Bladeren bron

how agent 维度分析整合到推导流程中

liuzhiheng 1 maand geleden
bovenliggende
commit
318b69dbc4

+ 14 - 3
examples_how/overall_derivation/derivation_main.md

@@ -76,6 +76,7 @@ $system$
 | 账号 pattern 复用 | `find_pattern` | 需 `account_name`、`post_id`、`derived_items`(可为空)、条件概率阈值、`top_n`;当 `derived_items` 非空时优先返回元素中包含已推导选题点的 pattern;**返回数据中已包含每个 pattern 各元素(含通过人设树子节点/兄弟节点扩展匹配)匹配成功的帖子选题点**。 |
 | 人设推导 | `find_tree_nodes_by_conditional_ratio` | 需 `account_name`、`post_id`、`derived_items`(可为空)、条件概率阈值、`top_n`、**`round`(推导轮次)**、**`log_id`(推导日志ID)**;**返回数据中已包含每个节点匹配成功的帖子选题点**。 |
 | 信息搜索 | 调用子 agent | 使用 `agent(agent_type="derivation_search", "task="...")`,在 `task` 中传入本次搜索所需的全部参数(见下方说明)。搜索子 agent 内部完成搜索与评估,将候选点及匹配结果一并返回。 |
+| **轮末维度汇总** | `round_pattern_dimension_analyze` | **每轮步骤四**在已写入本轮评估日志之后调用;需 `account_name`、`post_id`、`log_id`、当前轮次 **`round`**。工具基于第 `round` 轮及之前的评估累计状态,输出**当前已推导维度**与**可能尚未推导的维度**(可读文本),供下一轮策略决策|
 
 ### 信息搜索子 agent 调用参数说明
 
@@ -136,7 +137,7 @@ agent(agent_type="derivation_search", task="执行搜索任务,account_name=xx
 
 ### 推导主循环
 
-每一轮推导按以下四个步骤顺序执行,**不可跳步、不可乱序**:
+每一轮推导按以下**四个步骤**顺序执行(步骤四内含 4.1→4.2→4.3 子顺序),**不可跳步、不可乱序**:
 
 **步骤一:策略决策**
 执行推导前,先明确本轮方向:当前处于广召回阶段还是收敛阶段?上一轮评估结果如何,哪些方向值得延伸或放弃?本轮应选用哪些方法与参数组合?同时检查 `failed_points` 列表,确保本轮不重复已失败的推导方向。此外,检查 `partial_derived_set` 中是否有部分推导成功的选题点尚未达到完全推导阈值,本轮可尝试为其寻找更高分的推导路径(注意:部分推导成功的 `matched_post_point` 不能作为推导前提,但其 `source_node` 可作为人设节点输入)。
@@ -169,10 +170,14 @@ agent(agent_type="derivation_search", task="执行搜索任务,account_name=xx
 
 **部分推导升级检查**:对于 `partial_derived_set` 中已有的选题点,若本轮出现了更高 `matched_score` 的路径,则更新该选题点的记录(包括分数和对应的推导路径信息)。若更新后 `matched_score >= 0.78`,则将其从 `partial_derived_set` 移入 `derived_success_set`。
 
-**步骤四:写入日志 + 更新集合(每轮必须执行,不可省略)**
+**步骤四:写入日志 + 更新集合 + 获取当前推导维度数据(每轮必须按序执行,不可省略)**
+
+**4.1 写入推导日志与评估日志**
 - 将本轮推导路径按**推导日志**格式写入 `/Users/liuzhiheng/work/aigc/code/Agent/examples_how/overall_derivation/output/{account_name}/推导日志/{帖子ID}/{log_id}/{轮次}_推导.json`
 - 将步骤三的匹配判断结果按**评估日志**格式写入 `/Users/liuzhiheng/work/aigc/code/Agent/examples_how/overall_derivation/output/{account_name}/推导日志/{帖子ID}/{log_id}/{轮次}_评估.json`
 - 直接调用工具写入文件即可,不需要创建目录
+
+**4.2 更新集合**
 - 根据匹配结果更新集合:
   - `is_matched=true` 且 `matched_score >= 0.78`:将 `matched_post_point` 加入 `derived_success_set`(完全推导成功)
   - `is_matched=true` 且 `matched_score < 0.78`:若该选题点不在 `derived_success_set` 中,将其加入 `partial_derived_set`(部分推导成功),或更新 `partial_derived_set` 中已有记录的分数(取更高分)
@@ -180,7 +185,13 @@ agent(agent_type="derivation_search", task="执行搜索任务,account_name=xx
 - 根据匹配结果更新 `failed_points`:将 `is_matched=false` 的推导选题点记录在案,后续推导中**原则上不得再次输出**该名称;若确有必要重新推导,须换用完全不同的推导方法与输入组合
 - **更新 `consecutive_zero_rounds`**:若本轮匹配率 = 0%(无任何 `is_matched=true` 的记录,包括完全推导和部分推导),则 `consecutive_zero_rounds += 1`;否则重置为 `0`
 
-> **日志输出要求(强制)**:上述两个 JSON 文件是每轮唯一合法的输出载体。**禁止**以 markdown 文件、汇总报告或任何其他格式替代按轮次写入的 JSON 日志文件。每轮的推导日志和评估日志必须在该轮匹配判断完成后**立即写入**,不得延迟到任务结束后统一输出。
+**4.3 获取当前推导维度数据(须在 4.1 写入本轮 `{轮次}_评估.json` 之后执行)**
+- **任务目的**:汇总截至本轮结束时,人设维度上**已推导**与**仍可能未推导**的节点视图,便于下一轮策略决策(例如优先覆盖未推导维度、收敛方向判断)。
+- **操作方式**:调用工具 **`round_pattern_dimension_analyze(account_name, post_id, log_id, round)`**,其中 **`round` 与当前轮次一致**(与刚写入的 `{轮次}_评估.json` 对应)。
+- **工具行为**:读取第 `round` 轮及之前的评估累计状态,返回 **已推导的维度**、**未推导的维度** 等结构化结论(`ToolResult.output` 为可读摘要)。
+- **主 agent 动作**:读取工具返回内容,将「已推导 / 未推导维度」记入工作记忆,供**下一轮步骤一(策略决策)**与后续调用人设推导工具时参考;若工具报错(如评估文件尚未就绪),须先确认 4.1 已正确写入本轮评估日志后再重试或按错误说明处理。
+
+> **日志输出要求(强制)**:**推导日志**与**评估日志**两个 JSON 是每轮推导与评估结果的唯一合法输出载体。**禁止**以 markdown 文件、汇总报告或任何其他格式替代按轮次写入的 `{轮次}_推导.json` / `{轮次}_评估.json`。上述两文件须在步骤三完成后**立即写入**。
 
 **完成四步后**,根据「失败恢复与策略调整」章节判断是否继续下一轮。
 

+ 1 - 0
examples_how/overall_derivation/overall_derivation_agent_run.py

@@ -269,6 +269,7 @@ async def main(account_name, post_id):
         ("find_pattern", "find_pattern.py"),
         ("point_match", "point_match.py"),
         ("search_and_eval", "search_and_eval.py"),
+        ("pattern_dimension_analyze", "pattern_dimension_analyze.py"),
     ]:
         path = tools_dir / file_name
         if path.is_file():

+ 5 - 14
examples_how/overall_derivation/tools/find_pattern.py

@@ -219,36 +219,27 @@ def get_patterns_by_conditional_ratio(
     return result
 
 
-@tool(
-    description="按条件概率从 pattern 库中筛选 pattern,优先返回包含已推导选题点的 pattern,并检查每个 pattern 的元素是否与帖子选题点匹配。"
-    "功能:根据账号与已推导选题点(可选),筛选条件概率不低于阈值的 pattern;当 derived_items 非空时,优先返回 pattern 元素中包含已推导选题点的 pattern;同时对每个 pattern 的所有元素做帖子选题点匹配,匹配结果直接包含在返回数据中。"
-    "参数:account_name 为账号名;post_id 为帖子ID,用于加载帖子选题点并做匹配判断;derived_items 为已推导选题点列表,每项含 topic(或已推导的选题点)与 source_node(或推导来源人设树节点),可为空,为空时条件概率使用 pattern 自身的 support;conditional_ratio_threshold 为条件概率阈值;top_n 为返回条数上限,默认 100。"
-    "返回:ToolResult,output 为可读的 pattern 列表文本"
-)
+@tool()
 async def find_pattern(
     account_name: str,
     post_id: str,
     derived_items: list[dict[str, str]],
     conditional_ratio_threshold: float,
     top_n: int = 100,
-    context: Optional[ToolContext] = None,
 ) -> ToolResult:
     """
     按条件概率阈值从 pattern 库筛选 pattern,返回最多 top_n 条(按条件概率降序)。
     当 derived_items 非空时,优先返回元素中包含已推导选题点的 pattern。
     返回前对每个 pattern 的所有元素做帖子选题点匹配,匹配结果直接包含在返回数据中。
 
-    参数
-    -------
+    Args:
     account_name : 账号名,用于定位该账号的 pattern 库。
     post_id : 帖子ID,用于加载帖子选题点并与 pattern 元素做匹配判断。
-    derived_items : 已推导选题点列表,可为空。非空时每项为字典,需含 topic(或「已推导的选题点」)与 source_node(或「推导来源人设树节点」);为空时各 pattern 的条件概率取其自身 support。
+    derived_items : 已推导选题点列表,可为空。非空时每项为字典,需含 topic(或「已推导的选题点」)与 source_node(或「推导来源人设树节点」)
     conditional_ratio_threshold : 条件概率阈值,仅返回条件概率 >= 该值的 pattern。
-    top_n : 返回条数上限,默认 100。
-    context : 可选,Agent 工具上下文。
+    top_n : 返回条数上限。
 
-    返回
-    -------
+    Returns:
     ToolResult:
         - title: 结果标题。
         - output: 可读的 pattern 列表文本(每行:pattern名称、条件概率、帖子匹配情况)。

+ 164 - 49
examples_how/overall_derivation/tools/find_tree_node.py

@@ -15,8 +15,8 @@ from typing import Any, Optional
 _root = Path(__file__).resolve().parent.parent
 if str(_root) not in sys.path:
     sys.path.insert(0, str(_root))
-from utils.conditional_ratio_calc import calc_node_conditional_ratio
-from tools.point_match import match_derivation_to_post_points
+from utils.conditional_ratio_calc import calc_node_conditional_ratio  # noqa: E402
+from tools.point_match import match_derivation_to_post_points  # noqa: E402
 
 try:
     from agent.tools import tool, ToolResult, ToolContext
@@ -26,8 +26,82 @@ except ImportError:
     ToolResult = None  # 仅用 main() 测核心逻辑时可无 agent
     ToolContext = None
 
-# 相对本文件:tools -> overall_derivation,input 在 overall_derivation 下
+# 相对本文件:tools -> overall_derivation,input / output 在 overall_derivation 下
 _BASE_INPUT = Path(__file__).resolve().parent.parent / "input"
+_BASE_OUTPUT = Path(__file__).resolve().parent.parent / "output"
+
+
+def _dimension_analysis_log_dir(account_name: str, post_id: str, log_id: str) -> Path:
+    """推导日志目录:output/{account_name}/推导日志/{post_id}/{log_id}/"""
+    return _BASE_OUTPUT / account_name / "推导日志" / post_id / log_id
+
+
+def _load_derived_dim_tree_node_names(
+    account_name: str, post_id: str, log_id: str, round: int
+) -> list[str]:
+    """
+    读取当前轮次对应的维度分析 JSON(优先 {round}_维度分析.json,不存在则 {round-1}_维度分析.json),
+    返回 derived_dims 中每项的 tree_node_name(已推导出的维度节点,人设树中层次较高)。
+    无可用文件时返回空列表。
+    """
+    if not log_id or not str(log_id).strip():
+        return []
+    log_dir = _dimension_analysis_log_dir(account_name, post_id, str(log_id).strip())
+    for r in (round, round - 1):
+        if r < 1:
+            continue
+        path = log_dir / f"{r}_维度分析.json"
+        if not path.is_file():
+            continue
+        try:
+            with open(path, "r", encoding="utf-8") as f:
+                data = json.load(f)
+        except Exception:
+            continue
+        dims = data.get("derived_dims") or []
+        names: list[str] = []
+        for d in dims:
+            if isinstance(d, dict):
+                tn = d.get("tree_node_name")
+                if tn is not None and str(tn).strip():
+                    names.append(str(tn).strip())
+        return names
+    return []
+
+
+def _descendant_names_under_tree_nodes(
+    account_name: str, anchor_node_names: list[str]
+) -> tuple[set[str], dict[str, str]]:
+    """
+    在每个人设维度树根上 DFS,收集所有锚点(derived_dims.tree_node_name)之下的**全部后代**(不含锚点自身)。
+
+    同时记录「所属维度」:对路径上每个后代节点,取从维度根到该节点路径上**最深的**那个锚点
+    (与原先沿父链向上找最近 derived_dim 一致;多个锚点呈祖孙时取更深者)。
+
+    Returns:
+        (allowed 节点名集合, 节点名 -> 所属已推导维度树节点名)
+    """
+    if not anchor_node_names:
+        return set(), {}
+    S = set(anchor_node_names)
+    allowed: set[str] = set()
+    dim_map: dict[str, str] = {}
+
+    for dim_root_name, root in _load_trees(account_name):
+
+        def dfs(node_name: str, node_dict: dict, parent_deepest_s: Optional[str]) -> None:
+            d_self = node_name if node_name in S else parent_deepest_s
+            for cname, cnode in (node_dict.get("children") or {}).items():
+                if not isinstance(cnode, dict):
+                    continue
+                if cname not in S and d_self is not None:
+                    allowed.add(cname)
+                    dim_map[cname] = d_self
+                dfs(cname, cnode, d_self)
+
+        dfs(dim_root_name, root, None)
+
+    return allowed, dim_map
 
 
 def _tree_dir(account_name: str) -> Path:
@@ -107,41 +181,60 @@ def get_nodes_by_conditional_ratio(
     derived_list: list[tuple[str, str]],
     threshold: float,
     top_n: int,
+    allowed_node_names: Optional[set[str]] = None,
+    node_belonging_dim: Optional[dict[str, str]] = None,
 ) -> list[dict[str, Any]]:
     """
     获取人设树中条件概率 >= threshold 的节点,按条件概率降序,返回前 top_n 个。
     derived_list: 已推导列表,每项 (已推导的选题点, 推导来源人设树节点);为空时使用节点自身的 _ratio 作为条件概率。
-    返回列表项:节点名称、条件概率、父节点名称。
+    allowed_node_names: 若给定,仅保留节点名称在该集合内的结果。
+    node_belonging_dim: 与 allowed 同步生成(见 _descendant_names_under_tree_nodes),节点名 -> 所属已推导维度;不传则所属维度均为「—」。
+    返回列表项:节点名称、条件概率、父节点名称、所属维度。
     """
     base_dir = _BASE_INPUT
-    scored: list[tuple[str, float, str]] = []
+    node_to_parent: dict[str, str] = {}
+    if derived_list:
+        for n, p, _ in _iter_all_nodes(account_name):
+            node_to_parent[n] = p
+
+    def dim_for(node_name: str) -> str:
+        if not node_belonging_dim:
+            return "—"
+        return node_belonging_dim.get(node_name) or "—"
+
+    scored: list[tuple[str, float, str, str]] = []
 
     if not derived_list:
-        # derived_items 为空:条件概率取节点本身的 _ratio
         for node_name, parent_name, node in _iter_all_nodes(account_name):
+            if allowed_node_names is not None and node_name not in allowed_node_names:
+                continue
             ratio = node.get("_ratio")
             if ratio is None:
                 ratio = 0.0
             else:
                 ratio = float(ratio)
             if ratio >= threshold:
-                scored.append((node_name, ratio, parent_name))
+                scored.append((node_name, ratio, parent_name, dim_for(node_name)))
     else:
-        node_to_parent: dict[str, str] = {}
-        for node_name, parent_name, _ in _iter_all_nodes(account_name):
-            node_to_parent[node_name] = parent_name
         for node_name, parent_name in node_to_parent.items():
+            if allowed_node_names is not None and node_name not in allowed_node_names:
+                continue
             ratio = calc_node_conditional_ratio(
                 account_name, derived_list, node_name, base_dir=base_dir
             )
             if ratio >= threshold:
-                scored.append((node_name, ratio, parent_name))
+                scored.append((node_name, ratio, parent_name, dim_for(node_name)))
 
     scored.sort(key=lambda x: x[1], reverse=True)
     top = scored[:top_n]
     return [
-        {"节点名称": name, "条件概率": ratio, "父节点名称": parent}
-        for name, ratio, parent in top
+        {
+            "节点名称": name,
+            "条件概率": ratio,
+            "父节点名称": parent,
+            "所属维度": dim,
+        }
+        for name, ratio, parent, dim in top
     ]
 
 
@@ -163,28 +256,19 @@ def _parse_derived_list(derived_items: list[dict[str, str]]) -> list[tuple[str,
 # Agent Tools(参考 glob_tool 封装)
 # ---------------------------------------------------------------------------
 
-@tool(
-    description="获取指定账号人设树中的常量节点(全局常量、局部常量),并检查每个节点与帖子选题点的匹配情况。"
-    "功能:根据账号名查询该账号人设树中所有常量节点,同时对每个节点判断是否匹配帖子选题点,匹配结果直接包含在返回数据中。"
-    "参数:account_name 为账号名;post_id 为帖子ID,用于加载帖子选题点并做匹配判断。"
-    "返回:ToolResult,output 为可读的节点列表文本"
-)
+@tool()
 async def find_tree_constant_nodes(
     account_name: str,
     post_id: str,
-    context: Optional[ToolContext] = None,
 ) -> ToolResult:
     """
     获取人设树中的常量节点列表(全局常量与局部常量),并检查每个节点与帖子选题点的匹配情况。
 
-    参数
-    -------
+    Args:
     account_name : 账号名,用于定位该账号的人设树数据。
     post_id : 帖子ID,用于加载帖子选题点并与各常量节点做匹配判断。
-    context : 可选,Agent 工具上下文。
 
-    返回
-    -------
+    Returns:
     ToolResult:
         - title: 结果标题。
         - output: 可读的节点列表文本(每行:节点名称、概率、常量类型、帖子匹配情况)。
@@ -237,37 +321,32 @@ async def find_tree_constant_nodes(
         )
 
 
-@tool(
-    description="按条件概率从人设树中筛选节点,返回达到阈值且按条件概率排序的前 topN 条,并检查每个节点与帖子选题点的匹配情况。"
-    "功能:根据账号与已推导选题点(可选),筛选人设树中条件概率不低于阈值的节点,同时对每个节点判断是否匹配帖子选题点,匹配结果直接包含在返回数据中。"
-    "参数:account_name 为账号名;post_id 为帖子ID,用于加载帖子选题点并做匹配判断;derived_items 为已推导选题点列表,每项含 topic(或已推导的选题点)与 source_node(或推导来源人设树节点),可为空,为空时条件概率使用节点自身的 _ratio;conditional_ratio_threshold 为条件概率阈值;top_n 为返回条数上限,默认 100。"
-    "返回:ToolResult,output 为可读的节点列表文本"
-)
+@tool()
 async def find_tree_nodes_by_conditional_ratio(
     account_name: str,
     post_id: str,
     derived_items: list[dict[str, str]],
     conditional_ratio_threshold: float,
     top_n: int = 100,
-    context: Optional[ToolContext] = None,
+    round: int = 1,
+    log_id: str = "",
 ) -> ToolResult:
     """
     按条件概率阈值从人设树筛选节点,返回最多 top_n 条(按条件概率降序),并检查每个节点与帖子选题点的匹配情况。
 
-    参数
-    -------
+    Args:
     account_name : 账号名,用于定位该账号的人设树数据。
     post_id : 帖子ID,用于加载帖子选题点并与各节点做匹配判断。
-    derived_items : 已推导选题点列表,可为空。非空时每项为字典,需含 topic(或「已推导的选题点」)与 source_node(或「推导来源人设树节点」);为空时各节点的条件概率取其自身 _ratio。
+    derived_items : 已推导选题点列表,可为空。非空时每项为字典,需含 topic(或「已推导的选题点」)与 source_node(或「推导来源人设树节点」)
     conditional_ratio_threshold : 条件概率阈值,仅返回条件概率 >= 该值的节点。
-    top_n : 返回条数上限,默认 100。
-    context : 可选,Agent 工具上下文。
+    top_n : 返回条数上限。
+    round : 推导轮次。
+    log_id : 推导日志ID
 
-    返回
-    -------
+    Returns:
     ToolResult:
         - title: 结果标题。
-        - output: 可读的节点列表文本(每行:节点名称、条件概率、父节点名称、帖子匹配情况)。
+        - output: 可读的节点列表文本(每行:节点名称、条件概率、父节点、所属维度、帖子匹配情况)。
         - 出错时 error 为错误信息。
     """
     tree_dir = _tree_dir(account_name)
@@ -279,9 +358,37 @@ async def find_tree_nodes_by_conditional_ratio(
         )
     try:
         derived_list = _parse_derived_list(derived_items or [])
+
+        allowed: Optional[set[str]] = None
+        node_belonging_dim: dict[str, str] = {}
+        dim_source = ""
+        derived_dim_names: list[str] = []
+        if log_id and str(log_id).strip():
+            derived_dim_names = _load_derived_dim_tree_node_names(
+                account_name, post_id, str(log_id).strip(), int(round)
+            )
+            if derived_dim_names:
+                allowed, node_belonging_dim = _descendant_names_under_tree_nodes(
+                    account_name, derived_dim_names
+                )
+                # 记录实际用到的维度分析文件(与读取逻辑一致)
+                log_dir = _dimension_analysis_log_dir(account_name, post_id, str(log_id).strip())
+                for r in (int(round), int(round) - 1):
+                    if r >= 1 and (log_dir / f"{r}_维度分析.json").is_file():
+                        dim_source = f"{r}_维度分析.json (derived_dims -> 全部后代)"
+                        break
+            else:
+                dim_source = "未读到 derived_dims(无对应维度分析文件或为空),未收窄"
+
         items = get_nodes_by_conditional_ratio(
-            account_name, derived_list, conditional_ratio_threshold, top_n
+            account_name,
+            derived_list,
+            conditional_ratio_threshold,
+            top_n,
+            allowed_node_names=allowed,
+            node_belonging_dim=node_belonging_dim if node_belonging_dim else None,
         )
+
         # 批量匹配所有节点与帖子选题点
         if items and post_id:
             node_names = [x["节点名称"] for x in items]
@@ -296,11 +403,9 @@ async def find_tree_nodes_by_conditional_ratio(
                 matches = node_match_map.get(item["节点名称"], [])
                 item["帖子选题点匹配"] = matches if matches else "无"
 
-
         # [临时] 仅保留有帖子选题点匹配的记录(过滤掉「无」),方便后续删除
         items = [x for x in items if isinstance(x.get("帖子选题点匹配"), list)]
 
-
         if not items:
             output = f"未找到条件概率 >= {conditional_ratio_threshold} 的节点"
         else:
@@ -311,8 +416,9 @@ async def find_tree_nodes_by_conditional_ratio(
                     match_str = "、".join(f"{m['帖子选题点']}({m['匹配分数']})" for m in match_info)
                 else:
                     match_str = str(match_info)
+                dim_label = x.get("所属维度", "—")
                 lines.append(
-                    f"- {x['节点名称']}\t条件概率={x['条件概率']}\t父节点={x['父节点名称']}\t帖子选题点匹配={match_str}"
+                    f"- {x['节点名称']}\t条件概率={x['条件概率']}\t父节点={x['父节点名称']}\t所属维度={dim_label}\t帖子选题点匹配={match_str}"
                 )
             output = "\n".join(lines)
         return ToolResult(
@@ -323,6 +429,13 @@ async def find_tree_nodes_by_conditional_ratio(
                 "threshold": conditional_ratio_threshold,
                 "top_n": top_n,
                 "count": len(items),
+                "round": int(round),
+                "log_id": str(log_id).strip() if log_id else "",
+                "dimension_filter": {
+                    "derived_dim_nodes": derived_dim_names,
+                    "allowed_descendant_count": len(allowed) if allowed is not None else None,
+                    "source": dim_source or ("未提供 log_id,未按维度收窄" if not (log_id and str(log_id).strip()) else ""),
+                },
             },
         )
     except Exception as e:
@@ -343,9 +456,9 @@ def main() -> None:
     #     {"topic": "分享", "source_node": "分享"},
     #     {"topic": "叙事结构", "source_node": "叙事结构"},
     # ]
-    derived_items = [{"source_node":"分享","topic":"分享"},{"source_node":"叙事结构","topic":"叙事结构"},{"source_node":"图片文字","topic":"图片文字"},{"source_node":"补充说明式","topic":"补充说明式"},{"source_node":"幽默化标题","topic":"幽默化标题"},{"source_node":"标题","topic":"标题"}]
+    derived_items = [{"topic":"分享","source_node":"分享"},{"topic":"叙事结构","source_node":"叙事编排"},{"topic":"幽默化标题","source_node":"幽默化标题"},{"source_node":"叙事结构","topic":"叙事结构"},{"topic":"夸张堆叠","source_node":"夸张转化"},{"topic":"居家生活场景","source_node":"生活场景"},{"topic":"图片文字","source_node":"图片文字"},{"source_node":"补充说明式","topic":"补充说明式"},{"topic":"标题","source_node":"标题"},{"topic":"递进式","source_node":"递进式"},{"source_node":"荒诞夸张","topic":"夸张堆叠"},{"topic":"图片文字","source_node":"图文组合"},{"topic":"补充说明式","source_node":"解读说明"},{"source_node":"分步骤","topic":"递进式"},{"source_node":"视觉证据","topic":"拖鞋物证"},{"topic":"鞋架","source_node":"家居器具"},{"topic":"柴犬形象","source_node":"形象演绎"},{"topic":"叙事结构","source_node":"结构编排"},{"topic":"图片文字","source_node":"图文编排"},{"source_node":"形象","topic":"柴犬形象"},{"source_node":"版面结构","topic":"叙事结构"},{"source_node":"夸张变形","topic":"夸张堆叠"},{"source_node":"夸张造型","topic":"夸张堆叠"},{"topic":"夸张堆叠","source_node":"夸张穿戴法"}]
     conditional_ratio_threshold = 0.2
-    top_n = 1000
+    top_n = 2000
 
     # # 1)常量节点(核心函数,无匹配)
     # constant_nodes = get_constant_nodes(account_name)
@@ -367,15 +480,17 @@ def main() -> None:
     # 3)有 agent 时通过 tool 接口再跑一遍(含帖子选题点匹配)
     if ToolResult is not None:
         async def run_tools():
-            r1 = await find_tree_constant_nodes(account_name, post_id=post_id)
-            print("--- find_tree_constant_nodes ---")
-            print(r1.output[:2000] + "..." if len(r1.output) > 2000 else r1.output)
+            # r1 = await find_tree_constant_nodes(account_name, post_id=post_id)
+            # print("--- find_tree_constant_nodes ---")
+            # print(r1.output[:2000] + "..." if len(r1.output) > 2000 else r1.output)
             r2 = await find_tree_nodes_by_conditional_ratio(
                 account_name,
                 post_id=post_id,
                 derived_items=derived_items,
                 conditional_ratio_threshold=conditional_ratio_threshold,
                 top_n=top_n,
+                round=6,
+                log_id="20260318172724",
             )
             print("\n--- find_tree_nodes_by_conditional_ratio ---")
             print(r2.output)

+ 306 - 56
examples_how/overall_derivation/tools/pattern_dimension_analyze.py

@@ -10,14 +10,27 @@ Pattern 维度分析 Tool
 - account_name: 账号名称
 - post_id: 帖子 ID
 - log_id: 推导日志目录名(形如 20260313210921)
-- cluster_level: 在人设树中查找祖先节点的目标深度(root 为 0 层)
+
+已推导/未推导维度节点在结果中以对象列表表示,字段见 _analyze_single_round 返回说明。
 """
 
 import json
+import logging
 import sys
 from pathlib import Path
 from typing import Any, Dict, List, Optional, Tuple, Set
 
+logger = logging.getLogger(__name__)
+
+try:
+    from agent.tools import tool, ToolResult, ToolContext
+except ImportError:
+    def tool(*args, **kwargs):
+        return lambda f: f
+
+    ToolResult = None
+    ToolContext = None
+
 # 保证直接运行或作为包加载时都能解析 utils / tools(IDE 可跳转)
 _root = Path(__file__).resolve().parent.parent
 if str(_root) not in sys.path:
@@ -35,6 +48,9 @@ TOP_KEYS = [
 ]
 SUB_KEYS = ["two_x", "one_x", "zero_x"]
 
+# 在人设树中查找祖先节点的目标深度(root 为 0 层)
+CLUSTER_LEVEL = 3
+
 
 # ---------------------------------------------------------------------------
 # 1. 读取推导日志:按轮次累计 matched_post_point
@@ -52,6 +68,7 @@ def _load_round_matched_points(
     account_name: str,
     post_id: str,
     log_id: str,
+    max_round: Optional[int] = None,
 ) -> List[Dict[str, Any]]:
     """
     读取指定日志目录下所有 {轮次}.评估.json,按轮次排序,生成:
@@ -91,6 +108,9 @@ def _load_round_matched_points(
             continue
         eval_files.append((r, p))
 
+    if max_round is not None:
+        eval_files = [(r, p) for r, p in eval_files if r <= max_round]
+
     eval_files.sort(key=lambda x: x[0])
     results: List[Dict[str, Any]] = []
     cumulative: List[Dict[str, Any]] = []
@@ -512,11 +532,36 @@ class TreeIndex:
 # 4. 对单轮数据执行 pattern & 聚类分析
 # ---------------------------------------------------------------------------
 
+def _dim_obj(
+    tree_node_name: str,
+    tree_index: TreeIndex,
+    matched_point: Optional[str] = None,
+) -> Dict[str, Any]:
+    dim = (tree_index.node_info.get(tree_node_name) or {}).get("dimension") or ""
+    o: Dict[str, Any] = {
+        "tree_node_name": tree_node_name,
+        "dimension": dim,
+    }
+    if matched_point is not None:
+        o["matched_point"] = matched_point
+    return o
+
+
+def _entry_to_matched_point(entry: Dict[str, Any]) -> str:
+    """is_fully_derived 为 true 时用 matched_post_point,否则用 derivation_output_point。"""
+    dop = entry.get("derivation_output_point")
+    dop_s = str(dop).strip() if dop is not None else ""
+    if entry.get("is_fully_derived") is True:
+        mpp = entry.get("matched_post_point")
+        return str(mpp).strip() if mpp is not None else ""
+    return dop_s
+
+
 def _analyze_single_round(
     patterns: List[Dict[str, Any]],
     tree_index: TreeIndex,
     cumulative_points: List[Dict[str, Any]],
-    cluster_level: int,
+    cluster_level: int = CLUSTER_LEVEL,
 ) -> Dict[str, Any]:
     """
     对某一轮(给定累计 point 列表)执行维度分析:
@@ -527,24 +572,19 @@ def _analyze_single_round(
     3. 对筛选出 pattern 的每个元素标记是否已推导:
        - 元素在 derived_ancestor_set 中 → is_derived=True(已推导维度)
        - 其他 → is_derived=False(未推导维度)
-    4. 汇总 derived_dims / underived_dims 列表。
-
-    返回结构:
-    {
-      "cumulative_points": [...],          # 原始累计 point 对象列表
-      "derived_ancestor_nodes": [...],     # 所有 derivation_output_point 对应的 cluster_level 层祖先节点(已推导维度节点集合)
-      "patterns": [...],                   # 筛选后带 is_derived 标记的 pattern 列表
-      "derived_dims": [...],               # 已推导维度节点(去重,出现于筛选 pattern 中)
-      "underived_dims": [...],             # 未推导维度节点(去重,排除已推导节点)
-      "patterns_count": int,
-      "derived_dim_count": int,
-      "underived_dim_count": int,
-    }
+    4. 汇总 derived_dims / underived_dims 对象列表。
+
+    返回结构(节选):
+    - derived_ancestor_nodes: [{ tree_node_name, dimension, matched_point }, ...]
+    - derived_dims: [{ tree_node_name, dimension, matched_point }, ...]
+    - underived_dims: [{ tree_node_name, dimension }, ...](无 matched_point)
     """
-    # 1. 收集 derived_ancestor_set,同时记录每个祖先节点对应的 matched_post_point 来源
+    # 1. 收集 derived_ancestor_set,同时按规则累计每个祖先的 matched_point
     derived_ancestor_set: Set[str] = set()
-    ancestor_to_mpps: Dict[str, List[str]] = {}  # 祖先节点 -> [matched_post_point, ...]
+    ancestor_to_matched: Dict[str, List[str]] = {}
     for entry in cumulative_points:
+        if not isinstance(entry, dict):
+            continue
         dop = entry.get("derivation_output_point")
         if not dop:
             continue
@@ -552,9 +592,9 @@ def _analyze_single_round(
         if not ancestor:
             continue
         derived_ancestor_set.add(ancestor)
-        mpp = entry.get("matched_post_point") or ""
-        if mpp and mpp not in ancestor_to_mpps.get(ancestor, []):
-            ancestor_to_mpps.setdefault(ancestor, []).append(mpp)
+        pt = _entry_to_matched_point(entry)
+        if pt and pt not in ancestor_to_matched.get(ancestor, []):
+            ancestor_to_matched.setdefault(ancestor, []).append(pt)
 
     # 2. 筛选 pattern:已推导维度节点占所有元素的比例 >= 50%
     filtered_patterns: List[Dict[str, Any]] = []
@@ -578,23 +618,19 @@ def _analyze_single_round(
         f"derived_ancestor_set: {len(derived_ancestor_set)}"
     )
 
-    def _node_label(name: str, is_derived: bool) -> str:
-        """
-        返回格式化标签:
-        - 已推导节点:'node_name->dimension(mpp1,mpp2,...)'
-        - 未推导节点:'node_name->dimension'
-        """
-        dim = (tree_index.node_info.get(name) or {}).get("dimension") or ""
-        base = f"{name}->{dim}" if dim else name
-        if is_derived:
-            mpps = ancestor_to_mpps.get(name) or []
-            if mpps:
-                return f"{base}({','.join(mpps)})"
-        return base
+    def _matched_join(name: str) -> str:
+        pts = ancestor_to_matched.get(name) or []
+        return ", ".join(pts) if pts else ""
+
+    derived_ancestor_nodes: List[Dict[str, Any]] = []
+    for anc in sorted(derived_ancestor_set):
+        derived_ancestor_nodes.append(
+            _dim_obj(anc, tree_index, matched_point=_matched_join(anc) or "")
+        )
 
     # 3. 对筛选 pattern 元素分类并汇总维度列表
-    derived_dims: List[str] = []
-    underived_dims: List[str] = []
+    derived_dims: List[Dict[str, Any]] = []
+    underived_dims: List[Dict[str, Any]] = []
     derived_dims_seen: Set[str] = set()
     underived_dims_seen: Set[str] = set()
 
@@ -616,11 +652,17 @@ def _analyze_single_round(
             if is_derived:
                 if name and name not in derived_dims_seen:
                     derived_dims_seen.add(name)
-                    derived_dims.append(_node_label(name, is_derived=True))
+                    derived_dims.append(
+                        _dim_obj(
+                            name,
+                            tree_index,
+                            matched_point=_matched_join(name) or "",
+                        )
+                    )
             else:
                 if name and name not in underived_dims_seen:
                     underived_dims_seen.add(name)
-                    underived_dims.append(_node_label(name, is_derived=False))
+                    underived_dims.append(_dim_obj(name, tree_index))
 
         scored_patterns.append(
             {
@@ -631,7 +673,7 @@ def _analyze_single_round(
         )
 
     # 从 underived_dims 中排除与 derived_dims 重叠的节点
-    underived_dims = [d for d in underived_dims if d.split("->")[0] not in derived_dims_seen]
+    underived_dims = [d for d in underived_dims if d["tree_node_name"] not in derived_dims_seen]
 
     # 按 is_derived=True 的元素数量从高到低排序,数量相同再按元素总数从高到低
     scored_patterns.sort(
@@ -644,7 +686,7 @@ def _analyze_single_round(
 
     return {
         "cumulative_points": list(cumulative_points),
-        "derived_ancestor_nodes": sorted(derived_ancestor_set),
+        "derived_ancestor_nodes": derived_ancestor_nodes,
         "patterns": scored_patterns,
         "derived_dims": derived_dims,
         "underived_dims": underived_dims,
@@ -654,12 +696,32 @@ def _analyze_single_round(
     }
 
 
+def _format_round_dimension_text(
+    derived_dims: List[Dict[str, Any]],
+    underived_dims: List[Dict[str, Any]],
+) -> str:
+    """已推导/未推导维度,每行:维度:tree_node_name,匹配点:matched_point"""
+    lines: List[str] = ["【已推导的维度】"]
+    for d in derived_dims:
+        name = d.get("tree_node_name") or ""
+        mp = d.get("matched_point") or "-"
+        lines.append(f"维度:{name},匹配点:{mp}")
+    if not derived_dims:
+        lines.append("(无)")
+    lines.append("")
+    lines.append("【未推导的维度】")
+    for d in underived_dims:
+        name = d.get("tree_node_name") or ""
+        lines.append(f"维度:{name}")
+    if not underived_dims:
+        lines.append("(无)")
+    return "\n".join(lines)
+
 
 def pattern_dimension_analyze(
     account_name: str,
     post_id: str,
     log_id: str,
-    cluster_level: int = 2,
 ) -> Dict[str, Any]:
     """
     Pattern 维度分析主入口。
@@ -669,14 +731,13 @@ def pattern_dimension_analyze(
     account_name : 账号名(用于定位 input / output 下的数据目录)
     post_id : 帖子 ID(用于定位推导日志)
     log_id : 推导日志目录名(../output/{account_name}/推导日志/{post_id}/{log_id}/)
-    cluster_level : 在人设树中查找祖先节点的目标深度(root 为 0 层),默认 2
 
     逻辑概述
     --------
-    每一轮:
-    1. 从 derivation_output_point 在人设树中找到 cluster_level 层祖先节点 → 已推导维度节点集合。
+    聚类层级固定为 CLUSTER_LEVEL(默认 3)。每一轮:
+    1. 从 derivation_output_point 在人设树中找到层祖先节点 → 已推导维度节点集合。
     2. 筛选包含已推导维度节点的 pattern。
-    3. 标记每个 pattern 元素是否已推导,汇总 derived_dims / underived_dims。
+    3. 标记每个 pattern 元素是否已推导,汇总 derived_dims / underived_dims(对象列表)
     """
     eval_dir = _round_eval_dir(account_name, post_id, log_id)
     if not eval_dir.is_dir():
@@ -688,7 +749,7 @@ def pattern_dimension_analyze(
             "account_name": account_name,
             "post_id": post_id,
             "log_id": log_id,
-            "cluster_level": cluster_level,
+            "cluster_level": CLUSTER_LEVEL,
             "rounds": [],
             "message": "未在指定日志目录下找到任何评估结果文件(*_评估.json)",
         }
@@ -707,7 +768,6 @@ def pattern_dimension_analyze(
             patterns=deduped_patterns,
             tree_index=tree_index,
             cumulative_points=cumulative_points,
-            cluster_level=cluster_level,
         )
         analyzed["round"] = r
         rounds_output.append(analyzed)
@@ -716,18 +776,145 @@ def pattern_dimension_analyze(
         "account_name": account_name,
         "post_id": post_id,
         "log_id": log_id,
-        "cluster_level": cluster_level,
+        "cluster_level": CLUSTER_LEVEL,
         "rounds": rounds_output,
     }
 
 
+def round_pattern_dimension_analyze_core(
+    account_name: str,
+    post_id: str,
+    log_id: str,
+    round: int,
+) -> Dict[str, Any]:
+    """
+    仅使用第 round 轮及之前的评估文件,得到该轮结束时的累计选题点状态并做维度分析。
+    返回 analyzed 单轮结构(含 derived_dims / underived_dims 等),失败时含 error 字段。
+    """
+    eval_dir = _round_eval_dir(account_name, post_id, log_id)
+    if not eval_dir.is_dir():
+        return {"error": f"推导日志目录不存在: {eval_dir}"}
+
+    round_infos = _load_round_matched_points(
+        account_name, post_id, log_id, max_round=round
+    )
+    if not round_infos:
+        return {
+            "error": f"在 {eval_dir} 下未找到第 {round} 轮及之前的 *_评估.json",
+        }
+    last = round_infos[-1]
+    if last.get("round") != round:
+        return {
+            "error": (
+                f"指定轮次 {round} 的评估文件不存在;"
+                f"当前仅加载到第 {last.get('round')} 轮"
+            ),
+        }
+
+    tree_index = TreeIndex(account_name)
+    raw_patterns = _load_raw_patterns(account_name)
+    deduped_patterns = _dedupe_patterns(raw_patterns)
+    analyzed = _analyze_single_round(
+        patterns=deduped_patterns,
+        tree_index=tree_index,
+        cumulative_points=last["cumulative_points"],
+    )
+    analyzed["round"] = round
+    return analyzed
+
+
+@tool()
+async def round_pattern_dimension_analyze(
+    account_name: str,
+    post_id: str,
+    log_id: str,
+    round: int,
+) -> Any:
+    """
+    推导维度分析,返回当前轮次已推导的维度和可能的未推导维度数据
+
+    Args:
+        account_name: 账号名称
+        post_id: 帖子 ID
+        log_id: 推导日志目录名
+        round: 推导轮次(正整数)
+
+    Returns:
+        ToolResult:output 为可读文本,含「已推导的维度」「未推导的维度」两段,
+        每行格式为「维度:tree_node_name,匹配点:matched_point」
+        (未推导行固定为「-」;matched_point 规则:is_fully_derived 为真取选题点否则取推导输出点)。
+    """
+    if ToolResult is None:
+        return None
+
+    logger.info(
+        "round_pattern_dimension_analyze: account=%s post_id=%s log_id=%s round=%s",
+        account_name,
+        post_id,
+        log_id,
+        round,
+    )
+
+    try:
+        r = int(round)
+        if r < 1:
+            return ToolResult(
+                title="维度分析: 轮次无效",
+                output="",
+                error="round 须为 >= 1 的整数",
+            )
+    except (TypeError, ValueError):
+        return ToolResult(
+            title="维度分析: 轮次无效",
+            output="",
+            error="round 须为整数",
+        )
+
+    try:
+        analyzed = round_pattern_dimension_analyze_core(
+            account_name, post_id, log_id, r
+        )
+        if analyzed.get("error"):
+            return ToolResult(
+                title=f"维度分析 第{r}轮 失败",
+                output="",
+                error=str(analyzed["error"]),
+            )
+
+        # 保存到与 {轮次}_评估.json 同级目录
+        out_dir = _round_eval_dir(account_name, post_id, log_id)
+        out_dir.mkdir(parents=True, exist_ok=True)
+        out_path = out_dir / f"{r}_维度分析.json"
+        with open(out_path, "w", encoding="utf-8") as f:
+            json.dump(analyzed, f, ensure_ascii=False, indent=2)
+
+        derived = analyzed.get("derived_dims") or []
+        underived = analyzed.get("underived_dims") or []
+        text = _format_round_dimension_text(derived, underived)
+        meta = (
+            f"round={r}, derived={len(derived)}, underived={len(underived)}, "
+            f"patterns={analyzed.get('patterns_count', 0)}"
+        )
+        return ToolResult(
+            title=f"第 {r} 轮维度分析(已推导 {len(derived)} / 未推导 {len(underived)})",
+            output=text,
+            metadata={"round_pattern_dimension_analyze": meta},
+        )
+    except Exception as e:
+        logger.exception("round_pattern_dimension_analyze failed: %s", e)
+        return ToolResult(
+            title="维度分析失败",
+            output="",
+            error=str(e),
+        )
+
+
 def main(account_name, post_id, log_id) -> None:
     """本地简单测试:以家有大志账号的一次推导日志做分析,并将结果写入输出目录。"""
     result = pattern_dimension_analyze(
         account_name=account_name,
         post_id=post_id,
         log_id=log_id,
-        cluster_level=3,
     )
     # 控制台打印前 4000 字符,便于快速查看
     # print(json.dumps(result, ensure_ascii=False, indent=2)[:4000] + "...")
@@ -742,16 +929,79 @@ def main(account_name, post_id, log_id) -> None:
     print(f"\n分析结果已写入: {out_path}")
 
 
+def main_round_pattern_dimension_analyze(
+    account_name: str,
+    post_id: str,
+    log_id: str,
+    round_num: int,
+) -> None:
+    """本地测试:直接调用 round_pattern_dimension_analyze,打印 ToolResult。"""
+    import asyncio
+
+    async def _run() -> None:
+        result = await round_pattern_dimension_analyze(
+            account_name=account_name,
+            post_id=post_id,
+            log_id=log_id,
+            round=round_num,
+            context=None,
+        )
+        if result is None:
+            print(
+                "round_pattern_dimension_analyze 返回 None:请先将 Agent 项目根目录加入 PYTHONPATH,"
+                "或在 __main__ 中保证能 import agent.tools.ToolResult"
+            )
+            return
+        if result.error:
+            print(f"错误: {result.error}")
+        else:
+            print(result.title)
+            print(result.output)
+
+    asyncio.run(_run())
+
+
 if __name__ == "__main__":
-    account_name = "家有大志"
+    import asyncio
+    import importlib.util
+
+    # 直接加载 ToolResult,避免 import agent 时拉全量依赖(如 langchain)
+    _agent_root = Path(__file__).resolve().parents[3]
+    _models_py = _agent_root / "agent" / "tools" / "models.py"
+    if _models_py.is_file():
+        _spec = importlib.util.spec_from_file_location(
+            "_pattern_dim_tool_models", _models_py
+        )
+        if _spec and _spec.loader:
+            _m = importlib.util.module_from_spec(_spec)
+            _spec.loader.exec_module(_m)
+            globals()["ToolResult"] = _m.ToolResult
+            if getattr(_m, "ToolContext", None) is not None:
+                globals()["ToolContext"] = _m.ToolContext
+
+    # ---------- 开关与参数(改这里即可) ----------
+    run_round_pattern_test = False
+    run_full_pattern_analyze = True
+
+    test_account_name = "家有大志"
+    test_post_id = "68fb6a5c000000000302e5de"
+    test_log_id = "20260317214307"
+    test_round = 1
 
     items = [
-        {"post_id": "68fb6a5c000000000302e5de", "log_id": "20260317214307"},
-        {"post_id": "69185d49000000000d00f94e", "log_id": "20260317214841"},
-        {"post_id": "6921937a000000001b0278d1", "log_id": "20260317215616"}
+        {"post_id": "68fb6a5c000000000302e5de", "log_id": "20260318172724"},
+        # {"post_id": "69185d49000000000d00f94e", "log_id": "20260317214841"},
+        # {"post_id": "6921937a000000001b0278d1", "log_id": "20260317215616"},
     ]
-    for item in items:
-        post_id = item["post_id"]
-        log_id = item["log_id"]
-        main(account_name, post_id, log_id)
+
+    if run_round_pattern_test:
+        main_round_pattern_dimension_analyze(
+            test_account_name,
+            test_post_id,
+            test_log_id,
+            test_round,
+        )
+    elif run_full_pattern_analyze:
+        for item in items:
+            main(test_account_name, item["post_id"], item["log_id"])
 

+ 3 - 11
examples_how/overall_derivation/tools/point_match.py

@@ -149,12 +149,7 @@ async def match_derivation_to_post_points(
     return matched
 
 
-@tool(
-    description="判断推导选题点(人设树节点)与帖子选题点是否匹配。"
-    "功能:根据账号与帖子ID,将传入的推导选题点列表与帖子选题点做匹配,返回达到阈值的匹配对。"
-    "参数:derivation_output_points 为推导选题点字符串列表;account_name 为账号名;post_id 为帖子ID;match_threshold 为匹配分数阈值,默认 0.8。"
-    "返回:ToolResult,output 为可读匹配结果文本"
-)
+# @tool()
 async def point_match(
     derivation_output_points: List[str],
     account_name: str,
@@ -165,16 +160,13 @@ async def point_match(
     """
     判断推导选题点与帖子选题点是否匹配,返回达到阈值的匹配对。
 
-    参数
-    -------
+    Args:
     derivation_output_points : 推导选题点字符串列表。
     account_name : 账号名,用于定位 input 下的账号目录。
     post_id : 帖子ID,用于定位该帖的选题点与匹配数据。
     match_threshold : 匹配分数阈值,分数 >= 该值视为匹配成功,默认 0.8。
-    context : 可选,Agent 工具上下文。
 
-    返回
-    -------
+    Returns:
     ToolResult:
         - title: 结果标题。
         - output: 可读的匹配结果文本(每行:推导选题点、帖子选题点、匹配分数)。