Sfoglia il codice sorgente

fix: 扩展层排除已在直接匹配层的节点

避免同一节点同时出现在第3层(直接匹配)和第4层(扩展),
导致第4层节点和帖子标签产生不正确的匹配边

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
yangxiaohui 5 giorni fa
parent
commit
7f86de3a26
1 ha cambiato i file con 6 aggiunte e 1 eliminazioni
  1. 6 1
      script/data_processing/build_match_graph.py

+ 6 - 1
script/data_processing/build_match_graph.py

@@ -542,12 +542,17 @@ def process_filtered_result(
     # 扩展人设节点一层,只对标签类型的节点通过"属于"边扩展到分类
     # 过滤出标签类型的人设节点(只有标签才能"属于"分类)
     tag_persona_ids = {pid for pid in persona_node_ids if "_标签_" in pid}
-    expanded_nodes, expanded_edges, _ = expand_one_layer(
+    expanded_nodes_raw, expanded_edges_raw, _ = expand_one_layer(
         tag_persona_ids, edges_data, nodes_data,
         edge_types=["属于"],
         direction="outgoing"  # 只向外扩展:标签->分类
     )
 
+    # 排除已经在第3层(直接匹配)中的节点,避免同一节点出现在两层
+    expanded_nodes = [n for n in expanded_nodes_raw if n["节点ID"] not in persona_node_ids]
+    expanded_edges = [e for e in expanded_edges_raw
+                      if e["目标节点ID"] not in persona_node_ids or e["源节点ID"] not in persona_node_ids]
+
     # 创建通过扩展节点的帖子镜像边(正确逻辑)
     # 逻辑:帖子->标签->分类,分类之间有边,则对应帖子产生二阶边