# sug_v6_1_2_8.py 流程分析文档

## 📋 概述

`sug_v6_1_2_8.py` 是一个基于 LLM Agent 的智能搜索查询优化工具，主要用于小红书平台的搜索优化。通过多轮迭代的方式，从原始查询出发，逐步扩展和优化搜索词，最终获取高质量的搜索结果。

**版本**: v6.1.2.8
**核心模型**: google/gemini-2.5-flash
**主要特性**:
- 🔄 多轮迭代优化
- 🤖 多 Agent 协作
- 📊 相关度评分系统
- 🔍 小红书搜索集成
- 📈 可视化支持

---

## 🏗️ 整体架构

### 架构图

```
原始问题(o)
    ↓
[初始化阶段]
    ├─ 分词 → seg_list
    ├─ 评估分词相关度
    ├─ 构建 word_list_1
    ├─ 构建 q_list_1
    └─ 构建 seed_list
    ↓
[第1轮迭代]
    ├─ 请求 sug (建议词)
    ├─ 评估 sug 相关度
    ├─ 构建 search_list (高分sug搜索)
    ├─ 为 seed 加词 → q_list_next
    ├─ 更新 seed_list
    └─ 保存搜索结果
    ↓
[第2轮迭代] ...
    ↓
[第N轮迭代] ...
    ↓
[输出结果 + 可视化]
```

### 核心组件

1. **数据模型层** - 定义所有数据结构（Seg, Word, Q, Sug, Seed, Post, Search）
2. **Agent 层** - 三个专家 Agent（分词、相关度评估、加词选择）
3. **流程控制层** - 初始化、轮次迭代、主循环
4. **外部服务层** - 小红书 API 集成（搜索推荐、搜索）

---

## 📦 数据模型

### 核心数据结构

#### 1. Seg (分词)
```python
class Seg(BaseModel):
    text: str                    # 分词文本
    score_with_o: float = 0.0    # 与原始问题的评分
    reason: str = ""             # 评分理由
    from_o: str = ""             # 原始问题
```

**用途**: 存储原始问题分词后的每个词单元

#### 2. Word (词)
```python
class Word(BaseModel):
    text: str                    # 词文本
    score_with_o: float = 0.0    # 与原始问题的评分
    from_o: str = ""             # 原始问题
```

**用途**: 词库，用于后续组合新的查询词

#### 3. Q (查询)
```python
class Q(BaseModel):
    text: str                    # 查询文本
    score_with_o: float = 0.0    # 与原始问题的评分
    reason: str = ""             # 评分理由
    from_source: str = ""        # 来源: seg/sug/add
```

**用途**: 待处理的查询队列，每轮从 q_list 中取 query 进行处理

#### 4. Sug (建议词)
```python
class Sug(BaseModel):
    text: str                    # 建议词文本
    score_with_o: float = 0.0    # 与原始问题的评分
    reason: str = ""             # 评分理由
    from_q: QFromQ | None        # 来自哪个 q
```

**用途**: 存储从小红书 API 获取的建议词

#### 5. Seed (种子)
```python
class Seed(BaseModel):
    text: str                    # 种子文本
    added_words: list[str]       # 已添加的词
    from_type: str = ""          # 来源: seg/sug
    score_with_o: float = 0.0    # 与原始问题的评分
```

**用途**: 用于加词扩展的基础词，记录已经添加过的词以避免重复

#### 6. Post (帖子)
```python
class Post(BaseModel):
    title: str                   # 标题
    body_text: str               # 正文
    type: str = "normal"         # 类型: video/normal
    images: list[str]            # 图片URL列表
    video: str = ""              # 视频URL
    interact_info: dict          # 互动信息(点赞/收藏/评论/分享)
    note_id: str                 # 笔记ID
    note_url: str                # 笔记URL
```

**用途**: 存储小红书搜索结果的帖子详情

#### 7. Search (搜索结果)
```python
class Search(Sug):
    post_list: list[Post]        # 搜索到的帖子列表
```

**用途**: 继承 Sug，附加实际搜索到的帖子数据

#### 8. RunContext (运行上下文)
```python
class RunContext(BaseModel):
    version: str                 # 版本号
    input_files: dict            # 输入文件路径
    c: str                       # 原始需求
    o: str                       # 原始问题
    log_url: str                 # 日志URL
    log_dir: str                 # 日志目录
    rounds: list[dict]           # 每轮的详细数据
    final_output: str | None     # 最终结果
```

**用途**: 记录整个运行过程的上下文信息和中间结果

---

## 🤖 Agent 系统

### Agent 1: 分词专家 (word_segmenter)

**功能**: 将原始问题拆分成有意义的最小单元

**输入**: 原始查询文本
**输出**:
```python
class WordSegmentation:
    words: list[str]        # 分词结果列表
    reasoning: str          # 分词理由
```

**分词原则**:
1. 保留有搜索意义的词汇
2. 拆分成独立的概念
3. 保留专业术语的完整性
4. 去除虚词（的、吗、呢等）

**示例**:
- 输入: "如何获取能体现川西秋季特色的高质量风光摄影素材？"
- 输出: ["川西", "秋季", "风光摄影", "素材"]

### Agent 2: 相关度评估专家 (relevance_evaluator)

**功能**: 评估文本与原始问题的匹配程度

**输入**: 原始问题 + 待评估文本
**输出**:
```python
class RelevanceEvaluation:
    relevance_score: float  # 0-1的相关性分数
    reason: str            # 评估理由
```

**评估标准**:
- 主题相关性
- 要素覆盖度
- 意图匹配度

**示例**:
- 原始问题: "川西秋季摄影"
- 待评估: "川西旅游攻略"
- 输出: score=0.75, reason="与川西相关但缺少秋季和摄影要素"

### Agent 3: 加词选择专家 (word_selector)

**功能**: 从候选词中选择最合适的词与 seed 组合

**输入**: 原始问题 + 当前 seed + 候选词列表
**输出**:
```python
class WordSelection:
    selected_word: str       # 选择的词
    combined_query: str      # 组合后的新query
    reasoning: str           # 选择理由
```

**选择原则**:
1. 选择与当前 seed 最相关的词
2. 组合后的 query 语义通顺
3. 符合搜索习惯
4. 优先选择能扩展搜索范围的词

**示例**:
- seed: "川西"
- 候选词: ["秋季", "摄影", "旅游"]
- 输出: selected_word="秋季", combined_query="川西秋季"

---

## 🔄 核心流程

### 阶段 0: 初始化 (initialize)

**目标**: 从原始问题创建初始数据结构

**流程**:

```
步骤1: 分词
o → [word_segmenter] → WordSegmentation → seg_list

步骤2: 评估分词
for each seg in seg_list:
    seg + o → [relevance_evaluator] → score + reason
    更新 seg.score_with_o, seg.reason

步骤3: 构建 word_list_1
seg_list → word_list_1 (直接转换)

步骤4: 构建 q_list_1
seg_list → q_list_1 (from_source="seg")

步骤5: 构建 seed_list
seg_list → seed_list (from_type="seg")
```

**输入**:
- `o`: 原始问题（例如: "如何获取川西秋季风光摄影素材？"）

**输出**:
- `seg_list`: 分词结果列表
- `word_list_1`: 初始词库
- `q_list_1`: 第一轮待处理查询列表
- `seed_list`: 初始种子列表

**示例数据流**:
```
o = "川西秋季摄影素材"
    ↓
seg_list = [
    Seg(text="川西", score_with_o=0.85),
    Seg(text="秋季", score_with_o=0.90),
    Seg(text="摄影", score_with_o=0.88),
    Seg(text="素材", score_with_o=0.75)
]
    ↓
word_list_1 = [Word("川西"), Word("秋季"), ...]
q_list_1 = [Q("川西"), Q("秋季"), ...]
seed_list = [Seed("川西"), Seed("秋季"), ...]
```

---

### 阶段 N: 轮次迭代 (run_round)

**目标**: 基于当前 q_list 扩展搜索，生成下一轮的数据

**输入**:
- `round_num`: 轮次编号
- `q_list`: 当前轮的查询列表
- `word_list`: 当前词库
- `seed_list`: 当前种子列表
- `sug_threshold`: 建议词阈值（默认 0.7）

**输出**:
- `word_list_next`: 下一轮词库
- `q_list_next`: 下一轮查询列表
- `seed_list_next`: 下一轮种子列表
- `search_list`: 本轮搜索结果

#### 步骤1: 请求建议词

```python
for each q in q_list:
    sug_texts = xiaohongshu_api.get_recommendations(q.text)
    for sug_text in sug_texts:
        sug_list.append(Sug(
            text=sug_text,
            from_q=QFromQ(text=q.text, score=q.score_with_o)
        ))
```

**并发处理**: 所有 q 的请求可以并发执行

**数据流**:
```
q_list = [Q("川西"), Q("秋季")]
    ↓ [小红书API]
sug_list_list = [
    [Sug("川西旅游"), Sug("川西攻略"), ...],  # 来自 "川西"
    [Sug("秋季景色"), Sug("秋季摄影"), ...]   # 来自 "秋季"
]
```

#### 步骤2: 评估建议词

```python
async def evaluate_sug(sug: Sug) -> Sug:
    sug.score_with_o, sug.reason = await evaluate_with_o(sug.text, o)
    return sug

# 并发评估所有 sug
await asyncio.gather(*[evaluate_sug(sug) for sug in all_sugs])
```

**评估标准**: 使用 relevance_evaluator Agent

**数据流**:
```
Sug("川西旅游") + o → score=0.75, reason="..."
Sug("秋季摄影") + o → score=0.92, reason="..."
```

#### 步骤3: 构建 search_list（搜索高分建议词）

```python
high_score_sugs = [sug for sug in all_sugs if sug.score_with_o > sug_threshold]

async def search_for_sug(sug: Sug) -> Search:
    result = xiaohongshu_search.search(sug.text)
    posts = process_notes(result)
    return Search(text=sug.text, post_list=posts, ...)

search_list = await asyncio.gather(*[search_for_sug(sug) for sug in high_score_sugs])
```

**阈值过滤**: 只搜索评分 > `sug_threshold` 的建议词

**并发搜索**: 所有高分 sug 并发搜索

**数据流**:
```
high_score_sugs = [Sug("秋季摄影", score=0.92), ...]
    ↓ [小红书搜索API]
search_list = [
    Search(text="秋季摄影", post_list=[Post(...), ...])
]
```

#### 步骤4: 构建 word_list_next

```python
word_list_next = word_list.copy()  # 暂时直接复制
```

**说明**: 当前版本词库保持不变，未来可扩展从 sug 中提取新词

#### 步骤5: 构建 q_list_next

**5.1 为每个 seed 加词**

```python
for each seed in seed_list:
    # 过滤候选词
    candidate_words = [w for w in word_list_next
                       if w.text not in seed.text
                       and w.text not in seed.added_words]

    # Agent 选词
    selection_input = f"""
    原始问题: {o}
    当前Seed: {seed.text}
    候选词: {candidate_words}
    """
    result = await Runner.run(word_selector, selection_input)

    # 创建新 query
    new_q = Q(
        text=result.combined_query,
        score_with_o=...,
        from_source="add"
    )
    q_list_next.append(new_q)

    # 更新 seed
    seed.added_words.append(result.selected_word)
```

**关键逻辑**:
- 避免重复: 词不在 seed.text 中且未被添加过
- Agent 智能选择: 使用 word_selector 选择最佳组合
- 评估新 query: 评估组合后的 query 与原始问题的相关度

**示例**:
```
seed = Seed("川西", added_words=[])
candidate_words = ["秋季", "摄影"]
    ↓ [word_selector]
selected_word = "秋季"
combined_query = "川西秋季"
    ↓ [relevance_evaluator]
new_q = Q("川西秋季", score=0.88, from_source="add")
```

**5.2 高分 sug 加入 q_list_next**

```python
for sug in all_sugs:
    if sug.score_with_o > sug.from_q.score_with_o:
        new_q = Q(
            text=sug.text,
            score_with_o=sug.score_with_o,
            from_source="sug"
        )
        q_list_next.append(new_q)
```

**条件**: sug 分数 > 来源 query 分数

**示例**:
```
sug = Sug("秋季摄影技巧", score=0.92, from_q=Q("秋季", score=0.85))
    ↓ (0.92 > 0.85)
q_list_next.append(Q("秋季摄影技巧", score=0.92, from_source="sug"))
```

#### 步骤6: 更新 seed_list

```python
seed_list_next = seed_list.copy()  # 保留原有 seed

for sug in all_sugs:
    if (sug.score_with_o > sug.from_q.score_with_o
        and sug.text not in existing_seed_texts):
        new_seed = Seed(
            text=sug.text,
            from_type="sug",
            score_with_o=sug.score_with_o
        )
        seed_list_next.append(new_seed)
```

**条件**:
1. sug 分数 > 来源 query 分数
2. sug 未在 seed_list 中出现过

**示例**:
```
sug = Sug("川西秋季攻略", score=0.90, from_q=Q("川西", score=0.85))
    ↓ (0.90 > 0.85 且未重复)
seed_list_next.append(Seed("川西秋季攻略", from_type="sug"))
```

---

### 主循环 (iterative_loop)

**流程控制**:

```python
# 初始化
seg_list, word_list, q_list, seed_list = await initialize(o, context)

# 迭代
round_num = 1
while q_list and round_num <= max_rounds:
    word_list, q_list, seed_list, search_list = await run_round(
        round_num, q_list, word_list, seed_list, ...
    )
    all_search_list.extend(search_list)
    round_num += 1

return all_search_list
```

**终止条件**:
1. `q_list` 为空（没有更多查询需要处理）
2. 达到 `max_rounds` 限制

**数据累积**: 所有轮次的 search_list 合并到 `all_search_list`

---

## 📊 数据流图

### 完整数据流

```
输入:
├─ input_dir/context.md  (原始需求 c)
└─ input_dir/q.md        (原始问题 o)
    ↓
[初始化]
o → seg_list → word_list_1, q_list_1, seed_list
    ↓
[第1轮]
q_list_1 → sug_list_1 → search_list_1
         → q_list_2, seed_list_2 (通过加词+高分sug)
    ↓
[第2轮]
q_list_2 → sug_list_2 → search_list_2
         → q_list_3, seed_list_3
    ↓
[第N轮] ...
    ↓
输出:
├─ all_search_list (所有搜索结果)
├─ log_dir/run_context.json (运行上下文)
├─ log_dir/search_results.json (详细搜索结果)
└─ log_dir/visualization.html (可视化HTML)
```

### 每轮数据变化

```
轮次输入                          轮次输出
┌─────────────────┐             ┌─────────────────┐
│ q_list          │──┐          │ q_list_next     │
│ word_list       │  │          │ word_list_next  │
│ seed_list       │  │          │ seed_list_next  │
└─────────────────┘  │          │ search_list     │
                     │          └─────────────────┘
                     ↓
            ┌──────────────────┐
            │   run_round()    │
            │                  │
            │ 1. 请求sug       │
            │ 2. 评估sug       │
            │ 3. 搜索高分sug   │
            │ 4. 为seed加词    │
            │ 5. 构建q_next    │
            │ 6. 更新seed_list │
            └──────────────────┘
```

---

## 🎯 关键算法

### 1. 相关度评分机制

**评分函数**: `evaluate_with_o(text, o)`

**输入**:
- `text`: 待评估文本
- `o`: 原始问题

**输出**: `(score, reason)`

**实现**:
```python
async def evaluate_with_o(text: str, o: str) -> tuple[float, str]:
    eval_input = f"""
    <原始问题>{o}</原始问题>
    <当前文本>{text}</当前文本>
    请评估当前文本与原始问题的相关度。
    """
    result = await Runner.run(relevance_evaluator, eval_input)
    return result.final_output.relevance_score, result.final_output.reason
```

**应用场景**:
- 评估分词与原始问题的相关度
- 评估 sug 与原始问题的相关度
- 评估新组合 query 与原始问题的相关度

### 2. 加词策略

**目标**: 从词库中为 seed 选择最佳词进行组合

**候选词过滤**:
```python
candidate_words = [
    w for w in word_list
    if w.text not in seed.text           # 词不在seed中
    and w.text not in seed.added_words   # 词未被添加过
]
```

**智能选择**:
```python
selection_input = f"""
<原始问题>{o}</原始问题>
<当前Seed>{seed.text}</当前Seed>
<候选词列表>{', '.join([w.text for w in candidate_words])}</候选词列表>
请从候选词中选择一个最合适的词，与当前seed组合成新的query。
"""
result = await Runner.run(word_selector, selection_input)
```

**验证和评估**:
```python
# 验证选择的词在候选列表中
if selection.selected_word not in [w.text for w in candidate_words]:
    continue

# 评估组合后的query
new_q_score, new_q_reason = await evaluate_with_o(
    selection.combined_query, o
)
```

### 3. Sug 晋升机制

**晋升到 q_list 的条件**:
```python
if sug.score_with_o > sug.from_q.score_with_o:
    q_list_next.append(Q(
        text=sug.text,
        score_with_o=sug.score_with_o,
        from_source="sug"
    ))
```

**晋升到 seed_list 的条件**:
```python
if (sug.score_with_o > sug.from_q.score_with_o
    and sug.text not in existing_seed_texts):
    seed_list_next.append(Seed(
        text=sug.text,
        from_type="sug",
        score_with_o=sug.score_with_o
    ))
```

**逻辑**: 只有当 sug 的评分超过其来源 query 时，才认为 sug 是更优的查询词

### 4. 搜索阈值过滤

**目标**: 只搜索高质量的建议词

**实现**:
```python
high_score_sugs = [
    sug for sug in all_sugs
    if sug.score_with_o > sug_threshold
]

# 并发搜索
search_list = await asyncio.gather(*[
    search_for_sug(sug) for sug in high_score_sugs
])
```

**默认阈值**: 0.7（可通过 `--sug-threshold` 参数调整）

---

## 🔧 外部服务集成

### 1. 小红书搜索推荐 API

**类**: `XiaohongshuSearchRecommendations`

**方法**: `get_recommendations(keyword: str) -> list[str]`

**功能**: 获取指定关键词的搜索建议词

**使用场景**: 在每轮中为 q_list 中的每个 query 请求建议词

### 2. 小红书搜索 API

**类**: `XiaohongshuSearch`

**方法**: `search(keyword: str) -> dict`

**功能**: 搜索指定关键词，返回帖子列表

**返回数据处理**:
```python
def process_note_data(note: dict) -> Post:
    note_card = note.get("note_card", {})
    return Post(
        note_id=note.get("id"),
        title=note_card.get("display_title"),
        body_text=note_card.get("desc"),
        type=note_card.get("type", "normal"),
        images=[img.get("image_url") for img in note_card.get("image_list", [])],
        interact_info={
            "liked_count": ...,
            "collected_count": ...,
            "comment_count": ...,
            "shared_count": ...
        },
        note_url=f"https://www.xiaohongshu.com/explore/{note.get('id')}"
    )
```

---

## 📝 日志和输出

### 运行上下文 (run_context.json)

**保存内容**:
```json
{
  "version": "sug_v6_1_2_8.py",
  "input_files": {...},
  "c": "原始需求",
  "o": "原始问题",
  "log_dir": "...",
  "log_url": "...",
  "rounds": [
    {
      "round_num": 0,
      "type": "initialization",
      "seg_list": [...],
      "word_list_1": [...],
      "q_list_1": [...],
      "seed_list": [...]
    },
    {
      "round_num": 1,
      "input_q_list": [...],
      "sug_count": 20,
      "high_score_sug_count": 5,
      "search_count": 5,
      "total_posts": 50,
      "sug_details": {...},
      "add_word_details": {...},
      "search_results": [...]
    },
    ...
  ],
  "final_output": "..."
}
```

### 搜索结果 (search_results.json)

**保存内容**:
```json
[
  {
    "text": "秋季摄影",
    "score_with_o": 0.92,
    "reason": "...",
    "from_q": {
      "text": "秋季",
      "score_with_o": 0.85
    },
    "post_list": [
      {
        "note_id": "...",
        "note_url": "...",
        "title": "...",
        "body_text": "...",
        "images": [...],
        "interact_info": {...}
      },
      ...
    ]
  },
  ...
]
```

### 可视化 HTML

**生成方式**:
```python
subprocess.run([
    "node",
    "visualization/sug_v6_1_2_8/index.js",
    abs_context_file,
    abs_output_html
])
```

**依赖**: Node.js + React + esbuild

**生成的文件**: `log_dir/visualization.html`

---

## 🚀 使用方法

### 命令行参数

```bash
python3 sug_v6_1_2_8.py \
  --input-dir "input/旅游/如何获取川西秋季风光摄影素材？" \
  --max-rounds 4 \
  --sug-threshold 0.7 \
  --visualize
```

**参数说明**:

| 参数 | 类型 | 默认值 | 说明 |
|------|------|--------|------|
| `--input-dir` | str | `input/旅游-逸趣玩旅行/...` | 输入目录路径 |
| `--max-rounds` | int | 4 | 最大迭代轮数 |
| `--sug-threshold` | float | 0.7 | 建议词评分阈值 |
| `--visualize` | flag | True | 是否生成可视化 |

### 输入文件结构

```
input_dir/
├── context.md   # 原始需求描述
└── q.md         # 原始问题
```

### 输出文件结构

```
input_dir/output/sug_v6_1_2_8/{timestamp}/
├── run_context.json      # 运行上下文
├── search_results.json   # 详细搜索结果
└── visualization.html    # 可视化页面
```

---

## 🎨 并发优化

### 并发点

1. **分词评估**: 所有 seg 并发评估
   ```python
   await asyncio.gather(*[evaluate_seg(seg) for seg in seg_list])
   ```

2. **Sug 评估**: 所有 sug 并发评估
   ```python
   await asyncio.gather(*[evaluate_sug(sug) for sug in all_sugs])
   ```

3. **搜索**: 所有高分 sug 并发搜索
   ```python
   await asyncio.gather(*[search_for_sug(sug) for sug in high_score_sugs])
   ```

### 串行点

1. **分词**: 必须先完成分词才能评估
2. **轮次迭代**: 必须按顺序执行各轮
3. **加词选择**: 每个 seed 的加词必须等待 Agent 返回

---

## 🔍 核心特点

### 1. 迭代扩展
- 从原始问题出发，逐轮扩展搜索词
- 通过 seed + word 组合生成新查询
- 通过 sug 晋升机制引入新的高质量查询

### 2. 智能评分
- 所有文本与原始问题的相关度都通过 LLM 评估
- 评分结果用于过滤、排序、晋升决策

### 3. 多 Agent 协作
- 分词专家: 拆分原始问题
- 相关度评估专家: 统一评分标准
- 加词选择专家: 智能组合词汇

### 4. 数据驱动
- 完整记录每轮的输入输出
- 可追溯每个 query/sug 的来源
- 支持可视化分析

### 5. 高并发
- 利用 asyncio 实现高并发
- 评估、搜索等操作并行执行
- 提升整体执行效率

---

## 🐛 潜在问题和改进方向

### 1. 词库静态
**问题**: word_list 在初始化后不再更新，可能错过新的有价值的词

**改进方向**:
- 从高分 sug 中提取新词加入 word_list
- 从搜索结果的标题/正文中提取关键词

### 2. 加词盲目性
**问题**: 每个 seed 每轮必须加一个词，即使候选词质量不高

**改进方向**:
- 增加加词的评分阈值
- 允许 seed 在某轮跳过加词

### 3. Sug 重复
**问题**: 不同 query 可能返回相同的 sug，导致重复搜索

**改进方向**:
- 全局去重 sug
- 记录已搜索的 query，避免重复搜索

### 4. 搜索结果未利用
**问题**: 搜索到的帖子内容没有被进一步分析和利用

**改进方向**:
- 分析帖子标题/内容提取新的关键词
- 评估帖子质量，作为 query 质量的反馈

### 5. 阈值固定
**问题**: sug_threshold 固定，可能导致某些轮次没有搜索结果

**改进方向**:
- 动态调整阈值
- 保证每轮至少有一定数量的搜索

---

## 📈 性能分析

### 时间复杂度

假设:
- 每轮 q_list 大小: `Q`
- 每个 q 的 sug 数量: `S`
- 每轮 seed 数量: `K`
- 最大轮数: `R`

**每轮时间复杂度**:
- 请求 sug: `O(Q)`（并发）
- 评估 sug: `O(Q * S)`（并发）
- 搜索: `O(高分sug数量)`（并发）
- 加词: `O(K * word_list大小)`（串行，但每个加词操作并发评估）

**总时间复杂度**: `O(R * (Q + Q*S + K*W))`

### 空间复杂度

- `seg_list`: `O(分词数)`
- `word_list`: `O(分词数)`（当前版本）
- `q_list`: `O(Q)` 每轮
- `seed_list`: `O(K)` 每轮
- `sug_list`: `O(Q * S)` 每轮
- `search_list`: `O(高分sug数) * O(每个搜索的帖子数)`

**总空间复杂度**: `O(R * (Q*S + 高分sug数*帖子数))`

---

## 🎯 总结

`sug_v6_1_2_8.py` 是一个设计精良的搜索查询优化系统，具有以下特点:

### 优势
1. ✅ **模块化设计**: 数据模型、Agent、流程控制分离清晰
2. ✅ **智能化**: 利用多个 LLM Agent 实现分词、评估、选词
3. ✅ **可扩展**: 通过迭代机制不断扩展搜索范围
4. ✅ **高性能**: 大量使用并发优化执行效率
5. ✅ **可追溯**: 完整记录每轮数据，支持可视化分析

### 核心流程
```
原始问题 → 分词 → 评估 → 迭代(请求sug → 评估 → 搜索 → 加词 → 更新) → 输出结果
```

### 关键机制
- **评分机制**: 统一的相关度评估标准
- **晋升机制**: 高分 sug 晋升为 query 和 seed
- **扩展机制**: seed + word 组合生成新 query
- **过滤机制**: 阈值过滤低质量 sug

### 适用场景
- 搜索查询扩展和优化
- 关键词发现和探索
- 内容检索和推荐
- 搜索效果分析

---

**文档生成时间**: 2025-11-03
**代码版本**: sug_v6_1_2_8.py
**作者**: Knowledge Agent Team