1 hari lalu · 2c943a43f8
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1,97 @@
 
				+# CLAUDE.md
			
 
				+
			
 
				+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
			
 
				+
			
 
				+## Project Overview
			
 
				+
			
 
				+LongArticleSearchAgent is an AI-powered content discovery system for finding long-form WeChat articles. It combines a general-purpose agent framework (`agent/`) with a domain-specific pipeline (`src/pipeline/`) that orchestrates search, filtering, and account aggregation stages.
			
 
				+
			
 
				+## Common Commands
			
 
				+
			
 
				+```bash
			
 
				+# Install dependencies
			
 
				+make install                    # or: pip install -r requirements.txt
			
 
				+
			
 
				+# Run web server (dev with reload)
			
 
				+make dev                        # hypercorn app:app --bind 0.0.0.0:8000 --reload
			
 
				+
			
 
				+# Run web server (production)
			
 
				+make run
			
 
				+
			
 
				+# Run pipeline CLI (no database required)
			
 
				+python run_pipeline.py
			
 
				+
			
 
				+# Run production harness (with budget control, observer, DB)
			
 
				+python run_search_agent.py
			
 
				+
			
 
				+# Lint
			
 
				+make lint                       # ruff check src/ tests/ app.py
			
 
				+
			
 
				+# Format
			
 
				+make format                     # ruff check --fix + ruff format
			
 
				+
			
 
				+# Run all tests
			
 
				+make test                       # pytest tests/ -v
			
 
				+
			
 
				+# Run specific test file or test
			
 
				+pytest tests/unit/test_config.py -v
			
 
				+pytest tests/integration/test_health.py::test_health_returns_ok -v
			
 
				+
			
 
				+# Docker
			
 
				+make docker-build && make docker-run
			
 
				+
			
 
				+# Clean caches
			
 
				+make clean
			
 
				+```
			
 
				+
			
 
				+## Environment Variables
			
 
				+
			
 
				+Required: `OPEN_ROUTER_API_KEY`. Key optional vars: `PIPELINE_QUERY`, `PIPELINE_DEMAND_ID`, `PIPELINE_TARGET_COUNT`, `PIPELINE_TEMPERATURE`, `PIPELINE_MAX_ITERATIONS`, `PIPELINE_TIMEOUT`, `MODEL` (default: `anthropic/claude-sonnet-4-5`). Database vars (`SEARCH_AGENT_DB_*`) are optional; the system runs without a database using default policies.
			
 
				+
			
 
				+## Architecture
			
 
				+
			
 
				+### Two-Layer Design
			
 
				+
			
 
				+- **Agent Core** (`agent/`): General autonomous agent framework (~24K lines). Contains `AgentRunner` execution loop, LLM provider integrations (OpenRouter, Gemini, Qwen), tool registry with JSON schema, skill system (markdown-based prompt injection), and `FileSystemTraceStore` for execution persistence.
			
 
				+
			
 
				+- **Application Layer** (`src/`): Domain-specific logic built on top of the agent framework.
			
 
				+  - `src/domain/search/` — `SearchAgentCore` service, `SearchAgentPolicy` strategy config, `SearchAgentPolicyRepository` for DB-driven runtime policy override.
			
 
				+  - `src/pipeline/` — `PipelineOrchestrator` engine with 6 stages, 3 quality gates, and 4 lifecycle hooks.
			
 
				+  - `src/config/` — Pydantic `BaseSettings` classes with env prefix and `.env` support.
			
 
				+  - `src/infra/` — Async MySQL pool, HTTP client, Aliyun logging, Redis/ES/Milvus clients.
			
 
				+
			
 
				+### Pipeline Flow
			
 
				+
			
 
				+```
			
 
				+DemandAnalysisStage → ContentSearchStage → [SearchCompletenessGate]
			
 
				+  → HardFilterStage → QualityFilterStage → [FilterSufficiencyGate]
			
 
				+  → AccountPrecipitateStage → OutputPersistStage → [OutputSchemaGate]
			
 
				+```
			
 
				+
			
 
				+- **Stages** (`src/pipeline/stages/`): Sequential processing units. `DemandAnalysisStage` and `QualityFilterStage` use `StageAgentExecutor` to run LLM calls with skill-defined prompts. `ContentSearchStage` delegates to a `ToolAdapter` (currently `WeixinToolAdapter`).
			
 
				+- **Quality Gates** (`src/pipeline/gates/`): Post-stage checks that return proceed/retry/fallback/abort actions. `FilterSufficiencyGate` can trigger fallback to `ContentSearchStage` if insufficient results.
			
 
				+- **Hooks** (`src/pipeline/hooks/`): Observer pattern for observability — JSONL trace writing, terminal progress, MySQL persistence, structured logging. All hooks are invoked at each lifecycle point by the orchestrator.
			
 
				+
			
 
				+### Key Patterns
			
 
				+
			
 
				+- **Adapter pattern** for platform abstraction: `ToolAdapter` base class (`src/pipeline/adapters/base.py`) with `WeixinToolAdapter` implementation wrapping `tests/tools/weixin_tools.py`.
			
 
				+- **Skills as markdown files** in `tests/skills/`: LLM prompts defining domain expertise (demand analysis, article finding/filtering strategy, account precipitation, output schema). Loaded by stages to guide agent behavior.
			
 
				+- **Pydantic config hierarchy**: `LongArticlesSearchAgentConfig` → includes `SearchAgentMySQLConfig`, `DeepSeekConfig`, `AliyunLogConfig`. `RuntimePipelineConfig` for per-run settings. `SearchAgentPolicy` for strategy parameters (loadable from DB via `search_agent_strategy` table).
			
 
				+- **Async throughout**: All stages, hooks, adapters, and database access use `async/await`.
			
 
				+
			
 
				+### Entry Points
			
 
				+
			
 
				+| Entry | Purpose |
			
 
				+|-------|---------|
			
 
				+| `app.py` | Quart ASGI web server with REST API |
			
 
				+| `run_pipeline.py` | Simple CLI runner (no DB strategy) |
			
 
				+| `run_search_agent.py` | Production CLI with budget control, timeout, DB policy loading |
			
 
				+| `pipeline_visualize.py` | Converts JSONL trace → HTML visualization |
			
 
				+
			
 
				+## Code Style
			
 
				+
			
 
				+- Python 3.11+, line length 120, Ruff for linting/formatting
			
 
				+- Rules: E, F, I (isort), W. E501 (line length) is ignored.
			
 
				+- Known first-party modules for isort: `src`, `agent`, `gateway`, `knowhub`
			
 
				+- Tests use `pytest-asyncio` with `asyncio_mode = "auto"` (no need for `@pytest.mark.asyncio` decorator)
			
 
				+- Test path includes project root (`pythonpath = ["."]`)
			
--- a/agent/core/runner.py
+++ b/agent/core/runner.py
@@ -274,6 +274,8 @@ class AgentRunner:
 
				         self.skills_dir = skills_dir
			
 
				         self.goal_tree = goal_tree
			
 
				         self.debug = debug
			
 
				+        # 预加载 skills，避免每次 run() 重复读取磁盘
			
 
				+        self._cached_skills: Optional[List] = None
			
 
				         self.stdin_check: Optional[Callable] = None  # 由外部设置，用于子 agent 执行期间检查 stdin
			
 
				         self._cancel_events: Dict[str, asyncio.Event] = {}  # trace_id → cancel event
			
 
				 
			
@@ -2684,8 +2686,10 @@ class AgentRunner:
 
				             if preset is not None:
			
 
				                 skills_filter = preset.skills  # 可能仍为 None（加载全部）
			
 
				 
			
 
				-        # 加载并过滤
			
 
				-        all_skills = load_skills_from_dir(self.skills_dir)
			
 
				+        # 加载并过滤（使用缓存，只从磁盘读取一次）
			
 
				+        if self._cached_skills is None:
			
 
				+            self._cached_skills = load_skills_from_dir(self.skills_dir)
			
 
				+        all_skills = self._cached_skills
			
 
				         if skills_filter is not None:
			
 
				             skills = [s for s in all_skills if s.name in skills_filter]
			
 
				         else:
			
--- a/docs/content-finder-pipeline-design.md
+++ b/docs/content-finder-pipeline-design.md
@@ -0,0 +1,317 @@
 
				+# 内容寻找 Pipeline 设计文档
			
 
				+
			
 
				+本文描述在**不修改 `agent/` 内核**前提下，为「微信内容寻找」任务搭建的**编排层（Orchestration）**：多阶段流水线、质量门禁、可插拔适配器与观测/持久化 Hook。实现代码位于 `src/pipeline/`，入口为仓库根目录 `run_pipeline.py`。
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 1. 背景与目标
			
 
				+
			
 
				+### 1.1 业务目标
			
 
				+
			
 
				+上游输入需求（`query`、`demand_id`、目标条数等），在**微信平台**内：
			
 
				+
			
 
				+1. 理解需求（特征分层、搜索策略建议）
			
 
				+2. 召回候选文章
			
 
				+3. 拉取正文并做质量/相关性判断
			
 
				+4. 沉淀公众号账号维度结果
			
 
				+5. 按固定 Schema 落盘 `output.json`，并可选写入 MySQL
			
 
				+
			
 
				+### 1.2 与旧入口 `tests/run_single.py` 的差异
			
 
				+
			
 
				+| 维度 | `run_single.py` + `content_finder.md` | `run_pipeline.py` + `src/pipeline/` |
			
 
				+|------|----------------------------------------|-------------------------------------|
			
 
				+| 流程控制 | 单次 `AgentRunner.run`，依赖长 System Prompt 约束顺序 | **程序化阶段** + 局部 LLM 子任务 |
			
 
				+| 约束落地 | 主要靠 prompt 自检 | **QualityGate** 程序化校验；不足时可 **fallback** 回搜 |
			
 
				+| 工具调用 | 模型自主选工具 | 搜索/详情/账号：**适配器直连** 或 **Stage 内 Agent** 混合 |
			
 
				+| 可观测 | JSONL + 控制台 | Hook：`TraceHook`、可选 `DatabasePersistHook` |
			
 
				+| 扩展 | 改 prompt / 工具注册 | **ToolAdapter**、**KnowledgeSource**、新 Stage / Gate |
			
 
				+
			
 
				+旧入口仍保留，用于对照与渐进迁移；新入口面向「可重跑、可审计、可扩展」的生产形态。
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 2. 总体架构
			
 
				+
			
 
				+### 2.1 分层视图
			
 
				+
			
 
				+```mermaid
			
 
				+flowchart TB
			
 
				+    subgraph Entry["执行入口"]
			
 
				+        RP[run_pipeline.py]
			
 
				+    end
			
 
				+
			
 
				+    subgraph Orch["编排层 src/pipeline"]
			
 
				+        PO[PipelineOrchestrator]
			
 
				+        CTX[PipelineContext]
			
 
				+        ST[Stages]
			
 
				+        G[Gates]
			
 
				+        H[Hooks]
			
 
				+    end
			
 
				+
			
 
				+    subgraph AgentKernel["agent/ 内核（不改）"]
			
 
				+        AR[AgentRunner]
			
 
				+        TS[FileSystemTraceStore]
			
 
				+        LLM[OpenRouter LLM]
			
 
				+    end
			
 
				+
			
 
				+    subgraph Exec["执行层"]
			
 
				+        WA[WeixinToolAdapter]
			
 
				+        WX[tests/tools/weixin_tools]
			
 
				+        HTTP[src.infra.shared.http_client]
			
 
				+    end
			
 
				+
			
 
				+    subgraph Persist["可选持久化"]
			
 
				+        DB[(MySQL)]
			
 
				+    end
			
 
				+
			
 
				+    RP --> PO
			
 
				+    PO --> CTX
			
 
				+    PO --> ST
			
 
				+    PO --> G
			
 
				+    PO --> H
			
 
				+    ST -->|DemandAnalysis / LLM 复评| AR
			
 
				+    AR --> LLM
			
 
				+    AR --> TS
			
 
				+    ST --> WA
			
 
				+    WA --> WX
			
 
				+    WX --> HTTP
			
 
				+    H -->|DatabasePersistHook| DB
			
 
				+```
			
 
				+
			
 
				+**原则**：`agent/` 只负责「单次对话 + 工具循环」；**谁调用、何时调用、阶段间数据契约**由 `src/pipeline` 决定。
			
 
				+
			
 
				+### 2.2 模块职责
			
 
				+
			
 
				+| 模块 | 路径 | 职责 |
			
 
				+|------|------|------|
			
 
				+| 上下文与契约 | `context.py` | `PipelineContext`、文章/账号/输出等数据结构 |
			
 
				+| 抽象基类 | `base.py` | `Stage`、`QualityGate`、`PipelineHook`、`PipelineConfig` |
			
 
				+| 编排器 | `orchestrator.py` | 顺序执行 Stage、checkpoint、门禁解析、Hook 回调 |
			
 
				+| 阶段 | `stages/*.py` | 各业务步骤实现 |
			
 
				+| 门禁 | `gates/*.py` | 阶段后程序化校验与 `proceed / retry / fallback / abort` |
			
 
				+| 适配器 | `adapters/*.py` | 平台工具抽象；当前 `WeixinToolAdapter` |
			
 
				+| 知识源 | `adapters/knowledge/` | `KnowledgeSource` 扩展点 |
			
 
				+| Hook | `hooks/*.py` | 日志追踪、MySQL 落库（可选） |
			
 
				+| 运行时配置 | `config/pipeline_config.py` | 环境变量驱动的模型与超参 |
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 3. 流水线阶段与数据流
			
 
				+
			
 
				+### 3.1 阶段顺序（当前默认）
			
 
				+
			
 
				+```mermaid
			
 
				+flowchart LR
			
 
				+    S1[demand_analysis] --> S2[content_search]
			
 
				+    S2 --> G1{SearchCompletenessGate}
			
 
				+    G1 --> S3[hard_filter]
			
 
				+    S3 --> S4[quality_filter]
			
 
				+    S4 --> G2{FilterSufficiencyGate}
			
 
				+    G2 -->|fallback| S2
			
 
				+    G2 --> S5[account_precipitate]
			
 
				+    S5 --> S6[output_persist]
			
 
				+    S6 --> G3{OutputSchemaGate}
			
 
				+```
			
 
				+
			
 
				+### 3.2 各 Stage 说明
			
 
				+
			
 
				+| Stage | 类型 | 主要行为 | 产出写入 `ctx` |
			
 
				+|-------|------|----------|----------------|
			
 
				+| `demand_analysis` | LLM（`StageAgentExecutor` + `AgentRunner`） | 需求分层与策略 JSON | `demand_analysis` |
			
 
				+| `content_search` | 代码 + `WeixinToolAdapter` | 按关键词串行 `weixin_search`，去重 | `candidate_articles` |
			
 
				+| `hard_filter` | 代码 | URL/时间戳等硬规则、去重 | `candidate_articles` |
			
 
				+| `quality_filter` | 代码 + 详情 API + 可选 LLM | `fetch_article_detail`；启发式打分 + **LLM 复评**（可关） | `filtered_articles` |
			
 
				+| `account_precipitate` | 代码 + API | `fetch_weixin_account` 聚合账号 | `accounts`、`article_account_relations` |
			
 
				+| `output_persist` | 代码 | 写 `tests/output/{trace_id}/output.json` | `output`、`metadata["output_file"]` |
			
 
				+
			
 
				+**说明**：`trace_id` 在首次 `AgentRunner.run`（需求分析或质量复评）时由框架生成并写入 `ctx.trace_id`；`output_persist` 依赖该字段确定目录。
			
 
				+
			
 
				+### 3.3 单次运行的时序（简化）
			
 
				+
			
 
				+```mermaid
			
 
				+sequenceDiagram
			
 
				+    participant RP as run_pipeline
			
 
				+    participant PO as Orchestrator
			
 
				+    participant DA as DemandAnalysisStage
			
 
				+    participant AR as AgentRunner
			
 
				+    participant CS as ContentSearchStage
			
 
				+    participant AD as WeixinAdapter
			
 
				+    participant QF as QualityFilterStage
			
 
				+    participant OP as OutputPersistStage
			
 
				+
			
 
				+    RP->>PO: run(ctx)
			
 
				+    PO->>DA: execute(ctx)
			
 
				+    DA->>AR: run_json_stage(...)
			
 
				+    AR-->>DA: assistant JSON
			
 
				+    DA-->>PO: ctx.demand_analysis
			
 
				+
			
 
				+    PO->>CS: execute(ctx)
			
 
				+    CS->>AD: search(keyword)
			
 
				+    AD-->>CS: CandidateArticle[]
			
 
				+    CS-->>PO: ctx.candidate_articles
			
 
				+
			
 
				+    Note over PO: Gate: 候选数量
			
 
				+
			
 
				+    PO->>QF: execute(ctx)
			
 
				+    loop 每条候选（上限）
			
 
				+        QF->>AD: get_article_detail(url)
			
 
				+        opt enable_llm_review
			
 
				+            QF->>AR: run_json_stage(复评)
			
 
				+        end
			
 
				+    end
			
 
				+    QF-->>PO: ctx.filtered_articles
			
 
				+
			
 
				+    Note over PO: Gate: 入选数量 / fallback 回搜
			
 
				+
			
 
				+    PO->>OP: execute(ctx)
			
 
				+    OP-->>PO: output.json
			
 
				+```
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 4. 质量门禁（QualityGate）
			
 
				+
			
 
				+门禁在 **Stage 成功执行后** 运行，结果由 `orchestrator._resolve_gate` 解释：
			
 
				+
			
 
				+| Gate | 挂载阶段 | 行为概要 |
			
 
				+|------|----------|----------|
			
 
				+| `SearchCompletenessGate` | `content_search` | 候选数与 `target_count * 2` 比较；严重不足则 `abort` |
			
 
				+| `FilterSufficiencyGate` | `quality_filter` | 入选数与目标比较；不足可走 `fallback` 回到 `content_search` |
			
 
				+| `OutputSchemaGate` | `output_persist` | `summary` 与 `contents/accounts/relations` 一致性 |
			
 
				+
			
 
				+`GateResult.action`：`proceed` | `retry_stage` | `fallback` | `abort`（`abort` 会抛错终止流水线）。
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 5. Hook 与可观测性
			
 
				+
			
 
				+### 5.1 PipelineTraceHook
			
 
				+
			
 
				+- 路径：`src/pipeline/hooks/pipeline_trace_hook.py`
			
 
				+- 作用：在 pipeline / stage / gate 关键点写入 JSONL 事件流（`tests/traces/{trace_id}/pipeline.jsonl`），供 `pipeline_visualize.py` 渲染 HTML。
			
 
				+- **决策数据落盘**：`stage_complete` 事件包含 `decisions` 字段，记录该阶段的完整决策信息：
			
 
				+
			
 
				+| 阶段 | `decisions` 内容 |
			
 
				+|------|------------------|
			
 
				+| `demand_analysis` | 特征分层（实质/形式/上/下层）、搜索策略（精准词/主题词）、筛选关注点 |
			
 
				+| `content_search` | 每个搜索词的命中统计（关键词、返回数、新增数）、累计候选数 |
			
 
				+| `hard_filter` | 过滤后剩余候选数 |
			
 
				+| `quality_filter` | 逐篇审核记录（标题、评分、accept/reject/skip、原因、审核阶段） |
			
 
				+| `account_precipitate` | 账号名、文章数、示例标题 |
			
 
				+| `output_persist` | 输出文件路径 |
			
 
				+
			
 
				+### 5.2 LiveProgressHook
			
 
				+
			
 
				+- 路径：`src/pipeline/hooks/live_progress_hook.py`
			
 
				+- 作用：终端实时进度可视化，在 stdout 打印各阶段的进度、模型思考、搜索词命中与逐篇审核决策等信息。
			
 
				+- 与 `PipelineTraceHook` 协作：两者监听相同事件，`LiveProgressHook` 输出到终端便于实时观察，`PipelineTraceHook` 持久化到 JSONL 便于事后可视化与审计。
			
 
				+
			
 
				+### 5.3 DatabasePersistHook
			
 
				+
			
 
				+- 路径：`src/pipeline/hooks/db_hook.py`
			
 
				+- 作用：在配置了 `LongArticlesSearchAgentConfig.search_agent_db`（`SEARCH_AGENT_DB_*`）时，向 `docs/supply-agent-solution.md` 中建议的表写入任务、阶段、候选、评分、账号、事件等摘要。
			
 
				+- **未配置数据库时自动跳过**，不影响本地无库运行。
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 6. 扩展点
			
 
				+
			
 
				+### 6.1 ToolAdapter
			
 
				+
			
 
				+- 基类：`src/pipeline/adapters/base.py`（`search` / `get_article_detail` / `get_account`）
			
 
				+- 微信实现：`WeixinToolAdapter`，内部调用 `tests/tools/weixin_tools.py`
			
 
				+- 扩展新平台：实现新 Adapter，在 `build_pipeline` 中替换注入即可。
			
 
				+
			
 
				+### 6.2 KnowledgeSource
			
 
				+
			
 
				+- 基类：`src/pipeline/adapters/knowledge/base.py`
			
 
				+- 示例：`StaticKnowledgeSource`；`run_pipeline.py` 中演示注入 `platform_rules`
			
 
				+- 扩展：历史任务、账号池、黑名单等，在构建 `PipelineContext` 时填入 `knowledge_sources`，`DemandAnalysisStage` 会 `query` 并拼入 prompt。
			
 
				+
			
 
				+### 6.3 新 Stage / Gate / Hook
			
 
				+
			
 
				+- 继承 `Stage` / `QualityGate` / `PipelineHook`，在 `run_pipeline.build_pipeline` 中注册即可，无需改 `agent/`。
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 7. 配置与环境变量
			
 
				+
			
 
				+### 7.1 必选
			
 
				+
			
 
				+| 变量 | 说明 |
			
 
				+|------|------|
			
 
				+| `OPEN_ROUTER_API_KEY` | OpenRouter 调用密钥 |
			
 
				+
			
 
				+### 7.2 Pipeline 运行时（`RuntimePipelineConfig`）
			
 
				+
			
 
				+| 变量 | 默认 | 说明 |
			
 
				+|------|------|------|
			
 
				+| `MODEL` | `anthropic/claude-sonnet-4-5` | LLM 模型 id |
			
 
				+| `PIPELINE_TEMPERATURE` | `0.2` | 阶段内 Agent 温度 |
			
 
				+| `PIPELINE_MAX_ITERATIONS` | `12` | 单阶段 Agent 最大轮次 |
			
 
				+| `PIPELINE_TARGET_COUNT` | `10` | 默认目标文章数 |
			
 
				+| `PIPELINE_QUERY` / `PIPELINE_DEMAND_ID` | 见 `run_pipeline.main` | CLI 演示用 |
			
 
				+
			
 
				+### 7.3 MySQL（可选，`DatabasePersistHook`）
			
 
				+
			
 
				+使用 `src/config/database/mysql_config.py` 中 `SearchAgentMySQLConfig` 的前缀：`SEARCH_AGENT_DB_HOST` 等；未配全则 Hook 不启用。
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 8. 输出契约
			
 
				+
			
 
				+- 文件：`tests/output/{trace_id}/output.json`
			
 
				+- 字段与 `tests/skills/output_schema.md` 对齐：`trace_id`、`query`、`demand_id`、`summary`、`contents`、`accounts`、`article_account_relations`
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 9. 与《供给 Agent 技术方案》的关系
			
 
				+
			
 
				+`docs/supply-agent-solution.md` 描述了更完整的任务状态机与表结构。本 Pipeline 是其中 **RECALL →（详情）→ RANK → 账号沉淀 → 落盘** 的落地实现子集；`DatabasePersistHook` 按该文档表名做**尽力写入**，表未建齐时会在日志中 warning 并跳过对应 SQL。
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 10. 已知限制与后续路线
			
 
				+
			
 
				+1. **质量复评成本**：`quality_filter` 在 `enable_llm_review=True` 时对每条候选可能触发额外 `AgentRunner` 调用，需按成本调 `detail_limit` 或改为批量评审。
			
 
				+2. **Fallback 回搜**：若关键词集合不变，单纯重复 `content_search` 可能无法增加候选，后续可扩展「翻页 / 扩展词表」策略。
			
 
				+3. **与 mega-agent 行为对齐**：若需与 `content_finder.md` 完全同行为，可将部分 Stage 改回「纯 Agent + 工具」并由 JSON 抽取结果，编排层仍保留门禁与落库。
			
 
				+4. **图示**：本文使用 **Mermaid** 嵌入架构与流程；若需对外 PPT，可将 Mermaid 导出为 PNG/SVG（VS Code / Mermaid Live Editor 等）。
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 11. 关键文件索引
			
 
				+
			
 
				+| 用途 | 路径 |
			
 
				+|------|------|
			
 
				+| 入口（无 DB 策略） | `run_pipeline.py` → `src/pipeline/runner.py` |
			
 
				+| 入口（从 search_agent 库读策略） | `run_search_agent.py` → `src/domain/search/core.py` |
			
 
				+| 编排核心 | `src/pipeline/orchestrator.py`、`base.py`、`context.py` |
			
 
				+| JSONL 追踪 Hook | `src/pipeline/hooks/pipeline_trace_hook.py` |
			
 
				+| 终端实时进度 Hook | `src/pipeline/hooks/live_progress_hook.py` |
			
 
				+| HTML 可视化工具 | `pipeline_visualize.py` |
			
 
				+| 追踪产物目录 | `tests/traces/<trace_id>/` |
			
 
				+| 旧对照入口 | `tests/run_single.py` |
			
 
				+| 业务 Prompt（旧链路） | `tests/content_finder.md` |
			
 
				+| 技能片段 | `tests/skills/*.md` |
			
 
				+| 微信工具 | `tests/tools/weixin_tools.py` |
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 12. Search Agent 策略（search_agent 库）
			
 
				+
			
 
				+与 `LongArticlesSearchAgentConfig.search_agent_db`（环境变量 `SEARCH_AGENT_DB_*`）对应库中可配置表 **`search_agent_strategy`**，用于覆盖搜索与筛选的运行时参数（无需改代码）。
			
 
				+
			
 
				+- **DDL 与字段说明**：[docs/search_agent_strategy.sql](search_agent_strategy.sql)
			
 
				+- **加载逻辑**：`SearchAgentPolicyRepository` 先按 `demand_id` 查最新 `version`；若无则取 `strategy_code='default'`。
			
 
				+- **注入方式**：`SearchAgentCore.run()` 将策略写入 `ctx.metadata["search_agent_policy"]`，以下阶段会读取：
			
 
				+  - `content_search`：`max_keywords`、`initial_cursor`、`keyword_priority`、`extra_keywords`、`min_candidate_multiplier`
			
 
				+  - `SearchCompletenessGate`：`min_candidate_multiplier`、`near_enough_candidate_multiplier`
			
 
				+  - `FilterSufficiencyGate`：`filter_near_ratio`
			
 
				+  - `quality_filter`：`max_detail_fetch`、`enable_llm_review`
			
 
				+
			
 
				+代码入口：`src/domain/search/core.py`（`SearchAgentCore`），命令行：`python run_search_agent.py`。
			
 
				+
			
 
				+---
			
 
				+
			
 
				+*文档版本：与当前仓库 `src/pipeline` 实现同步；若改阶段名或表结构，请同步更新本文第 3、5、9、12 节。*
			
--- a/docs/database-schema.sql
+++ b/docs/database-schema.sql
@@ -0,0 +1,272 @@
 
				+-- =========================================================
			
 
				+-- Search Agent 数据库表结构
			
 
				+-- 数据库名称: search_agent
			
 
				+-- 字符集: utf8mb4
			
 
				+-- =========================================================
			
 
				+
			
 
				+SET NAMES utf8mb4;
			
 
				+SET FOREIGN_KEY_CHECKS = 0;
			
 
				+
			
 
				+-- =========================================================
			
 
				+-- 1) 上游需求快照表：冻结需求上下文，便于回溯
			
 
				+-- =========================================================
			
 
				+CREATE TABLE IF NOT EXISTS supply_demand_snapshot (
			
 
				+  id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT COMMENT '自增主键',
			
 
				+  demand_id BIGINT NOT NULL COMMENT '上游需求ID（业务侧主键）',
			
 
				+  demand_code VARCHAR(64) DEFAULT NULL COMMENT '上游需求编码（可选）',
			
 
				+  query VARCHAR(255) NOT NULL COMMENT '核心搜索词',
			
 
				+  query_expansion JSON DEFAULT NULL COMMENT '扩展词列表（同义词/场景词/禁用词等）',
			
 
				+  platform VARCHAR(32) NOT NULL DEFAULT 'weixin' COMMENT '目标平台，当前为 weixin',
			
 
				+  expected_count INT NOT NULL DEFAULT 10 COMMENT '期望返回条数',
			
 
				+  audience_profile JSON DEFAULT NULL COMMENT '目标人群画像（如50+）',
			
 
				+  quality_constraints JSON DEFAULT NULL COMMENT '质量约束（最低阅读量、黑名单词等）',
			
 
				+  source_payload JSON DEFAULT NULL COMMENT '上游原始请求快照',
			
 
				+  version INT NOT NULL DEFAULT 1 COMMENT '快照版本号（同一 demand_id 可多版本）',
			
 
				+  created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
			
 
				+  updated_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
			
 
				+  PRIMARY KEY (id),
			
 
				+  KEY idx_demand_id (demand_id),
			
 
				+  KEY idx_platform (platform),
			
 
				+  KEY idx_created_at (created_at)
			
 
				+) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='上游需求快照表';
			
 
				+
			
 
				+
			
 
				+-- =========================================================
			
 
				+-- 2) 任务主表：一个 demand 可触发多次任务（重跑/补跑）
			
 
				+-- =========================================================
			
 
				+CREATE TABLE IF NOT EXISTS supply_task (
			
 
				+  id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT COMMENT '自增主键',
			
 
				+  task_id VARCHAR(64) NOT NULL COMMENT '任务唯一ID（建议UUID/雪花ID）',
			
 
				+  demand_snapshot_id BIGINT UNSIGNED NOT NULL COMMENT '关联 supply_demand_snapshot.id',
			
 
				+  demand_id BIGINT NOT NULL COMMENT '冗余字段：上游需求ID',
			
 
				+  trace_id VARCHAR(64) DEFAULT NULL COMMENT 'Agent trace_id，便于追踪模型执行',
			
 
				+  status VARCHAR(32) NOT NULL COMMENT '任务状态：PENDING/RUNNING/RECALL_DONE/ENRICHED/RANKED/PLAN_CREATED/COMPLETED/FAILED/CANCELLED',
			
 
				+  current_stage VARCHAR(32) DEFAULT NULL COMMENT '当前阶段：INIT/RECALL/ACCOUNT_ENRICH/ACCOUNT_PROFILE/CONTENT_ENRICH/RANK/CREATE_PLAN/FINALIZE',
			
 
				+  priority TINYINT NOT NULL DEFAULT 5 COMMENT '优先级（1最高，9最低）',
			
 
				+  retry_count INT NOT NULL DEFAULT 0 COMMENT '已重试次数',
			
 
				+  max_retry INT NOT NULL DEFAULT 3 COMMENT '最大重试次数',
			
 
				+  is_idempotent TINYINT NOT NULL DEFAULT 1 COMMENT '是否启用幂等（1是0否）',
			
 
				+  idempotency_key VARCHAR(128) DEFAULT NULL COMMENT '幂等键（可由 demand_id+query_hash 组成）',
			
 
				+  started_at DATETIME DEFAULT NULL COMMENT '任务启动时间',
			
 
				+  finished_at DATETIME DEFAULT NULL COMMENT '任务完成时间',
			
 
				+  duration_ms BIGINT DEFAULT NULL COMMENT '总耗时（毫秒）',
			
 
				+  error_code VARCHAR(64) DEFAULT NULL COMMENT '错误码（网络超时/参数错误/平台限流等）',
			
 
				+  error_message TEXT COMMENT '错误详情',
			
 
				+  operator VARCHAR(64) DEFAULT 'agent' COMMENT '执行方（agent/manual/cron）',
			
 
				+  ext JSON DEFAULT NULL COMMENT '扩展字段（灰度标记、AB实验信息等）',
			
 
				+  created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
			
 
				+  updated_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
			
 
				+  PRIMARY KEY (id),
			
 
				+  UNIQUE KEY uk_task_id (task_id),
			
 
				+  UNIQUE KEY uk_idempotency_key (idempotency_key),
			
 
				+  KEY idx_demand_id (demand_id),
			
 
				+  KEY idx_status_priority (status, priority),
			
 
				+  KEY idx_current_stage (current_stage),
			
 
				+  KEY idx_created_at (created_at)
			
 
				+) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='供给任务主表（状态机核心）';
			
 
				+
			
 
				+
			
 
				+-- =========================================================
			
 
				+-- 3) 阶段执行表：记录每个阶段的执行情况
			
 
				+-- =========================================================
			
 
				+CREATE TABLE IF NOT EXISTS supply_task_stage (
			
 
				+  id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT COMMENT '自增主键',
			
 
				+  task_id VARCHAR(64) NOT NULL COMMENT '关联任务ID',
			
 
				+  stage_name VARCHAR(32) NOT NULL COMMENT '阶段名：RECALL/ACCOUNT_ENRICH/ACCOUNT_PROFILE/CONTENT_ENRICH/RANK/CREATE_PLAN',
			
 
				+  stage_status VARCHAR(32) NOT NULL COMMENT '阶段状态：PENDING/RUNNING/SUCCESS/FAILED/SKIPPED',
			
 
				+  attempt_no INT NOT NULL DEFAULT 1 COMMENT '该阶段第几次执行',
			
 
				+  input_payload JSON DEFAULT NULL COMMENT '阶段输入快照',
			
 
				+  output_payload JSON DEFAULT NULL COMMENT '阶段输出快照（摘要）',
			
 
				+  started_at DATETIME DEFAULT NULL COMMENT '阶段开始时间',
			
 
				+  finished_at DATETIME DEFAULT NULL COMMENT '阶段结束时间',
			
 
				+  duration_ms BIGINT DEFAULT NULL COMMENT '阶段耗时（毫秒）',
			
 
				+  error_code VARCHAR(64) DEFAULT NULL COMMENT '阶段错误码',
			
 
				+  error_message TEXT COMMENT '阶段错误信息',
			
 
				+  created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
			
 
				+  updated_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
			
 
				+  PRIMARY KEY (id),
			
 
				+  KEY idx_task_stage (task_id, stage_name),
			
 
				+  KEY idx_stage_status (stage_status),
			
 
				+  KEY idx_started_at (started_at)
			
 
				+) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='任务阶段执行明细表';
			
 
				+
			
 
				+
			
 
				+-- =========================================================
			
 
				+-- 4) 候选内容池：搜索召回到的原始候选
			
 
				+-- =========================================================
			
 
				+CREATE TABLE IF NOT EXISTS supply_candidate_content (
			
 
				+  id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT COMMENT '自增主键',
			
 
				+  task_id VARCHAR(64) NOT NULL COMMENT '关联任务ID',
			
 
				+  source_keyword VARCHAR(255) DEFAULT NULL COMMENT '触发召回的关键词',
			
 
				+  recall_round INT NOT NULL DEFAULT 1 COMMENT '第几轮召回（核心词/扩展词/场景词）',
			
 
				+  recall_page INT DEFAULT NULL COMMENT '召回分页页码/游标序号',
			
 
				+  platform VARCHAR(32) NOT NULL DEFAULT 'weixin' COMMENT '来源平台',
			
 
				+  content_id VARCHAR(128) DEFAULT NULL COMMENT '平台内容ID（若可提取）',
			
 
				+  title VARCHAR(512) NOT NULL COMMENT '文章标题',
			
 
				+  url VARCHAR(1024) NOT NULL COMMENT '文章链接（业务唯一候选）',
			
 
				+  digest VARCHAR(1000) DEFAULT NULL COMMENT '文章摘要',
			
 
				+  cover_url VARCHAR(1024) DEFAULT NULL COMMENT '封面图链接',
			
 
				+  publish_time DATETIME DEFAULT NULL COMMENT '发布时间',
			
 
				+  account_id VARCHAR(128) DEFAULT NULL COMMENT '公众号ID（wx_gh）',
			
 
				+  account_name VARCHAR(255) DEFAULT NULL COMMENT '公众号名称',
			
 
				+  biz_info VARCHAR(128) DEFAULT NULL COMMENT '公众号 biz 标识',
			
 
				+  view_count BIGINT DEFAULT NULL COMMENT '阅读量（若有）',
			
 
				+  like_count BIGINT DEFAULT NULL COMMENT '点赞量（若有）',
			
 
				+  comment_count BIGINT DEFAULT NULL COMMENT '评论量（若有）',
			
 
				+  favorite_count BIGINT DEFAULT NULL COMMENT '收藏量（若有）',
			
 
				+  share_count BIGINT DEFAULT NULL COMMENT '分享量（若有）',
			
 
				+  raw_statistics JSON DEFAULT NULL COMMENT '原始统计字段',
			
 
				+  raw_payload JSON DEFAULT NULL COMMENT '原始API返回（单条）',
			
 
				+  dedup_hash CHAR(64) DEFAULT NULL COMMENT '去重哈希（建议 title+url 归一化后sha256）',
			
 
				+  quality_flag VARCHAR(32) DEFAULT 'UNKNOWN' COMMENT '初步质量标记：UNKNOWN/PASS/REJECT',
			
 
				+  reject_reason VARCHAR(255) DEFAULT NULL COMMENT '初筛淘汰原因',
			
 
				+  is_deleted TINYINT NOT NULL DEFAULT 0 COMMENT '软删除标记（1删除）',
			
 
				+  created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
			
 
				+  updated_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
			
 
				+  PRIMARY KEY (id),
			
 
				+  UNIQUE KEY uk_task_url (task_id, url(255)),
			
 
				+  KEY idx_task_round (task_id, recall_round),
			
 
				+  KEY idx_task_account (task_id, account_id),
			
 
				+  KEY idx_publish_time (publish_time),
			
 
				+  KEY idx_quality_flag (quality_flag),
			
 
				+  KEY idx_dedup_hash (dedup_hash)
			
 
				+) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='候选内容池（召回原始结果）';
			
 
				+
			
 
				+
			
 
				+-- =========================================================
			
 
				+-- 5) 账号画像表：账号维度的补全与打分输入
			
 
				+-- =========================================================
			
 
				+CREATE TABLE IF NOT EXISTS supply_account_profile (
			
 
				+  id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT COMMENT '自增主键',
			
 
				+  task_id VARCHAR(64) NOT NULL COMMENT '关联任务ID',
			
 
				+  account_id VARCHAR(128) NOT NULL COMMENT '公众号ID（wx_gh）',
			
 
				+  account_name VARCHAR(255) DEFAULT NULL COMMENT '账号名称',
			
 
				+  biz_info VARCHAR(128) DEFAULT NULL COMMENT '账号biz标识',
			
 
				+  intro TEXT COMMENT '账号简介',
			
 
				+  verify_status TINYINT DEFAULT NULL COMMENT '认证状态（0未认证1认证，按平台定义）',
			
 
				+  category VARCHAR(64) DEFAULT NULL COMMENT '账号分类（健康/资讯/情感等）',
			
 
				+  update_frequency VARCHAR(32) DEFAULT NULL COMMENT '更新频率（daily/weekly/irregular）',
			
 
				+  recent_article_count INT DEFAULT 0 COMMENT '近周期发文数',
			
 
				+  median_view_count BIGINT DEFAULT NULL COMMENT '历史阅读量中位数',
			
 
				+  median_like_count BIGINT DEFAULT NULL COMMENT '历史点赞量中位数',
			
 
				+  vertical_score DECIMAL(5,2) DEFAULT 0 COMMENT '垂直度评分（0-100）',
			
 
				+  stability_score DECIMAL(5,2) DEFAULT 0 COMMENT '稳定更新评分（0-100）',
			
 
				+  credibility_score DECIMAL(5,2) DEFAULT 0 COMMENT '可信度评分（0-100）',
			
 
				+  risk_tags JSON DEFAULT NULL COMMENT '风险标签（标题党/营销号/违规高风险）',
			
 
				+  raw_profile_payload JSON DEFAULT NULL COMMENT '账号详情原始返回',
			
 
				+  raw_history_payload JSON DEFAULT NULL COMMENT '历史文章原始返回（摘要）',
			
 
				+  created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
			
 
				+  updated_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
			
 
				+  PRIMARY KEY (id),
			
 
				+  UNIQUE KEY uk_task_account (task_id, account_id),
			
 
				+  KEY idx_task_credibility (task_id, credibility_score),
			
 
				+  KEY idx_category (category)
			
 
				+) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='账号画像表（账号维度特征）';
			
 
				+
			
 
				+
			
 
				+-- =========================================================
			
 
				+-- 6) 内容特征表：文章详情解析后的质量特征
			
 
				+-- =========================================================
			
 
				+CREATE TABLE IF NOT EXISTS supply_content_feature (
			
 
				+  id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT COMMENT '自增主键',
			
 
				+  task_id VARCHAR(64) NOT NULL COMMENT '关联任务ID',
			
 
				+  candidate_id BIGINT UNSIGNED NOT NULL COMMENT '关联候选内容ID',
			
 
				+  content_length INT DEFAULT NULL COMMENT '正文长度（字符数）',
			
 
				+  image_count INT DEFAULT NULL COMMENT '图片数量',
			
 
				+  video_count INT DEFAULT NULL COMMENT '视频数量',
			
 
				+  has_source_reference TINYINT DEFAULT 0 COMMENT '是否包含来源引用（0否1是）',
			
 
				+  readability_score DECIMAL(5,2) DEFAULT 0 COMMENT '可读性评分（0-100）',
			
 
				+  information_density_score DECIMAL(5,2) DEFAULT 0 COMMENT '信息密度评分（0-100）',
			
 
				+  elder_friendly_score DECIMAL(5,2) DEFAULT 0 COMMENT '老年友好评分（0-100）',
			
 
				+  sentiment_score DECIMAL(5,2) DEFAULT NULL COMMENT '情绪倾向分（可选）',
			
 
				+  risk_score DECIMAL(5,2) DEFAULT 0 COMMENT '风险分（越高风险越大）',
			
 
				+  risk_reasons JSON DEFAULT NULL COMMENT '风险原因明细（夸张词/伪科学词/诱导词）',
			
 
				+  keyword_coverage JSON DEFAULT NULL COMMENT 'query关键词覆盖明细',
			
 
				+  summary TEXT COMMENT '文章摘要（抽取）',
			
 
				+  raw_detail_payload JSON DEFAULT NULL COMMENT '文章详情原始返回',
			
 
				+  created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
			
 
				+  updated_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
			
 
				+  PRIMARY KEY (id),
			
 
				+  UNIQUE KEY uk_task_candidate_feature (task_id, candidate_id),
			
 
				+  KEY idx_task_quality (task_id, readability_score, information_density_score)
			
 
				+) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='内容特征表（文章维度特征）';
			
 
				+
			
 
				+
			
 
				+-- =========================================================
			
 
				+-- 7) 评分与筛选结果表：最终排序与入选
			
 
				+-- =========================================================
			
 
				+CREATE TABLE IF NOT EXISTS supply_content_score (
			
 
				+  id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT COMMENT '自增主键',
			
 
				+  task_id VARCHAR(64) NOT NULL COMMENT '关联任务ID',
			
 
				+  candidate_id BIGINT UNSIGNED NOT NULL COMMENT '关联候选内容ID',
			
 
				+  relevance_score DECIMAL(5,2) NOT NULL DEFAULT 0 COMMENT '相关性分（0-100）',
			
 
				+  popularity_score DECIMAL(5,2) NOT NULL DEFAULT 0 COMMENT '热度分（0-100）',
			
 
				+  quality_score DECIMAL(5,2) NOT NULL DEFAULT 0 COMMENT '内容质量分（0-100）',
			
 
				+  account_score DECIMAL(5,2) NOT NULL DEFAULT 0 COMMENT '账号可信分（0-100）',
			
 
				+  elder_fit_score DECIMAL(5,2) NOT NULL DEFAULT 0 COMMENT '老年适配分（0-100）',
			
 
				+  diversity_penalty DECIMAL(5,2) NOT NULL DEFAULT 0 COMMENT '多样性惩罚分（防止同账号过多）',
			
 
				+  total_score DECIMAL(6,2) NOT NULL DEFAULT 0 COMMENT '综合分',
			
 
				+  filter_status VARCHAR(32) NOT NULL DEFAULT 'PENDING' COMMENT '过滤状态：PENDING/PASS/REJECT',
			
 
				+  filter_reason VARCHAR(255) DEFAULT NULL COMMENT '过滤原因',
			
 
				+  is_selected TINYINT NOT NULL DEFAULT 0 COMMENT '是否入选最终结果（1是0否）',
			
 
				+  rank_no INT DEFAULT NULL COMMENT '最终排序名次（从1开始）',
			
 
				+  score_version VARCHAR(32) DEFAULT 'v1' COMMENT '评分模型版本',
			
 
				+  score_detail JSON DEFAULT NULL COMMENT '评分明细（权重、中间分）',
			
 
				+  created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
			
 
				+  updated_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
			
 
				+  PRIMARY KEY (id),
			
 
				+  UNIQUE KEY uk_task_candidate_score (task_id, candidate_id),
			
 
				+  KEY idx_task_selected_rank (task_id, is_selected, rank_no),
			
 
				+  KEY idx_task_total_score (task_id, total_score)
			
 
				+) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='内容评分与筛选结果表';
			
 
				+
			
 
				+
			
 
				+-- =========================================================
			
 
				+-- 8) 抓取计划映射表：与AIGC平台交互记录
			
 
				+-- =========================================================
			
 
				+CREATE TABLE IF NOT EXISTS supply_crawler_plan (
			
 
				+  id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT COMMENT '自增主键',
			
 
				+  task_id VARCHAR(64) NOT NULL COMMENT '关联任务ID',
			
 
				+  platform VARCHAR(32) NOT NULL DEFAULT 'weixin' COMMENT '平台',
			
 
				+  plan_type VARCHAR(32) NOT NULL COMMENT '建计划方式：BY_CONTENT/BY_ACCOUNT',
			
 
				+  input_count INT NOT NULL DEFAULT 0 COMMENT '输入内容/账号数量',
			
 
				+  crawler_plan_id VARCHAR(128) DEFAULT NULL COMMENT 'AIGC爬取计划ID',
			
 
				+  crawler_plan_name VARCHAR(255) DEFAULT NULL COMMENT 'AIGC爬取计划名称',
			
 
				+  produce_plan_id VARCHAR(128) DEFAULT NULL COMMENT '绑定的生成计划ID（可空）',
			
 
				+  produce_bind_status VARCHAR(32) DEFAULT NULL COMMENT '绑定状态：NOT_REQUIRED/SUCCESS/FAILED',
			
 
				+  request_payload JSON NOT NULL COMMENT '创建计划请求体',
			
 
				+  response_payload JSON DEFAULT NULL COMMENT '创建计划响应体',
			
 
				+  plan_status VARCHAR(32) NOT NULL COMMENT '计划状态：CREATED/FAILED/BIND_SUCCESS/BIND_FAILED',
			
 
				+  error_code VARCHAR(64) DEFAULT NULL COMMENT '失败错误码',
			
 
				+  error_message TEXT COMMENT '失败错误信息',
			
 
				+  created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
			
 
				+  updated_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
			
 
				+  PRIMARY KEY (id),
			
 
				+  KEY idx_task_id (task_id),
			
 
				+  KEY idx_crawler_plan_id (crawler_plan_id),
			
 
				+  KEY idx_plan_status (plan_status)
			
 
				+) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='AIGC抓取计划映射表';
			
 
				+
			
 
				+
			
 
				+-- =========================================================
			
 
				+-- 9) 事件日志表：完整可观测性与审计
			
 
				+-- =========================================================
			
 
				+CREATE TABLE IF NOT EXISTS supply_task_event (
			
 
				+  id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT COMMENT '自增主键',
			
 
				+  task_id VARCHAR(64) NOT NULL COMMENT '关联任务ID',
			
 
				+  stage_name VARCHAR(32) DEFAULT NULL COMMENT '所属阶段',
			
 
				+  event_type VARCHAR(64) NOT NULL COMMENT '事件类型：STATE_CHANGE/TOOL_CALL/TOOL_RESULT/RETRY/ERROR/MANUAL_INTERVENTION',
			
 
				+  event_level VARCHAR(16) NOT NULL DEFAULT 'INFO' COMMENT '日志级别：DEBUG/INFO/WARN/ERROR',
			
 
				+  tool_name VARCHAR(64) DEFAULT NULL COMMENT '工具名（如 weixin_search）',
			
 
				+  request_id VARCHAR(64) DEFAULT NULL COMMENT '外部请求ID（可用于链路追踪）',
			
 
				+  event_payload JSON DEFAULT NULL COMMENT '事件详情（入参/出参摘要）',
			
 
				+  message VARCHAR(1000) DEFAULT NULL COMMENT '事件描述',
			
 
				+  created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
			
 
				+  PRIMARY KEY (id),
			
 
				+  KEY idx_task_created (task_id, created_at),
			
 
				+  KEY idx_stage_type (stage_name, event_type),
			
 
				+  KEY idx_tool_name (tool_name),
			
 
				+  KEY idx_event_level (event_level)
			
 
				+) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='任务事件日志表';
			
 
				+
			
 
				+SET FOREIGN_KEY_CHECKS = 1;
			
--- a/docs/database-tool.md
+++ b/docs/database-tool.md
@@ -0,0 +1,685 @@
 
				+# 数据库工具与交互文档
			
 
				+
			
 
				+本文档描述 Search Agent 系统与 MySQL 数据库的交互机制、表结构设计、环境配置和使用指南。
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 1. 数据库交互架构
			
 
				+
			
 
				+### 1.1 核心组件
			
 
				+
			
 
				+```
			
 
				+┌─────────────────────────────────────────────────────────┐
			
 
				+│                   Pipeline 执行层                        │
			
 
				+│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  │
			
 
				+│  │ DemandAnalysis│  │ContentSearch │  │QualityFilter │  │
			
 
				+│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘  │
			
 
				+│         │                  │                  │          │
			
 
				+│         └──────────────────┼──────────────────┘          │
			
 
				+│                            │                             │
			
 
				+│                    ┌───────▼────────┐                    │
			
 
				+│                    │ DatabaseHook   │                    │
			
 
				+│                    │ (旁路观测)      │                    │
			
 
				+│                    └───────┬────────┘                    │
			
 
				+└────────────────────────────┼──────────────────────────────┘
			
 
				+                             │
			
 
				+                    ┌────────▼─────────┐
			
 
				+                    │ AsyncMySQLPool   │
			
 
				+                    │ (连接池管理)      │
			
 
				+                    └────────┬─────────┘
			
 
				+                             │
			
 
				+                    ┌────────▼─────────┐
			
 
				+                    │  MySQL Database  │
			
 
				+                    │  (better_me)     │
			
 
				+                    └──────────────────┘
			
 
				+```
			
 
				+
			
 
				+### 1.2 设计原则
			
 
				+
			
 
				+1. **旁路特性**：数据库持久化失败不阻断主流程，只记录警告日志
			
 
				+2. **异步非阻塞**：使用 `aiomysql` 连接池，避免阻塞 Agent 执行
			
 
				+3. **幂等性保证**：通过 `idempotency_key` 防止重复任务创建
			
 
				+4. **可追溯性**：每个任务关联唯一 `task_id` 和 `trace_id`，全链路可追踪
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 2. 环境配置
			
 
				+
			
 
				+### 2.1 必需的环境变量
			
 
				+
			
 
				+在 `.env` 文件中配置以下变量：
			
 
				+
			
 
				+```bash
			
 
				+# MySQL 数据库连接配置
			
 
				+SEARCH_AGENT_DB_HOST=localhost
			
 
				+SEARCH_AGENT_DB_PORT=3306
			
 
				+SEARCH_AGENT_DB_USER=your_username
			
 
				+SEARCH_AGENT_DB_PASSWORD=your_password
			
 
				+SEARCH_AGENT_DB_DB=better_me
			
 
				+
			
 
				+# 可选：连接池配置
			
 
				+SEARCH_AGENT_DB_MINSIZE=5
			
 
				+SEARCH_AGENT_DB_MAXSIZE=20
			
 
				+SEARCH_AGENT_DB_CHARSET=utf8mb4
			
 
				+```
			
 
				+
			
 
				+### 2.2 配置类说明
			
 
				+
			
 
				+配置通过 `SearchAgentMySQLConfig` 类管理（位于 `src/config/database/mysql_config.py`）：
			
 
				+
			
 
				+| 字段 | 默认值 | 说明 |
			
 
				+|------|--------|------|
			
 
				+| `host` | `localhost` | MySQL 服务器地址 |
			
 
				+| `port` | `3306` | MySQL 端口 |
			
 
				+| `user` | `""` | 数据库用户名（必填） |
			
 
				+| `password` | `""` | 数据库密码（必填） |
			
 
				+| `db` | `""` | 数据库名称（必填） |
			
 
				+| `charset` | `utf8mb4` | 字符集 |
			
 
				+| `minsize` | `5` | 连接池最小连接数 |
			
 
				+| `maxsize` | `20` | 连接池最大连接数 |
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 3. 数据库表结构
			
 
				+
			
 
				+### 3.1 表关系图
			
 
				+
			
 
				+```
			
 
				+supply_demand_snapshot (需求快照)
			
 
				+         │
			
 
				+         │ 1:N
			
 
				+         ▼
			
 
				+supply_task (任务主表) ──────┐
			
 
				+         │                   │
			
 
				+         │ 1:N               │ 1:N
			
 
				+         ▼                   ▼
			
 
				+supply_task_stage      supply_task_event
			
 
				+(阶段执行明细)          (事件日志)
			
 
				+         │
			
 
				+         │ 1:N
			
 
				+         ▼
			
 
				+supply_candidate_content (候选内容池)
			
 
				+         │
			
 
				+         │ 1:1
			
 
				+         ├──────────────────┬──────────────────┐
			
 
				+         ▼                  ▼                  ▼
			
 
				+supply_content_feature  supply_content_score  supply_account_profile
			
 
				+(内容特征)              (评分结果)            (账号画像)
			
 
				+```
			
 
				+
			
 
				+### 3.2 核心表说明
			
 
				+
			
 
				+#### 3.2.1 supply_demand_snapshot (需求快照表)
			
 
				+**用途**：冻结上游需求上下文，便于回溯和审计
			
 
				+
			
 
				+| 核心字段 | 类型 | 说明 |
			
 
				+|---------|------|------|
			
 
				+| `demand_id` | BIGINT | 上游需求 ID（业务侧主键） |
			
 
				+| `query` | VARCHAR(255) | 核心搜索词 |
			
 
				+| `query_expansion` | JSON | 扩展词列表（同义词/场景词） |
			
 
				+| `expected_count` | INT | 期望返回条数 |
			
 
				+| `audience_profile` | JSON | 目标人群画像 |
			
 
				+
			
 
				+#### 3.2.2 supply_task (任务主表)
			
 
				+**用途**：状态机核心，记录任务全生命周期
			
 
				+
			
 
				+| 核心字段 | 类型 | 说明 |
			
 
				+|---------|------|------|
			
 
				+| `task_id` | VARCHAR(64) | 任务唯一 ID（UUID） |
			
 
				+| `trace_id` | VARCHAR(64) | Agent trace_id，串联整个执行链路 |
			
 
				+| `status` | VARCHAR(32) | 任务状态（见下方状态机） |
			
 
				+| `current_stage` | VARCHAR(32) | 当前执行阶段 |
			
 
				+| `idempotency_key` | VARCHAR(128) | 幂等键（防重复） |
			
 
				+| `error_code` / `error_message` | VARCHAR / TEXT | 错误信息 |
			
 
				+
			
 
				+**任务状态流转**：
			
 
				+```
			
 
				+PENDING → RUNNING → RECALL_DONE → ENRICHED → RANKED → PLAN_CREATED → COMPLETED
			
 
				+                                                                    ↘
			
 
				+                                                                    FAILED
			
 
				+```
			
 
				+
			
 
				+#### 3.2.3 supply_task_stage (阶段执行明细表)
			
 
				+**用途**：记录每个 Pipeline 阶段的执行情况
			
 
				+
			
 
				+| 核心字段 | 类型 | 说明 |
			
 
				+|---------|------|------|
			
 
				+| `stage_name` | VARCHAR(32) | 阶段名（RECALL/RANK/CREATE_PLAN 等） |
			
 
				+| `stage_status` | VARCHAR(32) | 阶段状态（PENDING/RUNNING/SUCCESS/FAILED） |
			
 
				+| `attempt_no` | INT | 第几次执行（支持重试） |
			
 
				+| `duration_ms` | BIGINT | 阶段耗时（毫秒） |
			
 
				+| `input_payload` / `output_payload` | JSON | 输入输出快照 |
			
 
				+
			
 
				+#### 3.2.4 supply_candidate_content (候选内容池)
			
 
				+**用途**：存储搜索召回的原始候选文章
			
 
				+
			
 
				+| 核心字段 | 类型 | 说明 |
			
 
				+|---------|------|------|
			
 
				+| `source_keyword` | VARCHAR(255) | 触发召回的关键词 |
			
 
				+| `recall_round` | INT | 第几轮召回 |
			
 
				+| `title` / `url` | VARCHAR | 文章标题和链接 |
			
 
				+| `account_id` / `account_name` | VARCHAR | 公众号信息 |
			
 
				+| `view_count` / `like_count` | BIGINT | 阅读量/点赞量 |
			
 
				+| `dedup_hash` | CHAR(64) | 去重哈希（title+url 的 SHA256） |
			
 
				+| `quality_flag` | VARCHAR(32) | 初步质量标记（UNKNOWN/PASS/REJECT） |
			
 
				+
			
 
				+#### 3.2.5 supply_content_score (评分与筛选结果表)
			
 
				+**用途**：存储最终排序与入选结果
			
 
				+
			
 
				+| 核心字段 | 类型 | 说明 |
			
 
				+|---------|------|------|
			
 
				+| `relevance_score` | DECIMAL(5,2) | 相关性分（0-100） |
			
 
				+| `quality_score` | DECIMAL(5,2) | 内容质量分（0-100） |
			
 
				+| `total_score` | DECIMAL(6,2) | 综合分 |
			
 
				+| `is_selected` | TINYINT | 是否入选最终结果 |
			
 
				+| `rank_no` | INT | 最终排序名次 |
			
 
				+| `score_detail` | JSON | 评分明细（权重、中间分） |
			
 
				+
			
 
				+#### 3.2.6 supply_account_profile (账号画像表)
			
 
				+**用途**：账号维度的补全与打分输入
			
 
				+
			
 
				+| 核心字段 | 类型 | 说明 |
			
 
				+|---------|------|------|
			
 
				+| `account_id` | VARCHAR(128) | 公众号 ID（wx_gh） |
			
 
				+| `credibility_score` | DECIMAL(5,2) | 可信度评分（0-100） |
			
 
				+| `risk_tags` | JSON | 风险标签（标题党/营销号） |
			
 
				+| `median_view_count` | BIGINT | 历史阅读量中位数 |
			
 
				+
			
 
				+#### 3.2.7 supply_task_event (事件日志表)
			
 
				+**用途**：完整可观测性与审计
			
 
				+
			
 
				+| 核心字段 | 类型 | 说明 |
			
 
				+|---------|------|------|
			
 
				+| `event_type` | VARCHAR(64) | 事件类型（STATE_CHANGE/TOOL_CALL/ERROR） |
			
 
				+| `event_level` | VARCHAR(16) | 日志级别（DEBUG/INFO/WARN/ERROR） |
			
 
				+| `tool_name` | VARCHAR(64) | 工具名（如 weixin_search） |
			
 
				+| `event_payload` | JSON | 事件详情（入参/出参摘要） |
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 4. 数据库初始化
			
 
				+
			
 
				+### 4.1 创建数据库
			
 
				+
			
 
				+```sql
			
 
				+CREATE DATABASE IF NOT EXISTS better_me
			
 
				+  DEFAULT CHARACTER SET utf8mb4
			
 
				+  DEFAULT COLLATE utf8mb4_unicode_ci;
			
 
				+
			
 
				+USE better_me;
			
 
				+```
			
 
				+
			
 
				+### 4.2 执行建表 SQL
			
 
				+
			
 
				+完整的建表 SQL 请参考本文档末尾的「附录 A：完整建表 SQL」。
			
 
				+
			
 
				+执行方式：
			
 
				+```bash
			
 
				+# 方式 1：通过 MySQL 客户端
			
 
				+mysql -u your_username -p better_me < docs/database-schema.sql
			
 
				+
			
 
				+# 方式 2：通过命令行直接执行
			
 
				+mysql -u your_username -p -e "source /path/to/database-schema.sql"
			
 
				+```
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 5. Hook 工作机制
			
 
				+
			
 
				+### 5.1 DatabasePersistHook 生命周期
			
 
				+
			
 
				+`DatabasePersistHook` 实现了 `PipelineHook` 接口，在 Pipeline 各个关键节点被调用：
			
 
				+
			
 
				+```python
			
 
				+class DatabasePersistHook(PipelineHook):
			
 
				+    async def on_pipeline_start(ctx):
			
 
				+        # 1. 写入需求快照
			
 
				+        # 2. 创建任务主记录（status=RUNNING）
			
 
				+        # 3. 写入 INIT 事件
			
 
				+
			
 
				+    async def on_stage_complete(stage_name, ctx):
			
 
				+        # 1. 写入阶段执行记录
			
 
				+        # 2. 根据阶段名写入业务表：
			
 
				+        #    - content_search → supply_candidate_content
			
 
				+        #    - quality_filter → supply_content_score
			
 
				+        #    - account_precipitate → supply_account_profile
			
 
				+        # 3. 更新任务状态
			
 
				+
			
 
				+    async def on_gate_check(gate_name, result, ctx):
			
 
				+        # 写入门禁检查事件（通过/拦截/回退）
			
 
				+
			
 
				+    async def on_error(stage_name, error, ctx):
			
 
				+        # 1. 更新任务状态为 FAILED
			
 
				+        # 2. 写入错误事件
			
 
				+
			
 
				+    async def on_pipeline_complete(ctx):
			
 
				+        # 写入 FINALIZE 事件
			
 
				+```
			
 
				+
			
 
				+### 5.2 错误处理策略
			
 
				+
			
 
				+所有数据库操作均采用 **try-except + warning 日志** 模式：
			
 
				+
			
 
				+```python
			
 
				+try:
			
 
				+    await self.pool.async_save(sql, params)
			
 
				+except Exception as exc:
			
 
				+    logger.warning("persist task skipped: %s", exc)
			
 
				+    # 不抛出异常，不阻断主流程
			
 
				+```
			
 
				+
			
 
				+这确保了即使数据库不可用，Pipeline 仍能正常执行。
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 6. 使用指南
			
 
				+
			
 
				+### 6.1 启用数据库持久化
			
 
				+
			
 
				+在 `src/pipeline/runner.py` 或 `run_search_agent.py` 中注册 Hook：
			
 
				+
			
 
				+```python
			
 
				+from src.pipeline.hooks import DatabasePersistHook
			
 
				+
			
 
				+# 创建 Pipeline
			
 
				+pipeline = PipelineOrchestrator(...)
			
 
				+
			
 
				+# 注册数据库 Hook
			
 
				+pipeline.add_hook(DatabasePersistHook())
			
 
				+
			
 
				+# 运行 Pipeline
			
 
				+await pipeline.run(ctx)
			
 
				+```
			
 
				+
			
 
				+### 6.2 查询任务执行记录
			
 
				+
			
 
				+```sql
			
 
				+-- 查询最近 10 个任务
			
 
				+SELECT task_id, status, current_stage, started_at, duration_ms
			
 
				+FROM supply_task
			
 
				+ORDER BY created_at DESC
			
 
				+LIMIT 10;
			
 
				+
			
 
				+-- 查询某个任务的所有阶段执行记录
			
 
				+SELECT stage_name, stage_status, attempt_no, duration_ms
			
 
				+FROM supply_task_stage
			
 
				+WHERE task_id = 'your-task-id'
			
 
				+ORDER BY id;
			
 
				+
			
 
				+-- 查询某个任务的候选文章
			
 
				+SELECT title, url, view_count, quality_flag
			
 
				+FROM supply_candidate_content
			
 
				+WHERE task_id = 'your-task-id'
			
 
				+ORDER BY view_count DESC;
			
 
				+
			
 
				+-- 查询某个任务的最终入选文章
			
 
				+SELECT c.title, c.url, s.total_score, s.rank_no
			
 
				+FROM supply_candidate_content c
			
 
				+JOIN supply_content_score s ON c.id = s.candidate_id
			
 
				+WHERE s.task_id = 'your-task-id' AND s.is_selected = 1
			
 
				+ORDER BY s.rank_no;
			
 
				+```
			
 
				+
			
 
				+### 6.3 监控与告警
			
 
				+
			
 
				+```sql
			
 
				+-- 统计最近 24 小时的任务成功率
			
 
				+SELECT
			
 
				+    COUNT(*) as total,
			
 
				+    SUM(CASE WHEN status = 'COMPLETED' THEN 1 ELSE 0 END) as success,
			
 
				+    ROUND(SUM(CASE WHEN status = 'COMPLETED' THEN 1 ELSE 0 END) * 100.0 / COUNT(*), 2) as success_rate
			
 
				+FROM supply_task
			
 
				+WHERE created_at >= DATE_SUB(NOW(), INTERVAL 24 HOUR);
			
 
				+
			
 
				+-- 查询失败任务的错误分布
			
 
				+SELECT error_code, COUNT(*) as count
			
 
				+FROM supply_task
			
 
				+WHERE status = 'FAILED' AND created_at >= DATE_SUB(NOW(), INTERVAL 7 DAY)
			
 
				+GROUP BY error_code
			
 
				+ORDER BY count DESC;
			
 
				+```
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 7. 故障排查
			
 
				+
			
 
				+### 7.1 常见错误
			
 
				+
			
 
				+#### 错误 1：Table doesn't exist
			
 
				+```
			
 
				+WARNING - persist stage skipped: (1146, "Table 'better_me.supply_task_stage' doesn't exist")
			
 
				+```
			
 
				+
			
 
				+**原因**：数据库表未创建
			
 
				+**解决**：执行本文档末尾的完整建表 SQL
			
 
				+
			
 
				+#### 错误 2：Connection refused
			
 
				+```
			
 
				+WARNING - DatabasePersistHook init skipped: (2003, "Can't connect to MySQL server on 'localhost'")
			
 
				+```
			
 
				+
			
 
				+**原因**：MySQL 服务未启动或连接配置错误
			
 
				+**解决**：
			
 
				+1. 检查 MySQL 服务状态：`systemctl status mysql` 或 `brew services list`
			
 
				+2. 验证 `.env` 中的连接配置
			
 
				+3. 测试连接：`mysql -h localhost -u your_user -p`
			
 
				+
			
 
				+#### 错误 3：Access denied
			
 
				+```
			
 
				+WARNING - DatabasePersistHook init skipped: (1045, "Access denied for user 'xxx'@'localhost'")
			
 
				+```
			
 
				+
			
 
				+**原因**：用户名或密码错误，或用户无权限
			
 
				+**解决**：
			
 
				+```sql
			
 
				+-- 创建用户并授权
			
 
				+CREATE USER 'your_user'@'localhost' IDENTIFIED BY 'your_password';
			
 
				+GRANT ALL PRIVILEGES ON better_me.* TO 'your_user'@'localhost';
			
 
				+FLUSH PRIVILEGES;
			
 
				+```
			
 
				+
			
 
				+### 7.2 调试模式
			
 
				+
			
 
				+启用详细日志：
			
 
				+
			
 
				+```python
			
 
				+import logging
			
 
				+logging.getLogger('src.pipeline.hooks.db_hook').setLevel(logging.DEBUG)
			
 
				+logging.getLogger('src.infra.database').setLevel(logging.DEBUG)
			
 
				+```
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 8. 性能优化建议
			
 
				+
			
 
				+1. **批量插入**：候选文章和账号画像使用 `executemany` 批量插入
			
 
				+2. **索引优化**：已为常用查询字段创建索引（task_id, status, created_at 等）
			
 
				+3. **分区表**：若 `supply_task_event` 数据量大，建议按月分区
			
 
				+4. **连接池调优**：根据并发量调整 `minsize` 和 `maxsize`
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 附录 A：完整建表 SQL
			
 
				+
			
 
				+将以下 SQL 保存为 `docs/database-schema.sql` 并执行：
			
 
				+
			
 
				+```sql
			
 
				+SET NAMES utf8mb4;
			
 
				+SET FOREIGN_KEY_CHECKS = 0;
			
 
				+
			
 
				+-- =========================================================
			
 
				+-- 1) 上游需求快照表：冻结需求上下文，便于回溯
			
 
				+-- =========================================================
			
 
				+CREATE TABLE IF NOT EXISTS supply_demand_snapshot (
			
 
				+  id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT COMMENT '自增主键',
			
 
				+  demand_id BIGINT NOT NULL COMMENT '上游需求ID（业务侧主键）',
			
 
				+  demand_code VARCHAR(64) DEFAULT NULL COMMENT '上游需求编码（可选）',
			
 
				+  query VARCHAR(255) NOT NULL COMMENT '核心搜索词',
			
 
				+  query_expansion JSON DEFAULT NULL COMMENT '扩展词列表（同义词/场景词/禁用词等）',
			
 
				+  platform VARCHAR(32) NOT NULL DEFAULT 'weixin' COMMENT '目标平台，当前为 weixin',
			
 
				+  expected_count INT NOT NULL DEFAULT 10 COMMENT '期望返回条数',
			
 
				+  audience_profile JSON DEFAULT NULL COMMENT '目标人群画像（如50+）',
			
 
				+  quality_constraints JSON DEFAULT NULL COMMENT '质量约束（最低阅读量、黑名单词等）',
			
 
				+  source_payload JSON DEFAULT NULL COMMENT '上游原始请求快照',
			
 
				+  version INT NOT NULL DEFAULT 1 COMMENT '快照版本号（同一 demand_id 可多版本）',
			
 
				+  created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
			
 
				+  updated_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
			
 
				+  PRIMARY KEY (id),
			
 
				+  KEY idx_demand_id (demand_id),
			
 
				+  KEY idx_platform (platform),
			
 
				+  KEY idx_created_at (created_at)
			
 
				+) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='上游需求快照表';
			
 
				+
			
 
				+
			
 
				+-- =========================================================
			
 
				+-- 2) 任务主表：一个 demand 可触发多次任务（重跑/补跑）
			
 
				+-- =========================================================
			
 
				+CREATE TABLE IF NOT EXISTS supply_task (
			
 
				+  id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT COMMENT '自增主键',
			
 
				+  task_id VARCHAR(64) NOT NULL COMMENT '任务唯一ID（建议UUID/雪花ID）',
			
 
				+  demand_snapshot_id BIGINT UNSIGNED NOT NULL COMMENT '关联 supply_demand_snapshot.id',
			
 
				+  demand_id BIGINT NOT NULL COMMENT '冗余字段：上游需求ID',
			
 
				+  trace_id VARCHAR(64) DEFAULT NULL COMMENT 'Agent trace_id，便于追踪模型执行',
			
 
				+  status VARCHAR(32) NOT NULL COMMENT '任务状态：PENDING/RUNNING/RECALL_DONE/ENRICHED/RANKED/PLAN_CREATED/COMPLETED/FAILED/CANCELLED',
			
 
				+  current_stage VARCHAR(32) DEFAULT NULL COMMENT '当前阶段：INIT/RECALL/ACCOUNT_ENRICH/ACCOUNT_PROFILE/CONTENT_ENRICH/RANK/CREATE_PLAN/FINALIZE',
			
 
				+  priority TINYINT NOT NULL DEFAULT 5 COMMENT '优先级（1最高，9最低）',
			
 
				+  retry_count INT NOT NULL DEFAULT 0 COMMENT '已重试次数',
			
 
				+  max_retry INT NOT NULL DEFAULT 3 COMMENT '最大重试次数',
			
 
				+  is_idempotent TINYINT NOT NULL DEFAULT 1 COMMENT '是否启用幂等（1是0否）',
			
 
				+  idempotency_key VARCHAR(128) DEFAULT NULL COMMENT '幂等键（可由 demand_id+query_hash 组成）',
			
 
				+  started_at DATETIME DEFAULT NULL COMMENT '任务启动时间',
			
 
				+  finished_at DATETIME DEFAULT NULL COMMENT '任务完成时间',
			
 
				+  duration_ms BIGINT DEFAULT NULL COMMENT '总耗时（毫秒）',
			
 
				+  error_code VARCHAR(64) DEFAULT NULL COMMENT '错误码（网络超时/参数错误/平台限流等）',
			
 
				+  error_message TEXT COMMENT '错误详情',
			
 
				+  operator VARCHAR(64) DEFAULT 'agent' COMMENT '执行方（agent/manual/cron）',
			
 
				+  ext JSON DEFAULT NULL COMMENT '扩展字段（灰度标记、AB实验信息等）',
			
 
				+  created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
			
 
				+  updated_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
			
 
				+  PRIMARY KEY (id),
			
 
				+  UNIQUE KEY uk_task_id (task_id),
			
 
				+  UNIQUE KEY uk_idempotency_key (idempotency_key),
			
 
				+  KEY idx_demand_id (demand_id),
			
 
				+  KEY idx_status_priority (status, priority),
			
 
				+  KEY idx_current_stage (current_stage),
			
 
				+  KEY idx_created_at (created_at)
			
 
				+) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='供给任务主表（状态机核心）';
			
 
				+
			
 
				+
			
 
				+-- =========================================================
			
 
				+-- 3) 阶段执行表：记录每个阶段的执行情况
			
 
				+-- =========================================================
			
 
				+CREATE TABLE IF NOT EXISTS supply_task_stage (
			
 
				+  id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT COMMENT '自增主键',
			
 
				+  task_id VARCHAR(64) NOT NULL COMMENT '关联任务ID',
			
 
				+  stage_name VARCHAR(32) NOT NULL COMMENT '阶段名：RECALL/ACCOUNT_ENRICH/ACCOUNT_PROFILE/CONTENT_ENRICH/RANK/CREATE_PLAN',
			
 
				+  stage_status VARCHAR(32) NOT NULL COMMENT '阶段状态：PENDING/RUNNING/SUCCESS/FAILED/SKIPPED',
			
 
				+  attempt_no INT NOT NULL DEFAULT 1 COMMENT '该阶段第几次执行',
			
 
				+  input_payload JSON DEFAULT NULL COMMENT '阶段输入快照',
			
 
				+  output_payload JSON DEFAULT NULL COMMENT '阶段输出快照（摘要）',
			
 
				+  started_at DATETIME DEFAULT NULL COMMENT '阶段开始时间',
			
 
				+  finished_at DATETIME DEFAULT NULL COMMENT '阶段结束时间',
			
 
				+  duration_ms BIGINT DEFAULT NULL COMMENT '阶段耗时（毫秒）',
			
 
				+  error_code VARCHAR(64) DEFAULT NULL COMMENT '阶段错误码',
			
 
				+  error_message TEXT COMMENT '阶段错误信息',
			
 
				+  created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
			
 
				+  updated_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
			
 
				+  PRIMARY KEY (id),
			
 
				+  KEY idx_task_stage (task_id, stage_name),
			
 
				+  KEY idx_stage_status (stage_status),
			
 
				+  KEY idx_started_at (started_at)
			
 
				+) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='任务阶段执行明细表';
			
 
				+
			
 
				+
			
 
				+-- =========================================================
			
 
				+-- 4) 候选内容池：搜索召回到的原始候选
			
 
				+-- =========================================================
			
 
				+CREATE TABLE IF NOT EXISTS supply_candidate_content (
			
 
				+  id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT COMMENT '自增主键',
			
 
				+  task_id VARCHAR(64) NOT NULL COMMENT '关联任务ID',
			
 
				+  source_keyword VARCHAR(255) DEFAULT NULL COMMENT '触发召回的关键词',
			
 
				+  recall_round INT NOT NULL DEFAULT 1 COMMENT '第几轮召回（核心词/扩展词/场景词）',
			
 
				+  recall_page INT DEFAULT NULL COMMENT '召回分页页码/游标序号',
			
 
				+  platform VARCHAR(32) NOT NULL DEFAULT 'weixin' COMMENT '来源平台',
			
 
				+  content_id VARCHAR(128) DEFAULT NULL COMMENT '平台内容ID（若可提取）',
			
 
				+  title VARCHAR(512) NOT NULL COMMENT '文章标题',
			
 
				+  url VARCHAR(1024) NOT NULL COMMENT '文章链接（业务唯一候选）',
			
 
				+  digest VARCHAR(1000) DEFAULT NULL COMMENT '文章摘要',
			
 
				+  cover_url VARCHAR(1024) DEFAULT NULL COMMENT '封面图链接',
			
 
				+  publish_time DATETIME DEFAULT NULL COMMENT '发布时间',
			
 
				+  account_id VARCHAR(128) DEFAULT NULL COMMENT '公众号ID（wx_gh）',
			
 
				+  account_name VARCHAR(255) DEFAULT NULL COMMENT '公众号名称',
			
 
				+  biz_info VARCHAR(128) DEFAULT NULL COMMENT '公众号 biz 标识',
			
 
				+  view_count BIGINT DEFAULT NULL COMMENT '阅读量（若有）',
			
 
				+  like_count BIGINT DEFAULT NULL COMMENT '点赞量（若有）',
			
 
				+  comment_count BIGINT DEFAULT NULL COMMENT '评论量（若有）',
			
 
				+  favorite_count BIGINT DEFAULT NULL COMMENT '收藏量（若有）',
			
 
				+  share_count BIGINT DEFAULT NULL COMMENT '分享量（若有）',
			
 
				+  raw_statistics JSON DEFAULT NULL COMMENT '原始统计字段',
			
 
				+  raw_payload JSON DEFAULT NULL COMMENT '原始API返回（单条）',
			
 
				+  dedup_hash CHAR(64) DEFAULT NULL COMMENT '去重哈希（建议 title+url 归一化后sha256）',
			
 
				+  quality_flag VARCHAR(32) DEFAULT 'UNKNOWN' COMMENT '初步质量标记：UNKNOWN/PASS/REJECT',
			
 
				+  reject_reason VARCHAR(255) DEFAULT NULL COMMENT '初筛淘汰原因',
			
 
				+  is_deleted TINYINT NOT NULL DEFAULT 0 COMMENT '软删除标记（1删除）',
			
 
				+  created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
			
 
				+  updated_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
			
 
				+  PRIMARY KEY (id),
			
 
				+  UNIQUE KEY uk_task_url (task_id, url(255)),
			
 
				+  KEY idx_task_round (task_id, recall_round),
			
 
				+  KEY idx_task_account (task_id, account_id),
			
 
				+  KEY idx_publish_time (publish_time),
			
 
				+  KEY idx_quality_flag (quality_flag),
			
 
				+  KEY idx_dedup_hash (dedup_hash)
			
 
				+) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='候选内容池（召回原始结果）';
			
 
				+
			
 
				+
			
 
				+-- =========================================================
			
 
				+-- 5) 账号画像表：账号维度的补全与打分输入
			
 
				+-- =========================================================
			
 
				+CREATE TABLE IF NOT EXISTS supply_account_profile (
			
 
				+  id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT COMMENT '自增主键',
			
 
				+  task_id VARCHAR(64) NOT NULL COMMENT '关联任务ID',
			
 
				+  account_id VARCHAR(128) NOT NULL COMMENT '公众号ID（wx_gh）',
			
 
				+  account_name VARCHAR(255) DEFAULT NULL COMMENT '账号名称',
			
 
				+  biz_info VARCHAR(128) DEFAULT NULL COMMENT '账号biz标识',
			
 
				+  intro TEXT COMMENT '账号简介',
			
 
				+  verify_status TINYINT DEFAULT NULL COMMENT '认证状态（0未认证1认证，按平台定义）',
			
 
				+  category VARCHAR(64) DEFAULT NULL COMMENT '账号分类（健康/资讯/情感等）',
			
 
				+  update_frequency VARCHAR(32) DEFAULT NULL COMMENT '更新频率（daily/weekly/irregular）',
			
 
				+  recent_article_count INT DEFAULT 0 COMMENT '近周期发文数',
			
 
				+  median_view_count BIGINT DEFAULT NULL COMMENT '历史阅读量中位数',
			
 
				+  median_like_count BIGINT DEFAULT NULL COMMENT '历史点赞量中位数',
			
 
				+  vertical_score DECIMAL(5,2) DEFAULT 0 COMMENT '垂直度评分（0-100）',
			
 
				+  stability_score DECIMAL(5,2) DEFAULT 0 COMMENT '稳定更新评分（0-100）',
			
 
				+  credibility_score DECIMAL(5,2) DEFAULT 0 COMMENT '可信度评分（0-100）',
			
 
				+  risk_tags JSON DEFAULT NULL COMMENT '风险标签（标题党/营销号/违规高风险）',
			
 
				+  raw_profile_payload JSON DEFAULT NULL COMMENT '账号详情原始返回',
			
 
				+  raw_history_payload JSON DEFAULT NULL COMMENT '历史文章原始返回（摘要）',
			
 
				+  created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
			
 
				+  updated_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
			
 
				+  PRIMARY KEY (id),
			
 
				+  UNIQUE KEY uk_task_account (task_id, account_id),
			
 
				+  KEY idx_task_credibility (task_id, credibility_score),
			
 
				+  KEY idx_category (category)
			
 
				+) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='账号画像表（账号维度特征）';
			
 
				+
			
 
				+
			
 
				+-- =========================================================
			
 
				+-- 6) 内容特征表：文章详情解析后的质量特征
			
 
				+-- =========================================================
			
 
				+CREATE TABLE IF NOT EXISTS supply_content_feature (
			
 
				+  id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT COMMENT '自增主键',
			
 
				+  task_id VARCHAR(64) NOT NULL COMMENT '关联任务ID',
			
 
				+  candidate_id BIGINT UNSIGNED NOT NULL COMMENT '关联候选内容ID',
			
 
				+  content_length INT DEFAULT NULL COMMENT '正文长度（字符数）',
			
 
				+  image_count INT DEFAULT NULL COMMENT '图片数量',
			
 
				+  video_count INT DEFAULT NULL COMMENT '视频数量',
			
 
				+  has_source_reference TINYINT DEFAULT 0 COMMENT '是否包含来源引用（0否1是）',
			
 
				+  readability_score DECIMAL(5,2) DEFAULT 0 COMMENT '可读性评分（0-100）',
			
 
				+  information_density_score DECIMAL(5,2) DEFAULT 0 COMMENT '信息密度评分（0-100）',
			
 
				+  elder_friendly_score DECIMAL(5,2) DEFAULT 0 COMMENT '老年友好评分（0-100）',
			
 
				+  sentiment_score DECIMAL(5,2) DEFAULT NULL COMMENT '情绪倾向分（可选）',
			
 
				+  risk_score DECIMAL(5,2) DEFAULT 0 COMMENT '风险分（越高风险越大）',
			
 
				+  risk_reasons JSON DEFAULT NULL COMMENT '风险原因明细（夸张词/伪科学词/诱导词）',
			
 
				+  keyword_coverage JSON DEFAULT NULL COMMENT 'query关键词覆盖明细',
			
 
				+  summary TEXT COMMENT '文章摘要（抽取）',
			
 
				+  raw_detail_payload JSON DEFAULT NULL COMMENT '文章详情原始返回',
			
 
				+  created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
			
 
				+  updated_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
			
 
				+  PRIMARY KEY (id),
			
 
				+  UNIQUE KEY uk_task_candidate_feature (task_id, candidate_id),
			
 
				+  KEY idx_task_quality (task_id, readability_score, information_density_score)
			
 
				+) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='内容特征表（文章维度特征）';
			
 
				+
			
 
				+
			
 
				+-- =========================================================
			
 
				+-- 7) 评分与筛选结果表：最终排序与入选
			
 
				+-- =========================================================
			
 
				+CREATE TABLE IF NOT EXISTS supply_content_score (
			
 
				+  id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT COMMENT '自增主键',
			
 
				+  task_id VARCHAR(64) NOT NULL COMMENT '关联任务ID',
			
 
				+  candidate_id BIGINT UNSIGNED NOT NULL COMMENT '关联候选内容ID',
			
 
				+  relevance_score DECIMAL(5,2) NOT NULL DEFAULT 0 COMMENT '相关性分（0-100）',
			
 
				+  popularity_score DECIMAL(5,2) NOT NULL DEFAULT 0 COMMENT '热度分（0-100）',
			
 
				+  quality_score DECIMAL(5,2) NOT NULL DEFAULT 0 COMMENT '内容质量分（0-100）',
			
 
				+  account_score DECIMAL(5,2) NOT NULL DEFAULT 0 COMMENT '账号可信分（0-100）',
			
 
				+  elder_fit_score DECIMAL(5,2) NOT NULL DEFAULT 0 COMMENT '老年适配分（0-100）',
			
 
				+  diversity_penalty DECIMAL(5,2) NOT NULL DEFAULT 0 COMMENT '多样性惩罚分（防止同账号过多）',
			
 
				+  total_score DECIMAL(6,2) NOT NULL DEFAULT 0 COMMENT '综合分',
			
 
				+  filter_status VARCHAR(32) NOT NULL DEFAULT 'PENDING' COMMENT '过滤状态：PENDING/PASS/REJECT',
			
 
				+  filter_reason VARCHAR(255) DEFAULT NULL COMMENT '过滤原因',
			
 
				+  is_selected TINYINT NOT NULL DEFAULT 0 COMMENT '是否入选最终结果（1是0否）',
			
 
				+  rank_no INT DEFAULT NULL COMMENT '最终排序名次（从1开始）',
			
 
				+  score_version VARCHAR(32) DEFAULT 'v1' COMMENT '评分模型版本',
			
 
				+  score_detail JSON DEFAULT NULL COMMENT '评分明细（权重、中间分）',
			
 
				+  created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
			
 
				+  updated_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
			
 
				+  PRIMARY KEY (id),
			
 
				+  UNIQUE KEY uk_task_candidate_score (task_id, candidate_id),
			
 
				+  KEY idx_task_selected_rank (task_id, is_selected, rank_no),
			
 
				+  KEY idx_task_total_score (task_id, total_score)
			
 
				+) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='内容评分与筛选结果表';
			
 
				+
			
 
				+
			
 
				+-- =========================================================
			
 
				+-- 8) 抓取计划映射表：与AIGC平台交互记录
			
 
				+-- =========================================================
			
 
				+CREATE TABLE IF NOT EXISTS supply_crawler_plan (
			
 
				+  id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT COMMENT '自增主键',
			
 
				+  task_id VARCHAR(64) NOT NULL COMMENT '关联任务ID',
			
 
				+  platform VARCHAR(32) NOT NULL DEFAULT 'weixin' COMMENT '平台',
			
 
				+  plan_type VARCHAR(32) NOT NULL COMMENT '建计划方式：BY_CONTENT/BY_ACCOUNT',
			
 
				+  input_count INT NOT NULL DEFAULT 0 COMMENT '输入内容/账号数量',
			
 
				+  crawler_plan_id VARCHAR(128) DEFAULT NULL COMMENT 'AIGC爬取计划ID',
			
 
				+  crawler_plan_name VARCHAR(255) DEFAULT NULL COMMENT 'AIGC爬取计划名称',
			
 
				+  produce_plan_id VARCHAR(128) DEFAULT NULL COMMENT '绑定的生成计划ID（可空）',
			
 
				+  produce_bind_status VARCHAR(32) DEFAULT NULL COMMENT '绑定状态：NOT_REQUIRED/SUCCESS/FAILED',
			
 
				+  request_payload JSON NOT NULL COMMENT '创建计划请求体',
			
 
				+  response_payload JSON DEFAULT NULL COMMENT '创建计划响应体',
			
 
				+  plan_status VARCHAR(32) NOT NULL COMMENT '计划状态：CREATED/FAILED/BIND_SUCCESS/BIND_FAILED',
			
 
				+  error_code VARCHAR(64) DEFAULT NULL COMMENT '失败错误码',
			
 
				+  error_message TEXT COMMENT '失败错误信息',
			
 
				+  created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
			
 
				+  updated_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
			
 
				+  PRIMARY KEY (id),
			
 
				+  KEY idx_task_id (task_id),
			
 
				+  KEY idx_crawler_plan_id (crawler_plan_id),
			
 
				+  KEY idx_plan_status (plan_status)
			
 
				+) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='AIGC抓取计划映射表';
			
 
				+
			
 
				+
			
 
				+-- =========================================================
			
 
				+-- 9) 事件日志表：完整可观测性与审计
			
 
				+-- =========================================================
			
 
				+CREATE TABLE IF NOT EXISTS supply_task_event (
			
 
				+  id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT COMMENT '自增主键',
			
 
				+  task_id VARCHAR(64) NOT NULL COMMENT '关联任务ID',
			
 
				+  stage_name VARCHAR(32) DEFAULT NULL COMMENT '所属阶段',
			
 
				+  event_type VARCHAR(64) NOT NULL COMMENT '事件类型：STATE_CHANGE/TOOL_CALL/TOOL_RESULT/RETRY/ERROR/MANUAL_INTERVENTION',
			
 
				+  event_level VARCHAR(16) NOT NULL DEFAULT 'INFO' COMMENT '日志级别：DEBUG/INFO/WARN/ERROR',
			
 
				+  tool_name VARCHAR(64) DEFAULT NULL COMMENT '工具名（如 weixin_search）',
			
 
				+  request_id VARCHAR(64) DEFAULT NULL COMMENT '外部请求ID（可用于链路追踪）',
			
 
				+  event_payload JSON DEFAULT NULL COMMENT '事件详情（入参/出参摘要）',
			
 
				+  message VARCHAR(1000) DEFAULT NULL COMMENT '事件描述',
			
 
				+  created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
			
 
				+  PRIMARY KEY (id),
			
 
				+  KEY idx_task_created (task_id, created_at),
			
 
				+  KEY idx_stage_type (stage_name, event_type),
			
 
				+  KEY idx_tool_name (tool_name),
			
 
				+  KEY idx_event_level (event_level)
			
 
				+) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='任务事件日志表';
			
 
				+
			
 
				+SET FOREIGN_KEY_CHECKS = 1;
			
 
				+```
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 附录 B：快速启动检查清单
			
 
				+
			
 
				+- [ ] 已安装 MySQL 5.7+ 或 MariaDB 10.3+
			
 
				+- [ ] 已创建数据库 `better_me`
			
 
				+- [ ] 已执行完整建表 SQL
			
 
				+- [ ] 已在 `.env` 中配置数据库连接信息
			
 
				+- [ ] 已验证数据库连接：`mysql -h localhost -u your_user -p better_me`
			
 
				+- [ ] 已在 Pipeline 中注册 `DatabasePersistHook`
			
 
				+- [ ] 运行测试任务，检查表中是否有数据写入
			
--- a/docs/run_search_agent_workflow.md
+++ b/docs/run_search_agent_workflow.md
@@ -0,0 +1,387 @@
 
				+# run_search_agent.py 完整工作流程
			
 
				+
			
 
				+## 总览
			
 
				+
			
 
				+```
			
 
				+main()
			
 
				+ ├─ ① validate_prerequisites()          — 前置检查
			
 
				+ ├─ ② 读取环境变量 (query, demand_id)
			
 
				+ ├─ ③ AgentBudget.from_env() + validate — 预算约束
			
 
				+ ├─ ④ uuid4() 生成全局 trace_id
			
 
				+ ├─ ⑤ print_run_plan()                  — 打印运行计划
			
 
				+ ├─ ⑥ run_with_harness()                — 核心执行（带超时）
			
 
				+ │    ├─ 策略加载 (DB / default fallback)
			
 
				+ │    ├─ 预算注入 (cap target_count)
			
 
				+ │    └─ asyncio.wait_for(core.run(), timeout)
			
 
				+ │         ├─ SearchAgentCore.run()
			
 
				+ │         │    ├─ 解析策略 (SearchAgentPolicy)
			
 
				+ │         │    ├─ 构建 PipelineContext
			
 
				+ │         │    └─ run_content_finder_pipeline(ctx)
			
 
				+ │         │         ├─ build_default_pipeline() — 组装 stages/gates/hooks
			
 
				+ │         │         └─ PipelineOrchestrator.run(ctx)
			
 
				+ │         │              ├─ Stage 1: DemandAnalysisStage
			
 
				+ │         │              ├─ Stage 2: ContentSearchStage
			
 
				+ │         │              │    └─ Gate: SearchCompletenessGate
			
 
				+ │         │              ├─ Stage 3: HardFilterStage
			
 
				+ │         │              ├─ Stage 4: CoarseFilterStage (LLM 标题粗筛)
			
 
				+ │         │              ├─ Stage 5: QualityFilterStage (正文精排)
			
 
				+ │         │              │    └─ Gate: FilterSufficiencyGate (可 fallback 到 Stage 2)
			
 
				+ │         │              ├─ Stage 6: AccountPrecipitateStage
			
 
				+ │         │              └─ Stage 7: OutputPersistStage
			
 
				+ │         │                   └─ Gate: OutputSchemaGate
			
 
				+ │         └─ 返回 PipelineContext
			
 
				+ ├─ ⑦ summary.log()                     — 结构化摘要
			
 
				+ └─ ⑧ exit code (0=成功, 1=失败)
			
 
				+```
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 详细流程
			
 
				+
			
 
				+### 1. 入口：`main()`
			
 
				+
			
 
				+> 文件：`run_search_agent.py:284`
			
 
				+
			
 
				+```
			
 
				+asyncio.run(main())
			
 
				+```
			
 
				+
			
 
				+#### ① 前置检查 — Fallback Harness
			
 
				+
			
 
				+`validate_prerequisites()` (line 173) 检查 `OPEN_ROUTER_API_KEY` 是否存在，缺失则立即抛出 `EnvironmentError`，快速失败。
			
 
				+
			
 
				+#### ② 读取运行参数
			
 
				+
			
 
				+| 环境变量 | 默认值 | 用途 |
			
 
				+|---------|--------|------|
			
 
				+| `PIPELINE_QUERY` | `"伊朗以色列冲突、中老年人会关注什么？"` | 搜索查询 |
			
 
				+| `PIPELINE_DEMAND_ID` | `"1"` | 需求 ID（关联 DB 策略） |
			
 
				+
			
 
				+#### ③ 预算约束 — Budget Harness
			
 
				+
			
 
				+`AgentBudget.from_env()` (line 61) 从环境变量构建预算：
			
 
				+
			
 
				+| 参数 | 环境变量 | 默认值 | 约束范围 |
			
 
				+|------|---------|--------|---------|
			
 
				+| `timeout_seconds` | `PIPELINE_TIMEOUT` | 1800 (30min) | ≥ 30 |
			
 
				+| `max_target_count` | `PIPELINE_MAX_TARGET_COUNT` | 10 | [1, 200] |
			
 
				+| `max_fallback_rounds` | `PIPELINE_MAX_FALLBACK_ROUNDS` | 1 | [0, 5] |
			
 
				+
			
 
				+`budget.validate()` 校验参数范围，不合法直接抛异常。
			
 
				+
			
 
				+#### ④ 生成全局 trace_id
			
 
				+
			
 
				+```python
			
 
				+trace_id = str(uuid4())
			
 
				+```
			
 
				+
			
 
				+此 `trace_id` 贯穿整个运行周期：运行计划 → harness → core → pipeline context → 所有 stages/hooks/gates。
			
 
				+
			
 
				+#### ⑤ 运行计划 — Planner Harness
			
 
				+
			
 
				+`print_run_plan()` (line 138) 在执行前打印结构化计划，包含 trace_id、query、各阶段目标与约束，使运行意图可审计。
			
 
				+
			
 
				+---
			
 
				+
			
 
				+### 2. 核心执行：`run_with_harness()`
			
 
				+
			
 
				+> 文件：`run_search_agent.py:193`
			
 
				+
			
 
				+#### 2.1 策略加载
			
 
				+
			
 
				+```
			
 
				+use_db_policy=True?
			
 
				+  ├─ Yes → core.load_policy(demand_id)
			
 
				+  │         ├─ 成功 → policy_source = "db"
			
 
				+  │         └─ 失败 → 降级为 SearchAgentPolicy.defaults(), policy_source = "default(fallback)"
			
 
				+  └─ No  → SearchAgentPolicy.defaults(), policy_source = "default"
			
 
				+```
			
 
				+
			
 
				+**策略加载链路：**
			
 
				+- `SearchAgentCore.load_policy()` → `SearchAgentPolicyRepository.load_policy()` (`src/domain/search/repository.py:20`)
			
 
				+- 查询 `search_agent` 库的 `search_agent_strategy` 表
			
 
				+- 优先按 `demand_id` 查找，找不到则查 `strategy_code='default'`
			
 
				+- 解析 `config_json` 字段，合并默认值，返回 `SearchAgentPolicy`
			
 
				+
			
 
				+**SearchAgentPolicy 关键字段** (`src/domain/search/policy.py`):
			
 
				+
			
 
				+| 字段 | 默认值 | 含义 |
			
 
				+|------|--------|------|
			
 
				+| `max_keywords` | 6 | 最大搜索关键词数 |
			
 
				+| `min_candidate_multiplier` | 2.0 | 候选目标 = target_count × 此值 |
			
 
				+| `near_enough_candidate_multiplier` | 1.2 | "接近足够"的阈值倍数 |
			
 
				+| `filter_near_ratio` | 0.8 | 过滤后"接近足够"的比例 |
			
 
				+| `max_detail_fetch` | 30 | 最多获取详情的文章数 |
			
 
				+| `enable_llm_review` | True | 是否启用 LLM 复评 |
			
 
				+
			
 
				+#### 2.2 预算注入
			
 
				+
			
 
				+```python
			
 
				+effective_target = min(runtime.target_count, budget.max_target_count)
			
 
				+```
			
 
				+
			
 
				+`RuntimePipelineConfig.from_env()` (`src/pipeline/config/pipeline_config.py`) 读取：
			
 
				+
			
 
				+| 环境变量 | 默认值 |
			
 
				+|---------|--------|
			
 
				+| `MODEL` | `anthropic/claude-sonnet-4-5` |
			
 
				+| `PIPELINE_TEMPERATURE` | 0.2 |
			
 
				+| `PIPELINE_MAX_ITERATIONS` | 12 |
			
 
				+| `PIPELINE_TARGET_COUNT` | 10 |
			
 
				+
			
 
				+Budget Harness 将 `target_count` 限制在 `max_target_count` 以内，防止无限扩张。
			
 
				+
			
 
				+#### 2.3 超时包裹执行
			
 
				+
			
 
				+```python
			
 
				+ctx = await asyncio.wait_for(
			
 
				+    core.run(query, demand_id, effective_target, ..., trace_id=trace_id),
			
 
				+    timeout=budget.timeout_seconds,
			
 
				+)
			
 
				+```
			
 
				+
			
 
				+三种结果：
			
 
				+- 正常返回 → 采集摘要，`success=True`
			
 
				+- `asyncio.TimeoutError` → 记录超时错误，`success=False`
			
 
				+- 其他异常 → 记录异常信息，`success=False`
			
 
				+
			
 
				+---
			
 
				+
			
 
				+### 3. 服务层：`SearchAgentCore.run()`
			
 
				+
			
 
				+> 文件：`src/domain/search/core.py:41`
			
 
				+
			
 
				+```python
			
 
				+trace_id = trace_id or str(uuid4())   # 优先使用外部传入的 trace_id
			
 
				+ctx = PipelineContext(
			
 
				+    task_id=str(uuid4()),
			
 
				+    trace_id=trace_id,
			
 
				+    query=query,
			
 
				+    demand_id=demand_id,
			
 
				+    target_count=target_count,
			
 
				+    model=runtime.model,
			
 
				+    output_dir="tests/output",
			
 
				+    knowledge_sources=default_knowledge_sources(),
			
 
				+)
			
 
				+apply_search_agent_policy(ctx, policy)   # 策略写入 ctx.metadata
			
 
				+return await run_content_finder_pipeline(ctx)
			
 
				+```
			
 
				+
			
 
				+`default_knowledge_sources()` (`src/pipeline/runner.py`) 加载静态知识：平台规则、受众画像等。
			
 
				+
			
 
				+---
			
 
				+
			
 
				+### 4. Pipeline 组装：`build_default_pipeline()`
			
 
				+
			
 
				+> 文件：`src/pipeline/runner.py:39`
			
 
				+
			
 
				+组装三大组件：
			
 
				+
			
 
				+**Stages（按顺序执行）：**
			
 
				+
			
 
				+| # | Stage | 职责 |
			
 
				+|---|-------|------|
			
 
				+| 1 | `DemandAnalysisStage` | LLM 理解需求，产出搜索策略 |
			
 
				+| 2 | `ContentSearchStage` | 按关键词召回候选文章 |
			
 
				+| 3 | `HardFilterStage` | 去重 + URL/时间基础校验 |
			
 
				+| 4 | `CoarseFilterStage` | LLM 批量标题语义粗筛 |
			
 
				+| 5 | `QualityFilterStage` | 数据指标评分 + LLM 正文精排 |
			
 
				+| 6 | `AccountPrecipitateStage` | 账号信息聚合沉淀 |
			
 
				+| 7 | `OutputPersistStage` | 输出结构化 JSON |
			
 
				+
			
 
				+**Gates（阶段后置检查）：**
			
 
				+
			
 
				+| Gate | 挂载在 | 动作 |
			
 
				+|------|--------|------|
			
 
				+| `SearchCompletenessGate` | Stage 2 之后 | 候选不足 → abort |
			
 
				+| `FilterSufficiencyGate` | Stage 5 之后 | 不足 → fallback 到 Stage 2 |
			
 
				+| `OutputSchemaGate` | Stage 7 之后 | 结构校验 |
			
 
				+
			
 
				+**Hooks（观测层）：**
			
 
				+
			
 
				+| Hook | 职责 |
			
 
				+|------|------|
			
 
				+| `TraceHook` | JSON 快照写入 logger |
			
 
				+| `PipelineTraceHook` | JSONL 事件写入 `tests/traces/{trace_id}/pipeline.jsonl` |
			
 
				+| `LiveProgressHook` | 终端实时进度展示 |
			
 
				+| `DatabasePersistHook` | MySQL 持久化（可选，失败不阻塞） |
			
 
				+
			
 
				+---
			
 
				+
			
 
				+### 5. Pipeline 执行：`PipelineOrchestrator.run()`
			
 
				+
			
 
				+> 文件：`src/pipeline/orchestrator.py:43`
			
 
				+
			
 
				+```
			
 
				+on_pipeline_start(ctx)                    ← 所有 hooks 触发
			
 
				+
			
 
				+for stage in stages:
			
 
				+    on_stage_start(ctx, stage)            ← hooks
			
 
				+    validate_input(ctx)                   ← 前置校验
			
 
				+    _execute_stage(ctx)                   ← 执行（带重试，max_stage_retries=1）
			
 
				+    checkpoint(ctx)                       ← 快照保存
			
 
				+    on_stage_complete(ctx, stage)         ← hooks
			
 
				+
			
 
				+    if gate exists for this stage:
			
 
				+        result = gate.check(ctx)
			
 
				+        ├─ passed / proceed  → 继续下一个 stage
			
 
				+        ├─ retry_stage       → 重新执行当前 stage
			
 
				+        ├─ fallback          → 跳转到 fallback_stage（最多 1 轮）
			
 
				+        └─ abort             → 抛出异常，终止 pipeline
			
 
				+
			
 
				+on_pipeline_complete(ctx)                 ← 所有 hooks 触发
			
 
				+return ctx
			
 
				+```
			
 
				+
			
 
				+---
			
 
				+
			
 
				+### 6. 各 Stage 详细逻辑
			
 
				+
			
 
				+#### Stage 1: DemandAnalysisStage
			
 
				+
			
 
				+> 文件：`src/pipeline/stages/demand_analysis.py:19`
			
 
				+
			
 
				+- 输入：`ctx.query` + `ctx.knowledge_sources`
			
 
				+- 调用 `StageAgentExecutor.run_json_stage()`，加载 skill `demand_analysis.md`
			
 
				+- LLM 输出 JSON，解析为 `DemandAnalysisResult`：
			
 
				+  - `substantive_features` / `formal_features` — 内容特征
			
 
				+  - `search_strategy.precise_keywords` / `topic_keywords` — 搜索关键词
			
 
				+  - `filter_focus.format_rules` / `relevance_focus` — 过滤指导
			
 
				+- 输出：写入 `ctx.demand_analysis`
			
 
				+
			
 
				+#### Stage 2: ContentSearchStage
			
 
				+
			
 
				+> 文件：`src/pipeline/stages/content_search.py:42`
			
 
				+
			
 
				+两种模式：
			
 
				+- **Agent 模式**（有 `agent_executor`）：LLM 自主调用 `weixin_search` 工具，迭代搜索
			
 
				+- **Code 模式**（无 `agent_executor`）：按关键词列表逐个调用 `WeixinToolAdapter.search()`
			
 
				+
			
 
				+流程：
			
 
				+1. 从 `ctx.demand_analysis` 提取关键词列表
			
 
				+2. 逐关键词搜索，结果按 URL 去重
			
 
				+3. 达到 `target_count × min_candidate_multiplier` 时停止
			
 
				+4. 输出：`ctx.candidate_articles` + `ctx.metadata["_search_keyword_stats"]`
			
 
				+
			
 
				+**→ Gate: SearchCompletenessGate** (`src/pipeline/gates/search_completeness.py:18`)
			
 
				+
			
 
				+```
			
 
				+候选数 ≥ target × 2.0        → PASS
			
 
				+候选数 ≥ target × 1.2        → PASS (warning)
			
 
				+候选数 < target × 1.2        → ABORT
			
 
				+```
			
 
				+
			
 
				+#### Stage 3: HardFilterStage
			
 
				+
			
 
				+> 文件：`src/pipeline/stages/content_filter.py:96`
			
 
				+
			
 
				+- 按 URL 去重（保留首次出现）
			
 
				+- 过滤掉：缺少 title/URL、非 HTTP URL、publish_time ≤ 0
			
 
				+- 输出：更新 `ctx.candidate_articles`
			
 
				+
			
 
				+#### Stage 4: CoarseFilterStage
			
 
				+
			
 
				+> 文件：`src/pipeline/stages/coarse_filter.py`
			
 
				+
			
 
				+- 输入：`ctx.candidate_articles`（硬过滤后）
			
 
				+- 将候选文章标题按批（每批 ~20 篇）+ query + demand_analysis 特征打包成 LLM prompt
			
 
				+- 调用 `StageAgentExecutor.run_simple_llm_json()` 做语义相关性判断
			
 
				+- 宽松原则：只淘汰明显不相关的标题，不确定时放行
			
 
				+- LLM 调用失败时该批全部放行（fail-open）
			
 
				+- 输出：更新 `ctx.candidate_articles` + `ctx.metadata["_coarse_filter_log"]`
			
 
				+
			
 
				+#### Stage 5: QualityFilterStage
			
 
				+
			
 
				+> 文件：`src/pipeline/stages/content_filter.py:134`
			
 
				+
			
 
				+1. 对每篇文章（上限 `max_detail_fetch=30`）：
			
 
				+   - 调用 `adapter.get_article_detail(url)` 获取正文
			
 
				+   - 数据指标评分 `_score_article()`：正文长度、阅读量、互动率 → interest 等级
			
 
				+   - 关键词匹配结果仅作为参考信息传给 LLM
			
 
				+   - LLM 复评决定最终 relevance（所有非 spam 文章都进 LLM 审核）
			
 
				+   - relevance ≠ "low" 的文章入选
			
 
				+2. 按 (relevance, interest, view_count, publish_time) 降序排序
			
 
				+3. 截断到 `target_count`
			
 
				+4. 输出：`ctx.filtered_articles` + `ctx.metadata["_quality_review_log"]`
			
 
				+
			
 
				+**→ Gate: FilterSufficiencyGate** (`src/pipeline/gates/filter_sufficiency.py:21`)
			
 
				+
			
 
				+```
			
 
				+入选数 ≥ target_count              → PASS
			
 
				+入选数 ≥ target_count × 0.8        → PASS (warning)
			
 
				+已 fallback 过 1 次                → PASS (接受现有结果)
			
 
				+否则                               → FALLBACK 到 Stage 2 (content_search)
			
 
				+```
			
 
				+
			
 
				+Fallback 时：跳回 Stage 2 重新搜索，跳过已用关键词，尝试新组合。最多 1 轮。
			
 
				+
			
 
				+#### Stage 6: AccountPrecipitateStage
			
 
				+
			
 
				+> 文件：`src/pipeline/stages/account_precipitate.py:24`
			
 
				+
			
 
				+- 遍历 `ctx.filtered_articles`
			
 
				+- 调用 `adapter.get_account(url)` 获取公众号信息
			
 
				+- 按 `wx_gh` / `account_name` 去重聚合：
			
 
				+  - `article_count`（累加）
			
 
				+  - `sample_articles`（最多 5 篇标题）
			
 
				+  - `source_urls`（去重）
			
 
				+- 建立 `ArticleAccountRelation` 映射
			
 
				+- 输出：`ctx.accounts` + `ctx.article_account_relations`
			
 
				+
			
 
				+#### Stage 7: OutputPersistStage
			
 
				+
			
 
				+> 文件：`src/pipeline/stages/output_persist.py:28`
			
 
				+
			
 
				+- 构建 `PipelineOutput`：
			
 
				+  - `summary`：candidate_count, filtered_in_count, account_count
			
 
				+  - `contents`：[{title, url, statistics, reason}, ...]
			
 
				+  - `accounts`：[{wx_gh, account_name, article_count, sample_articles}, ...]
			
 
				+  - `article_account_relations`：[{article_url, wx_gh}, ...]
			
 
				+- 写入 `tests/output/{trace_id}/output.json`
			
 
				+- 输出：`ctx.output` + `ctx.metadata["output_file"]`
			
 
				+
			
 
				+**→ Gate: OutputSchemaGate** (`src/pipeline/gates/output_schema.py:18`)
			
 
				+
			
 
				+校验：
			
 
				+- `candidate_count ≥ filtered_in_count`
			
 
				+- `account_count == len(accounts)`
			
 
				+- `filtered_in_count == len(contents)`
			
 
				+- 所有 relation 引用的 URL 和 wx_gh 均存在
			
 
				+
			
 
				+---
			
 
				+
			
 
				+### 7. 结果回传
			
 
				+
			
 
				+Pipeline 执行完毕后，`PipelineContext` 沿调用链返回：
			
 
				+
			
 
				+```
			
 
				+PipelineOrchestrator.run(ctx)
			
 
				+  → run_content_finder_pipeline(ctx) 返回 ctx
			
 
				+    → SearchAgentCore.run() 返回 ctx
			
 
				+      → run_with_harness() 采集 RunSummary
			
 
				+        → main() 打印摘要 + 设置退出码
			
 
				+```
			
 
				+
			
 
				+`RunSummary` 采集的指标：
			
 
				+
			
 
				+| 字段 | 来源 |
			
 
				+|------|------|
			
 
				+| `trace_id` | harness 层生成，贯穿全程 |
			
 
				+| `policy_source` | "db" / "default" / "default(fallback)" |
			
 
				+| `candidate_count` | `len(ctx.candidate_articles)` |
			
 
				+| `filtered_count` | `len(ctx.filtered_articles)` |
			
 
				+| `account_count` | `len(ctx.accounts)` |
			
 
				+| `elapsed_seconds` | `time.monotonic()` 计时 |
			
 
				+| `stage_history` | 各 stage 的 name/status/attempt |
			
 
				+| `output_file` | `ctx.metadata["output_file"]` |
			
 
				+
			
 
				+---
			
 
				+
			
 
				+### 8. 产出物
			
 
				+
			
 
				+| 产出 | 路径 | 内容 |
			
 
				+|------|------|------|
			
 
				+| 结构化输出 | `tests/output/{trace_id}/output.json` | 文章列表 + 账号列表 + 关联关系 |
			
 
				+| Pipeline 事件流 | `tests/traces/{trace_id}/pipeline.jsonl` | 每个 stage/gate 的 JSONL 事件 |
			
 
				+| DB 记录 | `search_agent` 库多张表 | 任务、阶段、候选、评分、账号（可选） |
			
 
				+| 终端摘要 | stdout | RunSummary 结构化日志 |
			
--- a/docs/search-agent-architecture.md
+++ b/docs/search-agent-architecture.md
@@ -0,0 +1,521 @@
 
				+# Search Agent 系统架构
			
 
				+
			
 
				+本文提供 LongArticleSearchAgent 的全局架构视图，涵盖从 CLI 入口到外部 API 调用的全部分层、数据流与关键组件。
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 1. 系统分层总览
			
 
				+
			
 
				+```mermaid
			
 
				+flowchart TB
			
 
				+    subgraph Entry["入口层"]
			
 
				+        RSA["run_search_agent.py<br/>(Harness 模式)"]
			
 
				+        RP["run_pipeline.py<br/>(简易 CLI)"]
			
 
				+        APP["app.py<br/>(Web API / Quart)"]
			
 
				+    end
			
 
				+
			
 
				+    subgraph Domain["领域层 src/domain"]
			
 
				+        SAC["SearchAgentCore"]
			
 
				+        SAP["SearchAgentPolicy"]
			
 
				+        SAPR["SearchAgentPolicyRepository"]
			
 
				+    end
			
 
				+
			
 
				+    subgraph Pipeline["编排层 src/pipeline"]
			
 
				+        PO["PipelineOrchestrator"]
			
 
				+        CTX["PipelineContext"]
			
 
				+        STAGES["Stages × 7"]
			
 
				+        GATES["QualityGates × 3"]
			
 
				+        HOOKS["Hooks × 4"]
			
 
				+    end
			
 
				+
			
 
				+    subgraph AgentCore["Agent 内核 agent/"]
			
 
				+        AR["AgentRunner"]
			
 
				+        LLM["OpenRouter LLM"]
			
 
				+        TR["Trace / Message"]
			
 
				+        TOOLS["Tool Registry"]
			
 
				+        SK["Skill System"]
			
 
				+    end
			
 
				+
			
 
				+    subgraph Adapters["适配器层"]
			
 
				+        WA["WeixinToolAdapter"]
			
 
				+        KS["KnowledgeSource"]
			
 
				+    end
			
 
				+
			
 
				+    subgraph Infra["基础设施层 src/infra"]
			
 
				+        HTTP["AsyncHttpClient"]
			
 
				+        MYSQL["AsyncMySQLPool"]
			
 
				+        LOG["Logging / Trace"]
			
 
				+    end
			
 
				+
			
 
				+    subgraph External["外部服务"]
			
 
				+        WXAPI["微信搜索 API<br/>crawler-cn.aiddit.com"]
			
 
				+        ORAPI["OpenRouter API"]
			
 
				+        DB[("MySQL<br/>search_agent / supply_*")]
			
 
				+    end
			
 
				+
			
 
				+    RSA --> SAC
			
 
				+    RP --> PO
			
 
				+    APP --> SAC
			
 
				+    SAC --> SAPR
			
 
				+    SAPR --> MYSQL
			
 
				+    SAC --> PO
			
 
				+    PO --> CTX
			
 
				+    PO --> STAGES
			
 
				+    PO --> GATES
			
 
				+    PO --> HOOKS
			
 
				+    STAGES -->|"DemandAnalysis<br/>CoarseFilter<br/>QualityFilter"| AR
			
 
				+    AR --> LLM
			
 
				+    AR --> TR
			
 
				+    AR --> TOOLS
			
 
				+    AR --> SK
			
 
				+    STAGES --> WA
			
 
				+    STAGES --> KS
			
 
				+    WA --> HTTP
			
 
				+    HTTP --> WXAPI
			
 
				+    LLM --> ORAPI
			
 
				+    HOOKS -->|"DatabasePersistHook"| MYSQL
			
 
				+    HOOKS -->|"PipelineTraceHook"| LOG
			
 
				+    MYSQL --> DB
			
 
				+```
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 2. Pipeline 阶段与门禁流程
			
 
				+
			
 
				+采用 **粗筛 + 精排** 两阶段过滤架构：先用标题做 LLM 语义粗筛，再对通过的文章拉取正文做 LLM 精排。
			
 
				+
			
 
				+```mermaid
			
 
				+flowchart LR
			
 
				+    S1["1. demand_analysis<br/>🔍 需求理解与特征分层"]
			
 
				+    S2["2. content_search<br/>📡 按策略搜索候选文章"]
			
 
				+    G1{"SearchCompleteness<br/>Gate 🚦"}
			
 
				+    S3["3. hard_filter<br/>🧹 去重与基础规则过滤"]
			
 
				+    S4["4. coarse_filter<br/>🏷️ LLM 标题语义粗筛"]
			
 
				+    S5["5. quality_filter<br/>⭐ 正文精排（数据 + LLM）"]
			
 
				+    G2{"FilterSufficiency<br/>Gate 🚦"}
			
 
				+    S6["6. account_precipitate<br/>👤 基于文章聚合公众号"]
			
 
				+    S7["7. output_persist<br/>💾 生成标准输出并落盘"]
			
 
				+    G3{"OutputSchema<br/>Gate 🚦"}
			
 
				+
			
 
				+    S1 --> S2 --> G1 --> S3 --> S4 --> S5 --> G2 --> S6 --> S7 --> G3
			
 
				+    G2 -->|"fallback: 候选不足"| S2
			
 
				+    G1 -->|"abort: 严重不足"| ABORT["Pipeline 终止"]
			
 
				+    G3 -->|"abort: Schema 校验失败"| ABORT
			
 
				+```
			
 
				+
			
 
				+### 粗筛 → 精排 设计原理
			
 
				+
			
 
				+| 维度 | 粗筛 (CoarseFilter) | 精排 (QualityFilter) |
			
 
				+|------|---------------------|---------------------|
			
 
				+| 输入 | 文章标题 + 来源关键词 | 文章正文（detail API 拉取） |
			
 
				+| 方法 | LLM 批量语义判断（每批 ~20 篇） | 数据指标评分 + LLM 逐篇复评 |
			
 
				+| 目标 | 快速淘汰明显不相关的文章 | 精确判定相关性 + 兴趣度 |
			
 
				+| 原则 | **宽进**：不确定就放行 | **准确**：LLM 基于正文做最终判定 |
			
 
				+| 性能优势 | 减少后续 detail API 调用量 | — |
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 3. 数据流与 Context 变迁
			
 
				+
			
 
				+```mermaid
			
 
				+flowchart TD
			
 
				+    subgraph Input["输入"]
			
 
				+        Q["query + demand_id + target_count"]
			
 
				+    end
			
 
				+
			
 
				+    subgraph S1["demand_analysis"]
			
 
				+        DA["DemandAnalysisResult<br/>─ 实质/形式/上层/下层特征<br/>─ 精准词 + 主题词<br/>─ 筛选关注点"]
			
 
				+    end
			
 
				+
			
 
				+    subgraph S2["content_search"]
			
 
				+        CA["CandidateArticle[]<br/>─ title, url, publish_time<br/>─ view/like/share count<br/>─ source_keyword"]
			
 
				+    end
			
 
				+
			
 
				+    subgraph S3["hard_filter"]
			
 
				+        CA2["CandidateArticle[]<br/>(去重 + 规则过滤后)"]
			
 
				+    end
			
 
				+
			
 
				+    subgraph S4["coarse_filter"]
			
 
				+        CA3["CandidateArticle[]<br/>(LLM 标题粗筛后)<br/>+ metadata._coarse_filter_log"]
			
 
				+    end
			
 
				+
			
 
				+    subgraph S5["quality_filter"]
			
 
				+        FA["FilteredArticle[]<br/>─ relevance/interest level<br/>─ body_text, reason<br/>─ phase(heuristic/LLM)"]
			
 
				+    end
			
 
				+
			
 
				+    subgraph S6["account_precipitate"]
			
 
				+        ACC["AccountInfo[]<br/>─ account_name, wx_gh<br/>─ article_count<br/>─ sample_articles"]
			
 
				+        REL["article_account_relations"]
			
 
				+    end
			
 
				+
			
 
				+    subgraph S7["output_persist"]
			
 
				+        OUT["PipelineOutput → JSON 文件<br/>tests/output/{trace_id}/output.json"]
			
 
				+    end
			
 
				+
			
 
				+    Q --> S1
			
 
				+    S1 -->|"ctx.demand_analysis"| S2
			
 
				+    S2 -->|"ctx.candidate_articles"| S3
			
 
				+    S3 -->|"ctx.candidate_articles"| S4
			
 
				+    S4 -->|"ctx.candidate_articles (筛减后)"| S5
			
 
				+    S5 -->|"ctx.filtered_articles"| S6
			
 
				+    S6 -->|"ctx.accounts<br/>ctx.article_account_relations"| S7
			
 
				+```
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 4. 核心组件详解
			
 
				+
			
 
				+### 4.1 入口层
			
 
				+
			
 
				+| 入口 | 文件 | 说明 |
			
 
				+|------|------|------|
			
 
				+| Harness CLI | `run_search_agent.py` | 带预算控制（`AgentBudget`）、观测（`Observer`）、回退策略的生产入口 |
			
 
				+| 简易 CLI | `run_pipeline.py` | 无 DB 策略，硬编码 query，开发调试用 |
			
 
				+| Web API | `app.py` | Quart 服务，REST 端点调用 `SearchAgentCore` |
			
 
				+
			
 
				+### 4.2 领域层 (`src/domain/search/`)
			
 
				+
			
 
				+```mermaid
			
 
				+classDiagram
			
 
				+    class SearchAgentCore {
			
 
				+        +run(query, demand_id, ...) RunSummary
			
 
				+        -_load_policy(demand_id) SearchAgentPolicy
			
 
				+        -_build_pipeline(ctx) PipelineOrchestrator
			
 
				+    }
			
 
				+    class SearchAgentPolicy {
			
 
				+        +max_keywords: int
			
 
				+        +keyword_priority: str
			
 
				+        +min_candidate_multiplier: float
			
 
				+        +enable_llm_review: bool
			
 
				+        +extra_keywords: list
			
 
				+    }
			
 
				+    class SearchAgentPolicyRepository {
			
 
				+        +load(demand_id) SearchAgentPolicy
			
 
				+    }
			
 
				+
			
 
				+    SearchAgentCore --> SearchAgentPolicyRepository : 加载策略
			
 
				+    SearchAgentCore --> SearchAgentPolicy : 应用到 ctx
			
 
				+    SearchAgentPolicyRepository --> SearchAgentPolicy : 返回
			
 
				+```
			
 
				+
			
 
				+### 4.3 编排层 (`src/pipeline/`)
			
 
				+
			
 
				+```mermaid
			
 
				+classDiagram
			
 
				+    class PipelineOrchestrator {
			
 
				+        +run(ctx) PipelineContext
			
 
				+        -_run_stage(stage, ctx)
			
 
				+        -_resolve_gate(gate, result, ctx)
			
 
				+    }
			
 
				+    class PipelineContext {
			
 
				+        +task_id: str
			
 
				+        +trace_id: str
			
 
				+        +query: str
			
 
				+        +demand_analysis: DemandAnalysisResult
			
 
				+        +candidate_articles: list
			
 
				+        +filtered_articles: list
			
 
				+        +accounts: list
			
 
				+        +stage_history: list
			
 
				+        +metadata: dict
			
 
				+    }
			
 
				+    class Stage {
			
 
				+        <<abstract>>
			
 
				+        +execute(ctx)*
			
 
				+        +validate_input(ctx)*
			
 
				+        +on_retry(ctx, attempt)
			
 
				+    }
			
 
				+    class QualityGate {
			
 
				+        <<abstract>>
			
 
				+        +check(ctx)* GateResult
			
 
				+    }
			
 
				+    class PipelineHook {
			
 
				+        <<abstract>>
			
 
				+        +on_pipeline_start(ctx)
			
 
				+        +on_stage_start(name, ctx)
			
 
				+        +on_stage_complete(name, ctx)
			
 
				+        +on_gate_check(name, result, ctx)
			
 
				+        +on_error(name, error, ctx)
			
 
				+        +on_pipeline_complete(ctx)
			
 
				+    }
			
 
				+
			
 
				+    PipelineOrchestrator --> PipelineContext
			
 
				+    PipelineOrchestrator --> Stage
			
 
				+    PipelineOrchestrator --> QualityGate
			
 
				+    PipelineOrchestrator --> PipelineHook
			
 
				+```
			
 
				+
			
 
				+### 4.4 七阶段实现
			
 
				+
			
 
				+| # | Stage | 类型 | 关键依赖 | 写入 `ctx` |
			
 
				+|---|-------|------|----------|------------|
			
 
				+| 1 | `DemandAnalysisStage` | LLM (AgentRunner) | `StageAgentExecutor` | `demand_analysis` |
			
 
				+| 2 | `ContentSearchStage` | API 调用 | `WeixinToolAdapter.search` | `candidate_articles` |
			
 
				+| 3 | `HardFilterStage` | 纯代码 | — | `candidate_articles` (清洗后) |
			
 
				+| 4 | `CoarseFilterStage` | LLM 批量判断 | `StageAgentExecutor.run_simple_llm_json` | `candidate_articles` (粗筛后), `metadata["_coarse_filter_log"]` |
			
 
				+| 5 | `QualityFilterStage` | API + LLM | `WeixinToolAdapter.get_article_detail` | `filtered_articles` |
			
 
				+| 6 | `AccountPrecipitateStage` | API 调用 | `WeixinToolAdapter.get_account` | `accounts`, `article_account_relations` |
			
 
				+| 7 | `OutputPersistStage` | 纯代码 | — | `output`, `metadata["output_file"]` |
			
 
				+
			
 
				+#### Stage 4: CoarseFilterStage（粗筛）
			
 
				+
			
 
				+- 将候选文章标题列表 + query + demand_analysis 特征打包成 LLM prompt
			
 
				+- 每批 ~20 篇，批量调用 `run_simple_llm_json()` 做语义相关性判断
			
 
				+- 只判断 `pass`（通过）或 `reject`（淘汰），附带简短理由
			
 
				+- 宽松原则：不确定时倾向 pass，LLM 调用失败时全部放行
			
 
				+- 粗筛日志写入 `ctx.metadata["_coarse_filter_log"]`
			
 
				+
			
 
				+#### Stage 5: QualityFilterStage（精排）
			
 
				+
			
 
				+- 对粗筛通过的文章调用 detail API 获取正文
			
 
				+- **interest**（兴趣度）纯数据驱动：正文长度、阅读量、互动率
			
 
				+- **relevance**（相关性）由 LLM 语义分析决定（关键词匹配结果仅作参考信息传给 LLM）
			
 
				+- 所有非 spam 文章都进入 LLM 复评，不做启发式提前淘汰
			
 
				+- 按 relevance/interest/阅读量/发布时间排序后截断到目标数量
			
 
				+
			
 
				+### 4.5 三道门禁
			
 
				+
			
 
				+| Gate | 挂载阶段后 | 通过条件 | 失败动作 |
			
 
				+|------|-----------|---------|---------|
			
 
				+| `SearchCompletenessGate` | `content_search` | 候选 >= target * multiplier | `abort` |
			
 
				+| `FilterSufficiencyGate` | `quality_filter` | 入选 >= target | `fallback` → `content_search` |
			
 
				+| `OutputSchemaGate` | `output_persist` | JSON Schema 合法 | `abort` |
			
 
				+
			
 
				+### 4.6 四种 Hook
			
 
				+
			
 
				+| Hook | 路径 | 职责 |
			
 
				+|------|------|------|
			
 
				+| `PipelineTraceHook` | `hooks/pipeline_trace_hook.py` | JSONL 事件流落盘（含 `decisions` 决策数据），供 HTML 可视化 |
			
 
				+| `LiveProgressHook` | `hooks/live_progress_hook.py` | 终端实时进度打印 |
			
 
				+| `DatabasePersistHook` | `hooks/db_hook.py` | 写入 MySQL `supply_*` 表（可选） |
			
 
				+| `TraceHook` | `hooks/trace_hook.py` | 结构化日志 |
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 5. Agent 内核集成
			
 
				+
			
 
				+Pipeline 中需要 LLM 能力的阶段通过 `StageAgentExecutor` 桥接到 `agent/` 框架：
			
 
				+
			
 
				+```mermaid
			
 
				+sequenceDiagram
			
 
				+    participant Stage as Pipeline Stage
			
 
				+    participant SAE as StageAgentExecutor
			
 
				+    participant AR as AgentRunner
			
 
				+    participant LLM as OpenRouter API
			
 
				+    participant TS as FileSystemTraceStore
			
 
				+
			
 
				+    Stage->>SAE: run_json_stage / run_simple_llm_json
			
 
				+    SAE->>AR: run(system_prompt, tools, config)
			
 
				+    loop Agent Loop (仅 run_json_stage)
			
 
				+        AR->>LLM: chat completion
			
 
				+        LLM-->>AR: response (text / tool_call)
			
 
				+        AR->>AR: execute tool if needed
			
 
				+    end
			
 
				+    AR->>TS: persist trace
			
 
				+    AR-->>SAE: final assistant message
			
 
				+    SAE->>SAE: extract JSON from response
			
 
				+    SAE-->>Stage: parsed result
			
 
				+```
			
 
				+
			
 
				+涉及 LLM 的阶段：
			
 
				+- **`DemandAnalysisStage`**：将 query 分解为特征分层 + 搜索策略（`run_json_stage`，带工具调用）
			
 
				+- **`CoarseFilterStage`**：批量标题语义相关性判断（`run_simple_llm_json`，单轮无工具）
			
 
				+- **`QualityFilterStage`**（`enable_llm_review=True` 时）：基于正文的 LLM 精排复评（`run_simple_llm_json`）
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 6. 适配器与外部 API
			
 
				+
			
 
				+```mermaid
			
 
				+flowchart LR
			
 
				+    subgraph Adapter["WeixinToolAdapter"]
			
 
				+        A1["search(keyword)"]
			
 
				+        A2["get_article_detail(url)"]
			
 
				+        A3["get_account(url)"]
			
 
				+    end
			
 
				+
			
 
				+    subgraph Tools["tests/tools/weixin_tools.py"]
			
 
				+        T1["weixin_search()"]
			
 
				+        T2["fetch_article_detail()"]
			
 
				+        T3["fetch_weixin_account()"]
			
 
				+    end
			
 
				+
			
 
				+    subgraph API["crawler-cn.aiddit.com"]
			
 
				+        E1["/crawler/wei_xin/keyword"]
			
 
				+        E2["/crawler/wei_xin/detail"]
			
 
				+        E3["/crawler/wei_xin/account"]
			
 
				+    end
			
 
				+
			
 
				+    A1 --> T1 --> E1
			
 
				+    A2 --> T2 --> E2
			
 
				+    A3 --> T3 --> E3
			
 
				+```
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 7. 可观测性与追踪
			
 
				+
			
 
				+```mermaid
			
 
				+flowchart LR
			
 
				+    PO["PipelineOrchestrator"]
			
 
				+
			
 
				+    subgraph Hooks["Hook 回调"]
			
 
				+        H1["PipelineTraceHook"]
			
 
				+        H2["LiveProgressHook"]
			
 
				+        H3["DatabasePersistHook"]
			
 
				+    end
			
 
				+
			
 
				+    subgraph Output["产物"]
			
 
				+        JSONL["tests/traces/{trace_id}/pipeline.jsonl"]
			
 
				+        HTML["pipeline_trace.html"]
			
 
				+        TERM["终端 stdout"]
			
 
				+        DBOUT[("MySQL supply_* 表")]
			
 
				+    end
			
 
				+
			
 
				+    PO --> H1 --> JSONL
			
 
				+    JSONL -->|"pipeline_visualize.py"| HTML
			
 
				+    PO --> H2 --> TERM
			
 
				+    PO --> H3 --> DBOUT
			
 
				+```
			
 
				+
			
 
				+**JSONL 事件类型**：`init` → `stage_start` → `stage_complete`（含 `decisions` 决策详情）→ `gate_check` → `error` → `complete`
			
 
				+
			
 
				+**HTML 可视化**：`pipeline_visualize.py` 读取 JSONL 生成深色主题 Timeline 页面，每个阶段的决策数据渲染为可折叠的 `<details>` 卡片。可视化内容包括：
			
 
				+- **粗筛阶段**：标题级别的 pass/reject 表格，含判断理由
			
 
				+- **精排阶段**：相关性和兴趣度分列显示，含 LLM 评审理由
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 8. 配置体系
			
 
				+
			
 
				+```mermaid
			
 
				+flowchart TD
			
 
				+    ENV[".env 环境变量"]
			
 
				+
			
 
				+    subgraph Configs["配置类"]
			
 
				+        MC["LongArticlesSearchAgentConfig<br/>(Pydantic BaseSettings)"]
			
 
				+        RPC["RuntimePipelineConfig"]
			
 
				+        AB["AgentBudget"]
			
 
				+        SAMC["SearchAgentMySQLConfig"]
			
 
				+    end
			
 
				+
			
 
				+    ENV --> MC
			
 
				+    ENV --> RPC
			
 
				+    ENV --> AB
			
 
				+    MC --> SAMC
			
 
				+
			
 
				+    MC -->|"search_agent_db"| SAMC
			
 
				+    RPC -->|"model / temperature<br/>target_count / max_iterations"| PO["PipelineOrchestrator"]
			
 
				+    AB -->|"timeout / max_fallback_rounds"| RSA["run_search_agent.py"]
			
 
				+    SAMC -->|"host/port/db"| MYSQL["AsyncMySQLPool"]
			
 
				+```
			
 
				+
			
 
				+| 变量 | 默认值 | 说明 |
			
 
				+|------|-------|------|
			
 
				+| `OPEN_ROUTER_API_KEY` | 必填 | OpenRouter 调用密钥 |
			
 
				+| `MODEL` | `anthropic/claude-sonnet-4-5` | LLM 模型 |
			
 
				+| `PIPELINE_TARGET_COUNT` | `10` | 目标文章数 |
			
 
				+| `PIPELINE_TEMPERATURE` | `0.2` | LLM 温度 |
			
 
				+| `PIPELINE_MAX_ITERATIONS` | `12` | 单阶段 Agent 最大轮次 |
			
 
				+| `SEARCH_AGENT_DB_HOST` / `PORT` / `DB` | 可选 | MySQL 连接（策略读取 + 结果持久化） |
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 9. 数据库表结构
			
 
				+
			
 
				+```mermaid
			
 
				+erDiagram
			
 
				+    search_agent_strategy {
			
 
				+        int id PK
			
 
				+        varchar demand_id
			
 
				+        varchar strategy_code
			
 
				+        json config_json
			
 
				+        int version
			
 
				+        boolean enabled
			
 
				+    }
			
 
				+
			
 
				+    supply_task {
			
 
				+        varchar task_id PK
			
 
				+        varchar query
			
 
				+        varchar demand_id
			
 
				+        varchar status
			
 
				+        timestamp created_at
			
 
				+    }
			
 
				+
			
 
				+    supply_candidate_content {
			
 
				+        int id PK
			
 
				+        varchar task_id FK
			
 
				+        varchar title
			
 
				+        varchar url
			
 
				+        varchar source_keyword
			
 
				+    }
			
 
				+
			
 
				+    supply_content_score {
			
 
				+        int id PK
			
 
				+        int candidate_id FK
			
 
				+        varchar relevance_level
			
 
				+        varchar interest_level
			
 
				+        text reason
			
 
				+    }
			
 
				+
			
 
				+    supply_account_profile {
			
 
				+        int id PK
			
 
				+        varchar task_id FK
			
 
				+        varchar account_name
			
 
				+        varchar wx_gh
			
 
				+        int article_count
			
 
				+    }
			
 
				+
			
 
				+    search_agent_strategy ||--o{ supply_task : "策略驱动"
			
 
				+    supply_task ||--o{ supply_candidate_content : "候选"
			
 
				+    supply_candidate_content ||--o| supply_content_score : "评分"
			
 
				+    supply_task ||--o{ supply_account_profile : "账号"
			
 
				+```
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 10. 关键文件索引
			
 
				+
			
 
				+| 分层 | 文件 | 说明 |
			
 
				+|------|------|------|
			
 
				+| **入口** | `run_search_agent.py` | Harness 生产入口 |
			
 
				+| | `run_pipeline.py` | 简易 CLI |
			
 
				+| | `app.py` | Web API |
			
 
				+| **领域** | `src/domain/search/core.py` | `SearchAgentCore` |
			
 
				+| | `src/domain/search/policy.py` | `SearchAgentPolicy` 策略模型 |
			
 
				+| | `src/domain/search/repository.py` | 策略 DB 加载 |
			
 
				+| **编排** | `src/pipeline/orchestrator.py` | `PipelineOrchestrator` |
			
 
				+| | `src/pipeline/base.py` | Stage / Gate / Hook 抽象基类 |
			
 
				+| | `src/pipeline/context.py` | `PipelineContext` 与数据结构 |
			
 
				+| | `src/pipeline/runner.py` | Pipeline 构建器 |
			
 
				+| | `src/pipeline/config/pipeline_config.py` | `RuntimePipelineConfig` |
			
 
				+| **阶段** | `src/pipeline/stages/demand_analysis.py` | 需求分析 |
			
 
				+| | `src/pipeline/stages/content_search.py` | 内容召回 |
			
 
				+| | `src/pipeline/stages/content_filter.py` | 硬规则 + 质量精排 |
			
 
				+| | `src/pipeline/stages/coarse_filter.py` | LLM 标题粗筛 |
			
 
				+| | `src/pipeline/stages/account_precipitate.py` | 账号沉淀 |
			
 
				+| | `src/pipeline/stages/output_persist.py` | 输出落盘 |
			
 
				+| | `src/pipeline/stages/common.py` | `StageAgentExecutor` (LLM 桥接) |
			
 
				+| **门禁** | `src/pipeline/gates/search_completeness.py` | 召回充分性 |
			
 
				+| | `src/pipeline/gates/filter_sufficiency.py` | 筛选充分性 |
			
 
				+| | `src/pipeline/gates/output_schema.py` | 输出 Schema 校验 |
			
 
				+| **Hook** | `src/pipeline/hooks/pipeline_trace_hook.py` | JSONL 追踪 + 决策落盘 |
			
 
				+| | `src/pipeline/hooks/live_progress_hook.py` | 终端实时进度 |
			
 
				+| | `src/pipeline/hooks/db_hook.py` | MySQL 持久化 |
			
 
				+| **适配器** | `src/pipeline/adapters/weixin.py` | 微信平台适配器 |
			
 
				+| | `src/pipeline/adapters/knowledge/` | 知识源扩展点 |
			
 
				+| **工具** | `tests/tools/weixin_tools.py` | 微信 API 封装 |
			
 
				+| **Skills** | `tests/skills/demand_analysis.md` | 需求分析 LLM 提示词 |
			
 
				+| | `tests/skills/article_finding_strategy.md` | 内容搜索策略 |
			
 
				+| | `tests/skills/article_filter_strategy.md` | 筛选评审策略 |
			
 
				+| | `tests/skills/account_precipitation.md` | 账号沉淀策略 |
			
 
				+| **可视化** | `pipeline_visualize.py` | JSONL → HTML 渲染 |
			
 
				+| **Agent 内核** | `agent/core/runner.py` | `AgentRunner` |
			
 
				+| | `agent/llm/openrouter.py` | LLM 调用 |
			
 
				+| | `agent/trace/models.py` | Trace / Message |
			
 
				+| | `agent/tools/registry.py` | 工具注册 |
			
 
				+| **配置** | `src/config/agent_config.py` | 主配置类 |
			
 
				+| | `.env` | 环境变量 |
			
 
				+
			
 
				+---
			
 
				+
			
 
				+*文档版本：与当前仓库 `src/pipeline` + `src/domain/search` 实现同步（7 阶段，含 CoarseFilterStage 粗筛）。*
			
--- a/docs/search-agent-harness-design.md
+++ b/docs/search-agent-harness-design.md
@@ -0,0 +1,203 @@
 
				+# Search Agent Harness 设计文档
			
 
				+
			
 
				+本文描述对 `run_search_agent.py` 及 Pipeline 可观测性的 **Harness Engineering** 改造：在不修改 `agent/` 内核模块的前提下，为 Search Agent 引入**约束、规划、观测、容错**四层 Harness，并实现全生命周期的 `trace_id` 追踪与可视化。
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 1. 背景与动机
			
 
				+
			
 
				+### 1.1 原始系统的问题
			
 
				+
			
 
				+在初期实现中，系统的启动入口仅仅是一个 5 行的薄壳代码，同时缺乏生命周期追踪，导致以下工程问题：
			
 
				+
			
 
				+| 问题 | 影响 | 解决思路 |
			
 
				+|------|------|----------|
			
 
				+| 缺乏整体超时与资源消耗上限控制 | Agent 可能无限运行，资源被耗尽 | 引入 **Budget Harness** 兜底 |
			
 
				+| 执行过程像"黑盒"，不清楚即将运行哪些阶段 | 难以审计与预判行为 | 引入 **Planner Harness** 打印计划 |
			
 
				+| 运行结果只打散在零碎的日志中 | CI/CD 与调度系统无法判断成功与否 | 引入 **Observer Harness** 结构化总结 |
			
 
				+| 缺少前置 API/DB 校验，深层阶段才报错 | 调试成本高，报错信息不明确 | 引入 **Fallback Harness** 快速失败 |
			
 
				+| `trace_id` 生命周期不完整，各阶段割裂 | 日志分散，难以串联一整次执行记录 | 在 Context 构造阶段**前置生成 trace_id** |
			
 
				+| 长周期运行缺乏直观的呈现方式 | 仅靠终端日志复盘困难 | **事件流 JSONL 追踪 + HTML 可视化** |
			
 
				+
			
 
				+### 1.2 核心架构约束
			
 
				+
			
 
				+**不可变内核原则**：`agent/` 模块为外部协作模块，严格保持**不可变（Immutable）**。所有流程控制、观测埋点和策略注入均通过 Pipeline 层的 Hook 机制和外部的 Harness 层实现。
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 2. 系统架构
			
 
				+
			
 
				+整个系统现在分为三个清晰的层级：
			
 
				+1. **Engine（引擎层 - `agent/`）**：负责具体的 LLM 交互与工具调用（不可变）。
			
 
				+2. **Assembly Line（装配线层 - `src/pipeline/`）**：负责将多个 Agent 阶段串联，执行门禁校验，生成事件流。
			
 
				+3. **Harness（控制台层 - `run_search_agent.py`）**：负责包裹整个装配线，提供运行时的宏观控制与观测。
			
 
				+
			
 
				+### 架构图示
			
 
				+
			
 
				+```mermaid
			
 
				+graph TD
			
 
				+    subgraph Harness [Harness Layer - run_search_agent.py]
			
 
				+        F[Fallback: 检查 API/DB 等前置依赖]
			
 
				+        B[Budget: 设定超时/目标上限/重试约束]
			
 
				+        P[Planner: 打印执行计划]
			
 
				+        O[Observer: 收集并打印结构化摘要]
			
 
				+    end
			
 
				+
			
 
				+    subgraph Pipeline [Pipeline Layer - Orchestrator]
			
 
				+        Ctx[PipelineContext + trace_id前置生成]
			
 
				+        T_Hook[PipelineTraceHook]
			
 
				+        S1[DemandAnalysis] --> S2[ContentSearch] --> S3[...]
			
 
				+    end
			
 
				+
			
 
				+    subgraph Visualization [Tracing & Vis]
			
 
				+        JSONL[pipeline.jsonl 流式写入]
			
 
				+        HTML[pipeline_trace.html 渲染展示]
			
 
				+    end
			
 
				+
			
 
				+    F --> P --> B --> Ctx
			
 
				+    Ctx --> S1
			
 
				+    Pipeline -->|Hook 监听事件流| T_Hook
			
 
				+    T_Hook --> JSONL
			
 
				+    JSONL -->|pipeline_visualize.py| HTML
			
 
				+    Pipeline -->|执行结果| O
			
 
				+```
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 3. 四层 Harness 详解
			
 
				+
			
 
				+### 3.1 Fallback Harness (前置防线)
			
 
				+**职责**：在触发任何消耗资源的 Agent 之前，前置验证环境与依赖，实现“快速失败”。
			
 
				+**实现**：`validate_prerequisites()`
			
 
				+- 检查 `OPEN_ROUTER_API_KEY` 是否配置。
			
 
				+- 尝试预加载数据库策略，验证数据库连通性（但不强制阻断，失败可降级至默认策略）。
			
 
				+
			
 
				+### 3.2 Planner Harness (运行规划)
			
 
				+**职责**：在实际执行前，向终端输出明确的执行计划，消除“盲盒”感。
			
 
				+**实现**：`print_run_plan()`
			
 
				+- 明确输出：任务目标（查询词）、预算约束（超时、目标条数）、使用的模型。
			
 
				+
			
 
				+### 3.3 Budget Harness (预算约束)
			
 
				+**职责**：限制 Agent 系统可能发生的资源滥用（时间、Token、API 调用）。
			
 
				+**实现**：`AgentBudget` 数据类
			
 
				+- `timeout_seconds`：全局执行超时（由 `asyncio.wait_for` 兜底进行硬中断）。
			
 
				+- `max_target_count`：召回与入选的数量上限控制（与 `PIPELINE_TARGET_COUNT` 取小值防御扩张）。
			
 
				+- `max_fallback_rounds`：门禁失败后的最大回退重试次数。
			
 
				+
			
 
				+### 3.4 Observer Harness (观测与总结)
			
 
				+**职责**：无论执行成功还是失败，提供结构化的最终执行报告。
			
 
				+**实现**：`RunSummary` 数据类
			
 
				+- 包含：最终状态、统一获取自 Context 的 `trace_id`、候选文章数、最终入选文章数、各个阶段的耗时与重试记录。
			
 
				+- 特性：失败时抛出 `SystemExit(1)` 供外部调度系统感知。
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 4. 全局 Trace 与可视化体系
			
 
				+
			
 
				+为了彻底解决长周期运行的观测问题，我们实现了一套基于事件流的追踪与可视化体系，风格与 `visualize_log.py` 保持一致。
			
 
				+
			
 
				+### 4.1 `trace_id` 的生命周期
			
 
				+1. **前置生成**：在 `src/pipeline/runner.py` 或 `SearchAgentCore` 构造 `PipelineContext` 的第一时间，就生成统一的 `trace_id = str(uuid4())`。
			
 
				+2. **全程贯穿**：这个 `trace_id` 作为唯一标识被注入 Context，贯穿整个 Pipeline 的所有阶段、门禁和 Hook。
			
 
				+3. **隔离存储**：所有该次运行的产物与日志均归档至 `tests/traces/<trace_id>/` 目录下。
			
 
				+
			
 
				+### 4.2 事件流追踪 (PipelineTraceHook)
			
 
				+不修改核心 Agent 逻辑，通过 Pipeline 的 Hook 机制监听生命周期：
			
 
				+- `on_pipeline_start` -> 写入 `init` 事件
			
 
				+- `on_stage_start` -> 写入 `stage_start` 事件
			
 
				+- `on_stage_complete` -> 写入 `stage_complete` 事件（含阶段耗时、产出统计与 **decisions 决策数据**）
			
 
				+- `on_gate_check` -> 写入 `gate_check` 事件（含门禁通过状态、拦截原因、回退动作）
			
 
				+- `on_error` / `on_pipeline_complete` -> 写入异常与结束事件
			
 
				+
			
 
				+**decisions 字段**：`stage_complete` 事件中新增的 `decisions` 字段，将各阶段的决策信息（原来仅在 `LiveProgressHook` 打印到 stdout 后丢失）持久化到 JSONL，包括：
			
 
				+- 需求分析的特征分层与搜索策略
			
 
				+- 搜索词逐词命中统计
			
 
				+- 质量筛选的逐篇审核记录（accept/reject/skip + 原因 + 审核阶段）
			
 
				+- 账号聚合详情与输出路径
			
 
				+
			
 
				+**产物**：流式追加写入 `tests/traces/<trace_id>/pipeline.jsonl`，安全、解耦且可追溯。
			
 
				+
			
 
				+### 4.3 HTML 可视化 (pipeline_visualize.py)
			
 
				+独立的 CLI 工具，读取 JSONL 事件流并渲染为美观的 HTML 报告。
			
 
				+- **特性**：深色主题、Timeline 时间轴展示、错误详情折叠、各阶段核心指标（耗时/召回/入选/沉淀账号）卡片。
			
 
				+- **决策详情卡片**：每个阶段完成事件下方展示可折叠的 `<details>` 决策详情，包含：
			
 
				+  - **需求理解**：特征分层标签 + 搜索策略表 + 筛选关注点列表
			
 
				+  - **内容召回**：搜索词命中表格（关键词 | 返回数 | 新增数）
			
 
				+  - **硬规则过滤**：过滤后剩余数量
			
 
				+  - **质量筛选**：审核统计概览 + 入选/淘汰文章表（标题、评分、审核阶段、原因）
			
 
				+  - **账号沉淀**：账号列表 + 文章数 + 示例标题
			
 
				+  - **结果输出**：输出文件路径
			
 
				+- **无侵入性**：只依赖生成的 JSONL 文件，独立运行，方便结果分享与复盘。
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 5. 使用指南
			
 
				+
			
 
				+### 5.1 运行 Agent
			
 
				+```bash
			
 
				+# 基本运行（受 Harness 保护）
			
 
				+python run_search_agent.py "中老年人健康饮食"
			
 
				+
			
 
				+# 指定数量与 Demand ID
			
 
				+python run_search_agent.py "高血压食谱" --demand-id "d_1001" --target-count 20
			
 
				+```
			
 
				+
			
 
				+### 5.2 追踪可视化
			
 
				+Pipeline 运行结束后（或运行中途出错），可以生成可视化报告：
			
 
				+```bash
			
 
				+# 自动读取最新的 trace 并生成 HTML
			
 
				+python pipeline_visualize.py
			
 
				+
			
 
				+# 列出所有历史 trace
			
 
				+python pipeline_visualize.py --list
			
 
				+
			
 
				+# 为指定的 trace_id 生成 HTML
			
 
				+python pipeline_visualize.py <trace_id>
			
 
				+```
			
 
				+生成的 HTML 报告路径为：`tests/traces/<trace_id>/pipeline_trace.html`，可通过浏览器直接打开。
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 6. 配置全览
			
 
				+
			
 
				+### 6.1 必选环境变量
			
 
				+| 变量 | 说明 |
			
 
				+|------|------|
			
 
				+| `OPEN_ROUTER_API_KEY` | OpenRouter API 密钥，缺失时立即 `EnvironmentError` |
			
 
				+
			
 
				+### 6.2 运行参数（原有）
			
 
				+| 变量 | 默认值 | 说明 |
			
 
				+|------|--------|------|
			
 
				+| `PIPELINE_QUERY` | 默认演示查询词 | 搜索 query |
			
 
				+| `PIPELINE_DEMAND_ID` | `1` | 需求 ID（用于读取 DB 策略） |
			
 
				+| `MODEL` | `anthropic/claude-sonnet-4-5` | LLM 模型 |
			
 
				+| `PIPELINE_TEMPERATURE` | `0.2` | 阶段内 LLM 温度 |
			
 
				+| `PIPELINE_MAX_ITERATIONS` | `12` | 单阶段 Agent 最大迭代轮次 |
			
 
				+| `PIPELINE_TARGET_COUNT` | `10` | 目标文章数（受 Budget Harness 约束） |
			
 
				+
			
 
				+### 6.3 Budget Harness 新增变量
			
 
				+| 变量 | 默认值 | 合法范围 | 说明 |
			
 
				+|------|--------|---------|------|
			
 
				+| `PIPELINE_TIMEOUT` | `300` | ≥ 30 | 整体超时（秒） |
			
 
				+| `PIPELINE_MAX_TARGET_COUNT` | `30` | [1, 200] | 单次最多产出文章数上限兜底 |
			
 
				+| `PIPELINE_MAX_FALLBACK_ROUNDS` | `1` | [0, 5] | 补召回最大轮次 |
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 7. 关键文件索引
			
 
				+
			
 
				+| 用途 | 路径 |
			
 
				+|------|------|
			
 
				+| Harness 入口 | `run_search_agent.py` |
			
 
				+| Search Agent 核心服务 | `src/domain/search/core.py` |
			
 
				+| Pipeline 装配器 | `src/pipeline/runner.py` |
			
 
				+| 事件流监听 Hook | `src/pipeline/hooks/pipeline_trace_hook.py` |
			
 
				+| HTML 渲染工具 | `pipeline_visualize.py` |
			
 
				+| 可视化产物目录 | `tests/traces/<trace_id>/` |
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 8. 已知限制与后续路线
			
 
				+
			
 
				+1. **`max_fallback_rounds` 目前未在 Pipeline 内部传递**：当前 `FilterSufficiencyGate` 的 fallback 回搜轮次由 `PipelineOrchestrator` 的逻辑控制，约束尚未通过 Context 注入深层，待后续通过 `ctx.metadata` 传递。
			
 
				+2. **Observer 指标未接入外部监控**：`RunSummary` 的字段目前只用于日志输出，后续可在 `run_with_harness` 返回后将关键指标（如通过率、阶段耗时等）上报到 Prometheus / 云监控。
			
--- a/docs/search_agent_strategy.sql
+++ b/docs/search_agent_strategy.sql
@@ -0,0 +1,35 @@
 
				+-- search_agent 库：内容搜索策略配置表
			
 
				+-- 与 LongArticlesSearchAgentConfig.search_agent_db（SEARCH_AGENT_DB_*）指向同一库
			
 
				+
			
 
				+SET NAMES utf8mb4;
			
 
				+
			
 
				+CREATE TABLE IF NOT EXISTS search_agent_strategy (
			
 
				+  id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT COMMENT '主键',
			
 
				+  demand_id BIGINT UNSIGNED DEFAULT NULL COMMENT '绑定上游需求 ID，NULL 表示仅按 strategy_code 使用',
			
 
				+  strategy_code VARCHAR(64) NOT NULL DEFAULT 'default' COMMENT '策略编码，如 default / experiment_a',
			
 
				+  name VARCHAR(128) DEFAULT NULL COMMENT '策略名称',
			
 
				+  config_json JSON NOT NULL COMMENT '运行时参数，见下方示例',
			
 
				+  enabled TINYINT NOT NULL DEFAULT 1 COMMENT '1 启用 0 停用',
			
 
				+  version INT NOT NULL DEFAULT 1 COMMENT '同 demand 多版本时取最大 version',
			
 
				+  created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
			
 
				+  updated_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
			
 
				+  PRIMARY KEY (id),
			
 
				+  KEY idx_strategy_code (strategy_code, enabled),
			
 
				+  KEY idx_demand_enabled (demand_id, enabled),
			
 
				+  KEY idx_version (version)
			
 
				+) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='Search Agent 搜索与筛选策略';
			
 
				+
			
 
				+-- config_json 示例（字段均可选，未填则用代码默认）
			
 
				+-- {
			
 
				+--   "max_keywords": 8,
			
 
				+--   "initial_cursor": "1",
			
 
				+--   "keyword_priority": "demand_first",
			
 
				+--   "extra_keywords": ["扩展词1"],
			
 
				+--   "min_candidate_multiplier": 2.0,
			
 
				+--   "near_enough_candidate_multiplier": 1.2,
			
 
				+--   "filter_near_ratio": 0.8,
			
 
				+--   "max_detail_fetch": 30,
			
 
				+--   "enable_llm_review": true
			
 
				+-- }
			
 
				+--
			
 
				+-- 加载顺序：先按 demand_id 命中；若无则取 strategy_code='default' 的最新 version。
			
--- a/docs/supply-agent-solution.md
+++ b/docs/supply-agent-solution.md
@@ -0,0 +1,403 @@
 
				+# 供给 Agent 技术方案（微信内容供给 → AIGC 抓取计划）
			
 
				+
			
 
				+本文档用于落地一个 **供给 agent**：从上游获取需求（`demand_id + query`）后，在微信平台搜索文章，按策略筛选与打分，最终在 AIGC 平台创建对应的抓取计划；全链路可追踪、可重跑、可审计。
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 背景与目标
			
 
				+
			
 
				+- **输入**：上游需求（`demand_id`、`query`、可选约束/画像/期望条数等）
			
 
				+- **处理**：微信文章搜索 → 候选池扩展 → 账号/内容补全 → 过滤/打分/去重/多样性 → 结果集
			
 
				+- **输出**：AIGC 抓取计划（`crawler_plan_id`）+ 可追踪 `task_id/trace_id` + 最终入选内容列表
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 总体技术架构
			
 
				+
			
 
				+### 模块划分
			
 
				+
			
 
				+- **Orchestrator（编排层）**
			
 
				+  - 接收上游需求
			
 
				+  - 创建/恢复任务状态
			
 
				+  - 驱动 agentic 状态机执行与重试
			
 
				+- **Search Agent（决策层）**
			
 
				+  - 只做策略与决策：多轮召回、筛选、打分、去重、多样性控制
			
 
				+  - 通过工具完成数据获取
			
 
				+- **Tool Gateway（执行层）**
			
 
				+  - 统一封装 4 个工具接口
			
 
				+  - 统一处理：超时、重试、限流、熔断、请求追踪
			
 
				+- **Plan Service（AIGC 对接层）**
			
 
				+  - 将入选内容映射成 AIGC 平台创建抓取计划的 payload
			
 
				+  - 创建计划、绑定生成计划（可选）、记录请求/响应
			
 
				+- **Persistence（存储层）**
			
 
				+  - 任务/候选/特征/评分/计划/事件日志持久化
			
 
				+- **Ops（运维层）**
			
 
				+  - 监控、告警、重试队列、人工复核入口
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## Agentic 流程（建议状态机）
			
 
				+
			
 
				+### 状态/阶段定义
			
 
				+
			
 
				+- **任务状态（`supply_task.status`）**：`PENDING/RUNNING/RECALL_DONE/ENRICHED/RANKED/PLAN_CREATED/COMPLETED/FAILED/CANCELLED`
			
 
				+- **当前阶段（`supply_task.current_stage`）**：
			
 
				+  - `INIT`
			
 
				+  - `RECALL`（搜索召回）
			
 
				+  - `ACCOUNT_ENRICH`（账号信息补全）
			
 
				+  - `ACCOUNT_PROFILE`（账号历史/画像）
			
 
				+  - `CONTENT_ENRICH`（文章详情补全）
			
 
				+  - `RANK`（过滤/打分/去重/多样性）
			
 
				+  - `CREATE_PLAN`（创建 AIGC 抓取计划）
			
 
				+  - `FINALIZE`（收尾：输出/回传）
			
 
				+
			
 
				+### 流程说明（可重跑）
			
 
				+
			
 
				+1. **S0 接收任务（INIT）**
			
 
				+   - 输入：`demand_id, query, expected_count(默认10), platform=weixin`
			
 
				+   - 产出：`task_id`，写入需求快照与任务主表，状态 `PENDING → RUNNING`
			
 
				+
			
 
				+2. **S1 搜索召回（RECALL）**
			
 
				+   - 工具：`weixin_search`
			
 
				+   - 多轮召回（核心词/扩展词/场景词），支持分页，形成候选池（建议 40~60 条）
			
 
				+   - 状态：`RECALL_DONE`
			
 
				+
			
 
				+3. **S2 账号信息补全（ACCOUNT_ENRICH）**
			
 
				+   - 工具：`fetch_weixin_account`
			
 
				+   - 为候选文章补齐账号字段（如 `wx_gh/account_name/biz_info`）
			
 
				+   - 状态：`ENRICHED`
			
 
				+
			
 
				+4. **S3 账号画像（ACCOUNT_PROFILE）**
			
 
				+   - 工具：`fetch_account_article_list`
			
 
				+   - 对高潜账号抓取最近 N 篇历史文章，形成账号画像特征（垂直度、稳定更新、历史热度中位数等）
			
 
				+
			
 
				+5. **S4 文章详情补全（CONTENT_ENRICH）**
			
 
				+   - 工具：`fetch_article_detail`
			
 
				+   - 只对 topK 候选（例如 30）拉取全文/结构化详情，控制成本
			
 
				+
			
 
				+6. **S5 过滤与打分（RANK）**
			
 
				+   - **硬过滤**：不相关、明显营销/标题党、低质等直接淘汰
			
 
				+   - **软排序**：综合评分（示例 100 分制）
			
 
				+     - 相关性 30
			
 
				+     - 热度 25
			
 
				+     - 内容质量 25
			
 
				+     - 账号可信 15
			
 
				+     - 老年适配 5
			
 
				+   - **去重**：同链接、标题近似、同账号重复过多
			
 
				+   - **多样性**：同账号最多 3 篇；主题子类覆盖（健康/政策/防骗/生活技巧等）
			
 
				+   - **不足补召回**：若高质量不足 10 条，回到 S1 进行增量召回，直到补齐
			
 
				+
			
 
				+7. **S6 创建抓取计划（CREATE_PLAN）**
			
 
				+   - 调用 AIGC 平台创建抓取计划，记录 `crawler_plan_id`
			
 
				+   - 可选：绑定生成计划（produce plan）
			
 
				+   - 状态：`PLAN_CREATED`
			
 
				+
			
 
				+8. **S7 完成（FINALIZE）**
			
 
				+   - 输出：`task_id/trace_id/crawler_plan_id` + 入选内容列表
			
 
				+   - 状态：`COMPLETED`
			
 
				+
			
 
				+### 失败与重试策略
			
 
				+
			
 
				+- **可重试错误**：网络超时、5xx、短时限流 → 最多重试 3 次（2s/4s/8s 指数退避）
			
 
				+- **不可重试错误**：参数错误、权限错误、数据结构不符合约束 → 直接失败并落库
			
 
				+- 所有错误都写入：`supply_task.error_code/error_message` + `supply_task_event`
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 工程关键点（必须项）
			
 
				+
			
 
				+- **并发控制**：`fetch_weixin_account/fetch_article_detail` 建议并发 5~10（按平台容量调参）
			
 
				+- **幂等**：
			
 
				+  - `task_id` 幂等：重复请求不重复建任务
			
 
				+  - 创建计划前检查是否已存在 `crawler_plan_id`
			
 
				+- **可观测性**：每一步写 `supply_task_event`（工具入参/出参摘要、耗时、错误码）
			
 
				+- **可回放**：保留 `raw_payload` 与特征/评分明细，便于规则迭代后重算
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## MySQL 表结构（带字段备注）
			
 
				+
			
 
				+> 建议统一使用 `utf8mb4`；如需高吞吐日志，可将 `supply_task_event` 拆库/分区。
			
 
				+
			
 
				+```sql
			
 
				+SET NAMES utf8mb4;
			
 
				+SET FOREIGN_KEY_CHECKS = 0;
			
 
				+
			
 
				+-- =========================================================
			
 
				+-- 1) 上游需求快照表：冻结需求上下文，便于回溯
			
 
				+-- =========================================================
			
 
				+CREATE TABLE IF NOT EXISTS supply_demand_snapshot (
			
 
				+  id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT COMMENT '自增主键',
			
 
				+  demand_id BIGINT NOT NULL COMMENT '上游需求ID（业务侧主键）',
			
 
				+  demand_code VARCHAR(64) DEFAULT NULL COMMENT '上游需求编码（可选）',
			
 
				+  query VARCHAR(255) NOT NULL COMMENT '核心搜索词',
			
 
				+  query_expansion JSON DEFAULT NULL COMMENT '扩展词列表（同义词/场景词/禁用词等）',
			
 
				+  platform VARCHAR(32) NOT NULL DEFAULT 'weixin' COMMENT '目标平台，当前为 weixin',
			
 
				+  expected_count INT NOT NULL DEFAULT 10 COMMENT '期望返回条数',
			
 
				+  audience_profile JSON DEFAULT NULL COMMENT '目标人群画像（如50+）',
			
 
				+  quality_constraints JSON DEFAULT NULL COMMENT '质量约束（最低阅读量、黑名单词等）',
			
 
				+  source_payload JSON DEFAULT NULL COMMENT '上游原始请求快照',
			
 
				+  version INT NOT NULL DEFAULT 1 COMMENT '快照版本号（同一 demand_id 可多版本）',
			
 
				+  created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
			
 
				+  updated_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
			
 
				+  PRIMARY KEY (id),
			
 
				+  KEY idx_demand_id (demand_id),
			
 
				+  KEY idx_platform (platform),
			
 
				+  KEY idx_created_at (created_at)
			
 
				+) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='上游需求快照表';
			
 
				+
			
 
				+
			
 
				+-- =========================================================
			
 
				+-- 2) 任务主表：一个 demand 可触发多次任务（重跑/补跑）
			
 
				+-- =========================================================
			
 
				+CREATE TABLE IF NOT EXISTS supply_task (
			
 
				+  id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT COMMENT '自增主键',
			
 
				+  task_id VARCHAR(64) NOT NULL COMMENT '任务唯一ID（建议UUID/雪花ID）',
			
 
				+  demand_snapshot_id BIGINT UNSIGNED NOT NULL COMMENT '关联 supply_demand_snapshot.id',
			
 
				+  demand_id BIGINT NOT NULL COMMENT '冗余字段：上游需求ID',
			
 
				+  trace_id VARCHAR(64) DEFAULT NULL COMMENT 'Agent trace_id，便于追踪模型执行',
			
 
				+  status VARCHAR(32) NOT NULL COMMENT '任务状态：PENDING/RUNNING/RECALL_DONE/ENRICHED/RANKED/PLAN_CREATED/COMPLETED/FAILED/CANCELLED',
			
 
				+  current_stage VARCHAR(32) DEFAULT NULL COMMENT '当前阶段：INIT/RECALL/ACCOUNT_ENRICH/ACCOUNT_PROFILE/CONTENT_ENRICH/RANK/CREATE_PLAN/FINALIZE',
			
 
				+  priority TINYINT NOT NULL DEFAULT 5 COMMENT '优先级（1最高，9最低）',
			
 
				+  retry_count INT NOT NULL DEFAULT 0 COMMENT '已重试次数',
			
 
				+  max_retry INT NOT NULL DEFAULT 3 COMMENT '最大重试次数',
			
 
				+  is_idempotent TINYINT NOT NULL DEFAULT 1 COMMENT '是否启用幂等（1是0否）',
			
 
				+  idempotency_key VARCHAR(128) DEFAULT NULL COMMENT '幂等键（可由 demand_id+query_hash 组成）',
			
 
				+  started_at DATETIME DEFAULT NULL COMMENT '任务启动时间',
			
 
				+  finished_at DATETIME DEFAULT NULL COMMENT '任务完成时间',
			
 
				+  duration_ms BIGINT DEFAULT NULL COMMENT '总耗时（毫秒）',
			
 
				+  error_code VARCHAR(64) DEFAULT NULL COMMENT '错误码（网络超时/参数错误/平台限流等）',
			
 
				+  error_message TEXT COMMENT '错误详情',
			
 
				+  operator VARCHAR(64) DEFAULT 'agent' COMMENT '执行方（agent/manual/cron）',
			
 
				+  ext JSON DEFAULT NULL COMMENT '扩展字段（灰度标记、AB实验信息等）',
			
 
				+  created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
			
 
				+  updated_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
			
 
				+  PRIMARY KEY (id),
			
 
				+  UNIQUE KEY uk_task_id (task_id),
			
 
				+  UNIQUE KEY uk_idempotency_key (idempotency_key),
			
 
				+  KEY idx_demand_id (demand_id),
			
 
				+  KEY idx_status_priority (status, priority),
			
 
				+  KEY idx_current_stage (current_stage),
			
 
				+  KEY idx_created_at (created_at),
			
 
				+  CONSTRAINT fk_task_demand_snapshot
			
 
				+    FOREIGN KEY (demand_snapshot_id) REFERENCES supply_demand_snapshot(id)
			
 
				+) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='供给任务主表（状态机核心）';
			
 
				+
			
 
				+
			
 
				+-- =========================================================
			
 
				+-- 3) 阶段执行表：记录每个阶段的执行情况
			
 
				+-- =========================================================
			
 
				+CREATE TABLE IF NOT EXISTS supply_task_stage (
			
 
				+  id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT COMMENT '自增主键',
			
 
				+  task_id VARCHAR(64) NOT NULL COMMENT '关联任务ID',
			
 
				+  stage_name VARCHAR(32) NOT NULL COMMENT '阶段名：RECALL/ACCOUNT_ENRICH/ACCOUNT_PROFILE/CONTENT_ENRICH/RANK/CREATE_PLAN',
			
 
				+  stage_status VARCHAR(32) NOT NULL COMMENT '阶段状态：PENDING/RUNNING/SUCCESS/FAILED/SKIPPED',
			
 
				+  attempt_no INT NOT NULL DEFAULT 1 COMMENT '该阶段第几次执行',
			
 
				+  input_payload JSON DEFAULT NULL COMMENT '阶段输入快照',
			
 
				+  output_payload JSON DEFAULT NULL COMMENT '阶段输出快照（摘要）',
			
 
				+  started_at DATETIME DEFAULT NULL COMMENT '阶段开始时间',
			
 
				+  finished_at DATETIME DEFAULT NULL COMMENT '阶段结束时间',
			
 
				+  duration_ms BIGINT DEFAULT NULL COMMENT '阶段耗时（毫秒）',
			
 
				+  error_code VARCHAR(64) DEFAULT NULL COMMENT '阶段错误码',
			
 
				+  error_message TEXT COMMENT '阶段错误信息',
			
 
				+  created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
			
 
				+  updated_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
			
 
				+  PRIMARY KEY (id),
			
 
				+  KEY idx_task_stage (task_id, stage_name),
			
 
				+  KEY idx_stage_status (stage_status),
			
 
				+  KEY idx_started_at (started_at)
			
 
				+) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='任务阶段执行明细表';
			
 
				+
			
 
				+
			
 
				+-- =========================================================
			
 
				+-- 4) 候选内容池：搜索召回到的原始候选
			
 
				+-- =========================================================
			
 
				+CREATE TABLE IF NOT EXISTS supply_candidate_content (
			
 
				+  id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT COMMENT '自增主键',
			
 
				+  task_id VARCHAR(64) NOT NULL COMMENT '关联任务ID',
			
 
				+  source_keyword VARCHAR(255) DEFAULT NULL COMMENT '触发召回的关键词',
			
 
				+  recall_round INT NOT NULL DEFAULT 1 COMMENT '第几轮召回（核心词/扩展词/场景词）',
			
 
				+  recall_page INT DEFAULT NULL COMMENT '召回分页页码/游标序号',
			
 
				+  platform VARCHAR(32) NOT NULL DEFAULT 'weixin' COMMENT '来源平台',
			
 
				+  content_id VARCHAR(128) DEFAULT NULL COMMENT '平台内容ID（若可提取）',
			
 
				+  title VARCHAR(512) NOT NULL COMMENT '文章标题',
			
 
				+  url VARCHAR(1024) NOT NULL COMMENT '文章链接（业务唯一候选）',
			
 
				+  digest VARCHAR(1000) DEFAULT NULL COMMENT '文章摘要',
			
 
				+  cover_url VARCHAR(1024) DEFAULT NULL COMMENT '封面图链接',
			
 
				+  publish_time DATETIME DEFAULT NULL COMMENT '发布时间',
			
 
				+  account_id VARCHAR(128) DEFAULT NULL COMMENT '公众号ID（wx_gh）',
			
 
				+  account_name VARCHAR(255) DEFAULT NULL COMMENT '公众号名称',
			
 
				+  biz_info VARCHAR(128) DEFAULT NULL COMMENT '公众号 biz 标识',
			
 
				+  view_count BIGINT DEFAULT NULL COMMENT '阅读量（若有）',
			
 
				+  like_count BIGINT DEFAULT NULL COMMENT '点赞量（若有）',
			
 
				+  comment_count BIGINT DEFAULT NULL COMMENT '评论量（若有）',
			
 
				+  favorite_count BIGINT DEFAULT NULL COMMENT '收藏量（若有）',
			
 
				+  share_count BIGINT DEFAULT NULL COMMENT '分享量（若有）',
			
 
				+  raw_statistics JSON DEFAULT NULL COMMENT '原始统计字段',
			
 
				+  raw_payload JSON DEFAULT NULL COMMENT '原始API返回（单条）',
			
 
				+  dedup_hash CHAR(64) DEFAULT NULL COMMENT '去重哈希（建议 title+url 归一化后sha256）',
			
 
				+  quality_flag VARCHAR(32) DEFAULT 'UNKNOWN' COMMENT '初步质量标记：UNKNOWN/PASS/REJECT',
			
 
				+  reject_reason VARCHAR(255) DEFAULT NULL COMMENT '初筛淘汰原因',
			
 
				+  is_deleted TINYINT NOT NULL DEFAULT 0 COMMENT '软删除标记（1删除）',
			
 
				+  created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
			
 
				+  updated_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
			
 
				+  PRIMARY KEY (id),
			
 
				+  UNIQUE KEY uk_task_url (task_id, url(255)),
			
 
				+  KEY idx_task_round (task_id, recall_round),
			
 
				+  KEY idx_task_account (task_id, account_id),
			
 
				+  KEY idx_publish_time (publish_time),
			
 
				+  KEY idx_quality_flag (quality_flag),
			
 
				+  KEY idx_dedup_hash (dedup_hash)
			
 
				+) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='候选内容池（召回原始结果）';
			
 
				+
			
 
				+
			
 
				+-- =========================================================
			
 
				+-- 5) 账号画像表：账号维度的补全与打分输入
			
 
				+-- =========================================================
			
 
				+CREATE TABLE IF NOT EXISTS supply_account_profile (
			
 
				+  id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT COMMENT '自增主键',
			
 
				+  task_id VARCHAR(64) NOT NULL COMMENT '关联任务ID',
			
 
				+  account_id VARCHAR(128) NOT NULL COMMENT '公众号ID（wx_gh）',
			
 
				+  account_name VARCHAR(255) DEFAULT NULL COMMENT '账号名称',
			
 
				+  biz_info VARCHAR(128) DEFAULT NULL COMMENT '账号biz标识',
			
 
				+  intro TEXT COMMENT '账号简介',
			
 
				+  verify_status TINYINT DEFAULT NULL COMMENT '认证状态（0未认证1认证，按平台定义）',
			
 
				+  category VARCHAR(64) DEFAULT NULL COMMENT '账号分类（健康/资讯/情感等）',
			
 
				+  update_frequency VARCHAR(32) DEFAULT NULL COMMENT '更新频率（daily/weekly/irregular）',
			
 
				+  recent_article_count INT DEFAULT 0 COMMENT '近周期发文数',
			
 
				+  median_view_count BIGINT DEFAULT NULL COMMENT '历史阅读量中位数',
			
 
				+  median_like_count BIGINT DEFAULT NULL COMMENT '历史点赞量中位数',
			
 
				+  vertical_score DECIMAL(5,2) DEFAULT 0 COMMENT '垂直度评分（0-100）',
			
 
				+  stability_score DECIMAL(5,2) DEFAULT 0 COMMENT '稳定更新评分（0-100）',
			
 
				+  credibility_score DECIMAL(5,2) DEFAULT 0 COMMENT '可信度评分（0-100）',
			
 
				+  risk_tags JSON DEFAULT NULL COMMENT '风险标签（标题党/营销号/违规高风险）',
			
 
				+  raw_profile_payload JSON DEFAULT NULL COMMENT '账号详情原始返回',
			
 
				+  raw_history_payload JSON DEFAULT NULL COMMENT '历史文章原始返回（摘要）',
			
 
				+  created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
			
 
				+  updated_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
			
 
				+  PRIMARY KEY (id),
			
 
				+  UNIQUE KEY uk_task_account (task_id, account_id),
			
 
				+  KEY idx_task_credibility (task_id, credibility_score),
			
 
				+  KEY idx_category (category)
			
 
				+) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='账号画像表（账号维度特征）';
			
 
				+
			
 
				+
			
 
				+-- =========================================================
			
 
				+-- 6) 内容特征表：文章详情解析后的质量特征
			
 
				+-- =========================================================
			
 
				+CREATE TABLE IF NOT EXISTS supply_content_feature (
			
 
				+  id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT COMMENT '自增主键',
			
 
				+  task_id VARCHAR(64) NOT NULL COMMENT '关联任务ID',
			
 
				+  candidate_id BIGINT UNSIGNED NOT NULL COMMENT '关联候选内容ID',
			
 
				+  content_length INT DEFAULT NULL COMMENT '正文长度（字符数）',
			
 
				+  image_count INT DEFAULT NULL COMMENT '图片数量',
			
 
				+  video_count INT DEFAULT NULL COMMENT '视频数量',
			
 
				+  has_source_reference TINYINT DEFAULT 0 COMMENT '是否包含来源引用（0否1是）',
			
 
				+  readability_score DECIMAL(5,2) DEFAULT 0 COMMENT '可读性评分（0-100）',
			
 
				+  information_density_score DECIMAL(5,2) DEFAULT 0 COMMENT '信息密度评分（0-100）',
			
 
				+  elder_friendly_score DECIMAL(5,2) DEFAULT 0 COMMENT '老年友好评分（0-100）',
			
 
				+  sentiment_score DECIMAL(5,2) DEFAULT NULL COMMENT '情绪倾向分（可选）',
			
 
				+  risk_score DECIMAL(5,2) DEFAULT 0 COMMENT '风险分（越高风险越大）',
			
 
				+  risk_reasons JSON DEFAULT NULL COMMENT '风险原因明细（夸张词/伪科学词/诱导词）',
			
 
				+  keyword_coverage JSON DEFAULT NULL COMMENT 'query关键词覆盖明细',
			
 
				+  summary TEXT COMMENT '文章摘要（抽取）',
			
 
				+  raw_detail_payload JSON DEFAULT NULL COMMENT '文章详情原始返回',
			
 
				+  created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
			
 
				+  updated_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
			
 
				+  PRIMARY KEY (id),
			
 
				+  UNIQUE KEY uk_task_candidate_feature (task_id, candidate_id),
			
 
				+  KEY idx_task_quality (task_id, readability_score, information_density_score),
			
 
				+  CONSTRAINT fk_feature_candidate
			
 
				+    FOREIGN KEY (candidate_id) REFERENCES supply_candidate_content(id)
			
 
				+) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='内容特征表（文章维度特征）';
			
 
				+
			
 
				+
			
 
				+-- =========================================================
			
 
				+-- 7) 评分与筛选结果表：最终排序与入选
			
 
				+-- =========================================================
			
 
				+CREATE TABLE IF NOT EXISTS supply_content_score (
			
 
				+  id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT COMMENT '自增主键',
			
 
				+  task_id VARCHAR(64) NOT NULL COMMENT '关联任务ID',
			
 
				+  candidate_id BIGINT UNSIGNED NOT NULL COMMENT '关联候选内容ID',
			
 
				+  relevance_score DECIMAL(5,2) NOT NULL DEFAULT 0 COMMENT '相关性分（0-100）',
			
 
				+  popularity_score DECIMAL(5,2) NOT NULL DEFAULT 0 COMMENT '热度分（0-100）',
			
 
				+  quality_score DECIMAL(5,2) NOT NULL DEFAULT 0 COMMENT '内容质量分（0-100）',
			
 
				+  account_score DECIMAL(5,2) NOT NULL DEFAULT 0 COMMENT '账号可信分（0-100）',
			
 
				+  elder_fit_score DECIMAL(5,2) NOT NULL DEFAULT 0 COMMENT '老年适配分（0-100）',
			
 
				+  diversity_penalty DECIMAL(5,2) NOT NULL DEFAULT 0 COMMENT '多样性惩罚分（防止同账号过多）',
			
 
				+  total_score DECIMAL(6,2) NOT NULL DEFAULT 0 COMMENT '综合分',
			
 
				+  filter_status VARCHAR(32) NOT NULL DEFAULT 'PENDING' COMMENT '过滤状态：PENDING/PASS/REJECT',
			
 
				+  filter_reason VARCHAR(255) DEFAULT NULL COMMENT '过滤原因',
			
 
				+  is_selected TINYINT NOT NULL DEFAULT 0 COMMENT '是否入选最终结果（1是0否）',
			
 
				+  rank_no INT DEFAULT NULL COMMENT '最终排序名次（从1开始）',
			
 
				+  score_version VARCHAR(32) DEFAULT 'v1' COMMENT '评分模型版本',
			
 
				+  score_detail JSON DEFAULT NULL COMMENT '评分明细（权重、中间分）',
			
 
				+  created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
			
 
				+  updated_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
			
 
				+  PRIMARY KEY (id),
			
 
				+  UNIQUE KEY uk_task_candidate_score (task_id, candidate_id),
			
 
				+  KEY idx_task_selected_rank (task_id, is_selected, rank_no),
			
 
				+  KEY idx_task_total_score (task_id, total_score),
			
 
				+  CONSTRAINT fk_score_candidate
			
 
				+    FOREIGN KEY (candidate_id) REFERENCES supply_candidate_content(id)
			
 
				+) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='内容评分与筛选结果表';
			
 
				+
			
 
				+
			
 
				+-- =========================================================
			
 
				+-- 8) 抓取计划映射表：与AIGC平台交互记录
			
 
				+-- =========================================================
			
 
				+CREATE TABLE IF NOT EXISTS supply_crawler_plan (
			
 
				+  id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT COMMENT '自增主键',
			
 
				+  task_id VARCHAR(64) NOT NULL COMMENT '关联任务ID',
			
 
				+  platform VARCHAR(32) NOT NULL DEFAULT 'weixin' COMMENT '平台',
			
 
				+  plan_type VARCHAR(32) NOT NULL COMMENT '建计划方式：BY_CONTENT/BY_ACCOUNT',
			
 
				+  input_count INT NOT NULL DEFAULT 0 COMMENT '输入内容/账号数量',
			
 
				+  crawler_plan_id VARCHAR(128) DEFAULT NULL COMMENT 'AIGC爬取计划ID',
			
 
				+  crawler_plan_name VARCHAR(255) DEFAULT NULL COMMENT 'AIGC爬取计划名称',
			
 
				+  produce_plan_id VARCHAR(128) DEFAULT NULL COMMENT '绑定的生成计划ID（可空）',
			
 
				+  produce_bind_status VARCHAR(32) DEFAULT NULL COMMENT '绑定状态：NOT_REQUIRED/SUCCESS/FAILED',
			
 
				+  request_payload JSON NOT NULL COMMENT '创建计划请求体',
			
 
				+  response_payload JSON DEFAULT NULL COMMENT '创建计划响应体',
			
 
				+  plan_status VARCHAR(32) NOT NULL COMMENT '计划状态：CREATED/FAILED/BIND_SUCCESS/BIND_FAILED',
			
 
				+  error_code VARCHAR(64) DEFAULT NULL COMMENT '失败错误码',
			
 
				+  error_message TEXT COMMENT '失败错误信息',
			
 
				+  created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
			
 
				+  updated_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
			
 
				+  PRIMARY KEY (id),
			
 
				+  KEY idx_task_id (task_id),
			
 
				+  KEY idx_crawler_plan_id (crawler_plan_id),
			
 
				+  KEY idx_plan_status (plan_status)
			
 
				+) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='AIGC抓取计划映射表';
			
 
				+
			
 
				+
			
 
				+-- =========================================================
			
 
				+-- 9) 事件日志表：完整可观测性与审计
			
 
				+-- =========================================================
			
 
				+CREATE TABLE IF NOT EXISTS supply_task_event (
			
 
				+  id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT COMMENT '自增主键',
			
 
				+  task_id VARCHAR(64) NOT NULL COMMENT '关联任务ID',
			
 
				+  stage_name VARCHAR(32) DEFAULT NULL COMMENT '所属阶段',
			
 
				+  event_type VARCHAR(64) NOT NULL COMMENT '事件类型：STATE_CHANGE/TOOL_CALL/TOOL_RESULT/RETRY/ERROR/MANUAL_INTERVENTION',
			
 
				+  event_level VARCHAR(16) NOT NULL DEFAULT 'INFO' COMMENT '日志级别：DEBUG/INFO/WARN/ERROR',
			
 
				+  tool_name VARCHAR(64) DEFAULT NULL COMMENT '工具名（如 weixin_search）',
			
 
				+  request_id VARCHAR(64) DEFAULT NULL COMMENT '外部请求ID（可用于链路追踪）',
			
 
				+  event_payload JSON DEFAULT NULL COMMENT '事件详情（入参/出参摘要）',
			
 
				+  message VARCHAR(1000) DEFAULT NULL COMMENT '事件描述',
			
 
				+  created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
			
 
				+  PRIMARY KEY (id),
			
 
				+  KEY idx_task_created (task_id, created_at),
			
 
				+  KEY idx_stage_type (stage_name, event_type),
			
 
				+  KEY idx_tool_name (tool_name),
			
 
				+  KEY idx_event_level (event_level)
			
 
				+) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='任务事件日志表';
			
 
				+
			
 
				+SET FOREIGN_KEY_CHECKS = 1;
			
 
				+```
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 与现有代码对齐（建议后续改动）
			
 
				+
			
 
				+- `tests/run_single.py` 的 `allowed_tools` 需要与真实工具名对齐（如 `fetch_account_article_list`），避免 agent 无法调用。
			
 
				+- `tests/tools/weixin_tools.py` 的 `fetch_article_detail` 当前未实现，它是内容质量判断的关键环节，建议补齐后再做强筛选策略。
			
 
				+
			
--- a/docs/wechat-search-recovery-requirements.md
+++ b/docs/wechat-search-recovery-requirements.md
@@ -0,0 +1,173 @@
 
				+# 长文搜寻 Agent 微信搜索能力恢复需求文档
			
 
				+
			
 
				+## 1. 背景与问题
			
 
				+
			
 
				+长文搜寻 Agent 原有的微信搜索能力依赖上游接口（或工具）提供结果。当前该能力被上游下线，导致 `content_search` 阶段无法稳定召回候选文章，进而影响整条 pipeline 的有效性。
			
 
				+
			
 
				+现阶段需要快速恢复可用搜索能力，优先保障业务连续性，同时保留后续可替换为更稳定方案的空间。
			
 
				+
			
 
				+## 2. 目标
			
 
				+
			
 
				+### 2.1 业务目标
			
 
				+
			
 
				+- 恢复 Agent 对微信内容的搜索召回能力。
			
 
				+- 在不改动核心业务流程语义的前提下，保证 pipeline 可持续产出候选文章。
			
 
				+- 将恢复方案设计为可灰度、可观测、可回滚。
			
 
				+
			
 
				+### 2.2 技术目标
			
 
				+
			
 
				+- 引入“Agent 调用浏览器在微信搜狗搜索”的替代能力。
			
 
				+- 对搜索过程和结果质量建立可观测日志与追踪。
			
 
				+- 支持后续平滑切换到其他搜索数据源。
			
 
				+
			
 
				+## 3. 方案概述
			
 
				+
			
 
				+核心思路：Agent 通过浏览器自动化访问搜狗微信搜索页，执行关键词检索、解析结果列表，并输出统一结构的候选文章数据，接入现有 `content_search` 阶段。
			
 
				+
			
 
				+### 3.1 核心流程
			
 
				+
			
 
				+1. Agent 根据需求分析产出的关键词列表逐个发起检索。
			
 
				+2. 浏览器打开搜狗微信搜索页面，输入关键词并请求结果。
			
 
				+3. 解析搜索结果页面，提取文章卡片信息（标题、链接、摘要、时间、来源等）。
			
 
				+4. 对结果做去重、清洗、结构化，映射为 `CandidateArticle`。
			
 
				+5. 回填到 pipeline 上下文，供后续过滤与沉淀阶段使用。
			
 
				+
			
 
				+## 4. 适用范围与非目标
			
 
				+
			
 
				+### 4.1 适用范围
			
 
				+
			
 
				+- 长文搜寻 Agent 的微信内容召回能力恢复。
			
 
				+- 当前 pipeline 的 `content_search` 阶段替代执行。
			
 
				+- 测试/预发/生产环境均可运行（按环境开关控制）。
			
 
				+
			
 
				+### 4.2 非目标
			
 
				+
			
 
				+- 本期不实现高并发采集平台化改造。
			
 
				+- 本期不做多搜索引擎聚合。
			
 
				+- 本期不承诺 100% 抗页面结构变化（先满足可用与可维护）。
			
 
				+
			
 
				+## 5. 功能需求
			
 
				+
			
 
				+### 5.1 搜索执行
			
 
				+
			
 
				+- 输入：关键词、分页参数（默认 1 页，可扩展）。
			
 
				+- 输出：结构化文章列表，字段至少包含：
			
 
				+  - `title`
			
 
				+  - `url`
			
 
				+  - `source_keyword`
			
 
				+  - `publish_time`（可解析则填时间戳，否则置 0）
			
 
				+  - `statistics`（保留原始可用统计信息）
			
 
				+
			
 
				+### 5.2 去重与合并
			
 
				+
			
 
				+- 以文章 URL 为主键去重。
			
 
				+- 同一关键词多轮检索结果合并时保留首条并补充来源关键词信息。
			
 
				+
			
 
				+### 5.3 失败与重试
			
 
				+
			
 
				+- 单关键词检索失败可重试（建议 2~3 次，指数退避）。
			
 
				+- 若关键词全部失败，需返回可解释错误并进入现有 gate 策略。
			
 
				+- 支持超时控制，避免单次任务阻塞全流程。
			
 
				+
			
 
				+### 5.4 可观测性
			
 
				+
			
 
				+- 记录每个关键词的检索状态：成功/失败/重试次数/返回条数/新增条数。
			
 
				+- 输出页面解析错误样本（截断保存）用于排障。
			
 
				+- 与现有 trace 对齐，保证可视化可追踪。
			
 
				+
			
 
				+### 5.5 配置与开关
			
 
				+
			
 
				+- 增加搜索后端策略开关，例如：
			
 
				+  - `WECHAT_SEARCH_BACKEND=browser_sogou|legacy|hybrid`
			
 
				+- 浏览器参数可配置：
			
 
				+  - 无头/有头模式
			
 
				+  - 超时时间
			
 
				+  - User-Agent
			
 
				+  - 代理配置（如需要）
			
 
				+
			
 
				+## 6. 非功能需求
			
 
				+
			
 
				+### 6.1 性能
			
 
				+
			
 
				+- 单关键词首屏检索期望 < 8s（网络波动允许超时重试）。
			
 
				+- 单次任务（默认关键词数）总时长需在当前 pipeline 可接受范围内。
			
 
				+
			
 
				+### 6.2 稳定性
			
 
				+
			
 
				+- 解析逻辑需具备容错（DOM 选择器多策略 + 兜底）。
			
 
				+- 网络异常、跳转异常、验证码/反爬触发需可识别并记录。
			
 
				+
			
 
				+### 6.3 可维护性
			
 
				+
			
 
				+- 页面解析逻辑与业务逻辑分层，避免耦合。
			
 
				+- 关键选择器与规则集中管理，便于热修复。
			
 
				+
			
 
				+### 6.4 合规与安全
			
 
				+
			
 
				+- 遵守目标站点使用条款与公司安全规范。
			
 
				+- 不在日志中泄露敏感信息（cookie、token、账号凭据等）。
			
 
				+
			
 
				+## 7. 技术设计要求（高层）
			
 
				+
			
 
				+### 7.1 模块划分建议
			
 
				+
			
 
				+- `BrowserSearchAdapter`：统一对外搜索接口（与现有 adapter 风格一致）。
			
 
				+- `SogouWeixinClient`：浏览器会话管理、页面访问、请求控制。
			
 
				+- `SogouResultParser`：DOM 解析与结构化映射。
			
 
				+- `SearchTelemetry`：关键词级指标与错误采样。
			
 
				+
			
 
				+### 7.2 与现有 pipeline 对接
			
 
				+
			
 
				+- 在 `content_search` 阶段通过策略开关选择搜索后端。
			
 
				+- 输出继续写入 `ctx.candidate_articles` 与 `_search_keyword_stats`，保持下游兼容。
			
 
				+
			
 
				+### 7.3 失败降级
			
 
				+
			
 
				+- 浏览器方案失败时可降级：
			
 
				+  - 返回空结果并触发现有 gate/fallback 逻辑；
			
 
				+  - 或回退到备用后端（若配置了 hybrid）。
			
 
				+
			
 
				+## 8. 验收标准（DoD）
			
 
				+
			
 
				+### 8.1 功能验收
			
 
				+
			
 
				+- 在测试环境运行 20 条代表性 query：
			
 
				+  - 至少 90% 任务能成功完成搜索阶段并产出候选文章。
			
 
				+  - 候选结果结构化字段完整率 >= 95%（title/url/source_keyword 必填）。
			
 
				+
			
 
				+### 8.2 质量验收
			
 
				+
			
 
				+- 关键词级日志完整可追踪，trace 页面可查看搜索统计。
			
 
				+- 页面解析异常可定位（有错误类型和样本）。
			
 
				+
			
 
				+### 8.3 稳定性验收
			
 
				+
			
 
				+- 连续运行 3 天无阻塞性故障（可接受少量单关键词失败并自动重试恢复）。
			
 
				+
			
 
				+## 9. 里程碑计划（建议）
			
 
				+
			
 
				+- M1（1-2 天）：最小可用版本（单关键词检索 + 结果解析 + 接入 content_search）。
			
 
				+- M2（2-3 天）：完善重试、超时、日志、trace 展示与配置开关。
			
 
				+- M3（1-2 天）：回归测试、灰度发布、故障预案与回滚验证。
			
 
				+
			
 
				+## 10. 风险与应对
			
 
				+
			
 
				+- 页面结构变化导致解析失效  
			
 
				+  - 应对：多选择器策略 + 快速热修 + 监控告警。
			
 
				+- 反爬策略导致检索受限  
			
 
				+  - 应对：限速、随机等待、会话治理、必要时代理策略。
			
 
				+- 检索质量波动  
			
 
				+  - 应对：关键词策略优化、召回后过滤强化、A/B 对比。
			
 
				+
			
 
				+## 11. 回滚策略
			
 
				+
			
 
				+- 通过配置开关一键切回旧后端或禁用浏览器后端。
			
 
				+- 保留完整日志与 trace，便于回滚后复盘。
			
 
				+
			
 
				+## 12. 待确认事项
			
 
				+
			
 
				+- 是否允许引入浏览器自动化依赖（Playwright/Selenium）及运行环境要求。
			
 
				+- 是否需要代理池/固定出口 IP。
			
 
				+- 生产环境是否允许有头模式调试。
			
 
				+- 搜索结果页抓取频率上限与并发策略。
			
 
				+
			
--- a/pipeline_visualize.py
+++ b/pipeline_visualize.py
@@ -0,0 +1,1861 @@
 
				+"""
			
 
				+Pipeline 执行追踪可视化工具。
			
 
				+
			
 
				+读取 tests/traces/{trace_id}/pipeline.jsonl，生成 HTML 可视化页面。
			
 
				+HTML 风格与 tests/.cache/visualize_log.py 保持一致。
			
 
				+
			
 
				+用法：
			
 
				+    python pipeline_visualize.py                          # 读取最新 trace
			
 
				+    python pipeline_visualize.py <trace_id>               # 指定 trace_id
			
 
				+    python pipeline_visualize.py --list                   # 列出所有可用 trace
			
 
				+"""
			
 
				+
			
 
				+from __future__ import annotations
			
 
				+
			
 
				+import html as html_mod
			
 
				+import json
			
 
				+import re
			
 
				+import sys
			
 
				+from datetime import datetime
			
 
				+from pathlib import Path
			
 
				+
			
 
				+TRACES_DIR = Path(__file__).parent / "tests" / "traces"
			
 
				+
			
 
				+
			
 
				+# ─────────────────────────────────────────────────────────────
			
 
				+# 工具函数
			
 
				+# ─────────────────────────────────────────────────────────────
			
 
				+
			
 
				+def _esc(s: str) -> str:
			
 
				+    return html_mod.escape(str(s))
			
 
				+
			
 
				+
			
 
				+def _ts(s: str) -> str:
			
 
				+    return s[:19] if s else ""
			
 
				+
			
 
				+
			
 
				+def _duration_label(ms: int | None) -> str:
			
 
				+    if ms is None:
			
 
				+        return ""
			
 
				+    if ms < 1000:
			
 
				+        return f"{ms}ms"
			
 
				+    return f"{ms / 1000:.1f}s"
			
 
				+
			
 
				+
			
 
				+def _calc_duration(start_ts: str, end_ts: str) -> str:
			
 
				+    try:
			
 
				+        t0 = datetime.strptime(start_ts[:19], "%Y-%m-%d %H:%M:%S")
			
 
				+        t1 = datetime.strptime(end_ts[:19], "%Y-%m-%d %H:%M:%S")
			
 
				+        delta = int((t1 - t0).total_seconds())
			
 
				+        mins, secs = divmod(delta, 60)
			
 
				+        return f"{mins}分{secs}秒" if mins else f"{secs}秒"
			
 
				+    except Exception:
			
 
				+        return "N/A"
			
 
				+
			
 
				+
			
 
				+def _ts_readable(ts: int) -> str:
			
 
				+    """将时间戳转为可读格式"""
			
 
				+    try:
			
 
				+        dt = datetime.fromtimestamp(ts)
			
 
				+        return dt.strftime("%Y-%m-%d %H:%M")
			
 
				+    except Exception:
			
 
				+        return str(ts)
			
 
				+
			
 
				+
			
 
				+# ─────────────────────────────────────────────────────────────
			
 
				+# JSONL 读取
			
 
				+# ─────────────────────────────────────────────────────────────
			
 
				+
			
 
				+def read_jsonl(path: Path) -> list[dict]:
			
 
				+    events = []
			
 
				+    for line in path.read_text(encoding="utf-8").splitlines():
			
 
				+        line = line.strip()
			
 
				+        if not line:
			
 
				+            continue
			
 
				+        try:
			
 
				+            events.append(json.loads(line))
			
 
				+        except json.JSONDecodeError:
			
 
				+            pass
			
 
				+    return events
			
 
				+
			
 
				+
			
 
				+# ─────────────────────────────────────────────────────────────
			
 
				+# HTML 渲染
			
 
				+# ─────────────────────────────────────────────────────────────
			
 
				+
			
 
				+_STAGE_LABELS = {
			
 
				+    "demand_analysis": "需求理解",
			
 
				+    "content_search": "内容召回",
			
 
				+    "hard_filter": "硬规则过滤",
			
 
				+    "coarse_filter": "标题粗筛",
			
 
				+    "quality_filter": "质量精排",
			
 
				+    "account_precipitate": "账号沉淀",
			
 
				+    "output_persist": "结果输出",
			
 
				+}
			
 
				+
			
 
				+_GATE_LABELS = {
			
 
				+    "content_search": "SearchCompletenessGate",
			
 
				+    "quality_filter": "FilterSufficiencyGate",
			
 
				+    "output_persist": "OutputSchemaGate",
			
 
				+}
			
 
				+
			
 
				+_ACTION_COLORS = {
			
 
				+    "proceed": "var(--green)",
			
 
				+    "retry_stage": "var(--yellow)",
			
 
				+    "fallback": "var(--orange)",
			
 
				+    "abort": "var(--red)",
			
 
				+}
			
 
				+
			
 
				+
			
 
				+# ─────────────────────────────────────────────────────────────
			
 
				+# 决策数据渲染
			
 
				+# ─────────────────────────────────────────────────────────────
			
 
				+
			
 
				+def _render_decisions(stage: str, decisions: dict) -> str:
			
 
				+    """根据阶段类型，将 decisions dict 渲染为 HTML（可折叠 <details> 卡片）。"""
			
 
				+    if not decisions:
			
 
				+        return ""
			
 
				+    renderer = _DECISION_RENDERERS.get(stage)
			
 
				+    if not renderer:
			
 
				+        return ""
			
 
				+    try:
			
 
				+        inner = renderer(decisions)
			
 
				+    except Exception:
			
 
				+        return ""
			
 
				+    if not inner:
			
 
				+        return ""
			
 
				+    label = _STAGE_LABELS.get(stage, stage)
			
 
				+    # 重要阶段默认展开
			
 
				+    open_attr = " open" if stage in ("quality_filter", "demand_analysis", "content_search", "coarse_filter") else ""
			
 
				+    return (
			
 
				+        f'<details class="decision-card"{open_attr}>'
			
 
				+        f'<summary>📋 {_esc(label)} 决策详情</summary>'
			
 
				+        f'<div class="decision-body">{inner}</div>'
			
 
				+        f'</details>'
			
 
				+    )
			
 
				+
			
 
				+
			
 
				+def _render_demand_analysis(d: dict) -> str:
			
 
				+    parts: list[str] = []
			
 
				+
			
 
				+    # 特征分层
			
 
				+    feature_rows: list[str] = []
			
 
				+    for key, label in [
			
 
				+        ("substantive_features", "实质特征"),
			
 
				+        ("formal_features", "形式特征"),
			
 
				+        ("upper_features", "上层特征"),
			
 
				+        ("lower_features", "下层特征"),
			
 
				+    ]:
			
 
				+        items = d.get(key, [])
			
 
				+        if items:
			
 
				+            tags = " ".join(f'<span class="tag">{_esc(t)}</span>' for t in items)
			
 
				+            feature_rows.append(
			
 
				+                f'<tr><td class="feature-label">{label}</td><td>{tags}</td></tr>'
			
 
				+            )
			
 
				+    if feature_rows:
			
 
				+        parts.append(
			
 
				+            '<div class="decision-section">'
			
 
				+            '<div class="section-title">🧠 特征分层</div>'
			
 
				+            '<table class="decision-table">'
			
 
				+            + "\n".join(feature_rows)
			
 
				+            + '</table></div>'
			
 
				+        )
			
 
				+
			
 
				+    # 搜索策略
			
 
				+    ss = d.get("search_strategy", {})
			
 
				+    precise = ss.get("precise_keywords", [])
			
 
				+    topic = ss.get("topic_keywords", [])
			
 
				+    if precise or topic:
			
 
				+        rows: list[str] = []
			
 
				+        if precise:
			
 
				+            tags = " ".join(f'<span class="tag tag-blue">{_esc(k)}</span>' for k in precise)
			
 
				+            rows.append(f'<tr><td class="feature-label">精准词</td><td>{tags}</td></tr>')
			
 
				+        if topic:
			
 
				+            tags = " ".join(f'<span class="tag tag-purple">{_esc(k)}</span>' for k in topic)
			
 
				+            rows.append(f'<tr><td class="feature-label">主题词</td><td>{tags}</td></tr>')
			
 
				+        parts.append(
			
 
				+            '<div class="decision-section">'
			
 
				+            '<div class="section-title">🔎 搜索策略</div>'
			
 
				+            '<table class="decision-table">'
			
 
				+            + "\n".join(rows)
			
 
				+            + '</table></div>'
			
 
				+        )
			
 
				+
			
 
				+    # 筛选关注点
			
 
				+    ff = d.get("filter_focus", {})
			
 
				+    relevance = ff.get("relevance_focus", [])
			
 
				+    risks = ff.get("elimination_risks", [])
			
 
				+    if relevance or risks:
			
 
				+        items_html = ""
			
 
				+        if relevance:
			
 
				+            items_html += '<div class="focus-group"><b>关注点</b><ul>'
			
 
				+            items_html += "".join(f'<li>{_esc(r)}</li>' for r in relevance)
			
 
				+            items_html += '</ul></div>'
			
 
				+        if risks:
			
 
				+            items_html += '<div class="focus-group"><b>淘汰风险</b><ul>'
			
 
				+            items_html += "".join(f'<li>{_esc(r)}</li>' for r in risks)
			
 
				+            items_html += '</ul></div>'
			
 
				+        parts.append(
			
 
				+            '<div class="decision-section">'
			
 
				+            '<div class="section-title">🎯 筛选关注</div>'
			
 
				+            + items_html
			
 
				+            + '</div>'
			
 
				+        )
			
 
				+
			
 
				+    return "\n".join(parts)
			
 
				+
			
 
				+
			
 
				+def _render_content_search(d: dict) -> str:
			
 
				+    parts: list[str] = []
			
 
				+    stats = d.get("keyword_stats", [])
			
 
				+    total = d.get("total_candidates", 0)
			
 
				+    candidates = d.get("candidates", [])
			
 
				+
			
 
				+    # 搜索词命中统计
			
 
				+    if stats:
			
 
				+        rows = []
			
 
				+        for s in stats:
			
 
				+            kw = _esc(s.get("keyword", ""))
			
 
				+            returned = s.get("returned", 0)
			
 
				+            new = s.get("new", 0)
			
 
				+            rows.append(
			
 
				+                f'<tr><td><code>{kw}</code></td>'
			
 
				+                f'<td class="num-cell">{returned}</td>'
			
 
				+                f'<td class="num-cell">{new}</td></tr>'
			
 
				+            )
			
 
				+        parts.append(
			
 
				+            '<div class="decision-section">'
			
 
				+            '<div class="section-title">📊 搜索词命中</div>'
			
 
				+            '<table class="decision-table kw-table">'
			
 
				+            '<thead><tr><th>关键词</th><th>返回数</th><th>新增数</th></tr></thead>'
			
 
				+            '<tbody>' + "\n".join(rows) + '</tbody>'
			
 
				+            '</table></div>'
			
 
				+        )
			
 
				+
			
 
				+    # 全部召回文章列表
			
 
				+    if candidates:
			
 
				+        rows = []
			
 
				+        for idx, c in enumerate(candidates, 1):
			
 
				+            title = _esc(c.get("title", ""))
			
 
				+            url = _esc(c.get("url", ""))
			
 
				+            kw = _esc(c.get("source_keyword", ""))
			
 
				+            pt = c.get("publish_time", 0)
			
 
				+            pt_str = _ts_readable(pt) if pt else "未知"
			
 
				+            view = c.get("view_count", 0)
			
 
				+            rows.append(
			
 
				+                f'<tr>'
			
 
				+                f'<td class="num-cell">{idx}</td>'
			
 
				+                f'<td class="article-title-cell"><a href="{url}" target="_blank">{title}</a></td>'
			
 
				+                f'<td><code>{kw}</code></td>'
			
 
				+                f'<td>{_esc(pt_str)}</td>'
			
 
				+                f'<td class="num-cell">{view}</td>'
			
 
				+                f'</tr>'
			
 
				+            )
			
 
				+        parts.append(
			
 
				+            '<div class="decision-section">'
			
 
				+            f'<div class="section-title">📋 全部召回文章（{len(candidates)} 篇）</div>'
			
 
				+            '<table class="decision-table recall-table">'
			
 
				+            '<thead><tr><th>#</th><th>标题</th><th>来源关键词</th><th>发布时间</th><th>阅读量</th></tr></thead>'
			
 
				+            '<tbody>' + "\n".join(rows) + '</tbody>'
			
 
				+            '</table></div>'
			
 
				+        )
			
 
				+
			
 
				+    parts.append(
			
 
				+        f'<div class="total-line">累计候选: <b>{total}</b> 篇</div>'
			
 
				+    )
			
 
				+    return "\n".join(parts)
			
 
				+
			
 
				+
			
 
				+def _render_hard_filter(d: dict) -> str:
			
 
				+    count = d.get("after_filter_count", 0)
			
 
				+    return (
			
 
				+        '<div class="decision-section">'
			
 
				+        f'<div class="section-title">📊 过滤结果</div>'
			
 
				+        f'<div class="total-line">过滤后剩余: <b>{count}</b> 篇</div>'
			
 
				+        '</div>'
			
 
				+    )
			
 
				+
			
 
				+
			
 
				+def _render_coarse_filter(d: dict) -> str:
			
 
				+    log = d.get("coarse_log", [])
			
 
				+    total = d.get("total_count", len(log))
			
 
				+    passed_cnt = d.get("passed_count", 0)
			
 
				+    rejected_cnt = d.get("rejected_count", 0)
			
 
				+    after_cnt = d.get("after_filter_count", 0)
			
 
				+
			
 
				+    if not log:
			
 
				+        return f'<div class="decision-section">粗筛后剩余: {after_cnt} 篇</div>'
			
 
				+
			
 
				+    parts: list[str] = []
			
 
				+
			
 
				+    # 统计概览
			
 
				+    parts.append(
			
 
				+        '<div class="decision-section">'
			
 
				+        f'<div class="section-title">📊 粗筛统计</div>'
			
 
				+        f'<span class="stat-pill stat-accept">通过 {passed_cnt}</span>'
			
 
				+        f'<span class="stat-pill stat-reject">淘汰 {rejected_cnt}</span>'
			
 
				+        '</div>'
			
 
				+    )
			
 
				+
			
 
				+    # 通过的文章
			
 
				+    passed = [r for r in log if r.get("status") == "pass"]
			
 
				+    if passed:
			
 
				+        rows = []
			
 
				+        for idx, r in enumerate(passed, 1):
			
 
				+            title = _esc(r.get("title", ""))
			
 
				+            url = _esc(r.get("url", ""))
			
 
				+            reason = _esc(r.get("reason", ""))
			
 
				+            src_kw = _esc(r.get("source_keyword", ""))
			
 
				+            rows.append(
			
 
				+                f'<tr class="row-accept">'
			
 
				+                f'<td class="num-cell">{idx}</td>'
			
 
				+                f'<td class="article-title-cell"><a href="{url}" target="_blank">{title}</a></td>'
			
 
				+                f'<td><code>{src_kw}</code></td>'
			
 
				+                f'<td class="reason-full-cell">{reason}</td>'
			
 
				+                f'</tr>'
			
 
				+            )
			
 
				+        parts.append(
			
 
				+            '<div class="decision-section">'
			
 
				+            f'<div class="section-title">✅ 通过文章（{len(passed)} 篇）</div>'
			
 
				+            '<table class="decision-table review-table">'
			
 
				+            '<thead><tr><th>#</th><th>标题</th><th>来源词</th><th>理由</th></tr></thead>'
			
 
				+            '<tbody>' + "\n".join(rows) + '</tbody>'
			
 
				+            '</table></div>'
			
 
				+        )
			
 
				+
			
 
				+    # 淘汰的文章
			
 
				+    rejected = [r for r in log if r.get("status") == "reject"]
			
 
				+    if rejected:
			
 
				+        rows = []
			
 
				+        for idx, r in enumerate(rejected, 1):
			
 
				+            title = _esc(r.get("title", ""))
			
 
				+            url = _esc(r.get("url", ""))
			
 
				+            reason = _esc(r.get("reason", ""))
			
 
				+            src_kw = _esc(r.get("source_keyword", ""))
			
 
				+            rows.append(
			
 
				+                f'<tr class="row-reject">'
			
 
				+                f'<td class="num-cell">{idx}</td>'
			
 
				+                f'<td class="article-title-cell"><a href="{url}" target="_blank">{title}</a></td>'
			
 
				+                f'<td><code>{src_kw}</code></td>'
			
 
				+                f'<td class="reason-full-cell">{reason}</td>'
			
 
				+                f'</tr>'
			
 
				+            )
			
 
				+        parts.append(
			
 
				+            '<div class="decision-section">'
			
 
				+            f'<div class="section-title">❌ 淘汰文章（{len(rejected)} 篇）</div>'
			
 
				+            '<table class="decision-table review-table">'
			
 
				+            '<thead><tr><th>#</th><th>标题</th><th>来源词</th><th>理由</th></tr></thead>'
			
 
				+            '<tbody>' + "\n".join(rows) + '</tbody>'
			
 
				+            '</table></div>'
			
 
				+        )
			
 
				+
			
 
				+    parts.append(
			
 
				+        f'<div class="total-line">粗筛后剩余: <b>{after_cnt}</b> 篇</div>'
			
 
				+    )
			
 
				+    return "\n".join(parts)
			
 
				+
			
 
				+
			
 
				+def _render_quality_filter(d: dict) -> str:
			
 
				+    reviews = d.get("review_log", [])
			
 
				+    accepted_cnt = d.get("accepted_count", 0)
			
 
				+    rejected_cnt = d.get("rejected_count", 0)
			
 
				+    skipped_cnt = d.get("skipped_count", 0)
			
 
				+    final_cnt = d.get("final_filtered_count", 0)
			
 
				+    score_config = d.get("score_config", {})
			
 
				+    match_terms = d.get("match_terms", [])
			
 
				+
			
 
				+    if not reviews:
			
 
				+        return f'<div class="decision-section">最终入选: {final_cnt} 篇</div>'
			
 
				+
			
 
				+    parts: list[str] = []
			
 
				+
			
 
				+    # 评分配置
			
 
				+    if score_config:
			
 
				+        cfg_rows: list[str] = []
			
 
				+        for key, label in [
			
 
				+            ("min_body_length", "最小正文长度"),
			
 
				+            ("high_relevance_ratio", "高相关性阈值"),
			
 
				+            ("view_count_threshold", "阅读量阈值"),
			
 
				+            ("engage_rate_threshold", "互动率阈值"),
			
 
				+            ("spam_keywords_count", "标题党关键词数"),
			
 
				+        ]:
			
 
				+            val = score_config.get(key, "")
			
 
				+            if val != "":
			
 
				+                cfg_rows.append(
			
 
				+                    f'<tr><td class="feature-label">{label}</td>'
			
 
				+                    f'<td><code>{_esc(str(val))}</code></td></tr>'
			
 
				+                )
			
 
				+        if cfg_rows:
			
 
				+            parts.append(
			
 
				+                '<div class="decision-section">'
			
 
				+                '<div class="section-title">⚙️ 评分配置</div>'
			
 
				+                '<table class="decision-table">'
			
 
				+                + "\n".join(cfg_rows)
			
 
				+                + '</table></div>'
			
 
				+            )
			
 
				+
			
 
				+    # 匹配词表
			
 
				+    if match_terms:
			
 
				+        tags = " ".join(f'<span class="tag tag-blue">{_esc(t)}</span>' for t in match_terms)
			
 
				+        parts.append(
			
 
				+            '<div class="decision-section">'
			
 
				+            f'<div class="section-title">🔑 匹配词表（{len(match_terms)} 个）</div>'
			
 
				+            f'{tags}'
			
 
				+            '</div>'
			
 
				+        )
			
 
				+
			
 
				+    # 统计概览
			
 
				+    parts.append(
			
 
				+        '<div class="decision-section">'
			
 
				+        f'<div class="section-title">📊 审核统计</div>'
			
 
				+        f'<span class="stat-pill stat-accept">入选 {accepted_cnt}</span>'
			
 
				+        f'<span class="stat-pill stat-reject">淘汰 {rejected_cnt}</span>'
			
 
				+        + (f'<span class="stat-pill stat-skip">跳过 {skipped_cnt}</span>' if skipped_cnt else '')
			
 
				+        + '</div>'
			
 
				+    )
			
 
				+
			
 
				+    review_table_head = (
			
 
				+        '<thead><tr>'
			
 
				+        '<th>#</th><th>标题</th><th>来源词</th><th>相关性</th><th>兴趣</th><th>阶段</th>'
			
 
				+        '<th>发布日期</th><th>正文长度</th><th>阅读</th><th>点赞</th><th>分享</th><th>在看</th>'
			
 
				+        '<th>原因</th>'
			
 
				+        '</tr></thead>'
			
 
				+    )
			
 
				+
			
 
				+    # 入选文章
			
 
				+    accepted = [r for r in reviews if r.get("status") == "accept"]
			
 
				+    accepted.sort(key=lambda r: int(r.get("view_count", 0) or 0), reverse=True)
			
 
				+    if accepted:
			
 
				+        rows = []
			
 
				+        for idx, r in enumerate(accepted, 1):
			
 
				+            title = _esc(r.get("title", ""))
			
 
				+            url = _esc(r.get("url", ""))
			
 
				+            relevance = _esc(r.get("relevance", ""))
			
 
				+            interest = _esc(r.get("interest", ""))
			
 
				+            reason = _esc(r.get("reason", ""))
			
 
				+            phase = "LLM" if r.get("phase") == "llm" else "启发式"
			
 
				+            pt = r.get("publish_time", 0)
			
 
				+            pt_str = _ts_readable(pt) if pt else "未知"
			
 
				+            view = r.get("view_count", 0)
			
 
				+            like = r.get("like_count", 0)
			
 
				+            share = r.get("share_count", 0)
			
 
				+            looking = r.get("looking_count", 0)
			
 
				+            body_len = r.get("body_length", "-")
			
 
				+            src_kw = _esc(r.get("source_keyword", ""))
			
 
				+            rows.append(
			
 
				+                f'<tr class="row-accept">'
			
 
				+                f'<td class="num-cell">{idx}</td>'
			
 
				+                f'<td class="article-title-cell"><a href="{url}" target="_blank">{title}</a></td>'
			
 
				+                f'<td><code>{src_kw}</code></td>'
			
 
				+                f'<td class="score-cell">{relevance}</td>'
			
 
				+                f'<td class="score-cell">{interest}</td>'
			
 
				+                f'<td><span class="phase-badge">{phase}</span></td>'
			
 
				+                f'<td class="date-cell">{_esc(pt_str)}</td>'
			
 
				+                f'<td class="num-cell">{body_len}</td>'
			
 
				+                f'<td class="num-cell">{view}</td>'
			
 
				+                f'<td class="num-cell">{like}</td>'
			
 
				+                f'<td class="num-cell">{share}</td>'
			
 
				+                f'<td class="num-cell">{looking}</td>'
			
 
				+                f'<td class="reason-full-cell">{reason}</td>'
			
 
				+                f'</tr>'
			
 
				+            )
			
 
				+        parts.append(
			
 
				+            '<div class="decision-section">'
			
 
				+            f'<div class="section-title">✅ 入选文章（{len(accepted)} 篇）</div>'
			
 
				+            '<table class="decision-table review-table">'
			
 
				+            + review_table_head +
			
 
				+            '<tbody>' + "\n".join(rows) + '</tbody>'
			
 
				+            '</table></div>'
			
 
				+        )
			
 
				+
			
 
				+    # 淘汰文章
			
 
				+    rejected = [r for r in reviews if r.get("status") == "reject"]
			
 
				+    if rejected:
			
 
				+        rows = []
			
 
				+        for idx, r in enumerate(rejected, 1):
			
 
				+            title = _esc(r.get("title", ""))
			
 
				+            url = _esc(r.get("url", ""))
			
 
				+            relevance = _esc(r.get("relevance", ""))
			
 
				+            interest = _esc(r.get("interest", ""))
			
 
				+            reason = _esc(r.get("reason", ""))
			
 
				+            phase = "LLM" if r.get("phase") == "llm" else "启发式"
			
 
				+            pt = r.get("publish_time", 0)
			
 
				+            pt_str = _ts_readable(pt) if pt else "未知"
			
 
				+            view = r.get("view_count", 0)
			
 
				+            like = r.get("like_count", 0)
			
 
				+            share = r.get("share_count", 0)
			
 
				+            looking = r.get("looking_count", 0)
			
 
				+            body_len = r.get("body_length", "-")
			
 
				+            src_kw = _esc(r.get("source_keyword", ""))
			
 
				+            rows.append(
			
 
				+                f'<tr class="row-reject">'
			
 
				+                f'<td class="num-cell">{idx}</td>'
			
 
				+                f'<td class="article-title-cell"><a href="{url}" target="_blank">{title}</a></td>'
			
 
				+                f'<td><code>{src_kw}</code></td>'
			
 
				+                f'<td class="score-cell">{relevance}</td>'
			
 
				+                f'<td class="score-cell">{interest}</td>'
			
 
				+                f'<td><span class="phase-badge">{phase}</span></td>'
			
 
				+                f'<td class="date-cell">{_esc(pt_str)}</td>'
			
 
				+                f'<td class="num-cell">{body_len}</td>'
			
 
				+                f'<td class="num-cell">{view}</td>'
			
 
				+                f'<td class="num-cell">{like}</td>'
			
 
				+                f'<td class="num-cell">{share}</td>'
			
 
				+                f'<td class="num-cell">{looking}</td>'
			
 
				+                f'<td class="reason-full-cell">{reason}</td>'
			
 
				+                f'</tr>'
			
 
				+            )
			
 
				+        parts.append(
			
 
				+            '<div class="decision-section">'
			
 
				+            f'<div class="section-title">❌ 淘汰文章（{len(rejected)} 篇）</div>'
			
 
				+            '<table class="decision-table review-table">'
			
 
				+            + review_table_head +
			
 
				+            '<tbody>' + "\n".join(rows) + '</tbody>'
			
 
				+            '</table></div>'
			
 
				+        )
			
 
				+
			
 
				+    # 跳过文章
			
 
				+    skipped = [r for r in reviews if r.get("status") == "skip"]
			
 
				+    if skipped:
			
 
				+        rows = []
			
 
				+        for idx, r in enumerate(skipped, 1):
			
 
				+            title = _esc(r.get("title", ""))
			
 
				+            url = _esc(r.get("url", ""))
			
 
				+            reason = _esc(r.get("reason", ""))
			
 
				+            phase = "LLM" if r.get("phase") == "llm" else "启发式"
			
 
				+            pt = r.get("publish_time", 0)
			
 
				+            pt_str = _ts_readable(pt) if pt else "未知"
			
 
				+            view = r.get("view_count", 0)
			
 
				+            src_kw = _esc(r.get("source_keyword", ""))
			
 
				+            rows.append(
			
 
				+                f'<tr class="row-skip">'
			
 
				+                f'<td class="num-cell">{idx}</td>'
			
 
				+                f'<td class="article-title-cell"><a href="{url}" target="_blank">{title}</a></td>'
			
 
				+                f'<td><code>{src_kw}</code></td>'
			
 
				+                f'<td class="score-cell">-</td>'
			
 
				+                f'<td class="score-cell">-</td>'
			
 
				+                f'<td><span class="phase-badge">{phase}</span></td>'
			
 
				+                f'<td class="date-cell">{_esc(pt_str)}</td>'
			
 
				+                f'<td class="num-cell">-</td>'
			
 
				+                f'<td class="num-cell">{view}</td>'
			
 
				+                f'<td class="num-cell">-</td>'
			
 
				+                f'<td class="num-cell">-</td>'
			
 
				+                f'<td class="num-cell">-</td>'
			
 
				+                f'<td class="reason-full-cell">{reason}</td>'
			
 
				+                f'</tr>'
			
 
				+            )
			
 
				+        parts.append(
			
 
				+            '<div class="decision-section">'
			
 
				+            f'<div class="section-title">⏭️ 跳过文章（{len(skipped)} 篇）</div>'
			
 
				+            '<table class="decision-table review-table">'
			
 
				+            + review_table_head +
			
 
				+            '<tbody>' + "\n".join(rows) + '</tbody>'
			
 
				+            '</table></div>'
			
 
				+        )
			
 
				+
			
 
				+    parts.append(
			
 
				+        f'<div class="total-line">最终入选: <b>{final_cnt}</b> 篇</div>'
			
 
				+    )
			
 
				+    return "\n".join(parts)
			
 
				+
			
 
				+
			
 
				+def _render_account_precipitate(d: dict) -> str:
			
 
				+    accounts = d.get("accounts", [])
			
 
				+    if not accounts:
			
 
				+        return '<div class="decision-section">聚合账号: 0 个</div>'
			
 
				+
			
 
				+    rows = []
			
 
				+    for acc in accounts:
			
 
				+        name = _esc(acc.get("account_name", ""))
			
 
				+        count = acc.get("article_count", 0)
			
 
				+        samples = acc.get("sample_articles", [])
			
 
				+        sample_html = ""
			
 
				+        if samples:
			
 
				+            sample_html = '<div class="sample-titles">' + ", ".join(
			
 
				+                _esc(s) for s in samples[:3]
			
 
				+            ) + '</div>'
			
 
				+        rows.append(
			
 
				+            f'<tr><td><b>{name}</b></td>'
			
 
				+            f'<td class="num-cell">{count} 篇</td>'
			
 
				+            f'<td>{sample_html}</td></tr>'
			
 
				+        )
			
 
				+    return (
			
 
				+        '<div class="decision-section">'
			
 
				+        '<div class="section-title">👤 聚合账号</div>'
			
 
				+        '<table class="decision-table acct-table">'
			
 
				+        '<thead><tr><th>账号名</th><th>文章数</th><th>示例标题</th></tr></thead>'
			
 
				+        '<tbody>' + "\n".join(rows) + '</tbody>'
			
 
				+        '</table></div>'
			
 
				+    )
			
 
				+
			
 
				+
			
 
				+def _render_output_persist(d: dict) -> str:
			
 
				+    path = d.get("output_file", "")
			
 
				+    if not path:
			
 
				+        return ""
			
 
				+    return (
			
 
				+        '<div class="decision-section">'
			
 
				+        f'<div class="section-title">📄 输出文件</div>'
			
 
				+        f'<code class="file-path">{_esc(path)}</code>'
			
 
				+        '</div>'
			
 
				+    )
			
 
				+
			
 
				+
			
 
				+_DECISION_RENDERERS = {
			
 
				+    "demand_analysis": _render_demand_analysis,
			
 
				+    "content_search": _render_content_search,
			
 
				+    "hard_filter": _render_hard_filter,
			
 
				+    "coarse_filter": _render_coarse_filter,
			
 
				+    "quality_filter": _render_quality_filter,
			
 
				+    "account_precipitate": _render_account_precipitate,
			
 
				+    "output_persist": _render_output_persist,
			
 
				+}
			
 
				+
			
 
				+
			
 
				+# ─────────────────────────────────────────────────────────────
			
 
				+# LLM 交互追踪渲染
			
 
				+# ─────────────────────────────────────────────────────────────
			
 
				+
			
 
				+def _render_llm_interactions(interactions: list[dict]) -> str:
			
 
				+    """渲染阶段内所有 LLM 交互记录（折叠卡片，展示思考过程）。"""
			
 
				+    if not interactions:
			
 
				+        return ""
			
 
				+
			
 
				+    sections: list[str] = []
			
 
				+    for idx, ix in enumerate(interactions, 1):
			
 
				+        name = _esc(ix.get("name", "LLM 调用"))
			
 
				+        model = _esc(ix.get("model", ""))
			
 
				+        duration_ms = ix.get("duration_ms", 0)
			
 
				+        tokens = ix.get("tokens", 0)
			
 
				+        dur_label = _duration_label(duration_ms)
			
 
				+
			
 
				+        parts: list[str] = []
			
 
				+
			
 
				+        # 输入 Prompt
			
 
				+        messages = ix.get("messages", [])
			
 
				+        for msg in messages:
			
 
				+            role = msg.get("role", "")
			
 
				+            content = msg.get("content", "")
			
 
				+            if not content:
			
 
				+                continue
			
 
				+            content_str = str(content) if not isinstance(content, str) else content
			
 
				+            role_icon = {"system": "📜", "user": "👤", "assistant": "🤖"}.get(role, "💬")
			
 
				+            role_label = {"system": "System Prompt", "user": "User Prompt", "assistant": "Assistant"}.get(role, role)
			
 
				+            if len(content_str) > 500:
			
 
				+                parts.append(
			
 
				+                    f'<details class="llm-msg llm-msg-{_esc(role)}">'
			
 
				+                    f'<summary>{role_icon} {role_label} ({len(content_str)} 字)</summary>'
			
 
				+                    f'<pre class="llm-msg-pre">{_esc(content_str)}</pre>'
			
 
				+                    f'</details>'
			
 
				+                )
			
 
				+            else:
			
 
				+                parts.append(
			
 
				+                    f'<div class="llm-msg llm-msg-{_esc(role)}">'
			
 
				+                    f'<div class="llm-msg-header">{role_icon} {role_label}</div>'
			
 
				+                    f'<pre class="llm-msg-pre">{_esc(content_str)}</pre>'
			
 
				+                    f'</div>'
			
 
				+                )
			
 
				+
			
 
				+        # 推理过程（reasoning）
			
 
				+        reasoning = ix.get("reasoning", "")
			
 
				+        if reasoning:
			
 
				+            parts.append(
			
 
				+                f'<details class="llm-reasoning" open>'
			
 
				+                f'<summary>🧠 LLM 推理过程 ({len(reasoning)} 字)</summary>'
			
 
				+                f'<pre class="llm-msg-pre llm-reasoning-text">{_esc(reasoning)}</pre>'
			
 
				+                f'</details>'
			
 
				+            )
			
 
				+
			
 
				+        # 工具调用
			
 
				+        tool_calls = ix.get("tool_calls") or []
			
 
				+        for tc in tool_calls:
			
 
				+            tool_name = _esc(tc.get("tool_name", ""))
			
 
				+            args = tc.get("arguments", "")
			
 
				+            try:
			
 
				+                args_obj = json.loads(args) if isinstance(args, str) else args
			
 
				+                args_formatted = json.dumps(args_obj, ensure_ascii=False, indent=2)
			
 
				+            except Exception:
			
 
				+                args_formatted = str(args)
			
 
				+            result_preview = tc.get("result_preview", "")
			
 
				+            parts.append(
			
 
				+                f'<div class="llm-tool-call">'
			
 
				+                f'<div class="llm-tool-name">🔧 <code>{tool_name}</code></div>'
			
 
				+                f'<pre class="llm-tool-args">{_esc(args_formatted)}</pre>'
			
 
				+            )
			
 
				+            if result_preview:
			
 
				+                parts.append(
			
 
				+                    f'<details class="llm-tool-result">'
			
 
				+                    f'<summary>📤 返回结果 ({len(result_preview)} 字)</summary>'
			
 
				+                    f'<pre class="llm-msg-pre">{_esc(result_preview)}</pre>'
			
 
				+                    f'</details>'
			
 
				+                )
			
 
				+            parts.append('</div>')
			
 
				+
			
 
				+        # LLM 回复
			
 
				+        response_text = ix.get("response_text", "")
			
 
				+        if response_text:
			
 
				+            if len(response_text) > 2000:
			
 
				+                parts.append(
			
 
				+                    f'<details class="llm-response">'
			
 
				+                    f'<summary>💬 LLM 回复 ({len(response_text)} 字)</summary>'
			
 
				+                    f'<pre class="llm-msg-pre">{_esc(response_text)}</pre>'
			
 
				+                    f'</details>'
			
 
				+                )
			
 
				+            else:
			
 
				+                parts.append(
			
 
				+                    f'<div class="llm-response">'
			
 
				+                    f'<div class="llm-msg-header">💬 LLM 回复</div>'
			
 
				+                    f'<pre class="llm-msg-pre">{_esc(response_text)}</pre>'
			
 
				+                    f'</div>'
			
 
				+                )
			
 
				+
			
 
				+        inner = "\n".join(parts)
			
 
				+        meta_parts = []
			
 
				+        if model:
			
 
				+            meta_parts.append(f'<code>{model}</code>')
			
 
				+        if dur_label:
			
 
				+            meta_parts.append(dur_label)
			
 
				+        if tokens:
			
 
				+            meta_parts.append(f'{tokens} tokens')
			
 
				+        meta_html = " · ".join(meta_parts)
			
 
				+
			
 
				+        sections.append(
			
 
				+            f'<details class="llm-interaction-card" open>'
			
 
				+            f'<summary>🧪 LLM 交互 #{idx}: {name}'
			
 
				+            f'<span class="llm-interaction-meta"> ({meta_html})</span>'
			
 
				+            f'</summary>'
			
 
				+            f'<div class="llm-interaction-body">{inner}</div>'
			
 
				+            f'</details>'
			
 
				+        )
			
 
				+
			
 
				+    return "\n".join(sections)
			
 
				+
			
 
				+
			
 
				+# ─────────────────────────────────────────────────────────────
			
 
				+# Agent Trace 渲染
			
 
				+# ─────────────────────────────────────────────────────────────
			
 
				+
			
 
				+def _load_agent_trace(trace_id: str) -> tuple[list[dict], dict]:
			
 
				+    """读取 agent 子任务的 events.jsonl 和 meta.json。"""
			
 
				+    agent_dir = TRACES_DIR / trace_id
			
 
				+    events: list[dict] = []
			
 
				+    meta: dict = {}
			
 
				+    events_path = agent_dir / "events.jsonl"
			
 
				+    meta_path = agent_dir / "meta.json"
			
 
				+    if events_path.exists():
			
 
				+        events = read_jsonl(events_path)
			
 
				+    if meta_path.exists():
			
 
				+        try:
			
 
				+            meta = json.loads(meta_path.read_text(encoding="utf-8"))
			
 
				+        except Exception:
			
 
				+            pass
			
 
				+    return events, meta
			
 
				+
			
 
				+
			
 
				+def _render_agent_meta(meta: dict) -> str:
			
 
				+    """渲染 agent 元信息摘要。"""
			
 
				+    task = meta.get("task", "")
			
 
				+    model = meta.get("model", "")
			
 
				+    status = meta.get("status", "")
			
 
				+    total_tokens = meta.get("total_tokens", 0)
			
 
				+    total_cost = meta.get("total_cost", 0)
			
 
				+    created = meta.get("created_at", "")[:19]
			
 
				+    completed = meta.get("completed_at", "")[:19]
			
 
				+    duration = _calc_duration(created, completed) if created and completed else "N/A"
			
 
				+
			
 
				+    status_color = "var(--green)" if status == "completed" else "var(--red)"
			
 
				+    return (
			
 
				+        f'<div class="agent-meta">'
			
 
				+        f'<span class="agent-meta-item">任务: <b>{_esc(task)}</b></span>'
			
 
				+        f'<span class="agent-meta-item">模型: <code>{_esc(model)}</code></span>'
			
 
				+        f'<span class="agent-meta-item">状态: <span style="color:{status_color}">{_esc(status)}</span></span>'
			
 
				+        f'<span class="agent-meta-item">耗时: {_esc(duration)}</span>'
			
 
				+        f'<span class="agent-meta-item">Tokens: {total_tokens}</span>'
			
 
				+        f'<span class="agent-meta-item">Cost: ${total_cost:.4f}</span>'
			
 
				+        f'</div>'
			
 
				+    )
			
 
				+
			
 
				+
			
 
				+def _render_agent_message(msg: dict) -> str:
			
 
				+    """渲染单条 agent 消息。"""
			
 
				+    role = msg.get("role", "")
			
 
				+    content = msg.get("content", "")
			
 
				+    tool_call_id = msg.get("tool_call_id", "")
			
 
				+    ts = msg.get("created_at", "")[:19]
			
 
				+    tokens = msg.get("tokens", 0)
			
 
				+
			
 
				+    if role == "system":
			
 
				+        # System prompt 折叠显示，只展示前 200 字
			
 
				+        text = content if isinstance(content, str) else str(content)
			
 
				+        preview = text[:200] + "..." if len(text) > 200 else text
			
 
				+        return (
			
 
				+            f'<details class="agent-msg agent-msg-system">'
			
 
				+            f'<summary>📜 System Prompt ({len(text)} 字)</summary>'
			
 
				+            f'<pre class="agent-msg-pre">{_esc(text)}</pre>'
			
 
				+            f'</details>'
			
 
				+        )
			
 
				+
			
 
				+    elif role == "user":
			
 
				+        text = content if isinstance(content, str) else str(content)
			
 
				+        # User prompt 折叠，如果太长
			
 
				+        if len(text) > 500:
			
 
				+            return (
			
 
				+                f'<details class="agent-msg agent-msg-user">'
			
 
				+                f'<summary>👤 User Prompt ({len(text)} 字)</summary>'
			
 
				+                f'<pre class="agent-msg-pre">{_esc(text)}</pre>'
			
 
				+                f'</details>'
			
 
				+            )
			
 
				+        return (
			
 
				+            f'<div class="agent-msg agent-msg-user">'
			
 
				+            f'<div class="agent-msg-header">👤 User</div>'
			
 
				+            f'<pre class="agent-msg-pre">{_esc(text)}</pre>'
			
 
				+            f'</div>'
			
 
				+        )
			
 
				+
			
 
				+    elif role == "assistant":
			
 
				+        parts: list[str] = []
			
 
				+
			
 
				+        # 提取文本和工具调用
			
 
				+        if isinstance(content, dict):
			
 
				+            text = content.get("text", "") or ""
			
 
				+            tool_calls = content.get("tool_calls", []) or []
			
 
				+            reasoning = content.get("reasoning_content", "") or ""
			
 
				+        else:
			
 
				+            text = str(content or "")
			
 
				+            tool_calls = []
			
 
				+            reasoning = ""
			
 
				+
			
 
				+        token_badge = f'<span class="agent-token-badge">{tokens} tokens</span>' if tokens else ""
			
 
				+
			
 
				+        # 推理内容
			
 
				+        if reasoning:
			
 
				+            parts.append(
			
 
				+                f'<details class="agent-reasoning">'
			
 
				+                f'<summary>🧠 推理过程</summary>'
			
 
				+                f'<pre class="agent-msg-pre">{_esc(reasoning)}</pre>'
			
 
				+                f'</details>'
			
 
				+            )
			
 
				+
			
 
				+        # 文本回复
			
 
				+        if text:
			
 
				+            # 截断过长文本，折叠显示
			
 
				+            if len(text) > 2000:
			
 
				+                parts.append(
			
 
				+                    f'<details class="agent-thinking">'
			
 
				+                    f'<summary>💭 回复 ({len(text)} 字)</summary>'
			
 
				+                    f'<pre class="agent-msg-pre">{_esc(text)}</pre>'
			
 
				+                    f'</details>'
			
 
				+                )
			
 
				+            else:
			
 
				+                parts.append(
			
 
				+                    f'<div class="agent-thinking">'
			
 
				+                    f'<pre class="agent-msg-pre">{_esc(text)}</pre>'
			
 
				+                    f'</div>'
			
 
				+                )
			
 
				+
			
 
				+        # 工具调用
			
 
				+        for tc in tool_calls:
			
 
				+            fn = tc.get("function", {})
			
 
				+            fn_name = fn.get("name", "?")
			
 
				+            fn_args = fn.get("arguments", "")
			
 
				+            # 格式化 JSON 参数
			
 
				+            try:
			
 
				+                args_obj = json.loads(fn_args) if isinstance(fn_args, str) else fn_args
			
 
				+                fn_args_formatted = json.dumps(args_obj, ensure_ascii=False, indent=2)
			
 
				+            except Exception:
			
 
				+                fn_args_formatted = str(fn_args)
			
 
				+            parts.append(
			
 
				+                f'<div class="agent-tool-call">'
			
 
				+                f'<div class="agent-tool-name">🔧 <code>{_esc(fn_name)}</code></div>'
			
 
				+                f'<pre class="agent-tool-args">{_esc(fn_args_formatted)}</pre>'
			
 
				+                f'</div>'
			
 
				+            )
			
 
				+
			
 
				+        inner = "\n".join(parts)
			
 
				+        return (
			
 
				+            f'<div class="agent-msg agent-msg-assistant">'
			
 
				+            f'<div class="agent-msg-header">🤖 Assistant {token_badge}</div>'
			
 
				+            f'{inner}'
			
 
				+            f'</div>'
			
 
				+        )
			
 
				+
			
 
				+    elif role == "tool":
			
 
				+        # 工具返回
			
 
				+        if isinstance(content, dict):
			
 
				+            tool_name = content.get("tool_name", "")
			
 
				+            result = content.get("result", "")
			
 
				+        else:
			
 
				+            tool_name = ""
			
 
				+            result = str(content or "")
			
 
				+
			
 
				+        result_str = str(result)
			
 
				+        if len(result_str) > 1500:
			
 
				+            return (
			
 
				+                f'<details class="agent-msg agent-msg-tool">'
			
 
				+                f'<summary>📤 {_esc(tool_name)} 返回 ({len(result_str)} 字)</summary>'
			
 
				+                f'<pre class="agent-msg-pre">{_esc(result_str)}</pre>'
			
 
				+                f'</details>'
			
 
				+            )
			
 
				+        return (
			
 
				+            f'<div class="agent-msg agent-msg-tool">'
			
 
				+            f'<div class="agent-msg-header">📤 {_esc(tool_name)}</div>'
			
 
				+            f'<pre class="agent-msg-pre">{_esc(result_str)}</pre>'
			
 
				+            f'</div>'
			
 
				+        )
			
 
				+
			
 
				+    return ""
			
 
				+
			
 
				+
			
 
				+def _render_agent_trace_section(agent_trace_ids: list[str]) -> str:
			
 
				+    """渲染阶段内所有 agent 子任务的执行详情（折叠卡片）。"""
			
 
				+    if not agent_trace_ids:
			
 
				+        return ""
			
 
				+
			
 
				+    sections: list[str] = []
			
 
				+    for tid in agent_trace_ids:
			
 
				+        agent_events, meta = _load_agent_trace(tid)
			
 
				+        if not agent_events and not meta:
			
 
				+            continue
			
 
				+
			
 
				+        task_name = meta.get("task", tid[:8])
			
 
				+        total_tokens = meta.get("total_tokens", 0)
			
 
				+        total_cost = meta.get("total_cost", 0)
			
 
				+
			
 
				+        # 渲染元信息
			
 
				+        meta_html = _render_agent_meta(meta) if meta else ""
			
 
				+
			
 
				+        # 渲染消息序列
			
 
				+        msg_blocks: list[str] = []
			
 
				+        for ev in agent_events:
			
 
				+            if ev.get("event") == "message_added":
			
 
				+                msg = ev.get("message", {})
			
 
				+                rendered = _render_agent_message(msg)
			
 
				+                if rendered:
			
 
				+                    msg_blocks.append(rendered)
			
 
				+            elif ev.get("event") == "goal_added":
			
 
				+                goal = ev.get("goal", {})
			
 
				+                goal_desc = goal.get("description", "")
			
 
				+                msg_blocks.append(
			
 
				+                    f'<div class="agent-msg agent-msg-goal">'
			
 
				+                    f'🎯 目标创建: <b>{_esc(goal_desc)}</b>'
			
 
				+                    f'</div>'
			
 
				+                )
			
 
				+
			
 
				+        messages_html = "\n".join(msg_blocks)
			
 
				+        sections.append(
			
 
				+            f'<details class="agent-trace-card">'
			
 
				+            f'<summary>🤖 Agent 执行详情: {_esc(task_name)}'
			
 
				+            f'<span class="agent-trace-summary"> ({total_tokens} tokens · ${total_cost:.4f})</span>'
			
 
				+            f'</summary>'
			
 
				+            f'<div class="agent-trace-body">'
			
 
				+            f'{meta_html}'
			
 
				+            f'<div class="agent-messages">{messages_html}</div>'
			
 
				+            f'</div>'
			
 
				+            f'</details>'
			
 
				+        )
			
 
				+
			
 
				+    return "\n".join(sections)
			
 
				+
			
 
				+
			
 
				+def _parse_log_lines(log_text: str) -> list[dict]:
			
 
				+    """将 full_log.log 文本解析为结构化行列表，供 HTML 渲染。"""
			
 
				+    lines: list[dict] = []
			
 
				+    for raw in log_text.splitlines():
			
 
				+        level = "STDOUT"
			
 
				+        if "| DEBUG   |" in raw:
			
 
				+            level = "DEBUG"
			
 
				+        elif "| INFO    |" in raw:
			
 
				+            level = "INFO"
			
 
				+        elif "| WARNING |" in raw:
			
 
				+            level = "WARNING"
			
 
				+        elif "| ERROR   |" in raw:
			
 
				+            level = "ERROR"
			
 
				+        elif "| CRITICAL|" in raw:
			
 
				+            level = "CRITICAL"
			
 
				+        lines.append({"level": level, "text": raw})
			
 
				+    return lines
			
 
				+
			
 
				+
			
 
				+def _render_full_log_section(log_lines: list[dict]) -> str:
			
 
				+    """生成完整日志面板的 HTML（带过滤、搜索、行号、颜色）。"""
			
 
				+    if not log_lines:
			
 
				+        return ""
			
 
				+
			
 
				+    level_counts = {}
			
 
				+    for ln in log_lines:
			
 
				+        level_counts[ln["level"]] = level_counts.get(ln["level"], 0) + 1
			
 
				+
			
 
				+    rows: list[str] = []
			
 
				+    for idx, ln in enumerate(log_lines, 1):
			
 
				+        lvl = ln["level"].lower()
			
 
				+        text = _esc(ln["text"])
			
 
				+        rows.append(
			
 
				+            f'<tr class="log-row log-{lvl}" data-level="{ln["level"]}">'
			
 
				+            f'<td class="log-ln">{idx}</td>'
			
 
				+            f'<td class="log-txt">{text}</td>'
			
 
				+            f'</tr>'
			
 
				+        )
			
 
				+
			
 
				+    filter_buttons: list[str] = []
			
 
				+    filter_buttons.append(
			
 
				+        f'<button class="log-btn log-btn-active" data-filter="ALL">ALL ({len(log_lines)})</button>'
			
 
				+    )
			
 
				+    for lvl in ("DEBUG", "INFO", "WARNING", "ERROR", "STDOUT"):
			
 
				+        cnt = level_counts.get(lvl, 0)
			
 
				+        if cnt:
			
 
				+            filter_buttons.append(
			
 
				+                f'<button class="log-btn" data-filter="{lvl}">{lvl} ({cnt})</button>'
			
 
				+            )
			
 
				+
			
 
				+    return (
			
 
				+        '<div class="log-panel">'
			
 
				+        '<h2 class="log-title">📋 完整执行日志</h2>'
			
 
				+        '<div class="log-toolbar">'
			
 
				+        '<div class="log-filters">' + "".join(filter_buttons) + '</div>'
			
 
				+        '<input type="text" class="log-search" placeholder="🔍 搜索日志..." id="logSearch" />'
			
 
				+        '</div>'
			
 
				+        '<div class="log-container">'
			
 
				+        '<table class="log-table"><tbody id="logBody">'
			
 
				+        + "\n".join(rows) +
			
 
				+        '</tbody></table>'
			
 
				+        '</div>'
			
 
				+        '</div>'
			
 
				+    )
			
 
				+
			
 
				+
			
 
				+_LOG_VIEWER_CSS = """
			
 
				+/* ── 日志面板 ── */
			
 
				+.log-panel { margin-top:32px; border:1px solid var(--border); border-radius:10px; background:var(--bg2); overflow:hidden; }
			
 
				+.log-title { font-size:18px; color:var(--blue); padding:16px 20px 0; margin:0; letter-spacing:-0.3px; }
			
 
				+.log-toolbar { display:flex; flex-wrap:wrap; gap:8px; align-items:center; padding:12px 20px; border-bottom:1px solid var(--border); }
			
 
				+.log-filters { display:flex; gap:4px; flex-wrap:wrap; }
			
 
				+.log-btn {
			
 
				+  background:rgba(139,148,158,.08); border:1px solid var(--border); border-radius:6px;
			
 
				+  padding:3px 10px; font-size:11px; color:var(--dim); cursor:pointer; transition:all .15s;
			
 
				+}
			
 
				+.log-btn:hover { background:rgba(88,166,255,.1); color:var(--blue); border-color:rgba(88,166,255,.3); }
			
 
				+.log-btn-active { background:rgba(88,166,255,.15); color:var(--blue); border-color:rgba(88,166,255,.4); }
			
 
				+.log-search {
			
 
				+  flex:1; min-width:180px; background:var(--bg); border:1px solid var(--border); border-radius:6px;
			
 
				+  padding:4px 10px; font-size:12px; color:var(--text); outline:none;
			
 
				+}
			
 
				+.log-search:focus { border-color:var(--blue); }
			
 
				+.log-container { max-height:calc(100vh - 200px); overflow-y:auto; }
			
 
				+.log-table { width:100%; border-collapse:collapse; font-family:"SF Mono",Monaco,Menlo,monospace; font-size:11px; }
			
 
				+.log-row { border-bottom:1px solid rgba(33,38,45,.3); }
			
 
				+.log-row.log-hidden { display:none; }
			
 
				+.log-ln { width:50px; text-align:right; padding:2px 8px 2px 4px; color:rgba(139,148,158,.4); user-select:none; vertical-align:top; }
			
 
				+.log-txt { padding:2px 8px; white-space:pre-wrap; word-break:break-all; line-height:1.5; }
			
 
				+.log-debug .log-txt { color:var(--dim); }
			
 
				+.log-info .log-txt { color:var(--text); }
			
 
				+.log-warning .log-txt { color:var(--yellow); }
			
 
				+.log-error .log-txt, .log-critical .log-txt { color:var(--red); }
			
 
				+.log-stdout .log-txt { color:var(--purple); }
			
 
				+.log-highlight { background:rgba(227,179,65,.15); }
			
 
				+"""
			
 
				+
			
 
				+_LOG_VIEWER_JS = """
			
 
				+<script>
			
 
				+(function(){
			
 
				+  var activeFilter = 'ALL';
			
 
				+  var searchTerm = '';
			
 
				+
			
 
				+  function applyFilters(){
			
 
				+    var rows = document.querySelectorAll('.log-row');
			
 
				+    var term = searchTerm.toLowerCase();
			
 
				+    rows.forEach(function(row){
			
 
				+      var level = row.getAttribute('data-level');
			
 
				+      var text = row.querySelector('.log-txt').textContent.toLowerCase();
			
 
				+      var matchLevel = (activeFilter === 'ALL' || level === activeFilter);
			
 
				+      var matchSearch = (!term || text.indexOf(term) !== -1);
			
 
				+      if(matchLevel && matchSearch){
			
 
				+        row.classList.remove('log-hidden');
			
 
				+        if(term){
			
 
				+          row.classList.add('log-highlight');
			
 
				+        } else {
			
 
				+          row.classList.remove('log-highlight');
			
 
				+        }
			
 
				+      } else {
			
 
				+        row.classList.add('log-hidden');
			
 
				+        row.classList.remove('log-highlight');
			
 
				+      }
			
 
				+    });
			
 
				+  }
			
 
				+
			
 
				+  document.querySelectorAll('.log-btn').forEach(function(btn){
			
 
				+    btn.addEventListener('click', function(){
			
 
				+      document.querySelectorAll('.log-btn').forEach(function(b){ b.classList.remove('log-btn-active'); });
			
 
				+      btn.classList.add('log-btn-active');
			
 
				+      activeFilter = btn.getAttribute('data-filter');
			
 
				+      applyFilters();
			
 
				+    });
			
 
				+  });
			
 
				+
			
 
				+  var searchInput = document.getElementById('logSearch');
			
 
				+  if(searchInput){
			
 
				+    var debounce;
			
 
				+    searchInput.addEventListener('input', function(){
			
 
				+      clearTimeout(debounce);
			
 
				+      debounce = setTimeout(function(){
			
 
				+        searchTerm = searchInput.value;
			
 
				+        applyFilters();
			
 
				+      }, 200);
			
 
				+    });
			
 
				+  }
			
 
				+})();
			
 
				+</script>
			
 
				+"""
			
 
				+
			
 
				+
			
 
				+def render_html(events: list[dict], full_log_lines: list[dict] | None = None) -> str:
			
 
				+    init_ev = next((e for e in events if e["type"] == "init"), None)
			
 
				+    complete_ev = next((e for e in events if e["type"] == "complete"), None)
			
 
				+
			
 
				+    query = init_ev.get("query", "") if init_ev else ""
			
 
				+    model = init_ev.get("model", "") if init_ev else ""
			
 
				+    demand_id = init_ev.get("demand_id", "") if init_ev else ""
			
 
				+    trace_id = init_ev.get("trace_id", "") if init_ev else ""
			
 
				+    target_count = init_ev.get("target_count", 0) if init_ev else 0
			
 
				+
			
 
				+    timestamps = [e.get("ts", "") for e in events if e.get("ts")]
			
 
				+    start_ts = timestamps[0] if timestamps else ""
			
 
				+    end_ts = timestamps[-1] if timestamps else ""
			
 
				+    duration = _calc_duration(start_ts, end_ts) if start_ts and end_ts else "N/A"
			
 
				+
			
 
				+    stage_completes = [e for e in events if e["type"] == "stage_complete"]
			
 
				+    gate_checks = [e for e in events if e["type"] == "gate_check"]
			
 
				+    error_events = [e for e in events if e["type"] == "error"]
			
 
				+
			
 
				+    final_stats = complete_ev.get("stats", {}) if complete_ev else {}
			
 
				+    candidate_count = final_stats.get("candidate_count", 0)
			
 
				+    filtered_count = final_stats.get("filtered_count", 0)
			
 
				+    account_count = final_stats.get("account_count", 0)
			
 
				+    output_file = complete_ev.get("output_file", "") if complete_ev else ""
			
 
				+    pipeline_status = complete_ev.get("status", "unknown") if complete_ev else "running"
			
 
				+    ordered_stages = list(_STAGE_LABELS.keys())
			
 
				+    stage_order_map = {name: idx + 1 for idx, name in enumerate(ordered_stages)}
			
 
				+
			
 
				+    # ── 构建事件块 ──
			
 
				+    blocks: list[str] = []
			
 
				+
			
 
				+    for ev in events:
			
 
				+        t = ev["type"]
			
 
				+        ts = _ts(ev.get("ts", ""))
			
 
				+
			
 
				+        if t == "init":
			
 
				+            # 基础信息
			
 
				+            init_html_parts = [
			
 
				+                f'<div class="ev-init">',
			
 
				+                f'<div class="ev-label">🚀 Pipeline 启动</div>',
			
 
				+                f'<div class="kv"><span class="k">查询词</span><span class="v">{_esc(query)}</span></div>',
			
 
				+                f'<div class="kv"><span class="k">模型</span><span class="v">{_esc(model)}</span></div>',
			
 
				+                f'<div class="kv"><span class="k">需求ID</span><span class="v">{_esc(str(demand_id))}</span></div>',
			
 
				+                f'<div class="kv"><span class="k">目标条数</span><span class="v">{target_count}</span></div>',
			
 
				+                f'<div class="kv"><span class="k">trace_id</span><span class="v mono">{_esc(trace_id)}</span></div>',
			
 
				+                f'<div class="kv"><span class="k">时间</span><span class="v">{_esc(ts)}</span></div>',
			
 
				+            ]
			
 
				+
			
 
				+            # 执行计划（来自 Harness）
			
 
				+            rp = ev.get("run_plan")
			
 
				+            if rp:
			
 
				+                init_html_parts.append('<hr style="border-color:#444; margin:12px 0;">')
			
 
				+                init_html_parts.append('<div class="ev-label" style="margin-top:4px;">📋 执行计划</div>')
			
 
				+                init_html_parts.append(
			
 
				+                    f'<div class="kv"><span class="k">超时上限</span>'
			
 
				+                    f'<span class="v">{rp.get("timeout_seconds", "N/A")} 秒</span></div>'
			
 
				+                )
			
 
				+                init_html_parts.append(
			
 
				+                    f'<div class="kv"><span class="k">目标文章上限</span>'
			
 
				+                    f'<span class="v">{rp.get("max_target_count", "N/A")} 篇</span></div>'
			
 
				+                )
			
 
				+                init_html_parts.append(
			
 
				+                    f'<div class="kv"><span class="k">最大补召回轮次</span>'
			
 
				+                    f'<span class="v">{rp.get("max_fallback_rounds", "N/A")} 轮</span></div>'
			
 
				+                )
			
 
				+                plan_stages = rp.get("stages", [])
			
 
				+                if plan_stages:
			
 
				+                    init_html_parts.append(
			
 
				+                        '<div style="margin-top:8px; font-size:13px; color:#aaa;">阶段规划:</div>'
			
 
				+                    )
			
 
				+                    for idx, ps in enumerate(plan_stages, 1):
			
 
				+                        sname = _esc(ps.get("name", ""))
			
 
				+                        slabel = _esc(ps.get("label", ""))
			
 
				+                        sicon = _STAGE_LABELS.get(ps.get("name", ""), sname)
			
 
				+                        gate = ps.get("gate", "")
			
 
				+                        gate_html = (
			
 
				+                            f' <span style="color:#e0a040; font-size:12px;">'
			
 
				+                            f'└─ Gate: {_esc(gate)}</span>'
			
 
				+                        ) if gate else ""
			
 
				+                        init_html_parts.append(
			
 
				+                            f'<div style="padding:2px 0 2px 16px; font-size:13px;">'
			
 
				+                            f'<span style="color:#6cf;">{idx}.</span> '
			
 
				+                            f'<code style="color:#8be9fd;">{sname}</code> '
			
 
				+                            f'<span style="color:#bbb;">← {slabel}</span>'
			
 
				+                            f'{gate_html}</div>'
			
 
				+                        )
			
 
				+
			
 
				+            init_html_parts.append('</div>')
			
 
				+            blocks.append("\n".join(init_html_parts))
			
 
				+
			
 
				+        elif t == "stage_start":
			
 
				+            stage = ev.get("stage", "")
			
 
				+            icon = ev.get("icon", "▶")
			
 
				+            label = _STAGE_LABELS.get(stage, stage)
			
 
				+            stage_no = stage_order_map.get(stage, 0)
			
 
				+            blocks.append(
			
 
				+                f'<div class="ev-phase">'
			
 
				+                f'<span class="ts">{_esc(ts)}</span> '
			
 
				+                f'{icon} <h1 class="stage-h1">{stage_no}. {_esc(label)}</h1>'
			
 
				+                f'<span class="stage-name"> ({_esc(stage)})</span>'
			
 
				+                f'</div>'
			
 
				+            )
			
 
				+
			
 
				+        elif t == "stage_complete":
			
 
				+            stage = ev.get("stage", "")
			
 
				+            icon = ev.get("icon", "▶")
			
 
				+            label = _STAGE_LABELS.get(stage, stage)
			
 
				+            attempt = ev.get("attempt", 1)
			
 
				+            dur = _duration_label(ev.get("duration_ms"))
			
 
				+            stats = ev.get("stats", {})
			
 
				+            decisions = ev.get("decisions", {})
			
 
				+            agent_trace_ids = ev.get("agent_trace_ids", [])
			
 
				+            llm_interactions = ev.get("llm_interactions", [])
			
 
				+            attempt_badge = f'<span class="badge-retry">重试#{attempt}</span>' if attempt > 1 else ""
			
 
				+            dur_badge = f'<span class="badge-dur">{_esc(dur)}</span>' if dur else ""
			
 
				+            stats_html = (
			
 
				+                f'<span class="stat-pill">候选 {stats.get("candidate_count", 0)}</span>'
			
 
				+                f'<span class="stat-pill">入选 {stats.get("filtered_count", 0)}</span>'
			
 
				+                f'<span class="stat-pill">账号 {stats.get("account_count", 0)}</span>'
			
 
				+            )
			
 
				+            decisions_html = _render_decisions(stage, decisions)
			
 
				+            agent_html = _render_agent_trace_section(agent_trace_ids)
			
 
				+            llm_html = _render_llm_interactions(llm_interactions)
			
 
				+            blocks.append(
			
 
				+                f'<div class="ev-stage-ok">'
			
 
				+                f'<span class="ts">{_esc(ts)}</span> '
			
 
				+                f'✅ {icon} <b>{_esc(label)}</b> 完成 '
			
 
				+                f'{attempt_badge}{dur_badge}'
			
 
				+                f'<div class="stage-stats">{stats_html}</div>'
			
 
				+                f'{llm_html}'
			
 
				+                f'{decisions_html}'
			
 
				+                f'{agent_html}'
			
 
				+                f'</div>'
			
 
				+            )
			
 
				+
			
 
				+        elif t == "gate_check":
			
 
				+            gate = ev.get("gate", "")
			
 
				+            gate_label = _GATE_LABELS.get(gate, gate)
			
 
				+            passed = ev.get("passed", False)
			
 
				+            action = ev.get("action", "proceed")
			
 
				+            issues = ev.get("issues", [])
			
 
				+            fallback = ev.get("fallback_stage", "")
			
 
				+            icon = ev.get("icon", "🚦")
			
 
				+            action_color = _ACTION_COLORS.get(action, "var(--dim)")
			
 
				+            status_icon = "✅" if passed else "⚠️"
			
 
				+            issues_html = ""
			
 
				+            if issues:
			
 
				+                issues_html = (
			
 
				+                    '<ul class="gate-issues">'
			
 
				+                    + "".join(f"<li>{_esc(i)}</li>" for i in issues)
			
 
				+                    + "</ul>"
			
 
				+                )
			
 
				+            fallback_html = (
			
 
				+                f'<span class="gate-fallback">→ 回退到 <b>{_esc(fallback)}</b></span>'
			
 
				+                if fallback else ""
			
 
				+            )
			
 
				+            cls = "ev-gate-ok" if passed else "ev-gate-warn"
			
 
				+            blocks.append(
			
 
				+                f'<div class="{cls}">'
			
 
				+                f'<span class="ts">{_esc(ts)}</span> '
			
 
				+                f'{status_icon} {icon} <b>{_esc(gate_label)}</b> '
			
 
				+                f'<span class="gate-action" style="color:{action_color}">[{_esc(action)}]</span>'
			
 
				+                f'{fallback_html}'
			
 
				+                f'{issues_html}'
			
 
				+                f'</div>'
			
 
				+            )
			
 
				+
			
 
				+        elif t == "error":
			
 
				+            stage = ev.get("stage", "")
			
 
				+            msg = ev.get("msg", "")
			
 
				+            blocks.append(
			
 
				+                f'<details class="ev-error" open>'
			
 
				+                f'<summary>❌ 错误 @ {_esc(stage)}: {_esc(msg[:200])}</summary>'
			
 
				+                f'<pre>{_esc(msg)}</pre>'
			
 
				+                f'</details>'
			
 
				+            )
			
 
				+
			
 
				+        elif t == "complete":
			
 
				+            status = ev.get("status", "unknown")
			
 
				+            stats = ev.get("stats", {})
			
 
				+            stage_count = ev.get("stage_count", 0)
			
 
				+            err_count = ev.get("error_count", 0)
			
 
				+            cls = "ev-complete-ok" if status == "completed" else "ev-complete-fail"
			
 
				+            out_html = (
			
 
				+                f'<div class="kv"><span class="k">输出文件</span>'
			
 
				+                f'<span class="v mono">{_esc(output_file)}</span></div>'
			
 
				+                if output_file else ""
			
 
				+            )
			
 
				+            blocks.append(
			
 
				+                f'<div class="{cls}">'
			
 
				+                f'<div class="ev-label">🏁 Pipeline 结束</div>'
			
 
				+                f'<div class="kv"><span class="k">状态</span><span class="v">{_esc(status)}</span></div>'
			
 
				+                f'<div class="kv"><span class="k">trace_id</span><span class="v mono">{_esc(trace_id)}</span></div>'
			
 
				+                f'<div class="kv"><span class="k">阶段数</span><span class="v">{stage_count}</span></div>'
			
 
				+                f'<div class="kv"><span class="k">错误数</span><span class="v">{err_count}</span></div>'
			
 
				+                f'<div class="kv"><span class="k">候选文章</span><span class="v">{stats.get("candidate_count", 0)}</span></div>'
			
 
				+                f'<div class="kv"><span class="k">入选文章</span><span class="v">{stats.get("filtered_count", 0)}</span></div>'
			
 
				+                f'<div class="kv"><span class="k">账号数</span><span class="v">{stats.get("account_count", 0)}</span></div>'
			
 
				+                f'{out_html}'
			
 
				+                f'</div>'
			
 
				+            )
			
 
				+
			
 
				+    body = "\n".join(blocks)
			
 
				+    status_color = "var(--green)" if pipeline_status == "completed" else "var(--red)"
			
 
				+    completed_stage_names = {e.get("stage", "") for e in stage_completes}
			
 
				+    errored_stage_names = {e.get("stage", "") for e in error_events}
			
 
				+    ordered_stages = list(_STAGE_LABELS.keys())
			
 
				+    stage_order_map = {name: idx + 1 for idx, name in enumerate(ordered_stages)}
			
 
				+    flow_items: list[str] = []
			
 
				+    for stage_key in ordered_stages:
			
 
				+        label = _STAGE_LABELS.get(stage_key, stage_key)
			
 
				+        if stage_key in errored_stage_names:
			
 
				+            cls = "flow-step flow-error"
			
 
				+            icon = "✖"
			
 
				+        elif stage_key in completed_stage_names:
			
 
				+            cls = "flow-step flow-done"
			
 
				+            icon = "✔"
			
 
				+        else:
			
 
				+            cls = "flow-step flow-pending"
			
 
				+            icon = "○"
			
 
				+        stage_no = stage_order_map.get(stage_key, 0)
			
 
				+        flow_items.append(
			
 
				+            f'<div class="{cls}"><span class="flow-icon">{icon}</span>'
			
 
				+            f'<h1 class="flow-h1">{stage_no}. {_esc(label)}</h1></div>'
			
 
				+        )
			
 
				+    flow_html = "".join(flow_items)
			
 
				+
			
 
				+    return f"""<!DOCTYPE html>
			
 
				+<html lang="zh-CN">
			
 
				+<head>
			
 
				+<meta charset="UTF-8">
			
 
				+<meta name="viewport" content="width=device-width, initial-scale=1.0">
			
 
				+<title>Pipeline 执行追踪 — {_esc(query[:40])}</title>
			
 
				+<style>
			
 
				+:root {{
			
 
				+  --bg: #0b1020; --bg2: #121a2b; --bg3: #1a2336; --border: #263149;
			
 
				+  --text: #d9e1f2; --dim: #94a3bf; --blue: #66b3ff;
			
 
				+  --purple: #c9a4ff; --green: #53d79f; --red: #ff6b7a;
			
 
				+  --yellow: #f2c35d; --orange: #ff9f5a; --green-bg: #132d28;
			
 
				+  --shadow: 0 12px 34px rgba(0,0,0,.28);
			
 
				+}}
			
 
				+* {{ margin:0; padding:0; box-sizing:border-box; }}
			
 
				+body {{
			
 
				+  font-family: -apple-system,"SF Pro Text","Segoe UI","Helvetica Neue",sans-serif;
			
 
				+  background:
			
 
				+    radial-gradient(1200px 520px at 10% -10%, rgba(102,179,255,.11), transparent 50%),
			
 
				+    radial-gradient(900px 500px at 90% 0%, rgba(201,164,255,.10), transparent 45%),
			
 
				+    var(--bg);
			
 
				+  color:var(--text); line-height:1.7;
			
 
				+  margin:0; padding:24px 16px;
			
 
				+}}
			
 
				+.page-wrap {{
			
 
				+  width: 80vw;
			
 
				+  max-width: 1920px;
			
 
				+  margin: 0 auto;
			
 
				+}}
			
 
				+.page-nav {{
			
 
				+  position:sticky; top:0; z-index:20; backdrop-filter: blur(6px);
			
 
				+  background:rgba(10,15,28,.72); border:1px solid var(--border); border-radius:12px;
			
 
				+  box-shadow: var(--shadow);
			
 
				+  padding:8px 10px; margin-bottom:14px; display:flex; gap:8px; flex-wrap:wrap;
			
 
				+}}
			
 
				+.page-nav a {{
			
 
				+  color:var(--dim); text-decoration:none; font-size:12px; padding:5px 12px;
			
 
				+  border-radius:999px; border:1px solid rgba(148,163,191,.2); background:rgba(18,26,43,.92);
			
 
				+}}
			
 
				+.page-nav a:hover {{ color:var(--blue); border-color:rgba(102,179,255,.45); background:rgba(102,179,255,.08); }}
			
 
				+header {{
			
 
				+  border:1px solid var(--border); border-radius:14px;
			
 
				+  background:linear-gradient(180deg, rgba(18,26,43,.95), rgba(18,26,43,.78));
			
 
				+  box-shadow: var(--shadow);
			
 
				+  padding:18px 20px; margin-bottom:20px;
			
 
				+  display:flex; justify-content:space-between; align-items:flex-end; flex-wrap:wrap; gap:12px;
			
 
				+}}
			
 
				+header h1 {{ color:var(--blue); font-size:24px; letter-spacing:-0.5px; margin:0; }}
			
 
				+header .sub {{ color:var(--dim); font-size:12px; margin-top:4px; }}
			
 
				+header .sub span {{ margin:0 6px; }}
			
 
				+.stats {{
			
 
				+  display:grid; grid-template-columns:repeat(5,1fr);
			
 
				+  gap:12px; margin-bottom:24px;
			
 
				+}}
			
 
				+.stat {{
			
 
				+  background:linear-gradient(180deg, rgba(18,26,43,.95), rgba(18,26,43,.75));
			
 
				+  border:1px solid var(--border);
			
 
				+  border-radius:12px; padding:14px 16px; text-align:center;
			
 
				+  box-shadow: var(--shadow);
			
 
				+}}
			
 
				+.stat .num {{ font-size:28px; font-weight:700; line-height:1.2; }}
			
 
				+.stat .desc {{ font-size:11px; color:var(--dim); margin-top:4px; text-transform:uppercase; letter-spacing:0.5px; }}
			
 
				+.s-time .num {{ color:var(--yellow); font-size:20px; }}
			
 
				+.s-cand .num {{ color:var(--blue); }}
			
 
				+.s-filt .num {{ color:var(--green); }}
			
 
				+.s-acct .num {{ color:var(--purple); }}
			
 
				+.s-err .num {{ color:var(--red); }}
			
 
				+.section-title-bar {{
			
 
				+  display:flex; align-items:center; justify-content:space-between;
			
 
				+  margin:22px 0 10px;
			
 
				+}}
			
 
				+.section-title-bar h2 {{
			
 
				+  margin:0; font-size:15px; color:var(--text); font-weight:700; letter-spacing:.2px;
			
 
				+}}
			
 
				+.section-title-bar .hint {{ color:var(--dim); font-size:11px; }}
			
 
				+.flow-strip {{
			
 
				+  display:grid; grid-template-columns:repeat(auto-fit,minmax(160px,1fr));
			
 
				+  gap:8px; margin:8px 0 18px;
			
 
				+}}
			
 
				+.flow-step {{
			
 
				+  background:linear-gradient(180deg, rgba(18,26,43,.92), rgba(18,26,43,.75));
			
 
				+  border:1px solid var(--border); border-radius:10px;
			
 
				+  padding:9px 11px; font-size:12px; display:flex; gap:8px; align-items:center;
			
 
				+  box-shadow: 0 4px 16px rgba(0,0,0,.18);
			
 
				+}}
			
 
				+.flow-h1 {{
			
 
				+  margin:0; font-size:16px; font-weight:700; letter-spacing:.1px; color:var(--text);
			
 
				+}}
			
 
				+.flow-icon {{ font-size:11px; opacity:.95; }}
			
 
				+.flow-done {{ border-color:rgba(86,211,100,.45); background:rgba(86,211,100,.08); }}
			
 
				+.flow-done .flow-icon {{ color:var(--green); }}
			
 
				+.flow-error {{ border-color:rgba(248,81,73,.55); background:rgba(248,81,73,.10); }}
			
 
				+.flow-error .flow-icon {{ color:var(--red); }}
			
 
				+.flow-pending {{ border-color:rgba(139,148,158,.35); }}
			
 
				+.timeline {{
			
 
				+  position:relative; padding:12px 12px 12px 28px;
			
 
				+  border:1px solid var(--border); border-radius:14px;
			
 
				+  background:linear-gradient(180deg, rgba(18,26,43,.90), rgba(18,26,43,.72));
			
 
				+  box-shadow: var(--shadow);
			
 
				+}}
			
 
				+.timeline::before {{
			
 
				+  content:''; position:absolute; left:8px; top:0; bottom:0;
			
 
				+  width:2px; background:linear-gradient(180deg, var(--blue) 0%, var(--green) 50%, var(--purple) 100%);
			
 
				+  opacity:0.4;
			
 
				+}}
			
 
				+.ts {{ font-size:10px; color:var(--dim); margin-right:6px; }}
			
 
				+.ev-label {{ font-weight:600; margin-bottom:6px; }}
			
 
				+.kv {{ font-size:13px; margin:2px 0; }}
			
 
				+.kv .k {{ color:var(--dim); margin-right:8px; }}
			
 
				+.kv .k::after {{ content:':'; }}
			
 
				+.kv .v {{ color:var(--text); }}
			
 
				+.mono {{ font-family:monospace; font-size:11px; color:var(--dim); }}
			
 
				+.ev-init, .ev-complete-ok, .ev-complete-fail {{
			
 
				+  background:rgba(16,24,41,.85); border:1px solid var(--border);
			
 
				+  border-radius:12px; padding:14px 16px; margin:12px 0;
			
 
				+}}
			
 
				+.ev-init {{ border-left:4px solid var(--blue); }}
			
 
				+.ev-init .ev-label {{ color:var(--blue); }}
			
 
				+.ev-complete-ok {{ border-left:4px solid var(--green); }}
			
 
				+.ev-complete-ok .ev-label {{ color:var(--green); }}
			
 
				+.ev-complete-fail {{ border-left:4px solid var(--red); }}
			
 
				+.ev-complete-fail .ev-label {{ color:var(--red); }}
			
 
				+.ev-phase {{
			
 
				+  background:linear-gradient(135deg,#1f3f67,#1a2b47);
			
 
				+  border:1px solid rgba(88,166,255,.3); border-radius:10px;
			
 
				+  padding:11px 14px; margin:18px 0 8px;
			
 
				+  font-size:15px; font-weight:700; color:var(--blue);
			
 
				+  position:relative; display:flex; align-items:center; gap:8px;
			
 
				+}}
			
 
				+.ev-phase::before {{
			
 
				+  content:''; position:absolute; left:-20px; top:14px;
			
 
				+  width:10px; height:10px; background:var(--blue);
			
 
				+  border-radius:50%; border:2px solid var(--bg);
			
 
				+}}
			
 
				+.stage-name {{ font-size:11px; color:var(--dim); font-weight:400; margin-left:6px; }}
			
 
				+.stage-h1 {{
			
 
				+  margin:0; font-size:22px; line-height:1.15; font-weight:800; color:var(--blue);
			
 
				+}}
			
 
				+.ev-stage-ok {{
			
 
				+  background:linear-gradient(180deg, rgba(19,45,40,.92), rgba(19,45,40,.72));
			
 
				+  border:1px solid rgba(83,215,159,.35);
			
 
				+  border-radius:10px; padding:10px 12px; margin:8px 0; font-size:13px;
			
 
				+}}
			
 
				+.stage-stats {{ margin-top:4px; }}
			
 
				+.stat-pill {{
			
 
				+  display:inline-block; background:rgba(88,166,255,.1);
			
 
				+  border:1px solid rgba(88,166,255,.2); border-radius:12px;
			
 
				+  padding:1px 8px; font-size:11px; color:var(--blue); margin-right:6px;
			
 
				+}}
			
 
				+.badge-retry {{
			
 
				+  background:rgba(227,179,65,.15); border:1px solid rgba(227,179,65,.3);
			
 
				+  border-radius:4px; padding:1px 6px; font-size:10px; color:var(--yellow); margin-left:6px;
			
 
				+}}
			
 
				+.badge-dur {{
			
 
				+  background:rgba(139,148,158,.1); border-radius:4px;
			
 
				+  padding:1px 6px; font-size:10px; color:var(--dim); margin-left:4px;
			
 
				+}}
			
 
				+.ev-gate-ok, .ev-gate-warn {{
			
 
				+  border-radius:10px; padding:9px 12px; margin:6px 0; font-size:12px;
			
 
				+}}
			
 
				+.ev-gate-ok {{
			
 
				+  background:rgba(86,211,100,.05); border:1px solid rgba(86,211,100,.2);
			
 
				+}}
			
 
				+.ev-gate-warn {{
			
 
				+  background:rgba(240,136,62,.05); border:1px solid rgba(240,136,62,.3);
			
 
				+}}
			
 
				+.gate-action {{ font-weight:600; margin-left:8px; }}
			
 
				+.gate-fallback {{ color:var(--orange); font-size:11px; margin-left:8px; }}
			
 
				+.gate-issues {{ margin:4px 0 0 16px; color:var(--dim); font-size:11px; }}
			
 
				+.ev-error {{ background:#2d1215; border:1px solid rgba(248,81,73,.4); border-radius:8px; margin:6px 0; color:var(--red); }}
			
 
				+.ev-error summary {{ padding:10px 14px; cursor:pointer; font-size:13px; }}
			
 
				+.ev-error pre {{ white-space:pre-wrap; word-break:break-word; font-size:11px; color:#f0a0a0; padding:10px 14px; border-top:1px solid rgba(248,81,73,.2); max-height:300px; overflow-y:auto; }}
			
 
				+/* ── 决策详情卡片 ── */
			
 
				+.decision-card {{
			
 
				+  margin:10px 0 6px; border:1px solid var(--border); border-radius:10px;
			
 
				+  background:rgba(18,26,43,.92); overflow:hidden;
			
 
				+}}
			
 
				+.decision-card summary {{
			
 
				+  padding:10px 16px; cursor:pointer; font-size:13px; font-weight:600;
			
 
				+  color:var(--blue); user-select:none; background:rgba(88,166,255,.03);
			
 
				+}}
			
 
				+.decision-card summary:hover {{ background:rgba(88,166,255,.08); }}
			
 
				+.decision-body {{ padding:8px 16px 16px; }}
			
 
				+.decision-section {{ margin-bottom:12px; }}
			
 
				+.section-title {{ font-size:12px; font-weight:600; color:var(--dim); margin-bottom:6px; }}
			
 
				+.decision-table {{
			
 
				+  width:100%; border-collapse:collapse; font-size:12px; margin-top:4px;
			
 
				+  table-layout:auto;
			
 
				+}}
			
 
				+.decision-table th {{
			
 
				+  text-align:left; padding:6px 10px; border-bottom:2px solid var(--border);
			
 
				+  color:var(--dim); font-weight:600; font-size:11px; white-space:nowrap;
			
 
				+  background:rgba(0,0,0,.2); position:sticky; top:0;
			
 
				+}}
			
 
				+.decision-table td {{ padding:5px 10px; border-bottom:1px solid rgba(33,38,45,.5); vertical-align:top; }}
			
 
				+.feature-label {{ color:var(--dim); white-space:nowrap; width:70px; }}
			
 
				+.tag {{
			
 
				+  display:inline-block; background:rgba(210,168,255,.1); border:1px solid rgba(210,168,255,.25);
			
 
				+  border-radius:4px; padding:1px 6px; font-size:11px; color:var(--purple); margin:1px 3px 1px 0;
			
 
				+}}
			
 
				+.tag-blue {{ background:rgba(88,166,255,.1); border-color:rgba(88,166,255,.25); color:var(--blue); }}
			
 
				+.tag-purple {{ background:rgba(210,168,255,.1); border-color:rgba(210,168,255,.25); color:var(--purple); }}
			
 
				+.focus-group {{ margin:4px 0 8px; font-size:12px; }}
			
 
				+.focus-group b {{ color:var(--dim); font-size:11px; }}
			
 
				+.focus-group ul {{ margin:2px 0 0 16px; color:var(--text); }}
			
 
				+.focus-group li {{ margin:1px 0; }}
			
 
				+.kw-table code {{ color:var(--blue); font-size:11px; }}
			
 
				+.num-cell {{ text-align:right; font-variant-numeric:tabular-nums; }}
			
 
				+.total-line {{ font-size:12px; color:var(--dim); margin-top:6px; padding-top:6px; border-top:1px solid var(--border); }}
			
 
				+.article-title-cell {{ word-break:break-all; }}
			
 
				+.article-title-cell a {{ color:var(--blue); text-decoration:none; }}
			
 
				+.article-title-cell a:hover {{ text-decoration:underline; }}
			
 
				+.recall-table .article-title-cell,
			
 
				+.review-table .article-title-cell {{
			
 
				+  min-width: 520px;
			
 
				+  max-width: 680px;
			
 
				+}}
			
 
				+.score-cell {{ white-space:nowrap; font-size:11px; }}
			
 
				+.reason-full-cell {{ font-size:11px; color:var(--dim); line-height:1.5; }}
			
 
				+.date-cell {{ white-space:nowrap; font-size:11px; color:var(--dim); }}
			
 
				+.review-table th {{ white-space:nowrap; }}
			
 
				+.review-table {{ min-width:900px; }}
			
 
				+.decision-section {{ overflow-x:auto; }}
			
 
				+.recall-table code {{ color:var(--blue); font-size:11px; }}
			
 
				+.phase-badge {{
			
 
				+  display:inline-block; font-size:10px; padding:1px 5px; border-radius:3px;
			
 
				+  background:rgba(139,148,158,.12); color:var(--dim); white-space:nowrap;
			
 
				+}}
			
 
				+.row-accept td {{ border-left:2px solid var(--green); }}
			
 
				+.row-reject td {{ border-left:2px solid var(--red); }}
			
 
				+.row-skip td {{ border-left:2px solid var(--dim); }}
			
 
				+.stat-accept {{ background:rgba(86,211,100,.1); border-color:rgba(86,211,100,.3); color:var(--green); }}
			
 
				+.stat-reject {{ background:rgba(248,81,73,.1); border-color:rgba(248,81,73,.3); color:var(--red); }}
			
 
				+.stat-skip {{ background:rgba(139,148,158,.1); border-color:rgba(139,148,158,.3); color:var(--dim); }}
			
 
				+.acct-table .sample-titles {{ font-size:11px; color:var(--dim); }}
			
 
				+.file-path {{ font-size:11px; color:var(--dim); background:rgba(139,148,158,.08); padding:3px 8px; border-radius:4px; }}
			
 
				+/* ── LLM 交互追踪卡片 ── */
			
 
				+.llm-interaction-card {{
			
 
				+  margin:10px 0 6px; border:1px solid rgba(102,179,255,.3); border-radius:10px;
			
 
				+  background:rgba(18,26,43,.95); overflow:hidden;
			
 
				+}}
			
 
				+.llm-interaction-card summary {{
			
 
				+  padding:8px 14px; cursor:pointer; font-size:12px; font-weight:600;
			
 
				+  color:var(--blue); user-select:none;
			
 
				+}}
			
 
				+.llm-interaction-card summary:hover {{ background:rgba(102,179,255,.06); }}
			
 
				+.llm-interaction-meta {{ font-weight:400; color:var(--dim); font-size:11px; margin-left:4px; }}
			
 
				+.llm-interaction-body {{ padding:6px 14px 14px; display:flex; flex-direction:column; gap:6px; }}
			
 
				+.llm-msg {{ border-radius:6px; font-size:12px; overflow:hidden; }}
			
 
				+.llm-msg-header {{ font-size:11px; font-weight:600; color:var(--dim); margin-bottom:4px; }}
			
 
				+.llm-msg-pre {{
			
 
				+  white-space:pre-wrap; word-break:break-word; font-size:11px;
			
 
				+  font-family:monospace; line-height:1.5; color:var(--text);
			
 
				+  max-height:400px; overflow-y:auto; margin:0; padding:4px 0;
			
 
				+}}
			
 
				+.llm-msg-system {{
			
 
				+  background:rgba(88,166,255,.04); border:1px solid rgba(88,166,255,.1);
			
 
				+}}
			
 
				+.llm-msg-system summary {{
			
 
				+  padding:6px 10px; cursor:pointer; font-size:11px; color:var(--blue);
			
 
				+}}
			
 
				+.llm-msg-system .llm-msg-pre {{ padding:0 10px 10px; color:var(--dim); max-height:300px; }}
			
 
				+.llm-msg-user {{
			
 
				+  background:rgba(139,148,158,.04); border:1px solid rgba(139,148,158,.1);
			
 
				+  padding:8px 10px;
			
 
				+}}
			
 
				+.llm-msg-user summary {{
			
 
				+  padding:6px 10px; cursor:pointer; font-size:11px; color:var(--dim);
			
 
				+}}
			
 
				+.llm-msg-user .llm-msg-pre {{ padding:0 10px 10px; }}
			
 
				+.llm-reasoning {{
			
 
				+  background:rgba(227,179,65,.08); border:1px solid rgba(227,179,65,.25);
			
 
				+  border-radius:6px; margin:4px 0;
			
 
				+}}
			
 
				+.llm-reasoning summary {{
			
 
				+  cursor:pointer; font-size:12px; font-weight:600; color:var(--yellow); padding:6px 10px;
			
 
				+}}
			
 
				+.llm-reasoning-text {{ padding:4px 10px 10px; color:var(--yellow); opacity:0.9; }}
			
 
				+.llm-response {{
			
 
				+  background:rgba(86,211,100,.04); border:1px solid rgba(86,211,100,.15);
			
 
				+  border-radius:6px; padding:8px 10px;
			
 
				+}}
			
 
				+.llm-response summary {{
			
 
				+  cursor:pointer; font-size:11px; color:var(--green); padding:4px 0;
			
 
				+}}
			
 
				+.llm-tool-call {{
			
 
				+  background:rgba(88,166,255,.06); border:1px solid rgba(88,166,255,.12);
			
 
				+  border-radius:4px; margin:4px 0; padding:6px 8px;
			
 
				+}}
			
 
				+.llm-tool-name {{ font-size:11px; font-weight:600; color:var(--blue); margin-bottom:2px; }}
			
 
				+.llm-tool-args {{
			
 
				+  font-size:10px; color:var(--dim); font-family:monospace; margin:2px 0 0;
			
 
				+  white-space:pre-wrap; max-height:200px; overflow-y:auto;
			
 
				+}}
			
 
				+.llm-tool-result {{ margin:4px 0; }}
			
 
				+.llm-tool-result summary {{
			
 
				+  cursor:pointer; font-size:11px; color:var(--purple); padding:2px 0;
			
 
				+}}
			
 
				+/* ── Agent Trace 卡片 ── */
			
 
				+.agent-trace-card {{
			
 
				+  margin:10px 0 6px; border:1px solid rgba(210,168,255,.3); border-radius:10px;
			
 
				+  background:rgba(18,26,43,.95); overflow:hidden;
			
 
				+}}
			
 
				+.agent-trace-card summary {{
			
 
				+  padding:8px 14px; cursor:pointer; font-size:12px; font-weight:600;
			
 
				+  color:var(--purple); user-select:none;
			
 
				+}}
			
 
				+.agent-trace-card summary:hover {{ background:rgba(210,168,255,.06); }}
			
 
				+.agent-trace-summary {{ font-weight:400; color:var(--dim); font-size:11px; margin-left:4px; }}
			
 
				+.agent-trace-body {{ padding:6px 14px 14px; }}
			
 
				+.agent-meta {{
			
 
				+  display:flex; flex-wrap:wrap; gap:8px 16px; padding:6px 0 10px;
			
 
				+  border-bottom:1px solid var(--border); margin-bottom:10px; font-size:11px;
			
 
				+}}
			
 
				+.agent-meta-item {{ color:var(--dim); }}
			
 
				+.agent-meta-item b {{ color:var(--text); }}
			
 
				+.agent-meta-item code {{ color:var(--blue); font-size:10px; }}
			
 
				+.agent-messages {{ display:flex; flex-direction:column; gap:6px; }}
			
 
				+.agent-msg {{ border-radius:6px; font-size:12px; overflow:hidden; }}
			
 
				+.agent-msg-header {{
			
 
				+  font-size:11px; font-weight:600; color:var(--dim); margin-bottom:4px;
			
 
				+}}
			
 
				+.agent-msg-pre {{
			
 
				+  white-space:pre-wrap; word-break:break-word; font-size:11px;
			
 
				+  font-family:monospace; line-height:1.5; color:var(--text);
			
 
				+  max-height:400px; overflow-y:auto; margin:0; padding:4px 0;
			
 
				+}}
			
 
				+.agent-msg-system {{
			
 
				+  background:rgba(88,166,255,.04); border:1px solid rgba(88,166,255,.1);
			
 
				+}}
			
 
				+.agent-msg-system summary {{
			
 
				+  padding:6px 10px; cursor:pointer; font-size:11px; color:var(--blue);
			
 
				+}}
			
 
				+.agent-msg-system .agent-msg-pre {{ padding:0 10px 10px; color:var(--dim); max-height:300px; }}
			
 
				+.agent-msg-user {{
			
 
				+  background:rgba(139,148,158,.04); border:1px solid rgba(139,148,158,.1);
			
 
				+  padding:8px 10px;
			
 
				+}}
			
 
				+.agent-msg-user summary {{
			
 
				+  padding:6px 10px; cursor:pointer; font-size:11px; color:var(--dim);
			
 
				+}}
			
 
				+.agent-msg-user .agent-msg-pre {{ padding:0 10px 10px; }}
			
 
				+.agent-msg-assistant {{
			
 
				+  background:rgba(86,211,100,.04); border:1px solid rgba(86,211,100,.15);
			
 
				+  padding:8px 10px;
			
 
				+}}
			
 
				+.agent-token-badge {{
			
 
				+  display:inline-block; font-size:9px; padding:1px 5px; border-radius:3px;
			
 
				+  background:rgba(139,148,158,.12); color:var(--dim); margin-left:6px; font-weight:400;
			
 
				+}}
			
 
				+.agent-thinking {{ margin:4px 0; }}
			
 
				+.agent-thinking summary {{
			
 
				+  cursor:pointer; font-size:11px; color:var(--dim); padding:2px 0;
			
 
				+}}
			
 
				+.agent-reasoning {{
			
 
				+  background:rgba(227,179,65,.06); border:1px solid rgba(227,179,65,.15);
			
 
				+  border-radius:4px; margin:4px 0;
			
 
				+}}
			
 
				+.agent-reasoning summary {{
			
 
				+  cursor:pointer; font-size:11px; color:var(--yellow); padding:4px 8px;
			
 
				+}}
			
 
				+.agent-reasoning .agent-msg-pre {{ padding:4px 8px 8px; color:var(--yellow); }}
			
 
				+.agent-tool-call {{
			
 
				+  background:rgba(88,166,255,.06); border:1px solid rgba(88,166,255,.12);
			
 
				+  border-radius:4px; margin:4px 0; padding:6px 8px;
			
 
				+}}
			
 
				+.agent-tool-name {{ font-size:11px; font-weight:600; color:var(--blue); margin-bottom:2px; }}
			
 
				+.agent-tool-args {{
			
 
				+  font-size:10px; color:var(--dim); font-family:monospace; margin:2px 0 0;
			
 
				+  white-space:pre-wrap; max-height:200px; overflow-y:auto;
			
 
				+}}
			
 
				+.agent-msg-tool {{
			
 
				+  background:rgba(210,168,255,.04); border:1px solid rgba(210,168,255,.1);
			
 
				+  padding:6px 10px;
			
 
				+}}
			
 
				+.agent-msg-tool summary {{
			
 
				+  cursor:pointer; font-size:11px; color:var(--purple); padding:4px 0;
			
 
				+}}
			
 
				+.agent-msg-tool .agent-msg-pre {{ max-height:250px; color:var(--dim); }}
			
 
				+.agent-msg-goal {{
			
 
				+  background:rgba(227,179,65,.06); border:1px solid rgba(227,179,65,.12);
			
 
				+  border-radius:4px; padding:6px 10px; font-size:11px; color:var(--yellow);
			
 
				+}}
			
 
				+@media(max-width:900px) {{
			
 
				+  body {{ padding:16px 12px; }}
			
 
				+  .stats {{ grid-template-columns:repeat(auto-fit,minmax(120px,1fr)); }}
			
 
				+  .timeline {{ padding-left:16px; }}
			
 
				+  .timeline::before {{ left:3px; }}
			
 
				+  .ev-phase::before {{ left:-14px; width:8px; height:8px; }}
			
 
				+  header {{ flex-direction:column; align-items:flex-start; }}
			
 
				+}}
			
 
				+{_LOG_VIEWER_CSS}
			
 
				+</style>
			
 
				+</head>
			
 
				+<body>
			
 
				+<div class="page-wrap">
			
 
				+<nav class="page-nav">
			
 
				+  <a href="#overview">总览</a>
			
 
				+  <a href="#flow">流程总览</a>
			
 
				+  <a href="#timeline">执行时间线</a>
			
 
				+  <a href="#logs">完整日志</a>
			
 
				+</nav>
			
 
				+
			
 
				+<header id="overview">
			
 
				+  <h1>🔄 Pipeline 执行追踪</h1>
			
 
				+  <div class="sub">
			
 
				+    查询: {_esc(query)} &nbsp;|&nbsp;
			
 
				+    模型: {_esc(model)} &nbsp;|&nbsp;
			
 
				+    状态: <span style="color:{status_color}">{_esc(pipeline_status)}</span> &nbsp;|&nbsp;
			
 
				+    耗时: {_esc(duration)} &nbsp;|&nbsp;
			
 
				+    trace_id: <span style="font-family:monospace">{_esc(trace_id)}</span>
			
 
				+  </div>
			
 
				+</header>
			
 
				+
			
 
				+<div class="stats">
			
 
				+  <div class="stat s-time"><div class="num">{_esc(duration)}</div><div class="desc">总耗时</div></div>
			
 
				+  <div class="stat s-cand"><div class="num">{candidate_count}</div><div class="desc">召回候选</div></div>
			
 
				+  <div class="stat s-filt"><div class="num">{filtered_count}</div><div class="desc">入选文章</div></div>
			
 
				+  <div class="stat s-acct"><div class="num">{account_count}</div><div class="desc">沉淀账号</div></div>
			
 
				+  <div class="stat s-err"><div class="num">{len(error_events)}</div><div class="desc">错误</div></div>
			
 
				+</div>
			
 
				+
			
 
				+<div class="section-title-bar" id="flow">
			
 
				+  <h2>流程总览</h2>
			
 
				+  <span class="hint">按阶段展示执行状态</span>
			
 
				+</div>
			
 
				+<div class="flow-strip">
			
 
				+{flow_html}
			
 
				+</div>
			
 
				+
			
 
				+<div class="section-title-bar" id="timeline">
			
 
				+  <h2>执行时间线</h2>
			
 
				+  <span class="hint">按事件顺序展示关键决策与结果</span>
			
 
				+</div>
			
 
				+<div class="timeline">
			
 
				+{body}
			
 
				+</div>
			
 
				+
			
 
				+<div id="logs">
			
 
				+{_render_full_log_section(full_log_lines or [])}
			
 
				+</div>
			
 
				+
			
 
				+{_LOG_VIEWER_JS}
			
 
				+
			
 
				+</div>
			
 
				+</body>
			
 
				+</html>"""
			
 
				+
			
 
				+
			
 
				+# ─────────────────────────────────────────────────────────────
			
 
				+# 入口
			
 
				+# ─────────────────────────────────────────────────────────────
			
 
				+
			
 
				+def list_traces() -> None:
			
 
				+    if not TRACES_DIR.exists():
			
 
				+        print(f"traces 目录不存在: {TRACES_DIR}")
			
 
				+        return
			
 
				+    dirs = sorted(
			
 
				+        [d for d in TRACES_DIR.iterdir() if d.is_dir() and (d / "pipeline.jsonl").exists()],
			
 
				+        key=lambda d: d.stat().st_mtime,
			
 
				+        reverse=True,
			
 
				+    )
			
 
				+    if not dirs:
			
 
				+        print("暂无可用 trace（需先运行 run_search_agent.py）")
			
 
				+        return
			
 
				+    print(f"{'trace_id':<40} {'修改时间'}")
			
 
				+    print("-" * 60)
			
 
				+    for d in dirs:
			
 
				+        mtime = datetime.fromtimestamp(d.stat().st_mtime).strftime("%Y-%m-%d %H:%M:%S")
			
 
				+        print(f"{d.name:<40} {mtime}")
			
 
				+
			
 
				+
			
 
				+def find_latest_trace() -> Path | None:
			
 
				+    if not TRACES_DIR.exists():
			
 
				+        return None
			
 
				+    dirs = sorted(
			
 
				+        [d for d in TRACES_DIR.iterdir() if d.is_dir() and (d / "pipeline.jsonl").exists()],
			
 
				+        key=lambda d: d.stat().st_mtime,
			
 
				+        reverse=True,
			
 
				+    )
			
 
				+    return dirs[0] if dirs else None
			
 
				+
			
 
				+
			
 
				+def main() -> None:
			
 
				+    args = sys.argv[1:]
			
 
				+
			
 
				+    if "--list" in args:
			
 
				+        list_traces()
			
 
				+        return
			
 
				+
			
 
				+    if args and not args[0].startswith("--"):
			
 
				+        trace_id = args[0]
			
 
				+        trace_dir = TRACES_DIR / trace_id
			
 
				+    else:
			
 
				+        trace_dir = find_latest_trace()
			
 
				+        if trace_dir is None:
			
 
				+            print(f"❌ 找不到任何 trace，请先运行 run_search_agent.py")
			
 
				+            print(f"   traces 目录: {TRACES_DIR}")
			
 
				+            sys.exit(1)
			
 
				+        trace_id = trace_dir.name
			
 
				+
			
 
				+    jsonl_path = trace_dir / "pipeline.jsonl"
			
 
				+    if not jsonl_path.exists():
			
 
				+        print(f"❌ 找不到 {jsonl_path}")
			
 
				+        sys.exit(1)
			
 
				+
			
 
				+    events = read_jsonl(jsonl_path)
			
 
				+    print(f"📄 读取了 {len(events)} 个事件 (trace_id={trace_id})")
			
 
				+
			
 
				+    # 读取完整日志文件（如有）
			
 
				+    log_path = trace_dir / "full_log.log"
			
 
				+    full_log_lines: list[dict] | None = None
			
 
				+    if log_path.exists():
			
 
				+        log_text = log_path.read_text(encoding="utf-8")
			
 
				+        full_log_lines = _parse_log_lines(log_text)
			
 
				+        print(f"📋 读取了 {len(full_log_lines)} 行完整日志")
			
 
				+
			
 
				+    html_content = render_html(events, full_log_lines=full_log_lines)
			
 
				+    out_path = trace_dir / "pipeline_trace.html"
			
 
				+    out_path.write_text(html_content, encoding="utf-8")
			
 
				+    size_kb = out_path.stat().st_size / 1024
			
 
				+    print(f"✅ 已生成: {out_path}  ({size_kb:.0f} KB)")
			
 
				+
			
 
				+
			
 
				+if __name__ == "__main__":
			
 
				+    main()
			
--- a/run_pipeline.py
+++ b/run_pipeline.py
@@ -0,0 +1,94 @@
 
				+from __future__ import annotations
			
 
				+
			
 
				+import asyncio
			
 
				+import logging
			
 
				+import os
			
 
				+import shutil
			
 
				+import sys
			
 
				+import tempfile
			
 
				+
			
 
				+from dotenv import load_dotenv
			
 
				+
			
 
				+from src.pipeline.runner import run_content_finder_from_cli
			
 
				+
			
 
				+load_dotenv()
			
 
				+
			
 
				+# ── 日志级别由环境变量控制，默认 DEBUG 全量捕获 ────────────
			
 
				+_LOG_LEVEL = os.getenv("LOG_LEVEL", "DEBUG").upper()
			
 
				+_CONSOLE_LEVEL = os.getenv("CONSOLE_LOG_LEVEL", "INFO").upper()
			
 
				+
			
 
				+_LOG_FMT = "%(asctime)s | %(levelname)-7s | %(name)s | %(message)s"
			
 
				+_LOG_DATEFMT = "%Y-%m-%d %H:%M:%S"
			
 
				+
			
 
				+
			
 
				+def _setup_logging(log_file_path: str) -> logging.FileHandler:
			
 
				+    """
			
 
				+    配置双通道日志：console（INFO）+ file（DEBUG）。
			
 
				+
			
 
				+    不修改 agent 内核代码，通过 root logger 拦截所有子 logger 输出。
			
 
				+    """
			
 
				+    root = logging.getLogger()
			
 
				+    root.setLevel(getattr(logging, _LOG_LEVEL, logging.DEBUG))
			
 
				+
			
 
				+    formatter = logging.Formatter(fmt=_LOG_FMT, datefmt=_LOG_DATEFMT)
			
 
				+
			
 
				+    console = logging.StreamHandler(sys.__stdout__)
			
 
				+    console.setLevel(getattr(logging, _CONSOLE_LEVEL, logging.INFO))
			
 
				+    console.setFormatter(formatter)
			
 
				+    root.addHandler(console)
			
 
				+
			
 
				+    fh = logging.FileHandler(log_file_path, mode="w", encoding="utf-8")
			
 
				+    fh.setLevel(logging.DEBUG)
			
 
				+    fh.setFormatter(formatter)
			
 
				+    root.addHandler(fh)
			
 
				+
			
 
				+    for noisy in ("httpx", "httpcore", "urllib3", "asyncio"):
			
 
				+        logging.getLogger(noisy).setLevel(logging.WARNING)
			
 
				+
			
 
				+    # agent 内核日志不写入全量日志文件（减少噪音）
			
 
				+    class _AgentLogFilter(logging.Filter):
			
 
				+        def filter(self, record: logging.LogRecord) -> bool:
			
 
				+            return not record.name.startswith("agent.")
			
 
				+
			
 
				+    fh.addFilter(_AgentLogFilter())
			
 
				+
			
 
				+    return fh
			
 
				+
			
 
				+
			
 
				+logger = logging.getLogger(__name__)
			
 
				+
			
 
				+
			
 
				+async def main() -> None:
			
 
				+    tmp = tempfile.NamedTemporaryFile(
			
 
				+        delete=False, suffix=".log", prefix="pipeline_run_", mode="w", encoding="utf-8",
			
 
				+    )
			
 
				+    tmp_path = tmp.name
			
 
				+    tmp.close()
			
 
				+
			
 
				+    file_handler = _setup_logging(tmp_path)
			
 
				+
			
 
				+    try:
			
 
				+        query = os.getenv("PIPELINE_QUERY", "伊朗、以色列、和平是永恒的主题")
			
 
				+        demand_id = os.getenv("PIPELINE_DEMAND_ID", "1")
			
 
				+        result = await run_content_finder_from_cli(query=query, demand_id=demand_id)
			
 
				+
			
 
				+        logger.info("pipeline trace_id=%s", result.trace_id)
			
 
				+        logger.info("pipeline output=%s", result.metadata.get("output_file", ""))
			
 
				+
			
 
				+        # 将日志文件移入 trace 目录
			
 
				+        file_handler.close()
			
 
				+        trace_dir = os.path.join("tests", "traces", result.trace_id)
			
 
				+        os.makedirs(trace_dir, exist_ok=True)
			
 
				+        dest = os.path.join(trace_dir, "full_log.log")
			
 
				+        shutil.move(tmp_path, dest)
			
 
				+        logger.info("完整日志已保存: %s", dest)
			
 
				+    finally:
			
 
				+        if os.path.exists(tmp_path):
			
 
				+            try:
			
 
				+                os.unlink(tmp_path)
			
 
				+            except OSError:
			
 
				+                pass
			
 
				+
			
 
				+
			
 
				+if __name__ == "__main__":
			
 
				+    asyncio.run(main())
			
--- a/run_search_agent.py
+++ b/run_search_agent.py
@@ -0,0 +1,408 @@
 
				+"""
			
 
				+Search Agent Harness — 约束驱动的搜索 Agent 入口。
			
 
				+
			
 
				+Harness Engineering 分层：
			
 
				+  1. Budget Harness   — 显式限定运行预算（超时、迭代上限、召回上限）
			
 
				+  2. Planner Harness  — 启动前打印运行计划，明确每阶段目标与约束
			
 
				+  3. Observer Harness — 结构化进度回调，暴露关键检查点状态
			
 
				+  4. Fallback Harness — DB 策略失败 / API Key 缺失的显式降级路径
			
 
				+
			
 
				+前置：
			
 
				+- OPEN_ROUTER_API_KEY
			
 
				+- 可选：SEARCH_AGENT_DB_* 与表 search_agent_strategy（见 docs/search_agent_strategy.sql）
			
 
				+
			
 
				+环境变量：
			
 
				+- PIPELINE_QUERY       / 默认 "伊朗、以色列、和平是永恒的主题"
			
 
				+- PIPELINE_DEMAND_ID   / 默认 "1"
			
 
				+- PIPELINE_TIMEOUT     / 整个 Agent 超时秒数，默认 1800（30 分钟）
			
 
				+- PIPELINE_TARGET_COUNT / 目标文章数，默认取 RuntimePipelineConfig
			
 
				+"""
			
 
				+
			
 
				+from __future__ import annotations
			
 
				+
			
 
				+import asyncio
			
 
				+import logging
			
 
				+import os
			
 
				+import shutil
			
 
				+import sys
			
 
				+import tempfile
			
 
				+import time
			
 
				+from dataclasses import dataclass, field
			
 
				+from typing import Optional
			
 
				+from uuid import uuid4
			
 
				+
			
 
				+from dotenv import load_dotenv
			
 
				+from src.domain.search.core import SearchAgentCore
			
 
				+from src.domain.search.policy import SearchAgentPolicy
			
 
				+
			
 
				+load_dotenv()
			
 
				+
			
 
				+# ── 日志级别由环境变量控制 ────────────
			
 
				+_LOG_LEVEL = os.getenv("LOG_LEVEL", "DEBUG").upper()
			
 
				+_CONSOLE_LEVEL = os.getenv("CONSOLE_LOG_LEVEL", "INFO").upper()
			
 
				+_LOG_FMT = "%(asctime)s | %(levelname)-7s | %(name)s | %(message)s"
			
 
				+_LOG_DATEFMT = "%Y-%m-%d %H:%M:%S"
			
 
				+
			
 
				+# 全局文件 handler 引用，供 main() 移动日志文件
			
 
				+_file_handler: Optional[logging.FileHandler] = None
			
 
				+_tmp_log_path: Optional[str] = None
			
 
				+
			
 
				+
			
 
				+def _setup_logging() -> None:
			
 
				+    """
			
 
				+    配置双通道日志：console（INFO）+ file（DEBUG）。
			
 
				+
			
 
				+    全量日志写入临时文件，pipeline 完成后移入 trace 目录。
			
 
				+    """
			
 
				+    global _file_handler, _tmp_log_path
			
 
				+
			
 
				+    root = logging.getLogger()
			
 
				+    root.setLevel(getattr(logging, _LOG_LEVEL, logging.DEBUG))
			
 
				+
			
 
				+    formatter = logging.Formatter(fmt=_LOG_FMT, datefmt=_LOG_DATEFMT)
			
 
				+
			
 
				+    console = logging.StreamHandler(sys.__stdout__)
			
 
				+    console.setLevel(getattr(logging, _CONSOLE_LEVEL, logging.INFO))
			
 
				+    console.setFormatter(formatter)
			
 
				+    root.addHandler(console)
			
 
				+
			
 
				+    tmp = tempfile.NamedTemporaryFile(
			
 
				+        delete=False, suffix=".log", prefix="search_agent_", mode="w", encoding="utf-8",
			
 
				+    )
			
 
				+    _tmp_log_path = tmp.name
			
 
				+    tmp.close()
			
 
				+
			
 
				+    _file_handler = logging.FileHandler(_tmp_log_path, mode="w", encoding="utf-8")
			
 
				+    _file_handler.setLevel(logging.DEBUG)
			
 
				+    _file_handler.setFormatter(formatter)
			
 
				+    root.addHandler(_file_handler)
			
 
				+
			
 
				+    for noisy in ("httpx", "httpcore", "urllib3", "asyncio"):
			
 
				+        logging.getLogger(noisy).setLevel(logging.WARNING)
			
 
				+
			
 
				+    # agent 内核日志不写入全量日志文件（减少噪音）
			
 
				+    # 过滤 agent.core.runner / agent.llm.* / agent.tools.* / agent.trace.* 等
			
 
				+    class _AgentLogFilter(logging.Filter):
			
 
				+        def filter(self, record: logging.LogRecord) -> bool:
			
 
				+            return not record.name.startswith("agent.")
			
 
				+
			
 
				+    _file_handler.addFilter(_AgentLogFilter())
			
 
				+
			
 
				+
			
 
				+_setup_logging()
			
 
				+logger = logging.getLogger(__name__)
			
 
				+
			
 
				+
			
 
				+# ─────────────────────────────────────────────
			
 
				+# 1. Budget Harness — 运行预算约束
			
 
				+# ─────────────────────────────────────────────
			
 
				+
			
 
				+@dataclass
			
 
				+class AgentBudget:
			
 
				+    """
			
 
				+    显式声明 Agent 可消耗的资源上限。
			
 
				+
			
 
				+    约束驱动原则：
			
 
				+    - 所有上限必须在启动前确定，不允许在运行中隐式扩张。
			
 
				+    - 超时由 harness 层统一兜底，不依赖各 Stage 自己的超时。
			
 
				+    """
			
 
				+    timeout_seconds: int = 1800         # 整体超时（30 分钟）
			
 
				+    max_target_count: int = 10          # 单次运行最多产出文章数（防止无限扩张）
			
 
				+    max_fallback_rounds: int = 1        # content_search gate fallback 最大轮次（防止死循环）
			
 
				+
			
 
				+    @classmethod
			
 
				+    def from_env(cls) -> "AgentBudget":
			
 
				+        return cls(
			
 
				+            timeout_seconds=int(os.getenv("PIPELINE_TIMEOUT", "1800")),
			
 
				+            max_target_count=int(os.getenv("PIPELINE_MAX_TARGET_COUNT", "10")),
			
 
				+            max_fallback_rounds=int(os.getenv("PIPELINE_MAX_FALLBACK_ROUNDS", "1")),
			
 
				+        )
			
 
				+
			
 
				+    def validate(self) -> None:
			
 
				+        """前置断言：预算参数必须在合理范围内。"""
			
 
				+        if self.timeout_seconds < 30:
			
 
				+            raise ValueError(f"timeout_seconds 至少 30 秒，当前: {self.timeout_seconds}")
			
 
				+        if self.max_target_count < 1 or self.max_target_count > 200:
			
 
				+            raise ValueError(f"max_target_count 须在 [1, 200]，当前: {self.max_target_count}")
			
 
				+        if self.max_fallback_rounds < 0 or self.max_fallback_rounds > 5:
			
 
				+            raise ValueError(f"max_fallback_rounds 须在 [0, 5]，当前: {self.max_fallback_rounds}")
			
 
				+
			
 
				+
			
 
				+# ─────────────────────────────────────────────
			
 
				+# 2. Observer Harness — 结构化运行摘要
			
 
				+# ─────────────────────────────────────────────
			
 
				+
			
 
				+@dataclass
			
 
				+class RunSummary:
			
 
				+    """
			
 
				+    Agent 运行后的结构化摘要（非裸日志）。
			
 
				+
			
 
				+    设计意图：
			
 
				+    - 调用方可检查 success / error_message 决定后续动作。
			
 
				+    - 关键指标（candidate_count / filtered_count）可接入告警。
			
 
				+    """
			
 
				+    success: bool
			
 
				+    query: str
			
 
				+    demand_id: str
			
 
				+    policy_source: str = "unknown"      # "db" | "default" | "override"
			
 
				+    trace_id: Optional[str] = None
			
 
				+    output_file: str = ""
			
 
				+    candidate_count: int = 0
			
 
				+    filtered_count: int = 0
			
 
				+    account_count: int = 0
			
 
				+    elapsed_seconds: float = 0.0
			
 
				+    error_message: str = ""
			
 
				+    stage_history: list = field(default_factory=list)
			
 
				+
			
 
				+    def log(self) -> None:
			
 
				+        """结构化打印运行摘要。"""
			
 
				+        status = "✅ 成功" if self.success else "❌ 失败"
			
 
				+        logger.info("=" * 60)
			
 
				+        logger.info("Agent 运行摘要 %s", status)
			
 
				+        logger.info("  query        : %s", self.query)
			
 
				+        logger.info("  demand_id    : %s", self.demand_id)
			
 
				+        logger.info("  policy_source: %s", self.policy_source)
			
 
				+        logger.info("  trace_id     : %s", self.trace_id)
			
 
				+        logger.info("  output_file  : %s", self.output_file)
			
 
				+        logger.info("  候选文章数    : %d", self.candidate_count)
			
 
				+        logger.info("  入选文章数    : %d", self.filtered_count)
			
 
				+        logger.info("  账号数        : %d", self.account_count)
			
 
				+        logger.info("  耗时          : %.1f 秒", self.elapsed_seconds)
			
 
				+        if self.error_message:
			
 
				+            logger.error("  错误信息      : %s", self.error_message)
			
 
				+        if self.stage_history:
			
 
				+            logger.info("  阶段历史:")
			
 
				+            for record in self.stage_history:
			
 
				+                status_flag = "✓" if record.get("status") == "completed" else "✗"
			
 
				+                logger.info(
			
 
				+                    "    %s %-28s attempt=%d",
			
 
				+                    status_flag,
			
 
				+                    record.get("stage_name", "?"),
			
 
				+                    record.get("attempt", 1),
			
 
				+                )
			
 
				+        logger.info("=" * 60)
			
 
				+
			
 
				+
			
 
				+# ─────────────────────────────────────────────
			
 
				+# 3. Planner Harness — 启动前打印运行计划
			
 
				+# ─────────────────────────────────────────────
			
 
				+
			
 
				+def print_run_plan(query: str, demand_id: str, budget: AgentBudget, trace_id: str) -> dict:
			
 
				+    """
			
 
				+    在 Agent 启动前打印结构化运行计划，并返回计划数据供 trace 使用。
			
 
				+
			
 
				+    目的：
			
 
				+    - 使运行意图可见、可审计，便于调试和追溯。
			
 
				+    - 明确各阶段目标与约束，防止"黑盒"执行。
			
 
				+    """
			
 
				+    logger.info("=" * 60)
			
 
				+    logger.info("▶ Search Agent 运行计划")
			
 
				+    logger.info("  Trace ID   : %s", trace_id)
			
 
				+    logger.info("  Query      : %s", query)
			
 
				+    logger.info("  Demand ID  : %s", demand_id or "(未指定，使用 default 策略)")
			
 
				+    logger.info("  超时上限    : %d 秒", budget.timeout_seconds)
			
 
				+    logger.info("  目标文章上限 : %d 篇", budget.max_target_count)
			
 
				+    logger.info("  最大补召回轮次: %d 轮", budget.max_fallback_rounds)
			
 
				+    logger.info("")
			
 
				+    logger.info("  阶段规划:")
			
 
				+    logger.info("    1. [demand_analysis   ]  ← 需求理解，产出搜索策略（无工具调用）")
			
 
				+    logger.info("    2. [content_search    ]  ← 按关键词召回候选文章")
			
 
				+    logger.info("       └─ Gate: SearchCompletenessGate — 候选不足则 abort")
			
 
				+    logger.info("    3. [hard_filter       ]  ← 去重 + URL / 时间基础校验")
			
 
				+    logger.info("    4. [coarse_filter     ]  ← LLM 标题语义粗筛")
			
 
				+    logger.info("    5. [quality_filter    ]  ← 数据指标评分 + LLM 正文精排")
			
 
				+    logger.info("       └─ Gate: FilterSufficiencyGate — 不足则回退补召回（最多 %d 轮）",
			
 
				+                budget.max_fallback_rounds)
			
 
				+    logger.info("    6. [account_precipitate] ← 账号信息沉淀")
			
 
				+    logger.info("    7. [output_persist    ]  ← 输出结构化 JSON")
			
 
				+    logger.info("       └─ Gate: OutputSchemaGate — 结构校验")
			
 
				+    logger.info("=" * 60)
			
 
				+
			
 
				+    return {
			
 
				+        "trace_id": trace_id,
			
 
				+        "query": query,
			
 
				+        "demand_id": demand_id or "",
			
 
				+        "timeout_seconds": budget.timeout_seconds,
			
 
				+        "max_target_count": budget.max_target_count,
			
 
				+        "max_fallback_rounds": budget.max_fallback_rounds,
			
 
				+        "stages": [
			
 
				+            {"name": "demand_analysis", "label": "需求理解，产出搜索策略"},
			
 
				+            {"name": "content_search", "label": "按关键词召回候选文章", "gate": "SearchCompletenessGate"},
			
 
				+            {"name": "hard_filter", "label": "去重 + 基础规则过滤"},
			
 
				+            {"name": "coarse_filter", "label": "LLM 标题语义粗筛"},
			
 
				+            {"name": "quality_filter", "label": "数据指标评分 + LLM 正文精排", "gate": "FilterSufficiencyGate"},
			
 
				+            {"name": "account_precipitate", "label": "账号信息沉淀"},
			
 
				+            {"name": "output_persist", "label": "输出结构化 JSON", "gate": "OutputSchemaGate"},
			
 
				+        ],
			
 
				+    }
			
 
				+
			
 
				+
			
 
				+# ─────────────────────────────────────────────
			
 
				+# 4. Fallback Harness — 前置检查与降级路径
			
 
				+# ─────────────────────────────────────────────
			
 
				+
			
 
				+def validate_prerequisites() -> None:
			
 
				+    """
			
 
				+    前置条件检查（Harness 级别，不依赖 Core 内部检查）。
			
 
				+
			
 
				+    设计意图：
			
 
				+    - 把必须满足的约束提升到最外层，让失败快速、信息明确。
			
 
				+    - 避免在深层 Stage 里才触发 "OPEN_ROUTER_API_KEY 未设置"。
			
 
				+    """
			
 
				+    api_key = os.getenv("OPEN_ROUTER_API_KEY", "").strip()
			
 
				+    if not api_key:
			
 
				+        raise EnvironmentError(
			
 
				+            "缺少必要环境变量: OPEN_ROUTER_API_KEY\n"
			
 
				+            "请在 .env 文件或系统环境中设置该变量后重试。"
			
 
				+        )
			
 
				+
			
 
				+
			
 
				+# ─────────────────────────────────────────────
			
 
				+# 5. 主流程 — Harness 统一编排
			
 
				+# ─────────────────────────────────────────────
			
 
				+
			
 
				+async def run_with_harness(
			
 
				+    query: str,
			
 
				+    demand_id: str,
			
 
				+    budget: AgentBudget,
			
 
				+    trace_id: str,
			
 
				+    use_db_policy: bool = True,
			
 
				+    run_plan: dict | None = None,
			
 
				+) -> RunSummary:
			
 
				+    """
			
 
				+    带 Harness 的 Agent 执行入口。
			
 
				+
			
 
				+    职责分层：
			
 
				+    - 本函数只做"约束注入 + 超时包裹 + 摘要采集"。
			
 
				+    - 业务逻辑委托给 SearchAgentCore。
			
 
				+    - 不在这里写 if/else 业务判断。
			
 
				+    """
			
 
				+
			
 
				+    start = time.monotonic()
			
 
				+    summary = RunSummary(success=False, query=query, demand_id=demand_id, trace_id=trace_id)
			
 
				+
			
 
				+    # --- 策略来源标记（Observer 用） ---
			
 
				+    core = SearchAgentCore()
			
 
				+    policy_override: Optional[SearchAgentPolicy] = None
			
 
				+
			
 
				+    if use_db_policy:
			
 
				+        try:
			
 
				+            # 预读策略仅用于确认 DB 连通性和标记来源；
			
 
				+            # SearchAgentCore.run() 内部会用同一 demand_id 再次加载。
			
 
				+            await core.load_policy(demand_id or None)
			
 
				+            summary.policy_source = "db"
			
 
				+            logger.info("策略已从 DB 加载: demand_id=%s", demand_id)
			
 
				+        except Exception as exc:
			
 
				+            logger.warning("DB 策略读取失败，降级为默认策略: %s", exc)
			
 
				+            policy_override = SearchAgentPolicy.defaults()
			
 
				+            summary.policy_source = "default(fallback)"
			
 
				+    else:
			
 
				+        policy_override = SearchAgentPolicy.defaults()
			
 
				+        summary.policy_source = "default"
			
 
				+
			
 
				+    # --- 预算注入：target_count 不超过 max_target_count ---
			
 
				+    from src.pipeline.config.pipeline_config import RuntimePipelineConfig
			
 
				+    runtime = RuntimePipelineConfig.from_env()
			
 
				+    effective_target = min(runtime.target_count, budget.max_target_count)
			
 
				+    if effective_target != runtime.target_count:
			
 
				+        logger.info(
			
 
				+            "target_count 被 Budget Harness 限制: %d → %d",
			
 
				+            runtime.target_count,
			
 
				+            effective_target,
			
 
				+        )
			
 
				+
			
 
				+    # --- 超时包裹执行 ---
			
 
				+    try:
			
 
				+        ctx = await asyncio.wait_for(
			
 
				+            core.run(
			
 
				+                query=query,
			
 
				+                demand_id=demand_id,
			
 
				+                target_count=effective_target,
			
 
				+                use_db_policy=(policy_override is None),
			
 
				+                policy_override=policy_override,
			
 
				+                trace_id=trace_id,
			
 
				+                run_plan=run_plan,
			
 
				+            ),
			
 
				+            timeout=budget.timeout_seconds,
			
 
				+        )
			
 
				+    except asyncio.TimeoutError:
			
 
				+        summary.elapsed_seconds = time.monotonic() - start
			
 
				+        summary.error_message = f"Agent 超时（>{budget.timeout_seconds}s），已中止"
			
 
				+        logger.error(summary.error_message)
			
 
				+        return summary
			
 
				+    except Exception as exc:
			
 
				+        summary.elapsed_seconds = time.monotonic() - start
			
 
				+        summary.error_message = str(exc)
			
 
				+        logger.exception("Agent 运行异常: %s", exc)
			
 
				+        return summary
			
 
				+
			
 
				+    # --- 采集 Observer 摘要 ---
			
 
				+    summary.success = True
			
 
				+    summary.output_file = ctx.metadata.get("output_file", "")
			
 
				+    summary.candidate_count = len(ctx.candidate_articles)
			
 
				+    summary.filtered_count = len(ctx.filtered_articles)
			
 
				+    summary.account_count = len(ctx.accounts)
			
 
				+    summary.elapsed_seconds = time.monotonic() - start
			
 
				+    summary.stage_history = [
			
 
				+        {
			
 
				+            "stage_name": r.stage_name,
			
 
				+            "status": r.status,
			
 
				+            "attempt": r.attempt,
			
 
				+        }
			
 
				+        for r in ctx.stage_history
			
 
				+    ]
			
 
				+    return summary
			
 
				+
			
 
				+
			
 
				+async def main() -> None:
			
 
				+    # ① 前置检查（Fallback Harness）
			
 
				+    validate_prerequisites()
			
 
				+
			
 
				+    # ② 读取运行参数
			
 
				+    query = os.getenv("PIPELINE_QUERY", "伊朗以色列冲突、中老年人会关注什么？")
			
 
				+    demand_id = os.getenv("PIPELINE_DEMAND_ID", "1")
			
 
				+
			
 
				+    # ③ 预算约束（Budget Harness）
			
 
				+    budget = AgentBudget.from_env()
			
 
				+    budget.validate()
			
 
				+
			
 
				+    # ④ 生成全局 trace_id，贯穿整个运行周期
			
 
				+    trace_id = str(uuid4())
			
 
				+    logger.info("Trace ID: %s", trace_id)
			
 
				+
			
 
				+    # ⑤ 运行计划（Planner Harness）
			
 
				+    run_plan = print_run_plan(query=query, demand_id=demand_id, budget=budget, trace_id=trace_id)
			
 
				+
			
 
				+    # ⑥ 执行（带约束 + 观测）
			
 
				+    summary = await run_with_harness(
			
 
				+        query=query,
			
 
				+        demand_id=demand_id,
			
 
				+        budget=budget,
			
 
				+        trace_id=trace_id,
			
 
				+        use_db_policy=True,
			
 
				+        run_plan=run_plan,
			
 
				+    )
			
 
				+
			
 
				+    # ⑦ 结构化输出摘要（Observer Harness）
			
 
				+    summary.log()
			
 
				+
			
 
				+    # ⑧ 将全量日志移入 trace 目录
			
 
				+    global _file_handler, _tmp_log_path
			
 
				+    if _file_handler and _tmp_log_path and os.path.exists(_tmp_log_path):
			
 
				+        try:
			
 
				+            _file_handler.close()
			
 
				+            trace_dir = os.path.join("tests", "traces", trace_id)
			
 
				+            os.makedirs(trace_dir, exist_ok=True)
			
 
				+            dest = os.path.join(trace_dir, "full_log.log")
			
 
				+            shutil.move(_tmp_log_path, dest)
			
 
				+            logger.info("完整日志已保存: %s", dest)
			
 
				+        except Exception as exc:
			
 
				+            logger.warning("移动日志文件失败: %s", exc)
			
 
				+
			
 
				+    # ⑨ 非零退出码（让 CI/调度系统能感知失败）
			
 
				+    if not summary.success:
			
 
				+        raise SystemExit(1)
			
 
				+
			
 
				+
			
 
				+if __name__ == "__main__":
			
 
				+    asyncio.run(main())
			
--- a/src/domain/search/__init__.py
+++ b/src/domain/search/__init__.py
@@ -0,0 +1,10 @@
 
				+from .core import SearchAgentCore
			
 
				+from .policy import SearchAgentPolicy, apply_search_agent_policy
			
 
				+from .repository import SearchAgentPolicyRepository
			
 
				+
			
 
				+__all__ = [
			
 
				+    "SearchAgentCore",
			
 
				+    "SearchAgentPolicy",
			
 
				+    "SearchAgentPolicyRepository",
			
 
				+    "apply_search_agent_policy",
			
 
				+]
			
--- a/src/domain/search/core.py
+++ b/src/domain/search/core.py
@@ -1,25 +1,84 @@
 
				-from typing import Optional
			
 
				-from src.agents import cop_agent
			
 
				+from __future__ import annotations
			
 
				+
			
 
				+import logging
			
 
				+import os
			
 
				+from pathlib import Path
			
 
				+from uuid import uuid4
			
 
				 
			
 
				 from src.config import LongArticlesSearchAgentConfig
			
 
				+from src.domain.search.policy import SearchAgentPolicy, apply_search_agent_policy
			
 
				+from src.domain.search.repository import SearchAgentPolicyRepository
			
 
				 from src.infra.database import AsyncMySQLPool
			
 
				-from src.infra.trace import LogService
			
 
				+from src.pipeline.context import PipelineContext
			
 
				+from src.pipeline.runner import default_knowledge_sources, run_content_finder_pipeline
			
 
				+
			
 
				+logger = logging.getLogger(__name__)
			
 
				+
			
 
				+REPO_ROOT = Path(__file__).resolve().parents[3]
			
 
				+
			
 
				 
			
 
				 class SearchAgentCore:
			
 
				-    def __init__(self, pool: AsyncMySQLPool, log_service: LogService, config: LongArticlesSearchAgentConfig):
			
 
				-        self.pool = pool
			
 
				-        self.log_service = log_service
			
 
				-        self.config = config
			
 
				-
			
 
				-    async def run_agent(
			
 
				-            self, query: Optional[str] = None,
			
 
				-            trace_id: Optional[int] = None,
			
 
				-            stream_output: bool = True
			
 
				+    """
			
 
				+    Search Agent 服务入口：从 search_agent 库加载策略，注入 Pipeline 并执行。
			
 
				+
			
 
				+    依赖：
			
 
				+    - OPEN_ROUTER_API_KEY
			
 
				+    - SEARCH_AGENT_DB_* 已配置且表 search_agent_strategy 已建（可选；失败则用默认策略）
			
 
				+    """
			
 
				+
			
 
				+    def __init__(
			
 
				+        self,
			
 
				+        config: LongArticlesSearchAgentConfig | None = None,
			
 
				+        pool: AsyncMySQLPool | None = None,
			
 
				     ):
			
 
				-        # query = query or DEFAULT_QUERY
			
 
				-        res = cop_agent.AgentRunner()
			
 
				+        self.config = config or LongArticlesSearchAgentConfig()
			
 
				+        self.pool = pool or AsyncMySQLPool(self.config)
			
 
				+        self._policy_repo = SearchAgentPolicyRepository(self.pool)
			
 
				+
			
 
				+    async def load_policy(self, demand_id: str | None = None) -> SearchAgentPolicy:
			
 
				+        return await self._policy_repo.load_policy(demand_id)
			
 
				+
			
 
				+    async def run(
			
 
				+        self,
			
 
				+        query: str,
			
 
				+        demand_id: str = "",
			
 
				+        target_count: int | None = None,
			
 
				+        *,
			
 
				+        use_db_policy: bool = True,
			
 
				+        policy_override: SearchAgentPolicy | None = None,
			
 
				+        trace_id: str | None = None,
			
 
				+        run_plan: dict | None = None,
			
 
				+    ) -> PipelineContext:
			
 
				+        if not os.getenv("OPEN_ROUTER_API_KEY"):
			
 
				+            raise ValueError("OPEN_ROUTER_API_KEY 未设置")
			
 
				+
			
 
				+        if policy_override is not None:
			
 
				+            policy = policy_override
			
 
				+        elif use_db_policy:
			
 
				+            policy = await self.load_policy(demand_id or None)
			
 
				+        else:
			
 
				+            policy = SearchAgentPolicy.defaults()
			
 
				 
			
 
				-        pass
			
 
				+        from src.pipeline.config.pipeline_config import RuntimePipelineConfig
			
 
				 
			
 
				-    async def deal(self):
			
 
				-        pass
			
 
				+        runtime = RuntimePipelineConfig.from_env()
			
 
				+        trace_id = trace_id or str(uuid4())
			
 
				+        ctx = PipelineContext(
			
 
				+            task_id=str(uuid4()),
			
 
				+            trace_id=trace_id,
			
 
				+            query=query,
			
 
				+            demand_id=demand_id,
			
 
				+            target_count=target_count or runtime.target_count,
			
 
				+            model=runtime.model,
			
 
				+            output_dir=str(REPO_ROOT / "tests" / "output"),
			
 
				+            knowledge_sources=default_knowledge_sources(),
			
 
				+        )
			
 
				+        apply_search_agent_policy(ctx, policy)
			
 
				+        if run_plan:
			
 
				+            ctx.metadata["_run_plan"] = run_plan
			
 
				+        logger.info(
			
 
				+            "SearchAgentCore 已注入策略: demand_id=%s policy=%s",
			
 
				+            demand_id,
			
 
				+            ctx.metadata.get("search_agent_policy"),
			
 
				+        )
			
 
				+        return await run_content_finder_pipeline(ctx)
			
--- a/src/domain/search/policy.py
+++ b/src/domain/search/policy.py
@@ -0,0 +1,88 @@
 
				+from __future__ import annotations
			
 
				+
			
 
				+import json
			
 
				+from dataclasses import dataclass, field
			
 
				+from typing import Any, Dict, List, Literal
			
 
				+
			
 
				+from src.pipeline.context import PipelineContext
			
 
				+
			
 
				+KeywordPriority = Literal["demand_first", "query_first"]
			
 
				+
			
 
				+
			
 
				+@dataclass
			
 
				+class SearchAgentPolicy:
			
 
				+    """
			
 
				+    search_agent 库中策略的运行时参数（与表 search_agent_strategy.config_json 对齐）。
			
 
				+
			
 
				+    未在库中配置时使用 defaults()。
			
 
				+    """
			
 
				+
			
 
				+    max_keywords: int = 6
			
 
				+    initial_cursor: str = "1"
			
 
				+    keyword_priority: KeywordPriority = "demand_first"
			
 
				+    extra_keywords: List[str] = field(default_factory=list)
			
 
				+    min_candidate_multiplier: float = 2.0
			
 
				+    near_enough_candidate_multiplier: float = 1.2
			
 
				+    filter_near_ratio: float = 0.8
			
 
				+    max_detail_fetch: int = 30
			
 
				+    enable_llm_review: bool = True
			
 
				+
			
 
				+    @classmethod
			
 
				+    def defaults(cls) -> SearchAgentPolicy:
			
 
				+        return cls()
			
 
				+
			
 
				+    @classmethod
			
 
				+    def from_dict(cls, data: Dict[str, Any]) -> SearchAgentPolicy:
			
 
				+        base = cls.defaults().__dict__.copy()
			
 
				+        for key, value in (data or {}).items():
			
 
				+            if key in base and value is not None:
			
 
				+                base[key] = value
			
 
				+        ek = base["extra_keywords"]
			
 
				+        if not isinstance(ek, list):
			
 
				+            ek = []
			
 
				+        return cls(
			
 
				+            max_keywords=int(base["max_keywords"]),
			
 
				+            initial_cursor=str(base["initial_cursor"]),
			
 
				+            keyword_priority=base["keyword_priority"]
			
 
				+            if base["keyword_priority"] in ("demand_first", "query_first")
			
 
				+            else "demand_first",
			
 
				+            extra_keywords=[str(x).strip() for x in ek if str(x).strip()],
			
 
				+            min_candidate_multiplier=float(base["min_candidate_multiplier"]),
			
 
				+            near_enough_candidate_multiplier=float(base["near_enough_candidate_multiplier"]),
			
 
				+            filter_near_ratio=float(base["filter_near_ratio"]),
			
 
				+            max_detail_fetch=int(base["max_detail_fetch"]),
			
 
				+            enable_llm_review=bool(base["enable_llm_review"]),
			
 
				+        )
			
 
				+
			
 
				+    def to_dict(self) -> Dict[str, Any]:
			
 
				+        return {
			
 
				+            "max_keywords": self.max_keywords,
			
 
				+            "initial_cursor": self.initial_cursor,
			
 
				+            "keyword_priority": self.keyword_priority,
			
 
				+            "extra_keywords": list(self.extra_keywords),
			
 
				+            "min_candidate_multiplier": self.min_candidate_multiplier,
			
 
				+            "near_enough_candidate_multiplier": self.near_enough_candidate_multiplier,
			
 
				+            "filter_near_ratio": self.filter_near_ratio,
			
 
				+            "max_detail_fetch": self.max_detail_fetch,
			
 
				+            "enable_llm_review": self.enable_llm_review,
			
 
				+        }
			
 
				+
			
 
				+
			
 
				+def apply_search_agent_policy(ctx: PipelineContext, policy: SearchAgentPolicy) -> None:
			
 
				+    """将策略写入上下文，供 Stage / Gate 读取。"""
			
 
				+    ctx.metadata["search_agent_policy"] = policy.to_dict()
			
 
				+
			
 
				+
			
 
				+def parse_policy_json(raw: Any) -> Dict[str, Any]:
			
 
				+    if raw is None:
			
 
				+        return {}
			
 
				+    if isinstance(raw, dict):
			
 
				+        return raw
			
 
				+    if isinstance(raw, (bytes, bytearray)):
			
 
				+        raw = raw.decode("utf-8", errors="replace")
			
 
				+    if isinstance(raw, str):
			
 
				+        try:
			
 
				+            return json.loads(raw)
			
 
				+        except json.JSONDecodeError:
			
 
				+            return {}
			
 
				+    return {}
			
--- a/src/domain/search/repository.py
+++ b/src/domain/search/repository.py
@@ -0,0 +1,60 @@
 
				+from __future__ import annotations
			
 
				+
			
 
				+import logging
			
 
				+from typing import Any, Dict, Optional
			
 
				+
			
 
				+from src.domain.search.policy import SearchAgentPolicy, parse_policy_json
			
 
				+from src.infra.database import AsyncMySQLPool
			
 
				+
			
 
				+logger = logging.getLogger(__name__)
			
 
				+
			
 
				+
			
 
				+class SearchAgentPolicyRepository:
			
 
				+    """从 search_agent 库加载 search_agent_strategy 配置。"""
			
 
				+
			
 
				+    TABLE = "search_agent_strategy"
			
 
				+
			
 
				+    def __init__(self, pool: AsyncMySQLPool):
			
 
				+        self._pool = pool
			
 
				+
			
 
				+    async def load_policy(self, demand_id: Optional[str] = None) -> SearchAgentPolicy:
			
 
				+        merged: Dict[str, Any] = {}
			
 
				+        try:
			
 
				+            if not self._pool.get_pool("search_agent"):
			
 
				+                await self._pool.init_pools()
			
 
				+            pool = self._pool.get_pool("search_agent")
			
 
				+            if not pool:
			
 
				+                return SearchAgentPolicy.defaults()
			
 
				+
			
 
				+            row = await self._fetch_by_demand(demand_id)
			
 
				+            if not row:
			
 
				+                row = await self._fetch_default()
			
 
				+            if row and row.get("config_json") is not None:
			
 
				+                merged.update(parse_policy_json(row["config_json"]))
			
 
				+        except Exception as exc:
			
 
				+            logger.warning("search_agent 策略读取失败，使用默认策略: %s", exc)
			
 
				+            return SearchAgentPolicy.defaults()
			
 
				+
			
 
				+        return SearchAgentPolicy.from_dict(merged)
			
 
				+
			
 
				+    async def _fetch_by_demand(self, demand_id: Optional[str]) -> Optional[Dict[str, Any]]:
			
 
				+        if not demand_id or not str(demand_id).strip():
			
 
				+            return None
			
 
				+        try:
			
 
				+            did = int(str(demand_id).strip())
			
 
				+        except ValueError:
			
 
				+            return None
			
 
				+        sql = (
			
 
				+            f"SELECT config_json FROM {self.TABLE} "
			
 
				+            "WHERE demand_id=%s AND enabled=1 ORDER BY version DESC, id DESC LIMIT 1"
			
 
				+        )
			
 
				+        rows = await self._pool.async_fetch(sql, params=(did,))
			
 
				+        return rows[0] if rows else None
			
 
				+
			
 
				+    async def _fetch_default(self) -> Optional[Dict[str, Any]]:
			
 
				+        sql = (
			
 
				+            f"SELECT config_json FROM {self.TABLE} "
			
 
				+            "WHERE strategy_code=%s AND enabled=1 ORDER BY version DESC, id DESC LIMIT 1"
			
 
				+        )
			
 
				+        rows = await self._pool.async_fetch(sql, params=("default",))
			
 
				+        return rows[0] if rows else None
			
--- a/src/infra/shared/common.py
+++ b/src/infra/shared/common.py
@@ -1,5 +1,8 @@
 
				+import logging
			
 
				 from typing import Any, Dict, List, Optional
			
 
				 
			
 
				+logger = logging.getLogger(__name__)
			
 
				+
			
 
				 
			
 
				 def show_desc_to_sta(show_desc):
			
 
				     def decode_show_v(show_v):
			
@@ -27,7 +30,7 @@ def show_desc_to_sta(show_desc):
 
				             "赞赏": "show_zs_count",
			
 
				         }
			
 
				         if show_k not in this_dict:
			
 
				-            print(f"error from decode_show_k, show_k not found: {show_k}")
			
 
				+            logger.warning("decode_show_k: show_k not found: %s", show_k)
			
 
				         return this_dict.get(show_k, "show_unknown")
			
 
				 
			
 
				     show_desc = show_desc.replace("+", "")
			
--- a/src/infra/shared/http_client.py
+++ b/src/infra/shared/http_client.py
@@ -1,6 +1,9 @@
 
				+import logging
			
 
				 import aiohttp
			
 
				 from typing import Optional, Union, Dict, Any
			
 
				 
			
 
				+logger = logging.getLogger(__name__)
			
 
				+
			
 
				 
			
 
				 class AsyncHttpClient:
			
 
				     def __init__(
			
@@ -59,10 +62,10 @@ class AsyncHttpClient:
 
				                 return await response.text()
			
 
				 
			
 
				         except aiohttp.ClientResponseError as e:
			
 
				-            print(f"HTTP error: {e.status} {e.message}")
			
 
				+            logger.error("HTTP error: %d %s", e.status, e.message)
			
 
				             raise
			
 
				         except aiohttp.ClientError as e:
			
 
				-            print(f"Network error: {str(e)}")
			
 
				+            logger.error("Network error: %s", e)
			
 
				             raise
			
 
				 
			
 
				     async def get(
			
--- a/src/infra/trace/logging/log_capture.py
+++ b/src/infra/trace/logging/log_capture.py
@@ -0,0 +1,134 @@
 
				+"""
			
 
				+Tee 日志捕获工具
			
 
				+
			
 
				+支持多 Agent 并发执行：
			
 
				+  - 每个 Agent 通过 build_log(build_id) 注册自己的日志 buffer
			
 
				+  - log() 函数根据 contextvars 自动路由到当前 Agent 的 buffer
			
 
				+  - 同时输出到真实 stdout，不劫持 sys.stdout
			
 
				+"""
			
 
				+import io
			
 
				+import sys
			
 
				+import contextvars
			
 
				+import threading
			
 
				+from contextlib import contextmanager
			
 
				+
			
 
				+# 当前 Agent 执行绑定的 build_id（通过 contextvars 跨 asyncio.to_thread 传播）
			
 
				+_current_build_id: contextvars.ContextVar[int | None] = contextvars.ContextVar(
			
 
				+    'log_build_id', default=None
			
 
				+)
			
 
				+
			
 
				+# build_id → StringIO buffer 的全局注册表（线程安全）
			
 
				+_buffers: dict[int, io.StringIO] = {}
			
 
				+_buffers_lock = threading.Lock()
			
 
				+
			
 
				+# 保存真实 stdout（进程启动时的值，不会被覆盖）
			
 
				+_real_stdout = sys.stdout
			
 
				+
			
 
				+
			
 
				+def log(*args, **kwargs):
			
 
				+    """并发安全的日志函数，替代 print()。
			
 
				+
			
 
				+    同时输出到 stdout 和当前 Agent 的日志 buffer。
			
 
				+    如果不在 Agent 上下文中，等同于普通 print()。
			
 
				+    """
			
 
				+    # 1. 始终输出到真实 stdout
			
 
				+    print(*args, file=_real_stdout, **kwargs)
			
 
				+
			
 
				+    # 2. 如果在 Agent 上下文中，额外写入 buffer
			
 
				+    build_id = _current_build_id.get()
			
 
				+    if build_id is not None:
			
 
				+        buf = _buffers.get(build_id)
			
 
				+        if buf is not None:
			
 
				+            print(*args, file=buf, **kwargs)
			
 
				+
			
 
				+
			
 
				+@contextmanager
			
 
				+def build_log(build_id: int):
			
 
				+    """Agent 执行的日志上下文管理器。
			
 
				+
			
 
				+    使用方式:
			
 
				+        with build_log(build_id):
			
 
				+            log("这条会写入 buffer")
			
 
				+            ...
			
 
				+        # with 结束后仅清理内存缓冲区
			
 
				+    """
			
 
				+    buf = io.StringIO()
			
 
				+    token = _current_build_id.set(build_id)
			
 
				+
			
 
				+    with _buffers_lock:
			
 
				+        _buffers[build_id] = buf
			
 
				+
			
 
				+    try:
			
 
				+        yield buf
			
 
				+    finally:
			
 
				+        # 清理
			
 
				+        with _buffers_lock:
			
 
				+            _buffers.pop(build_id, None)
			
 
				+        _current_build_id.reset(token)
			
 
				+        buf.close()
			
 
				+
			
 
				+
			
 
				+@contextmanager
			
 
				+def log_fold(label: str):
			
 
				+    """可折叠日志块的上下文管理器"""
			
 
				+    log(f"[FOLD:{label}]")
			
 
				+    try:
			
 
				+        yield
			
 
				+    finally:
			
 
				+        log("[/FOLD]")
			
 
				+
			
 
				+
			
 
				+def get_log_content(build_id: int) -> str | None:
			
 
				+    """获取指定 build 当前已收集的日志内容（用于实时查看）"""
			
 
				+    buf = _buffers.get(build_id)
			
 
				+    return buf.getvalue() if buf else None
			
 
				+
			
 
				+
			
 
				+def _save_to_db(build_id: int, content: str) -> bool:
			
 
				+    """兼容旧接口：已禁用 DB 落库。"""
			
 
				+    return False
			
 
				+
			
 
				+
			
 
				+# ============================================================================
			
 
				+# 兼容旧接口 — TeeStream（仅供单线程场景使用，如 run_build_topic_agent.py）
			
 
				+# ============================================================================
			
 
				+
			
 
				+class TeeStream(io.TextIOBase):
			
 
				+    """Tee 模式的输出流：同时写入原始 stdout 和内部缓冲区
			
 
				+
			
 
				+    ⚠️ 仅供单进程单 Agent 使用（如命令行运行），并发场景请使用 build_log()。
			
 
				+    """
			
 
				+
			
 
				+    def __init__(self, original_stdout):
			
 
				+        super().__init__()
			
 
				+        self.original_stdout = original_stdout
			
 
				+        self._buffer = io.StringIO()
			
 
				+
			
 
				+    def write(self, s):
			
 
				+        if s:
			
 
				+            self.original_stdout.write(s)
			
 
				+            self._buffer.write(s)
			
 
				+        return len(s) if s else 0
			
 
				+
			
 
				+    def flush(self):
			
 
				+        self.original_stdout.flush()
			
 
				+        self._buffer.flush()
			
 
				+
			
 
				+    def get_log(self) -> str:
			
 
				+        return self._buffer.getvalue()
			
 
				+
			
 
				+    def save_to_db(self, build_id: int) -> bool:
			
 
				+        return False
			
 
				+
			
 
				+    @property
			
 
				+    def encoding(self):
			
 
				+        return self.original_stdout.encoding
			
 
				+
			
 
				+    def isatty(self):
			
 
				+        return False
			
 
				+
			
 
				+    def readable(self):
			
 
				+        return False
			
 
				+
			
 
				+    def writable(self):
			
 
				+        return True
			
--- a/src/infra/trace/logging/log_service.py
+++ b/src/infra/trace/logging/log_service.py
@@ -1,4 +1,5 @@
 
				 import asyncio
			
 
				+import logging
			
 
				 import traceback
			
 
				 import time, json
			
 
				 import datetime
			
@@ -8,6 +9,8 @@ from typing import Optional
 
				 from aliyun.log import LogClient, PutLogsRequest, LogItem
			
 
				 from src.config.aliyun import AliyunLogConfig
			
 
				 
			
 
				+logger = logging.getLogger(__name__)
			
 
				+
			
 
				 
			
 
				 class LogService:
			
 
				     def __init__(self, log_config: AliyunLogConfig):
			
@@ -65,8 +68,7 @@ class LogService:
 
				                 try:
			
 
				                     await asyncio.to_thread(self._put_log, contents)
			
 
				                 except Exception as e:
			
 
				-                    print(f"[Log Error] {e}")
			
 
				-                    print(traceback.format_exc())
			
 
				+                    logger.error("Log worker error: %s\n%s", e, traceback.format_exc())
			
 
				         except asyncio.CancelledError:
			
 
				             pass
			
 
				 
			
--- a/src/infra/trace/logging/tool_logging.py
+++ b/src/infra/trace/logging/tool_logging.py
@@ -0,0 +1,129 @@
 
				+"""工具调用日志的通用封装。"""
			
 
				+
			
 
				+from __future__ import annotations
			
 
				+
			
 
				+import json
			
 
				+from typing import Any, Dict
			
 
				+
			
 
				+from .log_capture import log, log_fold
			
 
				+
			
 
				+
			
 
				+def _pretty_json_if_possible(text: str) -> str:
			
 
				+    """如果文本是合法 JSON，则返回带缩进的可读格式；否则原样返回。"""
			
 
				+    raw = (text or "").strip()
			
 
				+    if not raw:
			
 
				+        return text
			
 
				+    if not (raw.startswith("{") or raw.startswith("[")):
			
 
				+        return text
			
 
				+    try:
			
 
				+        parsed = json.loads(raw)
			
 
				+    except Exception:
			
 
				+        return text
			
 
				+    return json.dumps(parsed, ensure_ascii=False, indent=2)
			
 
				+
			
 
				+
			
 
				+def _truncate_deep(obj: Any, str_limit: int = 2000) -> Any:
			
 
				+    """递归遍历对象，对超长字符串做截断，其余结构原样保留。"""
			
 
				+    if isinstance(obj, str):
			
 
				+        return obj if len(obj) <= str_limit else obj[:str_limit] + f"...(truncated, total {len(obj)} chars)"
			
 
				+    if isinstance(obj, dict):
			
 
				+        return {k: _truncate_deep(v, str_limit) for k, v in obj.items()}
			
 
				+    if isinstance(obj, list):
			
 
				+        return [_truncate_deep(item, str_limit) for item in obj]
			
 
				+    return obj
			
 
				+
			
 
				+
			
 
				+def _structure_metadata(md: Dict[str, Any], body_limit: int = 200) -> Dict[str, Any]:
			
 
				+    """对 metadata 做结构化精简，剥离 raw_data / 完整正文等大字段。
			
 
				+
			
 
				+    - 含 article_info 的结果：提取标题、统计、正文预览，丢弃 raw HTML / 图片列表。
			
 
				+    - 含 account_info 的结果：保留账号关键字段。
			
 
				+    - 含 search_results 的结果：每条只保留标题和 URL。
			
 
				+    - 其他情况：递归截断超长字符串。
			
 
				+    """
			
 
				+    # --- 文章详情 ---
			
 
				+    article_info = md.get("article_info")
			
 
				+    if isinstance(article_info, dict):
			
 
				+        body = str(article_info.get("body_text", "") or "")
			
 
				+        body_preview = body[:body_limit] + "..." if len(body) > body_limit else body
			
 
				+        # 去掉图片标记行
			
 
				+        body_preview = "\n".join(
			
 
				+            line for line in body_preview.splitlines()
			
 
				+            if not line.strip().startswith("[image:")
			
 
				+        )
			
 
				+        images = article_info.get("image_url_list") or []
			
 
				+        return {
			
 
				+            "article_info": {
			
 
				+                "title": article_info.get("title", ""),
			
 
				+                "content_link": article_info.get("content_link", ""),
			
 
				+                "publish_timestamp": article_info.get("publish_timestamp"),
			
 
				+                "statistics": {
			
 
				+                    "view_count": article_info.get("view_count"),
			
 
				+                    "like_count": article_info.get("like_count"),
			
 
				+                    "share_count": article_info.get("share_count"),
			
 
				+                    "looking_count": article_info.get("looking_count"),
			
 
				+                    "comment_count": article_info.get("comment_count"),
			
 
				+                    "collect_count": article_info.get("collect_count"),
			
 
				+                },
			
 
				+                "is_original": article_info.get("is_original", False),
			
 
				+                "image_count": len(images),
			
 
				+                "body_length": len(body),
			
 
				+                "body_preview": body_preview,
			
 
				+            }
			
 
				+        }
			
 
				+
			
 
				+    # --- 账号信息 ---
			
 
				+    account_info = md.get("account_info")
			
 
				+    if isinstance(account_info, dict):
			
 
				+        return {
			
 
				+            "account_info": {
			
 
				+                "account_name": account_info.get("account_name", ""),
			
 
				+                "wx_gh": account_info.get("wx_gh", ""),
			
 
				+                "channel_account_id": account_info.get("channel_account_id", ""),
			
 
				+            }
			
 
				+        }
			
 
				+
			
 
				+    # --- 搜索结果列表 ---
			
 
				+    search_results = md.get("search_results")
			
 
				+    if isinstance(search_results, list):
			
 
				+        brief = [
			
 
				+            {"title": item.get("title", ""), "url": item.get("url", "")}
			
 
				+            for item in search_results[:20]
			
 
				+        ]
			
 
				+        return {"search_results": brief, "total": len(search_results)}
			
 
				+
			
 
				+    # --- 兜底：递归截断 ---
			
 
				+    return _truncate_deep(md)
			
 
				+
			
 
				+
			
 
				+def format_tool_result_for_log(result: Any) -> str:
			
 
				+    """将 ToolResult 或普通字符串格式化为可写入日志的文本。
			
 
				+
			
 
				+    对文章详情类结果，输出结构化摘要（标题/统计/正文预览），
			
 
				+    剥离 raw_data 和完整正文，避免日志被大段内容淹没。
			
 
				+    """
			
 
				+    if result is None:
			
 
				+        return ""
			
 
				+    if isinstance(result, str):
			
 
				+        s = result
			
 
				+        return s if len(s) <= 8000 else s[:8000] + "\n...(truncated)"
			
 
				+    title = getattr(result, "title", "") or ""
			
 
				+    output = getattr(result, "output", None) or ""
			
 
				+    err = getattr(result, "error", None)
			
 
				+    payload: Dict[str, Any] = {"title": title, "output": output}
			
 
				+    if err:
			
 
				+        payload["error"] = err
			
 
				+    md = getattr(result, "metadata", None)
			
 
				+    if isinstance(md, dict) and md:
			
 
				+        payload["metadata"] = _structure_metadata(md)
			
 
				+    return json.dumps(payload, ensure_ascii=False)
			
 
				+
			
 
				+
			
 
				+def log_tool_call(tool_name: str, params: Dict[str, Any], result: str) -> None:
			
 
				+    """以折叠块结构化输出工具调用参数与返回内容。"""
			
 
				+    with log_fold(f"🔧 {tool_name}"):
			
 
				+        with log_fold("📥 调用参数"):
			
 
				+            log(json.dumps(params, ensure_ascii=False, indent=2))
			
 
				+        with log_fold("📤 返回内容"):
			
 
				+            log(_pretty_json_if_possible(result))
			
 
				+
			
--- a/src/pipeline/__init__.py
+++ b/src/pipeline/__init__.py
@@ -0,0 +1,11 @@
 
				+"""Pipeline 对外导出入口。"""
			
 
				+
			
 
				+from .base import PipelineConfig
			
 
				+from .context import PipelineContext
			
 
				+from .orchestrator import PipelineOrchestrator
			
 
				+
			
 
				+__all__ = [
			
 
				+    "PipelineConfig",
			
 
				+    "PipelineContext",
			
 
				+    "PipelineOrchestrator",
			
 
				+]
			
--- a/src/pipeline/adapters/__init__.py
+++ b/src/pipeline/adapters/__init__.py
@@ -0,0 +1,6 @@
 
				+"""适配器层对外导出。"""
			
 
				+
			
 
				+from .base import ArticleDetail, ToolAdapter
			
 
				+from .weixin import WeixinToolAdapter
			
 
				+
			
 
				+__all__ = ["ArticleDetail", "ToolAdapter", "WeixinToolAdapter"]
			
--- a/src/pipeline/adapters/base.py
+++ b/src/pipeline/adapters/base.py
@@ -0,0 +1,88 @@
 
				+from __future__ import annotations
			
 
				+
			
 
				+"""
			
 
				+执行层适配器抽象。
			
 
				+
			
 
				+设计意图：
			
 
				+- 把“业务编排”与“外部平台工具调用”解耦。
			
 
				+- 上层 Stage 只依赖 ToolAdapter 接口，不关心微信/其他平台实现细节。
			
 
				+"""
			
 
				+
			
 
				+from abc import ABC, abstractmethod
			
 
				+from dataclasses import dataclass, field
			
 
				+from datetime import datetime, timezone
			
 
				+from typing import Any, Dict, List, Optional
			
 
				+
			
 
				+from src.pipeline.context import AccountInfo, CandidateArticle
			
 
				+
			
 
				+
			
 
				+@dataclass
			
 
				+class ArticleDetail:
			
 
				+    """文章详情的标准化结构。"""
			
 
				+
			
 
				+    title: str
			
 
				+    url: str
			
 
				+    publish_time: int
			
 
				+    body_text: str = ""
			
 
				+    view_count: int = 0
			
 
				+    like_count: int = 0
			
 
				+    share_count: int = 0
			
 
				+    looking_count: int = 0
			
 
				+    comment_count: int = 0
			
 
				+    collect_count: int = 0
			
 
				+    image_count: int = 0
			
 
				+    is_original: bool = False
			
 
				+    channel_account_id: str = ""
			
 
				+    raw_data: Dict[str, Any] = field(default_factory=dict)
			
 
				+
			
 
				+    def to_log_dict(self, body_limit: int = 200) -> Dict[str, Any]:
			
 
				+        """返回适合日志输出的结构化字典，正文截断、raw_data 剥离。"""
			
 
				+        body_preview = self.body_text[:body_limit] + "..." if len(self.body_text) > body_limit else self.body_text
			
 
				+        # 去掉正文中的图片标记，只保留文字预览
			
 
				+        body_preview = "\n".join(
			
 
				+            line for line in body_preview.splitlines()
			
 
				+            if not line.strip().startswith("[image:")
			
 
				+        )
			
 
				+        publish_str = ""
			
 
				+        if self.publish_time > 0:
			
 
				+            try:
			
 
				+                publish_str = datetime.fromtimestamp(self.publish_time, tz=timezone.utc).strftime("%Y-%m-%d %H:%M")
			
 
				+            except (OSError, ValueError):
			
 
				+                publish_str = str(self.publish_time)
			
 
				+
			
 
				+        return {
			
 
				+            "title": self.title,
			
 
				+            "url": self.url,
			
 
				+            "publish_time": publish_str,
			
 
				+            "statistics": {
			
 
				+                "view_count": self.view_count,
			
 
				+                "like_count": self.like_count,
			
 
				+                "share_count": self.share_count,
			
 
				+                "looking_count": self.looking_count,
			
 
				+                "comment_count": self.comment_count,
			
 
				+                "collect_count": self.collect_count,
			
 
				+            },
			
 
				+            "is_original": self.is_original,
			
 
				+            "image_count": self.image_count,
			
 
				+            "body_length": len(self.body_text),
			
 
				+            "body_preview": body_preview,
			
 
				+        }
			
 
				+
			
 
				+
			
 
				+class ToolAdapter(ABC):
			
 
				+    """平台能力接口：搜索、详情、账号信息。"""
			
 
				+
			
 
				+    @abstractmethod
			
 
				+    async def search(self, keyword: str, page: str = "1") -> List[CandidateArticle]:
			
 
				+        """按关键词召回候选文章。"""
			
 
				+        raise NotImplementedError
			
 
				+
			
 
				+    @abstractmethod
			
 
				+    async def get_article_detail(self, article_url: str) -> Optional[ArticleDetail]:
			
 
				+        """按文章 URL 获取正文与发布时间等详情。"""
			
 
				+        raise NotImplementedError
			
 
				+
			
 
				+    @abstractmethod
			
 
				+    async def get_account(self, article_url: str) -> Optional[AccountInfo]:
			
 
				+        """按文章 URL 获取所属公众号信息。"""
			
 
				+        raise NotImplementedError
			
--- a/src/pipeline/adapters/knowledge/__init__.py
+++ b/src/pipeline/adapters/knowledge/__init__.py
@@ -0,0 +1,5 @@
 
				+"""知识源子模块导出。"""
			
 
				+
			
 
				+from .base import KnowledgeItem, KnowledgeSource, StaticKnowledgeSource
			
 
				+
			
 
				+__all__ = ["KnowledgeItem", "KnowledgeSource", "StaticKnowledgeSource"]
			
--- a/src/pipeline/adapters/knowledge/base.py
+++ b/src/pipeline/adapters/knowledge/base.py
@@ -0,0 +1,49 @@
 
				+from __future__ import annotations
			
 
				+
			
 
				+"""
			
 
				+知识源抽象。
			
 
				+
			
 
				+用途：
			
 
				+- 为需求理解或筛选阶段提供外部先验信息（规则、黑名单、历史经验）。
			
 
				+- 让知识注入从“硬编码 prompt 文本”升级为“可插拔查询”。
			
 
				+"""
			
 
				+
			
 
				+from abc import ABC, abstractmethod
			
 
				+from dataclasses import dataclass, field
			
 
				+from typing import Any, Dict, List
			
 
				+
			
 
				+
			
 
				+@dataclass
			
 
				+class KnowledgeItem:
			
 
				+    """知识条目最小结构。"""
			
 
				+
			
 
				+    title: str
			
 
				+    content: str
			
 
				+    metadata: Dict[str, Any] = field(default_factory=dict)
			
 
				+
			
 
				+
			
 
				+class KnowledgeSource(ABC):
			
 
				+    """知识源接口。"""
			
 
				+
			
 
				+    @abstractmethod
			
 
				+    async def query(self, query: str, top_k: int = 5) -> List[KnowledgeItem]:
			
 
				+        """按 query 检索知识条目。"""
			
 
				+        raise NotImplementedError
			
 
				+
			
 
				+    def format_for_prompt(self, items: List[KnowledgeItem]) -> str:
			
 
				+        """将知识条目格式化为可直接拼接到 prompt 的文本。"""
			
 
				+        lines: List[str] = []
			
 
				+        for index, item in enumerate(items, start=1):
			
 
				+            lines.append(f"{index}. {item.title}: {item.content}")
			
 
				+        return "\n".join(lines)
			
 
				+
			
 
				+
			
 
				+class StaticKnowledgeSource(KnowledgeSource):
			
 
				+    """静态知识实现：适合规则常量、演示数据、离线环境。"""
			
 
				+
			
 
				+    def __init__(self, items: List[KnowledgeItem]):
			
 
				+        self.items = items
			
 
				+
			
 
				+    async def query(self, query: str, top_k: int = 5) -> List[KnowledgeItem]:
			
 
				+        """返回前 top_k 条静态数据。"""
			
 
				+        return self.items[:top_k]
			
--- a/src/pipeline/adapters/weixin.py
+++ b/src/pipeline/adapters/weixin.py
@@ -0,0 +1,112 @@
 
				+from __future__ import annotations
			
 
				+
			
 
				+"""微信公众号工具适配器实现。"""
			
 
				+
			
 
				+from typing import Any, Dict, List, Optional
			
 
				+
			
 
				+from src.pipeline.adapters.base import ArticleDetail, ToolAdapter
			
 
				+from src.pipeline.context import AccountInfo, CandidateArticle
			
 
				+from tests.tools.weixin_tools import (
			
 
				+    fetch_article_detail,
			
 
				+    fetch_weixin_account,
			
 
				+    weixin_search,
			
 
				+)
			
 
				+
			
 
				+
			
 
				+class WeixinToolAdapter(ToolAdapter):
			
 
				+    """
			
 
				+    将 tests/tools/weixin_tools.py 的工具返回统一映射到 Pipeline 契约。
			
 
				+
			
 
				+    这里做的工作主要是“标准化”：
			
 
				+    - 字段抽取（metadata -> dataclass）
			
 
				+    - 轻量清洗（标题引号统一、空值保护）
			
 
				+    - 时间戳转换（毫秒 -> 秒）
			
 
				+    """
			
 
				+
			
 
				+    async def search(self, keyword: str, page: str = "1") -> List[CandidateArticle]:
			
 
				+        """调用 weixin_search 并转换为 CandidateArticle 列表。"""
			
 
				+        result = await weixin_search(keyword=keyword, page=page)
			
 
				+        metadata = getattr(result, "metadata", {}) or {}
			
 
				+        items = metadata.get("search_results", []) or []
			
 
				+        candidates: List[CandidateArticle] = []
			
 
				+        for item in items:
			
 
				+            title = self._normalize_title(item.get("title", ""))
			
 
				+            url = item.get("url", "")
			
 
				+            statistics = item.get("statistics", {}) or {}
			
 
				+            publish_time = int(statistics.get("time") or 0)
			
 
				+            if not title or not url or not publish_time:
			
 
				+                continue
			
 
				+            candidates.append(
			
 
				+                CandidateArticle(
			
 
				+                    title=title,
			
 
				+                    url=url,
			
 
				+                    publish_time=publish_time,
			
 
				+                    source_keyword=keyword,
			
 
				+                    recall_round=1,
			
 
				+                    statistics=statistics,
			
 
				+                )
			
 
				+            )
			
 
				+        return candidates
			
 
				+
			
 
				+    async def get_article_detail(self, article_url: str) -> Optional[ArticleDetail]:
			
 
				+        """调用 fetch_article_detail 并转换为 ArticleDetail。"""
			
 
				+        result = await fetch_article_detail(article_link=article_url)
			
 
				+        metadata = getattr(result, "metadata", {}) or {}
			
 
				+        article_info = metadata.get("article_info")
			
 
				+        if not isinstance(article_info, dict):
			
 
				+            return None
			
 
				+
			
 
				+        publish_ms = int(article_info.get("publish_timestamp") or 0)
			
 
				+        publish_time = publish_ms // 1000 if publish_ms else 0
			
 
				+        image_list = article_info.get("image_url_list") or []
			
 
				+        return ArticleDetail(
			
 
				+            title=self._normalize_title(article_info.get("title", "")),
			
 
				+            url=article_info.get("content_link", article_url),
			
 
				+            publish_time=publish_time,
			
 
				+            body_text=article_info.get("body_text", "") or "",
			
 
				+            view_count=int(article_info.get("view_count") or 0),
			
 
				+            like_count=int(article_info.get("like_count") or 0),
			
 
				+            share_count=int(article_info.get("share_count") or 0),
			
 
				+            looking_count=int(article_info.get("looking_count") or 0),
			
 
				+            comment_count=int(article_info.get("comment_count") or 0),
			
 
				+            collect_count=int(article_info.get("collect_count") or 0),
			
 
				+            image_count=len(image_list),
			
 
				+            is_original=bool(article_info.get("is_original", False)),
			
 
				+            channel_account_id=str(article_info.get("channel_account_id") or ""),
			
 
				+            raw_data=article_info,
			
 
				+        )
			
 
				+
			
 
				+    async def get_account(self, article_url: str) -> Optional[AccountInfo]:
			
 
				+        """调用 fetch_weixin_account 并转换为 AccountInfo。"""
			
 
				+        result = await fetch_weixin_account(content_link=article_url)
			
 
				+        metadata = getattr(result, "metadata", {}) or {}
			
 
				+        account_info = metadata.get("account_info")
			
 
				+        if not isinstance(account_info, dict):
			
 
				+            return None
			
 
				+
			
 
				+        return AccountInfo(
			
 
				+            account_name=account_info.get("account_name", "") or "",
			
 
				+            wx_gh=account_info.get("wx_gh", "") or "",
			
 
				+            channel_account_id=str(account_info.get("channel_account_id") or ""),
			
 
				+            biz_info=self._normalize_dict(account_info.get("biz_info")),
			
 
				+        )
			
 
				+
			
 
				+    @staticmethod
			
 
				+    def _normalize_title(text: str) -> str:
			
 
				+        """把英文双引号替换为中文成对引号，保持输出风格一致。"""
			
 
				+        if '"' not in text:
			
 
				+            return text
			
 
				+        chars: List[str] = []
			
 
				+        open_quote = True
			
 
				+        for ch in text:
			
 
				+            if ch == '"':
			
 
				+                chars.append("“" if open_quote else "”")
			
 
				+                open_quote = not open_quote
			
 
				+            else:
			
 
				+                chars.append(ch)
			
 
				+        return "".join(chars)
			
 
				+
			
 
				+    @staticmethod
			
 
				+    def _normalize_dict(value: Any) -> Dict[str, Any]:
			
 
				+        """确保返回值为字典，避免上游类型异常传染到业务层。"""
			
 
				+        return value if isinstance(value, dict) else {}
			
--- a/src/pipeline/base.py
+++ b/src/pipeline/base.py
@@ -0,0 +1,106 @@
 
				+from __future__ import annotations
			
 
				+
			
 
				+"""
			
 
				+Pipeline 抽象层定义。
			
 
				+
			
 
				+本文件只放“协议”和“基础数据结构”：
			
 
				+- Stage: 一个执行阶段
			
 
				+- QualityGate: 阶段后门禁校验
			
 
				+- PipelineHook: 生命周期观察点
			
 
				+- PipelineConfig: 编排器配置
			
 
				+
			
 
				+业务实现应在 stages/、gates/、hooks/ 下完成，避免在这里写具体逻辑。
			
 
				+"""
			
 
				+
			
 
				+from abc import ABC, abstractmethod
			
 
				+from dataclasses import dataclass, field
			
 
				+from typing import Any, Dict, List, Literal, Optional
			
 
				+
			
 
				+from .context import PipelineContext
			
 
				+
			
 
				+
			
 
				+@dataclass
			
 
				+class RetryDecision:
			
 
				+    """描述阶段异常后的重试策略。"""
			
 
				+
			
 
				+    should_retry: bool = False
			
 
				+    max_retries: int = 0
			
 
				+    reason: str = ""
			
 
				+
			
 
				+
			
 
				+@dataclass
			
 
				+class GateResult:
			
 
				+    """
			
 
				+    门禁输出结果。
			
 
				+
			
 
				+    action 约定：
			
 
				+    - proceed: 继续到下一阶段
			
 
				+    - retry_stage: 重跑当前阶段
			
 
				+    - fallback: 跳转到 fallback_stage
			
 
				+    - abort: 终止整个流水线
			
 
				+    """
			
 
				+
			
 
				+    passed: bool
			
 
				+    issues: List[str] = field(default_factory=list)
			
 
				+    action: Literal["proceed", "retry_stage", "fallback", "abort"] = "proceed"
			
 
				+    fallback_stage: Optional[str] = None
			
 
				+
			
 
				+
			
 
				+class Stage(ABC):
			
 
				+    """流水线阶段基类。每个阶段关注单一职责。"""
			
 
				+
			
 
				+    name: str = ""
			
 
				+    description: str = ""
			
 
				+
			
 
				+    def validate_input(self, ctx: PipelineContext) -> List[str]:
			
 
				+        """执行前输入校验。返回错误列表；空列表表示通过。"""
			
 
				+        return []
			
 
				+
			
 
				+    def on_retry(self, ctx: PipelineContext, error: Exception) -> RetryDecision:
			
 
				+        """阶段异常时的重试策略，默认不重试。"""
			
 
				+        return RetryDecision()
			
 
				+
			
 
				+    @abstractmethod
			
 
				+    async def execute(self, ctx: PipelineContext) -> PipelineContext:
			
 
				+        """执行阶段逻辑并返回更新后的上下文。"""
			
 
				+        raise NotImplementedError
			
 
				+
			
 
				+
			
 
				+class QualityGate(ABC):
			
 
				+    """阶段后校验基类。用于把“约束”固化为代码规则。"""
			
 
				+
			
 
				+    @abstractmethod
			
 
				+    def check(self, ctx: PipelineContext) -> GateResult:
			
 
				+        raise NotImplementedError
			
 
				+
			
 
				+
			
 
				+class PipelineHook(ABC):
			
 
				+    """编排生命周期钩子。适合做日志、监控、持久化等旁路能力。"""
			
 
				+
			
 
				+    async def on_pipeline_start(self, ctx: PipelineContext) -> None:
			
 
				+        return None
			
 
				+
			
 
				+    async def on_stage_start(self, stage_name: str, ctx: PipelineContext) -> None:
			
 
				+        return None
			
 
				+
			
 
				+    async def on_stage_complete(self, stage_name: str, ctx: PipelineContext) -> None:
			
 
				+        return None
			
 
				+
			
 
				+    async def on_gate_check(self, gate_name: str, result: GateResult, ctx: PipelineContext) -> None:
			
 
				+        return None
			
 
				+
			
 
				+    async def on_pipeline_complete(self, ctx: PipelineContext) -> None:
			
 
				+        return None
			
 
				+
			
 
				+    async def on_error(self, stage_name: str, error: Exception, ctx: PipelineContext) -> None:
			
 
				+        return None
			
 
				+
			
 
				+
			
 
				+@dataclass
			
 
				+class PipelineConfig:
			
 
				+    """编排器运行配置。"""
			
 
				+
			
 
				+    max_stage_retries: int = 1
			
 
				+    checkpoint_enabled: bool = True
			
 
				+    fail_fast: bool = True
			
 
				+    metadata: Dict[str, Any] = field(default_factory=dict)
			
--- a/src/pipeline/config/pipeline_config.py
+++ b/src/pipeline/config/pipeline_config.py
@@ -0,0 +1,26 @@
 
				+from __future__ import annotations
			
 
				+
			
 
				+"""Pipeline 运行时配置（环境变量 -> 强类型配置对象）。"""
			
 
				+
			
 
				+from dataclasses import dataclass
			
 
				+import os
			
 
				+
			
 
				+
			
 
				+@dataclass
			
 
				+class RuntimePipelineConfig:
			
 
				+    """运行时参数聚合，避免在代码中散落 os.getenv。"""
			
 
				+
			
 
				+    model: str
			
 
				+    temperature: float
			
 
				+    max_iterations: int
			
 
				+    target_count: int
			
 
				+
			
 
				+    @classmethod
			
 
				+    def from_env(cls) -> "RuntimePipelineConfig":
			
 
				+        """从环境变量读取配置并填充默认值。"""
			
 
				+        return cls(
			
 
				+            model=os.getenv("MODEL", "anthropic/claude-opus-4-6"),
			
 
				+            temperature=float(os.getenv("PIPELINE_TEMPERATURE", "0.2")),
			
 
				+            max_iterations=int(os.getenv("PIPELINE_MAX_ITERATIONS", "12")),
			
 
				+            target_count=int(os.getenv("PIPELINE_TARGET_COUNT", "10")),
			
 
				+        )
			
--- a/src/pipeline/context.py
+++ b/src/pipeline/context.py
@@ -0,0 +1,190 @@
 
				+from __future__ import annotations
			
 
				+
			
 
				+"""
			
 
				+Pipeline 共享上下文与数据契约定义。
			
 
				+
			
 
				+设计意图：
			
 
				+- 用强类型 dataclass 约束阶段间输入输出，减少“字典字符串键”错误。
			
 
				+- PipelineContext 作为单一事实来源（Single Source of Truth），在阶段间流转。
			
 
				+- 通过 snapshot() 给日志/Hook 提供稳定的摘要视图。
			
 
				+"""
			
 
				+
			
 
				+from dataclasses import asdict, dataclass, field
			
 
				+from datetime import datetime
			
 
				+from typing import Any, Dict, List, Literal, Optional
			
 
				+
			
 
				+# 相关等级
			
 
				+RelevanceLevel = Literal["high", "medium", "low"]
			
 
				+# 兴趣等级
			
 
				+InterestLevel = Literal["high", "medium", "low"]
			
 
				+# 热度等级
			
 
				+HotLevel = Literal["high", "medium", "low"]
			
 
				+
			
 
				+
			
 
				+@dataclass
			
 
				+class SearchStrategy:
			
 
				+    """需求理解阶段产出的搜索策略提示。"""
			
 
				+
			
 
				+    precise_search: bool = True  # 是否建议“精准词直搜”（搜非常具体、可直接命中的词）
			
 
				+    topic_drill_down: bool = True  # 是否建议“主题下钻”（先搜大主题，再沿子主题扩展）
			
 
				+    precise_keywords: List[str] = field(default_factory=list)  # 具体可搜的精准词列表（优先级高）
			
 
				+    topic_keywords: List[str] = field(default_factory=list)  # 主题层关键词列表（用于扩展召回）
			
 
				+
			
 
				+
			
 
				+@dataclass
			
 
				+class FilterFocus:
			
 
				+    """筛选阶段关注点（格式、相关性、风险）。"""
			
 
				+
			
 
				+    format_rules: List[str] = field(default_factory=list)
			
 
				+    relevance_focus: List[str] = field(default_factory=list)
			
 
				+    elimination_risks: List[str] = field(default_factory=list)
			
 
				+
			
 
				+
			
 
				+@dataclass
			
 
				+class DemandAnalysisResult:
			
 
				+    """需求分析阶段标准输出。"""
			
 
				+    substantive_features: List[str] = field(default_factory=list)
			
 
				+    formal_features: List[str] = field(default_factory=list)
			
 
				+    upper_features: List[str] = field(default_factory=list)
			
 
				+    lower_features: List[str] = field(default_factory=list)
			
 
				+    search_strategy: SearchStrategy = field(default_factory=SearchStrategy)
			
 
				+    filter_focus: FilterFocus = field(default_factory=FilterFocus)
			
 
				+    raw_result: Dict[str, Any] = field(default_factory=dict)
			
 
				+
			
 
				+
			
 
				+@dataclass
			
 
				+class CandidateArticle:
			
 
				+    """搜索阶段候选文章结构。"""
			
 
				+    title: str
			
 
				+    url: str
			
 
				+    publish_time: int
			
 
				+    view_count: int = 0 # 阅读量
			
 
				+    like_count: int = 0 # 点赞量
			
 
				+    share_count: int = 0 # 分享量
			
 
				+    looking_count: int = 0 # 在看量
			
 
				+    source_keyword: str = ""
			
 
				+    recall_round: int = 1
			
 
				+    statistics: Dict[str, Any] = field(default_factory=dict)
			
 
				+
			
 
				+
			
 
				+@dataclass
			
 
				+class FilteredArticle(CandidateArticle):
			
 
				+    """质量筛选后保留文章结构。"""
			
 
				+    reason: str = ""
			
 
				+    relevance_level: RelevanceLevel = "medium"
			
 
				+    interest_level: InterestLevel = "medium"
			
 
				+    # hot_level: HotLevel = "medium"
			
 
				+    detail_title: str = ""
			
 
				+    detail_url: str = ""
			
 
				+    body_text: str = ""
			
 
				+
			
 
				+
			
 
				+@dataclass
			
 
				+class AccountInfo:
			
 
				+    """账号沉淀后的账号聚合信息。"""
			
 
				+    account_name: str
			
 
				+    wx_gh: str
			
 
				+    channel_account_id: str = ""
			
 
				+    biz_info: Dict[str, Any] = field(default_factory=dict)
			
 
				+    article_count: int = 0
			
 
				+    sample_articles: List[str] = field(default_factory=list)
			
 
				+    source_urls: List[str] = field(default_factory=list)
			
 
				+
			
 
				+
			
 
				+@dataclass
			
 
				+class ArticleAccountRelation:
			
 
				+    """文章与账号的映射关系。"""
			
 
				+    article_url: str
			
 
				+    wx_gh: str
			
 
				+
			
 
				+
			
 
				+@dataclass
			
 
				+class StageError:
			
 
				+    """阶段执行错误记录。"""
			
 
				+    stage_name: str
			
 
				+    error_type: str
			
 
				+    message: str
			
 
				+    retryable: bool = False
			
 
				+    created_at: str = field(default_factory=lambda: datetime.utcnow().isoformat())
			
 
				+
			
 
				+
			
 
				+@dataclass
			
 
				+class StageRecord:
			
 
				+    """阶段执行记录（用于审计与重跑分析）。"""
			
 
				+    stage_name: str
			
 
				+    started_at: str
			
 
				+    completed_at: Optional[str] = None
			
 
				+    status: str = "pending"
			
 
				+    attempt: int = 1
			
 
				+    summary: Dict[str, Any] = field(default_factory=dict)
			
 
				+
			
 
				+
			
 
				+@dataclass
			
 
				+class OutputSummary:
			
 
				+    """输出摘要统计。"""
			
 
				+    candidate_count: int = 0
			
 
				+    filtered_in_count: int = 0
			
 
				+    account_count: int = 0
			
 
				+
			
 
				+
			
 
				+@dataclass
			
 
				+class PipelineOutput:
			
 
				+    """最终 output.json 的数据结构。"""
			
 
				+    trace_id: str
			
 
				+    query: str
			
 
				+    demand_id: str
			
 
				+    summary: OutputSummary
			
 
				+    contents: List[Dict[str, Any]]
			
 
				+    accounts: List[Dict[str, Any]]
			
 
				+    article_account_relations: List[Dict[str, Any]]
			
 
				+
			
 
				+    def to_dict(self) -> Dict[str, Any]:
			
 
				+        return asdict(self)
			
 
				+
			
 
				+
			
 
				+@dataclass
			
 
				+class PipelineContext:
			
 
				+    """
			
 
				+    流水线执行上下文。
			
 
				+
			
 
				+    任何阶段都应只读/修改该对象中的明确字段，不应在外部维护隐式状态。
			
 
				+    """
			
 
				+    task_id: str
			
 
				+    query: str
			
 
				+    demand_id: str = ""
			
 
				+    target_count: int = 10
			
 
				+    model: str = ""
			
 
				+    platform: str = "weixin"
			
 
				+    trace_id: Optional[str] = None
			
 
				+    output_dir: str = ""
			
 
				+    current_stage: str = "INIT"
			
 
				+
			
 
				+    demand_analysis: Optional[DemandAnalysisResult] = None
			
 
				+    candidate_articles: List[CandidateArticle] = field(default_factory=list)
			
 
				+    filtered_articles: List[FilteredArticle] = field(default_factory=list)
			
 
				+    accounts: List[AccountInfo] = field(default_factory=list)
			
 
				+    article_account_relations: List[ArticleAccountRelation] = field(default_factory=list)
			
 
				+    output: Optional[PipelineOutput] = None
			
 
				+
			
 
				+    stage_history: List[StageRecord] = field(default_factory=list)
			
 
				+    checkpoints: Dict[str, Dict[str, Any]] = field(default_factory=dict)
			
 
				+    errors: List[StageError] = field(default_factory=list)
			
 
				+    metadata: Dict[str, Any] = field(default_factory=dict)
			
 
				+    knowledge_sources: Dict[str, Any] = field(default_factory=dict)
			
 
				+
			
 
				+    def snapshot(self) -> Dict[str, Any]:
			
 
				+        """生成轻量快照，供日志、监控与数据库 Hook 使用。"""
			
 
				+        return {
			
 
				+            "task_id": self.task_id,
			
 
				+            "query": self.query,
			
 
				+            "demand_id": self.demand_id,
			
 
				+            "target_count": self.target_count,
			
 
				+            "model": self.model,
			
 
				+            "platform": self.platform,
			
 
				+            "trace_id": self.trace_id,
			
 
				+            "output_dir": self.output_dir,
			
 
				+            "current_stage": self.current_stage,
			
 
				+            "candidate_count": len(self.candidate_articles),
			
 
				+            "filtered_count": len(self.filtered_articles),
			
 
				+            "account_count": len(self.accounts),
			
 
				+        }
			
--- a/src/pipeline/gates/__init__.py
+++ b/src/pipeline/gates/__init__.py
@@ -0,0 +1,11 @@
 
				+"""门禁模块导出。"""
			
 
				+
			
 
				+from .filter_sufficiency import FilterSufficiencyGate
			
 
				+from .output_schema import OutputSchemaGate
			
 
				+from .search_completeness import SearchCompletenessGate
			
 
				+
			
 
				+__all__ = [
			
 
				+    "FilterSufficiencyGate",
			
 
				+    "OutputSchemaGate",
			
 
				+    "SearchCompletenessGate",
			
 
				+]
			
--- a/src/pipeline/gates/filter_sufficiency.py
+++ b/src/pipeline/gates/filter_sufficiency.py
@@ -0,0 +1,51 @@
 
				+from __future__ import annotations
			
 
				+
			
 
				+"""筛选结果数量门禁。"""
			
 
				+
			
 
				+from src.pipeline.base import GateResult, QualityGate
			
 
				+from src.pipeline.context import PipelineContext
			
 
				+
			
 
				+
			
 
				+class FilterSufficiencyGate(QualityGate):
			
 
				+    """校验 quality_filter 阶段输出是否满足目标数量。
			
 
				+
			
 
				+    当入选数量不足时，最多触发一轮 fallback 补召回。
			
 
				+    若补召回后仍不足，则放行（有多少用多少）。
			
 
				+    """
			
 
				+
			
 
				+    def __init__(self, fallback_stage: str = "content_search"):
			
 
				+        # 当数量明显不足时，回退到指定阶段补召回
			
 
				+        self.fallback_stage = fallback_stage
			
 
				+        self._check_count = 0
			
 
				+
			
 
				+    def check(self, ctx: PipelineContext) -> GateResult:
			
 
				+        self._check_count += 1
			
 
				+        policy = ctx.metadata.get("search_agent_policy") or {}
			
 
				+        near_ratio = float(policy.get("filter_near_ratio", 0.5))
			
 
				+        count = len(ctx.filtered_articles)
			
 
				+        target = max(ctx.target_count, 1)
			
 
				+
			
 
				+        if count >= target:
			
 
				+            return GateResult(passed=True)
			
 
				+
			
 
				+        if count >= max(int(target * near_ratio), 1):
			
 
				+            return GateResult(
			
 
				+                passed=True,
			
 
				+                issues=[f"入选数量接近目标: {count}/{target}"],
			
 
				+                action="proceed",
			
 
				+            )
			
 
				+
			
 
				+        # 已经 fallback 过一次，不再重试，有多少用多少
			
 
				+        if self._check_count > 1:
			
 
				+            return GateResult(
			
 
				+                passed=True,
			
 
				+                issues=[f"补召回后入选仍不足({count}/{target})，放行已有结果"],
			
 
				+                action="proceed",
			
 
				+            )
			
 
				+
			
 
				+        return GateResult(
			
 
				+            passed=False,
			
 
				+            issues=[f"入选数量不足: {count}/{target}"],
			
 
				+            action="fallback",
			
 
				+            fallback_stage=self.fallback_stage,
			
 
				+        )
			
--- a/src/pipeline/gates/output_schema.py
+++ b/src/pipeline/gates/output_schema.py
@@ -0,0 +1,43 @@
 
				+from __future__ import annotations
			
 
				+
			
 
				+"""最终输出一致性门禁。"""
			
 
				+
			
 
				+from src.pipeline.base import GateResult, QualityGate
			
 
				+from src.pipeline.context import PipelineContext
			
 
				+
			
 
				+
			
 
				+class OutputSchemaGate(QualityGate):
			
 
				+    """
			
 
				+    校验 output_persist 阶段产出的结构完整性与引用一致性。
			
 
				+
			
 
				+    关注点：
			
 
				+    - summary 计数与实际数组长度一致
			
 
				+    - relation 中的 article_url / wx_gh 必须能在主表中找到
			
 
				+    """
			
 
				+
			
 
				+    def check(self, ctx: PipelineContext) -> GateResult:
			
 
				+        output = ctx.output
			
 
				+        if not output:
			
 
				+            return GateResult(passed=False, issues=["缺少 output"], action="abort")
			
 
				+
			
 
				+        issues = []
			
 
				+        if output.summary.candidate_count < output.summary.filtered_in_count:
			
 
				+            issues.append("candidate_count 不能小于 filtered_in_count")
			
 
				+        if output.summary.account_count != len(output.accounts):
			
 
				+            issues.append("account_count 与 accounts 数量不一致")
			
 
				+        if output.summary.filtered_in_count != len(output.contents):
			
 
				+            issues.append("filtered_in_count 与 contents 数量不一致")
			
 
				+
			
 
				+        content_urls = {item["url"] for item in output.contents}
			
 
				+        account_ids = {item["wx_gh"] for item in output.accounts if item["wx_gh"]}
			
 
				+        for relation in output.article_account_relations:
			
 
				+            if relation["article_url"] not in content_urls:
			
 
				+                issues.append(f"relation 引用了不存在的 article_url: {relation['article_url']}")
			
 
				+            if relation["wx_gh"] and relation["wx_gh"] not in account_ids:
			
 
				+                issues.append(f"relation 引用了不存在的 wx_gh: {relation['wx_gh']}")
			
 
				+
			
 
				+        return GateResult(
			
 
				+            passed=not issues,
			
 
				+            issues=issues,
			
 
				+            action="proceed" if not issues else "abort",
			
 
				+        )
			
--- a/src/pipeline/gates/search_completeness.py
+++ b/src/pipeline/gates/search_completeness.py
@@ -0,0 +1,36 @@
 
				+from __future__ import annotations
			
 
				+
			
 
				+"""候选召回数量门禁。"""
			
 
				+
			
 
				+from src.pipeline.base import GateResult, QualityGate
			
 
				+from src.pipeline.context import PipelineContext
			
 
				+
			
 
				+
			
 
				+class SearchCompletenessGate(QualityGate):
			
 
				+    """
			
 
				+    校验 content_search 阶段是否召回了足够候选。
			
 
				+
			
 
				+    策略来源：
			
 
				+    - ctx.target_count
			
 
				+    - ctx.metadata.search_agent_policy 中的候选倍率参数
			
 
				+    """
			
 
				+
			
 
				+    def check(self, ctx: PipelineContext) -> GateResult:
			
 
				+        policy = ctx.metadata.get("search_agent_policy") or {}
			
 
				+        mult = float(policy.get("min_candidate_multiplier", 2.0))
			
 
				+        near = float(policy.get("near_enough_candidate_multiplier", 1.2))
			
 
				+        target = max(int(ctx.target_count * mult), 1)
			
 
				+        count = len(ctx.candidate_articles)
			
 
				+        if count >= target:
			
 
				+            return GateResult(passed=True)
			
 
				+        if count >= max(int(ctx.target_count * near), 1):
			
 
				+            return GateResult(
			
 
				+                passed=True,
			
 
				+                issues=[f"候选数量低于理想值，但可继续: {count}/{target}"],
			
 
				+                action="proceed",
			
 
				+            )
			
 
				+        return GateResult(
			
 
				+            passed=False,
			
 
				+            issues=[f"候选数量不足: {count}/{target}"],
			
 
				+            action="abort",
			
 
				+        )
			
--- a/src/pipeline/hooks/__init__.py
+++ b/src/pipeline/hooks/__init__.py
@@ -0,0 +1,8 @@
 
				+"""Hook 模块导出。"""
			
 
				+
			
 
				+from .db_hook import DatabasePersistHook
			
 
				+from .live_progress_hook import LiveProgressHook
			
 
				+from .pipeline_trace_hook import PipelineTraceHook
			
 
				+from .trace_hook import TraceHook
			
 
				+
			
 
				+__all__ = ["DatabasePersistHook", "LiveProgressHook", "PipelineTraceHook", "TraceHook"]
			
--- a/src/pipeline/hooks/db_hook.py
+++ b/src/pipeline/hooks/db_hook.py
@@ -0,0 +1,388 @@
 
				+from __future__ import annotations
			
 
				+
			
 
				+"""
			
 
				+数据库持久化 Hook。
			
 
				+
			
 
				+使用意图：
			
 
				+- 将 pipeline 运行过程沉淀到 MySQL，便于审计、复盘与运营分析。
			
 
				+- 保持“旁路”特性：任何落库失败都只告警，不阻断主流程。
			
 
				+"""
			
 
				+
			
 
				+import json
			
 
				+import logging
			
 
				+from datetime import datetime
			
 
				+from hashlib import sha256
			
 
				+from typing import Any, Iterable
			
 
				+
			
 
				+from src.config import LongArticlesSearchAgentConfig
			
 
				+from src.infra.database.mysql.async_mysql_pool import AsyncMySQLPool
			
 
				+from src.pipeline.base import GateResult, PipelineHook
			
 
				+from src.pipeline.context import PipelineContext
			
 
				+
			
 
				+logger = logging.getLogger(__name__)
			
 
				+
			
 
				+
			
 
				+class DatabasePersistHook(PipelineHook):
			
 
				+    """
			
 
				+    轻量数据库持久化 hook。
			
 
				+
			
 
				+    约束:
			
 
				+    - 若未配置数据库连接，自动跳过
			
 
				+    - 只做 upsert / insert，不影响主流程
			
 
				+    - 依赖 docs/supply-agent-solution.md 中建议的表结构
			
 
				+    """
			
 
				+
			
 
				+    def __init__(self) -> None:
			
 
				+        # enabled=False 时所有回调快速返回，不产生 DB 开销
			
 
				+        self.enabled = False
			
 
				+        self.pool: AsyncMySQLPool | None = None
			
 
				+        self._snapshot_pk: int | None = None
			
 
				+        try:
			
 
				+            config = LongArticlesSearchAgentConfig()
			
 
				+            db_conf = config.search_agent_db
			
 
				+            if db_conf.host and db_conf.user and db_conf.db:
			
 
				+                self.pool = AsyncMySQLPool(config)
			
 
				+                self.enabled = True
			
 
				+        except Exception as exc:
			
 
				+            logger.warning("DatabasePersistHook init skipped: %s", exc)
			
 
				+
			
 
				+    async def on_pipeline_start(self, ctx: PipelineContext) -> None:
			
 
				+        # 初始化快照与主任务行
			
 
				+        if not self.enabled or not self.pool:
			
 
				+            return
			
 
				+        self._snapshot_pk = await self._upsert_demand_snapshot(ctx)
			
 
				+        await self._upsert_task(ctx, status="RUNNING")
			
 
				+        await self._insert_event(ctx, "INIT", "STATE_CHANGE", "INFO", "pipeline started")
			
 
				+
			
 
				+    async def on_stage_complete(self, stage_name: str, ctx: PipelineContext) -> None:
			
 
				+        # 每个阶段完成后写 stage 记录和事件，并按阶段写细分业务表
			
 
				+        if not self.enabled or not self.pool:
			
 
				+            return
			
 
				+        await self._insert_stage_record(stage_name, ctx)
			
 
				+        await self._insert_event(ctx, stage_name.upper(), "STATE_CHANGE", "INFO", "stage completed")
			
 
				+
			
 
				+        if stage_name == "content_search":
			
 
				+            await self._persist_candidates(ctx)
			
 
				+            await self._upsert_task(ctx, status="RECALL_DONE")
			
 
				+        elif stage_name == "quality_filter":
			
 
				+            await self._persist_scores(ctx)
			
 
				+            await self._upsert_task(ctx, status="RANKED")
			
 
				+        elif stage_name == "account_precipitate":
			
 
				+            await self._persist_accounts(ctx)
			
 
				+            await self._upsert_task(ctx, status="ENRICHED")
			
 
				+        elif stage_name == "output_persist":
			
 
				+            await self._upsert_task(ctx, status="COMPLETED")
			
 
				+
			
 
				+    async def on_gate_check(self, gate_name: str, result: GateResult, ctx: PipelineContext) -> None:
			
 
				+        if not self.enabled or not self.pool:
			
 
				+            return
			
 
				+        payload = {"passed": result.passed, "issues": result.issues, "action": result.action}
			
 
				+        await self._insert_event(ctx, gate_name.upper(), "GATE_CHECK", "INFO", json.dumps(payload, ensure_ascii=False))
			
 
				+
			
 
				+    async def on_error(self, stage_name: str, error: Exception, ctx: PipelineContext) -> None:
			
 
				+        if not self.enabled or not self.pool:
			
 
				+            return
			
 
				+        await self._upsert_task(ctx, status="FAILED", error_code=type(error).__name__, error_message=str(error))
			
 
				+        await self._insert_event(ctx, stage_name.upper(), "ERROR", "ERROR", str(error))
			
 
				+
			
 
				+    async def on_pipeline_complete(self, ctx: PipelineContext) -> None:
			
 
				+        if not self.enabled or not self.pool:
			
 
				+            return
			
 
				+        await self._insert_event(ctx, "FINALIZE", "STATE_CHANGE", "INFO", "pipeline completed")
			
 
				+
			
 
				+    async def _upsert_demand_snapshot(self, ctx: PipelineContext) -> int | None:
			
 
				+        """写需求快照表，返回快照主键（供 task 表关联）。"""
			
 
				+        assert self.pool is not None
			
 
				+        query_expansion = []
			
 
				+        if ctx.demand_analysis:
			
 
				+            query_expansion = (
			
 
				+                ctx.demand_analysis.search_strategy.precise_keywords
			
 
				+                + ctx.demand_analysis.search_strategy.topic_keywords
			
 
				+            )
			
 
				+        sql = """
			
 
				+        INSERT INTO supply_demand_snapshot
			
 
				+        (demand_id, query, query_expansion, platform, expected_count, audience_profile, quality_constraints, source_payload, version)
			
 
				+        VALUES (%s, %s, %s, %s, %s, %s, %s, %s, 1)
			
 
				+        """
			
 
				+        params = (
			
 
				+            int(ctx.demand_id or 0),
			
 
				+            ctx.query,
			
 
				+            json.dumps(query_expansion, ensure_ascii=False),
			
 
				+            ctx.platform,
			
 
				+            ctx.target_count,
			
 
				+            json.dumps({"core_user_group": "50岁以上中老年人"}, ensure_ascii=False),
			
 
				+            json.dumps(ctx.metadata.get("quality_constraints", {}), ensure_ascii=False),
			
 
				+            json.dumps({"task_id": ctx.task_id, "query": ctx.query}, ensure_ascii=False),
			
 
				+        )
			
 
				+        try:
			
 
				+            await self.pool.async_save(sql, params)
			
 
				+            rows = await self.pool.async_fetch(
			
 
				+                "SELECT id FROM supply_demand_snapshot WHERE demand_id=%s ORDER BY id DESC LIMIT 1",
			
 
				+                params=(int(ctx.demand_id or 0),),
			
 
				+            )
			
 
				+            return int(rows[0]["id"]) if rows else None
			
 
				+        except Exception as exc:
			
 
				+            logger.warning("persist demand snapshot skipped: %s", exc)
			
 
				+            return None
			
 
				+
			
 
				+    async def _upsert_task(
			
 
				+        self,
			
 
				+        ctx: PipelineContext,
			
 
				+        status: str,
			
 
				+        error_code: str | None = None,
			
 
				+        error_message: str | None = None,
			
 
				+    ) -> None:
			
 
				+        """写/更新任务主表。"""
			
 
				+        assert self.pool is not None
			
 
				+        now = datetime.utcnow().strftime("%Y-%m-%d %H:%M:%S")
			
 
				+        started_at = now if status == "RUNNING" else None
			
 
				+        finished_at = now if status in {"COMPLETED", "FAILED"} else None
			
 
				+        sql = """
			
 
				+        INSERT INTO supply_task
			
 
				+        (task_id, demand_snapshot_id, demand_id, trace_id, status, current_stage, priority, retry_count, max_retry,
			
 
				+         is_idempotent, idempotency_key, started_at, finished_at, error_code, error_message, operator, ext)
			
 
				+        VALUES (%s, %s, %s, %s, %s, %s, 5, 0, 3, 1, %s, %s, %s, %s, %s, 'agent', %s)
			
 
				+        ON DUPLICATE KEY UPDATE
			
 
				+            trace_id=VALUES(trace_id),
			
 
				+            status=VALUES(status),
			
 
				+            current_stage=VALUES(current_stage),
			
 
				+            finished_at=COALESCE(VALUES(finished_at), finished_at),
			
 
				+            error_code=VALUES(error_code),
			
 
				+            error_message=VALUES(error_message),
			
 
				+            ext=VALUES(ext)
			
 
				+        """
			
 
				+        params = (
			
 
				+            ctx.task_id,
			
 
				+            self._snapshot_pk or 0,
			
 
				+            int(ctx.demand_id or 0),
			
 
				+            ctx.trace_id,
			
 
				+            status,
			
 
				+            ctx.current_stage.upper(),
			
 
				+            _build_idempotency_key(ctx),
			
 
				+            started_at,
			
 
				+            finished_at,
			
 
				+            error_code,
			
 
				+            error_message,
			
 
				+            json.dumps(ctx.snapshot(), ensure_ascii=False),
			
 
				+        )
			
 
				+        try:
			
 
				+            await self.pool.async_save(sql, params)
			
 
				+        except Exception as exc:
			
 
				+            logger.warning("persist task skipped: %s", exc)
			
 
				+
			
 
				+    async def _insert_stage_record(self, stage_name: str, ctx: PipelineContext) -> None:
			
 
				+        """写阶段执行明细表。"""
			
 
				+        assert self.pool is not None
			
 
				+        record = next((item for item in reversed(ctx.stage_history) if item.stage_name == stage_name), None)
			
 
				+        if not record:
			
 
				+            return
			
 
				+        sql = """
			
 
				+        INSERT INTO supply_task_stage
			
 
				+        (task_id, stage_name, stage_status, attempt_no, input_payload, output_payload, started_at, finished_at, duration_ms, error_code, error_message)
			
 
				+        VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, NULL, NULL)
			
 
				+        """
			
 
				+        params = (
			
 
				+            ctx.task_id,
			
 
				+            stage_name.upper(),
			
 
				+            "SUCCESS" if record.status == "completed" else "FAILED",
			
 
				+            record.attempt,
			
 
				+            json.dumps(ctx.checkpoints.get(stage_name, {}), ensure_ascii=False),
			
 
				+            json.dumps(record.summary, ensure_ascii=False),
			
 
				+            _iso_to_mysql(record.started_at),
			
 
				+            _iso_to_mysql(record.completed_at),
			
 
				+            _duration_ms(record.started_at, record.completed_at),
			
 
				+        )
			
 
				+        try:
			
 
				+            await self.pool.async_save(sql, params)
			
 
				+        except Exception as exc:
			
 
				+            logger.warning("persist stage skipped: %s", exc)
			
 
				+
			
 
				+    async def _persist_candidates(self, ctx: PipelineContext) -> None:
			
 
				+        """写候选内容池。"""
			
 
				+        assert self.pool is not None
			
 
				+        if not ctx.candidate_articles:
			
 
				+            return
			
 
				+        sql = """
			
 
				+        INSERT INTO supply_candidate_content
			
 
				+        (task_id, source_keyword, recall_round, recall_page, platform, title, url, publish_time, raw_statistics, dedup_hash, quality_flag, reject_reason)
			
 
				+        VALUES (%s, %s, %s, %s, %s, %s, %s, FROM_UNIXTIME(%s), %s, %s, %s, %s)
			
 
				+        ON DUPLICATE KEY UPDATE
			
 
				+            source_keyword=VALUES(source_keyword),
			
 
				+            recall_round=VALUES(recall_round),
			
 
				+            raw_statistics=VALUES(raw_statistics),
			
 
				+            quality_flag=VALUES(quality_flag),
			
 
				+            reject_reason=VALUES(reject_reason)
			
 
				+        """
			
 
				+        params = [
			
 
				+            (
			
 
				+                ctx.task_id,
			
 
				+                item.source_keyword,
			
 
				+                item.recall_round,
			
 
				+                1,
			
 
				+                ctx.platform,
			
 
				+                item.title,
			
 
				+                item.url,
			
 
				+                item.publish_time,
			
 
				+                json.dumps(item.statistics, ensure_ascii=False),
			
 
				+                _dedup_hash(item.title, item.url),
			
 
				+                "UNKNOWN",
			
 
				+                "",
			
 
				+            )
			
 
				+            for item in ctx.candidate_articles
			
 
				+        ]
			
 
				+        try:
			
 
				+            await self.pool.async_save(sql, params, batch=True)
			
 
				+        except Exception as exc:
			
 
				+            logger.warning("persist candidates skipped: %s", exc)
			
 
				+
			
 
				+    async def _persist_scores(self, ctx: PipelineContext) -> None:
			
 
				+        """写质量评分结果。"""
			
 
				+        assert self.pool is not None
			
 
				+        if not ctx.filtered_articles:
			
 
				+            return
			
 
				+        for rank, item in enumerate(ctx.filtered_articles, start=1):
			
 
				+            rows = await self.pool.async_fetch(
			
 
				+                "SELECT id FROM supply_candidate_content WHERE task_id=%s AND url=%s LIMIT 1",
			
 
				+                params=(ctx.task_id, item.url),
			
 
				+            )
			
 
				+            if not rows:
			
 
				+                continue
			
 
				+            candidate_id = rows[0]["id"]
			
 
				+            total_score = _level_to_score(item.relevance_level) * 0.6 + _level_to_score(item.interest_level) * 0.4
			
 
				+            sql = """
			
 
				+            INSERT INTO supply_content_score
			
 
				+            (task_id, candidate_id, relevance_score, popularity_score, quality_score, account_score, elder_fit_score,
			
 
				+             diversity_penalty, total_score, filter_status, filter_reason, is_selected, rank_no, score_version, score_detail)
			
 
				+            VALUES (%s, %s, %s, 0, %s, 0, %s, 0, %s, 'PASS', %s, 1, %s, 'v1', %s)
			
 
				+            ON DUPLICATE KEY UPDATE
			
 
				+                relevance_score=VALUES(relevance_score),
			
 
				+                quality_score=VALUES(quality_score),
			
 
				+                elder_fit_score=VALUES(elder_fit_score),
			
 
				+                total_score=VALUES(total_score),
			
 
				+                filter_status='PASS',
			
 
				+                filter_reason=VALUES(filter_reason),
			
 
				+                is_selected=1,
			
 
				+                rank_no=VALUES(rank_no),
			
 
				+                score_detail=VALUES(score_detail)
			
 
				+            """
			
 
				+            params = (
			
 
				+                ctx.task_id,
			
 
				+                candidate_id,
			
 
				+                _level_to_score(item.relevance_level),
			
 
				+                _level_to_score(item.interest_level),
			
 
				+                _level_to_score(item.interest_level),
			
 
				+                total_score,
			
 
				+                item.reason,
			
 
				+                rank,
			
 
				+                json.dumps(
			
 
				+                    {
			
 
				+                        "relevance_level": item.relevance_level,
			
 
				+                        "interest_level": item.interest_level,
			
 
				+                        "reason": item.reason,
			
 
				+                    },
			
 
				+                    ensure_ascii=False,
			
 
				+                ),
			
 
				+            )
			
 
				+            try:
			
 
				+                await self.pool.async_save(sql, params)
			
 
				+            except Exception as exc:
			
 
				+                logger.warning("persist score skipped: %s", exc)
			
 
				+
			
 
				+    async def _persist_accounts(self, ctx: PipelineContext) -> None:
			
 
				+        """写账号画像/沉淀结果。"""
			
 
				+        assert self.pool is not None
			
 
				+        if not ctx.accounts:
			
 
				+            return
			
 
				+        sql = """
			
 
				+        INSERT INTO supply_account_profile
			
 
				+        (task_id, account_id, account_name, biz_info, credibility_score, risk_tags, raw_profile_payload)
			
 
				+        VALUES (%s, %s, %s, %s, %s, %s, %s)
			
 
				+        ON DUPLICATE KEY UPDATE
			
 
				+            account_name=VALUES(account_name),
			
 
				+            biz_info=VALUES(biz_info),
			
 
				+            credibility_score=VALUES(credibility_score),
			
 
				+            risk_tags=VALUES(risk_tags),
			
 
				+            raw_profile_payload=VALUES(raw_profile_payload)
			
 
				+        """
			
 
				+        params = [
			
 
				+            (
			
 
				+                ctx.task_id,
			
 
				+                item.wx_gh,
			
 
				+                item.account_name,
			
 
				+                json.dumps(item.biz_info, ensure_ascii=False),
			
 
				+                min(item.article_count * 20, 100),
			
 
				+                json.dumps([], ensure_ascii=False),
			
 
				+                json.dumps(
			
 
				+                    {
			
 
				+                        "account_name": item.account_name,
			
 
				+                        "wx_gh": item.wx_gh,
			
 
				+                        "article_count": item.article_count,
			
 
				+                        "source_urls": item.source_urls,
			
 
				+                    },
			
 
				+                    ensure_ascii=False,
			
 
				+                ),
			
 
				+            )
			
 
				+            for item in ctx.accounts
			
 
				+            if item.wx_gh
			
 
				+        ]
			
 
				+        try:
			
 
				+            if params:
			
 
				+                await self.pool.async_save(sql, params, batch=True)
			
 
				+        except Exception as exc:
			
 
				+            logger.warning("persist accounts skipped: %s", exc)
			
 
				+
			
 
				+    async def _insert_event(
			
 
				+        self,
			
 
				+        ctx: PipelineContext,
			
 
				+        stage_name: str,
			
 
				+        event_type: str,
			
 
				+        event_level: str,
			
 
				+        message: str,
			
 
				+    ) -> None:
			
 
				+        """写统一事件日志。"""
			
 
				+        assert self.pool is not None
			
 
				+        sql = """
			
 
				+        INSERT INTO supply_task_event
			
 
				+        (task_id, stage_name, event_type, event_level, event_payload, message)
			
 
				+        VALUES (%s, %s, %s, %s, %s, %s)
			
 
				+        """
			
 
				+        params = (
			
 
				+            ctx.task_id,
			
 
				+            stage_name,
			
 
				+            event_type,
			
 
				+            event_level,
			
 
				+            json.dumps(ctx.snapshot(), ensure_ascii=False),
			
 
				+            message[:1000],
			
 
				+        )
			
 
				+        try:
			
 
				+            await self.pool.async_save(sql, params)
			
 
				+        except Exception as exc:
			
 
				+            logger.warning("persist event skipped: %s", exc)
			
 
				+
			
 
				+
			
 
				+def _build_idempotency_key(ctx: PipelineContext) -> str:
			
 
				+    raw = f"{ctx.demand_id}:{ctx.query}:{ctx.platform}"
			
 
				+    return sha256(raw.encode("utf-8")).hexdigest()
			
 
				+
			
 
				+
			
 
				+def _dedup_hash(title: str, url: str) -> str:
			
 
				+    raw = f"{title.strip()}::{url.strip()}"
			
 
				+    return sha256(raw.encode("utf-8")).hexdigest()
			
 
				+
			
 
				+
			
 
				+def _iso_to_mysql(value: str | None) -> str | None:
			
 
				+    if not value:
			
 
				+        return None
			
 
				+    return datetime.fromisoformat(value).strftime("%Y-%m-%d %H:%M:%S")
			
 
				+
			
 
				+
			
 
				+def _duration_ms(started_at: str | None, completed_at: str | None) -> int | None:
			
 
				+    if not started_at or not completed_at:
			
 
				+        return None
			
 
				+    start = datetime.fromisoformat(started_at)
			
 
				+    end = datetime.fromisoformat(completed_at)
			
 
				+    return int((end - start).total_seconds() * 1000)
			
 
				+
			
 
				+
			
 
				+def _level_to_score(level: str) -> int:
			
 
				+    return {"low": 30, "medium": 70, "high": 90}.get(level, 0)
			
--- a/src/pipeline/hooks/live_progress_hook.py
+++ b/src/pipeline/hooks/live_progress_hook.py
@@ -0,0 +1,203 @@
 
				+from __future__ import annotations
			
 
				+
			
 
				+"""
			
 
				+终端实时进度可视化 Hook。
			
 
				+
			
 
				+在终端输出 pipeline 各阶段的：
			
 
				+- 实时进度与耗时
			
 
				+- 模型思考（需求分析结果、搜索策略）
			
 
				+- 参考点（搜索词命中数据）
			
 
				+- 决策（逐篇文章审核结论）
			
 
				+"""
			
 
				+
			
 
				+import logging
			
 
				+import time
			
 
				+from typing import Any, Dict, List, Optional
			
 
				+
			
 
				+from src.pipeline.base import GateResult, PipelineHook
			
 
				+from src.pipeline.context import PipelineContext
			
 
				+
			
 
				+logger = logging.getLogger(__name__)
			
 
				+
			
 
				+# 阶段元信息
			
 
				+_STAGE_META: Dict[str, Dict[str, str]] = {
			
 
				+    "demand_analysis":      {"icon": "🔍", "label": "需求理解与特征分层"},
			
 
				+    "content_search":       {"icon": "📡", "label": "按策略搜索候选文章"},
			
 
				+    "hard_filter":          {"icon": "🧹", "label": "候选去重与基础规则过滤"},
			
 
				+    "quality_filter":       {"icon": "⭐", "label": "规则与 LLM 混合筛选"},
			
 
				+    "account_precipitate":  {"icon": "👤", "label": "基于文章聚合公众号"},
			
 
				+    "output_persist":       {"icon": "💾", "label": "生成标准输出并落盘"},
			
 
				+}
			
 
				+
			
 
				+_STAGE_ORDER = list(_STAGE_META.keys())
			
 
				+
			
 
				+_BAR = "━" * 56
			
 
				+
			
 
				+
			
 
				+def _truncate(text: str, max_len: int = 40) -> str:
			
 
				+    """截断过长文本。"""
			
 
				+    if len(text) <= max_len:
			
 
				+        return text
			
 
				+    return text[: max_len - 1] + "…"
			
 
				+
			
 
				+
			
 
				+class LiveProgressHook(PipelineHook):
			
 
				+    """终端实时进度可视化。"""
			
 
				+
			
 
				+    def __init__(self) -> None:
			
 
				+        self._stage_start_times: Dict[str, float] = {}
			
 
				+        self._pipeline_start: float = 0.0
			
 
				+        self._stage_index: int = 0
			
 
				+        self._total_stages: int = len(_STAGE_ORDER)
			
 
				+
			
 
				+    # ── Pipeline 生命周期 ───────────────────────────────────
			
 
				+
			
 
				+    async def on_pipeline_start(self, ctx: PipelineContext) -> None:
			
 
				+        self._pipeline_start = time.monotonic()
			
 
				+        logger.info("\n%s", _BAR)
			
 
				+        logger.info("  Pipeline Started")
			
 
				+        logger.info("  Query   : %s", ctx.query)
			
 
				+        logger.info("  Target  : %d 篇 | Model: %s", ctx.target_count, ctx.model)
			
 
				+        logger.info("  TraceID : %s", ctx.trace_id or ctx.task_id)
			
 
				+        logger.info(_BAR)
			
 
				+
			
 
				+    async def on_stage_start(self, stage_name: str, ctx: PipelineContext) -> None:
			
 
				+        self._stage_start_times[stage_name] = time.monotonic()
			
 
				+        meta = _STAGE_META.get(stage_name, {"icon": "▶", "label": stage_name})
			
 
				+        self._stage_index = _STAGE_ORDER.index(stage_name) + 1 if stage_name in _STAGE_ORDER else self._stage_index + 1
			
 
				+        logger.info(
			
 
				+            "\n[%d/%d] %s %s — %s...",
			
 
				+            self._stage_index, self._total_stages, meta["icon"], stage_name, meta["label"],
			
 
				+        )
			
 
				+
			
 
				+    async def on_stage_complete(self, stage_name: str, ctx: PipelineContext) -> None:
			
 
				+        elapsed = time.monotonic() - self._stage_start_times.get(stage_name, time.monotonic())
			
 
				+        logger.info("      ✓ 完成 (%.1fs)", elapsed)
			
 
				+
			
 
				+        # 根据不同阶段打印决策信息
			
 
				+        printer = getattr(self, f"_print_{stage_name}", None)
			
 
				+        if printer:
			
 
				+            printer(ctx)
			
 
				+
			
 
				+    async def on_gate_check(self, gate_name: str, result: GateResult, ctx: PipelineContext) -> None:
			
 
				+        if result.passed:
			
 
				+            logger.info("\n      🚦 %s Gate: ✓ 通过", gate_name)
			
 
				+        else:
			
 
				+            logger.warning("\n      🚦 %s Gate: ✗ 未通过 [%s]", gate_name, result.action)
			
 
				+            for issue in result.issues:
			
 
				+                logger.warning("         ⚠ %s", issue)
			
 
				+            if result.fallback_stage:
			
 
				+                logger.warning("         ↩ 回退到: %s", result.fallback_stage)
			
 
				+
			
 
				+    async def on_error(self, stage_name: str, error: Exception, ctx: PipelineContext) -> None:
			
 
				+        logger.error("      ✗ 失败: %s", error)
			
 
				+
			
 
				+    async def on_pipeline_complete(self, ctx: PipelineContext) -> None:
			
 
				+        total = time.monotonic() - self._pipeline_start
			
 
				+        logger.info("\n%s", _BAR)
			
 
				+        logger.info("  ✅ Pipeline 完成 | 总耗时 %.1fs", total)
			
 
				+        logger.info(
			
 
				+            "  候选: %d 篇 → 入选: %d 篇 → 账号: %d 个",
			
 
				+            len(ctx.candidate_articles), len(ctx.filtered_articles), len(ctx.accounts),
			
 
				+        )
			
 
				+        if ctx.errors:
			
 
				+            logger.warning("  ⚠ 错误: %d 个", len(ctx.errors))
			
 
				+        logger.info("%s\n", _BAR)
			
 
				+
			
 
				+    # ── 各阶段决策信息打印 ──────────────────────────────────
			
 
				+
			
 
				+    def _print_demand_analysis(self, ctx: PipelineContext) -> None:
			
 
				+        da = ctx.demand_analysis
			
 
				+        if not da:
			
 
				+            return
			
 
				+        logger.info("      ┌─ 模型思考")
			
 
				+        if da.substantive_features:
			
 
				+            logger.info("      │  实质特征: %s", ", ".join(da.substantive_features))
			
 
				+        if da.formal_features:
			
 
				+            logger.info("      │  形式特征: %s", ", ".join(da.formal_features))
			
 
				+        if da.upper_features:
			
 
				+            logger.info("      │  上层特征: %s", ", ".join(da.upper_features))
			
 
				+        if da.lower_features:
			
 
				+            logger.info("      │  下层特征: %s", ", ".join(da.lower_features))
			
 
				+        logger.info("      ├─ 搜索策略")
			
 
				+        s = da.search_strategy
			
 
				+        if s.precise_keywords:
			
 
				+            logger.info("      │  精准词: %s", ", ".join(s.precise_keywords))
			
 
				+        if s.topic_keywords:
			
 
				+            logger.info("      │  主题词: %s", ", ".join(s.topic_keywords))
			
 
				+        ff = da.filter_focus
			
 
				+        if ff.relevance_focus:
			
 
				+            logger.info("      ├─ 筛选关注")
			
 
				+            for f in ff.relevance_focus:
			
 
				+                logger.info("      │  · %s", f)
			
 
				+        if ff.elimination_risks:
			
 
				+            logger.info("      ├─ 淘汰风险")
			
 
				+            for r in ff.elimination_risks:
			
 
				+                logger.info("      │  · %s", r)
			
 
				+        logger.info("      └─")
			
 
				+
			
 
				+    def _print_content_search(self, ctx: PipelineContext) -> None:
			
 
				+        stats: List[Dict] = ctx.metadata.get("_search_keyword_stats", [])
			
 
				+        if not stats:
			
 
				+            logger.info("      └─ 📊 候选: %d 篇", len(ctx.candidate_articles))
			
 
				+            return
			
 
				+        logger.info("      ┌─ 搜索词命中")
			
 
				+        for i, s in enumerate(stats):
			
 
				+            connector = "│" if i < len(stats) - 1 else "│"
			
 
				+            logger.info(
			
 
				+                '      %s  "%s" → 返回 %d 篇, 新增 %d 篇',
			
 
				+                connector, s["keyword"], s["returned"], s["new"],
			
 
				+            )
			
 
				+        logger.info("      └─ 📊 累计候选: %d 篇", len(ctx.candidate_articles))
			
 
				+
			
 
				+    def _print_hard_filter(self, ctx: PipelineContext) -> None:
			
 
				+        logger.info("      └─ 📊 过滤后: %d 篇", len(ctx.candidate_articles))
			
 
				+
			
 
				+    def _print_quality_filter(self, ctx: PipelineContext) -> None:
			
 
				+        reviews: List[Dict] = ctx.metadata.get("_quality_review_log", [])
			
 
				+        if not reviews:
			
 
				+            logger.info("      └─ 📊 入选: %d 篇", len(ctx.filtered_articles))
			
 
				+            return
			
 
				+
			
 
				+        accepted = [r for r in reviews if r["status"] == "accept"]
			
 
				+        rejected = [r for r in reviews if r["status"] == "reject"]
			
 
				+        skipped = [r for r in reviews if r["status"] == "skip"]
			
 
				+
			
 
				+        logger.info(
			
 
				+            "      ┌─ 详情拉取: %d/%d 成功%s",
			
 
				+            len(reviews) - len(skipped), len(reviews),
			
 
				+            f", {len(skipped)} 跳过" if skipped else "",
			
 
				+        )
			
 
				+
			
 
				+        # 打印入选文章
			
 
				+        if accepted:
			
 
				+            logger.info("      ├─ ✓ 入选 (%d 篇)", len(accepted))
			
 
				+            for r in accepted:
			
 
				+                title = _truncate(r["title"])
			
 
				+                logger.info("      │    %s", title)
			
 
				+                logger.info("      │    → %s/%s | %s", r["relevance"], r["interest"], _truncate(r["reason"], 50))
			
 
				+
			
 
				+        # 打印淘汰文章
			
 
				+        if rejected:
			
 
				+            logger.info("      ├─ ✗ 淘汰 (%d 篇)", len(rejected))
			
 
				+            for r in rejected:
			
 
				+                title = _truncate(r["title"])
			
 
				+                phase = "LLM复评" if r.get("phase") == "llm" else "启发式"
			
 
				+                logger.info("      │    %s", title)
			
 
				+                logger.info("      │    → [%s] %s", phase, _truncate(r["reason"], 80))
			
 
				+
			
 
				+        logger.info("      └─ 📊 最终入选: %d 篇", len(ctx.filtered_articles))
			
 
				+
			
 
				+    def _print_account_precipitate(self, ctx: PipelineContext) -> None:
			
 
				+        if not ctx.accounts:
			
 
				+            logger.info("      └─ 📊 账号: 0 个")
			
 
				+            return
			
 
				+        logger.info("      ┌─ 聚合账号")
			
 
				+        for i, acc in enumerate(ctx.accounts):
			
 
				+            connector = "│" if i < len(ctx.accounts) - 1 else "└"
			
 
				+            logger.info("      %s  %s (%d 篇)", connector, acc.account_name, acc.article_count)
			
 
				+
			
 
				+    def _print_output_persist(self, ctx: PipelineContext) -> None:
			
 
				+        output_file = ctx.metadata.get("output_file", "")
			
 
				+        if output_file:
			
 
				+            logger.info("      └─ 📄 %s", output_file)
			
--- a/src/pipeline/hooks/pipeline_trace_hook.py
+++ b/src/pipeline/hooks/pipeline_trace_hook.py
@@ -0,0 +1,305 @@
 
				+from __future__ import annotations
			
 
				+
			
 
				+"""
			
 
				+Pipeline JSONL 追踪 Hook。
			
 
				+
			
 
				+职责：
			
 
				+- 将 Pipeline 整个生命周期的关键事件写成 JSONL 格式
			
 
				+- 输出到 tests/traces/{trace_id}/pipeline.jsonl
			
 
				+- 事件格式与 visualize_log.py 兼容，可直接用 pipeline_visualize.py 渲染 HTML
			
 
				+
			
 
				+事件类型（type 字段）：
			
 
				+  init            — Pipeline 启动（含 query / model / demand_id）
			
 
				+  stage_start     — 阶段开始
			
 
				+  stage_complete  — 阶段完成（含候选/入选数量快照）
			
 
				+  gate_check      — 质量门禁结果
			
 
				+  error           — 阶段异常
			
 
				+  complete        — Pipeline 结束
			
 
				+"""
			
 
				+
			
 
				+import json
			
 
				+import logging
			
 
				+import re
			
 
				+from datetime import datetime, timezone
			
 
				+from pathlib import Path
			
 
				+from typing import Any, Dict
			
 
				+
			
 
				+from src.pipeline.base import GateResult, PipelineHook
			
 
				+from src.pipeline.context import PipelineContext
			
 
				+
			
 
				+logger = logging.getLogger(__name__)
			
 
				+
			
 
				+# 阶段图标映射
			
 
				+_STAGE_ICONS: Dict[str, str] = {
			
 
				+    "demand_analysis": "🔍",
			
 
				+    "content_search": "📡",
			
 
				+    "hard_filter": "🧹",
			
 
				+    "coarse_filter": "🏷️",
			
 
				+    "quality_filter": "⭐",
			
 
				+    "account_precipitate": "👤",
			
 
				+    "output_persist": "💾",
			
 
				+}
			
 
				+
			
 
				+_GATE_ICONS: Dict[str, str] = {
			
 
				+    "content_search": "🚦",
			
 
				+    "quality_filter": "🚦",
			
 
				+    "output_persist": "🚦",
			
 
				+}
			
 
				+
			
 
				+
			
 
				+def _now() -> str:
			
 
				+    return datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M:%S.%f")[:-3]
			
 
				+
			
 
				+
			
 
				+def _strip_html(text: str) -> str:
			
 
				+    """去除字符串中的 HTML 标签，只保留纯文本。"""
			
 
				+    return re.sub(r"<[^>]+>", "", text) if text else ""
			
 
				+
			
 
				+
			
 
				+def _stage_icon(name: str) -> str:
			
 
				+    return _STAGE_ICONS.get(name, "▶")
			
 
				+
			
 
				+
			
 
				+class PipelineTraceHook(PipelineHook):
			
 
				+    """
			
 
				+    将 Pipeline 生命周期事件写成 JSONL 文件，供 pipeline_visualize.py 渲染。
			
 
				+
			
 
				+    文件路径：{trace_dir}/{trace_id}/pipeline.jsonl
			
 
				+    trace_id 在 on_pipeline_start 时从 ctx.trace_id 读取（已在 Context 构造时生成）。
			
 
				+    """
			
 
				+
			
 
				+    def __init__(self, trace_dir: str | Path) -> None:
			
 
				+        self._trace_dir = Path(trace_dir)
			
 
				+        self._jsonl_path: Path | None = None
			
 
				+        self._start_ts: str = ""
			
 
				+        self._agent_trace_cursor: int = 0  # 已消费的 _stage_agent_traces 游标
			
 
				+        self._llm_interaction_cursor: int = 0  # 已消费的 _llm_interactions 游标
			
 
				+
			
 
				+    # ── 内部工具 ──────────────────────────────────────────────
			
 
				+
			
 
				+    def _init_file(self, ctx: PipelineContext) -> None:
			
 
				+        """根据 trace_id 确定输出路径并创建目录。"""
			
 
				+        trace_id = ctx.trace_id or ctx.task_id
			
 
				+        out_dir = self._trace_dir / trace_id
			
 
				+        out_dir.mkdir(parents=True, exist_ok=True)
			
 
				+        self._jsonl_path = out_dir / "pipeline.jsonl"
			
 
				+        # 清空旧文件（重跑场景）
			
 
				+        self._jsonl_path.write_text("", encoding="utf-8")
			
 
				+
			
 
				+    def _write(self, event: Dict[str, Any]) -> None:
			
 
				+        if self._jsonl_path is None:
			
 
				+            return
			
 
				+        try:
			
 
				+            with self._jsonl_path.open("a", encoding="utf-8") as f:
			
 
				+                f.write(json.dumps(event, ensure_ascii=False) + "\n")
			
 
				+        except Exception as exc:
			
 
				+            logger.warning("PipelineTraceHook write failed: %s", exc)
			
 
				+
			
 
				+    def _snapshot_stats(self, ctx: PipelineContext) -> Dict[str, Any]:
			
 
				+        return {
			
 
				+            "candidate_count": len(ctx.candidate_articles),
			
 
				+            "filtered_count": len(ctx.filtered_articles),
			
 
				+            "account_count": len(ctx.accounts),
			
 
				+        }
			
 
				+
			
 
				+    def _extract_decisions(self, stage_name: str, ctx: PipelineContext) -> Dict[str, Any]:
			
 
				+        """根据 stage_name 提取该阶段的决策数据，写入 JSONL 供可视化使用。"""
			
 
				+        extractor = getattr(self, f"_decisions_{stage_name}", None)
			
 
				+        if extractor:
			
 
				+            try:
			
 
				+                return extractor(ctx)
			
 
				+            except Exception as exc:
			
 
				+                logger.debug("_extract_decisions(%s) failed: %s", stage_name, exc)
			
 
				+        return {}
			
 
				+
			
 
				+    def _decisions_demand_analysis(self, ctx: PipelineContext) -> Dict[str, Any]:
			
 
				+        da = ctx.demand_analysis
			
 
				+        if not da:
			
 
				+            return {}
			
 
				+        return {
			
 
				+            "substantive_features": da.substantive_features,
			
 
				+            "formal_features": da.formal_features,
			
 
				+            "upper_features": da.upper_features,
			
 
				+            "lower_features": da.lower_features,
			
 
				+            "search_strategy": {
			
 
				+                "precise_keywords": da.search_strategy.precise_keywords,
			
 
				+                "topic_keywords": da.search_strategy.topic_keywords,
			
 
				+            },
			
 
				+            "filter_focus": {
			
 
				+                "relevance_focus": da.filter_focus.relevance_focus,
			
 
				+                "elimination_risks": da.filter_focus.elimination_risks,
			
 
				+            },
			
 
				+        }
			
 
				+
			
 
				+    def _decisions_content_search(self, ctx: PipelineContext) -> Dict[str, Any]:
			
 
				+        stats: list = ctx.metadata.get("_search_keyword_stats", [])
			
 
				+        candidates = []
			
 
				+        for a in ctx.candidate_articles:
			
 
				+            candidates.append({
			
 
				+                "title": _strip_html(a.title),
			
 
				+                "url": a.url,
			
 
				+                "source_keyword": a.source_keyword,
			
 
				+                "publish_time": a.publish_time,
			
 
				+                "view_count": a.view_count,
			
 
				+            })
			
 
				+        return {
			
 
				+            "keyword_stats": stats,
			
 
				+            "total_candidates": len(ctx.candidate_articles),
			
 
				+            "candidates": candidates,
			
 
				+        }
			
 
				+
			
 
				+    def _decisions_hard_filter(self, ctx: PipelineContext) -> Dict[str, Any]:
			
 
				+        return {
			
 
				+            "after_filter_count": len(ctx.candidate_articles),
			
 
				+        }
			
 
				+
			
 
				+    def _decisions_coarse_filter(self, ctx: PipelineContext) -> Dict[str, Any]:
			
 
				+        log: list = ctx.metadata.get("_coarse_filter_log", [])
			
 
				+        return {
			
 
				+            "coarse_log": log,
			
 
				+            "passed_count": sum(1 for r in log if r.get("status") == "pass"),
			
 
				+            "rejected_count": sum(1 for r in log if r.get("status") == "reject"),
			
 
				+            "after_filter_count": len(ctx.candidate_articles),
			
 
				+        }
			
 
				+
			
 
				+    def _decisions_quality_filter(self, ctx: PipelineContext) -> Dict[str, Any]:
			
 
				+        reviews: list = ctx.metadata.get("_quality_review_log", [])
			
 
				+        return {
			
 
				+            "score_config": ctx.metadata.get("_quality_score_config", {}),
			
 
				+            "match_terms": ctx.metadata.get("_quality_match_terms", []),
			
 
				+            "review_log": reviews,
			
 
				+            "accepted_count": sum(1 for r in reviews if r.get("status") == "accept"),
			
 
				+            "rejected_count": sum(1 for r in reviews if r.get("status") == "reject"),
			
 
				+            "skipped_count": sum(1 for r in reviews if r.get("status") == "skip"),
			
 
				+            "final_filtered_count": len(ctx.filtered_articles),
			
 
				+        }
			
 
				+
			
 
				+    def _decisions_account_precipitate(self, ctx: PipelineContext) -> Dict[str, Any]:
			
 
				+        return {
			
 
				+            "accounts": [
			
 
				+                {
			
 
				+                    "account_name": acc.account_name,
			
 
				+                    "article_count": acc.article_count,
			
 
				+                    "sample_articles": acc.sample_articles[:3],
			
 
				+                }
			
 
				+                for acc in ctx.accounts
			
 
				+            ],
			
 
				+        }
			
 
				+
			
 
				+    def _decisions_output_persist(self, ctx: PipelineContext) -> Dict[str, Any]:
			
 
				+        return {
			
 
				+            "output_file": ctx.metadata.get("output_file", ""),
			
 
				+        }
			
 
				+
			
 
				+    # ── Hook 回调 ─────────────────────────────────────────────
			
 
				+
			
 
				+    async def on_pipeline_start(self, ctx: PipelineContext) -> None:
			
 
				+        self._init_file(ctx)
			
 
				+        self._start_ts = _now()
			
 
				+        init_event: Dict[str, Any] = {
			
 
				+            "type": "init",
			
 
				+            "ts": self._start_ts,
			
 
				+            "trace_id": ctx.trace_id or ctx.task_id,
			
 
				+            "task_id": ctx.task_id,
			
 
				+            "query": ctx.query,
			
 
				+            "demand_id": ctx.demand_id,
			
 
				+            "model": ctx.model,
			
 
				+            "target_count": ctx.target_count,
			
 
				+            "platform": ctx.platform,
			
 
				+        }
			
 
				+        # 注入 harness 执行计划（如果存在）
			
 
				+        run_plan = ctx.metadata.get("_run_plan")
			
 
				+        if run_plan:
			
 
				+            init_event["run_plan"] = run_plan
			
 
				+        self._write(init_event)
			
 
				+
			
 
				+    async def on_stage_start(self, stage_name: str, ctx: PipelineContext) -> None:
			
 
				+        icon = _stage_icon(stage_name)
			
 
				+        self._write({
			
 
				+            "type": "stage_start",
			
 
				+            "ts": _now(),
			
 
				+            "stage": stage_name,
			
 
				+            "icon": icon,
			
 
				+            "trace_id": ctx.trace_id or ctx.task_id,
			
 
				+            "stats": self._snapshot_stats(ctx),
			
 
				+        })
			
 
				+
			
 
				+    async def on_stage_complete(self, stage_name: str, ctx: PipelineContext) -> None:
			
 
				+        icon = _stage_icon(stage_name)
			
 
				+        record = next(
			
 
				+            (r for r in reversed(ctx.stage_history) if r.stage_name == stage_name), None
			
 
				+        )
			
 
				+        duration_ms = None
			
 
				+        attempt = 1
			
 
				+        if record and record.started_at and record.completed_at:
			
 
				+            try:
			
 
				+                t0 = datetime.fromisoformat(record.started_at)
			
 
				+                t1 = datetime.fromisoformat(record.completed_at)
			
 
				+                duration_ms = int((t1 - t0).total_seconds() * 1000)
			
 
				+            except Exception:
			
 
				+                pass
			
 
				+            attempt = record.attempt
			
 
				+
			
 
				+        # 取出本阶段新增的 agent 子任务 trace_id（游标推进）
			
 
				+        all_traces = ctx.metadata.get("_stage_agent_traces", [])
			
 
				+        new_traces = all_traces[self._agent_trace_cursor:]
			
 
				+        self._agent_trace_cursor = len(all_traces)
			
 
				+        agent_trace_ids = [entry["agent_trace_id"] for entry in new_traces]
			
 
				+
			
 
				+        # 取出本阶段新增的 LLM 交互记录（游标推进）
			
 
				+        all_interactions = ctx.metadata.get("_llm_interactions", [])
			
 
				+        new_interactions = all_interactions[self._llm_interaction_cursor:]
			
 
				+        self._llm_interaction_cursor = len(all_interactions)
			
 
				+
			
 
				+        self._write({
			
 
				+            "type": "stage_complete",
			
 
				+            "ts": _now(),
			
 
				+            "stage": stage_name,
			
 
				+            "icon": icon,
			
 
				+            "trace_id": ctx.trace_id or ctx.task_id,
			
 
				+            "attempt": attempt,
			
 
				+            "duration_ms": duration_ms,
			
 
				+            "stats": self._snapshot_stats(ctx),
			
 
				+            "decisions": self._extract_decisions(stage_name, ctx),
			
 
				+            "agent_trace_ids": agent_trace_ids,
			
 
				+            "llm_interactions": new_interactions if new_interactions else [],
			
 
				+        })
			
 
				+
			
 
				+    async def on_gate_check(self, gate_name: str, result: GateResult, ctx: PipelineContext) -> None:
			
 
				+        icon = _GATE_ICONS.get(gate_name, "🚦")
			
 
				+        self._write({
			
 
				+            "type": "gate_check",
			
 
				+            "ts": _now(),
			
 
				+            "gate": gate_name,
			
 
				+            "icon": icon,
			
 
				+            "trace_id": ctx.trace_id or ctx.task_id,
			
 
				+            "passed": result.passed,
			
 
				+            "action": result.action,
			
 
				+            "issues": result.issues,
			
 
				+            "fallback_stage": result.fallback_stage,
			
 
				+        })
			
 
				+
			
 
				+    async def on_error(self, stage_name: str, error: Exception, ctx: PipelineContext) -> None:
			
 
				+        self._write({
			
 
				+            "type": "error",
			
 
				+            "ts": _now(),
			
 
				+            "stage": stage_name,
			
 
				+            "trace_id": ctx.trace_id or ctx.task_id,
			
 
				+            "msg": str(error),
			
 
				+            "traceback": "",
			
 
				+        })
			
 
				+
			
 
				+    async def on_pipeline_complete(self, ctx: PipelineContext) -> None:
			
 
				+        self._write({
			
 
				+            "type": "complete",
			
 
				+            "ts": _now(),
			
 
				+            "trace_id": ctx.trace_id or ctx.task_id,
			
 
				+            "task_id": ctx.task_id,
			
 
				+            "status": "completed",
			
 
				+            "stats": self._snapshot_stats(ctx),
			
 
				+            "output_file": ctx.metadata.get("output_file", ""),
			
 
				+            "stage_count": len(ctx.stage_history),
			
 
				+            "error_count": len(ctx.errors),
			
 
				+        })
			
 
				+        if self._jsonl_path:
			
 
				+            logger.info("Pipeline trace saved: %s", self._jsonl_path)
			
--- a/src/pipeline/hooks/trace_hook.py
+++ b/src/pipeline/hooks/trace_hook.py
@@ -0,0 +1,43 @@
 
				+from __future__ import annotations
			
 
				+
			
 
				+"""轻量追踪 Hook：将流水线关键事件输出到日志。"""
			
 
				+
			
 
				+import json
			
 
				+import logging
			
 
				+
			
 
				+from src.pipeline.base import GateResult, PipelineHook
			
 
				+from src.pipeline.context import PipelineContext
			
 
				+
			
 
				+logger = logging.getLogger(__name__)
			
 
				+
			
 
				+
			
 
				+class TraceHook(PipelineHook):
			
 
				+    """
			
 
				+    默认观测 Hook。
			
 
				+
			
 
				+    使用意图：
			
 
				+    - 在不引入外部监控系统时，先用结构化日志观察阶段执行轨迹。
			
 
				+    - 若后续接入 APM/日志平台，可在该 Hook 基础上扩展。
			
 
				+    """
			
 
				+
			
 
				+    async def on_pipeline_start(self, ctx: PipelineContext) -> None:
			
 
				+        logger.info("pipeline_start: %s", json.dumps(ctx.snapshot(), ensure_ascii=False))
			
 
				+
			
 
				+    async def on_stage_start(self, stage_name: str, ctx: PipelineContext) -> None:
			
 
				+        logger.info("stage_start[%s]: %s", stage_name, json.dumps(ctx.snapshot(), ensure_ascii=False))
			
 
				+
			
 
				+    async def on_stage_complete(self, stage_name: str, ctx: PipelineContext) -> None:
			
 
				+        logger.info("stage_complete[%s]: %s", stage_name, json.dumps(ctx.snapshot(), ensure_ascii=False))
			
 
				+
			
 
				+    async def on_gate_check(self, gate_name: str, result: GateResult, ctx: PipelineContext) -> None:
			
 
				+        logger.info(
			
 
				+            "gate_check[%s]: %s",
			
 
				+            gate_name,
			
 
				+            json.dumps(
			
 
				+                {"passed": result.passed, "issues": result.issues, "action": result.action},
			
 
				+                ensure_ascii=False,
			
 
				+            ),
			
 
				+        )
			
 
				+
			
 
				+    async def on_pipeline_complete(self, ctx: PipelineContext) -> None:
			
 
				+        logger.info("pipeline_complete: %s", json.dumps(ctx.snapshot(), ensure_ascii=False))
			
--- a/src/pipeline/orchestrator.py
+++ b/src/pipeline/orchestrator.py
@@ -0,0 +1,155 @@
 
				+from __future__ import annotations
			
 
				+
			
 
				+"""
			
 
				+Pipeline 编排器。
			
 
				+
			
 
				+职责：
			
 
				+1) 按顺序执行 Stage
			
 
				+2) 记录阶段执行历史（含重试）
			
 
				+3) 在阶段后运行 QualityGate
			
 
				+4) 触发 Hook（日志/持久化/监控）
			
 
				+5) 支持 checkpoint 与 fallback 跳转
			
 
				+"""
			
 
				+
			
 
				+import copy
			
 
				+from datetime import datetime
			
 
				+from typing import Dict, List, Optional
			
 
				+
			
 
				+from .base import GateResult, PipelineConfig, PipelineHook, QualityGate, Stage
			
 
				+from .context import PipelineContext, StageError, StageRecord
			
 
				+
			
 
				+
			
 
				+class PipelineOrchestrator:
			
 
				+    """有状态流水线执行器。"""
			
 
				+
			
 
				+    def __init__(
			
 
				+        self,
			
 
				+        stages: List[Stage],
			
 
				+        gates: Optional[Dict[str, QualityGate]] = None,
			
 
				+        config: Optional[PipelineConfig] = None,
			
 
				+        max_gate_fallbacks: int = 1,
			
 
				+    ):
			
 
				+        self.stages = stages
			
 
				+        self.gates = gates or {}
			
 
				+        self.config = config or PipelineConfig()
			
 
				+        self.hooks: List[PipelineHook] = []
			
 
				+        self._max_gate_fallbacks = max_gate_fallbacks
			
 
				+        self._gate_fallback_counts: Dict[str, int] = {}
			
 
				+
			
 
				+    def add_hook(self, hook: PipelineHook) -> None:
			
 
				+        """注册一个生命周期 Hook。"""
			
 
				+        self.hooks.append(hook)
			
 
				+
			
 
				+    async def run(self, ctx: PipelineContext) -> PipelineContext:
			
 
				+        """执行整条流水线，直到完成或抛错。"""
			
 
				+        for hook in self.hooks:
			
 
				+            await hook.on_pipeline_start(ctx)
			
 
				+
			
 
				+        stage_index = 0
			
 
				+        while stage_index < len(self.stages):
			
 
				+            stage = self.stages[stage_index]
			
 
				+            ctx.current_stage = stage.name
			
 
				+            for hook in self.hooks:
			
 
				+                await hook.on_stage_start(stage.name, ctx)
			
 
				+
			
 
				+            errors = stage.validate_input(ctx)
			
 
				+            if errors:
			
 
				+                raise ValueError(f"{stage.name} 输入校验失败: {'; '.join(errors)}")
			
 
				+
			
 
				+            try:
			
 
				+                ctx = await self._execute_stage(stage, ctx)
			
 
				+            except Exception as exc:
			
 
				+                ctx.errors.append(
			
 
				+                    StageError(
			
 
				+                        stage_name=stage.name,
			
 
				+                        error_type=type(exc).__name__,
			
 
				+                        message=str(exc),
			
 
				+                        retryable=False,
			
 
				+                    )
			
 
				+                )
			
 
				+                for hook in self.hooks:
			
 
				+                    await hook.on_error(stage.name, exc, ctx)
			
 
				+                raise
			
 
				+
			
 
				+            if self.config.checkpoint_enabled:
			
 
				+                ctx.checkpoints[stage.name] = copy.deepcopy(ctx.snapshot())
			
 
				+
			
 
				+            for hook in self.hooks:
			
 
				+                await hook.on_stage_complete(stage.name, ctx)
			
 
				+
			
 
				+            gate = self.gates.get(stage.name)
			
 
				+            if gate:
			
 
				+                gate_result = gate.check(ctx)
			
 
				+                for hook in self.hooks:
			
 
				+                    await hook.on_gate_check(stage.name, gate_result, ctx)
			
 
				+                if gate_result.action == "fallback":
			
 
				+                    ctx.metadata["_fallback_round"] = ctx.metadata.get("_fallback_round", 0) + 1
			
 
				+                stage_index = self._resolve_gate(stage_index, gate_result, stage.name)
			
 
				+                continue
			
 
				+
			
 
				+            stage_index += 1
			
 
				+
			
 
				+        for hook in self.hooks:
			
 
				+            await hook.on_pipeline_complete(ctx)
			
 
				+        return ctx
			
 
				+
			
 
				+    async def _execute_stage(self, stage: Stage, ctx: PipelineContext) -> PipelineContext:
			
 
				+        """执行单个阶段并应用重试策略。"""
			
 
				+        retries = 0
			
 
				+        max_retries = self.config.max_stage_retries
			
 
				+        while True:
			
 
				+            record = StageRecord(
			
 
				+                stage_name=stage.name,
			
 
				+                started_at=datetime.utcnow().isoformat(),
			
 
				+                status="running",
			
 
				+                attempt=retries + 1,
			
 
				+            )
			
 
				+            ctx.stage_history.append(record)
			
 
				+            try:
			
 
				+                ctx = await stage.execute(ctx)
			
 
				+                record.status = "completed"
			
 
				+                record.completed_at = datetime.utcnow().isoformat()
			
 
				+                record.summary = ctx.snapshot()
			
 
				+                return ctx
			
 
				+            except Exception as exc:
			
 
				+                retry = stage.on_retry(ctx, exc)
			
 
				+                retryable = retry.should_retry and retries < min(retry.max_retries, max_retries)
			
 
				+                record.status = "failed"
			
 
				+                record.completed_at = datetime.utcnow().isoformat()
			
 
				+                record.summary = {"error": str(exc), "retryable": retryable}
			
 
				+                ctx.errors.append(
			
 
				+                    StageError(
			
 
				+                        stage_name=stage.name,
			
 
				+                        error_type=type(exc).__name__,
			
 
				+                        message=str(exc),
			
 
				+                        retryable=retryable,
			
 
				+                    )
			
 
				+                )
			
 
				+                if not retryable:
			
 
				+                    raise
			
 
				+                retries += 1
			
 
				+
			
 
				+    def _resolve_gate(self, current_index: int, result: GateResult, stage_name: str) -> int:
			
 
				+        """根据门禁动作计算下一个阶段索引，fallback 有次数上限防止死循环。"""
			
 
				+        if result.passed or result.action == "proceed":
			
 
				+            return current_index + 1
			
 
				+        if result.action == "retry_stage":
			
 
				+            return current_index
			
 
				+        if result.action == "fallback":
			
 
				+            if not result.fallback_stage:
			
 
				+                raise ValueError("fallback 动作缺少 fallback_stage")
			
 
				+            key = f"{stage_name}->{result.fallback_stage}"
			
 
				+            count = self._gate_fallback_counts.get(key, 0) + 1
			
 
				+            self._gate_fallback_counts[key] = count
			
 
				+            if count > self._max_gate_fallbacks:
			
 
				+                import logging
			
 
				+                logging.getLogger(__name__).warning(
			
 
				+                    "Gate fallback %s 已达上限 %d 次，强制放行",
			
 
				+                    key, self._max_gate_fallbacks,
			
 
				+                )
			
 
				+                return current_index + 1
			
 
				+            for index, stage in enumerate(self.stages):
			
 
				+                if stage.name == result.fallback_stage:
			
 
				+                    return index
			
 
				+            raise ValueError(f"未知 fallback stage: {result.fallback_stage}")
			
 
				+        raise RuntimeError(f"质量门禁未通过: {'; '.join(result.issues)}")
			
--- a/src/pipeline/runner.py
+++ b/src/pipeline/runner.py
@@ -0,0 +1,130 @@
 
				+from __future__ import annotations
			
 
				+
			
 
				+"""
			
 
				+Pipeline 装配与启动入口（供代码复用）。
			
 
				+
			
 
				+与根目录 run_pipeline.py 的关系：
			
 
				+- run_pipeline.py: CLI 壳层（读取环境变量并调用这里）
			
 
				+- 本文件: 真正的“装配器”，负责拼出默认 pipeline 与 context
			
 
				+"""
			
 
				+
			
 
				+import logging
			
 
				+import os
			
 
				+from pathlib import Path
			
 
				+from uuid import uuid4
			
 
				+
			
 
				+from agent import FileSystemTraceStore
			
 
				+
			
 
				+from src.pipeline import PipelineConfig, PipelineContext, PipelineOrchestrator
			
 
				+from src.pipeline.adapters.knowledge import KnowledgeItem, StaticKnowledgeSource
			
 
				+from src.pipeline.adapters.weixin import WeixinToolAdapter
			
 
				+from src.pipeline.config.pipeline_config import RuntimePipelineConfig
			
 
				+from src.pipeline.gates import FilterSufficiencyGate, OutputSchemaGate, SearchCompletenessGate
			
 
				+from src.pipeline.hooks import DatabasePersistHook, LiveProgressHook, PipelineTraceHook, TraceHook
			
 
				+from src.pipeline.stages import (
			
 
				+    AccountPrecipitateStage,
			
 
				+    CoarseFilterStage,
			
 
				+    ContentSearchStage,
			
 
				+    DemandAnalysisStage,
			
 
				+    HardFilterStage,
			
 
				+    OutputPersistStage,
			
 
				+    QualityFilterStage,
			
 
				+)
			
 
				+from src.pipeline.stages.common import StageAgentExecutor
			
 
				+
			
 
				+logger = logging.getLogger(__name__)
			
 
				+
			
 
				+PROJECT_ROOT = Path(__file__).resolve().parents[2]
			
 
				+
			
 
				+
			
 
				+def build_default_pipeline(runtime: RuntimePipelineConfig) -> PipelineOrchestrator:
			
 
				+    """
			
 
				+    构建默认内容寻找流水线。
			
 
				+
			
 
				+    这里集中定义：
			
 
				+    - 阶段顺序
			
 
				+    - 门禁规则
			
 
				+    - 钩子（trace/db）
			
 
				+    """
			
 
				+    trace_dir = PROJECT_ROOT / "tests" / "traces"
			
 
				+    trace_dir.mkdir(parents=True, exist_ok=True)
			
 
				+
			
 
				+    skills_dir = str(PROJECT_ROOT / "tests" / "skills")
			
 
				+    trace_store = FileSystemTraceStore(base_path=str(trace_dir))
			
 
				+    agent_executor = StageAgentExecutor(
			
 
				+        trace_store=trace_store,
			
 
				+        skills_dir=skills_dir,
			
 
				+        model=runtime.model,
			
 
				+        temperature=runtime.temperature,
			
 
				+        max_iterations=runtime.max_iterations,
			
 
				+    )
			
 
				+    adapter = WeixinToolAdapter()
			
 
				+
			
 
				+    pipeline = PipelineOrchestrator(
			
 
				+        stages=[
			
 
				+            DemandAnalysisStage(agent_executor=agent_executor),
			
 
				+            ContentSearchStage(adapter=adapter, agent_executor=agent_executor),
			
 
				+            HardFilterStage(),
			
 
				+            CoarseFilterStage(agent_executor=agent_executor),
			
 
				+            QualityFilterStage(adapter=adapter, agent_executor=agent_executor, enable_llm_review=True),
			
 
				+            AccountPrecipitateStage(adapter=adapter),
			
 
				+            OutputPersistStage(),
			
 
				+        ],
			
 
				+        gates={
			
 
				+            "content_search": SearchCompletenessGate(),
			
 
				+            "quality_filter": FilterSufficiencyGate(),
			
 
				+            "output_persist": OutputSchemaGate(),
			
 
				+        },
			
 
				+        config=PipelineConfig(max_stage_retries=1, checkpoint_enabled=True, fail_fast=True),
			
 
				+    )
			
 
				+    pipeline.add_hook(TraceHook())
			
 
				+    pipeline.add_hook(PipelineTraceHook(trace_dir=trace_dir))
			
 
				+    pipeline.add_hook(LiveProgressHook())
			
 
				+    pipeline.add_hook(DatabasePersistHook())
			
 
				+    return pipeline
			
 
				+
			
 
				+
			
 
				+def default_knowledge_sources() -> dict:
			
 
				+    """默认知识源（静态规则），用于需求理解阶段提示注入。"""
			
 
				+    return {
			
 
				+        "platform_rules": StaticKnowledgeSource(
			
 
				+            [
			
 
				+                KnowledgeItem(title="平台约束", content="只允许使用微信平台相关工具，不切换到其他平台。"),
			
 
				+                KnowledgeItem(title="受众画像", content="核心受众为 50 岁以上中老年人，更关注实用、稳健、可信内容。"),
			
 
				+            ]
			
 
				+        )
			
 
				+    }
			
 
				+
			
 
				+
			
 
				+async def run_content_finder_pipeline(ctx: PipelineContext) -> PipelineContext:
			
 
				+    """执行内容寻找流水线（ctx 需已构造；可选已注入 search_agent_policy）。"""
			
 
				+    if not os.getenv("OPEN_ROUTER_API_KEY"):
			
 
				+        raise ValueError("OPEN_ROUTER_API_KEY 未设置")
			
 
				+
			
 
				+    runtime = RuntimePipelineConfig.from_env()
			
 
				+    if not ctx.model:
			
 
				+        ctx.model = runtime.model
			
 
				+
			
 
				+    pipeline = build_default_pipeline(runtime)
			
 
				+    return await pipeline.run(ctx)
			
 
				+
			
 
				+
			
 
				+async def run_content_finder_from_cli(
			
 
				+    query: str,
			
 
				+    demand_id: str = "",
			
 
				+    target_count: int | None = None,
			
 
				+) -> PipelineContext:
			
 
				+    """无 DB 策略时的便捷入口（与根目录 run_pipeline 行为一致）。"""
			
 
				+    runtime = RuntimePipelineConfig.from_env()
			
 
				+    trace_id = str(uuid4())
			
 
				+    ctx = PipelineContext(
			
 
				+        task_id=str(uuid4()),
			
 
				+        trace_id=trace_id,
			
 
				+        query=query,
			
 
				+        demand_id=demand_id,
			
 
				+        target_count=target_count or runtime.target_count,
			
 
				+        model=runtime.model,
			
 
				+        output_dir=str(PROJECT_ROOT / "tests" / "output"),
			
 
				+        knowledge_sources=default_knowledge_sources(),
			
 
				+    )
			
 
				+    return await run_content_finder_pipeline(ctx)
			
--- a/src/pipeline/stages/__init__.py
+++ b/src/pipeline/stages/__init__.py
@@ -0,0 +1,18 @@
 
				+"""阶段模块导出。"""
			
 
				+
			
 
				+from .account_precipitate import AccountPrecipitateStage
			
 
				+from .coarse_filter import CoarseFilterStage
			
 
				+from .content_filter import HardFilterStage, QualityFilterStage
			
 
				+from .content_search import ContentSearchStage
			
 
				+from .demand_analysis import DemandAnalysisStage
			
 
				+from .output_persist import OutputPersistStage
			
 
				+
			
 
				+__all__ = [
			
 
				+    "AccountPrecipitateStage",
			
 
				+    "CoarseFilterStage",
			
 
				+    "ContentSearchStage",
			
 
				+    "DemandAnalysisStage",
			
 
				+    "HardFilterStage",
			
 
				+    "OutputPersistStage",
			
 
				+    "QualityFilterStage",
			
 
				+]
			
--- a/src/pipeline/stages/account_precipitate.py
+++ b/src/pipeline/stages/account_precipitate.py
@@ -0,0 +1,60 @@
 
				+from __future__ import annotations
			
 
				+
			
 
				+"""账号沉淀阶段：将入选文章聚合为账号池。"""
			
 
				+
			
 
				+from typing import Dict, List
			
 
				+
			
 
				+from src.pipeline.adapters.base import ToolAdapter
			
 
				+from src.pipeline.base import Stage
			
 
				+from src.pipeline.context import AccountInfo, ArticleAccountRelation, PipelineContext
			
 
				+
			
 
				+
			
 
				+class AccountPrecipitateStage(Stage):
			
 
				+    name = "account_precipitate"
			
 
				+    description = "基于入选文章聚合沉淀公众号账号"
			
 
				+
			
 
				+    def __init__(self, adapter: ToolAdapter):
			
 
				+        self.adapter = adapter
			
 
				+
			
 
				+    def validate_input(self, ctx: PipelineContext) -> List[str]:
			
 
				+        return []
			
 
				+
			
 
				+    async def execute(self, ctx: PipelineContext) -> PipelineContext:
			
 
				+        """
			
 
				+        逐篇拉取账号并按 wx_gh/account_name 去重聚合。
			
 
				+
			
 
				+        聚合字段：
			
 
				+        - article_count
			
 
				+        - sample_articles（最多 5 条）
			
 
				+        - source_urls（去重）
			
 
				+        """
			
 
				+        account_map: Dict[str, AccountInfo] = {}
			
 
				+        relations: List[ArticleAccountRelation] = []
			
 
				+
			
 
				+        for article in ctx.filtered_articles:
			
 
				+            account = await self.adapter.get_account(article.url)
			
 
				+            if not account:
			
 
				+                continue
			
 
				+            key = account.wx_gh or account.account_name
			
 
				+            if not key:
			
 
				+                continue
			
 
				+
			
 
				+            existing = account_map.get(key)
			
 
				+            if not existing:
			
 
				+                existing = account
			
 
				+                existing.article_count = 0
			
 
				+                existing.sample_articles = []
			
 
				+                existing.source_urls = []
			
 
				+                account_map[key] = existing
			
 
				+
			
 
				+            existing.article_count += 1
			
 
				+            if article.title and article.title not in existing.sample_articles and len(existing.sample_articles) < 5:
			
 
				+                existing.sample_articles.append(article.title)
			
 
				+            if article.url and article.url not in existing.source_urls:
			
 
				+                existing.source_urls.append(article.url)
			
 
				+
			
 
				+            relations.append(ArticleAccountRelation(article_url=article.url, wx_gh=existing.wx_gh))
			
 
				+
			
 
				+        ctx.accounts = list(account_map.values())
			
 
				+        ctx.article_account_relations = relations
			
 
				+        return ctx
			
--- a/src/pipeline/stages/coarse_filter.py
+++ b/src/pipeline/stages/coarse_filter.py
@@ -0,0 +1,155 @@
 
				+from __future__ import annotations
			
 
				+
			
 
				+"""粗筛阶段：基于标题的 LLM 语义相关性批量判断。
			
 
				+
			
 
				+在 HardFilterStage 之后、QualityFilterStage 之前执行。
			
 
				+用 LLM 对候选文章标题做批量语义相关性判断，快速淘汰明显不相关的文章，
			
 
				+减少后续 detail API 调用量。
			
 
				+"""
			
 
				+
			
 
				+import logging
			
 
				+from typing import Any, Dict, List
			
 
				+
			
 
				+from src.pipeline.base import Stage
			
 
				+from src.pipeline.context import PipelineContext
			
 
				+from src.pipeline.stages.common import StageAgentExecutor
			
 
				+
			
 
				+logger = logging.getLogger(__name__)
			
 
				+
			
 
				+# 每批处理的文章数量上限
			
 
				+_BATCH_SIZE = 20
			
 
				+
			
 
				+
			
 
				+class CoarseFilterStage(Stage):
			
 
				+    name = "coarse_filter"
			
 
				+    description = "基于标题的 LLM 语义相关性粗筛"
			
 
				+
			
 
				+    def __init__(self, agent_executor: StageAgentExecutor, batch_size: int = _BATCH_SIZE):
			
 
				+        self.agent_executor = agent_executor
			
 
				+        self.batch_size = batch_size
			
 
				+
			
 
				+    def validate_input(self, ctx: PipelineContext) -> List[str]:
			
 
				+        if not ctx.candidate_articles:
			
 
				+            return ["缺少 candidate_articles"]
			
 
				+        return []
			
 
				+
			
 
				+    async def execute(self, ctx: PipelineContext) -> PipelineContext:
			
 
				+        articles = ctx.candidate_articles
			
 
				+        query = ctx.query
			
 
				+
			
 
				+        # 构建需求特征摘要供 LLM 参考
			
 
				+        demand_summary = self._build_demand_summary(ctx)
			
 
				+
			
 
				+        coarse_log: List[Dict[str, Any]] = []
			
 
				+        passed_articles = []
			
 
				+
			
 
				+        # 分批处理
			
 
				+        for batch_start in range(0, len(articles), self.batch_size):
			
 
				+            batch = articles[batch_start : batch_start + self.batch_size]
			
 
				+            batch_results = await self._judge_batch(query, demand_summary, batch, ctx)
			
 
				+
			
 
				+            for article, result in zip(batch, batch_results):
			
 
				+                relevance = result.get("relevance", "reject")
			
 
				+                reason = result.get("reason", "")
			
 
				+                status = "pass" if relevance == "pass" else "reject"
			
 
				+
			
 
				+                coarse_log.append({
			
 
				+                    "title": article.title,
			
 
				+                    "url": article.url,
			
 
				+                    "source_keyword": article.source_keyword,
			
 
				+                    "status": status,
			
 
				+                    "reason": reason,
			
 
				+                })
			
 
				+
			
 
				+                if status == "pass":
			
 
				+                    passed_articles.append(article)
			
 
				+
			
 
				+        logger.info(
			
 
				+            "coarse_filter 粗筛完成: %d → %d 篇 (淘汰 %d 篇)",
			
 
				+            len(articles), len(passed_articles), len(articles) - len(passed_articles),
			
 
				+        )
			
 
				+
			
 
				+        ctx.candidate_articles = passed_articles
			
 
				+        ctx.metadata["_coarse_filter_log"] = coarse_log
			
 
				+        return ctx
			
 
				+
			
 
				+    async def _judge_batch(
			
 
				+        self,
			
 
				+        query: str,
			
 
				+        demand_summary: str,
			
 
				+        batch: list,
			
 
				+        ctx: PipelineContext,
			
 
				+    ) -> List[Dict[str, str]]:
			
 
				+        """对一批文章标题做 LLM 语义相关性判断。"""
			
 
				+        titles_block = "\n".join(
			
 
				+            f"{i + 1}. {a.title}" for i, a in enumerate(batch)
			
 
				+        )
			
 
				+
			
 
				+        messages = [
			
 
				+            {
			
 
				+                "role": "system",
			
 
				+                "content": (
			
 
				+                    "你是内容相关性筛选专家。根据用户的搜索需求，判断每篇文章标题是否与需求相关。\n"
			
 
				+                    "你不能调用任何工具。\n\n"
			
 
				+                    "【判断原则】\n"
			
 
				+                    "1. 宽松判断：只要标题暗示文章可能与需求主题相关，就判为 pass\n"
			
 
				+                    "2. 只淘汰明显不相关的：标题与需求主题完全无关才判为 reject\n"
			
 
				+                    "3. 不确定时倾向于 pass（宁可多留，不可误杀）\n\n"
			
 
				+                    "输出格式：JSON，放在 ```json 代码块中。\n"
			
 
				+                    "```json\n"
			
 
				+                    '{"results": [{"index": 1, "relevance": "pass", "reason": "简短理由"}, ...]}\n'
			
 
				+                    "```\n"
			
 
				+                    "relevance 只能是 \"pass\" 或 \"reject\"。"
			
 
				+                ),
			
 
				+            },
			
 
				+            {
			
 
				+                "role": "user",
			
 
				+                "content": (
			
 
				+                    f"搜索需求: {query}\n\n"
			
 
				+                    f"{demand_summary}\n\n"
			
 
				+                    f"请判断以下 {len(batch)} 篇文章标题的相关性:\n\n"
			
 
				+                    f"{titles_block}"
			
 
				+                ),
			
 
				+            },
			
 
				+        ]
			
 
				+
			
 
				+        try:
			
 
				+            result = await self.agent_executor.run_simple_llm_json(
			
 
				+                name="标题粗筛",
			
 
				+                messages=messages,
			
 
				+                ctx=ctx,
			
 
				+            )
			
 
				+            items = result.get("results", [])
			
 
				+            # 按 index 排序并对齐到 batch
			
 
				+            indexed: Dict[int, Dict] = {}
			
 
				+            for item in items:
			
 
				+                idx = item.get("index", 0)
			
 
				+                if isinstance(idx, int) and 1 <= idx <= len(batch):
			
 
				+                    indexed[idx] = item
			
 
				+
			
 
				+            return [
			
 
				+                indexed.get(i + 1, {"relevance": "pass", "reason": "LLM 未返回结果，默认通过"})
			
 
				+                for i in range(len(batch))
			
 
				+            ]
			
 
				+        except Exception as exc:
			
 
				+            logger.warning("coarse_filter LLM 调用失败，全部放行: %s", exc)
			
 
				+            return [{"relevance": "pass", "reason": "LLM 调用失败，默认通过"} for _ in batch]
			
 
				+
			
 
				+    @staticmethod
			
 
				+    def _build_demand_summary(ctx: PipelineContext) -> str:
			
 
				+        """从需求分析结果中提取摘要信息供粗筛 LLM 参考。"""
			
 
				+        da = ctx.demand_analysis
			
 
				+        if not da:
			
 
				+            return ""
			
 
				+
			
 
				+        parts: List[str] = []
			
 
				+        if da.substantive_features:
			
 
				+            parts.append(f"核心主题特征: {', '.join(da.substantive_features)}")
			
 
				+        if da.search_strategy.precise_keywords:
			
 
				+            parts.append(f"精准关键词: {', '.join(da.search_strategy.precise_keywords)}")
			
 
				+        if da.search_strategy.topic_keywords:
			
 
				+            parts.append(f"主题关键词: {', '.join(da.search_strategy.topic_keywords)}")
			
 
				+        if da.filter_focus.relevance_focus:
			
 
				+            parts.append(f"相关性关注点: {', '.join(da.filter_focus.relevance_focus)}")
			
 
				+
			
 
				+        return "需求分析摘要:\n" + "\n".join(f"- {p}" for p in parts) if parts else ""
			
--- a/src/pipeline/stages/common.py
+++ b/src/pipeline/stages/common.py
@@ -0,0 +1,279 @@
 
				+from __future__ import annotations
			
 
				+
			
 
				+"""
			
 
				+Stage 公共能力。
			
 
				+
			
 
				+核心能力：
			
 
				+- StageAgentExecutor: 在阶段内复用 AgentRunner，执行"必须产出 JSON"的子任务
			
 
				+- extract_json_object: 从模型文本中尽量稳健地提取 JSON 对象
			
 
				+- LLM 交互追踪: 捕获每次 LLM 调用的 prompt/response/reasoning，写入 ctx.metadata
			
 
				+"""
			
 
				+
			
 
				+import json
			
 
				+import logging
			
 
				+import re
			
 
				+import time
			
 
				+from typing import Any, Dict, List, Optional
			
 
				+
			
 
				+from agent import AgentRunner, FileSystemTraceStore, Message, RunConfig, Trace
			
 
				+from agent.llm import create_openrouter_llm_call
			
 
				+from agent.skill.skill_loader import load_skills_from_dir
			
 
				+from agent.tools.builtin.knowledge import KnowledgeConfig
			
 
				+from src.pipeline.context import PipelineContext
			
 
				+
			
 
				+logger = logging.getLogger(__name__)
			
 
				+
			
 
				+JSON_BLOCK_RE = re.compile(r"```json\s*(\{.*?\})\s*```", re.DOTALL)
			
 
				+
			
 
				+
			
 
				+def _append_llm_interaction(ctx: PipelineContext, interaction: Dict[str, Any]) -> None:
			
 
				+    """将一条 LLM 交互记录追加到 ctx.metadata['_llm_interactions']。"""
			
 
				+    interactions: List[Dict[str, Any]] = ctx.metadata.setdefault("_llm_interactions", [])
			
 
				+    interactions.append(interaction)
			
 
				+
			
 
				+
			
 
				+class StageAgentExecutor:
			
 
				+    """阶段内 LLM 执行器（面向结构化 JSON 输出场景）。"""
			
 
				+
			
 
				+    def __init__(
			
 
				+        self,
			
 
				+        trace_store: FileSystemTraceStore,
			
 
				+        skills_dir: str,
			
 
				+        model: str,
			
 
				+        temperature: float = 0.2,
			
 
				+        max_iterations: int = 12,
			
 
				+        extra_llm_params: Optional[Dict[str, Any]] = None,
			
 
				+    ):
			
 
				+        self.trace_store = trace_store
			
 
				+        self.skills_dir = skills_dir
			
 
				+        self.model = model
			
 
				+        self.temperature = temperature
			
 
				+        self.max_iterations = max_iterations
			
 
				+        self.extra_llm_params = extra_llm_params or {"max_tokens": 8192}
			
 
				+        self._llm_call = create_openrouter_llm_call(model=model)
			
 
				+        # 预加载全部 skills，按 name 索引（只从磁盘读取一次）
			
 
				+        loaded_skills = load_skills_from_dir(skills_dir)
			
 
				+        self._all_skills = {s.name: s for s in loaded_skills}
			
 
				+        self.runner = AgentRunner(
			
 
				+            llm_call=self._llm_call,
			
 
				+            trace_store=trace_store,
			
 
				+            skills_dir=skills_dir,
			
 
				+        )
			
 
				+        # 共享已加载的 skills 给 AgentRunner，避免重复读磁盘
			
 
				+        self.runner._cached_skills = loaded_skills
			
 
				+
			
 
				+    async def run_json_stage(
			
 
				+        self,
			
 
				+        ctx: PipelineContext,
			
 
				+        name: str,
			
 
				+        messages: List[Dict[str, Any]],
			
 
				+        allowed_tools: List[str],
			
 
				+        skills: Optional[List[str]] = None,
			
 
				+    ) -> Dict[str, Any]:
			
 
				+        """
			
 
				+        运行一个阶段子任务并解析 JSON 结果。
			
 
				+
			
 
				+        使用约束：
			
 
				+        - prompt 必须要求模型输出 JSON 代码块
			
 
				+        - 解析失败会抛异常，交由上层阶段处理
			
 
				+
			
 
				+        Args:
			
 
				+            skills: 要注入的 skill 名称列表。None 表示不注入自定义 skills。
			
 
				+        """
			
 
				+        config = RunConfig(
			
 
				+            name=name,
			
 
				+            model=self.model,
			
 
				+            temperature=self.temperature,
			
 
				+            max_iterations=self.max_iterations,
			
 
				+            tools=allowed_tools,
			
 
				+            skills=skills,
			
 
				+            extra_llm_params=self.extra_llm_params,
			
 
				+            knowledge=KnowledgeConfig(
			
 
				+                enable_extraction=False,
			
 
				+                enable_completion_extraction=False,
			
 
				+                enable_injection=False,
			
 
				+            ),
			
 
				+        )
			
 
				+
			
 
				+        t0 = time.monotonic()
			
 
				+        assistant_texts: List[str] = []
			
 
				+        assistant_reasonings: List[str] = []
			
 
				+        tool_interactions: List[Dict[str, Any]] = []
			
 
				+        agent_trace_id: str | None = None
			
 
				+        total_tokens = 0
			
 
				+
			
 
				+        async for item in self.runner.run(messages=messages, config=config):
			
 
				+            if isinstance(item, Trace) and item.trace_id:
			
 
				+                agent_trace_id = item.trace_id
			
 
				+                ctx.trace_id = ctx.trace_id or item.trace_id
			
 
				+            elif isinstance(item, Message):
			
 
				+                if item.role == "assistant":
			
 
				+                    content = item.content
			
 
				+                    if isinstance(content, dict):
			
 
				+                        text = content.get("text", "") or ""
			
 
				+                        reasoning = content.get("reasoning_content", "") or ""
			
 
				+                        tc = content.get("tool_calls", []) or []
			
 
				+                    else:
			
 
				+                        text = str(content or "")
			
 
				+                        reasoning = ""
			
 
				+                        tc = []
			
 
				+                    if text:
			
 
				+                        assistant_texts.append(text)
			
 
				+                    if reasoning:
			
 
				+                        assistant_reasonings.append(reasoning)
			
 
				+                    for call in tc:
			
 
				+                        fn = call.get("function", {})
			
 
				+                        tool_interactions.append({
			
 
				+                            "tool_name": fn.get("name", ""),
			
 
				+                            "arguments": fn.get("arguments", ""),
			
 
				+                        })
			
 
				+                    if item.tokens:
			
 
				+                        total_tokens += item.tokens
			
 
				+                elif item.role == "tool":
			
 
				+                    content = item.content if isinstance(item.content, dict) else {}
			
 
				+                    tool_name = content.get("tool_name", "")
			
 
				+                    result_text = content.get("result", "")
			
 
				+                    if isinstance(result_text, list):
			
 
				+                        result_text = next(
			
 
				+                            (b.get("text", "") for b in result_text if isinstance(b, dict) and b.get("type") == "text"),
			
 
				+                            str(result_text),
			
 
				+                        )
			
 
				+                    # 追加工具返回到最近一个 tool_interaction
			
 
				+                    if tool_interactions and tool_interactions[-1].get("tool_name") == tool_name:
			
 
				+                        tool_interactions[-1]["result_preview"] = str(result_text)[:500]
			
 
				+
			
 
				+        duration_ms = int((time.monotonic() - t0) * 1000)
			
 
				+
			
 
				+        # 记录 LLM 交互到 ctx.metadata
			
 
				+        _append_llm_interaction(ctx, {
			
 
				+            "stage": ctx.current_stage,
			
 
				+            "name": name,
			
 
				+            "messages": _compact_messages(messages),
			
 
				+            "response_text": "\n".join(assistant_texts),
			
 
				+            "reasoning": "\n".join(assistant_reasonings),
			
 
				+            "tool_calls": tool_interactions if tool_interactions else None,
			
 
				+            "duration_ms": duration_ms,
			
 
				+            "tokens": total_tokens,
			
 
				+            "model": self.model,
			
 
				+        })
			
 
				+
			
 
				+        # 记录 agent 子任务 trace_id，供 PipelineTraceHook 关联
			
 
				+        if agent_trace_id:
			
 
				+            stage_traces: List[Dict[str, str]] = ctx.metadata.setdefault("_stage_agent_traces", [])
			
 
				+            stage_traces.append({"stage": name, "agent_trace_id": agent_trace_id})
			
 
				+
			
 
				+        for text in reversed(assistant_texts):
			
 
				+            data = extract_json_object(text)
			
 
				+            if data is not None:
			
 
				+                return data
			
 
				+        raise ValueError(f"{name} 未产出可解析 JSON")
			
 
				+
			
 
				+    async def run_simple_llm_json(
			
 
				+        self,
			
 
				+        name: str,
			
 
				+        messages: List[Dict[str, Any]],
			
 
				+        skills: Optional[List[str]] = None,
			
 
				+        ctx: PipelineContext | None = None,
			
 
				+    ) -> Dict[str, Any]:
			
 
				+        """
			
 
				+        单轮 LLM 调用并解析 JSON，不走 Agent 循环。
			
 
				+
			
 
				+        适用于不需要工具调用的纯文本生成场景（如质量复评），
			
 
				+        跳过 goal 创建、iteration 循环等开销。
			
 
				+
			
 
				+        Args:
			
 
				+            skills: 要注入的 skill 名称列表。指定后会将对应 skill 内容
			
 
				+                    追加到 system message 中。None 表示不注入。
			
 
				+            ctx: PipelineContext，传入时会记录 LLM 交互日志。
			
 
				+        """
			
 
				+        if skills:
			
 
				+            messages = self._inject_skills_into_messages(messages, skills)
			
 
				+
			
 
				+        params = {
			
 
				+            "temperature": self.temperature,
			
 
				+            **self.extra_llm_params,
			
 
				+        }
			
 
				+        t0 = time.monotonic()
			
 
				+        response = await self._llm_call(messages=messages, **params)
			
 
				+        duration_ms = int((time.monotonic() - t0) * 1000)
			
 
				+
			
 
				+        text = response.get("content") or ""
			
 
				+        reasoning = response.get("reasoning_content") or ""
			
 
				+        tokens = (response.get("prompt_tokens") or 0) + (response.get("completion_tokens") or 0)
			
 
				+
			
 
				+        # 记录 LLM 交互
			
 
				+        if ctx is not None:
			
 
				+            _append_llm_interaction(ctx, {
			
 
				+                "stage": ctx.current_stage,
			
 
				+                "name": name,
			
 
				+                "messages": _compact_messages(messages),
			
 
				+                "response_text": text,
			
 
				+                "reasoning": reasoning,
			
 
				+                "tool_calls": None,
			
 
				+                "duration_ms": duration_ms,
			
 
				+                "tokens": tokens,
			
 
				+                "model": self.model,
			
 
				+            })
			
 
				+
			
 
				+        data = extract_json_object(text)
			
 
				+        if data is not None:
			
 
				+            return data
			
 
				+        raise ValueError(f"{name} 未产出可解析 JSON")
			
 
				+
			
 
				+    def _inject_skills_into_messages(
			
 
				+        self,
			
 
				+        messages: List[Dict[str, Any]],
			
 
				+        skill_names: List[str],
			
 
				+    ) -> List[Dict[str, Any]]:
			
 
				+        """将指定 skills 的内容追加到 messages 中第一条 system 消息末尾。"""
			
 
				+        matched = [self._all_skills[n] for n in skill_names if n in self._all_skills]
			
 
				+        if not matched:
			
 
				+            return messages
			
 
				+        skills_text = "\n\n".join(s.to_prompt_text() for s in matched)
			
 
				+        suffix = f"\n\n## Skills\n{skills_text}"
			
 
				+
			
 
				+        result = []
			
 
				+        injected = False
			
 
				+        for msg in messages:
			
 
				+            if not injected and msg.get("role") == "system":
			
 
				+                result.append({**msg, "content": (msg.get("content") or "") + suffix})
			
 
				+                injected = True
			
 
				+            else:
			
 
				+                result.append(msg)
			
 
				+        return result
			
 
				+
			
 
				+
			
 
				+def _compact_messages(messages: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
			
 
				+    """压缩 messages 用于 trace 记录：保留 role 和截断后的 content。"""
			
 
				+    compacted = []
			
 
				+    for msg in messages:
			
 
				+        content = msg.get("content", "")
			
 
				+        if isinstance(content, str) and len(content) > 2000:
			
 
				+            content = content[:2000] + f"... (truncated, total {len(msg['content'])} chars)"
			
 
				+        compacted.append({"role": msg.get("role", ""), "content": content})
			
 
				+    return compacted
			
 
				+
			
 
				+
			
 
				+def extract_json_object(text: str) -> Optional[Dict[str, Any]]:
			
 
				+    """
			
 
				+    从文本中提取 JSON 对象。
			
 
				+
			
 
				+    优先级：
			
 
				+    1) ```json 代码块
			
 
				+    2) 文本中首尾大括号包裹的片段（兜底）
			
 
				+    """
			
 
				+    matches = JSON_BLOCK_RE.findall(text or "")
			
 
				+    for block in reversed(matches):
			
 
				+        try:
			
 
				+            return json.loads(block)
			
 
				+        except json.JSONDecodeError:
			
 
				+            continue
			
 
				+
			
 
				+    start = (text or "").find("{")
			
 
				+    end = (text or "").rfind("}")
			
 
				+    if start >= 0 and end > start:
			
 
				+        try:
			
 
				+            return json.loads(text[start : end + 1])
			
 
				+        except json.JSONDecodeError:
			
 
				+            return None
			
 
				+    return None
			
--- a/src/pipeline/stages/content_filter.py
+++ b/src/pipeline/stages/content_filter.py
@@ -0,0 +1,578 @@
 
				+from __future__ import annotations
			
 
				+
			
 
				+"""筛选阶段：硬规则过滤 + 质量评估（启发式/可选 LLM 复评）。"""
			
 
				+
			
 
				+import logging
			
 
				+import os
			
 
				+import re
			
 
				+from dataclasses import dataclass
			
 
				+from datetime import datetime
			
 
				+from typing import Any, Dict, List, Tuple
			
 
				+
			
 
				+from src.pipeline.adapters.base import ToolAdapter
			
 
				+from src.pipeline.base import Stage
			
 
				+from src.pipeline.context import CandidateArticle, FilteredArticle, PipelineContext
			
 
				+from src.pipeline.stages.common import StageAgentExecutor
			
 
				+
			
 
				+logger = logging.getLogger(__name__)
			
 
				+
			
 
				+# 用于从中文 query 中拆分子词的标点正则
			
 
				+_PUNCT_RE = re.compile(r"[，,、。？?！!；;：:""''（）()\[\]【】\s]+")
			
 
				+_FILLER_RE = re.compile(r"^(什么|怎么|怎样|哪里|哪些|如何|为什么|会不会|是不是|谁|多少)$")
			
 
				+# 拆分后的最小有效长度
			
 
				+_MIN_TERM_LEN = 2
			
 
				+# CJK Unicode 范围，用于识别纯中文字符串
			
 
				+_CJK_RE = re.compile(r"[\u4e00-\u9fff]+")
			
 
				+
			
 
				+# 用于模糊匹配时忽略的中文虚词/连接词
			
 
				+_CONNECTORS = set("的了和与在是也不但又则被把将对从而且或")
			
 
				+
			
 
				+
			
 
				+def _fuzzy_contains(term: str, haystack: str) -> bool:
			
 
				+    """判断 term 是否在 haystack 中出现（支持中文模糊匹配）。
			
 
				+
			
 
				+    规则：
			
 
				+    1. 先尝试精确子串匹配
			
 
				+    2. 对纯中文长词（≥3 字符），检查去除虚词后的核心字符是否在
			
 
				+       haystack 中都出现过（允许中间插入连接词）
			
 
				+    """
			
 
				+    term_lower = term.lower()
			
 
				+    if term_lower in haystack:
			
 
				+        return True
			
 
				+
			
 
				+    # 对短词或非中文词不做模糊匹配
			
 
				+    if len(term) < 3 or not _CJK_RE.fullmatch(term):
			
 
				+        return False
			
 
				+
			
 
				+    # 提取 term 中的核心字符（去除虚词）
			
 
				+    core_chars = [c for c in term if c not in _CONNECTORS]
			
 
				+    if len(core_chars) < 2:
			
 
				+        return False
			
 
				+
			
 
				+    # 检查所有核心字符是否在 haystack 附近（500字窗口）出现
			
 
				+    # 找每个核心字符在 haystack 中的所有位置
			
 
				+    positions: List[List[int]] = []
			
 
				+    for c in core_chars:
			
 
				+        pos = [i for i, ch in enumerate(haystack) if ch == c]
			
 
				+        if not pos:
			
 
				+            return False  # 有核心字符完全不存在
			
 
				+        positions.append(pos)
			
 
				+
			
 
				+    # 检查是否存在一个 500 字符窗口包含所有核心字符
			
 
				+    # 使用第一个字符的位置作为锚点
			
 
				+    window_size = 500
			
 
				+    for anchor in positions[0]:
			
 
				+        window_start = max(0, anchor - window_size // 2)
			
 
				+        window_end = anchor + window_size // 2
			
 
				+        all_in_window = True
			
 
				+        for other_pos in positions[1:]:
			
 
				+            if not any(window_start <= p <= window_end for p in other_pos):
			
 
				+                all_in_window = False
			
 
				+                break
			
 
				+        if all_in_window:
			
 
				+            return True
			
 
				+
			
 
				+    return False
			
 
				+# ─────────────────────────────────────────────
			
 
				+
			
 
				+@dataclass
			
 
				+class QualityScoreConfig:
			
 
				+    """
			
 
				+    文章质量启发式评分的阈值配置。
			
 
				+
			
 
				+    所有数值均可通过以下途径覆盖（优先级从高到低）：
			
 
				+    1. DB 策略 search_agent_policy 中的同名字段
			
 
				+    2. 构造 QualityFilterStage 时传入的 score_config
			
 
				+    3. 环境变量 QUALITY_SCORE_*
			
 
				+    4. 此处的默认值
			
 
				+    """
			
 
				+    # 正文长度阈值：>= 此值视为内容充实，interest 给 high
			
 
				+    min_body_length: int = 800
			
 
				+
			
 
				+    # 关键词匹配比例阈值：>= 此值判定 relevance=high，否则 medium
			
 
				+    high_relevance_ratio: float = 0.8
			
 
				+
			
 
				+    # 阅读量阈值：>= 此值直接提升 interest 为 high
			
 
				+    high_view_count: int = 10000
			
 
				+    # 阅读量阈值：>= 此值且 interest 不是 high 时提升为 high
			
 
				+    medium_view_count: int = 1000
			
 
				+
			
 
				+    # 互动率阈值（(点赞+分享+在看) / 阅读量）
			
 
				+    high_engage_rate: float = 0.05      # >= 此值标注高互动
			
 
				+    low_engage_rate: float = 0.001      # < 此值且阅读量达标标注互动极低
			
 
				+
			
 
				+    # 标题党/煽动性关键词
			
 
				+    spam_keywords: Tuple[str, ...] = ("震惊", "必看", "吓坏", "立刻转发")
			
 
				+
			
 
				+    @classmethod
			
 
				+    def from_env(cls) -> "QualityScoreConfig":
			
 
				+        """从环境变量读取，前缀 QUALITY_SCORE_。"""
			
 
				+        def _int(key: str, default: int) -> int:
			
 
				+            return int(os.getenv(f"QUALITY_SCORE_{key}", str(default)))
			
 
				+
			
 
				+        def _float(key: str, default: float) -> float:
			
 
				+            return float(os.getenv(f"QUALITY_SCORE_{key}", str(default)))
			
 
				+
			
 
				+        spam_raw = os.getenv("QUALITY_SCORE_SPAM_KEYWORDS", "")
			
 
				+        spam = tuple(s.strip() for s in spam_raw.split(",") if s.strip()) if spam_raw else cls.spam_keywords
			
 
				+
			
 
				+        return cls(
			
 
				+            min_body_length=_int("MIN_BODY_LENGTH", cls.min_body_length),
			
 
				+            high_relevance_ratio=_float("HIGH_RELEVANCE_RATIO", cls.high_relevance_ratio),
			
 
				+            high_view_count=_int("HIGH_VIEW_COUNT", cls.high_view_count),
			
 
				+            medium_view_count=_int("MEDIUM_VIEW_COUNT", cls.medium_view_count),
			
 
				+            high_engage_rate=_float("HIGH_ENGAGE_RATE", cls.high_engage_rate),
			
 
				+            low_engage_rate=_float("LOW_ENGAGE_RATE", cls.low_engage_rate),
			
 
				+            spam_keywords=spam,
			
 
				+        )
			
 
				+
			
 
				+    def merge_policy(self, policy: Dict) -> "QualityScoreConfig":
			
 
				+        """用 DB 策略中的值覆盖当前配置，返回新实例。"""
			
 
				+        score_cfg = policy.get("quality_score") or {}
			
 
				+        if not score_cfg:
			
 
				+            return self
			
 
				+        return QualityScoreConfig(
			
 
				+            min_body_length=int(score_cfg.get("min_body_length", self.min_body_length)),
			
 
				+            high_relevance_ratio=float(score_cfg.get("high_relevance_ratio", self.high_relevance_ratio)),
			
 
				+            high_view_count=int(score_cfg.get("high_view_count", self.high_view_count)),
			
 
				+            medium_view_count=int(score_cfg.get("medium_view_count", self.medium_view_count)),
			
 
				+            high_engage_rate=float(score_cfg.get("high_engage_rate", self.high_engage_rate)),
			
 
				+            low_engage_rate=float(score_cfg.get("low_engage_rate", self.low_engage_rate)),
			
 
				+            spam_keywords=tuple(score_cfg.get("spam_keywords", self.spam_keywords)),
			
 
				+        )
			
 
				+
			
 
				+    def to_log_dict(self) -> Dict[str, Any]:
			
 
				+        """返回适合日志展示的字典（隐藏 spam_keywords 细节）。"""
			
 
				+        return {
			
 
				+            "min_body_length": self.min_body_length,
			
 
				+            "high_relevance_ratio": self.high_relevance_ratio,
			
 
				+            "view_count_threshold": f"{self.medium_view_count}~{self.high_view_count}",
			
 
				+            "engage_rate_threshold": f"{self.low_engage_rate}~{self.high_engage_rate}",
			
 
				+            "spam_keywords_count": len(self.spam_keywords),
			
 
				+        }
			
 
				+
			
 
				+
			
 
				+class HardFilterStage(Stage):
			
 
				+    name = "hard_filter"
			
 
				+    description = "候选去重与基础规则过滤"
			
 
				+
			
 
				+    def validate_input(self, ctx: PipelineContext) -> List[str]:
			
 
				+        if not ctx.candidate_articles:
			
 
				+            return ["缺少 candidate_articles"]
			
 
				+        return []
			
 
				+
			
 
				+    async def execute(self, ctx: PipelineContext) -> PipelineContext:
			
 
				+        """执行基础合法性过滤与 URL 去重。"""
			
 
				+        dedup: Dict[str, CandidateArticle] = {}
			
 
				+        for item in ctx.candidate_articles:
			
 
				+            if not item.title or not item.url:
			
 
				+                continue
			
 
				+            if not item.url.startswith("http"):
			
 
				+                continue
			
 
				+            if item.publish_time <= 0:
			
 
				+                continue
			
 
				+            dedup.setdefault(item.url, item)
			
 
				+        ctx.candidate_articles = list(dedup.values())
			
 
				+        return ctx
			
 
				+
			
 
				+
			
 
				+class QualityFilterStage(Stage):
			
 
				+    name = "quality_filter"
			
 
				+    description = "根据文章详情完成规则与 LLM 混合筛选"
			
 
				+
			
 
				+    def __init__(
			
 
				+        self,
			
 
				+        adapter: ToolAdapter,
			
 
				+        detail_limit: int = 50,
			
 
				+        agent_executor: StageAgentExecutor | None = None,
			
 
				+        enable_llm_review: bool = True,
			
 
				+        score_config: QualityScoreConfig | None = None,
			
 
				+    ):
			
 
				+        self.adapter = adapter
			
 
				+        self.detail_limit = detail_limit
			
 
				+        self.agent_executor = agent_executor
			
 
				+        self.enable_llm_review = enable_llm_review and agent_executor is not None
			
 
				+        self._base_score_config = score_config or QualityScoreConfig.from_env()
			
 
				+
			
 
				+    def validate_input(self, ctx: PipelineContext) -> List[str]:
			
 
				+        if not ctx.candidate_articles:
			
 
				+            return ["缺少 candidate_articles"]
			
 
				+        return []
			
 
				+
			
 
				+    async def execute(self, ctx: PipelineContext) -> PipelineContext:
			
 
				+        """
			
 
				+        执行质量筛选主流程。
			
 
				+
			
 
				+        步骤：
			
 
				+        1) 拉取文章详情
			
 
				+        2) 启发式打分（相关性/兴趣）
			
 
				+        3) 可选 LLM 复评
			
 
				+        4) 按等级与时间排序后截断到目标数量
			
 
				+        """
			
 
				+        policy = ctx.metadata.get("search_agent_policy") or {}
			
 
				+        limit = int(policy.get("max_detail_fetch", self.detail_limit))
			
 
				+        if "enable_llm_review" in policy:
			
 
				+            enable_llm = bool(policy["enable_llm_review"]) and self.agent_executor is not None
			
 
				+        else:
			
 
				+            enable_llm = self.enable_llm_review
			
 
				+
			
 
				+        # 合并 DB 策略中的评分阈值
			
 
				+        score_cfg = self._base_score_config.merge_policy(policy)
			
 
				+
			
 
				+        # 记录评分配置到 metadata，供 trace hook 和可视化使用
			
 
				+        logger.info("quality_filter 评分配置: %s", score_cfg.to_log_dict())
			
 
				+        ctx.metadata["_quality_score_config"] = score_cfg.to_log_dict()
			
 
				+
			
 
				+        # 记录匹配词表
			
 
				+        match_terms = self._build_match_terms(ctx.query, ctx)
			
 
				+        logger.info("quality_filter 匹配词表 (%d 个): %s", len(match_terms), match_terms)
			
 
				+        ctx.metadata["_quality_match_terms"] = match_terms
			
 
				+
			
 
				+        # 保留上一轮已有的入选结果和审核记录（fallback 重跑时不丢失）
			
 
				+        results: List[FilteredArticle] = list(ctx.filtered_articles)
			
 
				+        review_log: List[Dict] = list(ctx.metadata.get("_quality_review_log", []))
			
 
				+
			
 
				+        # 跳过已审核过的 URL，只审核新增候选
			
 
				+        reviewed_urls: set[str] = {r["url"] for r in review_log if "url" in r}
			
 
				+        pending = [a for a in ctx.candidate_articles if a.url not in reviewed_urls]
			
 
				+        logger.info(
			
 
				+            "quality_filter 候选 %d 篇，已审核 %d 篇，待审核 %d 篇（limit=%d）",
			
 
				+            len(ctx.candidate_articles), len(reviewed_urls), len(pending), limit,
			
 
				+        )
			
 
				+        for article in pending[:limit]:
			
 
				+            detail = await self.adapter.get_article_detail(article.url)
			
 
				+            if not detail or not detail.body_text:
			
 
				+                review_log.append({
			
 
				+                    "title": article.title, "url": article.url,
			
 
				+                    "status": "skip",
			
 
				+                    "reason": "无详情或正文为空",
			
 
				+                    "publish_time": article.publish_time,
			
 
				+                    "view_count": article.view_count,
			
 
				+                    "source_keyword": article.source_keyword,
			
 
				+                })
			
 
				+                continue
			
 
				+
			
 
				+            relevance, interest, reason = self._score_article(ctx.query, article, detail, score_cfg, ctx)
			
 
				+
			
 
				+            # spam 硬规则仍直接淘汰（_score_article 返回 low）
			
 
				+            if relevance == "low":
			
 
				+                review_log.append({
			
 
				+                    "title": detail.title or article.title,
			
 
				+                    "url": detail.url or article.url,
			
 
				+                    "status": "reject",
			
 
				+                    "phase": "heuristic",
			
 
				+                    "relevance": relevance, "interest": interest, "reason": reason,
			
 
				+                    "publish_time": detail.publish_time or article.publish_time,
			
 
				+                    "view_count": detail.view_count,
			
 
				+                    "like_count": detail.like_count,
			
 
				+                    "share_count": detail.share_count,
			
 
				+                    "looking_count": detail.looking_count,
			
 
				+                    "body_length": len(detail.body_text) if detail.body_text else 0,
			
 
				+                    "source_keyword": article.source_keyword,
			
 
				+                })
			
 
				+                continue
			
 
				+
			
 
				+            # 所有非 spam 文章都进 LLM 复评（relevance 由 LLM 决定）
			
 
				+            review_phase = "heuristic"
			
 
				+            if enable_llm:
			
 
				+                llm_review = await self._llm_review(
			
 
				+                    ctx=ctx,
			
 
				+                    article=article,
			
 
				+                    detail_title=detail.title or article.title,
			
 
				+                    detail_url=detail.url or article.url,
			
 
				+                    publish_time=detail.publish_time or article.publish_time,
			
 
				+                    body_text=detail.body_text,
			
 
				+                    heuristic=(relevance, interest, reason),
			
 
				+                )
			
 
				+                if llm_review:
			
 
				+                    relevance, interest, reason = llm_review
			
 
				+                    review_phase = "llm"
			
 
				+
			
 
				+            if relevance == "low":
			
 
				+                review_log.append({
			
 
				+                    "title": detail.title or article.title,
			
 
				+                    "url": detail.url or article.url,
			
 
				+                    "status": "reject",
			
 
				+                    "phase": review_phase,
			
 
				+                    "relevance": relevance, "interest": interest, "reason": reason,
			
 
				+                    "publish_time": detail.publish_time or article.publish_time,
			
 
				+                    "view_count": detail.view_count,
			
 
				+                    "like_count": detail.like_count,
			
 
				+                    "share_count": detail.share_count,
			
 
				+                    "looking_count": detail.looking_count,
			
 
				+                    "body_length": len(detail.body_text) if detail.body_text else 0,
			
 
				+                    "source_keyword": article.source_keyword,
			
 
				+                })
			
 
				+                continue
			
 
				+
			
 
				+            review_log.append({
			
 
				+                "title": detail.title or article.title,
			
 
				+                "url": detail.url or article.url,
			
 
				+                "status": "accept",
			
 
				+                "phase": review_phase,
			
 
				+                "relevance": relevance, "interest": interest, "reason": reason,
			
 
				+                "publish_time": detail.publish_time or article.publish_time,
			
 
				+                "view_count": detail.view_count,
			
 
				+                "like_count": detail.like_count,
			
 
				+                "share_count": detail.share_count,
			
 
				+                "looking_count": detail.looking_count,
			
 
				+                "body_length": len(detail.body_text) if detail.body_text else 0,
			
 
				+                "source_keyword": article.source_keyword,
			
 
				+            })
			
 
				+            results.append(
			
 
				+                FilteredArticle(
			
 
				+                    title=detail.title or article.title,
			
 
				+                    url=detail.url or article.url,
			
 
				+                    publish_time=detail.publish_time or article.publish_time,
			
 
				+                    view_count=detail.view_count,
			
 
				+                    like_count=detail.like_count,
			
 
				+                    share_count=detail.share_count,
			
 
				+                    looking_count=detail.looking_count,
			
 
				+                    source_keyword=article.source_keyword,
			
 
				+                    recall_round=article.recall_round,
			
 
				+                    statistics=article.statistics,
			
 
				+                    reason=reason,
			
 
				+                    relevance_level=relevance,
			
 
				+                    interest_level=interest,
			
 
				+                    detail_title=detail.title,
			
 
				+                    detail_url=detail.url,
			
 
				+                    body_text=detail.body_text,
			
 
				+                )
			
 
				+            )
			
 
				+
			
 
				+        results.sort(
			
 
				+            key=lambda item: (
			
 
				+                _level_rank(item.relevance_level),
			
 
				+                _level_rank(item.interest_level),
			
 
				+                item.view_count,
			
 
				+                item.publish_time,
			
 
				+            ),
			
 
				+            reverse=True,
			
 
				+        )
			
 
				+        ctx.filtered_articles = results[: ctx.target_count]
			
 
				+        ctx.metadata["_quality_review_log"] = review_log
			
 
				+        return ctx
			
 
				+
			
 
				+    def _score_article(
			
 
				+        self,
			
 
				+        query: str,
			
 
				+        article: CandidateArticle,
			
 
				+        detail: "ArticleDetail",
			
 
				+        cfg: QualityScoreConfig,
			
 
				+        ctx: PipelineContext | None = None,
			
 
				+    ) -> Tuple[str, str, str]:
			
 
				+        """数据驱动的 interest 评分 + 关键词匹配参考信息。
			
 
				+
			
 
				+        返回 (relevance_level, interest_level, reason)。
			
 
				+        - relevance 固定为 "pending"，交由 LLM 复评决定最终相关性
			
 
				+        - interest 仅基于数据指标（正文长度、阅读量、互动率）
			
 
				+        - 关键词匹配结果作为参考信息写入 reason，供 LLM 参考
			
 
				+        """
			
 
				+        body_text = detail.body_text
			
 
				+        haystack = f"{article.title}\n{body_text[:2500]}"
			
 
				+        haystack_lower = haystack.lower()
			
 
				+
			
 
				+        # interest 基于数据指标
			
 
				+        interest = "high" if len(body_text) >= cfg.min_body_length else "medium"
			
 
				+
			
 
				+        # spam 检测仍保留为硬规则
			
 
				+        if any(flag in haystack_lower for flag in cfg.spam_keywords):
			
 
				+            return "low", "low", "存在明显标题党或情绪煽动风险"
			
 
				+
			
 
				+        # 利用阅读量/互动数据辅助判断 interest
			
 
				+        view_count = detail.view_count
			
 
				+        engagement = detail.like_count + detail.share_count + detail.looking_count
			
 
				+        quality_notes: List[str] = []
			
 
				+
			
 
				+        if view_count >= cfg.high_view_count:
			
 
				+            interest = "high"
			
 
				+            quality_notes.append(f"阅读量 {view_count}")
			
 
				+        elif view_count >= cfg.medium_view_count:
			
 
				+            interest = "high"
			
 
				+            quality_notes.append(f"阅读量 {view_count}")
			
 
				+
			
 
				+        if view_count > 0 and engagement > 0:
			
 
				+            engage_rate = engagement / view_count
			
 
				+            if engage_rate >= cfg.high_engage_rate:
			
 
				+                quality_notes.append(f"互动率 {engage_rate:.1%}")
			
 
				+            elif engage_rate < cfg.low_engage_rate and view_count >= cfg.medium_view_count:
			
 
				+                quality_notes.append("互动率极低")
			
 
				+
			
 
				+        # 关键词匹配作为参考信息（不再用于决定 relevance）
			
 
				+        match_terms = self._build_match_terms(query, ctx)
			
 
				+        matches = [term for term in match_terms if _fuzzy_contains(term, haystack_lower)]
			
 
				+        match_ratio = len(matches) / max(len(match_terms), 1)
			
 
				+
			
 
				+        keyword_part = ""
			
 
				+        if matches:
			
 
				+            keyword_part = f"关键词命中: {', '.join(matches)}（{match_ratio:.0%}）"
			
 
				+        elif match_terms:
			
 
				+            keyword_part = f"关键词未命中: {'、'.join(match_terms[:8])}"
			
 
				+
			
 
				+        stats_part = f"（{'; '.join(quality_notes)}）" if quality_notes else ""
			
 
				+        reason = f"{keyword_part}{stats_part}".strip()
			
 
				+
			
 
				+        # relevance 设为 pending，交由 LLM 决定
			
 
				+        return "pending", interest, reason
			
 
				+
			
 
				+    @staticmethod
			
 
				+    def _build_match_terms(query: str, ctx: PipelineContext | None = None) -> List[str]:
			
 
				+        """从 query 和需求分析结果中提取用于匹配的子词列表。
			
 
				+
			
 
				+        策略：
			
 
				+        1. 需求分析的 precise_keywords + topic_keywords 按空格拆分为原子词
			
 
				+        2. 需求分析的 substantive_features 作为补充主题词
			
 
				+        3. query 的规则拆分结果兜底
			
 
				+
			
 
				+        匹配模式：只要文章标题+正文命中任一 term 即视为有相关性。
			
 
				+        匹配比例 >= high_relevance_ratio 判定 high，否则 medium。
			
 
				+        所有 term 均未命中才判定 low（淘汰）。
			
 
				+        """
			
 
				+        seen: set[str] = set()
			
 
				+        deduped: List[str] = []
			
 
				+
			
 
				+        def _add(w: str) -> None:
			
 
				+            w = w.strip()
			
 
				+            if len(w) >= _MIN_TERM_LEN and w not in seen:
			
 
				+                seen.add(w)
			
 
				+                deduped.append(w)
			
 
				+
			
 
				+        # 优先使用需求分析的关键词
			
 
				+        if ctx and ctx.demand_analysis:
			
 
				+            da = ctx.demand_analysis
			
 
				+            kw = list(da.search_strategy.precise_keywords) + list(da.search_strategy.topic_keywords)
			
 
				+            for phrase in kw:
			
 
				+                # 按空格拆分搜索短语为原子词
			
 
				+                for atom in phrase.strip().split():
			
 
				+                    _add(atom)
			
 
				+            # 补充 substantive_features（通常是核心主题，如"伊朗以色列冲突"）
			
 
				+            for feat in da.substantive_features:
			
 
				+                _add(feat)
			
 
				+
			
 
				+        # 始终补充 query 的规则拆分结果（提高召回率）
			
 
				+        for term in _extract_sub_terms(query):
			
 
				+            _add(term)
			
 
				+
			
 
				+        return deduped
			
 
				+
			
 
				+    async def _llm_review(
			
 
				+        self,
			
 
				+        ctx: PipelineContext,
			
 
				+        article: CandidateArticle,
			
 
				+        detail_title: str,
			
 
				+        detail_url: str,
			
 
				+        publish_time: int,
			
 
				+        body_text: str,
			
 
				+        heuristic: Tuple[str, str, str],
			
 
				+    ) -> Tuple[str, str, str] | None:
			
 
				+        """使用 LLM 对单篇文章做复评；失败返回 None 走启发式结果。"""
			
 
				+        assert self.agent_executor is not None
			
 
				+        messages = [
			
 
				+            {
			
 
				+                "role": "system",
			
 
				+                "content": (
			
 
				+                    "你是微信文章质检评审员。"
			
 
				+                    "你不能调用任何工具。"
			
 
				+                    "请基于文章标题、正文摘要、启发式初判结果做最终质量判断。"
			
 
				+                    "最终必须输出 JSON，并放在 ```json 代码块中。\n\n"
			
 
				+                    "【重要评审原则】\n"
			
 
				+                    "1. 宽进严出：只要文章与查询主题相关即可入选，不要求完全匹配查询的字面表述。\n"
			
 
				+                    "2. 查询词是用户感兴趣的话题方向，文章只要围绕该话题的任何角度展开均可入选。\n"
			
 
				+                    "3. 只有以下情况才应判为 low 淘汰：\n"
			
 
				+                    "   - 文章内容与查询主题完全无关\n"
			
 
				+                    "   - 明显的标题党、虚假信息或阴谋论\n"
			
 
				+                    "   - 内容质量极差（纯广告、乱码、无实质信息）\n"
			
 
				+                    "4. 不要因为文章角度、立场、情感倾向与查询不完全一致就淘汰。"
			
 
				+                ),
			
 
				+            },
			
 
				+            {
			
 
				+                "role": "user",
			
 
				+                "content": f"""
			
 
				+query: {ctx.query}
			
 
				+
			
 
				+文章标题: {detail_title}
			
 
				+文章链接: {detail_url}
			
 
				+发布时间: {_ts_to_readable(publish_time)}
			
 
				+
			
 
				+启发式初判:
			
 
				+- relevance_level: {heuristic[0]}
			
 
				+- interest_level: {heuristic[1]}
			
 
				+- reason: {heuristic[2]}
			
 
				+
			
 
				+正文摘录:
			
 
				+{body_text[:3000]}
			
 
				+
			
 
				+请复评此文章。记住：只要文章与查询主题相关（即使角度不同），就应入选。
			
 
				+只淘汰真正无关、低质或虚假的内容。
			
 
				+
			
 
				+输出格式:
			
 
				+```json
			
 
				+{{
			
 
				+  "relevance_level": "high",
			
 
				+  "interest_level": "medium",
			
 
				+  "reason": "基于正文证据的入选或淘汰理由"
			
 
				+}}
			
 
				+```
			
 
				+""",
			
 
				+            },
			
 
				+        ]
			
 
				+        try:
			
 
				+            result = await self.agent_executor.run_simple_llm_json(
			
 
				+                name="文章质量复评",
			
 
				+                messages=messages,
			
 
				+                skills=["article_filter_strategy"],
			
 
				+                ctx=ctx,
			
 
				+            )
			
 
				+        except Exception:
			
 
				+            return None
			
 
				+
			
 
				+        relevance = str(result.get("relevance_level", heuristic[0]))
			
 
				+        interest = str(result.get("interest_level", heuristic[1]))
			
 
				+        reason = str(result.get("reason", heuristic[2]))
			
 
				+        if relevance not in {"high", "medium", "low"}:
			
 
				+            relevance = heuristic[0]
			
 
				+        if interest not in {"high", "medium", "low"}:
			
 
				+            interest = heuristic[1]
			
 
				+        return relevance, interest, reason
			
 
				+
			
 
				+
			
 
				+def _extract_sub_terms(query: str) -> List[str]:
			
 
				+    """从原始 query 中拆分出子词用于模糊匹配（兜底方案）。
			
 
				+
			
 
				+    仅按标点/空白切分，过滤填充词和过短片段。
			
 
				+    正式 pipeline 中优先使用需求分析阶段 LLM 产出的关键词。
			
 
				+    """
			
 
				+    segments = _PUNCT_RE.split(query)
			
 
				+    terms: List[str] = []
			
 
				+    seen: set[str] = set()
			
 
				+
			
 
				+    def _add(w: str) -> None:
			
 
				+        w = w.strip()
			
 
				+        if len(w) >= _MIN_TERM_LEN and w not in seen and not _FILLER_RE.match(w):
			
 
				+            seen.add(w)
			
 
				+            terms.append(w)
			
 
				+
			
 
				+    for seg in segments:
			
 
				+        seg = seg.strip()
			
 
				+        if len(seg) < _MIN_TERM_LEN:
			
 
				+            continue
			
 
				+        _add(seg)
			
 
				+    return terms
			
 
				+
			
 
				+
			
 
				+def _ts_to_readable(ts: int) -> str:
			
 
				+    """将时间戳转为可读日期，自动判断秒级还是毫秒级。"""
			
 
				+    if not ts:
			
 
				+        return "未知"
			
 
				+    # 秒级时间戳一般 10 位（~到 2286 年），毫秒级 13 位
			
 
				+    if ts > 1e12:
			
 
				+        ts = ts // 1000
			
 
				+    try:
			
 
				+        return datetime.fromtimestamp(ts).strftime("%Y-%m-%d %H:%M:%S")
			
 
				+    except (OSError, ValueError):
			
 
				+        return str(ts)
			
 
				+
			
 
				+
			
 
				+def _level_rank(value: str) -> int:
			
 
				+    """把等级映射为排序权重。"""
			
 
				+    return {"low": 1, "medium": 2, "high": 3}.get(value, 0)
			
--- a/src/pipeline/stages/content_search.py
+++ b/src/pipeline/stages/content_search.py
@@ -0,0 +1,393 @@
 
				+from __future__ import annotations
			
 
				+
			
 
				+"""内容搜索阶段：按策略生成关键词并召回候选文章。"""
			
 
				+
			
 
				+import json
			
 
				+import logging
			
 
				+import re
			
 
				+from typing import Dict, List
			
 
				+
			
 
				+from agent import Message, RunConfig, Trace
			
 
				+from agent.tools.builtin.knowledge import KnowledgeConfig
			
 
				+
			
 
				+from src.pipeline.adapters.base import ToolAdapter
			
 
				+from src.pipeline.base import Stage
			
 
				+from src.pipeline.context import CandidateArticle, PipelineContext
			
 
				+from src.pipeline.stages.common import StageAgentExecutor, _append_llm_interaction, _compact_messages, extract_json_object
			
 
				+
			
 
				+# 从 weixin_search 工具输出中提取 JSON 文章列表
			
 
				+_TOOL_JSON_RE = re.compile(r"```json\s*(\[.*?\])\s*```", re.DOTALL)
			
 
				+
			
 
				+
			
 
				+class ContentSearchStage(Stage):
			
 
				+    name = "content_search"
			
 
				+    description = "按策略搜索候选文章"
			
 
				+
			
 
				+    def __init__(
			
 
				+        self,
			
 
				+        adapter: ToolAdapter,
			
 
				+        agent_executor: StageAgentExecutor | None = None,
			
 
				+        max_keywords: int = 6,
			
 
				+        page: str = "1",
			
 
				+        recall_multiplier: float = 5.0,
			
 
				+    ):
			
 
				+        self.adapter = adapter
			
 
				+        self.agent_executor = agent_executor
			
 
				+        self.max_keywords = max_keywords
			
 
				+        self.page = page
			
 
				+        self.recall_multiplier = recall_multiplier
			
 
				+
			
 
				+    def validate_input(self, ctx: PipelineContext) -> List[str]:
			
 
				+        if not ctx.demand_analysis:
			
 
				+            return ["缺少 demand_analysis"]
			
 
				+        return []
			
 
				+
			
 
				+    def _max_recall(self, ctx: PipelineContext) -> int:
			
 
				+        """计算本次搜索的最大召回候选数。
			
 
				+
			
 
				+        公式：target_count * recall_multiplier，可通过 policy 覆盖。
			
 
				+        例：目标 10 篇 × 5.0 = 最多 50 篇候选进入过滤阶段。
			
 
				+        """
			
 
				+        policy = ctx.metadata.get("search_agent_policy") or {}
			
 
				+        mult = float(policy.get("recall_multiplier", self.recall_multiplier))
			
 
				+        return max(int(ctx.target_count * mult), ctx.target_count + 1)
			
 
				+
			
 
				+    async def execute(self, ctx: PipelineContext) -> PipelineContext:
			
 
				+        """
			
 
				+        执行候选召回。
			
 
				+
			
 
				+        优先使用 Agent 驱动搜索（注入 content_finding_strategy skill），
			
 
				+        Agent 不可用时回退到代码驱动搜索。
			
 
				+        """
			
 
				+        if self.agent_executor is not None:
			
 
				+            return await self._agent_search(ctx)
			
 
				+        return await self._code_search(ctx)
			
 
				+
			
 
				+    # ─────────────────────────────────────────────
			
 
				+    # Agent 驱动搜索
			
 
				+    # ─────────────────────────────────────────────
			
 
				+
			
 
				+    async def _agent_search(self, ctx: PipelineContext) -> PipelineContext:
			
 
				+        """
			
 
				+        使用 Agent + content_finding_strategy skill 执行搜索。
			
 
				+
			
 
				+        候选文章从 weixin_search 工具返回的结构化数据中直接提取，
			
 
				+        不依赖 Agent 最终 JSON 输出（LLM 无法可靠地复述大量文章数据）。
			
 
				+        Agent 的 JSON 输出仅用于 keyword_stats。
			
 
				+        """
			
 
				+        assert self.agent_executor is not None
			
 
				+        analysis = ctx.demand_analysis
			
 
				+        assert analysis is not None
			
 
				+
			
 
				+        precise_keywords = json.dumps(analysis.search_strategy.precise_keywords, ensure_ascii=False)
			
 
				+        topic_keywords = json.dumps(analysis.search_strategy.topic_keywords, ensure_ascii=False)
			
 
				+        upper_features = json.dumps(analysis.upper_features, ensure_ascii=False)
			
 
				+        lower_features = json.dumps(analysis.lower_features, ensure_ascii=False)
			
 
				+        filter_focus = ""
			
 
				+        if analysis.filter_focus:
			
 
				+            elimination_risks = json.dumps(analysis.filter_focus.elimination_risks, ensure_ascii=False)
			
 
				+            filter_focus = f"淘汰风险点: {elimination_risks}"
			
 
				+
			
 
				+        # 回退搜索信息
			
 
				+        fallback_round = ctx.metadata.get("_fallback_round", 0)
			
 
				+        fallback_hint = ""
			
 
				+        if fallback_round >= 1:
			
 
				+            prev_stats: list[dict] = ctx.metadata.get("_search_keyword_stats", [])
			
 
				+            used_kws = [str(s.get("keyword", "")) for s in prev_stats]
			
 
				+            fallback_hint = (
			
 
				+                f"\n⚠️ 这是第 {fallback_round + 1} 轮补充搜索，上一轮结果不够。"
			
 
				+                f"\n上一轮已使用的搜索词（请勿重复使用）: {json.dumps(used_kws, ensure_ascii=False)}"
			
 
				+                f"\n请使用不同的搜索词组合，尝试不同角度和表述方式来扩大召回范围。"
			
 
				+            )
			
 
				+
			
 
				+        max_recall = self._max_recall(ctx)
			
 
				+        existing_count = len(ctx.candidate_articles)
			
 
				+        remaining_quota = max(max_recall - existing_count, 0)
			
 
				+
			
 
				+        messages = [
			
 
				+            {
			
 
				+                "role": "system",
			
 
				+                "content": (
			
 
				+                    "你是微信内容搜索执行器。"
			
 
				+                    "你必须调用 weixin_search 工具搜索文章。"
			
 
				+                    "搜索完成后，必须输出一个 JSON 结果，放在 ```json 代码块中。"
			
 
				+                ),
			
 
				+            },
			
 
				+            {
			
 
				+                "role": "user",
			
 
				+                "content": f"""
			
 
				+任务：根据需求分析结果搜索微信文章。
			
 
				+
			
 
				+原始 query: {ctx.query}
			
 
				+目标文章数: {ctx.target_count}
			
 
				+召回上限: {remaining_quota} 篇（已有 {existing_count} 篇候选，总上限 {max_recall} 篇）
			
 
				+
			
 
				+需求分析结果:
			
 
				+- 精准词候选: {precise_keywords}
			
 
				+- 主题下钻候选: {topic_keywords}
			
 
				+- 上层特征: {upper_features}
			
 
				+- 下层特征: {lower_features}
			
 
				+- {filter_focus}
			
 
				+{fallback_hint}
			
 
				+
			
 
				+注意：搜索 2-3 个关键词即可，不要搜索过多。优先使用最相关的精准词。
			
 
				+
			
 
				+请按照 content_finding_strategy 技能中的方法论执行搜索，完成后输出 JSON：
			
 
				+```json
			
 
				+{{
			
 
				+  "keyword_stats": [
			
 
				+    {{"keyword": "搜索词", "round": 1, "returned": 10, "new": 8}}
			
 
				+  ]
			
 
				+}}
			
 
				+```
			
 
				+""",
			
 
				+            },
			
 
				+        ]
			
 
				+
			
 
				+        # 直接迭代 runner，同时捕获 tool 消息和 assistant 消息
			
 
				+        config = RunConfig(
			
 
				+            name="内容搜索",
			
 
				+            model=self.agent_executor.model,
			
 
				+            temperature=self.agent_executor.temperature,
			
 
				+            max_iterations=self.agent_executor.max_iterations,
			
 
				+            tools=["weixin_search"],
			
 
				+            skills=["content_finding_strategy"],
			
 
				+            extra_llm_params=self.agent_executor.extra_llm_params,
			
 
				+            knowledge=KnowledgeConfig(
			
 
				+                enable_extraction=False,
			
 
				+                enable_completion_extraction=False,
			
 
				+                enable_injection=False,
			
 
				+            ),
			
 
				+        )
			
 
				+
			
 
				+        dedup: Dict[str, CandidateArticle] = {item.url: item for item in ctx.candidate_articles}
			
 
				+        assistant_texts: List[str] = []
			
 
				+        assistant_reasonings: List[str] = []
			
 
				+        tool_interactions: List[Dict] = []
			
 
				+        tool_call_round = 0
			
 
				+        total_tokens = 0
			
 
				+        max_recall = self._max_recall(ctx)
			
 
				+        logger = logging.getLogger(__name__)
			
 
				+        logger.info("content_search(agent) 最大召回上限: %d 篇 (target=%d)", max_recall, ctx.target_count)
			
 
				+
			
 
				+        import time as _time
			
 
				+        t0 = _time.monotonic()
			
 
				+
			
 
				+        async for item in self.agent_executor.runner.run(messages=messages, config=config):
			
 
				+            if isinstance(item, Trace) and item.trace_id:
			
 
				+                ctx.trace_id = ctx.trace_id or item.trace_id
			
 
				+                stage_traces: List[Dict] = ctx.metadata.setdefault("_stage_agent_traces", [])
			
 
				+                stage_traces.append({"stage": "内容搜索", "agent_trace_id": item.trace_id})
			
 
				+            elif isinstance(item, Message):
			
 
				+                if item.role == "assistant":
			
 
				+                    content = item.content
			
 
				+                    text = content.get("text", "") if isinstance(content, dict) else str(content or "")
			
 
				+                    reasoning = content.get("reasoning_content", "") if isinstance(content, dict) else ""
			
 
				+                    tc = content.get("tool_calls", []) if isinstance(content, dict) else []
			
 
				+                    if text:
			
 
				+                        assistant_texts.append(text)
			
 
				+                    if reasoning:
			
 
				+                        assistant_reasonings.append(reasoning)
			
 
				+                    for call in (tc or []):
			
 
				+                        fn = call.get("function", {})
			
 
				+                        tool_interactions.append({
			
 
				+                            "tool_name": fn.get("name", ""),
			
 
				+                            "arguments": fn.get("arguments", ""),
			
 
				+                        })
			
 
				+                    if hasattr(item, "tokens") and item.tokens:
			
 
				+                        total_tokens += item.tokens
			
 
				+                elif item.role == "tool":
			
 
				+                    # 从 tool 消息中提取 weixin_search 返回的文章数据
			
 
				+                    content = item.content if isinstance(item.content, dict) else {}
			
 
				+                    tool_name = content.get("tool_name", "")
			
 
				+                    # 记录工具返回摘要
			
 
				+                    result_text = content.get("result", "")
			
 
				+                    if isinstance(result_text, list):
			
 
				+                        result_text = next(
			
 
				+                            (b.get("text", "") for b in result_text if isinstance(b, dict) and b.get("type") == "text"),
			
 
				+                            str(result_text),
			
 
				+                        )
			
 
				+                    if tool_interactions and tool_interactions[-1].get("tool_name") == tool_name:
			
 
				+                        tool_interactions[-1]["result_preview"] = str(result_text)[:500]
			
 
				+
			
 
				+                    if tool_name == "weixin_search":
			
 
				+                        tool_call_round += 1
			
 
				+                        articles = self._parse_articles_from_tool_output(str(result_text), tool_call_round)
			
 
				+                        for article in articles:
			
 
				+                            if len(dedup) >= max_recall:
			
 
				+                                break
			
 
				+                            if article.url not in dedup:
			
 
				+                                dedup[article.url] = article
			
 
				+
			
 
				+        duration_ms = int((_time.monotonic() - t0) * 1000)
			
 
				+
			
 
				+        # 记录 LLM 交互
			
 
				+        _append_llm_interaction(ctx, {
			
 
				+            "stage": ctx.current_stage,
			
 
				+            "name": "内容搜索",
			
 
				+            "messages": _compact_messages(messages),
			
 
				+            "response_text": "\n".join(assistant_texts),
			
 
				+            "reasoning": "\n".join(assistant_reasonings),
			
 
				+            "tool_calls": tool_interactions if tool_interactions else None,
			
 
				+            "duration_ms": duration_ms,
			
 
				+            "tokens": total_tokens,
			
 
				+            "model": self.agent_executor.model,
			
 
				+        })
			
 
				+
			
 
				+        ctx.candidate_articles = list(dedup.values())
			
 
				+
			
 
				+        # 尝试从 agent 最终输出提取 keyword_stats（可选）
			
 
				+        keyword_stats = []
			
 
				+        for text in reversed(assistant_texts):
			
 
				+            data = extract_json_object(text)
			
 
				+            if data is not None:
			
 
				+                keyword_stats = data.get("keyword_stats", [])
			
 
				+                break
			
 
				+        ctx.metadata["_search_keyword_stats"] = keyword_stats
			
 
				+        return ctx
			
 
				+
			
 
				+    @staticmethod
			
 
				+    def _parse_articles_from_tool_output(tool_text: str, round_num: int) -> List[CandidateArticle]:
			
 
				+        """从 weixin_search 工具输出文本中解析文章列表。"""
			
 
				+        # 工具输出格式: "搜索关键词「xxx」返回 N 条结果\n```json\n[...]\n```"
			
 
				+        # 提取搜索关键词
			
 
				+        keyword = ""
			
 
				+        kw_match = re.search(r"搜索关键词「(.+?)」", tool_text)
			
 
				+        if kw_match:
			
 
				+            keyword = kw_match.group(1)
			
 
				+
			
 
				+        # 提取 JSON 数组
			
 
				+        articles: List[CandidateArticle] = []
			
 
				+        json_match = _TOOL_JSON_RE.search(tool_text)
			
 
				+        if not json_match:
			
 
				+            return articles
			
 
				+        try:
			
 
				+            items = json.loads(json_match.group(1))
			
 
				+        except json.JSONDecodeError:
			
 
				+            return articles
			
 
				+
			
 
				+        for item in items:
			
 
				+            url = str(item.get("url", "")).strip()
			
 
				+            if not url:
			
 
				+                continue
			
 
				+            publish_time = 0
			
 
				+            stats = item.get("statistics") or {}
			
 
				+            if stats.get("time"):
			
 
				+                publish_time = int(stats["time"])
			
 
				+            articles.append(CandidateArticle(
			
 
				+                title=str(item.get("title", "")),
			
 
				+                url=url,
			
 
				+                publish_time=publish_time,
			
 
				+                view_count=0,
			
 
				+                source_keyword=keyword,
			
 
				+                recall_round=round_num,
			
 
				+                statistics=stats,
			
 
				+            ))
			
 
				+        return articles
			
 
				+
			
 
				+    # ─────────────────────────────────────────────
			
 
				+    # 代码驱动搜索（回退）
			
 
				+    # ─────────────────────────────────────────────
			
 
				+
			
 
				+    async def _code_search(self, ctx: PipelineContext) -> PipelineContext:
			
 
				+        """代码驱动搜索：按关键词依次调用 adapter.search。"""
			
 
				+        policy = ctx.metadata.get("search_agent_policy") or {}
			
 
				+        page = str(policy.get("initial_cursor", self.page))
			
 
				+        max_recall = self._max_recall(ctx)
			
 
				+        logger = logging.getLogger(__name__)
			
 
				+        logger.info("content_search(code) 最大召回上限: %d 篇 (target=%d)", max_recall, ctx.target_count)
			
 
				+
			
 
				+        fallback_round = ctx.metadata.get("_fallback_round", 0)
			
 
				+        keywords = self._build_keywords(ctx, fallback_round=fallback_round)
			
 
				+        dedup: Dict[str, CandidateArticle] = {item.url: item for item in ctx.candidate_articles}
			
 
				+        keyword_stats: List[Dict] = []
			
 
				+
			
 
				+        for index, keyword in enumerate(keywords, start=1):
			
 
				+            if len(dedup) >= max_recall:
			
 
				+                break
			
 
				+            before = len(dedup)
			
 
				+            articles = await self.adapter.search(keyword=keyword, page=page)
			
 
				+            for article in articles:
			
 
				+                article.source_keyword = keyword
			
 
				+                article.recall_round = index
			
 
				+                if article.url not in dedup:
			
 
				+                    dedup[article.url] = article
			
 
				+            keyword_stats.append({
			
 
				+                "keyword": keyword,
			
 
				+                "round": index,
			
 
				+                "returned": len(articles),
			
 
				+                "new": len(dedup) - before,
			
 
				+            })
			
 
				+
			
 
				+        ctx.candidate_articles = list(dedup.values())
			
 
				+        ctx.metadata["_search_keyword_stats"] = keyword_stats
			
 
				+        return ctx
			
 
				+
			
 
				+    def _build_keywords(self, ctx: PipelineContext, *, fallback_round: int = 0) -> List[str]:
			
 
				+        """
			
 
				+        构建搜索词队列。
			
 
				+
			
 
				+        来源：
			
 
				+        - demand_analysis 产出的精准词/主题词/上下层特征
			
 
				+        - policy.extra_keywords
			
 
				+        - 原始 query（兜底）
			
 
				+
			
 
				+        回退搜索（fallback_round >= 1）时：
			
 
				+        - 跳过上一轮已使用的关键词
			
 
				+        - 优先使用 topic_keywords 和 lower_features 中未使用的词
			
 
				+        - 加入 filter_focus.relevance_focus 作为补充搜索词
			
 
				+        - 增大搜索词数量上限
			
 
				+        """
			
 
				+        policy = ctx.metadata.get("search_agent_policy") or {}
			
 
				+        max_kw = int(policy.get("max_keywords", self.max_keywords))
			
 
				+        priority = policy.get("keyword_priority", "demand_first")
			
 
				+        extras = [str(x).strip() for x in (policy.get("extra_keywords") or []) if str(x).strip()]
			
 
				+
			
 
				+        analysis = ctx.demand_analysis
			
 
				+        assert analysis is not None
			
 
				+
			
 
				+        # 收集已使用的关键词（从上一轮搜索统计中获取）
			
 
				+        used_keywords: set[str] = set()
			
 
				+        if fallback_round >= 1:
			
 
				+            prev_stats: List[Dict] = ctx.metadata.get("_search_keyword_stats", [])
			
 
				+            used_keywords = {str(s.get("keyword", "")).strip() for s in prev_stats}
			
 
				+            # 回退轮增大关键词数量上限
			
 
				+            max_kw = max(max_kw, self.max_keywords) + 4
			
 
				+
			
 
				+        from_demand: List[str] = []
			
 
				+        if fallback_round >= 1:
			
 
				+            # 回退搜索：优先未用过的 topic_keywords 和 lower/upper features
			
 
				+            from_demand.extend(analysis.search_strategy.topic_keywords)
			
 
				+            from_demand.extend(analysis.lower_features)
			
 
				+            from_demand.extend(analysis.upper_features)
			
 
				+            from_demand.extend(analysis.search_strategy.precise_keywords)
			
 
				+            # 加入 filter_focus.relevance_focus 作为补充搜索词
			
 
				+            if analysis.filter_focus and analysis.filter_focus.relevance_focus:
			
 
				+                from_demand.extend(analysis.filter_focus.relevance_focus)
			
 
				+        else:
			
 
				+            from_demand.extend(analysis.search_strategy.precise_keywords)
			
 
				+            from_demand.extend(analysis.search_strategy.topic_keywords)
			
 
				+            from_demand.extend(analysis.lower_features)
			
 
				+            from_demand.extend(analysis.upper_features)
			
 
				+
			
 
				+        query = str(ctx.query).strip()
			
 
				+
			
 
				+        if priority == "query_first":
			
 
				+            ordered = [query] + extras + from_demand
			
 
				+        else:
			
 
				+            ordered = from_demand + extras + [query]
			
 
				+
			
 
				+        seen = set()
			
 
				+        keywords: List[str] = []
			
 
				+        for keyword in ordered:
			
 
				+            value = str(keyword).strip()
			
 
				+            if not value or value in seen:
			
 
				+                continue
			
 
				+            # 回退搜索时跳过上一轮已使用的关键词
			
 
				+            if fallback_round >= 1 and value in used_keywords:
			
 
				+                continue
			
 
				+            seen.add(value)
			
 
				+            keywords.append(value)
			
 
				+            if len(keywords) >= max_kw:
			
 
				+                break
			
 
				+        return keywords
			
--- a/src/pipeline/stages/demand_analysis.py
+++ b/src/pipeline/stages/demand_analysis.py
@@ -0,0 +1,128 @@
 
				+from __future__ import annotations
			
 
				+
			
 
				+"""需求理解阶段：把自然语言 query 转成结构化策略。"""
			
 
				+
			
 
				+from typing import List
			
 
				+
			
 
				+from src.pipeline.base import Stage
			
 
				+from src.pipeline.context import DemandAnalysisResult, FilterFocus, PipelineContext, SearchStrategy
			
 
				+from src.pipeline.stages.common import StageAgentExecutor
			
 
				+
			
 
				+
			
 
				+class DemandAnalysisStage(Stage):
			
 
				+    name = "demand_analysis"
			
 
				+    description = "需求理解与特征分层"
			
 
				+
			
 
				+    def __init__(self, agent_executor: StageAgentExecutor):
			
 
				+        self.agent_executor = agent_executor
			
 
				+
			
 
				+    async def execute(self, ctx: PipelineContext) -> PipelineContext:
			
 
				+        """
			
 
				+        运行需求分析子任务。
			
 
				+
			
 
				+        输入：
			
 
				+        - ctx.query
			
 
				+        - ctx.knowledge_sources（可选）
			
 
				+
			
 
				+        输出：
			
 
				+        - ctx.demand_analysis
			
 
				+        """
			
 
				+        knowledge_context = await _build_knowledge_context(ctx)
			
 
				+        messages = [
			
 
				+            {
			
 
				+                "role": "system",
			
 
				+                "content": (
			
 
				+                    "你是内容搜索任务的需求分析器。"
			
 
				+                    "你只做需求理解，不调用任何工具，不执行搜索。"
			
 
				+                    "你必须严格返回 JSON，并放在 ```json 代码块中。"
			
 
				+                ),
			
 
				+            },
			
 
				+            {
			
 
				+                "role": "user",
			
 
				+                "content": f"""
			
 
				+任务：分析搜索需求，拆出特征分层与搜索策略。
			
 
				+
			
 
				+原始 query: {ctx.query}
			
 
				+目标条数: {ctx.target_count}
			
 
				+
			
 
				+补充知识:
			
 
				+{knowledge_context or "无"}
			
 
				+
			
 
				+要求:
			
 
				+1. 只能用 query 中已有词语做归类，禁止编造核心特征。
			
 
				+2. 先区分 `实质特征` 与 `形式特征`。
			
 
				+3. 只对 `实质特征` 继续区分 `上层特征` 与 `下层特征`。
			
 
				+4. 输出 JSON:
			
 
				+```json
			
 
				+{{
			
 
				+  "特征归类": {{
			
 
				+    "实质特征": [],
			
 
				+    "形式特征": [],
			
 
				+    "上层特征": [],
			
 
				+    "下层特征": []
			
 
				+  }},
			
 
				+  "起点策略": {{
			
 
				+    "建议精准词直搜": true,
			
 
				+    "建议主题下钻": true,
			
 
				+    "精准词候选": [],
			
 
				+    "主题下钻候选": []
			
 
				+  }},
			
 
				+  "筛选关注点": {{
			
 
				+    "形式规则": [],
			
 
				+    "相关性关注点": [],
			
 
				+    "淘汰风险点": []
			
 
				+  }}
			
 
				+}}
			
 
				+```
			
 
				+""",
			
 
				+            },
			
 
				+        ]
			
 
				+        result = await self.agent_executor.run_json_stage(
			
 
				+            ctx=ctx,
			
 
				+            name="需求理解",
			
 
				+            messages=messages,
			
 
				+            allowed_tools=[],
			
 
				+            skills=["demand_analysis"],
			
 
				+        )
			
 
				+        feature_group = result.get("特征归类", {}) or {}
			
 
				+        start_strategy = result.get("起点策略", {}) or {}
			
 
				+        filter_focus = result.get("筛选关注点", {}) or {}
			
 
				+
			
 
				+        ctx.demand_analysis = DemandAnalysisResult(
			
 
				+            substantive_features=_ensure_list(feature_group.get("实质特征")),
			
 
				+            formal_features=_ensure_list(feature_group.get("形式特征")),
			
 
				+            upper_features=_ensure_list(feature_group.get("上层特征")),
			
 
				+            lower_features=_ensure_list(feature_group.get("下层特征")),
			
 
				+            search_strategy=SearchStrategy(
			
 
				+                precise_search=bool(start_strategy.get("建议精准词直搜", True)),
			
 
				+                topic_drill_down=bool(start_strategy.get("建议主题下钻", True)),
			
 
				+                precise_keywords=_ensure_list(start_strategy.get("精准词候选")),
			
 
				+                topic_keywords=_ensure_list(start_strategy.get("主题下钻候选")),
			
 
				+            ),
			
 
				+            filter_focus=FilterFocus(
			
 
				+                format_rules=_ensure_list(filter_focus.get("形式规则")),
			
 
				+                relevance_focus=_ensure_list(filter_focus.get("相关性关注点")),
			
 
				+                elimination_risks=_ensure_list(filter_focus.get("淘汰风险点")),
			
 
				+            ),
			
 
				+            raw_result=result,
			
 
				+        )
			
 
				+        return ctx
			
 
				+
			
 
				+
			
 
				+async def _build_knowledge_context(ctx: PipelineContext) -> str:
			
 
				+    """拉取知识源并拼接成 prompt 可消费文本。"""
			
 
				+    lines: List[str] = []
			
 
				+    for name, source in ctx.knowledge_sources.items():
			
 
				+        items = await source.query(ctx.query, top_k=3)
			
 
				+        if not items:
			
 
				+            continue
			
 
				+        lines.append(f"## {name}")
			
 
				+        lines.append(source.format_for_prompt(items))
			
 
				+    return "\n".join(lines)
			
 
				+
			
 
				+
			
 
				+def _ensure_list(value) -> List[str]:
			
 
				+    """把外部结果安全规范为字符串列表。"""
			
 
				+    if isinstance(value, list):
			
 
				+        return [str(item) for item in value if str(item).strip()]
			
 
				+    return []
			
--- a/src/pipeline/stages/output_persist.py
+++ b/src/pipeline/stages/output_persist.py
@@ -0,0 +1,78 @@
 
				+from __future__ import annotations
			
 
				+
			
 
				+"""输出阶段：组装标准结果并落盘 output.json。"""
			
 
				+
			
 
				+import json
			
 
				+from pathlib import Path
			
 
				+from typing import List
			
 
				+
			
 
				+from src.pipeline.base import Stage
			
 
				+from src.pipeline.context import OutputSummary, PipelineContext, PipelineOutput
			
 
				+
			
 
				+
			
 
				+class OutputPersistStage(Stage):
			
 
				+    name = "output_persist"
			
 
				+    description = "生成标准输出并写入 output.json"
			
 
				+
			
 
				+    def validate_input(self, ctx: PipelineContext) -> List[str]:
			
 
				+        """落盘前必要字段校验（filtered_articles 允许为空，代表无匹配结果）。"""
			
 
				+        errors: List[str] = []
			
 
				+        if not ctx.trace_id:
			
 
				+            errors.append("缺少 trace_id")
			
 
				+        if not ctx.output_dir:
			
 
				+            errors.append("缺少 output_dir")
			
 
				+        return errors
			
 
				+
			
 
				+    async def execute(self, ctx: PipelineContext) -> PipelineContext:
			
 
				+        """
			
 
				+        构造 PipelineOutput 并写入 `{output_dir}/{trace_id}/output.json`。
			
 
				+
			
 
				+        注意：这里只负责写文件，不做 schema 深度校验；一致性由 OutputSchemaGate 保证。
			
 
				+        """
			
 
				+        summary = OutputSummary(
			
 
				+            candidate_count=len(ctx.candidate_articles),
			
 
				+            filtered_in_count=len(ctx.filtered_articles),
			
 
				+            account_count=len(ctx.accounts),
			
 
				+        )
			
 
				+        output = PipelineOutput(
			
 
				+            trace_id=ctx.trace_id or "",
			
 
				+            query=ctx.query,
			
 
				+            demand_id=ctx.demand_id,
			
 
				+            summary=summary,
			
 
				+            contents=[
			
 
				+                {
			
 
				+                    "title": item.title,
			
 
				+                    "url": item.url,
			
 
				+                    "statistics": {"time": item.publish_time},
			
 
				+                    "reason": item.reason,
			
 
				+                }
			
 
				+                for item in ctx.filtered_articles
			
 
				+            ],
			
 
				+            accounts=[
			
 
				+                {
			
 
				+                    "wx_gh": item.wx_gh,
			
 
				+                    "account_name": item.account_name,
			
 
				+                    "channel_account_id": item.channel_account_id,
			
 
				+                    "biz_info": item.biz_info,
			
 
				+                    "article_count": item.article_count,
			
 
				+                    "sample_articles": item.sample_articles,
			
 
				+                    "source_urls": item.source_urls,
			
 
				+                }
			
 
				+                for item in ctx.accounts
			
 
				+            ],
			
 
				+            article_account_relations=[
			
 
				+                {
			
 
				+                    "article_url": item.article_url,
			
 
				+                    "wx_gh": item.wx_gh,
			
 
				+                }
			
 
				+                for item in ctx.article_account_relations
			
 
				+            ],
			
 
				+        )
			
 
				+        ctx.output = output
			
 
				+
			
 
				+        target_dir = Path(ctx.output_dir) / (ctx.trace_id or "")
			
 
				+        target_dir.mkdir(parents=True, exist_ok=True)
			
 
				+        output_file = target_dir / "output.json"
			
 
				+        output_file.write_text(json.dumps(output.to_dict(), ensure_ascii=False, indent=2), encoding="utf-8")
			
 
				+        ctx.metadata["output_file"] = str(output_file)
			
 
				+        return ctx
			
--- a/tests/content_finder.prompt
+++ b/tests/content_finder.prompt
@@ -1,77 +0,0 @@
 
				----
			
 
				-model: sonnet-4.6
			
 
				-temperature: 0.3
			
 
				----
			
 
				-
			
 
				-$system$
			
 
				-你是一个专业的内容寻找助手，帮助运营人员在微信平台上寻找符合要求的文章内容。
			
 
				-
			
 
				-## 重要约束
			
 
				-- 只在微信平台搜索，不要切换到其他平台（小红书、B站、抖音等）
			
 
				-- 可用工具：`wechat_search`、`fetch_weixin_account`、`fetch_account_article_list`、`fetch_article_detail`
			
 
				-- **严格禁止**调用任何名称以 `browser_` 开头的浏览器工具
			
 
				-
			
 
				-## 平台背景
			
 
				-- 平台载体：微信公众号
			
 
				-- 核心用户群：95% 是 50 岁以上中老年人
			
 
				-- 增长方式：微信文章阅读率
			
 
				-- 核心指标：阅读量
			
 
				-
			
 
				-## 执行流程（按顺序，禁止跳步）
			
 
				-1. **搜索阶段（article_finding_strategy）**
			
 
				-   - 基于 `%query%` 提取关键词并串行搜索。
			
 
				-   - 搜索结果只从工具返回的结构化字段读取，不解析工具 output 文本。
			
 
				-   - 先收集候选文章池（数量建议为目标数量 M 的 2 倍）。
			
 
				-
			
 
				-2. **过滤阶段（article_filter_strategy）**
			
 
				-   - 输入：`input_query=%query%` + 候选文章池。
			
 
				-   - 对候选文章调用 `fetch_article_detail` 获取详情后再过滤。
			
 
				-   - 按三段执行：
			
 
				-     - 相关性判断：文章与 `input_query` 的核心意图、对象、场景是否匹配。
			
 
				-     - 硬性淘汰：低质、夸大、明显跑题内容直接剔除。
			
 
				-     - 兴趣评分：以 query 需求价值为核心，结合可读性、可信度、情感适配、时效性综合排序。
			
 
				-   - 若筛选后数量不足（C < M × 0.8），回到搜索阶段补充候选再筛选。
			
 
				-
			
 
				-3. **账号沉淀阶段（account_precipitation）**
			
 
				-   - 对通过过滤的文章逐条调用 `fetch_weixin_account` 获取账号信息。
			
 
				-   - 账号聚合去重规则：优先 `wx_gh`，缺失时用 `account_name` 兜底。
			
 
				-   - 统计账号命中文章数与代表文章（用于内部沉淀与后续复用）。
			
 
				-   - 生成账号结果 `accounts` 与文章-账号关系 `article_account_relations`。
			
 
				-
			
 
				-4. **数据记录阶段（本地）**
			
 
				-   - 先按 `output_schema` 生成并写入 `output.json`（JSON 格式）。
			
 
				-   - 输出必须包含：文章结果 + 账号结果 + 文章-账号关系。
			
 
				-   - 本阶段仅保存本地文件，不写数据库。
			
 
				-
			
 
				-## 强制要求（违反即为错误）
			
 
				-
			
 
				-### 输出字段必须严格遵循 Schema
			
 
				-- 顶层字段只能有：`trace_id`、`query`、`demand_id`、`summary`、`contents`、`accounts`、`article_account_relations`
			
 
				-- `contents` 每条字段只能有：`title`、`url`、`statistics`、`reason`
			
 
				-- `accounts` 每条字段只能有：`wx_gh`、`account_name`、`channel_account_id`、`biz_info`、`article_count`、`sample_articles`、`source_urls`
			
 
				-- `article_account_relations` 每条字段只能有：`article_url`、`wx_gh`
			
 
				-- **禁止自创字段**（如 `results`、`metrics`、`tags`、`platform` 等）
			
 
				-- **禁止使用中文 key**
			
 
				-- `summary` 必须包含：`candidate_count`、`filtered_in_count`、`account_count`
			
 
				-- `title` 中英文双引号 `"` 必须标准化为中文双引号 `“`、`”`
			
 
				-
			
 
				-## 流程自检
			
 
				-
			
 
				-**在宣称任务完成或结束对话前，必须逐项确认；任一项未满足则继续执行，不得提前收尾。**
			
 
				-
			
 
				-### 输出、校验、入库顺序是否正确
			
 
				-- 无需写数据库，直接写入 `output.json` 即可。
			
 
				-- 必须输出沉淀的账号以及文章-账号关系
			
 
				-- **禁止**：未校验 Schema 就直接入库。
			
 
				-
			
 
				-
			
 
				-$user$
			
 
				-任务：找 20 个与「%query%」相关的、老年人感兴趣的文章。
			
 
				-要求：
			
 
				-- 适合老年人分享观看
			
 
				-- 热度要高，质量要好
			
 
				-
			
 
				-搜索词: %query%
			
 
				-搜索词id: %demand_id%（如有）
			
 
				-
			
 
				-请开始执行内容寻找任务。记住要多步推理，每次只执行一小步，然后思考下一步该做什么。
			
--- a/tests/run_single.py
+++ b/tests/run_single.py
@@ -1,10 +1,179 @@
 
				-from dotenv import load_dotenv
			
 
				-load_dotenv()
			
 
				+"""
			
 
				+内容寻找 Agent 执行入口
			
 
				 
			
 
				-from typing import Dict, Any, Optional
			
 
				+日志设计：
			
 
				+- agent.log: JSONL 格式的完整执行追踪（无截断），用于 HTML 可视化
			
 
				+- 控制台: 人类可读的简要输出
			
 
				+"""
			
 
				+
			
 
				+import logging
			
 
				+import sys
			
 
				 import os
			
 
				-from pathlib import Path
			
 
				 import json
			
 
				+import traceback as tb_mod
			
 
				+from typing import Dict, Any, Optional, List
			
 
				+from pathlib import Path
			
 
				+from datetime import datetime
			
 
				+
			
 
				+PROJECT_ROOT = Path(__file__).resolve().parent
			
 
				+log_dir = PROJECT_ROOT / '.cache'
			
 
				+log_dir.mkdir(exist_ok=True)
			
 
				+
			
 
				+TRACE_LOG_PATH = log_dir / 'agent.log'
			
 
				+
			
 
				+
			
 
				+# ============================================================
			
 
				+# TraceWriter: 结构化 JSONL 追踪日志（写入 agent.log，无截断）
			
 
				+# ============================================================
			
 
				+
			
 
				+class TraceWriter:
			
 
				+    """将 Agent 执行的每一步写为 JSONL 事件到 agent.log，不做任何截断。"""
			
 
				+
			
 
				+    def __init__(self, path: Path):
			
 
				+        self._file = open(path, 'w', encoding='utf-8')
			
 
				+        self._iteration = 0
			
 
				+
			
 
				+    def _ts(self) -> str:
			
 
				+        return datetime.now().strftime("%Y-%m-%d %H:%M:%S.%f")[:-3]
			
 
				+
			
 
				+    def _write(self, event: dict):
			
 
				+        event["ts"] = self._ts()
			
 
				+        self._file.write(json.dumps(event, ensure_ascii=False) + '\n')
			
 
				+        self._file.flush()
			
 
				+
			
 
				+    # --- 任务生命周期 ---
			
 
				+
			
 
				+    def log_init(self, query: str, demand_id: int, model: str, trace_id: str = None):
			
 
				+        self._write({
			
 
				+            "type": "init",
			
 
				+            "query": query,
			
 
				+            "demand_id": demand_id,
			
 
				+            "model": model,
			
 
				+            "trace_id": trace_id,
			
 
				+        })
			
 
				+
			
 
				+    def log_complete(self, trace_id: str, status: str, error: str = None):
			
 
				+        self._write({
			
 
				+            "type": "complete",
			
 
				+            "trace_id": trace_id,
			
 
				+            "status": status,
			
 
				+            "error": error,
			
 
				+            "total_iterations": self._iteration,
			
 
				+        })
			
 
				+
			
 
				+    # --- 框架日志 ---
			
 
				+
			
 
				+    def log_framework(self, logger_name: str, level: str, message: str):
			
 
				+        self._write({
			
 
				+            "type": "framework",
			
 
				+            "logger": logger_name,
			
 
				+            "level": level,
			
 
				+            "msg": message,
			
 
				+        })
			
 
				+
			
 
				+    # --- Agent 思考 / LLM 输出 ---
			
 
				+
			
 
				+    def log_assistant(self, text: str, tool_calls: List[dict] = None,
			
 
				+                      reasoning: str = None, tokens: dict = None):
			
 
				+        """记录 LLM 的完整输出（思考文本 + 工具调用，不截断）。"""
			
 
				+        self._iteration += 1
			
 
				+
			
 
				+        parsed_calls = []
			
 
				+        for tc in (tool_calls or []):
			
 
				+            func = tc.get("function", {})
			
 
				+            name = func.get("name", "unknown")
			
 
				+            args_str = func.get("arguments", "{}")
			
 
				+            try:
			
 
				+                params = json.loads(args_str)
			
 
				+            except (json.JSONDecodeError, TypeError):
			
 
				+                params = args_str
			
 
				+            parsed_calls.append({
			
 
				+                "name": name,
			
 
				+                "params": params,
			
 
				+                "call_id": tc.get("id", ""),
			
 
				+            })
			
 
				+
			
 
				+        self._write({
			
 
				+            "type": "assistant",
			
 
				+            "iteration": self._iteration,
			
 
				+            "text": text or "",
			
 
				+            "tool_calls": parsed_calls,
			
 
				+            "reasoning": reasoning or "",
			
 
				+            "tokens": tokens or {},
			
 
				+        })
			
 
				+
			
 
				+    # --- 工具结果 ---
			
 
				+
			
 
				+    def log_tool_result(self, tool_name: str, result: Any, call_id: str = ""):
			
 
				+        """记录工具的完整返回（不截断）。"""
			
 
				+        if isinstance(result, list):
			
 
				+            texts = [
			
 
				+                p.get("text", "") for p in result
			
 
				+                if isinstance(p, dict) and p.get("type") == "text"
			
 
				+            ]
			
 
				+            output = "\n".join(texts) if texts else json.dumps(result, ensure_ascii=False)
			
 
				+        elif isinstance(result, str):
			
 
				+            output = result
			
 
				+        else:
			
 
				+            output = str(result)
			
 
				+
			
 
				+        self._write({
			
 
				+            "type": "tool_result",
			
 
				+            "name": tool_name,
			
 
				+            "call_id": call_id,
			
 
				+            "output": output,
			
 
				+        })
			
 
				+
			
 
				+    # --- 错误 ---
			
 
				+
			
 
				+    def log_error(self, message: str, traceback_str: str = ""):
			
 
				+        self._write({
			
 
				+            "type": "error",
			
 
				+            "msg": message,
			
 
				+            "traceback": traceback_str,
			
 
				+        })
			
 
				+
			
 
				+    def close(self):
			
 
				+        if self._file and not self._file.closed:
			
 
				+            self._file.close()
			
 
				+
			
 
				+
			
 
				+class JsonlLogHandler(logging.Handler):
			
 
				+    """将 Python logging 记录路由到 TraceWriter（JSONL 格式）。"""
			
 
				+
			
 
				+    def __init__(self, trace_writer: TraceWriter):
			
 
				+        super().__init__()
			
 
				+        self.trace_writer = trace_writer
			
 
				+
			
 
				+    def emit(self, record: logging.LogRecord):
			
 
				+        try:
			
 
				+            self.trace_writer.log_framework(
			
 
				+                record.name,
			
 
				+                record.levelname,
			
 
				+                record.getMessage(),
			
 
				+            )
			
 
				+        except Exception:
			
 
				+            pass
			
 
				+
			
 
				+
			
 
				+# ============================================================
			
 
				+# 控制台日志（人类可读）
			
 
				+# ============================================================
			
 
				+
			
 
				+console_handler = logging.StreamHandler(sys.stdout)
			
 
				+console_handler.setFormatter(
			
 
				+    logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
			
 
				+)
			
 
				+logging.basicConfig(level=logging.INFO, handlers=[console_handler], force=True)
			
 
				+logger = logging.getLogger(__name__)
			
 
				+
			
 
				+
			
 
				+# ============================================================
			
 
				+# 第三方 / 项目依赖导入（放在 logging 配置之后）
			
 
				+# ============================================================
			
 
				+
			
 
				+from dotenv import load_dotenv
			
 
				+load_dotenv()
			
 
				 
			
 
				 from tools import fetch_account_article_list, fetch_weixin_account, weixin_search
			
 
				 
			
@@ -13,18 +182,16 @@ from agent.llm import create_openrouter_llm_call
 
				 from agent.llm.prompts import SimplePrompt
			
 
				 from agent.tools.builtin.knowledge import KnowledgeConfig
			
 
				 
			
 
				-# 默认搜索词
			
 
				 DEFAULT_QUERY = "伊朗、以色列、和平是永恒的主题"
			
 
				 DEFAULT_DEMAND_ID = 1
			
 
				 
			
 
				-import logging
			
 
				-
			
 
				-logger = logging.getLogger(__name__)
			
 
				-PROJECT_ROOT = Path(__file__).resolve().parent
			
 
				 
			
 
				+# ============================================================
			
 
				+# 工具函数
			
 
				+# ============================================================
			
 
				 
			
 
				 def _normalize_ascii_double_quotes(text: str) -> str:
			
 
				-    """将字符串中的 ASCII 双引号 `"` 规范化为中文双引号 `“`、`”`。"""
			
 
				+    """将字符串中的 ASCII 双引号 `"` 规范化为中文双引号 `"`、`"`。"""
			
 
				     if '"' not in text:
			
 
				         return text
			
 
				 
			
@@ -32,7 +199,7 @@ def _normalize_ascii_double_quotes(text: str) -> str:
 
				     open_quote = True
			
 
				     for ch in text:
			
 
				         if ch == '"':
			
 
				-            chars.append("“" if open_quote else "”")
			
 
				+            chars.append("\u201c" if open_quote else "\u201d")
			
 
				             open_quote = not open_quote
			
 
				         else:
			
 
				             chars.append(ch)
			
@@ -50,11 +217,6 @@ def _sanitize_json_strings(value: Any) -> Any:
 
				 
			
 
				 
			
 
				 def _sanitize_output_json(output_json_path: Path) -> None:
			
 
				-    """
			
 
				-    任务完成后对 output.json 做后处理：
			
 
				-    - 递归清洗所有字符串值中的英文双引号 `"`
			
 
				-    - 保持合法 JSON
			
 
				-    """
			
 
				     if not output_json_path.exists():
			
 
				         logger.warning(f"未找到 output.json，跳过清洗: {output_json_path}")
			
 
				         return
			
@@ -73,41 +235,54 @@ def _sanitize_output_json(output_json_path: Path) -> None:
 
				     logger.info(f"已完成 output.json 引号清洗: {output_json_path}")
			
 
				 
			
 
				 
			
 
				+# ============================================================
			
 
				+# 控制台流式输出（简洁版）
			
 
				+# ============================================================
			
 
				+
			
 
				+def _print_assistant(text: str, tool_calls: list):
			
 
				+    """向控制台打印 Agent 输出的摘要。"""
			
 
				+    if text:
			
 
				+        logger.info("\n%s", text)
			
 
				+    for tc in (tool_calls or []):
			
 
				+        name = tc.get("function", {}).get("name", "unknown")
			
 
				+        if name not in ("goal", "get_current_context"):
			
 
				+            logger.info("[工具] %s", name)
			
 
				+
			
 
				+
			
 
				+def _print_tool_result(tool_name: str):
			
 
				+    """向控制台打印工具结果标记。"""
			
 
				+    if tool_name not in ("goal", "get_current_context"):
			
 
				+        logger.info("[结果] %s ✓", tool_name)
			
 
				+
			
 
				+
			
 
				+# ============================================================
			
 
				+# Agent 执行
			
 
				+# ============================================================
			
 
				+
			
 
				 async def run_agent(
			
 
				         query: Optional[str] = None,
			
 
				         demand_id: Optional[int] = None,
			
 
				         stream_output: bool = True,
			
 
				 ) -> Dict[str, Any]:
			
 
				-    """
			
 
				-    执行 agent 任务
			
 
				-
			
 
				-    Args:
			
 
				-        query: 查询内容（搜索词），None 则使用默认值
			
 
				-        demand_id: 本次搜索任务 id（int，关联 demand_content 表）
			
 
				-        stream_output: 是否流式输出到 stdout（run.py 需要，server.py 不需要）
			
 
				-
			
 
				-    Returns:
			
 
				-        {
			
 
				-            "trace_id": "20260317_103046_xyz789",
			
 
				-            "status": "completed" | "failed",
			
 
				-            "error": "错误信息"  # 失败时
			
 
				-        }
			
 
				-    """
			
 
				     query = query or DEFAULT_QUERY
			
 
				     demand_id = demand_id or DEFAULT_DEMAND_ID
			
 
				 
			
 
				-    # 加载 prompt
			
 
				-    prompt_path = PROJECT_ROOT / "content_finder.prompt"
			
 
				+    # 创建 TraceWriter → agent.log（JSONL，完整无截断）
			
 
				+    tw = TraceWriter(TRACE_LOG_PATH)
			
 
				+
			
 
				+    # 将 Python logging 也路由到 JSONL
			
 
				+    jsonl_handler = JsonlLogHandler(tw)
			
 
				+    jsonl_handler.setLevel(logging.DEBUG)
			
 
				+    logging.getLogger().addHandler(jsonl_handler)
			
 
				+
			
 
				+    prompt_path = PROJECT_ROOT / "content_finder.md"
			
 
				     prompt = SimplePrompt(prompt_path)
			
 
				 
			
 
				-    # output 目录
			
 
				     output_dir = str(PROJECT_ROOT / "output")
			
 
				 
			
 
				-    # 构建消息（替换 %query%、%output_dir%、%demand_id%）
			
 
				     demand_id_str = str(demand_id) if demand_id is not None else ""
			
 
				     messages = prompt.build_messages(query=query, output_dir=output_dir, demand_id=demand_id_str)
			
 
				 
			
 
				-    # 初始化配置
			
 
				     api_key = os.getenv("OPEN_ROUTER_API_KEY")
			
 
				     if not api_key:
			
 
				         raise ValueError("OPEN_ROUTER_API_KEY 未设置")
			
@@ -117,11 +292,9 @@ async def run_agent(
 
				     temperature = float(prompt.config.get("temperature", 0.3))
			
 
				     max_iterations = 30
			
 
				     trace_dir = str(PROJECT_ROOT / "traces")
			
 
				-
			
 
				     skills_dir = str(PROJECT_ROOT / "skills")
			
 
				 
			
 
				     Path(trace_dir).mkdir(parents=True, exist_ok=True)
			
 
				-
			
 
				     store = FileSystemTraceStore(base_path=trace_dir)
			
 
				 
			
 
				     allowed_tools = [
			
@@ -148,102 +321,131 @@ async def run_agent(
 
				             enable_extraction=False,
			
 
				             enable_completion_extraction=False,
			
 
				             enable_injection=False,
			
 
				-            # owner="content_finder_agent",
			
 
				-            # default_tags={"project": "content_finder"},
			
 
				-            # default_scopes=["com.piaoquantv.supply"],
			
 
				-            # default_search_types=["tool", "usecase", "definition"],
			
 
				-            # default_search_owner="content_finder_agent"
			
 
				         )
			
 
				     )
			
 
				 
			
 
				-    # 执行
			
 
				+    tw.log_init(query, demand_id, model)
			
 
				+
			
 
				     trace_id = None
			
 
				 
			
 
				     try:
			
 
				         async for item in runner.run(messages=messages, config=config):
			
 
				+
			
 
				+            # ---------- Trace 对象 ----------
			
 
				             if isinstance(item, Trace):
			
 
				                 trace_id = item.trace_id
			
 
				+                tw._write({"type": "trace_status", "trace_id": trace_id, "status": item.status})
			
 
				 
			
 
				                 if item.status == "completed":
			
 
				                     if trace_id:
			
 
				                         output_json_path = Path(output_dir) / trace_id / "output.json"
			
 
				                         _sanitize_output_json(output_json_path)
			
 
				+                    tw.log_complete(trace_id, "completed")
			
 
				                     logger.info(f"Agent 执行完成: trace_id={trace_id}")
			
 
				-                    return {
			
 
				-                        "trace_id": trace_id,
			
 
				-                        "status": "completed"
			
 
				-                    }
			
 
				+                    return {"trace_id": trace_id, "status": "completed"}
			
 
				+
			
 
				                 elif item.status == "failed":
			
 
				+                    tw.log_complete(trace_id, "failed", item.error_message)
			
 
				                     logger.error(f"Agent 执行失败: {item.error_message}")
			
 
				-                    return {
			
 
				-                        "trace_id": trace_id,
			
 
				-                        "status": "failed",
			
 
				-                        "error": item.error_message
			
 
				-                    }
			
 
				-
			
 
				-            elif isinstance(item, Message) and stream_output:
			
 
				-                # 流式输出（仅 run.py 需要）
			
 
				+                    return {"trace_id": trace_id, "status": "failed", "error": item.error_message}
			
 
				+
			
 
				+            # ---------- Message 对象 ----------
			
 
				+            elif isinstance(item, Message):
			
 
				+
			
 
				+                # --- Assistant 消息（思考 + 工具调用）---
			
 
				                 if item.role == "assistant":
			
 
				                     content = item.content
			
 
				                     if isinstance(content, dict):
			
 
				                         text = content.get("text", "")
			
 
				                         tool_calls = content.get("tool_calls", [])
			
 
				+                        reasoning = content.get("reasoning_content", "")
			
 
				+
			
 
				+                        # JSONL: 完整记录，不截断
			
 
				+                        tw.log_assistant(
			
 
				+                            text=text,
			
 
				+                            tool_calls=tool_calls,
			
 
				+                            reasoning=reasoning,
			
 
				+                            tokens={
			
 
				+                                "prompt": getattr(item, "prompt_tokens", None),
			
 
				+                                "completion": getattr(item, "completion_tokens", None),
			
 
				+                            },
			
 
				+                        )
			
 
				+
			
 
				+                        # 控制台：简要输出
			
 
				+                        if stream_output:
			
 
				+                            _print_assistant(text, tool_calls)
			
 
				 
			
 
				-                        if text:
			
 
				-                            # 如果有推荐结果，完整输出
			
 
				-                            if len(text) > 500 and ("推荐结果" in text or "推荐内容" in text or "🎯" in text):
			
 
				-                                print(f"\n{text}")
			
 
				-                            # 如果有工具调用且文本较短，只输出摘要
			
 
				-                            elif tool_calls and len(text) > 100:
			
 
				-                                print(f"[思考] {text[:100]}...")
			
 
				-                            # 其他情况输出完整文本
			
 
				-                            else:
			
 
				-                                print(f"\n{text}")
			
 
				-
			
 
				-                        # 输出工具调用信息
			
 
				-                        if tool_calls:
			
 
				-                            for tc in tool_calls:
			
 
				-                                tool_name = tc.get("function", {}).get("name", "unknown")
			
 
				-                                # 跳过 goal 工具的输出，减少噪音
			
 
				-                                if tool_name != "goal":
			
 
				-                                    print(f"[工具] {tool_name}")
			
 
				                     elif isinstance(content, str) and content:
			
 
				-                        print(f"\n{content}")
			
 
				+                        tw.log_assistant(text=content)
			
 
				+                        if stream_output:
			
 
				+                            logger.info("\n%s", content)
			
 
				 
			
 
				+                # --- Tool 消息（工具返回）---
			
 
				                 elif item.role == "tool":
			
 
				                     content = item.content
			
 
				                     if isinstance(content, dict):
			
 
				                         tool_name = content.get("tool_name", "unknown")
			
 
				-                        print(f"[结果] {tool_name} ✓")
			
 
				-
			
 
				-        # 如果循环结束但没有返回，说明异常退出
			
 
				-        return {
			
 
				-            "trace_id": trace_id,
			
 
				-            "status": "failed",
			
 
				-            "error": "Agent 异常退出"
			
 
				-        }
			
 
				+                        result = content.get("result", "")
			
 
				+                        error = content.get("error")
			
 
				+
			
 
				+                        # JSONL: 完整记录，不截断
			
 
				+                        if error:
			
 
				+                            tw.log_error(
			
 
				+                                message=f"Tool {tool_name}: {error}",
			
 
				+                            )
			
 
				+                        else:
			
 
				+                            tw.log_tool_result(
			
 
				+                                tool_name=tool_name,
			
 
				+                                result=result,
			
 
				+                                call_id=item.tool_call_id or "",
			
 
				+                            )
			
 
				+
			
 
				+                        # 控制台：简要标记
			
 
				+                        if stream_output:
			
 
				+                            _print_tool_result(tool_name)
			
 
				+
			
 
				+        # 循环正常结束但未返回
			
 
				+        tw.log_complete(trace_id, "failed", "Agent 异常退出（循环结束未返回结果）")
			
 
				+        return {"trace_id": trace_id, "status": "failed", "error": "Agent 异常退出"}
			
 
				 
			
 
				     except KeyboardInterrupt:
			
 
				         logger.info("用户中断")
			
 
				+        tw.log_complete(trace_id, "interrupted", "用户中断")
			
 
				         if stream_output:
			
 
				-            print("\n用户中断")
			
 
				-        return {
			
 
				-            "trace_id": trace_id,
			
 
				-            "status": "failed",
			
 
				-            "error": "用户中断"
			
 
				-        }
			
 
				+            logger.info("用户中断")
			
 
				+        return {"trace_id": trace_id, "status": "failed", "error": "用户中断"}
			
 
				+
			
 
				     except Exception as e:
			
 
				+        tb_str = tb_mod.format_exc()
			
 
				         logger.error(f"Agent 执行异常: {e}", exc_info=True)
			
 
				+        tw.log_error(str(e), tb_str)
			
 
				+        tw.log_complete(trace_id, "failed", str(e))
			
 
				         if stream_output:
			
 
				-            print(f"\n执行失败: {e}")
			
 
				-        return {
			
 
				-            "trace_id": trace_id,
			
 
				-            "status": "failed",
			
 
				-            "error": str(e)
			
 
				-        }
			
 
				+            logger.error("执行失败: %s", e)
			
 
				+        return {"trace_id": trace_id, "status": "failed", "error": str(e)}
			
 
				+
			
 
				+    finally:
			
 
				+        logging.getLogger().removeHandler(jsonl_handler)
			
 
				+        tw.close()
			
 
				+
			
 
				+
			
 
				+async def main():
			
 
				+    try:
			
 
				+        result = await run_agent(query=None, demand_id=None, stream_output=True)
			
 
				+
			
 
				+        if result["status"] == "completed":
			
 
				+            logger.info(f"[完成] trace_id={result['trace_id']}")
			
 
				+        else:
			
 
				+            logger.error(f"[失败] trace_id={result.get('trace_id')}, 错误: {result.get('error')}")
			
 
				+            sys.exit(1)
			
 
				+
			
 
				+    except KeyboardInterrupt:
			
 
				+        logger.info("用户中断")
			
 
				+    except Exception as e:
			
 
				+        logger.error(f"执行失败: {e}", exc_info=True)
			
 
				+        sys.exit(1)
			
 
				 
			
 
				 
			
 
				 if __name__ == "__main__":
			
 
				     import asyncio
			
 
				-    asyncio.run(run_agent())
			
 
				-
			
 
				+    asyncio.run(main())
			
--- a/tests/skills/account_precipitation.md
+++ b/tests/skills/account_precipitation.md
@@ -1,119 +1,154 @@
 
				 ---
			
 
				 name: account_precipitation
			
 
				-description: 账号沉淀策略（公众号账号信息提取与聚合）
			
 
				+description: 账号沉淀策略（Harness 架构：biz 批量合并 + 质量分级）
			
 
				 ---
			
 
				 
			
 
				-# 账号沉淀策略（公众号账号信息提取与聚合）
			
 
				+# 账号沉淀策略
			
 
				 
			
 
				-## 目标
			
 
				+---
			
 
				 
			
 
				-基于已筛选文章，调用 `fetch_weixin_account` 获取公众号账号信息，完成账号层面的沉淀与聚合，形成可复用的“优质账号池”。
			
 
				+## ⚡ Harness: Fallback — 前置验证（快速失败）
			
 
				 
			
 
				----
			
 
				+在调用任何账号工具前，先验证以下前置条件。**任一失败则立即终止。**
			
 
				 
			
 
				-## 输入与输出
			
 
				-
			
 
				-### 输入
			
 
				-- `filtered_articles`：`array<object>`，建议来自 `article_filter_strategy` 输出
			
 
				-- `filtered_articles[i]` 必须包含：
			
 
				-  - `url`：`string`，完整文章链接（原样透传）
			
 
				-  - `title`：`string`，文章标题
			
 
				-
			
 
				-### 输入强约束（必须满足）
			
 
				-- 逐条以文章 `url` 作为 `fetch_weixin_account` 的 `content_link` 入参
			
 
				-- 禁止把 `wx_gh` 当作 `fetch_weixin_account` 的输入参数
			
 
				-- 禁止修改、截断、重写 `url`
			
 
				-- 任一条缺失 `url` 或 `title`，该条直接跳过
			
 
				-
			
 
				-### 输出
			
 
				-- `accounts`：`array<object>`，去重后的账号列表
			
 
				-- `accounts[i]` 固定字段：
			
 
				-  - `account_name`：`string`
			
 
				-  - `wx_gh`：`string`（公众号 ID）
			
 
				-  - `channel_account_id`：`string | int | null`
			
 
				-  - `biz_info`：`object | null`
			
 
				-  - `article_count`：`int`
			
 
				-  - `sample_articles`：`array<string>`（最多 5 条）
			
 
				-  - `source_urls`：`array<string>`（可追溯来源链接）
			
 
				-
			
 
				-### 输出强约束（必须满足）
			
 
				-- 仅允许输出上述字段，禁止额外字段
			
 
				-- `account_name/wx_gh/channel_account_id/biz_info` 必须来自 `metadata.account_info`
			
 
				-- `article_count` 必须等于该账号聚合到的文章数
			
 
				-- `source_urls` 必须为真实输入文章链接，不得编造
			
 
				+| 检查项 | 通过条件 | 失败处理 |
			
 
				+|---|---|---|
			
 
				+| 筛选文章非空 | `filtered_articles` 长度 ≥ 1 | 终止，输出空账号列表 `{"accounts": []}` |
			
 
				+| 账号工具可用 | `fetch_weixin_account` 存在 | 终止，告知用户"账号工具不可用" |
			
 
				+| 每条文章有 url | `url` 字段非空 | 跳过缺失 url 的文章，其余继续 |
			
 
				 
			
 
				 ---
			
 
				 
			
 
				-## 工具使用规则
			
 
				+## 📋 Harness: Planner — 执行计划（开始前打印）
			
 
				 
			
 
				-### 必用工具
			
 
				-- `fetch_weixin_account`：根据文章查询其所属公众号账号信息
			
 
				+```
			
 
				+[AccountPlanner]
			
 
				+  输入文章数           = {len(filtered_articles)}
			
 
				+  biz 分组数           = {distinct_biz_count}（相同 biz 合并调用，节省 API）
			
 
				+  无 biz 文章数        = {no_biz_count}（每篇单独调用）
			
 
				+  预计 API 调用数      = {estimated_calls}
			
 
				+  A 级候选账号数       = {a_grade_candidates}（relevance_level=high 文章 ≥ 2 篇）
			
 
				+  可选历史验证         = {True/False}（A 级账号 ≤ 3 个时开启）
			
 
				+```
			
 
				 
			
 
				-### 工具 I/O 契约（强约束）
			
 
				-- 入参固定：
			
 
				-  - `content_link` ← 文章 `url`
			
 
				-- 仅从以下路径读取账号信息：
			
 
				-  - `metadata.account_info.account_name`
			
 
				-  - `metadata.account_info.wx_gh`
			
 
				-  - `metadata.account_info.biz_info`
			
 
				-  - `metadata.account_info.channel_account_id`
			
 
				-- 返回 `None` 或缺失 `metadata.account_info` 时：
			
 
				-  - 当前文章不参与账号沉淀，记录为失败样本但不编造账号
			
 
				+---
			
 
				 
			
 
				-### 调用顺序
			
 
				-1. 遍历文章，逐条调用 `fetch_weixin_account(content_link=url)`
			
 
				-2. 将返回账号按唯一键聚合（优先 `wx_gh`，若缺失则用 `account_name`）
			
 
				-3. 统计每个账号命中文章数与代表文章
			
 
				-4. 生成账号沉淀结果
			
 
				+## 💰 Harness: Budget — 预算约束
			
 
				+
			
 
				+| 预算项 | 限制 | 超出处理 |
			
 
				+|---|---|---|
			
 
				+| fetch_weixin_account 调用数 | 等于 distinct biz 数（无 biz 时等于文章数） | 不超出，biz 分组已保证最小调用 |
			
 
				+| 工具重试 | 内置 3 次自动重试，Skill 层不额外重试 | 仍失败则跳过该文章，不编造账号 |
			
 
				+| 可选历史验证上限 | 最多验证 A 级账号中前 3 个 | 超出 3 个时跳过验证，直接输出 |
			
 
				+| fetch_account_article_list | 每个 A 级账号仅取 1 页（is_cache=True） | 不翻页，只用近期数据做质量判断 |
			
 
				 
			
 
				 ---
			
 
				 
			
 
				-## 聚合与去重规则
			
 
				+## ⚙️ Core Execution — 核心执行
			
 
				 
			
 
				-- 优先用 `wx_gh` 去重；若无 `wx_gh`，使用 `account_name` 兜底
			
 
				-- 同一账号下累计：
			
 
				-  - `article_count += 1`
			
 
				-  - 将文章标题加入 `sample_articles`（最多保留 5 条）
			
 
				-  - 将文章链接加入 `source_urls`（去重保留）
			
 
				-- 若账号字段冲突，采用“非空优先 + 最新记录优先”策略
			
 
				+### 步骤 1：利用搜索元数据做前置分组
			
 
				 
			
 
				----
			
 
				+整理 `filtered_articles` 中携带的 `biz` 字段：
			
 
				 
			
 
				-## 质量筛选（账号层）
			
 
				+- `biz` 非空的文章 → 按 `biz` 分组，同组只取第一篇 `url` 作为调用入参（节省 API）
			
 
				+- `biz` 为空的文章 → 每篇单独调用 `fetch_weixin_account`
			
 
				 
			
 
				-用于后续细化，可先保留框架：
			
 
				+### 步骤 2：调用 fetch_weixin_account 获取账号信息
			
 
				 
			
 
				-- **内容匹配度**：账号历史内容是否持续覆盖老年人关注主题
			
 
				-- **稳定性**：不是一次性命中，而是多篇文章持续命中
			
 
				-- **可信度**：内容风格是否理性、信息是否相对可靠
			
 
				+```
			
 
				+fetch_weixin_account(content_link = article.url)
			
 
				+```
			
 
				 
			
 
				-> 当前阶段先沉淀，不强制淘汰；后续可在此增加评分阈值。
			
 
				+读取路径（禁止解析 output 文本，禁止猜测账号名）：
			
 
				 
			
 
				----
			
 
				+```
			
 
				+metadata.account_info.account_name       公众号名称
			
 
				+metadata.account_info.wx_gh              公众号 ID（gh_xxx）
			
 
				+metadata.account_info.biz_info           biz 详情
			
 
				+metadata.account_info.channel_account_id 内部 ID
			
 
				+```
			
 
				+
			
 
				+失败处理：`ToolResult.error` 非空 → 该文章不参与账号沉淀，记录失败，继续下一条。
			
 
				+
			
 
				+### 步骤 3：账号聚合与去重
			
 
				+
			
 
				+**去重优先级**：`wx_gh` → `biz` → `account_name`（前者非空时用前者）
			
 
				+
			
 
				+同一账号累计规则：
			
 
				+
			
 
				+```
			
 
				+article_count += 1
			
 
				+sample_articles ← 优先保留 relevance_level=high 的标题，最多 5 条
			
 
				+source_urls     ← 去重追加文章链接
			
 
				+```
			
 
				+
			
 
				+字段冲突时：非空优先 + 最新记录优先。
			
 
				+
			
 
				+### 步骤 4：账号质量分级（内部评估，不写入最终输出）
			
 
				+
			
 
				+| 等级 | 条件 |
			
 
				+|---|---|
			
 
				+| **A 级（优质）** | `article_count ≥ 3` **且** `relevance_level=high` 文章 ≥ 2 篇 |
			
 
				+| **B 级（良好）** | `article_count ≥ 2` 或 `relevance_level=high` 文章 ≥ 1 篇 |
			
 
				+| **C 级（一般）** | 其余（仅 1 篇且质量中/低） |
			
 
				 
			
 
				-## 数据真实性要求
			
 
				+### 步骤 5：可选 — A 级账号历史验证
			
 
				+
			
 
				+仅当 A 级账号数量 ≤ 3 个时执行（避免过多 API 调用）：
			
 
				+
			
 
				+```
			
 
				+fetch_account_article_list(wx_gh = account.wx_gh, is_cache=True)
			
 
				+```
			
 
				 
			
 
				-- 账号信息必须来自 `fetch_weixin_account` 返回
			
 
				-- 不得根据标题或 URL 猜测账号名
			
 
				-- 文章与账号映射关系必须可追溯（必须保留 `source_urls`）
			
 
				-- 禁止解析工具 `output` 文本，必须使用 `metadata.account_info`
			
 
				+读取 `metadata.articles[i].title` 近 3 篇，判断是否持续产出相关内容：
			
 
				+- 近 3 篇有 1 篇以上与目标主题相关 → 维持 A 级
			
 
				+- 近 3 篇全部无关 → 降级为 B 级，在 `sample_articles` 末尾追加备注
			
 
				 
			
 
				 ---
			
 
				 
			
 
				-## 输出固定格式（必须严格遵守）
			
 
				+## 📊 Harness: Observer — 观测与输出
			
 
				+
			
 
				+### 账号沉淀摘要（写入日志 / 传递给下游）
			
 
				+
			
 
				+```
			
 
				+[AccountObserver]
			
 
				+  输入文章数           = {len(filtered_articles)}
			
 
				+  账号 API 成功数      = {success_calls}
			
 
				+  账号 API 失败数      = {failed_calls}
			
 
				+  去重后账号总数       = {total_accounts}
			
 
				+    - A 级账号         = {a_count}
			
 
				+    - B 级账号         = {b_count}
			
 
				+    - C 级账号         = {c_count}
			
 
				+  历史验证执行         = {True/False}，降级数 = {downgraded}
			
 
				+```
			
 
				+
			
 
				+### 输出顺序
			
 
				+
			
 
				+账号列表按质量等级排序：A 级 → B 级 → C 级（同级按 `article_count` 降序）。
			
 
				+
			
 
				+### 输出固定格式
			
 
				 
			
 
				 ```json
			
 
				 {
			
 
				   "accounts": [
			
 
				     {
			
 
				-      "account_name": "示例公众号",
			
 
				+      "account_name": "示例公众号（A级）",
			
 
				       "wx_gh": "gh_xxx",
			
 
				       "channel_account_id": "12345",
			
 
				       "biz_info": {},
			
 
				-      "article_count": 4,
			
 
				-      "sample_articles": ["标题A", "标题B", "标题C"],
			
 
				-      "source_urls": ["https://mp.weixin.qq.com/s?..."]
			
 
				+      "article_count": 3,
			
 
				+      "sample_articles": ["高质量文章A", "高质量文章B", "文章C"],
			
 
				+      "source_urls": [
			
 
				+        "https://mp.weixin.qq.com/s?...",
			
 
				+        "https://mp.weixin.qq.com/s?..."
			
 
				+      ]
			
 
				     }
			
 
				   ]
			
 
				 }
			
 
				 ```
			
 
				+
			
 
				+**数据真实性约束**（违反即视为输出无效）：
			
 
				+- 账号信息必须来自 `fetch_weixin_account` 的 `metadata.account_info`，不得猜测
			
 
				+- `source_urls` 中所有链接必须真实存在于 `filtered_articles` 输入
			
 
				+- `article_count` 必须等于该账号实际关联成功的文章数，禁止估算
			
 
				+- 禁止解析工具 `output` 文本
			
--- a/tests/skills/article_filter_strategy.md
+++ b/tests/skills/article_filter_strategy.md
@@ -1,146 +1,212 @@
 
				 ---
			
 
				 name: article_filter_strategy
			
 
				-description: 文章过滤与筛选策略（老年人兴趣向）
			
 
				+description: 文章过滤与筛选策略（Harness 架构：Phase-0 预筛 + 四维评分）
			
 
				 ---
			
 
				 
			
 
				-# 文章过滤与筛选策略（老年人兴趣向）
			
 
				+# 文章过滤与筛选策略
			
 
				 
			
 
				-## 目标
			
 
				+---
			
 
				+
			
 
				+## ⚡ Harness: Fallback — 前置验证（快速失败）
			
 
				+
			
 
				+在调用任何详情工具前，先验证以下前置条件。**任一失败则立即终止。**
			
 
				 
			
 
				-在 `article_finding_strategy` 拿到候选文章后，调用 `fetch_article_detail` 获取详情，筛选出更适合老年人、且老年人更感兴趣的内容，
			
 
				-输出可直接使用的高质量文章集合。
			
 
				+| 检查项 | 通过条件 | 失败处理 |
			
 
				+|---|---|---|
			
 
				+| 候选列表非空 | `candidate_articles` 长度 ≥ 1 | 终止，向上游发出"候选为空"信号 |
			
 
				+| 目标数量有效 | `target_count >= 1` | 终止，告知上游"目标数量无效" |
			
 
				+| 详情工具可用 | `fetch_article_detail` 存在 | 终止，告知用户"详情工具不可用" |
			
 
				+| filter_focus 可用 | 有 `demand_analysis.筛选关注点` 输出 | 降级：以 `input_query` 关键词自行构造相关性判断，记录警告 |
			
 
				 
			
 
				 ---
			
 
				 
			
 
				-## 输入与输出
			
 
				-
			
 
				-### 输入
			
 
				-- `input_query`：`string`，用户原始需求，非空
			
 
				-- `target_count`：`int`，目标数量，`target_count >= 1`
			
 
				-- `candidate_articles`：`array<object>`，候选文章列表，非空数组
			
 
				-- `candidate_articles[i]` 必须包含：
			
 
				-  - `title`：`string`，非空
			
 
				-  - `url`：`string`，完整文章链接，禁止截断/改写
			
 
				-  - `statistics.time`：`int`，秒级时间戳
			
 
				-
			
 
				-### 输入强约束（必须满足）
			
 
				-- 只接受上述字段；禁止传入未定义字段作为筛选依据
			
 
				-- 严禁从工具 `output` 文本解析结构化数据
			
 
				-- 候选文章字段必须来自同一条 `metadata.search_results` 记录
			
 
				-- 任一候选缺失 `title/url/statistics.time`，该条直接丢弃，不补造字段
			
 
				-
			
 
				-### 输出
			
 
				-- `filtered_articles`：`array<object>`，最多 `target_count` 条
			
 
				-- `filtered_articles[i]` 字段固定为：
			
 
				-  - `title`：`string`
			
 
				-  - `url`：`string`
			
 
				-  - `publish_time`：`int`（秒级时间戳）
			
 
				-  - `reason`：`string`（基于证据的入选原因）
			
 
				-  - `relevance_level`：`"high" | "medium" | "low"`
			
 
				-  - `interest_level`：`"high" | "medium" | "low"`
			
 
				-
			
 
				-### 输出强约束（必须满足）
			
 
				-- 只允许输出上述字段，禁止增加自定义 key
			
 
				-- `title/url/publish_time` 必须与原候选记录或详情记录一致，不得改写
			
 
				-- `publish_time` 统一为秒级：若详情返回毫秒时间戳（`publish_timestamp`），需整除 1000
			
 
				-- `reason` 必须可被详情证据支撑，禁止主观臆测
			
 
				-- `title` 必须做引号标准化：若包含英文双引号 `"`，统一替换为中文双引号 `“` 和 `”`（成对），禁止保留英文双引号
			
 
				+## 📋 Harness: Planner — 执行计划（开始前打印）
			
 
				+
			
 
				+```
			
 
				+[FilterPlanner]
			
 
				+  候选文章数           = {len(candidate_articles)}
			
 
				+  目标输出数 M         = {target_count}
			
 
				+  Phase 0（预筛）      = 基于搜索元数据，零 API 调用淘汰明显不合格项
			
 
				+  Phase 1（详情拉取）  = 最多拉取 {budget_detail_fetch} 篇详情
			
 
				+  评分维度             = 需求相关性 / 内容深度 / 社交证明 / 时效可信度
			
 
				+  淘汰风险点           = {filter_focus.淘汰风险点}
			
 
				+```
			
 
				 
			
 
				 ---
			
 
				 
			
 
				-## 工具使用规则
			
 
				-
			
 
				-### 必用工具
			
 
				-- `fetch_article_detail`：用于查询文章详情（正文、摘要、标签、互动信息等）
			
 
				-
			
 
				-### 工具 I/O 契约（强约束）
			
 
				-- 入参固定：
			
 
				-  - `article_link` ← 候选文章 `url`
			
 
				-  - `is_count` 使用默认值（不强制改写）
			
 
				-  - `is_cache` 使用默认值（不强制改写）
			
 
				-- 只从以下路径读取详情：
			
 
				-  - `metadata.article_info.title`
			
 
				-  - `metadata.article_info.content_link`
			
 
				-  - `metadata.article_info.body_text`
			
 
				-  - `metadata.article_info.publish_timestamp`（毫秒）
			
 
				-- 调用失败（返回 `None` 或缺失 `metadata.article_info`）：
			
 
				-  - 当前文章标记为“详情缺失”，不进入最终结果
			
 
				-
			
 
				-### 调用顺序
			
 
				-1. 先基于候选列表做初筛（去重、基础质量过滤）
			
 
				-2. 对初筛后的文章逐条调用 `fetch_article_detail`
			
 
				-3. 根据详情做老年人兴趣打分与淘汰
			
 
				-4. 按得分排序，输出前 `target_count` 条
			
 
				+## 💰 Harness: Budget — 预算约束
			
 
				+
			
 
				+| 预算项 | 限制 | 超出处理 |
			
 
				+|---|---|---|
			
 
				+| 详情拉取上限 | `min(Phase0通过数, target_count × 3)` | 达到上限停止拉取，对已有结果排序输出 |
			
 
				+| 重试次数 | 工具内置 3 次自动重试，Skill 层不额外重试 | 工具仍失败则跳过该文章 |
			
 
				+| 最终输出上限 | `target_count` | 超出时按综合评分截断 |
			
 
				 
			
 
				 ---
			
 
				 
			
 
				-## 筛选框架（老年人导向）
			
 
				+## ⚙️ Core Execution — 核心执行
			
 
				+
			
 
				+### Phase 0：基于搜索元数据的快速预筛（零 API 调用）
			
 
				+
			
 
				+对每条候选按顺序执行以下检查，**命中任一条则丢弃**：
			
 
				+
			
 
				+| # | 检查项 | 判断标准 |
			
 
				+|---|---|---|
			
 
				+| 1 | 链接有效性 | `url` 为空或非 `http` 开头 |
			
 
				+| 2 | 标题有效性 | `title` 为空或仅空白字符 |
			
 
				+| 3 | 时效性 | `statistics.time` 距今 > 18 个月 |
			
 
				+| 4 | 淘汰风险点（标题级） | 标题命中 `filter_focus.淘汰风险点` 中任意词 |
			
 
				+| 5 | url 去重 | `url` 已存在于已通过列表 |
			
 
				+| 6 | 账号过度集中 | 同 `biz` 已保留 3 条且本条超额 |
			
 
				 
			
 
				-采用三阶段筛选：**相关性判断** + **硬性淘汰** + **兴趣评分**。
			
 
				+**Phase 0 结束后检查**：若通过数 < `target_count × 1.5`，立即通知上游补充候选，收到补充后再继续 Phase 1。
			
 
				 
			
 
				-### 一、相关性判断（与 `input_query` 对齐）
			
 
				+---
			
 
				 
			
 
				-先判断文章与用户需求是否相关，再进入后续评分。
			
 
				+### Phase 1：详情拉取 + 四维评分
			
 
				 
			
 
				-- **核心意图匹配**：文章是否回答了 `input_query` 的主要问题
			
 
				-- **关键信息匹配**：主题、对象、人群、场景是否与 query 一致
			
 
				-- **偏题惩罚**：仅关键词相似但实际内容跑题，判定为低相关
			
 
				+#### 工具调用
			
 
				 
			
 
				-建议先按高/中/低分档：
			
 
				-- 高相关：直接围绕 query 展开，信息可直接用于需求
			
 
				-- 中相关：部分覆盖 query，需要用户二次判断
			
 
				-- 低相关：偏题或仅弱关联，优先淘汰
			
 
				+```
			
 
				+fetch_article_detail(article_link = candidate.url)
			
 
				+```
			
 
				 
			
 
				-### 二、硬性淘汰（任一命中即排除）
			
 
				-- 内容明显低俗、猎奇惊悚、过度煽动情绪
			
 
				-- 医疗健康类但存在明显不实或夸大表述
			
 
				-- 标题党严重、正文信息密度极低
			
 
				-- 与用户主题完全无关
			
 
				+读取路径（仅从以下路径取值，禁止解析 output 文本）：
			
 
				 
			
 
				-### 三、兴趣评分（建议维度）
			
 
				-- **需求价值（query-driven）**：是否真正解决 `input_query`，并给出可执行、可理解的信息
			
 
				-- **可读性**：结构清晰、语言直白、专业术语少
			
 
				-- **可信度**：来源可靠、论述相对客观、有明确依据
			
 
				-- **情感适配**：积极稳健，不过度制造焦虑
			
 
				-- **时效性**：在可接受时间范围内（避免过旧信息误导）
			
 
				+```
			
 
				+metadata.article_info.title              标准标题（覆盖搜索标题）
			
 
				+metadata.article_info.body_text          正文文本
			
 
				+metadata.article_info.publish_timestamp  发布时间（毫秒，÷ 1000 → 秒）
			
 
				+metadata.article_info.view_count         阅读量
			
 
				+metadata.article_info.like_count         点赞量
			
 
				+metadata.article_info.share_count        分享量
			
 
				+metadata.article_info.looking_count      在看量
			
 
				+metadata.article_info.mini_program       嵌入小程序列表
			
 
				+metadata.article_info.image_url_list     文章图片列表
			
 
				+```
			
 
				 
			
 
				-> 说明：先用相对评分即可（高/中/低），后续再细化成数值阈值。  
			
 
				-> 价值判断不再限定固定品类，而是以 `input_query` 为核心做动态匹配。
			
 
				+工具调用失败处理：`ToolResult.error` 非空（已内部重试 3 次）→ 当前文章丢弃，不编造字段，继续下一条。
			
 
				 
			
 
				 ---
			
 
				 
			
 
				-## 结果数量控制
			
 
				+#### 硬性淘汰（在评分前执行，任一命中则排除）
			
 
				 
			
 
				-- 若通过数 **C >= target_count**：按评分取前 `target_count` 条输出
			
 
				-- 若 **target_count × 0.8 <= C < target_count**：输出当前结果并标注“数量接近目标”
			
 
				-- 若 **C < target_count × 0.8**：返回上游继续补充候选，再进入本流程
			
 
				+| 规则 | 依据字段 |
			
 
				+|---|---|
			
 
				+| 正文完全缺失 | `body_text` 为空字符串 |
			
 
				+| 纯广告植入 | `mini_program` 非空 **且** 正文 < 200 字 |
			
 
				+| 标题党 | 强情绪煽动词 + 正文信息密度极低（< 200 字） |
			
 
				+| 明显不实 | 医疗类文章含"根治""100%"等绝对化断言 |
			
 
				+| 完全无关 | 正文与 `input_query` 核心意图无任何交集 |
			
 
				+| 淘汰风险点命中正文 | 正文（非仅标题）命中 `filter_focus.淘汰风险点` |
			
 
				 
			
 
				 ---
			
 
				 
			
 
				-## 数据真实性与一致性
			
 
				+#### 四维评分框架
			
 
				+
			
 
				+每维独立评为 `high / medium / low`，最终综合映射到 `relevance_level` 和 `interest_level`。
			
 
				+
			
 
				+**维度 A：需求相关性** → 直接映射为 `relevance_level`
			
 
				+
			
 
				+参考 `filter_focus.相关性关注点`：
			
 
				+
			
 
				+| 评级 | 标准 |
			
 
				+|---|---|
			
 
				+| `high` | 核心内容直接围绕 query，信息可直接用于需求 |
			
 
				+| `medium` | 部分覆盖 query，涉及相关场景或周边信息 |
			
 
				+| `low` | 仅关键词相似但内容实质跑题 |
			
 
				+
			
 
				+**维度 B：内容深度** → 影响 `interest_level`
			
 
				 
			
 
				-- 仅使用工具返回数据，不编造字段
			
 
				-- `title`、`url`、`publish_time` 必须来自同一条记录
			
 
				-- `reason` 必须基于可验证内容生成（例如“涉及防诈骗案例且表达清晰”）
			
 
				-- 优先使用结构化路径：搜索结果取 `metadata.search_results`，详情取 `metadata.article_info`
			
 
				-- 禁止把不同文章的字段进行拼接混用
			
 
				-- 允许在不改变语义的前提下对 `title` 做最小清洗：去 HTML 标签 + 英文双引号标准化（`"` -> `“”`）
			
 
				+| 信号 | 评估 |
			
 
				+|---|---|
			
 
				+| `body_text` > 800 字 | 内容充实（加分） |
			
 
				+| `body_text` 200~800 字 | 中等 |
			
 
				+| `body_text` < 200 字 | 浅薄（减分） |
			
 
				+| `image_url_list` 非空 | 图文结合（加分） |
			
 
				+| `mini_program` 非空 | 商业嵌入（减分） |
			
 
				+| 符合 `filter_focus.形式规则` | 加分 |
			
 
				+
			
 
				+**维度 C：社交证明** → 影响 `interest_level`
			
 
				+
			
 
				+| 指标 | 低 | 中 | 高 |
			
 
				+|---|---|---|---|
			
 
				+| `view_count` | < 1,000 | 1,000 ~ 10,000 | > 10,000 |
			
 
				+| `like_count` | < 50 | 50 ~ 500 | > 500 |
			
 
				+| `looking_count` | < 20 | 20 ~ 200 | > 200 |
			
 
				+| `share_count` | < 10 | 10 ~ 100 | > 100 |
			
 
				+
			
 
				+≥ 2 项达到「高」阈值 → 社交证明维度为 `high`；全部为 0 时不作淘汰依据，仅降低置信度。
			
 
				+
			
 
				+**维度 D：时效与可信度** → 影响 `interest_level`
			
 
				+
			
 
				+| 信号 | 评估 |
			
 
				+|---|---|
			
 
				+| `publish_timestamp` 距今 < 1 年 | 好 |
			
 
				+| 1 ~ 2 年 | 可接受 |
			
 
				+| > 2 年 | 谨慎（时事/政策类更严格）|
			
 
				+| 论述客观、有来源依据 | 加分 |
			
 
				+| 绝对化断言、制造焦虑 | 减分 |
			
 
				+
			
 
				+**综合映射规则：**
			
 
				+
			
 
				+```
			
 
				+relevance_level = 维度A 直接映射
			
 
				+
			
 
				+interest_level:
			
 
				+  high   = B高 + C中以上 + D良好
			
 
				+  medium = B中 或 C高，其余不差
			
 
				+  low    = B浅薄 或 C全低 或 D有明显问题
			
 
				+```
			
 
				+
			
 
				+`reason` 必须引用具体证据，例如：
			
 
				+> "阅读量 1.2 万，正文 1,500 字，完整覆盖防诈骗流程，发布于 3 个月内"
			
 
				 
			
 
				 ---
			
 
				 
			
 
				-## 输出固定格式（必须严格遵守）
			
 
				+## 📊 Harness: Observer — 观测与输出
			
 
				+
			
 
				+### 筛选摘要（写入日志 / 传递给下游）
			
 
				+
			
 
				+```
			
 
				+[FilterObserver]
			
 
				+  Phase 0 输入数       = {len(candidate_articles)}
			
 
				+  Phase 0 通过数       = {phase0_passed}
			
 
				+  详情拉取数           = {detail_fetched}
			
 
				+    - 拉取失败数       = {detail_failed}
			
 
				+    - 硬性淘汰数       = {hard_eliminated}
			
 
				+    - 评分通过数       = {score_passed}
			
 
				+  最终输出数           = {final_count} / {target_count}
			
 
				+  状态                 = 充足 / 接近目标 / 不足（触发补量）
			
 
				+```
			
 
				+
			
 
				+### 数量不足时的补量信号
			
 
				+
			
 
				+| 通过数 C | 状态 | 行为 |
			
 
				+|---|---|---|
			
 
				+| C ≥ target_count | 充足 | 按评分取前 `target_count` 条输出 |
			
 
				+| target_count × 0.8 ≤ C < target_count | 接近目标 | 输出全部，备注"数量接近目标" |
			
 
				+| C < target_count × 0.8 | 不足 | 向上游发出补量请求，补充后重新进入 Phase 0 |
			
 
				+
			
 
				+### 输出固定格式
			
 
				 
			
 
				 ```json
			
 
				 {
			
 
				   "filtered_articles": [
			
 
				     {
			
 
				-      "title": "文章标题",
			
 
				-      "url": "完整链接",
			
 
				+      "title": "文章标题（已清洗）",
			
 
				+      "url": "https://mp.weixin.qq.com/s?...",
			
 
				       "publish_time": 1710000000,
			
 
				-      "reason": "适合老年人阅读：围绕 query、信息清晰且证据充分",
			
 
				+      "reason": "阅读量 1.2 万，正文 1,500 字，直接回答防诈骗流程，3 个月内发布",
			
 
				       "relevance_level": "high",
			
 
				       "interest_level": "high"
			
 
				     }
			
 
				   ]
			
 
				 }
			
 
				 ```
			
 
				+
			
 
				+**数据真实性约束**（违反即视为输出无效）：
			
 
				+- `title/url/publish_time` 必须来自同一条记录，不得跨记录混用
			
 
				+- 优先用详情中的 `title` 覆盖搜索标题
			
 
				+- `reason` 必须可被工具返回字段支撑，禁止主观臆测
			
 
				+- 禁止解析工具 `output` 文本；禁止编造任何字段
			
--- a/tests/skills/article_finding_strategy.md
+++ b/tests/skills/article_finding_strategy.md
@@ -1,75 +1,130 @@
 
				 ---
			
 
				 name: content_finding_strategy
			
 
				-description: 内容搜索方法论
			
 
				+description: 内容搜索方法论（Harness 架构：两轨搜索 + 搜索期预过滤）
			
 
				 ---
			
 
				 
			
 
				 # 内容搜索方法论
			
 
				 
			
 
				-## 核心流程：关键词提取 → 串行搜索 → 结果评估 → 按需补充
			
 
				-
			
 
				 ---
			
 
				 
			
 
				-## 第一步：需求分析与关键词提取
			
 
				+## ⚡ Harness: Fallback — 前置验证（快速失败）
			
 
				+
			
 
				+在调用任何搜索工具前，先验证以下前置条件。**任一失败则立即终止，不继续执行。**
			
 
				 
			
 
				-- 从用户需求中提取核心关键词和扩展关键词，优先使用用户原话
			
 
				-- 按相关性排序：用户明确说的 > 用户暗示的 > 推测的
			
 
				-- 确定目标数量 **M**（如"找10条"，则 M = 10）
			
 
				+| 检查项 | 通过条件 | 失败处理 |
			
 
				+|---|---|---|
			
 
				+| demand_analysis 已完成 | `起点策略.精准词候选` 或 `主题下钻候选` 至少一个非空 | 回退：用用户原始 query 分词后作为精准词候选，记录警告 |
			
 
				+| 目标数量有效 | `target_count >= 1` | 终止，告知用户"请指定目标文章数量" |
			
 
				+| 搜索工具可用 | `weixin_search` 工具存在 | 终止，告知用户"搜索工具不可用" |
			
 
				 
			
 
				 ---
			
 
				 
			
 
				-## 第二步：串行关键词搜索
			
 
				+## 📋 Harness: Planner — 执行计划（开始前打印）
			
 
				+
			
 
				+执行前输出以下计划，消除"盲盒"感：
			
 
				+
			
 
				+```
			
 
				+[SearchPlanner]
			
 
				+  目标文章数 M         = {target_count}
			
 
				+  候选目标量 P         = M × 3 = {target_count * 3}
			
 
				+  第一轨（精准词）     = {精准词候选列表}，期望贡献 P × 60%
			
 
				+  第二轨（下钻词）     = {主题下钻候选列表}，仅在一轨不足时启用
			
 
				+  搜索期淘汰规则       = 时效 >18月 / 标题命中淘汰风险点 / 同biz >3条
			
 
				+```
			
 
				 
			
 
				-**数量控制**：只搜索 **N = M × 2** 条，搜到后立即停止，不超出此限制。
			
 
				+---
			
 
				 
			
 
				-**数据读取规则**：
			
 
				-- 搜索结果从 `metadata.search_results` 获取，**不要解析工具的 output 文本**
			
 
				+## 💰 Harness: Budget — 预算约束
			
 
				 
			
 
				-**分页策略**：第一次使用默认 cursor（`"0"` 或 `""`），需要更多时使用返回的 cursor 继续获取。
			
 
				+| 预算项 | 限制 | 超出处理 |
			
 
				+|---|---|---|
			
 
				+| 候选上限 P | `target_count × 3` | 达到 P 立即停止搜索，不再调用 weixin_search |
			
 
				+| 关键词轮询上限 | 精准词 + 下钻词全部使用完 | 用完后不再补充关键词，将已有候选交付筛选 |
			
 
				+| 单关键词最多翻页 | 2 页（首页 + next_cursor 续页） | 单词超 2 页不再翻页，换下一个关键词 |
			
 
				+| 同一 biz 保留上限 | 3 条 | 超出丢弃，防止单一账号垄断候选 |
			
 
				 
			
 
				 ---
			
 
				 
			
 
				-## 第三步：数据真实性规范（严格遵守）
			
 
				+## ⚙️ Core Execution — 核心执行
			
 
				+
			
 
				+### 步骤 1：两轨串行搜索
			
 
				+
			
 
				+**第一轨：精准词直搜**（高确信，优先）
			
 
				+
			
 
				+- 来源：`demand_analysis.起点策略.精准词候选`（按顺序逐个）
			
 
				+- 首页 cursor = `"0"`；结果 < 5 条时用 `next_cursor` 取第二页
			
 
				+- 候选量 ≥ P × 60% 后停止第一轨
			
 
				+
			
 
				+**第二轨：主题下钻词**（补量，仅在一轨不足时启用）
			
 
				+
			
 
				+- 来源：`demand_analysis.起点策略.主题下钻候选`
			
 
				+- 策略同第一轨；候选总量 ≥ P 后停止
			
 
				 
			
 
				-**禁止编造数据**，所有字段必须来自工具返回的 metadata。
			
 
				+**数据读取（强约束，禁止解析 output 文本）：**
			
 
				 
			
 
				-### 字段完整性要求
			
 
				-- `url`：文章链接，必须**逐字符完整复制**，不能截断或修改。
			
 
				-- `title`: 文章标题，必须来自**同一条记录**，不能混用，去掉标题中的 html 符号（如 `<p>`、`</p>` 等）；标题中若出现英文双引号（`"`），必须标准化为中文双引号（成对 `“`、`”`），禁止保留英文双引号。
			
 
				-- `statistics.time`: 文章发布时间戳（秒），必须来自**同一条记录**，不能混用。
			
 
				-### 正确做法
			
 
				-```python
			
 
				-item = metadata.search_results[0]
			
 
				-url = item["url"]
			
 
				-title = normalize_quotes(item["title"])  # 英文双引号标准化为中文双引号“”
			
 
				+```
			
 
				+metadata.search_results[i].title           文章标题
			
 
				+metadata.search_results[i].url             文章链接（逐字符完整复制）
			
 
				+metadata.search_results[i].statistics.time 发布时间（秒）
			
 
				+metadata.search_results[i].nick_name       账号名称
			
 
				+metadata.search_results[i].biz             账号 biz（去重键）
			
 
				+metadata.search_results[i].cover_url       封面图（有值=带图）
			
 
				 ```
			
 
				 
			
 
				-### 禁止行为
			
 
				-❌ 编造 url  
			
 
				-❌ 截断 url  
			
 
				-❌ 从 output 文本中解析数据  
			
 
				-❌ 混用不同记录的字段  
			
 
				+### 步骤 2：搜索期预过滤（与步骤 1 同步执行，零额外 API）
			
 
				 
			
 
				-**违反后果**：编造数据会导致 404 错误，严重影响用户体验。
			
 
				+每条结果收到后立即判断，**命中任一条则丢弃**：
			
 
				 
			
 
				----
			
 
				+| 条件 | 判断方式 |
			
 
				+|---|---|
			
 
				+| 链接缺失 | `url` 为空或非 `http` 开头 |
			
 
				+| 标题缺失 | `title` 为空或仅空格 |
			
 
				+| 时效过期 | `statistics.time` 距今 > 18 个月 |
			
 
				+| 命中淘汰风险点 | 标题含 `demand_analysis.筛选关注点.淘汰风险点` 中任意词 |
			
 
				+| url 重复 | 相同 `url` 已在候选池中 |
			
 
				+| 同 biz 超额 | 相同 `biz` 已保留 3 条 |
			
 
				 
			
 
				-## 第四步：结果评估与补充
			
 
				+通过预过滤的文章做标题清洗（去 HTML 标签；英文 `"` 替换为 `"` / `"`）后加入候选池。
			
 
				 
			
 
				-经 `content_filtering_strategy` 筛选后，统计符合要求的内容数量 **C**：
			
 
				+### 步骤 3：工具调用失败处理
			
 
				 
			
 
				-- **C >= M**：完成，进入输出阶段
			
 
				-- **C < M × 0.8**：内容不足，选下一个关键词，回到第二步
			
 
				-- **M × 0.8 <= C < M**：接近目标，可选择继续补充或直接输出
			
 
				+工具已内置 3 次自动重试。若 `ToolResult.error` 仍非空：
			
 
				+
			
 
				+- 当前关键词视为失败，跳过，换下一个关键词
			
 
				+- 所有关键词均失败：终止搜索，进入 Observer 报告失败状态
			
 
				 
			
 
				 ---
			
 
				 
			
 
				-## 错误处理
			
 
				+## 📊 Harness: Observer — 观测与交付
			
 
				 
			
 
				-| 错误类型 | 处理策略 |
			
 
				-|---|---|
			
 
				-| HTTP 502/503/504 | 服务暂时不可用，最多重试 1 次，失败则告知用户 |
			
 
				-| HTTP 400/404 | 检查参数格式，调整后重试 |
			
 
				-| Timeout | 重试 1 次，仍超时则告知用户 |
			
 
				-| 网络错误 | 重试 1-2 次，持续失败则告知用户 |
			
 
				+### 候选池交付给 `article_filter_strategy`
			
 
				+
			
 
				+每条记录固定包含：
			
 
				+
			
 
				+```json
			
 
				+{
			
 
				+  "title": "标题（已清洗）",
			
 
				+  "url": "完整文章链接（原样）",
			
 
				+  "statistics": { "time": 1710000000 },
			
 
				+  "nick_name": "账号名称",
			
 
				+  "biz": "账号 biz",
			
 
				+  "cover_url": "封面图链接或空字符串"
			
 
				+}
			
 
				+```
			
 
				+
			
 
				+### 搜索摘要（写入日志 / 传递给下游）
			
 
				+
			
 
				+```
			
 
				+[SearchObserver]
			
 
				+  关键词使用数       = {used_keywords} / {total_keywords}
			
 
				+  搜索原始返回条数   = {raw_count}
			
 
				+  预过滤淘汰条数     = {filtered_out}
			
 
				+  候选池最终数量     = {candidate_count} （目标 P = {P}）
			
 
				+  状态              = 充足 / 接近目标 / 不足（< P × 60%）
			
 
				+```
			
 
				 
			
 
				-不要切换到其他平台或工具。
			
 
				+| 候选量 C | 状态判断 | 下游行为 |
			
 
				+|---|---|---|
			
 
				+| C ≥ P | 充足 | 直接进入筛选 |
			
 
				+| P × 0.6 ≤ C < P | 接近目标 | 进入筛选，筛选层判断是否补量 |
			
 
				+| C < P × 0.6 | 不足 | 通知筛选层"候选不足"，筛选层可触发回搜 |
			
--- a/tests/skills/demand_analysis.md
+++ b/tests/skills/demand_analysis.md
@@ -0,0 +1,60 @@
 
				+---
			
 
				+name: demand_analysis
			
 
				+description: 需求分析
			
 
				+---
			
 
				+
			
 
				+# 需求分析（仅理解，不执行）
			
 
				+
			
 
				+输入：逗号分隔特征词，如 `养老,防骗,政策解读,故事化`。
			
 
				+本步骤只输出结构化理解结果，不调用工具、不执行搜索/过滤/沉淀。
			
 
				+
			
 
				+## 步骤1：特征分层
			
 
				+
			
 
				+仅对输入词归类，禁止编造新词。
			
 
				+
			
 
				+1. **实质 vs 形式**
			
 
				+   - `实质特征`：主题/问题/对象/场景
			
 
				+   - `形式特征`：表达方式/结构/语气（不参与下一步细分）
			
 
				+
			
 
				+2. **仅对实质特征继续细分**
			
 
				+   - `上层特征`：宽泛，不能直接检索（如"养老政策"）
			
 
				+   - `下层特征`：具体，可直接检索（如"退休金被骗套路"）
			
 
				+   - 约束：`上层 ∪ 下层 = 实质特征`
			
 
				+
			
 
				+## 步骤2：策略判定（只给建议）
			
 
				+
			
 
				+| 条件 | 建议 |
			
 
				+|---|---|
			
 
				+| 下层特征非空 | 精准词直搜 |
			
 
				+| 上层特征非空 | 主题下钻 |
			
 
				+| 两者都非空 | 并行 |
			
 
				+| 只有形式特征 | 用原话构造最小词包 |
			
 
				+
			
 
				+## 输出模板
			
 
				+
			
 
				+```json
			
 
				+{
			
 
				+  "特征归类": {
			
 
				+    "实质特征": [],
			
 
				+    "形式特征": [],
			
 
				+    "上层特征": [],
			
 
				+    "下层特征": []
			
 
				+  },
			
 
				+  "起点策略": {
			
 
				+    "建议精准词直搜": true,
			
 
				+    "建议主题下钻": true,
			
 
				+    "精准词候选": [],
			
 
				+    "主题下钻候选": []
			
 
				+  },
			
 
				+  "筛选关注点": {
			
 
				+    "形式规则": [],
			
 
				+    "相关性关注点": [],
			
 
				+    "淘汰风险点": []
			
 
				+  }
			
 
				+}
			
 
				+```
			
 
				+
			
 
				+## 自检
			
 
				+- 完成实质/形式 + 上层/下层双重标注
			
 
				+- 只输出理解结果，未执行任何动作
			
 
				+- 未引入输入外的核心主题词
			
--- a/tests/tools/__init__.py
+++ b/tests/tools/__init__.py
@@ -3,10 +3,12 @@
 
				 """
			
 
				 
			
 
				 from .weixin_tools import weixin_search, fetch_weixin_account, fetch_account_article_list, fetch_article_detail
			
 
				+from .think_and_plan import think_and_plan
			
 
				 
			
 
				 __all__ = [
			
 
				     "weixin_search",
			
 
				     "fetch_weixin_account",
			
 
				     "fetch_account_article_list",
			
 
				     "fetch_article_detail",
			
 
				+    "think_and_plan"
			
 
				 ]
			
--- a/tests/tools/douyin_user_videos.py
+++ b/tests/tools/douyin_user_videos.py
@@ -193,7 +193,7 @@ async def main():
 
				         sort_type="最新",
			
 
				         cursor=""
			
 
				     )
			
 
				-    print(result.output)
			
 
				+    logger.info(result.output)
			
 
				 
			
 
				 if __name__ == "__main__":
			
 
				     asyncio.run(main())
			
--- a/tests/tools/store_results_mysql.py
+++ b/tests/tools/store_results_mysql.py
@@ -94,7 +94,7 @@ async def main():
 
				         trace_id="7b211fa6-f0d6-4f98-a6f5-689e6af64748",
			
 
				     )
			
 
				     # ToolResult 是 dataclass，用 vars 输出
			
 
				-    print(vars(result))
			
 
				+    logger.info(vars(result))
			
 
				 
			
 
				 if __name__ == "__main__":
			
 
				     asyncio.run(main())
			
--- a/tests/tools/think_and_plan.py
+++ b/tests/tools/think_and_plan.py
@@ -0,0 +1,35 @@
 
				+from agent.tools import tool
			
 
				+from src.infra.trace.logging.tool_logging import format_tool_result_for_log, log_tool_call
			
 
				+
			
 
				+
			
 
				+@tool(
			
 
				+    description="系统化思考与规划工具。不会获取新信息或更改数据库，只用于记录思考过程。",
			
 
				+)
			
 
				+def think_and_plan(thought: str, thought_number: int, action: str, plan: str) -> str:
			
 
				+    """这是用于系统化思考与规划的工具，支持在面对复杂选题构建任务时分阶段梳理思考、规划和行动步骤。该工具不会获取新信息或更改数据库，只会将想法附加到记忆中。
			
 
				+
			
 
				+    Args:
			
 
				+        thought: 当前的思考内容，可以是对问题的分析、假设、洞见、反思或对前一步骤的总结。
			
 
				+        thought_number: 当前思考步骤的编号，用于追踪和回溯整个思考与规划过程。
			
 
				+        action: 基于当前思考和计划，建议下一步采取的行动步骤。
			
 
				+        plan: 针对当前任务拟定的计划或方案。
			
 
				+
			
 
				+    Returns:
			
 
				+        A string describing the thought, plan, and action steps.
			
 
				+    """
			
 
				+    params = {
			
 
				+        "thought": thought,
			
 
				+        "thought_number": thought_number,
			
 
				+        "action": action,
			
 
				+        "plan": plan,
			
 
				+    }
			
 
				+
			
 
				+    result = (
			
 
				+        f"[思考 #{thought_number}]\n"
			
 
				+        f"思考: {thought}\n"
			
 
				+        f"计划: {plan}\n"
			
 
				+        f"下一步: {action}\n"
			
 
				+        f"(此工具仅用于记录思考过程，不会修改任何数据)"
			
 
				+    )
			
 
				+    log_tool_call("think_and_plan", params, format_tool_result_for_log(result))
			
 
				+    return result
			
--- a/tests/tools/weixin_tools.py
+++ b/tests/tools/weixin_tools.py
@@ -1,11 +1,13 @@
 
				 from __future__ import annotations
			
 
				 
			
 
				+import asyncio
			
 
				 import json
			
 
				 import logging
			
 
				 
			
 
				 from agent.tools import tool, ToolContext, ToolResult
			
 
				 from src.infra.shared.http_client import AsyncHttpClient
			
 
				 from src.infra.shared.common import extract_history_articles
			
 
				+from src.infra.trace.logging.tool_logging import format_tool_result_for_log, log_tool_call
			
 
				 
			
 
				 logger = logging.getLogger(__name__)
			
 
				 
			
@@ -13,6 +15,40 @@ logger = logging.getLogger(__name__)
 
				 base_url = "http://crawler-cn.aiddit.com/crawler/wei_xin"
			
 
				 headers = {"Content-Type": "application/json"}
			
 
				 
			
 
				+# fetch_article_detail 串行锁：防止并发请求压垮上游
			
 
				+_detail_lock = asyncio.Lock()
			
 
				+
			
 
				+# 重试配置
			
 
				+_MAX_RETRIES = 3
			
 
				+_RETRY_DELAYS = (2, 4, 8)  # 指数退避（秒）
			
 
				+
			
 
				+
			
 
				+class _UpstreamError(Exception):
			
 
				+    """
			
 
				+    上游返回业务级失败。
			
 
				+
			
 
				+    触发场景：HTTP 200 但响应体中 code != 0 且 data 为 None，
			
 
				+    例如 {"code": 10000, "msg": "未知错误", "data": null}。
			
 
				+    这种情况与网络异常一样需要重试。
			
 
				+    """
			
 
				+
			
 
				+
			
 
				+def _check_upstream(response: dict, tool_name: str) -> None:
			
 
				+    """
			
 
				+    检查上游响应是否为业务级失败，若是则抛出 _UpstreamError。
			
 
				+
			
 
				+    判定条件：code 字段非 0 且 data 字段为 None。
			
 
				+    调用方应在拿到 response 后立即调用此函数，
			
 
				+    从而让外层重试逻辑统一捕获。
			
 
				+    """
			
 
				+    code = response.get("code")
			
 
				+    data = response.get("data")
			
 
				+    if code != 0 and data is None:
			
 
				+        msg = response.get("msg", "")
			
 
				+        raise _UpstreamError(
			
 
				+            f"[{tool_name}] 上游业务错误: code={code}, msg={msg}"
			
 
				+        )
			
 
				+
			
 
				 
			
 
				 def _build_success_result(title: str, response: dict) -> ToolResult:
			
 
				     """把上游响应规范为 ToolResult。"""
			
@@ -56,18 +92,63 @@ async def weixin_search(keyword: str, page: str = "1", ctx: ToolContext = None)
 
				         """
			
 
				     url = "{}/keyword".format(base_url)
			
 
				     payload = json.dumps({"keyword": keyword, "cursor": page})
			
 
				-    try:
			
 
				-        async with AsyncHttpClient(timeout=120) as http_client:
			
 
				-            response = await http_client.post(url=url, headers=headers, data=payload)
			
 
				-        return _build_success_result("微信文章搜索结果", response)
			
 
				-    except Exception as e:
			
 
				-        logger.exception("weixin_search failed")
			
 
				-        return ToolResult(
			
 
				-            title="微信文章搜索失败",
			
 
				-            output="",
			
 
				-            error=str(e),
			
 
				-            metadata={"keyword": keyword, "page": page},
			
 
				-        )
			
 
				+    params = {"keyword": keyword, "page": page}
			
 
				+    last_error: Exception | None = None
			
 
				+
			
 
				+    for attempt in range(1, _MAX_RETRIES + 1):
			
 
				+        try:
			
 
				+            async with AsyncHttpClient(timeout=120) as http_client:
			
 
				+                response = await http_client.post(url=url, headers=headers, data=payload)
			
 
				+
			
 
				+            print(json.dumps(response, ensure_ascii=False, indent=4))
			
 
				+            # 业务级失败（code 非 0 且 data 为 None）视为可重试错误
			
 
				+            _check_upstream(response, "weixin_search")
			
 
				+            # API 返回格式: {"code": 0, "data": {"data": [...], "next_cursor": "2"}}
			
 
				+            # 需要将 data.data 映射为 search_results，每条记录的 time 字段包装到 statistics 中
			
 
				+            raw_items = (response.get("data") or {}).get("data") or []
			
 
				+            search_results = []
			
 
				+            for item in raw_items:
			
 
				+                search_results.append({
			
 
				+                    "title": item.get("title", ""),
			
 
				+                    "url": item.get("url", ""),
			
 
				+                    "statistics": {"time": item.get("time", 0)},
			
 
				+                    "cover_url": item.get("cover_url", ""),
			
 
				+                    "nick_name": item.get("nick_name", ""),
			
 
				+                    "biz": item.get("biz", ""),
			
 
				+                })
			
 
				+            next_cursor = (response.get("data") or {}).get("next_cursor")
			
 
				+            # output 中嵌入结构化文章数据，使 Agent 和下游解析都能获取完整结果
			
 
				+            articles_json = json.dumps(search_results, ensure_ascii=False)
			
 
				+            output_text = (
			
 
				+                f"搜索关键词「{keyword}」返回 {len(search_results)} 条结果\n"
			
 
				+                f"```json\n{articles_json}\n```"
			
 
				+            )
			
 
				+            normalized = {
			
 
				+                "output": output_text,
			
 
				+                "metadata": {
			
 
				+                    "search_results": search_results,
			
 
				+                    "next_cursor": next_cursor,
			
 
				+                    "raw_data": response,
			
 
				+                },
			
 
				+            }
			
 
				+            result = _build_success_result("微信文章搜索结果", normalized)
			
 
				+            log_tool_call("weixin_search", params, format_tool_result_for_log(result))
			
 
				+            return result
			
 
				+        except Exception as e:
			
 
				+            last_error = e
			
 
				+            logger.warning("weixin_search 第 %d/%d 次失败: %s", attempt, _MAX_RETRIES, e)
			
 
				+            if attempt < _MAX_RETRIES:
			
 
				+                await asyncio.sleep(_RETRY_DELAYS[attempt - 1])
			
 
				+
			
 
				+    logger.error("weixin_search 重试 %d 次后仍失败", _MAX_RETRIES)
			
 
				+    result = ToolResult(
			
 
				+        title="微信文章搜索失败",
			
 
				+        output="",
			
 
				+        error=str(last_error),
			
 
				+        metadata=params,
			
 
				+    )
			
 
				+    log_tool_call("weixin_search", params, format_tool_result_for_log(result))
			
 
				+    return result
			
 
				 
			
 
				 
			
 
				 @tool(description="通过公众号文章链接获取公众号详情信息")
			
@@ -94,18 +175,48 @@ async def fetch_weixin_account(content_link: str, ctx: ToolContext = None) -> To
 
				     url = "{}/account_info".format(base_url)
			
 
				     payload = json.dumps({"content_link": content_link, "is_cache": False})
			
 
				 
			
 
				-    try:
			
 
				-        async with AsyncHttpClient(timeout=120) as http_client:
			
 
				-            response = await http_client.post(url=url, headers=headers, data=payload)
			
 
				-        return _build_success_result("公众号详情信息", response)
			
 
				-    except Exception as e:
			
 
				-        logger.exception("fetch_weixin_account failed")
			
 
				-        return ToolResult(
			
 
				-            title="公众号详情获取失败",
			
 
				-            output="",
			
 
				-            error=str(e),
			
 
				-            metadata={"content_link": content_link},
			
 
				-        )
			
 
				+    params = {"content_link": content_link}
			
 
				+    last_error: Exception | None = None
			
 
				+
			
 
				+    for attempt in range(1, _MAX_RETRIES + 1):
			
 
				+        try:
			
 
				+            async with AsyncHttpClient(timeout=120) as http_client:
			
 
				+                response = await http_client.post(url=url, headers=headers, data=payload)
			
 
				+            # 业务级失败（code 非 0 且 data 为 None）视为可重试错误
			
 
				+            _check_upstream(response, "fetch_weixin_account")
			
 
				+            # API 返回格式: {"code": 0, "data": {"data": {"account_name": ..., "wx_gh": ..., ...}}}
			
 
				+            raw_data = (response.get("data") or {}).get("data") or {}
			
 
				+            account_info = {
			
 
				+                "account_name": raw_data.get("account_name", ""),
			
 
				+                "wx_gh": raw_data.get("wx_gh", ""),
			
 
				+                "biz_info": raw_data.get("biz_info", {}),
			
 
				+                "channel_account_id": raw_data.get("channel_account_id", ""),
			
 
				+            }
			
 
				+            normalized = {
			
 
				+                "output": f"公众号: {account_info['account_name']} (wx_gh={account_info['wx_gh']})",
			
 
				+                "metadata": {
			
 
				+                    "account_info": account_info,
			
 
				+                    "raw_data": response,
			
 
				+                },
			
 
				+            }
			
 
				+            result = _build_success_result("公众号详情信息", normalized)
			
 
				+            log_tool_call("fetch_weixin_account", params, format_tool_result_for_log(result))
			
 
				+            return result
			
 
				+        except Exception as e:
			
 
				+            last_error = e
			
 
				+            logger.warning("fetch_weixin_account 第 %d/%d 次失败: %s", attempt, _MAX_RETRIES, e)
			
 
				+            if attempt < _MAX_RETRIES:
			
 
				+                await asyncio.sleep(_RETRY_DELAYS[attempt - 1])
			
 
				+
			
 
				+    logger.error("fetch_weixin_account 重试 %d 次后仍失败", _MAX_RETRIES)
			
 
				+    result = ToolResult(
			
 
				+        title="公众号详情获取失败",
			
 
				+        output="",
			
 
				+        error=str(last_error),
			
 
				+        metadata=params,
			
 
				+    )
			
 
				+    log_tool_call("fetch_weixin_account", params, format_tool_result_for_log(result))
			
 
				+    return result
			
 
				 
			
 
				 
			
 
				 @tool(description="通过微信公众号的 wx_gh 获取微信公众号的历史发文列表")
			
@@ -157,25 +268,51 @@ async def fetch_account_article_list(
 
				         }
			
 
				     )
			
 
				 
			
 
				-    try:
			
 
				-        async with AsyncHttpClient(timeout=120) as http_client:
			
 
				-            response = await http_client.post(url=url, headers=headers, data=payload)
			
 
				-        normalized = extract_history_articles(response)
			
 
				-        return _build_success_result("公众号历史发文列表", normalized)
			
 
				-    except Exception as e:
			
 
				-        logger.exception("fetch_account_article_list failed")
			
 
				-        return ToolResult(
			
 
				-            title="公众号历史发文获取失败",
			
 
				-            output="",
			
 
				-            error=str(e),
			
 
				-            metadata={"wx_gh": wx_gh, "index": index, "is_cache": is_cache},
			
 
				-        )
			
 
				+    params = {"wx_gh": wx_gh, "index": index, "is_cache": is_cache}
			
 
				+    last_error: Exception | None = None
			
 
				+
			
 
				+    for attempt in range(1, _MAX_RETRIES + 1):
			
 
				+        try:
			
 
				+            async with AsyncHttpClient(timeout=120) as http_client:
			
 
				+                response = await http_client.post(url=url, headers=headers, data=payload)
			
 
				+            # 业务级失败（code 非 0 且 data 为 None）视为可重试错误
			
 
				+            _check_upstream(response, "fetch_account_article_list")
			
 
				+            extracted = extract_history_articles(response)
			
 
				+            articles = extracted.get("articles", [])
			
 
				+            normalized = {
			
 
				+                "output": f"公众号 {wx_gh} 历史发文 {len(articles)} 篇",
			
 
				+                "metadata": {
			
 
				+                    "next_cursor": extracted.get("next_cursor"),
			
 
				+                    "articles": articles,
			
 
				+                    "raw_data": response,
			
 
				+                },
			
 
				+            }
			
 
				+            result = _build_success_result("公众号历史发文列表", normalized)
			
 
				+            log_tool_call("fetch_account_article_list", params, format_tool_result_for_log(result))
			
 
				+            return result
			
 
				+        except Exception as e:
			
 
				+            last_error = e
			
 
				+            logger.warning(
			
 
				+                "fetch_account_article_list 第 %d/%d 次失败: %s", attempt, _MAX_RETRIES, e
			
 
				+            )
			
 
				+            if attempt < _MAX_RETRIES:
			
 
				+                await asyncio.sleep(_RETRY_DELAYS[attempt - 1])
			
 
				+
			
 
				+    logger.error("fetch_account_article_list 重试 %d 次后仍失败", _MAX_RETRIES)
			
 
				+    result = ToolResult(
			
 
				+        title="公众号历史发文获取失败",
			
 
				+        output="",
			
 
				+        error=str(last_error),
			
 
				+        metadata=params,
			
 
				+    )
			
 
				+    log_tool_call("fetch_account_article_list", params, format_tool_result_for_log(result))
			
 
				+    return result
			
 
				 
			
 
				 
			
 
				 @tool(description="通过公众号文章链接获取文章详情")
			
 
				 async def fetch_article_detail(
			
 
				     article_link: str,
			
 
				-    is_count: bool = False,
			
 
				+    is_count: bool = True,
			
 
				     is_cache: bool = True,
			
 
				     ctx: ToolContext = None,
			
 
				 ) -> ToolResult:
			
@@ -197,6 +334,11 @@ async def fetch_article_detail(
 
				                 - mini_program: 文章嵌入小程序信息【若无则是空数组】
			
 
				                 - image_url_list: 文章图片列表【若无则是空数组】
			
 
				                 - publish_timestamp: 文章发布时间戳【毫秒时间戳】
			
 
				+                - view_count: 文章阅读量
			
 
				+                - like_count: 文章点赞量
			
 
				+                - share_count: 文章分享量
			
 
				+                - looking_count: 文章在看量
			
 
				+                
			
 
				             - metadata.raw_data: 原始 API 返回数据
			
 
				 
			
 
				     Note:
			
@@ -212,30 +354,69 @@ async def fetch_article_detail(
 
				             "is_cache": is_cache,
			
 
				         }
			
 
				     )
			
 
				-    try:
			
 
				-        async with AsyncHttpClient(timeout=10) as http_client:
			
 
				-            response = await http_client.post(target_url, headers=headers, data=payload)
			
 
				-        return _build_success_result("文章详情信息", response)
			
 
				-    except Exception as e:
			
 
				-        logger.exception("fetch_article_detail failed")
			
 
				-        return ToolResult(
			
 
				-            title="文章详情获取失败",
			
 
				-            output="",
			
 
				-            error=str(e),
			
 
				-            metadata={
			
 
				-                "article_link": article_link,
			
 
				-                "is_count": is_count,
			
 
				-                "is_cache": is_cache,
			
 
				-            },
			
 
				-        )
			
 
				+    params = {"article_link": article_link, "is_count": is_count, "is_cache": is_cache}
			
 
				+
			
 
				+    last_error: Exception | None = None
			
 
				+
			
 
				+    async with _detail_lock:
			
 
				+        for attempt in range(1, _MAX_RETRIES + 1):
			
 
				+            try:
			
 
				+                async with AsyncHttpClient(timeout=30) as http_client:
			
 
				+                    response = await http_client.post(target_url, headers=headers, data=payload)
			
 
				+                # 业务级失败（code 非 0 且 data 为 None）视为可重试错误
			
 
				+                _check_upstream(response, "fetch_article_detail")
			
 
				+                # API 返回格式: {"code": 0, "data": {"data": {"title": ..., "body_text": ..., ...}}}
			
 
				+                raw_detail = (response.get("data") or {}).get("data") or {}
			
 
				+                article_info = {
			
 
				+                    "title": raw_detail.get("title", ""),
			
 
				+                    "channel_content_id": raw_detail.get("channel_content_id", ""),
			
 
				+                    "content_link": raw_detail.get("content_link", article_link),
			
 
				+                    "body_text": raw_detail.get("body_text", ""),
			
 
				+                    "mini_program": raw_detail.get("mini_program", []),
			
 
				+                    "image_url_list": raw_detail.get("image_url_list", []),
			
 
				+                    "publish_timestamp": raw_detail.get("publish_timestamp", 0),
			
 
				+                    "view_count": raw_detail.get("view_count") or 0,
			
 
				+                    "like_count": raw_detail.get("like_count") or 0,
			
 
				+                    "share_count": raw_detail.get("share_count") or 0,
			
 
				+                    "looking_count": raw_detail.get("looking_count") or 0,
			
 
				+                }
			
 
				+                normalized = {
			
 
				+                    "output": f"文章详情: {article_info['title']}",
			
 
				+                    "metadata": {
			
 
				+                        "article_info": article_info,
			
 
				+                        "raw_data": response,
			
 
				+                    },
			
 
				+                }
			
 
				+                result = _build_success_result(article_info["title"] or "文章详情", normalized)
			
 
				+                log_tool_call("fetch_article_detail", params, format_tool_result_for_log(result))
			
 
				+                return result
			
 
				+            except Exception as e:
			
 
				+                last_error = e
			
 
				+                logger.warning(
			
 
				+                    "fetch_article_detail 第 %d/%d 次失败: %s", attempt, _MAX_RETRIES, e
			
 
				+                )
			
 
				+                if attempt < _MAX_RETRIES:
			
 
				+                    await asyncio.sleep(_RETRY_DELAYS[attempt - 1])
			
 
				+
			
 
				+    logger.error("fetch_article_detail 重试 %d 次后仍失败", _MAX_RETRIES)
			
 
				+    result = ToolResult(
			
 
				+        title="文章详情获取失败",
			
 
				+        output="",
			
 
				+        error=str(last_error),
			
 
				+        metadata=params,
			
 
				+    )
			
 
				+    log_tool_call("fetch_article_detail", params, format_tool_result_for_log(result))
			
 
				+    return result
			
 
				 
			
 
				 
			
 
				 if __name__ == "__main__":
			
 
				     url = "http://mp.weixin.qq.com/s?__biz=MjM5ODI5NTE2MA==&mid=2651871172&idx=1&sn=791630221da3b28fc23949c48c994218&chksm=bc39e9a2a29ea779aef9f6a510f24c3b0addfbc08c86d2d20f8bce0c132fc9b0bed98dc6c8ee&scene=7#rd"
			
 
				+
			
 
				     async def run():
			
 
				-        response = await fetch_article_detail(url)
			
 
				+        # response = await fetch_article_detail(url)
			
 
				+        response = await weixin_search("伊朗局势")
			
 
				         import json
			
 
				-        print(json.dumps(response, ensure_ascii=False, indent=4))
			
 
				+        logger.info(json.dumps(response, ensure_ascii=False, indent=4))
			
 
				 
			
 
				     import asyncio
			
 
				     asyncio.run(run())