Agent 功能需求与架构设计文档

可执行规格书：本文档是系统的核心设计。代码修改必须同步更新此文档。如文档与代码冲突，以代码为准，并立即修复文档。

文档维护规范

维护原则：

谁改代码谁更新文档 - 功能变更后，相关文档必须同步修改
保持结构稳定 - 只增删内容，不随意调整层级结构
流程优先 - 新功能先写入核心流程，再补充模块详情
链接代码 - 关键实现标注文件路径，格式：module/file.py:function_name
简洁原则 - 只记录最重要的、与代码准确对应的或者明确的已完成的设计的信息，避免推测、建议，或大量代码
文档分层 - 每层文档是不同层次的overview，在上层文档对应位置引用下层详细文档

系统概览

单次调用是 Agent 的特例：

特性	单次调用	Agent 模式	Sub-Agent 模式
循环次数	1	N (可配置)	N (可配置，受限)
工具调用	可选	常用	受限工具集
状态管理	无	有 (Trace)	有 (独立 Trace + 父子关系)
记忆检索	无	有 (Experience/Skill)	有 (继承主 Agent)
执行图	1 个节点	N 个节点的 DAG	嵌套 DAG（多个 Trace）
触发方式	直接调用	直接调用	通过 Task 工具
权限范围	完整	完整	受限（可配置）

核心架构

三层记忆模型

┌─────────────────────────────────────────────────────────────┐
│ Layer 3: Skills（技能库）                                     │
│ - Markdown 文件，存储详细的能力描述                            │
│ - 通过 skill 工具按需加载                                     │
└─────────────────────────────────────────────────────────────┘
                              ▲
                              │ 归纳
┌─────────────────────────────────────────────────────────────┐
│ Layer 2: Experience（经验库）                                 │
│ - 数据库存储，条件 + 规则 + 证据                              │
│ - 向量检索，注入到 system prompt                              │
└─────────────────────────────────────────────────────────────┘
                              ▲
                              │ 提取
┌─────────────────────────────────────────────────────────────┐
│ Layer 1: Task State（任务状态）                               │
│ - 当前任务的工作记忆                                          │
│ - Trace/Step 记录执行过程                                    │
└─────────────────────────────────────────────────────────────┘

注入方式：

Skills：通过 skill 工具动态加载到对话历史
Experiences：检索后注入到 system prompt

核心流程：Agent Loop

async def run(task: str, max_steps: int = 50) -> AsyncIterator[Union[Trace, Step]]:
    # 1. 创建 Trace
    trace = Trace.create(mode="agent", task=task, status="in_progress")
    await trace_store.create_trace(trace)
    yield trace  # 返回 Trace（表示开始）

    # 2. 加载 Skills（内置 + 自定义）
    # 内置 skills（agent/skills/core.md）自动加载
    skills = load_skills_from_dir(skills_dir)  # skills_dir 可选
    skills_text = format_skills(skills)

    # 3. 检索 Experiences，构建 system prompt
    experiences = await search_experiences(task)
    system_prompt = build_system_prompt(experiences, skills_text)

    # 4. 初始化消息
    messages = [{"role": "user", "content": task}]

    # 5. ReAct 循环
    for step in range(max_steps):
        # 调用 LLM
        response = await llm.chat(
            messages=messages,
            system=system_prompt,
            tools=tool_registry.to_schema()  # 包括 skill、task 等工具
        )

        # 记录 LLM 调用 Step
        llm_step = Step.create(
            trace_id=trace.trace_id,
            step_type="thought",
            status="completed",
            data={"content": response.content, "tool_calls": response.tool_calls}
        )
        await trace_store.add_step(llm_step)
        yield llm_step  # 返回 Step

        # 没有工具调用，完成
        if not response.tool_calls:
            break

        # 执行工具
        for tool_call in response.tool_calls:
            # Doom loop 检测
            if is_doom_loop(tool_call):
                raise DoomLoopError()

            # 执行工具（包括 skill、task 工具）
            result = await execute_tool(tool_call)

            # 记录 action Step
            action_step = Step.create(
                trace_id=trace.trace_id,
                step_type="action",
                status="completed",
                parent_id=llm_step.step_id,
                data={"tool_name": tool_call.name, "arguments": tool_call.args}
            )
            await trace_store.add_step(action_step)
            yield action_step

            # 记录 result Step
            result_step = Step.create(
                trace_id=trace.trace_id,
                step_type="result",
                status="completed",
                parent_id=action_step.step_id,
                data={"output": result}
            )
            await trace_store.add_step(result_step)
            yield result_step

            # 添加到消息历史
            messages.append({"role": "assistant", "tool_calls": [tool_call]})
            messages.append({"role": "tool", "content": result})

    # 6. 完成
    trace.status = "completed"
    await trace_store.update_trace(trace.trace_id, status="completed")
    yield trace  # 返回更新后的 Trace

    return trace

关键机制：

统一返回类型：AsyncIterator[Union[Trace, Step]] - 实时返回执行状态
Doom Loop 检测：跟踪最近 3 次工具调用，如果都是同一个工具且参数相同，中断循环
Skills 自动加载：agent/skills/core.md 总是自动加载，skills_dir 可选额外加载
动态工具加载：Skill 通过 tool 动态加载，按需消耗 context
Sub-Agent 支持：通过 task 工具启动专门化的 Sub-Agent 处理子任务

Sub-Agent 执行流程

主 Agent 通过 task 工具启动 Sub-Agent。

层级关系：

Main Trace (primary agent)
  └─▶ Sub Trace (parent_trace_id 指向主 Trace)
        └─▶ Steps...

关键字段：

Trace.parent_trace_id - 指向父 Trace
Trace.agent_definition - Sub-Agent 类型（如 "explore"）
Sub-Agent 有独立的工具权限配置

实现位置：agent/tools/builtin/task.py（待实现）

数据模型

Trace（任务执行）

@dataclass
class Trace:
    trace_id: str
    mode: Literal["call", "agent"]
    task: Optional[str] = None
    agent_type: Optional[str] = None
    status: Literal["running", "completed", "failed"] = "running"

    # Sub-Agent 支持
    parent_trace_id: Optional[str] = None      # 父 Trace ID
    agent_definition: Optional[str] = None     # Agent 类型名称
    spawned_by_tool: Optional[str] = None      # 启动此 Sub-Agent 的 Step ID

    # 统计
    total_steps: int = 0
    total_tokens: int = 0
    total_cost: float = 0.0

    # 上下文
    uid: Optional[str] = None
    context: Dict[str, Any] = field(default_factory=dict)

实现：agent/execution/models.py:Trace

Step（执行步骤）

@dataclass
class Step:
    step_id: str
    trace_id: str
    step_type: StepType    # "goal", "thought", "action", "result", "evaluation", "response"
    status: Status         # "planned", "in_progress", "completed", "failed", "skipped"
    parent_id: Optional[str] = None  # 树结构（单父节点）
    description: str = ""            # 系统自动提取
    data: Dict[str, Any] = field(default_factory=dict)
    summary: Optional[str] = None    # 仅 evaluation 类型需要

实现：agent/execution/models.py:Step

详细设计：参考 docs/step-tree.md

模块详情

详细的模块文档请参阅：

项目结构

框架与应用分层设计
框架内置预设（agent/presets/, agent/skills/）
项目层配置（./skills/, ./subagents/, ./tools/）
加载优先级和覆盖机制

Sub-Agent 机制

数据模型：AgentDefinition、Trace 扩展
内置 Sub-Agent：general、explore、analyst
Task Tool 实现
Agent Registry 和权限控制
配置文件格式

使用示例：examples/subagent_example.py

Step 树与 Context 管理

Step 类型：goal、action、result、evaluation
Step 状态：planned、in_progress、completed、failed、skipped
树结构：统一表达计划和执行
step 工具：计划管理和进度更新
Context 压缩：基于树结构的历史消息压缩

工具系统

工具定义和注册
双层记忆管理
域名过滤、敏感数据处理
最佳实践

工具接入规范：

高频&简单工具：Python 原生实现 → agent/tools/builtin/
复杂&低频工具：第三方仓库 → vendor/ + 适配器 → agent/tools/advanced/
CLI 命令行工具：pip 安装 → bash_command 调用（如 browser-use）

内置工具（builtin/）：

read_file, edit_file, write_file - 文件操作
bash_command - 命令执行
glob_files, grep_content - 文件搜索
skill, list_skills - 技能库管理

高级工具（advanced/）：

webfetch - 网页抓取（调用 opencode）
lsp_diagnostics - LSP 诊断（调用 opencode）

Skills（agent/skills/）：

browser_use - 浏览器自动化（包含环境配置工具）

详细设计：参考 docs/tools-adapters.md

核心特性：

from reson_agent import tool, ToolResult, ToolContext

@tool(
    url_patterns=["*.google.com"],
    requires_confirmation=True
)
async def my_tool(arg: str, ctx: ToolContext) -> ToolResult:
    return ToolResult(
        title="Success",
        output="Result content",
        long_term_memory="Short summary"
    )

多模态支持

Prompt 层多模态消息构建
OpenAI 格式消息规范
Gemini Provider 适配
图片资源处理

实现：

agent/llm/prompts/wrapper.py:SimplePrompt - Prompt 包装器
agent/llm/gemini.py:_convert_messages_to_gemini - 格式转换

使用示例：examples/feature_extract/run.py

Prompt Loader（提示加载器）

职责：加载和处理 .prompt 文件，支持多模态消息构建

文件格式：

---
model: gemini-2.5-flash
temperature: 0.3
---

$system$
系统提示...

$user$
用户提示：%variable%

核心功能：

YAML frontmatter 解析（配置）
$section$ 分节语法
%variable% 参数替换
多模态消息支持（图片等）

实现：

agent/llm/prompts/loader.py:load_prompt() - 文件解析
agent/llm/prompts/loader.py:get_message() - 参数替换
agent/llm/prompts/wrapper.py:SimplePrompt - Prompt 包装器

使用：

prompt = SimplePrompt("task.prompt")
messages = prompt.build_messages(text="...", images="img.png")

Skills（技能库）

分类：

类型	加载位置	加载时机
Core Skill	System Prompt	Agent 启动时自动
普通 Skill	对话消息	模型调用 `skill` 工具时

目录结构：

./agent/skills/
├── core.md                   # Core Skill（自动加载到 System Prompt）
└── browser_use/              # 普通 Skill（按需加载到对话消息）
    ├── browser-use.md
    ├── setup.py
    └── __init__.py

Core Skill（agent/skills/core.md）：

核心系统功能：Step 管理、进度追踪
框架自动注入到 System Prompt

普通 Skill：通过 skill 工具动态加载

# Agent 运行时调用
await tools.execute("skill", {"skill_name": "browser-use"})
# 内容注入到对话历史

实现：

agent/memory/skill_loader.py:SkillLoader - Markdown 解析器
agent/memory/skill_loader.py:load_skills_from_dir() - Skills 自动加载（内置 + 自定义）
agent/tools/builtin/skill.py:skill() - skill 工具实现
agent/tools/builtin/skill.py:list_skills() - 列出可用 skills

详细文档：参考 docs/skills.md

Experiences（经验库）

存储：PostgreSQL + pgvector

CREATE TABLE experiences (
    exp_id TEXT PRIMARY KEY,
    scope TEXT,           -- "agent:executor" 或 "user:123"
    condition TEXT,       -- "当遇到数据库连接超时"
    rule TEXT,            -- "增加重试次数到5次"
    evidence JSONB,       -- 证据（step_ids）

    source TEXT,          -- "execution", "feedback", "manual"
    confidence FLOAT,
    usage_count INT,
    success_rate FLOAT,

    embedding vector(1536),  -- 向量检索

    created_at TIMESTAMP,
    updated_at TIMESTAMP
);

检索和注入：

# 1. 检索相关 Experiences
experiences = await db.query(
    """
    SELECT condition, rule, success_rate
    FROM experiences
    WHERE scope = $1
    ORDER BY embedding <-> $2
    LIMIT 10
    """,
    f"agent:{agent_type}",
    embed(task)
)

# 2. 注入到 system prompt
system_prompt = base_prompt + "\n\n# Learned Experiences\n" + "\n".join([
    f"- When {e.condition}, then {e.rule} (success rate: {e.success_rate:.1%})"
    for e in experiences
])

实现：agent/memory/stores.py:ExperienceStore（待实现 PostgreSQL 版本）

存储接口

class TraceStore(Protocol):
    async def save(self, trace: Trace) -> None: ...
    async def get(self, trace_id: str) -> Trace: ...
    async def add_step(self, step: Step) -> None: ...
    async def get_steps(self, trace_id: str) -> List[Step]: ...

class ExperienceStore(Protocol):
    async def search(self, scope: str, query: str, limit: int) -> List[Dict]: ...
    async def add(self, exp: Dict) -> None: ...
    async def update_stats(self, exp_id: str, success: bool) -> None: ...

class SkillLoader(Protocol):
    async def scan(self) -> List[str]:  # 返回 skill names
        """扫描并返回所有可用的 skill 名称"""

    async def load(self, name: str) -> str:  # 返回内容
        """加载指定 skill 的 Markdown 内容"""

实现：

Trace/Step 协议：agent/execution/protocols.py
Memory 协议：agent/memory/protocols.py

实现策略：

Trace/Step: 文件系统（JSON）
- FileSystemTraceStore - 文件持久化（支持跨进程）
Experience: PostgreSQL + pgvector
Skill: 文件系统（Markdown）

模块结构（v0.2.2）

agent/
├── __init__.py            # 公开 API
│
├── core/                  # 核心引擎
│   ├── runner.py          # AgentRunner
│   └── config.py          # AgentConfig, CallResult
│
├── execution/             # 执行追踪
│   ├── models.py          # Trace, Step
│   ├── protocols.py       # TraceStore
│   ├── fs_store.py        # FileSystemTraceStore
│   ├── tree_dump.py       # 可视化
│   ├── api.py             # RESTful API
│   └── websocket.py       # WebSocket
│
├── memory/                # 记忆系统
│   ├── models.py          # Experience, Skill
│   ├── protocols.py       # MemoryStore, StateStore
│   ├── stores.py          # 存储实现
│   └── skill_loader.py    # Skill 加载器（自动加载内置 skills）
│
├── tools/                 # 工具系统
│   ├── registry.py        # ToolRegistry
│   ├── models.py          # ToolResult, ToolContext
│   ├── schema.py          # SchemaGenerator
│   ├── url_matcher.py     # URL 模式匹配
│   ├── sensitive.py       # 敏感数据处理
│   ├── builtin/           # 核心工具
│   ├── advanced/          # 高级工具
│   └── adapters/          # 外部集成
│
├── llm/                   # LLM 相关
│   ├── gemini.py
│   ├── openrouter.py
│   └── prompts/
│       ├── loader.py
│       └── wrapper.py
│
├── skills/                # 内置 Skills（自动加载）
│   └── core.md            # 核心 skill，每次运行自动加载
│
└── subagents/             # Sub-agent

设计决策

详见设计决策文档

核心决策：

Skills 通过工具加载（vs 预先注入）
- 按需加载，Agent 自主选择
- 参考 OpenCode 和 Claude API 文档
Skills 用文件系统（vs 数据库）
- 易于编辑（Markdown）
- 版本控制（Git）
- 零依赖
Experiences 用数据库（vs 文件）
- 需要向量检索
- 需要统计分析
- 数量大，动态更新
不需要事件系统
- 后台场景，不需要实时通知
- Trace/Step 已记录所有信息

Debug 工具

开发调试时可实时查看 Step 树：

from agent.debug import dump_tree

# 每次 step 变化后调用
dump_tree(trace, steps)

# 终端实时查看
watch -n 0.5 cat .trace/tree.txt

实现：agent/execution/tree_dump.py

详细说明：参考 docs/step-tree.md

测试

详见测试指南

测试分层：

单元测试：Agent 定义、工具系统、Trace 模型
集成测试：Sub-Agent、Trace 存储、多模块协作
E2E 测试：真实 LLM 调用（需要 API Key）

运行测试：

# 单元测试
pytest tests/ -v -m "not e2e"

# 覆盖率
pytest --cov=agent tests/ -m "not e2e"

# E2E 测试（可选）
GEMINI_API_KEY=xxx pytest tests/e2e/ -v -m e2e

核心概念速查

概念	定义	存储	实现
Trace	一次任务执行	文件系统（JSON）	`execution/models.py`
Step	执行步骤（树结构）	文件系统（JSON）	`execution/models.py`
Goal Step	计划项/目标	Step 的一种类型	`execution/models.py`
Sub-Agent	专门化的子代理	独立 Trace	`tools/builtin/task.py`
AgentDefinition	Agent 类型定义	配置文件/代码	`subagents/`
Skill	能力描述（Markdown）	文件系统	`memory/skill_loader.py`
Experience	经验规则（条件+规则）	数据库 + 向量	`memory/stores.py`
Tool	可调用的函数	内存（注册表）	`tools/registry.py`
Agent Loop	ReAct 循环	-	`core/runner.py`
Doom Loop	无限循环检测	-	`core/runner.py`

README.md 19 KB Permalink Cronologia Originale