# Agent 功能需求与架构设计文档

> **可执行规格书**：本文档是系统的核心设计。代码修改必须同步更新此文档。
> 如文档与代码冲突，以代码为准，并立即修复文档。

---

## 文档维护规范

**维护原则**：
1. **谁改代码谁更新文档** - 功能变更后，相关文档必须同步修改
2. **保持结构稳定** - 只增删内容，不随意调整层级结构
3. **流程优先** - 新功能先写入核心流程，再补充模块详情
4. **链接代码** - 关键实现标注文件路径，格式：`module/file.py:function_name`
5. **简洁原则** - 只记录最重要的、与代码准确对应的或者明确的已完成的设计的信息，避免推测、建议，或大量代码
6. **文档分层** - 每层文档是不同层次的overview，在上层文档对应位置引用下层详细文档

---

## 系统概览

**单次调用是 Agent 的特例**：

| 特性 | 单次调用 | Agent 模式 | Sub-Agent 模式 |
|------|---------|-----------|--------------|
| 循环次数 | 1 | N (可配置) | N (可配置，受限) |
| 工具调用 | 可选 | 常用 | 受限工具集 |
| 状态管理 | 无 | 有 (Trace) | 有 (独立 Trace + 父子关系) |
| 记忆检索 | 无 | 有 (Experience/Skill) | 有 (继承主 Agent) |
| 执行图 | 1 条 Message | N 条 Messages 的 DAG | 嵌套 DAG（多个 Trace） |
| 触发方式 | 直接调用 | 直接调用 | 通过 Task 工具 |
| 权限范围 | 完整 | 完整 | 受限（可配置） |

---

## 核心架构

### 三层记忆模型

```
┌─────────────────────────────────────────────────────────────┐
│ Layer 3: Skills（技能库）                                     │
│ - Markdown 文件，存储详细的能力描述                            │
│ - 通过 skill 工具按需加载                                     │
└─────────────────────────────────────────────────────────────┘
                              ▲
                              │ 归纳
┌─────────────────────────────────────────────────────────────┐
│ Layer 2: Experience（经验库）                                 │
│ - 数据库存储，条件 + 规则 + 证据                              │
│ - 向量检索，注入到 system prompt                              │
└─────────────────────────────────────────────────────────────┘
                              ▲
                              │ 提取
┌─────────────────────────────────────────────────────────────┐
│ Layer 1: Task State（任务状态）                               │
│ - 当前任务的工作记忆                                          │
│ - Trace + Messages 记录执行过程                               │
│ - GoalTree 管理执行计划                                       │
└─────────────────────────────────────────────────────────────┘
```

**注入方式**：
- **Skills**：通过 `skill` 工具动态加载到对话历史
- **Experiences**：检索后注入到 system prompt

---

## 核心流程：Agent Loop

```python
async def run(task: str, max_steps: int = 50) -> AsyncIterator[Union[Trace, Message]]:
    # 1. 创建 Trace
    trace = Trace.create(mode="agent", task=task, status="in_progress")
    await trace_store.create_trace(trace)
    yield trace  # 返回 Trace（表示开始）

    # 2. 加载 Skills（内置 + 自定义）
    skills = load_skills_from_dir(skills_dir)
    skills_text = format_skills(skills)

    # 3. 检索 Experiences，构建 system prompt
    experiences = await search_experiences(task)
    system_prompt = build_system_prompt(experiences, skills_text)

    # 4. 初始化消息和 GoalTree
    messages = [{"role": "user", "content": task}]
    goal_tree = GoalTree(mission=task)

    # 5. ReAct 循环
    for step in range(max_steps):
        # 注入当前计划到 system prompt
        plan_text = goal_tree.to_prompt()

        # 调用 LLM
        response = await llm.chat(
            messages=messages,
            system=system_prompt + plan_text,
            tools=tool_registry.to_schema()
        )

        # 记录 assistant Message
        assistant_msg = Message.create(
            trace_id=trace.trace_id,
            role="assistant",
            goal_id=goal_tree.current_id,
            content=response.content,  # text + tool_calls
        )
        await trace_store.add_message(assistant_msg)
        yield assistant_msg

        # 没有工具调用，完成
        if not response.tool_calls:
            break

        # 执行工具
        for tool_call in response.tool_calls:
            result = await execute_tool(tool_call)

            # 记录 tool Message
            tool_msg = Message.create(
                trace_id=trace.trace_id,
                role="tool",
                goal_id=goal_tree.current_id,
                tool_call_id=tool_call.id,
                content=result,
            )
            await trace_store.add_message(tool_msg)
            yield tool_msg

            # 添加到消息历史
            messages.append({"role": "assistant", "tool_calls": [tool_call]})
            messages.append({"role": "tool", "tool_call_id": tool_call.id, "content": result})

    # 6. 完成
    trace.status = "completed"
    await trace_store.update_trace(trace.trace_id, status="completed")
    yield trace
```

**关键机制**：
- **统一返回类型**：`AsyncIterator[Union[Trace, Message]]` - 实时返回执行状态
- **GoalTree 注入**：每次 LLM 调用前注入当前计划（过滤废弃目标，连续编号）
- **Message 关联 Goal**：每条 Message 通过 `goal_id` 关联所属 Goal
- **Doom Loop 检测**：跟踪最近 3 次工具调用，如果都是同一个工具且参数相同，中断循环

### Sub-Agent 执行流程

主 Agent 通过 `task` 工具启动 Sub-Agent。

**层级关系**：
```
Main Trace (primary agent)
  └─▶ Sub Trace (parent_trace_id 指向主 Trace)
        └─▶ Steps...
```

**关键字段**：
- `Trace.parent_trace_id` - 指向父 Trace
- `Trace.agent_definition` - Sub-Agent 类型（如 "explore"）
- Sub-Agent 有独立的工具权限配置

**实现位置**：`agent/tools/builtin/task.py`（待实现）

---

## 数据模型

### Trace（任务执行）

```python
@dataclass
class Trace:
    trace_id: str
    mode: Literal["call", "agent"]
    task: Optional[str] = None
    agent_type: Optional[str] = None
    status: Literal["running", "completed", "failed"] = "running"

    # Sub-Agent 支持
    parent_trace_id: Optional[str] = None      # 父 Trace ID
    agent_definition: Optional[str] = None     # Agent 类型名称
    spawned_by_tool: Optional[str] = None      # 启动此 Sub-Agent 的 Message ID

    # 统计
    total_messages: int = 0
    total_tokens: int = 0
    total_cost: float = 0.0

    # 上下文
    uid: Optional[str] = None
    context: Dict[str, Any] = field(default_factory=dict)
    current_goal_id: Optional[str] = None      # 当前焦点 goal
```

**实现**：`agent/execution/models.py:Trace`

### Message（执行消息）

对应 LLM API 的消息，加上元数据。不再有 parent_id 树结构。

```python
@dataclass
class Message:
    message_id: str
    trace_id: str
    role: Literal["assistant", "tool"]   # 和 LLM API 一致
    sequence: int                        # 全局顺序
    goal_id: str                         # 关联的 Goal 内部 ID
    tool_call_id: Optional[str] = None   # tool 消息关联对应的 tool_call
    content: Any = None                  # 消息内容（和 LLM API 格式一致）
    description: str = ""                # 系统自动提取的摘要

    # 元数据
    tokens: Optional[int] = None
    cost: Optional[float] = None
```

**实现**：`agent/execution/models.py:Message`

**Message 类型**：
- `role="assistant"`：模型的一次返回（可能同时包含文本和多个 tool_calls）
- `role="tool"`：一个工具的执行结果（通过 `tool_call_id` 关联）

---

## 模块详情

详细的模块文档请参阅：

### [项目结构](./project-structure.md)
- 框架与应用分层设计
- 框架内置预设（agent/presets/, agent/skills/）
- 项目层配置（./skills/, ./subagents/, ./tools/）
- 加载优先级和覆盖机制

### [Sub-Agent 机制](./sub-agents.md)
- 数据模型：AgentDefinition、Trace 扩展
- 内置 Sub-Agent：general、explore、analyst
- Task Tool 实现
- Agent Registry 和权限控制
- 配置文件格式

**使用示例**：`examples/subagent_example.py`

### [Context 管理与可视化](./context-management.md)
- GoalTree：层级目标管理（嵌套 JSON，注入 LLM）
- Goal ID 设计：内部 ID（稳定）vs 显示序号（连续，给 LLM）
- goal 工具：计划管理（add, after, under, done, abandon, focus）
- 位置控制：after（同层级追加）、under（子任务拆解）
- Plan 注入策略：完整展示当前目标及其父链
- explore 工具：并行探索-合并，收集每个 Sub-Trace 的最后消息作为元数据
- 回溯机制：未执行直接修改 / 已执行标记废弃+新分支
- DAG 可视化：节点=结果，边=动作，边可展开/折叠
- 数据结构：GoalTree + Messages（扁平列表，goal_id 关联）
- Sub-Trace 元数据：last_message, summary, stats（用于辅助决策和可视化）

### [工具系统](./tools.md)
- 工具定义和注册
- 双层记忆管理
- 域名过滤、敏感数据处理
- 最佳实践

**工具接入规范**：
- **高频&简单工具**：Python 原生实现 → `agent/tools/builtin/`
- **复杂&低频工具**：第三方仓库 → `vendor/` + 适配器 → `agent/tools/advanced/`
- **CLI 命令行工具**：pip 安装 → `bash_command` 调用（如 browser-use）

**内置工具**（`builtin/`）：
- `read_file`, `edit_file`, `write_file` - 文件操作
- `bash_command` - 命令执行
- `glob_files`, `grep_content` - 文件搜索
- `skill`, `list_skills` - 技能库管理

**高级工具**（`advanced/`）：
- `webfetch` - 网页抓取（调用 opencode）
- `lsp_diagnostics` - LSP 诊断（调用 opencode）

**Skills**（`agent/skills/`）：
- `browser_use` - 浏览器自动化（包含环境配置工具）

**详细设计**：参考 [`docs/tools-adapters.md`](./tools-adapters.md)

**核心特性**：
```python
from reson_agent import tool, ToolResult, ToolContext

@tool(
    url_patterns=["*.google.com"],
    requires_confirmation=True
)
async def my_tool(arg: str, ctx: ToolContext) -> ToolResult:
    return ToolResult(
        title="Success",
        output="Result content",
        long_term_memory="Short summary"
    )
```

### [多模态支持](./multimodal.md)
- Prompt 层多模态消息构建
- OpenAI 格式消息规范
- Gemini Provider 适配
- 图片资源处理

**实现**：
- `agent/llm/prompts/wrapper.py:SimplePrompt` - Prompt 包装器
- `agent/llm/gemini.py:_convert_messages_to_gemini` - 格式转换

**使用示例**：`examples/feature_extract/run.py`

### Prompt Loader（提示加载器）

**职责**：加载和处理 `.prompt` 文件，支持多模态消息构建

**文件格式**：
```yaml
---
model: gemini-2.5-flash
temperature: 0.3
---

$system$
系统提示...

$user$
用户提示：%variable%
```

**核心功能**：
- YAML frontmatter 解析（配置）
- `$section$` 分节语法
- `%variable%` 参数替换
- 多模态消息支持（图片等）

**实现**：
- `agent/llm/prompts/loader.py:load_prompt()` - 文件解析
- `agent/llm/prompts/loader.py:get_message()` - 参数替换
- `agent/llm/prompts/wrapper.py:SimplePrompt` - Prompt 包装器

**使用**：
```python
prompt = SimplePrompt("task.prompt")
messages = prompt.build_messages(text="...", images="img.png")
```

### Skills（技能库）

**分类**：

| 类型 | 加载位置 | 加载时机 |
|------|---------|---------|
| **Core Skill** | System Prompt | Agent 启动时自动 |
| **普通 Skill** | 对话消息 | 模型调用 `skill` 工具时 |

**目录结构**：

```
./agent/skills/
├── core.md                   # Core Skill（自动加载到 System Prompt）
└── browser_use/              # 普通 Skill（按需加载到对话消息）
    ├── browser-use.md
    ├── setup.py
    └── __init__.py
```

**Core Skill**（`agent/skills/core.md`）：
- 核心系统功能：Step 管理、进度追踪
- 框架自动注入到 System Prompt

**普通 Skill**：通过 `skill` 工具动态加载

```python
# Agent 运行时调用
await tools.execute("skill", {"skill_name": "browser-use"})
# 内容注入到对话历史
```

**实现**：
- `agent/memory/skill_loader.py:SkillLoader` - Markdown 解析器
- `agent/memory/skill_loader.py:load_skills_from_dir()` - Skills 自动加载（内置 + 自定义）
- `agent/tools/builtin/skill.py:skill()` - skill 工具实现
- `agent/tools/builtin/skill.py:list_skills()` - 列出可用 skills

**详细文档**：参考 [`docs/skills.md`](./skills.md)

### Experiences（经验库）

**存储**：PostgreSQL + pgvector

```sql
CREATE TABLE experiences (
    exp_id TEXT PRIMARY KEY,
    scope TEXT,           -- "agent:executor" 或 "user:123"
    condition TEXT,       -- "当遇到数据库连接超时"
    rule TEXT,            -- "增加重试次数到5次"
    evidence JSONB,       -- 证据（step_ids）

    source TEXT,          -- "execution", "feedback", "manual"
    confidence FLOAT,
    usage_count INT,
    success_rate FLOAT,

    embedding vector(1536),  -- 向量检索

    created_at TIMESTAMP,
    updated_at TIMESTAMP
);
```

**检索和注入**：

```python
# 1. 检索相关 Experiences
experiences = await db.query(
    """
    SELECT condition, rule, success_rate
    FROM experiences
    WHERE scope = $1
    ORDER BY embedding <-> $2
    LIMIT 10
    """,
    f"agent:{agent_type}",
    embed(task)
)

# 2. 注入到 system prompt
system_prompt = base_prompt + "\n\n# Learned Experiences\n" + "\n".join([
    f"- When {e.condition}, then {e.rule} (success rate: {e.success_rate:.1%})"
    for e in experiences
])
```

**实现**：`agent/memory/stores.py:ExperienceStore`（待实现 PostgreSQL 版本）

---

## 存储接口

```python
class TraceStore(Protocol):
    async def save(self, trace: Trace) -> None: ...
    async def get(self, trace_id: str) -> Trace: ...
    async def add_message(self, message: Message) -> None: ...
    async def get_messages(self, trace_id: str) -> List[Message]: ...
    async def get_messages_by_goal(self, trace_id: str, goal_id: str) -> List[Message]: ...

class ExperienceStore(Protocol):
    async def search(self, scope: str, query: str, limit: int) -> List[Dict]: ...
    async def add(self, exp: Dict) -> None: ...
    async def update_stats(self, exp_id: str, success: bool) -> None: ...

class SkillLoader(Protocol):
    async def scan(self) -> List[str]:  # 返回 skill names
        """扫描并返回所有可用的 skill 名称"""

    async def load(self, name: str) -> str:  # 返回内容
        """加载指定 skill 的 Markdown 内容"""
```

**实现**：
- Trace/Message 协议：`agent/execution/protocols.py`
- Memory 协议：`agent/memory/protocols.py`

**实现策略**：
- Trace/Message: 文件系统（JSON）
  - `FileSystemTraceStore` - 文件持久化（支持跨进程）
- Experience: PostgreSQL + pgvector
- Skill: 文件系统（Markdown）

---

## 模块结构（v0.2.2）

```
agent/
├── __init__.py            # 公开 API
│
├── core/                  # 核心引擎
│   ├── runner.py          # AgentRunner
│   └── config.py          # AgentConfig, CallResult
│
├── execution/             # 执行追踪
│   ├── models.py          # Trace, Message
│   ├── protocols.py       # TraceStore
│   ├── fs_store.py        # FileSystemTraceStore
│   ├── api.py             # RESTful API（DAG 视图）
│   └── websocket.py       # WebSocket
│
├── memory/                # 记忆系统
│   ├── models.py          # Experience, Skill
│   ├── protocols.py       # MemoryStore, StateStore
│   ├── stores.py          # 存储实现
│   └── skill_loader.py    # Skill 加载器（自动加载内置 skills）
│
├── tools/                 # 工具系统
│   ├── registry.py        # ToolRegistry
│   ├── models.py          # ToolResult, ToolContext
│   ├── schema.py          # SchemaGenerator
│   ├── url_matcher.py     # URL 模式匹配
│   ├── sensitive.py       # 敏感数据处理
│   ├── builtin/           # 核心工具
│   ├── advanced/          # 高级工具
│   └── adapters/          # 外部集成
│
├── llm/                   # LLM 相关
│   ├── gemini.py
│   ├── openrouter.py
│   └── prompts/
│       ├── loader.py
│       └── wrapper.py
│
├── skills/                # 内置 Skills（自动加载）
│   └── core.md            # 核心 skill，每次运行自动加载
│
└── subagents/             # Sub-agent
```

---

## 设计决策

详见 [设计决策文档](./decisions.md)

**核心决策**：

1. **Skills 通过工具加载**（vs 预先注入）
   - 按需加载，Agent 自主选择
   - 参考 OpenCode 和 Claude API 文档

2. **Skills 用文件系统**（vs 数据库）
   - 易于编辑（Markdown）
   - 版本控制（Git）
   - 零依赖

3. **Experiences 用数据库**（vs 文件）
   - 需要向量检索
   - 需要统计分析
   - 数量大，动态更新

4. **Context 管理：GoalTree + Message + DAG 可视化**
   - GoalTree 嵌套 JSON 注入 LLM，Messages 扁平存储
   - DAG 可视化从 GoalTree + Messages 派生
   - 详见 [`docs/context-management.md`](./context-management.md)

---

## Debug 工具

开发调试时可通过 API 查看 DAG 可视化：

```bash
# 启动 API Server
python api_server.py

# 查看 DAG
curl http://localhost:8000/api/traces/{trace_id}/dag
```

**实现**：`agent/execution/api.py`

---

## 测试

详见 [测试指南](./testing.md)

**测试分层**：
- **单元测试**：Agent 定义、工具系统、Trace 模型
- **集成测试**：Sub-Agent、Trace 存储、多模块协作
- **E2E 测试**：真实 LLM 调用（需要 API Key）

**运行测试**：
```bash
# 单元测试
pytest tests/ -v -m "not e2e"

# 覆盖率
pytest --cov=agent tests/ -m "not e2e"

# E2E 测试（可选）
GEMINI_API_KEY=xxx pytest tests/e2e/ -v -m e2e
```

---

## 核心概念速查

| 概念 | 定义 | 存储 | 实现 |
|------|------|------|------|
| **Trace** | 一次任务执行 | 文件系统（JSON） | `execution/models.py` |
| **Message** | 执行消息（对应 LLM 消息） | 文件系统（JSON） | `execution/models.py` |
| **GoalTree** | 层级执行计划 | goal.json | `goal/models.py` |
| **Goal** | 计划中的目标节点 | 嵌套在 GoalTree 中 | `goal/models.py` |
| **Sub-Agent** | 专门化的子代理 | 独立 Trace | `tools/builtin/task.py` |
| **AgentDefinition** | Agent 类型定义 | 配置文件/代码 | `subagents/` |
| **Skill** | 能力描述（Markdown） | 文件系统 | `memory/skill_loader.py` |
| **Experience** | 经验规则（条件+规则） | 数据库 + 向量 | `memory/stores.py` |
| **Tool** | 可调用的函数 | 内存（注册表） | `tools/registry.py` |
| **Agent Loop** | ReAct 循环 | - | `core/runner.py` |
| **Doom Loop** | 无限循环检测 | - | `core/runner.py` |