# Step 树结构与 Context 管理

> 本文档描述 Agent 执行过程的结构化记录、计划管理和 Context 压缩机制。

---

## 设计目标

1. **可视化**：支持执行路径的树状展示，可折叠/展开
2. **计划管理**：统一表达"已执行"和"计划中"的步骤
3. **Context 优化**：基于树结构压缩历史消息，节省 token

---

## 核心设计：Step 树

### Step 类型

```python
StepType = Literal[
    # 计划相关
    "goal",        # 目标/计划项（可以有子 steps）

    # LLM 输出
    "thought",     # 思考/分析（中间过程）
    "evaluation",  # 评估总结（需要 summary）
    "response",    # 最终回复

    # 工具相关
    "action",      # 工具调用（tool_call）
    "result",      # 工具结果（tool_result）
]
```

| 类型 | 来源 | 说明 |
|------|------|------|
| `goal` | LLM（通过 step 工具） | 设定目标/计划 |
| `thought` | LLM | 中间思考，不产生工具调用 |
| `evaluation` | LLM | 对一组操作的总结，需要 summary |
| `response` | LLM | 最终给用户的回复 |
| `action` | System | LLM 决定调用工具，系统记录 |
| `result` | System | 工具执行结果 |

### Step 状态

```python
Status = Literal[
    "planned",      # 计划中（未执行）
    "in_progress",  # 执行中
    "completed",    # 已完成
    "failed",       # 失败
    "skipped",      # 跳过
]
```

### Step 模型

```python
@dataclass
class Step:
    step_id: str
    trace_id: str
    step_type: StepType
    status: Status
    sequence: int

    # 树结构（单父节点）
    parent_id: Optional[str] = None

    # 内容
    description: str                      # 所有节点都有
    data: Dict[str, Any] = field(default_factory=dict)

    # 仅 evaluation 类型需要
    summary: Optional[str] = None

    # 执行指标
    duration_ms: Optional[int] = None
    cost: Optional[float] = None
    tokens: Optional[int] = None

    # 时间
    created_at: datetime = field(default_factory=datetime.now)
```

**关键点**：
- `parent_id` 是单个值（树结构），不是列表（DAG）
- `summary` 仅在 `evaluation` 类型节点填充，不是每个节点都需要
- `planned` 状态的 step 相当于 TODO item

---

## 树结构示例

```
Trace
├── goal: "探索代码库" (completed)
│   ├── thought: "需要先了解项目结构"
│   ├── action: glob_files
│   ├── result: [15 files...]
│   ├── thought: "发现配置文件，需要查看内容"
│   ├── action: read_file
│   ├── result: [content...]
│   └── evaluation: "主配置在 /src/config.yaml" ← summary
│
├── goal: "修改配置" (in_progress)
│   ├── action: read_file
│   └── result: [content...]
│
└── goal: "运行测试" (planned)
```

### Parent 关系规则

| Step 类型 | parent 是谁 |
|----------|------------|
| `goal` | 上一个 `goal`（或 None） |
| `thought` | 当前 `in_progress` 的 `goal` |
| `action` | 当前 `in_progress` 的 `goal` |
| `result` | 对应的 `action` |
| `evaluation` | 所属的 `goal` |
| `response` | 当前 `in_progress` 的 `goal`（或 None） |

---

## 元数据设置

### 系统自动记录

以下字段由系统自动填充，不需要 LLM 参与：

```python
step_id: str          # 自动生成
parent_id: str        # 根据当前 focus 的 goal 自动设置
step_type: StepType   # 根据 LLM 输出推断（见下）
sequence: int         # 递增序号
tokens: int           # API 返回
cost: float           # 计算得出
duration_ms: int      # 计时
created_at: datetime  # 当前时间
```

### Step 类型推断

系统根据 LLM 输出内容自动推断类型，不需要显式声明：

```python
def infer_step_type(llm_response) -> StepType:
    # 有工具调用 → action
    if llm_response.tool_calls:
        return "action"

    # 调用了 step 工具且 complete=True → evaluation
    if called_step_tool(llm_response, complete=True):
        return "evaluation"

    # 调用了 step 工具且 plan 不为空 → goal
    if called_step_tool(llm_response, plan=True):
        return "goal"

    # 最终回复（无后续工具调用，对话结束）
    if is_final_response(llm_response):
        return "response"

    # 默认：中间思考
    return "thought"
```

### description 提取

`description` 字段由系统从 LLM 输出中提取：

| Step 类型 | description 来源 |
|----------|-----------------|
| `goal` | step 工具的 plan 参数 |
| `thought` | LLM 输出的第一句话（或截断） |
| `action` | 工具名 + 关键参数 |
| `result` | 工具返回的 title 或简要输出 |
| `evaluation` | step 工具的 summary 参数 |
| `response` | LLM 输出的第一句话（或截断） |

---

## 计划管理工具

### step 工具

模型通过 `step` 工具管理执行进度：

```python
@tool
def step(
    plan: Optional[List[str]] = None,     # 添加 planned goals
    focus: Optional[str] = None,          # 切换焦点到哪个 goal
    complete: bool = False,               # 完成当前 goal
    summary: Optional[str] = None,        # 评估总结（配合 complete）
):
    """管理执行步骤"""
```

### 使用示例

```python
# 1. 创建计划
step(plan=["探索代码库", "修改配置", "运行测试"])

# 2. 开始执行第一个
step(focus="探索代码库")

# 3. [执行各种 tool_call...]

# 4. 完成并切换到下一个
step(complete=True, summary="主配置在 /src/config.yaml", focus="修改配置")

# 5. 中途调整计划
step(plan=["备份配置"])  # 追加新的 goal
```

### 状态变化

```
调用 step(plan=["A", "B", "C"]) 后:
├── goal: "A" (planned)
├── goal: "B" (planned)
└── goal: "C" (planned)

调用 step(focus="A") 后:
├── goal: "A" (in_progress) ← 当前焦点
├── goal: "B" (planned)
└── goal: "C" (planned)

调用 step(complete=True, summary="...", focus="B") 后:
├── goal: "A" (completed)
│   └── evaluation: "..." ← 自动创建
├── goal: "B" (in_progress) ← 新焦点
└── goal: "C" (planned)
```

---

## Context 管理

### 信息分层

不同用途需要不同的信息粒度：

| 用途 | 选择哪些节点 | 详略程度 |
|------|-------------|---------|
| **Todo 列表** | 仅 `goal` 类型 | 简略：描述 + 状态 |
| **历史压缩** | `goal` + `result` + `evaluation` | 详细：包含关键结果 |

### Todo 格式（简略）

```python
def to_todo_string(tree: StepTree) -> str:
    lines = []
    for goal in tree.filter(step_type="goal"):
        icon = {"completed": "✓", "in_progress": "→", "planned": " "}[goal.status]
        lines.append(f"[{icon}] {goal.description}")
    return "\n".join(lines)
```

输出：
```
[✓] 探索代码库
[→] 修改配置
[ ] 运行测试
```

### 历史压缩格式（详细）

```python
def to_history_string(tree: StepTree) -> str:
    lines = []
    for goal in tree.filter(step_type="goal"):
        status_label = {"completed": "完成", "in_progress": "进行中", "planned": "待做"}
        lines.append(f"[{status_label[goal.status]}] {goal.description}")

        if goal.status == "completed":
            # 选择关键结果节点
            for step in goal.children():
                if step.step_type == "result":
                    lines.append(f"  → {extract_brief(step.data)}")
                elif step.step_type == "evaluation":
                    lines.append(f"  总结: {step.summary}")

    return "\n".join(lines)
```

输出：
```
[完成] 探索代码库
  → glob_files: 找到 15 个文件
  → read_file(config.yaml): db_host=prod.db.com
  总结: 主配置在 /src/config.yaml，包含数据库连接配置

[进行中] 修改配置
  → read_file(config.yaml): 已读取

[待做] 运行测试
```

### 压缩触发

```python
def build_messages(messages: List, tree: StepTree) -> List:
    # 正常情况：不压缩
    if estimate_tokens(messages) < MAX_CONTEXT * 0.7:
        return messages

    # 超限时：用树摘要替代历史详情
    history_summary = tree.to_history_string()
    summary_msg = {"role": "assistant", "content": history_summary}

    # 保留最近的详细消息
    return [summary_msg] + recent_messages(messages)
```

### 按需读取

模型可通过工具读取当前进度，而非每次都注入：

```python
@tool
def read_progress() -> str:
    """读取当前执行进度"""
    return tree.to_todo_string()
```

**策略**：
- 正常情况：模型通过 `read_progress` 按需读取（省 context）
- 压缩时：自动注入详细历史摘要（保证不丢失）

---

## 可视化支持

树结构天然支持可视化：

- **折叠**：折叠某个 `goal` 节点 → 隐藏其子节点
- **展开**：展示子节点详情
- **回溯**：`failed` 或 `skipped` 状态的分支
- **并行**：同一 `goal` 下的多个 `action`（并行工具调用）

### 边的信息

可视化时，边（连接线）可展示：
- 执行时间：`Step.duration_ms`
- 成本：`Step.cost`
- 简要描述：`Step.description`

---

## 与 OpenCode 的对比

| 方面 | OpenCode | 本设计 |
|------|----------|--------|
| 计划存储 | Markdown 文件 + Todo 列表 | Step 树（`planned` 状态） |
| 计划与执行关联 | 无结构化关联 | 统一在树结构中 |
| 进度读取 | `todoread` 工具 | `read_progress` 工具 |
| 进度更新 | `todowrite` 工具 | `step` 工具 |
| Context 压缩 | 无 | 基于树结构自动压缩 |

**参考**：OpenCode 的实现见 `src/tool/todo.ts`、`src/session/prompt.ts`

---

## Debug 工具

### 实时查看 Step 树

开发调试时，系统自动输出三种格式的 Step 树：

```python
from agent.debug import dump_tree, dump_markdown, dump_json

# 1. 文本格式（简洁，带截断）
dump_tree(trace, steps)  # 输出到 .trace/tree.txt

# 2. Markdown 格式（完整，可折叠）
dump_markdown(trace, steps)  # 输出到 .trace/tree.md

# 3. JSON 格式（程序化分析）
dump_json(trace, steps)  # 输出到 .trace/tree.json
```

**自动生成**：在 `AgentRunner` 的 debug 模式下，会自动生成 `tree.txt` 和 `tree.md` 两个文件。

### 三种格式对比

| 格式 | 文件大小 | 内容完整性 | 适用场景 |
|-----|---------|----------|---------|
| **tree.txt** | 小（1-2KB） | 截断长内容 | 快速预览、终端查看 |
| **tree.md** | 中（5-10KB） | 完整内容 | 详细调试、编辑器查看 |
| **tree.json** | 大（可能>10KB） | 完整结构化 | 程序化分析、工具处理 |

### Markdown 格式特性

**完整可折叠**：使用 HTML `<details>` 标签实现原生折叠

```markdown
<details>
<summary><b>📨 Messages</b></summary>

```json
[完整的 messages 内容]
```

</details>
```

**智能截断**：
- ✅ **文本内容**：完整显示，不截断
- ✅ **工具调用**：完整显示 JSON schema
- ✅ **图片 base64**：智能截断，显示大小和预览

示例输出：
```json
{
  "type": "image_url",
  "image_url": {
    "url": "<IMAGE_DATA: 2363.7KB, data:image/png;base64, preview: iVBORw0KGgo...>"
  }
}
```

### 查看方式

```bash
# 方式1：终端实时刷新（tree.txt）
watch -n 0.5 cat .trace/tree.txt

# 方式2：VS Code 打开（tree.md，支持折叠）
code .trace/tree.md

# 方式3：浏览器预览（tree.md）
# 在 VS Code 中右键 → "Open Preview" 或使用 Markdown 预览插件
```

### tree.txt 输出示例

```
============================================================
 Step Tree Debug
 Generated: 2024-01-15 14:30:25
============================================================

## Trace
  trace_id: abc123
  task: 修改配置文件
  status: running
  total_steps: 5
  total_tokens: 1234
  total_cost: 0.0150

## Steps

├── [✓] goal: 探索代码库
│   id: a1b2c3d4...
│   duration: 1234ms
│   tokens: 500
│   cost: $0.0050
│   data:
│     description: 探索代码库
│   time: 14:30:10
│
│   ├── [✓] thought: 需要先了解项目结构
│   │   id: e5f6g7h8...
│   │   data:
│   │     content: 让我先看看项目的目录结构...
│   │   time: 14:30:11
│   │
│   ├── [✓] action: glob_files
│   │   id: i9j0k1l2...
│   │   duration: 50ms
│   │   data:
│   │     tool_name: glob_files
│   │     arguments: {"pattern": "**/*.py"}
│   │   time: 14:30:12
│   │
│   └── [✓] result: 找到 15 个文件
│       id: m3n4o5p6...
│       data:
│         output: ["src/main.py", "src/config.py", ...]
│       time: 14:30:12
│
└── [→] goal: 修改配置
    id: q7r8s9t0...
    time: 14:30:15
```

**实现**：`agent/execution/tree_dump.py`

---

## 实现位置

- Step 模型：`agent/execution/models.py:Step`（已实现）
- Trace 模型：`agent/execution/models.py:Trace`（已实现）
- 存储接口：`agent/execution/protocols.py:TraceStore`（已实现）
- 内存存储：`agent/execution/store.py:MemoryTraceStore`（已实现）
- Debug 工具：`agent/execution/tree_dump.py`（已实现）
- **Core Skill**：`agent/skills/core.md`（已实现）
- step 工具：`agent/tools/builtin/step.py`（待实现）
- read_progress 工具：`agent/tools/builtin/step.py`（待实现）
- Context 压缩：`agent/context/compressor.py`（待实现）

---

## 可视化 API

### 设计目标

为前端提供 Step 树的查询和实时推送接口，支持：
1. 历史任务和进行中任务的查询
2. 大型 Trace（上千 Step）的按需加载
3. WebSocket 实时推送进行中任务的更新

### 核心设计

**简化原则**：消除"批次计算"和"同层完整性检查"的复杂逻辑，使用简单的层级懒加载

**数据结构**：返回树形 JSON，前端无需自行构建

**性能策略**：
- 小型 Trace（<100 Steps）：用 `/tree` 一次性返回完整树
- 大型 Trace（>100 Steps）：用 `/node/{step_id}` 按需懒加载
- 进行中任务：WebSocket 推送增量更新

### API 端点

```
GET  /api/traces                          # 列出 Traces（支持过滤）
GET  /api/traces/{trace_id}               # 获取 Trace 元数据
GET  /api/traces/{trace_id}/tree          # 获取完整树（小型 Trace）
GET  /api/traces/{trace_id}/node/{step_id}  # 懒加载节点 + 子节点
WS   /api/traces/{trace_id}/watch         # 监听进行中的更新
```

### 懒加载核心逻辑

```python
async def get_node_with_children(
    store: TraceStore,
    step_id: Optional[str],  # None = 根节点
    trace_id: str,
    expand: bool = False,
    max_depth: int = 1
) -> dict:
    # 1. 获取当前层节点
    if step_id is None:
        steps = await store.get_trace_steps(trace_id)
        current_nodes = [s for s in steps if s.parent_id is None]
    else:
        current_nodes = await store.get_step_children(step_id)

    # 2. 构建响应
    result = []
    for step in current_nodes:
        node = step.to_dict()
        node["children"] = []

        # 3. 递归加载子节点（可选）
        if expand and current_depth < max_depth:
            children = await store.get_step_children(step.step_id)
            if children:
                node["children"] = [...]  # 递归

        result.append(node)

    return result
```

**品味评分**：🟢 好品味（逻辑清晰，< 30 行，无特殊情况）

### WebSocket 事件

```json
// 新增 Step
{"event": "step_added", "step": {...}}

// Step 更新
{"event": "step_updated", "step_id": "...", "updates": {...}}

// Trace 完成
{"event": "trace_completed", "trace_id": "..."}
```

### 实现位置（待定）

两种方案：

**方案 1：独立 API 模块**（推荐，如果未来需要多种 API）
```
agent/api/
├── server.py           # FastAPI 应用
├── routes/
│   ├── traces.py       # Step 树路由
│   └── websocket.py    # WebSocket 推送
└── schemas.py          # Pydantic 模型
```

**方案 2：Step 树专用模块**（推荐，如果只用于 Step 树可视化）
```
agent/step_tree/
├── api.py              # FastAPI 路由
├── websocket.py        # WebSocket 推送
└── server.py           # 独立服务入口
```

决策依据：
- 如果系统未来需要提供多种 API（Experience 管理、Agent 控制等）→ 方案 1
- 如果 API 仅用于 Step 树可视化 → 方案 2

**详细设计**：参见 `/Users/sunlit/.claude/plans/starry-yawning-zebra.md`

---

## 未来扩展

- 重试原因、重试次数、是否降级/兜底
- 为什么选择某个动作、是否触发了 skills、系统 prompt 中的策略
- 数据库持久化（PostgreSQL/Neo4j）
- 递归查询优化（PostgreSQL CTE）