设计目标:可扩展、可学习的 Agent 框架,支持执行追踪和持久记忆。
使用场景:后台执行复杂任务,人类专家定期检查和反馈。
单次调用是 Agent 的特例:
| 特性 | 单次调用 | Agent 模式 |
|---|---|---|
| 循环次数 | 1 | N (可配置) |
| 工具调用 | 可选 | 常用 |
| 状态管理 | 无 | 有 (Trace) |
| 记忆检索 | 无 | 有 (Experience/Skill) |
| 执行图 | 1 个节点 | N 个节点的 DAG |
┌─────────────────────────────────────────────────────────────┐
│ Layer 3: Skills(技能库) │
│ - Markdown 文件,存储详细的能力描述 │
│ - 通过 skill 工具按需加载 │
└─────────────────────────────────────────────────────────────┘
▲
│ 归纳
┌─────────────────────────────────────────────────────────────┐
│ Layer 2: Experience(经验库) │
│ - 数据库存储,条件 + 规则 + 证据 │
│ - 向量检索,注入到 system prompt │
└─────────────────────────────────────────────────────────────┘
▲
│ 提取
┌─────────────────────────────────────────────────────────────┐
│ Layer 1: Task State(任务状态) │
│ - 当前任务的工作记忆 │
│ - Trace/Step 记录执行过程 │
└─────────────────────────────────────────────────────────────┘
注入方式:
skill 工具动态加载到对话历史@dataclass
class Trace:
trace_id: str
mode: Literal["call", "agent"]
# 任务信息
task: Optional[str] = None
agent_type: Optional[str] = None
# 状态
status: Literal["running", "completed", "failed"] = "running"
# 上下文(灵活的元数据)
context: Dict[str, Any] = field(default_factory=dict)
# 时间
created_at: datetime
completed_at: Optional[datetime] = None
context 字段说明:存储任务相关的元信息,用于管理和分析
user_id: 用户 IDproject_id: 项目 IDpriority: 优先级tags: 标签列表@dataclass
class Step:
step_id: str
trace_id: str
step_type: str # "llm_call", "tool_call", "tool_result", ...
# DAG 结构
parent_ids: List[str] = field(default_factory=list)
# 灵活的步骤数据
data: Dict[str, Any] = field(default_factory=dict)
created_at: datetime
常见 step_type:
llm_call: LLM 调用(data: messages, response, tokens, cost)tool_call: 工具调用(data: tool_name, arguments)tool_result: 工具结果(data: output, metadata)reasoning: 推理过程(data: content)Trace
│
├─▶ Step(llm_call)
│ │
│ ├─▶ Step(tool_call: skill)
│ │ └─▶ Step(tool_result: "# Error Handling...")
│ │
│ └─▶ Step(tool_call: search_logs)
│ └─▶ Step(tool_result: "...")
│
└─▶ Step(llm_call)
└─▶ ...
Markdown 文件:
~/.reson/skills/ # 全局
├── error-handling/SKILL.md
└── data-processing/SKILL.md
./project/.reson/skills/ # 项目级
└── api-integration/SKILL.md
---
name: error-handling
description: Error handling best practices
---
## When to use
- Analyzing error logs
- Debugging production issues
## Guidelines
- Look for stack traces first
- Check error frequency
- Group by error type
通过 skill 工具:
@tool(id="skill", description="Load a skill by name", parameters={"name": str})
async def skill_tool(name: str) -> str:
# 扫描 Skills 目录
for dir in [Path.home() / ".reson/skills", Path.cwd() / ".reson/skills"]:
skill_file = dir / name / "SKILL.md"
if skill_file.exists():
return skill_file.read_text()
raise FileNotFoundError(f"Skill '{name}' not found")
本质:读取文件的工具,返回字符串。
PostgreSQL + pgvector:
CREATE TABLE experiences (
exp_id TEXT PRIMARY KEY,
scope TEXT, -- "agent:executor" 或 "user:123"
condition TEXT, -- "当遇到数据库连接超时"
rule TEXT, -- "增加重试次数到5次"
evidence JSONB, -- 证据(step_ids)
source TEXT, -- "execution", "feedback", "manual"
confidence FLOAT,
usage_count INT,
success_rate FLOAT,
embedding vector(1536), -- 向量检索
created_at TIMESTAMP,
updated_at TIMESTAMP
);
# 1. 检索相关 Experiences
experiences = await db.query(
"""
SELECT condition, rule, success_rate
FROM experiences
WHERE scope = $1
ORDER BY embedding <-> $2
LIMIT 10
""",
f"agent:{agent_type}",
embed(task)
)
# 2. 注入到 system prompt
system_prompt = base_prompt + "\n\n# Learned Experiences\n" + "\n".join([
f"- When {e.condition}, then {e.rule} (success rate: {e.success_rate:.1%})"
for e in experiences
])
async def run(task: str, max_steps: int = 50):
# 1. 创建 Trace
trace = Trace(trace_id=gen_id(), task=task, status="running")
await trace_store.save(trace)
# 2. 检索 Experiences,构建 system prompt
experiences = await search_experiences(task)
system_prompt = build_system_prompt(experiences)
# 3. 初始化消息
messages = [{"role": "user", "content": task}]
# 4. ReAct 循环
for step in range(max_steps):
# 调用 LLM
response = await llm.chat(
messages=messages,
system=system_prompt,
tools=tool_registry.to_schema() # 包括 skill 工具
)
# 记录 LLM 调用
await add_step(trace, "llm_call", {
"response": response.content,
"tool_calls": response.tool_calls
})
# 没有工具调用,完成
if not response.tool_calls:
break
# 执行工具
for tool_call in response.tool_calls:
# Doom loop 检测
if is_doom_loop(tool_call):
raise DoomLoopError()
# 执行工具(包括 skill 工具)
result = await execute_tool(tool_call)
# 记录步骤
await add_step(trace, "tool_call", {"tool": tool_call.name, "args": tool_call.args})
await add_step(trace, "tool_result", {"output": result})
# 添加到消息历史
messages.append({"role": "assistant", "tool_calls": [tool_call]})
messages.append({"role": "tool", "content": result})
# 5. 完成
trace.status = "completed"
await trace_store.save(trace)
return trace
Doom Loop 检测:
@dataclass
class ToolResult:
output: str
metadata: Dict[str, Any] = field(default_factory=dict)
class Tool(Protocol):
id: str
description: str
parameters: Type[BaseModel]
async def execute(self, args: Dict, ctx: ToolContext) -> ToolResult: ...
@tool(id="read", description="Read a file", parameters={"path": str})
async def read_tool(path: str) -> str:
return Path(path).read_text()
@tool(id="skill", description="Load a skill", parameters={"name": str})
async def skill_tool(name: str) -> str:
# 扫描并加载 Skill 文件
...
registry = ToolRegistry()
registry.register(read_tool)
registry.register(skill_tool)
registry.register(search_tool)
# 转换为 LLM schema
tools_schema = registry.to_schema()
class TraceStore(Protocol):
async def save(self, trace: Trace) -> None: ...
async def get(self, trace_id: str) -> Trace: ...
async def add_step(self, step: Step) -> None: ...
async def get_steps(self, trace_id: str) -> List[Step]: ...
class ExperienceStore(Protocol):
async def search(self, scope: str, query: str, limit: int) -> List[Dict]: ...
async def add(self, exp: Dict) -> None: ...
async def update_stats(self, exp_id: str, success: bool) -> None: ...
class SkillLoader(Protocol):
async def scan(self) -> List[str]: # 返回 skill names
"""扫描并返回所有可用的 skill 名称"""
async def load(self, name: str) -> str: # 返回内容
"""加载指定 skill 的 Markdown 内容"""
实现策略:
reson_agent/
├── __init__.py
├── runner.py # AgentRunner
├── models.py # Trace, Step
├── storage/
│ ├── protocols.py # TraceStore, ExperienceStore, SkillLoader
│ ├── trace_fs.py # 文件系统实现
│ ├── experience_pg.py # PostgreSQL 实现
│ └── skill_fs.py # 文件系统实现
├── tools/
│ ├── registry.py # ToolRegistry
│ ├── decorator.py # @tool
│ └── builtin.py # read, skill, search
└── llm.py # LLMProvider Protocol
方案对比:
| 方案 | 优点 | 缺点 |
|---|---|---|
| 预先注入到 system prompt | 简单 | 浪费 token,Agent 无法选择 |
| 作为工具动态加载 | 按需加载,Agent 自主选择 | 需要实现 skill 工具 |
选择:动态加载(参考 OpenCode 和 Claude API 文档)
选择:文件系统
选择:数据库(PostgreSQL + pgvector)
原因:后台场景,不需要实时通知
| 概念 | 定义 | 存储 |
|---|---|---|
| Trace | 一次任务执行 | 文件系统(JSON) |
| Step | 执行步骤 | 文件系统(JSON) |
| Skill | 能力描述(Markdown) | 文件系统 |
| Experience | 经验规则(条件+规则) | 数据库 + 向量 |
| Agent Loop | ReAct 循环 | - |
| Doom Loop | 无限循环检测 | - |