Tool Agent 项目设计方案

1. 项目概述

项目名称：tool_agent
项目目标：构建一个可以自动封装、自动接入、自动部署、自动编写工具的 Agent + 工具库系统
目标用户：其他 Agent 系统（作为工具供应商角色）

2. 系统定位

tool_agent 是一个本地常驻的智能工具管理系统，内部由两个 Agent 协作驱动：

┌─────────────────────────────────────────────┐
│              Router Agent                    │
│  统领全局，维护路由层                         │
│  - 对外接口管理（FastAPI/MCP/WS）            │
│  - 请求分发与调度                             │
│  - 冷热调度、健康监控                         │
│  - 中间件管理（鉴权/缓存/计量）              │
├─────────────────────────────────────────────┤
│              Code Agent                      │
│  维护工具库                                   │
│  - 编写/获取/部署新工具                       │
│  - staging 验证与 promote                    │
│  - 逆向 API 自修复                           │
│  - 财务管理（账号注册/API 充值）              │
└─────────────────────────────────────────────┘

角色	职责	关注点
Router Agent	维护路由层，管理对外接口和内部分发	"怎么调" — 调度、监控、流量
Code Agent	维护工具库，获取和生产工具	"有什么可调" — 工具供给、质量、演进

两者通过内部消息通信：

Code Agent 完成新工具 promote 后，通知 Router Agent 更新注册表和路由规则
Router Agent 发现工具异常时，通知 Tool Agent 介入修复

3. 对外接口

3.1 工具目录查询

外部 Agent 请求获取当前可用工具列表，按类别展示，供其挑选。

3.2 工具调用执行

外部 Agent 选定工具后发起调用请求，tool_agent 启动对应工具服务，执行并返回结果。

3.3 新工具需求响应

外部 Agent 提出工具库中不存在的工具需求时，tool_agent 自主决策获取方式：

策略	说明
A. 注册购买 API	直接对接第三方付费/免费 API
B. 本地部署	拉取开源项目，本地或 Docker 部署
C. 逆向 API	注册账号后逆向分析目标服务接口
D. Browser-Use	通过浏览器自动化操作网页工具

4. 工具库三层架构

┌─────────────────────────────────────────────┐
│              外部 Agent 请求                  │
└──────────────────┬──────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────┐
│            注册层 (Registry)                  │
│  - 工具分类注册                               │
│  - 统一接口封装（输入/输出规范）               │
│  - 工具元信息描述（用法、参数、返回值）        │
└──────────────────┬──────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────┐
│     路由层 (Router) — 兼任对外网关            │
│  - 对外：FastAPI (HTTP+SSE) / MCP 接口       │
│  - 请求解析与工具匹配                         │
│  - 冷热调度（按需唤醒/空闲休眠）              │
│  - 环境感知（工具在哪、怎么启动、怎么对接）    │
│  - 参数转换与传递                             │
│  - 中间件：鉴权 / 结果缓存 / 调用计量        │
└──────────────────┬──────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────┐
│            环境层 (Runtime)                   │
│  - 工具物理存储与管理                         │
│  - 环境隔离（本地进程 / Docker 容器）         │
│  - 生命周期管理（启动/停止/健康检查）          │
│  - 资源配额（CPU/内存/显存限制）               │
└─────────────────────────────────────────────┘

5. 接口设计

5.1 对外接口（路由层直接暴露）

路由层同时承担网关职责，对外仅暴露两个固定端口：

协议	端口	用途
FastAPI (HTTP)	8001	RESTful 接口（含 SSE 流式推送）
MCP (Model Context Protocol)	8001	标准 MCP Server，供支持 MCP 的 Agent 直接对接

不单独开 WebSocket 端口。流式返回用 SSE，异步任务用回调/轮询，理由：

SSE 基于普通 HTTP，单向推流，客户端实现简单，出问题好调试
外部 Agent 多为短生命周期进程，维护 WebSocket 长连接反而是负担
如果未来有明确的多事件长连接需求，再加 WebSocket 也不迟

FastAPI 路由

GET  /tools                     # 获取工具目录（按类别）
GET  /tools/{tool_id}/schema    # 获取单个工具的输入输出规范
POST /tools/{tool_id}/invoke    # 调用工具（Accept: text/event-stream 时返回 SSE）
POST /tools/request             # 提交新工具需求，返回 task_id
GET  /tasks/{task_id}/status    # 异步任务状态轮询
GET  /health                    # 服务健康检查
GET  /openapi.json              # OpenAPI 定义自动导出

异步任务（新工具需求）

// POST /tools/request
{
  "description": "需要一个图片压缩工具",
  "callback_url": "http://caller-agent/callback"   // 可选
}

// 响应
{
  "task_id": "uuid",
  "status": "pending"
}

// 任务完成后：
// 方式 A：tool_agent POST callback_url 推送结果
// 方式 B：外部 Agent 轮询 GET /tasks/{task_id}/status

MCP 接口

以标准 MCP Tool 协议暴露，每个注册工具自动映射为一个 MCP Tool，外部 Agent 可通过 tools/list 和 tools/call 直接使用。

5.2 内部接口（按环境分层）

uv 本地工具各自独立 venv，通过子进程 + stdio JSON 通信（零网络开销，保持环境隔离）；Docker 工具通过端口映射 HTTP 通信。

                    外部 Agent
                        │
              ┌─────────┴─────────┐
              ▼                   ▼
        :8001 FastAPI        :8001 MCP
        (HTTP + SSE)
              └─────────┬─────────┘
                        │ 路由层按 tool_id 分发
              ┌─────────┴─────────┐
              ▼                   ▼
     ┌── uv 本地工具 ──┐   ┌── Docker 工具 ──┐
     │  子进程 + stdio  │   │  HTTP 端口映射   │
     │  各自独立 venv   │   │  :9001  :9002   │
     │  tool_a  tool_c  │   │  tool_b  tool_d │
     └─────────────────┘   └─────────────────┘

uv 本地工具接口规范（子进程 + stdio JSON）

每个 uv 工具必须实现统一的 Python 入口，通过 stdin 接收请求、stdout 输出结果：

# tools/local/tool_a/main.py
import sys, json

class Tool:
    name: str = "tool_a"
    description: str = "工具描述"

    def schema(self) -> dict:
        return {"input": {...}, "output": {...}}

    def run(self, params: dict, stream: bool = False) -> dict:
        return {"result": ...}

    def health(self) -> bool:
        return True

if __name__ == "__main__":
    tool = Tool()
    request = json.loads(sys.stdin.read())
    action = request.get("action", "run")

    if action == "schema":
        output = tool.schema()
    elif action == "health":
        output = {"healthy": tool.health()}
    else:
        output = tool.run(request.get("params", {}), request.get("stream", False))

    json.dump(output, sys.stdout)

路由层调用方式：

import subprocess, json

def call_local_tool(tool_id: str, request: dict) -> dict:
    result = subprocess.run(
        ["uv", "run", "--directory", f"tools/local/{tool_id}", "python", "main.py"],
        input=json.dumps(request),
        capture_output=True, text=True
    )
    return json.loads(result.stdout)

每次调用启动独立子进程，天然隔离
无需管理端口或长驻进程
冷启动有一定开销，高频工具可考虑进程池复用

Docker 工具接口规范（HTTP）

Docker 工具因环境隔离，必须在容器内启动 HTTP 服务：

POST /run          # 执行工具主逻辑（支持流式 SSE）
GET  /health       # 健康检查
GET  /schema       # 返回自身输入输出 schema

请求/响应格式：

// POST /run 请求
{
  "request_id": "uuid",
  "params": { },
  "stream": false
}

// POST /run 响应（非流式）
{
  "request_id": "uuid",
  "status": "success | error",
  "result": { },
  "error": "错误信息（可选）"
}

// POST /run 响应（流式，Content-Type: text/event-stream）
data: {"request_id": "uuid", "chunk": "部分结果...", "done": false}
data: {"request_id": "uuid", "chunk": "最终结果", "done": true}

路由层中间件

中间件	作用
结果缓存	相同 tool_id + params 的重复调用直接返回缓存，可配置 TTL
调用计量	记录每个工具的调用频次、耗时、错误率，供调度决策使用

5.3 环境间通信

场景	通信方式	说明
uv 工具 ↔ 路由层	子进程 stdio JSON	`uv run` 启动独立 venv，stdin/stdout 传参，天然隔离
Docker(conda) ↔ 路由层	localhost:{映射端口} HTTP	容器端口映射到宿主机
Docker ↔ Docker	docker network	同一 bridge 内可直接通信
uv ↔ Docker(conda)	经路由层中转	不直接通信，统一走路由层

5.4 端口分配与服务发现

对外（固定）:
  8001        - FastAPI（HTTP + SSE）
  8001        - MCP Server

对内:
  uv 本地工具  - 无需端口，子进程 stdio 通信
  Docker 工具  - 9001+ 端口映射，按注册顺序递增
  9000         - 路由层管理端口

对外只有 8001/8001，外部 Agent 永远只访问这两个入口
uv 工具零端口占用，路由层 import 后直接调用
Docker 工具启动时分配端口，写入注册表，停止后回收

备选：Unix Domain Sockets (UDS)

对于同机通信场景，可选用 UDS 替代 TCP 端口：

路径规范：/tmp/tool_agent/sockets/{tool_id}.sock
优势：不占端口号、无网络协议栈开销、速度更快
适用于 uv 本地工具；Docker 工具仍需端口映射

注册表增强

registry.json 中每个工具条目增加调度相关字段：

{
  "tool_id": "tool_a",
  "port": 9001,
  "socket": "/tmp/tool_agent/sockets/tool_a.sock",
  "last_used_time": "2026-03-19T10:00:00Z",
  "call_count": 42,
  "avg_latency_ms": 120,
  "state": "running | sleeping | stopped"
}

5.5 部署设计

主服务（tool_agent 本体）

运行环境：uv 管理的 Python 虚拟环境
启动方式：uv run python -m tool_agent
包含：路由层（对外接口 + 内部分发）+ MCP Server + Agent 逻辑

uv 本地工具

每个工具一个独立目录，各自 pyproject.toml
由路由层通过 uv run 启动子进程
适用于轻量工具、纯 Python 工具

Docker 工具（conda 环境）

每个工具一个 Dockerfile，内部使用 conda 管理依赖
由路由层通过 Docker SDK 启动/停止容器

适用于：重依赖工具、GPU 工具、需要特殊系统库的工具

tools/
├── local/                    # uv 本地工具（生产）
│   ├── tool_a/
│   │   ├── pyproject.toml
│   │   ├── main.py           # 实现 /run /health /schema
│   │   └── ...
│   └── tool_b/
├── docker/                   # Docker 工具（生产）
│   ├── tool_c/
│   │   ├── Dockerfile
│   │   ├── environment.yml   # conda 环境定义
│   │   ├── main.py
│   │   └── ...
│   └── tool_d/
├── staging/                  # 预发环境（Agent 新写的工具先放这里）
│   └── tool_new/
│       ├── main.py
│       └── test_tool.py      # Agent 自动生成的测试脚本
└── registry.json             # 工具注册表

5.6 冷热调度机制

路由层根据工具类型和使用频率，自动管理工具进程的生命周期：

类型	策略	说明
冷工具（uv 本地）	按需唤醒	收到请求时 `uv run` 启动，空闲超时后自动杀掉进程释放内存
热工具（Docker）	常驻 + 置换	高频工具保持容器运行；显存/内存满时按 LRU 策略置换低频容器
API 代理	无状态	无需管理进程，直接转发

调度参数（可配置）：

{
  "cold_tool_idle_timeout_s": 300,
  "hot_tool_max_containers": 5,
  "eviction_policy": "lru"
}

5.7 Staging 预发环境

Agent 自主编写的新工具不直接进入生产目录，需经过预发验证：

Agent 编写代码 → staging/ 目录
       ↓
路由层分配临时端口启动
       ↓
Agent 编写并执行测试脚本
       ↓
  ┌─ 通过 → Promote 到 local/ 或 docker/，注册到 registry
  └─ 失败 → Agent 修复后重试，或标记为 failed 等待人工介入

安全审计：Agent 生成的代码在 promote 前需通过基础安全检查（禁止危险系统调用如 rm -rf、os.system 等）。

5.8 逆向 API 自修复

当逆向接入的工具出现故障时，触发自动修复闭环：

工具返回 error（403/签名失效/接口变更）
       ↓
路由层标记工具状态为 degraded
       ↓
通知 Tool Agent 介入
       ↓
  ┌─ 尝试 Browser-Use 重新抓包更新接口参数
  ├─ 尝试切换到备用 API 策略
  └─ 均失败 → 标记 inactive，通知调用方降级

对外部调用者完全透明：调用方只看到短暂延迟或降级通知，无需感知内部策略切换。

6. 双 Agent 架构

6.1 Router Agent（路由层维护者）

常驻进程，统领全局，是系统的"大脑"。

能力	说明
对外接口管理	维护 FastAPI / MCP / WebSocket 服务，处理外部请求
请求路由	解析请求，匹配工具，分发到 uv 子进程或 Docker 容器
冷热调度	按需唤醒冷工具、LRU 置换热工具、管理进程/容器生命周期
健康监控	定期检查工具状态，发现异常时通知 Tool Agent 修复
中间件管理	鉴权、结果缓存、调用计量、流量控制
注册表维护	接收 Tool Agent 的 promote 通知，更新路由规则

6.2 Tool Agent（工具库维护者）

按需唤醒或常驻，负责工具的"生产"和"维修"。

能力	说明
工程编码	拥有独立工作区（staging/），自主编写工具代码、测试脚本、接口适配器
代码审计	对生成代码进行安全检查（禁止危险系统调用、SQL 注入等）
财务能力	自动注册第三方账号、管理 API Key、自动充值/订阅付费 API
决策推理	分析工具需求，选择最优获取策略（API/部署/逆向/浏览器）
Browser-Use	浏览器自动化，用于网页操作、信息采集、账号注册、逆向抓包
子 Agent 调度	将子任务分发给专用子 Agent 并行处理
自修复	接收 Router Agent 的异常通知，自动修复失效工具
知识维护	实时总结工具库结构变更，维护各层元信息

6.3 双 Agent 协作流程

外部 Agent 请求新工具
       │
       ▼
Router Agent 收到请求，查注册表无此工具
       │
       ▼ 通知
Tool Agent 介入 → 决策获取策略 → 编码/部署/购买
       │
       ▼
staging 验证通过 → promote
       │
       ▼ 通知
Router Agent 更新注册表 + 路由规则
       │
       ▼
Router Agent 响应外部 Agent：工具已就绪

Router Agent 健康检查发现工具异常
       │
       ▼ 通知
Tool Agent 介入 → 自修复（重新抓包/切换策略）
       │
       ▼ 修复完成通知
Router Agent 更新工具状态为 active

6.4 财务管理（Tool Agent）

Agent 在获取新工具时可能涉及付费操作，需要一套财务管控机制：

{
  "budget": {
    "monthly_limit_usd": 100,
    "single_tx_limit_usd": 20,
    "require_approval_above_usd": 10,
    "spent_this_month_usd": 0
  },
  "accounts": [
    {
      "provider": "openai",
      "api_key": "sk-***",
      "balance_usd": 50,
      "auto_recharge": false
    }
  ]
}

低于单笔限额：Agent 自主完成注册/充值
超过审批阈值：暂停操作，通知用户审批后继续
所有支出记录到 billing_log.json，可追溯

6.5 独立编码工作区（Tool Agent）

Agent 编写代码的完整闭环：

staging/
├── {task_id}/              # 每个编码任务一个隔离目录
│   ├── main.py             # Agent 编写的工具代码
│   ├── test_main.py        # Agent 编写的测试脚本
│   ├── pyproject.toml      # 依赖声明
│   └── run_log.txt         # 测试运行日志

工作流程：

1. 创建任务目录 staging/{task_id}/
2. Agent 编写代码 + 测试脚本
3. 在隔离环境中 uv run pytest 执行测试
4. 测试失败 → Agent 读取日志 → 修复 → 重新测试（最多 N 轮）
5. 测试通过 → 安全审计 → Promote 到生产目录

6.6 运行模式

Router Agent：常驻进程，始终在线，通过 WebSocket 保持长连接
Tool Agent：按需唤醒（收到新工具需求或异常通知时启动），也可常驻
两者通过进程内消息队列通信，无需额外网络开销

7. 数据模型（工具元信息）

{
  "tool_id": "string",
  "name": "工具名称",
  "category": "分类标签",
  "description": "功能描述",
  "input_schema": {},
  "output_schema": {},
  "stream_support": false,
  "runtime": {
    "type": "local | docker | api | browser",
    "entry": "启动入口",
    "env": {},
    "resource_limits": {
      "cpu": "1.0",
      "memory_mb": 512,
      "gpu": false
    }
  },
  "scheduling": {
    "mode": "cold | hot | stateless",
    "idle_timeout_s": 300,
    "state": "running | sleeping | stopped | degraded"
  },
  "stats": {
    "last_used_time": "ISO8601",
    "call_count": 0,
    "avg_latency_ms": 0,
    "error_rate": 0.0
  },
  "fallback_strategy": ["api", "browser"],
  "status": "active | inactive | staging | building"
}

8. 通信协议

8.1 对外协议（HTTP + SSE）

// 同步调用 POST /tools/{tool_id}/invoke
// 请求
{
  "params": { },
  "stream": false
}

// 响应（非流式）
{
  "status": "success | error",
  "result": { }
}

// 响应（流式，Accept: text/event-stream）
data: {"chunk": "部分结果...", "done": false}
data: {"chunk": "最终结果", "done": true}

8.2 内部协议（双 Agent 消息总线）

Router Agent 与 Tool Agent 通过 asyncio.Queue 通信，零网络开销：

# 消息格式
{
  "type": "tool_request | tool_ready | tool_error | health_alert",
  "payload": { }
}

9. 项目代码结构

tool_agent/
├── pyproject.toml                  # 主项目依赖（fastapi, uvicorn, docker, mcp-sdk 等）
├── README.md
│
├── src/
│   └── tool_agent/
│       ├── __init__.py
│       ├── __main__.py             # 入口：uv run python -m tool_agent
│       ├── config.py               # 全局配置（端口、调度参数、预算等）
│       │
│       ├── router/                 # Router Agent — 路由层
│       │   ├── __init__.py
│       │   ├── agent.py            # Router Agent 主逻辑（调度决策、健康监控）
│       │   ├── server.py           # FastAPI 应用定义 + 路由注册 + SSE 流式
│       │   ├── mcp_server.py       # MCP Server 适配层
│       │   ├── dispatcher.py       # 请求分发：按 tool_id 路由到 uv 子进程 / Docker
│       │   ├── scheduler.py        # 冷热调度（唤醒/休眠/LRU 置换）
│       │   ├── middleware/
│       │   │   ├── __init__.py
│       │   │   ├── auth.py         # 鉴权
│       │   │   ├── cache.py        # 结果缓存
│       │   │   └── metrics.py      # 调用计量
│       │   └── health.py           # 工具健康检查
│       │
│       ├── tool/                   # Tool Agent — 工具库维护
│       │   ├── __init__.py
│       │   ├── agent.py            # Tool Agent 主逻辑（决策、编排）
│       │   ├── builder.py          # 工具编码：生成代码 + 测试脚本
│       │   ├── deployer.py         # 工具部署：uv init / docker build
│       │   ├── promoter.py         # staging → 生产目录 promote 流程
│       │   ├── auditor.py          # 代码安全审计
│       │   ├── repairer.py         # 逆向 API 自修复
│       │   ├── browser.py          # Browser-Use 能力封装
│       │   └── finance.py          # 财务管理（账号注册/充值/预算）
│       │
│       ├── registry/               # 注册层
│       │   ├── __init__.py
│       │   ├── registry.py         # 注册表 CRUD（读写 registry.json）
│       │   ├── catalog.py          # 工具目录（按类别组织、搜索）
│       │   └── schema.py           # 工具元信息 schema 定义（Pydantic models）
│       │
│       ├── runtime/                # 环境层
│       │   ├── __init__.py
│       │   ├── local_runner.py     # uv 子进程调用（subprocess + stdio JSON）
│       │   ├── docker_runner.py    # Docker 容器管理（Docker SDK）
│       │   ├── api_proxy.py        # 外部 API 代理转发
│       │   └── resource.py         # 资源配额管理（CPU/内存/显存）
│       │
│       ├── messaging.py            # 双 Agent 内部消息队列（asyncio.Queue）
│       └── models.py               # 公共数据模型（请求/响应/工具元信息）
│
├── tools/                          # 工具库（与主项目代码分离）
│   ├── local/                      # uv 本地工具（生产）
│   │   └── example_tool/
│   │       ├── pyproject.toml
│   │       └── main.py
│   ├── docker/                     # Docker 工具（生产）
│   │   └── example_gpu_tool/
│   │       ├── Dockerfile
│   │       ├── environment.yml
│   │       └── main.py
│   └── staging/                    # 预发环境
│       └── .gitkeep
│
├── data/                           # 运行时数据
│   ├── registry.json               # 工具注册表
│   ├── billing_log.json            # 财务支出记录
│   └── config.json                 # 运行时配置覆盖
│
└── tests/
    ├── test_dispatcher.py
    ├── test_registry.py
    ├── test_scheduler.py
    └── test_local_runner.py

9.1 模块职责映射

目录	归属	职责
`src/tool_agent/router/`	Router Agent	对外接口、请求分发、调度、中间件、健康监控
`src/tool_agent/tool/`	Tool Agent	工具编码、部署、审计、修复、财务、browser-use
`src/tool_agent/registry/`	共享	注册表读写，双 Agent 都会访问
`src/tool_agent/runtime/`	Router Agent 调用	实际执行工具的运行时（子进程/Docker/API代理）
`tools/`	Tool Agent 维护	工具代码本身，与主项目解耦
`data/`	共享	运行时状态数据

9.2 关键入口

# src/tool_agent/__main__.py
import asyncio
from tool_agent.router.agent import RouterAgent
from tool_agent.tool.agent import ToolAgent
from tool_agent.messaging import MessageBus

async def main():
    bus = MessageBus()
    router_agent = RouterAgent(bus)
    tool_agent = ToolAgent(bus)

    await asyncio.gather(
        router_agent.start(),   # 启动 FastAPI/MCP/WS + 调度循环
        tool_agent.start(),     # 监听消息队列，按需处理任务
    )

if __name__ == "__main__":
    asyncio.run(main())

10. 里程碑计划

阶段	内容	说明
Phase 1	基础框架	路由层（FastAPI + MCP + SSE）+ 注册层 + 工具目录查询
Phase 2	路由与执行	路由层实现 + 冷热调度 + 本地工具调用链路打通
Phase 3	环境层	Docker 隔离 + 资源配额 + 工具生命周期管理
Phase 4	Staging 闭环	预发环境 + 代码审计 + 自动测试 + Promote 流程
Phase 5	Agent 智能	新工具自动获取（API/部署/逆向/browser-use）+ 自修复
Phase 6	自维护	知识总结、健康检查、工具自动更新、调用缓存

11. 风险与应对

风险	影响	应对措施
逆向 API 稳定性差	工具不可用	自修复闭环：Browser-Use 重新抓包 → 备用策略切换 → 降级通知
Docker 环境资源占用	本地性能	冷热调度 + 资源配额（CPU/内存/显存上限）
工具间依赖冲突	环境污染	严格隔离，每工具独立环境
外部 Agent 恶意调用	安全风险	鉴权机制 + 调用频率限制
Agent 生成危险代码	系统安全	staging 预发验证 + 代码安全审计
端口碎片化	管理复杂	UDS 备选方案 + 端口回收 + LRU 淘汰

design.md 29 KB تاريخچه خام