Prompt Caching 测试

测试 Anthropic Prompt Caching 功能的命中率和效果。

功能说明

这个测试用例会：

执行多次简单的工具调用（bash_command）
产生大量的 assistant 和 tool 消息
观察缓存的创建和命中情况
统计缓存命中率和节省的成本

使用方法

1. 配置环境变量

export OPENROUTER_API_KEY="your-api-key"

2. 运行测试

cd examples/cache-test
python3 run.py

3. 查看结果

测试完成后会显示：

总消息数
总 tokens
Cache creation tokens（创建缓存的 tokens）
Cache read tokens（从缓存读取的 tokens）
缓存命中率
估算节省的成本

4. 分析详细数据

# 查看 trace 数据
cd .trace/<trace-id>

# 分析缓存命中情况
python3 << 'EOF'
import json

print("seq | role | cache_creation | cache_read | hit_rate")
print("-" * 70)

with open("events.jsonl") as f:
    for line in f:
        event = json.loads(line)
        if event.get("event") == "message_added":
            msg = event["message"]
            if msg.get("role") == "assistant":
                seq = msg["sequence"]
                role = msg["role"]
                creation = msg.get("cache_creation_tokens", 0)
                read = msg.get("cache_read_tokens", 0)
                prompt = msg.get("prompt_tokens", 1)
                rate = f"{read/prompt*100:.1f}%" if prompt > 0 else "0%"
                print(f"{seq:3d} | {role:9s} | {creation:14,d} | {read:10,d} | {rate:>8s}")
EOF

预期结果

修改前（按 user/assistant 计数）

缓存点稀疏（每 20 条 user/assistant）
缓存命中率：10-15%
大量 tool 消息的 tokens 未被缓存

修改后（按总消息数计数）

缓存点密集（每 15 条总消息）
缓存命中率：30-50%
tool 消息也被包含在缓存范围内

缓存策略

当前策略：

System message（如果 > 1000 字符）
每 15 条总消息，找最近的 user/assistant 添加缓存点
最多 4 个缓存点（含 system）

注意事项

缓存是 ephemeral 的，5 分钟后过期
只有 Claude 模型支持 Prompt Caching
Tool 消息不能直接添加 cache_control，但会被包含在缓存范围内
Level 2 压缩会导致缓存失效（这是已知问题）

README.md 2.3 KB История Исходник

Prompt Caching 测试

功能说明

使用方法

1. 配置环境变量

2. 运行测试

3. 查看结果

4. 分析详细数据

预期结果

修改前（按 user/assistant 计数）

修改后（按总消息数计数）

缓存策略

注意事项

README.md 2.3 KB

История Исходник