本文档记录知识库反馈机制和规模管理的优化方案
讨论日期:2026-03-17 状态:提案阶段,待审阅后实施
现有结构:
eval 字段:score (1-5), helpful/harmful 计数, confidence, 历史记录knowledge_update, knowledge_batch_updatemin_score 过滤、知识进化(evolve_feedback)存在的问题:
现有 slim 机制的问题:
知识库膨胀原因:
| 关系类型 | 说明 | 处理策略 |
|---|---|---|
duplicate |
完全重复,只是表述略有差异 | 跳过保存 |
subset |
新知识是已有知识的特例或部分 | 跳过保存,或作为案例添加 |
superset |
新知识更全面,包含已有知识 | 保存新知识,废弃旧知识 |
conflict |
两条知识给出矛盾的建议 | 保存但标记冲突,需要人工审核 |
complement |
相关但不重复,可以互相补充 | 保存并建立关联关系 |
independent |
两条知识无关 | 直接保存 |
Layer 1: 向量相似度检索(快速过滤)
↓ 无相似知识 → 直接保存
↓ 有相似知识
Layer 2: 规则判断(免费)
- task 完全相同 + content 重叠 > 90% → 跳过
- content 完全相同 → 跳过
↓ 规则无法判断
Layer 3: LLM 判断(仅处理边界情况)
- 相似度 > 0.85 时才调用
- 使用 gemini-2.5-flash-lite
成本估算:
agent/tools/builtin/knowledge.py:knowledge_save - 保存前检查knowhub/server.py:analyze_knowledge_relation - 关系分析knowhub/server.py:handle_knowledge_relation - 关系处理{
"eval": {
"score": 4.2, # 加权综合评分(自动计算)
"confidence": 0.9,
# 分来源统计
"feedback_by_source": {
"human": {
"helpful": 3,
"harmful": 0,
"weight": 1.0, # 权重最高
"last_feedback": "2026-03-17"
},
"agent_explicit": {
"helpful": 12,
"harmful": 2,
"weight": 0.6, # 中等权重
"last_feedback": "2026-03-17"
},
"task_outcome": {
"success": 45,
"failure": 5,
"weight": 0.3, # 权重最低(归因不明确)
"last_feedback": "2026-03-17"
}
},
# 详细历史(保留来源标记)
"feedback_history": [
{
"source": "human",
"type": "helpful",
"comment": "非常准确",
"timestamp": "2026-03-17",
"user_id": "user@example.com"
}
]
}
}
def calculate_weighted_score(feedback_by_source):
"""根据来源加权计算综合评分"""
total_weight = 0
weighted_sum = 0
for source, data in feedback_by_source.items():
helpful = data["helpful"]
harmful = data["harmful"]
weight = data["weight"]
if helpful + harmful == 0:
continue
# 正向率
positive_ratio = helpful / (helpful + harmful)
# 置信度:反馈次数越多越可信(上限10次)
confidence = min(1.0, (helpful + harmful) / 10)
# 该来源的得分:3 + 2 * (正向率 - 0.5)
source_score = 3 + 2 * (positive_ratio - 0.5)
# 加权累加
weighted_sum += source_score * weight * confidence
total_weight += weight * confidence
return max(1.0, min(5.0, weighted_sum / total_weight)) if total_weight > 0 else 3.0
knowhub/server.py:update_knowledge - 更新评分逻辑knowhub/server.py:calculate_weighted_score - 评分计算agent/tools/builtin/knowledge.py:knowledge_feedback - 新增人类反馈工具active(活跃)→ stable(稳定)→ cold(冷藏)→ archived(归档)
↓
deprecated(废弃)
def calculate_state(knowledge):
days_since_last_use = (now - knowledge["last_used"]).days
usage_count = knowledge["implicit_feedback"]["search_count"]
if days_since_last_use > 180 and usage_count < 5:
return "archived" # 半年未用且使用少 → 归档
elif days_since_last_use > 90:
return "cold" # 3个月未用 → 冷藏
elif usage_count > 20:
return "active" # 使用频繁 → 活跃
else:
return "stable" # 默认稳定
active + stableinclude_cold=true 扩展到冷藏知识archived 和 deprecated 不参与检索,但可通过 ID 访问{
"state": "active", # active/stable/cold/archived/deprecated
"state_reason": "", # 状态变更原因
"state_updated_at": "2026-03-17T12:00:00Z"
}
knowhub/server.py:update_knowledge_states - 后台任务,每天更新knowhub/server.py:search_knowledge_api - 检索时过滤状态knowhub/vector_store.py - Milvus 查询添加状态过滤score < 2harmful > helpful标记为 deprecated,不直接删除(可恢复)
async def prune_low_quality():
"""定期清理低质量知识"""
low_quality = milvus_store.query(
filter_expr='eval["score"] < 2 and eval["harmful"] > eval["helpful"]'
)
for k in low_quality:
age_days = (now - k["created_at"]).days
if age_days > 30:
await knowledge_update(
knowledge_id=k["id"],
metadata={
"state": "deprecated",
"state_reason": "low_quality",
"deprecated_at": now
}
)
knowhub/server.py:prune_low_quality - 后台任务,每天执行{
"relations": [
{
"target_id": "knowledge-20260310-c3d4",
"relation_type": "complement", # duplicate/subset/superset/conflict/complement
"direction": "bidirectional", # bidirectional/outgoing/incoming
"confidence": 0.95,
"reason": "两条知识互补,分别覆盖不同场景",
"created_at": "2026-03-17T12:00:00Z",
"created_by": "system", # system/human/agent
"action_taken": "" # 可选:deprecated_target/merged/etc
}
]
}
| 关系类型 | 方向性 | 说明 |
|---|---|---|
complement |
双向 | 互补关系,建立双向链接 |
duplicate |
双向 | 完全重复 |
subset |
单向 | 本知识是目标的子集 |
superset |
单向 | 本知识是目标的超集 |
conflict |
双向 | 冲突关系 |
knowhub/server.py:create_knowledge_link - 创建关系链接knowhub/server.py:get_related_knowledge - 查询相关知识检测保存时去重的漏判(兜底机制)
async def weekly_health_check():
"""每周检查新增知识的重复情况"""
# 只检查最近7天新增的知识
recent = query(filter=f'created_at > "{seven_days_ago}"')
if len(recent) < 10:
return # 新增太少,不值得检查
# 使用向量聚类检测明显重复(阈值 0.90)
clusters = await cluster_similar_knowledge(
knowledge_list=recent,
threshold=0.90
)
# 只报告,不自动处理
if clusters:
send_alert(f"发现 {len(clusters)} 组疑似重复,请人工审核")
几乎为 0(只用向量聚类,不调用 LLM)
knowhub/server.py:weekly_health_check - 后台任务,每周执行{
"implicit_feedback": {
"search_count": 156, # 被检索次数
"click_count": 89, # 被选中使用次数
"last_used": "2026-03-17",
"avg_rank": 2.3 # 平均检索排名
}
}
实现位置:knowhub/server.py:search_knowledge_api - 返回结果时记录
def apply_time_decay(knowledge, current_time):
age_days = (current_time - knowledge["created_at"]).days
# 6个月后开始衰减,1年后降至50%
if age_days > 180:
decay_factor = max(0.5, 1 - (age_days - 180) / 365)
knowledge["_search_score"] *= decay_factor
return knowledge
实现位置:knowhub/server.py:_llm_rerank - 精排前应用衰减
{
"eval": {
"dimensions": {
"accuracy": 5, # 准确性
"completeness": 4, # 完整性
"clarity": 4, # 清晰度
"timeliness": 3 # 时效性
}
}
}
对于任务成功/失败反馈,计算"这个结果有多大程度归因于该知识":
async def calculate_attribution_confidence(
knowledge_id: str,
task_result: dict
) -> float:
"""计算归因置信度"""
# 因素1:该知识在任务中的使用程度
usage_ratio = task_result["knowledge_usage"][knowledge_id] / task_result["total_steps"]
# 因素2:是否是唯一使用的知识
is_only_knowledge = len(task_result["used_knowledge_ids"]) == 1
# 因素3:失败时的错误类型
if task_result["status"] == "failed":
error_type = task_result["error_type"]
if error_type in ["network", "timeout", "rate_limit"]:
return 0.2 # 环境问题,归因置信度低
elif error_type in ["logic_error", "wrong_output"]:
return 0.9 # 逻辑问题,归因置信度高
# 综合计算
if is_only_knowledge:
return 0.9
else:
return 0.3 + 0.6 * usage_ratio
实现位置:agent/core/runner.py - 任务完成回调
@app.get("/api/knowledge/stats")
async def knowledge_stats():
"""知识库质量统计"""
return {
"total": 1234,
"by_score": {5: 234, 4: 567, 3: 345, 2: 67, 1: 21},
"by_state": {"active": 800, "stable": 300, "cold": 100, "archived": 34},
"low_quality": [...], # score < 3 的知识列表
"stale": [...], # 6个月未使用的知识
"top_helpful": [...], # helpful 最多的知识
"needs_review": [...], # harmful > helpful 的知识
"conflicts": [...] # 标记为冲突的知识对
}
使用聚类 + 分批处理,替换现有的一次性加载方案:
@app.post("/api/knowledge/slim")
async def slim_knowledge_v2(
batch_size: int = 100,
similarity_threshold: float = 0.85,
model: str = "google/gemini-2.5-flash-lite"
):
"""知识库瘦身 v2:分批聚类合并"""
# 1. 聚类相似知识(只用向量,不用 LLM)
clusters = await cluster_similar_knowledge(
similarity_threshold=similarity_threshold
)
# 2. 对每个聚类调用 LLM 判断(分批处理)
merged_count = 0
for cluster in clusters:
knowledge_list = [milvus_store.get_by_id(kid) for kid in cluster]
# 只处理这个聚类的 2-5 条知识
decision = await llm_merge_cluster(knowledge_list, model)
if decision["should_merge"]:
await execute_merge(decision)
merged_count += 1
return {"clusters_found": len(clusters), "merged": merged_count}
成本:~$0.5/次(按需执行)
| 机制 | 成本 | 实现位置 |
|---|---|---|
| 保存时关系判断 | $0.15/年 | agent/tools/builtin/knowledge.py:knowledge_save |
| 反馈来源区分 | $0 | knowhub/server.py:update_knowledge |
| 分层存储 | $0 | knowhub/server.py + knowhub/vector_store.py |
| 质量淘汰 | $0 | knowhub/server.py:prune_low_quality |
| 知识关系网络 | $0 | knowhub/server.py |
P0 总成本:~$0.15/年
| 机制 | 成本 | 实现位置 |
|---|---|---|
| 轻量级健康检查 | ~$0 | knowhub/server.py:weekly_health_check |
| 归因置信度 | $0 | agent/core/runner.py |
| 机制 | 成本 | 备注 |
|---|---|---|
| 隐式反馈收集 | $0 | 可选 |
| 时间衰减机制 | $0 | 可选 |
| 多维度反馈 | $0 | 可选 |
| 质量仪表盘 | $0 | 可选 |
| 改进 slim v2 | $0.5/次 | 按需执行 |
| 定期全量去重 | $10-20/次 | 仅在保存时去重误判率 > 5% 时需要 |
relations 字段建立知识图谱state 字段管理知识的可见性{
# 现有字段
"id": "knowledge-20260317-a1b2",
"message_id": "msg-xxx",
"types": ["strategy", "tool"],
"task": "在什么场景下要完成什么目标",
"content": "核心知识内容",
"tags": {"category": "preference"},
"scopes": ["org:cybertogether"],
"owner": "agent:research_agent",
"resource_ids": ["code/selenium/login"],
"source": {
"name": "资源名称",
"category": "exp",
"urls": ["https://example.com"],
"agent_id": "research_agent",
"submitted_by": "user@example.com",
"timestamp": "2026-03-17T12:00:00Z",
"message_id": "msg-xxx"
},
# 改进的评估字段
"eval": {
"score": 4.2, # 加权综合评分(自动计算)
"confidence": 0.9,
"feedback_by_source": {
"human": {"helpful": 3, "harmful": 0, "weight": 1.0, "last_feedback": "2026-03-17"},
"agent_explicit": {"helpful": 12, "harmful": 2, "weight": 0.6, "last_feedback": "2026-03-17"},
"task_outcome": {"success": 45, "failure": 5, "weight": 0.3, "last_feedback": "2026-03-17"}
},
"feedback_history": [
{
"source": "human",
"type": "helpful",
"comment": "非常准确",
"timestamp": "2026-03-17T12:00:00Z",
"user_id": "user@example.com"
}
]
},
# 新增:隐式反馈(P2 可选)
"implicit_feedback": {
"search_count": 156,
"click_count": 89,
"last_used": "2026-03-17",
"avg_rank": 2.3
},
# 新增:知识关系(P0)
"relations": [
{
"target_id": "knowledge-xxx",
"relation_type": "complement",
"direction": "bidirectional",
"confidence": 0.95,
"reason": "两条知识互补,分别覆盖不同场景",
"created_at": "2026-03-17T12:00:00Z",
"created_by": "system",
"action_taken": ""
}
],
# 新增:知识状态(P0)
"state": "active", # active/stable/cold/archived/deprecated
"state_reason": "",
"state_updated_at": "2026-03-17T12:00:00Z",
"created_at": "2026-03-17T12:00:00Z",
"updated_at": "2026-03-17T12:00:00Z"
}
relations, state, feedback_by_source)影响:可能误判为 duplicate 导致丢失有用知识
缓解:
影响:知识关系可能形成复杂网络,难以维护
缓解:
影响:有用的知识可能被过早归档
缓解:
knowhub/docs/knowledge-management.mdknowhub/docs/decisions.mdknowhub/docs/resource-storage.md