|
@@ -0,0 +1,678 @@
|
|
|
|
|
+# 知识库反馈与管理机制优化提案
|
|
|
|
|
+
|
|
|
|
|
+> 本文档记录知识库反馈机制和规模管理的优化方案
|
|
|
|
|
+>
|
|
|
|
|
+> 讨论日期:2026-03-17
|
|
|
|
|
+> 状态:提案阶段,待审阅后实施
|
|
|
|
|
+
|
|
|
|
|
+---
|
|
|
|
|
+
|
|
|
|
|
+## 一、背景与问题
|
|
|
|
|
+
|
|
|
|
|
+### 1.1 当前反馈机制
|
|
|
|
|
+
|
|
|
|
|
+**现有结构**:
|
|
|
|
|
+- `eval` 字段:score (1-5), helpful/harmful 计数, confidence, 历史记录
|
|
|
|
|
+- 工具:`knowledge_update`, `knowledge_batch_update`
|
|
|
|
|
+- 应用:`min_score` 过滤、知识进化(`evolve_feedback`)
|
|
|
|
|
+
|
|
|
|
|
+**存在的问题**:
|
|
|
|
|
+1. 反馈来源不区分(人类、Agent、任务结果混在一起)
|
|
|
|
|
+2. 评分更新逻辑简单(手动设置,未根据反馈历史自动调整)
|
|
|
|
|
+3. 缺少隐式反馈(使用频率、检索排名等)
|
|
|
|
|
+4. 缺少时间衰减机制(旧知识可能过时)
|
|
|
|
|
+
|
|
|
|
|
+### 1.2 规模控制问题
|
|
|
|
|
+
|
|
|
|
|
+**现有 slim 机制的问题**:
|
|
|
|
|
+- 一次性加载 10000 条知识到内存
|
|
|
|
|
+- 单次 LLM 调用处理全部(成本高 $1-5/次,质量差)
|
|
|
|
|
+- 每条只截取前 200 字符,信息不完整
|
|
|
|
|
+
|
|
|
|
|
+**知识库膨胀原因**:
|
|
|
|
|
+1. 重复提取:相似任务多次执行,每次都提取"新"知识
|
|
|
|
|
+2. 粒度不一致:同一经验被拆成多条或合并成粗粒度
|
|
|
|
|
+3. 版本演化:知识更新时创建新版本而非覆盖旧版本
|
|
|
|
|
+4. 低质量沉积:score=3 的"中等"知识大量累积
|
|
|
|
|
+
|
|
|
|
|
+---
|
|
|
|
|
+
|
|
|
|
|
+## 二、核心优化方案
|
|
|
|
|
+
|
|
|
|
|
+### 2.1 保存时关系判断(P0 核心机制)
|
|
|
|
|
+
|
|
|
|
|
+#### 知识关系类型
|
|
|
|
|
+
|
|
|
|
|
+| 关系类型 | 说明 | 处理策略 |
|
|
|
|
|
+|---------|------|---------|
|
|
|
|
|
+| `duplicate` | 完全重复,只是表述略有差异 | 跳过保存 |
|
|
|
|
|
+| `subset` | 新知识是已有知识的特例或部分 | 跳过保存,或作为案例添加 |
|
|
|
|
|
+| `superset` | 新知识更全面,包含已有知识 | 保存新知识,废弃旧知识 |
|
|
|
|
|
+| `conflict` | 两条知识给出矛盾的建议 | 保存但标记冲突,需要人工审核 |
|
|
|
|
|
+| `complement` | 相关但不重复,可以互相补充 | 保存并建立关联关系 |
|
|
|
|
|
+| `independent` | 两条知识无关 | 直接保存 |
|
|
|
|
|
+
|
|
|
|
|
+#### 分层判断策略(降低成本)
|
|
|
|
|
+
|
|
|
|
|
+```
|
|
|
|
|
+Layer 1: 向量相似度检索(快速过滤)
|
|
|
|
|
+ ↓ 无相似知识 → 直接保存
|
|
|
|
|
+ ↓ 有相似知识
|
|
|
|
|
+Layer 2: 规则判断(免费)
|
|
|
|
|
+ - task 完全相同 + content 重叠 > 90% → 跳过
|
|
|
|
|
+ - content 完全相同 → 跳过
|
|
|
|
|
+ ↓ 规则无法判断
|
|
|
|
|
+Layer 3: LLM 判断(仅处理边界情况)
|
|
|
|
|
+ - 相似度 > 0.85 时才调用
|
|
|
|
|
+ - 使用 gemini-2.5-flash-lite
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+**成本估算**:
|
|
|
|
|
+- 假设每天保存 50 条知识
|
|
|
|
|
+- Layer 1 过滤 70%,Layer 2 过滤 20%,Layer 3 处理 10%
|
|
|
|
|
+- 每次 LLM 调用:1100 tokens
|
|
|
|
|
+- 年成本:50 × 10% × 1100 tokens × 365 天 ≈ **$0.15/年**
|
|
|
|
|
+
|
|
|
|
|
+#### 实现位置
|
|
|
|
|
+
|
|
|
|
|
+- `agent/tools/builtin/knowledge.py:knowledge_save` - 保存前检查
|
|
|
|
|
+- `knowhub/server.py:analyze_knowledge_relation` - 关系分析
|
|
|
|
|
+- `knowhub/server.py:handle_knowledge_relation` - 关系处理
|
|
|
|
|
+
|
|
|
|
|
+---
|
|
|
|
|
+
|
|
|
|
|
+### 2.2 反馈来源区分与加权评分(P0)
|
|
|
|
|
+
|
|
|
|
|
+#### 数据结构变更
|
|
|
|
|
+
|
|
|
|
|
+```python
|
|
|
|
|
+{
|
|
|
|
|
+ "eval": {
|
|
|
|
|
+ "score": 4.2, # 加权综合评分(自动计算)
|
|
|
|
|
+ "confidence": 0.9,
|
|
|
|
|
+
|
|
|
|
|
+ # 分来源统计
|
|
|
|
|
+ "feedback_by_source": {
|
|
|
|
|
+ "human": {
|
|
|
|
|
+ "helpful": 3,
|
|
|
|
|
+ "harmful": 0,
|
|
|
|
|
+ "weight": 1.0, # 权重最高
|
|
|
|
|
+ "last_feedback": "2026-03-17"
|
|
|
|
|
+ },
|
|
|
|
|
+ "agent_explicit": {
|
|
|
|
|
+ "helpful": 12,
|
|
|
|
|
+ "harmful": 2,
|
|
|
|
|
+ "weight": 0.6, # 中等权重
|
|
|
|
|
+ "last_feedback": "2026-03-17"
|
|
|
|
|
+ },
|
|
|
|
|
+ "task_outcome": {
|
|
|
|
|
+ "success": 45,
|
|
|
|
|
+ "failure": 5,
|
|
|
|
|
+ "weight": 0.3, # 权重最低(归因不明确)
|
|
|
|
|
+ "last_feedback": "2026-03-17"
|
|
|
|
|
+ }
|
|
|
|
|
+ },
|
|
|
|
|
+
|
|
|
|
|
+ # 详细历史(保留来源标记)
|
|
|
|
|
+ "feedback_history": [
|
|
|
|
|
+ {
|
|
|
|
|
+ "source": "human",
|
|
|
|
|
+ "type": "helpful",
|
|
|
|
|
+ "comment": "非常准确",
|
|
|
|
|
+ "timestamp": "2026-03-17",
|
|
|
|
|
+ "user_id": "user@example.com"
|
|
|
|
|
+ }
|
|
|
|
|
+ ]
|
|
|
|
|
+ }
|
|
|
|
|
+}
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+#### 加权评分算法
|
|
|
|
|
+
|
|
|
|
|
+```python
|
|
|
|
|
+def calculate_weighted_score(feedback_by_source):
|
|
|
|
|
+ """根据来源加权计算综合评分"""
|
|
|
|
|
+
|
|
|
|
|
+ total_weight = 0
|
|
|
|
|
+ weighted_sum = 0
|
|
|
|
|
+
|
|
|
|
|
+ for source, data in feedback_by_source.items():
|
|
|
|
|
+ helpful = data["helpful"]
|
|
|
|
|
+ harmful = data["harmful"]
|
|
|
|
|
+ weight = data["weight"]
|
|
|
|
|
+
|
|
|
|
|
+ if helpful + harmful == 0:
|
|
|
|
|
+ continue
|
|
|
|
|
+
|
|
|
|
|
+ # 正向率
|
|
|
|
|
+ positive_ratio = helpful / (helpful + harmful)
|
|
|
|
|
+
|
|
|
|
|
+ # 置信度:反馈次数越多越可信(上限10次)
|
|
|
|
|
+ confidence = min(1.0, (helpful + harmful) / 10)
|
|
|
|
|
+
|
|
|
|
|
+ # 该来源的得分:3 + 2 * (正向率 - 0.5)
|
|
|
|
|
+ source_score = 3 + 2 * (positive_ratio - 0.5)
|
|
|
|
|
+
|
|
|
|
|
+ # 加权累加
|
|
|
|
|
+ weighted_sum += source_score * weight * confidence
|
|
|
|
|
+ total_weight += weight * confidence
|
|
|
|
|
+
|
|
|
|
|
+ return max(1.0, min(5.0, weighted_sum / total_weight)) if total_weight > 0 else 3.0
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+#### 实现位置
|
|
|
|
|
+
|
|
|
|
|
+- `knowhub/server.py:update_knowledge` - 更新评分逻辑
|
|
|
|
|
+- `knowhub/server.py:calculate_weighted_score` - 评分计算
|
|
|
|
|
+- `agent/tools/builtin/knowledge.py:knowledge_feedback` - 新增人类反馈工具
|
|
|
|
|
+
|
|
|
|
|
+---
|
|
|
|
|
+
|
|
|
|
|
+### 2.3 分层存储(P0 必需)
|
|
|
|
|
+
|
|
|
|
|
+#### 知识状态机
|
|
|
|
|
+
|
|
|
|
|
+```
|
|
|
|
|
+active(活跃)→ stable(稳定)→ cold(冷藏)→ archived(归档)
|
|
|
|
|
+ ↓
|
|
|
|
|
+ deprecated(废弃)
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+#### 状态转换规则
|
|
|
|
|
+
|
|
|
|
|
+```python
|
|
|
|
|
+def calculate_state(knowledge):
|
|
|
|
|
+ days_since_last_use = (now - knowledge["last_used"]).days
|
|
|
|
|
+ usage_count = knowledge["implicit_feedback"]["search_count"]
|
|
|
|
|
+
|
|
|
|
|
+ if days_since_last_use > 180 and usage_count < 5:
|
|
|
|
|
+ return "archived" # 半年未用且使用少 → 归档
|
|
|
|
|
+ elif days_since_last_use > 90:
|
|
|
|
|
+ return "cold" # 3个月未用 → 冷藏
|
|
|
|
|
+ elif usage_count > 20:
|
|
|
|
|
+ return "active" # 使用频繁 → 活跃
|
|
|
|
|
+ else:
|
|
|
|
|
+ return "stable" # 默认稳定
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+#### 检索策略
|
|
|
|
|
+
|
|
|
|
|
+- 默认只检索 `active` + `stable`
|
|
|
|
|
+- 可选参数 `include_cold=true` 扩展到冷藏知识
|
|
|
|
|
+- `archived` 和 `deprecated` 不参与检索,但可通过 ID 访问
|
|
|
|
|
+
|
|
|
|
|
+#### 数据结构
|
|
|
|
|
+
|
|
|
|
|
+```python
|
|
|
|
|
+{
|
|
|
|
|
+ "state": "active", # active/stable/cold/archived/deprecated
|
|
|
|
|
+ "state_reason": "", # 状态变更原因
|
|
|
|
|
+ "state_updated_at": "2026-03-17T12:00:00Z"
|
|
|
|
|
+}
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+#### 实现位置
|
|
|
|
|
+
|
|
|
|
|
+- `knowhub/server.py:update_knowledge_states` - 后台任务,每天更新
|
|
|
|
|
+- `knowhub/server.py:search_knowledge_api` - 检索时过滤状态
|
|
|
|
|
+- `knowhub/vector_store.py` - Milvus 查询添加状态过滤
|
|
|
|
|
+
|
|
|
|
|
+---
|
|
|
|
|
+
|
|
|
|
|
+### 2.4 质量淘汰(P0 必需)
|
|
|
|
|
+
|
|
|
|
|
+#### 淘汰条件
|
|
|
|
|
+
|
|
|
|
|
+- `score < 2`
|
|
|
|
|
+- `harmful > helpful`
|
|
|
|
|
+- 存在超过 30 天
|
|
|
|
|
+
|
|
|
|
|
+#### 操作
|
|
|
|
|
+
|
|
|
|
|
+标记为 `deprecated`,不直接删除(可恢复)
|
|
|
|
|
+
|
|
|
|
|
+#### 实现
|
|
|
|
|
+
|
|
|
|
|
+```python
|
|
|
|
|
+async def prune_low_quality():
|
|
|
|
|
+ """定期清理低质量知识"""
|
|
|
|
|
+
|
|
|
|
|
+ low_quality = milvus_store.query(
|
|
|
|
|
+ filter_expr='eval["score"] < 2 and eval["harmful"] > eval["helpful"]'
|
|
|
|
|
+ )
|
|
|
|
|
+
|
|
|
|
|
+ for k in low_quality:
|
|
|
|
|
+ age_days = (now - k["created_at"]).days
|
|
|
|
|
+ if age_days > 30:
|
|
|
|
|
+ await knowledge_update(
|
|
|
|
|
+ knowledge_id=k["id"],
|
|
|
|
|
+ metadata={
|
|
|
|
|
+ "state": "deprecated",
|
|
|
|
|
+ "state_reason": "low_quality",
|
|
|
|
|
+ "deprecated_at": now
|
|
|
|
|
+ }
|
|
|
|
|
+ )
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+#### 实现位置
|
|
|
|
|
+
|
|
|
|
|
+- `knowhub/server.py:prune_low_quality` - 后台任务,每天执行
|
|
|
|
|
+
|
|
|
|
|
+---
|
|
|
|
|
+
|
|
|
|
|
+### 2.5 知识关系网络(P0)
|
|
|
|
|
+
|
|
|
|
|
+#### 数据结构
|
|
|
|
|
+
|
|
|
|
|
+```python
|
|
|
|
|
+{
|
|
|
|
|
+ "relations": [
|
|
|
|
|
+ {
|
|
|
|
|
+ "target_id": "knowledge-20260310-c3d4",
|
|
|
|
|
+ "relation_type": "complement", # duplicate/subset/superset/conflict/complement
|
|
|
|
|
+ "direction": "bidirectional", # bidirectional/outgoing/incoming
|
|
|
|
|
+ "confidence": 0.95,
|
|
|
|
|
+ "reason": "两条知识互补,分别覆盖不同场景",
|
|
|
|
|
+ "created_at": "2026-03-17T12:00:00Z",
|
|
|
|
|
+ "created_by": "system", # system/human/agent
|
|
|
|
|
+ "action_taken": "" # 可选:deprecated_target/merged/etc
|
|
|
|
|
+ }
|
|
|
|
|
+ ]
|
|
|
|
|
+}
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+#### 关系方向说明
|
|
|
|
|
+
|
|
|
|
|
+| 关系类型 | 方向性 | 说明 |
|
|
|
|
|
+|---------|--------|------|
|
|
|
|
|
+| `complement` | 双向 | 互补关系,建立双向链接 |
|
|
|
|
|
+| `duplicate` | 双向 | 完全重复 |
|
|
|
|
|
+| `subset` | 单向 | 本知识是目标的子集 |
|
|
|
|
|
+| `superset` | 单向 | 本知识是目标的超集 |
|
|
|
|
|
+| `conflict` | 双向 | 冲突关系 |
|
|
|
|
|
+
|
|
|
|
|
+#### 实现位置
|
|
|
|
|
+
|
|
|
|
|
+- `knowhub/server.py:create_knowledge_link` - 创建关系链接
|
|
|
|
|
+- `knowhub/server.py:get_related_knowledge` - 查询相关知识
|
|
|
|
|
+
|
|
|
|
|
+---
|
|
|
|
|
+
|
|
|
|
|
+### 2.6 轻量级健康检查(P1 推荐)
|
|
|
|
|
+
|
|
|
|
|
+#### 目的
|
|
|
|
|
+
|
|
|
|
|
+检测保存时去重的漏判(兜底机制)
|
|
|
|
|
+
|
|
|
|
|
+#### 策略
|
|
|
|
|
+
|
|
|
|
|
+```python
|
|
|
|
|
+async def weekly_health_check():
|
|
|
|
|
+ """每周检查新增知识的重复情况"""
|
|
|
|
|
+
|
|
|
|
|
+ # 只检查最近7天新增的知识
|
|
|
|
|
+ recent = query(filter=f'created_at > "{seven_days_ago}"')
|
|
|
|
|
+
|
|
|
|
|
+ if len(recent) < 10:
|
|
|
|
|
+ return # 新增太少,不值得检查
|
|
|
|
|
+
|
|
|
|
|
+ # 使用向量聚类检测明显重复(阈值 0.90)
|
|
|
|
|
+ clusters = await cluster_similar_knowledge(
|
|
|
|
|
+ knowledge_list=recent,
|
|
|
|
|
+ threshold=0.90
|
|
|
|
|
+ )
|
|
|
|
|
+
|
|
|
|
|
+ # 只报告,不自动处理
|
|
|
|
|
+ if clusters:
|
|
|
|
|
+ send_alert(f"发现 {len(clusters)} 组疑似重复,请人工审核")
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+#### 成本
|
|
|
|
|
+
|
|
|
|
|
+几乎为 0(只用向量聚类,不调用 LLM)
|
|
|
|
|
+
|
|
|
|
|
+#### 实现位置
|
|
|
|
|
+
|
|
|
|
|
+- `knowhub/server.py:weekly_health_check` - 后台任务,每周执行
|
|
|
|
|
+
|
|
|
|
|
+---
|
|
|
|
|
+
|
|
|
|
|
+## 三、可选优化(P2)
|
|
|
|
|
+
|
|
|
|
|
+### 3.1 隐式反馈收集
|
|
|
|
|
+
|
|
|
|
|
+```python
|
|
|
|
|
+{
|
|
|
|
|
+ "implicit_feedback": {
|
|
|
|
|
+ "search_count": 156, # 被检索次数
|
|
|
|
|
+ "click_count": 89, # 被选中使用次数
|
|
|
|
|
+ "last_used": "2026-03-17",
|
|
|
|
|
+ "avg_rank": 2.3 # 平均检索排名
|
|
|
|
|
+ }
|
|
|
|
|
+}
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+**实现位置**:`knowhub/server.py:search_knowledge_api` - 返回结果时记录
|
|
|
|
|
+
|
|
|
|
|
+### 3.2 时间衰减机制
|
|
|
|
|
+
|
|
|
|
|
+```python
|
|
|
|
|
+def apply_time_decay(knowledge, current_time):
|
|
|
|
|
+ age_days = (current_time - knowledge["created_at"]).days
|
|
|
|
|
+
|
|
|
|
|
+ # 6个月后开始衰减,1年后降至50%
|
|
|
|
|
+ if age_days > 180:
|
|
|
|
|
+ decay_factor = max(0.5, 1 - (age_days - 180) / 365)
|
|
|
|
|
+ knowledge["_search_score"] *= decay_factor
|
|
|
|
|
+
|
|
|
|
|
+ return knowledge
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+**实现位置**:`knowhub/server.py:_llm_rerank` - 精排前应用衰减
|
|
|
|
|
+
|
|
|
|
|
+### 3.3 多维度反馈
|
|
|
|
|
+
|
|
|
|
|
+```python
|
|
|
|
|
+{
|
|
|
|
|
+ "eval": {
|
|
|
|
|
+ "dimensions": {
|
|
|
|
|
+ "accuracy": 5, # 准确性
|
|
|
|
|
+ "completeness": 4, # 完整性
|
|
|
|
|
+ "clarity": 4, # 清晰度
|
|
|
|
|
+ "timeliness": 3 # 时效性
|
|
|
|
|
+ }
|
|
|
|
|
+ }
|
|
|
|
|
+}
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+### 3.4 归因置信度
|
|
|
|
|
+
|
|
|
|
|
+对于任务成功/失败反馈,计算"这个结果有多大程度归因于该知识":
|
|
|
|
|
+
|
|
|
|
|
+```python
|
|
|
|
|
+async def calculate_attribution_confidence(
|
|
|
|
|
+ knowledge_id: str,
|
|
|
|
|
+ task_result: dict
|
|
|
|
|
+) -> float:
|
|
|
|
|
+ """计算归因置信度"""
|
|
|
|
|
+
|
|
|
|
|
+ # 因素1:该知识在任务中的使用程度
|
|
|
|
|
+ usage_ratio = task_result["knowledge_usage"][knowledge_id] / task_result["total_steps"]
|
|
|
|
|
+
|
|
|
|
|
+ # 因素2:是否是唯一使用的知识
|
|
|
|
|
+ is_only_knowledge = len(task_result["used_knowledge_ids"]) == 1
|
|
|
|
|
+
|
|
|
|
|
+ # 因素3:失败时的错误类型
|
|
|
|
|
+ if task_result["status"] == "failed":
|
|
|
|
|
+ error_type = task_result["error_type"]
|
|
|
|
|
+ if error_type in ["network", "timeout", "rate_limit"]:
|
|
|
|
|
+ return 0.2 # 环境问题,归因置信度低
|
|
|
|
|
+ elif error_type in ["logic_error", "wrong_output"]:
|
|
|
|
|
+ return 0.9 # 逻辑问题,归因置信度高
|
|
|
|
|
+
|
|
|
|
|
+ # 综合计算
|
|
|
|
|
+ if is_only_knowledge:
|
|
|
|
|
+ return 0.9
|
|
|
|
|
+ else:
|
|
|
|
|
+ return 0.3 + 0.6 * usage_ratio
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+**实现位置**:`agent/core/runner.py` - 任务完成回调
|
|
|
|
|
+
|
|
|
|
|
+### 3.5 质量仪表盘
|
|
|
|
|
+
|
|
|
|
|
+```python
|
|
|
|
|
+@app.get("/api/knowledge/stats")
|
|
|
|
|
+async def knowledge_stats():
|
|
|
|
|
+ """知识库质量统计"""
|
|
|
|
|
+ return {
|
|
|
|
|
+ "total": 1234,
|
|
|
|
|
+ "by_score": {5: 234, 4: 567, 3: 345, 2: 67, 1: 21},
|
|
|
|
|
+ "by_state": {"active": 800, "stable": 300, "cold": 100, "archived": 34},
|
|
|
|
|
+ "low_quality": [...], # score < 3 的知识列表
|
|
|
|
|
+ "stale": [...], # 6个月未使用的知识
|
|
|
|
|
+ "top_helpful": [...], # helpful 最多的知识
|
|
|
|
|
+ "needs_review": [...], # harmful > helpful 的知识
|
|
|
|
|
+ "conflicts": [...] # 标记为冲突的知识对
|
|
|
|
|
+ }
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+### 3.6 改进 slim v2(按需执行)
|
|
|
|
|
+
|
|
|
|
|
+使用聚类 + 分批处理,替换现有的一次性加载方案:
|
|
|
|
|
+
|
|
|
|
|
+```python
|
|
|
|
|
+@app.post("/api/knowledge/slim")
|
|
|
|
|
+async def slim_knowledge_v2(
|
|
|
|
|
+ batch_size: int = 100,
|
|
|
|
|
+ similarity_threshold: float = 0.85,
|
|
|
|
|
+ model: str = "google/gemini-2.5-flash-lite"
|
|
|
|
|
+):
|
|
|
|
|
+ """知识库瘦身 v2:分批聚类合并"""
|
|
|
|
|
+
|
|
|
|
|
+ # 1. 聚类相似知识(只用向量,不用 LLM)
|
|
|
|
|
+ clusters = await cluster_similar_knowledge(
|
|
|
|
|
+ similarity_threshold=similarity_threshold
|
|
|
|
|
+ )
|
|
|
|
|
+
|
|
|
|
|
+ # 2. 对每个聚类调用 LLM 判断(分批处理)
|
|
|
|
|
+ merged_count = 0
|
|
|
|
|
+ for cluster in clusters:
|
|
|
|
|
+ knowledge_list = [milvus_store.get_by_id(kid) for kid in cluster]
|
|
|
|
|
+
|
|
|
|
|
+ # 只处理这个聚类的 2-5 条知识
|
|
|
|
|
+ decision = await llm_merge_cluster(knowledge_list, model)
|
|
|
|
|
+
|
|
|
|
|
+ if decision["should_merge"]:
|
|
|
|
|
+ await execute_merge(decision)
|
|
|
|
|
+ merged_count += 1
|
|
|
|
|
+
|
|
|
|
|
+ return {"clusters_found": len(clusters), "merged": merged_count}
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+**成本**:~$0.5/次(按需执行)
|
|
|
|
|
+
|
|
|
|
|
+---
|
|
|
|
|
+
|
|
|
|
|
+## 四、实施优先级与成本
|
|
|
|
|
+
|
|
|
|
|
+### P0(立即实施)
|
|
|
|
|
+
|
|
|
|
|
+| 机制 | 成本 | 实现位置 |
|
|
|
|
|
+|------|------|---------|
|
|
|
|
|
+| 保存时关系判断 | $0.15/年 | `agent/tools/builtin/knowledge.py:knowledge_save` |
|
|
|
|
|
+| 反馈来源区分 | $0 | `knowhub/server.py:update_knowledge` |
|
|
|
|
|
+| 分层存储 | $0 | `knowhub/server.py` + `knowhub/vector_store.py` |
|
|
|
|
|
+| 质量淘汰 | $0 | `knowhub/server.py:prune_low_quality` |
|
|
|
|
|
+| 知识关系网络 | $0 | `knowhub/server.py` |
|
|
|
|
|
+
|
|
|
|
|
+**P0 总成本**:~$0.15/年
|
|
|
|
|
+
|
|
|
|
|
+### P1(短期实施)
|
|
|
|
|
+
|
|
|
|
|
+| 机制 | 成本 | 实现位置 |
|
|
|
|
|
+|------|------|---------|
|
|
|
|
|
+| 轻量级健康检查 | ~$0 | `knowhub/server.py:weekly_health_check` |
|
|
|
|
|
+| 归因置信度 | $0 | `agent/core/runner.py` |
|
|
|
|
|
+
|
|
|
|
|
+### P2(按需实施)
|
|
|
|
|
+
|
|
|
|
|
+| 机制 | 成本 | 备注 |
|
|
|
|
|
+|------|------|------|
|
|
|
|
|
+| 隐式反馈收集 | $0 | 可选 |
|
|
|
|
|
+| 时间衰减机制 | $0 | 可选 |
|
|
|
|
|
+| 多维度反馈 | $0 | 可选 |
|
|
|
|
|
+| 质量仪表盘 | $0 | 可选 |
|
|
|
|
|
+| 改进 slim v2 | $0.5/次 | 按需执行 |
|
|
|
|
|
+| 定期全量去重 | $10-20/次 | 仅在保存时去重误判率 > 5% 时需要 |
|
|
|
|
|
+
|
|
|
|
|
+---
|
|
|
|
|
+
|
|
|
|
|
+## 五、关键设计原则
|
|
|
|
|
+
|
|
|
|
|
+1. **实时防御优于事后清理**:保存时去重比定期去重更有效
|
|
|
|
|
+2. **分层判断降低成本**:向量 → 规则 → LLM,只在必要时用 LLM
|
|
|
|
|
+3. **反馈来源加权**:人类 (1.0) > Agent (0.6) > 任务结果 (0.3)
|
|
|
|
|
+4. **知识关系网络**:通过 `relations` 字段建立知识图谱
|
|
|
|
|
+5. **生命周期管理**:通过 `state` 字段管理知识的可见性
|
|
|
|
|
+6. **质量驱动淘汰**:基于反馈自动清理低质量知识
|
|
|
|
|
+
|
|
|
|
|
+---
|
|
|
|
|
+
|
|
|
|
|
+## 六、完整数据结构
|
|
|
|
|
+
|
|
|
|
|
+```python
|
|
|
|
|
+{
|
|
|
|
|
+ # 现有字段
|
|
|
|
|
+ "id": "knowledge-20260317-a1b2",
|
|
|
|
|
+ "message_id": "msg-xxx",
|
|
|
|
|
+ "types": ["strategy", "tool"],
|
|
|
|
|
+ "task": "在什么场景下要完成什么目标",
|
|
|
|
|
+ "content": "核心知识内容",
|
|
|
|
|
+ "tags": {"category": "preference"},
|
|
|
|
|
+ "scopes": ["org:cybertogether"],
|
|
|
|
|
+ "owner": "agent:research_agent",
|
|
|
|
|
+ "resource_ids": ["code/selenium/login"],
|
|
|
|
|
+ "source": {
|
|
|
|
|
+ "name": "资源名称",
|
|
|
|
|
+ "category": "exp",
|
|
|
|
|
+ "urls": ["https://example.com"],
|
|
|
|
|
+ "agent_id": "research_agent",
|
|
|
|
|
+ "submitted_by": "user@example.com",
|
|
|
|
|
+ "timestamp": "2026-03-17T12:00:00Z",
|
|
|
|
|
+ "message_id": "msg-xxx"
|
|
|
|
|
+ },
|
|
|
|
|
+
|
|
|
|
|
+ # 改进的评估字段
|
|
|
|
|
+ "eval": {
|
|
|
|
|
+ "score": 4.2, # 加权综合评分(自动计算)
|
|
|
|
|
+ "confidence": 0.9,
|
|
|
|
|
+ "feedback_by_source": {
|
|
|
|
|
+ "human": {"helpful": 3, "harmful": 0, "weight": 1.0, "last_feedback": "2026-03-17"},
|
|
|
|
|
+ "agent_explicit": {"helpful": 12, "harmful": 2, "weight": 0.6, "last_feedback": "2026-03-17"},
|
|
|
|
|
+ "task_outcome": {"success": 45, "failure": 5, "weight": 0.3, "last_feedback": "2026-03-17"}
|
|
|
|
|
+ },
|
|
|
|
|
+ "feedback_history": [
|
|
|
|
|
+ {
|
|
|
|
|
+ "source": "human",
|
|
|
|
|
+ "type": "helpful",
|
|
|
|
|
+ "comment": "非常准确",
|
|
|
|
|
+ "timestamp": "2026-03-17T12:00:00Z",
|
|
|
|
|
+ "user_id": "user@example.com"
|
|
|
|
|
+ }
|
|
|
|
|
+ ]
|
|
|
|
|
+ },
|
|
|
|
|
+
|
|
|
|
|
+ # 新增:隐式反馈(P2 可选)
|
|
|
|
|
+ "implicit_feedback": {
|
|
|
|
|
+ "search_count": 156,
|
|
|
|
|
+ "click_count": 89,
|
|
|
|
|
+ "last_used": "2026-03-17",
|
|
|
|
|
+ "avg_rank": 2.3
|
|
|
|
|
+ },
|
|
|
|
|
+
|
|
|
|
|
+ # 新增:知识关系(P0)
|
|
|
|
|
+ "relations": [
|
|
|
|
|
+ {
|
|
|
|
|
+ "target_id": "knowledge-xxx",
|
|
|
|
|
+ "relation_type": "complement",
|
|
|
|
|
+ "direction": "bidirectional",
|
|
|
|
|
+ "confidence": 0.95,
|
|
|
|
|
+ "reason": "两条知识互补,分别覆盖不同场景",
|
|
|
|
|
+ "created_at": "2026-03-17T12:00:00Z",
|
|
|
|
|
+ "created_by": "system",
|
|
|
|
|
+ "action_taken": ""
|
|
|
|
|
+ }
|
|
|
|
|
+ ],
|
|
|
|
|
+
|
|
|
|
|
+ # 新增:知识状态(P0)
|
|
|
|
|
+ "state": "active", # active/stable/cold/archived/deprecated
|
|
|
|
|
+ "state_reason": "",
|
|
|
|
|
+ "state_updated_at": "2026-03-17T12:00:00Z",
|
|
|
|
|
+
|
|
|
|
|
+ "created_at": "2026-03-17T12:00:00Z",
|
|
|
|
|
+ "updated_at": "2026-03-17T12:00:00Z"
|
|
|
|
|
+}
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+---
|
|
|
|
|
+
|
|
|
|
|
+## 七、实施路线图
|
|
|
|
|
+
|
|
|
|
|
+### 阶段 1:核心机制(1-2周)
|
|
|
|
|
+
|
|
|
|
|
+1. 修改知识数据结构(添加 `relations`, `state`, `feedback_by_source`)
|
|
|
|
|
+2. 实现保存时关系判断
|
|
|
|
|
+3. 实现反馈来源区分与加权评分
|
|
|
|
|
+4. 实现分层存储
|
|
|
|
|
+5. 实现质量淘汰
|
|
|
|
|
+
|
|
|
|
|
+### 阶段 2:监控与优化(1周)
|
|
|
|
|
+
|
|
|
|
|
+6. 实现轻量级健康检查
|
|
|
|
|
+7. 实现归因置信度
|
|
|
|
|
+8. 观察运行效果,调整参数
|
|
|
|
|
+
|
|
|
|
|
+### 阶段 3:增强功能(按需)
|
|
|
|
|
+
|
|
|
|
|
+9. 隐式反馈收集
|
|
|
|
|
+10. 时间衰减机制
|
|
|
|
|
+11. 多维度反馈
|
|
|
|
|
+12. 质量仪表盘
|
|
|
|
|
+13. 改进 slim v2
|
|
|
|
|
+
|
|
|
|
|
+---
|
|
|
|
|
+
|
|
|
|
|
+## 八、风险与缓解
|
|
|
|
|
+
|
|
|
|
|
+### 风险 1:LLM 判断误判
|
|
|
|
|
+
|
|
|
|
|
+**影响**:可能误判为 duplicate 导致丢失有用知识
|
|
|
|
|
+
|
|
|
|
|
+**缓解**:
|
|
|
|
|
+- 使用分层判断,只在边界情况用 LLM
|
|
|
|
|
+- 设置置信度阈值,低于 0.8 时降级到更好的模型
|
|
|
|
|
+- 轻量级健康检查作为兜底
|
|
|
|
|
+
|
|
|
|
|
+### 风险 2:关系网络复杂度
|
|
|
|
|
+
|
|
|
|
|
+**影响**:知识关系可能形成复杂网络,难以维护
|
|
|
|
|
+
|
|
|
|
|
+**缓解**:
|
|
|
|
|
+- 初期只建立必要的关系(complement, conflict)
|
|
|
|
|
+- 提供可视化工具查看关系图
|
|
|
|
|
+- 定期清理无效关系
|
|
|
|
|
+
|
|
|
|
|
+### 风险 3:状态转换过于激进
|
|
|
|
|
+
|
|
|
|
|
+**影响**:有用的知识可能被过早归档
|
|
|
|
|
+
|
|
|
|
|
+**缓解**:
|
|
|
|
|
+- 保守的阈值设置(180天才归档)
|
|
|
|
|
+- 归档的知识仍可通过 ID 访问
|
|
|
|
|
+- 提供恢复接口
|
|
|
|
|
+
|
|
|
|
|
+---
|
|
|
|
|
+
|
|
|
|
|
+## 九、成功指标
|
|
|
|
|
+
|
|
|
|
|
+### 定量指标
|
|
|
|
|
+
|
|
|
|
|
+- 知识库增长率:从当前 X 条/月降至 Y 条/月
|
|
|
|
|
+- 重复率:新增知识中重复率 < 5%
|
|
|
|
|
+- 低质量知识占比:score < 3 的知识 < 10%
|
|
|
|
|
+- 归档知识占比:archived 状态 < 20%
|
|
|
|
|
+
|
|
|
|
|
+### 定性指标
|
|
|
|
|
+
|
|
|
|
|
+- Agent 检索到的知识更相关
|
|
|
|
|
+- 知识质量反馈更准确
|
|
|
|
|
+- 知识库维护成本降低
|
|
|
|
|
+
|
|
|
|
|
+---
|
|
|
|
|
+
|
|
|
|
|
+## 十、参考资料
|
|
|
|
|
+
|
|
|
|
|
+- 现有知识管理文档:`knowhub/docs/knowledge-management.md`
|
|
|
|
|
+- 决策记录:`knowhub/docs/decisions.md`
|
|
|
|
|
+- 资源存储文档:`knowhub/docs/resource-storage.md`
|