这是一个基于LLM的智能搜索系统,用于小红书内容搜索场景。系统通过多阶段处理流程(7个步骤),从原始特征出发,自动生成、评估和执行搜索任务。核心功能包括特征匹配、关联分析、智能搜索词生成、并发搜索执行、LLM评估、深度解构和相似度分析。
输入: how.json
↓
步骤1: 特征筛选 → filtered_features.json
↓
步骤2: 候选词提取 → candidate_words.json
↓
步骤3: 搜索词生成(LLM) → search_queries_evaluated.json
↓
步骤4: 执行搜索(小红书API) → search_results.json
↓
步骤5: 结果评估(LLM两层过滤) → evaluated_results.json
↓
步骤6: 深度解构(可选) → deep_analysis_results.json
↓
步骤7: 相似度分析(可选) → similarity_analysis_results.json
↓
可视化: HTML交互式界面
位置: input/posts/{post_id}_how.json
核心字段:
{
"帖子id": "690d977d0000000007036331",
"帖子详情": {
"link": "https://www.xiaohongshu.com/explore/...",
"title": "帖子标题",
"body_text": "帖子正文",
"images": ["图片URL1", "图片URL2"],
"like_count": 7428,
"collect_count": 368,
"channel_account_name": "作者名称"
},
"解构结果": {
"灵感点列表": [
{
"ID": "inspiration_1",
"名称": "猫咪考试祝福",
"类型": "实质",
"描述": "详细描述...",
"置信度": 0.95,
"匹配人设结果": [
{
"人设特征名称": "鼓励式猫咪表情包",
"人设特征层级": "灵感点",
"特征类型": "标签", // 或 "分类"
"特征分类": ["表情包素材", "日常生活演绎", "实质"],
"相似度": 0.5339,
"说明": "相似度计算说明..."
}
]
}
],
"目的点列表": [...],
"关键点列表": [...]
}
}
筛选条件: 0.5 ≤ 最高相似度 < 0.8
[
{
"原始特征名称": "猫咪考试祝福",
"来源层级": "灵感点列表",
"权重": 1.0,
"所属点名称": "猫咪考试祝福",
"最高匹配信息": {
"人设特征名称": "鼓励式猫咪表情包",
"人设特征层级": "灵感点",
"特征类型": "标签",
"相似度": 0.5339,
"是分类": false,
"所属分类路径": "表情包素材/日常生活演绎/实质"
},
"top3匹配信息": [...]
}
]
候选词来源:
[
{
"原始特征名称": "猫咪考试祝福",
"来源层级": "灵感点列表",
"高相似度候选_按base_word": {
"鼓励式猫咪表情包": [
{
"候选词": "鼓励式猫咪表情包",
"候选词类型": "persona", // 或 "post"
"相似度": 0.85,
"特征类型": "标签",
"人设特征层级": "灵感点",
"来源层级": "persona",
"来源路径": "表情包素材/日常生活演绎/实质",
"来源原始特征": "猫咪考试祝福"
}
]
}
}
]
生成方式: LLM基于中心词(base_word)和候选词生成搜索词
[
{
"原始特征名称": "猫咪考试祝福",
"组合评估结果_分组": [
{
"base_word": "鼓励式猫咪表情包",
"base_word_similarity": 0.5339,
"base_word_info": {...},
"top10_searches": [
{
"search_word": "鼓励式猫咪表情包 考试祝福",
"score": 0.92,
"reasoning": "组合词很好地结合了中心词和候选词...",
"source_word": "鼓励式猫咪表情包",
"original_feature": "猫咪考试祝福"
}
],
"available_words": ["鼓励式猫咪表情包", "考试祝福", ...]
}
]
}
]
数据来源: 小红书搜索API
[
{
"原始特征名称": "猫咪考试祝福",
"组合评估结果_分组": [
{
"base_word": "鼓励式猫咪表情包",
"top10_searches": [
{
"search_word": "鼓励式猫咪表情包 考试祝福",
"search_result": {
"data": {
"data": [
{
"id": "note_id_123",
"note_card": {
"display_title": "考试祝福表情包",
"type": "normal", // 或 "video"
"image_list": ["http://..."],
"user": {
"nickname": "用户昵称",
"user_id": "..."
},
"interact_info": {
"liked_count": "1234",
"collected_count": "56"
}
}
}
]
}
},
"search_metadata": {
"searched_at": "2025-01-01T12:00:00",
"status": "success",
"note_count": 20
}
}
]
}
]
}
]
评估机制: LLM两层过滤
[
{
"原始特征名称": "猫咪考试祝福",
"组合评估结果_分组": [
{
"base_word": "鼓励式猫咪表情包",
"top10_searches": [
{
"search_word": "鼓励式猫咪表情包 考试祝福",
"search_result": {...},
"evaluation_with_filter": {
"total_notes": 20,
"filtered_count": 5, // 第一层过滤: 与Query无关
"evaluated_count": 15, // 第二层评估数量
"statistics": {
"完全匹配(0.8-1.0)": 8,
"相似匹配(0.6-0.79)": 4,
"弱相似(0.5-0.59)": 2,
"无匹配(≤0.4)": 1
},
"notes_evaluation": [
{
"note_index": 0,
"note_id": "note_id_123",
"Query相关性": "相关", // 或 "不相关"
"Query相关性说明": "...",
"综合得分": 0.85,
"评分说明": "...",
"关键匹配点": ["考试祝福", "猫咪表情包"]
}
]
}
}
]
}
]
}
]
触发条件: 评估得分 ≥ min_score (默认0.8) 且评估类型为"完全匹配"
{
"metadata": {
"stage": "deconstruction",
"post_id": "690d977d0000000007036331",
"total_matched_notes": 50,
"processed_notes": 45,
"success_count": 40,
"min_score_threshold": 0.8,
"created_at": "2025-01-01T12:00:00"
},
"results": [
{
"note_id": "note_id_123",
"search_word": "鼓励式猫咪表情包 考试祝福",
"original_feature": "猫咪考试祝福",
"evaluation_score": 0.85,
"evaluation_type": "完全匹配",
"key_matching_points": ["考试祝福", "猫咪表情包"],
"inspiration_features": [
{
"feature_name": "考试祝福主题",
"dimension": "灵感点-全新内容",
"dimension_detail": "实质",
"weight": 0.8,
"source_index": 0,
"source_candidate_number": 1
}
],
"purpose_features": [...],
"key_point_features": [...],
"api_response": {
"result": {
"data": {
"三点解构": {
"灵感点": {
"全新内容": [...],
"共性差异": [...],
"共性内容": [...]
},
"目的点": {...},
"关键点": {...}
}
}
}
}
}
]
}
计算公式:
{
"metadata": {
"stage": "similarity_analysis",
"weight_embedding": 0.5,
"weight_semantic": 0.5,
"min_similarity": 0.0,
"created_at": "2025-01-01T12:00:00"
},
"results": [
{
"note_id": "note_id_123",
"original_feature": "猫咪考试祝福",
"features_similarity": [
{
"feature_name": "考试祝福主题",
"dimension": "灵感点-全新内容",
"dimension_detail": "实质",
"weight": 0.8,
"similarity_score": 0.75,
"similarity_breakdown": {
"embedding_score": 0.80, // 向量模型
"semantic_score": 0.70, // LLM模型
"weighted_score": 0.75 // 加权平均
},
"original_feature": "猫咪考试祝福"
}
],
"statistics": {
"total_features": 5,
"avg_similarity": 0.72,
"max_similarity": 0.85,
"min_similarity": 0.60
}
}
],
"综合得分P": 0.78 // 可选,需要启用 calculate_comprehensive_score
}
optimization/
├── lib/ # 核心功能库
│ ├── client.py # API客户端
│ ├── data_loader.py # 数据加载
│ ├── match_analyzer.py # 匹配分析
│ ├── relation_analyzer.py # 关联分析
│ ├── semantic_similarity.py # 语义相似度
│ ├── hybrid_similarity.py # 混合相似度(向量+LLM)
│ └── utils.py # 工具函数
├── src/ # 主要源代码
│ ├── pipeline/ # 流程控制
│ ├── analyzers/ # 分析器(解构/相似度)
│ ├── evaluators/ # 评估器(LLM)
│ ├── clients/ # API客户端
│ └── visualizers/ # 可视化生成器
├── input/posts/ # 输入数据 (*_how.json)
├── output_v2/ # 输出结果
├── visualization/ # 可视化HTML文件
├── cache/search/ # 搜索缓存
└── logs/ # 日志文件
数据处理流程:
EnhancedSearchV2 (src/pipeline/feature_search_pipeline.py) - 主流程控制器LLMEvaluator (src/evaluators/llm_evaluator.py) - 搜索词生成和结果评估PostDeconstructionAnalyzer (src/analyzers/) - 深度解构分析SimilarityAnalyzer (src/analyzers/) - 相似度分析XiaohongshuSearch (src/clients/) - 小红书搜索API客户端OpenRouterClient (src/clients/) - LLM API客户端可视化模块:
deconstruction_visualizer.py - 整合可视化(搜索+评估+解构+相似度)python3 main.py --mode basic
python3 main.py --mode standard
python3 main.py --mode full
python3 main.py \
--enable-evaluation \
--enable-deep-analysis \
--deep-analysis-min-score 0.9 \
--enable-similarity \
--similarity-weight-embedding 0.6
top_n: 每个特征取评分最高的N个搜索词 (默认10)max_total_searches: 全局最大搜索次数max_searches_per_feature: 每个特征的最大搜索次数search_max_workers: 搜索并发数 (默认3)enable_evaluation: 是否启用结果评估evaluation_max_workers: 评估并发数 (默认10)evaluation_max_notes_per_query: 每个搜索最多评估的帖子数 (默认20)enable_deep_analysis: 是否启用深度解构deep_analysis_min_score: 最低分数阈值 (默认0.8)deep_analysis_max_workers: 解构并发数 (默认5)deep_analysis_api_url: 解构API地址enable_similarity: 是否启用相似度分析similarity_weight_embedding: 向量模型权重 (默认0.5)similarity_weight_semantic: LLM模型权重 (默认0.5)similarity_min_similarity: 最小相似度阈值 (默认0.0)calculate_comprehensive_score: 是否计算综合得分Plogs/app.logoutput_v2/*.json (各步骤的输出文件)cache/search/*.json (搜索结果缓存)visualization/*.html搜索结果可视化:
组合评估结果_分组 → top10_searches → evaluation_with_filterdata[featureIdx]['组合评估结果_分组'][groupIdx]['top10_searches']search_result.data.data[noteIdx]noteEvaluations["featureIdx-groupIdx-swIdx-noteIdx"]解构结果可视化:
evaluated_data[featureIdx]['组合评估结果_分组'][groupIdx]['top10_searches'][swIdx]deconstruction_mapping[note_id] (通过note_id关联)similarity_mapping[note_id] (通过note_id关联)all_features[feature_name] (从how.json提取)note_id (帖子唯一ID)note_id 匹配note_id 匹配原始特征名称 匹配