소스 검색

添加FastAPI服务,复用阶段3-7流程

刘立冬 2 주 전
부모
커밋
ed8a112c80
5개의 변경된 파일424개의 추가작업 그리고 2개의 파일을 삭제
  1. 222 0
      api/CURL_EXAMPLES.md
  2. 197 0
      api/README.md
  3. 1 1
      api/config.py
  4. 1 1
      api/search_service.py
  5. 3 0
      requirements.txt

+ 222 - 0
api/CURL_EXAMPLES.md

@@ -0,0 +1,222 @@
+# 使用curl访问API服务
+
+## 1. 启动服务
+
+首先确保已安装依赖并设置环境变量:
+
+```bash
+# 安装依赖
+pip install -r requirements.txt
+
+# 设置OpenRouter API密钥(必需)
+export OPENROUTER_API_KEY='your-api-key'
+
+# 可选:设置端口(默认8000)
+export API_PORT='8000'
+```
+
+启动服务:
+
+```bash
+python api/main.py
+```
+
+服务启动后,你会看到类似输出:
+```
+INFO:     Started server process
+INFO:     Waiting for application startup.
+INFO:     Pipeline包装器初始化成功
+INFO:     Application startup complete.
+INFO:     Uvicorn running on http://0.0.0.0:8000
+```
+
+## 2. 测试健康检查端点
+
+```bash
+curl http://localhost:8000/health
+```
+
+预期响应:
+```json
+{
+  "status": "healthy",
+  "pipeline_initialized": true
+}
+```
+
+## 3. 发送搜索请求
+
+### 基本请求
+
+```bash
+curl -X POST "http://localhost:8000/what/search" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "original_target": "墨镜",
+    "persona_features": [
+      {"persona_feature_name": "时尚达人"},
+      {"persona_feature_name": "潮流穿搭"},
+      {"persona_feature_name": "配饰搭配"}
+    ],
+    "candidate_words": ["太阳镜", "墨镜", "遮阳", "时尚", "潮流"]
+  }'
+```
+
+### 格式化输出(使用jq)
+
+如果安装了`jq`,可以格式化JSON输出:
+
+```bash
+curl -X POST "http://localhost:8000/what/search" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "original_target": "墨镜",
+    "persona_features": [
+      {"persona_feature_name": "时尚达人"},
+      {"persona_feature_name": "潮流穿搭"},
+      {"persona_feature_name": "配饰搭配"}
+    ],
+    "candidate_words": ["太阳镜", "墨镜", "遮阳", "时尚", "潮流"]
+  }' | jq '.'
+```
+
+### 保存响应到文件
+
+```bash
+curl -X POST "http://localhost:8000/what/search" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "original_target": "墨镜",
+    "persona_features": [
+      {"persona_feature_name": "时尚达人"},
+      {"persona_feature_name": "潮流穿搭"},
+      {"persona_feature_name": "配饰搭配"}
+    ],
+    "candidate_words": ["太阳镜", "墨镜", "遮阳", "时尚", "潮流"]
+  }' -o response.json
+```
+
+### 显示详细请求信息
+
+```bash
+curl -v -X POST "http://localhost:8000/search" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "original_target": "墨镜",
+    "persona_features": [
+      {"persona_feature_name": "时尚达人"},
+      {"persona_feature_name": "潮流穿搭"},
+      {"persona_feature_name": "配饰搭配"}
+    ],
+    "candidate_words": ["太阳镜", "墨镜", "遮阳", "时尚", "潮流"]
+  }'
+```
+
+### 从文件读取请求体
+
+创建请求文件 `request.json`:
+
+```json
+{
+  "original_target": "墨镜",
+  "persona_features": [
+    {"persona_feature_name": "时尚达人"},
+    {"persona_feature_name": "潮流穿搭"},
+    {"persona_feature_name": "配饰搭配"}
+  ],
+  "candidate_words": ["太阳镜", "墨镜", "遮阳", "时尚", "潮流"]
+}
+```
+
+然后使用:
+
+```bash
+curl -X POST "http://localhost:8000/what/search" \
+  -H "Content-Type: application/json" \
+  -d @request.json
+```
+
+## 4. 完整示例脚本
+
+创建一个测试脚本 `test_api.sh`:
+
+```bash
+#!/bin/bash
+
+# 设置变量
+API_URL="http://localhost:8000"
+ORIGINAL_TARGET="墨镜"
+
+# 测试健康检查
+echo "=== 测试健康检查 ==="
+curl -s "${API_URL}/health" | jq '.'
+echo -e "\n"
+
+# 测试搜索请求
+echo "=== 测试搜索请求 ==="
+curl -X POST "${API_URL}/search" \
+  -H "Content-Type: application/json" \
+  -d "{
+    \"original_target\": \"${ORIGINAL_TARGET}\",
+    \"persona_features\": [
+      {\"persona_feature_name\": \"时尚达人\"},
+      {\"persona_feature_name\": \"潮流穿搭\"},
+      {\"persona_feature_name\": \"配饰搭配\"}
+    ],
+    \"candidate_words\": [\"太阳镜\", \"墨镜\", \"遮阳\", \"时尚\", \"潮流\"]
+  }" | jq '.'
+```
+
+运行脚本:
+
+```bash
+chmod +x test_api.sh
+./test_api.sh
+```
+
+## 5. 常见问题
+
+### 连接被拒绝
+
+如果遇到 `Connection refused` 错误:
+
+1. 检查服务是否已启动:
+   ```bash
+   ps aux | grep "api/main.py"
+   ```
+
+2. 检查端口是否被占用:
+   ```bash
+   lsof -i :8000
+   ```
+
+3. 检查防火墙设置
+
+### 500错误
+
+如果返回500错误,检查:
+
+1. OpenRouter API密钥是否设置:
+   ```bash
+   echo $OPENROUTER_API_KEY
+   ```
+
+2. 查看服务日志中的错误信息
+
+### 400错误
+
+如果返回400错误,检查:
+
+1. JSON格式是否正确
+2. 必需字段是否都已提供
+3. 字段值是否符合要求(不能为空等)
+
+## 6. 查看API文档
+
+启动服务后,可以在浏览器中访问:
+
+- Swagger UI: http://localhost:8000/docs
+- ReDoc: http://localhost:8000/redoc
+
+这些页面提供了交互式的API文档,可以直接在浏览器中测试API。
+

+ 197 - 0
api/README.md

@@ -0,0 +1,197 @@
+# 特征搜索API服务
+
+## 概述
+
+这是一个FastAPI服务,复用阶段3-7的完整流程,提供搜索和评估功能。
+
+## 安装依赖
+
+```bash
+pip install -r requirements.txt
+```
+
+## 配置环境变量
+
+```bash
+export OPENROUTER_API_KEY='your-api-key'
+export API_HOST='0.0.0.0'  # 可选,默认0.0.0.0
+export API_PORT='8000'     # 可选,默认8000
+```
+
+## 启动服务
+
+```bash
+python api/main.py
+```
+
+或者使用uvicorn直接启动:
+
+```bash
+uvicorn api.search_service:app --host 0.0.0.0 --port 8000
+```
+
+## API文档
+
+启动服务后,访问以下地址查看自动生成的API文档:
+
+- Swagger UI: http://localhost:8000/docs
+- ReDoc: http://localhost:8000/redoc
+
+## API端点
+
+### POST /what/search
+
+执行搜索和评估
+
+**请求体:**
+
+```json
+{
+  "original_target": "原始目标名称",
+  "persona_features": [
+    {
+      "persona_feature_name": "人设特征名称1"
+    },
+    {
+      "persona_feature_name": "人设特征名称2"
+    },
+    {
+      "persona_feature_name": "人设特征名称3"
+    }
+  ],
+  "candidate_words": ["候选词1", "候选词2", "候选词3"]
+}
+```
+
+**响应:**
+
+```json
+{
+  "original_target": "原始目标名称",
+  "search_results": [
+    {
+      "search_word": "搜索词",
+      "comprehensive_score": 0.85,
+      "comprehensive_score_detail": {
+        "N": 20,
+        "M": 5,
+        "total_contribution": 4.25,
+        "P": 0.2125
+      },
+      "matched_notes": [
+        {
+          "note_id": "帖子ID",
+          "note_title": "帖子标题",
+          "evaluation_score": 0.9,
+          "max_similarity": 0.95,
+          "contribution": 0.855,
+          "note_data": {
+            // 完整的搜索结果信息
+          }
+        }
+      ]
+    }
+  ]
+}
+```
+
+**说明:**
+- 只返回综合得分P > 0的搜索结果
+- `matched_notes`包含完整的帖子信息(`note_data`字段)
+
+### GET /health
+
+健康检查端点
+
+**响应:**
+
+```json
+{
+  "status": "healthy",
+  "pipeline_initialized": true
+}
+```
+
+## 使用示例
+
+### 使用curl
+
+```bash
+curl -X POST "http://localhost:8000/what/search" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "original_target": "墨镜",
+    "persona_features": [
+      {"persona_feature_name": "时尚达人"},
+      {"persona_feature_name": "潮流穿搭"},
+      {"persona_feature_name": "配饰搭配"}
+    ],
+    "candidate_words": ["太阳镜", "墨镜", "遮阳", "时尚", "潮流"]
+  }'
+```
+
+### 使用Python
+
+```python
+import requests
+
+url = "http://localhost:8000/search"
+data = {
+    "original_target": "墨镜",
+    "persona_features": [
+        {"persona_feature_name": "时尚达人"},
+        {"persona_feature_name": "潮流穿搭"},
+        {"persona_feature_name": "配饰搭配"}
+    ],
+    "candidate_words": ["太阳镜", "墨镜", "遮阳", "时尚", "潮流"]
+}
+
+response = requests.post(url, json=data)
+result = response.json()
+print(result)
+```
+
+## 注意事项
+
+1. **不修改现有代码**:API服务独立实现,不修改`src/pipeline/feature_search_pipeline.py`中的阶段1-2
+2. **本地测试不受影响**:现有的`main.py`和`run_full_pipeline`方法保持不变
+3. **临时文件**:服务会在临时目录中创建文件,服务关闭时会自动清理
+4. **综合得分P**:只返回综合得分P > 0的搜索结果
+
+## 配置选项
+
+可以通过环境变量配置以下选项:
+
+- `OPENROUTER_API_KEY`: OpenRouter API密钥(必需)
+- `API_HOST`: API服务主机地址(默认:0.0.0.0)
+- `API_PORT`: API服务端口(默认:8000)
+- `SEARCH_MAX_WORKERS`: 搜索并发数(默认:3)
+- `EVALUATION_MAX_WORKERS`: 评估并发数(默认:10)
+- `EVALUATION_MAX_NOTES_PER_QUERY`: 每个搜索词评估的最大帖子数(默认:20)
+- `DEEP_ANALYSIS_MAX_WORKERS`: 深度解构并发数(默认:5)
+- `DEEP_ANALYSIS_MIN_SCORE`: 深度解构最低分数阈值(默认:0.8)
+- `SIMILARITY_WEIGHT_EMBEDDING`: 相似度分析向量模型权重(默认:0.5)
+- `SIMILARITY_WEIGHT_SEMANTIC`: 相似度分析LLM模型权重(默认:0.5)
+- `SIMILARITY_MAX_WORKERS`: 相似度分析并发数(默认:5)
+- `MAX_CANDIDATES`: 参与组合的最大候选词数(默认:20)
+- `MAX_COMBO_LENGTH`: 最大组合词数(默认:3)
+- `QUERY_GENERATION_MAX_WORKERS`: Query生成并发数(默认:8)
+
+## 错误处理
+
+API会返回适当的HTTP状态码:
+
+- `200`: 成功
+- `400`: 请求参数错误
+- `500`: 内部服务器错误
+- `503`: 服务不可用(Pipeline未初始化)
+
+## 日志
+
+服务会输出详细的日志信息,包括:
+- 请求处理过程
+- 各阶段执行状态
+- 错误信息
+
+日志级别可以通过Python的logging模块配置。
+

+ 1 - 1
api/config.py

@@ -13,7 +13,7 @@ class APIConfig:
     
     # API服务配置
     API_HOST: str = os.getenv("API_HOST", "0.0.0.0")
-    API_PORT: int = int(os.getenv("API_PORT", "8000"))
+    API_PORT: int = int(os.getenv("API_PORT", "8001"))
     
     # Pipeline配置
     OPENROUTER_API_KEY: Optional[str] = os.getenv("OPENROUTER_API_KEY")

+ 1 - 1
api/search_service.py

@@ -99,7 +99,7 @@ async def shutdown_event():
             logger.warning(f"Pipeline包装器清理失败: {e}")
 
 
-@app.post("/search", response_model=SearchResponse)
+@app.post("/what/search", response_model=SearchResponse)
 async def search(request: SearchRequest):
     """
     执行搜索和评估

+ 3 - 0
requirements.txt

@@ -1,2 +1,5 @@
 requests>=2.28.0
 pyyaml>=6.0
+fastapi>=0.104.0
+uvicorn[standard]>=0.24.0
+pydantic>=2.0.0