Browse Source

Initial commit: AI Architecture framework

刘立冬 2 weeks ago
commit
bddd69f63a

+ 135 - 0
.gitignore

@@ -0,0 +1,135 @@
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+.hypothesis/
+.pytest_cache/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# pyenv
+.python-version
+
+# celery beat schedule file
+celerybeat-schedule
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+ai_arch_env/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+
+# OS
+.DS_Store
+.DS_Store?
+._*
+.Spotlight-V100
+.Trashes
+ehthumbs.db
+Thumbs.db
+
+# Project specific
+data/
+logs/
+*.db
+*.sqlite
+*.sqlite3
+
+# API keys and secrets
+.env.local
+.env.production
+.env.staging
+
+# Temporary files
+*.tmp
+*.temp
+temp/
+tmp/

+ 103 - 0
INSTALL.md

@@ -0,0 +1,103 @@
+# 安装说明
+
+## 依赖管理
+
+### 问题说明
+
+在安装过程中,某些依赖可能会因为编译问题而失败:
+
+1. **faiss-cpu**: 需要 SWIG 编译器,在某些系统上编译困难
+2. **selenium**: 需要浏览器驱动,可能增加复杂性
+3. **开发工具**: black, isort, flake8 等是可选工具
+
+### 解决方案
+
+我们提供了两个依赖文件:
+
+#### 1. requirements-minimal.txt(推荐)
+包含核心功能必需的依赖,安装成功率最高:
+
+```bash
+pip install -r requirements-minimal.txt
+```
+
+#### 2. requirements.txt
+包含所有依赖,包括可选工具:
+
+```bash
+pip install -r requirements.txt
+```
+
+### 如果仍然遇到问题
+
+#### 对于 faiss-cpu 问题:
+```bash
+# 方法1:使用 conda 安装
+conda install -c conda-forge faiss-cpu
+
+# 方法2:使用预编译包
+pip install faiss-cpu --no-build-isolation
+
+# 方法3:暂时跳过,只使用 ChromaDB
+# 项目已经配置为默认使用 ChromaDB
+```
+
+#### 对于其他编译问题:
+```bash
+# 安装编译工具
+# Ubuntu/Debian
+sudo apt-get install build-essential
+
+# macOS
+xcode-select --install
+
+# Windows
+# 安装 Visual Studio Build Tools
+```
+
+### 功能影响
+
+移除某些依赖后的功能影响:
+
+1. **FAISS 向量数据库**: 暂时禁用,但 ChromaDB 功能完整
+2. **Selenium**: 网页抓取功能受限,但其他功能正常
+3. **开发工具**: 代码格式化功能受限,但不影响核心功能
+
+### 验证安装
+
+安装完成后,运行以下命令验证:
+
+```bash
+# 运行基本示例
+python examples/basic_usage.py
+
+# 或使用运行器
+python run_examples.py
+```
+
+### 系统要求
+
+- Python 3.8+
+- 内存: 至少 2GB RAM
+- 磁盘空间: 至少 500MB 可用空间
+
+### 支持的平台
+
+- ✅ Linux (Ubuntu 18.04+, CentOS 7+)
+- ✅ macOS (10.14+)
+- ✅ Windows (10+)
+
+### 故障排除
+
+如果遇到安装问题,请:
+
+1. 检查 Python 版本:`python --version`
+2. 更新 pip:`pip install --upgrade pip`
+3. 使用虚拟环境:
+   ```bash
+   python -m venv ai_arch_env
+   source ai_arch_env/bin/activate  # Linux/macOS
+   # 或
+   ai_arch_env\Scripts\activate     # Windows
+   ```
+4. 尝试最小化安装:`pip install -r requirements-minimal.txt`

+ 61 - 0
README.md

@@ -0,0 +1,61 @@
+# AI Architecture - LangChain 通用服务框架
+
+这是一个基于LangChain的AI工程框架,提供了通用的AI服务实现和封装。
+
+## 项目特性
+
+- 🚀 基于LangChain的模块化架构
+- 🔧 通用AI服务封装
+- 📝 完整的文档和示例
+- 🛠️ 易于扩展和定制
+- 🔒 安全配置管理
+
+## 项目结构
+
+```
+ai_arch/
+├── src/                    # 源代码目录
+│   ├── core/              # 核心模块
+│   ├── services/          # 服务层
+│   ├── utils/             # 工具函数
+│   └── config/            # 配置管理
+├── examples/              # 使用示例
+├── tests/                 # 测试文件
+├── docs/                  # 文档
+├── requirements.txt       # 依赖包
+└── README.md             # 项目说明
+```
+
+## 快速开始
+
+1. 安装依赖:
+```bash
+# 安装完整依赖
+pip install -r requirements.txt
+
+# 或安装最小化依赖(推荐)
+pip install -r requirements-minimal.txt
+```
+
+2. 配置环境变量:
+```bash
+cp .env.example .env
+# 编辑 .env 文件,添加您的API密钥
+```
+
+3. 运行示例:
+```bash
+python examples/basic_usage.py
+```
+
+## 主要功能
+
+- **LLM服务封装**: 统一的LLM调用接口,默认使用OpenRouter
+- **向量数据库集成**: 支持多种向量数据库
+- **文档处理**: 自动文档加载和处理
+- **对话管理**: 会话状态管理
+- **工具集成**: 常用AI工具封装
+
+## 许可证
+
+MIT License

+ 389 - 0
docs/README.md

@@ -0,0 +1,389 @@
+# AI Architecture 详细文档
+
+## 概述
+
+AI Architecture 是一个基于 LangChain 的通用 AI 服务框架,提供了模块化的架构设计,支持多种 LLM 提供商、向量数据库和文档处理功能。
+
+## 架构设计
+
+### 核心模块
+
+#### 1. 配置管理 (`src/config/`)
+- **Settings**: 统一的配置管理类,支持环境变量和配置文件
+- 支持多种配置源:环境变量、.env 文件、默认值
+- 类型验证和配置验证
+
+#### 2. 工具模块 (`src/utils/`)
+- **Logger**: 统一的日志管理,支持控制台和文件输出
+- **CacheManager**: 内存缓存管理,支持 TTL 和装饰器模式
+
+#### 3. 核心功能 (`src/core/`)
+- **LLMManager**: 多提供商 LLM 管理
+- **VectorStoreManager**: 向量数据库管理
+- **DocumentProcessor**: 文档处理和分割
+
+#### 4. 服务层 (`src/services/`)
+- **ChatService**: 聊天和会话管理
+- **QAService**: 基于向量数据库的问答系统
+- **DocumentService**: 文档管理和搜索服务
+
+## 详细功能说明
+
+### LLM 管理器
+
+支持多种 LLM 提供商:
+
+```python
+from src.core.llm_manager import llm_manager
+
+# 创建 OpenAI LLM
+llm = llm_manager.create_llm("openai", model="gpt-4")
+
+# 创建 Anthropic LLM
+llm = llm_manager.create_llm("anthropic", model="claude-3-sonnet-20240229")
+
+# 测试连接
+if llm_manager.test_connection("openai"):
+    print("连接成功")
+```
+
+### 向量数据库管理
+
+支持 ChromaDB(FAISS 因编译问题暂时禁用):
+
+```python
+from src.core.vector_store import vector_store_manager
+
+# 创建 ChromaDB 存储
+chroma_store = vector_store_manager.create_chroma_store("my_collection")
+
+# FAISS 存储暂时禁用(需要 SWIG 编译器)
+# faiss_store = vector_store_manager.create_faiss_store("my_index")
+
+# 添加文档
+vector_store_manager.add_documents(documents, "chroma", "my_collection")
+
+# 相似性搜索
+results = vector_store_manager.similarity_search("查询文本", k=4)
+```
+
+### 文档处理
+
+支持多种文档格式:
+
+```python
+from src.core.document_processor import document_processor
+
+# 支持的格式
+formats = document_processor.supported_extensions
+# {'.pdf': PyPDFLoader, '.docx': Docx2txtLoader, '.txt': TextLoader, ...}
+
+# 处理单个文档
+documents = document_processor.process_document_pipeline("document.pdf")
+
+# 处理目录
+documents = document_processor.load_directory("./docs", recursive=True)
+
+# 分割文档
+split_docs = document_processor.split_documents(documents, chunk_size=1000)
+```
+
+### 聊天服务
+
+完整的会话管理:
+
+```python
+from src.services.chat_service import chat_service
+
+# 创建会话
+session = chat_service.create_session("user_123", "你是一个有用的助手")
+
+# 发送消息
+response = chat_service.send_message("user_123", "你好")
+
+# 获取会话历史
+messages = session.get_messages()
+
+# 获取会话摘要
+summary = chat_service.get_conversation_summary("user_123")
+```
+
+### 问答服务
+
+基于向量数据库的问答系统:
+
+```python
+from src.services.qa_service import qa_service
+
+# 添加文档到知识库
+qa_service.add_documents_for_qa(documents, collection_name="knowledge_base")
+
+# 提问
+answer = qa_service.ask_question("什么是人工智能?", collection_name="knowledge_base")
+
+# 批量问答
+questions = ["问题1", "问题2", "问题3"]
+results = qa_service.batch_qa(questions, collection_name="knowledge_base")
+```
+
+### 文档服务
+
+文档管理和搜索:
+
+```python
+from src.services.document_service import document_service
+
+# 处理并存储文档
+result = document_service.process_and_store_document(
+    "document.pdf", 
+    collection_name="my_docs"
+)
+
+# 搜索文档
+results = document_service.search_documents("关键词", k=5)
+
+# 获取集合信息
+stats = document_service.get_collection_info("my_docs")
+```
+
+## 配置说明
+
+### 环境变量
+
+复制 `env.example` 为 `.env` 并配置:
+
+```bash
+# OpenAI 配置
+OPENAI_API_KEY=your_openai_api_key
+OPENAI_API_BASE=https://api.openai.com/v1
+
+# Anthropic 配置
+ANTHROPIC_API_KEY=your_anthropic_api_key
+
+# 向量数据库配置
+CHROMA_DB_PATH=./data/chroma_db
+FAISS_INDEX_PATH=./data/faiss_index
+
+# 日志配置
+LOG_LEVEL=INFO
+LOG_FILE=./logs/ai_arch.log
+
+# 应用配置
+APP_NAME=AI_Architecture
+DEBUG=false
+
+# 缓存配置
+CACHE_ENABLED=true
+CACHE_TTL=3600
+
+# LLM 参数
+MAX_TOKENS=4000
+TEMPERATURE=0.7
+```
+
+### 配置优先级
+
+1. 环境变量
+2. .env 文件
+3. 默认值
+
+## 使用示例
+
+### 基本使用
+
+```python
+from src.services.chat_service import chat_service
+from src.services.qa_service import qa_service
+
+# 创建聊天会话
+session = chat_service.create_session("user_123")
+response = chat_service.send_message("user_123", "你好")
+
+# 创建问答系统
+qa_service.add_documents_for_qa(documents)
+answer = qa_service.ask_question("问题")
+```
+
+### 高级使用
+
+```python
+from src.core.llm_manager import llm_manager
+from src.core.vector_store import vector_store_manager
+
+# 多提供商支持
+llm_openai = llm_manager.create_llm("openai", model="gpt-4")
+llm_anthropic = llm_manager.create_llm("anthropic", model="claude-3-sonnet")
+
+# 向量数据库操作
+vector_store = vector_store_manager.get_or_create_store("chroma", "my_collection")
+vector_store.add_documents(documents)
+results = vector_store.similarity_search("查询")
+```
+
+## 扩展开发
+
+### 添加新的 LLM 提供商
+
+```python
+from src.core.llm_manager import BaseLLMProvider
+
+class CustomProvider(BaseLLMProvider):
+    def create_llm(self, **kwargs):
+        # 实现创建 LLM 的逻辑
+        pass
+    
+    def get_available_models(self):
+        # 返回可用模型列表
+        return ["model1", "model2"]
+```
+
+### 添加新的向量数据库
+
+```python
+from src.core.vector_store import VectorStoreManager
+
+# 在 VectorStoreManager 中添加新的存储类型
+def create_custom_store(self, name: str):
+    # 实现自定义存储逻辑
+    pass
+```
+
+### 添加新的文档格式
+
+```python
+from src.core.document_processor import DocumentProcessor
+
+# 在 DocumentProcessor 中添加新的加载器
+self.supported_extensions['.custom'] = CustomLoader
+```
+
+## 最佳实践
+
+### 1. 错误处理
+
+```python
+try:
+    response = chat_service.send_message(session_id, message)
+except ValueError as e:
+    logger.error(f"参数错误: {e}")
+except Exception as e:
+    logger.error(f"未知错误: {e}")
+```
+
+### 2. 缓存使用
+
+```python
+from src.utils.cache import cache_manager
+
+# 使用缓存装饰器
+@cache_manager.cache(ttl=3600)
+def expensive_operation():
+    # 耗时操作
+    pass
+
+# 手动缓存管理
+cache_manager.set("key", value, ttl=1800)
+cached_value = cache_manager.get("key")
+```
+
+### 3. 日志记录
+
+```python
+from loguru import logger
+
+logger.info("操作开始")
+logger.debug("调试信息")
+logger.warning("警告信息")
+logger.error("错误信息")
+```
+
+### 4. 配置管理
+
+```python
+from src.config.settings import settings
+
+# 使用配置
+if settings.debug:
+    logger.setLevel("DEBUG")
+
+if settings.cache_enabled:
+    # 启用缓存
+    pass
+```
+
+## 性能优化
+
+### 1. 批量处理
+
+```python
+# 批量处理文档
+documents = document_processor.load_directory("./docs")
+vector_store_manager.add_documents(documents)
+
+# 批量问答
+questions = ["问题1", "问题2", "问题3"]
+results = qa_service.batch_qa(questions)
+```
+
+### 2. 缓存策略
+
+```python
+# 缓存 LLM 实例
+llm = llm_manager.get_or_create_llm("openai", model="gpt-4")
+
+# 缓存向量存储
+vector_store = vector_store_manager.get_or_create_store("chroma", "collection")
+```
+
+### 3. 异步处理
+
+```python
+import asyncio
+
+async def async_qa(questions):
+    tasks = [qa_service.ask_question(q) for q in questions]
+    results = await asyncio.gather(*tasks)
+    return results
+```
+
+## 故障排除
+
+### 常见问题
+
+1. **API 密钥错误**
+   - 检查环境变量设置
+   - 验证 API 密钥有效性
+
+2. **向量数据库连接失败**
+   - 检查数据库路径权限
+   - 验证数据库配置
+
+3. **文档处理失败**
+   - 检查文件格式支持
+   - 验证文件完整性
+
+4. **内存不足**
+   - 调整文档分块大小
+   - 启用缓存清理
+
+### 调试模式
+
+```python
+# 启用调试模式
+settings.debug = True
+logger.setLevel("DEBUG")
+
+# 测试连接
+llm_manager.test_connection("openai")
+```
+
+## 贡献指南
+
+1. Fork 项目
+2. 创建功能分支
+3. 提交更改
+4. 创建 Pull Request
+
+## 许可证
+
+MIT License

+ 31 - 0
env.example

@@ -0,0 +1,31 @@
+# OpenAI API 配置
+OPENAI_API_KEY=your_openai_api_key_here
+OPENAI_API_BASE=https://api.openai.com/v1
+
+# Anthropic API 配置
+ANTHROPIC_API_KEY=your_anthropic_api_key_here
+
+# OpenRouter API 配置
+OPENROUTER_API_KEY=your_openrouter_api_key_here
+OPENROUTER_API_BASE=https://openrouter.ai/api/v1
+
+# 向量数据库配置
+CHROMA_DB_PATH=./data/chroma_db
+FAISS_INDEX_PATH=./data/faiss_index
+
+# 日志配置
+LOG_LEVEL=INFO
+LOG_FILE=./logs/ai_arch.log
+
+# 应用配置
+APP_NAME=AI_Architecture
+APP_VERSION=1.0.0
+DEBUG=false
+
+# 缓存配置
+CACHE_ENABLED=true
+CACHE_TTL=3600
+
+# 安全配置
+MAX_TOKENS=4000
+TEMPERATURE=0.7

+ 3 - 0
examples/__init__.py

@@ -0,0 +1,3 @@
+"""
+使用示例
+"""

+ 284 - 0
examples/advanced_usage.py

@@ -0,0 +1,284 @@
+"""
+高级使用示例 - 展示更多功能
+"""
+
+import os
+import sys
+from pathlib import Path
+
+# 添加项目根目录到Python路径
+project_root = Path(__file__).parent.parent
+sys.path.insert(0, str(project_root))
+
+from src.core.llm_manager import llm_manager
+from src.core.vector_store import vector_store_manager
+from src.core.document_processor import document_processor
+from src.services.chat_service import chat_service
+from src.services.qa_service import qa_service
+from src.services.document_service import document_service
+from src.utils.logger import setup_logger
+from src.utils.cache import cache_manager
+from loguru import logger
+
+
+def demo_multi_provider_llm():
+    """演示多提供商LLM使用"""
+    print("=== 多提供商LLM演示 ===")
+    
+    providers = ["openrouter", "openai", "anthropic"]
+    
+    for provider in providers:
+        try:
+            print(f"\n使用 {provider} 提供商:")
+            
+            # 获取可用模型
+            models = llm_manager.get_available_models(provider)
+            print(f"  可用模型: {models[:3]}...")  # 只显示前3个
+            
+            # 测试连接
+            if llm_manager.test_connection(provider):
+                print(f"  {provider} 连接正常")
+                
+                # 创建LLM实例
+                llm = llm_manager.get_or_create_llm(provider)
+                
+                # 简单测试
+                response = llm.invoke("请用一句话介绍自己")
+                print(f"  回复: {response.content[:100]}...")
+            else:
+                print(f"  {provider} 连接失败")
+                
+        except Exception as e:
+            print(f"  {provider} 测试失败: {e}")
+
+
+def demo_advanced_chat():
+    """演示高级聊天功能"""
+    print("\n=== 高级聊天功能演示 ===")
+    
+    # 创建多个会话
+    sessions = [
+        ("session_1", "你是一个技术专家,专门回答编程问题"),
+        ("session_2", "你是一个创意写作助手,帮助用户创作故事"),
+        ("session_3", "你是一个数据分析师,帮助用户分析数据")
+    ]
+    
+    for session_id, system_prompt in sessions:
+        try:
+            # 创建会话
+            session = chat_service.create_session(session_id, system_prompt)
+            print(f"\n创建会话: {session_id}")
+            
+            # 发送消息
+            if session_id == "session_1":
+                message = "请解释什么是Python装饰器"
+            elif session_id == "session_2":
+                message = "请写一个关于太空探险的短故事开头"
+            else:
+                message = "如何分析销售数据的趋势?"
+            
+            response = chat_service.send_message(session_id, message)
+            print(f"  问题: {message}")
+            print(f"  回答: {response['assistant_message'][:150]}...")
+            
+        except Exception as e:
+            print(f"  会话 {session_id} 失败: {e}")
+    
+    # 获取会话摘要
+    for session_id, _ in sessions:
+        try:
+            summary = chat_service.get_conversation_summary(session_id)
+            print(f"\n会话 {session_id} 摘要: {summary[:100]}...")
+        except Exception as e:
+            print(f"获取会话 {session_id} 摘要失败: {e}")
+
+
+def demo_document_processing():
+    """演示文档处理功能"""
+    print("\n=== 文档处理功能演示 ===")
+    
+    # 创建不同类型的测试文档
+    test_docs = {
+        "technical.txt": """
+人工智能技术发展迅速,机器学习算法在各个领域都有广泛应用。
+深度学习模型如GPT、BERT等在自然语言处理任务中表现出色。
+计算机视觉技术也在图像识别、目标检测等方面取得了重大突破。
+        """,
+        "business.txt": """
+企业数字化转型是当前的重要趋势,云计算、大数据分析等技术正在改变商业模式。
+远程办公和在线协作工具的使用率大幅提升,推动了企业效率的提升。
+数据安全和隐私保护成为企业关注的重点问题。
+        """,
+        "creative.txt": """
+创意写作需要丰富的想象力和独特的视角,每个故事都有其独特的魅力。
+文学创作不仅仅是文字的堆砌,更是情感和思想的表达。
+好的作品能够引起读者的共鸣,传递深刻的人生哲理。
+        """
+    }
+    
+    # 创建文档
+    for filename, content in test_docs.items():
+        with open(filename, "w", encoding="utf-8") as f:
+            f.write(content.strip())
+    
+    try:
+        # 批量处理文档
+        for filename in test_docs.keys():
+            print(f"\n处理文档: {filename}")
+            
+            # 验证文档
+            validation = document_service.validate_document(filename)
+            print(f"  验证结果: {validation['valid']}")
+            
+            if validation["valid"]:
+                # 处理文档
+                result = document_service.process_and_store_document(
+                    filename,
+                    collection_name=f"demo_{filename.split('.')[0]}",
+                    additional_metadata={"category": "demo", "filename": filename}
+                )
+                print(f"  处理完成: {result['processed_documents']} 个片段")
+        
+        # 搜索文档
+        search_queries = ["人工智能", "企业", "创意"]
+        for query in search_queries:
+            print(f"\n搜索关键词: {query}")
+            results = document_service.search_documents_with_scores(query, k=3)
+            for i, result in enumerate(results, 1):
+                print(f"  结果 {i}: {result['content'][:100]}... (相似度: {result['similarity_score']:.3f})")
+        
+        # 获取集合信息
+        collections = document_service.get_document_collections()
+        print(f"\n文档集合: {collections}")
+        
+    except Exception as e:
+        print(f"文档处理演示失败: {e}")
+    
+    # 清理测试文件
+    for filename in test_docs.keys():
+        if os.path.exists(filename):
+            os.remove(filename)
+
+
+def demo_advanced_qa():
+    """演示高级问答功能"""
+    print("\n=== 高级问答功能演示 ===")
+    
+    # 创建知识库文档
+    knowledge_doc = """
+LangChain是一个用于开发由语言模型驱动的应用程序的框架。
+它提供了以下主要功能:
+1. 模型管理:支持多种LLM提供商,如OpenAI、Anthropic等
+2. 提示管理:提供灵活的提示模板和链式处理
+3. 记忆管理:支持对话历史和上下文管理
+4. 索引和检索:集成向量数据库,支持文档检索
+5. 代理和工具:支持工具调用和自动化任务
+
+LangChain的主要优势:
+- 模块化设计,易于扩展
+- 丰富的集成选项
+- 活跃的社区支持
+- 详细的文档和示例
+    """
+    
+    # 创建测试文档
+    test_file = "langchain_knowledge.txt"
+    with open(test_file, "w", encoding="utf-8") as f:
+        f.write(knowledge_doc)
+    
+    try:
+        # 处理文档并添加到问答系统
+        documents = document_processor.process_document_pipeline(test_file)
+        qa_service.add_documents_for_qa(documents, collection_name="langchain_qa")
+        
+        # 批量问答
+        questions = [
+            "LangChain是什么?",
+            "LangChain有哪些主要功能?",
+            "LangChain的优势是什么?",
+            "如何使用LangChain进行文档检索?"
+        ]
+        
+        print("批量问答测试:")
+        results = qa_service.batch_qa(questions, collection_name="langchain_qa")
+        
+        for i, result in enumerate(results, 1):
+            print(f"\n问题 {i}: {result['question']}")
+            print(f"答案: {result['answer'][:200]}...")
+            print(f"来源数量: {result['source_count']}")
+        
+        # 相似文档搜索
+        print(f"\n相似文档搜索:")
+        similar_docs = qa_service.search_similar_documents(
+            "LangChain框架", 
+            k=3, 
+            collection_name="langchain_qa"
+        )
+        
+        for i, doc in enumerate(similar_docs, 1):
+            print(f"  文档 {i}: {doc.page_content[:150]}...")
+        
+    except Exception as e:
+        print(f"高级问答演示失败: {e}")
+    
+    # 清理测试文件
+    if os.path.exists(test_file):
+        os.remove(test_file)
+
+
+def demo_cache_management():
+    """演示缓存管理功能"""
+    print("\n=== 缓存管理功能演示 ===")
+    
+    # 测试缓存功能
+    test_data = {"key": "value", "number": 42, "list": [1, 2, 3]}
+    
+    # 设置缓存
+    cache_manager.set("test_key", test_data, ttl=60)
+    print("设置缓存: test_key")
+    
+    # 获取缓存
+    cached_data = cache_manager.get("test_key")
+    if cached_data:
+        print(f"获取缓存成功: {cached_data}")
+    else:
+        print("缓存获取失败")
+    
+    # 删除缓存
+    cache_manager.delete("test_key")
+    print("删除缓存: test_key")
+    
+    # 验证删除
+    cached_data = cache_manager.get("test_key")
+    if cached_data is None:
+        print("缓存删除成功")
+    else:
+        print("缓存删除失败")
+
+
+def main():
+    """主函数"""
+    print("AI Architecture 高级使用示例")
+    print("=" * 60)
+    
+    # 设置日志
+    setup_logger()
+    
+    # 检查环境变量
+    if not os.getenv("OPENROUTER_API_KEY"):
+        print("警告: 未设置OPENROUTER_API_KEY环境变量")
+        print("请复制env.example为.env并设置您的OpenRouter API密钥")
+        return
+    
+    # 运行演示
+    demo_multi_provider_llm()
+    demo_advanced_chat()
+    demo_document_processing()
+    demo_advanced_qa()
+    demo_cache_management()
+    
+    print("\n高级示例运行完成!")
+
+
+if __name__ == "__main__":
+    main()

+ 152 - 0
examples/basic_usage.py

@@ -0,0 +1,152 @@
+"""
+基本使用示例
+"""
+
+import os
+import sys
+from pathlib import Path
+
+# 添加项目根目录到Python路径
+project_root = Path(__file__).parent.parent
+sys.path.insert(0, str(project_root))
+
+from src.core.llm_manager import llm_manager
+from src.core.vector_store import vector_store_manager
+from src.core.document_processor import document_processor
+from src.services.chat_service import chat_service
+from src.services.qa_service import qa_service
+from src.services.document_service import document_service
+from src.utils.logger import setup_logger
+from loguru import logger
+
+
+def test_llm_manager():
+    """测试LLM管理器"""
+    print("=== 测试LLM管理器 ===")
+    
+    # 列出可用的提供商
+    providers = llm_manager.list_providers()
+    print(f"可用的LLM提供商: {providers}")
+    
+    # 获取OpenRouter可用模型
+    openrouter_models = llm_manager.get_available_models("openrouter")
+    print(f"OpenRouter可用模型: {openrouter_models[:5]}...")  # 只显示前5个
+    
+    # 测试连接
+    if llm_manager.test_connection("openrouter"):
+        print("OpenRouter连接测试成功")
+    else:
+        print("OpenRouter连接测试失败")
+
+
+def test_chat_service():
+    """测试聊天服务"""
+    print("\n=== 测试聊天服务 ===")
+    
+    # 创建会话
+    session = chat_service.create_session("test_session", "你是一个有用的AI助手")
+    print(f"创建会话: {session.session_id}")
+    
+    # 发送消息
+    try:
+        response = chat_service.send_message("test_session", "你好,请介绍一下自己")
+        print(f"AI回复: {response['assistant_message']}")
+    except Exception as e:
+        print(f"聊天失败: {e}")
+    
+    # 列出会话
+    sessions = chat_service.list_sessions()
+    print(f"当前会话: {sessions}")
+
+
+def test_document_service():
+    """测试文档服务"""
+    print("\n=== 测试文档服务 ===")
+    
+    # 创建测试文档
+    test_doc_path = "test_document.txt"
+    with open(test_doc_path, "w", encoding="utf-8") as f:
+        f.write("这是一个测试文档。\n它包含一些示例文本。\n用于测试文档处理功能。")
+    
+    try:
+        # 验证文档
+        validation = document_service.validate_document(test_doc_path)
+        print(f"文档验证结果: {validation}")
+        
+        if validation["valid"]:
+            # 处理并存储文档
+            result = document_service.process_and_store_document(
+                test_doc_path, 
+                collection_name="test_collection"
+            )
+            print(f"文档处理结果: {result}")
+            
+            # 搜索文档
+            search_results = document_service.search_documents(
+                "测试", 
+                collection_name="test_collection"
+            )
+            print(f"搜索结果: {len(search_results)} 个文档")
+            
+    except Exception as e:
+        print(f"文档处理失败: {e}")
+    
+    # 清理测试文件
+    if os.path.exists(test_doc_path):
+        os.remove(test_doc_path)
+
+
+def test_qa_service():
+    """测试问答服务"""
+    print("\n=== 测试问答服务 ===")
+    
+    try:
+        # 创建测试文档并添加到问答系统
+        test_doc_path = "qa_test_doc.txt"
+        with open(test_doc_path, "w", encoding="utf-8") as f:
+            f.write("人工智能(AI)是计算机科学的一个分支,旨在创建能够执行通常需要人类智能的任务的系统。")
+        
+        # 处理文档
+        documents = document_processor.process_document_pipeline(test_doc_path)
+        qa_service.add_documents_for_qa(documents, collection_name="qa_test")
+        
+        # 提问
+        question = "什么是人工智能?"
+        answer = qa_service.ask_question(question, collection_name="qa_test")
+        print(f"问题: {question}")
+        print(f"答案: {answer['answer']}")
+        print(f"来源数量: {answer['source_count']}")
+        
+        # 清理测试文件
+        if os.path.exists(test_doc_path):
+            os.remove(test_doc_path)
+            
+    except Exception as e:
+        print(f"问答测试失败: {e}")
+
+
+def main():
+    """主函数"""
+    print("AI Architecture 基本使用示例")
+    print("=" * 50)
+    
+    # 设置日志
+    setup_logger()
+    
+    # 检查环境变量
+    if not os.getenv("OPENROUTER_API_KEY"):
+        print("警告: 未设置OPENROUTER_API_KEY环境变量")
+        print("请复制env.example为.env并设置您的OpenRouter API密钥")
+        return
+    
+    # 运行测试
+    test_llm_manager()
+    test_chat_service()
+    test_document_service()
+    test_qa_service()
+    
+    print("\n示例运行完成!")
+
+
+if __name__ == "__main__":
+    main()

+ 32 - 0
requirements-minimal.txt

@@ -0,0 +1,32 @@
+# 核心依赖 - 最小化版本
+langchain>=0.1.0
+langchain-openai>=0.0.5
+langchain-community>=0.0.10
+
+# LLM 提供商
+openai>=1.0.0
+anthropic>=0.7.0
+
+# 向量数据库
+chromadb>=0.4.0
+
+# 文档处理
+pypdf>=3.15.0
+python-docx>=0.8.11
+
+# 基础工具
+requests>=2.31.0
+
+# 数据处理
+numpy>=1.24.0
+
+# 配置管理
+python-dotenv>=1.0.0
+pydantic>=2.0.0
+pydantic-settings>=2.0.0
+
+# 日志
+loguru>=0.7.0
+
+# 测试
+pytest>=7.4.0

+ 43 - 0
requirements.txt

@@ -0,0 +1,43 @@
+# LangChain 核心依赖
+langchain>=0.1.0
+langchain-openai>=0.0.5
+langchain-community>=0.0.10
+
+# LLM 提供商
+openai>=1.0.0
+anthropic>=0.7.0
+
+# 向量数据库
+chromadb>=0.4.0
+# faiss-cpu>=1.7.0  # 需要SWIG编译器,暂时注释
+
+# 文档处理
+pypdf>=3.15.0
+python-docx>=0.8.11
+markdown>=3.4.0
+
+# 工具和工具包
+requests>=2.31.0
+beautifulsoup4>=4.12.0
+selenium>=4.15.0
+
+# 数据处理
+pandas>=2.0.0
+numpy>=1.24.0
+
+# 配置管理
+python-dotenv>=1.0.0
+pydantic>=2.0.0
+pydantic-settings>=2.0.0
+
+# 日志和监控
+loguru>=0.7.0
+
+# 测试
+pytest>=7.4.0
+pytest-asyncio>=0.21.0
+
+# 开发工具(可选)
+# black>=23.0.0
+# isort>=5.12.0
+# flake8>=6.0.0

+ 67 - 0
run_examples.py

@@ -0,0 +1,67 @@
+#!/usr/bin/env python3
+"""
+AI Architecture 示例运行脚本
+"""
+
+import sys
+import os
+from pathlib import Path
+
+# 添加项目根目录到Python路径
+project_root = Path(__file__).parent
+sys.path.insert(0, str(project_root))
+
+def main():
+    """主函数"""
+    print("AI Architecture 示例运行器")
+    print("=" * 40)
+    
+    # 检查环境变量
+    if not os.getenv("OPENROUTER_API_KEY"):
+        print("❌ 错误: 未设置 OPENROUTER_API_KEY 环境变量")
+        print("\n请按以下步骤配置:")
+        print("1. 复制 env.example 为 .env")
+        print("2. 在 .env 文件中设置您的 OpenRouter API 密钥")
+        print("3. 重新运行此脚本")
+        return
+    
+    print("✅ 环境变量检查通过")
+    
+    # 选择要运行的示例
+    print("\n请选择要运行的示例:")
+    print("1. 基本使用示例 (basic_usage.py)")
+    print("2. 高级使用示例 (advanced_usage.py)")
+    print("3. 运行所有示例")
+    
+    try:
+        choice = input("\n请输入选择 (1-3): ").strip()
+        
+        if choice == "1":
+            print("\n运行基本使用示例...")
+            from examples.basic_usage import main as basic_main
+            basic_main()
+        elif choice == "2":
+            print("\n运行高级使用示例...")
+            from examples.advanced_usage import main as advanced_main
+            advanced_main()
+        elif choice == "3":
+            print("\n运行所有示例...")
+            from examples.basic_usage import main as basic_main
+            from examples.advanced_usage import main as advanced_main
+            
+            print("\n" + "="*50)
+            basic_main()
+            print("\n" + "="*50)
+            advanced_main()
+        else:
+            print("❌ 无效选择")
+            return
+            
+    except KeyboardInterrupt:
+        print("\n\n👋 用户取消操作")
+    except Exception as e:
+        print(f"\n❌ 运行失败: {e}")
+        print("请检查配置和依赖是否正确安装")
+
+if __name__ == "__main__":
+    main()

+ 71 - 0
setup.py

@@ -0,0 +1,71 @@
+"""
+AI Architecture 安装脚本
+"""
+
+from setuptools import setup, find_packages
+from pathlib import Path
+
+# 读取 README 文件
+readme_path = Path(__file__).parent / "README.md"
+long_description = readme_path.read_text(encoding="utf-8") if readme_path.exists() else ""
+
+# 读取 requirements.txt
+requirements_path = Path(__file__).parent / "requirements.txt"
+requirements = []
+if requirements_path.exists():
+    requirements = requirements_path.read_text().splitlines()
+
+setup(
+    name="ai-architecture",
+    version="1.0.0",
+    author="AI Architecture Team",
+    author_email="team@ai-architecture.com",
+    description="基于LangChain的AI通用服务框架",
+    long_description=long_description,
+    long_description_content_type="text/markdown",
+    url="https://github.com/your-username/ai-architecture",
+    packages=find_packages(),
+    classifiers=[
+        "Development Status :: 4 - Beta",
+        "Intended Audience :: Developers",
+        "License :: OSI Approved :: MIT License",
+        "Operating System :: OS Independent",
+        "Programming Language :: Python :: 3",
+        "Programming Language :: Python :: 3.8",
+        "Programming Language :: Python :: 3.9",
+        "Programming Language :: Python :: 3.10",
+        "Programming Language :: Python :: 3.11",
+        "Topic :: Scientific/Engineering :: Artificial Intelligence",
+        "Topic :: Software Development :: Libraries :: Python Modules",
+    ],
+    python_requires=">=3.8",
+    install_requires=requirements,
+    extras_require={
+        "dev": [
+            "pytest>=7.4.0",
+            "pytest-asyncio>=0.21.0",
+            "black>=23.0.0",
+            "isort>=5.12.0",
+            "flake8>=6.0.0",
+        ],
+        "docs": [
+            "sphinx>=4.0.0",
+            "sphinx-rtd-theme>=1.0.0",
+        ],
+    },
+    entry_points={
+        "console_scripts": [
+            "ai-arch=run_examples:main",
+        ],
+    },
+    include_package_data=True,
+    package_data={
+        "": ["*.md", "*.txt", "*.yml", "*.yaml"],
+    },
+    keywords="ai langchain llm vector-database document-processing chat qa",
+    project_urls={
+        "Bug Reports": "https://github.com/your-username/ai-architecture/issues",
+        "Source": "https://github.com/your-username/ai-architecture",
+        "Documentation": "https://ai-architecture.readthedocs.io/",
+    },
+)

+ 6 - 0
src/__init__.py

@@ -0,0 +1,6 @@
+"""
+AI Architecture - LangChain 通用服务框架
+"""
+
+__version__ = "1.0.0"
+__author__ = "AI Architecture Team"

+ 7 - 0
src/config/__init__.py

@@ -0,0 +1,7 @@
+"""
+配置管理模块
+"""
+
+from .settings import Settings
+
+__all__ = ["Settings"]

+ 52 - 0
src/config/settings.py

@@ -0,0 +1,52 @@
+"""
+应用配置管理
+"""
+
+from typing import Optional
+from pydantic import Field
+from pydantic_settings import BaseSettings
+
+
+class Settings(BaseSettings):
+    """应用配置类"""
+    
+    # 应用基本信息
+    app_name: str = Field(default="AI_Architecture", description="应用名称")
+    app_version: str = Field(default="1.0.0", description="应用版本")
+    debug: bool = Field(default=False, description="调试模式")
+    
+    # OpenAI 配置
+    openai_api_key: Optional[str] = Field(default=None, description="OpenAI API密钥")
+    openai_api_base: str = Field(default="https://api.openai.com/v1", description="OpenAI API基础URL")
+    
+    # Anthropic 配置
+    anthropic_api_key: Optional[str] = Field(default=None, description="Anthropic API密钥")
+    
+    # OpenRouter 配置
+    openrouter_api_key: Optional[str] = Field(default=None, description="OpenRouter API密钥")
+    openrouter_api_base: str = Field(default="https://openrouter.ai/api/v1", description="OpenRouter API基础URL")
+    
+    # 向量数据库配置
+    chroma_db_path: str = Field(default="./data/chroma_db", description="ChromaDB路径")
+    faiss_index_path: str = Field(default="./data/faiss_index", description="FAISS索引路径")
+    
+    # 日志配置
+    log_level: str = Field(default="INFO", description="日志级别")
+    log_file: str = Field(default="./logs/ai_arch.log", description="日志文件路径")
+    
+    # 缓存配置
+    cache_enabled: bool = Field(default=True, description="是否启用缓存")
+    cache_ttl: int = Field(default=3600, description="缓存TTL(秒)")
+    
+    # LLM 参数配置
+    max_tokens: int = Field(default=4000, description="最大token数")
+    temperature: float = Field(default=0.7, description="温度参数")
+    
+    class Config:
+        env_file = ".env"
+        env_file_encoding = "utf-8"
+        case_sensitive = False
+
+
+# 全局配置实例
+settings = Settings()

+ 9 - 0
src/core/__init__.py

@@ -0,0 +1,9 @@
+"""
+核心模块
+"""
+
+from .llm_manager import LLMManager
+from .vector_store import VectorStoreManager
+from .document_processor import DocumentProcessor
+
+__all__ = ["LLMManager", "VectorStoreManager", "DocumentProcessor"]

+ 174 - 0
src/core/document_processor.py

@@ -0,0 +1,174 @@
+"""
+文档处理器 - 处理各种格式的文档
+"""
+
+from typing import List, Dict, Any, Optional
+from pathlib import Path
+from loguru import logger
+
+from langchain.schema import Document
+from langchain.text_splitter import RecursiveCharacterTextSplitter
+from langchain_community.document_loaders import (
+    PyPDFLoader,
+    Docx2txtLoader,
+    TextLoader,
+    CSVLoader,
+    UnstructuredMarkdownLoader
+)
+
+from ..config.settings import settings
+from ..utils.cache import cache_manager
+
+
+class DocumentProcessor:
+    """文档处理器"""
+    
+    def __init__(self):
+        self.text_splitter = RecursiveCharacterTextSplitter(
+            chunk_size=1000,
+            chunk_overlap=200,
+            length_function=len,
+        )
+        self.supported_extensions = {
+            '.pdf': PyPDFLoader,
+            '.docx': Docx2txtLoader,
+            '.txt': TextLoader,
+            '.csv': CSVLoader,
+            '.md': UnstructuredMarkdownLoader
+        }
+    
+    def load_document(self, file_path: str) -> List[Document]:
+        """加载单个文档"""
+        file_path = Path(file_path)
+        
+        if not file_path.exists():
+            raise FileNotFoundError(f"文件不存在: {file_path}")
+        
+        extension = file_path.suffix.lower()
+        if extension not in self.supported_extensions:
+            raise ValueError(f"不支持的文件格式: {extension}")
+        
+        loader_class = self.supported_extensions[extension]
+        loader = loader_class(str(file_path))
+        
+        try:
+            documents = loader.load()
+            logger.info(f"成功加载文档: {file_path}, 页数: {len(documents)}")
+            return documents
+        except Exception as e:
+            logger.error(f"加载文档失败: {file_path}, 错误: {e}")
+            raise
+    
+    def load_documents(self, file_paths: List[str]) -> List[Document]:
+        """批量加载文档"""
+        all_documents = []
+        
+        for file_path in file_paths:
+            try:
+                documents = self.load_document(file_path)
+                all_documents.extend(documents)
+            except Exception as e:
+                logger.error(f"跳过文件 {file_path}: {e}")
+                continue
+        
+        logger.info(f"总共加载了 {len(all_documents)} 个文档片段")
+        return all_documents
+    
+    def load_directory(self, directory_path: str, recursive: bool = True) -> List[Document]:
+        """加载目录中的所有文档"""
+        directory = Path(directory_path)
+        
+        if not directory.exists():
+            raise FileNotFoundError(f"目录不存在: {directory_path}")
+        
+        file_paths = []
+        if recursive:
+            for extension in self.supported_extensions.keys():
+                file_paths.extend(directory.rglob(f"*{extension}"))
+        else:
+            for extension in self.supported_extensions.keys():
+                file_paths.extend(directory.glob(f"*{extension}"))
+        
+        file_paths = [str(path) for path in file_paths]
+        return self.load_documents(file_paths)
+    
+    def split_documents(self, documents: List[Document], 
+                       chunk_size: int = 1000, 
+                       chunk_overlap: int = 200) -> List[Document]:
+        """分割文档"""
+        if chunk_size != 1000 or chunk_overlap != 200:
+            text_splitter = RecursiveCharacterTextSplitter(
+                chunk_size=chunk_size,
+                chunk_overlap=chunk_overlap,
+                length_function=len,
+            )
+        else:
+            text_splitter = self.text_splitter
+        
+        split_docs = text_splitter.split_documents(documents)
+        logger.info(f"文档分割完成: {len(documents)} -> {len(split_docs)} 个片段")
+        return split_docs
+    
+    def add_metadata(self, documents: List[Document], 
+                    metadata: Dict[str, Any]) -> List[Document]:
+        """为文档添加元数据"""
+        for doc in documents:
+            doc.metadata.update(metadata)
+        return documents
+    
+    def filter_documents(self, documents: List[Document], 
+                        filter_func) -> List[Document]:
+        """过滤文档"""
+        filtered_docs = [doc for doc in documents if filter_func(doc)]
+        logger.info(f"文档过滤完成: {len(documents)} -> {len(filtered_docs)} 个文档")
+        return filtered_docs
+    
+    def get_document_stats(self, documents: List[Document]) -> Dict[str, Any]:
+        """获取文档统计信息"""
+        total_chars = sum(len(doc.page_content) for doc in documents)
+        total_words = sum(len(doc.page_content.split()) for doc in documents)
+        
+        # 按来源分组
+        sources = {}
+        for doc in documents:
+            source = doc.metadata.get('source', 'unknown')
+            if source not in sources:
+                sources[source] = 0
+            sources[source] += 1
+        
+        return {
+            "total_documents": len(documents),
+            "total_characters": total_chars,
+            "total_words": total_words,
+            "avg_chars_per_doc": total_chars / len(documents) if documents else 0,
+            "avg_words_per_doc": total_words / len(documents) if documents else 0,
+            "sources": sources
+        }
+    
+    @cache_manager.cache(ttl=3600)  # 缓存1小时
+    def process_document_pipeline(self, file_path: str, 
+                                chunk_size: int = 1000,
+                                chunk_overlap: int = 200,
+                                additional_metadata: Dict[str, Any] = None) -> List[Document]:
+        """完整的文档处理流水线"""
+        logger.info(f"开始处理文档: {file_path}")
+        
+        # 1. 加载文档
+        documents = self.load_document(file_path)
+        
+        # 2. 分割文档
+        split_docs = self.split_documents(documents, chunk_size, chunk_overlap)
+        
+        # 3. 添加元数据
+        if additional_metadata:
+            split_docs = self.add_metadata(split_docs, additional_metadata)
+        
+        # 4. 获取统计信息
+        stats = self.get_document_stats(split_docs)
+        logger.info(f"文档处理完成: {stats}")
+        
+        return split_docs
+
+
+# 全局文档处理器实例
+document_processor = DocumentProcessor()

+ 189 - 0
src/core/llm_manager.py

@@ -0,0 +1,189 @@
+"""
+LLM管理器 - 统一管理不同的LLM提供商
+"""
+
+from typing import Dict, List, Optional, Any
+from abc import ABC, abstractmethod
+from loguru import logger
+
+from langchain.llms.base import LLM
+from langchain_openai import ChatOpenAI
+from langchain_community.chat_models import ChatOpenAI as CommunityChatOpenAI, ChatAnthropic
+
+from ..config.settings import settings
+from ..utils.cache import cache_manager
+
+
+class BaseLLMProvider(ABC):
+    """LLM提供商基类"""
+    
+    @abstractmethod
+    def create_llm(self, **kwargs) -> LLM:
+        """创建LLM实例"""
+        pass
+    
+    @abstractmethod
+    def get_available_models(self) -> List[str]:
+        """获取可用的模型列表"""
+        pass
+
+
+class OpenAIProvider(BaseLLMProvider):
+    """OpenAI提供商"""
+    
+    def create_llm(self, **kwargs) -> LLM:
+        """创建OpenAI LLM实例"""
+        if not settings.openai_api_key:
+            raise ValueError("OpenAI API密钥未配置")
+        
+        default_params = {
+            "openai_api_key": settings.openai_api_key,
+            "openai_api_base": settings.openai_api_base,
+            "temperature": settings.temperature,
+            "max_tokens": settings.max_tokens,
+            "model": "gpt-3.5-turbo"
+        }
+        default_params.update(kwargs)
+        
+        return ChatOpenAI(**default_params)
+    
+    def get_available_models(self) -> List[str]:
+        """获取可用的OpenAI模型"""
+        return [
+            "gpt-3.5-turbo",
+            "gpt-3.5-turbo-16k",
+            "gpt-4",
+            "gpt-4-turbo",
+            "gpt-4-32k"
+        ]
+
+
+class AnthropicProvider(BaseLLMProvider):
+    """Anthropic提供商"""
+    
+    def create_llm(self, **kwargs) -> LLM:
+        """创建Anthropic LLM实例"""
+        if not settings.anthropic_api_key:
+            raise ValueError("Anthropic API密钥未配置")
+        
+        default_params = {
+            "anthropic_api_key": settings.anthropic_api_key,
+            "temperature": settings.temperature,
+            "max_tokens": settings.max_tokens,
+            "model": "claude-3-sonnet-20240229"
+        }
+        default_params.update(kwargs)
+        
+        return ChatAnthropic(**default_params)
+    
+    def get_available_models(self) -> List[str]:
+        """获取可用的Anthropic模型"""
+        return [
+            "claude-3-opus-20240229",
+            "claude-3-sonnet-20240229",
+            "claude-3-haiku-20240307"
+        ]
+
+
+class OpenRouterProvider(BaseLLMProvider):
+    """OpenRouter提供商"""
+    
+    def create_llm(self, **kwargs) -> LLM:
+        """创建OpenRouter LLM实例"""
+        if not settings.openrouter_api_key:
+            raise ValueError("OpenRouter API密钥未配置")
+        
+        default_params = {
+            "openai_api_key": settings.openrouter_api_key,
+            "openai_api_base": settings.openrouter_api_base,
+            "temperature": settings.temperature,
+            "max_tokens": settings.max_tokens,
+            "model": "openai/gpt-3.5-turbo"
+        }
+        default_params.update(kwargs)
+        
+        return CommunityChatOpenAI(**default_params)
+    
+    def get_available_models(self) -> List[str]:
+        """获取可用的OpenRouter模型"""
+        return [
+            "openai/gpt-3.5-turbo",
+            "openai/gpt-4",
+            "openai/gpt-4-turbo",
+            "anthropic/claude-3-sonnet",
+            "anthropic/claude-3-opus",
+            "google/palm-2-chat-bison",
+            "meta-llama/llama-2-13b-chat",
+            "meta-llama/llama-2-70b-chat",
+            "microsoft/wizardlm-13b",
+            "nousresearch/nous-hermes-llama2-13b"
+        ]
+
+
+class LLMManager:
+    """LLM管理器"""
+    
+    def __init__(self):
+        self.providers: Dict[str, BaseLLMProvider] = {
+            "openai": OpenAIProvider(),
+            "anthropic": AnthropicProvider(),
+            "openrouter": OpenRouterProvider()
+        }
+        self._llm_instances: Dict[str, LLM] = {}
+    
+    def get_provider(self, provider_name: str) -> BaseLLMProvider:
+        """获取LLM提供商"""
+        if provider_name not in self.providers:
+            raise ValueError(f"不支持的LLM提供商: {provider_name}")
+        return self.providers[provider_name]
+    
+    def create_llm(self, provider: str = "openrouter", **kwargs) -> LLM:
+        """创建LLM实例"""
+        provider_instance = self.get_provider(provider)
+        return provider_instance.create_llm(**kwargs)
+    
+    def get_or_create_llm(self, provider: str = "openrouter", model: str = None, **kwargs) -> LLM:
+        """获取或创建LLM实例(带缓存)"""
+        cache_key = f"llm_{provider}_{model or 'default'}"
+        
+        # 检查缓存
+        cached_llm = cache_manager.get(cache_key)
+        if cached_llm:
+            return cached_llm
+        
+        # 创建新实例
+        if model:
+            kwargs["model"] = model
+        
+        llm = self.create_llm(provider, **kwargs)
+        
+        # 缓存实例
+        cache_manager.set(cache_key, llm, ttl=3600)  # 缓存1小时
+        
+        return llm
+    
+    def get_available_models(self, provider: str = "openrouter") -> List[str]:
+        """获取指定提供商的所有可用模型"""
+        provider_instance = self.get_provider(provider)
+        return provider_instance.get_available_models()
+    
+    def list_providers(self) -> List[str]:
+        """列出所有可用的提供商"""
+        return list(self.providers.keys())
+    
+    @cache_manager.cache(ttl=1800)  # 缓存30分钟
+    def test_connection(self, provider: str = "openrouter") -> bool:
+        """测试LLM连接"""
+        try:
+            llm = self.create_llm(provider)
+            # 发送一个简单的测试请求
+            response = llm.invoke("Hello")
+            logger.info(f"{provider} 连接测试成功")
+            return True
+        except Exception as e:
+            logger.error(f"{provider} 连接测试失败: {e}")
+            return False
+
+
+# 全局LLM管理器实例
+llm_manager = LLMManager()

+ 145 - 0
src/core/vector_store.py

@@ -0,0 +1,145 @@
+"""
+向量数据库管理器
+"""
+
+from typing import List, Dict, Any, Optional
+from pathlib import Path
+from loguru import logger
+
+from langchain_community.vectorstores import Chroma
+from langchain_community.embeddings import OpenAIEmbeddings
+from langchain.schema import Document
+
+from ..config.settings import settings
+from ..utils.cache import cache_manager
+
+
+class VectorStoreManager:
+    """向量数据库管理器"""
+    
+    def __init__(self):
+        self.chroma_path = Path(settings.chroma_db_path)
+        self.faiss_path = Path(settings.faiss_index_path)
+        self._embeddings = None
+        self._vector_stores = {}
+    
+    @property
+    def embeddings(self):
+        """获取嵌入模型"""
+        if self._embeddings is None:
+            if not settings.openai_api_key:
+                raise ValueError("OpenAI API密钥未配置,无法创建嵌入模型")
+            self._embeddings = OpenAIEmbeddings(
+                openai_api_key=settings.openai_api_key,
+                openai_api_base=settings.openai_api_base
+            )
+        return self._embeddings
+    
+    def create_chroma_store(self, collection_name: str = "default") -> Chroma:
+        """创建Chroma向量存储"""
+        self.chroma_path.mkdir(parents=True, exist_ok=True)
+        
+        return Chroma(
+            collection_name=collection_name,
+            embedding_function=self.embeddings,
+            persist_directory=str(self.chroma_path)
+        )
+    
+    def create_faiss_store(self, index_name: str = "default"):
+        """创建FAISS向量存储(已禁用,需要SWIG编译器)"""
+        raise NotImplementedError("FAISS支持已禁用,请使用ChromaDB")
+    
+    def get_or_create_store(self, store_type: str = "chroma", name: str = "default"):
+        """获取或创建向量存储"""
+        cache_key = f"vector_store_{store_type}_{name}"
+        
+        # 检查缓存
+        cached_store = cache_manager.get(cache_key)
+        if cached_store:
+            return cached_store
+        
+        # 创建新存储
+        if store_type == "chroma":
+            store = self.create_chroma_store(name)
+        elif store_type == "faiss":
+            raise NotImplementedError("FAISS支持已禁用,请使用ChromaDB")
+        else:
+            raise ValueError(f"不支持的向量存储类型: {store_type}")
+        
+        # 缓存存储
+        cache_manager.set(cache_key, store, ttl=3600)
+        
+        return store
+    
+    def add_documents(self, documents: List[Document], store_type: str = "chroma", 
+                     collection_name: str = "default") -> None:
+        """添加文档到向量存储"""
+        store = self.get_or_create_store(store_type, collection_name)
+        
+        if store_type == "chroma":
+            store.add_documents(documents)
+            store.persist()
+        elif store_type == "faiss":
+            raise NotImplementedError("FAISS支持已禁用,请使用ChromaDB")
+        
+        logger.info(f"已添加 {len(documents)} 个文档到 {store_type} 存储")
+    
+    def similarity_search(self, query: str, k: int = 4, store_type: str = "chroma",
+                         collection_name: str = "default") -> List[Document]:
+        """相似性搜索"""
+        store = self.get_or_create_store(store_type, collection_name)
+        return store.similarity_search(query, k=k)
+    
+    def similarity_search_with_score(self, query: str, k: int = 4, 
+                                   store_type: str = "chroma",
+                                   collection_name: str = "default") -> List[tuple]:
+        """带分数的相似性搜索"""
+        store = self.get_or_create_store(store_type, collection_name)
+        return store.similarity_search_with_score(query, k=k)
+    
+    def delete_collection(self, store_type: str = "chroma", 
+                         collection_name: str = "default") -> None:
+        """删除集合"""
+        if store_type == "chroma":
+            store = self.create_chroma_store(collection_name)
+            store._collection.delete()
+        elif store_type == "faiss":
+            raise NotImplementedError("FAISS支持已禁用,请使用ChromaDB")
+        
+        # 清除缓存
+        cache_key = f"vector_store_{store_type}_{collection_name}"
+        cache_manager.delete(cache_key)
+        
+        logger.info(f"已删除 {store_type} 集合: {collection_name}")
+    
+    def list_collections(self, store_type: str = "chroma") -> List[str]:
+        """列出所有集合"""
+        if store_type == "chroma":
+            # Chroma的集合管理比较复杂,这里简化处理
+            return ["default"]
+        elif store_type == "faiss":
+            raise NotImplementedError("FAISS支持已禁用,请使用ChromaDB")
+        else:
+            raise ValueError(f"不支持的向量存储类型: {store_type}")
+    
+    def get_collection_stats(self, store_type: str = "chroma", 
+                           collection_name: str = "default") -> Dict[str, Any]:
+        """获取集合统计信息"""
+        store = self.get_or_create_store(store_type, collection_name)
+        
+        if store_type == "chroma":
+            count = store._collection.count()
+            return {
+                "type": "chroma",
+                "name": collection_name,
+                "document_count": count,
+                "path": str(self.chroma_path)
+            }
+        elif store_type == "faiss":
+            raise NotImplementedError("FAISS支持已禁用,请使用ChromaDB")
+        
+        return {}
+
+
+# 全局向量存储管理器实例
+vector_store_manager = VectorStoreManager()

+ 9 - 0
src/services/__init__.py

@@ -0,0 +1,9 @@
+"""
+服务层模块
+"""
+
+from .chat_service import ChatService
+from .qa_service import QAService
+from .document_service import DocumentService
+
+__all__ = ["ChatService", "QAService", "DocumentService"]

+ 211 - 0
src/services/chat_service.py

@@ -0,0 +1,211 @@
+"""
+聊天服务 - 管理对话和会话
+"""
+
+from typing import List, Dict, Any, Optional
+from datetime import datetime
+from loguru import logger
+
+from langchain.schema import HumanMessage, AIMessage, SystemMessage
+from langchain.memory import ConversationBufferMemory, ConversationSummaryMemory
+
+from ..core.llm_manager import llm_manager
+from ..utils.cache import cache_manager
+
+
+class ChatSession:
+    """聊天会话"""
+    
+    def __init__(self, session_id: str, system_prompt: str = None):
+        self.session_id = session_id
+        self.system_prompt = system_prompt
+        self.messages: List[Dict[str, Any]] = []
+        self.created_at = datetime.now()
+        self.last_activity = datetime.now()
+        
+        # 初始化记忆
+        if system_prompt:
+            self.memory = ConversationBufferMemory(
+                memory_key="chat_history",
+                return_messages=True
+            )
+            self.memory.chat_memory.add_message(SystemMessage(content=system_prompt))
+        else:
+            self.memory = ConversationBufferMemory(
+                memory_key="chat_history",
+                return_messages=True
+            )
+    
+    def add_message(self, role: str, content: str, metadata: Dict[str, Any] = None):
+        """添加消息"""
+        message = {
+            "role": role,
+            "content": content,
+            "timestamp": datetime.now(),
+            "metadata": metadata or {}
+        }
+        self.messages.append(message)
+        self.last_activity = datetime.now()
+        
+        # 添加到LangChain记忆
+        if role == "user":
+            self.memory.chat_memory.add_user_message(content)
+        elif role == "assistant":
+            self.memory.chat_memory.add_ai_message(content)
+    
+    def get_messages(self, limit: int = None) -> List[Dict[str, Any]]:
+        """获取消息历史"""
+        if limit:
+            return self.messages[-limit:]
+        return self.messages
+    
+    def get_langchain_messages(self) -> List:
+        """获取LangChain格式的消息"""
+        return self.memory.chat_memory.messages
+    
+    def clear_history(self):
+        """清空历史记录"""
+        self.messages.clear()
+        self.memory.clear()
+        logger.info(f"会话 {self.session_id} 历史记录已清空")
+
+
+class ChatService:
+    """聊天服务"""
+    
+    def __init__(self):
+        self.sessions: Dict[str, ChatSession] = {}
+        self.default_provider = "openrouter"
+        self.default_model = "openai/gpt-3.5-turbo"
+    
+    def create_session(self, session_id: str, system_prompt: str = None) -> ChatSession:
+        """创建新的聊天会话"""
+        if session_id in self.sessions:
+            logger.warning(f"会话 {session_id} 已存在,将返回现有会话")
+            return self.sessions[session_id]
+        
+        session = ChatSession(session_id, system_prompt)
+        self.sessions[session_id] = session
+        logger.info(f"创建新会话: {session_id}")
+        return session
+    
+    def get_session(self, session_id: str) -> Optional[ChatSession]:
+        """获取会话"""
+        return self.sessions.get(session_id)
+    
+    def delete_session(self, session_id: str) -> bool:
+        """删除会话"""
+        if session_id in self.sessions:
+            del self.sessions[session_id]
+            logger.info(f"删除会话: {session_id}")
+            return True
+        return False
+    
+    def list_sessions(self) -> List[Dict[str, Any]]:
+        """列出所有会话"""
+        sessions_info = []
+        for session_id, session in self.sessions.items():
+            sessions_info.append({
+                "session_id": session_id,
+                "created_at": session.created_at,
+                "last_activity": session.last_activity,
+                "message_count": len(session.messages),
+                "system_prompt": session.system_prompt
+            })
+        return sessions_info
+    
+    def send_message(self, session_id: str, message: str, 
+                    provider: str = None, model: str = None,
+                    **kwargs) -> Dict[str, Any]:
+        """发送消息并获取回复"""
+        session = self.get_session(session_id)
+        if not session:
+            raise ValueError(f"会话不存在: {session_id}")
+        
+        # 添加用户消息
+        session.add_message("user", message)
+        
+        try:
+            # 获取LLM
+            llm = llm_manager.get_or_create_llm(
+                provider or self.default_provider,
+                model or self.default_model,
+                **kwargs
+            )
+            
+            # 获取对话历史
+            messages = session.get_langchain_messages()
+            
+            # 生成回复
+            response = llm.invoke(messages)
+            
+            # 添加AI回复
+            session.add_message("assistant", response.content)
+            
+            logger.info(f"会话 {session_id} 消息处理完成")
+            
+            return {
+                "session_id": session_id,
+                "user_message": message,
+                "assistant_message": response.content,
+                "timestamp": datetime.now(),
+                "provider": provider or self.default_provider,
+                "model": model or self.default_model
+            }
+            
+        except Exception as e:
+            logger.error(f"消息处理失败: {e}")
+            # 移除失败的用户消息
+            if session.messages:
+                session.messages.pop()
+            raise
+    
+    def get_conversation_summary(self, session_id: str) -> str:
+        """获取对话摘要"""
+        session = self.get_session(session_id)
+        if not session:
+            raise ValueError(f"会话不存在: {session_id}")
+        
+        if len(session.messages) < 2:
+            return "对话尚未开始"
+        
+        # 使用摘要记忆
+        summary_memory = ConversationSummaryMemory(
+            llm=llm_manager.get_or_create_llm(),
+            memory_key="chat_history",
+            return_messages=True
+        )
+        
+        # 添加所有消息到摘要记忆
+        for message in session.messages:
+            if message["role"] == "user":
+                summary_memory.chat_memory.add_user_message(message["content"])
+            elif message["role"] == "assistant":
+                summary_memory.chat_memory.add_ai_message(message["content"])
+        
+        # 获取摘要
+        summary = summary_memory.moving_summary_buffer
+        return summary if summary else "无法生成摘要"
+    
+    def export_conversation(self, session_id: str, format: str = "json") -> str:
+        """导出对话"""
+        session = self.get_session(session_id)
+        if not session:
+            raise ValueError(f"会话不存在: {session_id}")
+        
+        if format == "json":
+            import json
+            conversation_data = {
+                "session_id": session_id,
+                "created_at": session.created_at.isoformat(),
+                "last_activity": session.last_activity.isoformat(),
+                "system_prompt": session.system_prompt,
+                "messages": session.messages
+            }
+            return json.dumps(conversation_data, ensure_ascii=False, indent=2)
+        else:
+            raise ValueError(f"不支持的导出格式: {format}")
+
+
+# 全局聊天服务实例
+chat_service = ChatService()

+ 256 - 0
src/services/document_service.py

@@ -0,0 +1,256 @@
+"""
+文档服务 - 文档管理和处理服务
+"""
+
+from typing import List, Dict, Any, Optional
+from pathlib import Path
+from loguru import logger
+
+from langchain.schema import Document
+
+from ..core.document_processor import document_processor
+from ..core.vector_store import vector_store_manager
+from ..utils.cache import cache_manager
+
+
+class DocumentService:
+    """文档服务"""
+    
+    def __init__(self):
+        self.default_store_type = "chroma"
+        self.default_collection = "documents"
+    
+    def process_and_store_document(self, file_path: str, 
+                                 store_type: str = None, 
+                                 collection_name: str = None,
+                                 chunk_size: int = 1000,
+                                 chunk_overlap: int = 200,
+                                 additional_metadata: Dict[str, Any] = None) -> Dict[str, Any]:
+        """处理并存储文档"""
+        try:
+            # 处理文档
+            documents = document_processor.process_document_pipeline(
+                file_path, chunk_size, chunk_overlap, additional_metadata
+            )
+            
+            # 存储到向量数据库
+            vector_store_manager.add_documents(
+                documents,
+                store_type or self.default_store_type,
+                collection_name or self.default_collection
+            )
+            
+            # 获取统计信息
+            stats = document_processor.get_document_stats(documents)
+            collection_stats = vector_store_manager.get_collection_stats(
+                store_type or self.default_store_type,
+                collection_name or self.default_collection
+            )
+            
+            result = {
+                "file_path": file_path,
+                "processed_documents": len(documents),
+                "document_stats": stats,
+                "collection_stats": collection_stats,
+                "store_type": store_type or self.default_store_type,
+                "collection_name": collection_name or self.default_collection
+            }
+            
+            logger.info(f"文档处理完成: {file_path}")
+            return result
+            
+        except Exception as e:
+            logger.error(f"文档处理失败: {file_path}, 错误: {e}")
+            raise
+    
+    def process_and_store_directory(self, directory_path: str,
+                                  store_type: str = None,
+                                  collection_name: str = None,
+                                  recursive: bool = True,
+                                  chunk_size: int = 1000,
+                                  chunk_overlap: int = 200,
+                                  additional_metadata: Dict[str, Any] = None) -> Dict[str, Any]:
+        """处理并存储目录中的所有文档"""
+        try:
+            # 加载目录中的所有文档
+            documents = document_processor.load_directory(directory_path, recursive)
+            
+            if not documents:
+                logger.warning(f"目录中没有找到可处理的文档: {directory_path}")
+                return {
+                    "directory_path": directory_path,
+                    "processed_documents": 0,
+                    "message": "没有找到可处理的文档"
+                }
+            
+            # 分割文档
+            split_docs = document_processor.split_documents(
+                documents, chunk_size, chunk_overlap
+            )
+            
+            # 添加元数据
+            if additional_metadata:
+                split_docs = document_processor.add_metadata(split_docs, additional_metadata)
+            
+            # 存储到向量数据库
+            vector_store_manager.add_documents(
+                split_docs,
+                store_type or self.default_store_type,
+                collection_name or self.default_collection
+            )
+            
+            # 获取统计信息
+            stats = document_processor.get_document_stats(split_docs)
+            collection_stats = vector_store_manager.get_collection_stats(
+                store_type or self.default_store_type,
+                collection_name or self.default_collection
+            )
+            
+            result = {
+                "directory_path": directory_path,
+                "processed_documents": len(split_docs),
+                "original_documents": len(documents),
+                "document_stats": stats,
+                "collection_stats": collection_stats,
+                "store_type": store_type or self.default_store_type,
+                "collection_name": collection_name or self.default_collection
+            }
+            
+            logger.info(f"目录处理完成: {directory_path}")
+            return result
+            
+        except Exception as e:
+            logger.error(f"目录处理失败: {directory_path}, 错误: {e}")
+            raise
+    
+    def search_documents(self, query: str, k: int = 4,
+                        store_type: str = None, collection_name: str = None) -> List[Dict[str, Any]]:
+        """搜索文档"""
+        documents = vector_store_manager.similarity_search(
+            query, k, store_type or self.default_store_type,
+            collection_name or self.default_collection
+        )
+        
+        results = []
+        for doc in documents:
+            result = {
+                "content": doc.page_content,
+                "metadata": doc.metadata,
+                "source": doc.metadata.get("source", "unknown")
+            }
+            results.append(result)
+        
+        return results
+    
+    def search_documents_with_scores(self, query: str, k: int = 4,
+                                   store_type: str = None, collection_name: str = None) -> List[Dict[str, Any]]:
+        """带分数的文档搜索"""
+        results_with_scores = vector_store_manager.similarity_search_with_score(
+            query, k, store_type or self.default_store_type,
+            collection_name or self.default_collection
+        )
+        
+        results = []
+        for doc, score in results_with_scores:
+            result = {
+                "content": doc.page_content,
+                "metadata": doc.metadata,
+                "source": doc.metadata.get("source", "unknown"),
+                "similarity_score": score
+            }
+            results.append(result)
+        
+        return results
+    
+    def get_document_collections(self, store_type: str = None) -> List[str]:
+        """获取文档集合列表"""
+        return vector_store_manager.list_collections(
+            store_type or self.default_store_type
+        )
+    
+    def get_collection_info(self, store_type: str = None, collection_name: str = None) -> Dict[str, Any]:
+        """获取集合信息"""
+        return vector_store_manager.get_collection_stats(
+            store_type or self.default_store_type,
+            collection_name or self.default_collection
+        )
+    
+    def delete_collection(self, store_type: str = None, collection_name: str = None) -> None:
+        """删除文档集合"""
+        vector_store_manager.delete_collection(
+            store_type or self.default_store_type,
+            collection_name or self.default_collection
+        )
+        logger.info(f"已删除文档集合: {collection_name or self.default_collection}")
+    
+    def export_documents(self, store_type: str = None, collection_name: str = None,
+                        format: str = "json") -> str:
+        """导出文档集合"""
+        # 获取所有文档(这里简化处理,实际可能需要分批获取)
+        documents = vector_store_manager.similarity_search(
+            "", 1000, store_type or self.default_store_type,
+            collection_name or self.default_collection
+        )
+        
+        if format == "json":
+            import json
+            export_data = {
+                "collection_name": collection_name or self.default_collection,
+                "store_type": store_type or self.default_store_type,
+                "document_count": len(documents),
+                "documents": [
+                    {
+                        "content": doc.page_content,
+                        "metadata": doc.metadata
+                    }
+                    for doc in documents
+                ]
+            }
+            return json.dumps(export_data, ensure_ascii=False, indent=2)
+        else:
+            raise ValueError(f"不支持的导出格式: {format}")
+    
+    def get_supported_formats(self) -> List[str]:
+        """获取支持的文件格式"""
+        return list(document_processor.supported_extensions.keys())
+    
+    def validate_document(self, file_path: str) -> Dict[str, Any]:
+        """验证文档"""
+        file_path = Path(file_path)
+        
+        if not file_path.exists():
+            return {
+                "valid": False,
+                "error": "文件不存在"
+            }
+        
+        extension = file_path.suffix.lower()
+        if extension not in document_processor.supported_extensions:
+            return {
+                "valid": False,
+                "error": f"不支持的文件格式: {extension}",
+                "supported_formats": self.get_supported_formats()
+            }
+        
+        try:
+            # 尝试加载文档
+            documents = document_processor.load_document(str(file_path))
+            stats = document_processor.get_document_stats(documents)
+            
+            return {
+                "valid": True,
+                "file_path": str(file_path),
+                "file_size": file_path.stat().st_size,
+                "extension": extension,
+                "document_count": len(documents),
+                "stats": stats
+            }
+        except Exception as e:
+            return {
+                "valid": False,
+                "error": str(e)
+            }
+
+
+# 全局文档服务实例
+document_service = DocumentService()

+ 195 - 0
src/services/qa_service.py

@@ -0,0 +1,195 @@
+"""
+问答服务 - 基于向量数据库的问答系统
+"""
+
+from typing import List, Dict, Any, Optional
+from loguru import logger
+
+from langchain.chains import RetrievalQA
+from langchain.prompts import PromptTemplate
+from langchain.schema import Document
+
+from ..core.llm_manager import llm_manager
+from ..core.vector_store import vector_store_manager
+from ..utils.cache import cache_manager
+
+
+class QAService:
+    """问答服务"""
+    
+    def __init__(self):
+        self.default_provider = "openrouter"
+        self.default_model = "openai/gpt-3.5-turbo"
+        self.default_store_type = "chroma"
+        self.default_collection = "default"
+        
+        # 默认提示模板
+        self.default_prompt_template = """基于以下上下文信息回答问题。如果你不知道答案,请说"我不知道",不要试图编造答案。
+
+上下文信息:
+{context}
+
+问题: {question}
+
+答案:"""
+    
+    def create_qa_chain(self, provider: str = None, model: str = None,
+                       store_type: str = None, collection_name: str = None,
+                       prompt_template: str = None) -> RetrievalQA:
+        """创建问答链"""
+        # 获取LLM
+        llm = llm_manager.get_or_create_llm(
+            provider or self.default_provider,
+            model or self.default_model
+        )
+        
+        # 获取向量存储
+        vector_store = vector_store_manager.get_or_create_store(
+            store_type or self.default_store_type,
+            collection_name or self.default_collection
+        )
+        
+        # 创建提示模板
+        if prompt_template:
+            prompt = PromptTemplate(
+                template=prompt_template,
+                input_variables=["context", "question"]
+            )
+        else:
+            prompt = PromptTemplate(
+                template=self.default_prompt_template,
+                input_variables=["context", "question"]
+            )
+        
+        # 创建问答链
+        qa_chain = RetrievalQA.from_chain_type(
+            llm=llm,
+            chain_type="stuff",
+            retriever=vector_store.as_retriever(search_kwargs={"k": 4}),
+            chain_type_kwargs={"prompt": prompt},
+            return_source_documents=True
+        )
+        
+        return qa_chain
+    
+    def ask_question(self, question: str, provider: str = None, model: str = None,
+                    store_type: str = None, collection_name: str = None,
+                    k: int = 4, **kwargs) -> Dict[str, Any]:
+        """提问并获取答案"""
+        try:
+            # 创建问答链
+            qa_chain = self.create_qa_chain(
+                provider, model, store_type, collection_name
+            )
+            
+            # 执行问答
+            result = qa_chain({"query": question})
+            
+            # 处理结果
+            answer = result.get("result", "")
+            source_documents = result.get("source_documents", [])
+            
+            # 格式化源文档信息
+            sources = []
+            for doc in source_documents:
+                source_info = {
+                    "content": doc.page_content[:200] + "..." if len(doc.page_content) > 200 else doc.page_content,
+                    "metadata": doc.metadata
+                }
+                sources.append(source_info)
+            
+            response = {
+                "question": question,
+                "answer": answer,
+                "sources": sources,
+                "source_count": len(sources),
+                "provider": provider or self.default_provider,
+                "model": model or self.default_model,
+                "store_type": store_type or self.default_store_type,
+                "collection": collection_name or self.default_collection
+            }
+            
+            logger.info(f"问答完成: {question[:50]}...")
+            return response
+            
+        except Exception as e:
+            logger.error(f"问答失败: {e}")
+            raise
+    
+    def add_documents_for_qa(self, documents: List[Document], 
+                           store_type: str = None, collection_name: str = None) -> None:
+        """为问答系统添加文档"""
+        vector_store_manager.add_documents(
+            documents,
+            store_type or self.default_store_type,
+            collection_name or self.default_collection
+        )
+        logger.info(f"已为问答系统添加 {len(documents)} 个文档")
+    
+    def search_similar_documents(self, query: str, k: int = 4,
+                               store_type: str = None, collection_name: str = None) -> List[Document]:
+        """搜索相似文档"""
+        return vector_store_manager.similarity_search(
+            query, k, store_type or self.default_store_type, 
+            collection_name or self.default_collection
+        )
+    
+    def search_with_scores(self, query: str, k: int = 4,
+                          store_type: str = None, collection_name: str = None) -> List[tuple]:
+        """带分数的相似性搜索"""
+        return vector_store_manager.similarity_search_with_score(
+            query, k, store_type or self.default_store_type,
+            collection_name or self.default_collection
+        )
+    
+    def get_collection_stats(self, store_type: str = None, 
+                           collection_name: str = None) -> Dict[str, Any]:
+        """获取集合统计信息"""
+        return vector_store_manager.get_collection_stats(
+            store_type or self.default_store_type,
+            collection_name or self.default_collection
+        )
+    
+    def list_collections(self, store_type: str = None) -> List[str]:
+        """列出所有集合"""
+        return vector_store_manager.list_collections(
+            store_type or self.default_store_type
+        )
+    
+    def delete_collection(self, store_type: str = None, 
+                         collection_name: str = None) -> None:
+        """删除集合"""
+        vector_store_manager.delete_collection(
+            store_type or self.default_store_type,
+            collection_name or self.default_collection
+        )
+    
+    @cache_manager.cache(ttl=1800)  # 缓存30分钟
+    def batch_qa(self, questions: List[str], provider: str = None, 
+                model: str = None, store_type: str = None, 
+                collection_name: str = None) -> List[Dict[str, Any]]:
+        """批量问答"""
+        results = []
+        
+        for i, question in enumerate(questions):
+            try:
+                result = self.ask_question(
+                    question, provider, model, store_type, collection_name
+                )
+                results.append(result)
+                logger.info(f"批量问答进度: {i+1}/{len(questions)}")
+            except Exception as e:
+                logger.error(f"问题 {i+1} 处理失败: {e}")
+                results.append({
+                    "question": question,
+                    "answer": f"处理失败: {str(e)}",
+                    "sources": [],
+                    "source_count": 0,
+                    "error": True
+                })
+        
+        return results
+
+
+# 全局问答服务实例
+qa_service = QAService()

+ 8 - 0
src/utils/__init__.py

@@ -0,0 +1,8 @@
+"""
+工具函数模块
+"""
+
+from .logger import setup_logger
+from .cache import CacheManager
+
+__all__ = ["setup_logger", "CacheManager"]

+ 95 - 0
src/utils/cache.py

@@ -0,0 +1,95 @@
+"""
+缓存管理工具
+"""
+
+import json
+import hashlib
+from typing import Any, Optional
+from datetime import datetime, timedelta
+from loguru import logger
+from ..config.settings import settings
+
+
+class CacheManager:
+    """缓存管理器"""
+    
+    def __init__(self):
+        self.cache_enabled = settings.cache_enabled
+        self.cache_ttl = settings.cache_ttl
+        self._cache = {}
+        self._cache_timestamps = {}
+    
+    def _generate_key(self, *args, **kwargs) -> str:
+        """生成缓存键"""
+        key_data = {
+            'args': args,
+            'kwargs': sorted(kwargs.items())
+        }
+        key_string = json.dumps(key_data, sort_keys=True)
+        return hashlib.md5(key_string.encode()).hexdigest()
+    
+    def get(self, key: str) -> Optional[Any]:
+        """获取缓存值"""
+        if not self.cache_enabled:
+            return None
+        
+        if key not in self._cache:
+            return None
+        
+        # 检查是否过期
+        timestamp = self._cache_timestamps.get(key)
+        if timestamp and datetime.now() > timestamp:
+            del self._cache[key]
+            del self._cache_timestamps[key]
+            return None
+        
+        logger.debug(f"Cache hit for key: {key}")
+        return self._cache[key]
+    
+    def set(self, key: str, value: Any, ttl: Optional[int] = None) -> None:
+        """设置缓存值"""
+        if not self.cache_enabled:
+            return
+        
+        ttl = ttl or self.cache_ttl
+        self._cache[key] = value
+        self._cache_timestamps[key] = datetime.now() + timedelta(seconds=ttl)
+        logger.debug(f"Cache set for key: {key}, TTL: {ttl}s")
+    
+    def delete(self, key: str) -> None:
+        """删除缓存值"""
+        if key in self._cache:
+            del self._cache[key]
+        if key in self._cache_timestamps:
+            del self._cache_timestamps[key]
+        logger.debug(f"Cache deleted for key: {key}")
+    
+    def clear(self) -> None:
+        """清空所有缓存"""
+        self._cache.clear()
+        self._cache_timestamps.clear()
+        logger.info("Cache cleared")
+    
+    def cache(self, ttl: Optional[int] = None):
+        """缓存装饰器"""
+        def decorator(func):
+            def wrapper(*args, **kwargs):
+                if not self.cache_enabled:
+                    return func(*args, **kwargs)
+                
+                cache_key = self._generate_key(func.__name__, *args, **kwargs)
+                cached_result = self.get(cache_key)
+                
+                if cached_result is not None:
+                    return cached_result
+                
+                result = func(*args, **kwargs)
+                self.set(cache_key, result, ttl)
+                return result
+            
+            return wrapper
+        return decorator
+
+
+# 全局缓存实例
+cache_manager = CacheManager()

+ 44 - 0
src/utils/logger.py

@@ -0,0 +1,44 @@
+"""
+日志管理工具
+"""
+
+import os
+import sys
+from pathlib import Path
+from loguru import logger
+from ..config.settings import settings
+
+
+def setup_logger():
+    """设置日志配置"""
+    
+    # 创建日志目录
+    log_file = Path(settings.log_file)
+    log_file.parent.mkdir(parents=True, exist_ok=True)
+    
+    # 移除默认的日志处理器
+    logger.remove()
+    
+    # 添加控制台处理器
+    logger.add(
+        sys.stdout,
+        format="<green>{time:YYYY-MM-DD HH:mm:ss}</green> | <level>{level: <8}</level> | <cyan>{name}</cyan>:<cyan>{function}</cyan>:<cyan>{line}</cyan> - <level>{message}</level>",
+        level=settings.log_level,
+        colorize=True
+    )
+    
+    # 添加文件处理器
+    logger.add(
+        settings.log_file,
+        format="{time:YYYY-MM-DD HH:mm:ss} | {level: <8} | {name}:{function}:{line} - {message}",
+        level=settings.log_level,
+        rotation="10 MB",
+        retention="7 days",
+        compression="zip"
+    )
+    
+    return logger
+
+
+# 初始化日志
+setup_logger()

+ 3 - 0
tests/__init__.py

@@ -0,0 +1,3 @@
+"""
+测试模块
+"""

+ 50 - 0
tests/test_config.py

@@ -0,0 +1,50 @@
+"""
+配置模块测试
+"""
+
+import pytest
+import os
+from unittest.mock import patch
+from src.config.settings import Settings
+
+
+class TestSettings:
+    """测试配置类"""
+    
+    def test_default_settings(self):
+        """测试默认配置"""
+        settings = Settings()
+        
+        assert settings.app_name == "AI_Architecture"
+        assert settings.app_version == "1.0.0"
+        assert settings.debug is False
+        assert settings.log_level == "INFO"
+        assert settings.cache_enabled is True
+        assert settings.max_tokens == 4000
+        assert settings.temperature == 0.7
+    
+    def test_environment_variables(self):
+        """测试环境变量配置"""
+        with patch.dict(os.environ, {
+            "OPENAI_API_KEY": "test_key",
+            "LOG_LEVEL": "DEBUG",
+            "CACHE_ENABLED": "false"
+        }):
+            settings = Settings()
+            
+            assert settings.openai_api_key == "test_key"
+            assert settings.log_level == "DEBUG"
+            assert settings.cache_enabled is False
+    
+    def test_validation(self):
+        """测试配置验证"""
+        settings = Settings()
+        
+        # 测试数值类型
+        assert isinstance(settings.max_tokens, int)
+        assert isinstance(settings.temperature, float)
+        assert isinstance(settings.cache_ttl, int)
+        
+        # 测试布尔类型
+        assert isinstance(settings.debug, bool)
+        assert isinstance(settings.cache_enabled, bool)