This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
LongArticleSearchAgent is an AI-powered content discovery system for finding long-form WeChat articles. It combines a general-purpose agent framework (agent/) with a domain-specific pipeline (src/pipeline/) that orchestrates search, filtering, and account aggregation stages.
# Install dependencies
make install # or: pip install -r requirements.txt
# Run web server (dev with reload)
make dev # hypercorn app:app --bind 0.0.0.0:8000 --reload
# Run web server (production)
make run
# Run pipeline CLI (no database required)
python run_pipeline.py
# Run production harness (with budget control, observer, DB)
python run_search_agent.py
# Lint
make lint # ruff check src/ tests/ app.py
# Format
make format # ruff check --fix + ruff format
# Run all tests
make test # pytest tests/ -v
# Run specific test file or test
pytest tests/unit/test_config.py -v
pytest tests/integration/test_health.py::test_health_returns_ok -v
# Docker
make docker-build && make docker-run
# Clean caches
make clean
Required: OPEN_ROUTER_API_KEY. Key optional vars: PIPELINE_QUERY, PIPELINE_DEMAND_ID, PIPELINE_TARGET_COUNT, PIPELINE_TEMPERATURE, PIPELINE_MAX_ITERATIONS, PIPELINE_TIMEOUT, MODEL (default: anthropic/claude-sonnet-4-5). Database vars (SEARCH_AGENT_DB_*) are optional; the system runs without a database using default policies.
Agent Core (agent/): General autonomous agent framework (~24K lines). Contains AgentRunner execution loop, LLM provider integrations (OpenRouter, Gemini, Qwen), tool registry with JSON schema, skill system (markdown-based prompt injection), and FileSystemTraceStore for execution persistence.
Application Layer (src/): Domain-specific logic built on top of the agent framework.
src/domain/search/ — SearchAgentCore service, SearchAgentPolicy strategy config, SearchAgentPolicyRepository for DB-driven runtime policy override.src/pipeline/ — PipelineOrchestrator engine with 6 stages, 3 quality gates, and 4 lifecycle hooks.src/config/ — Pydantic BaseSettings classes with env prefix and .env support.src/infra/ — Async MySQL pool, HTTP client, Aliyun logging, Redis/ES/Milvus clients.DemandAnalysisStage → ContentSearchStage → [SearchCompletenessGate]
→ HardFilterStage → QualityFilterStage → [FilterSufficiencyGate]
→ AccountPrecipitateStage → OutputPersistStage → [OutputSchemaGate]
src/pipeline/stages/): Sequential processing units. DemandAnalysisStage and QualityFilterStage use StageAgentExecutor to run LLM calls with skill-defined prompts. ContentSearchStage delegates to a ToolAdapter (currently WeixinToolAdapter).src/pipeline/gates/): Post-stage checks that return proceed/retry/fallback/abort actions. FilterSufficiencyGate can trigger fallback to ContentSearchStage if insufficient results.src/pipeline/hooks/): Observer pattern for observability — JSONL trace writing, terminal progress, MySQL persistence, structured logging. All hooks are invoked at each lifecycle point by the orchestrator.ToolAdapter base class (src/pipeline/adapters/base.py) with WeixinToolAdapter implementation wrapping tests/tools/weixin_tools.py.tests/skills/: LLM prompts defining domain expertise (demand analysis, article finding/filtering strategy, account precipitation, output schema). Loaded by stages to guide agent behavior.LongArticlesSearchAgentConfig → includes SearchAgentMySQLConfig, DeepSeekConfig, AliyunLogConfig. RuntimePipelineConfig for per-run settings. SearchAgentPolicy for strategy parameters (loadable from DB via search_agent_strategy table).async/await.| Entry | Purpose |
|---|---|
app.py |
Quart ASGI web server with REST API |
run_pipeline.py |
Simple CLI runner (no DB strategy) |
run_search_agent.py |
Production CLI with budget control, timeout, DB policy loading |
pipeline_visualize.py |
Converts JSONL trace → HTML visualization |
src, agent, gateway, knowhubpytest-asyncio with asyncio_mode = "auto" (no need for @pytest.mark.asyncio decorator)pythonpath = ["."])