投流数据文章相关服务开发

luojunhui 333893e846 update requirements.txt há 10 meses atrás
applications c9ed0ac0cf 账号质量分析 há 10 meses atrás
routes c9ed0ac0cf 账号质量分析 há 10 meses atrás
ui 519fd8ba99 账号质量分析 há 10 meses atrás
.gitignore 18a8b26662 2025-01-03 修改gitignore文件 há 1 ano atrás
Dockerfile 4b238580d4 dockerfile 增加对nodejs对处理 há 10 meses atrás
LICENSE d3cfb36cda Initial commit há 1 ano atrás
README.md c5eece4ab6 update README.md há 10 meses atrás
app_config.toml 6c618d1e6a 公众号搜索新增try-except há 10 meses atrás
docker-compose.yaml 7aa217a476 init há 10 meses atrás
requirements.txt 333893e846 update requirements.txt há 10 meses atrás
run_dev_server.sh 12ff42e794 dev环境 há 10 meses atrás
task_app.py c9ed0ac0cf 账号质量分析 há 10 meses atrás

README.md

LongArticleTaskServer

description: a server for long_articles project experiments and tasks

启动服务

use hypercorn

hypercorn task_app:app --config app_config.toml

use docker

docker compose up -d

项目结构

.
├── Dockerfile
├── LICENSE
├── README.md
├── app_config.toml
├── applications
│   ├── __init__.py
│   ├── ab_test
│   │   ├── __init__.py
│   │   └── get_cover.py
│   ├── api
│   │   ├── __init__.py
│   │   ├── aliyun_log_api.py
│   │   ├── async_aigc_system_api.py
│   │   ├── async_apollo_api.py
│   │   ├── async_feishu_api.py
│   │   ├── async_piaoquan_api.py
│   │   ├── deep_seek_official_api.py
│   │   └── elastic_search_api.py
│   ├── config
│   │   ├── __init__.py
│   │   ├── aliyun_log_config.py
│   │   ├── deepseek_config.py
│   │   ├── elastic_search_mappings.py
│   │   ├── es_certs.crt
│   │   └── mysql_config.py
│   ├── crawler
│   │   ├── toutiao
│   │   │   ├── __init__.py
│   │   │   ├── blogger.py
│   │   │   ├── detail_recommend.py
│   │   │   ├── main_page_recomend.py
│   │   │   ├── search.py
│   │   │   ├── toutiao.js
│   │   │   └── use_js.py
│   │   └── wechat
│   │       ├── __init__.py
│   │       └── gzh_spider.py
│   ├── database
│   │   ├── __init__.py
│   │   └── mysql_pools.py
│   ├── pipeline
│   │   ├── __init__.py
│   │   ├── crawler_pipeline.py
│   │   └── data_recycle_pipeline.py
│   ├── service
│   │   ├── __init__.py
│   │   └── log_service.py
│   ├── tasks
│   │   ├── __init__.py
│   │   ├── cold_start_tasks
│   │   │   ├── __init__.py
│   │   │   └── article_pool_cold_start.py
│   │   ├── crawler_tasks
│   │   │   ├── __init__.py
│   │   │   └── crawler_toutiao.py
│   │   ├── data_recycle_tasks
│   │   │   ├── __init__.py
│   │   │   └── recycle_daily_publish_articles.py
│   │   ├── llm_tasks
│   │   │   ├── __init__.py
│   │   │   ├── candidate_account_process.py
│   │   │   └── process_title.py
│   │   ├── monitor_tasks
│   │   │   ├── __init__.py
│   │   │   ├── get_off_videos.py
│   │   │   ├── gzh_article_monitor.py
│   │   │   ├── kimi_balance.py
│   │   │   └── task_processing_monitor.py
│   │   ├── task_mapper.py
│   │   ├── task_scheduler.py
│   │   └── task_scheduler_v2.py
│   └── utils
│       ├── __init__.py
│       ├── aigc_system_database.py
│       ├── async_apollo_client.py
│       ├── async_http_client.py
│       ├── async_mysql_utils.py
│       ├── common.py
│       ├── get_cover.py
│       ├── item.py
│       └── response.py
├── dev
│   ├── code.py
│   ├── dev.py
│   ├── run_task_dev.py
│   ├── sample.txt
│   ├── title.json
│   └── totp.py
├── dev.py
├── docker-compose.yaml
├── myapp.log
├── requirements.txt
├── routes
│   ├── __init__.py
│   └── blueprint.py
└── task_app.py

get code strategy

tree -I "__pycache__|*.pyc"

1. 数据任务

daily发文数据回收

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "daily_publish_articles_recycle"}'

daily发文更新root_source_id

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "update_root_source_id"}'

账号质量处理

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "candidate_account_quality_analysis"}'

2. 抓取任务

今日头条账号内文章抓取

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "crawler_toutiao"}'

今日头条推荐抓取文章

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "crawler_toutiao", "method": "recommend"}'

今日头条搜索抓取账号

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "crawler_toutiao", "method": "search"}'

抓取账号管理(微信)

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "crawler_account_manager", "platform": "weixin"}'

抓取微信文章(抓账号模式)

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "crawler_gzh_articles", "account_method": "account_association", "crawl_mode": "account"}'

抓取微信文章(搜索模式)

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "crawler_gzh_articles", "account_method": "search", "crawl_mode": "search"}'

3. 冷启动发布任务

发布头条文章

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "article_pool_cold_start", "platform": "toutiao", "crawler_methods": ["toutiao_account_association"]}'

发布公众号文章

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "article_pool_cold_start"}'

4. 其他

校验kimi余额

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "check_kimi_balance"}'

自动下架视频

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "get_off_videos"}'

校验视频可见状态

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "check_publish_video_audit_status"}'

外部服务号监测

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "outside_article_monitor"}'

站内服务号发文监测

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "inner_article_monitor"}'

标题重写

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "title_rewrite"}'

为标题增加品类(文章池)

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "article_pool_category_generation", "limit": "1000"}'

候选账号质量分析

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "candidate_account_quality_analysis"}'