投流数据文章相关服务开发

luojunhui 2949b33452 新增服务号发文回收 10 tháng trước cách đây
applications 2949b33452 新增服务号发文回收 10 tháng trước cách đây
routes c9ed0ac0cf 账号质量分析 10 tháng trước cách đây
ui 519fd8ba99 账号质量分析 10 tháng trước cách đây
.gitignore 18a8b26662 2025-01-03 修改gitignore文件 1 năm trước cách đây
Dockerfile 22bbe39d93 update requirements.txt 10 tháng trước cách đây
LICENSE d3cfb36cda Initial commit 1 năm trước cách đây
README.md c5eece4ab6 update README.md 10 tháng trước cách đây
app_config.toml 6c618d1e6a 公众号搜索新增try-except 10 tháng trước cách đây
docker-compose.yaml 7aa217a476 init 10 tháng trước cách đây
requirements.txt 333893e846 update requirements.txt 10 tháng trước cách đây
run_dev_server.sh 12ff42e794 dev环境 10 tháng trước cách đây
task_app.py c9ed0ac0cf 账号质量分析 10 tháng trước cách đây

README.md

LongArticleTaskServer

description: a server for long_articles project experiments and tasks

启动服务

use hypercorn

hypercorn task_app:app --config app_config.toml

use docker

docker compose up -d

项目结构

.
├── Dockerfile
├── LICENSE
├── README.md
├── app_config.toml
├── applications
│   ├── __init__.py
│   ├── ab_test
│   │   ├── __init__.py
│   │   └── get_cover.py
│   ├── api
│   │   ├── __init__.py
│   │   ├── aliyun_log_api.py
│   │   ├── async_aigc_system_api.py
│   │   ├── async_apollo_api.py
│   │   ├── async_feishu_api.py
│   │   ├── async_piaoquan_api.py
│   │   ├── deep_seek_official_api.py
│   │   └── elastic_search_api.py
│   ├── config
│   │   ├── __init__.py
│   │   ├── aliyun_log_config.py
│   │   ├── deepseek_config.py
│   │   ├── elastic_search_mappings.py
│   │   ├── es_certs.crt
│   │   └── mysql_config.py
│   ├── crawler
│   │   ├── toutiao
│   │   │   ├── __init__.py
│   │   │   ├── blogger.py
│   │   │   ├── detail_recommend.py
│   │   │   ├── main_page_recomend.py
│   │   │   ├── search.py
│   │   │   ├── toutiao.js
│   │   │   └── use_js.py
│   │   └── wechat
│   │       ├── __init__.py
│   │       └── gzh_spider.py
│   ├── database
│   │   ├── __init__.py
│   │   └── mysql_pools.py
│   ├── pipeline
│   │   ├── __init__.py
│   │   ├── crawler_pipeline.py
│   │   └── data_recycle_pipeline.py
│   ├── service
│   │   ├── __init__.py
│   │   └── log_service.py
│   ├── tasks
│   │   ├── __init__.py
│   │   ├── cold_start_tasks
│   │   │   ├── __init__.py
│   │   │   └── article_pool_cold_start.py
│   │   ├── crawler_tasks
│   │   │   ├── __init__.py
│   │   │   └── crawler_toutiao.py
│   │   ├── data_recycle_tasks
│   │   │   ├── __init__.py
│   │   │   └── recycle_daily_publish_articles.py
│   │   ├── llm_tasks
│   │   │   ├── __init__.py
│   │   │   ├── candidate_account_process.py
│   │   │   └── process_title.py
│   │   ├── monitor_tasks
│   │   │   ├── __init__.py
│   │   │   ├── get_off_videos.py
│   │   │   ├── gzh_article_monitor.py
│   │   │   ├── kimi_balance.py
│   │   │   └── task_processing_monitor.py
│   │   ├── task_mapper.py
│   │   ├── task_scheduler.py
│   │   └── task_scheduler_v2.py
│   └── utils
│       ├── __init__.py
│       ├── aigc_system_database.py
│       ├── async_apollo_client.py
│       ├── async_http_client.py
│       ├── async_mysql_utils.py
│       ├── common.py
│       ├── get_cover.py
│       ├── item.py
│       └── response.py
├── dev
│   ├── code.py
│   ├── dev.py
│   ├── run_task_dev.py
│   ├── sample.txt
│   ├── title.json
│   └── totp.py
├── dev.py
├── docker-compose.yaml
├── myapp.log
├── requirements.txt
├── routes
│   ├── __init__.py
│   └── blueprint.py
└── task_app.py

get code strategy

tree -I "__pycache__|*.pyc"

1. 数据任务

daily发文数据回收

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "daily_publish_articles_recycle"}'

daily发文更新root_source_id

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "update_root_source_id"}'

账号质量处理

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "candidate_account_quality_analysis"}'

2. 抓取任务

今日头条账号内文章抓取

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "crawler_toutiao"}'

今日头条推荐抓取文章

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "crawler_toutiao", "method": "recommend"}'

今日头条搜索抓取账号

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "crawler_toutiao", "method": "search"}'

抓取账号管理(微信)

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "crawler_account_manager", "platform": "weixin"}'

抓取微信文章(抓账号模式)

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "crawler_gzh_articles", "account_method": "account_association", "crawl_mode": "account"}'

抓取微信文章(搜索模式)

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "crawler_gzh_articles", "account_method": "search", "crawl_mode": "search"}'

3. 冷启动发布任务

发布头条文章

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "article_pool_cold_start", "platform": "toutiao", "crawler_methods": ["toutiao_account_association"]}'

发布公众号文章

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "article_pool_cold_start"}'

4. 其他

校验kimi余额

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "check_kimi_balance"}'

自动下架视频

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "get_off_videos"}'

校验视频可见状态

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "check_publish_video_audit_status"}'

外部服务号监测

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "outside_article_monitor"}'

站内服务号发文监测

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "inner_article_monitor"}'

标题重写

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "title_rewrite"}'

为标题增加品类(文章池)

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "article_pool_category_generation", "limit": "1000"}'

候选账号质量分析

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "candidate_account_quality_analysis"}'