投流数据文章相关服务开发

luojunhui 1da4dc2fcf 更新优化小程序信息任务 3 days ago
applications 1da4dc2fcf 更新优化小程序信息任务 3 days ago
routes c9ed0ac0cf 账号质量分析 3 weeks ago
ui 519fd8ba99 账号质量分析 3 weeks ago
.gitignore 18a8b26662 2025-01-03 修改gitignore文件 8 months ago
Dockerfile c343a65eaf 设置docker时间 2 weeks ago
LICENSE d3cfb36cda Initial commit 1 year ago
README.md c5eece4ab6 update README.md 1 month ago
app_config.toml 6c618d1e6a 公众号搜索新增try-except 4 weeks ago
docker-compose.yaml 7aa217a476 init 1 month ago
requirements.txt 9a01707b42 更新requirements.txt 2 weeks ago
run_dev_server.sh 12ff42e794 dev环境 4 weeks ago
task_app.py c9ed0ac0cf 账号质量分析 3 weeks ago

README.md

LongArticleTaskServer

description: a server for long_articles project experiments and tasks

启动服务

use hypercorn

hypercorn task_app:app --config app_config.toml

use docker

docker compose up -d

项目结构

.
├── Dockerfile
├── LICENSE
├── README.md
├── app_config.toml
├── applications
│   ├── __init__.py
│   ├── ab_test
│   │   ├── __init__.py
│   │   └── get_cover.py
│   ├── api
│   │   ├── __init__.py
│   │   ├── aliyun_log_api.py
│   │   ├── async_aigc_system_api.py
│   │   ├── async_apollo_api.py
│   │   ├── async_feishu_api.py
│   │   ├── async_piaoquan_api.py
│   │   ├── deep_seek_official_api.py
│   │   └── elastic_search_api.py
│   ├── config
│   │   ├── __init__.py
│   │   ├── aliyun_log_config.py
│   │   ├── deepseek_config.py
│   │   ├── elastic_search_mappings.py
│   │   ├── es_certs.crt
│   │   └── mysql_config.py
│   ├── crawler
│   │   ├── toutiao
│   │   │   ├── __init__.py
│   │   │   ├── blogger.py
│   │   │   ├── detail_recommend.py
│   │   │   ├── main_page_recomend.py
│   │   │   ├── search.py
│   │   │   ├── toutiao.js
│   │   │   └── use_js.py
│   │   └── wechat
│   │       ├── __init__.py
│   │       └── gzh_spider.py
│   ├── database
│   │   ├── __init__.py
│   │   └── mysql_pools.py
│   ├── pipeline
│   │   ├── __init__.py
│   │   ├── crawler_pipeline.py
│   │   └── data_recycle_pipeline.py
│   ├── service
│   │   ├── __init__.py
│   │   └── log_service.py
│   ├── tasks
│   │   ├── __init__.py
│   │   ├── cold_start_tasks
│   │   │   ├── __init__.py
│   │   │   └── article_pool_cold_start.py
│   │   ├── crawler_tasks
│   │   │   ├── __init__.py
│   │   │   └── crawler_toutiao.py
│   │   ├── data_recycle_tasks
│   │   │   ├── __init__.py
│   │   │   └── recycle_daily_publish_articles.py
│   │   ├── llm_tasks
│   │   │   ├── __init__.py
│   │   │   ├── candidate_account_process.py
│   │   │   └── process_title.py
│   │   ├── monitor_tasks
│   │   │   ├── __init__.py
│   │   │   ├── get_off_videos.py
│   │   │   ├── gzh_article_monitor.py
│   │   │   ├── kimi_balance.py
│   │   │   └── task_processing_monitor.py
│   │   ├── task_mapper.py
│   │   ├── task_scheduler.py
│   │   └── task_scheduler_v2.py
│   └── utils
│       ├── __init__.py
│       ├── aigc_system_database.py
│       ├── async_apollo_client.py
│       ├── async_http_client.py
│       ├── async_mysql_utils.py
│       ├── common.py
│       ├── get_cover.py
│       ├── item.py
│       └── response.py
├── dev
│   ├── code.py
│   ├── dev.py
│   ├── run_task_dev.py
│   ├── sample.txt
│   ├── title.json
│   └── totp.py
├── dev.py
├── docker-compose.yaml
├── myapp.log
├── requirements.txt
├── routes
│   ├── __init__.py
│   └── blueprint.py
└── task_app.py

get code strategy

tree -I "__pycache__|*.pyc"

1. 数据任务

daily发文数据回收

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "daily_publish_articles_recycle"}'

daily发文更新root_source_id

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "update_root_source_id"}'

账号质量处理

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "candidate_account_quality_analysis"}'

2. 抓取任务

今日头条账号内文章抓取

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "crawler_toutiao"}'

今日头条推荐抓取文章

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "crawler_toutiao", "method": "recommend"}'

今日头条搜索抓取账号

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "crawler_toutiao", "method": "search"}'

抓取账号管理(微信)

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "crawler_account_manager", "platform": "weixin"}'

抓取微信文章(抓账号模式)

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "crawler_gzh_articles", "account_method": "account_association", "crawl_mode": "account"}'

抓取微信文章(搜索模式)

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "crawler_gzh_articles", "account_method": "search", "crawl_mode": "search"}'

3. 冷启动发布任务

发布头条文章

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "article_pool_cold_start", "platform": "toutiao", "crawler_methods": ["toutiao_account_association"]}'

发布公众号文章

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "article_pool_cold_start"}'

4. 其他

校验kimi余额

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "check_kimi_balance"}'

自动下架视频

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "get_off_videos"}'

校验视频可见状态

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "check_publish_video_audit_status"}'

外部服务号监测

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "outside_article_monitor"}'

站内服务号发文监测

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "inner_article_monitor"}'

标题重写

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "title_rewrite"}'

为标题增加品类(文章池)

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "article_pool_category_generation", "limit": "1000"}'

候选账号质量分析

curl -X POST http://192.168.142.66:6060/api/run_task -H "Content-Type: application/json" -d '{"task_name": "candidate_account_quality_analysis"}'