|
2 vuotta sitten | |
---|---|---|
common | 2 vuotta sitten | |
kuaishou | 2 vuotta sitten | |
main | 2 vuotta sitten | |
scheduling | 2 vuotta sitten | |
weixinzhishu | 2 vuotta sitten | |
xiaoniangao | 2 vuotta sitten | |
xigua | 2 vuotta sitten | |
youtube | 2 vuotta sitten | |
.DS_Store | 2 vuotta sitten | |
.gitignore | 2 vuotta sitten | |
README.MD | 2 vuotta sitten | |
requirements.txt | 2 vuotta sitten |
sh ./main/scheduling_main.sh ${crawler_dir} ${log_type} ${crawler} ${env} ${machine} >>${nohup_dir} 2>&1 &
参数说明
${crawler_dir}: 爬虫执行路径,如: scheduling/scheduling_main/run_write_task.py
${log_type}: 日志命名格式,如: scheduling-task,则在 scheduling/logs/目录下,生成 2023-02-08-scheduling-task.log
${crawler}: 哪款爬虫,如: youtube / kanyikan / weixinzhishu
${env}: 爬虫运行环境,正式环境: prod / 测试环境: dev
${machine}: 爬虫运行机器,阿里云服务器: aliyun_hk / aliyun / local
${nohup_dir}: nohup日志存储路径,如: shceduling/nohup-task.log
阿里云 102 服务器
sh ./main/scheduling_main.sh scheduling/scheduling_main/run_write_task.py --log_type="scheduling-write" --crawler="scheduling" --env="prod" --machine="aliyun" nohup-write.log
sh ./main/scheduling_main.sh scheduling/scheduling_main/run_scheduling_task.py --log_type="scheduling-task" --crawler="scheduling" --env="prod" --machine="aliyun" nohup-task.log
香港服务器
sh ./main/scheduling_main.sh scheduling/scheduling_main/run_write_task.py --log_type="scheduling-write" --crawler="scheduling" --env="prod" --machine="aliyun_hk" shceduling/nohup-write.log
sh ./main/scheduling_main.sh scheduling/scheduling_main/run_scheduling_task.py --log_type="scheduling-task" --crawler="scheduling" --env="prod" --machine="aliyun_hk" shceduling/nohup-task.log
线下调试
sh ./main/scheduling_main.sh scheduling/scheduling_main/run_write_task.py --log_type="scheduling-write" --crawler="scheduling" --env="dev" --machine="local" ./scheduling/nohup-write.log
sh ./main/scheduling_main.sh scheduling/scheduling_main/run_scheduling_task.py --log_type="scheduling-task" --crawler="scheduling" --env="dev" --machine="local" ./scheduling/nohup-task.log
杀进程
ps aux | grep scheduling
ps aux | grep scheduling | grep -v grep | awk '{print $2}' | xargs kill -9
sh ./main/main.sh ${crawler_dir} ${log_type} ${crawler} ${strategy} ${oss_endpoint} ${env} ${machine} ${nohup_dir}
参数说明
${crawler_dir}: 爬虫执行路径,如: ./youtube/youtube_main/run_youtube_follow.py
${log_type}: 日志命名格式,如: follow,则在 youtube/logs/目录下,生成 2023-02-08-follow.log
${crawler}: 哪款爬虫,如: youtube / kanyikan / weixinzhishu
${strategy}: 爬虫策略,如: 定向爬虫策略 / 小时榜爬虫策略 / 热榜爬虫策略
# ${oss_endpoint}: OSS网关,内网: inner / 外网: out / 香港: hk
${env}: 爬虫运行环境,正式环境: prod / 测试环境: dev
${machine}: 爬虫运行机器,阿里云服务器: aliyun_hk / aliyun / macpro / macair / local
${nohup_dir}: nohup日志存储路径,如: ./youtube/nohup.log
sh ./main/main.sh ./youtube/youtube_main/run_youtube_follow.py --log_type="follow" --crawler="youtube" --strategy="定向爬虫策略" --oss_endpoint="hk" --env="prod" --machine="aliyun_hk" youtube/nohup.log
# sh ./main/main.sh ./youtube/youtube_main/run_youtube_follow.py --log_type="follow" --crawler="youtube" --strategy="定向爬虫策略" --env="prod" --machine="aliyun_hk" youtube/nohup.log
youtube杀进程命令:
ps aux | grep run_youtube
ps aux | grep run_youtube | grep -v grep | awk '{print $2}' | xargs kill -9
微信指数杀进程
nohup python3 -u weixinzhishu/weixinzhishu_main/weixinzhishu_inner_sort.py >>./weixinzhishu/nohup_inner_sort.log 2>&1 &
nohup python3 -u weixinzhishu/weixinzhishu_main/weixinzhishu_inner_long.py >>./weixinzhishu/nohup_inner_long.log 2>&1 &
nohup python3 -u weixinzhishu/weixinzhishu_main/weixinzhishu_out.py >>./weixinzhishu/nohup_out.log 2>&1 &
ps aux | grep run_weixinzhishu
ps aux | grep run_weixinzhishu | grep -v grep | awk '{print $2}' | xargs kill -9
获取 wechat_key 设备: Mac Air
ps aux | grep weixinzhishu | grep -v grep | awk '{print $2}' | xargs kill -9 && cd /Users/piaoquan/Desktop/piaoquan_crawler && nohup python3 -u weixinzhishu/weixinzhishu_key/search_key_mac.py >> weixinzhishu/nohup.log 2>&1 &
阿里云 102 服务器
sh ./main/main.sh ./xigua/xigua_main/run_xigua_follow.py --log_type="follow" --crawler="xigua" --strategy="定向爬虫策略" --oss_endpoint="inner" --env="prod" --machine="aliyun" xigua/nohup.log
# sh ./main/main.sh ./xigua/xigua_main/run_xigua_follow.py --log_type="follow" --crawler="xigua" --strategy="定向爬虫策略" --env="prod" --machine="aliyun" xigua/nohup.log
本机
sh ./main/main.sh ./xigua/xigua_main/run_xigua_follow.py --log_type="follow" --crawler="xigua" --strategy="定向爬虫策略" --oss_endpoint="out" --env="prod" --machine="local" xigua/nohup.log
# sh ./main/main.sh ./xigua/xigua_main/run_xigua_follow.py --log_type="follow" --crawler="xigua" --strategy="定向爬虫策略" --env="prod" --machine="local" xigua/nohup.log
macpro
sh ./main/main.sh ./xigua/xigua_main/run_xigua_follow.py --log_type="follow" --crawler="xigua" --strategy="定向爬虫策略" --oss_endpoint="out" --env="prod" --machine="macpro" xigua/nohup.log
# sh ./main/main.sh ./xigua/xigua_main/run_xigua_follow.py --log_type="follow" --crawler="xigua" --strategy="定向爬虫策略" --env="prod" --machine="macpro" xigua/nohup.log
杀进程命令:
ps aux | grep run_xigua
ps aux | grep run_xigua | grep -v grep | awk '{print $2}' | xargs kill -9
阿里云 102 服务器
sh ./main/main.sh ./kuaishou/kuaishou_main/run_kuaishou_follow.py --log_type="follow" --crawler="kuaishou" --strategy="定向爬虫策略" --oss_endpoint="inner" --env="prod" --machine="aliyun" kuaishou/nohup.log
# sh ./main/main.sh ./kuaishou/kuaishou_main/run_kuaishou_follow.py --log_type="follow" --crawler="kuaishou" --strategy="定向爬虫策略" --env="prod" --machine="aliyun" kuaishou/nohup.log
本机
sh ./main/main.sh ./kuaishou/kuaishou_main/run_kuaishou_follow.py --log_type="follow" --crawler="kuaishou" --strategy="定向爬虫策略" --oss_endpoint="out" --env="dev" --machine="local" kuaishou/nohup.log
# sh ./main/main.sh ./kuaishou/kuaishou_main/run_kuaishou_follow.py --log_type="follow" --crawler="kuaishou" --strategy="定向爬虫策略" --env="dev" --machine="local" kuaishou/nohup.log
macpro
sh ./main/main.sh ./kuaishou/kuaishou_main/run_kuaishou_follow.py --log_type="follow" --crawler="kuaishou" --strategy="定向爬虫策略" --oss_endpoint="out" --env="prod" --machine="macpro" kuaishou/nohup.log
# sh ./main/main.sh ./kuaishou/kuaishou_main/run_kuaishou_follow.py --log_type="follow" --crawler="kuaishou" --strategy="定向爬虫策略" --env="prod" --machine="macpro" kuaishou/nohup.log
杀进程命令:
ps aux | grep run_kuaishou
ps aux | grep run_kuaishou | grep -v grep | awk '{print $2}' | xargs kill -9
阿里云 102 服务器
定向爬虫策略: sh ./main/main.sh ./xiaoniangao/xiaoniangao_main/run_xiaoniangao_follow.py --log_type="follow" --crawler="xiaoniangao" --strategy="定向爬虫策略" --oss_endpoint="inner" --env="prod" --machine="aliyun" xiaoniangao/nohup.log
小时榜爬虫策略:
播放量榜爬虫策略:
线下调试
定向爬虫策略: sh ./main/main.sh ./xiaoniangao/xiaoniangao_main/run_xiaoniangao_follow.py --log_type="follow" --crawler="xiaoniangao" --strategy="定向爬虫策略" --oss_endpoint="out" --env="dev" --machine="local" xiaoniangao/nohup.log
小时榜爬虫策略:
播放量榜爬虫策略:
杀进程命令
ps aux | grep run_xiaoniangao
ps aux | grep run_xiaoniangao | grep -v grep | awk '{print $2}' | xargs kill -9