Просмотр исходного кода

feat: 添加尾号实验、AB效果、表洞察等 SQL/JSON 文件

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
yangxiaohui 3 дней назад
Родитель
Сommit
035432b49e

+ 95 - 0
table_gen/loghubods.alg_recsys_feature_user_share_return_stat.md

@@ -0,0 +1,95 @@
+# loghubods.alg_recsys_feature_user_share_return_stat
+
+## 概述
+
+用户级分享回流特征表,刻画每个用户在最近时间窗口内的「分享能力」和「回流效果」。小时级分区,每小时更新。
+
+## 数据源
+
+| 表 | 用途 |
+|---|------|
+| `loghubods.dwd_recsys_alg_exposure_base_20250108` | 曝光基础表,只取 `is_share=1` 的记录 |
+| `loghubods.alg_vid_feature_basic_info` | 视频信息表,补充类目和标签 |
+
+## 参数
+
+| 参数 | 含义 |
+|------|------|
+| `${dt}` | 日期,如 20260301 |
+| `${hh}` | 小时,如 04 |
+| `${hours_early}` | 回看小时数(时间窗口长度) |
+| `${max_n}` | max 序列取 Top-N |
+| `${last_n}` | last 序列取 Top-N |
+| `${cate_n}` | 类目/标签序列取 Top-N |
+| `${smooth_plus}` | 类目回流率排序的平滑分母 |
+
+## 生产流程
+
+```
+exposure_base (is_share=1, 近 ${hours_early} 小时)
+    ↓ LEFT JOIN video_info (补类目/标签)
+t_base
+    ├──→ t_total           用户级聚合统计 (SUM/MAX)
+    ├──→ t_max_share       分享次数最高 Top-N 视频序列
+    ├──→ t_max_return      回流UV最高 Top-N 视频序列
+    ├──→ t_last_share      最近分享 Top-N 视频序列
+    ├──→ t_last_return     最近N级回流 Top-N 视频序列
+    ├──→ t_last_1_return   最近1级回流 Top-N 视频序列
+    ├──→ t_cate1           一级类目回流率 Top-N
+    ├──→ t_cate2           二级类目回流率 Top-N
+    ├──→ t_label1          标签1回流率 Top-N
+    └──→ t_label2          标签2回流率 Top-N
+    ↓
+t_result (LEFT JOIN 合并 10 个 CTE)
+    ↓
+JSON_FORMAT 打包 → feature 字段
+```
+
+## 输出字段
+
+表结构:`mid STRING, feature STRING`,按 `dt, hh` 分区。
+
+`feature` 是一个 JSON,包含以下字段:
+
+### 用户整体统计
+
+| JSON key | 含义 |
+|----------|------|
+| `s_pv` | 分享 PV 总和 |
+| `s_cnt` | 分享次数总和 |
+| `r_pv` | 回流 PV 总和(排除自己) |
+| `r_uv` | 回流 UV 总和(排除自己) |
+| `m_s_cnt` | 单次曝光最大分享次数 |
+| `m_r_uv` | 单次曝光最大回流 UV |
+
+### 行为序列
+
+| JSON key | 含义 | 排序 |
+|----------|------|------|
+| `m_s_s` | 分享次数最多的 Top-N 视频 | share_cnt DESC |
+| `m_r_s` | 回流UV最多的 Top-N 视频 | return_uv DESC |
+| `l_s_s` | 最近分享的 Top-N 视频 | ts DESC |
+| `l_r_s` | 最近N级回流 Top-N 视频 | ts DESC |
+| `l_r1_s` | 最近1级回流 Top-N 视频 | ts DESC |
+
+序列元素格式:`{"id": vid, "cnt"/"uv": 值, "ts": 时间戳}`
+
+### 类目偏好序列
+
+| JSON key | 含义 |
+|----------|------|
+| `c1_s` | 一级类目 (merge_first_level_cate) |
+| `c2_s` | 二级类目 (merge_second_level_cate) |
+| `l1_s` | 节日标签1 (festive_label1) |
+| `l2_s` | 节日标签2 (festive_label2) |
+
+按回流率(`return_uv / (share_pv + smooth)`)降序取 Top-N。
+
+序列元素格式:`{"na": 名称, "sp": 分享PV, "rp": 回流PV, "ru": 回流UV, "mu": 最大回流UV}`
+
+## 设计要点
+
+- **noself**:回流统计排除分享者自己,衡量真实拉新/拉活能力
+- **max vs last 序列**:max 捕捉历史最佳表现,last 捕捉近期兴趣漂移
+- **类目排序平滑**:分母加 `${smooth_plus}` 避免小样本类目排名虚高
+- **mid 取值**:优先用 uid,uid 为空时降级用 mid

+ 447 - 0
table_gen/loghubods.alg_recsys_feature_user_share_return_stat.sql

@@ -0,0 +1,447 @@
+--********************************************************************--
+-- 回流统计维度
+-- is_return_n_noself, return_n_uv_noself 计算return
+-- 
+-- 类目维度json名称简写
+-- na:name, sp:share_pv(分享pv求和), rp:return_n_pv_noself(回流pv求和)
+-- ru:return_n_uv_noself(回流uv求和), mu:max_return_n_uv_noself(一次曝光最大回流uv)
+--
+-- mid维度json名称简写
+-- s_pv:share_pv(分享pv求和), s_cnt:share_cnt(分享次数求和), 
+-- r_pv:return_pv(回流pv求和), r_uv:return_uv(回流uv求和)
+-- m_s_cnt:max_share_cnt(一次曝光最大分享次数), m_r_uv:max_return_uv(一次曝光最大回流uv)
+-- m_s_s:max_share_seq(最大分享序列), m_r_s:max_return_seq(最大回流序列)
+-- l_s_s:last_share_seq(最近分享序列), l_r_s:last_return_seq(最近回流序列)
+-- c1_s:cate1_seq(merge_first_level_cate序列-回流率), c2_s:cate2_seq(merge_second_level_cate序列-回流率)
+-- l1_s:label1_seq(festive_label1序列-回流率), l2_s:label2_seq(festive_label2序列-回流率)
+--********************************************************************--
+CREATE TABLE IF NOT EXISTS loghubods.alg_recsys_feature_user_share_return_stat
+(
+    mid      STRING COMMENT 'mid'
+    ,feature STRING COMMENT '统计值'
+)
+PARTITIONED BY 
+(
+    dt       STRING COMMENT '日期:20240105'
+    ,hh      STRING COMMENT '小时:04'
+)
+STORED AS ALIORC
+TBLPROPERTIES ('comment' = '统计用户最近30d的分享回流数据')
+LIFECYCLE 60
+;
+
+INSERT OVERWRITE TABLE loghubods.alg_recsys_feature_user_share_return_stat PARTITION (dt = '${dt}',hh = '${hh}')
+WITH t_video_info AS 
+(
+    SELECT  vid
+            ,TRIM(channel) AS channel
+            ,TRIM(cate1) AS cate1
+            ,TRIM(cate2) AS cate2
+            ,TRIM(label1) AS label1
+            ,TRIM(label2) AS label2
+    FROM    (
+                SELECT  CAST(vid AS BIGINT) vid
+                        ,GET_JSON_OBJECT(feature,"$.channel") AS channel
+                        ,GET_JSON_OBJECT(feature,"$.merge_first_level_cate") AS cate1
+                        ,GET_JSON_OBJECT(feature,"$.merge_second_level_cate") AS cate2
+                        ,GET_JSON_OBJECT(feature,"$.festive_label1") AS label1
+                        ,GET_JSON_OBJECT(feature,"$.festive_label2") AS label2
+                        ,ROW_NUMBER() OVER (PARTITION BY vid ) AS rn
+                FROM    loghubods.alg_vid_feature_basic_info
+                WHERE   CONCAT(dt,hh) BETWEEN TO_CHAR(FROM_UNIXTIME(UNIX_TIMESTAMP(TO_DATE('${dt}${hh}','YYYYMMDDHH')) - 2 * 3600),'YYYYMMDDHH') AND "${dt}${hh}"
+            ) 
+    WHERE   rn = 1
+)
+,t_base AS 
+(
+    SELECT  mid
+            ,tt1.vid
+            ,is_share
+            ,share_cnt
+            ,is_return_noself
+            ,return_1_uv_noself
+            ,is_return_n_noself
+            ,return_n_uv_noself
+            ,ts
+            ,channel
+            ,cate1
+            ,cate2
+            ,label1
+            ,label2
+    FROM    (
+                SELECT  (CASE    WHEN uid IS NOT NULL
+                                    AND LENGTH(uid) > 0
+                                    AND uid != 'null' THEN uid
+                                ELSE mid
+                        END) AS mid
+                        ,CAST(vid AS BIGINT) vid
+                        ,CAST(is_share AS BIGINT) AS is_share
+                        ,CAST(share_cnt AS BIGINT) AS share_cnt
+                        ,CAST(is_return_noself AS BIGINT) AS is_return_noself
+                        ,CAST(return_1_uv_noself AS BIGINT) AS return_1_uv_noself
+                        ,CAST(is_return_n_noself AS BIGINT) AS is_return_n_noself
+                        ,CAST(return_n_uv_noself AS BIGINT) AS return_n_uv_noself
+                        ,CAST(ts AS BIGINT) AS ts
+                FROM    loghubods.dwd_recsys_alg_exposure_base_20250108
+                WHERE   CONCAT(dt,hh) BETWEEN TO_CHAR(FROM_UNIXTIME(UNIX_TIMESTAMP(TO_DATE('${dt}${hh}','YYYYMMDDHH')) - ${hours_early} * 3600),'YYYYMMDDHH') AND TO_CHAR(FROM_UNIXTIME(UNIX_TIMESTAMP(TO_DATE('${dt}${hh}','YYYYMMDDHH')) - 1 * 3600),'YYYYMMDDHH')
+                AND     mid IS NOT NULL
+                AND     LENGTH(mid) > 1
+                AND     vid IS NOT NULL
+                AND     LENGTH(vid) > 1
+                AND     is_share = '1'
+            ) tt1
+    LEFT JOIN t_video_info tt2
+    ON      tt1.vid = tt2.vid
+)
+,t_total AS 
+(
+    SELECT  mid
+            ,SUM(is_share) AS share_pv
+            ,SUM(share_cnt) AS share_cnt
+            ,SUM(is_return_n_noself) AS return_n_pv_noself
+            ,SUM(return_n_uv_noself) AS return_n_uv_noself
+            ,MAX(share_cnt) AS max_share_cnt
+            ,MAX(return_n_uv_noself) AS max_return_uv
+    FROM    t_base
+    GROUP BY mid
+)
+,t_max_share AS 
+(
+    SELECT  mid
+            ,CONCAT("[",ARRAY_JOIN(COLLECT_LIST(record),","),"]") AS seq
+    FROM    (
+                SELECT  mid
+                        ,JSON_FORMAT(JSON_OBJECT("id",vid,"cnt",share_cnt,"ts",ts)) AS record
+                FROM    (
+                            SELECT  mid
+                                    ,vid
+                                    ,share_cnt
+                                    ,ts
+                                    ,ROW_NUMBER() OVER (PARTITION BY mid ORDER BY share_cnt DESC,ts DESC ) AS rank
+                            FROM    t_base
+                            WHERE   share_cnt > 0
+                        ) 
+                WHERE   rank <= ${max_n}
+            ) 
+    GROUP BY mid
+)
+,t_max_return AS 
+(
+    SELECT  mid
+            ,CONCAT("[",ARRAY_JOIN(COLLECT_LIST(record),","),"]") AS seq
+    FROM    (
+                SELECT  mid
+                        ,JSON_FORMAT(JSON_OBJECT("id",vid,"uv",return_n_uv_noself,"ts",ts)) AS record
+                FROM    (
+                            SELECT  mid
+                                    ,vid
+                                    ,return_n_uv_noself
+                                    ,ts
+                                    ,ROW_NUMBER() OVER (PARTITION BY mid ORDER BY return_n_uv_noself DESC,ts DESC ) AS rank
+                            FROM    t_base
+                            WHERE   is_return_n_noself > 0
+                            AND     return_n_uv_noself > 0
+                        ) 
+                WHERE   rank <= ${max_n}
+            ) 
+    GROUP BY mid
+)
+,t_last_share AS 
+(
+    SELECT  mid
+            ,CONCAT("[",ARRAY_JOIN(COLLECT_LIST(record),","),"]") AS seq
+    FROM    (
+                SELECT  mid
+                        ,JSON_FORMAT(JSON_OBJECT("id",vid,"cnt",share_cnt,"ts",ts)) AS record
+                FROM    (
+                            SELECT  mid
+                                    ,vid
+                                    ,share_cnt
+                                    ,ts
+                                    ,ROW_NUMBER() OVER (PARTITION BY mid ORDER BY ts DESC ) AS rank
+                            FROM    t_base
+                            WHERE   share_cnt > 0
+                        ) 
+                WHERE   rank <= ${last_n}
+            ) 
+    GROUP BY mid
+)
+,t_last_return AS 
+(
+    SELECT  mid
+            ,CONCAT("[",ARRAY_JOIN(COLLECT_LIST(record),","),"]") AS seq
+    FROM    (
+                SELECT  mid
+                        ,JSON_FORMAT(JSON_OBJECT("id",vid,"uv",return_n_uv_noself,"ts",ts)) AS record
+                FROM    (
+                            SELECT  mid
+                                    ,vid
+                                    ,return_n_uv_noself
+                                    ,ts
+                                    ,ROW_NUMBER() OVER (PARTITION BY mid ORDER BY ts DESC ) AS rank
+                            FROM    t_base
+                            WHERE   is_return_n_noself > 0
+                            AND     return_n_uv_noself > 0
+                        ) 
+                WHERE   rank <= ${last_n}
+            ) 
+    GROUP BY mid
+)
+,t_last_1_return AS 
+(
+    SELECT  mid
+            ,CONCAT("[",ARRAY_JOIN(COLLECT_LIST(record),","),"]") AS seq
+    FROM    (
+                SELECT  mid
+                        ,JSON_FORMAT(JSON_OBJECT("id",vid,"uv",return_1_uv_noself,"ts",ts)) AS record
+                FROM    (
+                            SELECT  mid
+                                    ,vid
+                                    ,return_1_uv_noself
+                                    ,ts
+                                    ,ROW_NUMBER() OVER (PARTITION BY mid ORDER BY ts DESC ) AS rank
+                            FROM    t_base
+                            WHERE   is_return_noself > 0
+                            AND     return_1_uv_noself > 0
+                        ) 
+                WHERE   rank <= ${last_n}
+            ) 
+    GROUP BY mid
+)
+,t_cate1 AS 
+(
+    SELECT  mid
+            ,CONCAT("[",ARRAY_JOIN(COLLECT_LIST(record),","),"]") AS seq
+    FROM    (
+                SELECT  mid
+                        ,JSON_FORMAT(
+                                    JSON_OBJECT("na",cate1,"sp",share_pv,"rp",return_n_pv_noself,"ru",return_n_uv_noself,"mu",max_return_uv)
+                        ) AS record
+                FROM    (
+                            SELECT  mid
+                                    ,cate1
+                                    ,share_pv
+                                    ,(CASE   WHEN return_n_pv_noself > 0 THEN return_n_pv_noself
+                                            ELSE NULL
+                                    END) AS return_n_pv_noself
+                                    ,(CASE   WHEN return_n_uv_noself > 0 THEN return_n_uv_noself
+                                            ELSE NULL
+                                    END) AS return_n_uv_noself
+                                    ,(CASE   WHEN max_return_uv > 0 THEN max_return_uv
+                                            ELSE NULL
+                                    END) AS max_return_uv
+                                    ,ROW_NUMBER() OVER (PARTITION BY mid ORDER BY (1.0 * return_n_uv_noself / (share_pv + ${smooth_plus})) DESC ) AS rank
+                            FROM    (
+                                        SELECT  mid
+                                                ,cate1
+                                                ,SUM(is_share) AS share_pv
+                                                ,SUM(is_return_n_noself) AS return_n_pv_noself
+                                                ,SUM(return_n_uv_noself) AS return_n_uv_noself
+                                                ,MAX(return_n_uv_noself) AS max_return_uv
+                                        FROM    t_base
+                                        WHERE   cate1 IS NOT NULL
+                                        AND     cate1 != 'unknown'
+                                        AND     cate1 != ''
+                                        GROUP BY mid
+                                                 ,cate1
+                                    ) 
+                        ) 
+                WHERE   rank <= ${cate_n}
+            ) 
+    GROUP BY mid
+)
+,t_cate2 AS 
+(
+    SELECT  mid
+            ,CONCAT("[",ARRAY_JOIN(COLLECT_LIST(record),","),"]") AS seq
+    FROM    (
+                SELECT  mid
+                        ,JSON_FORMAT(
+                                    JSON_OBJECT("na",cate2,"sp",share_pv,"rp",return_n_pv_noself,"ru",return_n_uv_noself,"mu",max_return_uv)
+                        ) AS record
+                FROM    (
+                            SELECT  mid
+                                    ,cate2
+                                    ,share_pv
+                                    ,(CASE   WHEN return_n_pv_noself > 0 THEN return_n_pv_noself
+                                            ELSE NULL
+                                    END) AS return_n_pv_noself
+                                    ,(CASE   WHEN return_n_uv_noself > 0 THEN return_n_uv_noself
+                                            ELSE NULL
+                                    END) AS return_n_uv_noself
+                                    ,(CASE   WHEN max_return_uv > 0 THEN max_return_uv
+                                            ELSE NULL
+                                    END) AS max_return_uv
+                                    ,ROW_NUMBER() OVER (PARTITION BY mid ORDER BY (1.0 * return_n_uv_noself / (share_pv + ${smooth_plus})) DESC ) AS rank
+                            FROM    (
+                                        SELECT  mid
+                                                ,cate2
+                                                ,SUM(is_share) AS share_pv
+                                                ,SUM(is_return_n_noself) AS return_n_pv_noself
+                                                ,SUM(return_n_uv_noself) AS return_n_uv_noself
+                                                ,MAX(return_n_uv_noself) AS max_return_uv
+                                        FROM    t_base
+                                        WHERE   cate2 IS NOT NULL
+                                        AND     cate2 != 'unknown'
+                                        AND     cate2 != ''
+                                        GROUP BY mid
+                                                 ,cate2
+                                    ) 
+                        ) 
+                WHERE   rank <= ${cate_n}
+            ) 
+    GROUP BY mid
+)
+,t_label1 AS 
+(
+    SELECT  mid
+            ,CONCAT("[",ARRAY_JOIN(COLLECT_LIST(record),","),"]") AS seq
+    FROM    (
+                SELECT  mid
+                        ,JSON_FORMAT(
+                                    JSON_OBJECT("na",label1,"sp",share_pv,"rp",return_n_pv_noself,"ru",return_n_uv_noself,"mu",max_return_uv)
+                        ) AS record
+                FROM    (
+                            SELECT  mid
+                                    ,label1
+                                    ,share_pv
+                                    ,(CASE   WHEN return_n_pv_noself > 0 THEN return_n_pv_noself
+                                            ELSE NULL
+                                    END) AS return_n_pv_noself
+                                    ,(CASE   WHEN return_n_uv_noself > 0 THEN return_n_uv_noself
+                                            ELSE NULL
+                                    END) AS return_n_uv_noself
+                                    ,(CASE   WHEN max_return_uv > 0 THEN max_return_uv
+                                            ELSE NULL
+                                    END) AS max_return_uv
+                                    ,ROW_NUMBER() OVER (PARTITION BY mid ORDER BY (1.0 * return_n_uv_noself / (share_pv + ${smooth_plus})) DESC ) AS rank
+                            FROM    (
+                                        SELECT  mid
+                                                ,label1
+                                                ,SUM(is_share) AS share_pv
+                                                ,SUM(is_return_n_noself) AS return_n_pv_noself
+                                                ,SUM(return_n_uv_noself) AS return_n_uv_noself
+                                                ,MAX(return_n_uv_noself) AS max_return_uv
+                                        FROM    t_base
+                                        WHERE   label1 IS NOT NULL
+                                        AND     label1 != 'unknown'
+                                        AND     label1 != ''
+                                        GROUP BY mid
+                                                 ,label1
+                                    ) 
+                        ) 
+                WHERE   rank <= ${cate_n}
+            ) 
+    GROUP BY mid
+)
+,t_label2 AS 
+(
+    SELECT  mid
+            ,CONCAT("[",ARRAY_JOIN(COLLECT_LIST(record),","),"]") AS seq
+    FROM    (
+                SELECT  mid
+                        ,JSON_FORMAT(
+                                    JSON_OBJECT("na",label2,"sp",share_pv,"rp",return_n_pv_noself,"ru",return_n_uv_noself,"mu",max_return_uv)
+                        ) AS record
+                FROM    (
+                            SELECT  mid
+                                    ,label2
+                                    ,share_pv
+                                    ,(CASE   WHEN return_n_pv_noself > 0 THEN return_n_pv_noself
+                                            ELSE NULL
+                                    END) AS return_n_pv_noself
+                                    ,(CASE   WHEN return_n_uv_noself > 0 THEN return_n_uv_noself
+                                            ELSE NULL
+                                    END) AS return_n_uv_noself
+                                    ,(CASE   WHEN max_return_uv > 0 THEN max_return_uv
+                                            ELSE NULL
+                                    END) AS max_return_uv
+                                    ,ROW_NUMBER() OVER (PARTITION BY mid ORDER BY (1.0 * return_n_uv_noself / (share_pv + ${smooth_plus})) DESC ) AS rank
+                            FROM    (
+                                        SELECT  mid
+                                                ,label2
+                                                ,SUM(is_share) AS share_pv
+                                                ,SUM(is_return_n_noself) AS return_n_pv_noself
+                                                ,SUM(return_n_uv_noself) AS return_n_uv_noself
+                                                ,MAX(return_n_uv_noself) AS max_return_uv
+                                        FROM    t_base
+                                        WHERE   label2 IS NOT NULL
+                                        AND     label2 != 'unknown'
+                                        AND     label2 != ''
+                                        GROUP BY mid
+                                                 ,label2
+                                    ) 
+                        ) 
+                WHERE   rank <= ${cate_n}
+            ) 
+    GROUP BY mid
+)
+,t_result AS 
+(
+    SELECT  mid
+            ,share_pv
+            ,share_cnt
+            ,(CASE   WHEN return_pv > 0 THEN return_pv
+                    ELSE NULL
+            END) AS return_pv
+            ,(CASE   WHEN return_uv > 0 THEN return_uv
+                    ELSE NULL
+            END) AS return_uv
+            ,(CASE   WHEN max_share_cnt > 0 THEN max_share_cnt
+                    ELSE NULL
+            END) AS max_share_cnt
+            ,(CASE   WHEN max_return_uv > 0 THEN max_return_uv
+                    ELSE NULL
+            END) AS max_return_uv
+            ,max_share_seq
+            ,max_return_seq
+            ,last_share_seq
+            ,last_return_seq
+            ,cate1_seq
+            ,cate2_seq
+            ,label1_seq
+            ,label2_seq
+            ,last_1_return_seq
+    FROM    (
+                SELECT  t1.mid
+                        ,t1.share_pv
+                        ,t1.share_cnt
+                        ,t1.return_n_pv_noself AS return_pv
+                        ,t1.return_n_uv_noself AS return_uv
+                        ,t1.max_share_cnt
+                        ,t1.max_return_uv
+                        ,t2.seq AS max_share_seq
+                        ,t3.seq AS max_return_seq
+                        ,t4.seq AS last_share_seq
+                        ,t5.seq AS last_return_seq
+                        ,t6.seq AS cate1_seq
+                        ,t7.seq AS cate2_seq
+                        ,t8.seq AS label1_seq
+                        ,t9.seq AS label2_seq
+                        ,t10.seq AS last_1_return_seq
+                FROM    t_total t1
+                LEFT JOIN t_max_share t2
+                ON      t1.mid = t2.mid
+                LEFT JOIN t_max_return t3
+                ON      t1.mid = t3.mid
+                LEFT JOIN t_last_share t4
+                ON      t1.mid = t4.mid
+                LEFT JOIN t_last_return t5
+                ON      t1.mid = t5.mid
+                LEFT JOIN t_cate1 t6
+                ON      t1.mid = t6.mid
+                LEFT JOIN t_cate2 t7
+                ON      t1.mid = t7.mid
+                LEFT JOIN t_label1 t8
+                ON      t1.mid = t8.mid
+                LEFT JOIN t_label2 t9
+                ON      t1.mid = t9.mid
+                LEFT JOIN t_last_1_return t10
+                ON      t1.mid = t10.mid
+            ) 
+)SELECT  mid
+        ,JSON_FORMAT(
+                    JSON_OBJECT('s_pv',share_pv,"s_cnt",share_cnt,"r_pv",return_pv,"r_uv",return_uv,"m_s_cnt",max_share_cnt,"m_r_uv",max_return_uv,"m_s_s",max_share_seq,"m_r_s",max_return_seq,"l_s_s",last_share_seq,"l_r_s",last_return_seq,"c1_s",cate1_seq,"c2_s",cate2_seq,"l1_s",label1_seq,"l2_s",label2_seq,"l_r1_s",last_1_return_seq)
+        ) AS feature
+FROM    t_result
+;

+ 351 - 0
tasks/00_AB效果/03_推荐AB天级效果_对比对照组_分ab_含c1d1.sql

@@ -0,0 +1,351 @@
+--odps sql
+--********************************************************************--
+-- 推荐AB天级效果 - 含对照组对比 + bn_rov/c1_rov/cn_rov/d1_rov/dn_rov 指标
+-- 基于:02_推荐AB天级效果_对比对照组_分ab.sql
+-- 新增:bn_rov(原rovn), c1_rov, cn_rov, d1_rov, dn_rov(参考于卓异 2026-03-05 版本)
+-- apptype=0,ab5/ab6/ab7 vs 对照组 ab8/ab9
+--********************************************************************--
+WITH t_base AS
+(
+    SELECT  sub.*
+            ,CASE   WHEN apptype IN ("0") AND abcode_raw IN ("ab5") THEN "rovn新损失函数实验"
+                    WHEN apptype IN ("0") AND abcode_raw IN ("ab6") THEN "c1_rovn&去掉vor实验"
+                    WHEN apptype IN ("0") AND abcode_raw IN ("ab7") THEN "c1_rovn实验"
+                    WHEN apptype IN ("0") AND abcode_raw IN ("ab8","ab9") THEN "对照组"
+                    ELSE "other"
+            END AS abcode
+    FROM    (
+                SELECT  dt
+                        ,apptype
+                        ,abcode AS abcode_raw
+                        ,CASE   WHEN page IN ("回流后沉浸页&内页feed","详情后沉浸页","首页feed","详情页") THEN "推荐"
+                                WHEN page IN ("回流页","其他") THEN "非推荐"
+                                ELSE "其他"
+                        END AS page
+                        ,a.mid
+                        ,a.vid
+                        ,is_share
+                        ,share_cnt
+                        ,is_return_1
+                        ,is_return_n
+                        ,is_return_noself
+                        ,return_1_uv
+                        ,return_n_uv
+                        ,return_n_uv_noself
+                        ,new_exposure_cnt
+                        ,flowpool
+                        ,cc.cn
+                        ,cc.c1
+                        ,dd.dn
+                        ,dd.d1
+                FROM    loghubods.dwd_recsys_alg_exposure_base_20250108 a
+                LEFT JOIN   (
+                                -- c1/cn:分享后被点击的回流 UV
+                                SELECT  a.machinecode AS mid
+                                        ,a.subsessionid
+                                        ,a.videoid AS vid
+                                        ,COUNT(DISTINCT CASE WHEN b1.machinecode <> b2.machinecode THEN b2.machinecode END) AS cn
+                                        ,COUNT(DISTINCT CASE WHEN b2.sharedepth = 1 AND b1.machinecode <> b2.machinecode THEN b2.machinecode END) AS c1
+                                FROM    (
+                                            SELECT  DISTINCT machinecode
+                                                    ,shareobjectid AS videoid
+                                                    ,recomTraceId
+                                                    ,subsessionid
+                                                    ,sharedepth
+                                                    ,shareid
+                                            FROM    loghubods.user_share_log
+                                            WHERE   dt = '${dt}'
+                                            AND     topic = 'share'
+                                            AND     pagesource REGEXP 'category$|recommend$|-pages/user-videos-detail$'
+                                        ) a
+                                LEFT JOIN   (
+                                                SELECT  DISTINCT machinecode
+                                                        ,clickobjectid
+                                                        ,recomTraceId
+                                                        ,subsessionid
+                                                        ,sharedepth
+                                                        ,rootshareid
+                                                FROM    loghubods.user_share_log
+                                                WHERE   dt = '${dt}'
+                                                AND     topic = 'click'
+                                            ) b
+                                ON      a.shareid = b.rootshareid
+                                LEFT JOIN   (
+                                                SELECT  DISTINCT machinecode
+                                                        ,shareobjectid
+                                                        ,recomTraceId
+                                                        ,subsessionid
+                                                        ,sharedepth
+                                                        ,shareid
+                                                FROM    loghubods.user_share_log
+                                                WHERE   dt = '${dt}'
+                                                AND     topic = 'share'
+                                                AND     pagesource REGEXP 'category$|recommend$|-pages/user-videos-detail$'
+                                            ) b1
+                                ON      b.machinecode = b1.machinecode
+                                AND     b.subsessionid = b1.subsessionid
+                                LEFT JOIN   (
+                                                SELECT  DISTINCT machinecode
+                                                        ,clickobjectid
+                                                        ,recomTraceId
+                                                        ,subsessionid
+                                                        ,sharedepth
+                                                        ,shareid
+                                                        ,rootshareid
+                                                FROM    loghubods.user_share_log
+                                                WHERE   dt = '${dt}'
+                                                AND     topic = 'click'
+                                            ) b2
+                                ON      b1.shareid = b2.rootshareid
+                                GROUP BY a.machinecode
+                                         ,a.subsessionid
+                                         ,a.videoid
+                            ) cc
+                ON      a.mid = cc.mid
+                AND     a.subsessionid = cc.subsessionid
+                AND     a.vid = cc.vid
+                LEFT JOIN   (
+                                -- d1/dn:分享视频的下一条视频带来的回流
+                                SELECT  *
+                                        ,LAG(回流,1,0) OVER (PARTITION BY mid,subsessionid ORDER BY rn DESC) AS dn
+                                        ,LAG(回流1,1,0) OVER (PARTITION BY mid,subsessionid ORDER BY rn DESC) AS d1
+                                FROM    (
+                                            SELECT  a.mid AS mid
+                                                    ,a.subsessionid
+                                                    ,a.videoid AS vid
+                                                    ,COUNT(DISTINCT b.shareid) AS 分享次数
+                                                    ,COUNT(DISTINCT CASE WHEN c.machinecode <> b.machinecode THEN c.machinecode END) AS 回流
+                                                    ,COUNT(DISTINCT CASE WHEN c.machinecode <> b.machinecode AND c.sharedepth = 1 THEN c.machinecode END) AS 回流1
+                                                    ,ROW_NUMBER() OVER (PARTITION BY a.subsessionid ORDER BY a.logtimestamp ASC) AS rn
+                                            FROM    (
+                                                        SELECT  *
+                                                        FROM    (
+                                                                    SELECT  DISTINCT mid
+                                                                            ,subsessionid
+                                                                            ,videoid
+                                                                            ,logtimestamp
+                                                                            ,ROW_NUMBER() OVER (PARTITION BY mid,subsessionid,videoid ORDER BY logtimestamp ASC) AS rn
+                                                                    FROM    loghubods.video_action_log_rp
+                                                                    WHERE   dt = '${dt}'
+                                                                    AND     businesstype = 'videoView'
+                                                                    AND     pagesource REGEXP 'category$|recommend$|-pages/user-videos-detail$'
+                                                                )
+                                                        WHERE   rn = 1
+                                                    ) a
+                                            LEFT JOIN   (
+                                                            SELECT  DISTINCT machinecode
+                                                                    ,shareobjectid AS videoid
+                                                                    ,recomTraceId
+                                                                    ,subsessionid
+                                                                    ,sharedepth
+                                                                    ,shareid
+                                                                    ,clienttimestamp
+                                                            FROM    loghubods.user_share_log
+                                                            WHERE   dt = '${dt}'
+                                                            AND     topic = 'share'
+                                                            AND     pagesource REGEXP 'category$|recommend$|-pages/user-videos-detail$'
+                                                        ) b
+                                            ON      a.mid = b.machinecode
+                                            AND     a.subsessionid = b.subsessionid
+                                            AND     a.videoid = b.videoid
+                                            LEFT JOIN   (
+                                                            SELECT  DISTINCT machinecode
+                                                                    ,clickobjectid
+                                                                    ,recomTraceId
+                                                                    ,subsessionid
+                                                                    ,sharedepth
+                                                                    ,rootshareid
+                                                            FROM    loghubods.user_share_log
+                                                            WHERE   dt = '${dt}'
+                                                            AND     topic = 'click'
+                                                        ) c
+                                            ON      b.shareid = c.rootshareid
+                                            GROUP BY a.mid
+                                                     ,a.subsessionid
+                                                     ,a.videoid
+                                                     ,a.logtimestamp
+                                        )
+                            ) dd
+                ON      a.mid = dd.mid
+                AND     a.subsessionid = dd.subsessionid
+                AND     a.vid = dd.vid
+                WHERE   dt = '${dt}'
+                AND     apptype IN ("0")
+                AND     page IN ("回流后沉浸页&内页feed","详情后沉浸页","首页feed","详情页","回流页","其他")
+                AND     abcode IN ("ab5","ab6","ab7","ab8","ab9")
+            ) sub
+)
+-- 按 abcode 桶聚合(abcode 内 UV 去重)
+,t_bucket AS
+(
+    SELECT  dt
+            ,apptype
+            ,abcode
+            ,abcode_raw
+            ,COALESCE(COUNT(1) / COUNT(DISTINCT mid),0) AS exp_per_dau
+            ,COALESCE(SUM(is_share) / COUNT(1),0) AS str_one
+            ,COALESCE(SUM(return_n_uv) / SUM(is_share),0) AS ros_one
+            ,COALESCE(SUM(share_cnt) / COUNT(1),0) AS str
+            ,COALESCE(SUM(return_n_uv) / SUM(share_cnt),0) AS ros
+            ,COALESCE(SUM(is_return_1) / COUNT(1),0) AS str_plus
+            ,COALESCE(SUM(return_n_uv) / SUM(is_return_1),0) AS ros_minus
+            ,COALESCE(SUM(return_n_uv) / COUNT(1),0) AS bn_rov
+            ,COALESCE(SUM(c1) / COUNT(1),0) AS c1_rov
+            ,COALESCE(SUM(cn) / COUNT(1),0) AS cn_rov
+            ,COALESCE(SUM(d1) / COUNT(1),0) AS d1_rov
+            ,COALESCE(SUM(dn) / COUNT(1),0) AS dn_rov
+            ,COALESCE(SUM(new_exposure_cnt) / COUNT(1),0) AS vovh24
+            ,COUNT(DISTINCT mid) AS dau
+            ,COUNT(1) AS exp
+            ,COALESCE(SUM(is_share),0) AS is_share
+            ,COALESCE(SUM(share_cnt),0) AS share_cnt
+            ,COALESCE(SUM(is_return_1),0) AS is_return_1
+            ,COALESCE(SUM(return_n_uv),0) AS return_n_uv
+            ,COALESCE(SUM(new_exposure_cnt),0) AS viewh24
+            ,COALESCE(SUM(return_n_uv_noself),0) AS return_n_uv_noself
+            ,COALESCE(SUM(cn),0) AS cn
+            ,COALESCE(SUM(c1),0) AS c1
+            ,COALESCE(SUM(dn),0) AS dn
+            ,COALESCE(SUM(d1),0) AS d1
+            -- rov 比率指标(分母=曝光数)已在上方计算
+    FROM    t_base
+    WHERE   page = "推荐"
+    AND     abcode != "other"
+    GROUP BY dt
+             ,apptype
+             ,abcode
+             ,abcode_raw
+)
+-- 按实验组求 abcode_raw 均值
+,t_metrics AS
+(
+    SELECT  dt
+            ,apptype
+            ,abcode
+            ,ROUND(AVG(exp_per_dau),2) AS exp_per_dau
+            ,ROUND(AVG(str_one),6) AS str_one
+            ,ROUND(AVG(ros_one),6) AS ros_one
+            ,ROUND(AVG(str),6) AS str
+            ,ROUND(AVG(ros),6) AS ros
+            ,ROUND(AVG(str_plus),6) AS str_plus
+            ,ROUND(AVG(ros_minus),6) AS ros_minus
+            ,ROUND(AVG(bn_rov),6) AS bn_rov
+            ,ROUND(AVG(c1_rov),6) AS c1_rov
+            ,ROUND(AVG(cn_rov),6) AS cn_rov
+            ,ROUND(AVG(d1_rov),6) AS d1_rov
+            ,ROUND(AVG(dn_rov),6) AS dn_rov
+            ,ROUND(AVG(vovh24),6) AS vovh24
+            ,AVG(dau) AS dau
+            ,AVG(exp) AS exp
+            ,AVG(is_share) AS is_share
+            ,AVG(share_cnt) AS share_cnt
+            ,AVG(is_return_1) AS is_return_1
+            ,AVG(return_n_uv) AS return_n_uv
+            ,AVG(viewh24) AS viewh24
+            ,AVG(return_n_uv_noself) AS return_n_uv_noself
+            ,AVG(cn) AS cn
+            ,AVG(c1) AS c1
+            ,AVG(dn) AS dn
+            ,AVG(d1) AS d1
+            ,WM_CONCAT(DISTINCT ',',abcode_raw) AS abcode_raws
+    FROM    t_bucket
+    GROUP BY dt
+             ,apptype
+             ,abcode
+)
+-- 对照组
+,t_control AS
+(
+    SELECT  dt
+            ,apptype
+            ,exp_per_dau AS ctrl_exp_per_dau
+            ,str_one AS ctrl_str_one
+            ,ros_one AS ctrl_ros_one
+            ,str AS ctrl_str
+            ,ros AS ctrl_ros
+            ,str_plus AS ctrl_str_plus
+            ,ros_minus AS ctrl_ros_minus
+            ,bn_rov AS ctrl_bn_rov
+            ,c1_rov AS ctrl_c1_rov
+            ,cn_rov AS ctrl_cn_rov
+            ,d1_rov AS ctrl_d1_rov
+            ,dn_rov AS ctrl_dn_rov
+            ,vovh24 AS ctrl_vovh24
+            ,dau AS ctrl_dau
+            ,exp AS ctrl_exp
+            ,is_share AS ctrl_is_share
+            ,share_cnt AS ctrl_share_cnt
+            ,is_return_1 AS ctrl_is_return_1
+            ,return_n_uv AS ctrl_return_n_uv
+            ,viewh24 AS ctrl_viewh24
+            ,return_n_uv_noself AS ctrl_return_n_uv_noself
+            ,cn AS ctrl_cn
+            ,c1 AS ctrl_c1
+            ,dn AS ctrl_dn
+            ,d1 AS ctrl_d1
+    FROM    t_metrics
+    WHERE   abcode = "对照组"
+)
+-- 关联对照组,计算 lift
+SELECT  m.dt
+        ,m.apptype
+        ,m.abcode
+        ,m.abcode_raws
+        -- 原始指标
+        ,m.exp_per_dau
+        ,m.str_one
+        ,m.ros_one
+        ,m.str
+        ,m.ros
+        ,m.str_plus
+        ,m.ros_minus
+        ,m.bn_rov
+        ,m.c1_rov
+        ,m.cn_rov
+        ,m.d1_rov
+        ,m.dn_rov
+        ,m.vovh24
+        ,m.dau
+        ,m.exp
+        ,m.is_share
+        ,m.share_cnt
+        ,m.is_return_1
+        ,m.return_n_uv
+        ,m.viewh24
+        ,m.return_n_uv_noself
+        ,m.cn
+        ,m.c1
+        ,m.dn
+        ,m.d1
+        -- lift vs 对照组
+        ,(m.exp_per_dau - c.ctrl_exp_per_dau) / c.ctrl_exp_per_dau AS exp_per_dau_lift
+        ,(m.str_one - c.ctrl_str_one) / c.ctrl_str_one AS str_one_lift
+        ,(m.ros_one - c.ctrl_ros_one) / c.ctrl_ros_one AS ros_one_lift
+        ,(m.str - c.ctrl_str) / c.ctrl_str AS str_lift
+        ,(m.ros - c.ctrl_ros) / c.ctrl_ros AS ros_lift
+        ,(m.str_plus - c.ctrl_str_plus) / c.ctrl_str_plus AS str_plus_lift
+        ,(m.ros_minus - c.ctrl_ros_minus) / c.ctrl_ros_minus AS ros_minus_lift
+        ,(m.bn_rov - c.ctrl_bn_rov) / c.ctrl_bn_rov AS bn_rov_lift
+        ,(m.c1_rov - c.ctrl_c1_rov) / c.ctrl_c1_rov AS c1_rov_lift
+        ,(m.cn_rov - c.ctrl_cn_rov) / c.ctrl_cn_rov AS cn_rov_lift
+        ,(m.d1_rov - c.ctrl_d1_rov) / c.ctrl_d1_rov AS d1_rov_lift
+        ,(m.dn_rov - c.ctrl_dn_rov) / c.ctrl_dn_rov AS dn_rov_lift
+        ,(m.vovh24 - c.ctrl_vovh24) / c.ctrl_vovh24 AS vovh24_lift
+        ,(m.dau - c.ctrl_dau) / c.ctrl_dau AS dau_lift
+        ,(m.exp - c.ctrl_exp) / c.ctrl_exp AS exp_lift
+        ,(m.is_share - c.ctrl_is_share) / c.ctrl_is_share AS is_share_lift
+        ,(m.share_cnt - c.ctrl_share_cnt) / c.ctrl_share_cnt AS share_cnt_lift
+        ,(m.is_return_1 - c.ctrl_is_return_1) / c.ctrl_is_return_1 AS is_return_1_lift
+        ,(m.return_n_uv - c.ctrl_return_n_uv) / c.ctrl_return_n_uv AS return_n_uv_lift
+        ,(m.viewh24 - c.ctrl_viewh24) / c.ctrl_viewh24 AS viewh24_lift
+        ,(m.return_n_uv_noself - c.ctrl_return_n_uv_noself) / c.ctrl_return_n_uv_noself AS return_n_uv_noself_lift
+        ,(m.cn - c.ctrl_cn) / c.ctrl_cn AS cn_lift
+        ,(m.c1 - c.ctrl_c1) / c.ctrl_c1 AS c1_lift
+        ,(m.dn - c.ctrl_dn) / c.ctrl_dn AS dn_lift
+        ,(m.d1 - c.ctrl_d1) / c.ctrl_d1 AS d1_lift
+FROM    t_metrics m
+LEFT JOIN t_control c
+ON      m.dt = c.dt
+AND     m.apptype = c.apptype
+ORDER BY m.dt DESC, m.apptype, m.abcode
+;

+ 7 - 0
tasks/00_尾号实验/02_推荐AB天级效果_对比对照组_分ab.json

@@ -0,0 +1,7 @@
+{
+  "token": "ONZqsxB9BhGH8tt90EScSJT5nHh",
+  "sheet_id": "JWT28U",
+  "sort": "dt:desc",
+  "cols": null,
+  "filter": "abcode!=other"
+}

+ 171 - 0
tasks/00_尾号实验/02_推荐AB天级效果_对比对照组_分ab.sql

@@ -0,0 +1,171 @@
+-- 推荐AB天级效果 - 含对照组对比
+-- 新增列:各指标相对对照组的变化率(lift)
+WITH t_base AS
+(
+    SELECT  dt
+            ,apptype
+            ,abcode as abcode_raw
+            ,CASE
+                    WHEN apptype IN ("0") AND abcode IN ("ab5") THEN "rovn新损失函数实验"
+                    WHEN apptype IN ("0") AND abcode IN ("ab6") THEN "c1_rovn&去掉vor实验"
+                    WHEN apptype IN ("0") AND abcode IN ("ab7") THEN "c1_rovn实验"
+                    WHEN apptype IN ("0") AND abcode IN ("ab8", "ab9") THEN "对照组"
+                    ELSE "other"
+            END AS abcode
+            ,CASE   WHEN page IN ("回流后沉浸页&内页feed","详情后沉浸页","首页feed","详情页") THEN "推荐"
+                    WHEN page IN ("回流页","其他") THEN "非推荐"
+                    ELSE "其他"
+            END AS page
+            ,mid
+            ,vid
+            ,is_share
+            ,share_cnt
+            ,is_return_1
+            ,is_return_n
+            ,is_return_noself
+            ,return_1_uv
+            ,return_n_uv
+            ,return_n_uv_noself
+            ,new_exposure_cnt
+            ,flowpool
+    FROM    loghubods.dwd_recsys_alg_exposure_base_20250108
+    WHERE   dt = '${dt}'
+    AND     apptype IN ("0")
+    AND     page IN ("回流后沉浸页&内页feed","详情后沉浸页","首页feed","详情页","回流页","其他")
+    AND     abcode IN ("ab0","ab1","ab2","ab3","ab4","ab5","ab6","ab7","ab8","ab9")
+),
+-- 按单桶聚合(桶内 UV 去重)
+t_bucket AS (
+    SELECT  dt
+            ,apptype
+            ,abcode
+            ,abcode_raw
+            ,page
+            ,COUNT(1) / COUNT(DISTINCT mid) AS exp_per_dau
+            ,SUM(is_share) / COUNT(1) AS str_one
+            ,SUM(return_n_uv) / SUM(is_share) AS ros_one
+            ,SUM(share_cnt) / COUNT(1) AS str
+            ,SUM(return_n_uv) / SUM(share_cnt) AS ros
+            ,SUM(is_return_1) / COUNT(1) AS str_plus
+            ,SUM(return_n_uv) / SUM(is_return_1) AS ros_minus
+            ,SUM(return_n_uv) / COUNT(1) AS rovn
+            ,SUM(new_exposure_cnt) / COUNT(1) AS vovh24
+            ,COUNT(DISTINCT mid) AS dau
+            ,COUNT(1) AS exp
+            ,COALESCE(SUM(is_share),0) AS is_share
+            ,COALESCE(SUM(share_cnt),0) AS share_cnt
+            ,COALESCE(SUM(is_return_1),0) AS is_return_1
+            ,COALESCE(SUM(return_n_uv),0) AS return_n_uv
+            ,COALESCE(SUM(new_exposure_cnt),0) AS viewh24
+            ,COALESCE(SUM(return_n_uv_noself),0) AS return_n_uv_noself
+    FROM    t_base
+    WHERE   page IN ("推荐")
+    AND     abcode != "other"
+    GROUP BY dt
+             ,apptype
+             ,abcode
+             ,abcode_raw
+             ,page
+),
+-- 按实验组求桶均值
+t_metrics AS (
+    SELECT  dt
+            ,apptype
+            ,abcode
+            ,page
+            ,AVG(exp_per_dau) AS exp_per_dau
+            ,AVG(str_one) AS str_one
+            ,AVG(ros_one) AS ros_one
+            ,AVG(str) AS str
+            ,AVG(ros) AS ros
+            ,AVG(str_plus) AS str_plus
+            ,AVG(ros_minus) AS ros_minus
+            ,AVG(rovn) AS rovn
+            ,AVG(vovh24) AS vovh24
+            ,AVG(dau) AS dau
+            ,AVG(exp) AS exp
+            ,AVG(is_share) AS is_share
+            ,AVG(share_cnt) AS share_cnt
+            ,AVG(is_return_1) AS is_return_1
+            ,AVG(return_n_uv) AS return_n_uv
+            ,AVG(viewh24) AS viewh24
+            ,AVG(return_n_uv_noself) AS return_n_uv_noself
+    FROM    t_bucket
+    GROUP BY dt
+             ,apptype
+             ,abcode
+             ,page
+),
+-- 获取对照组数据
+t_control AS (
+    SELECT  dt
+            ,apptype
+            ,page
+            ,exp_per_dau AS ctrl_exp_per_dau
+            ,str_one AS ctrl_str_one
+            ,ros_one AS ctrl_ros_one
+            ,str AS ctrl_str
+            ,ros AS ctrl_ros
+            ,str_plus AS ctrl_str_plus
+            ,ros_minus AS ctrl_ros_minus
+            ,rovn AS ctrl_rovn
+            ,vovh24 AS ctrl_vovh24
+            ,dau AS ctrl_dau
+            ,exp AS ctrl_exp
+            ,is_share AS ctrl_is_share
+            ,share_cnt AS ctrl_share_cnt
+            ,is_return_1 AS ctrl_is_return_1
+            ,return_n_uv AS ctrl_return_n_uv
+            ,viewh24 AS ctrl_viewh24
+            ,return_n_uv_noself AS ctrl_return_n_uv_noself
+    FROM    t_metrics
+    WHERE   abcode IN ("对照组")
+)
+-- 关联对照组,计算变化率
+SELECT  m.dt
+        ,m.apptype
+        ,m.abcode
+        ,m.page
+        -- 桶均指标
+        ,m.exp_per_dau
+        ,m.str_one
+        ,m.ros_one
+        ,m.str
+        ,m.ros
+        ,m.str_plus
+        ,m.ros_minus
+        ,m.rovn
+        ,m.vovh24
+        ,m.dau
+        ,m.exp
+        ,m.is_share
+        ,m.share_cnt
+        ,m.is_return_1
+        ,m.return_n_uv
+        ,m.viewh24
+        ,m.return_n_uv_noself
+        -- 相对对照组变化率
+        ,(m.exp_per_dau - c.ctrl_exp_per_dau) / c.ctrl_exp_per_dau AS exp_per_dau_lift
+        ,(m.str_one - c.ctrl_str_one) / c.ctrl_str_one AS str_one_lift
+        ,(m.ros_one - c.ctrl_ros_one) / c.ctrl_ros_one AS ros_one_lift
+        ,(m.str - c.ctrl_str) / c.ctrl_str AS str_lift
+        ,(m.ros - c.ctrl_ros) / c.ctrl_ros AS ros_lift
+        ,(m.str_plus - c.ctrl_str_plus) / c.ctrl_str_plus AS str_plus_lift
+        ,(m.ros_minus - c.ctrl_ros_minus) / c.ctrl_ros_minus AS ros_minus_lift
+        ,(m.rovn - c.ctrl_rovn) / c.ctrl_rovn AS rovn_lift
+        ,(m.vovh24 - c.ctrl_vovh24) / c.ctrl_vovh24 AS vovh24_lift
+        ,(m.dau - c.ctrl_dau) / c.ctrl_dau AS dau_lift
+        ,(m.exp - c.ctrl_exp) / c.ctrl_exp AS exp_lift
+        ,(m.is_share - c.ctrl_is_share) / c.ctrl_is_share AS is_share_lift
+        ,(m.share_cnt - c.ctrl_share_cnt) / c.ctrl_share_cnt AS share_cnt_lift
+        ,(m.is_return_1 - c.ctrl_is_return_1) / c.ctrl_is_return_1 AS is_return_1_lift
+        ,(m.return_n_uv - c.ctrl_return_n_uv) / c.ctrl_return_n_uv AS return_n_uv_lift
+        ,(m.viewh24 - c.ctrl_viewh24) / c.ctrl_viewh24 AS viewh24_lift
+        ,(m.return_n_uv_noself - c.ctrl_return_n_uv_noself) / c.ctrl_return_n_uv_noself AS return_n_uv_noself_lift
+FROM    t_metrics m
+LEFT JOIN t_control c
+ON      m.dt = c.dt
+AND     m.apptype = c.apptype
+AND     m.page = c.page
+ORDER BY m.dt DESC, m.apptype, m.page, m.abcode
+;

+ 137 - 0
tasks/00_尾号实验/a.json

@@ -0,0 +1,137 @@
+{
+    "s_pv": 10,
+    "s_cnt": 25,
+    "r_pv": 2,
+    "r_uv": 16,
+    "m_s_cnt": 8,
+    "m_r_uv": 15,
+    "m_s_s": [
+        {
+            "id": 64291100,
+            "cnt": 8,
+            "ts": 1770179547
+        },
+        {
+            "id": 65338494,
+            "cnt": 4,
+            "ts": 1772152001
+        }
+    ],
+    "m_r_s": [
+        {
+            "id": 64291100,
+            "uv": 15,
+            "ts": 1770179547
+        },
+        {
+            "id": 65152927,
+            "uv": 1,
+            "ts": 1771502038
+        }
+    ],
+    "l_s_s": [
+        {
+            "id": 64632804,
+            "cnt": 2,
+            "ts": 1772193314
+        },
+        {
+            "id": 65404119,
+            "cnt": 1,
+            "ts": 1772169844
+        }
+    ],
+    "l_r_s": [
+        {
+            "id": 65152927,
+            "uv": 1,
+            "ts": 1771502038
+        },
+        {
+            "id": 64291100,
+            "uv": 15,
+            "ts": 1770179547
+        }
+    ],
+    "c1_s": [
+        {
+            "na": "时政社会",
+            "sp": 3,
+            "rp": 2,
+            "ru": 16,
+            "mu": 15
+        },
+        {
+            "na": "医疗健康/长寿/健身",
+            "sp": 1
+        },
+        {
+            "na": "剧情/剧情演绎",
+            "sp": 2
+        },
+        {
+            "na": "明星/名人",
+            "sp": 1
+        },
+        {
+            "na": "综艺/影视综艺",
+            "sp": 1
+        },
+        {
+            "na": "法律/科普/人文社科",
+            "sp": 2
+        }
+    ],
+    "c2_s": [
+        {
+            "na": "国家力量",
+            "sp": 2,
+            "rp": 1,
+            "ru": 15,
+            "mu": 15
+        },
+        {
+            "na": "贪污腐败",
+            "sp": 1,
+            "rp": 1,
+            "ru": 1,
+            "mu": 1
+        },
+        {
+            "na": "健康知识",
+            "sp": 1
+        },
+        {
+            "na": "知识科普",
+            "sp": 2
+        },
+        {
+            "na": "老综艺影像",
+            "sp": 1
+        },
+        {
+            "na": "正能量剧情",
+            "sp": 1
+        },
+        {
+            "na": "历史名人",
+            "sp": 1
+        },
+        {
+            "na": "对口型表演",
+            "sp": 1
+        }
+    ],
+    "l_r1_s": [
+        {
+            "id": 65152927,
+            "uv": 1,
+            "ts": 1771502038
+        },
+        {
+            "id": 64291100,
+            "uv": 15,
+            "ts": 1770179547
+        }
+    ]
+}

+ 177 - 0
tasks/00_尾号实验/base_v3 copy.sql

@@ -0,0 +1,177 @@
+WITH t_abmap AS
+(
+    SELECT "0" AS suffix, "实验组:ros损失函数优化" AS abcode
+    UNION ALL SELECT "5", "实验组:ros损失函数优化"
+    UNION ALL SELECT "f", "实验组:ros损失函数优化"
+    UNION ALL SELECT "4", "实验组:c1_rovn & 去掉vor实验"
+    UNION ALL SELECT "6", "实验组:c1_rovn & 去掉vor实验"
+    UNION ALL SELECT "7", "实验组:c1_rovn & 去掉vor实验"
+    UNION ALL SELECT "8", "实验组:c1_rovn"
+    UNION ALL SELECT "9", "实验组:c1_rovn"
+    UNION ALL SELECT "e", "实验组:c1_rovn"
+    UNION ALL SELECT "a", "对照组"
+    UNION ALL SELECT "b", "对照组"
+    UNION ALL SELECT "c", "对照组"
+)
+,t_base AS
+(
+    SELECT  sub.*
+            ,COALESCE(m.abcode,"other") AS abcode
+    FROM    (
+                SELECT  dt
+                        ,apptype
+                        ,SUBSTR(GET_JSON_OBJECT(extend,'$.rootsessionid'),LENGTH(GET_JSON_OBJECT(extend,'$.rootsessionid')),1) AS suffix
+                        ,CASE   WHEN page IN ("回流后沉浸页&内页feed","详情后沉浸页","首页feed","详情页") THEN "推荐"
+                                WHEN page IN ("回流页","其他") THEN "非推荐"
+                                ELSE "其他"
+                        END AS page
+                        ,mid
+                        ,vid
+                        ,is_share
+                        ,share_cnt
+                        ,is_return_1
+                        ,is_return_n
+                        ,is_return_noself
+                        ,return_1_uv
+                        ,return_n_uv
+                        ,return_n_uv_noself
+                        ,new_exposure_cnt
+                        ,flowpool
+                        -- ,abcode as abcode_origin
+                FROM    loghubods.dwd_recsys_alg_exposure_base_20250108
+                WHERE   dt="${dt}"
+                AND     apptype IN ("4")
+                AND     page IN ("回流后沉浸页&内页feed","详情后沉浸页","首页feed","详情页","回流页","其他")
+                AND     abcode IN ("ab0","ab1","ab2","ab3","ab4","ab5","ab6","ab7","ab8","ab9")
+                AND     abcode NOT IN ("ab100")
+            ) sub
+    LEFT JOIN t_abmap m
+    ON      sub.apptype = "4"
+    AND     sub.suffix = m.suffix
+)
+-- dau2:按单尾号聚合
+,t_dau2_bucket AS
+(
+    SELECT  SUBSTR(sub.dt,1,8) AS dt
+            ,sub.apptype
+            ,COALESCE(m.abcode,"other") AS abcode
+            ,sub.suffix
+            ,COUNT(DISTINCT sub.machinecode) AS dau2
+    FROM    (
+                SELECT  dt
+                        ,apptype
+                        ,machinecode
+                        ,SUBSTR(GET_JSON_OBJECT(extparams,'$.rootSessionId'),LENGTH(GET_JSON_OBJECT(extparams,'$.rootSessionId')),1) AS suffix
+                FROM    loghubods.useractive_log
+                WHERE   dt="${dt}"
+                AND     apptype IN ("4")
+            ) sub
+    LEFT JOIN t_abmap m
+    ON      sub.apptype = "4"
+    AND     sub.suffix = m.suffix
+    GROUP BY SUBSTR(sub.dt,1,8)
+             ,sub.apptype
+             ,COALESCE(m.abcode,"other")
+             ,sub.suffix
+)
+-- dau2:按实验组求尾号均值
+,t_dau2 AS
+(
+    SELECT  dt
+            ,apptype
+            ,abcode
+            ,AVG(dau2) AS dau2
+    FROM    t_dau2_bucket
+    GROUP BY dt
+             ,apptype
+             ,abcode
+)
+-- 按单尾号聚合(尾号内 UV 去重)
+,t_bucket AS
+(
+    SELECT  dt
+            ,apptype
+            ,abcode
+            ,suffix
+            ,COALESCE(COUNT(1) / COUNT(DISTINCT mid),0) AS exp_per_dau
+            ,COALESCE(SUM(is_share) / COUNT(1),0) AS str_one
+            ,COALESCE(SUM(return_n_uv) / SUM(is_share),0) AS ros_one
+            ,COALESCE(SUM(share_cnt) / COUNT(1),0) AS str
+            ,COALESCE(SUM(return_n_uv) / SUM(share_cnt),0) AS ros
+            ,COALESCE(SUM(is_return_1) / COUNT(1),0) AS str_plus
+            ,COALESCE(SUM(return_n_uv) / SUM(is_return_1),0) AS ros_minus
+            ,COALESCE(SUM(return_n_uv) / COUNT(1),0) AS rovn
+            ,COALESCE(SUM(new_exposure_cnt) / COUNT(1),0) AS vovh24
+            ,COUNT(DISTINCT mid) AS dau
+            ,COUNT(1) AS exp
+            ,COALESCE(SUM(is_share),0) AS is_share
+            ,COALESCE(SUM(share_cnt),0) AS share_cnt
+            ,COALESCE(SUM(is_return_1),0) AS is_return_1
+            ,COALESCE(SUM(return_n_uv),0) AS return_n_uv
+            ,COALESCE(SUM(new_exposure_cnt),0) AS viewh24
+            ,COALESCE(SUM(return_n_uv_noself),0) AS return_n_uv_noself
+    FROM    t_base
+    WHERE   page = "推荐"
+    GROUP BY dt
+             ,apptype
+             ,abcode
+             ,suffix
+)
+-- 按实验组求尾号均值
+,t_metrics AS
+(
+    SELECT  dt
+            ,apptype
+            ,abcode
+            ,ROUND(AVG(exp_per_dau),2) AS exp_per_dau
+            ,ROUND(AVG(str_one),6) AS str_one
+            ,ROUND(AVG(ros_one),6) AS ros_one
+            ,ROUND(AVG(str),6) AS str
+            ,ROUND(AVG(ros),6) AS ros
+            ,ROUND(AVG(str_plus),6) AS str_plus
+            ,ROUND(AVG(ros_minus),6) AS ros_minus
+            ,ROUND(AVG(rovn),6) AS rovn
+            ,ROUND(AVG(vovh24),6) AS vovh24
+            ,AVG(dau) AS dau
+            ,AVG(exp) AS exp
+            ,AVG(is_share) AS is_share
+            ,AVG(share_cnt) AS share_cnt
+            ,AVG(is_return_1) AS is_return_1
+            ,AVG(return_n_uv) AS return_n_uv
+            ,AVG(viewh24) AS viewh24
+            ,AVG(return_n_uv_noself) AS return_n_uv_noself
+            ,WM_CONCAT(DISTINCT ',',suffix) AS suffix
+    FROM    t_bucket
+    GROUP BY dt
+             ,apptype
+             ,abcode
+)
+SELECT  a.dt
+        ,a.apptype
+        ,a.abcode
+        ,a.suffix
+        ,a.exp_per_dau
+        ,a.str_one
+        ,a.ros_one
+        ,a.str
+        ,a.ros
+        ,a.str_plus
+        ,a.ros_minus
+        ,a.rovn
+        ,a.vovh24
+        ,a.dau
+        ,a.exp
+        ,a.is_share
+        ,a.share_cnt
+        ,a.is_return_1
+        ,a.return_n_uv
+        ,a.viewh24
+        ,a.return_n_uv_noself
+        ,b.dau2
+FROM    t_metrics a
+LEFT JOIN t_dau2 b
+ON      a.dt = b.dt
+AND     a.apptype = b.apptype
+AND     a.abcode = b.abcode
+ORDER BY a.dt DESC,a.apptype,a.abcode
+;

+ 1 - 0
tasks/00_尾号实验/base_v3.sql

@@ -114,6 +114,7 @@ WITH t_abmap AS
             ,COALESCE(SUM(return_n_uv_noself),0) AS return_n_uv_noself
     FROM    t_base
     WHERE   page = "推荐"
+    AND     abcode != "other"
     GROUP BY dt
              ,apptype
              ,abcode

+ 173 - 0
tasks/00_尾号实验/base_v4 copy.sql

@@ -0,0 +1,173 @@
+WITH t_abmap AS
+(
+    SELECT "3" AS suffix, "实验组:ros损失函数优化" AS abcode
+    UNION ALL SELECT "4", "实验组:c1_rovn & 去掉vor实验"
+    UNION ALL SELECT "5", "实验组:c1_rovn"
+    UNION ALL SELECT "a", "对照组"
+    UNION ALL SELECT "b", "对照组"
+    UNION ALL SELECT "c", "对照组"
+)
+,t_base AS
+(
+    SELECT  sub.*
+            ,COALESCE(m.abcode,"other") AS abcode
+    FROM    (
+                SELECT  dt
+                        ,apptype
+                        ,SUBSTR(GET_JSON_OBJECT(extend,'$.rootsessionid'),LENGTH(GET_JSON_OBJECT(extend,'$.rootsessionid')),1) AS suffix
+                        ,CASE   WHEN page IN ("回流后沉浸页&内页feed","详情后沉浸页","首页feed","详情页") THEN "推荐"
+                                WHEN page IN ("回流页","其他") THEN "非推荐"
+                                ELSE "其他"
+                        END AS page
+                        ,mid
+                        ,vid
+                        ,is_share
+                        ,share_cnt
+                        ,is_return_1
+                        ,is_return_n
+                        ,is_return_noself
+                        ,return_1_uv
+                        ,return_n_uv
+                        ,return_n_uv_noself
+                        ,new_exposure_cnt
+                        ,flowpool
+                        -- ,abcode as abcode_origin
+                FROM    loghubods.dwd_recsys_alg_exposure_base_20250108
+                WHERE   dt="${dt}"
+                AND     apptype IN ("0")
+                AND     page IN ("回流后沉浸页&内页feed","详情后沉浸页","首页feed","详情页","回流页","其他")
+                AND     abcode IN ("ab0","ab1","ab2","ab3","ab4","ab5","ab6","ab7","ab8","ab9")
+                AND     abcode NOT IN ("ab100")
+            ) sub
+    LEFT JOIN t_abmap m
+    ON      sub.apptype = "0"
+    AND     sub.suffix = m.suffix
+)
+-- dau2:按单尾号聚合
+,t_dau2_bucket AS
+(
+    SELECT  SUBSTR(sub.dt,1,8) AS dt
+            ,sub.apptype
+            ,COALESCE(m.abcode,"other") AS abcode
+            ,sub.suffix
+            ,COUNT(DISTINCT sub.machinecode) AS dau2
+    FROM    (
+                SELECT  dt
+                        ,apptype
+                        ,machinecode
+                        ,SUBSTR(GET_JSON_OBJECT(extparams,'$.rootSessionId'),LENGTH(GET_JSON_OBJECT(extparams,'$.rootSessionId')),1) AS suffix
+                FROM    loghubods.useractive_log
+                WHERE   dt="${dt}"
+                -- FROM    loghubods.useractive_log_per5min
+                -- WHERE   dt BETWEEN CONCAT("${dt}","000000") AND CONCAT("${dt}","235500")
+                AND     apptype IN ("0")
+            ) sub
+    LEFT JOIN t_abmap m
+    ON      sub.apptype = "0"
+    AND     sub.suffix = m.suffix
+    GROUP BY SUBSTR(sub.dt,1,8)
+             ,sub.apptype
+             ,COALESCE(m.abcode,"other")
+             ,sub.suffix
+)
+-- dau2:按实验组求尾号均值
+,t_dau2 AS
+(
+    SELECT  dt
+            ,apptype
+            ,abcode
+            ,AVG(dau2) AS dau2
+    FROM    t_dau2_bucket
+    GROUP BY dt
+             ,apptype
+             ,abcode
+)
+-- 按单尾号聚合(尾号内 UV 去重)
+,t_bucket AS
+(
+    SELECT  dt
+            ,apptype
+            ,abcode
+            ,suffix
+            ,COALESCE(COUNT(1) / COUNT(DISTINCT mid),0) AS exp_per_dau
+            ,COALESCE(SUM(is_share) / COUNT(1),0) AS str_one
+            ,COALESCE(SUM(return_n_uv) / SUM(is_share),0) AS ros_one
+            ,COALESCE(SUM(share_cnt) / COUNT(1),0) AS str
+            ,COALESCE(SUM(return_n_uv) / SUM(share_cnt),0) AS ros
+            ,COALESCE(SUM(is_return_1) / COUNT(1),0) AS str_plus
+            ,COALESCE(SUM(return_n_uv) / SUM(is_return_1),0) AS ros_minus
+            ,COALESCE(SUM(return_n_uv) / COUNT(1),0) AS rovn
+            ,COALESCE(SUM(new_exposure_cnt) / COUNT(1),0) AS vovh24
+            ,COUNT(DISTINCT mid) AS dau
+            ,COUNT(1) AS exp
+            ,COALESCE(SUM(is_share),0) AS is_share
+            ,COALESCE(SUM(share_cnt),0) AS share_cnt
+            ,COALESCE(SUM(is_return_1),0) AS is_return_1
+            ,COALESCE(SUM(return_n_uv),0) AS return_n_uv
+            ,COALESCE(SUM(new_exposure_cnt),0) AS viewh24
+            ,COALESCE(SUM(return_n_uv_noself),0) AS return_n_uv_noself
+    FROM    t_base
+    WHERE   page = "推荐"
+    GROUP BY dt
+             ,apptype
+             ,abcode
+             ,suffix
+)
+-- 按实验组求尾号均值
+,t_metrics AS
+(
+    SELECT  dt
+            ,apptype
+            ,abcode
+            ,ROUND(AVG(exp_per_dau),2) AS exp_per_dau
+            ,ROUND(AVG(str_one),6) AS str_one
+            ,ROUND(AVG(ros_one),6) AS ros_one
+            ,ROUND(AVG(str),6) AS str
+            ,ROUND(AVG(ros),6) AS ros
+            ,ROUND(AVG(str_plus),6) AS str_plus
+            ,ROUND(AVG(ros_minus),6) AS ros_minus
+            ,ROUND(AVG(rovn),6) AS rovn
+            ,ROUND(AVG(vovh24),6) AS vovh24
+            ,AVG(dau) AS dau
+            ,AVG(exp) AS exp
+            ,AVG(is_share) AS is_share
+            ,AVG(share_cnt) AS share_cnt
+            ,AVG(is_return_1) AS is_return_1
+            ,AVG(return_n_uv) AS return_n_uv
+            ,AVG(viewh24) AS viewh24
+            ,AVG(return_n_uv_noself) AS return_n_uv_noself
+            ,WM_CONCAT(DISTINCT ',',suffix) AS suffix
+    FROM    t_bucket
+    GROUP BY dt
+             ,apptype
+             ,abcode
+)
+SELECT  a.dt
+        ,a.apptype
+        ,a.abcode
+        ,a.suffix
+        ,a.exp_per_dau
+        ,a.str_one
+        ,a.ros_one
+        ,a.str
+        ,a.ros
+        ,a.str_plus
+        ,a.ros_minus
+        ,a.rovn
+        ,a.vovh24
+        ,a.dau
+        ,a.exp
+        ,a.is_share
+        ,a.share_cnt
+        ,a.is_return_1
+        ,a.return_n_uv
+        ,a.viewh24
+        ,a.return_n_uv_noself
+        ,b.dau2
+FROM    t_metrics a
+LEFT JOIN t_dau2 b
+ON      a.dt = b.dt
+AND     a.apptype = b.apptype
+AND     a.abcode = b.abcode
+ORDER BY a.dt DESC,a.apptype,a.abcode
+;

+ 7 - 0
tasks/00_尾号实验/base_v4_v1.json

@@ -0,0 +1,7 @@
+{
+  "token": "ONZqsxB9BhGH8tt90EScSJT5nHh",
+  "sheet_id": "I1byJV",
+  "sort": "dt:desc",
+  "cols": null,
+  "filter": "abcode!=other,abcode!=6,abcode!=e,abcode!=f"
+}

+ 180 - 0
tasks/00_尾号实验/base_v4_v1.sql

@@ -0,0 +1,180 @@
+WITH t_abmap AS
+(
+    SELECT "3" AS suffix, "实验组:ros损失函数优化" AS abcode
+    UNION ALL SELECT "4", "实验组:c1_rovn & 去掉vor实验"
+    UNION ALL SELECT "5", "实验组:c1_rovn"
+    UNION ALL SELECT "d", "实验组:c1_rovn"
+    UNION ALL SELECT "e", "实验组:c1_rovn"
+    UNION ALL SELECT "f", "实验组:c1_rovn"
+    UNION ALL SELECT "6", "实验组:dn_rovn"
+    UNION ALL SELECT "7", "实验组:cn_rovn"
+    UNION ALL SELECT "a", "对照组"
+    UNION ALL SELECT "b", "对照组"
+    UNION ALL SELECT "c", "对照组"
+)
+,t_base AS
+(
+    SELECT  sub.*
+            ,COALESCE(m.abcode,"other") AS abcode
+    FROM    (
+                SELECT  dt
+                        ,apptype
+                        ,SUBSTR(GET_JSON_OBJECT(extend,'$.rootsessionid'),LENGTH(GET_JSON_OBJECT(extend,'$.rootsessionid')),1) AS suffix
+                        ,CASE   WHEN page IN ("回流后沉浸页&内页feed","详情后沉浸页","首页feed","详情页") THEN "推荐"
+                                WHEN page IN ("回流页","其他") THEN "非推荐"
+                                ELSE "其他"
+                        END AS page
+                        ,mid
+                        ,vid
+                        ,is_share
+                        ,share_cnt
+                        ,is_return_1
+                        ,is_return_n
+                        ,is_return_noself
+                        ,return_1_uv
+                        ,return_n_uv
+                        ,return_n_uv_noself
+                        ,new_exposure_cnt
+                        ,flowpool
+                        -- ,abcode as abcode_origin
+                FROM    loghubods.dwd_recsys_alg_exposure_base_20250108
+                WHERE   dt="${dt}"
+                AND     apptype IN ("0")
+                AND     page IN ("回流后沉浸页&内页feed","详情后沉浸页","首页feed","详情页","回流页","其他")
+                AND     abcode IN ("ab0","ab1","ab2","ab3","ab4","ab8","ab9")
+                AND     abcode NOT IN ("ab100")
+            ) sub
+    LEFT JOIN t_abmap m
+    ON      sub.apptype = "0"
+    AND     sub.suffix = m.suffix
+)
+-- dau2:按单尾号聚合
+,t_dau2_bucket AS
+(
+    SELECT  SUBSTR(sub.dt,1,8) AS dt
+            ,sub.apptype
+            ,COALESCE(m.abcode,"other") AS abcode
+            ,sub.suffix
+            ,COUNT(DISTINCT sub.machinecode) AS dau2
+    FROM    (
+                SELECT  dt
+                        ,apptype
+                        ,machinecode
+                        ,SUBSTR(GET_JSON_OBJECT(extparams,'$.rootSessionId'),LENGTH(GET_JSON_OBJECT(extparams,'$.rootSessionId')),1) AS suffix
+                FROM    loghubods.useractive_log
+                WHERE   dt="${dt}"
+                -- FROM    loghubods.useractive_log_per5min
+                -- WHERE   dt BETWEEN CONCAT("${dt}","000000") AND CONCAT("${dt}","235500")
+                AND     apptype IN ("0")
+                AND     GET_JSON_OBJECT(extparams,'$.eventInfos.ab_test003') IN ("ab0","ab1","ab2","ab3","ab4","ab8","ab9")
+                AND     GET_JSON_OBJECT(extparams,'$.eventInfos.ab_test003') NOT IN ("ab100")
+            ) sub
+    LEFT JOIN t_abmap m
+    ON      sub.apptype = "0"
+    AND     sub.suffix = m.suffix
+    GROUP BY SUBSTR(sub.dt,1,8)
+             ,sub.apptype
+             ,COALESCE(m.abcode,"other")
+             ,sub.suffix
+)
+-- dau2:按实验组求尾号均值
+,t_dau2 AS
+(
+    SELECT  dt
+            ,apptype
+            ,abcode
+            ,AVG(dau2) AS dau2
+    FROM    t_dau2_bucket
+    GROUP BY dt
+             ,apptype
+             ,abcode
+)
+-- 按单尾号聚合(尾号内 UV 去重)
+,t_bucket AS
+(
+    SELECT  dt
+            ,apptype
+            ,abcode
+            ,suffix
+            ,COALESCE(COUNT(1) / COUNT(DISTINCT mid),0) AS exp_per_dau
+            ,COALESCE(SUM(is_share) / COUNT(1),0) AS str_one
+            ,COALESCE(SUM(return_n_uv) / SUM(is_share),0) AS ros_one
+            ,COALESCE(SUM(share_cnt) / COUNT(1),0) AS str
+            ,COALESCE(SUM(return_n_uv) / SUM(share_cnt),0) AS ros
+            ,COALESCE(SUM(is_return_1) / COUNT(1),0) AS str_plus
+            ,COALESCE(SUM(return_n_uv) / SUM(is_return_1),0) AS ros_minus
+            ,COALESCE(SUM(return_n_uv) / COUNT(1),0) AS rovn
+            ,COALESCE(SUM(new_exposure_cnt) / COUNT(1),0) AS vovh24
+            ,COUNT(DISTINCT mid) AS dau
+            ,COUNT(1) AS exp
+            ,COALESCE(SUM(is_share),0) AS is_share
+            ,COALESCE(SUM(share_cnt),0) AS share_cnt
+            ,COALESCE(SUM(is_return_1),0) AS is_return_1
+            ,COALESCE(SUM(return_n_uv),0) AS return_n_uv
+            ,COALESCE(SUM(new_exposure_cnt),0) AS viewh24
+            ,COALESCE(SUM(return_n_uv_noself),0) AS return_n_uv_noself
+    FROM    t_base
+    WHERE   page = "推荐"
+    GROUP BY dt
+             ,apptype
+             ,abcode
+             ,suffix
+)
+-- 按实验组求尾号均值
+,t_metrics AS
+(
+    SELECT  dt
+            ,apptype
+            ,abcode
+            ,ROUND(AVG(exp_per_dau),2) AS exp_per_dau
+            ,ROUND(AVG(str_one),6) AS str_one
+            ,ROUND(AVG(ros_one),6) AS ros_one
+            ,ROUND(AVG(str),6) AS str
+            ,ROUND(AVG(ros),6) AS ros
+            ,ROUND(AVG(str_plus),6) AS str_plus
+            ,ROUND(AVG(ros_minus),6) AS ros_minus
+            ,ROUND(AVG(rovn),6) AS rovn
+            ,ROUND(AVG(vovh24),6) AS vovh24
+            ,AVG(dau) AS dau
+            ,AVG(exp) AS exp
+            ,AVG(is_share) AS is_share
+            ,AVG(share_cnt) AS share_cnt
+            ,AVG(is_return_1) AS is_return_1
+            ,AVG(return_n_uv) AS return_n_uv
+            ,AVG(viewh24) AS viewh24
+            ,AVG(return_n_uv_noself) AS return_n_uv_noself
+            ,WM_CONCAT(DISTINCT ',',suffix) AS suffix
+    FROM    t_bucket
+    GROUP BY dt
+             ,apptype
+             ,abcode
+)
+SELECT  a.dt
+        ,a.apptype
+        ,a.abcode
+        ,a.suffix
+        ,a.exp_per_dau
+        ,a.str_one
+        ,a.ros_one
+        ,a.str
+        ,a.ros
+        ,a.str_plus
+        ,a.ros_minus
+        ,a.rovn
+        ,a.vovh24
+        ,a.dau
+        ,a.exp
+        ,a.is_share
+        ,a.share_cnt
+        ,a.is_return_1
+        ,a.return_n_uv
+        ,a.viewh24
+        ,a.return_n_uv_noself
+        ,b.dau2
+FROM    t_metrics a
+LEFT JOIN t_dau2 b
+ON      a.dt = b.dt
+AND     a.apptype = b.apptype
+AND     a.abcode = b.abcode
+ORDER BY a.dt DESC,a.apptype,a.abcode
+;

+ 38 - 0
tasks/00_表的开发/loghubods.user_relation/02_验证from_mid.sql

@@ -0,0 +1,38 @@
+-- 新旧规则准确率对比
+SELECT  shareid_type
+       ,total
+       ,old_match
+       ,new_match
+       ,old_mismatch
+       ,new_mismatch
+       ,ROUND(old_match * 100.0 / total, 4) AS old_rate
+       ,ROUND(new_match * 100.0 / total, 4) AS new_rate
+FROM (
+    SELECT  CASE WHEN c.shareid LIKE 'weixin_openid%' THEN 'weixin_openid'
+                 WHEN c.shareid LIKE 'frontend%'      THEN 'frontend'
+                 WHEN c.shareid LIKE 'app_share_id%'  THEN 'app_share_id'
+                 ELSE 'other'
+            END AS shareid_type
+           ,COUNT(1) AS total
+           ,SUM(CASE WHEN REGEXP_REPLACE(c.shareid, '-[^-]+-[^-]+$', '') = s.mid THEN 1 ELSE 0 END) AS old_match
+           ,SUM(CASE WHEN REGEXP_REPLACE(c.shareid, '(-[0-9a-f]{4})?-[0-9]{13}[0-9]*$', '') = s.mid THEN 1 ELSE 0 END) AS new_match
+           ,SUM(CASE WHEN REGEXP_REPLACE(c.shareid, '-[^-]+-[^-]+$', '') <> s.mid THEN 1 ELSE 0 END) AS old_mismatch
+           ,SUM(CASE WHEN REGEXP_REPLACE(c.shareid, '(-[0-9a-f]{4})?-[0-9]{13}[0-9]*$', '') <> s.mid THEN 1 ELSE 0 END) AS new_mismatch
+    FROM    loghubods.user_share_log c
+    JOIN (
+        SELECT  shareid, MAX(machinecode) AS mid
+        FROM    loghubods.user_share_log
+        WHERE   dt = '${dt}' AND topic = 'share'
+        GROUP BY shareid
+    ) s ON c.shareid = s.shareid
+    WHERE   c.dt = '${dt}'
+    AND     c.topic = 'click'
+    GROUP BY
+            CASE WHEN c.shareid LIKE 'weixin_openid%' THEN 'weixin_openid'
+                 WHEN c.shareid LIKE 'frontend%'      THEN 'frontend'
+                 WHEN c.shareid LIKE 'app_share_id%'  THEN 'app_share_id'
+                 ELSE 'other'
+            END
+) t
+ORDER BY total DESC
+;

+ 29 - 0
tasks/00_表的洞察/loghubods.user_share_log/00_洞察/05_click_top_from_mid_v2.sql

@@ -0,0 +1,29 @@
+-- click 分享者(from_mid) top1000,按带回的 target_mid 人数降序
+-- from_mid = shareid 去掉最后两段(-code-timestamp),因为 mid 本身可能含 -
+-- 示例: weixin_openid_o0w175QzI9bWlWjFv5dVzFWz9vO0-e200-1772196892078 → weixin_openid_o0w175QzI9bWlWjFv5dVzFWz9vO0
+-- 使用: python fetch_daily.py .../05_click_top_from_mid_v2.sql --date 20260210
+
+SELECT dt, apptype, from_mid,
+       target_uv,
+       ROUND(target_uv * 100.0 / SUM(target_uv) OVER(PARTITION BY dt, apptype), 2) as target_uv_pct,
+       cnt,
+       vid,
+       max_depth, avg_depth
+FROM (
+    SELECT dt,
+           apptype,
+           REGEXP_REPLACE(shareid, '-[^-]+-[^-]+$', '') as from_mid,
+           COUNT(DISTINCT machinecode) as target_uv,
+           COUNT(1) as cnt,
+           COUNT(DISTINCT clickobjectid) as vid,
+           MAX(CAST(usersharedepth AS BIGINT)) as max_depth,
+           ROUND(AVG(CAST(usersharedepth AS BIGINT)), 2) as avg_depth,
+           ROW_NUMBER() OVER(PARTITION BY dt, apptype ORDER BY COUNT(DISTINCT machinecode) DESC) as rn
+    FROM loghubods.user_share_log
+    WHERE dt = '${dt}' AND topic = 'click'
+      AND REGEXP_REPLACE(shareid, '-[^-]+-[^-]+$', '') <> machinecode  -- 排除自点
+      AND (rootsourceid = '' OR rootsourceid IS NULL)  -- 排除转发带回的
+    GROUP BY dt, apptype, REGEXP_REPLACE(shareid, '-[^-]+-[^-]+$', '')
+) t
+WHERE rn <= 1000
+ORDER BY apptype, target_uv DESC

+ 35 - 0
tasks/00_表的洞察/loghubods.user_share_log/00_洞察/05_verify_shareid_timestamp.sql

@@ -0,0 +1,35 @@
+-- 验证 shareid 末尾是否为分享时间戳
+-- shareid 格式: {mid}-{code}-{timestamp_ms}
+-- 提取最后一段作为毫秒时间戳,转为日期后与 share 记录的 dt 对比
+-- 使用: python run_sql.py .../05_verify_shareid_timestamp.sql
+
+WITH clicks AS (
+    SELECT shareid
+    FROM loghubods.user_share_log
+    WHERE dt = '${start}'
+      AND topic = 'click'
+      AND shareid IS NOT NULL AND shareid <> ''
+    GROUP BY shareid
+),
+shares AS (
+    -- 回溯 90 天找 share 记录
+    SELECT shareid, MIN(dt) AS share_dt
+    FROM loghubods.user_share_log
+    WHERE dt >= TO_CHAR(DATEADD(TO_DATE('${start}', 'yyyyMMdd'), -90, 'dd'), 'yyyyMMdd')
+      AND dt <= '${start}'
+      AND topic = 'share'
+      AND shareid IS NOT NULL AND shareid <> ''
+    GROUP BY shareid
+)
+SELECT
+    s.share_dt,
+    TO_CHAR(FROM_UNIXTIME(CAST(REGEXP_EXTRACT(c.shareid, '-([^-]+)$', 1) AS BIGINT) / 1000), 'yyyyMMdd') AS ts_date,
+    CASE WHEN s.share_dt = TO_CHAR(FROM_UNIXTIME(CAST(REGEXP_EXTRACT(c.shareid, '-([^-]+)$', 1) AS BIGINT) / 1000), 'yyyyMMdd')
+         THEN 'match' ELSE 'mismatch' END AS result,
+    COUNT(1) AS cnt
+FROM clicks c
+JOIN shares s ON c.shareid = s.shareid
+GROUP BY
+    s.share_dt,
+    TO_CHAR(FROM_UNIXTIME(CAST(REGEXP_EXTRACT(c.shareid, '-([^-]+)$', 1) AS BIGINT) / 1000), 'yyyyMMdd')
+ORDER BY cnt DESC

+ 55 - 0
tasks/00_表的洞察/loghubods.user_share_log/00_洞察/10_click_top_from_mid_排除外部首层_v2.sql

@@ -0,0 +1,55 @@
+-- click 排除外部首层后的 分享者(from_mid) top1000,按带回的 target_mid 人数降序
+-- from_mid = shareid 去掉最后两段(-code-timestamp),因为 mid 本身可能含 -
+-- 保留: 内部 + 外部非首层
+-- 使用: python fetch_daily.py .../10_click_top_from_mid_排除外部首层_v2.sql --date 20260210
+
+WITH t9 AS (
+    SELECT  t1.root_source_id AS rootsourceid
+           ,t1.channel
+    FROM    loghubods.ad_put_flow_record_tencent_day t1
+    LEFT JOIN loghubods.content_platform_account t2
+    ON      t1.channel = t2.channel
+    WHERE   t1.dt = MAX_PT('loghubods.ad_put_flow_record_tencent_day')
+    AND     t1.put_type_one = '企微'
+    AND     t1.root_source_id REGEXP 'touliu_tencentwbqw_|dyyqw_|dyycd_'
+    GROUP BY t1.root_source_id, t1.channel
+)
+
+SELECT dt, apptype, from_mid,
+       target_uv,
+       ROUND(target_uv * 100.0 / SUM(target_uv) OVER(PARTITION BY dt, apptype), 2) as target_uv_pct,
+       cnt,
+       vid,
+       max_depth, avg_depth
+FROM (
+    SELECT s.dt,
+           s.apptype,
+           REGEXP_REPLACE(s.shareid, '-[^-]+-[^-]+$', '') as from_mid,
+           COUNT(DISTINCT s.machinecode) as target_uv,
+           COUNT(1) as cnt,
+           COUNT(DISTINCT s.clickobjectid) as vid,
+           MAX(CAST(s.usersharedepth AS BIGINT)) as max_depth,
+           ROUND(AVG(CAST(s.usersharedepth AS BIGINT)), 2) as avg_depth,
+           ROW_NUMBER() OVER(PARTITION BY s.dt, s.apptype ORDER BY COUNT(DISTINCT s.machinecode) DESC) as rn
+    FROM loghubods.user_share_log s
+    LEFT JOIN t9
+    ON   s.rootsourceid = t9.rootsourceid
+    WHERE s.dt = '${dt}' AND s.topic = 'click'
+      AND REGEXP_REPLACE(s.shareid, '-[^-]+-[^-]+$', '') <> s.machinecode  -- 排除自点
+      -- 排除外部首层
+      AND NOT (
+          (s.rootsourceid REGEXP 'touliu_tencentwbqw_|dyyqw_'
+           AND t9.channel REGEXP 'xycsd|csaq|shy|jxjx|gzcr|xyjj|jxatm|xjcy|yqyx|hbwq|jxxm|gzmy|cdjh|gzjr|gzxts|twhc|qdjdz|sjzyd|gzyhc|djh|gzlx|yywl|szjn|gzdd1|cqqd|cqslh|hzjy|hzjh|sclh|xyhc|snss'
+           AND CAST(s.usersharedepth AS BIGINT) <= 1)
+          OR
+          (s.rootsourceid REGEXP 'touliu_tencentwbqw_|dyyqw_'
+           AND (t9.channel IS NULL OR t9.channel NOT REGEXP 'xycsd|csaq|shy|jxjx|gzcr|xyjj|jxatm|xjcy|yqyx|hbwq|jxxm|gzmy|cdjh|gzjr|gzxts|twhc|qdjdz|sjzyd|gzyhc|djh|gzlx|yywl|szjn|gzdd1|cqqd|cqslh|hzjy|hzjh|sclh|xyhc|snss')
+           AND CAST(s.usersharedepth AS BIGINT) = 0)
+          OR
+          (s.rootsourceid NOT REGEXP 'touliu_tencentwbqw_|dyyqw_'
+           AND CAST(s.usersharedepth AS BIGINT) = 0)
+      )
+    GROUP BY s.dt, s.apptype, REGEXP_REPLACE(s.shareid, '-[^-]+-[^-]+$', '')
+) t
+WHERE rn <= 1000
+ORDER BY apptype, target_uv DESC

+ 62 - 0
tasks/00_表的洞察/loghubods.user_share_log/00_洞察/11_click_low_from_mid_sample.sql

@@ -0,0 +1,62 @@
+-- 排名 1001~1020 的 from_mid(top1000 之外的 top20),每个取 10 条明细
+-- 基于排除外部首层的逻辑
+-- 使用: python fetch_daily.py .../11_click_low_from_mid_sample.sql --date 20260210
+
+WITH t9 AS (
+    SELECT  t1.root_source_id AS rootsourceid
+           ,t1.channel
+    FROM    loghubods.ad_put_flow_record_tencent_day t1
+    LEFT JOIN loghubods.content_platform_account t2
+    ON      t1.channel = t2.channel
+    WHERE   t1.dt = MAX_PT('loghubods.ad_put_flow_record_tencent_day')
+    AND     t1.put_type_one = '企微'
+    AND     t1.root_source_id REGEXP 'touliu_tencentwbqw_|dyyqw_|dyycd_'
+    GROUP BY t1.root_source_id, t1.channel
+),
+-- 过滤后的 click 明细(排除自点 + 排除外部首层)
+filtered AS (
+    SELECT s.*,
+           REGEXP_REPLACE(s.shareid, '-[^-]+-[^-]+$', '') as from_mid
+    FROM loghubods.user_share_log s
+    LEFT JOIN t9
+    ON   s.rootsourceid = t9.rootsourceid
+    WHERE s.dt = '${dt}' AND s.topic = 'click'
+      AND REGEXP_REPLACE(s.shareid, '-[^-]+-[^-]+$', '') <> s.machinecode
+      AND NOT (
+          (s.rootsourceid REGEXP 'touliu_tencentwbqw_|dyyqw_'
+           AND t9.channel REGEXP 'xycsd|csaq|shy|jxjx|gzcr|xyjj|jxatm|xjcy|yqyx|hbwq|jxxm|gzmy|cdjh|gzjr|gzxts|twhc|qdjdz|sjzyd|gzyhc|djh|gzlx|yywl|szjn|gzdd1|cqqd|cqslh|hzjy|hzjh|sclh|xyhc|snss'
+           AND CAST(s.usersharedepth AS BIGINT) <= 1)
+          OR
+          (s.rootsourceid REGEXP 'touliu_tencentwbqw_|dyyqw_'
+           AND (t9.channel IS NULL OR t9.channel NOT REGEXP 'xycsd|csaq|shy|jxjx|gzcr|xyjj|jxatm|xjcy|yqyx|hbwq|jxxm|gzmy|cdjh|gzjr|gzxts|twhc|qdjdz|sjzyd|gzyhc|djh|gzlx|yywl|szjn|gzdd1|cqqd|cqslh|hzjy|hzjh|sclh|xyhc|snss')
+           AND CAST(s.usersharedepth AS BIGINT) = 0)
+          OR
+          (s.rootsourceid NOT REGEXP 'touliu_tencentwbqw_|dyyqw_'
+           AND CAST(s.usersharedepth AS BIGINT) = 0)
+      )
+),
+-- 按 from_mid 排名
+ranked_mid AS (
+    SELECT from_mid,
+           COUNT(DISTINCT machinecode) as target_uv,
+           ROW_NUMBER() OVER(ORDER BY COUNT(DISTINCT machinecode) DESC) as rn
+    FROM filtered
+    GROUP BY from_mid
+),
+-- 取排名 1001~1020
+low_mids AS (
+    SELECT from_mid, target_uv, rn
+    FROM ranked_mid
+    WHERE rn > 1000 AND rn <= 1020
+)
+-- 每个 from_mid 取 10 条明细
+SELECT lm.rn as mid_rank, lm.target_uv, f.*
+FROM (
+    SELECT f.*,
+           ROW_NUMBER() OVER(PARTITION BY f.from_mid ORDER BY CAST(f.usersharedepth AS BIGINT) DESC) as row_rn
+    FROM filtered f
+    JOIN low_mids lm ON f.from_mid = lm.from_mid
+) f
+JOIN low_mids lm ON f.from_mid = lm.from_mid
+WHERE f.row_rn <= 10
+ORDER BY lm.rn, f.row_rn