首页 > 解决方案 > 如何根据访问频率将回访用户分类到不同的时间段

问题描述

所以我有一个 visit_log 表,它基本上跟踪应用程序的多个 user_visit。它具有 user_id 和 timestamp 作为列。我已经能够聚合 user_visit 数据,例如,现在我知道有多少周、双周、月、季度和年用户使用以下逻辑访问了应用程序:

SELECT DISTINCT ,user_id
                ,count( distinct date_part('year', timestamp) * 100 + date_part('week', timestamp)) as week
                ,count( distinct date_part('year', timestamp) * 100 + ceiling(date_part('week', timestamp)::decimal/2)) as biweek
                ,count( distinct date_part('year', timestamp) * 100 + date_part('month', timestamp)) as month
                ,count( distinct date_part('year', timestamp) * 100 + ceiling(date_part('month', timestamp)::decimal/3)) as quarter
                ,count( distinct date_part('year', timestamp)) as year
     FROM visit_logs

现在我想更进一步,将用户定义为每周、每两周、每月、每季度、每年和很少返回的用户,以便他们在所有存储桶中遵循 80% 的阈值,以澄清用户是否返回应用程序在所选时间范围内一周内超过 80% 的时间,那么他是每周用户等等。

下面是我尝试更进一步并复制上述 Python 逻辑的 SQL 代码:

, temp AS 
(
      SELECT 
              user_id
              ,SUM(week) * 0.8 as total_weeks
              ,SUM(biweek) * 0.8 as total_biweek
              ,SUM(month) * 0.8 as total_months
              ,SUM(quarter) * 0.8 as total_quarters
              ,SUM(year) * 0.8 as total_year
      FROM  time_count
      GROUP BY 1
)


,week_count as 
(
            SELECT CASE WHEN week > total_weeks THEN 'Weekly'
                        WHEN biweek > total_biweek  THEN 'Biweekly'
                        WHEN month > total_months THEN 'Monthly'
                        WHEN quarter > total_quarters THEN 'quarterly'
                        WHEN year > total_year THEN 'yearly'
                        ELSE 'rarely' 
                    END as time_bucket
            FROM time_count
            LEFT JOIN temp ON temp.user_id = time_count.user_id
                           
)

SELECT * FROM week_count

不知道我哪里出错了,但它似乎没有按预期工作。请尽早提供建议/协助

标签: pythonsqlpostgresql

解决方案


我有点糊涂了。我会使用窗口函数来解决这个问题。这是一个带有内置时间框架的示例:

select user_id,
       (case when num_weeks > 0.8 * total_weeks then 'weekly'
             when num_months > 0.8 * total_months then 'monthly'
             when num_years > 0.8 * total_years then 'yearly'
             else 'rarely'
        end) as frequency
from (select user_id,
             count(distinct date_trunc('week', timestamp)) as num_weeks,
             count(distinct date_trunc('month', timestamp)) as num_months,
             count(distinct date_trunc('year', timestamp)) as num_years,
             extract(day from current_date - min(timestamp)) / 7 as total_weeks,
             extract(year from age(min(timestamp, current_date))) * 12 + extract(month from age(min(timestamp, current_date))) as total_months,
             extract(year from age(min(timestamp, current_date))) as total_years
      from visit_logs vl
      group by user_id
     ) vl;

推荐阅读