sql - 如何预测 Google BigQuery SQL 中按相同 order_id 分组的交易的流失率?
问题描述
我对 BigQuery 中的 SQL 很陌生。我试图根据他们的 order_id 和 date_start 预测客户状态。
规则
对于相同的 order_id,customer_status 必须相同
customer_status 必须根据下一个 Order_ID (Transaction) 或上一个 Order_ID 进行预测
这是我的代码
select week_start, t.account_id, t.order_id, date_start,date_ended,
from unnest(generate_date_array(date('2018-12-31'), date('2019-2-11'), interval 1 week) ) week_start cross join
(select distinct account_id
from t
) a left join
t
on t.account_id = a.account_id and
t.reporting_week_start_date = week_start
order by a.account_id,week_start
我的代码的输出
week_start | account_id | order_id | date_start | date_ended |
2018-12-31 | 1 | 1001 | 2019-01-01 | 2019-01-15 |
2019-01-07 | 1 | 1001 | 2019-01-01 | 2019-01-15 |
2019-01-14 | 1 | 1002 | 2019-01-15 | 2019-01-27 |
2019-01-21 | 1 | 1002 | 2019-01-15 | 2019-01-27 |
2019-01-28 | 1 | 1003 | 2019-01-29 | 2019-02-04 |
愿望输出
week_start | account_id | order_id | date_start | date_ended | Customer_Status
2018-12-31 | 1 | 1001 | 2019-01-01 | 2019-01-15 | First Time
2019-01-07 | 1 | 1001 | 2019-01-01 | 2019-01-15 | First Time
2019-01-14 | 1 | 1002 | 2019-01-15 | 2019-01-27 | Recurring
2019-01-21 | 1 | 1002 | 2019-01-15 | 2019-01-27 | Recurring
2019-01-28 | 1 | 1003 | 2019-01-29 | 2019-02-04 | Churned
我如何在 Big Query SQL 中执行此操作?谢谢。
解决方案
在分析了你的案例之后,我想出了你可以用来对客户进行分类的规则。由于您想在 BigQuery 上对每个客户进行分类,因此所有规则都是硬编码的。
您提到分类应该基于下一个和上一个Order_id,但我也使用了字段account_id。出于这个原因,我使用了两种方法LAG()和LEAD()。
下面的示例代码应合并到您当前的查询中,我使用您当前的输出作为我的源输入。我遵循以下算法:
1) 对首次使用的客户进行分类。所以我创建了临时表 new_c;
2) 对Recurring客户进行分类,然后创建rec_c临时表;
3) 最后一个临时表churn_c的流失客户;
4)根据源输入(您当前的输出)连接所有表格,在处理重复状态的阶段,如我们有两个“第一次”客户的状态等等。
下面是代码,请注意我使用的列名略有不同:
#First selecting the new costumers and changing the namwes of some columns so we can use left join in the end
WITH
new_c AS (
SELECT
_account_id__ AS acc,
_order_id__ AS oo,
week_start_ AS wk_nc,
CASE
WHEN ((LAG(_account_id__) OVER (PARTITION BY _account_id__ ORDER BY _order_id__ ASC)IS NULL) AND (LAG(_order_id__) OVER (PARTITION BY _account_id__ ORDER BY _order_id__ ASC)IS NULL )) THEN 'First time'
WHEN (LAG(_account_id__) OVER (PARTITION BY _account_id__ ORDER BY _order_id__ ASC) != _account_id__ ) THEN 'First time'
END
AS status
FROM
`test-proj-261014.sample.customer` ),
#Now, we classify the customers who are Recurring
rec_c AS (
SELECT
_account_id__ AS acc_rc,
_order_id__ AS oo_rc,
week_start_ AS wk_rc,
CASE
WHEN ((LAG(_account_id__) OVER (PARTITION BY _account_id__ ORDER BY _order_id__ ASC)= _account_id__ ) AND (LAG(_order_id__) OVER (PARTITION BY _account_id__ ORDER BY _order_id__ ASC)!= _order_id__) AND (LEAD(_account_id__) OVER (PARTITION BY _account_id__ ORDER BY _order_id__ ASC) = _account_id__ ) ) THEN 'Recurring'
END
AS status_rc
FROM
`test-proj-261014.sample.customer` ),
#Last, the Churned customers
churn_c AS (
SELECT
_account_id__ AS acc_c,
_order_id__ AS oo_c,
week_start_ AS wk_ch,
CASE
WHEN ((LEAD(_account_id__) OVER (PARTITION BY _account_id__ ORDER BY _order_id__ ASC)!= _account_id__ ) AND (LAG(_account_id__) OVER (PARTITION BY _account_id__ ORDER BY _order_id__ ASC)= _account_id__ )) THEN 'Churned'
WHEN ((LEAD(_account_id__) OVER (PARTITION BY _account_id__ ORDER BY _order_id__ ASC) IS NULL)
AND (LAG(_account_id__) OVER (PARTITION BY _account_id__ ORDER BY _order_id__ ASC)= _account_id__ )) THEN 'Churned'
END
AS status_churn
FROM
`test-proj-261014.sample.customer` )
#In the end, all the temporary tables are joined together and we handle same status customers.
SELECT
week_start_,
_account_id__,
_order_id__,
_date_start___,
_date_ended___,
CASE
WHEN status IS NOT NULL THEN status
WHEN (LAG(status) OVER (PARTITION BY acc ORDER BY oo ASC) ='First time'
AND LAG(oo) OVER (PARTITION BY acc ORDER BY oo ASC)=oo)THEN 'First time'
WHEN status_rc IS NOT NULL THEN status_rc
WHEN (LAG(status_rc) OVER (PARTITION BY acc_rc ORDER BY oo_rc ASC) ='Recurring'
AND LAG(oo_rc) OVER (PARTITION BY acc ORDER BY oo ASC)=oo_rc) THEN 'Recurring'
WHEN churn_c IS NOT NULL THEN status_churn
WHEN (LAG(status_churn) OVER (PARTITION BY acc_c ORDER BY oo_c ASC) ='Churned'
AND LAG(oo_c) OVER (PARTITION BY acc ORDER BY oo ASC)=oo_c) THEN 'Churned'
END
AS final_status
FROM
`test-proj-261014.sample.customer`
INNER JOIN
new_c
ON
_account_id__ = ACC
AND _order_id__ = OO
AND week_start_ = wk_nc
INNER JOIN
rec_c
ON
_account_id__ = acc_rc
AND _order_id__ = oo_rc
AND week_start_ = wk_rc
INNER JOIN
churn_c
ON
_account_id__ = acc_c
AND _order_id__ = oo_c
AND week_start_ = wk_ch ;
推荐阅读
- tfs - 无法使用 VS 2017 在本地文件夹中映射工作区
- java - 如何编译.java文件,需要哪些工具和数据?
- aframe - 在 vr 模式下访问左右投影设置
- npm - 如何禁用`npm install`
- azure - 如何将 vm 添加到多个 azure 自动化混合工作组
- ios - 在 ARImageTrackingConfiguration 中转换坐标
- javascript - Chrome 扩展程序:模拟对非焦点选项卡的关注
- android - AndroidKeystore 密钥无法初始化密码
- c++ - `std::filesystem::path` 没有标准哈希吗?
- java - 如何访问 javafx 元素的子元素?