sql - 分组后不同值的总和会爆炸度量
问题描述
我在用
with t1 as
(
SELECT
DATE_TRUNC(PARSE_DATE("%Y%m%d", date), MONTH) as month,
fullVisitorId,
product.productSKU,
product.v2ProductName,
case when hits.ecommerceaction.action_type = '2' then 1 else 0 end as pdp_visitor,
count(case when hits.ecommerceaction.action_type = '2' then fullvisitorid else null end) AS views_pdp,
count(case when hits.ecommerceaction.action_type = '3' then fullvisitorid else null end) AS add_cart,
count(case when hits.ecommerceaction.action_type = '6' then hits.transaction.transactionid else null end) AS conversions,
count(distinct(hits.transaction.transactionId)) as transaction_id_cnt,
FROM `table` AS nr,
UNNEST(hits) hits,
UNNEST(product) product
GROUP BY 1,2,3,4,5
)
select
month,
sum(views_pdp) as pdp
,sum(add_cart) as add_cart
,sum(conversions) as conversions
,sum(transaction_id_cnt)
from t1
group by 1
order by 1 desc;
哪个返回
month pdp add_cart conversions f0_
2021-02-01 500 100 20 10
2021-01-01 600 200 30 20
我知道f0_
( count(distinct(hits.transaction.transactionId))
) 在这里不好,因为product.productSKU
和product.v2ProductName
分组。一般来说,当用户在他的篮子里有 3 件商品时,我想把它算作一个订单,而现在它被算作 3 个。
count(distinct(hits.transaction.transactionId)) as transaction_id_cnt
如果我注释掉product.productSKU
和 ,这会产生正确的输出product.v2ProductName
。
运行此查询:
with t1 as
(
SELECT
DATE_TRUNC(PARSE_DATE("%Y%m%d", date), MONTH) as month,
fullVisitorId,
-- product.productSKU, # commented out
-- product.v2ProductName, # commented out
case when hits.ecommerceaction.action_type = '2' then 1 else 0 end as pdp_visitor,
count(case when hits.ecommerceaction.action_type = '2' then fullvisitorid else null end) AS views_pdp,
count(case when hits.ecommerceaction.action_type = '3' then fullvisitorid else null end) AS add_cart,
count(case when hits.ecommerceaction.action_type = '6' then hits.transaction.transactionid else null end) AS conversions,
count(distinct(hits.transaction.transactionId)) as transaction_id_cnt,
FROM `table` AS nr,
UNNEST(hits) hits,
UNNEST(product) product
GROUP BY 1,2,3,4,5
)
select
month,
sum(views_pdp) as pdp
,sum(add_cart) as add_cart
,sum(conversions) as conversions
,sum(transaction_id_cnt)
from t1
group by 1
order by 1 desc;
返回预期的内容,但现在我没有productSKU
并且v2ProductName
需要。我怀疑问题是每个订单都是谷歌大查询中的一个新行,当我要求按产品名称和 SKU 选择它时,我计算了唯一性,然后求和。
我怎样才能在count(distinct(hits.transaction.transactionId))
不丢失分组依据的情况下实现正确的总和, product.productSKU
以及product.v2ProductName
哪个会爆炸这个指标?
解决方案
在 group by Query 中,您可以将它们挑选为数组(这样您就不会按它们分组):
ARRAY_AGG(DISTINCT product.productSKU IGNORE NULLS) AS productSKU_list,
ARRAY_AGG(DISTINCT product.v2ProductName IGNORE NULLS) AS productName_list,
根据您的以下评论更新:如果您想在进一步的组中使用它们,只需将它们保存为字符串而不是数组。
STRING_AGG(DISTINCT product.productSKU, ',') AS productSKU_list,
STRING_AGG(DISTINCT product.v2ProductName, ',') AS productName_list,
推荐阅读
- javascript - 如何在javascript中将一个数组的现有数组对象替换为另一个数组
- jupyter-lab - 如何“查找和替换”文本?
- powershell - 如何使用 PowerShell 处理输出的 diskpart 命令
- javascript - 如何处理 Ember Js Handlebars 中的对象数组
- maven - maven javadoc插件抛出异常退出代码:1 - javadoc:错误 - 无法读取输入长度= 1
- json - Cumulocity JSON 通过 MQTT / 测量和错误主题
- c# - 解决方案中的 .NET 项目应该生成一个 exe,但只生成一个安装文件和快捷方式
- r - 重塑然后折叠数据框
- android - 将数据从 Dialog 中的 Activity 传递到另一个 Activity
- ruby - 增加版本的最后一部分