首页 > 解决方案 > 分组后不同值的总和会爆炸度量

问题描述

我在用

with t1 as 
(
SELECT
    DATE_TRUNC(PARSE_DATE("%Y%m%d", date), MONTH) as month,
    fullVisitorId,
    product.productSKU,
    product.v2ProductName,
    case when hits.ecommerceaction.action_type = '2' then 1 else 0 end as pdp_visitor,
    count(case when hits.ecommerceaction.action_type = '2' then fullvisitorid else null end) AS views_pdp,
    count(case when hits.ecommerceaction.action_type = '3' then fullvisitorid else null end) AS add_cart,
    count(case when hits.ecommerceaction.action_type = '6' then hits.transaction.transactionid else null end) AS conversions,
    count(distinct(hits.transaction.transactionId)) as transaction_id_cnt,
FROM `table` AS nr, 
    UNNEST(hits) hits,
    UNNEST(product) product
GROUP BY 1,2,3,4,5
)
select 
    month,
    sum(views_pdp) as pdp 
    ,sum(add_cart) as add_cart
    ,sum(conversions) as conversions
    ,sum(transaction_id_cnt)
from t1
group by 1
order by 1 desc;

哪个返回

month       pdp     add_cart    conversions     f0_ 
2021-02-01  500     100         20              10  
2021-01-01  600     200         30              20

我知道f0_( count(distinct(hits.transaction.transactionId))) 在这里不好,因为product.productSKUproduct.v2ProductName分组。一般来说,当用户在他的篮子里有 3 件商品时,我想把它算作一个订单,而现在它被算作 3 个。

count(distinct(hits.transaction.transactionId)) as transaction_id_cnt如果我注释掉product.productSKU和 ,这会产生正确的输出product.v2ProductName

运行此查询:

with t1 as 
(
SELECT
    DATE_TRUNC(PARSE_DATE("%Y%m%d", date), MONTH) as month,
    fullVisitorId,
    -- product.productSKU, # commented out
    -- product.v2ProductName, # commented out
    case when hits.ecommerceaction.action_type = '2' then 1 else 0 end as pdp_visitor,
    count(case when hits.ecommerceaction.action_type = '2' then fullvisitorid else null end) AS views_pdp,
    count(case when hits.ecommerceaction.action_type = '3' then fullvisitorid else null end) AS add_cart,
    count(case when hits.ecommerceaction.action_type = '6' then hits.transaction.transactionid else null end) AS conversions,
    count(distinct(hits.transaction.transactionId)) as transaction_id_cnt,
FROM `table` AS nr, 
    UNNEST(hits) hits,
    UNNEST(product) product
GROUP BY 1,2,3,4,5
)
select 
    month,
    sum(views_pdp) as pdp 
    ,sum(add_cart) as add_cart
    ,sum(conversions) as conversions
    ,sum(transaction_id_cnt)
from t1
group by 1
order by 1 desc;

返回预期的内容,但现在我没有productSKU并且v2ProductName需要。我怀疑问题是每个订单都是谷歌大查询中的一个新行,当我要求按产品名称和 SKU 选择它时,我计算了唯一性,然后求和。

我怎样才能在count(distinct(hits.transaction.transactionId))不丢失分组依据的情况下实现正确的总和, product.productSKU以及product.v2ProductName哪个会爆炸这个指标?

标签: sqlgoogle-bigquery

解决方案


在 group by Query 中,您可以将它们挑选为数组(这样您就不会按它们分组):

  ARRAY_AGG(DISTINCT product.productSKU IGNORE NULLS) AS productSKU_list,
  ARRAY_AGG(DISTINCT product.v2ProductName IGNORE NULLS) AS productName_list,

根据您的以下评论更新:如果您想在进一步的组中使用它们,只需将它们保存为字符串而不是数组。

  STRING_AGG(DISTINCT product.productSKU, ',') AS productSKU_list,
  STRING_AGG(DISTINCT product.v2ProductName, ',') AS productName_list,

推荐阅读