首页 > 解决方案 > GCP Bigquery,我想在一列中通过分类值聚合一些值

问题描述

我有这些数据要分析,架构如下

timestamp           price volume Qualifiers TradeCatogary
------------------- ----- ------ ---------- -------------
2016-07-11 01:00:00     3   2323          U           OML
2016-07-11 01:02:03   2.5    434          K           KCL
2016-07-11 01:03:34     4   3244          U           KCL
2016-07-11 01:23:00   2.3     45          K           OML
...

我想要的是计算每 5 分钟的总份额、交易价值、交易数量和平均价格(VWAP = DollarTraded / totalVolume)。所需架构如下

timestamp tradeCount totalVolume dollarTraded VWAP
--------- ---------- ----------- ------------ ----

这并不难,我用这样的 SQL 管理它

SELECT
    TIMESTAMP_SECONDS(CAST((ROUND(UNIX_SECONDS(Date_Time) / 300) * 300) AS int64)) AS interval_alias1,
    count(*) AS cnt,
    SUM(Volume) AS ShareVolumeTraded,
    SUM(Price * Volume) AS DollarVolumeTraded,
    (SUM(Price * Volume) / SUM(Volume)) AS VWAP
FROM
    `dbd-sdlc-prod.HKG_NORMALISED.HKG_NORMALISED`
WHERE
    RIC = '1606.HK'
    AND Type="Trade"
    AND (Date_Time BETWEEN TIMESTAMP('2016-07-11 00:00:00.000000') AND
    TIMESTAMP('2016-07-11 23:59:59.999999'))
    AND Volume >0
    AND Price >0
GROUP BY 
    ROUND(UNIX_SECONDS(Date_Time) / 300), interval_alias1
ORDER BY interval_alias1

但是,我想进一步推动我的分析,我想要的不仅仅是交易的总摘要,我还想根据限定符和 tradeCatogary 的不同组合进行更具体的分析,例如:

count_U volume_U dollar_U VWAP_U count_U_OML volume_U_OML dollar_U_OML VWAP_U_OML ....

there are two qualifiers and two catogaries in the example so there will be:
3(basic) + 2 * 3(only qualifier) + 3 * 2 * 2(qualifier and catogary combination) = 21 columns
If there are no such trade in these time slot, the value will be left 0 as default

我希望通过一次 SQL 执行来管理这些查询,我不知道如何解决它,请给我一个提示或一些提示,在此先感谢

标签: mysqlsqlgoogle-cloud-platformgoogle-bigquery

解决方案


如评论中所述,Group By Rollup适合您的情况。Group By Rollup 返回 Group By 对所提及列的所有可能组合的结果。由于您还希望将输出作为列而不是行,因此您可以使用数据透视表将行更改为列。此外,对于您的其他要求,即连接两列的输出并使其成为单列,可以使用CONCAT函数。


推荐阅读