首页 > 解决方案 > 如何查询嵌套在后缀为日期时间的记录列中的所有记录

问题描述

我有一个带有字段的表markedDates(类型:RECORD)。

markedDates有一些这样的属性:

我不知道设计师是如何创建这个表的,但数据是这样保存的:

我必须用一些日期时间过滤器(某物 <= 现在 <= 某物)对所有的收入点求和。但我是 Google Big Query 的新手。问题出在markedDates 还有其他日期时间。:d_2018_09_08,d_2019_09_09,.. 我也必须在其他日期总结获得的积分。我已经研究过,但找不到任何东西。

你能给我一些在这种情况下的关键字或在这种情况下如何查询吗?谢谢!

标签: google-bigquery

解决方案


我必须对所有 EarnPoint 进行求和...

以下是 BigQuery 标准 SQL

#standardSQL
SELECT userID, SUM(markedDates.d_2018_11_30.earnedPoint) AS allEarnedPoint
FROM `project.dataset.table`
GROUP BY userID     

...带有一些日期时间过滤器

看不到任何可用于此类过滤的日期时间相关字段

问题出在markedDates还有其他日期时间。:d_2018_09_08,d_2019_09_09,...而且我也必须在其他日期总结获得的积分

下面是诀窍

#standardSQL
SELECT userID, SUM(CAST(JSON_EXTRACT(REGEXP_EXTRACT(x, r'"d_.*?":(.*)'), '$.earnedPoint') AS FLOAT64)) allEarnedPoint
FROM `project.dataset.table`, 
UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(markedDates), r'"d_.*?":{.*?}')) x
WHERE REGEXP_EXTRACT(x, r'"d_(.*?)"') BETWEEN '2018_12_02' AND '2018_12_05'
GROUP BY userID   

您可以使用我希望代表您的情况的非常简化的虚拟数据来测试,玩上面

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 1 userID, 
    STRUCT(
      STRUCT(0 AS earnedPoint, TRUE AS earnedShare) AS d_2018_11_30,
      STRUCT(1 AS earnedPoint, TRUE AS earnedShare) AS d_2018_12_01,
      STRUCT(2 AS earnedPoint, FALSE AS earnedShare) AS d_2018_12_02,
      STRUCT(3 AS earnedPoint, TRUE AS earnedShare) AS d_2018_12_03,
      STRUCT(4 AS earnedPoint, FALSE AS earnedShare) AS d_2018_12_04,
      STRUCT(5 AS earnedPoint, TRUE AS earnedShare) AS d_2018_12_05,
      STRUCT(6 AS earnedPoint, TRUE AS earnedShare) AS d_2018_12_06
    ) markedDates
)
SELECT userID, SUM(CAST(JSON_EXTRACT(REGEXP_EXTRACT(x, r'"d_.*?":(.*)'), '$.earnedPoint') AS FLOAT64)) allEarnedPoint
FROM `project.dataset.table`, 
UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(markedDates), r'"d_.*?":{.*?}')) x
WHERE REGEXP_EXTRACT(x, r'"d_(.*?)"') BETWEEN '2018_12_02' AND '2018_12_05'
GROUP BY userID    

注意:应该与您的数据一样工作 - 但即使您需要做一些调整 - 您应该从上面得到好主意


推荐阅读