regex - BigQuery 使用从宽到长的结构重塑表
问题描述
删除了这篇文章的以前版本,而不是这个清理过的帖子,并带有一个可重复的示例。我有以下格式的表格:
WITH wide_stats AS (
(
SELECT
'joe' name, 'bills' team,
struct(struct(7 as fga) as o, struct(8 as fga) as d) as t1,
struct(struct(3 as fga) as o, struct(9 as fga) as d) as t2,
struct(3 as pts, 9 as ast, 5 as reb) as t3,
7 tov, 3 blk
) UNION ALL (
SELECT 'nick' name, 'jets' team,
struct(struct(12 as fga) as o, struct(13 as fga) as d) as t1,
struct(struct(15 as fga) as o, struct(22 as fga) as d) as t2,
struct(13 as pts, 5 as ast, 15 as reb) as t3,
75 tov, 23 blk
)
)
SELECT
name, team, metric, SAFE_CAST(value AS FLOAT64) value
FROM (
SELECT
name, team,
REGEXP_REPLACE(SPLIT(pair, ':')[OFFSET(0)], r'^"|"$', '') metric,
REGEXP_REPLACE(SPLIT(pair, ':')[OFFSET(1)], r'^"|"$', '') value
FROM wide_stats,
UNNEST(SPLIT(REGEXP_REPLACE(to_json_string(wide_stats), r'{|}', ''))) pair
)
WHERE NOT LOWER(metric) IN ('name', 'team')
我正在努力将表格重塑为以下输出:
name team metric value
joe bills t1_o_fga 7
joe bills t1_d_fga 8
joe bills t2_o_fga 3
joe bills t2_d_fga 9
joe bills t3_pts 3
joe bills t3_ast 9
joe bills t3_reb 5
joe bills tov 7
joe bills blk 3
nick jets t1_o_fga 12
nick jets t1_d_fga 13
nick jets t2_o_fga 15
nick jets t2_d_fga 22
nick jets t3_pts 13
nick jets t3_ast 5
nick jets t3_reb 15
nick jets tov 75
nick jets blk 23
该任务很容易解释 - 从宽到长,但在表中使用struct
和嵌套struct
s。我在另一个 stackoveflow 帖子中的正则表达式工作以错误的方式拆分列名,并且当前输出与它需要的不匹配。
行的顺序无关紧要。有了名字,不管它是 t1_o_fga 还是 t1-o-fga 还是 t1/o/fga,只要有一些分隔符并且清楚变量是什么。非常感谢任何帮助或指导,谢谢!
解决方案
以下是 BigQuery 标准 SQL
#standardSQL
WITH wide_stats AS (
SELECT 'joe' name, 'bills' team,
STRUCT(STRUCT(7 AS fga) AS o, STRUCT(8 AS fga) AS d) AS t1,
STRUCT(STRUCT(3 AS fga) AS o, STRUCT(9 AS fga) AS d) AS t2,
STRUCT(3 AS pts, 9 AS ast, 5 AS reb) AS t3, 7 tov, 3 blk UNION ALL
SELECT 'nick' name, 'jets' team,
STRUCT(STRUCT(12 AS fga) AS o, STRUCT(13 AS fga) AS d) AS t1,
STRUCT(STRUCT(15 AS fga) AS o, STRUCT(22 AS fga) AS d) AS t2,
STRUCT(13 AS pts, 5 AS ast, 15 AS reb) AS t3, 75 tov, 23 blk
), flat_stats AS (
SELECT name, team,
t1.o.fga AS t1_o_fga,
t1.d.fga AS t1_d_fga,
t2.o.fga AS t2_o_fga,
t2.d.fga AS t2_d_fga,
t3.pts AS t3_pts,
t3.ast AS t3_ast,
t3.reb AS t3_reb,
tov, blk
FROM wide_stats
)
SELECT name, team, metric, SAFE_CAST(value AS FLOAT64) value
FROM (
SELECT name, team,
REGEXP_REPLACE(SPLIT(pair, ':')[OFFSET(0)], r'^"|"$', '') metric,
REGEXP_REPLACE(SPLIT(pair, ':')[OFFSET(1)], r'^"|"$', '') value
FROM flat_stats,
UNNEST(SPLIT(REGEXP_REPLACE(TO_JSON_STRING(flat_stats), r'{|}', ''))) pair
)
WHERE NOT LOWER(metric) IN ('name', 'team')
带输出
Row name team metric value
1 joe bills t1_o_fga 7.0
2 joe bills t1_d_fga 8.0
3 joe bills t2_o_fga 3.0
4 joe bills t2_d_fga 9.0
5 joe bills t3_pts 3.0
6 joe bills t3_ast 9.0
7 joe bills t3_reb 5.0
8 joe bills tov 7.0
9 joe bills blk 3.0
10 nick jets t1_o_fga 12.0
11 nick jets t1_d_fga 13.0
12 nick jets t2_o_fga 15.0
13 nick jets t2_d_fga 22.0
14 nick jets t3_pts 13.0
15 nick jets t3_ast 5.0
16 nick jets t3_reb 15.0
17 nick jets tov 75.0
18 nick jets blk 23.0
如果由于某种原因您在flat_stats
手动组装临时表时遇到问题 - 您可以做一个像下面这样的小技巧
第 1 步- 只需在旧模式下使用目标表 [project:dataset.flat_stats] 运行以下查询
#legacySQL
SELECT *
FROM [project:dataset.wide_stats]
“令人惊讶”,这将创建[project:dataset.flat_stats]
具有以下结构的 表
Row name team t1_o_fga t1_d_fga t2_o_fga t2_d_fga t3_pts t3_ast t3_reb tov blk
1 joe bills 7 8 3 9 3 9 5 7 3
2 nick jets 12 13 15 22 13 5 15 75 23
第 2 步- 之后您可以简单地在下面运行(现在在标准 SQL 中)
#standardSQL
SELECT name, team, metric, SAFE_CAST(value AS FLOAT64) value
FROM (
SELECT name, team,
REGEXP_REPLACE(SPLIT(pair, ':')[OFFSET(0)], r'^"|"$', '') metric,
REGEXP_REPLACE(SPLIT(pair, ':')[OFFSET(1)], r'^"|"$', '') value
FROM `project.dataset.flat_stats` flat_stats,
UNNEST(SPLIT(REGEXP_REPLACE(TO_JSON_STRING(flat_stats), r'{|}', ''))) pair
)
WHERE NOT LOWER(metric) IN ('name', 'team')
推荐阅读
- html - 固定 div 比移动设备上的屏幕更宽。为什么?
- javascript - setBackgroundImage 无法按预期使用多个功能
- python - 注册为多个用户
- python - 如何根据给定行中第 3 次出现的值获取列?
- angular - Angular2+ 使用 NgFor 创建具有相同内容的 mat-tab
- php - 使用 PHPMailer 发送电子邮件并旋转图像
- mysql - MySQL全文布尔模式搜索带有通配符的两个字母单词
- c++ - 使用相同名称类型的两个不同库
- node.js - 你如何使用 event-stream 来暂停和恢复 Gulp 流?
- php - 奇怪的 curl 结果 php (301 -> 0)