r - 如何在 R 中使用多个列表列展平数据框(从 BigQuery 加载)
问题描述
我正在使用 R 的bigrquery
库将 BQ 数据库中的数据加载到 R 中。 bigrquery 工作正常,我收到的输出的一个简短示例如下:
dput(my_df_from_bq)
structure(list(season = c("2019", "2019", "2017", "2018", "2018"
), o_or_d = c("Offense", "Defense", "Offense", "Offense", "Defense"
), chances = list(list(num_ato_chances = 6L, ato_pts_scored = 4L,
ato_ppp = 0.66667, num_ato_chances_pctile = 0.272955974842767,
ato_ppp_pctile = 0.335849056603774), list(num_ato_chances = 7L,
ato_pts_scored = 2L, ato_ppp = 0.28571, num_ato_chances_pctile = 0.534591194968553,
ato_ppp_pctile = 0.913207547169811), list(num_ato_chances = 5L,
ato_pts_scored = 2L, ato_ppp = 0.4, num_ato_chances_pctile = 0.147118921127912,
ato_ppp_pctile = 0.177768696362893), list(num_ato_chances = 1L,
ato_pts_scored = 0L, ato_ppp = 0, num_ato_chances_pctile = 0,
ato_ppp_pctile = 0), list(num_ato_chances = 6L, ato_pts_scored = 8L,
ato_ppp = 1.33333, num_ato_chances_pctile = 0.70093839249286,
ato_ppp_pctile = 0.165646674826601)), dribbles = list(list(
dribbles = 928L, dribbles_pctile = 0.437735849056604), list(
dribbles = 1040L, dribbles_pctile = 0.113207547169811), list(
dribbles = 771L, dribbles_pctile = 0.0469963220269718), list(
dribbles = 735L, dribbles_pctile = 0.00489596083231334),
list(dribbles = 1049L, dribbles_pctile = 0.103223174214606))), row.names = c(NA,
-5L), class = c("tbl_df", "tbl", "data.frame"))
> my_df_from_bq
# A tibble: 5 x 4
season o_or_d chances dribbles
<chr> <chr> <list> <list>
1 2019 Offense <named list [5]> <named list [2]>
2 2019 Defense <named list [5]> <named list [2]>
3 2017 Offense <named list [5]> <named list [2]>
4 2018 Offense <named list [5]> <named list [2]>
5 2018 Defense <named list [5]> <named list [2]>
我从 BQ 加载的表包含许多嵌套结构,因此,数据帧的这种结构符合预期,因为bigrquery 文档本身表明嵌套值成为包含命名列表的列表列。
但是,我现在想把它弄平。您会注意到该列表my_df_from_bq$chances[[1]]$
的值包括num_ato_chances
、ato_pts_scored
、ato_ppp
等。因此,我想展平这个数据框,使得列名是:
- 季节
- o_or_d
- chance_num_ato_chances
- 机会_ato_chances_pg
- 机会_ato_ppp
- ...
- dribbles_dribbles
- dribbles_dribbles_pctile
...列表名称与每个列表中的值连接。
解决方案
您可以使用unnest_wider
,但我认为它不会像这个开放的 Github 问题中所引用的那样一次性取消嵌套多个列。
library(tidyr)
my_df_from_bq %>%
unnest_wider(chances, names_sep = "_") %>%
unnest_wider(dribbles, names_sep = "_")
# season o_or_d chances_num_ato… chances_ato_pts… chances_ato_ppp chances_num_ato…
# <chr> <chr> <int> <int> <dbl> <dbl>
#1 2019 Offen… 6 4 0.667 0.273
#2 2019 Defen… 7 2 0.286 0.535
#3 2017 Offen… 5 2 0.4 0.147
#4 2018 Offen… 1 0 0 0
#5 2018 Defen… 6 8 1.33 0.701
# … with 3 more variables: chances_ato_ppp_pctile <dbl>, dribbles_dribbles <int>,
# dribbles_dribbles_pctile <dbl>
推荐阅读
- installation - 如何在 cmake 中的 find_package 之后找到动态库的完整路径?
- java - hibernate 中有关 java.lang.ArithmeticException 的错误
- python - 我可以在终端中运行代码,但不能通过单击 Run(Shift + F10) / PyCharm
- php - 如何修复 Parse 错误:语法错误,意外标记“)”?
- android - android 的 repo 工具如何知道每个 repo 使用哪个分支?
- r - 在 group_by、do() 和更多链/多个条件之后的 dplyr 中的 if 语句
- tensorflow - 为什么我不能适应这个 tensorflow lstm 模型?
- android - 在 Android YouTube Player API 中为高级订阅者移除广告
- stata - 混合后如何得到所有对比结果?
- android - 究竟什么情况下静默登录会失败?