首页 > 解决方案 > 数据框列表,但每个列表项都有一个 df 和另一个值,无法让 bind_rows() 组合它们

问题描述

数据框列表:

mydiamonds <- diamonds %>% 
  group_by(cut, color) %>% 
  mutate(cumprice = cumsum(price)) %>% 
  mutate(lag_cumprice = lag(cumprice)) %>% 
  na.omit(.) %>% 
  group_split %>% 
  map(~ list(dta = ., initial_val = min(.$cumprice)))

如果这只是数据框的列表,没有别的,我想我可以将它们组合成一个数据框,只需:

mydiamonds %>% bind_rows %>% glimpse

但是,这会产生错误:

Error: Internal error in `vec_assign()`: `value` should have been recycled to fit `x`.

大概是因为它不是一个简单的数据框列表,因为每个列表项都有一个 df 和一个数值:

mydiamonds[[1]]
$dta
# A tibble: 162 x 12
   carat cut   color clarity depth table price     x     y     z cumprice lag_cumprice
   <dbl> <ord> <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>    <int>        <int>
 1  0.71 Fair  D     VS2      56.9    65  2858  5.89  5.84  3.34     5706         2848
 2  0.9  Fair  D     SI2      66.9    57  2885  6.02  5.9   3.99     8591         5706
 3  1    Fair  D     SI2      69.3    58  2974  5.96  5.87  4.1     11565         8591
 4  1.01 Fair  D     SI2      64.6    56  3003  6.31  6.24  4.05    14568        11565
 5  0.73 Fair  D     VS1      66      54  3047  5.56  5.66  3.7     17615        14568
 6  0.71 Fair  D     VS2      64.7    58  3077  5.61  5.58  3.62    20692        17615
 7  0.91 Fair  D     SI2      62.5    66  3079  6.08  6.01  3.78    23771        20692
 8  0.9  Fair  D     SI2      65.9    59  3205  6     5.95  3.94    26976        23771
 9  0.9  Fair  D     SI2      66      58  3205  6     5.97  3.95    30181        26976
10  0.9  Fair  D     SI2      64.7    54  3205  6.1   6.04  3.93    33386        30181
# … with 152 more rows

$initial_val
[1] 5706

有没有办法告诉 bind_rows() 只使用$dta每个列表项的一部分?

标签: rdplyr

解决方案


我们pluck只需要tibble通过循环遍历元素,指定行绑定所有元素map的后缀_dfr

library(purrr)
mydiamonds_full <- map_dfr(mydiamonds, pluck, 'dta')

-检查

glimpse(mydiamonds_full)
Rows: 53,905
Columns: 12
$ carat        <dbl> 0.71, 0.90, 1.00, 1.01, 0.73, 0.71, 0.91, 0.90, 0.90, 0.90, 0.90, 0.90, 0.25, 0.70, 1.00, 0.90, 0.95, 0.90, 0.90, 1.00, 0.90, 0.90, 0.91, 0.91, 1.03, 0.90…
$ cut          <ord> Fair, Fair, Fair, Fair, Fair, Fair, Fair, Fair, Fair, Fair, Fair, Fair, Fair, Fair, Fair, Fair, Fair, Fair, Fair, Fair, Fair, Fair, Fair, Fair, Fair, Fair…
$ color        <ord> D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D…
$ clarity      <ord> VS2, SI2, SI2, SI2, VS1, VS2, SI2, SI2, SI2, SI2, SI2, SI2, VS1, VVS2, SI2, SI1, SI2, SI2, SI2, SI2, SI1, SI1, SI1, SI1, SI2, SI1, SI2, SI2, SI1, SI2, SI2…
$ depth        <dbl> 56.9, 66.9, 69.3, 64.6, 66.0, 64.7, 62.5, 65.9, 66.0, 64.7, 65.7, 64.7, 61.2, 58.5, 64.8, 66.4, 64.4, 64.9, 64.5, 65.2, 64.8, 64.5, 64.7, 65.2, 66.4, 65.7…
$ table        <dbl> 65, 57, 58, 56, 54, 58, 66, 59, 58, 54, 60, 59, 55, 62, 60, 59, 60, 57, 61, 56, 59, 61, 61, 57, 56, 65, 66, 58, 61, 59, 59, 60, 57, 66, 66, 55, 56, 59, 53…
$ price        <int> 2858, 2885, 2974, 3003, 3047, 3077, 3079, 3205, 3205, 3205, 3205, 3205, 563, 3296, 3304, 3382, 3384, 3473, 3473, 3634, 3689, 3689, 3730, 3730, 3743, 3751,…
$ x            <dbl> 5.89, 6.02, 5.96, 6.31, 5.56, 5.61, 6.08, 6.00, 6.00, 6.10, 5.98, 6.09, 4.09, 5.72, 6.23, 5.97, 6.06, 6.03, 6.10, 6.27, 6.10, 6.05, 6.06, 6.08, 6.31, 6.06…
$ y            <dbl> 5.84, 5.90, 5.87, 6.24, 5.66, 5.58, 6.01, 5.95, 5.97, 6.04, 5.93, 5.99, 4.11, 5.81, 6.18, 5.92, 6.02, 5.98, 6.00, 6.21, 6.03, 6.01, 5.99, 6.04, 6.19, 5.94…
$ z            <dbl> 3.34, 3.99, 4.10, 4.05, 3.70, 3.62, 3.78, 3.94, 3.95, 3.93, 3.91, 3.91, 2.51, 3.37, 4.02, 3.95, 3.89, 3.90, 3.90, 4.07, 3.93, 3.89, 3.90, 3.95, 4.15, 3.94…
$ cumprice     <int> 5706, 8591, 11565, 14568, 17615, 20692, 23771, 26976, 30181, 33386, 36591, 39796, 40359, 43655, 46959, 50341, 53725, 57198, 60671, 64305, 67994, 71683, 75…
$ lag_cumprice <int> 2848, 5706, 8591, 11565, 14568, 17615, 20692, 23771, 26976, 30181, 33386, 36591, 39796, 40359, 43655, 46959, 50341, 53725, 57198, 60671, 64305, 67994, 716…

或者也可以使用keep只保留tibble元素flatten和行绑定

map_dfr(mydiamonds, ~ keep(.x, is_tibble) %>% 
        flatten_dfr)
# A tibble: 53,905 x 12
   carat cut   color clarity depth table price     x     y     z cumprice lag_cumprice
   <dbl> <ord> <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>    <int>        <int>
 1  0.71 Fair  D     VS2      56.9    65  2858  5.89  5.84  3.34     5706         2848
 2  0.9  Fair  D     SI2      66.9    57  2885  6.02  5.9   3.99     8591         5706
 3  1    Fair  D     SI2      69.3    58  2974  5.96  5.87  4.1     11565         8591
 4  1.01 Fair  D     SI2      64.6    56  3003  6.31  6.24  4.05    14568        11565
 5  0.73 Fair  D     VS1      66      54  3047  5.56  5.66  3.7     17615        14568
 6  0.71 Fair  D     VS2      64.7    58  3077  5.61  5.58  3.62    20692        17615
 7  0.91 Fair  D     SI2      62.5    66  3079  6.08  6.01  3.78    23771        20692
 8  0.9  Fair  D     SI2      65.9    59  3205  6     5.95  3.94    26976        23771
 9  0.9  Fair  D     SI2      66      58  3205  6     5.97  3.95    30181        26976
10  0.9  Fair  D     SI2      64.7    54  3205  6.1   6.04  3.93    33386        30181
# … with 53,895 more rows

或使用base R

do.call(rbind, lapply(mydiamonds, \(x) x$dta))

推荐阅读