首页 > 解决方案 > 如何为整个数据框指定 pivot_wider?

问题描述

我可以使用以下命令对特定列进行 pivot_wider:

new_df <- pivot_wider(old_df, names_from = col10, values_from = value_col, values_fn = list)

我想pivot_wider使用数据框中的每一列(减去一个 id 列)。做这个的最好方式是什么?我应该使用循环还是有办法让这个函数获取整个数据帧?

为了澄清,使用下面的示例数据框,我可以使用上面列出的 pivot_wider 函数从 old_df 转到 new_df。我现在想从 old_df2 转到 new_df2。

old_df <- structure(list(id = c("1", "1", "2"), col10 = c("yellow", 
"green", "green"), value_col = c("1", "1", "1")), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"))

old_df2 <- structure(list(id = c("1", "1", "2"), col10 = c("yellow", 
"green", "green"), col11 = c("dog", 
"cat", "dog"), value_col = c("1", "1", "1")), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"))

new_df <- pivot_wider(old_df, names_from = col10, values_from = value_col, values_fn = list)

new_df2 <- structure(list(id = c("1", "2"), yellow = c("1", "NULL"), green = c("1", "1"), dog = c("1", "1"), cat = c("1", "NULL")), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"))

标签: rdataframepivotreshapetidyr

解决方案


如果您想为这两列(或任意数量的列)之间的每个值使用单独的列名,您首先需要使用pivot_longer将所有列名放入单个列中,然后使用pivot_wider来分散它们:

library(tidyr)

old_df2 %>%
  pivot_longer(!c(id, value_col), names_to = "Cols", values_to = "vals") %>%
  pivot_wider(names_from = vals, values_from = value_col) %>%
  select(-Cols) %>%
  group_by(id) %>%
  summarise(across(everything(), ~ sum(as.numeric(.x), na.rm = TRUE)))

# A tibble: 2 x 5
  id    yellow   dog green   cat
  <chr>  <dbl> <dbl> <dbl> <dbl>
1 1          1     1     1     1
2 2          0     1     1     0

推荐阅读