首页 > 解决方案 > 旋转数据框并粘贴字符

问题描述

我有两个数据框 df 和 df_class

df <- structure(list(a = c(1,2,2,0),
                       b = c(2,1,2,2)),row.names=c(NA,-4L) ,class = "data.frame")
  a b
1 1 2
2 2 1
3 2 2
4 0 2
df_class <- structure(list(atype = c("A1","A2","A3","A4"),
                                 btype = c("B1","B2","B3","B4")),row.names=c(NA,-4L) ,class = "data.frame")
  atype btype
1    A1    B1
2    A2    B2
3    A3    B3
4    A4    B4

我使用这个将 df 更改为宽格式(归功于 Ronak Shah):

library(dplyr)
library(tidyr)

df %>%
  mutate(row = row_number()) %>%
  pivot_longer(cols = -row) %>%
  filter(value != 0) %>%
  group_by(row, value) %>%
  summarise(val = paste(name, collapse = "/")) %>%
  pivot_wider(names_from = value, values_from = val)
  row    1   2
1   1    a   b
2   2    b   a
3   3 <NA> a/b
4   4 <NA>   b

使用来自 df 的模式,我想获得一个转换后的 df_class,如下所示:

  1_type 2_type
1    A1    B1
2    B2    A2
3    NA    A3/B3
4    NA    B4

我尝试组合 df 和 df_class,然后使用 pivot_wider(cols=c("a","b")) 但无法正确获得正确的 1_type、2_type 列。实际上,我有 4 列,并且正在将它们变成 2。

标签: rdplyrtidyr

解决方案


在这种情况下,我们可以在通过连接将“df_class”转换为“long”格式后从“df_class”创建一个列,然后summarise使用“df_class”中的列代替“value”列

library(dplyr)
library(tidyr)
library(stringr)
df %>%
  mutate(row = row_number()) %>%
  pivot_longer(cols = -row) %>%
  left_join( df_class %>% 
                mutate(row = row_number()) %>%
                pivot_longer(cols = -row) %>%
                mutate(name = str_remove(name, 'type')),
     by = c('row', 'name')) %>% 
  filter(value.x != 0) %>%
  group_by(row, value.x = str_c(value.x, "_type")) %>%
  summarise(val = str_c(value.y, collapse="/")) %>% 
  ungroup %>%
  pivot_wider(names_from = value.x, values_from = val) %>%
  select(-row)
# A tibble: 4 x 2
#  `1_type` `2_type`
#  <chr>    <chr>   
#1 A1       B1      
#2 B2       A2      
#3 <NA>     A3/B3   
#4 <NA>     B4     

或者另一种选择是将数据集绑定在一起bind_colsnames_sep在执行时使用pivot_longer

df %>%
   rename_all(~ str_c(., '_number')) %>%
   bind_cols(df_class %>%
                rename_all(~ str_replace(., 'type', '_type'))) %>% 
   mutate(rn = row_number()) %>%
   pivot_longer(cols = -rn, names_sep = "_", names_to = c("group", ".value")) %>%
   filter(number != 0) %>% 
   group_by(rn, number = str_c(number, "_type")) %>%
   summarise(type = str_c(type, collapse="/")) %>% 
   ungroup %>% 
   pivot_wider(names_from = number, values_from = type) %>% 
   select(-rn)
# A tibble: 4 x 2
#  `1_type` `2_type`
#  <chr>    <chr>   
#1 A1       B1      
#2 B2       A2      
#3 <NA>     A3/B3   
#4 <NA>     B4    

推荐阅读