首页 > 解决方案 > R - 在收集函数中使用列索引

问题描述

我正在尝试使用收集功能来组合多行并延长我的宽数据。下面的示例数据:

用户 ID 书 1 书 1_YN 书 2 书 2_YN 书 3 书 3_YN 书 4 书 4_YN
1 ABC Y XYZ N LMN Y
2 XYZ Y DEF Y      
3 ABC N XYZ Y TUV N HIJ Y

理想情况下,我希望数据如下表所示,以便总结有关书籍的信息:

用户 ID Book_Num Book Book_YN
1 本书 1 ABC Y
1 本书 2 XYZ N
1 本书 3 LMN Y
2 书 1 XYZ Y
2 书 2 DEF Y
3 本书 1 ABC N
3 书 2 XYZ Y
3 书 3 TUV Y
3 书 4 HIJ Y

当我尝试在收集函数中使用列索引时......

data_clean <- gather(data, Book_Num, Book, data[c(2,4,6,8)]

我收到以下错误:“错误:data[c(2,4,6,8)]必须评估为列位置或名称,而不是列表”

任何人都知道该错误的含义和/或是否有更好的方法来处理此任务?

*已编辑以将图像更改为表格

标签: rtidyr

解决方案


一个选项melt来自data.table

library(data.table)
melt(setDT(df1), measure = patterns("^Book \\d+$", "^Book \\d+_YN$"), na.rm = TRUE,
     value.name = c("Book", "Book_YN"), variable.name = "Book_Num")[, 
      Book_Num := paste("Book", Book_Num)][order(`User ID`)]
#   User ID Book_Num Book Book_YN
#1:       1   Book 1  ABC       Y
#2:       1   Book 2  XYZ       N
#3:       1   Book 3  LMN       Y
#4:       2   Book 1  XYZ       Y
#5:       2   Book 2  DEF       Y
#6:       3   Book 1  ABC       N
#7:       3   Book 2  XYZ       Y
#8:       3   Book 3  TUV       N
#9:       3   Book 4  HIJ       Y

或使用pivot_longer

library(dplyr)
library(tidyr)
library(stringr)
df1 %>%
    rename_at(-1, ~ str_replace(., ' (\\d+)_YN', '_YN \\1')) %>%
    pivot_longer(cols = -`User ID`, names_to = c(".value", "Book_Num"),
      names_sep=" ", values_drop_na = TRUE) %>% 
    mutate(Book_Num = str_c('Book ', Book_Num))
# A tibble: 9 x 4
#  `User ID` Book_Num Book  Book_YN
#      <int> <chr>    <chr> <chr>  
#1         1 Book 1   ABC   Y      
#2         1 Book 2   XYZ   N      
#3         1 Book 3   LMN   Y      
#4         2 Book 1   XYZ   Y      
#5         2 Book 2   DEF   Y      
#6         3 Book 1   ABC   N      
#7         3 Book 2   XYZ   Y      
#8         3 Book 3   TUV   N      
#9         3 Book 4   HIJ   Y      

数据

df1 <- structure(list(`User ID` = 1:3, `Book 1` = c("ABC", "XYZ", "ABC"
), `Book 1_YN` = c("Y", "Y", "N"), `Book 2` = c("XYZ", "DEF", 
"XYZ"), `Book 2_YN` = c("N", "Y", "Y"), `Book 3` = c("LMN", NA, 
"TUV"), `Book 3_YN` = c("Y", NA, "N"), `Book 4` = c(NA, NA, "HIJ"
), `Book 4_YN` = c(NA, NA, "Y")), class = "data.frame", row.names = c(NA, 
-3L))

推荐阅读