首页 > 解决方案 > Reshape from long to wide according to the number of occurrence of one variable

问题描述

I have a dataframe looks like this

df1<-structure(list(person = c("a", "a", "a", "a", "b", "b", "b", 
"c"), visitID = c(123, 123, 256, 816, 237, 828, 828, 911), v1 = c(10, 
5, 15, 8, 95, 41, 31, 16), v2 = c(8, 72, 29, 12, 70, 23, 28, 
66), v3 = c(0, 1, 0, 0, 1, 1, 0, 1)), row.names = c(NA, -8L), class = c("tbl_df", 
"tbl", "data.frame"))

Where person is the name/id of the person and visitID is a number generated for each visit. Now each visit may have one or multiple variables (v1, v2, v3). My problem is that I'm trying to transform the structure where cases are aggregated into unique row with wide visits and variables, to look like:

df2<-structure(list(person = c("a", "b", "c"), visit1 = c(123, 237, 
911), visit2 = c(256, 828, NA), visit3 = c(816, NA, NA), v1.visit1 = c("10,5", 
"95", "16"), v1.visit2 = c("15", "41,31", NA), v1.visit3 = c("8", 
NA, NA), v2.visit1 = c("8,72", "70", "66"), v2.visit2 = c("29", 
"23,28", NA), v1.visit3 = c("12", NA, NA), v3.visit1 = c("0,1", 
"1", "1"), v3.visit2 = c("0", "1,0", NA), v3.visit3 = c("0", 
NA, NA)), row.names = c(NA, -3L), class = c("tbl_df", "tbl", 
"data.frame"))

Methods I have tried so far:
Method1:
1-aggregate according to "person" with all other variables separated by comma
2-split the variables into multiple columns
The problem with this method is that I would not know then which variable corresponds to which visit, especially that some may have multiple entries and some may not.

Method2:
1-Count number of each visitID. Take the maximum number of visits per unique person (in the case above is 3)
2-Create 3 columns for each variable. didn't know how to proceed from here

I found a great answer in the thread Reshape three column data frame to matrix ("long" to "wide" format) so tried working around with reshape and pivot_wider but couldn't get it to work.

Any ideas are appreciated even if did not lead to the same output. Thank you

标签: r

解决方案


You can try something like this:

df1 %>% 
  group_by(person, visitID) %>% 
  summarise(across(matches("v[0-9]+"), list)) %>%
  group_by(person) %>%
  mutate(visit = seq_len(n()) %>% str_c("visit.", .)) %>%  
  ungroup() %>%
  pivot_wider(
    id_cols = person, 
    names_from = visit, 
    values_from = c("visitID", matches("v[0-9]+"))
  ) 

replace list with ~str_c(.x, collapse = ",") if you want to have it in character style.


推荐阅读