r - Reshape from long to wide according to the number of occurrence of one variable
问题描述
I have a dataframe looks like this
df1<-structure(list(person = c("a", "a", "a", "a", "b", "b", "b",
"c"), visitID = c(123, 123, 256, 816, 237, 828, 828, 911), v1 = c(10,
5, 15, 8, 95, 41, 31, 16), v2 = c(8, 72, 29, 12, 70, 23, 28,
66), v3 = c(0, 1, 0, 0, 1, 1, 0, 1)), row.names = c(NA, -8L), class = c("tbl_df",
"tbl", "data.frame"))
Where person is the name/id of the person and visitID
is a number generated for each visit. Now each visit may have one or multiple variables (v1
, v2
, v3
). My problem is that I'm trying to transform the structure where cases are aggregated into unique row with wide visits and variables, to look like:
df2<-structure(list(person = c("a", "b", "c"), visit1 = c(123, 237,
911), visit2 = c(256, 828, NA), visit3 = c(816, NA, NA), v1.visit1 = c("10,5",
"95", "16"), v1.visit2 = c("15", "41,31", NA), v1.visit3 = c("8",
NA, NA), v2.visit1 = c("8,72", "70", "66"), v2.visit2 = c("29",
"23,28", NA), v1.visit3 = c("12", NA, NA), v3.visit1 = c("0,1",
"1", "1"), v3.visit2 = c("0", "1,0", NA), v3.visit3 = c("0",
NA, NA)), row.names = c(NA, -3L), class = c("tbl_df", "tbl",
"data.frame"))
Methods I have tried so far:
Method1:
1-aggregate according to "person" with all other variables separated by comma
2-split the variables into multiple columns
The problem with this method is that I would not know then which variable corresponds to which visit, especially that some may have multiple entries and some may not.
Method2:
1-Count number of each visitID
. Take the maximum number of visits per unique person (in the case above is 3)
2-Create 3 columns for each variable.
didn't know how to proceed from here
I found a great answer in the thread Reshape three column data frame to matrix ("long" to "wide" format) so tried working around with reshape and pivot_wider but couldn't get it to work.
Any ideas are appreciated even if did not lead to the same output. Thank you
解决方案
You can try something like this:
df1 %>%
group_by(person, visitID) %>%
summarise(across(matches("v[0-9]+"), list)) %>%
group_by(person) %>%
mutate(visit = seq_len(n()) %>% str_c("visit.", .)) %>%
ungroup() %>%
pivot_wider(
id_cols = person,
names_from = visit,
values_from = c("visitID", matches("v[0-9]+"))
)
replace list
with ~str_c(.x, collapse = ",")
if you want to have it in character style.
推荐阅读
- reactjs - 更新和删除(单个查询)(EF + .NET Core Api)
- java - 如何对 Kotlin 中共享相同变量的自定义对象进行相同的操作?
- python - 如何在 QTableView 中复制和粘贴多行/列(来自 pandas 数据框的数据源)?
- node.js - 节点循环依赖与 vue-cli 创建的项目
- python - 打开另一个脚本的pygame打开脚本不起作用
- r - R中是否有等效的COUNTIF?
- google-cloud-firestore - Cloud Firestore 数据存储模式 - 没有复合索引的事务争用
- oracle - Oracle 触发器无法访问 user_tab_columns
- react-redux - getServerSideProps 未填充状态
- javascript - 将 HTML 或组件传递给 Vue 插件