首页 > 解决方案 > 如何根据 R 中纵向情况下不同变量的状态对一个变量进行分组?

问题描述

我是 R 新手,所以请放轻松...我有一些纵向数据看起来像 这个

基本上,我试图找到一种方法来获取一个表格,其中 a) 具有所有完整数据的唯一案例的数量和 b) 具有至少一个不完整或缺失数据的唯一案例的数量。理想的最终结果是

这个

df<- df %>% group_by(Location)
df1<- df %>% group_by(any(Completion_status=='Incomplete' | 'Missing'))

标签: rdplyrany

解决方案


不确定您想要什么,因为您的请求和所需的输出之间似乎有些不一致,但是让我们尝试一下,您似乎需要一种频率表,您可以使用基本 R 来管理。在底部回答您可以找到一些与您的数据相似的数据。

# You have two cases, the Complete, and the other, so here a new column about it:
data$case <- ifelse(data$Completion_status =='Complete','Complete', 'MorIn')

# now a frequency table about them: if you want a data.frame, here we go
result <- as.data.frame.matrix(table(data$Location,data$case))

# now the location as a new column rather than the rownames
result$Location <- rownames(result)

# and lastly a data.frame with the final results: note that you can change the names
# of the columns but if you want spaces maybe a tibble is better 
result <- data.frame(Location = result$Location,
                     `Number.complete` = result$Complete,
                     `Number.incomplete.missing` = result$MorIn)

result
     Location Number.complete Number.incomplete.missing
1      London               0                         1
2 Los Angeles               0                         1
3       Paris               3                         1
4     Phoenix               0                         2
5     Toronto               1                         1

或者,如果您更喜欢 dplyr 链:

data %>%
mutate(case = ifelse(data$Completion_status =='Complete','Complete', 'MorIn')) %>%
do( as.data.frame.matrix(table(.$Location,.$case))) %>%
mutate(Location = rownames(.)) %>%
select(3,1,2) %>%
`colnames<-`(c("Location","Number of complete ", "Number of incomplete or"))
     Location Number of complete  Number of incomplete or
1      London                   0                       1
2 Los Angeles                   0                       1
3       Paris                   3                       1
4     Phoenix                   0                       2
5     Toronto                   1                       1

有数据:

# here your data (next time try to put them in an usable way in the question)
    data <- data.frame( ID = c("A1","A1","A2","A2","B1","C1","C2","D1","D2","E1"),
                        Location = c('Paris','Paris','Paris','Paris','London','Toronto','Toronto','Phoenix','Phoenix','Los Angeles'),
                        Completion_status = c('Complete','Complete','Incomplete','Complete','Incomplete','Missing',
                                 'Complete','Incomplete','Incomplete','Missing'))

推荐阅读