首页 > 解决方案 > Case_when 问题使用 sum 对数据进行分类 - R/dplyr 解决方案

问题描述

我可能在这里做一些愚蠢的事情,但希望能得到一些帮助。我正在尝试对一些填写不正确的数据进行分类。

df <- data.frame(ID = c("A", "A", "A","A", "A", "B", "B", "B", "B", "B"),
                 headache_y_n = c("Yes", "Yes", "Yes", "No", "Yes", "No", "No", "No", "Yes", "No"),
                 headache_days =c("2", "2", "2", "2", "2", "1", "1", "1", "1", "1"))

我想说,如果头痛_y_n 是超过 3 次,每个 ID,那么它符合“延长”的标准,否则它应该是“短”。

因此,我想要以下输出:

output <- data.frame(ID = c("A", "A", "A","A", "A", "B", "B", "B", "B", "B"),
                 headache_y_n = c("Yes", "Yes", "Yes", "No", "Yes", "No", "No", "No", "Yes", "No"),
                 headache_days =c("2", "2", "2", "2", "2", "1", "1", "1", "1", "1"),
                 criteria =c("prolonged", "prolonged", "prolonged", "prolonged", "prolonged", "short", "short", "short", "short", "short"))

我的代码如下:

library(dplyr)
df %>% group_by(ID) %>% mutate(criteria=case_when(
    sum(any(headache_y_n=="Yes") >= 3) ~ "prolonged",
    TRUE ~ "short"
))

不幸的是,它不起作用,我收到以下错误:

Error: Problem with `mutate()` input `criteria`.
x LHS of case 1 (`sum(any(headache_y_n == "Yes") >= 3)`) must be a logical vector, not an integer vector.
ℹ Input `criteria` is `case_when(...)`.
ℹ The error occurred in group 1: ID = "A".

我不够聪明,无法弄清楚我哪里出错了,因此为什么我要请你帮忙!

谢谢!

标签: rdplyr

解决方案


anysum应该切换,即在按“ID”分组后,我们正在计算“是”的数量,即逻辑sum表达式(headache_y_n == 'Yes'),然后在 之后创建第二个表达式sum >=3,将其包装any以匹配(这里可能不需要sum只是一个值)

library(dplyr)
df %>%
     group_by(ID) %>%
     mutate(criteria=case_when(
        any(sum(headache_y_n=="Yes") >= 3) ~ "prolonged",
         TRUE ~ "short"
    ))

即即使删除any,它也会返回相同的

df %>%
      group_by(ID) %>%
      mutate(criteria=case_when(
         sum(headache_y_n=="Yes") >= 3 ~ "prolonged",
          TRUE ~ "short"
     ))
# A tibble: 10 x 4
# Groups:   ID [2]
#   ID    headache_y_n headache_days criteria 
#   <chr> <chr>        <chr>         <chr>    
# 1 A     Yes          2             prolonged
# 2 A     Yes          2             prolonged
# 3 A     Yes          2             prolonged
# 4 A     No           2             prolonged
# 5 A     Yes          2             prolonged
# 6 B     No           1             short    
# 7 B     No           1             short    
# 8 B     No           1             short    
# 9 B     Yes          1             short    
#10 B     No           1             short

推荐阅读