首页 > 解决方案 > 我可以使用 purrr 执行 dplyr 查询并保存每个查询输出的结果吗

问题描述

我有以下数据集:

combined <- data.frame(
  client = c('aaa','aaa','aaa','bbb','bbb','ccc','ccc','ddd','ddd','ddd'),
  type = c('norm','reg','opt','norm','norm','reg','opt','opt','opt','reg'),
  age = c('>50','>50','75+','<25','<25','>50','75+','25-50','25-50','75+'),
  cases = c('1','2','2','1','0','1','2','0','3','2'),
  IsActive = c('1','0','0','1','1','0','1','1','1','0')
)

并确定了独特的变量组合:

# get unique variable combinations
unique_vars <- combined %>%
  select(1:3,5) %>%
  distinct()

我正在尝试combined %>% anti_join(slice(unique_vars,1))使用purrr并保存查询的输出并将cases每个输出的摘要保存回unique_vars表来迭代此查询。切片应该遍历unique_vars的每一行,而不是固定为1

我试过 :

qry <- combined %>% anti_join(slice(unique_vars,1))


map(.x = unique_vars %>%
      slice(.),
      ~qry %>%
      summarise(CaseCnt = sum(cases)) %>%
      inner_join(.x))

我想要的输出将是两件事:

  1. 查询的完整输出
  2. 添加到 unique_vars 数据框中的新字段 CaseCnt

这可能吗?

标签: rdplyrpurrr

解决方案


尽管我没有完全遵循您查询背后的直觉,但对于#1,您似乎想要:

lapply(1:nrow(unique_vars), function(x) {
  combined %>% 
    anti_join(slice(unique_vars, x), keep = TRUE)
})

对于#2,你会想要:

unique_vars$CaseCnt <- lapply(1:nrow(unique_vars), function(x) {
  combined %>% 
    anti_join(slice(unique_vars, x), keep = TRUE) %>%
    summarise(CaseCnt = sum(cases %>% as.numeric))
}) %>% do.call(what = rbind.data.frame, 
               args = .)

或者对于#2 purrr:map_df()

unique_vars$CaseCnt <- map_df(c(1:nrow(unique_vars)), function(x) {
  combined %>% 
    anti_join(slice(unique_vars, x), keep = TRUE) %>%
    summarise(CaseCnt = sum(cases %>% as.numeric))
})

顺便说一句 - 你可以直接这样做:

combined %>% 
  mutate(cases = as.numeric(cases)) %>%
  mutate(tot_cases = sum(cases)) %>%    # sum total cases across unique_id's
  group_by(client, type, age, IsActive) %>%
  summarize(CaseCnt = mean(tot_cases) - sum(cases))

或者,如果您实际要查找的是该组中的案例总和:

combined %>% 
  mutate(cases = as.numeric(cases)) %>%
  group_by(client, type, age, IsActive) %>%
  summarize(CaseCnt = sum(cases))

推荐阅读