首页 > 解决方案 > 在 Dplyr 中评估多行

问题描述

我有显示变量的数据集,我想要执行的计算(总和,不同值的数量)和计算后的新变量名称。

library(dplyr)

RefDf <- read.table(text = "Variables   Calculation NewVariable
Sepal.Length    sum Sepal.Length2
Petal.Length    n_distinct  Petal.LengthNew
", header = T)

手动方法- 通过对 Species 变量分组进行总结。

iris %>% group_by_at("Species") %>% 
  summarise(Sepal.Length2 = sum(Sepal.Length,na.rm = T),
            Petal.LengthNew = n_distinct(Petal.Length, na.rm = T)
            )

通过自动化eval(parse( ))

x <- RefDf %>% mutate(Check = paste0(NewVariable, " = ", Calculation, "(", Variables, ", na.rm = T", ")")) %>% pull(Check)
iris %>% group_by_at("Species") %>% summarise(eval(parse(text = x)))

截至目前,它正在回归 -

  Species    `eval(parse(text = x))`
  <fct>                        <int>
1 setosa                           9
2 versicolor                      19
3 virginica                       20

它应该返回 -

  Species    Sepal.Length2 Petal.LengthNew
  <fct>              <dbl>           <int>
1 setosa              250.               9
2 versicolor          297.              19
3 virginica           329.              20

标签: rdplyr

解决方案


您可以使用parse_exprs

library(tidyverse)
library(rlang)

RefDf <- read.table(text = "Variables   Calculation NewVariable
Sepal.Length    sum Sepal.Length2
Petal.Length    n_distinct  Petal.LengthNew
", header = T)

#
expr_txt <- set_names(str_c(RefDf$Calculation, "(", RefDf$Variables, ")"), 
                      RefDf$NewVariable)

iris %>%
     group_by_at("Species") %>%
     summarise(!!!parse_exprs(expr_txt), .groups = "drop")

## A tibble: 3 x 3
#Species    Sepal.Length2 Petal.LengthNew
#<fct>              <dbl>           <int>
#1 setosa              250.               9
#2 versicolor          297.              19
#3 virginica           329.              20

推荐阅读