首页 > 解决方案 > 使用 dplyr 以编程方式对任意变量进行乘法、选择和分组

问题描述

在我使用 的代码中dplyr,我经常对数据框变量执行某些操作(这里假设只是简单地乘以 2,以简化 MRE),可选地对另一个变量进行分组,然后select只对一些结果变量进行分组。为了防止代码重复,我想写一个函数。

测试数据框是

library(ggplot2)
msleep_mini <- msleep[1:10, ]

该函数必须重现以下行为。如果使用单个参数调用,例如sleep_total,它只是乘以2 ,sleep_total并返回包含列name、和的数据框:voreordersleep_total

# test_1
msleep_mini %>%
  group_double_select(sleep_total)
#> # A tibble: 20 x 4
#>    name                       vore  order           sleep_total
#>    <chr>                      <chr> <chr>                 <dbl>
#>  1 Cheetah                    carni Carnivora              24.2
#>  2 Owl monkey                 omni  Primates               34  
#>  3 Mountain beaver            herbi Rodentia               28.8
#>  4 Greater short-tailed shrew omni  Soricomorpha           29.8
#>  5 Cow                        herbi Artiodactyla            8  
#>  6 Three-toed sloth           herbi Pilosa                 28.8
#>  7 Northern fur seal          carni Carnivora              17.4
#>  8 Vesper mouse               <NA>  Rodentia               14  
#>  9 Dog                        carni Carnivora              20.2
#> 10 Roe deer                   herbi Artiodactyla            6  

如果使用两个参数调用,则第二个参数被解释为分组变量。同样,第一个乘以 2,但现在数据帧也按第二个参数分组,并根据它进行排序,最后将id包含每个组内的渐进行号的列添加到数据帧中。换句话说,输出将是

# test_2
msleep_mini %>%
  group_double_select(sleep_total, vore)
#> # A tibble: 20 x 5
#> # Groups:   vore [4]
#>    vore  name                       order           sleep_total    id
#>    <chr> <chr>                      <chr>                 <dbl> <int>
#>  1 carni Cheetah                    Carnivora              24.2     1
#>  2 carni Northern fur seal          Carnivora              17.4     2
#>  3 carni Dog                        Carnivora              20.2     3
#>  4 carni Long-nosed armadillo       Cingulata              34.8     4
#>  5 herbi Mountain beaver            Rodentia               28.8     1
#>  6 herbi Cow                        Artiodactyla            8       2
#>  7 herbi Three-toed sloth           Pilosa                 28.8     3
#>  8 herbi Roe deer                   Artiodactyla            6       4
#>  9 herbi Goat                       Artiodactyla           10.6     5
#> 10 herbi Guinea pig                 Rodentia               18.8     6

当然,该函数必须使用任意变量(只要它们可以在数据框中找到):

# test_3
msleep_mini %>%
  group_double_select(sleep_rem, order)
#> # A tibble: 20 x 5
#> # Groups:   order [9]
#>    order           name                       vore  sleep_rem    id
#>    <chr>           <chr>                      <chr>     <dbl> <int>
#>  1 Artiodactyla    Cow                        herbi       1.4     1
#>  2 Artiodactyla    Roe deer                   herbi      NA       2
#>  3 Artiodactyla    Goat                       herbi       1.2     3
#>  4 Carnivora       Cheetah                    carni      NA       1
#>  5 Carnivora       Northern fur seal          carni       2.8     2
#>  6 Carnivora       Dog                        carni       5.8     3
#>  7 Cingulata       Long-nosed armadillo       carni       6.2     1
#>  8 Didelphimorphia North American Opossum     omni        9.8     1
#>  9 Hyracoidea      Tree hyrax                 herbi       1       1
#> 10 Pilosa          Three-toed sloth           herbi       4.4     1

在我看来,以group_double_select健壮和可维护的方式编写的唯一方法是使用整洁的评估,但我可能错了。你能帮助我吗?

标签: rselectgroup-bydplyrtidyeval

解决方案


我们可以missing用来检查函数中是否缺少参数

group_double_select <- function(data, colVar, groupVar) {
   colVar <- enquo(colVar)



   if(missing(groupVar)) {
        data %>% 
              select(name, vore, order, !!colVar) %>% 
              mutate(!! quo_name(colVar) :=  !! colVar * 2)


   } else {
       groupVar <- enquo(groupVar)
       data %>%
            select(name, vore, order, !!colVar) %>%
            mutate(!! quo_name(colVar) :=  !! colVar * 2) %>%
            group_by(!! groupVar) %>%
            mutate(id = row_number()) %>%
            arrange(!! groupVar)





}

}

-测试

msleep_mini %>%
       group_double_select(sleep_total, vore) %>%
       head
# A tibble: 6 x 5
# Groups:   vore [2]
#  name                 vore  order        sleep_total    id
#  <chr>                <chr> <chr>              <dbl> <int>
#1 Cheetah              carni Carnivora           24.2     1
#2 Northern fur seal    carni Carnivora           17.4     2
#3 Dog                  carni Carnivora           20.2     3
#4 Long-nosed armadillo carni Cingulata           34.8     4
#5 Mountain beaver      herbi Rodentia            28.8     1
#6 Cow                  herbi Artiodactyla         8       2



msleep_mini %>% 
       group_double_select(sleep_total) %>%
       head
# A tibble: 6 x 4
#  name                       vore  order        sleep_total
#  <chr>                      <chr> <chr>              <dbl>
#1 Cheetah                    carni Carnivora           24.2
#2 Owl monkey                 omni  Primates            34  
#3 Mountain beaver            herbi Rodentia            28.8
#4 Greater short-tailed shrew omni  Soricomorpha        29.8
#5 Cow                        herbi Artiodactyla         8  
#6 Three-toed sloth           herbi Pilosa              28.8




msleep_mini %>%
       group_double_select(sleep_rem, order) %>%
       head
# A tibble: 6 x 5
# Groups:   order [2]
#  name              vore  order        sleep_rem    id
#  <chr>             <chr> <chr>            <dbl> <int>
#1 Cow               herbi Artiodactyla       1.4     1
#2 Roe deer          herbi Artiodactyla      NA       2
#3 Goat              herbi Artiodactyla       1.2     3
#4 Cheetah           carni Carnivora         NA       1
#5 Northern fur seal carni Carnivora          2.8     2
#6 Dog               carni Carnivora          5.8     3

推荐阅读