首页 > 解决方案 > “R:dplyr:如何添加将值除以第一组值的列(有点像 vlookup)”

问题描述

我正在尝试分析我的数据以按计划比较 db_perk。我想创建一个列,将 db_perk 除以计划列中第一个计划的 db_perk。这样我可以根据计划看到 db_perk 的差异。

我想把这个数据称为SQL_Table

   plan   gender marital_status accel_type extension_type inflation iss_age   dur    db_perk
   <chr>  <chr>  <chr>          <chr>        <chr>        <chr>     <chr>    <dbl>   <dbl>
 1 BasicF   F    Married        A.24         E.0          AC3.EC3    40       1      0.20
 2 BasicF   F    Married        A.24         E.0          AC3.EC3    40       2      0.25
 3 BasicF   F    Married        A.24         E.0          AC3.EC3    40       3      0.30
 4 BasicF   F    Married        A.24         E.0          AC3.EC3    40       4      0.40
 5 BasicF   M    Single         A.36         E.24         AC3.EC3    40       1      0.15
 6 GradedF  F    Married        A.24         E.0          AC3.EC3    40       1      0.25
 7 GradedF  F    Married        A.24         E.0          AC3.EC3    40       2      0.30
 8 GradedF  F    Married        A.24         E.0          AC3.EC3    40       3      0.50
 9 GradedF  F    Married        A.24         E.0          AC3.EC3    40       4      0.70
10 GradedF  M    Single         A.36         E.24         AC3.EC3    40       1      0.10

并将其转换为:

   plan   gender marital_status accel_type extension_type inflation iss_age   dur    db_perk  db_perk_compare
   <chr>  <chr>  <chr>          <chr>        <chr>        <chr>     <chr>    <dbl>   <dbl>      <dbl>
 1 BasicF   F    Married        A.24         E.0          AC3.EC3    40       1      0.20       1.00
 2 BasicF   F    Married        A.24         E.0          AC3.EC3    40       2      0.25       1.00
 3 BasicF   F    Married        A.24         E.0          AC3.EC3    40       3      0.30       1.00
 4 BasicF   F    Married        A.24         E.0          AC3.EC3    40       4      0.40       1.00
 5 BasicF   M    Single         A.36         E.24         AC3.EC3    40       1      0.15       1.00
 6 GradedF  F    Married        A.24         E.0          AC3.EC3    40       1      0.25       1.25
 7 GradedF  F    Married        A.24         E.0          AC3.EC3    40       2      0.30       1.20
 8 GradedF  F    Married        A.24         E.0          AC3.EC3    40       3      0.50       1.67
 9 GradedF  F    Married        A.24         E.0          AC3.EC3    40       4      0.70       1.75
10 GradedF  M    Single         A.36         E.24         AC3.EC3    40       1      0.10       0.67

如您所见,当计划为“BasicF”时,db_perk_compare 列 =“1”,因为公式是将 db_perk 除以 BasicF 的 db_perk。其他列也可以有多个会影响 db_perk 的不同值。

我试过这样的东西

for (i in nrow(SQL_Table)){
      SQL_Table$db_perk_compare[i] <- SQL_Table$db_perk[i]/SQL_Table$db_perk[which(plan == SQL_Table$plan[1],
                                                                                   gender == SQL_Table$gender[i],
                                                                                   marital_status == SQL_Table$marital_status[i],
                                                                                   accel_type == SQL_Table$accel_type[i],
                                                                                   extension_type  == SQL_Table$extension_type [i],
                                                                                   inflation  == SQL_Table$inflation [i],
                                                                                   iss_age    == SQL_Table$iss_age[i],
                                                                                   dur  == SQL_Table$dur[i])]
  }

但得到这个错误:

Error in which(plan == SQL_Table$plan[1], gender == SQL_Table$gender[i],  : 
  unused arguments (accel_type == SQL_Table$accel_type[i], extension_type == SQL_Table$extension_type[i], inflation == SQL_Table$inflation[i], iss_age == SQL_Table$iss_age[i], dur == SQL_Table$dur[i])

标签: rdplyr

解决方案


使用tidyverse,我们将所有要分组的列放在 中group_by,然后mutate通过除以first该列的观察值来分组“db_perk”

library(tidyverse)
SQL_Table %>%
       arrange(plan != "BasicF")%>%
       group_by(gender, marital_status, accel_type,
                extension_type, inflation, iss_age, dur) %>%
      mutate(db_perk_compare = db_perk/first(db_perk))
# A tibble: 10 x 10
# Groups:   gender, marital_status, accel_type, extension_type, inflation, iss_age, dur [5]
#   plan    gender marital_status accel_type extension_type inflation iss_age   dur db_perk db_perk_compare
#   <chr>   <chr>  <chr>          <chr>      <chr>          <chr>       <int> <int>   <dbl>           <dbl>
# 1 BasicF  F      Married        A.24       E.0            AC3.EC3        40     1    0.2            1    
# 2 BasicF  F      Married        A.24       E.0            AC3.EC3        40     2    0.25           1    
# 3 BasicF  F      Married        A.24       E.0            AC3.EC3        40     3    0.3            1    
# 4 BasicF  F      Married        A.24       E.0            AC3.EC3        40     4    0.4            1    
# 5 BasicF  M      Single         A.36       E.24           AC3.EC3        40     1    0.15           1    
# 6 GradedF F      Married        A.24       E.0            AC3.EC3        40     1    0.25           1.25 
# 7 GradedF F      Married        A.24       E.0            AC3.EC3        40     2    0.3            1.2  
# 8 GradedF F      Married        A.24       E.0            AC3.EC3        40     3    0.5            1.67 
# 9 GradedF F      Married        A.24       E.0            AC3.EC3        40     4    0.7            1.75 
#10 GradedF M      Single         A.36       E.24           AC3.EC3        40     1    0.1            0.667

推荐阅读