首页 > 解决方案 > 使用 dplyr 修改多行值的更有效方法

问题描述

在学习了 R 中数据转换的基础知识之后,我现在正在对数据集进行练习。我确实有四个具有相同值的变量,我想将数值修改为字符串。我在这个网站上找到了函数 case_when() 并将其应用于每一列,但我真的很想更快地做到这一点。

数据如下所示:

 climate_change air_quality water_polution trash
            <dbl>       <dbl>          <dbl> <dbl>
 1              3           2              2     1
 2              3           3              3     3
 3             NA          NA             NA    NA
 4             NA          NA             NA    NA
 5              1           1              1     1
 6              2           1              4     2
 7              2           3              3     2
 8             NA          NA             NA    NA
 9              3           3              2     2
10             NA          NA             NA    NA

我使用了这段代码:

dataset <- dataset %>%
  mutate(climate_change = case_when(
    climate_change %in% c(1) ~ "A very serious problem",
    climate_change %in% c(2) ~ "A somewhat serious problem",
    climate_change %in% c(3) ~ "Not a very serious problem",
    climate_change %in% c(4) ~ "Not at all a serious problem"),
    air_quality = case_when(
      air_quality %in% c(1) ~ "A very serious problem",
      air_quality %in% c(2) ~ "A somewhat serious problem",
      air_quality %in% c(3) ~ "Not a very serious problem",
      air_quality %in% c(4) ~ "Not at all a serious problem"),
    water_polution = case_when(
      water_polution %in% c(1) ~ "A very serious problem",
      water_polution %in% c(2) ~ "A somewhat serious problem",
      water_polution %in% c(3) ~ "Not a very serious problem",
      water_polution %in% c(4) ~ "Not at all a serious problem"),
    trash = case_when(
      trash %in% c(1) ~ "A very serious problem",
      trash %in% c(2) ~ "A somewhat serious problem",
      trash %in% c(3) ~ "Not a very serious problem",
      trash %in% c(4) ~ "Not at all a serious problem"))

除了四个值 (1-4) 之外,这些变量还有两种类型的缺失值 (88, 99)。我省略了 case_when() 函数中的缺失值,因为这些值似乎被自动编码到 NA 中。但是,没有将这些值专门编码到 NA 中是否有任何缺点?

问候

标签: rdplyr

解决方案


要将相同的功能应用于多个列,您可以across在 new中使用dplyr

如果一个数字只有一个值,您可以使用recode

library(dplyr)

dataset %>%
  mutate(across(climate_change:trash, 
                #use everything() if you want to do it for all the columns
                ~recode(., `1` = "A very serious problem",
                           `2` = "A somewhat serious problem",
                           `3` = "Not a very serious problem",
                           `4` = "Not at all a serious problem")))

同样,使用case_when

dataset %>%
   mutate(across(climate_change:trash, 
                ~case_when(. == 1 ~ "A very serious problem",
                           . == 2 ~ "A somewhat serious problem",
                           . == 3 ~ "Not a very serious problem",
                           . == 4 ~"Not at all a serious problem")))

在旧版本dplyr中,您可以使用mutate_at

dataset %>%
   mutate_at(vars(climate_change:trash), ~case_when....
                                          #same code
   #Use mutate_all if you want to do it for all the columns.

推荐阅读