首页 > 解决方案 > R替换跨数据框的许多列中的特定值

问题描述

我主要对替换数据框中许多列中的特定值(81)感兴趣。

例如,如果这是我的数据集

    Id         Date         Col_01     Col_02   Col_03       Col_04
    30         2012-03-31   1          A42.2    20.46        43  
    36         1996-11-15   42         V73      23           55
    96         2010-02-07   X48        81       13           3R
    40         2010-03-18   AD14       18.12    20.12        36
    69         2012-02-21   8          22.45    12           10                 
    11         2013-07-03   81         V017     78.12        81         
    22         2001-06-01   11         09       55           12
    83         2005-03-16   80.45      V22.15   46.52        X29.11 
    92         2012-02-12   1          4        67           12 
    34         2014-03-10   82.12      N72.22   V45.44       10

我喜欢将列中的值 81 替换Col1, Col2, Col3, Col4为 NA。最终的预期数据集是这样的

    Id         Date         Col_01     Col_02   Col_03       Col_04
    30         2012-03-31   1          A42.2    20.46        43  
    36         1996-11-15   42         V73      23           55
    96         2010-02-07   X48        **NA     13           3R
    40         2010-03-18   AD14       18.12    20.12        36
    69         2012-02-21   8          22.45    12           10                 
    11         2013-07-03   **NA       V017     78.12      **NA         
    22         2001-06-01   11         09       55           12
    83         2005-03-16   80.45      V22.15   46.52        X29.11 
    92         2012-02-12   1          4        67           12 
    34         2014-03-10   82.12      N72.22   V45.44       10

我试过这种方法

df %>% select(matches("^Col_\\d+$"))[ df %>% select(matches("^Col_\\d+$")) == 81 ] <- NA

与此解决方案类似的东西data[ , 2:3 ][ data[ , 2:3 ] == 4 ] <- 10在这里 用 R 中的另一个值替换数据帧的多列中出现的数字

这没有用。

任何建议都非常感谢。提前致谢。

标签: rreplace

解决方案


代替select,我们可以直接指定matchesinmutate将 '81' 的值替换为NA(使用na_if

library(dplyr)
df <- df %>%
   mutate(across(matches("^Col_\\d+$"), ~ na_if(., "81")))

-输出

df
   Id       Date Col_01 Col_02 Col_03 Col_04
1  30 2012-03-31      1  A42.2  20.46     43
2  36 1996-11-15     42    V73     23     55
3  96 2010-02-07    X48   <NA>     13     3R
4  40 2010-03-18   AD14  18.12  20.12     36
5  69 2012-02-21      8  22.45     12     10
6  11 2013-07-03   <NA>   V017  78.12   <NA>
7  22 2001-06-01     11     09     55     12
8  83 2005-03-16  80.45 V22.15  46.52 X29.11
9  92 2012-02-12      1      4     67     12
10 34 2014-03-10  82.12 N72.22 V45.44     10

或者我们可以使用base R

i1 <- grep("^Col_\\d+$", names(df))
df[i1][df[i1] == "81"] <- NA

OP代码中的问题是分配没有像我们预期的那样被触发,即

(df %>% 
     select(matches("^Col_\\d+$")))[(df %>% 
        select(matches("^Col_\\d+$"))) == "81" ]
[1] "81" "81" "81"

这与

df[i1][df[i1] == "81"]
[1] "81" "81" "81"

而不是任务

(df %>% 
      select(matches("^Col_\\d+$")))[(df %>% 
         select(matches("^Col_\\d+$"))) == "81" ] <- NA
Error in (df %>% select(matches("^Col_\\d+$")))[(df %>% select(matches("^Col_\\d+$"))) ==  : 
  could not find function "(<-"

base R中,它使用[<-

数据

df <- structure(list(Id = c(30L, 36L, 96L, 40L, 69L, 11L, 22L, 83L, 
92L, 34L), Date = c("2012-03-31", "1996-11-15", "2010-02-07", 
"2010-03-18", "2012-02-21", "2013-07-03", "2001-06-01", "2005-03-16", 
"2012-02-12", "2014-03-10"), Col_01 = c("1", "42", "X48", "AD14", 
"8", "81", "11", "80.45", "1", "82.12"), Col_02 = c("A42.2", 
"V73", "81", "18.12", "22.45", "V017", "09", "V22.15", "4", "N72.22"
), Col_03 = c("20.46", "23", "13", "20.12", "12", "78.12", "55", 
"46.52", "67", "V45.44"), Col_04 = c("43", "55", "3R", "36", 
"10", "81", "12", "X29.11", "12", "10")),
 class = "data.frame", row.names = c(NA, 
-10L))

推荐阅读