r - R替换跨数据框的许多列中的特定值
问题描述
我主要对替换数据框中许多列中的特定值(81)感兴趣。
例如,如果这是我的数据集
Id Date Col_01 Col_02 Col_03 Col_04
30 2012-03-31 1 A42.2 20.46 43
36 1996-11-15 42 V73 23 55
96 2010-02-07 X48 81 13 3R
40 2010-03-18 AD14 18.12 20.12 36
69 2012-02-21 8 22.45 12 10
11 2013-07-03 81 V017 78.12 81
22 2001-06-01 11 09 55 12
83 2005-03-16 80.45 V22.15 46.52 X29.11
92 2012-02-12 1 4 67 12
34 2014-03-10 82.12 N72.22 V45.44 10
我喜欢将列中的值 81 替换Col1, Col2, Col3, Col4
为 NA。最终的预期数据集是这样的
Id Date Col_01 Col_02 Col_03 Col_04
30 2012-03-31 1 A42.2 20.46 43
36 1996-11-15 42 V73 23 55
96 2010-02-07 X48 **NA 13 3R
40 2010-03-18 AD14 18.12 20.12 36
69 2012-02-21 8 22.45 12 10
11 2013-07-03 **NA V017 78.12 **NA
22 2001-06-01 11 09 55 12
83 2005-03-16 80.45 V22.15 46.52 X29.11
92 2012-02-12 1 4 67 12
34 2014-03-10 82.12 N72.22 V45.44 10
我试过这种方法
df %>% select(matches("^Col_\\d+$"))[ df %>% select(matches("^Col_\\d+$")) == 81 ] <- NA
与此解决方案类似的东西data[ , 2:3 ][ data[ , 2:3 ] == 4 ] <- 10
在这里
用 R 中的另一个值替换数据帧的多列中出现的数字
这没有用。
任何建议都非常感谢。提前致谢。
解决方案
代替select
,我们可以直接指定matches
inmutate
将 '81' 的值替换为NA
(使用na_if
)
library(dplyr)
df <- df %>%
mutate(across(matches("^Col_\\d+$"), ~ na_if(., "81")))
-输出
df
Id Date Col_01 Col_02 Col_03 Col_04
1 30 2012-03-31 1 A42.2 20.46 43
2 36 1996-11-15 42 V73 23 55
3 96 2010-02-07 X48 <NA> 13 3R
4 40 2010-03-18 AD14 18.12 20.12 36
5 69 2012-02-21 8 22.45 12 10
6 11 2013-07-03 <NA> V017 78.12 <NA>
7 22 2001-06-01 11 09 55 12
8 83 2005-03-16 80.45 V22.15 46.52 X29.11
9 92 2012-02-12 1 4 67 12
10 34 2014-03-10 82.12 N72.22 V45.44 10
或者我们可以使用base R
i1 <- grep("^Col_\\d+$", names(df))
df[i1][df[i1] == "81"] <- NA
OP代码中的问题是分配没有像我们预期的那样被触发,即
(df %>%
select(matches("^Col_\\d+$")))[(df %>%
select(matches("^Col_\\d+$"))) == "81" ]
[1] "81" "81" "81"
这与
df[i1][df[i1] == "81"]
[1] "81" "81" "81"
而不是任务
(df %>%
select(matches("^Col_\\d+$")))[(df %>%
select(matches("^Col_\\d+$"))) == "81" ] <- NA
Error in (df %>% select(matches("^Col_\\d+$")))[(df %>% select(matches("^Col_\\d+$"))) == :
could not find function "(<-"
在base R
中,它使用[<-
数据
df <- structure(list(Id = c(30L, 36L, 96L, 40L, 69L, 11L, 22L, 83L,
92L, 34L), Date = c("2012-03-31", "1996-11-15", "2010-02-07",
"2010-03-18", "2012-02-21", "2013-07-03", "2001-06-01", "2005-03-16",
"2012-02-12", "2014-03-10"), Col_01 = c("1", "42", "X48", "AD14",
"8", "81", "11", "80.45", "1", "82.12"), Col_02 = c("A42.2",
"V73", "81", "18.12", "22.45", "V017", "09", "V22.15", "4", "N72.22"
), Col_03 = c("20.46", "23", "13", "20.12", "12", "78.12", "55",
"46.52", "67", "V45.44"), Col_04 = c("43", "55", "3R", "36",
"10", "81", "12", "X29.11", "12", "10")),
class = "data.frame", row.names = c(NA,
-10L))