首页 > 解决方案 > 删除单个字符而不更改 r 数据框中的数字

问题描述

我的数据框中有许多箭头,“>”和“<”以及一些元素值。我想删除这些字符但保留数字。我只知道如何用下面的代码用 NA 替换整个元素。

df <- apply(df, 1:2, gsub, pattern = "<|>", replacement = "")

有人可以帮我编辑它,以便它也保留元素编号,而不是把整个东西都扔掉吗?

数据框:

structure(list(`Analyte  Sample` = c(1, 2, 3, 4, 5, 6, 7, 8, 
9, 10, 11, 12, 13, 14), A = c("4190", "6665", "7435", "2052", 
"783", "322", "199", "90", "46", "17", "8", "3", "3", "<1↓&quot;
), B = c("11569", "6677", "3852", "983.88", "589", "359", "203", 
"68", "33", "12", "6", "<2↓&quot;, "4", "<1↓&quot;), C = c("20453", 
"7699", "2499", "707.98", "412", "328", "156", "88", "39", "27", 
"17", "<1↓&quot;, "<3↓&quot;, "<1↓&quot;), D = c("7893", ">20000↑&quot;, 
"1623", "685.64", "321", "644", "112", "65", "35", "29", "9", 
"5", "<3↓&quot;, "<1↓&quot;), E = c("320", "15444", "2049", "1065", 
"389", "365", "145", "77", "38", "16", "9", "6", "<2↓&quot;, "<2↓&quot;
), F = c("7438", ">21999↑&quot;, "3472", "1057", "563", "401", "167", 
"89", "46", "19", "6", "<1↓&quot;, "<1↓&quot;, "<1↓&quot;), G = c(7345, 
9001, 2473, 1138, 516, 403, 134, 81, 37, 17, 8, 6, 4, 3), H = c("9004", 
"3998", "2299", "964.88", "499", "341", "112", "88", "39", "32", 
"<29↓&quot;, "<30↓&quot;, "<31↓&quot;, "<29↓&quot;), I = c("8434", "8700", 
"2217", "1263", "567", "352", "153", "80", "43", "18", "9", "2", 
"3", "<1↓&quot;), J = c("7734", "6733", "2092", "1115", "637", "332", 
"155", "82", "37", "17", "10", "4", "1", "<1↓&quot;), K = c(">3718↑&quot;, 
">3000↑&quot;, "2118", "862.13", "426", "355", "143", "78", "44", 
"22", "11", "<4↓&quot;, "<4↓&quot;, "<3↓&quot;), L = c(6345, 7688, 2311, 
1195, 647, 366, 177, 83, 41, 20, 8, 6, 3, 2), M = c("4222", ">25587↑&quot;, 
"1846", "814.61", "422", "314", "154", "86", "41", "27", "21", 
"<2↓&quot;, "<2↓&quot;, "<3↓&quot;), N = c("6773", "8934", "2381", "1221", 
"677", "356", "146", "89", "40", "17", "10", "5", "2", "<2↓&quot;
), O = c(">2200↑&quot;, ">2133↑&quot;, ">2000↑&quot;, "564.5", "226", 
"476", "111", "60", "32", "36", "18", "<10↓&quot;, "<1↓&quot;, "<2↓&quot;
)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-14L), spec = structure(list(cols = list(`Analyte  Sample` = structure(list(), class = c("collector_double", 
"collector")), A = structure(list(), class = c("collector_character", 
"collector")), B = structure(list(), class = c("collector_character", 
"collector")), C = structure(list(), class = c("collector_character", 
"collector")), D = structure(list(), class = c("collector_character", 
"collector")), E = structure(list(), class = c("collector_character", 
"collector")), F = structure(list(), class = c("collector_character", 
"collector")), G = structure(list(), class = c("collector_double", 
"collector")), H = structure(list(), class = c("collector_character", 
"collector")), I = structure(list(), class = c("collector_character", 
"collector")), J = structure(list(), class = c("collector_character", 
"collector")), K = structure(list(), class = c("collector_character", 
"collector")), L = structure(list(), class = c("collector_double", 
"collector")), M = structure(list(), class = c("collector_character", 
"collector")), N = structure(list(), class = c("collector_character", 
"collector")), O = structure(list(), class = c("collector_character", 
"collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), skip = 1), class = "col_spec"))

标签: r

解决方案


我认为在您的情况下,最好的方法是使用正则表达式。使用 tidyverse:

df %>% mutate_at(vars(A:O), ~ as.numeric(gsub("[^0-9]*([0-9]*).*", "\\1", .)))

如果您只想更改以 a或开头值,请执行以下操作:<>

df %>% mutate_at(vars(A:O), ~ as.numeric(gsub("[<>]*([0-9]*).*", "\\1", .)))

当然,您也可以使用apply... 但请注意 apply 在应用函数之前将数据框更改为矩阵的方式(作为数字的列将以空格为前缀,因此我们需要在模式中包含空格):

apply(df, 2, function(x) gsub("[ <>]*([0-9]*).*", "\\1", x))

解释:

该模式[0-9]*匹配一​​个数字任意多次。该模式可以多次[^0-9]匹配除数字之外的任何内容。


推荐阅读