首页 > 解决方案 > 在 R txt 文件中查找和替换数字

问题描述

我试图在中的文本文件中找到所有包含任何格式数字的句子,并将其替换为它们周围的主题标签。

例如,输入以下内容:

ex <- c("I have $5.78 in my account","Hello my name is blank","do you want 1,785 puppies?", 
        "I love stack overflow!","My favorite numbers are 3, 14,568, and 78")

作为函数的输出,我正在寻找:

 > "I have #$5.78# in my account" 
 > "do you want #1,785# puppies?"
 > "My favorite numbers are #3#, #14,568#, and #78#"

标签: rregextext-filesgsubhashtag

解决方案


周围的数字是直截了当的,假设任何带有数字、句点、逗号和美元符号的东西都包括在内。

gsub("\\b([-$0-9.,]+)\\b", "#\\1#", ex)
# [1] "I have $#5.78# in my account"                   
# [2] "Hello my name is blank"                         
# [3] "do you want #1,785# puppies?"                   
# [4] "I love stack overflow!"                         
# [5] "My favorite numbers are #3#, #14,568#, and #78#"

要过滤掉编号的条目:

grep("\\d", gsub("\\b([-$0-9.,]+)\\b", "#\\1#", ex), value = TRUE)
# [1] "I have $#5.78# in my account"                   
# [2] "do you want #1,785# puppies?"                   
# [3] "My favorite numbers are #3#, #14,568#, and #78#"

推荐阅读