首页 > 解决方案 > 替换 R 中的单词

问题描述

我反对他们的同义词。在不同的数据框中,我有句子。我想从其他数据框中搜索同义词。如果找到,将其替换为找到同义词的单词。

dt = read.table(header = TRUE, 
text ="Word Synonyms
Use 'employ, utilize, exhaust, spend, expend, consume, exercise'
Come    'advance, approach, arrive, near, reach'
Go  'depart, disappear, fade, move, proceed, recede, travel'
Run 'dash, escape, elope, flee, hasten, hurry, race, rush, speed, sprint'
Hurry   'rush, run, speed, race, hasten, urge, accelerate, bustle'
Hide    'conceal, cover, mask, cloak, camouflage, screen, shroud, veil'
", stringsAsFactors= F)


   mydf = read.table(header = TRUE, , stringsAsFactors= F,
                    text ="sentence
    'I can utilize this file'
    'I can cover these things'
    ")

所需的输出看起来像 -

I can Use this file
I can Hide these things

以上只是一个示例。在我的真实数据集中,我有超过 10000 个句子。

标签: rstringr

解决方案


可以将,in替换为dt$Synonyms|以便将其用作 的pattern参数gsub。现在,dt$Synonyms用作模式并将出现的任何单词(由 分隔|)替换为dt$word. 可以使用sapplygsub作为:

已编辑:gsub按照 OP 的建议添加了字边界检查(作为模式的一部分)。

# First replace `, ` with `|` in dt$Synonyms. Now dt$Synonyms can be
# used 'pattern' argument of `gsub`.
dt$Synonyms <- paste("\\b",gsub(", ","\\\\b|\\\\b",dt$Synonyms),"\\b", sep = "")

# Loop through each row of 'dt' to replace Synonyms with word using sapply
mydf$sentence <- sapply(mydf$sentence, function(x){
  for(row in 1:nrow(dt)){
    x = gsub(dt$Synonyms[row],dt$Word[row], x)
  }
  x
})

mydf
#                  sentence
# 1     I can Use this file
# 2 I can Hide these things

推荐阅读