首页 > 解决方案 > 展开词收缩

问题描述

我正在编写一个函数来扩展单词收缩。它将数据框作为输入参数,并输出带有“clean_text”列的数据框,并在文本中显示扩展模式。我可以通过使用 qdap mgsub 函数来替换文本中的模式来做到这一点。但是,我想知道是否有更好的解决方案。

contrap_pattern <- c("i'm","you're","he's","she's","it's", "we're", "they're","i've","you've","we've","they've","i'd","you'd","he'd","she'd","we'd","they'd","i'll","you'll","he'll","she'll","we'll","they'll","isn't","aren't","wasn't","weren't","hasn't","haven't","hadn't","doesn't","don't","didn't","won't","wouldn't","shan't","shouldn't","can't","couldn't","mustn't","let's","that's","who's","what's","here's","there's","when's","where's","why's","how's")


replacement_pattern <- c("I am","you are","he is" ,"she is" ,"it is","we are" , "they are", "I have","you have","we have", "they have","I would","you would","he would",  "she would","we would","they would", "I will","you will","he will", "she will" ,"we will","they will","is not","are not","was not","were not","has not" , "have not","had not","does not","do not", "did not" ,"will not","would not", "shall not","should not","can not","could not","must not","let us","that is", "who is","what is","here is", "there is","when is","where is","why is","how is")


clean$text_clean <- qdap::mgsub(pattern = contrap_pattern, replacement = replacement_pattern, clean$text_clean)

更新:无需在代码中明确编写模式,函数 replace_contraction() 就可以满足需要。感谢@phiver 的建议。

标签: rnlpqdap

解决方案


推荐阅读