首页 > 解决方案 > R如何在特定单词之后提取所有段落中的所有内容?

问题描述

您好,我正在寻找一个 R 代码来删除特定术语之后所有段落中的每个单词。寻找“谈话:”并替换所有内容直到新段落的示例。我尝试了正则表达式并花时间但无法成功(“fjeaofiz”始终存在)。

x <- c("12 3456 789", "Talk: zpfozefpozjgzigzehgoi oezjgzogzjgoezjgo \r fjeaofiz ", "", NA, "Talk: 667")
stri_sub_all(x, stri_locate_all_regex(x, "^Talk:.*\r", omit_no_match=TRUE)) <- "***"
print(x)

我的输出应该是:

x <-"12 3456 789", "***", "", NA, "***"

有什么帮助吗?

标签: rregexstring

解决方案


You need to use

stri_sub_all(x, stri_locate_all_regex(x, "(?s)^Talk:.*", omit_no_match=TRUE)) <- "***"

The point here is to remove \r (your regex matched only the part of the line until CR char) and use (?s) with .* pattern to match the rest of the whole string, because stringi regex package uses ICU regex flavor and . does not match line break chars (like CR and LF) by default. (?s) enables . to match line breaks.

Probably a simpler approach is to use

sub("^Talk:.*", "***", x)

Here, the default TRE regex library is used and . matches line breaks by default in this regex flavor.


推荐阅读