r - 在匹配之前找到两个单词

问题描述

我正在尝试使用正则表达式拆分字符串。我的正则表达式代码应该匹配冒号前的两个单词，最终目标是拆分如下内容：

"Joe Biden: We need to reform healthcare. It is important. Bernie Sanders: I agree. It is important."

变成这样的字符串向量：

"Joe Biden" "We need to reform healthcare. It is important." "Bernie Sanders" "I agree. It is important"

我得到的最接近的是：

foo <- strsplit(my_string, split="(\\S+)\\s*(\\S+)\\s*:",perl=TRUE)

但结果删除了正则表达式匹配。我尝试像这样使用lookbehind：

foo <- strsplit(my_string, split="(?<=.)(?=(\\S+)\\s*(\\S+)\\s*:)",perl=TRUE)

但它会抛出一个错误：

  PCRE pattern compilation error
    'lookbehind assertion is not fixed length'
    at ')'

是否有替代正则表达式代码来完成此操作，或者我应该使用其他功能？

标签： rregexstringr

这分裂了由 or 运算符分隔的两件事|。1) 一个空格，后跟两个单词，由一个空格分隔，然后是一个冒号；2) 一个冒号，后跟一个空格。

my_string <- "Joe Biden: We need to reform healthcare. It is important. Bernie Sanders: I agree. It is important."
strsplit(my_string, split="( (?=\\w+ \\w+:)|: )",perl=TRUE)
[[1]]
[1] "Joe Biden"            "We need to reform healthcare. It is important."
[3] "Bernie Sanders"       "I agree. It is important."

如果说话者的名字只有一个词，你会在这里遇到的麻烦。这就是在我对您上一个问题的回答中寻找标点符号的目的。

r - 在匹配之前找到两个单词

问题描述

解决方案

推荐阅读