首页 > 解决方案 > R:如何在某个关键短语之前和之后提取子字符串?

问题描述

我试图通过关键短语之前和之后出现的几个因素来分割一个长字符串。我可以在第一次出现时部分拆分它,但不能对它们中的每一个进行拆分。此外,之前没有任何处理模式匹配的问题能够为我回答这个问题。

文本示例行:

"#1 Player A advances to third on a wild pitch. #2 Player B advances to second on an error."

部分解决方案:

gsub('((advances).*$)', '', "#1 Player A advances to third on a wild pitch. #2 Player B advances to second on an error.", ("[\\w]*) advances"))

这将返回:

"#1 Player A "

但是,我想:

[1] "#1 Player A advances to third" [2] "#2 Player B advances to second"

作为两个单独的输出字符串。

我不知道在短语“advances to ...”和玩家编号之间提取文本的技术。

先感谢您!

标签: rstringsubstringmatch

解决方案


数字后面总是有一个词吗?如果是这样,这将起作用:

library(stringr)

str_match_all(str1, "(#.*? to \\S+)")[[1]][, 2]
# [1] "#1 Player A advances to third"  "#2 Player B advances to second"

推荐阅读