r - str_extract 特定模式
问题描述
我正在尝试从文本中提取具有相同模式的字符串
library(readr)
txt <- read_file('http://www.gutenberg.org/cache/epub/1112/pg1112.txt')
文本示例:
场景一。\r\n维罗纳。公共场所。\r\n\r\nCapulet 的房子\r\n的 Sampson 和 Gregory(带着剑和圆盾)进入。
...
场景 II。\r\n一条街道。\r\n\r\n凯普莱特、帕里斯郡和 [仆人] - 小丑上。\r\n\r\n\r\n 队长。
我要提取
维罗纳。公共场所。
一条街
我试过了
library(stringr)
str_extract(txt, "Scene\\s[IV]+\\.\\s\\s\\b[A-Z]+\\b")
它没有用。
预先感谢您的建议。
解决方案
str_extract_all(gsub("(Scene.*?)\r\n","\\1 ",txt),"Scene.*")
[[1]]
[1] "Scene I. Verona. A public place."
[2] "Scene II. A Street."
[3] "Scene III. Capulet's house."
[4] "Scene IV. A street."
[5] "Scene V. Capulet's house."
[6] "Scene I. A lane by the wall of Capulet's orchard."
[7] "Scene II. Capulet's orchard."
[8] "Scene III. Friar Laurence's cell."
[9] "Scene IV. A street."
[10] "Scene V. Capulet's orchard."
[11] "Scene VI. Friar Laurence's cell."
[12] "Scene I. A public place."
[13] "Scene II. Capulet's orchard."
[14] "Scene III. Friar Laurence's cell."
[15] "Scene IV. Capulet's house"
[16] "Scene V. Capulet's orchard."
[17] "Scene I. Friar Laurence's cell."
[18] "Scene II. Capulet's house."
[19] "Scene III. Juliet's chamber."
[20] "Scene IV. Capulet's house."
[21] "Scene V. Juliet's chamber."
[22] "Scene I. Mantua. A street."
[23] "Scene II. Verona. Friar Laurence's cell."
[24] "Scene III. Verona. A churchyard; in it the monument of the Capulets."
推荐阅读
- javascript - 无限函数调用,例如 'string'.replace().replace()
- sql-server - 获取具有重叠值的不同 id 组
- html - 无法使用 ngFor 正确填充 Select
- php - 通过命令行创建站点备份
- amazon-web-services - 用于 HTTP 到 HTTPS Flask 的 Kubernetes SSL AWS ELB 设置配置映射
- json - 如何将属性序列化为 json 对象?
- javascript - JavaScript:如何在没有任何库/包函数的情况下获得 2-3 个集合的并集
- android - 是否可以从值/颜色的 Uri 将 ColorFilter 设置为 imageView?
- r - 从数据帧计算条件/后验概率
- r - 如何将 tibble 转换为稀疏矩阵