首页 > 解决方案 > str_extract 特定模式

问题描述

我正在尝试从文本中提取具有相同模式的字符串

莎士比亚的《罗密欧与朱丽叶的悲剧》

library(readr)

txt <- read_file('http://www.gutenberg.org/cache/epub/1112/pg1112.txt')

文本示例:

场景一。\r\n维罗纳。公共场所。\r\n\r\nCapulet 的房子\r\n的 Sampson 和 Gregory(带着剑和圆盾)进入。
...
场景 II。\r\n一条街道。\r\n\r\n凯普莱特、帕里斯郡和 [仆人] - 小丑上。\r\n\r\n\r\n 队长。

我要提取

维罗纳。公共场所。
一条街

我试过了

library(stringr)

str_extract(txt, "Scene\\s[IV]+\\.\\s\\s\\b[A-Z]+\\b")

它没有用。

预先感谢您的建议。

标签: rstringr

解决方案


str_extract_all(gsub("(Scene.*?)\r\n","\\1 ",txt),"Scene.*")
[[1]]
 [1] "Scene I. Verona. A public place."                                    
 [2] "Scene II. A Street."                                                 
 [3] "Scene III. Capulet's house."                                         
 [4] "Scene IV. A street."                                                 
 [5] "Scene V. Capulet's house."                                           
 [6] "Scene I. A lane by the wall of Capulet's orchard."                   
 [7] "Scene II. Capulet's orchard."                                        
 [8] "Scene III. Friar Laurence's cell."                                   
 [9] "Scene IV. A street."                                                 
[10] "Scene V. Capulet's orchard."                                         
[11] "Scene VI. Friar Laurence's cell."                                    
[12] "Scene I. A public place."                                            
[13] "Scene II. Capulet's orchard."                                        
[14] "Scene III. Friar Laurence's cell."                                   
[15] "Scene IV. Capulet's house"                                           
[16] "Scene V. Capulet's orchard."                                         
[17] "Scene I. Friar Laurence's cell."                                     
[18] "Scene II. Capulet's house."                                          
[19] "Scene III. Juliet's chamber."                                        
[20] "Scene IV. Capulet's house."                                          
[21] "Scene V. Juliet's chamber."                                          
[22] "Scene I. Mantua. A street."                                          
[23] "Scene II. Verona. Friar Laurence's cell."                            
[24] "Scene III. Verona. A churchyard; in it the monument of the Capulets."

推荐阅读