python - 如果在使用正则表达式之前已经存在,则查找句子的最后一个单词
问题描述
content = "这是我在第 1 页写的句子,但我在第 2 页写的"
答案:= 第 2 页
content = "This is the sentence I have written at Page 1 but I wrote in Page 2"
res1 = re.search("\.?([^\.]*Page[^\.]*)",contents) ---->Search for words after Page
if res1 is not None:
a = res1.group(1)
extracted_page_sequence = re.sub('^(.*)(?=Page)',"", a) ----->Gets Page 2
print("Extracted sequence",extracted_page_sequence)
如果我使用此代码,我会得到“第 1 页,但我在第 2 页中写的”有解决方案。是否有任何方法在 Python 中使用正则表达式来获取第 2 页有解决方案。简而言之,需要从句子中获取最后一页
解决方案
尝试这个:
all_last_occurrences = re.findall("[^\.]*(Page\s+\w+)\s*[^\.]*", content)
如果content
仅由一个句子组成,例如"This is the sentence I have written at Page 1 but I wrote in Page 2"
,您会在它的末尾得到唯一的出现:
['Page 2']
如果content
包含多个句子(每个句子与下一个句子之间用 a 分隔.
),例如:
content = (
"This is the sentence I have written at Page 1 but I wrote in Page 2. "
"This is the sentence I have written at Page 3 but I wrote in Page 4. "
"This is the sentence I have written at Page 5 but I wrote in Page 6. "
"Page 7. "
"ghghPage 8."
"Page 9 Page 10 "
)
你得到:
['Page 2', 'Page 4', 'Page 6', 'Page 7', 'Page 8', 'Page 10']
推荐阅读
- php - 如果我们从 android webview 清除应用程序,laravel 会话就会过期
- r - 减少点之间的距离
- anaconda - 如何找到 conda-forge 包的 MD5 校验和
- curl - 如何从 developer.twitter.com 运行示例
- ruby - Ruby循环放置两条不同的消息
- r - ggplot2 如何去除重复的点?
- bash - 如何使用索引从 Jenkins 解析 .property 文件中的数据
- c# - 有没有办法在剃须刀页面的 CSHTML 条件中利用 [Authorize] 属性?
- javascript - 开玩笑的模拟功能没有按预期工作
- java - Java 11 junit jupiter assertThrows