python - Figuring out RegEx search term
问题描述
I'm very new to this whole thing. I am using regex to extract data from an HTML which contains:
<p class="bold"> Last Statement:</p>
<p>Yes sir. I would like to thank God, my dad, my Lord Jesus savior for saving me and changing my life. I want to apologize to my in-laws for causing all this emotional pain. I love y’all and consider y’all my sisters I never had. I want to thank you for forgiving me. Thank you warden. </p>
I am trying to extract the text using
word = re.findall('Last Statement:</p>.*<p>(.+)</p>', x)
But it's giving me an empty list. How can I debug that?
解决方案
you were almost here. replacing .* by \s* should make it work.
word = re.findall('Last Statement:</p>\s*<p>(.+)</p>', x)
e.g.
import re
if __name__ == "__main__":
s = """
<p class="bold"> Last Statement:</p>
<p>Yes sir. I would like to thank God, my dad, my Lord Jesus savior for saving me and changing my life. I want to apologize to my in-laws for causing all this emotional pain. I love y’all and consider y’all my sisters I never had. I want to thank you for forgiving me. Thank you warden. </p>
"""
word = re.findall('Last Statement:</p>\s*<p>(.+)</p>', s)
print(word)
since you are processing html, it might be better tough to use an xml parser + xpath to find the text you are interested in...
推荐阅读
- python - 在 tkinter 的树视图中选择多行并同时获取它们
- date - 如何在 CUSUM 图表的 x 轴上绘制日期?
- rust - 如何实现具有泛型类型的对象数组(但实际类型不同)
- javascript - 如果使用 Greasemonkey 命令在内部等待
- javascript - Ant Custom Tree Select 正在关闭下拉菜单
- java - 如何修复不兼容的类型:java.lang.Object 无法转换为 java.util.List
- python - 将数据导入 csv 文件以获取 matplotlib 图形的问题
- javascript - 使用相同的JS函数按类显示和隐藏不同的元素
- apache - mod_rewrite 删除域名的最后一部分,同时保留 URL 的其余部分
- python - 清除命令不断受到速率限制