python - 如何提取以下文本中带有评论/文本的所有句子?
问题描述
在这里,我想提取评论/文本。但它只提取了其中的一小部分。以下是输出:- <re.Match 对象;span=(226, 258), match='review/text: I like Creme Brulee'> <re.Match object; span=(750, 860), match='review/text: 不是我所期望的 >
重新进口
text='''
'product/productId: B004K2IHUO\n',
'review/userId: A2O9G2521O626G\n',
'review/profileName: Rachel Westendorf\n',
'review/helpfulness: 0/0\n',
'review/score: 5.0\n',
'review/time: 1308700800\n',
'review/summary: The best\n',
'review/text: I like Creme Brulee. I loved that these were so easy. Just sprinkle on the sugar that came with and broil. They look amazing and taste great. My guess thought I really went out of the way for them when really it took all of 5 minutes. I will be ordering more!\n',
'\n',
'product/productId: B004K2IHUO\n',
'review/userId: A1ZKFQLHFZAEH9\n',
'review/profileName: S. J. Monson "world citizen"\n',
'review/helpfulness: 2/8\n',
'review/score: 3.0\n',
'review/time: 1236384000\n',
'review/summary: disappointing\n',
"review/text: not what I was expecting in terms of the company's reputation for excellent home delivery products\n",
'\n',
'''
pattern=re.compile(r'review/text:\s[^.]+')
matches=pattern.finditer(text)
for match in matches:
print(match)
解决方案
如果您不介意不使用re
并且标识符是'review/text'
并且您的数据始终以逗号分隔,则可以简单地使用以下命令获取这些行:
matches = [s.strip() for s in text.split(',') if s.strip(' "\n\'').startswith('review/text')]
for match in matches:
print(match)
where从行的开头和结尾s.strip(' "\'\n')
删除空格、"
、'
和换行符以进行字符串比较。返回这两行:
'review/text: I like Creme Brulee. I loved that these were so easy. Just sprinkle on the sugar that came with and broil. They look amazing and taste great. My guess thought I really went out of the way for them when really it took all of 5 minutes. I will be ordering more!
'
"review/text: not what I was expecting in terms of the company's reputation for excellent home delivery products
"
推荐阅读
- python - 给定输入大小为 2 个单词,三元组预测下一个单词的行为应该是什么?
- r - 如何在循环中按字符名访问列表中的列表?
- apache-camel - Apache Camel 3.X 迁移 - ClassNotFoundException: org.apache.camel.impl.BreakpointSupport
- c++ - 使用 -Wall 和 -Wextra 对未使用的 lambda 没有警告
- r - 如何通过翻译 R 中的序列获得完整的氨基酸名称?
- xslt - 在 Xslt 1.0 中为 Datetime 添加 3 小时
- python-3.x - 根据Python中的另一个列表消除一个列表
- r - 根据百分比创建一个重复数字数组
- python - AttributeError:“str”对象没有属性“apps”
- delphi - Delphi 10.4 本地化问题