python - Python:仅在使用正则表达式的字符串中的特定单词之后查找完整文本
问题描述
有一段文字如下:
text = list of documents check 01 original invoice in favour of company z 02 cjpc abstract sheet weighment
slip goods receipt note iz checklist creator id name 30009460 xyz@abc.com
checklist creation date 31 03 2018 checklist print date time 31 03 2018 10 45 57 note anything
written manually on the checklist will not be considered invoice parth enterprise â invoice no dated
kashish aarcade baroi road 18 25 mar 2018 village baroi delivery note mode terms of payment taluka
mundra kutch supplierâ s ref other reference s gst no 24acypt3861 c1 z 7 dated buyer i buyer s order
no 21 jun 2017 abc corporation 5700214006 â dated 40 mwp solar power plant i despatch document no
vill bitta ta naliya abadasa despatched through destination march 18 terms of
目标: 我想提取单词'invoice'之后的文本,特别是'invoice'的第二次出现
我的方法:
txt = re.findall('invoice (.*)',text)
在上述方法中,我期望字符串列表如下:
txt = ['in favour of company z 02 cjpc abstract sheet weighment
slip goods receipt note iz checklist creator id name 30009460 xyz@abc.com
checklist creation date 31 03 2018 checklist print date time 31 03 2018 10 45 57 note anything
written manually on the checklist will not be considered','parth enterprise â invoice no dated
kashish aarcade baroi road 18 25 mar 2018 village baroi delivery note mode terms of payment
taluka ..... #rest of the string]
但是我得到了text
原始字符串中给出的整个字符串。如果我使用text.partition('invoice')
我没有得到正确的字符串,如txt
.
任何帮助,将不胜感激。
解决方案
如果您想获得问题中的 2 个匹配项,则可以使用 2 个捕获组。
第一次匹配,直到第一次出现发票。然后在第二次出现发票之前在组 1 中捕获。
然后再次匹配发票,并捕获组 2 中的其余字符串。
^.*? invoice (.*?) invoice (.*)
例如
import re
text = "list of documents check 01 original invoice in favour of company z 02 cjpc abstract sheet weighment slip goods receipt note iz checklist creator id name 30009460 xyz@abc.comchecklist creation date 31 03 2018 checklist print date time 31 03 2018 10 45 57 note anything written manually on the checklist will not be considered invoice parth enterprise â invoice no dated kashish aarcade baroi road 18 25 mar 2018 village baroi delivery note mode terms of payment taluka mundra kutch supplierâ s ref other reference s gst no 24acypt3861 c1 z 7 dated buyer i buyer s order no 21 jun 2017 abc corporation 5700214006 â dated 40 mwp solar power plant i despatch document no vill bitta ta naliya abadasa despatched through destination march 18 terms of"
regex = r"^.*? invoice (.*?) invoice (.*)"
matches = re.search(regex, text)
if matches:
print(matches.group(1))
print('\n')
print(matches.group(2))
输出
in favour of company z 02 cjpc abstract sheet weighment slip goods receipt note iz checklist creator id name 30009460 xyz@abc.comchecklist creation date 31 03 2018 checklist print date time 31 03 2018 10 45 57 note anything written manually on the checklist will not be considered
parth enterprise â invoice no dated kashish aarcade baroi road 18 25 mar 2018 village baroi delivery note mode terms of payment taluka mundra kutch supplierâ s ref other reference s gst no 24acypt3861 c1 z 7 dated buyer i buyer s order no 21 jun 2017 abc corporation 5700214006 â dated 40 mwp solar power plant i despatch document no vill bitta ta naliya abadasa despatched through destination march 18 terms of
推荐阅读
- xml - Ant 在构建时无法识别变量
- python - CrossEntropyLoss 上的 Pytorch-1.2.0 错误:仅支持批量空间目标(3D 张量)但获得了维度目标
- javascript - 尽管在 redux 商店中设置了 props undefined
- android - 可以在将 tensorflow lite 模型添加到 Android 应用程序之前对其进行压缩吗?
- django - transaction.atomic 是否也适用于 mongoengine
- pandas - 添加缺少的日期时间行并检查重复项
- php - PHP函数更新整个表中每个任务的所有标题字段
- amazon-s3 - S3 TopicConfiguration - 无法验证以下目标配置
- mysql - mysql导出单独创建表并更改
- c# - 在启用 FTS5 的已发布 VSTO 应用程序中,System.Data.SQLite 查询执行失败