python - Python递归执行try,除非条件满足
问题描述
我想逐行迭代文本文件并搜索模式并从中提取实体。但是,提取的几个模式具有多行特征,当我逐行迭代时会丢失这些特征。
现在,我正在使用一个try-except
块并将下一行附加到当前行,例如:
try:
id_value, utterance, prediction = process(line + ' ' + lines[n + 1])
except AttributeError:
# Handle bad data
try:
id_value, utterance, prediction = process(line + ' ' + lines[n + 1] + ' ' + lines[n + 2])
except AttributeError:
# Handle bad data
try:
id_value, utterance, prediction = process(
line + ' ' + lines[n + 1] + ' ' + lines[n + 2] + ' ' + lines[n + 3])
这是数据:
数据.txt
[22 Aug 2019 13:25:12] [ID:9ea1566460506294] INFO [139921763325696] (ModelClassification:056) - Model classification for utterance_1 is 1
[22 Aug 2019 13:26:06] [ID:7ea1566460117776] INFO [139921771718400] (ModelClassification:056) - Model classification for utterance_2
is 1
[22 Aug 2019 13:26:16] [ID:71d1566460492762] INFO [139921771718400] (ModelClassification:056) - Model classification for utterance_3 is 0
如你看到的
[22 Aug 2019 13:26:06] [ID:7ea1566460117776] INFO [139921771718400] (ModelClassification:056) - Model classification for utterance_2
is 1
在逐行迭代时扩展 2 行。
代码
import re
matching_string = 'Model classification for'
id_start_string = '[ID:'
id_end_string = ']'
def process(line):
start_idx = line.find(id_start_string)
end_idx = [s.start() for s in re.finditer(id_end_string, line)]
for end in end_idx:
if end > start_idx:
# Get first index greater than start string index
end_idx = end
break
id_value = line[start_idx + len(id_start_string): end_idx]
groups = re.search('Model classification for (.*) is (0|1)', line).groups()
utterance = groups[0]
prediction = groups[1]
return id_value, utterance, prediction
with open('data.txt', 'r') as f:
lines = f.read().splitlines()
for n, line in enumerate(lines):
# Search for pattern in string
if matching_string in line:
try:
id_value, utterance, prediction = process(line)
except AttributeError:
print('Bad data')
print(line)
print(id_value, utterance, prediction)
我的问题可以有递归解决方案吗?任何帮助是极大的赞赏。
编辑 -
lines = ['22 Aug 2019 13:25:12] [ID:9ea1566460506294] INFO [139921763325696] (ModelClassification:056) - Model classification for utterance_1 is 1', '[22 Aug 2019 13:26:06] [ID:7ea1566460117776] INFO [139921771718400] (ModelClassification:056) - Model classification for utterance_2', ' is 1', '[22 Aug 2019 13:26:16] [ID:71d1566460492762] INFO [139921771718400] (ModelClassification:056) - Model classification for utterance_3 is 0 ']
解决方案
如果要在文件中查找一行。你可以使用 re.findall()
import re
with open("input.txt", "r") as f:
text = f.read()
output = re.findall(r'some regex pattern', text)
output1 = re.findall(r'some other pattern', text)
output2 = re.findall(r'another pattern', text)
with open("output.txt", "w") as f:
f.write(output)
f.write(output1)
f.write(output2)
如果你想递归地做,你可以但 re.findall 听起来像你需要的。
推荐阅读
- html - 如何在点击时更改非活动标签颜色背景
- jenkins - 如何将文件中的参数和变量传递给jenkinsfile?
- amazon-quicksight - AWS-Quicksight 分析
- sccm - SCCM WQL 查询 Where Like 条件的长列表
- twilio - Twilio TaskRouter 点击调用编排
- symfony - Symfony bundle - 生成配置文件
- r - 从具有未知列数的数据框中将多个参数传递给函数
- excel - 为什么相同的代码结构在某些地方有效,但在其他地方却不能用于着色单元格
- back - 如何使用指向同一页面的后退按钮创建链接?
- typescript - 打字稿中 ID 的扩展类型不接收 ID 属性