首页 > 解决方案 > 如何使用正则表达式解析文本文件

问题描述

我正在尝试解析一些日志文件以获取一些数字并放入 CSV 文件。日志文件有很多日志消息,但下面是需要解析的行的摘录。

我正在尝试将下面这个文本文件中的损失和准确度数字转换为 CSV。对 bash 或 python 技巧有什么建议吗?

1500/1500 [==============================] - 1802s 1s/step - loss: 0.3430 - accuracy: 0.8753 - val_loss: 0.1110 - val_accuracy: 0.9670
Epoch 00002: saving model to /root/data-cache/data/tmp/models/ota-cfo-10k_20200527-001913_02_0.069291627_0.98.h5
1500/1500 [==============================] - 1679s 1s/step - loss: 0.0849 - accuracy: 0.9739 - val_loss: 0.0693 - val_accuracy: 0.9807
Epoch 00003: saving model to /root/data-cache/data/tmp/models/ota-cfo-10k_20200527-001913_03_0.055876694_0.98.h5
1500/1500 [==============================] - 1674s 1s/step - loss: 0.0742 - accuracy: 0.9791 - val_loss: 0.0559 - val_accuracy: 0.9845
Epoch 00004: saving model to /root/data-cache/data/tmp/models/ota-cfo-10k_20200527-001913_04_0.053867317_0.99.h5
1500/1500 [==============================] - 1671s 1s/step - loss: 0.0565 - accuracy: 0.9841 - val_loss: 0.0539 - val_accuracy: 0.9850
Epoch 00005: saving model to /root/data-cache/data/tmp/models/ota-cfo-10k_20200527-001913_05_0.053266536_0.99.h5
1500/1500 [==============================] - 1675s 1s/step - loss: 0.0409 - accuracy: 0.9881 - val_loss: 0.0533 - val_accuracy: 0.9855

这是我在 python 中尝试过的:

import re
text = r"""00 [==============================] - 1802s 1s/step - loss: 0.3430 - accuracy: 0.8753 - val_loss: 0.1110 - val_accuracy: 0.9670
Epoch 00002: saving model to /root/data-cache/data/tmp/models/ota-cfo-10k_20200527-001913_02_0.069291627_0.98.h5
1500/1500 [==============================] - 1679s 1s/step - loss: 0.0849 - accuracy: 0.9739 - val_loss: 0.0693 - val_accuracy: 0.9807
Epoch 00003: saving model to /root/data-cache/data/tmp/models/ota-cfo-10k_20200527-001913_03_0.055876694_0.98.h5
1500/1500 [==============================] - 1674s 1s/step - loss: 0.0742 - accuracy: 0.9791 - val_loss: 0.0559 - val_accuracy: 0.9845
Epoch 00004: saving model to /root/data-cache/data/tmp/models/ota-cfo-10k_20200527-001913_04_0.053867317_0.99.h5
1500/1500 [==============================] - 1671s 1s/step - loss: 0.0565 - accuracy: 0.9841 - val_loss: 0.0539 - val_accuracy: 0.9850
Epoch 00005: saving model to /root/data-cache/data/tmp/models/ota-cfo-10k_20200527-001913_05_0.053266536_0.99.h5"""
regular_exp = re.compile(r'^.*val_accuracy.*$', re.M)
for match in regular_exp.finditer(text)
   print(match)

标签: pythonregexbash

解决方案


在 Python 中,使用命名的捕获组:

(?m)^(?P<iteration>\d+(?:/\d+)?)\s+\[=+]\s+-\s+(?P<seconds>\d+)s\s+1s/step\s+-\s+loss:\s*(?P<loss>\d+\.\d+)\s+-\s+accuracy:\s*(?P<accuracy>\d+\.\d+)\s+-\s+val_loss:\s*(?P<val_loss>\d+\.\d+)\s*-\s*val_accuracy:\s*(?P<val_accuracy>\d+\.\d+)\r?\nEpoch\s+(?P<epoch_num>\d+):\s*saving model to\s*(?P<epoch_file>.*)

证明

Python代码:

regular_exp = re.compile(r'^(?P<iteration>\d+(?:/\d+)?)\s+\[=+]\s+-\s+(?P<seconds>\d+)s\s+1s/step\s+-\s+loss:\s*(?P<loss>\d+\.\d+)\s+-\s+accuracy:\s*(?P<accuracy>\d+\.\d+)\s+-\s+val_loss:\s*(?P<val_loss>\d+\.\d+)\s*-\s*val_accuracy:\s*(?P<val_accuracy>\d+\.\d+)\r?\nEpoch\s+(?P<epoch_num>\d+):\s*saving model to\s*(?P<epoch_file>.*)', re.M)
with open(filepath, 'r') as file:
    results = [ match.groupdict() for match in re.finditer(file.read()) ]

在线查看Python 证明,输出

[
    {'iteration': '00', 'seconds': '1802', 'loss': '0.3430', 'accuracy': '0.8753', 'val_loss': '0.1110', 'val_accuracy': '0.9670', 'epoch_num': '00002', 'epoch_file': '/root/data-cache/data/tmp/models/ota-cfo-10k_20200527-001913_02_0.069291627_0.98.h5'}, 
    {'iteration': '1500/1500', 'seconds': '1679', 'loss': '0.0849', 'accuracy': '0.9739', 'val_loss': '0.0693', 'val_accuracy': '0.9807', 'epoch_num': '00003', 'epoch_file': '/root/data-cache/data/tmp/models/ota-cfo-10k_20200527-001913_03_0.055876694_0.98.h5'}, 
    {'iteration': '1500/1500', 'seconds': '1674', 'loss': '0.0742', 'accuracy': '0.9791', 'val_loss': '0.0559', 'val_accuracy': '0.9845', 'epoch_num': '00004', 'epoch_file': '/root/data-cache/data/tmp/models/ota-cfo-10k_20200527-001913_04_0.053867317_0.99.h5'},
    {'iteration': '1500/1500', 'seconds': '1671', 'loss': '0.0565', 'accuracy': '0.9841', 'val_loss': '0.0539', 'val_accuracy': '0.9850', 'epoch_num': '00005', 'epoch_file': '/root/data-cache/data/tmp/models/ota-cfo-10k_20200527-001913_05_0.053266536_0.99.h5'}
]

推荐阅读