python - 如何使用正则表达式解析文本文件
问题描述
我正在尝试解析一些日志文件以获取一些数字并放入 CSV 文件。日志文件有很多日志消息,但下面是需要解析的行的摘录。
我正在尝试将下面这个文本文件中的损失和准确度数字转换为 CSV。对 bash 或 python 技巧有什么建议吗?
1500/1500 [==============================] - 1802s 1s/step - loss: 0.3430 - accuracy: 0.8753 - val_loss: 0.1110 - val_accuracy: 0.9670
Epoch 00002: saving model to /root/data-cache/data/tmp/models/ota-cfo-10k_20200527-001913_02_0.069291627_0.98.h5
1500/1500 [==============================] - 1679s 1s/step - loss: 0.0849 - accuracy: 0.9739 - val_loss: 0.0693 - val_accuracy: 0.9807
Epoch 00003: saving model to /root/data-cache/data/tmp/models/ota-cfo-10k_20200527-001913_03_0.055876694_0.98.h5
1500/1500 [==============================] - 1674s 1s/step - loss: 0.0742 - accuracy: 0.9791 - val_loss: 0.0559 - val_accuracy: 0.9845
Epoch 00004: saving model to /root/data-cache/data/tmp/models/ota-cfo-10k_20200527-001913_04_0.053867317_0.99.h5
1500/1500 [==============================] - 1671s 1s/step - loss: 0.0565 - accuracy: 0.9841 - val_loss: 0.0539 - val_accuracy: 0.9850
Epoch 00005: saving model to /root/data-cache/data/tmp/models/ota-cfo-10k_20200527-001913_05_0.053266536_0.99.h5
1500/1500 [==============================] - 1675s 1s/step - loss: 0.0409 - accuracy: 0.9881 - val_loss: 0.0533 - val_accuracy: 0.9855
这是我在 python 中尝试过的:
import re
text = r"""00 [==============================] - 1802s 1s/step - loss: 0.3430 - accuracy: 0.8753 - val_loss: 0.1110 - val_accuracy: 0.9670
Epoch 00002: saving model to /root/data-cache/data/tmp/models/ota-cfo-10k_20200527-001913_02_0.069291627_0.98.h5
1500/1500 [==============================] - 1679s 1s/step - loss: 0.0849 - accuracy: 0.9739 - val_loss: 0.0693 - val_accuracy: 0.9807
Epoch 00003: saving model to /root/data-cache/data/tmp/models/ota-cfo-10k_20200527-001913_03_0.055876694_0.98.h5
1500/1500 [==============================] - 1674s 1s/step - loss: 0.0742 - accuracy: 0.9791 - val_loss: 0.0559 - val_accuracy: 0.9845
Epoch 00004: saving model to /root/data-cache/data/tmp/models/ota-cfo-10k_20200527-001913_04_0.053867317_0.99.h5
1500/1500 [==============================] - 1671s 1s/step - loss: 0.0565 - accuracy: 0.9841 - val_loss: 0.0539 - val_accuracy: 0.9850
Epoch 00005: saving model to /root/data-cache/data/tmp/models/ota-cfo-10k_20200527-001913_05_0.053266536_0.99.h5"""
regular_exp = re.compile(r'^.*val_accuracy.*$', re.M)
for match in regular_exp.finditer(text)
print(match)
解决方案
在 Python 中,使用命名的捕获组:
(?m)^(?P<iteration>\d+(?:/\d+)?)\s+\[=+]\s+-\s+(?P<seconds>\d+)s\s+1s/step\s+-\s+loss:\s*(?P<loss>\d+\.\d+)\s+-\s+accuracy:\s*(?P<accuracy>\d+\.\d+)\s+-\s+val_loss:\s*(?P<val_loss>\d+\.\d+)\s*-\s*val_accuracy:\s*(?P<val_accuracy>\d+\.\d+)\r?\nEpoch\s+(?P<epoch_num>\d+):\s*saving model to\s*(?P<epoch_file>.*)
见证明。
Python代码:
regular_exp = re.compile(r'^(?P<iteration>\d+(?:/\d+)?)\s+\[=+]\s+-\s+(?P<seconds>\d+)s\s+1s/step\s+-\s+loss:\s*(?P<loss>\d+\.\d+)\s+-\s+accuracy:\s*(?P<accuracy>\d+\.\d+)\s+-\s+val_loss:\s*(?P<val_loss>\d+\.\d+)\s*-\s*val_accuracy:\s*(?P<val_accuracy>\d+\.\d+)\r?\nEpoch\s+(?P<epoch_num>\d+):\s*saving model to\s*(?P<epoch_file>.*)', re.M)
with open(filepath, 'r') as file:
results = [ match.groupdict() for match in re.finditer(file.read()) ]
在线查看Python 证明,输出
[
{'iteration': '00', 'seconds': '1802', 'loss': '0.3430', 'accuracy': '0.8753', 'val_loss': '0.1110', 'val_accuracy': '0.9670', 'epoch_num': '00002', 'epoch_file': '/root/data-cache/data/tmp/models/ota-cfo-10k_20200527-001913_02_0.069291627_0.98.h5'},
{'iteration': '1500/1500', 'seconds': '1679', 'loss': '0.0849', 'accuracy': '0.9739', 'val_loss': '0.0693', 'val_accuracy': '0.9807', 'epoch_num': '00003', 'epoch_file': '/root/data-cache/data/tmp/models/ota-cfo-10k_20200527-001913_03_0.055876694_0.98.h5'},
{'iteration': '1500/1500', 'seconds': '1674', 'loss': '0.0742', 'accuracy': '0.9791', 'val_loss': '0.0559', 'val_accuracy': '0.9845', 'epoch_num': '00004', 'epoch_file': '/root/data-cache/data/tmp/models/ota-cfo-10k_20200527-001913_04_0.053867317_0.99.h5'},
{'iteration': '1500/1500', 'seconds': '1671', 'loss': '0.0565', 'accuracy': '0.9841', 'val_loss': '0.0539', 'val_accuracy': '0.9850', 'epoch_num': '00005', 'epoch_file': '/root/data-cache/data/tmp/models/ota-cfo-10k_20200527-001913_05_0.053266536_0.99.h5'}
]
推荐阅读
- java - 如何通过 TDLib 从 Telegram 获取频道的数据列表?
- javascript - Vue:当我更改控制台日志时,数据仅在第一次渲染时传递给孩子,在刷新时被删除
- java - Websphere 超时异常
- php - 用购物车总数和订阅期替换 WooCommerce“下订单”按钮文本
- docker - docker run 和 docker-compose 结果不同?
- html - 为什么我不能根据 CSS 中的文本调整图像大小?
- c# - 如何生成不连续重复 3 次的随机 int?
- r - 按组划分的因子水平
- activex - 是否有对 Webview2 控件的 ActiveX 支持
- android - Android Autofill 执行回调