首页 > 解决方案 > 如何修复行间没有分隔符或空格的 .txt 文件

问题描述

我正在尝试从我无法控制的设备中读取记录的观察结果,并且 .txt 日志的格式不太理想。该文件每行应该有 3 列:日期、时间和观察。但是 .txt 文件在观察和下一个日期之间没有分隔符或空格。例子:

11/20/20,00:00,44.411/20/20,00:05,44.411/20/20,00:10,44.6 ... and so on.

理想情况下应格式化为

11/20/20,00:00,44.4
11/20/20,00:05,44.4
11/20/20,00:10,44.6 

与每条线相关的日期、时间和观察结果。有没有办法让熊猫按照我想要的方式读取这个文件?

标签: pythonpandas

解决方案


使用正则表达式查找每一行:

import re
import pandas as pd

# change s to your original text input
s = '11/20/20,00:00,44.411/20/20,00:05,44.411/20/20,00:10,44.6'

# use regular expression to findall occurrences of the pattern
data = re.findall('\d{2}/\d{2}/\d{2},\d{2}:\d{2},\d{2}\.\d', s)

# pass it to a DataFrame
df = pd.DataFrame(data)

print(df)

输出

                     0
0  11/20/20,00:00,44.4
1  11/20/20,00:05,44.4
2  11/20/20,00:10,44.6

推荐阅读