python-3.x - 使用python从棘手的文本文件中解析数据-如何将所有相关数据放在一行
问题描述
我对 python 比较陌生,并且在一个项目的中间,我必须从格式不佳的文本/NAVTEX 文件(~150000 行)示例文件中提取预测风速数据。我已经设法解析了日期、预测区域,但我遇到了风速线“WND:”的问题,因为在某些情况下它占用了不止一条线,而在其他情况下则没有:
NORTHEAST COAST: *<--- forecast region*
WNG: STORM / FREEZING SPRAY.
WND: NW25. 01/15Z S15. 02/00Z SE25. 02/12Z NE40. 02/18Z N50 LCLY G60 *<--- Wind speed line*
ALONG THE COAST. *<--- Wind speed line*
VIS: 02/00Z-03/03Z 0-1 SN.
我对预测区域有同样的问题,但设法使用以下代码解决了这个问题:
lines = open(newfile,'r').readlines()
finalfile = open(final, 'w')
for i, line in enumerate(lines):
if line.startswith("AND SOUTH:") or line.startswith("BANKS:"):
lines[i-1] = lines[-1].strip() + line
lines.pop(i)
finalfile.write(line)
我尝试使用“VIS:”作为关键字做类似的事情,将“WND:”(风速)放在一行中,但我没有得到想要的结果:
lines = open(newfile,'r').readlines()
finalfile = open(final, 'w')
for i, line in enumerate(lines):
if line.startswith("AND SOUTH:") or line.startswith("BANKS:"):
lines[i-1] = lines[-1].strip() + line
lines.pop(i)
if line.startswith("VIS:"):
if not lines[i-1].startswith("WND:") and lines[i-2].startswith("WND:"):
lines[i-2] = lines[i-1].strip() + lines[i-1]
lines.pop(i-1)
finalfile.write(line)
我想要的输出是:
NORTHEAST COAST: *<--- forecast region*
WNG: STORM / FREEZING SPRAY.
WND: NW25. 01/15Z S15. 02/00Z SE25. 02/12Z NE40. 02/18Z N50 LCLY G60 ALONG THE COAST. *<--- Wind speed line*
VIS: 02/00Z-03/03Z 0-1 SN.
从这里我想我可以根据需要分割风速线。提前致谢。
解决方案
该脚本将找到每个部分,WNG:
然后删除过多的换行符(变量txt
是问题链接中的字符串)(regex101):
import re
def get_lines(txt):
lines = iter(txt.splitlines())
buf = next(lines, '')
for line in lines:
if ': ' in line:
yield buf
buf = line
else:
buf += ' ' + line
if buf:
yield buf
for wind_data in re.findall(r'([^\n]+:\nWNG:.*?)\n\n', txt, flags=re.S):
for line in get_lines(wind_data):
print(line)
print('-' * 80)
印刷:
EAST COAST-CAPE ST FRANCIS AND SOUTH:
WNG: NIL.
WND: SW25 LCLY G35 ALONG THE COAST. 14/23Z SW25. 15/05Z SW15 XCPT SW25 OVER SOUTHERN SECTIONS. 15/11Z LGT XCPT W25 OVER SOUTHERN SECTIONS.
--------------------------------------------------------------------------------
EAST COAST-NORTH OF CAPE ST FRANCIS:
WNG: NIL.
WND: SW25 LCLY G35 ALONG THE COAST. 15/05Z SW15-20. 15/11Z VRB10-15. 15/17Z NW15-20.
--------------------------------------------------------------------------------
NORTHEAST COAST:
WNG: NIL.
WND: SW15-20. 14/21Z VRB15. 15/02Z NW20. 15/23Z NW10-15.
--------------------------------------------------------------------------------
FUNK ISLAND BANK:
WNG: NIL.
WND: SW25. 15/05Z S15-20. 15/17Z VRB10-15.
--------------------------------------------------------------------------------
NORTHERN GRAND BANKS:
WNG: NIL.
WND: SW25. 15/02Z SW15-20.
--------------------------------------------------------------------------------
SOUTHWEST COAST:
WNG: GALE.
WND: W25-35. 14/23Z W25. 15/23Z NW15-20.
--------------------------------------------------------------------------------
SOUTH COAST:
WNG: NIL.
WND: SW25 LCLY G35 ALONG THE COAST. 15/01Z SW25. 15/08Z W25 XCPT W15 OVER NORTHERN SECTIONS.
--------------------------------------------------------------------------------
SOUTHEASTERN GRAND BANKS:
WNG: NIL.
WND: SW15-20. 15/14Z W25.
--------------------------------------------------------------------------------
SOUTHWESTERN GRAND BANKS:
WNG: NIL.
WND: W20. 15/05Z W25.
--------------------------------------------------------------------------------
推荐阅读
- python - 尝试在 X 秒后编辑嵌入时出现 Discord.py 错误
- c# - 使用来自 system.data.sqlite.org 的二进制文件时无法加载 DLL 'SQLite.Interop.dll'
- java - 如何为 LiveData 查询 Room 数据库
在传递变量时 - postgresql - 无法使用 Docker 连接我的 postgres 数据库
- python - 如何为python程序跨平台设置内存限制?
- python - 在多线程 PyQt5 应用程序的异常中输入 pdb 时避免段错误?
- consistency - 哪些编译器为生成的 asm 代码提供保证(保证)?
- markdown - 在pandoc markdown的项目符号之间插入空行?
- java - 平方数的方法参数的困难
- api - 没有密码的xray rest api身份验证