首页 > 解决方案 > 如何使用python查找字符串并删除以前的文本?

问题描述

我有一种情况是通过查找特定字符串来删除一行中的前一个文本。

我有一个巨大的文件并希望删除一些不需要的文本。

例如:我有一行如下:

&$GNDTM,W84,,0.0,N,0.0,E,0.0,W84*71
 He    /  [A j  }    .   D   V   Fd     Y       $GLGSV,4,1,13,65,02,318,26,70,06,099,28,71,30,054,35,72,26,356,32*64

在这里,我需要找到一个字符串$G并删除它后面不需要的字符。我需要一个像这样的文件。

$GNDTM,W84,,0.0,N,0.0,E,0.0,W84*71
$GLGSV,4,1,13,65,02,318,26,70,06,099,28,71,30,054,35,72,26,356,32*64

有人可以帮我写一个 python 脚本吗?

标签: pythonregex

解决方案


你可以使用re - 模块来完成这个任务:

# create demo file
t = """&$GNDTM,W84,,0.0,N,0.0,E,0.0,W84*71
 He    /  [A j  }    .   D   V   Fd     Y       $GLGSV,4,1,13,65,02,318,26,70,06,099,28,71,30,054,35,72,26,356,32*64"""

with open("f.txt" ,"w") as f:
    f.write(t)


# process demo file
import re

cleaned = []
r = r"^.*?(\$G.*)$"
with open ("f.txt") as f, open ("r.txt","w") as w:
    for l in f:
        m = re.search(r,l)
        if m:
            w.write(m.group(1).rstrip("\n")+"\n")

with open ("r.txt") as r:
    print(r.read())

输出文件:

$GNDTM,W84,,0.0,N,0.0,E,0.0,W84*71
$GLGSV,4,1,13,65,02,318,26,70,06,099,28,71,30,054,35,72,26,356,32*64

正则表达式搜索从一行开始$G到行尾的所有匹配项。如果找到匹配项,则将其写入新文件。

正则表达式^.*?(\$G.*)$意味着:

^   start of line  
  .*? as few anythings as possible
    ( start of captured group
      \$G  literal $ followed by G
      .* anything greedy
    ) end of captured group
$ end of line

您可能需要在最后一行之后添加 crlf 或集成 \Z。

可能更好地使用你的真实数据和 fe http://regex101.com


推荐阅读