python - Python:提取与另一个 .txt 中的某个单词匹配的 .txt 的某些行(如 grep 函数)
问题描述
我是python世界的新手,所以如果我说一些愚蠢的话,请原谅我......我的脚本有问题,我有一个巨大的站列表(我将称之为huge_list.txt),如下所示:
1ULM MIDAS4 2003.4497 2019.1075 15.6578 5496 4984 7928 -0.013284 -0.000795
20NA MIDAS4 2008.2355 2017.4511 9.2156 2793 2793 5010 0.031619 0.059160
21NA MIDAS4 2008.2355 2017.4648 9.2293 3287 3287 5891 0.031598 0.059243
25MA MIDAS4 2013.3717 2019.1075 5.7358 2007 1279 1398 -0.010216 0.016478
299C MIDAS4 2003.0308 2007.0856 4.0548 1407 1407 2159 -0.003861 -0.021031
2TRY MIDAS4 2012.0465 2013.6564 1.6099 564 437 437 0.018726 0.054083
一行的前四个字母是电台的名称(例如 25MA,299C...)。我创建了一个带有某些电台名称的 .txt(我将其称为“station_list.txt”),如下所示:
20NA
21NA
2TRY
ETC...
我要做的是创建一个 .txt 文件,其中包含与 station_name.txt 中的电台名称匹配的 huge_list.txt 行。我可以这样做,但只能以这种方式用于电台列表的一项:
with open ("station_name.txt", "r") as p:
item='20NA'
def lines_that_start_with(string, fp):
return [line for line in fp if line.startswith(string)]
with open ("station_line.txt", "w") as l:
with open ("C:\huge_list.txt","r")as fp:
for line in lines_that_start_with (item, fp):
print line
l.write (line)
l.close()
我怎样才能让它为我的 station_list 的每个项目运行?
解决方案
# Your huge list will be input.txt
# Your station list will be input2.txt
In [3]: inp1 = open('input.txt')
In [4]: inp2 = open('input2.txt')
# if you don't want to hold anything in memory then this will be hacky solution, memory consuption is also less
with open('input') as inp1:
for i in inp1:
if any([i.startswith(j) for j in inp2]): print(i)
# Result
25MA MIDAS4 2013.3717 2019.1075 5.7358 2007 1279 1398 -0.010216 0.016478
299C MIDAS4 2003.0308 2007.0856 4.0548 1407 1407 2159 -0.003861 -0.021031
# if you want to do some kind of work on filtered data it is better to store it in memory
In [5]: inp1 = {i.split(' ',1)[0] :i.split(' ',1)[1] for i in inp1}
# The above lines read your huge file and convert into key-value pair dict
# result will be something like this.
In [6]: inp1
Out[6]:
{'1ULM': 'MIDAS4 2003.4497 2019.1075 15.6578 5496 4984 7928 -0.013284 -0.000795\n',
'20NA': 'MIDAS4 2008.2355 2017.4511 9.2156 2793 2793 5010 0.031619 0.059160\n',
'21NA': 'MIDAS4 2008.2355 2017.4648 9.2293 3287 3287 5891 0.031598 0.059243\n',
'25MA': 'MIDAS4 2013.3717 2019.1075 5.7358 2007 1279 1398 -0.010216 0.016478\n',
'299C': 'MIDAS4 2003.0308 2007.0856 4.0548 1407 1407 2159 -0.003861 -0.021031\n',
'2TRY': 'MIDAS4 2012.0465 2013.6564 1.6099 564 437 437 0.018726 0.054083'}
# similarly, we are going to do for the station file but slightly a different data structure
In [22]: inp2 = set([i.strip() for i in inp2])
# inp2 will look like
In [23]: inp2
Out[23]: {'25MA', '299C'}
# so to get your result filter the input list based on the station set.
In [24]: res = {k:v for k,v in inp1.items() if k in inp2}
In [25]: res
Out[25]:
{'25MA': 'MIDAS4 2013.3717 2019.1075 5.7358 2007 1279 1398 -0.010216 0.016478\n',
'299C': 'MIDAS4 2003.0308 2007.0856 4.0548 1407 1407 2159 -0.003861 -0.021031\n'}
# Hope this answer helps you
推荐阅读
- c++ - 如何将“extern int getchar (void);”从 stdio.h 转换为 Qt::Key?
- web-services - RestRequest 添加参数不起作用 Delphi Rad Studio
- github - 使用 Firefox 将 UTF-8 复制的文本粘贴到 github 问题中,删除 `\n`?
- uwp - C++/WinRT 控制台 UWP 应用的 AppxManifest.xml 中的入口点是什么?
- c++ - std::string resize 正在破坏比较运算符 (==)
- javascript - 测试片段和列出变量的 Javascript 编辑器
- python - pandas- 以字符串形式获取索引
- python - 在 Python 中解决有理数线性规划问题
- python - 选择至少一列中的值为负数的行
- java - 仅在 out 文件夹和 gradle 运行中无法识别资源