首页 > 解决方案 > 我想合并一些txt文件

问题描述

祝大家有美好的一天!

问题是我有一些 txt 文件,并且我有将它们放在一起的脚本。每个txt文件都从:

Export Type:                        by LAI\GCI\SAI
LAI\GCI\SAI:                        fjdfkj
HLR NUMBER:                         NA
Routing Category:                   NA
Telephone Service:                  NA
Export User Scope:                  Attached & Detached User
Task Name:                          lfl;sfd
Data Type:                          col1/col2
Begin Time of Exporting data:       2019-4-14 19:41
=================================
col1                    col2         
401885464645645         54634565754     
401884645645564         54545454564
401087465836453         54545454565     
401885645656567         53434343435
401084569498484         54342340788
401088465836453         56767686334
401439569345656         64545467558
401012993933334         55645342352
401034545566463         34353463464

我想从 col1 和 col2 开始组合(没有列的名称),但脚本也将它们与开头的单词组合起来。你能更新这个脚本吗?

import fileinput
import glob

file_list = glob.glob("*.txt")

with open('resultfile.txt', 'w') as file:
    input_lines = fileinput.input(file_list)
    file.writelines(input_lines)

另一个问题是我想去掉 col2 中值开头的 5,并删除所有不是从 40108/40188/401088e 开始的行。谢谢!

标签: excelpython-3.xpandasdata-science

解决方案


通过指定标题行有选择地导入标题。这提供了对数据帧中“标头”数据的访问。从那里,它们可能被连接起来并写回为 csv。

鉴于问题上的标签,我假设您希望通过 Pandas 执行此操作。

import pandas as pd
from pandas.compat import StringIO
import fileinput
import glob


csvdata = str("""Export Type:                        by LAI\GCI\SAI
LAI\GCI\SAI:                        fjdfkj
HLR NUMBER:                         NA
Routing Category:                   NA
Telephone Service:                  NA
Export User Scope:                  Attached & Detached User
Task Name:                          lfl;sfd
Data Type:                          col1/col2
Begin Time of Exporting data:       2019-4-14 19:41
=================================
col1                    col2
401885464645645         54634565754
401884645645564         54545454564
401087465836453         54545454565
401885645656567         53434343435
401084569498484         54342340788
401088465836453         56767686334
401439569345656         64545467558
401012993933334         55645342352
401034545566463         34353463464""")

files = ["file{}.txt".format(i) for i in range(3)]
for fn in files:
    with open(fn, "w") as f:
        f.write(csvdata)

file_list = glob.glob("file*.txt")

dfs = []
for f in file_list:
    df = pd.read_csv(f, sep="\s+", header=[10])
    dfs.append(df)

df = pd.concat(dfs)
df.reset_index(inplace=True)

df.to_csv("resultfile.txt")

生产

,index,col1,col2
0,0,401885464645645,54634565754
1,1,401884645645564,54545454564
2,2,401087465836453,54545454565
3,3,401885645656567,53434343435
4,4,401084569498484,54342340788
5,5,401088465836453,56767686334
6,6,401439569345656,64545467558
7,7,401012993933334,55645342352
8,8,401034545566463,34353463464
9,0,401885464645645,54634565754
10,1,401884645645564,54545454564
11,2,401087465836453,54545454565
12,3,401885645656567,53434343435
...snip...

推荐阅读