首页 > 解决方案 > 使用 python 在 Windows 中拆分带有标题的 CSV 文件,并从行开头和结尾删除文本限定符

问题描述

我有一个大的 csv 文件,我需要对其进行拆分。我已设法使用以下 python 代码拆分文件:

 import csv

 divisor = 500000

 outfileno = 1 outfile = None

 with open('file_temp.txt', 'r') as infile:
     for index, row in enumerate(csv.reader(infile)):
         if index % divisor == 0:
             if outfile is not None:
                 outfile.close()
             outfilename = 'big-{}.csv'.format(outfileno)
             outfile = open(outfilename, 'w')
             outfileno += 1
             writer = csv.writer(outfile)
         writer.writerow(row)

我面临的问题是文件头没有被复制到其余文件中。您能否让我知道如何修改我的代码以在不同的拆分文件中添加标题。

标签: pythoncsv

解决方案


您只需要缓存标题行,然后为每个 CSV 文件写出,例如:

import csv

divisor = 500000
outfileno = 1
outfile = None

try:
    with open('file_temp.txt', 'r') as infile:
        infile_iter = csv.reader(infile)
        header = next(infile_iter)
        for index, row in enumerate(infile_iter):
            if index % divisor == 0:
                if outfile is not None:
                    outfile.close()
                outfilename = 'big-{}.csv'.format(outfileno)
                outfile = open(outfilename, 'w')
                outfileno += 1
                writer = csv.writer(outfile)
                writer.writerow(header)
            writer.writerow(row)
finally:
    # Don't forget to close the last file
    if outfile is not None:
        outfile.close()

由于您只使用线条,因此您实际上不需要使用 CSV 模块,这里有一个没有它的版本:

divisor = 500000
outfileno = 1
outfile = None

try:
    with open('file_temp.txt', 'r') as infile:
        header = next(infile)
        for index, row in enumerate(infile):
            if index % divisor == 0:
                if outfile is not None:
                    outfile.close()
                outfilename = 'big-{}.csv'.format(outfileno)
                outfile = open(outfilename, 'w')
                outfileno += 1
                outfile.write(header)
            outfile.write(row)
finally:
    # Don't forget to close the last file
    if outfile is not None:
        outfile.close()

推荐阅读