首页 > 解决方案 > 批量读取csv文件。阅读器总是错过同一行

问题描述

我有一个简单的 python 脚本,它csv分批读取 5 个文件。csv 文件总共包含 9 条记录(不包括标题)。下面的脚本以 5 个批次读取文件,但似乎总是跳过ID6 个记录,我做错了什么?

.csv 文件:

"RIG_ID","STATUS_DATE"
"1","2019-04-10
"2","2019-04-11
"3","2019-04-12
"4","2019-04-13
"5","2019-04-14
"6","2019-04-15
"7","2019-04-16
"8","2019-04-17
"9","2019-04-18

Python脚本:

batch_size = 5
transaction_count = 0

parameter_set = []

with open('test.csv', 'r') as file:
    reader = csv.DictReader(file, delimiter=',')

    for row in reader:

        entry = get_entry(row)

        if(len(parameter_set) == batch_size):
            execute_transaction(sql, parameter_set)

            transaction_count = transaction_count + 1
            print(f'Transaction count: {transaction_count}')

            parameter_set.clear()
        else:
            parameter_set.append(entry)
            
    # check if we have records that didn't fit into a batch (i.e. less than 5)
    if(len(parameter_set) > 0):
        execute_transaction(sql, parameter_set)
        transaction_count = transaction_count + 1
        print(f'Transaction count: {transaction_count}')

entry = get_entry(row)如果在第一批完成后我在该行上放置一个断点,我会ID = 7跳过 csv 中的第 6 行。

标签: pythoncsv

解决方案


问题是当你append的情况entry变成:parameter_setiftrue

len(parameter_set) == batch_size

在你清除它之后,你还需要到appendentry的。parameter_set所以我建议:

         if(len(parameter_set) == batch_size):
            execute_transaction(sql, parameter_set)

            transaction_count = transaction_count + 1
            print(f'Transaction count: {transaction_count}')

            parameter_set.clear()
            parameter_set.append(entry)
        else:
            parameter_set.append(entry)

或者为了避免重复代码,您也可以将.append()if-else-condition 移出,因为它总是被执行。

       if(len(parameter_set) == batch_size):
            execute_transaction(sql, parameter_set)

            transaction_count = transaction_count + 1
            print(f'Transaction count: {transaction_count}')

            parameter_set.clear()
            
       parameter_set.append(entry)

推荐阅读