首页 > 解决方案 > 发送多行以使用 pandas

问题描述

我在 csv 文件中有 8492 行(行)。在每次迭代中,我需要向 functionX 发送 1000 行,除了最后一次迭代,它应该发送 1492 行(1000 行,其余行小于 1000 行),我问我如何更新我的代码来做到这一点?

import pandas as pd

    Path_source_file = 'C:/Users/lap/Desktop/bone/Z1.csv' 
    row_count = len(open(Path_source_file).readlines())
    print(row_count)
    count = 1000    # number of sending dataset
    skip=0
    nraw = 1000
    no_cluster = 0
    for i in range(1,row_count+1):
        if count <= row_count  :

            dataset = pd.read_csv(Path_source_file, skiprows=skip ,nrows=nraw,header=None) 
            X = dataset.iloc[:,[0,0]].values
            functionX(X,i)
            no_cluster +=1
            count += 1000
            skip += 1000 
        if(no_cluster == 9):   

          break 

标签: pythonpandasfilecsv

解决方案


为了不一次读取所有文件,您仍然需要读取 2 1000 个块(我假设您可以这样做,因为在最坏的情况下您可以读取 1000+999):

import pandas as pd

Path_source_file = 'C:/Users/lap/Desktop/bone/Z1.csv' 
count = 1000    # number of sending dataset
chunk0 = pd.read_csv(Path_source_file, chunksize=count, header=None)
readLast = False
for chunk1 in pd.read_csv(Path_source_file, chunksize=count, header=None):
    if chunk1.shape[0] == 1000: # All but last iteration (except special case)
        data = chunk0
        chunk0 = chunk1
    else: # Last iteration
        data = chunk0.append(chunk1)
        readLast = True
    X = data.iloc[:,[0,0]].values
    functionX(X) # I removed the index i, I let you reimplement it if needed
if not readLast: # Special case with multiple of 1000, do the last chunk here.
    X = chunk0.iloc[:,[0,0]].values
    functionX(X) # I removed the index i, I let you reimplement it if needed


推荐阅读