首页 > 解决方案 > 让它更简单,pandas 多文件器迭代

问题描述

import pandas as pd
import glob
import csv
files=glob.glob('*.csv')
for file in files:

    df=pd.read_csv(file, header= None)
    output_file_name = "output_" + file
    with open(output_file_name, 'w') as f:
        f.write("sum of the 1. column is " + str(df.iloc[:, 0].sum())+"\n")
        f.write("sum of the 2. column is " + str(df.iloc[:, 1].sum())+"\n")
        f.write("sum of the 3. column is " + str(df.iloc[:, 2].sum())+"\n")
        f.write("sum of the 4. column is " + str(df.iloc[:, 3].sum())+"\n")

        f.write("max of the 1. column is " + str(df.iloc[:, 0].max()) + "\n")
        f.write("max of the 2. column is " + str(df.iloc[:, 1].max()) + "\n")
        f.write("max of the 3. column is " + str(df.iloc[:, 2].max()) + "\n")
        f.write("max of the 4. column is " + str(df.iloc[:, 3].max()) + "\n")

    f.close()

如何遍历我的熊猫文件,这样我就不必再次重复所有这些行。我想要相同的输出文件,其中包含有关最大值和总和的信息。对于每个 csv 文件,我希望在同一文件夹中创建一个新文件来描述 max、sum、stdn 等。例如,输出文件将是:

sum of the 1. column is 21
sum of the 2. column is 23
sum of the 3. column is 33
sum of the 4. column is 30
max of the 1. column is 6
max of the 2. column is 6
max of the 3. column is 8
max of the 4. column is 9

它怎么能变得更简单 :D :D Tnx

标签: python-3.xpandasloopsdataframe

解决方案


用于iloc选择前 4 列,然后应用函数 by agg,创建以 开头的列1,reshape by stack,使用列表理解创建列表,最后写入文件 by Series.to_csv

files = glob.glob('*.csv')
for file in files:
    df = pd.read_csv(file, header= None)
    df1 = df.iloc[:, :4].agg(['sum','max','std'])
    df1.columns = range(1, len(df1.columns) + 1)
    s = df1.stack()
    L = ['{} of the {}. column is {}'.format(a, b, c) for (a, b), c in s.items()]

    output_file_name = "output_" + file
    pd.Series(L).to_csv(output_file_name, index=False)

推荐阅读