首页 > 解决方案 > 如何在 for 循环中从多个数据框中添加一列?

问题描述

我有一些 csv 文件存储在一个文件夹中。我想读取它们中的每一个并将特定列汇总到一个 nem 数据框中。它们都具有相同的索引范围和相同的列名。这是我到目前为止所拥有的:

import pandas as pd
import glob
path = r'C:\Users\lsminervino\Desktop\MUN'
files = glob.glob(path + "/*.csv")
df2= pd.DataFrame(index=range(646))
for file in files:    
    df = pd.read_csv(file, encoding="latin", sep=';')


    # new data frame with split value columns 
    new = df["Unnamed: 0"].str.split("-", n = 1, expand = True)


    # making separate first name column from new data frame 
    df["IBGE"]= new[0] 

    # making separate last name column from new data frame 
    df["Cidade"]= new[1]

    # Dropping old Name columns 
    df.drop(columns =["Unnamed: 0"], inplace = True) 

    df = df.set_index('Cidade')

    df2 = df['Total']

df2.head()

Out:
Cidade
 Adamantina          0
 Adolfo              0
 Aguaí               0
 Águas da Prata      0
 Águas de Lindóia    0
Name: Total, dtype: int64

我期望的是新数据框中文件夹中每个文件的名称为“总计”的每一列的总和(我无法在没有错误的情况下进行编码)。

以下是其中一个 .csv 文件的示例:

                  Unnamed: 0  Total  Cadastro  Sem Registro Civil
0        3500105 - Adamantina   17.0      17.0                   0
1            3500204 - Adolfo    3.0       3.0                   0
2             3500303 - Aguaí   14.0      14.0                   0
3    3500402 - Águas da Prata    2.0       2.0                   0
4  3500501 - Águas de Lindóia    0.0       0.0                   0

标签: pythonpandas

解决方案


尝试concatgroupby。这对你有用吗:

import pandas as pd
import glob
path = r'C:\Users\lsminervino\Desktop\MUN'
files = glob.glob(path + "/*.csv")
total_df = []
for file in files:    
    df = pd.read_csv(file, encoding="latin", sep=';')


    # new data frame with split value columns 
    new = df["Unnamed: 0"].str.split("-", n = 1, expand = True)


    # making separate first name column from new data frame 
    df["IBGE"]= new[0] 

    # making separate last name column from new data frame 
    df["Cidade"]= new[1]

    # Dropping old Name columns 
    df.drop(columns =["Unnamed: 0"], inplace = True) 

    total_df.append(df['Total'])

df_final = pd.concat(total_df).groupby(by='Cidade').sum()

推荐阅读