首页 > 解决方案 > 一次将多个数据帧合并在一起

问题描述

我有所有这些数据框:

demographic_data1 = pd.read_csv('Demographic Data Part 1',index_col=0,dtype={'Year':object})
demographic_data2 = pd.read_csv('Demographic Data Part 2',index_col=0,dtype={'Year':object})
employment_data1 = pd.read_csv('Employment Data Part 1',index_col=0,dtype={'Year':object})
employment_data2 = pd.read_csv('Employment Data Part 2',index_col=0,dtype={'Year':object})
employment_data3 = pd.read_csv('Employment Data Part 3',index_col=0,dtype={'Year':object})
employment_data4 = pd.read_csv('Employment Data Part 4',index_col=0,dtype={'Year':object})
employment_data5 = pd.read_csv('Employment Data Part 5',index_col=0,dtype={'Year':object})
employment_data6 = pd.read_csv('Employment Data Part 6',index_col=0,dtype={'Year':object})
employment_data7 = pd.read_csv('Employment Data Part 7',index_col=0,dtype={'Year':object})
employment_data8 = pd.read_csv('Employment Data Part 8',index_col=0,dtype={'Year':object})
employment_data9 = pd.read_csv('Employment Data Part 9',index_col=0,dtype={'Year':object})
employment_data10 = pd.read_csv('Employment Data Part 10',index_col=0,dtype={'Year':object})
employment_data11 = pd.read_csv('Employment Data Part 11',index_col=0,dtype={'Year':object})
employment_data12 = pd.read_csv('Employment Data Part 12',index_col=0,dtype={'Year':object})
employment_data13 = pd.read_csv('Employment Data Part 13',index_col=0,dtype={'Year':object})
health_insurance_data = pd.read_csv('Health Insurance Data Part 1',index_col=0,dtype={'Year':object})
orig_data_updated = pd.read_csv('ML Original Data Updated 2018',index_col=0,dtype={'Year':object})

如果我想加入其中两个,我必须这样做:

new_df1 = orig_data_updated.merge(demographic_data1.drop_duplicates(subset=['Location+Type']), how='left')

然后为了继续加入更多我这样做:

new_df2 = new_df1.merge(demographic_data2.drop_duplicates(subset=['Location+Type']), how='left')

我如何一口气做到这一点?

标签: pythonpandasdataframe

解决方案


更新答案:

Pandas 有一种组合数据帧列表(在其他序列/地图中)的方法pd.concat:(https://pandas.pydata.org/docs/reference/api/pandas.concat.html

combined_df = pd.concat(df_list)

-------------------下面的上一个答案-------------------

您可以将每个数据框附加到列表中,并在列表中循环,每次合并一个新数据框。

file_list = [
  "Demographic Data Part 1",
  "Demographic Data Part 2",
  ...
]

df_list = []
for file_name in file_list:
  df = pd.read_csv(file_name,index_col=0,dtype={'Year':object})
  df_list.append(df)

combined_df = pd.DataFrame()
for df in df_list:
  combined_df = combined_df.merge(df.drop_duplicates(subset=['Location+Type']), how='left')

推荐阅读