python - 一次将多个数据帧合并在一起
问题描述
我有所有这些数据框:
demographic_data1 = pd.read_csv('Demographic Data Part 1',index_col=0,dtype={'Year':object})
demographic_data2 = pd.read_csv('Demographic Data Part 2',index_col=0,dtype={'Year':object})
employment_data1 = pd.read_csv('Employment Data Part 1',index_col=0,dtype={'Year':object})
employment_data2 = pd.read_csv('Employment Data Part 2',index_col=0,dtype={'Year':object})
employment_data3 = pd.read_csv('Employment Data Part 3',index_col=0,dtype={'Year':object})
employment_data4 = pd.read_csv('Employment Data Part 4',index_col=0,dtype={'Year':object})
employment_data5 = pd.read_csv('Employment Data Part 5',index_col=0,dtype={'Year':object})
employment_data6 = pd.read_csv('Employment Data Part 6',index_col=0,dtype={'Year':object})
employment_data7 = pd.read_csv('Employment Data Part 7',index_col=0,dtype={'Year':object})
employment_data8 = pd.read_csv('Employment Data Part 8',index_col=0,dtype={'Year':object})
employment_data9 = pd.read_csv('Employment Data Part 9',index_col=0,dtype={'Year':object})
employment_data10 = pd.read_csv('Employment Data Part 10',index_col=0,dtype={'Year':object})
employment_data11 = pd.read_csv('Employment Data Part 11',index_col=0,dtype={'Year':object})
employment_data12 = pd.read_csv('Employment Data Part 12',index_col=0,dtype={'Year':object})
employment_data13 = pd.read_csv('Employment Data Part 13',index_col=0,dtype={'Year':object})
health_insurance_data = pd.read_csv('Health Insurance Data Part 1',index_col=0,dtype={'Year':object})
orig_data_updated = pd.read_csv('ML Original Data Updated 2018',index_col=0,dtype={'Year':object})
如果我想加入其中两个,我必须这样做:
new_df1 = orig_data_updated.merge(demographic_data1.drop_duplicates(subset=['Location+Type']), how='left')
然后为了继续加入更多我这样做:
new_df2 = new_df1.merge(demographic_data2.drop_duplicates(subset=['Location+Type']), how='left')
我如何一口气做到这一点?
解决方案
更新答案:
Pandas 有一种组合数据帧列表(在其他序列/地图中)的方法pd.concat
:(https://pandas.pydata.org/docs/reference/api/pandas.concat.html)
combined_df = pd.concat(df_list)
-------------------下面的上一个答案-------------------
您可以将每个数据框附加到列表中,并在列表中循环,每次合并一个新数据框。
file_list = [
"Demographic Data Part 1",
"Demographic Data Part 2",
...
]
df_list = []
for file_name in file_list:
df = pd.read_csv(file_name,index_col=0,dtype={'Year':object})
df_list.append(df)
combined_df = pd.DataFrame()
for df in df_list:
combined_df = combined_df.merge(df.drop_duplicates(subset=['Location+Type']), how='left')
推荐阅读
- django - DRF:数据未以补丁方法传递给 kwargs
- hive - 如何读取和分隔配置单元表列中的非 ascii 字符
- firefox - 如何正确设置firefox调试器?
- kubernetes - k8s中Priority和PriorityClass对象的区别
- azure - 带有 Microsoft Botframework 的持久菜单
- ios - 阴影不适用于顶部 Tableview 上的圆角
- sql - 关于select语句有几个问题
- excel - 访问工作簿并使用名称值集创建新工作表
- path - /usr/local/bin 之类的文件夹应该在 $PATH 中吗?(macOSX)
- c# - Http响应处理