首页 > 解决方案 > 使用 pd.merge 使用参考表填充左侧数据框中的缺失值

问题描述

我无法pd.merge为我拥有的这个数据框填写一些缺失的数据:

fulldf.head(20)

 code    Major_Project_Theme
0   8   Human development
1   11  
2   1   Economic management
3   6   Social protection and risk management
4   5   Trade and integration
5   2   Public sector governance
6   11  Environment and natural resources management
7   6   Social protection and risk management
8   7   Social dev/gender/inclusion
9   7   Social dev/gender/inclusion
10  5   Trade and integration
11  4   Financial and private sector development
12  6   Social protection and risk management
13  6   
14  2   Public sector governance
15  4   Financial and private sector development
16  11  Environment and natural resources management
17  8   
18  10  Rural development
19  7   `

使用此参考表:

fullgroupeddf = fulldf.groupby(['code', 'Major_Project_Theme']).count()
fullgroupeddf

code    Major_Project_Theme
1   Economic management
10  Rural development
11  Environment and natural resources management
2   Public sector governance
3   Rule of law
4   Financial and private sector development
5   Trade and integration
6   Social protection and risk management
7   Social dev/gender/inclusion
8   Human development
9   Urban development `

我尝试使用它但没有用:

filleddf = fulldf.merge(fullgroupeddf, how='left', left_on='code', right_on='code')

老实说,我不知道在合并方面我在做什么。这个想法是使用我创建的参考表来填充Major_Project_Theme第一个数据帧中的缺失值。我在合并语句中添加了什么或者有更好的方法来做到这一点?

标签: pythonpandas

解决方案


假设在缺少数据的行中,您实际上有一个空字符串'',您可以在以下代码transform(max)之后使用:groupby

filleddf = fulldf.copy() #this is just if you want different dataframes
# filled missing value in the column Major_Project_Theme with:
filleddf['Major_Project_Theme'] = (filleddf.groupby('code')['Major_Project_Theme']
                                            .transform(max))

filleddf应该填充与“代码”相关联的良好“Major_Project_Theme”的所有行


推荐阅读