python - 在 Python 中合并两个不规则数据框
问题描述
我有两个数据框 df1 和 df2
ID Range(US) Count(US) Mean(US)
0 690 1-3 266 4.0
1 4-7 277 NaN
2 354 1-3 233 2.0
3 4-7 85 NaN
4 947 1-3 156 4.0
和
ID Range(UK) Count(UK) Mean(UK)
0 690 1-3 186 4.0
1 4-7 25 NaN
2 354 1-3 44 1.0
3 947 1-3 213 3.0
4 4-7 33 NaN
我使用代码合并:
In:df=df1.merge(df2, left_on='deviceid',right_on='deviceid', how='left')
df
ID Range(US) Count(US) Mean(US) Range(UK) Count(UK) Mean(UK)
0 690 1-3 266 4.0 1-3 186 4.0
1 4-7 277 NaN 4-7 25 NaN
2 4-7 277 NaN 4-7 33 NaN
3 354 1-3 233 2.0 1-3 44 1.0
4 4-7 85 NaN 4-7 25 NaN
5 4-7 85 NaN 4-7 33 NaN
6 947 1-3 156 4.0 1-3 213 3.0
从上面我们看到,如果某些值不存在,这些值会再次重复
但预期的输出是
ID Range(US) Count(US) Mean(US) Range(UK) Count(UK) Mean(UK)
0 690 1-3 266 4.0 1-3 186 4.0
1 4-7 277 NaN 4-7 25 NaN
2 354 1-3 233 2.0 1-3 44 1.0
3 4-7 85 NaN Nan NaN NaN
4 947 1-3 156 4.0 1-3 213 3.0
5 4-7 Nan Nan 4-7 33 Nan
解决方案
首先删除duplicated
ID
两者中的替换DataFrames
:
#df1['ID'] = df1['ID'].mask(df['ID'].duplicated(), '')
#df2['ID'] = df2['ID'].mask(df['ID'].duplicated(), '')
print (df1)
ID Range(US) Count(US) Mean(US)
0 690 1-3 266 4.0
1 690 4-7 277 NaN
2 354 1-3 233 2.0
3 354 4-7 85 NaN
4 947 1-3 156 4.0
print (df2)
ID Range(UK) Count(UK) Mean(UK)
0 690 1-3 186 4.0
1 690 4-7 25 NaN
2 354 1-3 44 1.0
3 947 1-3 213 3.0
4 947 4-7 33 NaN
然后通过外连接将两列合并:
df = df1.merge(df2, left_on=['ID', 'Range(US)'], right_on=['ID', 'Range(UK)'], how='outer')
print (df)
ID Range(US) Count(US) Mean(US) Range(UK) Count(UK) Mean(UK)
0 690 1-3 266.0 4.0 1-3 186.0 4.0
1 690 4-7 277.0 NaN 4-7 25.0 NaN
2 354 1-3 233.0 2.0 1-3 44.0 1.0
3 354 4-7 85.0 NaN NaN NaN NaN
4 947 1-3 156.0 4.0 1-3 213.0 3.0
5 947 NaN NaN NaN 4-7 33.0 NaN
推荐阅读
- node.js - customHooks module.exports 未使用 es6 样式导入
- javascript - 如何过滤和排序嵌套对象?反应JS
- apache-spark - 索引集合中的大量数据时 SOLR 读取超时(套接字连接超时)
- android - 未解决的参考:Android S 的 LocationRequest (12)
- azure - 在 Azure 应用程序网关中包含 Letsencrypt 根证书
- javascript - 删除后任务恢复
- swift - NSCocoaErrorDomain 代码=257 文件权限
- r - 在一行中注释掉管道运算符 %>% 的快捷方式
- qml - 可滑动滚动条出现故障
- java - 正则表达式不匹配