python - 为什么通过pandas合并后dataframe会变空?
问题描述
我正在尝试将数据框与主数据框合并。我已成功合并多个数据框以创建主数据框。当我尝试合并时,有一个数据框会导致问题。
我获取当前主框架的代码:
postransaction_df.PROD_NBR = postransaction_df.PROD_NBR.astype(float)
postprod_df = pd.merge(products_df, postransaction_df, on='PROD_NBR')
postcat_df = pd.merge(postprod_df, major_product_categories_df, on='MAJOR_CAT_CD')
主框架:
postcat_df
Out[40]:
PROD_NBR PROD_DESC MAJOR_CAT_CD \
0 -7.358821e+10 VAL BABYS 1ST GENERAL 9687
1 -7.358821e+10 VAL BABYS 1ST GENERAL 9687
2 -7.204736e+10 CARD VAL ANYONE 9687
3 -7.204736e+10 CARD VAL ANYONE 9687
4 -7.204736e+10 CARD VAL ANYONE 9687
... ... ...
878509 8.940460e+10 ADVOCARE REDICODE PLUS DME STRIP 50CT 2343
878510 8.940460e+10 ADVOCARE REDICODE PLUS DME STRIP 50CT 2343
878511 8.940460e+10 ADVOCARE REDICODE PLUS DME STRIP 50CT 2343
878512 8.940460e+10 ADVOCARE REDICODE PLUS DME STRIP 50CT 2343
878513 8.940460e+10 ADVOCATE REDICODE TALKING GLUCOSE METER 2343
BSKT_ID PHRMCY_NBR SLS_DTE_NBR \
0 600010665100006106120160128 748613589991092598 20160128
1 600010665100006202720160208 748613589991092598 20160208
2 300000003998234235982 1174450154022548624 20160211
3 300000003787577235982 1174450154022548624 20160209
4 300000003792067235982 1174450154022548624 20160211
... ... ...
878509 600010687700002715520160312 1360787588063411417 20160312
878510 600010687700003139020160528 1360787588063411417 20160528
878511 600010687700002377820160111 1360787588063411417 20160111
878512 600010687700002814520160331 1360787588063411417 20160331
878513 600010687700002871320160412 1360787588063411417 20160412
EXT_SLS_AMT SLS_QTY MAJOR_CAT_DESC
0 1.25 1 GREETING CARDS
1 1.25 1 GREETING CARDS
2 1.99 1 GREETING CARDS
3 1.99 1 GREETING CARDS
4 1.99 1 GREETING CARDS
... ... ...
878509 24.00 2 DIABETES
878510 24.00 2 DIABETES
878511 12.00 1 DIABETES
878512 12.00 1 DIABETES
878513 10.00 1 DIABETES
麻烦的框架:
pharmacy_df
Out[41]:
PHRMCY_NBR PHRMCY_NAM ST_CD
0 1.017330e+18 GNP PHARMACY #1 NJ
1 1.041420e+18 GNP PHARMACY #2 NJ
2 1.048830e+18 GNP PHARMACY #3 MA
3 1.057350e+18 GNP PHARMACY #4 NJ
4 1.058510e+18 GNP PHARMACY #5 NY
... ... ...
1092 9.471890e+17 GNP PHARMACY #1093 PA
1093 9.657430e+17 GNP PHARMACY #1094 PA
1094 9.671640e+16 GNP PHARMACY #1095 PA
1095 9.686930e+17 GNP PHARMACY #1096 PR
1096 9.741830e+17 GNP PHARMACY #1097 NJ
我将框架合并在一起的代码:
pharmtotal_df = pd.merge(postcat_df, pharmacy_df, on='PHRMCY_NBR')
上一次合并的结果:
pharmtotal_df
Out[43]:
Empty DataFrame
Columns: [PROD_NBR, PROD_DESC, MAJOR_CAT_CD, BSKT_ID, PHRMCY_NBR, SLS_DTE_NBR, EXT_SLS_AMT, SLS_QTY, MAJOR_CAT_DESC, PHRMCY_NAM, ST_CD]
Index: []
任何人都可以修复如何合并而不导致它创建一个空数据框?
任何帮助是极大的赞赏。
解决方案
因为要么表的键不匹配,要么键的 dtypes 不匹配。尝试使用以下代码:
postprod_df = pd.merge(products_df.assign(PROD_NBR=lambda d: d.PROD_NBR.astype(int)),
postransaction_df.assign(PROD_NBR=lambda d: d.PROD_NBR.astype(int)),
on='PROD_NBR')
postcat_df = pd.merge(postprod_df.assign(MAJOR_CAT_CD=lambda d: d.MAJOR_CAT_CD.astype(int)),
major_product_categories_df.assign(MAJOR_CAT_CD=lambda d: d.MAJOR_CAT_CD.astype(int)),
on='MAJOR_CAT_CD')
推荐阅读
- c - 声明一个变量并在编译时将其添加到数组中
- angular - 从 Angular 中的 PrimeNG PickList 中删除按钮
- ios - 如何观察属性在 RxSwift 中的特定时间间隔内是否没有变化
- c# - 在模型中创建与其自身的多对多关系
- mysql - MYSQL中AND,OR运算符在检索行时有什么区别?
- git - 为什么当我 git status 时看到“HEAD detached from”?
- c - 我尝试编写一个函数来删除字符串中的所有字符(期望字母表)。但我的代码有问题
- c++ - C++ 指数分布偶尔会返回预期为 0 的 inf
- ssis - SSIS 错误“转换失败,因为数据值溢出指定类型”
- php - PHP 和 Jquery Ajax 批处理大数据