首页 > 解决方案 > 为什么通过pandas合并后dataframe会变空?

问题描述

我正在尝试将数据框与主数据框合并。我已成功合并多个数据框以创建主数据框。当我尝试合并时,有一个数据框会导致问题。

我获取当前主框架的代码:

postransaction_df.PROD_NBR = postransaction_df.PROD_NBR.astype(float)

postprod_df = pd.merge(products_df, postransaction_df, on='PROD_NBR')

postcat_df = pd.merge(postprod_df, major_product_categories_df, on='MAJOR_CAT_CD')

主框架:

postcat_df
Out[40]: 
            PROD_NBR                                PROD_DESC  MAJOR_CAT_CD  \
0      -7.358821e+10                    VAL BABYS 1ST GENERAL          9687   
1      -7.358821e+10                    VAL BABYS 1ST GENERAL          9687   
2      -7.204736e+10                          CARD VAL ANYONE          9687   
3      -7.204736e+10                          CARD VAL ANYONE          9687   
4      -7.204736e+10                          CARD VAL ANYONE          9687   
              ...                                      ...           ...   
878509  8.940460e+10    ADVOCARE REDICODE PLUS DME STRIP 50CT          2343   
878510  8.940460e+10    ADVOCARE REDICODE PLUS DME STRIP 50CT          2343   
878511  8.940460e+10    ADVOCARE REDICODE PLUS DME STRIP 50CT          2343   
878512  8.940460e+10    ADVOCARE REDICODE PLUS DME STRIP 50CT          2343   
878513  8.940460e+10  ADVOCATE REDICODE TALKING GLUCOSE METER          2343  

                            BSKT_ID           PHRMCY_NBR  SLS_DTE_NBR  \
0       600010665100006106120160128   748613589991092598     20160128   
1       600010665100006202720160208   748613589991092598     20160208   
2             300000003998234235982  1174450154022548624     20160211   
3             300000003787577235982  1174450154022548624     20160209   
4             300000003792067235982  1174450154022548624     20160211   
                             ...                  ...          ...   
878509  600010687700002715520160312  1360787588063411417     20160312   
878510  600010687700003139020160528  1360787588063411417     20160528   
878511  600010687700002377820160111  1360787588063411417     20160111   
878512  600010687700002814520160331  1360787588063411417     20160331   
878513  600010687700002871320160412  1360787588063411417     20160412  

        EXT_SLS_AMT  SLS_QTY  MAJOR_CAT_DESC  
0              1.25        1  GREETING CARDS  
1              1.25        1  GREETING CARDS  
2              1.99        1  GREETING CARDS  
3              1.99        1  GREETING CARDS  
4              1.99        1  GREETING CARDS  
             ...      ...             ...  
878509        24.00        2        DIABETES  
878510        24.00        2        DIABETES  
878511        12.00        1        DIABETES  
878512        12.00        1        DIABETES  
878513        10.00        1        DIABETES  

麻烦的框架:

pharmacy_df
Out[41]: 
        PHRMCY_NBR          PHRMCY_NAM ST_CD
0     1.017330e+18     GNP PHARMACY #1    NJ
1     1.041420e+18     GNP PHARMACY #2    NJ
2     1.048830e+18     GNP PHARMACY #3    MA
3     1.057350e+18     GNP PHARMACY #4    NJ
4     1.058510e+18     GNP PHARMACY #5    NY
            ...                 ...   ...
1092  9.471890e+17  GNP PHARMACY #1093    PA
1093  9.657430e+17  GNP PHARMACY #1094    PA
1094  9.671640e+16  GNP PHARMACY #1095    PA
1095  9.686930e+17  GNP PHARMACY #1096    PR
1096  9.741830e+17  GNP PHARMACY #1097    NJ

我将框架合并在一起的代码:

pharmtotal_df = pd.merge(postcat_df, pharmacy_df, on='PHRMCY_NBR')

上一次合并的结果:

pharmtotal_df
Out[43]: 
Empty DataFrame
Columns: [PROD_NBR, PROD_DESC, MAJOR_CAT_CD, BSKT_ID, PHRMCY_NBR, SLS_DTE_NBR, EXT_SLS_AMT, SLS_QTY, MAJOR_CAT_DESC, PHRMCY_NAM, ST_CD]
Index: []

任何人都可以修复如何合并而不导致它创建一个空数据框?

任何帮助是极大的赞赏。

标签: pythonpandasjoinmerge

解决方案


因为要么表的键不匹配,要么键的 dtypes 不匹配。尝试使用以下代码:

postprod_df = pd.merge(products_df.assign(PROD_NBR=lambda d: d.PROD_NBR.astype(int)),
                       postransaction_df.assign(PROD_NBR=lambda d: d.PROD_NBR.astype(int)), 
                       on='PROD_NBR')

postcat_df = pd.merge(postprod_df.assign(MAJOR_CAT_CD=lambda d: d.MAJOR_CAT_CD.astype(int)), 
                      major_product_categories_df.assign(MAJOR_CAT_CD=lambda d: d.MAJOR_CAT_CD.astype(int)), 
                      on='MAJOR_CAT_CD')

推荐阅读