首页 > 解决方案 > “CategoricalIndex”对象没有属性“is_dtype_equal”

问题描述

我已将我编写的 Pandas 脚本从一台计算机移动到另一台计算机。在新计算机上运行它时出现此错误,但不确定是什么原因造成的。

dfm = master_df
dfa = pd.read_csv(path)

dfa["Size"] = pd.cut(dfa["NOMSIZE_IN_MM_U"],bins=[0,300,600,float('inf')])
dfa["Depth"] = pd.cut(dfa["DEPTH_U"],bins=[0,2,4,6,float('inf')])

dfm['Size'] = pd.cut(dfm['NOMSIZE_IN_MM'], bins = [0,300,600,float('inf')])
dfm['Depth'] = pd.cut(dfm['AVE_DEPTH'], bins = [0,2,4,6,float('inf')])

master_df = dfm.join(dfa.set_index(['Size', 'Depth'])['REPAIR_DURATION'],on=['Size', 'Depth'])

回报:

Traceback (most recent call last):
  File "s:/!AMD Share/Julian D - Student/LARM Gravity/Python Scripts/LARM3_GS.py", line 442, in <module>
    master_df = dfm.join(dfa.set_index(['Size', 'Depth'])['REPAIR_DURATION'],on=['Size', 'Depth'])
  File "C:\Users\DITTHAJ0\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\frame.py", line 4767, in join
    rsuffix=rsuffix, sort=sort)
  File "C:\Users\DITTHAJ0\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\frame.py", line 4782, in _join_compat
    suffixes=(lsuffix, rsuffix), sort=sort)
  File "C:\Users\DITTHAJ0\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\reshape\merge.py", line 54, in merge
    return op.get_result()
  File "C:\Users\DITTHAJ0\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\reshape\merge.py", line 569, in get_result
    join_index, left_indexer, right_indexer = self._get_join_info()
  File "C:\Users\DITTHAJ0\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\reshape\merge.py", line 726, in _get_join_info
    sort=self.sort)
  File "C:\Users\DITTHAJ0\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\reshape\merge.py", line 1353, in _left_join_on_index
    _get_multiindex_indexer(join_keys, right_ax, sort=sort)
  File "C:\Users\DITTHAJ0\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\reshape\merge.py", line 1304, in _get_multiindex_indexer
    rlab, llab, shape = map(list, zip(* map(fkeys, index.levels, join_keys)))
  File "C:\Users\DITTHAJ0\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\reshape\merge.py", line 1390, in _factorize_keys
    lk.is_dtype_equal(rk)):
AttributeError: 'CategoricalIndex' object has no attribute 'is_dtype_equal'

哪里 dfa:

    NOMSIZE_IN_MM_U  DEPTH_U  REPAIR_DURATION
0               300        2                1
1               300        4                1
2               300        6                2
3               300        8                3
4               600        2                2
5               600        4                2
6               600        6                2
7               600        8                5
8               900        2                4
9               900        4                4
10              900        6                5
11              900        8               10

主要的数据:

  ID AVE_DEPTH NOMSIZE_IN_MM
1  0     3.985           915
2  1     2.655           915
3  2     4.200           915

标签: pythonpandas

解决方案


  • 使用或pandas 1.2.1更新测试的代码,具体取决于您的环境。pipconda
  • 另请查看Pandas Merging 101,了解合并和加入数据框的完整细分。

示例 DataFrames 和设置

import pandas as pd

# test dataframes
dfm = pd.DataFrame({'ID': [0, 1, 2], 'AVE_DEPTH': [3.985, 2.655, 4.200], 'NOMSIZE_IN_MM': [915, 915, 915]})
dfa = pd.DataFrame({'NOMSIZE_IN_MM_U': [300, 300, 300, 300, 600, 600, 600, 600, 900, 900, 900, 900], 'DEPTH_U': [2, 4, 6, 8, 2, 4, 6, 8, 2, 4, 6, 8], 'REPAIR_DURATION': [1, 1, 2, 3, 2, 2, 2, 5, 4, 4, 5, 10]})

# add bins
dfa["Size"] = pd.cut(dfa["NOMSIZE_IN_MM_U"],bins=[0,300,600,float('inf')])
dfa["Depth"] = pd.cut(dfa["DEPTH_U"],bins=[0,2,4,6,float('inf')])

dfm['Size'] = pd.cut(dfm['NOMSIZE_IN_MM'], bins = [0,300,600,float('inf')])
dfm['Depth'] = pd.cut(dfm['AVE_DEPTH'], bins = [0,2,4,6,float('inf')])

# join or merge the dataframes

.join

  • 结合索引
# set index - it's better to be explicit
dfm.set_index(['Size', 'Depth'], inplace=True)
dfa.set_index(['Size', 'Depth'], inplace=True)

# join dataframes
df = dfm.join(dfa.REPAIR_DURATION)

# display(df)
                         ID  AVE_DEPTH  NOMSIZE_IN_MM  REPAIR_DURATION
Size         Depth                                                    
(600.0, inf] (2.0, 4.0]   0      3.985            915                4
             (2.0, 4.0]   1      2.655            915                4
             (4.0, 6.0]   2      4.200            915                5

.merge

  • 结合索引和列的组合
# merge dataframes
df = dfm.merge(dfa[['Size', 'Depth', 'REPAIR_DURATION']], on=['Size', 'Depth'])

# display(df)
   ID  AVE_DEPTH  NOMSIZE_IN_MM          Size       Depth  REPAIR_DURATION
0   0      3.985            915  (600.0, inf]  (2.0, 4.0]                4
1   1      2.655            915  (600.0, inf]  (2.0, 4.0]                4
2   2      4.200            915  (600.0, inf]  (4.0, 6.0]                5

推荐阅读