python - pd.merge 给出错误:DataFrame' 对象是可变的,因此它们不能被散列
问题描述
我有一个数据帧 dfCM,它是从另一个数据帧 dfdict[dfCM] 创建的,然后按如下方式处理:
- 不需要的行被删除。
- 不需要的列已删除
- 添加了新列。
我现在需要将删除的列从 dfdict[dfCM] 添加回 dfCM。请注意, dfdict[dfCM] 保存在数据帧字典中。
我之前在我的代码中多次运行过类似的合并命令,但现在我收到错误:DataFrame' 对象是可变的,因此它们不能被散列
#add back deleted dfCM columns
dfCM = pd.merge(dfCM, dfdict[dfCM], on=['ClaimID'], how = 'left', suffixes = ('', '_cm'))
#remove duplicate columns
dfCM.filter(like='_cm',axis=1)
这就是 dfCM 的样子(有更多的列和行):
index ClaimID MeasCode MeasAppType
0 MCE-2019-02-02-068-01 CLA48 AR
1 MCE-2019-02-066-01 CLA48 AR
2 MCE-2019-02-066-01B CLA48 AR
3 MCE-2019-02-066-02 CLB50 AR
4 MCE-2019-02-066-02B CLB50 AR
5 MCE-2019-02-067-01 CLB51 AR
下面是dfdict的截图:
这就是 dfdict[dfCM] 的样子(有更多的行和列):
index ClaimID MeasAppType MeasDesc
0 BAY-2019_C&S_19Q1 AR Attic insulation; Domestic hot water heater/boiler;
1 BAY-2019_COM_19Q1 AR Attic insulation; Domestic hot water heater/boiler;
2 BAY-2019_Com_Q2 NR This record is not a project
3 BAY-2019_CS_Q2 NR This record is not a project
4 BAY-2019_EM&V_19Q1 AR Attic insulation; Domestic hot water heater/boiler;
我可以通过更改 dfdict[dfCM] 中的所有列名来进行合并,如下所示。但这并不理想,因为现在我无法区分添加到 dfCM 的重复列和唯一列,因此无法删除重复项。
#add back deleted dfCM columns
dfdict['dfCM'] = dfdict['dfCM'].add_suffix('_cm') #identified columns from dfCL
dfCM = pd.merge(dfCM, dfdict['dfCM'], left_on='ClaimID', right_on='ClaimID_cm', how = 'left', suffixes = ('', '_cm'))
有没有更好的方法来解决这个问题?谢谢
解决方案
您将需要解释如何创建dfdict
,因为您尝试使用数据框作为您无法执行的字典的键:
import pandas as pd
df1 = pd.DataFrame()
df2 = pd.DataFrame()
dfdict = {df1: 1, df2: 2}
Traceback (most recent call last):
File "/Users/dgolding/PycharmProjects/team-general-wikis/venv/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3326, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-5-3207e8fd0e73>", line 1, in <module>
{df1: 1, df2: 2}
File "/Users/dgolding/PycharmProjects/team-general-wikis/venv/lib/python3.6/site-packages/pandas/core/generic.py", line 1887, in __hash__
" hashed".format(self.__class__.__name__)
TypeError: 'DataFrame' objects are mutable, thus they cannot be hashed
也许您的字典键实际上是数据框变量名称的字符串?在这种情况下,当您尝试使用数据框作为键来获取值时,您会收到该错误:
dfdict = {"df1": df1, "df2": df2}
dfdict[df1]
Traceback (most recent call last):
File "/Users/dgolding/PycharmProjects/team-general-wikis/venv/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3326, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-7-825e4ae2577b>", line 1, in <module>
dfdict[df1]
File "/Users/dgolding/PycharmProjects/team-general-wikis/venv/lib/python3.6/site-packages/pandas/core/generic.py", line 1887, in __hash__
" hashed".format(self.__class__.__name__)
TypeError: 'DataFrame' objects are mutable, thus they cannot be hashed
也许你正试图这样做:dfdict["dfCM"]
?