python - 尝试加入两个 pandas 数据框但得到“ValueError:您正在尝试合并对象和 int64 列。”?
问题描述
我有两个熊猫数据框:seren1
和bbox
. 我想在名为的列上执行它们的内部连接filepath
。
seren1[["filepath", "label"]].join(bbox[["filepath", "label"]], on="filepath", how="inner", lsuffix='_caller', rsuffix='_other')
给出错误:
ValueError Traceback (most recent call last)
<ipython-input-74-c001a7adc7cd> in <module>
----> 1 seren1[["filepath", "label"]].join(bbox[["filepath", "label"]], on="filepath", how="inner", lsuffix='_caller', rsuffix='_other')
/projects/community/py-data-science-stack/5.1.0/kp807/envs/fastai/lib/python3.7/site-packages/pandas/core/frame.py in join(self, other, on, how, lsuffix, rsuffix, sort)
6822 # For SparseDataFrame's benefit
6823 return self._join_compat(other, on=on, how=how, lsuffix=lsuffix,
-> 6824 rsuffix=rsuffix, sort=sort)
6825
6826 def _join_compat(self, other, on=None, how='left', lsuffix='', rsuffix='',
/projects/community/py-data-science-stack/5.1.0/kp807/envs/fastai/lib/python3.7/site-packages/pandas/core/frame.py in _join_compat(self, other, on, how, lsuffix, rsuffix, sort)
6837 return merge(self, other, left_on=on, how=how,
6838 left_index=on is None, right_index=True,
-> 6839 suffixes=(lsuffix, rsuffix), sort=sort)
6840 else:
6841 if on is not None:
/projects/community/py-data-science-stack/5.1.0/kp807/envs/fastai/lib/python3.7/site-packages/pandas/core/reshape/merge.py in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
45 right_index=right_index, sort=sort, suffixes=suffixes,
46 copy=copy, indicator=indicator,
---> 47 validate=validate)
48 return op.get_result()
49
/projects/community/py-data-science-stack/5.1.0/kp807/envs/fastai/lib/python3.7/site-packages/pandas/core/reshape/merge.py in __init__(self, left, right, how, on, left_on, right_on, axis, left_index, right_index, sort, suffixes, copy, indicator, validate)
531 # validate the merge keys dtypes. We may need to coerce
532 # to avoid incompat dtypes
--> 533 self._maybe_coerce_merge_keys()
534
535 # If argument passed to validate,
/projects/community/py-data-science-stack/5.1.0/kp807/envs/fastai/lib/python3.7/site-packages/pandas/core/reshape/merge.py in _maybe_coerce_merge_keys(self)
978 (inferred_right in string_types and
979 inferred_left not in string_types)):
--> 980 raise ValueError(msg)
981
982 # datetimelikes must match exactly
ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat
但是,如果我将它们转换为系列加入:
import numpy as np
pd.Series(np.intersect1d(seren1["filepath"].values,bbox["filepath"].values))
它工作正常:
0 S1/B04/B04_R1/S1_B04_R1_PICT0006
1 S1/B04/B04_R1/S1_B04_R1_PICT0007
2 S1/B04/B04_R1/S1_B04_R1_PICT0008
3 S1/B04/B04_R1/S1_B04_R1_PICT0013
4 S1/B04/B04_R1/S1_B04_R1_PICT0039
5 S1/B04/B04_R1/S1_B04_R1_PICT0040
6 S1/B04/B04_R1/S1_B04_R1_PICT0041
7 S1/B05/B05_R1/S1_B05_R1_PICT0056
......
类型检查:
seren1.dtypes
filepath object
timestamp object
label object
dtype: object
bbox.dtypes
filepath object
label object
X int64
Y int64
W int64
H int64
dtype: object
all (seren1.filepath.apply(lambda x: isinstance(x, str)) )
True
all (bbox.filepath.apply(lambda x: isinstance(x, str)) )
True
出了什么问题?
解决方案
我能够摆脱这个错误,如下所示:
假设您正在尝试将 df2 加入 df1。要使连接功能正常工作,您必须在两个数据框中使用相同的列名“Column”,并且还必须在要连接的数据框中的“Column”列上设置_index。要让 df2 在“列”列加入 df1,请使用,
df1.join(df2.set_index('Column'), on = 'Column')
推荐阅读
- javascript - 删除 if 语句
- go - 带频道的 WaitGroup
- python - 计算值在两个数字之间并按不同列分组的行
- laravel - 如果删除状态,Laravel 强制用户注销
- python - 特定站点上的 BeautifulSoup 出现 403 错误
- json - 如果 json bytes 字段与某种类型匹配,我如何仅解组到字段中?
- angular - 将数据保存到索引类型变量时遇到问题
- c# - 实现 C# IDisposable
- c# - 将 python TensorFlow 聊天机器人模型集成到 Xamarin 应用程序中
- objective-c - 如何在Objective-C中序列化数据(如缓冲区对象)以供NativeScript使用