首页 > 解决方案 > Python 熊猫:assert_frame_equal

问题描述

我有 2 个 df(称为 bdf 和 cdf)来比较以验证它们的内容是否相等。所以我用

pd.util.testing.assert_frame_equal(bdf, cdf, check_dtype=False, check_like=True, check_exact=True)

做比较。但是,该函数断言了我没想到的列中的差异:

DataFrame.iloc[:, 70] are different

DataFrame.iloc[:, 70] values are different (100.0 %)
[left]:  [201801300040150000014217, 201801300040150000014217, 201801300040150000013737, 201801290040150000019605, 201801300040150000076982, 201801300040150000136588, 201801300040150000242399, 201801300040150000293800, 201801300040150000293801, 201801290040150000128792, 201801300040150000367067, 201801300040150000367770, 201801300040150000369255, 201801260040150000097789, 0, 0, 201801290040150000145140, 0, 201801290040150000145184, 201801290040150000145190, 201801290040150000145198, 201801290040150000145206, 201801290040150000145214, 201801290040150000145222, 0, 0, 201801290040150000145245, 201801290040150000145254, 201801290040150000145263, 201801290040150000145271, 201801290040150000145278, 201801290040150000145286, 201801290040150000145297, 201801290040150000145309, 201801290040150000145318, 201801290040150000145327, 201801290040150000149263, 201801290040150000149264, 201801300040150000433569, 201801290040150000156348, 201801290040150000161046, 201801290040150000161050, 201801290040150000165445, 0, 201801290040150000165456, 201801290040150000165472, 0, 0, 201801290040150000165496, 0, 0, 201801290040150000165520, 0, 0, 0, 201801290040150000165556, 0, 201801260040150000129418]
[right]: [201801300040150000014217, 201801300040150000014217, 201801300040150000013737, 201801290040150000019605, 201801300040150000076982, 201801300040150000136588, 201801300040150000242399, 201801300040150000293800, 201801300040150000293801, 201801290040150000128792, 201801300040150000367067, 201801300040150000367770, 201801300040150000369255, 201801260040150000097789, 0, 0, 201801290040150000145140, 0, 201801290040150000145184, 201801290040150000145190, 201801290040150000145198, 201801290040150000145206, 201801290040150000145214, 201801290040150000145222, 0, 0, 201801290040150000145245, 201801290040150000145254, 201801290040150000145263, 201801290040150000145271, 201801290040150000145278, 201801290040150000145286, 201801290040150000145297, 201801290040150000145309, 201801290040150000145318, 201801290040150000145327, 201801290040150000149263, 201801290040150000149264, 201801300040150000433569, 201801290040150000156348, 201801290040150000161046, 201801290040150000161050, 201801290040150000165445, 0, 201801290040150000165456, 201801290040150000165472, 0, 0, 201801290040150000165496, 0, 0, 201801290040150000165520, 0, 0, 0, 201801290040150000165556, 0, 201801260040150000129418]

从视觉上看,它们看起来并没有什么不同。当我打印出值和 dtype 时:

print "bdf: {}, type {}".format(bdf['refid'][0], bdf['refid'].dtype)
print "cdf: {}, type {}".format(cdf['refid'][0], cdf['refid'].dtype)

我得到:

bdf: 201801300040150000014217, type object
cdf: 201801300040150000014217, type object

那么当它们的值和数据类型相同时,为什么 assert_frame_equal() 会说它们不同呢?作为观察,这两个表中有 200 多列,而且它们都是 dtype=object,但我没有得到这些列的任何比较错误。

标签: pythonpandasdataframe

解决方案


推荐阅读