首页 > 解决方案 > 将两个数据框与python合并时的MemoryError

问题描述

我有两个巨大的数据框:第一个数据框:limdata

        SACC_ID     OPPLINE_LINE_ID     OPP_CREATION_DATE
    0   001A000000qqefQIAQ  a0W1200000F5TWOEA3  2015-01-09
    1   001A000000siuo7IAA  a0W1200000JEmTdEAL  2017-01-02
    2   001A000000qqCDcIAM  a0W1200000H3FYTEA3  2016-01-15
    3   001A0000014MJgpIAG  a0W1200000F5TW9EAN  2015-01-09
    4   001A000000ZdyuMIAR  a0W1200000H11lHEAR  2015-12-10
    5   001A000000aOmo4IAC  a0W1200000H11n3EAB  2015-12-10
    6   001A000000v6diCIAQ  a0W1200000HkwfzEAB  2016-05-02
    .....
    151185  001A000000skyIMIAY  a0WA000000EMTouMAH  2014-09-12

第二个数据框称为 hist

SACC_PS     CASE_ID     CREATION_DATE
    0   0011200001K64ncAAB  5001200000eXVMvAAO  2017-01-25 05:00:07
    1   001A000000iUrwSIAS  5001200000eX7FMAA0  2017-01-25 05:06:38
    2   001A0000011lNmnIAE  5001200000Xyi38AAB  2016-03-04 13:02:19
    3   001A000000aOlebIAC  5001200000XyE0TAAV  2016-03-04 13:02:09
    5   001A0000013XIPoIAO  5001200000XyG0LAAV  2016-03-04 13:02:12
    7   001A000000aOkIoIAK  5001200000XyLT3AAN  2016-03-04 13:02:12
    9   001A000000m5pCAIAY  5001200000XyKhsAAF  2016-03-04 13:02:12
    11  001A000000yLcL4IAK  5001200000Xyg2wAAB  2016-03-04 13:02:12
    ....
    12473746    001A000000aOkumIAC  5001200000gXsWHAA0  2017-05-02 16:20:59

我尝试使用这行代码合并这两个数据框:

case = pd.merge(limdata, hist, left_on='SACC_ID',right_on='SACC_PS')

但我得到了这个与内存有关的错误:

MemoryError Traceback (last last call last) in () ----> 1 case = pd.merge(limdata, hist, left_on='SACC_ID',right_on='SACC_PS')

~/anaconda3/envs/python3/lib/python3.6/site-packages/pandas/core/reshape/merge.py 在合并(左,右,如何,上,左上,右上,左索引,右索引,排序,后缀,复制,指标,验证)56 复制=复制,指标=指标,57 验证=验证)---> 58 返回 op.get_result() 59 60

~/anaconda3/envs/python3/lib/python3.6/site-packages/pandas/core/reshape/merge.py in get_result(self) 594 [(ldata, lindexers), (rdata, rindexers)], 595 轴= [llabels.append(rlabels), join_index], --> 596 concat_axis=0, copy=self.copy) 597 598 typ = self.left._constructor

~/anaconda3/envs/python3/lib/python3.6/site-packages/pandas/core/internals.py 在 concatenate_block_managers(mgrs_indexers,轴,concat_axis,副本)5201 其他:5202 b = make_block(-> 5203 concatenate_join_units(join_units , concat_axis, 复制=复制), 5204 放置=放置) 5205 blocks.append(b)

~/anaconda3/envs/python3/lib/python3.6/site-packages/pandas/core/internals.py in concatenate_join_units(join_units, concat_axis, copy) 5336
concat_values = to_concat[0] 5337 如果 copy 和 concat_values.base 不是无:-> 5338 concat_values = concat_values.copy() 5339 else: 5340 concat_values = _concat._concat_compat(to_concat, axis=concat_axis)

内存错误:

你能帮我解决这个问题吗?先感谢您

最好的

标签: pythonpandasdataframe

解决方案


推荐阅读