首页 > 解决方案 > Pandas: I want to multiply two columns with 19 million rows, but system runs out of memory (Memory Error)

问题描述

I want to multiply two columns with 19 million rows and add it to a new column.

So for example, I have a column col_X and a column col_Y, with 19 millions records. And col_X has values of type 'float' and col_Y has values of type 'numpy.float64'. I want to multiply them and add the values to a new column New_col.The code I am using for mulitplication is:

df['New_col']=df['col_X']*df['col_Y']

This worked well when I was working with 10 million records. But now with 19 million, I am facing the following error:

Memory Error: (lambda x: op(x, rvalues)) MemoryError)

I am thinking of multiplying these two columns in two parts (i.e. multiply initial 10 million records first and then multiply the next 9 million records after that, and then later join the two series and add it to a new column), but I don't know how to go about implementing this. Is there any other solution?

I am new to Python and would really appreciate your help.

标签: python-3.xpandasdataframe

解决方案


推荐阅读