首页 > 解决方案 > 如何优化 df.assign?

问题描述

我正在处理data形状为 (55025, 12) 的 dataframe() 上的 python,并且我正在尝试分配一个新列,我的代码是:

data_cat=data.assign(
    type0 = lambda dataframe: dataframe['value'].map(lambda x: x>0),
    type1= lambda dataframe: dataframe['value'].map(lambda x: x>1,
    type2 = lambda dataframe: dataframe['value'].map(lambda x: x>2)
)

它需要永远运行。我该如何优化呢?

谢谢!

标签: pythonpandasassign

解决方案


You can create new columns on the original dataframe directly to avoid copying data, if it won't hurt.

data["type0"] = data["value"].gt(0)
data["type1"] = data["value"].gt(1)
...

otherwise assign is fine

data_cat = data.assign(
    type0=data["value"].gt(0),
    type1=data["value"].gt(1),
    ...
)

See also pandas accessors for some other frequent operations that may have been already implemented in pandas.


推荐阅读