首页 > 解决方案 > Pandas - How can I Improve execution time of a function in a pandas dataframe?

问题描述

I'm actually performing some task in a pandas dataframe (+50k lines), but it's so slow.Actually is around 7 secs...

def check_uno(number,area):
    if number=='adm':
        if area==1:
            return 'uno-'+str(area)
        else:
            return area
    else:
        return area
    
%%timeit
df['area_uno']=df.apply(lambda row:check_uno(row['number'],row['area']),axis=1)
df
>>7.16 s ± 1.44 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

Is there any way I can improve this time? Any help will be greatly appreciated! Thanks in advance!

标签: pandasdataframefunctionperformancetime

解决方案


Try this with np.where:

df['area'] = df['area'].astype(str)
df['area_uno'] = np.where(df['number'].eq('adm') & df['area'].eq("1"), 'uno-' + df['area'], df['area'])

np.where is much faster than df.apply, because NumPy is implemented in C... Comparing the speed of C and Python is comparing night and day...


推荐阅读