pandas - Pandas - How can I Improve execution time of a function in a pandas dataframe?
问题描述
I'm actually performing some task in a pandas dataframe (+50k lines), but it's so slow.Actually is around 7 secs...
def check_uno(number,area):
if number=='adm':
if area==1:
return 'uno-'+str(area)
else:
return area
else:
return area
%%timeit
df['area_uno']=df.apply(lambda row:check_uno(row['number'],row['area']),axis=1)
df
>>7.16 s ± 1.44 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
Is there any way I can improve this time? Any help will be greatly appreciated! Thanks in advance!
解决方案
Try this with np.where
:
df['area'] = df['area'].astype(str)
df['area_uno'] = np.where(df['number'].eq('adm') & df['area'].eq("1"), 'uno-' + df['area'], df['area'])
np.where
is much faster than df.apply
, because NumPy is implemented in C... Comparing the speed of C and Python is comparing night and day...
推荐阅读
- github - 有没有办法将 Github 存储库转移到组?
- javascript - 当两个类都相等时删除一个样式。(CSS、HTML、JS)
- reactjs - 在样式组件中引用通用打字稿功能组件
- android - 为什么 Kotlin 在 firestore 调用中跳过这部分代码?
- postgresql - Wildfly 部署因缺少数据源而失败
- r - 从 R 中的 ts() 对象中删除一个值而不更改对象的类
- c++ - 一个程序,给出一个人在银行花费的最长时间(以分钟为单位)
- javascript - Redux 状态问题:TypeError: undefined is not an object
- sql - @@RowCount 返回 1 即使之前的选择条件是 False
- reactjs - npm start & npm run build 不工作 | 反应应用