python - Python:减少 for 循环的运行时间
问题描述
我想计算几个国家的 APRU。
country_list = ['us','gb','ca','id']
count = {}
for i in country_list:
count[i] = df_day_country[df_day_country.isin([i])]
count[i+'_reverse'] = count[i].iloc[::-1]
for j in range(1,len(count[i+'_reverse'])):
count[i+'_reverse']['count'].iloc[j] = count[i+'_reverse']['count'][j-1:j+1].sum()
for k in range(1,len(count[i])):
count[i][revenue_sum].iloc[k] = count[i][revenue_sum][k-1:k+1].sum()
count[i]['APRU'] = count[i][revenue_sum] / count[i]['count'][0]/100
之后,我将创建 4 个数据框:df_us、df_gb、df_ca、df_id,显示每个国家的 APRU。
但是数据集的大小很大。国家列表变大后运行时间极慢。那么有没有办法减少运行时间呢?
解决方案
Consider using numba
Your code thus becomes
from numba import njit
country_list = ['us','gb','ca','id']
@njit
def count(country_list):
count = {}
for i in country_list:
count[i] = df_day_country[df_day_country.isin([i])]
count[i+'_reverse'] = count[i].iloc[::-1]
for j in range(1,len(count[i+'_reverse'])):
count[i+'_reverse']['count'].iloc[j] = count[i+'_reverse']['count'][j-1:j+1].sum()
for k in range(1,len(count[i])):
count[i][revenue_sum].iloc[k] = count[i][revenue_sum][k-1:k+1].sum()
count[i]['APRU'] = count[i][revenue_sum] / count[i]['count'][0]/100
return count
Numba makes python loops a lot faster and is in the process of being integrated into the more heavy duty python libraries like scipy. Deffinetly give this a look.
推荐阅读
- html - 试图仅捕获图像名称而不是完整的图像路径
- tensorflow - 调用 dispose 后未知的遗留张量
- c++ - C ++ 17将列表(或其他容器)转换为字符串跨平台的最快方法是什么
- python - 在一个脚本中更新列表并从另一个脚本访问更新的列表
- ios - 附加到 ARAnchor 的 SpriteKit SKLabelNode 不出现或全屏显示
- ssh - 如何使用 SSH 使 samba 服务器在 Internet 上可用
- javascript - 将 React 环境变量插入 HTML 脚本标签
- python - KBinsDiscretizer 用于字符串
- wordpress - 从页面中删除 Wordpress 标题栏
- javascript - 节点 js async.parallel 与等待可能吗?