首页 > 解决方案 > Python:减少 for 循环的运行时间

问题描述

我想计算几个国家的 APRU。

country_list = ['us','gb','ca','id']

count = {}
for i in country_list:
    count[i] = df_day_country[df_day_country.isin([i])]
    count[i+'_reverse'] = count[i].iloc[::-1]
    for j in range(1,len(count[i+'_reverse'])): 
        count[i+'_reverse']['count'].iloc[j] = count[i+'_reverse']['count'][j-1:j+1].sum()
    for k in range(1,len(count[i])): 
        count[i][revenue_sum].iloc[k] = count[i][revenue_sum][k-1:k+1].sum()

    count[i]['APRU'] = count[i][revenue_sum] / count[i]['count'][0]/100

之后,我将创建 4 个数据框:df_us、df_gb、df_ca、df_id,显示每个国家的 APRU。

但是数据集的大小很大。国家列表变大后运行时间极慢。那么有没有办法减少运行时间呢?

标签: pythonpandasdataframefor-loop

解决方案


Consider using numba

Your code thus becomes

from numba import njit

country_list = ['us','gb','ca','id']

@njit
def count(country_list):
  count = {}
  for i in country_list:
      count[i] = df_day_country[df_day_country.isin([i])]
      count[i+'_reverse'] = count[i].iloc[::-1]
      for j in range(1,len(count[i+'_reverse'])): 
          count[i+'_reverse']['count'].iloc[j] = count[i+'_reverse']['count'][j-1:j+1].sum()
      for k in range(1,len(count[i])): 
          count[i][revenue_sum].iloc[k] = count[i][revenue_sum][k-1:k+1].sum()

      count[i]['APRU'] = count[i][revenue_sum] / count[i]['count'][0]/100
  return count

Numba makes python loops a lot faster and is in the process of being integrated into the more heavy duty python libraries like scipy. Deffinetly give this a look.


推荐阅读