首页 > 解决方案 > 如何优化我的python循环函数

问题描述

我开发了一个函数,它对包含股票收益和收益信号的数据框进行排序,然后按百分位数分解每一行,以查看每个箱中的信号发生了多少利润/损失以及收益或损失的总和。它运行但运行缓慢。我有两个“while”循环和几个“if”循环,所以我确信这是减慢它的原因。有没有办法加快这个python功能?

以下是一些可以使用的示例数据:

import numpy as np
import pandas as pd
#make y
y_mean = 1.6966731029796089e-06
y_std =  0.0010495629794829604

x_mean = -7.146476349274362e-06
x_std = 0.00020444862628284671

df_dict = {'x1':np.random.normal(loc=y_mean, scale = y_std, size = 100000), 'x2':np.random.normal(loc=x_mean, scale = x_std, size = 100000)}

df = pd.DataFrame(df_dict)

这是函数本身。再次,这是工作....但缓慢。我在置换测试中使用此函数,这意味着它运行 1000 次置换测试。目前,完成需要 1 小时 10 分钟。

def roc_table(df, row_count, signal, returns):
    """
    

    Parameters
    ----------
    df : dataframe
    row_count : length of data
    signal : signal/s
    returns : log returns

    Returns
    -------
    table - hopefully

    """
    df = df.copy()
    
    bins = [.01, .05, .1, .2, .3, .4, .5, .6, .7, .8, .9, .95, .99]
    
    df = df.sort_values(signal)
    threshold = []
    frac_greater = []
    frac_less = []
    win_above_list = []
    win_below_list = []
    lose_above_list = []
    lose_below_list = []
    
    work_signal = np.array(df[signal])
    work_return = np.array(df[returns])
    
    
    for bin_ in bins:
        k = np.round((bin_*(row_count+1))-1)
        k = int(k)
        threshold.append(work_signal[k])
        # print(threshold)
        # print(k)
        if k < 0:
            k = 0   
        win_above = 1e-60
        win_below = 1e-60
        lose_above = 1e-60
        lose_below = 1e-60
    

        i=0
        while i < k:
            if work_return[i] > 0:
                lose_below += work_return[i]
            else:
                win_below -= work_return[i]

            i += 1
        
        
            
        r = i
        while r < row_count:
            if work_return[r] > 0:
                win_above += work_return[r]
            else: 
                lose_above -= work_return[r]
            r+=1
        

        frac_greater.append((np.round(((row_count-k)/row_count),2)))
        if lose_above > 0:
            lose_above_list.append(np.round(win_above/lose_above,2))
        else:
            lose_above_list.append("inf")
            
        if win_above > 0:
            win_above_list.append(np.round((lose_above/win_above),2))
        else:
            win_above_list.append("inf")
            
        frac_less.append(np.round((k/row_count),2))
        
        if lose_below > 0:
            lose_below_list.append(np.round((win_below/lose_below),2))
        else:
            lose_below_list.append("inf")
            
        if win_below > 0:
            win_below_list.append(np.round((lose_below/win_below),2))
        else:
            win_below_list.append("inf")
            
    roc_dict = {"threshold":threshold,
                "frac Gtr/Eq":frac_greater,
                "Long PF":lose_above_list,
                "Short PF":win_above_list,
                "Frac Less":frac_less,
                "Short PF Less":lose_below_list,
                "Long PF Less":win_below_list}
    

    
    roc = pd.DataFrame(roc_dict)
        
    return roc        

然后运行它只需执行以下操作:

df1 = roc_table(df, df.shape[0], 'x2', 'x1')
df1

我不确定可以做什么,但提前感谢您查看。

标签: pythonperformance

解决方案


您可以使用Numba轻松加速代码中的循环。这是一个例子:

import numba as nb

@nb.njit(nb.types.UniTuple(nb.float64,4)(nb.float64[::1], nb.int64, nb.int64))
def loops(work_return, row_count, k):
    win_above = 1e-60
    win_below = 1e-60
    lose_above = 1e-60
    lose_below = 1e-60

    i=0
    while i < k:
        if work_return[i] > 0:
            lose_below += work_return[i]
        else:
            win_below -= work_return[i]
        i += 1

    r = i
    while r < row_count:
        if work_return[r] > 0:
            win_above += work_return[r]
        else: 
            lose_above -= work_return[r]
        r+=1

    return win_above, win_below, lose_above, lose_below

def roc_table(df, row_count, signal, returns):
    # [...] the beginning is left unchanged

    for bin_ in bins:
        k = np.round((bin_*(row_count+1))-1)
        k = int(k)
        threshold.append(work_signal[k])
        if k < 0:
            k = 0

        win_above, win_below, lose_above, lose_below = loops(work_return, row_count, k)

        # [...] the remaining is left unchanged

这使我的机器上的代码速度提高了43 倍。如果您使用 Numba 优化整个外部循环,您可能可以更快地加快代码速度,但这更难并且可能不值得。


推荐阅读