首页 > 解决方案 > Vectorising a loop based on the order of values in a series

问题描述

This question is based on a previous question I answered.

The input looks like:

Index   Results  Price
0       Buy      10
1       Sell     11
2       Buy      12
3       Neutral  13
4       Buy      14
5       Sell     15

I need to find every Buy-Sell sequence (ignoring extra Buy / Sell values out of sequence) and calculate the difference in Price.

The desired output:

Index Results Price Difference
0     Buy     10    
1     Sell    11    1
2     Buy     12    
3     Neutral 13    
4     Buy     14    
5     Sell    15    3

My solution is verbose but seems to work:

from numba import njit

@njit
def get_diffs(results, prices):
    res = np.full(prices.shape, np.nan)
    prev_one, prev_zero = True, False
    for i in range(len(results)):
        if prev_one and (results[i] == 0):
            price_start = prices[i]
            prev_zero, prev_one = True, False
        elif prev_zero and (results[i] == 1):
            res[i] = prices[i] - price_start
            prev_zero, prev_one = False, True
    return res

results = df['Results'].map({'Buy': 0, 'Sell': 1})

df['Difference'] = get_diffs(results.values, df['Price'].values)

Is there a vectorised method? I'm concerned about code maintainability and performance over a large number of rows.


Edit: Benchmarking code:

df = pd.DataFrame.from_dict({'Index': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5},
                             'Results': {0: 'Buy', 1: 'Sell', 2: 'Buy', 3: 'Neutral', 4: 'Buy', 5: 'Sell'},
                             'Price': {0: 10, 1: 11, 2: 12, 3: 13, 4: 14, 5: 15}})

df = pd.concat([df]*10**4, ignore_index=True)

def jpp(df):
    results = df['Results'].map({'Buy': 0, 'Sell': 1})    
    return get_diffs(results.values, df['Price'].values)

%timeit jpp(df)  # 7.99 ms ± 142 µs per loop

标签: pythonpandasperformancenumpydataframe

解决方案


By using cumcount to find the pair:

s=df.groupby('Results').cumcount()
df['Diff']=df.Price.groupby(s).diff().loc[df.Results.isin(['Buy','Sell'])]
df
Out[596]: 
   Index  Results  Price  Diff
0      0      Buy     10   NaN
1      1     Sell     11   1.0
2      2      Buy     12   NaN
3      3  Neutral     13   NaN
4      4      Buy     14   NaN
5      5     Sell     15   3.0

推荐阅读