首页 > 解决方案 > 替换重复值并在列中重复组合

问题描述

我在数据框中有“状态”列,其值为“Buy”、“Sell”和“NaN”:

No.Row 状态 价格
1 10
2 20
3 10
4 5
5 6
6 30
7 50
8 25
9 40
10 35
11 10
12 5

我想有第一个“买入”,然后是“卖出”,分别(买入,卖出)。然后是“卖出”(现在)-“买入”(之前)的“减少”列的值。例如,在第 7 行,我们有 50(第 3 卷)-10(第 7 卷)= 40
理想输出:

No.Row 状态 减少
1
2
3
4
5
6
7 40
8
9
10
11
12 -35

标签: pythondataframenumpy

解决方案


一种方法是使用一个global跟踪last buy状态的变量。

然后使用减少方法来计算每次到达某个Sell状态时的成本。

global last_buy
last_buy = None 


def reduction(row):
    global last_buy
    if row.state == "Sell" and last_buy is not None: 
        cost = row.price - last_buy
        last_buy = None
        return cost
    
    if row.state == "Buy" and last_buy is None: 
        last_buy = row.price


df["reduction"] = df.apply(reduction, axis=1)

输出:

   state  price  reduction
0   NaN      10        NaN
1   Sell     20        NaN
2    Buy     10        NaN
3   NaN       5        NaN
4    Buy      6        NaN
5    Buy     30        NaN
6   Sell     50       40.0
7   Sell     25        NaN
8    Buy     40        NaN
9    Buy     35        NaN
10  NaN      10        NaN
11  Sell      5      -35.0

推荐阅读