首页 > 解决方案 > 使用 groupby 的 Python 循环

问题描述

我在下面有一个数据框的摘录:

  ticker       date   open   high    low  close
0     A2M 2020-08-28  18.45  18.71  17.39  17.47
1     A2M 2020-09-04  17.47  17.52  16.53  16.70
2     A2M 2020-09-11  16.70  16.97  16.13  16.45
3     A2M 2020-09-18  16.54  16.77  16.25  16.39
4     A2M 2020-09-25  16.36  17.13  16.32  17.02
5     AAN 2007-06-08  15.29  15.33  14.93  15.07
6     AAN 2007-06-15  15.10  15.23  14.95  15.18
7     AAN 2007-06-22  15.18  15.25  15.12  15.16
8     AAN 2007-06-29  15.14  15.25  15.11  15.22
9     AAN 2007-07-06  15.11  15.33  15.07  15.33
10    AAN 2007-07-13  15.29  15.35  15.12  15.26
11    AAN 2007-07-20  15.25  15.27  15.02  15.10
12    AAN 2007-07-27  15.05  15.15  14.00  14.82
13    AAN 2007-08-03  14.72  14.85  14.47  14.69
14    AAN 2007-08-10  14.56  14.90  14.22  14.54
15    AAN 2007-08-17  14.55  14.79  13.71  14.42
16    AAP 2000-10-06   7.11   7.14   7.10   7.12
17    AAP 2000-10-13   7.13   7.17   7.12   7.17
18    AAP 2000-10-20   7.16   7.25   7.16   7.23
19    AAP 2000-10-27   7.23   7.24   7.22   7.23
20    AAP 2000-11-03   7.16   7.25   7.12   7.25
21    AAP 2000-11-10   7.24   7.24   7.12   7.12
22    ABB 2002-07-26   2.70   3.05   2.60   2.95
23    ABB 2002-08-02   2.92   2.95   2.75   2.80
24    ABB 2002-08-09   2.80   2.84   2.70   2.70
25    ABB 2002-08-16   2.72   2.75   2.70   2.75
26    ABB 2002-08-23   2.71   2.85   2.71   2.75
27    ABB 2002-08-30   2.75   2.75   2.75   2.75

我创建了以下代码来查找 upPrices 与 downPrices:

i = 0
upPrices=[]
downPrices=[]

while i < len(df['close']):
    if i == 0:
        upPrices.append(0)
        downPrices.append(0)
    else:
        if (df['close'][i]-df['close'][i-1])>0:
            upPrices.append(df['close'][i]-df['close'][i-1])
            downPrices.append(0)
        else:
            downPrices.append(df['close'][i]-df['close'][i-1])
            upPrices.append(0)
    i += 1
df['upPrices'] = upPrices
df['downPrices'] = downPrices

结果是以下数据框:

 ticker       date   open   high    low  close  upPrices  downPrices
0     A2M 2020-08-28  18.45  18.71  17.39  17.47      0.00        0.00
1     A2M 2020-09-04  17.47  17.52  16.53  16.70      0.00       -0.77
2     A2M 2020-09-11  16.70  16.97  16.13  16.45      0.00       -0.25
3     A2M 2020-09-18  16.54  16.77  16.25  16.39      0.00       -0.06
4     A2M 2020-09-25  16.36  17.13  16.32  17.02      0.63        0.00
5     AAN 2007-06-08  15.29  15.33  14.93  15.07      0.00       -1.95
6     AAN 2007-06-15  15.10  15.23  14.95  15.18      0.11        0.00
7     AAN 2007-06-22  15.18  15.25  15.12  15.16      0.00       -0.02
8     AAN 2007-06-29  15.14  15.25  15.11  15.22      0.06        0.00
9     AAN 2007-07-06  15.11  15.33  15.07  15.33      0.11        0.00
10    AAN 2007-07-13  15.29  15.35  15.12  15.26      0.00       -0.07
11    AAN 2007-07-20  15.25  15.27  15.02  15.10      0.00       -0.16
12    AAN 2007-07-27  15.05  15.15  14.00  14.82      0.00       -0.28
13    AAN 2007-08-03  14.72  14.85  14.47  14.69      0.00       -0.13
14    AAN 2007-08-10  14.56  14.90  14.22  14.54      0.00       -0.15
15    AAN 2007-08-17  14.55  14.79  13.71  14.42      0.00       -0.12
16    AAP 2000-10-06   7.11   7.14   7.10   7.12      0.00       -7.30
17    AAP 2000-10-13   7.13   7.17   7.12   7.17      0.05        0.00
18    AAP 2000-10-20   7.16   7.25   7.16   7.23      0.06        0.00
19    AAP 2000-10-27   7.23   7.24   7.22   7.23      0.00        0.00
20    AAP 2000-11-03   7.16   7.25   7.12   7.25      0.02        0.00
21    AAP 2000-11-10   7.24   7.24   7.12   7.12      0.00       -0.13
22    ABB 2002-07-26   2.70   3.05   2.60   2.95      0.00       -4.17
23    ABB 2002-08-02   2.92   2.95   2.75   2.80      0.00       -0.15
24    ABB 2002-08-09   2.80   2.84   2.70   2.70      0.00       -0.10
25    ABB 2002-08-16   2.72   2.75   2.70   2.75      0.05        0.00
26    ABB 2002-08-23   2.71   2.85   2.71   2.75      0.00        0.00
27    ABB 2002-08-30   2.75   2.75   2.75   2.75      0.00        0.00

不幸的是,逻辑不正确。和需要为每个股票代码upPricesdownPrices目前,您可以看到在第 5、16 和 22 行中,它比较了另一个股票代码的前一个收盘价。本质上,我需要这个公式groupby或其他方式在每个代码处重新启动。但是,当我尝试添加 groupby 时,它会返回索引长度不匹配错误。

请帮忙!

标签: pythonpandas

解决方案


你的直觉groupby是正确的。groupby收盘价然后diff收盘价。您可以使用where将其分成所需的上下列样式。另外,现在不再循环!对于只需要“基本”数学运算的东西,矢量化方法要好得多。

import pandas as pd
data = {"ticker":["A2M","A2M","A2M","A2M","A2M","AAN","AAN","AAN","AAN"], "close":[17.47,16.7,16.45,16.39,17.02,15.07,15.18,15.16,15.22]}
df = pd.DataFrame(data)

df["diff"] = df.groupby("ticker")["close"].diff()
df["upPrice"] = df["diff"].where(df["diff"] > 0, 0)
df["downPrice"] = df["diff"].where(df["diff"] < 0, 0)
del df["diff"]
print(df)

推荐阅读