首页 > 解决方案 > 根据条件熊猫数据框python计算cumprod

问题描述

示例数据框如下:-

import pandas as pd
import numpy as np
from datetime import datetime
start = datetime(2011, 1, 1)
end = datetime(2012, 1, 1)

index = pd.date_range(start, end)
df = pd.DataFrame(np.random.randn(366, 1), index=index, columns=["Returns"])

我知道累积收益计算如下(单位起始值)

start=1.0
df['Cumulative Returns']=start * (1 + df['Returns']).cumprod()

我需要根据另一列的布尔条件计算累积回报。

df['bool']=0
df.iloc[0:5,2]=1
df.iloc[8:18,2]=1

数据如下:-

Returns  Cumulative Returns  bool
2011-01-01 -0.180628            0.819372     1
2011-01-02  0.585284            1.298938     1
2011-01-03  0.032713            1.341430     1
2011-01-04  0.161464            1.558023     1
2011-01-05  1.741576            4.271438     1
2011-01-06 -1.893358           -3.815922     0
2011-01-07  0.015942           -3.876755     0
2011-01-08 -0.615686           -1.489891     0
2011-01-09  0.330300           -1.982002     1
2011-01-10  0.274620           -2.526298     1
2011-01-11  0.222498           -3.088395     1
2011-01-12 -0.131634           -2.681858     1
2011-01-13 -0.217193           -2.099378     1
2011-01-14 -0.794016           -0.432438     1
2011-01-15  0.077270           -0.465853     1
2011-01-16  0.388143           -0.646670     1
2011-01-17  0.361618           -0.880518     1
2011-01-18 -1.732723            0.645176     1
2011-01-19 -0.045690            0.615698     0
2011-01-20  1.018151            1.242571     0
2011-01-21 -0.218665            0.970865     0
2011-01-22 -1.454362           -0.441124     0
2011-01-23  1.401056           -1.059163     0
2011-01-24  0.233366           -1.306336     0
2011-01-25 -0.235055           -0.999275     0
2011-01-26  0.577812           -1.576668     0
2011-01-27  0.510124           -2.380965     0
2011-01-28 -0.848362           -0.361045     0
2011-01-29  0.712476           -0.618281     0
2011-01-30 -0.176403           -0.509214     0

我想根据 bool 列计算从 2011-01-01 到 2011-01-05 以及从 2011-01-09 到 2011-01-18 的非连续累积回报。

标签: pythonpandasdataframe

解决方案


使用您的原始公式,但仅适用于带有bool == 1的行。要做到这一点,而不是df使用df[df['bool'] == 1]。所以整个指令可以是:

df['CumProd2'] = start * (1 + df[df['bool'] == 1].Returns).cumprod()

bool == 0的值保留为NaN。如果要将它们更改为例如0,请运行:

df.CumProd2.fillna(0, inplace=True)

推荐阅读