首页 > 解决方案 > 基于组重采样数据并计算滚动和

问题描述

我想在我的数据框中创建一个额外的列,而不必循环执行这些步骤

This is created in the following steps.

 1.Start from end of the data.For each date resample every nth row 
 (in this case its 5th) from the end.
 2.Take the rolling sum of x numbers from 1 (x=2)

 a worked example for 
 11/22:5,7,3,2 (every 5th row being picked) but x=2 so 5+7=12
 11/15:6,5,2 (every 5th row being picked) but x=2 so 6+5=11


        cumulative 
 8/30/2019  2   
 9/6/2019   4   
 9/13/2019  1   
 9/20/2019  2   
 9/27/2019  3   5
 10/4/2019  3   7
 10/11/2019 5   6
 10/18/2019 5   7
 10/25/2019 7   10
 11/1/2019  4   7
 11/8/2019  9   14
 11/15/2019 6   11
 11/22/2019 5   12

标签: pythonpandasdataframe

解决方案


假设我们有一组 15 个整数:

df = pd.DataFrame([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15], columns=['original_data'])

我们定义应该添加第 n 行n以及x添加该nth行的次数

n = 5
x = 2

(
    df

    # Add `x` columsn which are all shifted `n` rows
    .assign(**{
        'n{} x{}'.format(n, x): df['original_data'].shift(n*x)
        for x in range(1, reps)})

    # take the rowwise sum
    .sum(axis=1)
)

输出:

    original_data   n5 x1
0   1               NaN
1   2               NaN
2   3               NaN
3   4               NaN
4   5               NaN
5   6               1.0
6   7               2.0
7   8               3.0
8   9               4.0
9   10              5.0
10  11              6.0
11  12              7.0
12  13              8.0
13  14              9.0
14  15              10.0

推荐阅读