首页 > 解决方案 > Pandas 与每日间隔的值的最大差异

问题描述

一个问题的继续

df

                    Timestamp  Spot DK1  Spot DK2  Ubalance DK1  Ubalance DK2
0   2020-01-01T00:00:00+01:00     33.42     33.42         34.00         34.00
1   2020-01-01T01:00:00+01:00     31.77     31.77         34.00         34.00
2   2020-01-01T02:00:00+01:00     31.57     31.57         34.00         34.00
3   2020-01-01T03:00:00+01:00     31.28     31.28         34.00         34.00
4   2020-01-01T04:00:00+01:00     30.85     30.85         26.00         26.00
5   2020-01-01T05:00:00+01:00     30.14     30.14         25.00         25.00
6   2020-01-01T06:00:00+01:00     30.17     30.17         24.00         24.00
7   2020-01-01T07:00:00+01:00     30.00     30.00         24.00         24.00
8   2020-01-01T08:00:00+01:00     30.63     30.63         24.00         24.00
9   2020-01-01T09:00:00+01:00     30.59     30.59         25.00         25.00
10  2020-01-01T10:00:00+01:00     30.27     30.27         25.00         25.00
11  2020-01-01T11:00:00+01:00     30.34     30.34         25.00         25.00
12  2020-01-01T12:00:00+01:00     30.59     30.59         30.59         30.59
13  2020-01-01T13:00:00+01:00     30.04     30.04         30.04         30.04
14  2020-01-01T14:00:00+01:00     30.60     30.60         30.60         30.60
15  2020-01-01T15:00:00+01:00     31.09     31.09         31.09         31.09
16  2020-01-01T16:00:00+01:00     31.53     31.53         31.53         31.53
17  2020-01-01T17:00:00+01:00     31.78     31.78         33.50         33.50
18  2020-01-01T18:00:00+01:00     31.64     31.64         33.50         33.50
19  2020-01-01T19:00:00+01:00     31.44     31.44         31.44         31.44
20  2020-01-01T20:00:00+01:00     31.35     31.35         31.35         31.35
21  2020-01-01T21:00:00+01:00     31.07     31.07         31.07         31.07
22  2020-01-01T22:00:00+01:00     30.96     30.96         25.00         25.00
23  2020-01-01T23:00:00+01:00     30.61     30.61         21.00         21.00
24  2020-01-02T00:00:00+01:00     30.78     30.78         20.00         20.00
25  2020-01-02T01:00:00+01:00     30.64     30.64         20.00         20.00
26  2020-01-02T02:00:00+01:00     30.43     30.43         20.00         20.00
27  2020-01-02T03:00:00+01:00     28.79     28.79         23.00         23.00
28  2020-01-02T04:00:00+01:00     28.42     28.42         22.73         22.73
29  2020-01-02T05:00:00+01:00     28.75     28.75         23.00         23.00
30  2020-01-02T06:00:00+01:00     33.38     34.16         22.50         22.50
31  2020-01-02T07:00:00+01:00     31.79     42.07         22.28         22.28
32  2020-01-02T08:00:00+01:00     31.83     44.89         22.50         22.50
33  2020-01-02T09:00:00+01:00     31.74     45.26         23.00         23.00
34  2020-01-02T10:00:00+01:00     31.63     45.57         24.00         24.00
35  2020-01-02T11:00:00+01:00     31.32     45.09         25.00         25.00
36  2020-01-02T12:00:00+01:00     31.07     45.16         25.00         25.00
37  2020-01-02T13:00:00+01:00     31.06     44.90         25.00         25.00
38  2020-01-02T14:00:00+01:00     31.07     44.06         26.00         26.00
39  2020-01-02T15:00:00+01:00     31.26     44.84         26.00         26.00
40  2020-01-02T16:00:00+01:00     31.41     44.40         27.50         27.50
41  2020-01-02T17:00:00+01:00     31.40     46.05         26.00         46.05
42  2020-01-02T18:00:00+01:00     31.10     46.72         26.00         26.00
43  2020-01-02T19:00:00+01:00     30.75     45.26         25.32         25.32
44  2020-01-02T20:00:00+01:00     30.47     39.32         20.25         20.25
45  2020-01-02T21:00:00+01:00     30.10     30.10         16.50         16.50
46  2020-01-02T22:00:00+01:00     29.71     29.71         16.50         16.50
47  2020-01-02T23:00:00+01:00     24.99     24.99         15.00         15.00
48  2020-01-03T00:00:00+01:00     18.93     18.93         15.00         15.00
49  2020-01-03T01:00:00+01:00      9.98      9.98          9.98          9.98

我想生成一组单独的值,其中每天的值的最大差异在其中。最好能够选择要包含的列。

因此,该系列中的第一个值应该是 12.5,因为第一天的最大值和最小值分别是 33.5 和 21.00。

预期结果:

       Day          Max diff
0   2020-01-01       12.50
1   2020-01-02       31.72

到目前为止,我已经尝试过这个 - 但这给了我一个滚动的 24 小时间隔,跨越天,这是我试图避免的:

cols = ['Spot DK1','Spot DK2','Ubalance DK1','Ubalance DK2']
N = 24

battery['dif'] = (battery[cols].stack(dropna=False)
                     .rolling(len(cols) * N)
                     .agg(lambda x: x.max() - x.min())
                     .groupby(level=0)
                     .max())

标签: pythonpandasdataframestatistics

解决方案


想法是每天聚合最小值和最大值,因此可能 getmaxminbymaxminlevel indf1和 last 减去 new df2

df['Timestamp'] = pd.to_datetime(df['Timestamp'])

cols = ['Spot DK1','Spot DK2','Ubalance DK1','Ubalance DK2']

df1 = (df.set_index('Timestamp')[cols]
         .groupby(pd.Grouper(freq='D', level='Timestamp'))
         .agg(['min','max']))

s1 = df1.xs('max', level=1, axis=1).max(axis=1)
s2 = df1.xs('min', level=1, axis=1).min(axis=1)

df2 = s1.sub(s2).rename_axis('Day').reset_index(name='Max diff')
print (df2)
                        Day  Max diff
0 2020-01-01 00:00:00+01:00     13.00
1 2020-01-02 00:00:00+01:00     31.72
2 2020-01-03 00:00:00+01:00      8.95

细节:

print (df1)
                          Spot DK1        Spot DK2        Ubalance DK1        \
                               min    max      min    max          min   max   
Timestamp                                                                      
2020-01-01 00:00:00+01:00    30.00  33.42    30.00  33.42        21.00  34.0   
2020-01-02 00:00:00+01:00    24.99  33.38    24.99  46.72        15.00  27.5   
2020-01-03 00:00:00+01:00     9.98  18.93     9.98  18.93         9.98  15.0   

                          Ubalance DK2         
                                   min    max  
Timestamp                                      
2020-01-01 00:00:00+01:00        21.00  34.00  
2020-01-02 00:00:00+01:00        15.00  46.05  
2020-01-03 00:00:00+01:00         9.98  15.00  

推荐阅读