python - 按一天中的小时对 Pandas 数据进行分组
问题描述
我使用以下代码生成随机日期和值:
import pandas as pd
import numpy as np
time = pd.date_range('1/1/2000', periods=2000, freq='5min')
series = pd.Series(np.random.randint(100, size=2000), index=time)
输出如下所示:
2000-01-01 00:00:00 40
2000-01-01 00:05:00 13
2000-01-01 00:10:00 99
2000-01-01 00:15:00 72
2000-01-01 00:20:00 4
2000-01-01 00:25:00 36
2000-01-01 00:30:00 24
2000-01-01 00:35:00 20
2000-01-01 00:40:00 83
2000-01-01 00:45:00 44
然后我按索引小时值对这些数据进行排序,然后按平均值聚合它,如下所示:
0 50.380952
1 49.380952
2 49.904762
3 53.273810
4 47.178571
5 46.095238
6 49.047619
7 44.297619
8 53.119048
9 48.261905
10 45.166667
11 54.214286
12 50.714286
13 56.130952
14 50.916667
15 42.428571
16 46.880952
17 56.892857
18 54.071429
19 47.607143
20 50.940476
21 50.511905
22 44.550000
23 50.250000
但是,如果我想仅按索引小时值而不使用平均值对所有数据进行分组,那么我现在应该怎么做,以便我可以获得与其索引关联的所有值?
提前致谢。
问候,
解决方案
如果要通过 s 聚合hour
,则表示:
np.random.seed(456)
time = pd.date_range('1/1/2000', periods=2000, freq='5min')
series = pd.Series(np.random.randint(100, size=2000), index=time)
s = series.groupby(series.index.hour).mean()
print (s)
0 49.392857
1 52.523810
2 53.047619
3 49.083333
4 49.785714
5 49.071429
6 52.476190
7 47.821429
8 52.190476
9 50.000000
10 49.035714
11 52.988095
12 52.785714
13 52.023810
14 46.964286
15 52.095238
16 51.047619
17 52.166667
18 48.357143
19 51.416667
20 45.214286
21 46.130952
22 49.750000
23 48.527778
dtype: float64
但如果需要MultiIndex
按小时计算:
series.index = [series.index.hour, series.index]
print (series)
0 2000-01-01 00:00:00 27
2000-01-01 00:05:00 43
2000-01-01 00:10:00 89
2000-01-01 00:15:00 42
2000-01-01 00:20:00 28
2000-01-01 00:25:00 79
2000-01-01 00:30:00 60
2000-01-01 00:35:00 45
2000-01-01 00:40:00 37
2000-01-01 00:45:00 92
2000-01-01 00:50:00 39
2000-01-01 00:55:00 81
1 2000-01-01 01:00:00 11
2000-01-01 01:05:00 77
2000-01-01 01:10:00 69
2000-01-01 01:15:00 98
...
然后可以按小时选择:
print (series.loc[0])
2000-01-01 00:00:00 27
2000-01-01 00:05:00 43
2000-01-01 00:10:00 89
2000-01-01 00:15:00 42
2000-01-01 00:20:00 28
2000-01-01 00:25:00 79
2000-01-01 00:30:00 60
2000-01-01 00:35:00 45
2000-01-01 00:40:00 37
2000-01-01 00:45:00 92
2000-01-01 00:50:00 39
2000-01-01 00:55:00 81
2000-01-02 00:00:00 82
2000-01-02 00:05:00 69
2000-01-02 00:10:00 99
2000-01-02 00:15:00 17
2000-01-02 00:20:00 59
...
此外,如果需要mean
没有变化DatetimeIndex
:
s1 = series.groupby(series.index.hour).transform('mean')
print (s1)
2000-01-01 00:00:00 49.392857
2000-01-01 00:05:00 49.392857
2000-01-01 00:10:00 49.392857
2000-01-01 00:15:00 49.392857
2000-01-01 00:20:00 49.392857
2000-01-01 00:25:00 49.392857
2000-01-01 00:30:00 49.392857
2000-01-01 00:35:00 49.392857
2000-01-01 00:40:00 49.392857
2000-01-01 00:45:00 49.392857
2000-01-01 00:50:00 49.392857
2000-01-01 00:55:00 49.392857
2000-01-01 01:00:00 52.523810
2000-01-01 01:05:00 52.523810
2000-01-01 01:10:00 52.523810
2000-01-01 01:15:00 52.523810
2000-01-01 01:20:00 52.523810
2000-01-01 01:25:00 52.523810
2000-01-01 01:30:00 52.523810
...
编辑:
对于每小时使用的列表:
s = series.groupby(series.index.hour).apply(list)
print (s)
0 [27, 43, 89, 42, 28, 79, 60, 45, 37, 92, 39, 8...
1 [11, 77, 69, 98, 78, 84, 34, 66, 4, 8, 85, 62,...
2 [16, 41, 10, 72, 44, 35, 48, 51, 99, 53, 22, 3...
3 [56, 22, 74, 85, 81, 6, 44, 44, 49, 43, 95, 11...
4 [21, 90, 89, 76, 62, 20, 66, 50, 68, 79, 69, 4...
5 [51, 85, 31, 58, 97, 10, 91, 25, 4, 11, 94, 28...
6 [5, 71, 62, 57, 62, 87, 12, 41, 43, 47, 25, 15...
7 [84, 17, 26, 32, 14, 76, 72, 35, 8, 60, 79, 27...
8 [15, 30, 80, 53, 10, 97, 71, 83, 37, 44, 89, 1...
9 [58, 20, 98, 77, 75, 26, 63, 26, 24, 62, 93, 6...
10 [39, 61, 92, 43, 61, 73, 86, 64, 26, 0, 75, 11...
11 [24, 13, 13, 54, 50, 38, 22, 46, 67, 15, 29, 4...
12 [21, 56, 16, 63, 46, 79, 11, 85, 87, 18, 66, 9...
13 [10, 89, 66, 80, 60, 2, 6, 19, 77, 81, 38, 48,...
14 [17, 64, 90, 91, 71, 32, 77, 9, 76, 14, 9, 79,...
15 [95, 75, 49, 34, 5, 31, 43, 68, 84, 48, 25, 69...
16 [13, 68, 87, 96, 6, 83, 9, 5, 29, 93, 57, 92, ...
17 [77, 6, 73, 41, 76, 93, 11, 50, 72, 84, 82, 53...
18 [95, 11, 61, 56, 30, 24, 24, 9, 0, 65, 96, 82,...
19 [31, 14, 98, 67, 7, 54, 29, 60, 77, 83, 45, 70...
20 [4, 15, 37, 78, 79, 59, 63, 97, 14, 74, 33, 2,...
21 [88, 69, 31, 20, 41, 10, 41, 6, 36, 27, 63, 49...
22 [4, 90, 70, 66, 92, 46, 54, 47, 6, 54, 62, 80,...
23 [27, 23, 21, 18, 29, 39, 77, 88, 21, 86, 7, 45...
dtype: object
推荐阅读
- powershell - 如何根据这些条件转换字符串?(无垃圾邮件到真正的电子邮件)
- python - 无法在 Python 中导入我自己的依赖项
- performance - 在高负载应用程序中使用装饰器是个好主意吗?
- python - 如何使用 CNN-LSTM 对不同长度的视频进行分类?
- vb.net - 在另一个表单上引发事件
- powershell - 相当于`netsh http show sslcert ipport='0.0.0.0:8172'`的powershell
- python - 以正确的顺序根据另一个列表从列表中删除元素
- python - 如何在 python 中“写入新的 .CSV 文件”或“另存为新的 .CSV 文件”
- r - 如何更改 R 中数据框的形状?(将具有相同名称的列堆叠在一起)
- verilog - 如何在verilog中获得小数(比如0.78)的平方和立方?