首页 > 解决方案 > 桶时间超过 24 小时时钟,并汇总每个桶的数量

问题描述

我想对Date_Time超过 24 小时时钟的计数进行存储,但也要Amt对每个存储桶的关联进行求和。这只适用于最高容量的书。分桶代码是 24 小时完成的,只需要求和的帮助Amt

数据框:

import pandas as pd
import numpy as np

df_Highest_Traded_Away_Book = [
                                ('Book', ['A', 'A','A','A','B','C','C','C']),
                                ('Amt', ['10', '10', '10', '10', '20', '30', '30', '30']),
                                ('Date_Time', ['2018-09-03 01:06:09', '2018-09-08 01:23:29',
                                                          '2018-09-15 02:23:29','2018-09-20 03:23:29',
                                                          '2018-09-20 00:23:29','2018-09-25 01:23:29',
                                                          '2018-09-25 02:23:29','2018-09-30 02:23:29',])
                              ]

获得最高容量的书

df_Highest_Traded_Away_Book = pd.DataFrame.from_items(df_Highest_Traded_Away_Book)
df_Highest_Traded_Away_Book['Date_Time'] = pd.to_datetime(df_Highest_Traded_Away_Book['Date_Time'])
df_Highest_Traded_Away_Book['Time_in_GMT'] =  df_Highest_Traded_Away_Book['Date_Time'].dt.hour
print(df_Highest_Traded_Away_Book)

df_Highest_Book =  df_Highest_Traded_Away_Book.groupby(['Book']).size().idxmax()
print(df_Highest_Book)

24 小时内的存储桶时间:

df_Highest_Traded_Away_Book = (df_Highest_Traded_Away_Book['Book']
              .eq(df_Highest_Book)
              .groupby(df_Highest_Traded_Away_Book['Time_in_GMT'])
              .sum()
              .astype(int)
              .reindex(np.arange(25), fill_value=0)
              .to_frame(df_Highest_Book))

print(df_Highest_Traded_Away_Book )
             A
Time_in_GMT   
0            0
1            2
2            1
3            1
4            0
5            0
6            0
7            0
8            0
9            0
10           0
11           0
12           0
13           0
14           0
15           0
16           0
17           0
18           0
19           0
20           0
21           0
22           0
23           0
24           0

需要输出:

             A
Time_in_GMT      Sum Amt
0            0   0
1            2   20
2            1   10
3            1   10
4            0   0
5            0   0
6            0   0
7            0   0
8            0   0
9            0   0
10           0   0
11           0   0
12           0   0
13           0   0
14           0   0
15           0   0
16           0   0
17           0   0
18           0   0
19           0   0
20           0   0
21           0   0
22           0   0
23           0   0
24           0   0

标签: pythonpandaspandas-groupby

解决方案


首先boolean indexing仅按df_Highest_Book值过滤,然后按aggwithsize和聚合sum

#convert column to integers
df_Highest_Traded_Away_Book['Amt'] = df_Highest_Traded_Away_Book['Amt'].astype(int)
df_Highest_Traded_Away_Book = (df_Highest_Traded_Away_Book[df_Highest_Traded_Away_Book['Book']
                                  .eq(df_Highest_Book)]
                                  .groupby('Time_in_GMT')
                                  .agg({'Time_in_GMT':'size','Amt':'sum'})
                                  .reindex(np.arange(25), fill_value=0)
                                  .rename(columns={'Time_in_GMT':'Count','Amt':'Sum Amt'})
                                  )

print(df_Highest_Traded_Away_Book)
             Count  Sum Amt
Time_in_GMT                
0                0        0
1                2       20
2                1       10
3                1       10
4                0        0
5                0        0
6                0        0
7                0        0
8                0        0
9                0        0
10               0        0
11               0        0
12               0        0
13               0        0
14               0        0
15               0        0
16               0        0
17               0        0
18               0        0
19               0        0
20               0        0
21               0        0
22               0        0
23               0        0
24               0        0

推荐阅读