首页 > 解决方案 > 确定 Pandas Grouped By df 中的最大计数,并将其用作返回记录的标准

问题描述

下午所有,

我在一个月内有大量数据。我想:

一个。查找该月期间交易次数最多的书。

湾。知道这一点后,就可以对本月在该书上完成的所有交易进行分组汇总,但会在 24 小时制的每个小时内显示该月的交易。

这是一个示例数据集:

df_Highest_Traded_Away_Book = [                            
                                ('trading_book', ['A', 'A','A','A','B','C','C','C']),                            
                                ('rfq_create_date_time', ['2018-09-03 01:06:09', '2018-09-08 01:23:29', 
                                                          '2018-09-15 02:23:29','2018-09-20 03:23:29',
                                                          '2018-09-20 00:23:29','2018-09-25 01:23:29',
                                                          '2018-09-25 02:23:29','2018-09-30 02:23:29',])

                            ]

df_Highest_Traded_Away_Book = pd.DataFrame.from_items(df_Highest_Traded_Away_Book)
display(df_Highest_Traded_Away_Book)

    trading_book    rfq_create_date_time
0              A    2018-09-03 01:06:09
1              A    2018-09-08 01:23:29
2              A    2018-09-15 02:23:29
3              A    2018-09-20 03:23:29
4              B    2018-09-20 00:23:29
5              C    2018-09-25 01:23:29
6              C    2018-09-25 02:23:29
7              C    2018-09-30 02:23:29    

df_Highest_Traded_Away_Book['rfq_create_date_time'] = pd.to_datetime(df_Highest_Traded_Away_Book['rfq_create_date_time'])
df_Highest_Traded_Away_Book['Time_in_GMT'] =  df_Highest_Traded_Away_Book['rfq_create_date_time'].dt.hour
display(df_Highest_Traded_Away_Book)

    trading_book    rfq_create_date_time    Time_in_GMT
0              A    2018-09-03 01:06:09               1
1              A    2018-09-08 01:23:29               1
2              A    2018-09-15 02:23:29               2
3              A    2018-09-20 03:23:29               3
4              B    2018-09-20 00:23:29               0
5              C    2018-09-25 01:23:29               1
6              C    2018-09-25 02:23:29               2
7              C    2018-09-30 02:23:29               2

df_Highest_Traded_Away_Book =  df_Highest_Traded_Away_Book.groupby(['trading_book']).size().reset_index(name='Traded_Away_for_the_Hour').sort_values(['Traded_Away_for_the_Hour'], ascending=False)   
display(df_Highest_Traded_Away_Book)

  trading_book  Trades_Bucketted_into_the_Hour_They_Occured
0            A                         4
2            C                         3
1            B                         1

display(df_Highest_Traded_Away_Book['Traded_Away_for_the_Hour'].max())

4 

即 A 册本月交易次数最多

现在返回按此书(当月)完成的所有交易的结果分组,但显示为将交易分入交易时间。

Time_in_GMT Trades_Book_A_Bucketted_into_the_Hour_They_Occured
0                     0
1                     2
2                     1
3                     1
4                     0
.                     0
.                     0
.                     0
24                    0

任何帮助,将不胜感激。我认为有一些方法可以在一行代码中返回标准。

标签: pandasdataframegroup-bypandas-groupby

解决方案


用于Series.idxmax顶级书籍:

df_Highest_Traded_Away_Book['rfq_create_date_time'] = pd.to_datetime(df_Highest_Traded_Away_Book['rfq_create_date_time'])
df_Highest_Traded_Away_Book['Time_in_GMT'] =  df_Highest_Traded_Away_Book['rfq_create_date_time'].dt.hour

df_Highest_Book =  df_Highest_Traded_Away_Book.groupby(['trading_book']).size().idxmax()
#alternative solution
#df_Highest_Book =  df_Highest_Traded_Away_Book['trading_book'].value_counts().idxmax()  
print(df_Highest_Book)
A

然后比较,聚合值计数并添加缺失值:eq (==)sumTruereindex

df_Highest_Traded_Away_Book = (df_Highest_Traded_Away_Book['trading_book']
                                      .eq(df_Highest_Book)
                                      .groupby(df_Highest_Traded_Away_Book['Time_in_GMT'])
                                      .sum()
                                      .astype(int)
                                      .reindex(np.arange(25), fill_value=0)
                                      .to_frame(df_Highest_Book))

print(df_Highest_Traded_Away_Book)
             A
Time_in_GMT   
0            0
1            2
2            1
3            1
4            0
5            0
6            0
7            0
8            0
9            0
10           0
11           0
12           0
13           0
14           0
15           0
16           0
17           0
18           0
19           0
20           0
21           0
22           0
23           0
24           0

推荐阅读