首页 > 解决方案 > 从 Pandas 中的分组数据中获取最大的 n 个项目

问题描述

使用这些数据

import pandas as pd 
df=pd.read_excel(
    "https://github.com/chris1610/pbpython/blob/master/data/sample-salesv3.xlsx?raw=True"
    )
df["date"] = pd.to_datetime(df['date']) df.head()

ext price我使用name此代码按月分组

df.groupby([pd.Grouper(key='date', freq='M'), 'name'])['ext price'].sum()

出去 :

date        name                           
2014-01-31  Barton LLC                          6177.57
            Cronin, Oberbrunner and Spencer     1141.75
            Frami, Hills and Schmidt            5112.34
            Fritsch, Russel and Anderson       15130.77
            Halvorson, Crona and Champlin       9997.17
            Herman LLC                         10749.84
            Jerde-Hilpert                      11274.33
            Kassulke, Ondricka and Metz         7322.83
            Keeling LLC                         6847.86
            Kiehn-Spinka                        8097.50
            Koepp Ltd                          10768.33
            Kuhn-Gusikowski                     7309.54
            Kulas Inc                          15398.87
            Pollich LLC                         1004.22
            Purdy-Kunde                         4689.37
            Sanford and Sons                    9544.13
            Stokes LLC                          5809.34
            Trantow-Barrows                    14328.26
            White-Trantow                      13703.77
            Will LLC                           20953.87
2014-02-28  Barton LLC                         12218.03
            Cronin, Oberbrunner and Spencer    13976.26
            Frami, Hills and Schmidt            4124.53
            Fritsch, Russel and Anderson        9595.35
            Halvorson, Crona and Champlin       7082.15
            Herman LLC                          5831.40
            Jerde-Hilpert                       4088.40
            Kassulke, Ondricka and Metz         3061.12
            Keeling LLC                         3383.45
            Kiehn-Spinka                        3461.12
                                                 ...   
2014-11-30  Koepp Ltd                           4882.27
            Kuhn-Gusikowski                     7197.89
            Kulas Inc                           4149.34
            Pollich LLC                         6334.21
            Purdy-Kunde                         2376.00
            Sanford and Sons                    6834.04
            Stokes LLC                          6158.81
            Trantow-Barrows                     6550.10
            White-Trantow                       9544.61
            Will LLC                            3210.44
2014-12-31  Barton LLC                          2772.90
            Cronin, Oberbrunner and Spencer     7640.60
            Frami, Hills and Schmidt           16249.81
            Fritsch, Russel and Anderson       12345.64
            Halvorson, Crona and Champlin       2900.51
            Herman LLC                          4664.54
            Jerde-Hilpert                       6941.99
            Kassulke, Ondricka and Metz         4425.22
            Keeling LLC                        13247.88
            Kiehn-Spinka                       17401.28
            Koepp Ltd                          11791.00
            Kuhn-Gusikowski                     4959.85
            Kulas Inc                           6106.38
            Pollich LLC                        12357.76
            Purdy-Kunde                         4051.79
            Sanford and Sons                    2151.48
            Stokes LLC                          6366.26
            Trantow-Barrows                    10124.23
            White-Trantow                       4806.93
            Will LLC                           12561.21
Name: ext price, Length: 240, dtype: float64

现在,我正在尝试为每个name获得前 5名(顶部ext pricemonth

我试过nlargest(5)但它不起作用

功能head(5)也不能解决问题

标签: pythonpandasgroup-by

解决方案


另一种选择是nlargest但可能不会比詹姆斯的建议更快,因为排序和获取headortail应该比nlargest

new = df.groupby([pd.Grouper(key='date', freq='M'), 'name'])['ext price'].sum()
new.groupby(level=0).nlargest(5).sort_index().reset_index(level=1, drop=True).to_frame()

推荐阅读