首页 > 解决方案 > 当 groupby 时,Pandas 数据框每组取组最大值

问题描述

我有很多列的数据框,2 个是分类的,其余的是数字的:

df = [type1 , type2 , type3 , val1, val2, val3
       a       b        q       1    2     3
       a       c        w       3    5     2
       b       c        t       2    9     0
       a       b        p       4    6     7
       a       c        m       2    1     8]

我想根据groupby(["type1","type2"])将创建的操作应用合并,从分组行中获取最大值:

df = [type1 , type2 ,type3, val1, val2, val3 
       a       b       q      2    6     7     
       a       c       w      4    5     8      
       b       c       t      2    9     0      

解释:val3第一行是 7,因为这是 时的最大值type1 = a, type2 = b

同样,val3第二行是 8,因为这是 时的最大值type1 = a, type2 = c

标签: pythonpandasdataframepandas-groupby

解决方案


If need aggregate all columns by max:

df = df.groupby(["type1","type2"]).max()
print (df)
            type3  val1  val2  val3
type1 type2                        
a     b         q     4     6     7
      c         w     3     5     8
b     c         t     2     9     0

If need some columns aggregate different you can create dictionary of columns names with aggregate functions and then set another aggregate functuions for some columns, like for type3 is used first and for val1 is used last:

d = dict.fromkeys(df.columns.difference(['type1','type2']), 'max')
d['type3'] = 'first'
d['val1'] = 'last'

df = df.groupby(["type1","type2"], as_index=False, sort=False).agg(d)
print (df)
  type1 type2 type3  val1  val2  val3
0     a     b     q     4     6     7
1     a     c     w     2     5     8
2     b     c     t     2     9     0

推荐阅读