首页 > 解决方案 > 删除重复项,但保留每组给定列中具有最大值的行

问题描述

我有一个这样的DF:

    Name        Gender         Age      Level
  Pikachu        Male           4         8
 Charmander     Female          5         7
 Charmander     Female          5         7
 Squirtle        Male           3         6
 Squirtle        Male           3         9
 Squirtle       Female          4         9

我希望它看起来像这样:

   Name        Gender         Age      Level
  Pikachu        Male           4         8
 Charmander     Female          5         7
 Squirtle        Male           3         9
 Squirtle       Female          4         9

我不知道如何用英语解释我想做什么,所以我会用伪代码来写。

基本上:

If Name, Gender and Age are the same:
      If there is a difference in levels:
            Keep the row with higher level
      If there is a tie:
            Keep a random one

任何想法表示赞赏!

标签: pythonpandasdataframegroup-by

解决方案


sort_values+检查drop_duplicates

df=df.sort_values('Level').drop_duplicates(['Name','Gender','Age'],keep='last')
df
         Name  Gender  Age  Level
2  Charmander  Female    5      7
0     Pikachu    Male    4      8
4    Squirtle    Male    3      9
5    Squirtle  Female    4      9

推荐阅读