首页 > 解决方案 > Little logic issue with Panda duplicates on python

问题描述

If I have a panda df with following entries:

type    location    unit     condition
Car        01        19          2
Car        01        19          3
Car        01        19          1
Car        02        19          1
Car        01        20          1
Machine    05        09          1
Machine    05        09          2
Machine    05        09          3
Machine    15        09          1
Machine    15        10          1
Truck      02        09          2
Truck      02        09          1

For the duplicates (based on type | location | unit) I would like to only get the ones with the best condition. I tried several sort_value duplicates etc. but I think I have a logic issue in my head.

so the perfect result would look like this:

type    location    unit     condition
Car        01        19          3
Car        02        19          1
Car        01        20          1
Machine    05        09          3
Machine    15        09          1
Machine    15        10          1
Truck      02        09          2

Commentary on proposed duplicate

Sorry for the mistake: I totally forgot (type, location, unit and condition) are not the only columns there. There is at least two more currently (Torque and Wheels) which I would need also to have access to.

So I guess it is not possible to solve with the answers to this question.

Current full df

type    location    unit     condition      wheels      torque
Car        01        19          2            4          256
Car        01        19          3            4          320
Car        01        19          1            4          190
Car        02        19          1            4          280
Car        01        20          1            4          400
Machine    05        09          1            4          320
Machine    05        09          2            6          690
Machine    05        09          3            12        1180
Machine    15        09          1            4          290
Machine    15        10          1            6          445
Truck      02        09          2            6          625
Truck      02        09          1            8          804

标签: pythonpandas

解决方案


You can use groupby and retain just the maximum condition:

df = df.groupby(['type', 'location', 'unit'])['condition'].max().reset_index()

推荐阅读