首页 > 解决方案 > Rank/Row Number Window Function in Python

问题描述

I am a newcomer to Python and I am needing a solution for this example below, This is how my df looks like,

enter image description here

I need to return just two records one for ID_Number 207921021 and one for ID_Number 222037001. Using the below code I was able to get one record for ID_Number 222037001 based on max value of Score 1 and Score 2 when the Action is the same ('Invest' in this case),

Data=Data.groupby(['ID_Number','Action'])['Score_1','Score_2'].max().reset_index()

Below was my result,

enter image description here

Is there way to just to get one record for ID_Number 207921021 as well (Two separate values in the action column this time i.e. 'Use' and 'Avoid')? I have used Window Functions in SQL i.e using QUALIFY RANK() OVER (PARTITION BY ID_number, Action ORDER BY Score_1,Score_2 DESC)=1 to get this done previously.

Is there a similar function in Python to do this?

标签: pythonpython-3.xpandasjupyter-notebook

解决方案


ID_Number这将根据您定义的排序选择一行。

df.sort_values(by=['Score_2', 'Score_1'], ascending=[False, True]).groupby(['ID_Number']).head(1)

输出:

    Action  ID_Number   Score_1     Score_2
3   Invest  222037001   9   0.4600
0   Use     207821021   7   0.4525

推荐阅读