python - 对分组数据中的值进行排名
问题描述
我正在寻找一种有效的方法来重新编号 Python 中分组数据帧的秩向量。
在一个简单的数据中,我首先soldier_type
根据他们的counts
. 到目前为止这很简单!OTOH,我需要根据以下条件重新排列士兵:
如果,在和soldier_type == S1
的每一组中,我希望它始终排名为,然后从排名开始(从最高到最低)重新排名其他士兵类型。regiment
trucks
1
2
counts
这是我解决此问题的尝试:
import pandas as pd
from numpy.random import seed
from numpy.random import randint
seed(1234)
raw_data = {'regiment': ['51st', '51st', '51st', '51st', '51st', '51st', '51st', '51st', '51st', '51st', '51st', '51st'],
'trucks': ['MAZ-7310', 'MAZ-7310', 'MAZ-7310', 'MAZ-7310', 'Tatra 810', 'Tatra 810', 'Tatra 810', 'Tatra 810', 'ZIS-150', 'ZIS-150', 'ZIS-150', 'ZIS-150'],
'soldier_type': ['S1', 'S2', 'S3', 'S4', 'S1', 'S3', 'S4', 'S5', 'S1', 'S2', 'S4', 'S5'],
'counts': randint(1,100,12)}
df = pd.DataFrame(raw_data, columns = ['regiment', 'trucks','soldier_type', 'counts'])
regiment trucks soldier_type counts
0 51st MAZ-7310 S1 48
1 51st MAZ-7310 S2 84
2 51st MAZ-7310 S3 39
3 51st MAZ-7310 S4 54
4 51st Tatra 810 S1 77
5 51st Tatra 810 S3 25
6 51st Tatra 810 S4 16
7 51st Tatra 810 S5 50
8 51st ZIS-150 S1 24
9 51st ZIS-150 S2 27
10 51st ZIS-150 S4 31
11 51st ZIS-150 S5 44
def rank_soldier_type (df):
df = df.assign(rank_ = df.groupby(['regiment','trucks'])['counts'].rank(ascending = False,method='dense'))
return df #1st part
#%%
if df.soldier_type != 'S1' and df.rank_ == 1 :
df['new_rank_'] = 1
else:
df['new_rank_'] = df['rank_'].rank(ascending = False,method='dense')
return df
df = rank_soldier_type(df)
如果我运行此函数的第一部分,我可以创建rank_
列:
df = rank_soldier_type(df)
regiment trucks soldier_type counts rank_
0 51st MAZ-7310 S1 48 3.0
1 51st MAZ-7310 S2 84 1.0
2 51st MAZ-7310 S3 39 4.0
3 51st MAZ-7310 S4 54 2.0
4 51st Tatra 810 S1 77 1.0
5 51st Tatra 810 S3 25 3.0
6 51st Tatra 810 S4 16 4.0
7 51st Tatra 810 S5 50 2.0
8 51st ZIS-150 S1 24 4.0
9 51st ZIS-150 S2 27 3.0
10 51st ZIS-150 S4 31 2.0
11 51st ZIS-150 S5 44 1.0
预期输出;
regiment trucks soldier_type counts rank_ new_rank_
0 51st MAZ-7310 S1 48 3.0 1.0
1 51st MAZ-7310 S2 84 1.0 2.0
2 51st MAZ-7310 S3 39 4.0 4.0
3 51st MAZ-7310 S4 54 2.0 3.0
4 51st Tatra 810 S1 77 1.0 1.0
5 51st Tatra 810 S3 25 3.0 3.0
6 51st Tatra 810 S4 16 4.0 4.0
7 51st Tatra 810 S5 50 2.0 2.0
8 51st ZIS-150 S1 24 4.0 1.0
9 51st ZIS-150 S2 27 3.0 4.0
10 51st ZIS-150 S4 31 2.0 3.0
11 51st ZIS-150 S5 44 1.0 2.0
解决方案
通过添加修复您的代码duplicated
df['New']=df[df[['regiment', 'trucks']].duplicated()].\
groupby(['regiment', 'trucks'])['counts'].rank(ascending=False, method='dense')+1
df.New.fillna(1,inplace=True)
df
Out[35]:
regiment trucks soldier_type counts New
0 51st MAZ-7310 S1 48 1.0
1 51st MAZ-7310 S2 84 2.0
2 51st MAZ-7310 S3 39 4.0
3 51st MAZ-7310 S4 54 3.0
4 51st Tatra 810 S1 77 1.0
5 51st Tatra 810 S3 25 3.0
6 51st Tatra 810 S4 16 4.0
7 51st Tatra 810 S5 50 2.0
8 51st ZIS-150 S1 24 1.0
9 51st ZIS-150 S2 27 4.0
10 51st ZIS-150 S4 31 3.0
11 51st ZIS-150 S5 44 2.0
推荐阅读
- r - 如何将系数设置为特定值,并在模型摘要中保留预测变量?
- python - 如何从 base64 创建 FIle 对象以将其发送给 Django
- next.js - 使用 nextjs 和 vercel 管理 API 版本
- r - 在ggplot R中的轮廓图中设置颜色级别
- php - How to send sms from php using the application smssync?
- javascript - Components overlapping in page render with react jsx
- c# - Unit test ASP.NET Core routing & endpoints
- c++ - how to convert a string from a getline got from .csv file to a int to use it
- php - How to change dimensions of SVG and print it as PNG?
- vba - Opening form with DoCmd.OpenForm