首页 > 解决方案 > 使用 for 循环和 group by 计算百分比

问题描述

如果我有以下带有循环的代码,它根据下面给出了输赢类型的比率,如果我想查看相同的数据但按教授分组,我将如何更改代码?

leads = ['Passed','Failed']
max_status = None
max_percent = None
for lead in leads:
    df_overall = df[(df['Status']== lead) & (df['size']== '20-34')]
    num_overall = len(df_overall) 
    lead_df = df[(df['size']== '20-34')]
    num_total = len(lead_df)
    percentage_overall = num_overall / num_total
   
    
    if max_status is None: 
        
        
        
        print(lead, percentage_overall)

这给了我如下输出:

Passed .65
Failed .35

我想编辑按教授分组的代码,因为它们也是我数据框中的教授列。

预期输出:

Mr. Johnson Passed .35
Mr. Johnson Failed .65
Ms. Jones   Passed .90
Ms. Jones   Failed .10
Mr. Boe     Passed .80
Mr. Boe     Passed .20

谢谢

标签: pythonpython-3.xpandas

解决方案


我相信你需要GroupBy.size

leads = ['Passed','Failed']

lead_df = df[(df['size']== '20-34')]
#filter by list leads
df_overall = lead_df[lead_df['Status'].isin(lead)]

num_overall1 = df_overall.groupby(['professor','Status']).size()
num_total1 = lead_df.groupby(['professor','Status']).size()

out = num_overall1.div(num_total1).reset_index(name='per')
print (out)

推荐阅读