首页 > 解决方案 > 使用 groupby 应用到 DataFrame 的函数的重复结果

问题描述

我正在尝试计算负债的收入群体与这些收入群体中个人总数的百分比(债务 = 1,无债务 = 0)我也尝试了 groupby() 方法,但没有设法让它起作用。这是我输入的:

import pandas as pd
import numpy as np
credit_scoring = pd.read_csv('/datasets/credit_scoring_eng.csv')

in_debt = credit_scoring[credit_scoring['debt'] == 1]['income_group'].value_counts()
total = credit_scoring['income_group'].value_counts()
print(in_debt)
print(total)

def percentage_of_debt(incomegroup):
    calc = in_debt / total * 100
    return calc

credit_scoring.groupby('income_group')['debt'].apply(percentage_of_debt)

结果显示了正确的百分比,但它也将结果再次分组到收入组中,如下所示:

< 20000          608
25000 - 29999    409
>= 35000         290
20000 - 24999    288
30000 - 34999    146
Name: income_group, dtype: int64
< 20000          7369
25000 - 29999    4856
>= 35000         4071
20000 - 24999    3378
30000 - 34999    1851
Name: income_group, dtype: int64
income_group                
20000 - 24999  < 20000          8.250780
               25000 - 29999    8.422570
               >= 35000         7.123557
               20000 - 24999    8.525755
               30000 - 34999    7.887628
                                  ...   
>= 35000       < 20000          8.250780
               25000 - 29999    8.422570
               >= 35000         7.123557
               20000 - 24999    8.525755
               30000 - 34999    7.887628
Name: debt, Length: 25, dtype: float64

我希望输出显示:

< 20000          8.250780
25000 - 29999    8.422570
20000 - 24999    8.525755
30000 - 34999    7.887628
>= 35000         7.123557

感谢所有的建议和指导!

标签: pythonpandasfunctiondataframeanalysis

解决方案


没有基础表有点棘手,但我认为你想要的是

credit_scoring.groupby("income_group").agg(lambda df: 100 * df['debt'].sum() / df['debt'].count()).sort_index()

这就是你所追求的吗?


推荐阅读