首页 > 解决方案 > 查找男性和女性的百分比

问题描述

在下面的代码之后,我能够从数据集中获得计数:

Users2 = Users.gender.groupby([Users['occupation'],Users['gender']]).count().astype(int)
Users2

输出:

occupation     gender
administrator  F          36
               M          43
artist         F          13
               M          15
doctor         M           7
educator       F          26
               M          69
engineer       F           2
               M          65

但是,我需要男性和女性的百分比而不是计数。

样本数据:

user_id age gender  occupation
0   1   24    M     doctor
1   2   53    F     educator
2   3   23    M     writer
3   4   24    M     administrator
4   5   33    F     artist

标签: python-3.xpandas

解决方案


SeriesGroupBy.value_counts与 一起使用normalize=True

#changed sample data for better MCVE
print (Users)
   user_id  age gender  occupation
0        1   24      M  technician
1        2   53      F  technician
2        3   23      M      writer
3        4   24      M  technician
4        5   33      F      writer

df = (Users.groupby('occupation')['gender']
           .value_counts(normalize=True)
           .reset_index(name='perc'))
print (df)
   occupation gender      perc
0  technician      M  0.666667
1  technician      F  0.333333
2      writer      F  0.500000
3      writer      M  0.500000

细节:

#without normalize=True get counts per groups
print (Users.groupby('occupation')['gender']
             .value_counts())
occupation  gender
technician  M         2
            F         1
writer      F         1
            M         1
Name: gender, dtype: int64

#with normalize=True get percentages
print (Users.groupby('occupation')['gender']
             .value_counts(normalize=True))

occupation  gender
technician  M         0.666667
            F         0.333333
writer      F         0.500000
            M         0.500000
Name: gender, dtype: float64

推荐阅读