首页 > 解决方案 > 获取按值分组的每行的百分比

问题描述

我有以下df:

df3 = pd.DataFrame(np.array([['Iza', 'Tuesday'],['Martin', 'Friday'],['John', 'Monday'],['Iza', 'Tuesday'],['Iza', 'Tuesday'],['Iza', 'Wednesday'],['Sara', 'Friday'], ['Sara', 'Friday'], ['Sara', 'Sunday'],['Silvia', 'Monday'],['Silvia', 'Wednesday'],['Paul', 'Monday'],['Paul', 'Tuesday'],['Paul', 'Wednesday']]),
                   columns=['Name', 'Day'])

df3:

    Name    Day
0   Iza     Tuesday
1   Martin  Friday
2   John    Monday
3   Iza     Tuesday
4   Iza     Tuesday
5   Iza     Wednesday
6   Sara    Friday
7   Sara    Friday
8   Sara    Sunday
9   Silvia  Monday
10  Silvia  Wednesday
11  Paul    Monday
12  Paul    Tuesday
13  Paul    Wednesday

我得到了每个用户的天数:

oo = df3.groupby(['Name','Day'])['Day'].size().reset_index(name='counts')

结果:

    Name    Day      counts
0   Iza    Tuesday      3
1   Iza    Wednesday    1
2   John    Monday      1
3   Martin  Friday      1
4   Paul    Monday      1
5   Paul    Tuesday     1
6   Paul    Wednesday   1
7   Sara    Friday      2
8   Sara    Sunday      1
9   Silvia  Monday      1
10  Silvia  Wednesday   1

丢弃只有一天记录的不受欢迎的用户;

uniq_us = oo[oo.duplicated(['Name'], keep=False)]

结果:

    Name    Day       counts
0   Iza     Tuesday     3
1   Iza     Wednesday   1
4   Paul    Monday      1
5   Paul    Tuesday     1
6   Paul    Wednesday   1
7   Sara    Friday      2
8   Sara    Sunday      1
9   Silvia  Monday      1
10  Silvia  Wednesday   1

现在我想按名称获取每个分组日期中的计数百分比:

uniq_us.groupby(['Name','Day'])['counts'].apply(lambda x: x.value_counts(normalize=True)) * 100

我有:

Name    Day         
Iza     Tuesday    3    100.0
        Wednesday  1    100.0
Paul    Monday     1    100.0
        Tuesday    1    100.0
        Wednesday  1    100.0
Sara    Friday     2    100.0
        Sunday     1    100.0
Silvia  Monday     1    100.0
        Wednesday  1    100.0
Name: counts, dtype: float64

我不知道如何按分组名称计算它

期望的输出:

     Name    Day         

    Iza     Tuesday    3    75.0
            Wednesday  1    25.0
    Paul    Monday     1    33.33
            Tuesday    1    33.33
            Wednesday  1    33.33
    Sara    Friday     2    66.66
            Sunday     1    33.34
    Silvia  Monday     1    50.0
            Wednesday  1    50.0
    Name: counts, dtype: float64

标签: pythonpandasdataframenumpy

解决方案


另一种选择是在早期阶段标准化计数:

(df3.groupby('Name')
    .Day
    .value_counts(normalize=True)
    .mul(100)
    .rename('Counts')
    .reset_index()
    .pipe(lambda x: x[x.duplicated(['Name'], keep=False)]))

#      Name        Day     Counts
#0      Iza    Tuesday  75.000000
#1      Iza  Wednesday  25.000000
#4     Paul     Monday  33.333333
#5     Paul    Tuesday  33.333333
#6     Paul  Wednesday  33.333333
#7     Sara     Friday  66.666667
#8     Sara     Sunday  33.333333
#9   Silvia     Monday  50.000000
#10  Silvia  Wednesday  50.000000

推荐阅读