python - 获取按值分组的每行的百分比
问题描述
我有以下df:
df3 = pd.DataFrame(np.array([['Iza', 'Tuesday'],['Martin', 'Friday'],['John', 'Monday'],['Iza', 'Tuesday'],['Iza', 'Tuesday'],['Iza', 'Wednesday'],['Sara', 'Friday'], ['Sara', 'Friday'], ['Sara', 'Sunday'],['Silvia', 'Monday'],['Silvia', 'Wednesday'],['Paul', 'Monday'],['Paul', 'Tuesday'],['Paul', 'Wednesday']]),
columns=['Name', 'Day'])
df3:
Name Day
0 Iza Tuesday
1 Martin Friday
2 John Monday
3 Iza Tuesday
4 Iza Tuesday
5 Iza Wednesday
6 Sara Friday
7 Sara Friday
8 Sara Sunday
9 Silvia Monday
10 Silvia Wednesday
11 Paul Monday
12 Paul Tuesday
13 Paul Wednesday
我得到了每个用户的天数:
oo = df3.groupby(['Name','Day'])['Day'].size().reset_index(name='counts')
结果:
Name Day counts
0 Iza Tuesday 3
1 Iza Wednesday 1
2 John Monday 1
3 Martin Friday 1
4 Paul Monday 1
5 Paul Tuesday 1
6 Paul Wednesday 1
7 Sara Friday 2
8 Sara Sunday 1
9 Silvia Monday 1
10 Silvia Wednesday 1
丢弃只有一天记录的不受欢迎的用户;
uniq_us = oo[oo.duplicated(['Name'], keep=False)]
结果:
Name Day counts
0 Iza Tuesday 3
1 Iza Wednesday 1
4 Paul Monday 1
5 Paul Tuesday 1
6 Paul Wednesday 1
7 Sara Friday 2
8 Sara Sunday 1
9 Silvia Monday 1
10 Silvia Wednesday 1
现在我想按名称获取每个分组日期中的计数百分比:
uniq_us.groupby(['Name','Day'])['counts'].apply(lambda x: x.value_counts(normalize=True)) * 100
我有:
Name Day
Iza Tuesday 3 100.0
Wednesday 1 100.0
Paul Monday 1 100.0
Tuesday 1 100.0
Wednesday 1 100.0
Sara Friday 2 100.0
Sunday 1 100.0
Silvia Monday 1 100.0
Wednesday 1 100.0
Name: counts, dtype: float64
我不知道如何按分组名称计算它
期望的输出:
Name Day
Iza Tuesday 3 75.0
Wednesday 1 25.0
Paul Monday 1 33.33
Tuesday 1 33.33
Wednesday 1 33.33
Sara Friday 2 66.66
Sunday 1 33.34
Silvia Monday 1 50.0
Wednesday 1 50.0
Name: counts, dtype: float64
解决方案
另一种选择是在早期阶段标准化计数:
(df3.groupby('Name')
.Day
.value_counts(normalize=True)
.mul(100)
.rename('Counts')
.reset_index()
.pipe(lambda x: x[x.duplicated(['Name'], keep=False)]))
# Name Day Counts
#0 Iza Tuesday 75.000000
#1 Iza Wednesday 25.000000
#4 Paul Monday 33.333333
#5 Paul Tuesday 33.333333
#6 Paul Wednesday 33.333333
#7 Sara Friday 66.666667
#8 Sara Sunday 33.333333
#9 Silvia Monday 50.000000
#10 Silvia Wednesday 50.000000
推荐阅读
- sql - 仅从同一客户的交易表中获取旧订单
- php - 重构和清理网络资源
- azure - 使用 SAML 令牌加密的 Azure AD SSO
- python - Django live Insert 不刷新页面
- linux - glibc 中的 mmap 实现 - 带有符号 mmap 的动态库
- xamarin.forms - 苹果拒绝应用程序要求提供访问位置目的字符串 Info.plist
- python-3.x - 为什么'2'<'1'== False在python3中输出False?
- javascript - 如果值大于最大值,则更改进度条颜色
- google-cloud-platform - 如何限制对通过谷歌云运行部署的 URL 的访问?
- javascript - 使用 Jest-expo 测试异步存储的示例用例?