python - 使用 Pandas pivot_table 嵌套小计“所有”行
问题描述
我有一些看起来像这样的长格式数据(见下文重新创建):
>>> df
section subsection name topic score
0 A W zwphf a 0.802427
1 A W jcyyc a 0.404077
2 A W kucem a 0.367319
3 A X ldbxz a 0.554260
4 A X vkcqh a 0.265864
5 A X cvksn a 0.548099
6 B Y spghx a 0.472612
7 B Y cqokn a 0.577504
8 B Y wjsxg a 0.815309
9 B Z holoo a 0.459850
10 B Z lnihf a 0.667877
11 B Z wirhq a 0.138879
12 A W zwphf b 0.673711
13 A W jcyyc b 0.507962
14 A W kucem b 0.546055
15 A X ldbxz b 0.148214
16 A X vkcqh b 0.773320
17 A X cvksn b 0.791990
18 B Y spghx b 0.487480
19 B Y cqokn b 0.252534
20 B Y wjsxg b 0.237767
21 B Z holoo b 0.432981
22 B Z lnihf b 0.317932
23 B Z wirhq b 0.614401
我想做一个 groupby on section
++ subsection
+ name
unstack topic
on topic
,但也显示间歇性嵌套的“All”小计行:
>>> result
section subsection name a b
0 A All All 0.490341 0.573542
1 A W All 0.524608 0.575909
2 A W jcyyc 0.404077 0.507962
3 A W kucem 0.367319 0.546055
4 A W zwphf 0.802427 0.673711
5 A X All 0.456074 0.571174
6 A X cvksn 0.548099 0.791990
7 A X ldbxz 0.554260 0.148214
8 A X vkcqh 0.265864 0.773320
9 B All All 0.522005 0.390516
10 B Y All 0.621808 0.325927
11 B Y cqokn 0.577504 0.252534
12 B Y spghx 0.472612 0.487480
13 B Y wjsxg 0.815309 0.237767
14 B Z All 0.422202 0.455104
15 B Z holoo 0.459850 0.432981
16 B Z lnihf 0.667877 0.317932
17 B Z wirhq 0.138879 0.614401
通过突出显示新行,这可能更容易可视化:
没有小计的初始 groupby 本身看起来像:
>>> df.groupby(['section', 'subsection', 'name', 'topic'])['score'].mean().unstack('topic')
topic a b
section subsection name
A W jcyyc 0.404077 0.507962
kucem 0.367319 0.546055
zwphf 0.802427 0.673711
X cvksn 0.548099 0.791990
ldbxz 0.554260 0.148214
vkcqh 0.265864 0.773320
B Y cqokn 0.577504 0.252534
spghx 0.472612 0.487480
wjsxg 0.815309 0.237767
Z holoo 0.459850 0.432981
lnihf 0.667877 0.317932
wirhq 0.138879 0.614401
但我不确定如何使用margins
来获取 groupby 操作的小计['section', 'topic']
和['section', 'subsection', 'topic']
.
重新创建df
:
import pandas as pd
data = [['A', 'W', 'zwphf', 'a', 0.80242702],
['A', 'W', 'jcyyc', 'a', 0.40407741],
['A', 'W', 'kucem', 'a', 0.36731944],
['A', 'X', 'ldbxz', 'a', 0.55426007],
['A', 'X', 'vkcqh', 'a', 0.26586396],
['A', 'X', 'cvksn', 'a', 0.54809939],
['B', 'Y', 'spghx', 'a', 0.47261223],
['B', 'Y', 'cqokn', 'a', 0.57750357],
['B', 'Y', 'wjsxg', 'a', 0.81530899],
['B', 'Z', 'holoo', 'a', 0.45985020],
['B', 'Z', 'lnihf', 'a', 0.66787651],
['B', 'Z', 'wirhq', 'a', 0.13887864],
['A', 'W', 'zwphf', 'b', 0.67371101],
['A', 'W', 'jcyyc', 'b', 0.50796174],
['A', 'W', 'kucem', 'b', 0.54605544],
['A', 'X', 'ldbxz', 'b', 0.14821402],
['A', 'X', 'vkcqh', 'b', 0.77331968],
['A', 'X', 'cvksn', 'b', 0.79198960],
['B', 'Y', 'spghx', 'b', 0.48747995],
['B', 'Y', 'cqokn', 'b', 0.25253355],
['B', 'Y', 'wjsxg', 'b', 0.23776694],
['B', 'Z', 'holoo', 'b', 0.43298050],
['B', 'Z', 'lnihf', 'b', 0.31793156],
['B', 'Z', 'wirhq', 'b', 0.61440056]]
df = pd.DataFrame(data,
columns=['section', 'subsection', 'name', 'topic', 'score'])
要重新创建预期结果:
import numpy as np
result = np.array([['A', 'All', 'All', 0.490341219, 0.573541919],
['A', 'W', 'All', 0.52460796, 0.5759094],
['A', 'W', 'jcyyc', 0.404077415, 0.5079617479999999],
['A', 'W', 'kucem', 0.36731944, 0.546055442],
['A', 'W', 'zwphf', 0.8024270240000001, 0.673711011],
['A', 'X', 'All', 0.45607447700000003, 0.571174437],
['A', 'X', 'cvksn', 0.548099391, 0.791989603],
['A', 'X', 'ldbxz', 0.554260074, 0.148214029],
['A', 'X', 'vkcqh', 0.265863967, 0.77331968],
['B', 'All', 'All', 0.5220050279999999, 0.390515513],
['B', 'Y', 'All', 0.621808268, 0.325926816],
['B', 'Y', 'cqokn', 0.577503576, 0.252533557],
['B', 'Y', 'spghx', 0.472612233, 0.487479951],
['B', 'Y', 'wjsxg', 0.815308995, 0.237766941],
['B', 'Z', 'All', 0.42220178799999997, 0.455104209],
['B', 'Z', 'holoo', 0.459850205, 0.43298050200000004],
['B', 'Z', 'lnihf', 0.667876511, 0.317931565],
['B', 'Z', 'wirhq', 0.13887864800000002, 0.61440056]], dtype=object)
result = pd.DataFrame(result, columns=['section', 'subsection', 'name', 'a', 'b'])
解决方案
你需要:
s = df.groupby(['section', 'subsection', 'name', 'topic'])['score'].mean().unstack('topic')
s1 = (s.mean(level=0)
.assign(subsection = 'All', name='All')
.set_index(['subsection','name'], append=True))
s2 = (s.mean(level=[0, 1])
.assign(name='All')
.set_index(['name'], append=True))
s = pd.concat([s, s1, s2]).sort_index()
但如果不需要submeans
确定上述解决方案是否正确(均值),更好的是:
s1 = df.groupby(['section','topic'])['score'].mean().unstack('topic').assign(subsection = 'All', name='All').set_index(['subsection','name'], append=True)
s2 = df.groupby(['section','subsection','topic'])['score'].mean().unstack('topic').assign(name='All').set_index(['name'], append=True)
s = pd.concat([s, s1, s2]).sort_index()
print (s)
topic a b
section subsection name
A All All 0.490341 0.573542
W All 0.524608 0.575909
jcyyc 0.404077 0.507962
kucem 0.367319 0.546055
zwphf 0.802427 0.673711
X All 0.456074 0.571174
cvksn 0.548099 0.791990
ldbxz 0.554260 0.148214
vkcqh 0.265864 0.773320
B All All 0.522005 0.390516
Y All 0.621808 0.325927
cqokn 0.577504 0.252534
spghx 0.472612 0.487480
wjsxg 0.815309 0.237767
Z All 0.422202 0.455104
holoo 0.459850 0.432980
lnihf 0.667877 0.317932
wirhq 0.138879 0.614401
编辑:
如果有必要订购 - 这里tot
可以All
使用ordered categoricals
:
cat1 = ['tot'] + df['subsection'].unique().tolist()
cat2 = ['tot'] + df['name'].unique().tolist()
df['subsection'] = pd.Categorical(df['subsection'], categories=cat1, ordered=True)
df['name'] = pd.Categorical(df['name'], categories=cat2, ordered=True)
s = df.groupby(['section', 'subsection', 'name', 'topic'])['score'].mean().unstack('topic')
s1 = (df.groupby(['section','topic'])['score'].mean()
.unstack('topic').assign(subsection = 'tot', name='tot')
.set_index(['subsection','name'], append=True))
s2 = (df.groupby(['section','subsection','topic'])['score'].mean()
.unstack('topic')
.assign(name='tot')
.set_index(['name'], append=True))
s = pd.concat([s, s1, s2]).sort_index()
print (s)
topic a b
section subsection name
A tot tot 0.490341 0.573542
W tot 0.524608 0.575909
zwphf 0.802427 0.673711
jcyyc 0.404077 0.507962
kucem 0.367319 0.546055
X tot 0.456074 0.571174
ldbxz 0.554260 0.148214
vkcqh 0.265864 0.773320
cvksn 0.548099 0.791990
B tot tot 0.522005 0.390516
Y tot 0.621808 0.325927
spghx 0.472612 0.487480
cqokn 0.577504 0.252534
wjsxg 0.815309 0.237767
Z tot 0.422202 0.455104
holoo 0.459850 0.432980
lnihf 0.667877 0.317932
wirhq 0.138879 0.614401
推荐阅读
- swiftui - Writing to Bool property not agreeing with SwiftUI
- linux - 格式化为 FAT32 后,笔式驱动器无法在音频播放器中播放音频
- html - 元素不遵守 Display: inline 属性
- javascript - JavaScript,'$' 未定义
- swift - 如何将 FBSDKCoreKit 集成到 Swift 自定义框架中?
- sql - 旋转附加 24 笔贷款的最后 6 笔付款
- bash - grep 两个模式之间的字符串 文件中的多个实例?
- php - 使用模式构建器 hasTable 方法阐明数据库问题
- r - 闪亮的流体行下的while循环
- javascript - 我正在尝试将提供不和谐机器人的反应转换为在使用特定反应时发送消息的机器人。我无法让它回应