pandas - 基于总行和小计行计算的行
问题描述
我们如何计算对 Total 贡献 80% 的前 n 行?
Item Number Item Amount State
1 Agriculture, forestry and fishing 308507 Oregon
--
10 Gross State Domestic Product
更多数据位于 gdrive 中的文件中:
https://drive.google.com/open?id=10l84MVcIDIwyWyKa_ftEYrNDwB0C3HWS
我有以下代码来计算前 n 个贡献行
for col in cols:
PercentageCol = col + ' %'
result_pivot[PercentageCol] = round((result_pivot[col] / result_pivot['Gross State Domestic Product']) * 100,2)
cols = result_pivot.columns[result_pivot.columns.str.contains('%')]
result_pivot = result_pivot[cols].T
result_pivot['Avg'] = round(result_pivot.mean(axis=1), 2)
result_pivot = result_pivot.sort_values(['Avg'], ascending=False)
result_pivot = result_pivot.nlargest(5, columns='Avg')
有没有其他方法可以做到这一点?
解决方案
如果我理解正确,我们有两种情况。有和没有排序:
1 没有降序排序:
我们可以用 来计算百分比Amount / sum of Amount
。然后我们计算这些值的累积总和,并使用布尔索引过滤所有小于0.8
which is 的行80%
:
df_80 = df[(df['Amount'] / df['Amount'].sum()).cumsum() < 0.8]
print(df_80)
Item Number Item \
0 1.0 Agriculture, forestry and fishing
1 1.1 Crops
2 1.2 Livestock
3 1.3 Forestry and logging
4 1.4 Fishing and aquaculture
5 2.0 Mining and quarrying
6 3.0 Manufacturing
7 4.0 Electricity, gas, water supply & other utility...
8 5.0 Construction
9 6.0 Trade, repair, hotels and restaurants
10 6.1 Trade & repair services
11 6.2 Hotels & restaurants
12 7.0 Transport, storage, communication & services r...
13 7.1 Railways
14 7.2 Road transport
15 7.3 Water transport
16 7.4 Air transport
17 7.5 Services incidental to transport
18 7.6 Storage
19 7.7 Communication & services related to broadcasting
20 8.0 Financial services
21 9.0 Real estate, ownership of dwelling & professio...
22 10.0 Public administration
23 11.0 Other services
24 12.0 TOTAL GSVA at basic prices
25 13.0 Taxes on Products
26 14.0 Subsidies on products
27 15.0 Gross State Domestic Product
28 16.0 Population ('00)
29 17.0 Per Capita GSDP (Rs.)
.. ... ...
54 12.0 TOTAL GSVA at basic prices
55 13.0 Taxes on Products
56 14.0 Subsidies on products
57 15.0 Gross State Domestic Product
58 16.0 Population ('00)
59 17.0 Per Capita GSDP (Rs.)
60 1.0 Agriculture, forestry and fishing
61 1.1 Crops
62 1.2 Livestock
63 1.3 Forestry and logging
64 1.4 Fishing and aquaculture
65 2.0 Mining and quarrying
66 3.0 Manufacturing
67 4.0 Electricity, gas, water supply & other utility...
68 5.0 Construction
69 6.0 Trade, repair, hotels and restaurants
70 6.1 Trade & repair services*
71 6.2 Hotels & restaurants
72 7.0 Transport, storage, communication & services r...
73 7.1 Railways
74 7.2 Road transport**
75 7.3 Water transport
76 7.4 Air transport
77 7.5 Services incidental to transport
78 7.6 Storage
79 7.7 Communication & services related to broadcasting
80 8.0 Financial services
81 9.0 Real estate, ownership of dwelling & professio...
82 10.0 Public administration
83 11.0 Other services
Amount State
0 308507.0 Oregon
1 140421.0 Oregon
2 30141.0 Oregon
3 15744.0 Oregon
4 122201.0 Oregon
5 3622.0 Oregon
6 1177608.0 Oregon
7 204110.0 Oregon
8 165819.0 Oregon
9 380927.0 Oregon
10 343492.0 Oregon
11 37434.0 Oregon
12 189656.0 Oregon
13 15649.0 Oregon
14 46171.0 Oregon
15 17820.0 Oregon
16 46359.0 Oregon
17 19272.0 Oregon
18 357.0 Oregon
19 44028.0 Oregon
20 233618.0 Oregon
21 407099.0 Oregon
22 346486.0 Oregon
23 180431.0 Oregon
24 3597882.0 Oregon
25 527279.0 Oregon
26 61854.0 Oregon
27 4063307.0 Oregon
28 14950.0 Oregon
29 271793.0 Oregon
.. ... ...
54 39828404.0 Washington
55 4985670.0 Washington
56 1067867.0 Washington
57 43746207.0 Washington
58 266620.0 Washington
59 164077.0 Washington
60 5930617.0 Idaho
61 3070386.0 Idaho
62 1656104.0 Idaho
63 499808.0 Idaho
64 704319.0 Idaho
65 558824.0 Idaho
66 4273567.0 Idaho
67 482470.0 Idaho
68 7314003.0 Idaho
69 8557345.0 Idaho
70 7763847.0 Idaho
71 793498.0 Idaho
72 4020934.0 Idaho
73 147897.0 Idaho
74 2761427.0 Idaho
75 26956.0 Idaho
76 125029.0 Idaho
77 71567.0 Idaho
78 3290.0 Idaho
79 884767.0 Idaho
80 2010306.0 Idaho
81 7287633.0 Idaho
82 2068915.0 Idaho
83 5728645.0 Idaho
2.带排序
df_80 = df[(df['Amount'] / df['Amount'].sum()).sort_values(ascending=False).cumsum() < 0.8]
print(df_80)
Item Number Item \
30 1.0 Agriculture, forestry and fishing
36 3.0 Manufacturing
39 6.0 Trade, repair, hotels and restaurants
51 9.0 Real estate, ownership of dwelling & professio...
54 12.0 TOTAL GSVA at basic prices
55 13.0 Taxes on Products
57 15.0 Gross State Domestic Product
60 1.0 Agriculture, forestry and fishing
68 5.0 Construction
69 6.0 Trade, repair, hotels and restaurants
70 6.1 Trade & repair services*
81 9.0 Real estate, ownership of dwelling & professio...
83 11.0 Other services
84 12.0 TOTAL GSVA at basic prices
85 13.0 Taxes on Products
87 15.0 Gross State Domestic Product
Amount State
30 8015238.0 Washington
36 7756921.0 Washington
39 4986319.0 Washington
51 6970183.0 Washington
54 39828404.0 Washington
55 4985670.0 Washington
57 43746207.0 Washington
60 5930617.0 Idaho
68 7314003.0 Idaho
69 8557345.0 Idaho
70 7763847.0 Idaho
81 7287633.0 Idaho
83 5728645.0 Idaho
84 48233259.0 Idaho
85 5189352.0 Idaho
87 52600230.0 Idaho
推荐阅读
- r - 覆盖文本颜色闪亮引导
- php - 调用 PHP 服务器端代码时如何在 Ajax 调用中使路径动态化
- php - 如果值相同,如何仅显示一次元值
- java - Wildfly 中的 ClassNotFoundException jdk.net.*
- java - EC2上的Spring Boot应用程序在一段时间后自动关闭
- python - 有没有办法在文件名中搜索非英文字母?
- android - 是否有新版本的 Xamarin.OpenId.AppAuth.Android?
- javascript - 如何在 Alpine.js 中刷新状态?
- c# - C# 动态加载具有每个字符串的动态类型参数和强类型结果的泛型类
- python - Discord.py 文本通道检查