python - 向熊猫数据框添加多索引,这是相同数据框值的总和
问题描述
我有一个df
:
df = pd.DataFrame.from_dict({('group', ''): {0: 'A',
1: 'A',
2: 'A',
3: 'A',
4: 'A',
5: 'A',
6: 'A',
7: 'A',
8: 'A',
9: 'B',
10: 'B',
11: 'B',
12: 'B',
13: 'B',
14: 'B',
15: 'B',
16: 'B',
17: 'B',
18: 'all',
19: 'all'},
('category', ''): {0: 'Amazon',
1: 'Apple',
2: 'Facebook',
3: 'Google',
4: 'Netflix',
5: 'Tesla',
6: 'Total',
7: 'Uber',
8: 'total',
9: 'Amazon',
10: 'Apple',
11: 'Facebook',
12: 'Google',
13: 'Netflix',
14: 'Tesla',
15: 'Total',
16: 'Uber',
17: 'total',
18: 'Total',
19: 'total'},
(pd.Timestamp('2020-06-29 00:00:00'), 'last_sales'): {0: 195.0,
1: 61.0,
2: 106.0,
3: 61.0,
4: 37.0,
5: 13.0,
6: 954.0,
7: 4.0,
8: 477.0,
9: 50.0,
10: 50.0,
11: 75.0,
12: 43.0,
13: 17.0,
14: 14.0,
15: 504.0,
16: 3.0,
17: 252.0,
18: 2916.0,
19: 2916.0},
(pd.Timestamp('2020-06-29 00:00:00'), 'sales'): {0: 1268.85,
1: 18274.385000000002,
2: 19722.65,
3: 55547.255,
4: 15323.800000000001,
5: 1688.6749999999997,
6: 227463.23,
7: 1906.0,
8: 113731.615,
9: 3219.6499999999996,
10: 15852.060000000001,
11: 17743.7,
12: 37795.15,
13: 5918.5,
14: 1708.75,
15: 166349.64,
16: 937.01,
17: 83174.82,
18: 787625.7400000001,
19: 787625.7400000001},
(pd.Timestamp('2020-06-29 00:00:00'), 'difference'): {0: 0.0,
1: 0.0,
2: 0.0,
3: 0.0,
4: 0.0,
5: 0.0,
6: 0.0,
7: 0.0,
8: 0.0,
9: 0.0,
10: 0.0,
11: 0.0,
12: 0.0,
13: 0.0,
14: 0.0,
15: 0.0,
16: 0.0,
17: 0.0,
18: 0.0,
19: 0.0},
(pd.Timestamp('2020-07-06 00:00:00'), 'last_sales'): {0: 26.0,
1: 39.0,
2: 79.0,
3: 49.0,
4: 10.0,
5: 10.0,
6: 436.0,
7: 5.0,
8: 218.0,
9: 89.0,
10: 34.0,
11: 133.0,
12: 66.0,
13: 21.0,
14: 20.0,
15: 732.0,
16: 3.0,
17: 366.0,
18: 2336.0,
19: 2336.0},
(pd.Timestamp('2020-07-06 00:00:00'), 'sales'): {0: 3978.15,
1: 12138.96,
2: 19084.175,
3: 40033.46000000001,
4: 4280.15,
5: 1495.1,
6: 165548.29,
7: 1764.15,
8: 82774.145,
9: 8314.92,
10: 12776.649999999996,
11: 28048.075,
12: 55104.21000000002,
13: 6962.844999999999,
14: 3053.2000000000003,
15: 231049.11000000002,
16: 1264.655,
17: 115524.55500000001,
18: 793194.8000000002,
19: 793194.8000000002},
(pd.Timestamp('2020-07-06 00:00:00'), 'difference'): {0: 0.0,
1: 0.0,
2: 0.0,
3: 0.0,
4: 0.0,
5: 0.0,
6: 0.0,
7: 0.0,
8: 0.0,
9: 0.0,
10: 0.0,
11: 0.0,
12: 0.0,
13: 0.0,
14: 0.0,
15: 0.0,
16: 0.0,
17: 0.0,
18: 0.0,
19: 0.0},
(pd.Timestamp('2021-06-28 00:00:00'), 'last_sales'): {0: 96.0,
1: 56.0,
2: 106.0,
3: 44.0,
4: 34.0,
5: 13.0,
6: 716.0,
7: 9.0,
8: 358.0,
9: 101.0,
10: 22.0,
11: 120.0,
12: 40.0,
13: 13.0,
14: 8.0,
15: 610.0,
16: 1.0,
17: 305.0,
18: 2652.0,
19: 2652.0},
(pd.Timestamp('2021-06-28 00:00:00'), 'sales'): {0: 5194.95,
1: 19102.219999999994,
2: 22796.420000000002,
3: 30853.115,
4: 11461.25,
5: 992.6,
6: 188143.41,
7: 3671.15,
8: 94071.705,
9: 6022.299999999998,
10: 7373.6,
11: 33514.0,
12: 35943.45,
13: 4749.000000000001,
14: 902.01,
15: 177707.32,
16: 349.3,
17: 88853.66,
18: 731701.46,
19: 731701.46},
(pd.Timestamp('2021-06-28 00:00:00'), 'difference'): {0: 0.0,
1: 0.0,
2: 0.0,
3: 0.0,
4: 0.0,
5: 0.0,
6: 0.0,
7: 0.0,
8: 0.0,
9: 0.0,
10: 0.0,
11: 0.0,
12: 0.0,
13: 0.0,
14: 0.0,
15: 0.0,
16: 0.0,
17: 0.0,
18: 0.0,
19: 0.0},
(pd.Timestamp('2021-07-07 00:00:00'), 'last_sales'): {0: 45.0,
1: 47.0,
2: 87.0,
3: 45.0,
4: 13.0,
5: 8.0,
6: 494.0,
7: 2.0,
8: 247.0,
9: 81.0,
10: 36.0,
11: 143.0,
12: 56.0,
13: 9.0,
14: 9.0,
15: 670.0,
16: 1.0,
17: 335.0,
18: 2328.0,
19: 2328.0},
(pd.Timestamp('2021-07-07 00:00:00'), 'sales'): {0: 7556.414999999998,
1: 14985.05,
2: 16790.899999999998,
3: 36202.729999999996,
4: 4024.97,
5: 1034.45,
6: 163960.32999999996,
7: 1385.65,
8: 81980.16499999998,
9: 5600.544999999999,
10: 11209.92,
11: 32832.61,
12: 42137.44500000001,
13: 3885.1499999999996,
14: 1191.5,
15: 194912.34000000003,
16: 599.0,
17: 97456.17000000001,
18: 717745.3400000001,
19: 717745.3400000001},
(pd.Timestamp('2021-07-07 00:00:00'), 'difference'): {0: 0.0,
1: 0.0,
2: 0.0,
3: 0.0,
4: 0.0,
5: 0.0,
6: 0.0,
7: 0.0,
8: 0.0,
9: 0.0,
10: 0.0,
11: 0.0,
12: 0.0,
13: 0.0,
14: 0.0,
15: 0.0,
16: 0.0,
17: 0.0,
18: 0.0,
19: 0.0}}).set_index(['group','category'])
我正在尝试创建一个级别1
索引combined
,级别2
索引将是当前索引级别的名称,2
category
但没有total
'Amazon',
'Apple',
'Facebook',
'Google',
'Netflix',
'Tesla',
'Uber'
这将是每个级别索引的总和,1
不包括列的级别索引。基本上得到所有不包括, per的总数。group
category
all
group
1
sales
groups
all
sum
category
是否也可以编写索引的group
名称combined
以考虑在内,这样我就可以对combined
categories
for selectedgroups
而不是每个group
exclude求和all
?
我试过了:
c = df.reset_index()
c[(c.group.isin(['A','B']))& (c.category.isin(['Amazon','Apple','Facebook', 'Google', 'Netflix', 'Tesla', 'Uber']))].loc[:,(slice(None),'sales')].sum()
但后来我意识到这不是分组的,category
所以我不知道如何继续。
预期输出示例(数据不一致):
2020-06-29 00:00:00
last_sales sales difference
group category
combined Amazon 195.000 1,268.850 0.000
Apple 61.000 18,274.385 0.000
Facebook 106.000 19,722.650 0.000
Google 61.000 55,547.255 0.000
Netflix 37.000 15,323.800 0.000
Tesla 13.000 1,688.675 0.000
Uber 4.000 1,906.000 0.000
A Amazon 50.000 3,219.650 0.000
Apple 50.000 15,852.060 0.000
Facebook 75.000 17,743.700 0.000
Google 43.000 37,795.150 0.000
Netflix 17.000 5,918.500 0.000
Tesla 14.000 1,708.750 0.000
Total 504.000 166,349.640 0.000
Uber 3.000 937.010 0.000
total 252.000 83,174.820 0.000
B Amazon 50.000 3,219.650 0.000
Apple 50.000 15,852.060 0.000
Facebook 75.000 17,743.700 0.000
Google 43.000 37,795.150 0.000
Netflix 17.000 5,918.500 0.000
Tesla 14.000 1,708.750 0.000
Total 504.000 166,349.640 0.000
Uber 3.000 937.010 0.000
total 252.000 83,174.820 0.000
all Total 2,916.000 787,625.740 0.000
total 2,916.000 787,625.740 0.000
解决方案
重申我之前解决方案的想法,我们可以通过以下方式解决这个问题
s = df.loc[['A', 'B']].drop(['total', 'Total'], level=1).sum(level=1)
s.index = pd.MultiIndex.from_product([['combined'], s.index])
df_out = s.append(df)
结果
print(df_out)
2020-06-29 00:00:00 2020-07-06 00:00:00 2021-06-28 00:00:00 2021-07-07 00:00:00
last_sales sales difference last_sales sales difference last_sales sales difference last_sales sales difference
category
combined Amazon 245.0 4488.500 0.0 115.0 12293.070 0.0 197.0 11217.250 0.0 126.0 13156.960 0.0
Apple 111.0 34126.445 0.0 73.0 24915.610 0.0 78.0 26475.820 0.0 83.0 26194.970 0.0
Facebook 181.0 37466.350 0.0 212.0 47132.250 0.0 226.0 56310.420 0.0 230.0 49623.510 0.0
Google 104.0 93342.405 0.0 115.0 95137.670 0.0 84.0 66796.565 0.0 101.0 78340.175 0.0
Netflix 54.0 21242.300 0.0 31.0 11242.995 0.0 47.0 16210.250 0.0 22.0 7910.120 0.0
Tesla 27.0 3397.425 0.0 30.0 4548.300 0.0 21.0 1894.610 0.0 17.0 2225.950 0.0
Uber 7.0 2843.010 0.0 8.0 3028.805 0.0 10.0 4020.450 0.0 3.0 1984.650 0.0
A Amazon 195.0 1268.850 0.0 26.0 3978.150 0.0 96.0 5194.950 0.0 45.0 7556.415 0.0
Apple 61.0 18274.385 0.0 39.0 12138.960 0.0 56.0 19102.220 0.0 47.0 14985.050 0.0
Facebook 106.0 19722.650 0.0 79.0 19084.175 0.0 106.0 22796.420 0.0 87.0 16790.900 0.0
Google 61.0 55547.255 0.0 49.0 40033.460 0.0 44.0 30853.115 0.0 45.0 36202.730 0.0
Netflix 37.0 15323.800 0.0 10.0 4280.150 0.0 34.0 11461.250 0.0 13.0 4024.970 0.0
Tesla 13.0 1688.675 0.0 10.0 1495.100 0.0 13.0 992.600 0.0 8.0 1034.450 0.0
Total 954.0 227463.230 0.0 436.0 165548.290 0.0 716.0 188143.410 0.0 494.0 163960.330 0.0
Uber 4.0 1906.000 0.0 5.0 1764.150 0.0 9.0 3671.150 0.0 2.0 1385.650 0.0
total 477.0 113731.615 0.0 218.0 82774.145 0.0 358.0 94071.705 0.0 247.0 81980.165 0.0
B Amazon 50.0 3219.650 0.0 89.0 8314.920 0.0 101.0 6022.300 0.0 81.0 5600.545 0.0
Apple 50.0 15852.060 0.0 34.0 12776.650 0.0 22.0 7373.600 0.0 36.0 11209.920 0.0
Facebook 75.0 17743.700 0.0 133.0 28048.075 0.0 120.0 33514.000 0.0 143.0 32832.610 0.0
Google 43.0 37795.150 0.0 66.0 55104.210 0.0 40.0 35943.450 0.0 56.0 42137.445 0.0
Netflix 17.0 5918.500 0.0 21.0 6962.845 0.0 13.0 4749.000 0.0 9.0 3885.150 0.0
Tesla 14.0 1708.750 0.0 20.0 3053.200 0.0 8.0 902.010 0.0 9.0 1191.500 0.0
Total 504.0 166349.640 0.0 732.0 231049.110 0.0 610.0 177707.320 0.0 670.0 194912.340 0.0
Uber 3.0 937.010 0.0 3.0 1264.655 0.0 1.0 349.300 0.0 1.0 599.000 0.0
total 252.0 83174.820 0.0 366.0 115524.555 0.0 305.0 88853.660 0.0 335.0 97456.170 0.0
all Total 2916.0 787625.740 0.0 2336.0 793194.800 0.0 2652.0 731701.460 0.0 2328.0 717745.340 0.0
total 2916.0 787625.740 0.0 2336.0 793194.800 0.0 2652.0 731701.460 0.0 2328.0 717745.340 0.0
推荐阅读
- php - Foreach 循环只处理一个元素
- python - 按月查找每个组中最早的记录
- python - 打开文本文件时出现“OSError:[Errno 36] 文件名太长”
- apache - 带有问号的 URL 重定向
- javascript - 如何用javascript给某个td一个类名?
- javascript - 具有相同状态名称的多个组件,在父div上单击时更改一个而不影响其他人?
- hbase - 带空格的 Hbase 行键
- python - BigTable - 删除/跳过前 N 行
- powershell - 通过 Powershell 获取具有特定域的 Office 365 用户
- unity3d - 跳跃时重力不影响玩家