python - 在多索引熊猫数据框中添加每个级别 2 索引的总数
问题描述
我有一个数据框:
df_full = pd.DataFrame.from_dict({('group', ''): {0: 'A',
1: 'A',
2: 'A',
3: 'A',
4: 'A',
5: 'A',
6: 'A',
7: 'B',
8: 'B',
9: 'B',
10: 'B',
11: 'B',
12: 'B',
13: 'B'},
('category', ''): {0: 'Books',
1: 'Candy',
2: 'Pencil',
3: 'Table',
4: 'PC',
5: 'Printer',
6: 'Lamp',
7: 'Books',
8: 'Candy',
9: 'Pencil',
10: 'Table',
11: 'PC',
12: 'Printer',
13: 'Lamp'},
(pd.Timestamp('2021-06-28 00:00:00'),
'Sales_1'): {0: 9.937449997200002, 1: 30.71300000639998, 2: 58.81199999639999, 3: 25.661999978399994, 4: 3.657999996, 5: 12.0879999972, 6: 61.16600000040001, 7: 6.319439989199998, 8: 12.333119997600003, 9: 24.0544100028, 10: 24.384659998799997, 11: 1.9992000012000002, 12: 0.324, 13: 40.69122000000001},
(pd.Timestamp('2021-06-28 00:00:00'),
'Sales_2'): {0: 21.890370397789923, 1: 28.300470581874837, 2: 53.52039700062155, 3: 52.425508769690694, 4: 6.384936971649232, 5: 6.807138946302334, 6: 52.172, 7: 5.916852561, 8: 5.810764652, 9: 12.1243325, 10: 17.88071596, 11: 0.913782413, 12: 0.869207661, 13: 20.9447844},
(pd.Timestamp('2021-06-28 00:00:00'), 'last_week_sales'): {0: np.nan,
1: np.nan,
2: np.nan,
3: np.nan,
4: np.nan,
5: np.nan,
6: np.nan,
7: np.nan,
8: np.nan,
9: np.nan,
10: np.nan,
11: np.nan,
12: np.nan,
13: np.nan},
(pd.Timestamp('2021-06-28 00:00:00'), 'total_orders'): {0: 86.0,
1: 66.0,
2: 188.0,
3: 556.0,
4: 12.0,
5: 4.0,
6: 56.0,
7: 90.0,
8: 26.0,
9: 49.0,
10: 250.0,
11: 7.0,
12: 2.0,
13: 44.0},
(pd.Timestamp('2021-06-28 00:00:00'), 'total_sales'): {0: 4390.11,
1: 24825.059999999998,
2: 48592.39999999998,
3: 60629.77,
4: 831.22,
5: 1545.71,
6: 34584.99,
7: 5641.54,
8: 6798.75,
9: 13290.13,
10: 42692.68000000001,
11: 947.65,
12: 329.0,
13: 29889.65},
(pd.Timestamp('2021-07-05 00:00:00'),
'Sales_1'): {0: 13.690399997999998, 1: 38.723000005199985, 2: 72.4443400032, 3: 36.75802000560001, 4: 5.691999996, 5: 7.206999998399999, 6: 66.55265999039996, 7: 6.4613199911999954, 8: 12.845630001599998, 9: 26.032340003999998, 10: 30.1634600016, 11: 1.0203399996, 12: 1.4089999991999997, 13: 43.67116000320002},
(pd.Timestamp('2021-07-05 00:00:00'),
'Sales_2'): {0: 22.874363860953647, 1: 29.5726042895728, 2: 55.926190956481534, 3: 54.7820864335212, 4: 6.671946105284065, 5: 7.113126469779095, 6: 54.517, 7: 6.194107518, 8: 6.083562133, 9: 12.69221484, 10: 18.71872129, 11: 0.956574175, 12: 0.910216433, 13: 21.92632044},
(pd.Timestamp('2021-07-05 00:00:00'), 'last_week_sales'): {0: 4390.11,
1: 24825.059999999998,
2: 48592.39999999998,
3: 60629.77,
4: 831.22,
5: 1545.71,
6: 34584.99,
7: 5641.54,
8: 6798.75,
9: 13290.13,
10: 42692.68000000001,
11: 947.65,
12: 329.0,
13: 29889.65},
(pd.Timestamp('2021-07-05 00:00:00'), 'total_orders'): {0: 109.0,
1: 48.0,
2: 174.0,
3: 587.0,
4: 13.0,
5: 5.0,
6: 43.0,
7: 62.0,
8: 13.0,
9: 37.0,
10: 196.0,
11: 8.0,
12: 1.0,
13: 33.0},
(pd.Timestamp('2021-07-05 00:00:00'), 'total_sales'): {0: 3453.02,
1: 17868.730000000003,
2: 44707.82999999999,
3: 60558.97999999999,
4: 1261.0,
5: 1914.6000000000001,
6: 24146.09,
7: 6201.489999999999,
8: 5513.960000000001,
9: 9645.87,
10: 25086.785,
11: 663.0,
12: 448.61,
13: 26332.7}}).set_index(['group','category'])
我正在尝试total
为每列获取一个category
. 所以在这个df
例子中,在下面添加 2 行Lamp
表示每列的总数。红线表示所需的totals
位置:
我试过的:
df_out['total'] = df_out.sum(level=1).loc[:, (slice(None), 'total_sales')]
但是得到:
ValueError:传递的项目数错误 4,位置暗示 1
我也检查了这个问题,但无法将其应用于我自己。
解决方案
groupby
让我们试一试level=0
s = df_full.groupby(level=0).sum()
s.index = pd.MultiIndex.from_product([s.index, ['Total']])
df_out = df_full.append(s).sort_index()
print(df_out)
2021-06-28 00:00:00 2021-07-05 00:00:00
Sales_1 Sales_2 last_week_sales total_orders total_sales Sales_1 Sales_2 last_week_sales total_orders total_sales
group category
A Books 9.93745 21.890370 NaN 86.0 4390.11 13.69040 22.874364 4390.11 109.0 3453.020
Candy 30.71300 28.300471 NaN 66.0 24825.06 38.72300 29.572604 24825.06 48.0 17868.730
Lamp 61.16600 52.172000 NaN 56.0 34584.99 66.55266 54.517000 34584.99 43.0 24146.090
PC 3.65800 6.384937 NaN 12.0 831.22 5.69200 6.671946 831.22 13.0 1261.000
Pencil 58.81200 53.520397 NaN 188.0 48592.40 72.44434 55.926191 48592.40 174.0 44707.830
Printer 12.08800 6.807139 NaN 4.0 1545.71 7.20700 7.113126 1545.71 5.0 1914.600
Table 25.66200 52.425509 NaN 556.0 60629.77 36.75802 54.782086 60629.77 587.0 60558.980
Total 202.03645 221.500823 0.0 968.0 175399.26 241.06742 231.457318 175399.26 979.0 153910.250
B Books 6.31944 5.916853 NaN 90.0 5641.54 6.46132 6.194108 5641.54 62.0 6201.490
Candy 12.33312 5.810765 NaN 26.0 6798.75 12.84563 6.083562 6798.75 13.0 5513.960
Lamp 40.69122 20.944784 NaN 44.0 29889.65 43.67116 21.926320 29889.65 33.0 26332.700
PC 1.99920 0.913782 NaN 7.0 947.65 1.02034 0.956574 947.65 8.0 663.000
Pencil 24.05441 12.124332 NaN 49.0 13290.13 26.03234 12.692215 13290.13 37.0 9645.870
Printer 0.32400 0.869208 NaN 2.0 329.00 1.40900 0.910216 329.00 1.0 448.610
Table 24.38466 17.880716 NaN 250.0 42692.68 30.16346 18.718721 42692.68 196.0 25086.785
Total 110.10605 64.460440 0.0 468.0 99589.40 121.60325 67.481717 99589.40 350.0 73892.415
推荐阅读
- python - 使用 pandas 将 html 转换为 dict
- reactjs - 在反应组件中运行测试文件时出现错误无法读取未定义的属性“xxx”
- nginx - 无法使用客户端证书作为身份验证机制将代理请求反向到后端运行的 Nifi
- embedded-linux - 如何在 /var/volatile/log/ 下创建一个新目录并在 Yocto 构建中更改权限?
- datatables - cell(...).data(...).draw 不是函数
- python - 有没有办法为数百个属性重新排序熊猫数据框中的列?
- python - 在python中从工程图纸的pdf文件中提取图像
- swift - 如何在 macOS 应用程序中启动运行 Ruby 的线程
- angular - 如何在下一个 concatMap 中从 resposne 获取数据?
- android - 如何在 Flutter App 中本地存储多个变量