首页 > 解决方案 > 在熊猫数据透视表中设置多级索引

问题描述

我有以下数据

,DISTRICT,STATUS,DEM,REP,CON,GRE,WOR,IND,WEP,REF,OTH,BLANK
3,1,Active,144470,162308,11778,1108,2066,23245,150.0,41.0,394,128547
4,1,Inactive,10086,8290,643,96,130,1545,5.0,3.0,28,8436
12,2,Active,159400,155996,8457,830,1712,18008,137.0,43.0,291,114789
13,2,Inactive,9440,7752,428,61,119,1043,6.0,3.0,13,6832
24,3,Active,188902,148599,5872,832,1193,19370,144.0,115.0,228,127907
25,3,Inactive,12367,8835,305,68,69,1255,3.0,4.0,16,8315
30,4,Active,204544,160861,4911,888,1286,17455,148.0,173.0,162,113486
31,4,Inactive,15085,10302,315,74,119,1269,4.0,2.0,6,8442
39,5,Active,298385,33244,1468,483,1251,8662,175.0,27.0,43,62106
40,5,Inactive,20071,2273,88,30,126,643,9.0,3.0,2,4380
45,6,Active,198024,52511,1951,592,849,8905,161.0,38.0,83,87250
46,6,Inactive,15016,3699,145,35,77,789,4.0,0.0,9,6769
57,7,Active,271911,22736,803,899,1369,6971,203.0,22.0,113,67437
58,7,Inactive,31143,3355,133,155,219,1166,10.0,1.0,20,9357
66,8,Active,324586,32846,1122,713,1666,8649,202.0,25.0,54,62021
67,8,Inactive,36652,3183,141,109,271,1245,11.0,2.0,20,8419
72,9,Active,313949,30234,1034,823,1474,7952,139.0,21.0,90,62772
73,9,Inactive,33517,3184,119,115,257,1160,8.0,1.0,11,8696
81,10,Active,247623,45926,1128,743,570,9287,128.0,24.0,152,79918
82,10,Inactive,38988,9685,177,143,114,2749,9.0,0.0,57,16564
90,11,Active,183564,110062,5299,654,1362,12817,168.0,67.0,137,88130
91,11,Inactive,16846,7921,409,69,187,1259,5.0,2.0,19,8107
102,12,Active,259877,54431,883,1009,622,13703,128.0,32.0,210,91923
103,12,Inactive,44238,12638,200,244,136,3629,10.0,2.0,87,20458
111,13,Active,327895,16808,738,840,1235,7730,238.0,37.0,82,54421
112,13,Inactive,50766,3815,154,167,304,1816,18.0,1.0,27,11201
120,14,Active,214570,33020,1694,650,993,7544,200.0,28.0,73,61960
121,14,Inactive,21175,3395,241,70,145,946,9.0,0.0,17,6736
126,15,Active,282788,13663,1069,426,1509,5539,284.0,37.0,23,46081
127,15,Inactive,44336,2904,203,53,393,1077,28.0,1.0,12,8909
135,16,Active,251841,62717,4039,584,1140,12216,153.0,72.0,92,76433
136,16,Inactive,29544,6369,369,54,181,1430,6.0,0.0,16,9342
144,17,Active,201872,102969,7728,807,1393,18013,131.0,206.0,97,106682
145,17,Inactive,16792,8085,520,110,134,1658,3.0,3.0,13,10190
159,18,Active,154331,135552,8585,1126,1764,22973,157.0,104.0,350,105443
160,18,Inactive,12972,9506,574,133,196,1995,9.0,4.0,39,9418
195,19,Active,141289,138473,10576,1867,2155,26423,167.0,85.0,360,114937
196,19,Inactive,11503,8552,656,209,245,2350,9.0,4.0,37,10392
213,20,Active,178650,107575,10258,1452,2098,25382,186.0,99.0,295,107500
214,20,Inactive,17920,7738,768,189,353,2570,25.0,1.0,75,11487
252,21,Active,117878,166307,5996,1243,1589,24589,96.0,65.0,201,83284
253,21,Inactive,10560,11493,502,175,245,2646,8.0,8.0,25,10898
279,22,Active,127765,158621,6587,1179,1818,23179,139.0,90.0,377,80746
280,22,Inactive,11143,9472,478,144,262,2183,23.0,11.0,49,8998
315,23,Active,128565,153949,7342,1452,1986,21646,136.0,93.0,357,84173
316,23,Inactive,11331,8808,510,196,232,2032,14.0,8.0,44,9399
330,24,Active,143319,138353,8370,1394,1844,21997,147.0,55.0,521,103922
331,24,Inactive,15962,9745,642,211,325,2496,16.0,6.0,74,12229
336,25,Active,178181,120433,7400,1180,1404,19377,163.0,43.0,607,103153
337,25,Inactive,15180,6838,404,158,146,1533,19.0,3.0,61,8252
345,26,Active,234929,90506,7508,1351,2372,19155,192.0,87.0,444,74621
346,26,Inactive,19306,5621,405,155,284,1602,22.0,7.0,47,8109
372,27,Active,142703,183641,12825,1337,2299,25525,102.0,73.0,627,97801
373,27,Inactive,7283,7959,550,102,165,1433,8.0,0.0,42,6377

并且可以在这里查看和下载。

我正在尝试旋转这个数据框,使“DISTRICT”是索引的第一级,而“BLANK”、“CON”、“DEM”、“GRE”、“IND”、“OTH”、“REF”、“REP” ', 'WEP', 'WOR' 列是多级索引的第二级。然后我想将“状态”列设置为水平列“活动”和“非活动”。我写了以下代码:

active_inactive8.pivot(index=['DISTRICT', 'DEM', 'REP', 'CON', 'GRE',
                              'WOR', 'IND', 'WEP', 'REF', 'OTH', 'BLANK'],
                       columns='STATUS')

但得到错误:ValueError: all arrays must be same length

我也试过 .melt 没有成功。

标签: pythonpandaspivot

解决方案


我相信需要set_indexunstack但随后需要转置,swaplevel最后sort_index

df = pd.read_csv('active_inactive8.csv', index_col=0)

df2 = df.set_index(['STATUS','DISTRICT']).unstack().T.swaplevel(1,0).sort_index()

df2 = df.pivot(index='STATUS', columns='DISTRICT').T.swaplevel(1,0).sort_index()


print (df2.head(10))
STATUS            Active  Inactive
DISTRICT                          
1        BLANK  128547.0    8436.0
         CON     11778.0     643.0
         DEM    144470.0   10086.0
         GRE      1108.0      96.0
         IND     23245.0    1545.0
         OTH       394.0      28.0
         REF        41.0       3.0
         REP    162308.0    8290.0
         WEP       150.0       5.0
         WOR      2066.0     130.0

推荐阅读