python - 在熊猫数据透视表中设置多级索引
问题描述
我有以下数据
,DISTRICT,STATUS,DEM,REP,CON,GRE,WOR,IND,WEP,REF,OTH,BLANK
3,1,Active,144470,162308,11778,1108,2066,23245,150.0,41.0,394,128547
4,1,Inactive,10086,8290,643,96,130,1545,5.0,3.0,28,8436
12,2,Active,159400,155996,8457,830,1712,18008,137.0,43.0,291,114789
13,2,Inactive,9440,7752,428,61,119,1043,6.0,3.0,13,6832
24,3,Active,188902,148599,5872,832,1193,19370,144.0,115.0,228,127907
25,3,Inactive,12367,8835,305,68,69,1255,3.0,4.0,16,8315
30,4,Active,204544,160861,4911,888,1286,17455,148.0,173.0,162,113486
31,4,Inactive,15085,10302,315,74,119,1269,4.0,2.0,6,8442
39,5,Active,298385,33244,1468,483,1251,8662,175.0,27.0,43,62106
40,5,Inactive,20071,2273,88,30,126,643,9.0,3.0,2,4380
45,6,Active,198024,52511,1951,592,849,8905,161.0,38.0,83,87250
46,6,Inactive,15016,3699,145,35,77,789,4.0,0.0,9,6769
57,7,Active,271911,22736,803,899,1369,6971,203.0,22.0,113,67437
58,7,Inactive,31143,3355,133,155,219,1166,10.0,1.0,20,9357
66,8,Active,324586,32846,1122,713,1666,8649,202.0,25.0,54,62021
67,8,Inactive,36652,3183,141,109,271,1245,11.0,2.0,20,8419
72,9,Active,313949,30234,1034,823,1474,7952,139.0,21.0,90,62772
73,9,Inactive,33517,3184,119,115,257,1160,8.0,1.0,11,8696
81,10,Active,247623,45926,1128,743,570,9287,128.0,24.0,152,79918
82,10,Inactive,38988,9685,177,143,114,2749,9.0,0.0,57,16564
90,11,Active,183564,110062,5299,654,1362,12817,168.0,67.0,137,88130
91,11,Inactive,16846,7921,409,69,187,1259,5.0,2.0,19,8107
102,12,Active,259877,54431,883,1009,622,13703,128.0,32.0,210,91923
103,12,Inactive,44238,12638,200,244,136,3629,10.0,2.0,87,20458
111,13,Active,327895,16808,738,840,1235,7730,238.0,37.0,82,54421
112,13,Inactive,50766,3815,154,167,304,1816,18.0,1.0,27,11201
120,14,Active,214570,33020,1694,650,993,7544,200.0,28.0,73,61960
121,14,Inactive,21175,3395,241,70,145,946,9.0,0.0,17,6736
126,15,Active,282788,13663,1069,426,1509,5539,284.0,37.0,23,46081
127,15,Inactive,44336,2904,203,53,393,1077,28.0,1.0,12,8909
135,16,Active,251841,62717,4039,584,1140,12216,153.0,72.0,92,76433
136,16,Inactive,29544,6369,369,54,181,1430,6.0,0.0,16,9342
144,17,Active,201872,102969,7728,807,1393,18013,131.0,206.0,97,106682
145,17,Inactive,16792,8085,520,110,134,1658,3.0,3.0,13,10190
159,18,Active,154331,135552,8585,1126,1764,22973,157.0,104.0,350,105443
160,18,Inactive,12972,9506,574,133,196,1995,9.0,4.0,39,9418
195,19,Active,141289,138473,10576,1867,2155,26423,167.0,85.0,360,114937
196,19,Inactive,11503,8552,656,209,245,2350,9.0,4.0,37,10392
213,20,Active,178650,107575,10258,1452,2098,25382,186.0,99.0,295,107500
214,20,Inactive,17920,7738,768,189,353,2570,25.0,1.0,75,11487
252,21,Active,117878,166307,5996,1243,1589,24589,96.0,65.0,201,83284
253,21,Inactive,10560,11493,502,175,245,2646,8.0,8.0,25,10898
279,22,Active,127765,158621,6587,1179,1818,23179,139.0,90.0,377,80746
280,22,Inactive,11143,9472,478,144,262,2183,23.0,11.0,49,8998
315,23,Active,128565,153949,7342,1452,1986,21646,136.0,93.0,357,84173
316,23,Inactive,11331,8808,510,196,232,2032,14.0,8.0,44,9399
330,24,Active,143319,138353,8370,1394,1844,21997,147.0,55.0,521,103922
331,24,Inactive,15962,9745,642,211,325,2496,16.0,6.0,74,12229
336,25,Active,178181,120433,7400,1180,1404,19377,163.0,43.0,607,103153
337,25,Inactive,15180,6838,404,158,146,1533,19.0,3.0,61,8252
345,26,Active,234929,90506,7508,1351,2372,19155,192.0,87.0,444,74621
346,26,Inactive,19306,5621,405,155,284,1602,22.0,7.0,47,8109
372,27,Active,142703,183641,12825,1337,2299,25525,102.0,73.0,627,97801
373,27,Inactive,7283,7959,550,102,165,1433,8.0,0.0,42,6377
并且可以在这里查看和下载。
我正在尝试旋转这个数据框,使“DISTRICT”是索引的第一级,而“BLANK”、“CON”、“DEM”、“GRE”、“IND”、“OTH”、“REF”、“REP” ', 'WEP', 'WOR' 列是多级索引的第二级。然后我想将“状态”列设置为水平列“活动”和“非活动”。我写了以下代码:
active_inactive8.pivot(index=['DISTRICT', 'DEM', 'REP', 'CON', 'GRE',
'WOR', 'IND', 'WEP', 'REF', 'OTH', 'BLANK'],
columns='STATUS')
但得到错误:ValueError: all arrays must be same length
。
我也试过 .melt 没有成功。
解决方案
我相信需要set_index
,unstack
但随后需要转置,swaplevel
最后sort_index
:
df = pd.read_csv('active_inactive8.csv', index_col=0)
df2 = df.set_index(['STATUS','DISTRICT']).unstack().T.swaplevel(1,0).sort_index()
df2 = df.pivot(index='STATUS', columns='DISTRICT').T.swaplevel(1,0).sort_index()
print (df2.head(10))
STATUS Active Inactive
DISTRICT
1 BLANK 128547.0 8436.0
CON 11778.0 643.0
DEM 144470.0 10086.0
GRE 1108.0 96.0
IND 23245.0 1545.0
OTH 394.0 28.0
REF 41.0 3.0
REP 162308.0 8290.0
WEP 150.0 5.0
WOR 2066.0 130.0
推荐阅读
- angular - 如何在 angular.json 中导入外部脚本 src?
- android - 无法从资产“index.android.bundle”加载脚本,请确保您的捆绑包已打包
- sql - 动态生成视图名称的拖放视图语法
- php - 在 codeigniter 中使用 $this 时不在对象上下文中
- java - 比较 url 字符串的路径。忽略域名
- javascript - ChartJS:设置图表加载时显示哪些数据并保存(记住)它
- jquery - 在发出请求之前覆盖 ajax 请求以更改 url
- javascript - 如何创建具有函数返回结果的数组?
- vba - 从展开/折叠数据透视表中提取“隐藏”数据 - Excel
- javascript - Lodash:提取属性,拆分数组,获取唯一值