首页 > 解决方案 > 使用数据透视表正确整理数据 - 将索引保留为两级

问题描述

我正在努力解决这个问题。我知道如何生成数据透视表,但我真的很难将索引保持为两级。这是问题,我的代码在下面:

用于pivot_table整理下面的数据table1,将结果赋给变量table1_tidy。在这种情况下,将索引保留为两级countryyear

table1columns = ["country",  "year",       "type",     "count"]
table1data =[ ["Afghanistan",  1999,      "cases",       745],
          ["Afghanistan",  1999, "population",  19987071],
          ["Afghanistan",  2000,      "cases",      2666],
          ["Afghanistan",  2000, "population",  20595360],
          [     "Brazil",  1999,      "cases",     37737],
          [     "Brazil",  1999, "population", 172006362],
          [     "Brazil",  2000,      "cases",     80488],
          [     "Brazil",  2000, "population", 174504898],
          [      "China",  1999,      "cases",    212258],
          [      "China",  1999, "population",1272915272],
          [      "China",  2000,      "cases",    213766],
          [      "China",  2000, "population",1280428583] ]

table1 = pd.DataFrame(table1data, columns=table1columns)

### BEGIN SOLUTION
'''
This code uses `pivot_table` to tidy the data below in `table1`, 
assigning the result to the variable `table1_tidy`.
'''
table1_tidy = table1.pivot('type', 'count')
### END SOLUTION
# When done, comment out line below
# raise NotImplementedError()
print(table1_tidy)

我的代码需要通过以下断言语句,但目前未能这样做:

assert table1_tidy.shape == (6, 2)
assert table1_tidy.iloc[3, 0] == 80488

标签: pythonpandaspivot

解决方案


Pivot 为多索引索引提供值错误。GitHub 上有一个同样的开放错误。当前的解决方案是改用 pivot_table

table1_tidy = table1.pivot_table( index = ['country', 'year'], columns = 'type',values = 'count')



type                cases   population
country     year        
Afghanistan 1999    745     19987071
            2000    2666    20595360
Brazil      1999    37737   172006362
            2000    80488   174504898
China       1999    212258  1272915272
            2000    213766  1280428583

您可以使用 set_index 获得相同的结果

table1_tidy = table1.set_index(['country', 'year', 'type'])['count'].unstack()

推荐阅读