首页 > 解决方案 > 如何在python中取消堆叠具有多个索引的列?

问题描述

我有一个这样的数据框,

d = {'state': [ "Alabama", "Alabama","Alabama", "Alabama","Alabama","Alabama","Alabama","Alabama",],
      'county_name': ["Autauga","Autauga","Autauga","Baldwin","Baldwin","Baldwin","Baldwin","Barbour"],
      'col3':["A","B","C","A","C","D","B","B"],
      'count':[2,2,1,9,3,2,50,1],
      'sum':[1,0,0,3,1,0,13,0]}
df = pd.DataFrame(data=d)
print(df)
     state county_name col3  count  sum
0  Alabama     Autauga    A      2    1
1  Alabama     Autauga    B      2    0
2  Alabama     Autauga    C      1    0
3  Alabama     Baldwin    A      9    3
4  Alabama     Baldwin    C      3    1
5  Alabama     Baldwin    D      2    0
6  Alabama     Baldwin    B     50   13
7  Alabama     Barbour    B      1    0

我试图通过在 python 中使用 unstack 来重塑 Dataframe。我想要的最终输出类似于,

state  county_name  col3_A_count  col3_B_count  col3_C_count col3_D_count  col3_A_sum  col3_B_sum  col3_C_sum  col3_D_sum
Alabama  Autauga         2             2             1             NA           1           0           0            NA  
Alabama  Baldwin         9            50             3             2            3           13          1            2
Alabama  Barbour         NA            1            NA             NA           NA          0           NA           NA  

我曾尝试使用set_indexunstack解决此问题,但它显示错误。

location = ['state','county_name']
df = df.set_index(['col3']+location).unstack('col3')

ValueError: Index contains duplicate entries, cannot reshape

好吧,我不知道为什么它在这里运作良好。可能是因为数据量小。但是当我应用于原始数据集时,它显示了该错误。看来您不能使用重复的记录作为索引。有人可以告诉我如何解决这个问题吗?

标签: pythonpandasdataframegroup-bypivot

解决方案


推荐阅读