首页 > 解决方案 > 转置具有重复值的 pandas 列

问题描述

我有一个如下所示的数据框

df1 = pd.DataFrame({'Gender':['Male','Male','Male','Male','Female','Female','Female','Female','Male','Male','Male','Male','Female','Female','Female','Female'],
                'Year' :[2008,2008,2009,2009,2008,2008,2009,2009,2008,2008,2009,2009,2008,2008,2009,2009],
           'rate':[2.3,3.2,4.5,6.7,5.6,3.2,3.5,2.6,2.3,3.2,4.5,6.7,5.6,3.2,3.5,2.6],
           'Heading':['TNMAB123','TNMAB123','TNMAB123','TNMAB123','TNMAB123','TNMAB123','TNMAB123','TNMAB123',
                     'TNMAB456','TNMAB456','TNMAB456','TNMAB456','TNMAB456','TNMAB456','TNMAB456','TNMAB456'],
           'target':[31.2,33.4,33.4,35.2,35.2,36.4,36.4,37.2,31.2,33.4,33.4,35.2,35.2,36.4,36.4,37.2],
            'day_type':['wk','wkend','wk','wkend','wk','wkend','wk','wkend','wk','wkend','wk','wkend','wk','wkend','wk','wkend']})

如您所见,所有列中都有重复的值。

我想转置/旋转它们以获得如下所示的输出。尽管我尝试了以下方法,但它没有用。

df1.pivot(index='Year', columns='Heading', values='rate')

我希望我的输出如下所示,其中每一年都作为一行,而该年份的所有相应条目都作为列。

请注意,我没有填写值,因为表列结构更重要。

在此处输入图像描述

你能帮我吗?

标签: pythonpandasnumpydataframepandas-groupby

解决方案


你可以试试这个。您可以df.unstack()在此处使用并将多索引转换为使用join.

df1 = df1.pivot_table(index=['Year','Gender'],columns='Heading',values='rate').unstack()

df1.columns = ['_'.join(i) for i in df1.columns.tolist()]

df1 
      TDAS3_Female  TDAS3_Male  TNMAB123_Female  TNMAB123_Male  TSAD4_Female  TSAD4_Male  TWQE2_Female  TWQE2_Male
Year
2008           NaN         NaN              6.3            2.3           NaN         NaN           NaN         NaN
2009           NaN         NaN              7.1            3.2           NaN         NaN           2.1         4.5
2010           5.3         5.6              NaN            NaN           NaN         NaN           4.2         6.7
2011           3.6         3.2              NaN            NaN           2.9         3.5           NaN         NaN
2012           NaN         NaN              NaN            NaN           6.2         2.6           NaN         NaN

有几种方法可以将多索引转换为单级。使用df.columsdf.columns.tolistpd.MultiIndex.to_flat_index

  • ['_'.join(i) for i in df1.columns.tolist()]
  • ['_'.join(i) for i in df1.columns]
  • ['_'.join(i) for i in df1.columns.to_flat_index()]

推荐阅读