首页 > 解决方案 > Pandas df unstack rows

问题描述

I have a dataframe with the following structure:

id       timestamp_str  y   x
2569     Jan.2018       4   9   
2569     Feb.2018       6   5
2569     Mar.2018       3   9   
2569     Apr.2018       2   10
2569     May.2018       6   9
3679     Jan.2018       14  52  
3679     Feb.2018       62  4   
3679     Mar.2018       5   89
3679     Apr.2018       9   4
3679     May.2018       32  9
.
.
.

and I would like to have this data into the following format:

id    parameter   Jan.2018   Feb.2018   Mar.2018   Apr.2018   May.2018
2569  y            4         6          3          2          6
2569  x            9         5          9          10         9
3679  y            14        62         5          9          32
3679  x            52        4          89         4          9

But so far I was not able to change the format, I tried something like this

df = (df.set_index(['id',df.groupby(['id']).cumcount().add(1)])
    .unstack()
    .sort_index(axis=1, level=[1, 0], ascending=[True, False]))
df.columns = df.columns.map(lambda x: f'{x[0]}{x[1]}')
df = df.reset_index()

Thanks for any hints,

cheers!

标签: pandasrows

解决方案


如果需要不转换为日期时间的解决方案:

使用DataFrame.stackwithSeries.unstack进行重塑,在 中设置日期时间的原始顺序DataFrame.reindex

df = (df.set_index(['id_x','timestamp_str'])
       .stack()
       .unstack(1)
       .reindex(df['timestamp_str'].unique(),axis=1)
       .rename_axis(index=['id','parameter'], columns=None)
       .reset_index())

print (df)
     id parameter  Jan.2018  Feb.2018  Mar.2018  Apr.2018  May.2018
0  2569         y         4         6         3         2         6
1  2569         x         9         5         9        10         9
2  3679         y        14        62         5         9        32
3  3679         x        52         4        89         4         9

或者像提到的@sammywemmyDataFrame.melt一样DataFrame.pivot

df = (df.melt(['id_x', 'timestamp_str'])
       .pivot(['id_x', 'variable'], 'timestamp_str', 'value')
       .reindex(df['timestamp_str'].unique(),axis=1)
       .rename_axis(index=['id','parameter'], columns=None)
       .reset_index())

print (df)
     id parameter  Jan.2018  Feb.2018  Mar.2018  Apr.2018  May.2018
0  2569         x         9         5         9        10         9
1  2569         y         4         6         3         2         6
2  3679         x        52         4        89         4         9
3  3679         y        14        62         5         9        32

转换为日期时间的解决方案 - 首先将列转换timestamp_str为日期时间,应用没有的解决方案reindex并添加rename原始格式datetimes

df['timestamp_str'] = pd.to_datetime(df['timestamp_str'], format='%b.%Y')

df = (df.set_index(['id_x','timestamp_str'])
       .stack()
       .unstack(1)
       .rename_axis(index=['id','parameter'], columns=None)
       .rename(columns= lambda x: x.strftime('%b.%Y'))
       .reset_index())

df['timestamp_str'] = pd.to_datetime(df['timestamp_str'], format='%b.%Y')

df = (df.melt(['id_x', 'timestamp_str'])
       .pivot(['id_x', 'variable'], 'timestamp_str', 'value')
       .rename_axis(index=['id','parameter'], columns=None)
       .rename(columns= lambda x: x.strftime('%b.%Y'))
       .reset_index())

推荐阅读