首页 > 解决方案 > Multi-column or block melt and then pivot?

问题描述

I'm struggling with a data reshaping problem. I have data approximately like this (but up to C10 group, with a multi-index I added myself.

df = pd.DataFrame({('C0',0) : {0:'ID1', 1:'ID2', 2:'ID3'},
               ('C0',1) : {0:'ID1', 1:'ID2', 2:'ID3'},
               ('C1',2) : {0:'A'  , 1:'A'  , 2:'A'},
               ('C1',3) : {0:'B'  , 1:'B'  , 2:'B'},
               ('C1',4) : {0:'C'  , 1:'C'  , 2:'C'},
               ('C2',5) : {0:'A'  , 1:'A'  , 2:'A'},
               ('C2',6) : {0:'B'  , 1:'B'  , 2:'B'},
               ('C2',7) : {0:'C'  , 1:'C'  , 2:'C'},
               ('C3',8) : {0:'A'  , 1:'A'  , 2:'A'},
               ('C3',9) : {0:'B'  , 1:'B'  , 2:'B'},
               ('C3',10) : {0:'C'  , 1:'C'  , 2:'C'}
              })

    C0      C1          C2          C3
    0   1   2   3   4   5   6   7   8   9   10
0   ID1 ID1 A   B   C   A   B   C   A   B   C
1   ID2 ID2 A   B   C   A   B   C   A   B   C
2   ID3 ID3 A   B   C   A   B   C   A   B   C

ID columns aren't actually identical values, but the unique key for this data is across multiple columns. Similarly, As, Bs, Cs are not actually identical values, just different types of values. The desired end result is this

    0   1   2   3   4   5
0   ID1 ID1 C1  A   B   C
1   ID1 ID1 C1  A   B   C
2   ID1 ID1 C1  A   B   C
3   ID2 ID2 C2  A   B   C
4   ID2 ID2 C2  A   B   C
5   ID2 ID2 C2  A   B   C
6   ID3 ID3 C3  A   B   C
7   ID3 ID3 C3  A   B   C
8   ID3 ID3 C3  A   B   C

In messing with this I've gotten to an intermediate point, where I've melted down the first bit, to include both column indexes on a line, but I wasn't able to re-pivot that into the desired format. Ultimately I'll also need to enumerate the lines within the C-groups, but I believe I can do that. First post here, mercy appreciated, but I've checked here and elsewhere to the best of my abilities.

标签: python-3.xpandasreshapemelt

解决方案


你可以融化然后取消堆叠:

s = df.melt(id_vars = list(df.columns[:2]), value_vars = list(df.columns[2:]))
d = s.assign(nm = s.groupby(list(s.columns[:3])).cumcount()).drop('variable_1', 1)
d.set_index(list(d.drop('value', 1).columns)).unstack().reset_index()

    C0      variable_0 value      
nm    0    1                0  1  2
0   ID1  ID1         C1     A  B  C
1   ID1  ID1         C2     A  B  C
2   ID1  ID1         C3     A  B  C
3   ID2  ID2         C1     A  B  C
4   ID2  ID2         C2     A  B  C
5   ID2  ID2         C3     A  B  C
6   ID3  ID3         C1     A  B  C
7   ID3  ID3         C2     A  B  C
8   ID3  ID3         C3     A  B  C

从 hre 你可以删除你想要的列级别


推荐阅读