首页 > 解决方案 > 融化熊猫中的多索引数据框

问题描述

我一直被这个多层次的股票价格数据框所困扰,我试图将它从宽数据转变为长数据。

我想从这个开始:

Attributes    Close   Close     High          
Symbols        AMZN    ^DJI     AMZN 
Date                                                                 
2020-12-01       32     29         35 
2020-11-30       31     28         36 
2020-11-27       30     27         37  

对此:

      Attri   Sym        Date     price
0     Close  AMZN  2020-12-01        32
1     Close  AMZN  2020-11-30        31
2     Close  AMZN  2020-11-27        30
3     Close  ^DJI  2020-12-01        29
4     Close  ^DJI  2020-11-30        28
5     High   AMZN  2020-12-01        35
6     ....

我试过了:

df = df.reset_index()
df = df.set_index([('Date', '')]).rename_axis(index=None, columns=('Date', ''))
df = df.fillna('').set_index('Date').T\
    .set_index('',append=True).stack().reset_index()

但我不明白。任何想法我还应该尝试什么?

标签: pythonpandas

解决方案


对我来说DataFrame.stack,两个级别的工作Series.reset_index

df = df.stack([0,1]).reset_index(name='price')
print (df)
         Date Attributes Symbols  price
0  2020-12-01      Close    AMZN   32.0
1  2020-12-01      Close    ^DJI   29.0
2  2020-12-01       High    AMZN   35.0
3  2020-11-30      Close    AMZN   31.0
4  2020-11-30      Close    ^DJI   28.0
5  2020-11-30       High    AMZN   36.0
6  2020-11-27      Close    AMZN   30.0
7  2020-11-27      Close    ^DJI   27.0
8  2020-11-27       High    AMZN   37.0

另一个想法是@sammywemmy 评论的解决方案:

df = df.melt(ignore_index=False, value_name="price").reset_index()
print (df)
         Date Attributes Symbols  price
0  2020-12-01      Close    AMZN     32
1  2020-11-30      Close    AMZN     31
2  2020-11-27      Close    AMZN     30
3  2020-12-01      Close    ^DJI     29
4  2020-11-30      Close    ^DJI     28
5  2020-11-27      Close    ^DJI     27
6  2020-12-01       High    AMZN     35
7  2020-11-30       High    AMZN     36
8  2020-11-27       High    AMZN     37

推荐阅读