首页 > 解决方案 > Pandas Multiindex 在级别上重新索引

问题描述

我有多个不同的系列数据保存为 Multiindex(2-level) pandas 数据框。我想知道如何重新索引 Multiindex 数据帧,以便获取两个现有索引之间的所有(每小时)数据的索引。

所以这是我的数据框的一个例子:

                                   A     B     C     D
tick       act
2019-01-10 2019-01-09 20:00:00   5.0   5.0   5.0   5.0                                        
           2019-01-10 00:00:00  52.0  34.0   1.0   9.0
           2019-01-10 01:00:00  75.0  52.0  61.0   1.0
           2019-01-10 02:00:00  28.0  29.0  46.0  61.0
2019-01-16 2019-01-09 22:00:00  91.0  42.0   3.0  34.0
           2019-01-10 02:00:00   2.0  22.0  41.0  59.0
           2019-01-10 03:00:00  16.0   9.0  92.0  53.0

这就是我想要得到的:

tick       act
2019-01-10 2019-01-09 20:00:00   5.0   5.0   5.0   5.0
           2019-01-09 21:00:00   NaT   NaN   NaN   NaN   NaN
           2019-01-09 22:00:00   NaT   NaN   NaN   NaN   NaN
           2019-01-09 23:00:00   NaT   NaN   NaN   NaN   NaN
           2019-01-10 00:00:00  52.0  34.0   1.0   9.0
           2019-01-10 01:00:00  75.0  52.0  61.0   1.0
           2019-01-10 02:00:00  28.0  29.0  46.0  61.0
2019-01-16 2019-01-09 22:00:00  91.0  42.0   3.0  34.0
           2019-01-09 23:00:00   NaT   NaN   NaN   NaN   NaN
           2019-01-10 00:00:00   NaT   NaN   NaN   NaN   NaN
           2019-01-10 01:00:00   NaT   NaN   NaN   NaN   NaN
           2019-01-10 02:00:00   2.0  22.0  41.0  59.0
           2019-01-10 03:00:00  16.0   9.0  92.0  53.0

要记住的重要一点是“行为”索引级别没有相同的日期范围(例如,在 2019-01-10 中,它以 2019-01-09 20:00:00 开始,以 2019-01-10 结束02:00:00 而对于 2019-01-16,它从 2019-01-09 22:00:00 开始,到 2019-01-10 03:00:00 结束)。

我主要感兴趣的是是否存在使用 pandas 方法而没有不必要的外部循环的解决方案。

标签: pythonpandas

解决方案


首先reset_index是您的数据。

d = df.reset_index()

d

         tick                 act     A     B     C     D
0  2019-01-10 2019-01-09 20:00:00   5.0   5.0   5.0   5.0
1  2019-01-10 2019-01-10 00:00:00  52.0  34.0   1.0   9.0
2  2019-01-10 2019-01-10 01:00:00  75.0  52.0  61.0   1.0
3  2019-01-10 2019-01-10 02:00:00  28.0  29.0  46.0  61.0
4  2019-01-16 2019-01-09 22:00:00  91.0  42.0   3.0  34.0
5  2019-01-16 2019-01-10 02:00:00   2.0  22.0  41.0  59.0
6  2019-01-16 2019-01-10 03:00:00  16.0   9.0  92.0  53.0

分组数据tick并将interpolate函数应用于每个组。

def interpolate(df):
    # generate new index
    new_index = pd.date_range(df.act.min(),df.act.max(),freq="h")
    # set `act` as index and unsampleing it to hours
    return df.set_index("act").reindex(new_index) 

d.groupby("tick").apply(interpolate)

它给:

                                      tick     A     B     C     D
tick                                                              
2019-01-10 2019-01-09 20:00:00  2019-01-10   5.0   5.0   5.0   5.0
           2019-01-09 21:00:00         NaN   NaN   NaN   NaN   NaN
           2019-01-09 22:00:00         NaN   NaN   NaN   NaN   NaN
           2019-01-09 23:00:00         NaN   NaN   NaN   NaN   NaN
           2019-01-10 00:00:00  2019-01-10  52.0  34.0   1.0   9.0
           2019-01-10 01:00:00  2019-01-10  75.0  52.0  61.0   1.0
           2019-01-10 02:00:00  2019-01-10  28.0  29.0  46.0  61.0
2019-01-16 2019-01-09 22:00:00  2019-01-16  91.0  42.0   3.0  34.0
           2019-01-09 23:00:00         NaN   NaN   NaN   NaN   NaN
           2019-01-10 00:00:00         NaN   NaN   NaN   NaN   NaN
           2019-01-10 01:00:00         NaN   NaN   NaN   NaN   NaN
           2019-01-10 02:00:00  2019-01-16   2.0  22.0  41.0  59.0
           2019-01-10 03:00:00  2019-01-16  16.0   9.0  92.0  53.0

推荐阅读