首页 > 解决方案 > 在所有第一级列上过滤 Pandas MultiIndex

问题描述

试图找到一种方法,根据仅为一个顶级列定义的过滤器有效地过滤两个顶级列下的所有条目。最好用下面的例子和所需的输出来解释。

示例数据框

import pandas as pd
import numpy as np
info = ['price', 'year']
months = ['month0','month1','month2']
settlement_dates = ['2020-12-31', '2021-01-01']
Data = [[[2,4,5],[2020,2021,2022]],[[1,4,2],[2021,2022,2023]]]
Data = np.array(Data).reshape(len(settlement_date),len(months) * len(info))
midx = pd.MultiIndex.from_product([assets, Asset_feature])
df = pd.DataFrame(Data, index=settlement_dates, columns=midx)
df

            price                 year              
           month0 month1 month2 month0 month1 month2
2020-12-31      2      4      5   2020   2021   2022
2021-01-01      1      4      2   2021   2022   2023

为多索引数据框创建过滤器

idx_cols = pd.IndexSlice

df_filter = df.loc[:, idx_cols['year', :]]==2021

df[df_filter]


            price                  year               
           month0 month1 month2  month0  month1 month2
2020-12-31    NaN    NaN    NaN     NaN  2021.0    NaN
2021-01-01    NaN    NaN    NaN  2021.0     NaN    NaN

期望的输出:

            price                  year               
           month0 month1 month2  month0  month1 month2
2020-12-31    NaN    4      NaN     NaN  2021.0    NaN
2021-01-01    1      NaN    NaN  2021.0     NaN    NaN

标签: pythonpandasdataframemulti-index

解决方案


您可以通过 reshape for DataFramebyDataFrame.stack和 filter by来重塑简化解决方案DataFrame.where

df1 = df.stack()

df_filter = df1['year']==2021

df_filter = df1.where(df_filter).unstack()
print (df_filter)
            price                  year               
           month0 month1 month2  month0  month1 month2
2020-12-31    NaN    4.0    NaN     NaN  2021.0    NaN
2021-01-01    1.0    NaN    NaN  2021.0     NaN    NaN

您的解决方案是可能的,但更复杂 - 通过向后和向前填充缺失值来重新塑造缺失值的掩码:

idx_cols = pd.IndexSlice

df_filter = df.loc[:, idx_cols['year', :]]==2021

df_filter = df_filter.reindex(df.columns, axis=1).stack(dropna=False).bfill(axis=1).ffill(axis=1).unstack()
print (df_filter)
            price                 year              
           month0 month1 month2 month0 month1 month2
2020-12-31  False   True  False  False   True  False
2021-01-01   True  False  False   True  False  False

print (df[df_filter])
            price                  year               
           month0 month1 month2  month0  month1 month2
2020-12-31    NaN    4.0    NaN     NaN  2021.0    NaN
2021-01-01    1.0    NaN    NaN  2021.0     NaN    NaN

推荐阅读