首页 > 解决方案 > 在现有数据框中插入一个级别 o,以便将 4 列分组为一个

问题描述

我想对我的数据框进行多索引,以便将 MAE、MSE、RMSE、MPE 组合在一起并赋予新的索引级别。同样,其余四个应分组在同一级别但名称不同

> mux3 = pd.MultiIndex.from_product([list('ABCD'),list('1234')],
> names=['one','two'])###dummy data 
>     df3 = pd.DataFrame(np.random.choice(10, (3, len(mux))), columns=mux3) #### dummy data frame
>     print(df3) #intended output required for the data frame in the picture given below

样本数据框

标签: pandasdataframeindexinginsertmulti-level

解决方案


假设列组已经按适当的顺序排列,我们可以简单地创建一个np.arange超过列的长度并将地板除以 4 来获取组并创建一个简单的MultiIndex.from_arrays.

样本输入和输出:

import numpy as np
import pandas as pd

initial_index = [1, 2, 3, 4] * 3
np.random.seed(5)
df3 = pd.DataFrame(
    np.random.choice(10, (3, len(initial_index))), columns=initial_index
)

   1  2  3  4  1  2  3  4  1  2  3  4  # Column headers are in repeating order
0  3  6  6  0  9  8  4  7  0  0  7  1
1  5  7  0  1  4  6  2  9  9  9  9  1
2  2  7  0  5  0  0  4  4  9  3  2  4
# Create New Columns
df3.columns = pd.MultiIndex.from_arrays([
    np.arange(len(df3.columns)) // 4,  # Group Each set of 4 columns together
    df3.columns  # Keep level 1 the same as current columns
], names=['one', 'two'])  # Set Names (optional)
df3

one  0           1           2         
two  1  2  3  4  1  2  3  4  1  2  3  4
0    3  6  6  0  9  8  4  7  0  0  7  1
1    5  7  0  1  4  6  2  9  9  9  9  1
2    2  7  0  5  0  0  4  4  9  3  2  4

如果列是混合顺序的:

np.random.seed(5)
df3 = pd.DataFrame(
    np.random.choice(10, (3, 8)), columns=[1, 1, 3, 2, 4, 3, 2, 4]
)
df3

   1  1  3  2  4  3  2  4  # Cannot select groups positionally
0  3  6  6  0  9  8  4  7
1  0  0  7  1  5  7  0  1
2  4  6  2  9  9  9  9  1

如果需要,我们可以使用then来转换Index.to_series然后枚举列:groupby cumcountsort_index

df3.columns = pd.MultiIndex.from_arrays([
    # Enumerate Groups to create new level 0 index
    df3.columns.to_series().groupby(df3.columns).cumcount(),
    df3.columns
], names=['one', 'two'])  # Set Names (optional)
# Sort to Order Correctly
# (Do not sort before setting columns it will break alignment with data)
df3 = df3.sort_index(axis=1)
df3

one  0           1         
two  1  2  3  4  1  2  3  4  # Notice Data has moved with headers
0    3  0  6  9  6  4  8  7
1    0  1  7  5  0  0  7  1
2    4  9  2  9  6  9  9  1

推荐阅读