首页 > 解决方案 > 熊猫数据框列到子列

问题描述

我想将列转换为子列。

假设数据是这样的;

    Q1  Q2:Q21  Q2:Q22 Q2:Q23 Q3:Q31 Q3:Q32
0  yes   green    blue  green    bus    car
1   no     red  orange   blue    car   bike
2  yes   green  yellow  black    car   walk
3  yes  yellow   green  brown    bus   walk
4   no   green   green    red    car    bus

重塑列后,我想拥有;

    Q1              Q2               Q3
    Q1     Q21     Q22    Q23    Q31    Q32
0  yes   green    blue  green    bus    car
1   no     red  orange   blue    car   bike
2  yes   green  yellow  black    car   walk
3  yes  yellow   green  brown    bus   walk
4   no   green   green    red    car    bus

在这里,我尝试了什么;

import pandas as pd
survey = pd.read_csv('survey.csv')
# first column names
survey_cols = [col.split(':')[0] for col in survey.columns]
# unique column names
survey_ucols = []
for e in survey_cols:
    if e not in survey_ucols:
        survey_ucols.append(e)
# second column names, subcolumns
survey_subcols = []
for col in survey_ucols:
    survey_subcols.append([subcol.split(':')[-1] for subcol in survey.columns if col in subcol])
# create new df
tuples = list(zip(survey_ucols,survey_subcols))
cols = pd.MultiIndex.from_tuples(tuples, names=['mainQ', 'subQ'])
survey_new = pd.DataFrame(survey, columns=cols)

提前致谢

标签: pythonpandas

解决方案


Index.to_series您可以使用和创建辅助数据帧Series.str.split,因此可以通过 前向填充每行的缺失值ffill,最后分配回MultiIndex.from_arrays

df = survey.columns.to_series().str.split(':', expand=True).ffill(axis=1)
survey.columns = pd.MultiIndex.from_arrays([df[0].tolist(), df[1].tolist()])
#simplified
#survey.columns = [df[0].tolist(), df[1].tolist()]
print (survey)
    Q1      Q2                  Q3      
    Q1     Q21     Q22    Q23  Q31   Q32
0  yes   green    blue  green  bus   car
1   no     red  orange   blue  car  bike
2  yes   green  yellow  black  car  walk
3  yes  yellow   green  brown  bus  walk
4   no   green   green    red  car   bus

详情

print (df)
         0    1
Q1      Q1   Q1
Q2:Q21  Q2  Q21
Q2:Q22  Q2  Q22
Q2:Q23  Q2  Q23
Q3:Q31  Q3  Q31
Q3:Q32  Q3  Q32

推荐阅读