首页 > 解决方案 > Pivoting a Pandas Dataframe on Categorical Variables

问题描述

I have a dataframe containing categorical variables:

{'SysID': {0: '00721778',
1: '00721778',
2: '00721778',
3: '00721779',
4: '00721779'},
'SoftwareComponent': {0: 'AA13912',
1: 'AA24120',
2: 'AA21612',
3: 'AA30861',
4: 'AA20635'},
'SoftwareSubcomponent': {0: None,
1: 'AK21431',
2: None,
3: 'AK22116',
4: None}}

I would like to pivot on the categorical variables by ignoring any NULL values. Zero should be the filler. The output should look like this:

{'SysID': {0: '00721778', 1: '00721779'},
'SoftwareCom-AA13912': {0: '1', 1: '0'},
'SoftwareCom-AA24120': {0: '1', 1: '0'},
'SoftwareCom-AA21612': {0: '1', 1: '0'},
'SoftwareCom-AA30861': {0: '0', 1: '1'},
'SoftwareCom-AA20635': {0: '0', 1: '1'},
'SoftwareSub-AK21431': {0: '1', 1: '0'},
'SoftwareSub-AK22116': {0: '0', 1: '1'}}

How to do this?

标签: python-3.xpandaspivotcategorical-data

解决方案


You can use pd.crosstab after doing a bit of cleanup. We will stack (which will ignore all of the None values) and create the column names as you want to treat SofwareCom and SoftwareSub the same.

import pandas as pd

df = df.set_index('SysID').stack().reset_index(level=1)
df['val'] = df['level_1'].str[0:11] + '-' + df[0]

pd.crosstab(df.index, df.val).rename_axis('SysID', 0).rename_axis(None,1).reset_index()

Output:

      SysID  SoftwareCom-AA13912  SoftwareCom-AA20635  SoftwareCom-AA21612  SoftwareCom-AA24120  SoftwareCom-AA30861  SoftwareSub-AK21431  SoftwareSub-AK22116
0  00721778                    1                    0                    1                    1                    0                    1                    0
1  00721779                    0                    1                    0                    0                    1                    0                    1

If you have the possibility of having multiple counts and just want 1s and 0s, then you can either typecast to bool, then back to int, or just use .clip

pd.crosstab(df.index, df.val).rename_axis('SysID', 0).rename_axis(None,1).clip(0,1).reset_index()

推荐阅读