python-3.x - Pivoting a Pandas Dataframe on Categorical Variables
问题描述
I have a dataframe containing categorical variables:
{'SysID': {0: '00721778',
1: '00721778',
2: '00721778',
3: '00721779',
4: '00721779'},
'SoftwareComponent': {0: 'AA13912',
1: 'AA24120',
2: 'AA21612',
3: 'AA30861',
4: 'AA20635'},
'SoftwareSubcomponent': {0: None,
1: 'AK21431',
2: None,
3: 'AK22116',
4: None}}
I would like to pivot on the categorical variables by ignoring any NULL values. Zero should be the filler. The output should look like this:
{'SysID': {0: '00721778', 1: '00721779'},
'SoftwareCom-AA13912': {0: '1', 1: '0'},
'SoftwareCom-AA24120': {0: '1', 1: '0'},
'SoftwareCom-AA21612': {0: '1', 1: '0'},
'SoftwareCom-AA30861': {0: '0', 1: '1'},
'SoftwareCom-AA20635': {0: '0', 1: '1'},
'SoftwareSub-AK21431': {0: '1', 1: '0'},
'SoftwareSub-AK22116': {0: '0', 1: '1'}}
How to do this?
解决方案
You can use pd.crosstab
after doing a bit of cleanup. We will stack (which will ignore all of the None
values) and create the column names as you want to treat SofwareCom and SoftwareSub the same.
import pandas as pd
df = df.set_index('SysID').stack().reset_index(level=1)
df['val'] = df['level_1'].str[0:11] + '-' + df[0]
pd.crosstab(df.index, df.val).rename_axis('SysID', 0).rename_axis(None,1).reset_index()
Output:
SysID SoftwareCom-AA13912 SoftwareCom-AA20635 SoftwareCom-AA21612 SoftwareCom-AA24120 SoftwareCom-AA30861 SoftwareSub-AK21431 SoftwareSub-AK22116
0 00721778 1 0 1 1 0 1 0
1 00721779 0 1 0 0 1 0 1
If you have the possibility of having multiple counts and just want 1s and 0s, then you can either typecast to bool, then back to int, or just use .clip
pd.crosstab(df.index, df.val).rename_axis('SysID', 0).rename_axis(None,1).clip(0,1).reset_index()