首页 > 解决方案 > 具有多个维度的 Pivot DataFrame

问题描述

我有两份报告,一份是训练状态,一份是大师名单。培训报告有 15 列。主名册有 9 列。我在下面创建了一个小样本。我的术语可能不正确,因为我是 Python 新手。

培训报告(我在Training列中添加了一些条件逻辑Training Code。请注意,如果他们完成了多次培训,则可以重复名称,例如Name2。)

import pandas as pd
df = pd.DataFrame({'Name':['Name1','Name2','Name2','Name3'],
                   'Office':['A', 'B', 'B', 'A'],
                   'Position':['Director','Manager','Manager','Analyst'],
                   'Training Code':['C3','C1-L','C2','C1-B'],
                   'Training':['ADV','BEG','INT','BEG']
                  })

输出

    Name Office  Position Training Code Training
0  Name1      A  Director            C3      ADV
1  Name2      B   Manager          C1-L      BEG
2  Name2      B   Manager            C2      INT
3  Name3      A   Analyst          C1-B      BEG

大师名册(我根据Required列的条件添加Status列。这是名册上每个人的唯一姓名列表。)

df4 = pd.DataFrame({'Name':['Name1','Name2','Name3','Name4'],
                   'Office':['A', 'B', 'A', 'C'],
                   'Position':['Director','Manager','Analyst','Supervisor'],
                   'Symbol':['OS','BP','OD','EO'],
                   'Status':[1,3,8,2],
                   'Required':['Required','Required','Recommended','Required']})

输出

    Name Office    Position Symbol  Status     Required
0  Name1      A    Director     OS       1     Required
1  Name2      B     Manager     BP       3     Required
2  Name3      A     Analyst     OD       8  Recommended
3  Name4      C  Supervisor     EO       2     Required

我需要合并主花名册和训练数据,如下所示。

df3 = pd.DataFrame({'Name':['Name1','Name2','Name3','Name4'],
                   'Office':['A', 'B', 'A', 'C'],
                   'Position':['Director','Manager','Analyst','Supervisor'],
                   'Symbol':['OS','BP','OD','EO'],
                   'Status':[1,3,8,2],
                   'Required':['Required','Required','Recommended','Required'],
                   'ADV':[1,0,0,0],
                   'INT':[0,1,0,0],
                   'BEG':[0,1,1,0]
                  })

期望的输出(名称的唯一列表和每个名称的信息 - 主名册,与培训报告的旋转版本合并。)

    Name Office    Position Symbol  Status     Required  ADV  INT  BEG
0  Name1      A    Director     OS       1     Required    1    0    0
1  Name2      B     Manager     BP       3     Required    0    1    1
2  Name3      A     Analyst     OD       8  Recommended    0    0    1
3  Name4      C  Supervisor     EO       2     Required    0    0    0

我需要使用主名册来获取该报告中的所有名称和其他字段。然后,我需要将该报告与一个旋转的培训报告合并,并将该Training列分成多个列并进行计数。

我的第一步是尝试转换训练报告数据(不使用所有列),然后将其与主花名册合并。

pvt = df.pivot_table(index = ['Name','Office','Position'],
                     columns = 'Training',
                     fill_value = 0,
                     aggfunc='count')

但是,我不确定这是否是最好的方法,并且枢轴输出似乎对合并不友好(我可能是错的)。在 SQL 中,我只会LEFT JOIN将培训报告报告给Name列中的主名单。

对于完成合并这 2 个报告以获得我最终想要的结果的最简单和最佳方法,任何指导将不胜感激。如果我需要进一步澄清任何事情,请告诉我!

----- 更新 2 --------

我能够合并然后旋转数据集,但这并不是我想要的样子。合并看起来不错,我只引入了我需要的列。

result = pd.merge(df4,
                  df[['Name','Training']],
                  on='Name',
                  how='left')

然后,我将列中的“NaN”值替换为Training“NONE”。result.update(result[['Training']].fillna('NONE'))

合并输出

    Name Office    Position Symbol  Status     Required Training
0  Name1      A    Director     OS       1     Required      ADV
1  Name2      B     Manager     BP       3     Required      BEG
2  Name2      B     Manager     BP       3     Required      INT
3  Name3      A     Analyst     OD       8  Recommended      BEG
4  Name4      C  Supervisor     EO       2     Required     NONE

但是,当我尝试旋转数据框时result,我现在得到“空数据框”。

cols = ['Name','Office','Position','Symbol','Status','Required']
pvt2 = result.pivot_table(index=cols,
                          columns='Training',
                          fill_value = 0,
                          aggfunc = 'count')

-------- 最终更新 --------- 我让它工作了!耶!

result = pd.merge(df4,
                  df[['Name','Training']],
                  on='Name',
                  how='left')

result.update(result[['Training']].fillna('NONE'))
cols = ['Name','Office','Position','Symbol','Status','Required']

pvt2 = result.pivot_table(index=cols,                          
                          columns=['Training'],
                          fill_value = 0,
                          aggfunc = len)

我所要做的就是改变aggfunc =count toaggfunc = len`。我希望最终能帮助别人!如果有人对此有所改进,我也绝对愿意接受。

标签: pythonpivot-table

解决方案


可能有更好的方法,但这个解决方案对我有用!同样,我很高兴接受反馈或改进!

import pandas as pd
#Create DataFrame for training
df = pd.DataFrame({'Name':['Name1','Name2','Name2','Name3','Name1'],
                   'Office':['A', 'B', 'B', 'A','A'],
                   'Position':['Director','Manager','Manager','Analyst','Director'],
                   'Training Code':['C3','C1-L','C2','C1-B','C3'],
                   'Training':['ADV','BEG','INT','BEG','ADV']
                  })

#Create DataFrame for master roster
df4 = pd.DataFrame({'Name':['Name1','Name2','Name3','Name4'],
                   'Office':['A', 'B', 'A', 'C'],
                   'Position':['Director','Manager','Analyst','Supervisor'],
                   'Symbol':['OS','BP','OD','EO'],
                   'Status':[1,3,8,2],
                   'Required':['Required','Required','Recommended','Required']})

#Left join the training DataFrame to the master roster DataFrame using the 'Name'
#column as the join key.
result = pd.merge(df4,
                  df[['Name','Training']],
                  on='Name',
                  how='left')

#Substitute any 'NaN' values with 'NONE' so the pivot doesn't drop rows with 'NaN'
result.update(result[['Training']].fillna('NONE'))

#Store all the column headers of the master roster into the 'cols' list
cols = list(roster.columns)

#Pivot the combined 'result' DataFrame using all the columns from
#the master roster DataFrame. The 'Training' column is the column
#that will be broken apart. 'aggfunc = len' does a count of the instances
#of each 'Training' element.
pvt2 = result.pivot_table(index=cols,                          
                          columns=['Training'],
                          fill_value = 0,
                          aggfunc = len)

推荐阅读