首页 > 解决方案 > 通过将一些值移动到列中,将行规范化为唯一行

问题描述

我有一个目前看起来像这样的数据框:

 index  serial          email     firstname     lastname     country     job    course     completed
     0    0005    one@two.com         David        Smith          US   Sales   course1            Y
     1    0076  three@two.com          John       Bloggs          GB    Exec   course2            Y
     2    0005    one@two.com         David        Smith          US   Sales   course2            Y
     3    0005    one@two.com         David        Smith          US   Sales   course3            Y
     4     NaN    foo@bar.com           Foo          Bar          IN     ext   course2            Y
     5     NaN    bar@foo.com           Bar          Far          NZ     ext   course2            Y
   ...     ...            ...           ...         ...          ...          ...           ...

我想规范化这个数据框,以便一个人只出现一次(在一行上)。换句话说,我想把它变成这样的东西:

 index   serial           email     firstname     lastname     country     job    course1    course2    course3
     0     0005     one@two.com         David        Smith          US   Sales        Yes        Yes        Yes
     1     0076   three@two.com          John       Bloggs          GB    Exec        NaN        Yes        NaN
     2      NaN     foo@bar.com           Foo          Bar          IN     ext        NaN        Yes        NaN
     3      NaN     bar@foo.com           Bar          Far          NZ     ext        NaN        Yes        NaN
   ...      ...             ...           ...          ...         ...           ...        ...        ...

请注意,唯一标识符是公司人员(工作 == 销售或执行人员)的序列号,而外部人员(工作 == 分机)的序列号是他们的电子邮件。

标签: pandas

解决方案


我试过这个,

dumm= (pd.get_dummies(df['course'])).astype(str).replace({'0':np.NaN,'1':'Yes'})
del df['course']
df=pd.concat([df,dumm],axis=1)
df=df.groupby('email').apply(lambda x:x.fillna(method='bfill'))
df=df.drop_duplicates(subset=['email'],keep='first')

输出:

   index  serial          email firstname lastname country    job completed  \
0      0     5.0    one@two.com     David    Smith      US  Sales         Y   
1      1    76.0  three@two.com      John   Bloggs      GB   Exec         Y   
4      4     NaN    foo@bar.com       Foo      Bar      IN    ext         Y   
5      5     NaN    bar@foo.com       Bar      Far      NZ    ext         Y   

  course1 course2 course3  
0     Yes     Yes     Yes  
1     NaN     Yes     NaN  
4     NaN     Yes     NaN  
5     NaN     Yes     NaN 

推荐阅读