首页 > 解决方案 > 如何按列值年份和月份分组以获得上个月的工资?

问题描述

我有这个形状员工的数据,他的月薪增加了几个月

Employee    year    month     Salary
PersonA     2001    1         $50000 
PersonB     2001    5         $65000 
PersonB     2002    1         $75000 
PersonB     2002    3         $100000 
PersonC     2002    5         $75000 
PersonC     2002    6         $100000 
PersonC     2003    3         $110000 
PersonC     2003    9         $130000 
PersonC     2004    3         $150000 
PersonC     2005    3         $200000

我想创建相同的形状,但有一个名为上个月薪水的额外列

Employee    year    month     Salary     previous month salary 
PersonA     2001    1         $50000     0
PersonB     2001    5         $65000     0
PersonB     2002    1         $75000     $65000
PersonB     2002    3         $100000    $75000
PersonC     2002    5         $75000     0
PersonC     2002    6         $100000    $75000
PersonC     2003    3         $110000    $100000
PersonC     2003    9         $130000    $110000
PersonC     2004    3         $150000    $130000
PersonC     2005    3         $200000    $150000

我在 pandas 中尝试过groupby,但我无法将月份值减一,因为这只是所有月份的样本真实数据,所以如果我能得到上个月的值,就是这样。

但是当我尝试时,groupby我无法达到如何减去

df["previous_salary"]=df.groupby(['year',"month"])['salary'].transform('mean').astype(np.float16)

df["previous_salary"]=df.groupby(['year',"month"])['salary']

结果是同月的平均值或值

Employee    year    month     Salary     previous month salary 
PersonA     2001    1         $50000     $50000
PersonB     2001    5         $65000     $65000
PersonB     2002    1         $75000     $75000
PersonB     2002    3         $100000    $100000
PersonC     2002    5         $75000     $75000 
PersonC     2002    6         $100000    $100000
PersonC     2003    3         $110000    $110000
PersonC     2003    9         $130000    $130000
PersonC     2004    3         $150000    $150000
PersonC     2005    3         $200000    $200000

有没有办法在我分组之前减去月份的值,或者还有另一种方法可以做到这一点

标签: pythonpandas

解决方案


您可以使用groupby().shift()获取以前的数据:

prev_salaries = df.groupby(['Employee']).Salary.shift()

# fill with current month
df['prev_salary'] = prev_salaries.fillna(df['Salary'])

输出:

  Employee  year  month   Salary prev_salary
0  PersonA  2001      1   $50000      $50000
1  PersonB  2001      5   $65000      $65000
2  PersonB  2002      1   $75000      $65000
3  PersonB  2002      3  $100000      $75000
4  PersonC  2002      5   $75000      $75000
5  PersonC  2002      6  $100000      $75000
6  PersonC  2003      3  $110000     $100000
7  PersonC  2003      9  $130000     $110000
8  PersonC  2004      3  $150000     $130000
9  PersonC  2005      3  $200000     $150000

推荐阅读