首页 > 解决方案 > 按分组列求和值

问题描述

我已经非常接近弄清楚如何按照我想要的方式对我的数据框进行排序,我只是不知道是否有更清洁的方法来做到这一点。

假设我的数据框定义为-

dic = {'firstname':['John','John','John','John','John','Susan','Susan',
                    'Susan','Susan','Susan','Mike','Mike','Mike','Mike',
                    'Mike'],
       'lastname':['Smith','Smith','Smith','Smith','Smith','Wilson',
                   'Wilson','Wilson','Wilson','Wilson','Jones','Jones',
                   'Jones','Jones','Jones'],
       'company':['KFC','BK','KFC','KFC','KFC','BK','BK','WND','WND',
                  'WND','TB','CHP','TB','CHP','TB'],
       'paid':[200,300,250,100,900,650,430,218,946,789,305,750,140,860,310],
       'overtime':[205,554,840,100,203,640,978,451,356,779,650,950,230,250,980]}
df = pd.DataFrame(dic)
print(df)

与输出-

   firstname lastname company  paid  overtime
0       John    Smith     KFC   200       205
1       John    Smith      BK   300       554
2       John    Smith     KFC   250       840
3       John    Smith     KFC   100       100
4       John    Smith     KFC   900       203
5      Susan   Wilson      BK   650       640
6      Susan   Wilson      BK   430       978
7      Susan   Wilson     WND   218       451
8      Susan   Wilson     WND   946       356
9      Susan   Wilson     WND   789       779
10      Mike    Jones      TB   305       650
11      Mike    Jones     CHP   750       950
12      Mike    Jones      TB   140       230
13      Mike    Jones     CHP   860       250
14      Mike    Jones      TB   310       980

我的目标是找出是否有人在不包括加班费的公司赚了超过 1,300 美元(所以只是付费栏)

这就是我尝试过的-

df = df.groupby(['lastname', 'firstname','company']).sum()
s = df['paid']>1300
df['limit']=s
df = df.loc[df['limit']==True]
del df['limit']
df = df.sort_values(by=['paid'],ascending=False)
print(df)

输出结果-

                            paid  overtime
lastname firstname company                
Wilson   Susan     WND      1953      1586
Jones    Mike      CHP      1610      1200
Smith    John      KFC      1450      1348

我正在寻求帮助以清理我的工作,但也有一些问题。

为什么带薪和加班的输出栏是加了一个?我可以把它调回去吗?另外,我只想对付费栏求和,没有其他内容(如果加班栏没有出现在最终输出中也没关系)

这更多是我要找的-

lastname firstname company paid              
Wilson   Susan     WND     1953
Jones    Mike      CHP     1610
Smith    John      KFC     1450

标签: pythonpandasdataframegroup-bysum

解决方案


更改此行,使用as_index=Falseand agg

df = df.groupby(['lastname', 'firstname','company'], as_index=False).agg({'paid': 'sum'})

  lastname firstname company  paid
5   Wilson     Susan     WND  1953
0    Jones      Mike     CHP  1610
3    Smith      John     KFC  1450

推荐阅读