首页 > 解决方案 > 如何在特定条件下使用循环从数据框中总结某些值?

问题描述

我有一个字典和一个包含两列的 DataFrame:namesalary. 我想将名称与字典的各个值匹配的薪水相加。这是我到目前为止所拥有的。我想分别总结经理、文员和分析师的薪水。

import pandas as pd

a = ['manager','sales','clerk','manager','analayst','sales','manager','analayst' ,'sales','clerk','clerk','analayst']
b = [45000,78000,12000,45000,96000,78000,56000,95000,84000,75000,95000,
   26000]
df = pd.DataFrame({'name':a,'salary':b})

sum = 0
k = 0
c = []

for i in a:
    if i not in c:
        c.append(i)

for j in range(len(df)):
    while k < len(c):
        p = c[k]
        print(p)
        
        d = df[df['name'] == p]['salary'].sum()
        k += 1[1]

标签: pythonpandasdataframe

解决方案


利用:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.GroupBy.sum.html

下面是结果的片段..

>>> import pandas as pd
>>> a = ['manager', 'sales', 'clerk', 'manager', 'analyst', 'sales', 'manager', 'analyst', 'sales', 'clerk',  'clerk', 'analyst']
>>> b = [45000, 78000, 12000, 45000, 96000, 78000, 56000, 95000, 84000, 75000, 95000, 26000]
>>> df = pd.DataFrame({'name': a, 'salary': b})
>>> df
       name  salary
0   manager   45000
1     sales   78000
2     clerk   12000
3   manager   45000
4   analyst   96000
5     sales   78000
6   manager   56000
7   analyst   95000
8     sales   84000
9     clerk   75000
10    clerk   95000
11  analyst   26000
>>> df.groupby(['name']).sum()
         salary
name
analyst  217000
clerk    182000
manager  146000
sales    240000

推荐阅读