首页 > 解决方案 > How to groupby the keys in dictionary and sum up the values in python?

问题描述

How to groupby two keys in dictionary and get the sum of the values of the other key val.

Input:

data = {'key1':['a','a', 'b', 'b'], 'key2':['m','n', 'm', 'm'], 
        'val':[1, 2, 3, 4]}

In this example, I want to groupby the key1 and the key2, and then sum up the value in val.

Expected:

data = {'key1':['a','a', 'b', 'b'], 'key2':['m','n', 'm', 'm'], 
        'val':[1, 2, 3, 4], 'val_sum':[1, 2, 7, 7]}

Actually, I don't want to convert the dictionary data into pandas.DataFrame then convert back to dictionary to achieve it, because my data is actually very big.


Update:

To help understand the generating val_sum, I post my code using pandas.DataFrame.

df = pd.DataFrame(data)
tmp = df.groupby(['key1', 'key2'])['val'].agg({'val_sum':'sum'})
df['val_sum'] = df.set_index(['key1', 'key2']).index.map(tmp.to_dict()['val_sum'])

And the result is shown as follows:

  key1 key2  val  val_sum
0    a    m    1        1
1    a    n    2        2
2    b    m    3        7
3    b    m    4        7

标签: pythondictionarypandas-groupby

解决方案


您可以使用 defaultdict 构建自己的求和解决方案,如下所示。

from collections import defaultdict

data = {'key1':['a','a', 'b', 'b'], 'key2':['m','n', 'm', 'm'], 
        'val':[1, 2, 3, 4]}


keys_to_group = ['key1','key2']

temp = defaultdict(int) #initializes sum to zero


for i, *key_group in zip(data['val'], *[data[key] for key in keys_to_group]):
    print(i, key_group) #key_group now looks like ['a', 'm'] or ['b', 'm'] or so on
    temp[tuple(key_group)] += i

val_sum = [temp[key_group] for key_group in zip(*[data[key] for key in keys_to_group])]

data['val_sum'] = val_sum

print(data)
{'key1': ['a', 'a', 'b', 'b'],
 'key2': ['m', 'n', 'm', 'm'],
 'val': [1, 2, 3, 4],
 'val_sum': [1, 2, 7, 7]}

然而,话虽如此,您的数据似乎更适合表格结构,如果您打算做的不仅仅是这一项操作,那么无论如何将其加载到数据框中可能是有意义的。


推荐阅读