How to groupby two keys in dictionary and get the sum of the values of the other key val.


data = {'key1':['a','a', 'b', 'b'], 'key2':['m','n', 'm', 'm'], 
        'val':[1, 2, 3, 4]}

In this example, I want to groupby the key1 and the key2, and then sum up the value in val.


data = {'key1':['a','a', 'b', 'b'], 'key2':['m','n', 'm', 'm'], 
        'val':[1, 2, 3, 4], 'val_sum':[1, 2, 7, 7]}

Actually, I don't want to convert the dictionary data into pandas.DataFrame then convert back to dictionary to achieve it, because my data is actually very big.


To help understand the generating val_sum, I post my code using pandas.DataFrame.

df = pd.DataFrame(data)
tmp = df.groupby(['key1', 'key2'])['val'].agg({'val_sum':'sum'})
df['val_sum'] = df.set_index(['key1', 'key2']).index.map(tmp.to_dict()['val_sum'])

And the result is shown as follows:

  key1 key2  val  val_sum
0    a    m    1        1
1    a    n    2        2
2    b    m    3        7
3    b    m    4        7

您可以使用 defaultdict 构建自己的求和解决方案,如下所示。

from collections import defaultdict

data = {'key1':['a','a', 'b', 'b'], 'key2':['m','n', 'm', 'm'], 
        'val':[1, 2, 3, 4]}

keys_to_group = ['key1','key2']

temp = defaultdict(int) #initializes sum to zero

for i, *key_group in zip(data['val'], *[data[key] for key in keys_to_group]):
    print(i, key_group) #key_group now looks like ['a', 'm'] or ['b', 'm'] or so on
    temp[tuple(key_group)] += i

val_sum = [temp[key_group] for key_group in zip(*[data[key] for key in keys_to_group])]

data['val_sum'] = val_sum

{'key1': ['a', 'a', 'b', 'b'],
 'key2': ['m', 'n', 'm', 'm'],
 'val': [1, 2, 3, 4],
 'val_sum': [1, 2, 7, 7]}

