python - How to groupby the keys in dictionary and sum up the values in python?
问题描述
How to groupby
two keys
in dictionary
and get the sum of the values of the other key val
.
Input:
data = {'key1':['a','a', 'b', 'b'], 'key2':['m','n', 'm', 'm'],
'val':[1, 2, 3, 4]}
In this example, I want to groupby
the key1
and the key2
, and then sum up the value in val
.
Expected:
data = {'key1':['a','a', 'b', 'b'], 'key2':['m','n', 'm', 'm'],
'val':[1, 2, 3, 4], 'val_sum':[1, 2, 7, 7]}
Actually, I don't want to convert the dictionary data
into pandas.DataFrame
then convert back to dictionary
to achieve it, because my data
is actually very big.
Update:
To help understand the generating val_sum
, I post my code using pandas.DataFrame
.
df = pd.DataFrame(data)
tmp = df.groupby(['key1', 'key2'])['val'].agg({'val_sum':'sum'})
df['val_sum'] = df.set_index(['key1', 'key2']).index.map(tmp.to_dict()['val_sum'])
And the result is shown as follows:
key1 key2 val val_sum
0 a m 1 1
1 a n 2 2
2 b m 3 7
3 b m 4 7
解决方案
您可以使用 defaultdict 构建自己的求和解决方案,如下所示。
from collections import defaultdict
data = {'key1':['a','a', 'b', 'b'], 'key2':['m','n', 'm', 'm'],
'val':[1, 2, 3, 4]}
keys_to_group = ['key1','key2']
temp = defaultdict(int) #initializes sum to zero
for i, *key_group in zip(data['val'], *[data[key] for key in keys_to_group]):
print(i, key_group) #key_group now looks like ['a', 'm'] or ['b', 'm'] or so on
temp[tuple(key_group)] += i
val_sum = [temp[key_group] for key_group in zip(*[data[key] for key in keys_to_group])]
data['val_sum'] = val_sum
print(data)
{'key1': ['a', 'a', 'b', 'b'],
'key2': ['m', 'n', 'm', 'm'],
'val': [1, 2, 3, 4],
'val_sum': [1, 2, 7, 7]}
然而,话虽如此,您的数据似乎更适合表格结构,如果您打算做的不仅仅是这一项操作,那么无论如何将其加载到数据框中可能是有意义的。
推荐阅读
- wordpress - Wordpress wp_get_attachment_image_srcset 根据媒体大小选择错误的图像
- go - 纠正关于错误格式参数的“去兽医”警告
- php - 应该为我的 SMTP 设置哪些属性
- go - 如何使用 go-pg 查询一对多关系
- python - 用于自定义数据集的 ImageDataGenerator 的替代方案
- javascript - 将数据从 ajax 函数传递到 php 文件/函数将数据显示为 NULL
- r - R:包主题模型:LDA:错误:无效参数
- node.js - 在角度 4+ 中获取选择选项的值
- json - 如何以 JSON 格式获取值数组
- python - pylint 可以检查所有文档顶部的静态评论/版权声明吗?