首页 > 解决方案 > 如何根据来自其他两列的值的分组总和创建新的值列?

问题描述

首先,如果我的问题措辞不够好,我深表歉意。

我将使用一个示例数据框来说明我的问题。

medals = pd.DataFrame({'Year':[2010,2010,2010,2010,2010,2010,2014,2014,2014,2014,2014,2014,2018,2018,2018,2018,2018,2018],'Country': ['Canada','Canada','USA','USA','Germany','Germany','Canada','Canada','USA','USA','Germany','Germany','Canada','Canada','USA','USA','Germany','Germany'],'Sex': ['female','male','female','male','female','male','female','male','female','male','female','male','female','male','female','male','female','male'],
'No. of medals': [2,4,2,0,3,0,1,1,3,2,4,4,1,3,2,2,1,3]})

假设我有这个国家数据框和他们在奥运会上获得的奖牌数量:

    Year  Country     Sex  No. of medals
0   2010   Canada  female              2
1   2010   Canada    male              4
2   2010      USA  female              2
3   2010      USA    male              0
4   2010  Germany  female              3
5   2010  Germany    male              0
6   2014   Canada  female              1
7   2014   Canada    male              1
8   2014      USA  female              3
9   2014      USA    male              2
10  2014  Germany  female              4
11  2014  Germany    male              4
12  2018   Canada  female              1
13  2018   Canada    male              3
14  2018      USA  female              2
15  2018      USA    male              2
16  2018  Germany  female              1
17  2018  Germany    male              3 

假设我想添加一列,显示该国当年获得的奖牌总数:

    Year  Country     Sex  No. of medals  Total medals
0   2010   Canada  female              2             6
1   2010   Canada    male              4             6
2   2010      USA  female              2             2
3   2010      USA    male              0             2
4   2010  Germany  female              3             3
5   2010  Germany    male              0             3
6   2014   Canada  female              1             2
7   2014   Canada    male              1             2
8   2014      USA  female              3             5
9   2014      USA    male              2             5
10  2014  Germany  female              4             8
11  2014  Germany    male              4             8
12  2018   Canada  female              1             4
13  2018   Canada    male              3             4
14  2018      USA  female              2             4
15  2018      USA    male              2             4
16  2018  Germany  female              1             4
17  2018  Germany    male              3             4

我该怎么做呢?我已经按国家和年份分组并得到总和,但我不确定如何将其映射到年份和国家列。

medals.groupby(['Year','Country'])['No. of medals'].sum()

给了我这个:

Year  Country
2010  Canada     6
      Germany    3
      USA        2
2014  Canada     2
      Germany    8
      USA        5
2018  Canada     4
      Germany    4
      USA        4
Name: No. of medals, dtype: int64

非常感谢任何提示和指示。谢谢!

标签: pythonpandaspandas-groupby

解决方案


利用groupby transform

medals['Total medals']=medals.groupby(['Country','Year'])['No. of medals'].transform('sum')
print(medals)




  Year  Country     Sex  No. of medals  Total medals
0   2010   Canada  female              2             6
1   2010   Canada    male              4             6
2   2010      USA  female              2             2
3   2010      USA    male              0             2
4   2010  Germany  female              3             3
5   2010  Germany    male              0             3
6   2014   Canada  female              1             2
7   2014   Canada    male              1             2
8   2014      USA  female              3             5
9   2014      USA    male              2             5
10  2014  Germany  female              4             8
11  2014  Germany    male              4             8
12  2018   Canada  female              1             4
13  2018   Canada    male              3             4
14  2018      USA  female              2             4
15  2018      USA    male              2             4
16  2018  Germany  female              1             4
17  2018  Germany    male              3             4

推荐阅读