首页 > 解决方案 > 按另一列中的键值对列求和

问题描述

我有一个像这样的熊猫数据框:

        city             country         city_population
0      New York            USA             8300000
1      London              UK              8900000
2      Paris              France           2100000
3      Chicago             USA             2700000
4      Manchester          UK              510000
5      Marseille          France           860000

我想country_population通过计算每个国家/地区每个城市的总和来创建一个新列。我努力了:

df['Country population'] = df['city_population'].sum().where(df['country'])

但这不起作用,我可以就这个问题提出一些建议吗?

标签: pythonpandasdataframe

解决方案


听起来你在寻找groupby

import pandas as pd

data = {
    'city': ['New York', 'London', 'Paris', 'Chicago', 'Manchester', 'Marseille'],
    'country': ['USA', 'UK', 'France', 'USA', 'UK', 'France'],
    'city_population': [8_300_000, 8_900_000, 2_100_000, 2_700_000, 510_000, 860_000]
}

df = pd.DataFrame.from_dict(data)
# group by country, access 'city_population' column, sum
pop = df.groupby('country')['city_population'].sum()
print(pop)

输出:

country
France     2960000
UK         9410000
USA       11000000
Name: city_population, dtype: int64

将此系列附加到 DataFrame。(虽然可以说不鼓励,因为它冗余地存储信息并且并不真正适合原始 DataFrame 的结构):

# add to existing df
pop.rename('country_population', inplace=True)
# how='left' to preserve original ordering of df
df = df.merge(pop, how='left', on='country')
print(df)

输出:

         city country  city_population  country_population
0    New York     USA          8300000            11000000
1      London      UK          8900000             9410000
2       Paris  France          2100000             2960000
3     Chicago     USA          2700000            11000000
4  Manchester      UK           510000             9410000
5   Marseille  France           860000             2960000

推荐阅读