python - 根据 pandas 中另一列的值创建一个新列
问题描述
在数据框中,我有一个不同国家/地区名称的列,我想创建一个包含其区域的新列,例如该国家是印度,该区域应该是亚洲等。我已经尝试过使用 np.where,但看起来像我做错了什么。以下是我尝试过的代码:
Region = np.where(country_name == 'US' , "US",
np.where(country_name == ('Brazil' or 'Canada' or 'Peru' or 'Chile') , "Rest of America",
np.where(country_name == ('South Africa 'or 'Egypt' or 'Morocco' or 'Algeria' or 'Ghana'), "Africa",
np.where(country_name == ('Afghanistan'or 'Armenia'or 'Azerbaijan' or 'Bahrain'or'Bangladesh'or 'Bhutan'or
'Brunei'or 'Burma'or 'Cambodia'or 'China'or 'East Timor' or
'Georgia'or 'Hong Kong'or 'India' or 'Indonesia'or 'Iran' or 'Iraq'or 'Israel'or 'Japan'or
'Jordan'or 'Kazakhstan'or 'Kuwait'or 'Kyrgyzstan'or 'Laos'or
'Lebanon'or 'Malaysia' or 'Mongolia'or 'Nepal'or 'North Korea'or 'Oman'or 'Pakistan'|
'Papua New Guinea'or 'Philippines'or 'Qatar'or 'Russia'or 'Saudi Arabia'or 'Singapore'|
'South Korea'or 'Sri Lanka'or 'Syria'or 'Taiwan'or 'Tajikistan'or 'Thailand'or 'Turkey'or 'Turkmenistan'or
'United Arab Emirates'or 'Uzbekistan'or 'Vietnam'or 'Yemen'), "Asia",
np.where(country_name == ('Spain'or 'Italy' or 'Germany'or 'United Kingdom' or'France'), "Europe", "Unchange")))))
Below is the data:
Entity Region Code Date Total confirmed deaths (deaths) Total confirmed cases (cases)
0 Afghanistan Asia AFG 2019-12-31 0 0
1 Afghanistan Asia AFG 2020-01-01 0 0
2 Afghanistan Asia AFG 2020-01-02 0 0
3 Afghanistan Asia AFG 2020-01-03 0 0
4 Afghanistan Asia AFG 2020-01-04 0 0
5 Afghanistan Asia AFG 2020-01-05 0 0
6 Afghanistan Asia AFG 2020-01-06 0 0
7 Afghanistan Asia AFG 2020-01-07 0 0
8 Afghanistan Asia AFG 2020-01-08 0 0
9 Afghanistan Asia AFG 2020-01-09 0 0
10 Afghanistan Asia AFG 2020-01-10 0 0
11 Afghanistan Asia AFG 2020-01-11 0 0
但此代码仅适用于第一个国家,例如仅适用于巴西、南非、阿富汗和西班牙。
解决方案
list_1 = ["Iceland", "Norway", "Sweden", "Finland","Denmark","United Kingdom", "Ireland",
"France", "Belgium","Netherlands", "Luxembourg","Monaco", "Portugal", "Spain",
"Andorra", "Italy","Malta","San Marino", "Vatican City", "Germany",
"Switzerland", "Liechtenstein"," Austria", "Poland", "Czech Republic", "Slovakia",
"Hungary","Slovenia","Croatia", "Bosnia" ,"Herzegovina", "Serbia", "Montenegro",
"Albania", "Macedonia", "Romania", "Bulgaria","Greece", "Estonia", "Latvia",
"Lithuania", "Belarus", "Ukraine", "Moldova"]
list_2 = ['Brazil' , 'Canada' , 'Peru' , 'Chile', 'South America']
list_3 = ['Afghanistan', 'Armenia', 'Azerbaijan', 'Bahrain' ,'Bangladesh', 'Bhutan',
'Brunei', 'Burma', 'Cambodia', 'China', 'East Timor','Georgia', 'Hong Kong',
'India' , 'Indonesia', 'Iran' , 'Iraq' ,'Israel' , 'Japan','Jordan', 'Kazakhstan',
'Kuwait' , 'Kyrgyzstan' , 'Laos', 'Lebanon', 'Malaysia' , 'Mongolia', 'Nepal',
'North Korea', 'Oman', 'Pakistan','Papua New Guinea', 'Philippines', 'Qatar',
'Saudi Arabia','Singapore', 'South Korea', 'Sri Lanka', 'Syria', 'Taiwan'
'Tajikistan', 'Thailand', 'Turkey', 'Turkmenistan','United Arab Emirates',
'Uzbekistan', 'Vietnam', 'Yemen']
list_4 = ['United States']
list_5 = ['South Africa','Egypt' , 'Morocco' , 'Algeria' , 'Ghana', 'Africa', "Egypt"]
conditions = [
(df['Entity'].isin(list_4)),
(df['Entity'].isin(list_2)),
(df['Entity'].isin(list_5)),
(df['Entity'].isin(list_3)),
(df['Entity'].isin(list_1))
]
choices = ['US',"Rest of America","Africa","Asia","Europe"]
Region = np.select(conditions, choices, default='Rest of the world')