python - How to create a new column in pandas and set its values according to whether a second column includes a string from various lists of strings
问题描述
I have a dataframe with values for Turkish provinces:
df['province']
2078982 Adana
2078983 Adana
2078984 Adana
2078985 Adana
2078986 Adana
2210113 Zonguldak
2210114 Zonguldak
2210115 Zonguldak
2210116 Zonguldak
2210117 Zonguldak
I want to write an if loop or a function that can create a new column that would categorize each of these provinces by regions. Therefore, I create 7 lists which contain the provinces that are included in each of the 7 regions:
aegean = ['Izmir', 'Aydin', 'Manisa', 'Uşak', 'Afyonkarahisar', 'Denizli', 'Kütahya', 'Muğla']
blacksea = ['Amasya', 'Gümüşhane', 'Bartın', 'Bolu', 'Giresun', 'Kastamonu', 'Karabük','Ordu', 'Rize', 'Samsun',
'Sinop', 'Tokat', 'Trabzon', 'Zonguldak', 'Artvin', 'Bayburt', 'Çorum', 'Düzce']
cen_ana= ['Aksaray', 'Kırıkkale', 'Kırşehir', 'Nevşehir', 'Ankara', 'Çankırı', 'Eskisehir', 'Karaman', 'Kayseri', 'Konya', 'Sivas', 'Yozgat']
eas_ana= ['Ağrı', 'Bingöl', 'Elazığ', 'Hakkari', 'Iğdır', 'Kars', 'Tunceli', 'Van', 'Ardahan', 'Erzurum','Şırnak']
marmara=['Edirne', 'Istanbul', 'Kırklareli', 'Kocaeli', 'Tekirdağ', 'Yalova', 'Balıkesir', 'Bilecik', ' Bursa','Çanakkale','Sakarya' ]
medite=['Adana', 'Antalya', 'Mersin', 'Burdur', 'Hatay', 'Isparta', 'Osmaniye','Kahramanmaraş' ]
sou_ana=['Adiyaman', 'Batman','Diyarbakır', 'Gaziantep', 'Siirt', 'Mardin', 'Şanlıurfa']
After having done that, I loop through the dataset with a for and if loop:
for i, row in df.iterrows():
df['Region']='something'
if any(e in df["province"] for e in aegean):
df['Region']=="Aegean Region"
elif any(q in df["province"] for q in blacksea):
df['Region']=="Black Sea Region"
elif any(s in df["province"] for s in cen_ana):
df['Region']=="Central Anatolia"
elif any(c in df["province"] for c in eas_ana):
df['Region']=="Eastern Anatolia"
elif any(v in df["province"] for v in sou_ana):
df['Region']=="Southern Anatolia"
elif any(g in df["province"] for g in marmara):
df['Region']=="Marmara"
elif any(h in df["province"] for h in medite):
df['Region']=="Mediterranean"
else:
df['Region']=="Other"
But all I end up getting is all my columns with values "something" for some reason.
df['Region']
Out[148]:
2078982 something
2078983 something
2078984 something
2078985 something
2078986 something
2210113 something
2210114 something
2210115 something
2210116 something
2210117 something
Name: Region, Length: 15901, dtype: object
I tried some examples which suggest using a function instead:
def regionaler(x):
if any(e in df["province"] for e in aegean):
return "Aegean Region"
elif any(e in df["province"] for e in blacksea):
return "Black Sea Region"
elif any(e in df["province"] for e in cen_ana):
return "Central Anatolia"
elif any(e in df["province"] for e in eas_ana):
return "Eastern Anatolia"
elif any(e in df["province"] for e in sou_ana):
return "Southern Anatolia"
elif any(e in df["province"] for e in marmara):
return "Marmara"
elif any(e in df["province"] for e in medite):
return "Mediterranean"
else:
return "Other"
But the result is similarly off for me:
df['Region'] = df.apply(regionaler,axis=1)
df['Region']
Out[151]:
2078982 Other
2078983 Other
2078984 Other
2078985 Other
2078986 Other
2210113 Other
2210114 Other
2210115 Other
2210116 Other
2210117 Other
Name: Region, Length: 15901, dtype: object
I have the feeling that I am doing some seriously stupid mistake which can be easily fixed but can't figure it out. Would be very grateful to anyone who could help!
解决方案
You can do this better by using Series.map
:
Create a dict
with the region lists like below(I am using only a sample):
In [2511]: medite=['Adana', 'Antalya', 'Mersin']
In [2508]: blacksea = ['Amasya', 'Gümüşhane', 'Bartın','Zonguldak']
In [2512]: province_map = {'medite': medite, 'blacksea':blacksea}
In [2513]: print(province_map)
Out[2513]:
{'medite': ['Adana', 'Antalya', 'Mersin'],
'blacksea': ['Amasya', 'Gümüşhane', 'Bartın', 'Zonguldak']}
Now, convert province_map
values to keys, like below:
In [2514]: d = {i: k for k,v in province_map.items() for i in v}
In [2515]: print(d)
Out[2515]:
{'Adana': 'medite',
'Antalya': 'medite',
'Mersin': 'medite',
'Amasya': 'blacksea',
'Gümüşhane': 'blacksea',
'Bartın': 'blacksea',
'Zonguldak': 'blacksea'}
Now use Series.map
to create your new column in dataframe:
In [2518]: df['Region'] = df.province.map(d)
In [2519]: df
Out[2519]:
province Region
2078982 Adana medite
2078983 Adana medite
2078984 Adana medite
2078985 Adana medite
2078986 Adana medite
2210113 Zonguldak blacksea
2210114 Zonguldak blacksea
2210115 Zonguldak blacksea
2210116 Zonguldak blacksea
2210117 Zonguldak blacksea
推荐阅读
- javascript - 编译失败。找不到模块:无法解析“react-router-dom”
- sql - SQL Server 存储过程抛出错误必须声明标量变量“”
- sql - 如何在 LINQ 中结合 LEFT JOIN、GROUP BY 和 SUM?
- python - 通过套接字将 numpy 数组发送到另一台机器时,套接字连接超时
- botframework - QnAMaker 使用关键字的错误答案
- css - CSS 媒体查询验证
- android - 动态注册的广播接收器的 onReceive() 钩子方法没有被调用
- python - 有人有使用 Python Zeep 和 Mock 对 SOAP API 进行单元测试的示例吗?
- react-native - 反应原生 fbsdk 显示登录失败,没有任何 Facebook 弹出窗口
- javascript - 如何从一个 HTML 页面获取数据以更新提交时的另一个页面