python-3.x - 使用多个 if else 填充基于其他列值的列
问题描述
我正在尝试比较熊猫数据框中的 4 列并根据结果填充第 5 列。在普通的 SQL 中,它会是这样的:
if speciality_new is null and location_new is null then 'No match found'
elif specialty <> specialty_new and location <> location_new then 'both are different'
elif specialty_new is null then 'specialty not found'
elif location_new is null then 'location not found'
else 'true'
我读到这可以使用 np.where 来实现,但我的代码失败了。有人可以告诉我我做错了什么。这是我写的:
masterDf['Match'] = np.where(
masterDf[speciality_new].isnull() & masterDf[location_new].isnull(), 'No match found',
masterDf[speciality] != masterDf[speciality_new] & masterDf[location] != masterDf[location_new], 'Both specialty and location didnt match',
masterDf[speciality] != masterDf[speciality_new], 'Specialty didnt match',
masterDf[location] != masterDf[location_new], 'Location didnt match',
True)
错误消息TypeError: unsupported operand type(s) for &: 'str' and 'str'
没有任何意义,因为 '&' 是 'and' 的语法
dfsample 是我所拥有的,而 dfFinal 是我想要的
dfsample = pd.DataFrame({'ID': [1, 2, 3, 4, 5],
'speciality': ['doctor', 'nurse', 'patient', 'driver', 'director'],
'location': ['texas', 'dc', 'georgia', '', 'florida'],
'speciality_new' : ['doctor', 'nurse', 'director', 'nurse', ''],
'location_new': ['texas', 'alaska', 'georgia', 'maryland', 'florida']})
dfFinal = pd.DataFrame({'ID': [1, 2, 3, 4, 5],
'speciality': ['doctor', 'nurse', 'patient', 'driver', 'director'],
'location': ['texas', 'dc', 'georgia', '', 'florida'],
'speciality_new' : ['doctor', 'nurse', 'director', 'nurse', ''],
'location_new': ['texas', 'alaska', 'georgia', 'maryland', 'florida'],
'match': ['TRUE', 'location didn’t match', 'specialty didn’t match', 'both specialty and location didn’t match', 'specialty didn’t match']})
解决方案
要使用 分析多个条件numpy
,最好使用numpy.select
,您应该在其中指定条件、每个条件的预期输出和默认输出,就像 if-elif-else 语句一样:
import numpy as np
condlist = [
dfsample['speciality_new'].isnull() & dfsample['location_new'].isnull(),
dfsample['speciality'].ne(dfsample['speciality_new']) &
dfsample['location'].ne(dfsample['location_new']),
dfsample['speciality'].ne(dfsample['speciality_new']),
dfsample['location'].ne(dfsample['location_new']),
]
choicelist = [
'No match found',
'Both specialty and location didnt match',
'Specialty didnt match',
'Location didnt match'
]
dfsample['match'] = np.select(condlist, choicelist, default=True)
print(dfsample)
wherene
代表“不等于”(您可以简单地使用!=
)。
输出:
ID speciality location speciality_new location_new match
0 1 doctor texas doctor texas True
1 2 nurse dc nurse alaska Location didnt match
2 3 patient georgia director georgia Specialty didnt match
3 4 driver nurse maryland Both specialty and location didnt match
4 5 director florida florida Specialty didnt match
推荐阅读
- python - 需要帮助理解 sorted(x for x in os.listdir(path)) 表达式
- reactjs - 如何在 Gitlab CI 期间替换公用文件夹?
- amazon-web-services - 无法获取 TableName 无服务器框架:参数中缺少必需的键 'TableName'
- c# - 如何从异步方法更改 UI 元素属性的值?
- java - 为什么 Collection.equals() JavaDoc 说不能正确实现 List 和 Set?
- php - 获取字符串中的前两个单词
- docker - 使用UFW + Nginx + Lets Encrpyt + Streamlit的正确流程是什么
- javascript - 为了提高 JavaScript 的性能,避免多次声明一个函数会更好吗?
- xml - 我可以链接 svg 中的两个元素,以便将鼠标悬停在一个元素上会更改另一个元素的属性吗?
- swift - SwiftUI NSSearchField 文本卡住