首页 > 解决方案 > 使用多个 if else 填充基于其他列值的列

问题描述

我正在尝试比较熊猫数据框中的 4 列并根据结果填充第 5 列。在普通的 SQL 中,它会是这样的:

if speciality_new is null and location_new is null then 'No match found'
elif specialty <> specialty_new and location <> location_new then 'both are different'
elif specialty_new is null then 'specialty not found'
elif location_new is null then 'location not found'
else 'true'

我读到这可以使用 np.where 来实现,但我的代码失败了。有人可以告诉我我做错了什么。这是我写的:

masterDf['Match'] = np.where(
    masterDf[speciality_new].isnull() & masterDf[location_new].isnull(), 'No match found',
    masterDf[speciality] != masterDf[speciality_new] & masterDf[location] != masterDf[location_new], 'Both specialty and location didnt match',
    masterDf[speciality] != masterDf[speciality_new], 'Specialty didnt match',
    masterDf[location] != masterDf[location_new], 'Location didnt match',
    True)

错误消息TypeError: unsupported operand type(s) for &: 'str' and 'str'没有任何意义,因为 '&' 是 'and' 的语法

dfsample 是我所拥有的,而 dfFinal 是我想要的

dfsample = pd.DataFrame({'ID': [1, 2, 3, 4, 5],
       'speciality': ['doctor', 'nurse', 'patient', 'driver', 'director'],
       'location': ['texas', 'dc', 'georgia', '', 'florida'],
       'speciality_new' : ['doctor', 'nurse', 'director', 'nurse', ''],
       'location_new': ['texas', 'alaska', 'georgia', 'maryland', 'florida']})

dfFinal = pd.DataFrame({'ID': [1, 2, 3, 4, 5],
       'speciality': ['doctor', 'nurse', 'patient', 'driver', 'director'],
       'location': ['texas', 'dc', 'georgia', '', 'florida'],
       'speciality_new' : ['doctor', 'nurse', 'director', 'nurse', ''],
       'location_new': ['texas', 'alaska', 'georgia', 'maryland', 'florida'],
       'match': ['TRUE', 'location didn’t match', 'specialty didn’t match', 'both specialty and location didn’t match', 'specialty didn’t match']})

标签: python-3.xpandas

解决方案


要使用 分析多个条件numpy,最好使用numpy.select,您应该在其中指定条件、每个条件的预期输出和默认输出,就像 if-elif-else 语句一样:

import numpy as np

condlist = [
    dfsample['speciality_new'].isnull() & dfsample['location_new'].isnull(),
    dfsample['speciality'].ne(dfsample['speciality_new']) & 
    dfsample['location'].ne(dfsample['location_new']),
    dfsample['speciality'].ne(dfsample['speciality_new']),
    dfsample['location'].ne(dfsample['location_new']),
]

choicelist = [
    'No match found',
    'Both specialty and location didnt match',
    'Specialty didnt match',
    'Location didnt match'
]

dfsample['match'] = np.select(condlist, choicelist, default=True)
print(dfsample)

wherene代表“不等于”(您可以简单地使用!=)。


输出:

   ID speciality location speciality_new location_new                                    match
0   1     doctor    texas         doctor        texas                                     True
1   2      nurse       dc          nurse       alaska                     Location didnt match
2   3    patient  georgia       director      georgia                    Specialty didnt match
3   4     driver                   nurse     maryland  Both specialty and location didnt match
4   5   director  florida                     florida                    Specialty didnt match

推荐阅读