首页 > 解决方案 > python函数中的语法无效,用于根据另外两个条件创建新列

问题描述

我不断收到错误消息SyntaxError: invalid syntax,我想知道(1)为什么会这样以及(2)如何修复我的功能,以便它做我想要的。

我有一个看起来像这样的熊猫数据框:

d = {'Relationship': ['Male', 'Female','Spouse','Spouse','Male','Spouse','Male','Male','Male','Spouse','Female'], 'Sex': ['Male', 'Female','Female','Male','Male','Female','Male','Male','Male','Female','Female']}
df = pd.DataFrame(data=d)
df

Relationship    Sex
Male            Male
Female          Female
Spouse          Female
Spouse          Male
Male            Male
Spouse          Female
Male            Male
Male            Male
Male            Male
Spouse          Female
Female          Female

而我想要的是为每个实例Spouse填写 中列出的异性df['Sex']。所以 df 应该是这样的:

df

Relationship    Sex
Male            Male
Female          Female
Male            Female
Female          Male
Male            Male
Male            Female
Male            Male
Male            Male
Male            Male
Male            Female
Female          Female

这是我写的函数:

def typex(column):
    if column['Relationship']!='Spouse' & column['Sex']! ='Female':
        return 'Male'
    elif column['Relationship']!='Spouse' & column['Sex']! ='Male':
        return 'Female'

df.loc[:,'Relationship'] = df.apply(typex, axis=1)

标签: pythonpandasfor-loopdataframesyntax

解决方案


我建议numpy.select用于矢量化解决方案:

m1 = (df['Relationship']!='Spouse') & (df['Sex']!='Female')
m2 = (df['Relationship']!='Spouse') & (df['Sex']!='Male')

df['new'] = np.select([m1, m2], ['Male','Female'], default='not matched') 

但是如果想使用您的代码更改&and 因为使用标量

def typex(column):
    if (column['Relationship']=='Spouse') and (column['Sex']=='Female'):
        return 'Male'
    elif (column['Relationship']=='Spouse') and (column['Sex']='Male'):
        return 'Female'

df['new'] = df.apply(typex, axis=1)

推荐阅读