首页 > 解决方案 > 如果多个列上的条件,熊猫数据框中的新列无法获得预期值基础

问题描述

我有一个熊猫数据框,其数据如下表所示:

Negative  Positive  Neutral
True      False     False
True      False     False
False     False     True
False     True      False
True      False     False
False     True      False
True      False     False
True      False     False

我正在做的是创建一个新列(“Overall”)并根据条件,如果列“Positive”的行值为 True,Overall 列的值为“Positive”,如果列“Negative”为 True,那么总体将采用“负”,否则为“中性”值:

def flag_df(df):
    if (df["Negative"] == "True") and (df["Positive"] == "False") and (df["Neutral"] == "False"):
        return "Negative"
    elif (df["Negative"] == "False") and (df["Positive"] == "True") and (df["Neutral"] == "False"):
        return "Positive"
    else :
        return "Neutral"

fdf['Overall'] = fdf.apply(flag_df, axis = 1)

但不幸的是,我不知道我做错了什么,“总体”列中的所有观察结果都是“中性”:

Negative    Positive    Neutral     Overall
True           False     False      Neutral
True           False     False      Neutral
False          False     True       Neutral
False          True      False      Neutral
True           False     False      Neutral
False          True      False      Neutral
True           False     False      Neutral
True           False     False      Neutral

有人可以让我知道我在哪里做错了吗?

标签: pythonpython-3.xpandasif-statement

解决方案


如果所有列都是布尔值并且每行总是只有一个True可以使用DataFrame.dot

print (df.dtypes)
Negative    bool
Positive    bool
Neutral     bool
dtype: object

df['Overall'] = df.dot(df.columns)
print (df)
   Negative  Positive  Neutral   Overall
0      True     False    False  Negative
1      True     False    False  Negative
2     False     False     True   Neutral
3     False      True    False  Positive
4      True     False    False  Negative
5     False      True    False  Positive
6      True     False    False  Negative
7      True     False    False  Negative

如果多个列名称使用:

cols = ['Negative', 'Positive', 'Neutral']
df['Overall'] = df[cols].dot(pd.Index(cols))

或者:

df1 = df[cols]
df['Overall'] = df1.dot(df1.columns)

您的解决方案应更改为numpy.select

m1 = df["Negative"] & ~df["Positive"] & ~df["Neutral"]
m2 = ~df["Negative"] & df["Positive"] & ~df["Neutral"]

df['Overall'] = np.select([m1, m2], ['Negative','Positive'], default='Neutral')
print (df)
   Negative  Positive  Neutral   Overall
0      True     False    False  Negative
1      True     False    False  Negative
2     False     False     True   Neutral
3     False      True    False  Positive
4      True     False    False  Negative
5     False      True    False  Positive
6      True     False    False  Negative
7      True     False    False  Negative

如果可能,每行可以有多个 match - Trues 将分隔符添加到 anmes 列,然后删除 last ,

print (df)
   Negative  Positive  Neutral
0      True     False     True
1      True     False    False
2     False     False     True
3     False      True    False
4      True     False    False
5     False      True    False
6      True     False    False
7      True     False    False


df['Overall'] = df.dot(df.columns + ',').str.rstrip(',')
print (df)
   Negative  Positive  Neutral           Overall
0      True     False     True  Negative,Neutral
1      True     False    False          Negative
2     False     False     True           Neutral
3     False      True    False          Positive
4      True     False    False          Negative
5     False      True    False          Positive
6      True     False    False          Negative
7      True     False    False          Negative

推荐阅读