首页 > 解决方案 > 这个 Numpy/Pandas 代码根据其他两个布尔列中的值构造新的布尔列有什么问题?

问题描述

我有以下数据集:

起始数据集:

ObjectID,Date,Price,Vol,Mx
101,2017-01-01,,145,203
101,2017-01-02,,155,163
101,2017-01-03,67.0,140,234
101,2017-01-04,78.0,130,182
101,2017-01-05,58.0,178,202
101,2017-01-06,53.0,134,204
101,2017-01-07,52.0,134,183
101,2017-01-08,62.0,148,176
101,2017-01-09,42.0,152,193
101,2017-01-10,80.0,137,150

我首先根据我的起始数据集中的值创建两个名为 VolPrice 和 Check 的新布尔值列。我想创建一个名为 DoubleCheck 的第三个附加列,如果 VolPrice OR Check 等于 True,则该列的值应该是 True,否则 DoubleCheck 的值应该是 false。最初我收到以下错误:

ValueError:具有多个元素的数组的真值不明确。使用 a.any() 或 a.all()

但后来我在语句中的每一列之后添加了 .any() 以构造 DoubleCheck 列。但是,这也不起作用,因为它在整个 DoubleCheck 列中提供“True”值,即使应该有如下所示的错误值。

代码:

import pandas as pd
import numpy as np

Observations = pd.read_csv("C:\\Users\\Observations.csv", parse_dates=['Date'], index_col=['ObjectID', 'Date'])

Observations['VolPrice'] = np.where((Observations['Price']<Observations['Vol']) & (Observations['Vol']<Observations['Mx']), True, False)
Observations['Check'] = np.where(Observations['Vol']<Observations['Price'], True, False)
Observations['DoubleCheck'] = np.where((Observations['Check'].any()==True) or (Observations['VolPrice'].any()==True), True, False)

print(Observations)

当前结果:

ObjectID,Date,Price,Vol,Mx,VolPrice,Check,DoubleCheck
101,2017-01-01,,145,203,False,False,True
101,2017-01-02,,155,163,False,False,True
101,2017-01-03,67.0,140,234,True,False,True
101,2017-01-04,78.0,130,182,True,False,True
101,2017-01-05,58.0,178,202,True,False,True
101,2017-01-06,53.0,134,204,True,False,True
101,2017-01-07,52.0,134,183,True,False,True
101,2017-01-08,62.0,148,176,True,False,True
101,2017-01-09,42.0,152,193,True,False,True
101,2017-01-10,80.0,137,150,True,False,True

期望的结果:

ObjectID,Date,Price,Vol,Mx,VolPrice,Check,DoubleCheck
101,2017-01-01,,145,203,False,False,False
101,2017-01-02,,155,163,False,False,False
101,2017-01-03,67.0,140,234,True,False,True
101,2017-01-04,78.0,130,182,True,False,True
101,2017-01-05,58.0,178,202,True,False,True
101,2017-01-06,53.0,134,204,True,False,True
101,2017-01-07,52.0,134,183,True,False,True
101,2017-01-08,62.0,148,176,True,False,True
101,2017-01-09,42.0,152,193,True,False,True
101,2017-01-10,80.0,137,150,True,False,True

标签: pythonpandasnumpy

解决方案


用于|按位OR,工作方式与&按位相同AND

Observations['DoubleCheck'] = Observations['Check'] | Observations['VolPrice']

DataFrame.any两列:

Observations['DoubleCheck'] = Observations[['Check','VolPrice']].any(axis=1)

如果没有:所有这些都是可能的np.where

Observations['VolPrice'] = (Observations['Price']<Observations['Vol']) & (Observations['Vol']<Observations['Mx'])
Observations['Check'] = Observations['Vol']<Observations['Price']
Observations['DoubleCheck'] = Observations['Check'] | Observations['VolPrice']

推荐阅读