首页 > 解决方案 > Python: Create new binary list based on some condition between two pandas series

问题描述

I am trying to compare two lists, one of a series of known values and the other a series of predictions from a classifier, and create a new binary list on whether the two elements were "close enough".

I will be using RMSE as a measure of fitness and if the difference between the predicted value and known value is less than say 1, I would like to put a 1 in this location of the new list, and if the error is greater than 1, return a 0 at this point in the new list.

For example:

y_known = {23,45,67,83}
y_pred = {23,46,64,78}

Should return

binary_array = [1,1,0,0]

I need this to calculate the precision / recall curve of my trained system. I have looked at using lambda expressions but apparently for this type of problem it is more hassle than it is worth. Any suggestions would be greatly appreciated.

UPDATE

This works flawlessly and did exactly what I needed it to. Original author withdrew his comment but thanks a lot!

def createBinaryArray(x, y, k):
    assert(len(x) == len(y))
    return([1 if abs(a-b)<=k else 0 for a,b in zip(x, y)])

标签: pythonpandasnumpyscikit-learnscipy

解决方案


You could use

(np.abs(y_known - y_pred) <= 1).astype(int)

With your example input:

In [265]: y_known = np.array([23, 45, 67, 83])

In [266]: y_pred = np.array([23, 46, 64, 78])

In [267]: (np.abs(y_known - y_pred) <= 1).astype(int)
Out[267]: array([1, 1, 0, 0])

Edit, based on the comments: The same approach works just the same if what you start out with are pandas Series:

In [273]: y_known = pd.Series([23, 45, 67, 83])

In [274]: y_pred = pd.Series([23, 46, 64, 78])

In [278]: ((y_known - y_pred).abs() <= 1).astype(int)
Out[278]:
0    1
1    1
2    0
3    0
dtype: int32

推荐阅读