首页 > 解决方案 > 匹配两个特定列的 Numpy

问题描述

我有一个六列矩阵。我想找到两列都与查询匹配的行。

我一直在尝试使用 numpy.where,但我无法指定它只匹配两列。

#Example of the array
x = np.array([[860259, 860328, 861277, 861393, 865534, 865716], [860259, 860328, 861301, 861393, 865534, 865716], [860259, 860328, 861301, 861393, 871151, 871173],])

print(x)

#Match first column of interest
A = np.where(x[:,2] == 861301)

#Match second column on interest
B = np.where(x[:,3] == 861393)

#rows in both A and B
np.intersect1d(A, B)
#This approach works, but is not column specific for the intersect, leaving me with extra rows I don't want.


#This is the only way I can get Numpy to match the two columns, but
#when I query I will not actually know the values of columns 0,1,4,5.
#So this approach will not work.
#Specify what row should exactly look like
np.where(all([860259, 860328, 861277, 861393, 865534, 865716]))
#I want something like this:
#Where * could be any number. But I think that this approach may be 
#inefficient. It would be best to just match column 2 and 3.

np.where(all([*, *, 861277, 861393, *, *]))

我正在寻找一个有效的答案,因为我正在浏览一个 150GB 的 HDF5 文件。

谢谢你的帮助!

标签: pythonarraysnumpy

解决方案


如果我理解正确,

您可以使用更高级的切片,如下所示:

np.where(np.all(x[:,2:4] == [861277, 861393], axis=1))

这只会给你这 2 cols 等于[861277, 861393]


推荐阅读