首页 > 解决方案 > Use numpy to get row indexes for a given column value sorted along another column

问题描述

The question title may be confusing but here's the problem, I have 2-dimensional numpy arrays. Now, I want to get the list/array of indexes that have a specific value for 1st column while also at the same time sorted along the second column:

a = np.array([[1,2],[1,3],[1,4],[1,5],[1,6],[2,9],[1,9],[1,7],[2,7],[1,8]])

index = [0, 1, 2, 3, 4, 7, 9, 6] # <---- the solution, I want this list

# this list gives sorted array for 1st column value 1
a[index] = 
array([[1, 2],
       [1, 3],
       [1, 4],
       [1, 5],
       [1, 6],
       [1, 7],
       [1, 8],
       [1, 9]])

NOTE: I want the index list, not the sorted array for the given value.

What I've currently come up with is the following:

tmp = a[np.lexsort((a[:,1],a[:,0]))]
tmp= tmp[tmp[:,0]==1]
index = [np.where(np.all(a==i,axis=1))[0][0] for i in tmp]

As, you can see this is preety bad and as I'm worikng with very large data sets, this needs real improvement. Is there any way to accomplish this more efficiently with numpy?

标签: pythonarraysnumpy

解决方案


这是另一种使用np.unique. 优点np.unique是您可以将其配置为直接返回索引和排序数组。请参见下面的代码:

# Get the sorted array and indices
tmp = np.unique(a, return_index=True, axis=0)
# Get the indices only where the sorted array's first column equals 1 
index = tmp[1][tmp[0][:,0]==1]
print(index)

输出:

[0 1 2 3 4 7 9 6]

推荐阅读