首页 > 解决方案 > 使用 lexsort 对 numpy 数组进行排序

问题描述

我主要对形状为 Nx3 的二维数组感兴趣,但问题出现在形状为 Nxm 的数组中,其中 m>1 也是如此。具体来说,我想首先根据第一列对 Nx3 数组进行排序,然后是第二列,最后是第三列。所以,假设我们有数组k给出

array([[0.90625, 0.90625, 0.15625],
       [0.40625, 0.40625, 0.15625],
       [0.40625, 0.90625, 0.65625],
       [0.15625, 0.90625, 0.40625],
       [0.90625, 0.40625, 0.90625],
       [0.40625, 0.65625, 0.15625],
       [0.40625, 0.65625, 0.65625],
       [0.15625, 0.65625, 0.40625],
       [0.65625, 0.15625, 0.90625],
       [0.40625, 0.15625, 0.15625],
       [0.40625, 0.90625, 0.40625],
       [0.65625, 0.40625, 0.40625],
       [0.15625, 0.15625, 0.90625],
       [0.40625, 0.40625, 0.40625],
       [0.65625, 0.90625, 0.40625],
       [0.90625, 0.15625, 0.40625]])

所需的(排序的)数组应该是

array([[0.15625, 0.15625, 0.90625],
       [0.15625, 0.65625, 0.40625],
       [0.15625, 0.90625, 0.40625],
       [0.40625, 0.15625, 0.15625],
       [0.40625, 0.40625, 0.15625],
       [0.40625, 0.40625, 0.40625],
       [0.40625, 0.65625, 0.15625],
       [0.40625, 0.65625, 0.65625],
       [0.40625, 0.90625, 0.40625],
       [0.40625, 0.90625, 0.65625],
       [0.65625, 0.15625, 0.90625],
       [0.65625, 0.40625, 0.40625],
       [0.65625, 0.90625, 0.40625],
       [0.90625, 0.15625, 0.40625],
       [0.90625, 0.40625, 0.90625],
       [0.90625, 0.90625, 0.15625]])

我以为我可以通过使用来实现这一点,np.lexsort但似乎我可能遗漏了一些东西并且没有按预期工作。到目前为止,我一直在做以下事情

In [28]: k[np.lexsort((k[:,2], k[:,1], k[:,0]))]
Out[28]: 
array([[0.15625, 0.65625, 0.40625],
       [0.15625, 0.15625, 0.90625],
       [0.15625, 0.90625, 0.40625],
       [0.40625, 0.65625, 0.65625],
       [0.40625, 0.90625, 0.40625],
       [0.40625, 0.15625, 0.15625],
       [0.40625, 0.40625, 0.40625],
       [0.40625, 0.90625, 0.65625],
       [0.40625, 0.40625, 0.15625],
       [0.40625, 0.65625, 0.15625],
       [0.65625, 0.15625, 0.90625],
       [0.65625, 0.90625, 0.40625],
       [0.65625, 0.40625, 0.40625],
       [0.90625, 0.40625, 0.90625],
       [0.90625, 0.15625, 0.40625],
       [0.90625, 0.90625, 0.15625]])

似乎第一列已正确排序,但其他列未正确排序。之前有人问过类似的问题,但我相信接受的答案(这基本上就是我正在做的)不起作用。

根据我对它进行更多研究后的理解,我认为这与数组的值是浮点数有关。

编辑

我找到了我的问题的答案。k但是,我会将其添加为“编辑”而不是将其作为答案发布,因为我相信如果我在原始帖子中提到了有关矩阵的详细信息,则可以避免整个情况。Matrixk是从另一个 matrix 创建的a,其中a本质上是通过从文件中读取具有 16 位小数的浮点矩阵创建的。现在让我们看看引导我找到解决方案的工作流程。

In [6]: k=a[[1,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60]]

In [7]: k
Out[7]: 
array([[0.15625, 0.15625, 0.40625],
       [0.15625, 0.40625, 0.15625],
       [0.15625, 0.65625, 0.15625],
       [0.15625, 0.90625, 0.15625],
       [0.40625, 0.15625, 0.15625],
       [0.40625, 0.40625, 0.15625],
       [0.40625, 0.65625, 0.15625],
       [0.40625, 0.90625, 0.15625],
       [0.65625, 0.15625, 0.15625],
       [0.65625, 0.40625, 0.15625],
       [0.65625, 0.65625, 0.15625],
       [0.65625, 0.90625, 0.15625],
       [0.90625, 0.15625, 0.15625],
       [0.90625, 0.40625, 0.15625],
       [0.90625, 0.65625, 0.15625],
       [0.90625, 0.90625, 0.15625]])

In [8]: np.random.shuffle(k)

In [9]: k
Out[9]: 
array([[0.15625, 0.90625, 0.15625],
       [0.90625, 0.40625, 0.15625],
       [0.40625, 0.65625, 0.15625],
       [0.90625, 0.90625, 0.15625],
       [0.15625, 0.40625, 0.15625],
       [0.65625, 0.15625, 0.15625],
       [0.40625, 0.90625, 0.15625],
       [0.65625, 0.65625, 0.15625],
       [0.40625, 0.15625, 0.15625],
       [0.90625, 0.65625, 0.15625],
       [0.65625, 0.40625, 0.15625],
       [0.15625, 0.65625, 0.15625],
       [0.65625, 0.90625, 0.15625],
       [0.15625, 0.15625, 0.40625],
       [0.90625, 0.15625, 0.15625],
       [0.40625, 0.40625, 0.15625]])

In [10]: k[np.lexsort((k[:,2],k[:,1],k[:,0]))]
Out[10]: 
array([[0.15625, 0.40625, 0.15625],
       [0.15625, 0.65625, 0.15625],
       [0.15625, 0.90625, 0.15625],
       [0.15625, 0.15625, 0.40625],
       [0.40625, 0.65625, 0.15625],
       [0.40625, 0.90625, 0.15625],
       [0.40625, 0.15625, 0.15625],
       [0.40625, 0.40625, 0.15625],
       [0.65625, 0.15625, 0.15625],
       [0.65625, 0.40625, 0.15625],
       [0.65625, 0.65625, 0.15625],
       [0.65625, 0.90625, 0.15625],
       [0.90625, 0.15625, 0.15625],
       [0.90625, 0.40625, 0.15625],
       [0.90625, 0.65625, 0.15625],
       [0.90625, 0.90625, 0.15625]])

In [11]: k=np.round(k, 5)

In [12]: k[np.lexsort((k[:,2],k[:,1],k[:,0]))]
Out[12]: 
array([[0.15625, 0.15625, 0.40625],
       [0.15625, 0.40625, 0.15625],
       [0.15625, 0.65625, 0.15625],
       [0.15625, 0.90625, 0.15625],
       [0.40625, 0.15625, 0.15625],
       [0.40625, 0.40625, 0.15625],
       [0.40625, 0.65625, 0.15625],
       [0.40625, 0.90625, 0.15625],
       [0.65625, 0.15625, 0.15625],
       [0.65625, 0.40625, 0.15625],
       [0.65625, 0.65625, 0.15625],
       [0.65625, 0.90625, 0.15625],
       [0.90625, 0.15625, 0.15625],
       [0.90625, 0.40625, 0.15625],
       [0.90625, 0.65625, 0.15625],
       [0.90625, 0.90625, 0.15625]])

In [13]: np.savetxt(sys.stdout, a[[1,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60]], fmt='%.18f')
0.156250000000000000 0.156250000000000000 0.406250000000000000
0.156249999999999972 0.406250000000000000 0.156250000000000028
0.156249999999999972 0.656250000000000000 0.156250000000000028
0.156249999999999972 0.906250000000000000 0.156250000000000028
0.406250000000000000 0.156249999999999972 0.156250000000000028
0.406250000000000000 0.406250000000000000 0.156250000000000028
0.406249999999999944 0.656250000000000000 0.156250000000000028
0.406249999999999944 0.906250000000000000 0.156250000000000028
0.656250000000000000 0.156249999999999972 0.156250000000000028
0.656250000000000000 0.406249999999999944 0.156250000000000028
0.656250000000000000 0.656250000000000000 0.156250000000000028
0.656250000000000000 0.906250000000000000 0.156250000000000056
0.906250000000000000 0.156249999999999972 0.156250000000000028
0.906250000000000000 0.406249999999999944 0.156250000000000028
0.906250000000000000 0.656250000000000000 0.156250000000000056
0.906250000000000000 0.906250000000000000 0.156250000000000056

从上面可以看出,这完全是四舍五入的问题。显然,打印小数点后一切似乎都很好,但是当读取文件并a创建矩阵时,它在小数点后 16 位后存储不准确。因此,这些不准确性一直延续到k定义时a。因此,lexsort考虑到存储在矩阵中的实数,从一开始就给出了正确的结果。当我四舍五入 matrix 时,一切正常k

故事的寓意:始终检查您的价值观的准确性。

标签: pythonpython-3.xnumpy

解决方案


我认为numpy这种操作不灵活,但我不能否认存在某种解决方案。我建议您使用其他软件包,例如pandasor numpy_indexed(假设data是您的数组):

熊猫

import pandas as pd
df = pd.DataFrame(data)
sorted_data = np.array(df.sort_values(by=[0,1,2]))

numpy_indexed

import numpy_indexed as npi
npi.sort(data)

来源

对于更一般的使用情况,您可能想查看这个答案


推荐阅读