python - 将多维数组中的元素映射到其索引

问题描述

我正在使用此处get_tuples(length, total)的函数来生成给定长度和总和的所有元组的数组，示例和函数如下所示。创建数组后，我需要找到一种方法来返回数组中给定数量元素的索引。我可以通过将数组更改为列表来做到这一点，如下所示。但是，此解决方案或同样基于搜索（例如使用）的其他解决方案需要大量时间来查找索引。由于数组中的所有元素（数组.index()np.wheres在示例中）是不同的，我想知道我们是否可以构造一个一对一的映射，即一个函数，使得给定数组中的元素，它通过对值进行一些加法和乘法来返回元素的索引这个元素的。如果可能的话，有什么想法吗？谢谢！

import numpy as np

def get_tuples(length, total):
    if length == 1:
        yield (total,)
        return

    for i in range(total + 1):
        for t in get_tuples(length - 1, total - i):
            yield (i,) + t
#example
s = np.array(list(get_tuples(4, 20)))

# array s
In [1]: s
Out[1]: 
array([[ 0,  0,  0, 20],
       [ 0,  0,  1, 19],
       [ 0,  0,  2, 18],
       ...,
       [19,  0,  1,  0],
       [19,  1,  0,  0],
       [20,  0,  0,  0]])

#example of element to find the index for. (Note in reality this is 1000+ elements)
elements_to_find =np.array([[ 0,  0,  0, 20],
                            [ 0,  0,  7, 13],
                            [ 0,  5,  5, 10],
                            [ 0,  0,  5, 15],
                            [ 0,  2,  4, 14]])
#change array to list
s_list = s.tolist()

#find the indices
indx=[s_list.index(i) for i in elements_to_find.tolist()]

#output
In [2]: indx
Out[2]: [0, 7, 100, 5, 45]

标签： pythonarraysnumpyindexing

解决方案

这是一个仅基于元组计算索引的公式，即它不需要看到完整的数组。要计算 N 元组的索引，它需要评估 N-1 个二项式系数。以下实现是（部分）矢量化的，它接受 ND 数组，但元组必须在最后一维中。

import numpy as np
from scipy.special import comb

# unfortunately, comb with option exact=True is not vectorized
def bc(N,k):
    return np.round(comb(N,k)).astype(int)

def get_idx(s):
    N = s.shape[-1] - 1
    R = np.arange(1,N)
    ps = s[...,::-1].cumsum(-1)
    B = bc(ps[...,1:-1]+R,1+R)
    return bc(ps[...,-1]+N,N) - ps[...,0] - 1 - B.sum(-1)

# OP's generator
def get_tuples(length, total):
    if length == 1:
        yield (total,)
        return

    for i in range(total + 1):
        for t in get_tuples(length - 1, total - i):
            yield (i,) + t
#example
s = np.array(list(get_tuples(4, 20)))

# compute each index
r = get_idx(s)

# expected: 0,1,2,3,...
assert (r == np.arange(len(r))).all()
print("all ok")

#example of element to find the index for. (Note in reality this is 1000+ elements)
elements_to_find =np.array([[ 0,  0,  0, 20],
                            [ 0,  0,  7, 13],
                            [ 0,  5,  5, 10],
                            [ 0,  0,  5, 15],
                            [ 0,  2,  4, 14]])

print(get_idx(elements_to_find))

样品运行：

all ok
[  0   7 100   5  45]

如何推导出公式：

使用星形和条形将完整的分区计数#part(N,k)（N 是总数，k 是长度）表示为单个二项式系数(N + k - 1) choose (k - 1)。
从后到前计数：不难验证在 OP 的生成器的外循环的第 i 次完整迭代之后，究竟#part(N-i,k)还没有枚举。实际上，剩下的就是所有分区 p1+p2+... = N with p1>=i; 我们可以写 p1=q1+i 使得 q1+p2+... = Ni 并且后一个分区是无约束的，所以我们可以使用 1. 来计数。

python - 将多维数组中的元素映射到其索引

问题描述

解决方案

推荐阅读