首页 > 解决方案 > 高效地计算矩阵中所有行的运算

问题描述

假设我有一个任意大小(n,m)的二维矩阵 M。现在我想有效地对 M 中的所有行与 M 中的所有其他行进行一些向量操作。就我而言,我想在每一行之间计算按位异或。

我们所知道的是,不需要计算 xnor(x, y) 和 xnor(y, x),计算其中一个就足够了。此外,就我而言,不需要计算 xnor(x, x)。我知道这个问题的两种解决方案:

def xnor(p, q):
    return abs(p-1) * abs(q-1) + (p * q)

def solution_one(M):
    return xor(M[:, None], M).all(axis=2)

def solution_two(M):
    n = M.shape[0]
    results = np.zeros((n, n))

    # Upper triangular indices of 
    ii, jj = np.triu_indices(n, 1)
    for i, j in zip(ii,jj):
        results[i,j] = xnor(M[i], M[j]).all(axis=0)

    return results

用一些任意大矩阵运行这两者,我们得到解决方案一

# 100x100
%timeit solution_one(M)
# 3.55 ms ± 73.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# 500x500
%timeit solution_one(M)
# 1.36 s ± 59.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# 1000x1000
%timeit solution_one(M)
# 27.4 s ± 603 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

对于解决方案二

# 100x100
%timeit solution_two(M)
# 29.5 ms ± 535 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

# 500x500
%timeit solution_two(M)
# 1.06 s ± 41.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# 1000x1000
%timeit solution_two(M)
# 5.58 s ± 95.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

如您所见,对于较小的实例solution_one比 快solution_two,但对于较大的实例则相反。

我们可以看到这solution_one是在做n x n计算,而solution_two会做(正如 Naphat Amundsen 在评论中提到的那样)((n x n) - n) / 2。有更好的解决方案吗?可以以某种方式构造一个新的 M 矩阵,使得解决方案更接近((n x n) - n) / 2计算吗?

标签: pythonnumpymatrix

解决方案


我相信在这种情况下您可以进行的最少计算次数是((n x n) - n) / 2. 如果是这种情况,那么就复杂性而言,您solution_two已经是最佳的了。但是,我相信您可以进行solution_two更多矢量化,这可能会提高运行时性能。这是我的尝试:

import numpy as np

def solution_two(M):
    n = M.shape[0]
    results = np.zeros((n, n))

    # Upper triangular indices of
    ii, jj = np.triu_indices(n, 1)
    for i, j in zip(ii,jj):
        results[i,j] = xnor(M[i], M[j]).all(axis=0)

    return results

def solution_two_vectorized(M):
    n = M.shape[0]
    results = np.zeros((n, n))
    ii, jj = np.triu_indices(n, 1)
    results[ii,jj] = xnor(M[ii], M[jj]).all(-1)
    return results

# Quick test checking that your outputs are same as mine on some random matrices
import time
# Generator of random test matrices
Ms = (np.random.randint(-10,10,(np.random.randint(100,200),np.random.randint(100,200))) for i in
      range(100))

vectortimes = []
loopytimes = []

for M in Ms:
    t0 = time.time()
    out_vectorized = solution_two_vectorized(M)
    vectortimes.append(time.time() - t0)

    t0 = time.time()
    out_loopy = solution_two(M)
    loopytimes.append(time.time() - t0)

    if not np.allclose(out_vectorized, out_loopy):
        print("Output mismatch!")
        break
else:
    print("Success!")
# Success!

mean_vectortimes = np.mean(vectortimes)
mean_loopytimes = np.mean(loopytimes)

print("Vector times: ", mean_vectortimes)
# Vector times:  0.017428524494171142
print("Loopy times: ", mean_loopytimes)
# Loopy times:  0.07040918111801148
print(f"Performance improvement: {mean_loopytimes/mean_vectortimes} times")
# Performance improvement: 4.039881926984661 times

推荐阅读