首页 > 解决方案 > 多个稀疏矩阵的元素相加

问题描述

我有一个相同形状的稀疏 CSR 矩阵列表。我想按元素添加它们,以使结果矩阵保持稀疏。

有没有比在这种循环中这样做更好的方法?

a = lil_matrix((5,5)).tocsr()

for m in m_list:
    a += m

我也尝试过这种方法:

a = np.sum(m_list)

但我在某处读过 numpy 函数不应该与 scipy 稀疏矩阵混合,对吗?

标签: pythonpython-3.xscipypython-3.7sparse-matrix

解决方案


让我们实验一下:

制作一些矩阵:

In [30]: mlist = [sparse.random(5,5,.2,'csr')*10 for _ in range(3)]
In [32]: mlist = [(sparse.random(5,5,.2,'csr')*10).astype(int) for _ in range(3)
    ...: ]
In [33]: mlist
Out[33]: 
[<5x5 sparse matrix of type '<class 'numpy.int64'>'
    with 5 stored elements in Compressed Sparse Row format>,
 <5x5 sparse matrix of type '<class 'numpy.int64'>'
    with 5 stored elements in Compressed Sparse Row format>,
 <5x5 sparse matrix of type '<class 'numpy.int64'>'
    with 5 stored elements in Compressed Sparse Row format>]
In [34]: [m.A for m in mlist]
Out[34]: 
[array([[0, 0, 3, 0, 0],
        [4, 0, 0, 0, 0],
        [0, 9, 0, 0, 0],
        [7, 0, 6, 0, 0],
        [0, 0, 0, 0, 0]]),
 array([[0, 0, 1, 0, 0],
        [0, 0, 6, 0, 0],
        [8, 0, 0, 0, 0],
        [0, 0, 7, 0, 0],
        [0, 0, 0, 0, 0]]),
 array([[0, 0, 0, 0, 8],
        [0, 0, 0, 0, 0],
        [7, 0, 8, 0, 0],
        [2, 0, 0, 0, 0],
        [0, 0, 0, 0, 0]])]

执行显式添加(与循环相同):

In [36]: mlist[0]+mlist[1]+mlist[2]
Out[36]: 
<5x5 sparse matrix of type '<class 'numpy.int64'>'
    with 9 stored elements in Compressed Sparse Row format>
In [37]: _.A
Out[37]: 
array([[ 0,  0,  4,  0,  8],
       [ 4,  0,  6,  0,  0],
       [15,  9,  8,  0,  0],
       [ 9,  0, 13,  0,  0],
       [ 0,  0,  0,  0,  0]])

应用 python 的“总和”:

In [38]: sum(mlist)
Out[38]: 
<5x5 sparse matrix of type '<class 'numpy.int64'>'
    with 9 stored elements in Compressed Sparse Row format>
In [39]: _.A
Out[39]: 
array([[ 0,  0,  4,  0,  8],
       [ 4,  0,  6,  0,  0],
       [15,  9,  8,  0,  0],
       [ 9,  0, 13,  0,  0],
       [ 0,  0,  0,  0,  0]])

np.sum

In [40]: np.sum(mlist)
Out[40]: 
<5x5 sparse matrix of type '<class 'numpy.int64'>'
    with 9 stored elements in Compressed Sparse Row format>
In [41]: _.A
Out[41]: 
array([[ 0,  0,  4,  0,  8],
       [ 4,  0,  6,  0,  0],
       [15,  9,  8,  0,  0],
       [ 9,  0, 13,  0,  0],
       [ 0,  0,  0,  0,  0]])

两者都有效。Pythonsum只是遍历列表,+在它们之间执行。

np.sum制作一个数组:

In [42]: np.array(mlist)
Out[42]: 
array([<5x5 sparse matrix of type '<class 'numpy.int64'>'
    with 5 stored elements in Compressed Sparse Row format>,
       <5x5 sparse matrix of type '<class 'numpy.int64'>'
    with 5 stored elements in Compressed Sparse Row format>,
       <5x5 sparse matrix of type '<class 'numpy.int64'>'
    with 5 stored elements in Compressed Sparse Row format>], dtype=object)

但由于这是一个对象 dtype 数组,它也将任务委托给+矩阵的方法。

时间差别不大:

In [43]: timeit sum(mlist)
421 µs ± 14.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [44]: timeit np.sum(mlist)
391 µs ± 18 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [45]: timeit mlist[0]+mlist[1]+mlist[2]
334 µs ± 629 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

但是将其与添加密集数组进行比较:

In [46]: timeit mlist[0].A+mlist[1].A+mlist[2].A
25.3 µs ± 505 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

稀疏矩阵加法不是很有效。这种格式更适合矩阵乘法,但即使在这种情况下,稀疏度也需要在 10% 左右或更少。我还没有针对稀疏性测试加法。

如果您从样式输入构建这些csr矩阵,您可能会考虑先组合输入。coo使用coo样式输入,重复的条目被求和。

只是为了说明这个想法:

def foo(mlist):
    data, row, col = [],[],[]
    for m in mlist:
        mc = m.tocoo()
        data.extend(mc.data)
        row.extend(mc.row)
        col.extend(mc.col)
    res = sparse.csr_matrix((data,(row,col)),shape=mc.shape)
    return res

In [55]: foo(mlist)
Out[55]: 
<5x5 sparse matrix of type '<class 'numpy.int64'>'
    with 11 stored elements in Compressed Sparse Row format>
In [56]: _.A
Out[56]: 
array([[ 0,  0,  4,  0,  8],
       [ 4,  0,  6,  0,  0],
       [15,  9,  8,  0,  0],
       [ 9,  0, 13,  0,  0],
       [ 0,  0,  0,  0,  0]])
In [57]: timeit foo(mlist)
738 µs ± 30.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

它比那个慢,sum所以我不会追溯。但它仍然是一个需要牢记的选项。


推荐阅读