python - 按行拆分稀疏矩阵
问题描述
我有一个scipy.sparse.csr.csr_matrix
维度(8723, 1741277)
。
如何有效地将它按行分成 n 个块?
块的行数最好大致相等。
我说的大致是因为它取决于(行数)/(块数)是否会返回任何剩余部分。
我认为你可以很容易地在numpy.split
数组中做到这一点,但它似乎不适用于稀疏矩阵。
具体来说,如果我选择不能与 8723 完全整除的 n 块数,我会收到此错误:
ValueError: array split does not result in an equal division
如果我选择与 8723 完全可分的 n-chunks 数,我会收到此错误:
AxisError: axis1: axis 0 is out of bounds for array of dimension 0
我想将稀疏矩阵分成块的原因是因为我想将稀疏矩阵转换为(密集)数组,但我不能直接这样做,因为它整体太大。
解决方案
In [6]: from scipy import sparse
In [7]: M = sparse.random(12,3,.1,'csr')
In [8]: np.split?
In [9]: np.split(M,3)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py in _wrapfunc(obj, method, *args, **kwds)
55 try:
---> 56 return getattr(obj, method)(*args, **kwds)
57
/usr/local/lib/python3.6/dist-packages/scipy/sparse/base.py in __getattr__(self, attr)
687 else:
--> 688 raise AttributeError(attr + " not found")
689
AttributeError: swapaxes not found
During handling of the above exception, another exception occurred:
AxisError Traceback (most recent call last)
<ipython-input-9-11a4dcdd89af> in <module>
----> 1 np.split(M,3)
/usr/local/lib/python3.6/dist-packages/numpy/lib/shape_base.py in split(ary, indices_or_sections, axis)
848 raise ValueError(
849 'array split does not result in an equal division')
--> 850 res = array_split(ary, indices_or_sections, axis)
851 return res
852
/usr/local/lib/python3.6/dist-packages/numpy/lib/shape_base.py in array_split(ary, indices_or_sections, axis)
760
761 sub_arys = []
--> 762 sary = _nx.swapaxes(ary, axis, 0)
763 for i in range(Nsections):
764 st = div_points[i]
/usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py in swapaxes(a, axis1, axis2)
583
584 """
--> 585 return _wrapfunc(a, 'swapaxes', axis1, axis2)
586
587
/usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py in _wrapfunc(obj, method, *args, **kwds)
64 # a downstream library like 'pandas'.
65 except (AttributeError, TypeError):
---> 66 return _wrapit(obj, method, *args, **kwds)
67
68
/usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py in _wrapit(obj, method, *args, **kwds)
44 except AttributeError:
45 wrap = None
---> 46 result = getattr(asarray(obj), method)(*args, **kwds)
47 if wrap:
48 if not isinstance(result, mu.ndarray):
AxisError: axis1: axis 0 is out of bounds for array of dimension 0
如果我们申请np.array
,M
我们会得到一个 0d 对象数组;只是稀疏对象周围的天真包装。
In [10]: np.array(M)
Out[10]:
array(<12x3 sparse matrix of type '<class 'numpy.float64'>'
with 3 stored elements in Compressed Sparse Row format>, dtype=object)
In [11]: _.shape
Out[11]: ()
拆分正确的密集等价物:
In [12]: np.split(M.A,3)
Out[12]:
[array([[0. , 0.61858517, 0. ],
[0. , 0. , 0. ],
[0. , 0. , 0. ],
[0. , 0. , 0. ]]), array([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]]), array([[0. , 0.89573059, 0. ],
[0. , 0. , 0. ],
[0. , 0. , 0.02334738],
[0. , 0. , 0. ]])]
和直接稀疏分裂:
In [13]: [M[i:j,:] for i,j in zip([0,4,8],[4,8,12])]
Out[13]:
[<4x3 sparse matrix of type '<class 'numpy.float64'>'
with 1 stored elements in Compressed Sparse Row format>,
<4x3 sparse matrix of type '<class 'numpy.float64'>'
with 0 stored elements in Compressed Sparse Row format>,
<4x3 sparse matrix of type '<class 'numpy.float64'>'
with 2 stored elements in Compressed Sparse Row format>]
对于稀疏矩阵,像这样的切片不如密集矩阵有效。密集切片是视图。稀疏的必须是副本。唯一的例外是lil
格式,它有一个get_rowview
方法。虽然有许多函数可以从块中构造稀疏矩阵,但并不需要将它们拆分的函数。
可能sklearn
有一些拆分功能。它有一些稀疏效用函数来解决它自己对稀疏矩阵的使用。
推荐阅读
- wpf - 从 Api 获取 JSON - WPF 应用程序
- magento2 - Magento 2:如何插入含税价格?
- matlab - 我可以在 Matlab 中写出包含不同维度数据的 txt 或 csv 文档吗?
- java - 如何以编程方式更改设备显示屏的刷新率?
- amazon-web-services - DevOps - CloudFront/Lambda 将某些路径路由到某些服务器
- spring-boot - 在spring boot application.properties/.yml中根据OS设置日志文件位置
- java - Spring REST API 中自定义接受/标头和返回类型的 406 HTTP 状态
- python - 在 Pandas 中读取带逗号的 CSV 文件时出现问题
- python - Print the output of a python code in a new cmd window
- android - 如何并排对齐大小不均匀的图像?