python-3.x - 使用 numpy 进行有效矩阵切片
问题描述
我正在尝试在子向量上乘以子矩阵。似乎这种乘法应该比整个矩阵在整个向量上的乘法更快,但时间测量结果相反:
B = np.random.randn(26200, 2000)
h = np.random.randn(2000)
%time z = B @ h
CPU times: user 56 ms, sys: 4 ms, total: 60 ms
Wall time: 29.4 ms
%time z = B[:, :256] @ h[:256]
CPU times: user 44 ms, sys: 28 ms, total: 72 ms
Wall time: 54.5 ms
带有 %timeit 的结果:
%timeit z = B @ h
100 loops, best of 3: 18.8 ms per loop
%timeit z = B[:, :256] @ h[:256]
10 loops, best of 3: 38.2 ms per loop
再次运行它:
%timeit z = B @ h
10 loops, best of 3: 18.7 ms per loop
%timeit z = B[:, :256] @ h[:256]
10 loops, best of 3: 36.8 ms per loop
可能有一些有效的方法可以用 numpy 做到这一点,或者我可能需要使用例如 tenserflow 来使这种切片有效?
解决方案
这是内存布局和时间访问的问题。默认情况下,数组像在 C (中一样逐行存储order='C')
。您可以像在 Fortran ( ) 中那样逐列存储数据order='F'
,这与您的受限问题更兼容,因为您只选择了几列。
插图:
In [107]: BF=np.asfortranarray(B)
In [108]: np.equal(B,BF).all()
Out[108]: True
In [110]: %timeit B@h
78.5 ms ± 20.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [111]: %timeit BF@h
89.3 ms ± 7.18 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [112]: %timeit B[:,:256]@h[:256]
150 ms ± 18.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [113]: %timeit BF[:,:256]@h[:256]
10.5 ms ± 893 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
这种方式时间执行跟随大小。
推荐阅读
- msmtp - Sending a mail with smtp on localhost in ubuntu?
- matlab - How to move to next iteration in for loop in Matlab
- android - queryIntentActivities method returns empty List for API level 23 (Marshmallow) and below
- java - jboss AS 7 对 glassfish 的限制
- linux - 使用 socat 同时嗅探多个串口
- recursion - 递归调用后的代码递归
- typescript - 如何检查 get 方法 TypeScript 中的属性值
- javascript - Chrome claims a function takes long but it is never called
- vba - Excel VBA: Paste Excel Range as a Table in Powerpoint
- text - VBS OpenTextFile returns unexpected result