python - 如何在python中对包含eigvalsh的复杂代码进行矢量化
问题描述
我有以下代码(对不起,它不是太小,我已经尝试从原始代码中减少它)。
eval_s()
基本上,我在运行以下方法/函数时遇到性能问题:
1) 找到 4x4 厄密矩阵的 4 个特征值eigvalsh()
2) 将特征值的倒数相加成一个变量result
3) 我对由 参数化的许多矩阵重复第 1 步和第 2 步x,y,z
,将累积和存储在 中result
。
我在步骤 3 中重复计算(查找特征值和求和)的次数取决于ksep
我的代码中的一个变量,我需要这个数字来增加我的实际代码(即,ksep
必须减少)。但是其中的计算eval_s()
有一个 for 循环x,y,z
,我猜这真的会减慢速度。[试着ksep=0.5
明白我的意思。]
有没有办法对我的示例代码中指示的方法(或者一般来说,涉及查找参数化矩阵的特征值的函数)进行矢量化?
代码:
import numpy as np
import sympy as sp
import itertools as it
from sympy.abc import x, y, z
class Solver:
def __init__(self, vmat):
self._vfunc = sp.lambdify((x, y, z),
expr=vmat,
modules='numpy')
self._q_count, self._qs = None, [] # these depend on ksep!
################################################################
# How to vectorize this?
def eval_s(self, stiff):
assert len(self._qs) == self._q_count, "Run 'populate_qs' first!"
result = 0
for k in self._qs:
evs = np.linalg.eigvalsh(self._vfunc(*k))
result += np.sum(np.divide(1., (stiff + evs)))
return result.real - 4 * self._q_count
################################################################
def populate_qs(self, ksep: float = 1.7):
self._qs = [(kx, ky, kz) for kx, ky, kz
in it.product(np.arange(-3*np.pi, 3.01*np.pi, ksep),
np.arange(-3*np.pi, 3.01*np.pi, ksep),
np.arange(-3*np.pi, 3.01*np.pi, ksep))]
self._q_count = len(self._qs)
def test():
vmat = sp.Matrix([[1, sp.cos(x/4+y/4), sp.cos(x/4+z/4), sp.cos(y/4+z/4)],
[sp.cos(x/4+y/4), 1, sp.cos(y/4-z/4), sp.cos(x/4 - z/4)],
[sp.cos(x/4+z/4), sp.cos(y/4-z/4), 1, sp.cos(x/4-y/4)],
[sp.cos(y/4+z/4), sp.cos(x/4-z/4), sp.cos(x/4-y/4), 1]]) * 2
solver = Solver(vmat)
solver.populate_qs(ksep=1.7) # <---- Performance starts to worsen (in eval_s) when ksep is reduced!
print(solver.eval_s(0.65))
if __name__ == "__main__":
import timeit
print(timeit.timeit("test()", setup="from __main__ import test", number=100))
ps 代码的 sympy 部分可能看起来很奇怪,但它在我的原始代码中是有目的的。
解决方案
您可以,方法如下:
def eval_s_vectorized(self, stiff):
assert len(self._qs) == self._q_count, "Run 'populate_qs' first!"
mats = np.stack([self._vfunc(*k) for k in self._qs], axis=0)
evs = np.linalg.eigvalsh(mats)
result = np.sum(np.divide(1., (stiff + evs)))
return result.real - 4 * self._q_count
这仍然使 Sympy 表达式的评估未向量化。这部分向量化有点棘手,主要是因为1
输入矩阵中的 s。您可以通过修改来制作代码的完全矢量化版本,Solver
以便将标量常量替换为数组常量vmat
:
import itertools as it
import numpy as np
import sympy as sp
from sympy.abc import x, y, z
from sympy.core.numbers import Number
from sympy.utilities.lambdify import implemented_function
xones = implemented_function('xones', lambda x: np.ones(len(x)))
lfuncs = {'xones': xones}
def vectorizemat(mat):
ret = mat.copy()
# get the first element of the set of symbols that mat uses
for x in mat.free_symbols: break
for i,j in it.product(*(range(s) for s in mat.shape)):
if isinstance(mat[i,j], Number):
ret[i,j] = xones(x) * mat[i,j]
return ret
class Solver:
def __init__(self, vmat):
self._vfunc = sp.lambdify((x, y, z),
expr=vectorizemat(vmat),
modules=[lfuncs, 'numpy'])
self._q_count, self._qs = None, [] # these depend on ksep!
def eval_s_vectorized_completely(self, stiff):
assert len(self._qs) == self._q_count, "Run 'populate_qs' first!"
evs = np.linalg.eigvalsh(self._vfunc(*self._qs.T).T)
result = np.sum(np.divide(1., (stiff + evs)))
return result.real - 4 * self._q_count
def populate_qs(self, ksep: float = 1.7):
self._qs = np.array([(kx, ky, kz) for kx, ky, kz
in it.product(np.arange(-3*np.pi, 3.01*np.pi, ksep),
np.arange(-3*np.pi, 3.01*np.pi, ksep),
np.arange(-3*np.pi, 3.01*np.pi, ksep))])
self._q_count = len(self._qs)
测试/计时
对于小型ksep
,矢量化版本比原始版本快约 2 倍,完全矢量化版本快约 20 倍:
# old version for ksep=.3
import timeit
print(timeit.timeit("test()", setup="from __main__ import test", number=10))
-85240.46154500882
-85240.46154500882
-85240.46154500882
-85240.46154500882
-85240.46154500882
-85240.46154500882
-85240.46154500882
-85240.46154500882
-85240.46154500882
-85240.46154500882
118.42847006605007
# vectorized version for ksep=.3
import timeit
print(timeit.timeit("test()", setup="from __main__ import test", number=10))
-85240.46154498367
-85240.46154498367
-85240.46154498367
-85240.46154498367
-85240.46154498367
-85240.46154498367
-85240.46154498367
-85240.46154498367
-85240.46154498367
-85240.46154498367
64.95763925800566
# completely vectorized version for ksep=.3
import timeit
print(timeit.timeit("test()", setup="from __main__ import test", number=10))
-85240.46154498367
-85240.46154498367
-85240.46154498367
-85240.46154498367
-85240.46154498367
-85240.46154498367
-85240.46154498367
-85240.46154498367
-85240.46154498367
-85240.46154498367
5.648927717003971
矢量化版本的结果中的舍入误差与原始版本略有不同。这可能是由于result
计算总和的方式不同。
推荐阅读
- phpstorm - 是否有人为 PhpStorm 配置了 xml php 文件?
- linux - 在内核版本 3.16.x 中实现自定义系统调用
- python-3.x - 多页 Tiff 图像的 PyTesseract 错误
- asp.net - ASP.NET 上的单页授权绕过问题
- c# - 如何从 DataTemplate 中绑定到 ViewModel 上的属性?
- neo4j - 通过 Cypher 中的聚合计算进行过滤
- assembly - 如何在汇编中使用缓冲区(GNU asm)?
- python - while循环中的变量赋值
- wolfram-mathematica - 如何在 Wolfram Mathematica 中的等高线图和梯度矢量场上绘制 3-D 图
- angular - 错误渲染的条形图 Chart.js