首页 > 解决方案 > tensorflow 2.0(CPU)中简单模型的推理速度?Vanilla tensorflow/numpy 仍然快 50 倍?

问题描述

我发现与 numpy 相比,tensorflow keras(2.0)仍然很慢。tfdeploy 还在使用吗?在推理中接近 numpy 速度的选项有哪些?

更新:

这是一个清理后的示例,显示了不同机器上的三种方法。tf.function 包装的 tensorflow 调用现在是 8x vanilla numpy,这可能是合理的,但仍然感觉需要一些开销。

import tensorflow as tf
import numpy as np

x = tf.convert_to_tensor(np.random.randn(1, 12).astype(np.float32)) # YES! one row only
xnumpy = x.numpy()
model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(16, use_bias=False) for i in range(10)
    ])

@tf.function
def f(x):
    return model(x)


def chain_layers_tf(x, kernels):
    y = x
    for kernel in kernels:
        y = tf.matmul(y, kernel)
    return y

# call once
f(x)
chain_layers_tf_wrapped = tf.function(chain_layers_tf)

kernels_tf = [x.kernel for x in model.layers]
kernels_np = [x.kernel.numpy() for x in model.layers]

chain_layers_tf_wrapped(x, kernels_np)
chain_layers_tf_wrapped(x, kernels_tf)

def chain_layers_np(x, kernels):
    y = x
    for kernel in kernels:
        y = y.dot(kernel)
    return y

model(x)

"""
%timeit t.model(t.x)
%timeit t.f(t.x)
%timeit t.chain_layers_tf(t.x, t.kernels_tf)
%timeit t.chain_layers_tf_wrapped(t.x, t.kernels_tf)
%timeit t.chain_layers_np(t.xnumpy, t.kernels_np)
"""

# In [107]: %timeit t.model(t.x)
#      ...: %timeit t.f(t.x)
#      ...: %timeit t.chain_layers_tf(t.x, t.kernels_tf)
#      ...: %timeit t.chain_layers_tf_wrapped(t.x, t.kernels_tf)
#      ...: %timeit t.chain_layers_np(t.xnumpy, t.kernels_np)
# 1.51 ms ± 1.73 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# 131 µs ± 856 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 491 µs ± 1.17 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# 417 µs ± 1.53 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# 5.21 µs ± 7.56 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

以前的笔记:

在此处输入图像描述

更新:

Not just numpy as above, but tensorflow is faster than tensorflow.keras

In [22]: x = np.random.randn(100, 12).astype(np.float32)

In [23]: a = np.random.randn(12, 16).astype(np.float32)

In [24]: %timeit tf.matmul(x, a)
39.3 µs ± 98.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


UPDATE: 

more layers not much different

在此处输入图像描述

标签: numpykerastensorflow2.0

解决方案


推荐阅读