tensorflow - 将 tf.map_fn 与多个 GPU 一起使用
问题描述
我正在尝试将我的单 GPU TensorFlow 代码扩展到多 GPU。我必须在 3 个自由度上工作,不幸的是我需要使用 tf.map_fn 来并行化第三个自由度。我尝试使用官方文档中显示的设备放置,但似乎无法使用tf.map_fn
. 有没有办法tf.map_fn
在多个 GPU 上运行?
这里的错误输出:
InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'map_1/TensorArray_1': Could not satisfy explicit device specification '' because the node was colocated with a group of nodes that required incompatible device '/device:GPU:1'
Colocation Debug Info:
Colocation group had the following types and devices:
TensorArrayGatherV3: GPU CPU
Range: GPU CPU
TensorArrayWriteV3: GPU CPU
TensorArraySizeV3: GPU CPU
MatMul: GPU CPU
Enter: GPU CPU
TensorArrayV3: GPU CPU
Const: GPU CPU
Colocation members and user-requested devices:
map_1/TensorArrayStack/range/delta (Const)
map_1/TensorArrayStack/range/start (Const)
map_1/TensorArray_1 (TensorArrayV3)
map_1/while/TensorArrayWrite/TensorArrayWriteV3/Enter (Enter) /device:GPU:1
map_1/TensorArrayStack/TensorArraySizeV3 (TensorArraySizeV3)
map_1/TensorArrayStack/range (Range)
map_1/TensorArrayStack/TensorArrayGatherV3 (TensorArrayGatherV3)
map_1/while/MatMul (MatMul) /device:GPU:1
map_1/while/TensorArrayWrite/TensorArrayWriteV3 (TensorArrayWriteV3) /device:GPU:1
[[Node: map_1/TensorArray_1 = TensorArrayV3[clear_after_read=true, dtype=DT_FLOAT, dynamic_size=false, element_shape=<unknown>, identical_element_shapes=true, tensor_array_name=""](map_1/TensorArray_1/size)]]
这里有一个简单的代码示例来重现它:
import tensorflow as tf
import numpy
rc = 1000
sess = tf.Session()
for deviceName in ['/cpu:0', '/device:GPU:0', '/device:GPU:1']:
with tf.device(deviceName):
matrices = tf.random_uniform([rc,rc,4],minval = 0, maxval = 1, dtype = tf.float32)
def mult(i):
product = tf.matmul(matrices[:,:,i],matrices[:,:,i+1])
return product
mul = tf.zeros([rc,rc,3], dtype = tf.float32)
mul = tf.map_fn(mult, numpy.array([0,1,2]), dtype = tf.float32, parallel_iterations = 10)
m = sess.run(mul)
解决方案
您正在尝试做的事情可以通过批处理 matmul 来完成。考虑以下变化:
import tensorflow as tf
import numpy
import time
import numpy as np
rc = 1000
sess = tf.Session()
#compute on cpu for comparison later
vals = np.random.uniform(size=[rc,rc,4]).astype(np.float32)
mat1 = tf.identity(vals)
mat2 = tf.transpose(vals, [2, 0, 1])
#store mul in array so all are fetched in run call
muls = []
#I only have one GPU.
for deviceName in ['/cpu:0', '/device:GPU:0']:
with tf.device(deviceName):
def mult(i):
product = tf.matmul(mat1[:,:,i],mat1[:,:,i+1])
return product
mul = tf.zeros([rc,rc,3], dtype = tf.float32)
mul = tf.map_fn(mult, numpy.array([0,1,2]), dtype = tf.float32, parallel_iterations = 10)
muls.append(mul)
#use transposed mat with a shift to matmul in one go
mul = tf.matmul(mat2[:-1], mat2[1:])
print(muls)
print(mul)
start = time.time()
m1 = sess.run(muls)
end = time.time()
print("muls:", end - start)
start = time.time()
m2 = sess.run(mul)
end = time.time()
print("mul:", end - start)
print(np.allclose(m1[0],m1[1]))
print(np.allclose(m1[0],m2))
print(np.allclose(m1[1],m2))
我的电脑上的结果是:
[<tf.Tensor 'map/TensorArrayStack/TensorArrayGatherV3:0' shape=(3, 1000, 1000) dtype=float32>, <tf.Tensor 'map_1/TensorArrayStack/TensorArrayGatherV3:0' shape=(3, 1000, 1000) dtype=float32>]
Tensor("MatMul:0", shape=(3, 1000, 1000), dtype=float32)
muls: 0.4262731075286865
mul: 0.3794088363647461
True
True
True
您很少希望将 CPU 与 GPU 同步使用,因为这将成为瓶颈。GPU 将等待 CPU 完成。如果你用 CPU 做任何事情,它应该与 GPU 异步,这样它们就可以全速运行。
推荐阅读
- android - EditText 在 TextInputLayout 中不可见
- python-3.x - Python3 numpy 导入错误 PyCapsule_Import 无法导入模块“datetime”
- javascript - Joi 中可选条件的模式
- node.js - MongoDB。如何将文档从一个数组移动到另一个数组?
- javascript - 悬停元素的可选工具提示
- c# - 如何读取固定宽度的 XML 文档?
- python - 如何将给定的字符串转换为所需的字符串,如下所示的 Python
- angular - 似乎无法在没有错误的情况下重置我的表单
- google-apps-script - Google Apps 脚本:找不到方法 createAllDayEvent(string,string,object)
- java - 检查 ArrayList 中每个立方体的碰撞