python-3.x - 在 TensorFlow 中使用多个 GPU 使用 pb 模型进行推理
问题描述
我使用带有 8 个 Titan X 的服务器,试图比使用单个 GPU 更快地预测图像。我像这样加载PB模型:
model_dir = "./model"
model = "nasnet_large_v1.pb"
model_path = os.path.join(model_dir, model)
model_graph = tf.Graph()
with model_graph.as_default():
with tf.gfile.GFile(model_path, 'rb') as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
_ = tf.import_graph_def(graph_def, name='')
input_layer = model_graph.get_tensor_by_name("input:0")
output_layer = model_graph.get_tensor_by_name('final_layer/predictions:0')
然后我开始迭代目录中的文件,./data_input
如下所示:
with tf.Session(graph = model_graph, config=config) as inference_session:
# Initialize session
initializer = np.zeros([1, 331, 331, 3])
print("Initialing session...")
inference_session.run(output_layer, feed_dict={input_layer: initializer})
print("Done initialing.")
# Prediction
file_list = []
processed_files = []
for path, dir, files in os.walk('./model_output/processed_files'):
for file in files:
processed_files.append(file.split('_')[0]+'.tfrecord')
print("Processed files: ")
for f in processed_files:
print('\t', f)
while True:
for path, dir, files in os.walk("./data_input"):
for file in files:
if file == '.DS_Store': continue
if file in processed_files: continue
print("Reading file {}".format(file))
file_path = os.path.join('./data_input', file)
file_list.append(file_path)
res = predict(file_path)
processed_files.append(file)
with open('./model_output/processed_files/{}_{}_processed_files.json'.format(file.split('.')[0], model.split('.')[0]), 'w') as f:
f.write(json.dumps(processed_files))
with open('./model_output/classify_result/{}_{}_classify_result.json'.format(file.split('.')[0], model.split('.')[0]), 'w') as f:
f.write(json.dumps(res, indent=4, separators=(',',':')))
time.sleep(1)
在predict()
函数中,我写了这样的代码:
label_map = get_label()
# read tfrecord file by tf.data
dataset = get_dataset(filename)
# dataset.apply(tf.contrib.data.prefetch_to_device("/gpu:0"))
# load data
iterator = dataset.make_one_shot_iterator()
features = iterator.get_next()
result = []
content = {}
count = 0
# session
with tf.Session() as sess:
tf.global_variables_initializer()
t1 = time.time()
try:
while True:
[_image, _label, _filepath] = sess.run(fetches=features)
_image = np.asarray([_image])
_image = _image.reshape(-1, 331, 331, 3)
predictions = inference_session.run(output_layer, feed_dict={input_layer: _image})
predictions = np.squeeze(predictions)
# res = []
for i, pred in enumerate(predictions):
count += 1
overall_result = np.argmax(pred)
predict_result = label_map[overall_result].split(":")[-1]
if predict_result == 'unknown': continue
content['prob'] = str(np.max(pred))
content['label'] = predict_result
content['filepath'] = str(_filepath[i], encoding='utf-8')
result.append(content)
except tf.errors.OutOfRangeError:
t2 = time.time()
print("{} images processed, average time: {}s".format(count, (t2-t1)/count))
return result
我尝试with tf.device('/gpu:{}'.format(i))
在加载模型部分或推理会话部分或会话部分中使用,nvidia-smi
显示只有 GPU0 使用到 100%,而其他 GPU 即使在加载内存时似乎也无法工作。
我应该怎么做才能让所有 GPU 同时运行以提高预测速度?
我的代码在https://github.com/tzattack/image_classification_algorithms下。
解决方案
您可以通过以下方式强制图表中每个节点的设备:
def load_network(graph, i):
od_graph_def = tf.GraphDef()
with tf.gfile.GFile(graph, 'rb') as fid:
serialized_graph = fid.read()
od_graph_def.ParseFromString(serialized_graph)
for node in od_graph_def.node:
node.device = '/gpu:{}'.format(i) if i >= 0 else '/cpu:0'
return {"od_graph_def": od_graph_def}
然后您可以将获得的多个 grapgh(每个 gpu)合并为一个,
还可以更改张量名称,以防您对所有 gpus 使用相同的模型
并在一个 sessoin 中运行
非常适合我
推荐阅读
- python-3.x - 我可以将 gstreamer 字幕作为 python 变量获取吗
- typescript - TypeScript 数组按数组中的数字排序
- bash - 使用 bash 删除目录中的特定文件
- javascript - 使用 querySelectorAll 检查类
- c# - .net 中的一种 SSL 实现方式 - 需要帮助
- python - Tkinter - 当事件发生时如何添加“铃声”?
- python - 读取 csv 到对象列表并通过 mqtt 发布者以 30 秒的间隔逐行发送
- delphi - 如何在 Delphi 中制作具有可变列宽的 FMX.ListBox?
- azure-devops - Devops 拉取请求工作流验证
- python - Python Seaborn lineplot - 用色调平滑