首页 > 解决方案 > Tensorflow object detection api v1 object detection api mask_rcnn_inception_v2_coco 慢推理和内存泄漏

问题描述

我正在尝试使用来自tensorflow 对象检测 api(v1)的掩码 rcnn 模型(mask_rcnn_inception_v2_coco)。我尝试使用以下两种方法推断视频。(这两种方法类似于 TensorFlow 对象检测 API 笔记本中提供的方法)

我面临两个问题。

  1. 推理速度比模型动物园中提到的值慢。

推理速度慢了大约 40 倍,这绝对不能用 2 个 GPU 之间的差异来解释。(Tensorflow 使用的是 Nvidia GeForce GTX TITAN X,我的是 GTX 1660 ti(笔记本电脑))

  1. 内存使用量逐渐增加。(特别是对于方法 2,这个增量很重要)(可能内存泄漏?)

(我拍的视频是一个非常小的2秒(60帧,1280x720))

内存使用和推理时间

方法一

def run_inference_1(cap):
    while cap.isOpened():
        with detection_graph.as_default():
            with tf.Session() as sess:
                
                ret, image_np = cap.read()
                if(ret==False):
                    break
                # Get handles to input and output tensors
                ops = tf.get_default_graph().get_operations()
                all_tensor_names = {output.name for op in ops for output in op.outputs}
                tensor_dict = {}
                for key in [
                    'num_detections', 'detection_boxes', 'detection_scores',
                    'detection_classes', 'detection_masks']:
                    tensor_name = key + ':0'

                    if tensor_name in all_tensor_names:
                        tensor_dict[key] = tf.get_default_graph().get_tensor_by_name(tensor_name)
                if 'detection_masks' in tensor_dict:
                    # The following processing is only for single image
                    detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0])
                    detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])
                    # Reframe is required to translate mask from box coordinates to image coordinates and fit the image size.
                    real_num_detection = tf.cast(tensor_dict['num_detections'][0], tf.int32)
                    
                    detection_boxes = tf.slice(detection_boxes, [0, 0], [real_num_detection, -1])
                    detection_masks = tf.slice(detection_masks, [0, 0, 0], [real_num_detection, -1, -1])
                    detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(detection_masks,detection_boxes, image_np.shape[0], image_np.shape[1])
                    
                    detection_masks_reframed = tf.cast(tf.greater(detection_masks_reframed, 0.5), tf.uint8)
                    # Follow the convention by adding back the batch dimension
                    tensor_dict['detection_masks'] = tf.expand_dims(detection_masks_reframed, 0)

                image_tensor = tf.get_default_graph().get_tensor_by_name('image_tensor:0')
                # Run inference
                output_dict = sess.run(tensor_dict,feed_dict={image_tensor: np.expand_dims(image_np, 0)})

                # all outputs are float32 numpy arrays, so convert types as appropriate
                output_dict['num_detections'] = int(output_dict['num_detections'][0])
                output_dict['detection_classes'] = output_dict['detection_classes'][0].astype(np.uint8)
                output_dict['detection_boxes'] = output_dict['detection_boxes'][0]
                output_dict['detection_scores'] = output_dict['detection_scores'][0]
                if 'detection_masks' in output_dict:
                    output_dict['detection_masks'] = output_dict['detection_masks'][0]

                # Visualization of the results of a detection.
                viz_utils.visualize_boxes_and_labels_on_image_array(
                    image_np,
                    output_dict['detection_boxes'],
                    output_dict['detection_classes'],
                    output_dict['detection_scores'],
                    category_index,
                    instance_masks=output_dict.get('detection_masks_reframed', None),
                    use_normalized_coordinates=True,
                    line_thickness=8)

方法二

将函数中的前三行更改为:(在会话中运行视频循环)

    with detection_graph.as_default():
        with tf.Session() as sess:
            while cap.isOpened():

你知道造成这种情况的原因吗,推理速度慢和明显的内存泄漏?

我可以对代码进行任何改进以提高推理速度吗?(除了使用 TensorFlow lite 模型)

标签: pythontensorflowimage-segmentationobject-detection-api

解决方案


推荐阅读