首页 > 解决方案 > Tensorflow 错误:没有为任何变量提供梯度,请检查您的图表中是否存在变量之间不支持梯度的操作

问题描述

我在使用 tensorFlow 时遇到了麻烦。以下代码是可以的。

来自未来的进口部门,print_function,absolute_import

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

import tensorflow.contrib.slim as slim
from tensorflow.contrib.layers.python.layers import initializers
from tensorflow.python.ops import init_ops

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)

# Visualize decoder setting
# Parameters
training_epochs = 100
batch_size = 50
display_step = 40

LEARNING_RATE = 0.001
# LEARNING_RATE_BASE = 0.01
# LEARNING_RATE_DECAY = 0.99

min_after_dequeue = 1000
TRAINING_SAMPLE_SIZE = 3365

IMAGE_SIZE1 = 40
IMAGE_SIZE2 = 80
IMAGE_CHANNEL = 1
DEEP_SIZE = 8

def inference(x, reuse=False):
    # Building the encoder
    with tf.variable_scope('inference') as scope:
        if reuse:
            scope.reuse_variables()
        with slim.arg_scope([slim.conv2d], padding='SAME',
            weights_initializer=initializers.xavier_initializer(),
            weights_regularizer=None,
            biases_initializer=init_ops.zeros_initializer(),
            biases_regularizer=None,
            normalizer_fn=slim.batch_norm,
            activation_fn=tf.nn.relu):
            net = slim.conv2d(x, DEEP_SIZE, [2, 2], stride=1, scope='conv1')
            net = slim.conv2d(net, DEEP_SIZE*2, [2, 2], stride=1, scope='conv2')
            net = slim.conv2d(net, DEEP_SIZE, [2, 2], stride=1, scope='conv3')
            # net = slim.conv2d(net, DEEP_SIZE*2, [2, 2], stride=1, scope='conv4')
            # net = slim.conv2d(net, DEEP_SIZE, [2, 2], stride=1, scope='conv5')
            net = slim.conv2d(net, IMAGE_CHANNEL, [2, 2], stride=1, activation_fn=None, scope='conv6')
    return net

# Construct model
def train():
    # tf Graph input (only pictures)
    x = tf.placeholder(tf.float32, [batch_size, IMAGE_SIZE1, IMAGE_SIZE2, IMAGE_CHANNEL])
    y = tf.placeholder(tf.float32, [batch_size, IMAGE_SIZE1, IMAGE_SIZE2, IMAGE_CHANNEL])

    y_pred = inference(x)
    y_true = y

    cost = tf.reduce_mean((y_pred - y_true) ** 2)
    # cost = - tf.reduce_mean(y_true * tf.log(tf.clip_by_value(y_pred, 1e-10, 1.0)))
    optimizer = tf.train.AdamOptimizer(LEARNING_RATE).minimize(cost)

    saver = tf.train.Saver(tf.global_variables())
    # Launch the graph
    with tf.Session(config = tf.ConfigProto(allow_soft_placement = True, log_device_placement = True)) as sess:
        sess.run([tf.global_variables_initializer(), tf.local_variables_initializer()])
        coord = tf.train.Coordinator()
        threads = tf.train.start_queue_runners(coord=coord)

        total_batch = TRAINING_SAMPLE_SIZE // batch_size
        # Training cycle
        loss_init = 100000000
        for epoch in range(training_epochs):
            # Loop over all batches
            loss = 0
            for i in range(total_batch):
                batch_xs = sess.run(input_imgs)
                batch_ys = sess.run(label_imgs)
                batch_ys1 = (batch_ys<0.5)*0.
                batch_ys2 = (batch_ys>0.5)*1.
                batch_ys = batch_ys1 + batch_ys2
                # Run optimization op (backprop) and cost op (to get loss value)
                _, c = sess.run([optimizer, cost], feed_dict={x: batch_xs, y: batch_ys})
                # Display logs per epoch step
                if i % display_step == 0:
                    print(epoch+1,'\t', i+1, '\t', "{:.9f}".format(c))

                    # Compare original images with their reconstructions
                    a = 0
                    for j in range(1):
                        xs = sess.run(input_imgs)
                        ys = sess.run(label_imgs)
                        ys1 = (ys<0.5)*0.
                        ys2 = (ys>0.5)*1.
                        ys = ys1 + ys2
                        encode_decode = sess.run(y_pred, feed_dict={x: xs, y:ys})
                        for i in range(5):
                            # plt.imshow(xs[i].reshape(IMAGE_SIZE1, IMAGE_SIZE2), cmap='Greys_r')
                            # plt.axis('off')  #不显示坐标轴
                            # fig = plt.gcf()  
                            # fig.set_size_inches(IMAGE_SIZE2/300, IMAGE_SIZE1/300)
                            # plt.gca().xaxis.set_major_locator(plt.NullLocator())  
                            # plt.gca().yaxis.set_major_locator(plt.NullLocator())  
                            # plt.subplots_adjust(top = 1, bottom = 0, right = 1, left = 0, hspace = 0, wspace = 0)  
                            # plt.margins(0,0)  
                            # fig.savefig('autoencoder_AE/input/%d.jpg' % a, transparent=True, dpi=300, pad_inches = 0)

                            plt.imshow(ys[i].reshape(IMAGE_SIZE1, IMAGE_SIZE2), cmap='Greys_r')
                            plt.axis('off')  #不显示坐标轴
                            fig = plt.gcf()  
                            fig.set_size_inches(IMAGE_SIZE2/300, IMAGE_SIZE1/300)
                            plt.gca().xaxis.set_major_locator(plt.NullLocator())  
                            plt.gca().yaxis.set_major_locator(plt.NullLocator())  
                            plt.subplots_adjust(top = 1, bottom = 0, right = 1, left = 0, hspace = 0, wspace = 0)  
                            plt.margins(0,0)  
                            fig.savefig('autoencoder_AE/label/%d.jpg' % a, transparent=True, dpi=300, pad_inches = 0)

                            plt.imshow(encode_decode[i].reshape(IMAGE_SIZE1, IMAGE_SIZE2), cmap='Greys_r')
                            plt.axis('off')  #不显示坐标轴
                            fig = plt.gcf()  
                            fig.set_size_inches(IMAGE_SIZE2/300, IMAGE_SIZE1/300)
                            plt.gca().xaxis.set_major_locator(plt.NullLocator())  
                            plt.gca().yaxis.set_major_locator(plt.NullLocator())  
                            plt.subplots_adjust(top = 1, bottom = 0, right = 1, left = 0, hspace = 0, wspace = 0)  
                            plt.margins(0,0)  
                            fig.savefig('autoencoder_AE/predict/%d.jpg' % a, transparent=True, dpi=300, pad_inches = 0)

                            a = a + 1
            if loss < loss_init:
                saver.save(sess,'AE_simp_model/autoencoder.ckpt')
                loss_init = loss
        print("Optimization Finished!")
train()

但是,当 inference() 已更改为:

def inference(x, reuse=False):
    # Building the encoder
    with tf.variable_scope('inference') as scope:
        if reuse:
            scope.reuse_variables()
        with slim.arg_scope([slim.conv2d], padding='SAME',
            weights_initializer=initializers.xavier_initializer(),
            weights_regularizer=None,
            biases_initializer=init_ops.zeros_initializer(),
            biases_regularizer=None,
            normalizer_fn=slim.batch_norm,
            activation_fn=tf.nn.relu):
            net = slim.conv2d(x, DEEP_SIZE, [2, 2], stride=1, scope='conv1')
            net = slim.conv2d(net, DEEP_SIZE*2, [2, 2], stride=1, scope='conv2')
            net = slim.conv2d(net, DEEP_SIZE, [2, 2], stride=1, scope='conv3')
            # net = slim.conv2d(net, DEEP_SIZE*2, [2, 2], stride=1, scope='conv4')
            # net = slim.conv2d(net, DEEP_SIZE, [2, 2], stride=1, scope='conv5')
            net = slim.conv2d(net, IMAGE_CHANNEL, [2, 2], stride=1, activation_fn=None, scope='conv6')
            net = tf.reshape(net, [-1, 1])
            a = tf.ones_like(net) * 0.5
            net = tf.concat([net, a], 1)
            net = tf.arg_max(net, 1)
            net = tf.cast(net, tf.float32)
            net = tf.reshape(net, [batch_size, IMAGE_SIZE1, IMAGE_SIZE2, IMAGE_CHANNEL])
    return net

出现错误“没有为任何变量提供梯度,请检查您的图表中不支持梯度的操作,变量之间”。

标签: pythontensorflow

解决方案


正如错误已经告诉您的那样,有一个操作 (op) 导致 tensorflow 无法计算梯度。在您的情况下,操作tf.arg_max是问题所在。该函数不可微分(即不存在导数,因此无法计算梯度)。因此,Tensorflow 不能为图中的任何变量创建梯度并抛出此错误。

解决方案是设计一个只使用可微函数的网络,并摆脱tf.arg_max. 由于您没有解释您的网络设计意图是什么,因此很难说出inference在这种情况下如何重新设计您的功能。我只能猜测您更希望softmax获得每个类的预测概率。然后可以将其用于训练的损失函数中。

即使您想要的目标图像只有值 0 和 1,强制网络只输出 0 和 1 也是不合理的。相反,您需要一个可以解释为(未缩放的)概率的连续输出。您需要连续值来计算定义明确的导数,以便您的损失函数可以通过梯度下降进行优化。

当然,在训练阶段之后,您可以argmax将预测值 <0.5 的所有像素设置为零,其他像素设置为 1。但是,在训练期间不应该这样做。


推荐阅读