首页 > 解决方案 > “得到 OperatorNotAllowedInGraphError:迭代 tf.Tensor”,而没有明显迭代张量

问题描述

操作系统:Manjaro Linux x64

CUDA:11.0.3

TF/Keras:2.4(.1)

比:3.8

您好,我正在尝试构建某种 W-VAE-GAN。我一遍又一遍地遇到同样的错误,我已经在 Keras 2.3 上遇到了这个问题,有趣的是没有使用 TF/K 2.2。不幸的是,我需要使用 Keras 2.4,因为我应该使用这些精确的 TF/K 和 CUDA 版本在我们的大学服务器上运行我的代码。在这一点上,我只是想确保我的代码按预期工作。

我得到的错误如下,在我的代码中注释了确切的行:

...
Epoch 1/20
Traceback (most recent call last):
  File "/home/peer/Programmierkram/BA/Metal_GAN/wvaegan.py", line 282, in <module>
    gan.fit(gen_trans, batch_size=batch_size, epochs=epochs, callbacks=[callback])
  File "/home/peer/Programmierkram/BA/Metal_GAN/venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 1100, in fit
    tmp_logs = self.train_function(iterator)
  File "/home/peer/Programmierkram/BA/Metal_GAN/venv/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 828, in __call__
    result = self._call(*args, **kwds)
  File "/home/peer/Programmierkram/BA/Metal_GAN/venv/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 871, in _call
    self._initialize(args, kwds, add_initializers_to=initializers)
  File "/home/peer/Programmierkram/BA/Metal_GAN/venv/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 725, in _initialize
    self._stateful_fn._get_concrete_function_internal_garbage_collected(  # pylint: disable=protected-access
  File "/home/peer/Programmierkram/BA/Metal_GAN/venv/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2969, in _get_concrete_function_internal_garbage_collected
    graph_function, _ = self._maybe_define_function(args, kwargs)
  File "/home/peer/Programmierkram/BA/Metal_GAN/venv/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3361, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/home/peer/Programmierkram/BA/Metal_GAN/venv/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3196, in _create_graph_function
    func_graph_module.func_graph_from_py_func(
  File "/home/peer/Programmierkram/BA/Metal_GAN/venv/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 990, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/home/peer/Programmierkram/BA/Metal_GAN/venv/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 634, in wrapped_fn
    out = weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/home/peer/Programmierkram/BA/Metal_GAN/venv/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 977, in wrapper
    raise e.ag_error_metadata.to_exception(e)
tensorflow.python.framework.errors_impl.OperatorNotAllowedInGraphError: in user code:

    /home/peer/Programmierkram/BA/Metal_GAN/venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:805 train_function  *
        return step_function(self, iterator)
    /home/peer/Programmierkram/BA/Metal_GAN/venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:795 step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    /home/peer/Programmierkram/BA/Metal_GAN/venv/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py:1259 run
        return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
    /home/peer/Programmierkram/BA/Metal_GAN/venv/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py:2730 call_for_each_replica
        return self._call_for_each_replica(fn, args, kwargs)
    /home/peer/Programmierkram/BA/Metal_GAN/venv/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py:3417 _call_for_each_replica
        return fn(*args, **kwargs)
    /home/peer/Programmierkram/BA/Metal_GAN/venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:788 run_step  **
        outputs = model.train_step(data)
    /home/peer/Programmierkram/BA/Metal_GAN/wvaegan.py:190 train_step
        z_mean, z_log_var, z = self.encoder(clip_img) # <------ WHY NO ERROR HERE?
    /home/peer/Programmierkram/BA/Metal_GAN/venv/lib/python3.8/site-packages/tensorflow/python/framework/ops.py:505 __iter__
        self._disallow_iteration()
    /home/peer/Programmierkram/BA/Metal_GAN/venv/lib/python3.8/site-packages/tensorflow/python/framework/ops.py:498 _disallow_iteration
        self._disallow_when_autograph_enabled("iterating over `tf.Tensor`")
    /home/peer/Programmierkram/BA/Metal_GAN/venv/lib/python3.8/site-packages/tensorflow/python/framework/ops.py:474 _disallow_when_autograph_enabled
        raise errors.OperatorNotAllowedInGraphError(

    OperatorNotAllowedInGraphError: iterating over `tf.Tensor` is not allowed: AutoGraph did convert this function. This might indicate you are trying to use an unsupported feature.

我首先不明白这个错误是怎么来的,因为我使用 TF2.2 成功执行了 VAE 部分,没有任何这样的错误,更重要的是,对我来说没有任何明显的张量迭代. 即使我在我的 train_step 中注释掉了 for 循环,同样的错误也会在几行之后在相同的上下文中发生。我还尝试使用@tf.function 装饰 def train_step(),但没有任何改变。

我使用的代码如下:

import os
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import tensorflow.keras.backend as K
from keras.preprocessing.image import ImageDataGenerator
import itertools
import scipy.io
import matplotlib.pyplot as plt
import matplotlib.image  as PIL


runOnGPU = 0

if runOnGPU==1:
    os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
    os.environ["CUDA_VISIBLE_DEVICES"]="0"
    gpus = tf.config.experimental.list_physical_devices('GPU')
    if gpus:
        try:
            for gpu in gpus:
                tf.config.experimental.set_memory_growth(gpu, True)
        except RuntimeError as e:
            print(e)
else:
    os.environ['CUDA_VISIBLE_DEVICES'] = '-1'


path_clipped_train = os.path.join('./sinograms/clip')
path_transparent_train = os.path.join('./sinograms/transparent')

img_width, img_height = 512, 512
bottleneck = 1024 * 2
filters = (1024, 512, 256, 64)
filter_size = (3, 3, 3, 3)
batch_size = 4
epochs = 20
dsc_steps = 1
gp_w = 10.0
beta_v = 2
learning_rate = 125e-5
latent_dim = 2
input_shape = (1, img_width, img_height, 1)

dataset_gen1 = ImageDataGenerator(rescale=1 / 255, dtype="float32")
dataset_gen2 = ImageDataGenerator(rescale=1 / 255, dtype="float32")

gen_trans = dataset_gen1.flow_from_directory(path_transparent_train,
                                             target_size=(img_width, img_height),
                                             color_mode='grayscale',
                                             classes=[''],
                                             class_mode=None,
                                             batch_size=batch_size,
                                             shuffle=False,
                                             )

gen_clip = dataset_gen2.flow_from_directory(path_clipped_train,
                                            target_size=(img_width, img_height),
                                            color_mode='grayscale',
                                            classes=[''],
                                            class_mode=None,
                                            batch_size=batch_size,
                                            shuffle=False,
                                            )


class Sampling(layers.Layer):
    def call(self, inputs):
        z_mean, z_log_var = inputs
        batch = tf.shape(z_mean)[0]
        dim = tf.shape(z_mean)[1]
        epsilon = tf.keras.backend.random_normal(shape=(batch, dim))
        return z_mean + tf.exp(0.5 * z_log_var) * epsilon


def get_encoder():
    encoder_inputs = keras.Input(shape=input_shape[1:], name="encoder_input")
    enc = encoder_inputs
    for (numFilters, szFilters) in zip(filters, filter_size):
        enc = layers.Conv2D(numFilters, szFilters, activation='relu', strides=2, padding='same')(enc)
        enc = layers.BatchNormalization()(enc)
        enc = layers.Dropout(0.2)(enc)
    conv_shape = K.int_shape(enc)[1:]
    enc = layers.Flatten()(enc)
    enc = layers.Dense(bottleneck, activation='relu', name="bottleneck")(enc)
    enc = layers.BatchNormalization()(enc)
    z_mean = layers.Dense(latent_dim, name="z_mean")(enc)
    z_log_var = layers.Dense(latent_dim, name="z_log_var")(enc)

    latent_z = Sampling()([z_mean, z_log_var])

    encoder_model = keras.models.Model(encoder_inputs, latent_z, name="encoder")
    return encoder_model, conv_shape

enc_model, conv_shape = get_encoder()
enc_model.summary()


def get_decoder():
    latent_input = keras.Input(shape=(latent_dim,))
    dec = layers.Dense(conv_shape[0] * conv_shape[1] * conv_shape[2], activation='relu')(latent_input)
    dec = layers.Reshape(conv_shape)(dec)
    for (numFilters, szFilters) in zip(reversed(filters), reversed(filter_size)):
        dec = layers.Conv2DTranspose(numFilters, szFilters, activation='relu', strides=2, padding='same')(dec)
        dec = layers.BatchNormalization()(dec)
        dec = layers.Dropout(0.2)(dec)
    decoder_outputs = layers.Conv2DTranspose(1, 3, activation='relu', padding='same')(dec)

    decoder_model = keras.models.Model(latent_input, decoder_outputs, name="decoder")
    return decoder_model

dec_model = get_decoder()
dec_model.summary()


def get_discriminator():
    dscr_input = keras.Input(shape=input_shape[1:])
    dscr = dscr_input
    for numFilters in filters:
        dscr = layers.Conv2D(numFilters, kernel_size=5, activation='relu', strides=2, padding='same')(dscr)
    dscr = layers.Flatten()(dscr)
    dscr = layers.Dense(1, activation="relu", name="dsc_end")(dscr)

    discriminator_model = keras.models.Model(dscr_input, dscr, name="discriminator")
    return discriminator_model


dsc_model = get_discriminator()
dsc_model.summary()


class GAN(keras.Model):

    def __init__(self,
                 discriminator,
                 encoder,
                 decoder,
                 latent_dim,
                 dsc_steps=dsc_steps,
                 gp_w=gp_w,
                 ):
        super(GAN, self).__init__()
        self.discriminator = discriminator
        self.encoder = encoder
        self.decoder = decoder
        self.latent_dim = latent_dim
        self.dsc_steps = dsc_steps
        self.gp_w = gp_w

    def compile(self,
                dsc_optimizer, enc_optimizer, dec_optimizer,
                dsc_loss_fn, enc_loss_fn, dec_loss_fn):
        super(GAN, self).compile()
        self.dsc_optimizer = dsc_optimizer
        self.enc_optimizer = enc_optimizer
        self.dec_optimizer = dec_optimizer
        self.dsc_loss_fn = dsc_loss_fn
        self.enc_loss_fn = enc_loss_fn
        self.dec_loss_fn = dec_loss_fn

    def call(self, data):
        ds = self.discriminator(data)
        e = self.encoder(data)
        d = self.decoder(e)

    def gradient_penalty(self, batch_size, ref_img, gen_img):
        alpha = tf.random_normal([batch_size, 1, 1, 1], 0.0, 1.0)
        diff = gen_img - ref_img
        interpolated = ref_img + alpha * diff

        with tf.GradientTape() as gp_tape:
            gp_tape.watch(interpolated)
            pred = self.discriminator(interpolated, training=True)

        grads = gp_tape.gradient(pred, [interpolated])[0]
        norm = tf.sqrt(tf.reduce_sum(tf.square(grads), axis=[1, 2, 3]))
        gp = tf.reduce_mean((norm - 1.0) ** 2)
        return gp

    @tf.function # doesn't make any difference if decorating with that
    def train_step(self, data):
        trans_img = data
        clip_img = data
        batch_size = tf.shape(trans_img)[:1]

        for i in range(self.dsc_steps):
            with tf.GradientTape() as tape:
                z_mean, z_log_var, z = self.encoder(clip_img) # <------ ERROR HERE
                gen_img = self.decoder(z)
                gen_logits = self.discriminator(gen_img)
                ref_logits = self.discriminator(trans_img)

                dsc_cost = self.dsc_loss_fn(ref_img=ref_logits, gen_img=gen_logits)
                gp = self.gradient_penalty(batch_size, trans_img, gen_img)
                dsc_loss = dsc_cost + gp * self.gp_w

            dsc_gradient = tape.gradient(dsc_loss, self.discriminator.trainable_variables)
            self.dsc_optimizer.apply_gradients(zip(dsc_gradient, self.discriminator.trainable_variables))

        with tf.GradientTape() as tape:
            z_mean, z_log_var, z = self.encoder(clip_img)   # <------ ERROR ALSO HERE IF dsc_steps = 0
            gen_img = self.decoder(z)

            gen_img_logits = self.discriminator(gen_img)
            dec_loss = self.dec_loss_fn(gen_img_logits)
            kl_loss = self.kl_loss(z_mean, z_log_var)

        enc_gradient = tape.gradient(kl_loss, self.encoder.trainable_variables)
        self.enc_optimizer.apply_gradients(zip(enc_gradient, self.encoder.trainable_variables))

        dec_gradient = tape.gradient(dec_loss, self.decoder.trainable_variables)
        self.dec_optimizer.apply_gradients(zip(dec_gradient, self.decoder.trainable_variables))

        return {"dsc_loss": dsc_loss, "KL-Loss": kl_loss, "dec_loss": dec_loss}


class GANMonitor(keras.callbacks.Callback):
    def __init__(self, num_img=6, latent_dim=latent_dim):
        self.num_img = num_img
        self.latent_dim = latent_dim

    def on_epoch_end(self, epoch, logs=None):
        generated_images = self.model.decoder()
        generated_images = (generated_images * 127.5) + 127.5

        for i in range(self.num_img):
            img = generated_images[i].np()
            img = keras.preprocessing.image.array_to_img(img)
            img.save("generated_img_{i}_{epoch}.png".format(i=i, epoch=epoch))

encoder_optimizer       = keras.optimizers.Adam(learning_rate=0.0002, beta_1=0.5, beta_2=0.9)
decoder_optimizer       = keras.optimizers.Adam(learning_rate=0.0002, beta_1=0.5, beta_2=0.9)
discriminator_optimizer = keras.optimizers.Adam(learning_rate=0.0002, beta_1=0.5, beta_2=0.9)


def discriminator_loss(real_img, fake_img):
    real_loss = tf.reduce_mean(real_img)
    fake_loss = tf.reduce_mean(fake_img)
    return fake_loss - real_loss


def generator_loss(fake_img):
    return -tf.reduce_mean(fake_img)


def kl_loss(z_mean, z_log_var):
    kl_loss = -0.5 * (1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var))
    kl_loss = tf.reduce_mean(tf.reduce_sum(kl_loss, axis=1))
    return beta_v * kl_loss


def reconstruction_loss(data, reconstruction):
    rec_loss = tf.reduce_mean(
        tf.reduce_sum(keras.losses.mse(data, reconstruction), axis=(1, 2))
    )
    return rec_loss


callback = GANMonitor(num_img=3, latent_dim=latent_dim)


gan = GAN(
    discriminator=dsc_model,
    encoder=enc_model,
    decoder=dec_model,
    latent_dim=latent_dim,
    dsc_steps=dsc_steps,
)


gan.compile(
    dsc_optimizer=discriminator_optimizer,
    enc_optimizer=encoder_optimizer,
    dec_optimizer=decoder_optimizer,
    dsc_loss_fn=discriminator_loss,
    enc_loss_fn=kl_loss,
    dec_loss_fn=generator_loss,
)

gan.fit(gen_trans, batch_size=batch_size, epochs=epochs, callbacks=[callback])

我将非常感谢您的帮助,因为我无法找到或解决此问题。我已经读到在 TF2.5 中不应该再出现类似这样的错误,但不能选择使用 TF2.5。

标签: pythontensorflowkeras

解决方案


我发现了问题。

在 VAE 的原始 Keras 示例中,编码器以三层结束,即 z_mean、z_log_var 和latent_z。虽然可以访问 TF 2.2 中模型的所有终端层,就像我在 train_step 中所做的那样

z_mean, z_log_var, z = encoder(data)

只有 (lantent_) z 按照编码器模型初始化中的定义提交。

通过定义模型

encoder_model = keras.models.Model(encoder_inputs, ([z_mean, z_log_var, latent_z]), name="encoder")

使用输出层列表,所有 z* 都可以访问。我假设从具有奇异输出的单个张量中获取多个变量

x1, x2, x3 = model(data)

导致一个看起来像这样的循环:

for i in x:
    x{i} = model(data)

这是我能想到的对张量进行迭代的唯一解释。

但是,阅读代码可能很有用,我会尽量记住这一点。


推荐阅读