首页 > 解决方案 > 使用 tf.data.cache(file) 时并发缓存迭代器错误

问题描述

使用 tf.data.cache(file) 时出现以下错误model.fit(),我不知道为什么会这样。目录中没有lockfile

tensorflow.python.framework.errors_impl.AlreadyExistsError:  There appears to be a concurrent caching iterator running - cache lockfile already exists ('/tmp/cache/mydataset-train_0.lockfile'). If you are sure no other running TF computations are using this cache prefix, delete the lockfile and re-initialize the iterator. Lockfile contents: Created at: 1601972246
     [[node IteratorGetNext (defined at /Users/lzuwei/workspace/train_model.py:132) ]] [Op:__inference_train_function_2847]

Function call stack:
train_function

这是我的数据管道的样子,files_list15tfrecord 格式的文件。num_parallel_reads被设定为15

ds = tf.data.TFRecordDataset(filenames=files_list, compression_type='GZIP', num_parallel_reads=num_parallel_reads) \
        .map(map_fn, num_parallel_calls=tf.data.experimental.AUTOTUNE) \
        .cache("/tmp/cache/mydataset-train") \
        .shuffle(buffer_size=10*batch_size) \
        .batch(batch_size) \
        .prefetch(tf.data.experimental.AUTOTUNE)

model_merged = modelMHA_tfa() # returns a tf.keras.models.Model

model_merged.fit(
    ds,
    epochs=10,
)

def map_fn(data_record):
    features = tf.io.parse_single_example(data_record, fc_dataset_schema)
    # dd = tf.cast(features['a'], dtype=tf.float32)
    X = tf.stack([
        tf.cast(features['b'], dtype=tf.float32),
        tf.cast(features['c'], dtype=tf.float32),
        features['d'],
        features['e'],
        features['f'],
        features['g']
    ],
        axis=0
    )
    Y = tf.stack([
        features['h']
    ],
        axis=0
    )
    return X, Y

非常感谢任何提示和建议!

标签: pythontensorflow

解决方案


问题是由于model.fit()在数据集之前创建了一个迭代器。

ds_iter = iter(ds)
x, y = ds_iter.next()

删除此代码后问题得到解决。


推荐阅读