tensorflow - 使用 tfrecord 但文件太大
问题描述
我正在尝试从 numpy 数组的文件夹创建一个 tfrecord,该文件夹包含大约 2000 个 50mb 的 numpy 文件。
def convert(image_paths,out_path):
# Args:
# image_paths List of file-paths for the images.
# labels Class-labels for the images.
# out_path File-path for the TFRecords output file.
print("Converting: " + out_path)
# Number of images. Used when printing the progress.
num_images = len(image_paths)
# Open a TFRecordWriter for the output-file.
with tf.python_io.TFRecordWriter(out_path) as writer:
# Iterate over all the image-paths and class-labels.
for i, (path) in enumerate(image_paths):
# Print the percentage-progress.
print_progress(count=i, total=num_images-1)
# Load the image-file using matplotlib's imread function.
img = np.load(path)
# Convert the image to raw bytes.
img_bytes = img.tostring()
# Create a dict with the data we want to save in the
# TFRecords file. You can add more relevant data here.
data = \
{
'image': wrap_bytes(img_bytes)
}
# Wrap the data as TensorFlow Features.
feature = tf.train.Features(feature=data)
# Wrap again as a TensorFlow Example.
example = tf.train.Example(features=feature)
# Serialize the data.
serialized = example.SerializeToString()
# Write the serialized data to the TFRecords file.
writer.write(serialized)
我认为它转换了大约 200 个文件,然后我得到了这个
Converting: tf.recordtrain
- Progress: 3.6%Traceback (most recent call last):
File "tf_record.py", line 71, in <module>
out_path=path_tfrecords_train)
File "tf_record.py", line 54, in convert
writer.write(serialized)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/lib/io/tf_record.py", line 236, in write
self._writer.WriteRecord(record, status)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.OutOfRangeError: tf.recordtrain; File too large
任何解决此问题的建议都会有所帮助,在此先感谢。
解决方案
我不确定 tfrecords 的限制是什么,但假设您有足够的磁盘空间,更常见的方法是将数据集存储在多个 tfrecords 文件中,例如将每 20 个 numpy 文件存储在不同的 tfrecords 文件中。
推荐阅读
- java - 如何在 android studio 中修复“IllegalStateException”(致命异常)?
- javascript - 有时代码未显示在 Visual Studio 代码的自动完成列表中,解决方案?
- amazon-web-services - 避免存储在 s3 存储桶上的 http 内容在浏览器中被破坏
- c# - 拼写检查 en-GB 不适用于日文键盘
- python - 无法使用 python 脚本切换 gcloud 平台帐户
- java - 如何使用apache poi将excel中的滚动条限制为最大行大小?
- asp.net - 如何设计一个带有空白第一行的数据源的gridview或表格,我们可以在asp.net中搜索表格列?
- bash - 通过校验和查找目录中修改过的文件
- javascript - 为什么 XMLHttpRequestEventTarget.onerror 在同步请求中不起作用?
- enterprise-library - .Net framework 4.8 会支持微软企业库 v6.0 吗?