首页 > 解决方案 > TFRecords 比原始大小大 100 倍

问题描述

我正在使用 StyleGAN github repo 中的 dataset_tool.py 将本地文件夹中的火车图像转换为 TFRecords。这是代码:

 def create_from_images(tfrecord_dir, image_dir, shuffle):
    print('Loading images from "%s"' % image_dir)
    image_filenames = sorted(glob.glob(os.path.join(image_dir, '*')))
    if len(image_filenames) == 0:
        error('No input images found')

    img = np.asarray(PIL.Image.open(image_filenames[0]))
    resolution = img.shape[0]
    channels = img.shape[2] if img.ndim == 3 else 1
    if img.shape[1] != resolution:
        error('Input images must have the same width and height')
    if resolution != 2 ** int(np.floor(np.log2(resolution))):
        error('Input image resolution must be a power-of-two')
    if channels not in [1, 3]:
        error('Input images must be stored as RGB or grayscale')

    with TFRecordExporter(tfrecord_dir, len(image_filenames)) as tfr:
        order = tfr.choose_shuffled_order() if shuffle else np.arange(len(image_filenames))
        for idx in range(order.size):
            img = np.asarray(PIL.Image.open(image_filenames[order[idx]]))
            if channels == 1:
                img = img[np.newaxis, :, :] # HW => CHW
            else:
                img = img.transpose([2, 0, 1]) # HWC => CHW
            tfr.add_image(img)

 def add_image(self, img):
     if self.print_progress and self.cur_images % self.progress_interval == 0:
            print('%d / %d\r' % (self.cur_images, self.expected_images), end='', flush=True)
     if self.shape is None:
            self.shape = img.shape
            self.resolution_log2 = int(np.log2(self.shape[1]))
            assert self.shape[0] in [1, 3]
            assert self.shape[1] == self.shape[2]
            assert self.shape[1] == 2**self.resolution_log2
            tfr_opt = tf.python_io.TFRecordOptions(tf.python_io.TFRecordCompressionType.NONE)
            for lod in range(self.resolution_log2 - 1):
                tfr_file = self.tfr_prefix + '-r%02d.tfrecords' % (self.resolution_log2 - lod)
                self.tfr_writers.append(tf.python_io.TFRecordWriter(tfr_file, tfr_opt))
        assert img.shape == self.shape
        for lod, tfr_writer in enumerate(self.tfr_writers):
            if lod:
                img = img.astype(np.float32)
                img = (img[:, 0::2, 0::2] + img[:, 0::2, 1::2] + img[:, 1::2, 0::2] + img[:, 1::2, 1::2]) * 0.25
            quant = np.rint(img).clip(0, 255).astype(np.uint8)
            ex = tf.train.Example(features=tf.train.Features(feature={
                'shape': tf.train.Feature(int64_list=tf.train.Int64List(value=quant.shape)),
                'data': tf.train.Feature(bytes_list=tf.train.BytesList(value=[quant.tostring()]))}))
            tfr_writer.write(ex.SerializeToString())
        self.cur_images += 1

它创建具有多个原始分辨率的 TFRecords 文件。因此,使用原始分辨率创建的 TFRecords 比包含文件的原始文件夹大 100 倍。我的原始文件是 BW png 2 KB,每个文件的文件夹大小为 120 MB。虽然我收到的 TFRecords 是 12 GB。我知道 TFRecords 通常比原来的大,但不是 100 倍!这里可能有什么问题?

标签: pythontensorflow

解决方案


问题是您将未压缩的图像保存在记录文件中,这比压缩的图像文件占用更多的空间。为避免这种情况,您可以直接将图像文件写入记录,但由于您首先进行一些图像处理,因此您必须进行该处理并以压缩格式再次保存生成的图像。您可以使用如下函数将图像数组转换为其 PNG 压缩形式:

import io
import numpy as np
from PIL import Image

def img2png(image):
    # Assumes image was passed in CHW format
    img = Image.fromarray(np.moveaxis(image, 0, 2))
    with io.BytesIO() as img_bytes:
        img.save(img_bytes, 'PNG')
        return img_bytes.getvalue()

在您的示例中,您可以quant像这样保存图像。

ex = tf.train.Example(features=tf.train.Features(feature={
    'shape': tf.train.Feature(int64_list=tf.train.Int64List(value=quant.shape)),
    'data': tf.train.Feature(bytes_list=tf.train.BytesList(value=[img2png(quant)]))}))

tf.io.decode_image请注意,由于您正在保存压缩图像,因此稍后解析记录时需要使用。这是您必须为减少磁盘大小而“支付”的开销。


推荐阅读