首页 > 解决方案 > gzip 压缩时设备上没有空间

问题描述

我正在使用django-bakery尝试 gzip 一个大小约为 1MB 的大文件。

但是,它因错误而失败:

[2021-11-08 09:59:37] myapp.views DEBUG Target path: /project/build/index.html
[2021-11-08 09:59:37] bakery.views.base DEBUG Gzipping to osfs:////project/build/index.html
Traceback (most recent call last):
  File "/project/.env/lib/python3.7/site-packages/fs/osfs.py", line 646, in open
    **options
OSError: [Errno 28] No space left on device: b'/project/build/index.html'

我在 Ubuntu 20 上运行。我以前遇到过这个错误,我知道可能导致它的各种原因,以及诊断根本原因的多种方法,但到目前为止我没有看到任何明显的原因。

运行df -H显示我有超过 200GB 的可用磁盘空间。

Filesystem      Size  Used Avail Use% Mounted on
udev             17G     0   17G   0% /dev
tmpfs           3.4G  877k  3.4G   1% /run
/dev/md1        977G  684G  244G  74% /
tmpfs            17G  291k   17G   1% /dev/shm
tmpfs           5.3M     0  5.3M   0% /run/lock
tmpfs            17G     0   17G   0% /sys/fs/cgroup
tmpfs           3.4G     0  3.4G   0% /run/user/1000

运行sudo find . -xdev -type f | cut -d "/" -f 2 | sort | uniq -c | sort -n确实显示我的build目录有大量的 inode。

        ...
   4414 .git
  38208 .env
 660615 media
16699756 build

但是,df -i显示我的磁盘仍然有大量的 inode:

Filesystem       Inodes    IUsed    IFree IUse% Mounted on
udev            4110382      457  4109925    1% /dev
tmpfs           4116299      726  4115573    1% /run
/dev/md1       60563456 18135948 42427508   30% /
tmpfs           4116299       24  4116275    1% /dev/shm
tmpfs           4116299        6  4116293    1% /run/lock
tmpfs           4116299       18  4116281    1% /sys/fs/cgroup
tmpfs           4116299       11  4116288    1% /run/user/1000

当我运行该进程时,我注意到它似乎内存不足。我有 32GB,其中大约一半是免费的,一旦它消耗了所有这些,那就是进程报告存储空间不足的时候。

完整的回溯到:

  File "/project/myapp/views.py", line 238, in build_queryset
    self.build_object(o)
  File "/project/myapp/views.py", line 261, in build_object
    self.build_file(target_path, self.get_content())
  File "/project/.env/lib/python3.7/site-packages/bakery/views/base.py", line 68, in build_file
    self.gzip_file(path, html)
  File "/project/.env/lib/python3.7/site-packages/bakery/views/base.py", line 123, in gzip_file
    with self.fs.open(smart_text(target_path), 'wb') as outfile:
  File "/project/.env/lib/python3.7/site-packages/fs/osfs.py", line 646, in open
    **options
  File "/project/.env/lib/python3.7/site-packages/fs/error_tools.py", line 90, in __exit__
    reraise(fserror, fserror(self._path, exc=exc_value), traceback)
  File "/project/.env/lib/python3.7/site-packages/six.py", line 702, in reraise
    raise value.with_traceback(tb)
  File "/project/.env/lib/python3.7/site-packages/fs/osfs.py", line 646, in open

如果我查找面包店的代码,gzip_file()我会看到:

# Write GZIP data to an in-memory buffer
data_buffer = six.BytesIO()
kwargs = dict(
    filename=path.basename(target_path),
    mode='wb',
    fileobj=data_buffer
)
if float(sys.version[:3]) >= 2.7:
    kwargs['mtime'] = 0
with gzip.GzipFile(**kwargs) as f:
    f.write(six.binary_type(html))

# Write that buffer out to the filesystem
with self.fs.open(smart_text(target_path), 'wb') as outfile:
    outfile.write(data_buffer.getvalue())
    outfile.close()

所以看起来这是错误发生,因为它旨在将 gzip 数据写入内存缓冲区。但是,我不明白为什么它在将该缓冲区写入磁盘而不是f.write(six.binary_type(html))写入该内存缓冲区的行上失败了。

为什么在缓冲区写入期间没有失败?

显然,解决此问题的方法是覆盖此代码以跳过内存缓冲区并直接写入磁盘。这将使较小的写入速度慢得多,但这可能只是开展业务的成本。

有更好的解决方案吗?有什么方法可以指导 Gzip 故障转移到磁盘甚至交换而不是在内存不足时引发异常?

标签: pythongzip

解决方案


推荐阅读