首页 > 解决方案 > 整数列表二进制表示的文件压缩/解压缩

问题描述

目前,我有一个将整数列表转换为二进制表示的系统。我计算每个数字所需的字节数,然后使用该to_bytes()函数将它们转换为字节,如下所示:

o = open(outFileName, "wb")
for n in result:
    numBytes = math.ceil(n.bit_length()/8)
    o.write(n.to_bytes(numBytes, 'little'))

o.close()

但是,由于字节的长度不同,允许解包程序/函数知道每个字节有多长的方法是什么?我听说过使用 struct 模块,特别是 pack 函数,但是考虑到效率和尽可能减小文件的大小,解决这个问题的最佳方法是允许这样的解包程序检索原始编码整数的确切列表?

标签: pythonpython-3.xcompressionlossless-compression

解决方案


You can't. Your encoding maps different lists of integers to the same sequence of bytes. It is then impossible to know which one was the original input.

You need a different encoding.

Take a look at using the high bit each byte. There are other ways that might be better, depending on the distribution of your integers, such as Golomb coding.


推荐阅读