python - 在 GIF LZW 解压缩算法中读取最大代码表大小后跟清晰代码到底要做什么?
问题描述
我一直在学习如何编码和解码 GIF 图像文件。这是我的测试图像:
有问题的代码替换为return
, 在错误前停止:
from enum import Enum
class Codes(Enum):
CLEAR = 0
EOI = 1
def decode(code_bytes: bytes):
"""Return decoded gif from incoming bytes, currently only works with entire bytes object"""
lzw_min, leng, *_ = code_bytes
total = 2 + leng
# Skip lzw_min and sub-block size indicator bytes
skips = {0, 1, total}
while leng != 1:
leng = code_bytes[total] + 1
total += leng
skips |= {total}
def _stream(skips):
"""Return least significant bit still available from current byte"""
for i, byte in enumerate(code_bytes):
if i in skips:
continue
for _ in range(8):
yield byte & 1 and 1
byte >>= 1
code_stream = _stream(skips)
def get_code(bits, x=0, s=0):
"""Retrieve bits and assemble in proper order"""
for n in range(s, bits):
x |= next(code_stream) << n
return x
code_table = {
**{i: [i] for i in range(2 ** lzw_min)},
2 ** lzw_min: [Codes.CLEAR],
2 ** lzw_min + 1: [Codes.EOI]
}
bits = lzw_min + 1
assert Codes.CLEAR in code_table[get_code(bits)] # First code is always CLEAR
table_index = init_table = len(code_table) - 1
last = get_code(bits)
code = get_code(bits)
yield last
while Codes.EOI not in code_table.get(code, []):
if code <= table_index:
for i in code_table[code]:
yield i
table_index += 1
code_table[table_index] = code_table[last] + [code_table[code][0]]
else:
k = code_table[last][0]
for i in code_table[last] + [k]:
yield i
table_index += 1
code_table[table_index] = code_table[last] + [k]
if table_index == 2**bits-1:
bits += 1
# Problem replaced here
if bits == 13:
return
last, code = code, get_code(bits)
上面输出的第二帧 GIF:
目前,有问题的相同输入的输出包括:
所以这就是问题所在:
if bits == 13:
last, code = code, get_code(bits)
bits = lzw_min + 1
table_index = init_table
assert Codes.CLEAR in code_table[code]
在读取图像时,当table_index
达到最大位长以下 1 并增加超过最大位长的位时,此代码将执行。它显然不能正常工作。
达到最大位大小后,CLEAR
接下来会读取代码,正如预期的那样,不会引发 AssertionError。
根据我的阅读,CLEAR
代码意味着将位长重置为lzw_min + 1
并重新初始化代码表。我相信这就是我的代码所做的。
最初,code_table 是一个列表,并且在位长度重置后被CLEAR
重置,但是比当前表索引大得多的代码仍然从流中输出。使代码表成为字典允许在重新初始化后访问当前未覆盖的代码,但这只会产生噪音。显然,一些噪声最终记录为EOI
解析结束。
在遇到最大大小的代码后跟一个 ,这个算法有什么问题CLEAR
?
重新创建我的问题(需要枕头):
以下是测试 gif 的第二帧作为字节文字,您可以将其用于函数输入,帧的调色板作为构建 img 的列表文字:
字节:(从 pastebin 中删除)
调色板:
[(230, 226, 225), (54, 28, 25), (99, 25, 28), (117, 22, 28), (65, 39, 33), (79, 45, 38), (92, 39, 36), (81, 45, 39), (79, 48, 40), (88, 50, 43), (100, 42, 38), (104, 60, 50), (111, 66, 55), (119, 68, 57), (138, 11, 23), (134, 14, 24), (139, 14, 25), (151, 9, 23), (148, 13, 26), (156, 12, 26), (132, 18, 27), (141, 18, 29), (149, 19, 30), (166, 7, 23), (173, 7, 24), (164, 11, 26), (172, 11, 27), (184, 4, 23), (181, 7, 25), (190, 6, 25), (180, 10, 27), (191, 9, 28), (166, 16, 30), (142, 22, 33), (142, 25, 35), (151, 23, 35), (146, 27, 37), (147, 30, 41), (189, 14, 33), (169, 23, 37), (180, 20, 37), (180, 23, 39), (188, 18, 36), (188, 26, 43), (149, 35, 45), (155, 36, 42), (151, 41, 51), (153, 43, 53), (134, 63, 56), (151, 48, 57), (156, 51, 60), (166, 40, 52), (173, 42, 51), (184, 39, 54), (167, 53, 55), (192, 7, 26), (192, 10, 29), (193, 14, 33), (194, 19, 37), (196, 26, 39), (195, 22, 40), (195, 27, 44), (196, 31, 48), (199, 34, 45), (197, 36, 52), (201, 42, 53), (198, 42, 58), (200, 45, 60), (202, 51, 60), (131, 75, 63), (175, 67, 61), (159, 55, 64), (157, 59, 67), (161, 59, 68), (170, 58, 69), (185, 55, 68), (202, 52, 67), (199, 56, 70), (204, 60, 75), (208, 62, 64), (148, 81, 71), (164, 68, 76), (171, 69, 76), (183, 70, 73), (189, 86, 75), (165, 75, 83), (169, 76, 84), (184, 75, 85), (170, 90, 83), (168, 84, 91), (172, 84, 91), (169, 99, 82), (177, 104, 86), (182, 106, 88), (172, 90, 97), (181, 92, 99), (173, 102, 108), (181, 100, 107), (183, 109, 114), (186, 118, 123), (198, 67, 72), (205, 65, 78), (206, 77, 73), (206, 69, 82), (202, 74, 87), (207, 73, 86), (208, 74, 87), (208, 77, 90), (199, 85, 90), (215, 91, 82), (209, 81, 94), (199, 104, 87), (218, 106, 92), (195, 114, 93), (203, 114, 92), (212, 112, 95), (201, 90, 100), (210, 85, 97), (211, 90, 101), (212, 94, 105), (199, 99, 108), (213, 98, 109), (202, 119, 99), (212, 122, 99), (215, 121, 100), (217, 125, 102), (201, 105, 114), (214, 102, 113), (215, 106, 117), (215, 109, 119), (216, 111, 120), (199, 116, 123), (212, 115, 124), (217, 115, 124), (225, 121, 103), (221, 130, 107), (226, 133, 110), (228, 135, 112), (230, 136, 113), (232, 138, 114), (224, 140, 121), (184, 125, 129), (198, 124, 130), (215, 122, 130), (219, 123, 132), (185, 133, 137), (189, 143, 146), (190, 149, 152), (191, 160, 162), (198, 132, 137), (215, 135, 141), (221, 131, 139), (201, 141, 145), (212, 141, 146), (222, 139, 146), (203, 150, 154), (214, 150, 154), (226, 153, 135), (224, 142, 149), (224, 144, 151), (225, 148, 155), (226, 153, 159), (224, 169, 156), (200, 158, 161), (214, 158, 162), (227, 156, 162), (200, 167, 169), (210, 162, 165), (216, 166, 169), (213, 170, 173), (201, 175, 177), (216, 174, 176), (205, 184, 185), (218, 179, 180), (215, 182, 184), (221, 187, 188), (228, 161, 166), (229, 165, 170), (230, 170, 174), (231, 173, 177), (232, 174, 178), (232, 178, 181), (227, 183, 185), (233, 182, 185), (234, 186, 189), (212, 191, 192), (228, 191, 193), (235, 190, 193), (207, 203, 202), (211, 195, 196), (212, 205, 204), (218, 201, 201), (212, 208, 207), (214, 210, 210), (216, 212, 212), (219, 216, 215), (222, 218, 217), (226, 195, 196), (236, 195, 197), (228, 199, 200), (237, 199, 200), (229, 203, 203), (238, 203, 204), (238, 207, 208), (240, 207, 208), (230, 213, 213), (235, 212, 212), (226, 221, 220), (237, 220, 219), (241, 211, 212), (241, 215, 216), (242, 219, 219), (228, 224, 223), (239, 224, 223), (242, 224, 223), (230, 226, 224), (234, 229, 228), (237, 232, 231), (239, 234, 232), (244, 228, 227), (245, 232, 231), (244, 237, 235), (248, 239, 237), (246, 241, 239), (248, 241, 239), (247, 242, 240), (248, 242, 240), (250, 248, 246), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0)]
这是使用这些文字产生与我相同的输出所需的代码:
from PIL import Image
w, h = 380, 407
gifo = Image.new('RGB', (w, h))
gif = gifo.load()
pix = decode(FRAME_2) # bytes literal as input
palette = PALETTE # palette literal as input
try:
for y in range(h):
for x in range(w):
t = next(pix)
while t in {Codes.EOI, Codes.CLEAR}: # Only needed to handle buggy noise with codes mixed into it
t = next(pix)
gif[x, y] = palette[t]
except StopIteration:
...
gifo.show()
我在这个项目中使用的资源:
http://giflib.sourceforge.net/whatsinagif/lzw_image_data.html
https://www.daubnet.com/en/file-format-gif
https://www.w3.org/Graphics/GIF/spec-gif89a.txt
https://www.eecis.udel.edu/~amer/CISC651/lzw.and.gif.explained.html
关于延迟CLEAR
代码,虽然这个 gif 不包含这些,我也没有在这个问题中包含它们的逻辑:
https://halicery.com/Image%20Decoders/GIF/GIF-LZW%20Deferred%20Clear%20Code.html
解决方案
我想到了。除其他外,我正在提取 13 位CLEAR
代码而不是 12 位代码,并且在正确初始化表时我没有循环到下一个数据。这个版本很好用。
BYTE = 8
MAX_BITLEN = 12
def decode(code_bytes):
"""Yield decoded gif-style LZW from incoming bytes."""
bytes_iter = iter(code_bytes)
lzw_min = next(bytes_iter)
get_code = _get_code_getter(bytes_iter)
code = clear = get_code(lzw_min + 1)
eoi = clear + 1
while code != eoi:
if code == clear:
bitlength = lzw_min + 1
code_table = [[i] for i in range(eoi + 1)]
last, code = get_code(bitlength), get_code(bitlength)
yield last
if code < len(code_table):
output = code_table[code]
to_add = code_table[last] + [code_table[code][0]]
else:
to_add = output = code_table[last] + [code_table[last][0]]
for i in output:
yield i
code_table += [to_add]
bitlength += len(code_table) == (bitlength < MAX_BITLEN) << bitlength
last, code = code, get_code(bitlength)
def _get_code_getter(code_iter):
def bit_stream():
"""Yield least significant bit remaining in current byte."""
length = next(code_iter)
for read, byte in enumerate(code_iter):
if read == length:
length += byte + 1
else:
for bit in (1 << b & byte and 1 for b in range(BYTE)):
yield bit
def code_getter(bitlength):
"""Retrieve/structure bits, least significant first."""
return sum(next(code_stream) << z for z in range(bitlength))
code_stream = bit_stream()
return code_getter
推荐阅读
- c++ - 为什么虚拟 cudaMalloc 在moderngpu 中加速interval_gather?
- sql - JPA 查询参数 IN 或 IS NULL
- firebase - Firebase 实时数据库过滤数据
- python - PYTHON CSV 文件上传 - WebUI -> Lambda ->S3
- python - 在“字符串”类型的 pd.dataframe 上使用 iloc 时,字符串比较总是错误的
- kubernetes - Nginx-ingress 上的默认证书
- c# - 表单间导航的路由类
- javascript - 解构数组并在一行中分配给属性对象
- javascript - Threejs中的“加号”或“X”形波浪
- flutter - webview_flutter 只是在 android 模拟器中显示一个空白屏幕