首页 > 解决方案 > 在 GIF LZW 解压缩算法中读取最大代码表大小后跟清晰代码到底要做什么?

问题描述

我一直在学习如何编码和解码 GIF 图像文件。这是我的测试图像:

这是我的测试图像:

有问题的代码替换为return, 在错误前停止:

from enum import Enum

class Codes(Enum):
    CLEAR = 0
    EOI = 1

def decode(code_bytes: bytes):
    """Return decoded gif from incoming bytes, currently only works with entire bytes object"""
    lzw_min, leng, *_ = code_bytes
    total = 2 + leng

    # Skip lzw_min and sub-block size indicator bytes
    skips = {0, 1, total}
    while leng != 1:
        leng = code_bytes[total] + 1
        total += leng
        skips |= {total}

    def _stream(skips):
        """Return least significant bit still available from current byte"""
        for i, byte in enumerate(code_bytes):
            if i in skips:
                continue
            for _ in range(8):
                yield byte & 1 and 1
                byte >>= 1

    code_stream = _stream(skips)

    def get_code(bits, x=0, s=0):
        """Retrieve bits and assemble in proper order"""
        for n in range(s, bits):
            x |= next(code_stream) << n
        return x

    code_table = {
            **{i: [i] for i in range(2 ** lzw_min)},
            2 ** lzw_min: [Codes.CLEAR], 
            2 ** lzw_min + 1: [Codes.EOI]
    }
    bits = lzw_min + 1
    assert Codes.CLEAR in code_table[get_code(bits)]  # First code is always CLEAR
    table_index = init_table = len(code_table) - 1
    last = get_code(bits)
    code = get_code(bits)
    yield last
    while Codes.EOI not in code_table.get(code, []):

        if code <= table_index:
            for i in code_table[code]:
                yield i
            table_index += 1
            code_table[table_index] = code_table[last] + [code_table[code][0]]

        else:
            k = code_table[last][0]
            for i in code_table[last] + [k]:
                yield i
            table_index += 1
            code_table[table_index] = code_table[last] + [k]

        if table_index == 2**bits-1:
            bits += 1

        # Problem replaced here
        if bits == 13:
            return

        last, code = code, get_code(bits)

上面输出的第二帧 GIF:

在此处输入图像描述

目前,有问题的相同输入的输出包括:

测试图像输出,只有上半部分

所以这就是问题所在:

        if bits == 13:
            last, code = code, get_code(bits)
            bits = lzw_min + 1
            table_index = init_table
            assert Codes.CLEAR in code_table[code]

在读取图像时,当table_index达到最大位长以下 1 并增加超过最大位长的位时,此代码将执行。它显然不能正常工作。

达到最大位大小后,CLEAR接下来会读取代码,正如预期的那样,不会引发 AssertionError。

根据我的阅读,CLEAR代码意味着将位长重置为lzw_min + 1并重新初始化代码表。我相信这就是我的代码所做的。

最初,code_table 是一个列表,并且在位长度重置后被CLEAR重置,但是比当前表索引大得多的代码仍然从流中输出。使代码表成为字典允许在重新初始化后访问当前未覆盖的代码,但这只会产生噪音。显然,一些噪声最终记录为EOI解析结束。

在遇到最大大小的代码后跟一个 ,这个算法有什么问题CLEAR


重新创建我的问题(需要枕头):

以下是测试 gif 的第二帧作为字节文字,您可以将其用于函数输入,帧的调色板作为构建 img 的列表文字:

字节:(从 pastebin 中删除)

调色板:

[(230, 226, 225), (54, 28, 25), (99, 25, 28), (117, 22, 28), (65, 39, 33), (79, 45, 38), (92, 39, 36), (81, 45, 39), (79, 48, 40), (88, 50, 43), (100, 42, 38), (104, 60, 50), (111, 66, 55), (119, 68, 57), (138, 11, 23), (134, 14, 24), (139, 14, 25), (151, 9, 23), (148, 13, 26), (156, 12, 26), (132, 18, 27), (141, 18, 29), (149, 19, 30), (166, 7, 23), (173, 7, 24), (164, 11, 26), (172, 11, 27), (184, 4, 23), (181, 7, 25), (190, 6, 25), (180, 10, 27), (191, 9, 28), (166, 16, 30), (142, 22, 33), (142, 25, 35), (151, 23, 35), (146, 27, 37), (147, 30, 41), (189, 14, 33), (169, 23, 37), (180, 20, 37), (180, 23, 39), (188, 18, 36), (188, 26, 43), (149, 35, 45), (155, 36, 42), (151, 41, 51), (153, 43, 53), (134, 63, 56), (151, 48, 57), (156, 51, 60), (166, 40, 52), (173, 42, 51), (184, 39, 54), (167, 53, 55), (192, 7, 26), (192, 10, 29), (193, 14, 33), (194, 19, 37), (196, 26, 39), (195, 22, 40), (195, 27, 44), (196, 31, 48), (199, 34, 45), (197, 36, 52), (201, 42, 53), (198, 42, 58), (200, 45, 60), (202, 51, 60), (131, 75, 63), (175, 67, 61), (159, 55, 64), (157, 59, 67), (161, 59, 68), (170, 58, 69), (185, 55, 68), (202, 52, 67), (199, 56, 70), (204, 60, 75), (208, 62, 64), (148, 81, 71), (164, 68, 76), (171, 69, 76), (183, 70, 73), (189, 86, 75), (165, 75, 83), (169, 76, 84), (184, 75, 85), (170, 90, 83), (168, 84, 91), (172, 84, 91), (169, 99, 82), (177, 104, 86), (182, 106, 88), (172, 90, 97), (181, 92, 99), (173, 102, 108), (181, 100, 107), (183, 109, 114), (186, 118, 123), (198, 67, 72), (205, 65, 78), (206, 77, 73), (206, 69, 82), (202, 74, 87), (207, 73, 86), (208, 74, 87), (208, 77, 90), (199, 85, 90), (215, 91, 82), (209, 81, 94), (199, 104, 87), (218, 106, 92), (195, 114, 93), (203, 114, 92), (212, 112, 95), (201, 90, 100), (210, 85, 97), (211, 90, 101), (212, 94, 105), (199, 99, 108), (213, 98, 109), (202, 119, 99), (212, 122, 99), (215, 121, 100), (217, 125, 102), (201, 105, 114), (214, 102, 113), (215, 106, 117), (215, 109, 119), (216, 111, 120), (199, 116, 123), (212, 115, 124), (217, 115, 124), (225, 121, 103), (221, 130, 107), (226, 133, 110), (228, 135, 112), (230, 136, 113), (232, 138, 114), (224, 140, 121), (184, 125, 129), (198, 124, 130), (215, 122, 130), (219, 123, 132), (185, 133, 137), (189, 143, 146), (190, 149, 152), (191, 160, 162), (198, 132, 137), (215, 135, 141), (221, 131, 139), (201, 141, 145), (212, 141, 146), (222, 139, 146), (203, 150, 154), (214, 150, 154), (226, 153, 135), (224, 142, 149), (224, 144, 151), (225, 148, 155), (226, 153, 159), (224, 169, 156), (200, 158, 161), (214, 158, 162), (227, 156, 162), (200, 167, 169), (210, 162, 165), (216, 166, 169), (213, 170, 173), (201, 175, 177), (216, 174, 176), (205, 184, 185), (218, 179, 180), (215, 182, 184), (221, 187, 188), (228, 161, 166), (229, 165, 170), (230, 170, 174), (231, 173, 177), (232, 174, 178), (232, 178, 181), (227, 183, 185), (233, 182, 185), (234, 186, 189), (212, 191, 192), (228, 191, 193), (235, 190, 193), (207, 203, 202), (211, 195, 196), (212, 205, 204), (218, 201, 201), (212, 208, 207), (214, 210, 210), (216, 212, 212), (219, 216, 215), (222, 218, 217), (226, 195, 196), (236, 195, 197), (228, 199, 200), (237, 199, 200), (229, 203, 203), (238, 203, 204), (238, 207, 208), (240, 207, 208), (230, 213, 213), (235, 212, 212), (226, 221, 220), (237, 220, 219), (241, 211, 212), (241, 215, 216), (242, 219, 219), (228, 224, 223), (239, 224, 223), (242, 224, 223), (230, 226, 224), (234, 229, 228), (237, 232, 231), (239, 234, 232), (244, 228, 227), (245, 232, 231), (244, 237, 235), (248, 239, 237), (246, 241, 239), (248, 241, 239), (247, 242, 240), (248, 242, 240), (250, 248, 246), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0)]

这是使用这些文字产生与我相同的输出所需的代码:

from PIL import Image
w, h = 380, 407
gifo = Image.new('RGB', (w, h))
gif = gifo.load()
pix = decode(FRAME_2)  # bytes literal as input
palette = PALETTE      # palette literal as input
try:
    for y in range(h):
        for x in range(w):
            t = next(pix)
            while t in {Codes.EOI, Codes.CLEAR}: # Only needed to handle buggy noise with codes mixed into it
                    t = next(pix)
            gif[x, y] = palette[t]
except StopIteration:
    ...
gifo.show()

我在这个项目中使用的资源:

http://giflib.sourceforge.net/whatsinagif/lzw_image_data.html

https://www.daubnet.com/en/file-format-gif

https://www.w3.org/Graphics/GIF/spec-gif89a.txt

https://www.eecis.udel.edu/~amer/CISC651/lzw.and.gif.explained.html

关于延迟CLEAR代码,虽然这个 gif 不包含这些,我也没有在这个问题中包含它们的逻辑:

https://halicery.com/Image%20Decoders/GIF/GIF-LZW%20Deferred%20Clear%20Code.html

标签: pythonalgorithmcompressiongiflzw

解决方案


我想到了。除其他外,我正在提取 13 位CLEAR代码而不是 12 位代码,并且在正确初始化表时我没有循环到下一个数据。这个版本很好用。

BYTE = 8
MAX_BITLEN = 12

def decode(code_bytes):
    """Yield decoded gif-style LZW from incoming bytes."""
    bytes_iter = iter(code_bytes)
    lzw_min = next(bytes_iter)
    get_code = _get_code_getter(bytes_iter)
    code = clear = get_code(lzw_min + 1)
    eoi = clear + 1
    while code != eoi:
        if code == clear:
            bitlength = lzw_min + 1
            code_table = [[i] for i in range(eoi + 1)]
            last, code = get_code(bitlength), get_code(bitlength)
            yield last
        if code < len(code_table):
            output = code_table[code]
            to_add = code_table[last] + [code_table[code][0]]
        else:
            to_add = output = code_table[last] + [code_table[last][0]]
        for i in output:
            yield i
        code_table += [to_add]
        bitlength += len(code_table) == (bitlength < MAX_BITLEN) << bitlength
        last, code = code, get_code(bitlength)


def _get_code_getter(code_iter):
    def bit_stream():
        """Yield least significant bit remaining in current byte."""
        length = next(code_iter)
        for read, byte in enumerate(code_iter):
            if read == length:
                length += byte + 1
            else:
                for bit in (1 << b & byte and 1 for b in range(BYTE)):
                    yield bit

    def code_getter(bitlength):
        """Retrieve/structure bits, least significant first."""
        return sum(next(code_stream) << z for z in range(bitlength))

    code_stream = bit_stream()
    return code_getter

推荐阅读