首页 > 解决方案 > python文本编码程序

问题描述

当将文本输入定义 run_length_encoder 时,重复字母应该被压缩,例如,当输入 aaabbac 时,输出应该是 ['a','a',3,'b','b',2,'a', 'c'] 但因为我的代码没有压缩。

def run_length_encoder(string):
#def compress(string):

    res = []

    count = 1

    #Add in first character
    res.append(string[0])

    #Iterate through loop, skipping last one
    for i in range(len(string)-1):
        if(string[i] == string[i+1]):
            count+=1
            res.append(string[i+1])
        else:
            if(count > 1):
                #Ignore if no repeats
                res.append(count)
            res.append(string[i+1])
            count = 1
    #print last one
    if(count > 1):
        res.append(str(count))
    return res

例如,当输入 abbbbaa 时,输出应该是这个 ['a', 'b', 'b', 4, 'a', 'a', 2] 而我得到的是这个 ['a', ' b','b','b','b',4,'a','a','2']

标签: pythonpython-3.x

解决方案


Itertools爱你,希望你快乐:

from itertools import chain, groupby

def run_length_encoder(src):
    return list(
        # chain.from_iterable flattens the series of tuples we make inside the
        # loop into a single list.
        chain.from_iterable(
            # groupby returns an iterable (item, group) where group is an
            # iterable that yields a copy of `item` as many times as that item
            # appears consecutively in the input. Therefore, if you take the
            # length of `group`, you get the run length of `item`. This
            # whole expression then returns a series of (letter, count)
            # tuples.
            (letter, len(list(group))) for letter, group in groupby(src)
        )
    )


print(run_length_encoder("aajjjjiiiiohhkkkkkkkkhkkkk"))

推荐阅读