首页 > 解决方案 > 词表压缩

问题描述

我正在研究字符串的压缩算法(包含逗号分隔的单词列表)。结果将是一个格式为“字数”的字符串。例如:'a, a, b, c'-> 'a-2, b-1, c-1'

不知道为什么这不起作用

def word_compression(words):
    words=words.split(",")
    print(words)
    prev=words[0]
    count=0
    #record=dict()
    s=""
    for i in range (len(words)):
        word=words[i].strip()
        if(word==prev):
            count+=1
            print(word,count)
        else:
            #new word
            #record[prev]=count
            print(word,prev)
            if(s==""):
                s=prev+"-"+str(count)
            else:
                s=s+", "+prev+"-"+str(count)
            print("changed from: ",prev, word)
            count=1
            prev=word

    print(prev,count)
    s=s+", "+prev+"-"+str(count)
    return s

标签: python-3.xstringlist

解决方案


我更喜欢使用itertools.groupby来获取计数的解决方案,然后用于', '.join加入结果:

from itertools import groupby
from re import split

def word_compression(words):
    words = split(r',\s*', words)
    counts = (f'{word}-{sum(1 for _ in group)}' for word, group in groupby(words))
    return ', '.join(counts)

print(word_compression("1,1,1,1,1,1, 2, 2, 2, 2, 3, 3, 1, 1, 1"))
# 1-6, 2-4, 3-2, 1-3

推荐阅读