首页 > 解决方案 > 压缩最长的列表并对齐输出。python itertools 从头开始

问题描述

我正在尝试将一个列表与多个列表进行比较,并生成一个值对齐的 csv 文件。itertools.zip_longest 做得很好,但因为需要对齐输出,我想我会构建自己的版本。这将有助于理解生成器。如果有更好的方法,请告诉我。

因为第一个列表是我将其余列表与之进行比较的列表,所以我想我会使用 args[0] 对其进行迭代并将其他列表与它进行比较。因为我只想在找到它的值后手动运行 next(it),所以我创建了一个缓存进行比较。我相信这是我遇到问题的地方。我应该创建更多我的结果显示的行。


def main():
    a = ['apple','banana','pear']
    b = ['apple','orange','orange','pear']
    c = ['banana','cucumber']
    d = ['1 apple','2 cherries']
    
    zipped_data = [','.join(x) for x in zip_longest_list(a,b,c,d,)]


def zip_longest_list(*args, fillvalue=''):
    iterators = [iter(it) for it in args]
    num_active = len(iterators)
    # I created a cache to compare lists with
    cache = [{'value': '', 'isLoaded': False} for i in range(num_active)]
    data = []
    # check if args are valid
    if not num_active:
        return

    # Because the first list is the one I am comparing the rest of the lists to
    # I thought I would use args[0] to iterate over it and compare the others to it

    # iterate over the list to compare to
    for i in args[0]:
        values = []

        for j, it in enumerate(iterators):
            value = ''

    # Because I wanted to manualy run next(it) only once its value has been found  
    # I created a cache

            # load cache
            try:
                if cache[i]['isLoaded'] == False:
                    value = next(it)
                    cache['value'] = value
                    cache[i]['isLoaded'] = True

            # check if list is empty
            except StopIteration:
                num_active -= 1
                if not num_active:
                    return
                iterators[i] = repeat(fillvalue)
                value = fillvalue

    # I believe this is where I am having an issue
    # I should be creating more rows that my results are showing

            if cache[i]['isLoaded'] == True:
                if i == cache[i]['value']:
                    new_row = []
                    [new_row.append(x['value']) for x in cache]
                    row.append(str(','.join([x for x in new_row])))
                    cache[i]['isLoaded'] = False
                else:
                    continue
        data.append(values)
    for i in data:
        yield i


# local copy of itertools.repeat
def repeat(object, times=None):
    if times is None:
        while True:
            yield object
    else:
        for i in range(times):
            yield object


if __name__ == '__main__':
    main()

预期产出

[',,,1 apple']
[',,,2 cherries']
['apple,apple,,']
['banana,,banana,']
[',,cucumber,']
[',orange,,']
[',orange,,']
['pear,pear,,']

实际输出

['apple,,,','apple,apple,,']
['banana,apple,banana,1 apple','banana,orange,banana,1 apple']
['pear,orange,banana,1 apple']

非常感激

标签: pythoncsvzipitertools

解决方案


我想我可能已经弄清楚了如何获得所需的输出。它看起来像用于对磁带进行排序的旧“文件匹配”过程:

def fileMatch(*content, fillValue=None):
    Done      = []
    iterators = [ iter(c) for c in content ]
    values    = [ next(i,Done) for i in iterators ]
    while not all(v is Done for v in values):
        matchValue = min(v for v in values if v is not Done)
        matched    = [ v is not Done and v == matchValue for v in values ]
        yield  tuple ( v if isMatch else fillValue
                       for v,isMatch in zip(values,matched) )
        values     = [ next(i,Done) if isMatch else v
                       for v,isMatch,i in zip(values,matched,iterators) ]

for t in fileMatch(a,b,c,d,fillValue=""): print(t)

('', '', '', '1 apple')
('', '', '', '2 cherries')
('apple', 'apple', '', '')
('banana', '', 'banana', '')
('', '', 'cucumber', '')
('', 'orange', '', '')
('', 'orange', '', '')
('pear', 'pear', '', '')
    

推荐阅读