首页 > 解决方案 > 有条件地将列表拆分为 2 个子列表,支持连续传递

问题描述

我想逐步将一个列表中的值过滤到子列表中。每次条件匹配时,我都想为下一个过滤器忽略该值。

例如,假设我想 a) 抓取可被 3 整除的物品,b) 抓取奇数物品,c) 保留其余物品。

li = [0,1,2,3,4,5,6,7,8,9]

我想得到:

divby3 = [3,6,9]
odd = [1,5,7]
rest =[0,2,4,8]

itertools 中有什么东西可以做到这一点吗?我写了一些测试代码,但看起来某些东西可能已经存在。最具表现力,也是最快的是:

def append_split(li,cond):
    """ list comp appends to 2 separate lists """
    hits, miss = [],[]
    [hits.append(v) if cond(v) else miss.append(v) for v in li]
    return hits, miss

p1_divby3, li = append_split(li, is_3)
p2_odd, p3_rest = append_split(li, is_odd)

或其建议的替代方案:

def looped_append(li, cond):
    """ for-loop to avoid side-effects within list comp """
    hits, miss = [],[]
    for v in li: 
        (hits if cond(v) else miss).append(v)
    return hits, miss

标准库中有更好的方法吗?

我得到的性能(在 10000 项列表上)如下:

timings:
0.00301504 by_filter_prune_set
0.00335598 by_append_split
0.00498891 by_tupling
0.56877589 by_memberships

完整的测试代码:

import sys
from time import time

if len(sys.argv) >=2 :
    li = range(0,int(sys.argv[1]))
    do_compare = False
else:
    li = [0,1,2,3,4,5,6,7,8,9]
    do_compare = True

exp = dict(
    p1_divby3 = [3,6,9],
    p2_odd = [1,5,7],
    p3_rest =[0,2,4,8],
)

def is_3(v): 
    return v and not (v % 3)

def is_odd(v): 
    return bool(v % 2)

def get_result(di):
    return {k:v for k,v in sorted(di.items()) if k in exp}

def by_memberships(li):
    """ SLOWEST.  filter checks that item wasn't previously extracted """
    p1_divby3 = [v for v in li if is_3(v)]
    p2_odd = [v for v in li if is_odd(v) and not v in p1_divby3]
    p3_rest = [v for v in li if not v in p1_divby3 and not v in p2_odd]
    return get_result(locals())

def prune_set(candidates, seen):
    """ filter, then prune found from list."""
    seen = set(seen)  #really slow if you dont cast to a set
    return [v for v in candidates if not v in seen]

def by_filter_prune_set(li):
    p1_divby3 = [v for v in li if is_3(v)]
    li = prune_set(li, p1_divby3)
    p2_odd = [v for v in li if is_odd(v)]
    p3_rest = prune_set(li, p2_odd)
    return get_result(locals())

def looped_append(li, cond):
    # from comments, also slighty faster than append_split
    hits, miss = [],[]
    for v in li: 
        (hits if cond(v) else miss).append(v)
    return hits, miss

def by_looped_append(li):
    p1_divby3, li = looped_append(li, is_3)
    p2_odd, p3_rest = looped_append(li, is_odd)
    return get_result(locals())


def append_split(li,cond):
    """ list comp appends to 2 separate lists """
    hits, miss = [],[]
    [hits.append(v) if cond(v) else miss.append(v) for v in li]
    return hits, miss

def by_append_split(li):
    p1_divby3, li = append_split(li, is_3)
    p2_odd, p3_rest = append_split(li, is_odd)
    return get_result(locals())

def split_tupling(li, cond):
    """ put into a (hit, miss) tuple then re-filter into 2 lists"""
    undefined = NotImplemented
    li = [(v, undefined) if cond(v) else (undefined, v) for v in li  ]
    hits = [v[0] for v in li if v[0] is not undefined]
    miss = [v[1] for v in li if v[0] is undefined]
    return hits, miss

def by_tupling(li):
    p1_divby3, li = split_tupling(li, is_3)
    p2_odd, p3_rest = split_tupling(li, is_odd)
    return get_result(locals())

timings = {}
for fn in [by_memberships, by_looped_append, by_append_split, by_tupling, by_filter_prune_set]:
    sys.stdout.write(f"\n\n{fn.__name__:20.20}")
    start = time()
    got = fn(li)
    duration = time()-start
    sys.stdout.write(f" {duration:10.8f}\n")
    timings[fn.__name__] = duration
    if do_compare:
        if got == exp:
            flag = "✅"
        else:
            flag = "❌"
        print(f"{flag}{exp=}\n{flag}{got=}")

li = sorted([(v,k) for k,v in timings.items()])
print("\n\ntimings:")
[print(f"{tu[0]:010.8f} {tu[1]}") for tu in li]

标签: pythonfunctional-programming

解决方案


下面是一个最小的工作示例答案more_itertools.partition

from more_itertools import partition

li = [0,1,2,3,4,5,6,7,8,9]

def is_3(v): 
    return v and not (v % 3)

def is_odd(v): 
    return bool(v % 2)

def by_partition(li): 
    """ using more_itertools.partition(pred, iterable) """ 
    li2, p1_divby3 = partition(is_3, li) 
    p3_rest, p2_odd = partition(is_odd, li2) 
    return tuple(map(list, [p1_divby3, p2_odd, p3_rest]))

div_by_3, odd, rest = by_partition(li)

我只能补充一点,如果多次遇到这种情况,编写一个更通用的函数可能会很好,也很漂亮,它根据几个条件将一个可迭代对象拆分为多个可迭代对象。

PS 感谢您的代码!


推荐阅读