首页 > 解决方案 > 使用集合文字更快地设置包含

问题描述

在下文10_000_000中,我会检查 if10是否在{0, ..., 9}.

在第一次检查中我使用中间变量,在第二次检查中我使用文字。

import timeit

x = 10
s = set(range(x))
number = 10 ** 7

stmt = f'my_set = {s} ; {x} in my_set'
print(f'eval "{stmt}"')
print(timeit.timeit(stmt=stmt, number=number))

stmt = f'{x} in {s}'
print(f'eval "{stmt}"')
print(timeit.timeit(stmt=stmt, number=number))

输出:

eval "my_set = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} ; 10 in my_set"
1.2576093
eval "10 in {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}"
0.20336140000000036

为什么第二个更快(大约是 5-6 倍)?Python是否执行了一些运行时优化,例如,如果包含检查是否对文字进行?或者可能是由于垃圾收集(因为它是文字 python 垃圾在使用后立即收集它)?

标签: pythonperformance

解决方案


您没有测试相同的两件事 - 在第一个测试中,除了成员资格测试之外,您还安排了两个作业和查找:

In [1]: import dis
   ...: x = 10
   ...: s = set(range(x))

In [2]: dis.dis("x in s")
  1        0 LOAD_NAME                0 (x)
           2 LOAD_NAME                1 (s)
           4 CONTAINS_OP              0
           6 RETURN_VALUE

In [3]: dis.dis("my_set = s; x in my_set")
  1        0 LOAD_NAME                0 (s)
           2 STORE_NAME               1 (my_set)
           4 LOAD_NAME                2 (x)
           6 LOAD_NAME                1 (my_set)
           8 CONTAINS_OP              0
          10 POP_TOP
          12 LOAD_CONST               0 (None)
          14 RETURN_VALUE

# By request
In [4]: dis.dis("s = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}; 10 in s")
  1        0 BUILD_SET                0
           2 LOAD_CONST               0 (frozenset({0, 1, 2, 3, 4, 5, 6, 7, 8, 9}))
           4 SET_UPDATE               1
           6 STORE_NAME               0 (s)
           8 LOAD_CONST               1 (10)
          10 LOAD_NAME                0 (s)
          12 CONTAINS_OP              0
          14 POP_TOP
          16 LOAD_CONST               2 (None)
          18 RETURN_VALUE

使用文字和使用文字之间的实际区别在于x in s后者需要在全局变量中执行查找,即区别是LOAD_NAMEvs LOAD_CONST

In [5]: dis.dis("10 in {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}")
  1        0 LOAD_CONST               0 (10)
           2 LOAD_CONST               1 (frozenset({0, 1, 2, 3, 4, 5, 6, 7, 8, 9}))
           4 CONTAINS_OP              0
           6 RETURN_VALUE

时间:

In [6]: %timeit x in s
28.5 ns ± 0.792 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [7]: %timeit 10 in {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
20.3 ns ± 0.384 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

推荐阅读