首页 > 解决方案 > 用cython和python3枚举的奇怪行为

问题描述

我们有一堆代码要移植到 python3 中,我们面临着一个非常奇怪的枚举行为。

cdef char **c_argv
c_argv = <char**>malloc(sizeof(char*) * len(args))
for idx, s in enumerate(args):
    if bytes != str:
        s = s.encode('utf-8')
    c_argv[idx] = s

在 python2 中,我们将在 c_argv 中看到所有的 argv,而在 python3 中我们只看到一个 ... 注意,如果我们以“pythonic”方式编写 for 而不使用 enumerate :

for i in args:

这也不起作用。

这是我们测试的完整复制器:

test_enumerate.pyx

from libc.stdlib cimport malloc, free
from libc.string cimport const_char

def test_enumerate(args):
    cdef char **c_argv
    c_argv = <char**>malloc(sizeof(char*) * len(args))
    for idx, s in enumerate(args):
        if bytes != str:
            s = s.encode('utf-8')
        c_argv[idx] = s

    for i in range(len(args)):
        print("Set by enumerate",c_argv[i])        
    free(c_argv)

def test_loop_obj(args):
    cdef char **c_argv
    c_argv = <char**>malloc(sizeof(char*) * len(args))
    idx=0
    for s in (args):
        if bytes != str:
            s = s.encode('utf-8')
        c_argv[idx] = s
        idx = idx+1
        
    for i in range(len(args)):
        print("Set by loop on objects",c_argv[i])        
    free(c_argv)

def test_loop(args):
    cdef char **c_argv
    c_argv = <char**>malloc(sizeof(char*) * len(args))
    for i in range(len(args)):
        if bytes != str:
            args[i] = args[i].encode('utf-8')
        c_argv[i] = args[i]

    for i in range(len(args)):
        print("Set by loop on index",c_argv[i])        
    free(c_argv)

测试.py

from test_enumerate import test_enumerate, test_loop_obj, test_loop
test_enumerate(['salut','tu','vas','bien'])
test_loop_obj(['salut','tu','vas','bien'])
test_loop(['salut','tu','vas','bien'])

设置.py:

from setuptools import setup
from Cython.Build import cythonize
setup(
    ext_modules = cythonize("test_enumerate.pyx")
)

我们编译它:

python/python3 setup.py build_ext --inplace

这是说明我们问题的输出:

$ python test.py
('Set by enumerate', 'salut')
('Set by enumerate', 'tu')
('Set by enumerate', 'vas')
('Set by enumerate', 'bien')
('Set by loop on objects', 'salut')
('Set by loop on objects', 'tu')
('Set by loop on objects', 'vas')
('Set by loop on objects', 'bien')
('Set by loop on index', 'salut')
('Set by loop on index', 'tu')
('Set by loop on index', 'vas')
('Set by loop on index', 'bien')
$ python3 test.py
('Set by enumerate', b'bien')
('Set by enumerate', b'bien')
('Set by enumerate', b'bien')
('Set by enumerate', b'bien')
('Set by loop on objects', b'bien')
('Set by loop on objects', b'bien')
('Set by loop on objects', b'bien')
('Set by loop on objects', b'bien')
('Set by loop on index', b'salut')
('Set by loop on index', b'tu')
('Set by loop on index', b'vas')
('Set by loop on index', b'bien')

有人可以解释我们在这里缺少什么吗?

标签: python-3.xpython-2.7cythonenumerate

解决方案


c_argv[idx] = s

这设置c_argv[idx]为指向 的数据的指针s。指针仅在s仍然存在时才有效。

s = s.encode('utf-8')

如果这一行发生,则s创建一个新的编码,导致先前的编码s被取消引用,因此可能被释放。

基本上,除非您了解(并且可以控制)它们的生命周期,否则不要乱用 c 指针。


推荐阅读