首页 > 解决方案 > Cython 中固定大小的字节串序列

问题描述

我是 Cython 的新手,对 C 的经验很少,所以请多多包涵。

我想存储一个固定大小的不可变字节对象序列。该对象看起来像:

obj = (b'abc', b'1234', b'^&$#%')

元组中的元素是不可变的,但它们的长度是任意的。

我尝试的是类似以下内容:

cdef char[3] *obj
cdef char* a, b, c
a = b'abc'
b = b'1234'
c = b'^&$#%'
obj = (a, b, c)

但我得到:

Storing unsafe C derivative of temporary Python reference

有人可以指出我正确的方向吗?

额外的问题:我如何输入这些 3 元组的任意长序列?

谢谢!

标签: python-3.xcython

解决方案


你肯定很接近!似乎有两个问题。

首先,我们需要更改 的声明obj,使其显示我们正在尝试创建一个char*对象数组,大小固定为 3。为此,您需要输入类型,然后是变量名,然后才数组的大小。这将为您提供所需char*的堆栈数组。

其次,当你声明时char* a, b, c,only ais a char*, while band care just char!这在编译阶段在 cython 中很清楚,它为我输出以下警告:

Non-trivial type declarators in shared declaration (e.g. mix of pointers and values). Each pointer declaration should be on its own line.

所以你应该这样做:

cdef char* obj[3]
cdef char* a
cdef char* b
cdef char* c
a = b'abc'
b = b'1234'
c = b'^&$#%'
obj = [a, b, c]

作为旁注,您可以cdef通过对代码执行以下操作来最小化输入:

cdef:
    char* obj[3]
    char* a
    char* b
    char* c
a = b'abc'
b = b'1234'
c = b'^&$#%'
obj = [a, b, c]

奖金:

根据您对 C 和一般指针的经验水平,我想我将展示使用 C++ 数据结构对新手更友好的方法。C++ 具有简单的内置数据结构,例如vector,相当于 python 列表。C 替代方案是有一个指向结构的指针,表示triplets. 然后,您将亲自负责使用 、 、 等函数管理mallocfree内存realloc

这是让您入门的东西;我强烈建议您自己遵循一些在线 C 或 C++ 教程,并将它们改编为 cython,经过一些练习,这应该是相当简单的。我展示了一个test.pyx文件以及setup.py显示如何使用 c++ 支持编译它的文件。

测试.pyx

from libcpp.vector cimport vector

"""
While it can be discouraged to mix raw char* C "strings" wth C++ data types, 
the code here is pretty simple.
Fixed arrays cannot be used directly for vector type, so we use a struct.
Ideally, you would use an std::array of std::string, but std::array does not 
exist in cython's libcpp. It should be easy to add support for this with an
extern statement though (out of the scope of this mini-tutorial).
"""
ctypedef struct triplet:
    char* data[3]

cdef:
    vector[triplet] obj
    triplet abc
    triplet xyz

abc.data = ["abc", "1234", "^&$#%"]
xyz.data = ["xyz", "5678", "%#$&^"]
obj.push_back(abc)#pretty much like python's list.append
obj.push_back(xyz)

"""
Loops through the vector.
Cython can automagically print structs so long as their members can be 
converted trivially to python types.
"""
for o in obj:
    print(o)

安装程序.py

from distutils.core import setup
from Cython.Build import cythonize
from distutils.core import Extension

def create_extension(ext_name):
    global language, libs, args, link_args
    path_parts = ext_name.split(".")
    path = "./{0}.pyx".format("/".join(path_parts))
    ext = Extension(ext_name, sources=[path], libraries=libs, language=language,
            extra_compile_args=args, extra_link_args=link_args)
    return ext

if __name__ == "__main__":
    libs = []#no external c libraries in this case
    language = "c++"#chooses c++ rather than c since STL is used
    args = ["-w", "-O3", "-ffast-math", "-march=native", "-fopenmp"]#assumes gcc is the compiler
    link_args = ["-fopenmp"]#none here, could use -fopenmp for parallel code
    annotate = True#autogenerates .html files per .pyx
    directives = {#saves typing @cython decorators and applies them globally
        "boundscheck": False,
        "wraparound": False,
        "initializedcheck": False,
        "cdivision": True,
        "nonecheck": False,
    }

    ext_names = [
        "test",
    ]

    extensions = [create_extension(ext_name) for ext_name in ext_names]
    setup(ext_modules = cythonize(
            extensions, 
            annotate=annotate, 
            compiler_directives=directives,
        )
    )

推荐阅读