首页 > 解决方案 > 是否可以始终依赖 ctypes.data_as 来保留对临时对象的引用?

问题描述

将数组从后端库传递python给后端c++库时,可以依赖以下内容吗?这曾经在 中起作用python <= 3.6,但似乎导致在以下情况下发生零星崩溃python >= 3.7

(这是一个非常简化的“真实”代码版本,其中面向用户的界面在底层库python之间来回传递数据)c++

# a 2d array, possibly not order="F"
xmat = np.ones((16, 32), dtype=np.float64)

# get a pointer to a version of xmat that is guaranteed to have order="F"
# if xmat already has order="F": no temporary
# if not, a temporary copy is made, reordered and a ptr to that returned
xptr = np.asfortranarray(xmat).ctypes.data_as(ctypes.POINTER(ctypes.c_double))

# pass xptr to c++ back-end to do things (expects order="F" data)

正如我(目前!)理解的ctypes.data_as 那样

将数据指针返回到特定的 c-types 对象...

返回的指针将保留对数组的引用。

还有一个示例显示在创建临时对象的情况下,例如(a + b).ctypes.data_as(ctypes.c_void_p)使用 ofdata_as是正确的做法。

python >= 3.7似乎data_as没有保留对临时的引用,并且在上面,最终xptr指向释放的内存......

难道我做错了什么?这是一个错误python >= 3.7吗?有一个更好的方法吗?


这里给出了一个完整的示例(带有一些额外的样板,将array's 编组为struct's 用于后端库):

import numpy as np
import ctypes as ct

lib_REALS_t = ct.c_double
lib_INDEX_t = ct.c_int32
lib_REALS_p = ct.POINTER(lib_REALS_t)

class lib_REALS_array_t(ct.Structure):
    _fields_ = [("size", lib_INDEX_t),
                ("data", lib_REALS_p)]

class lib_t(ct.Structure):
    _fields_ = [
    ("value", lib_REALS_array_t)]

def bug():

    libt = lib_t()

    # a 2d array, user-specified, possibly not order="F"
    xmat = np.ones((16, 32), dtype=np.float64, order="C")

    # get a pointer to a version of xmat that is guaranteed to have order="F"
    # if xmat already has order="F": no temporary
    # if not, a temporary copy is made, reordered and a ptr to that returned
    libt.value.size = xmat.size
    libt.value.data = np.asfortranarray(xmat).ctypes.data_as(ct.POINTER(lib_REALS_t))

    # pass xptr to c++ back-end to do things (expects order="F" data)

    # just "simulate" this by trying to access data using the pointer
    print(libt.value.data[1])

    return


if (__name__ == "__main__"): bug()

对我来说,python <= 3.6打印1.0(如预期)而python >= 3.7打印6.92213454250094e-310(即临时必须已被释放,因此指向未初始化的内存)。

标签: pythonnumpyctypes

解决方案


清单[Python 3.Docs]:ctypes - Python 的外部函数库

经过调查并寻找代码后,我得出了一个结论(我从一开始就凭直觉知道发生了什么)。

似乎[SciPy.Docs]: numpy.ndarray.ctypes

_ctypes.data_as ( self, obj )

...

返回的指针将保留对数组的引用。

具有误导性。保留引用表示它将保留数组(内部)缓冲区地址(在某种意义上它不会复制内存内容),而不是Python引用(Py_XINCREF)。

查看[Github]:numpy/numpy - numpy/numpy/core/_internal.py

def data_as(self, obj):
    # Comments
    return self._ctypes.cast(self._data, obj)

这是对ctypes.cast的调用,它只保存源数组的缓冲区地址。

发生的事情是np.asfortranarray(xmat)创建一个临时数组(动态),然后ctypes.data_as返回其缓冲区地址。在该行之后,临时超出范围(其缓冲区也是如此),但仍引用其地址,从而产生未定义行为UB)。

v1.15.0[SciPy.Docs]: numpy.ndarray.ctypes重点是我的))中提到了这一点:

小心使用 ctypes 属性 - 特别是在临时数组或动态构建的数组上。例如,调用(a+b).ctypes.data_as(ctypes.c_void_p) 返回一个指向无效内存的指针,因为创建为 (a+b) 的数组在下一个 Python 语句之前被释放。您可以使用c=a+b或避免此问题ct=(a+b).ctypes。在后一种情况下,ct 将持有对数组的引用,直到 ct 被删除或重新分配。

但他们后来把它拿出来了(尽管代码没有被修改(关于这种行为))。

要克服错误,请“保存”临时数组或保留对它的 ( Python ) 引用。[SO]中遇到了同样的问题:尝试读取在 Python 中创建的对象时访问冲突传递给 C++ 端的 std::vector 然后返回给 Python (@CristiFati's answer)

我稍微更改了您的代码(包括那些可怕的名字:))。

代码00.py

#!/usr/bin/env python3

import sys
import ctypes as ct
import numpy as np
from collections import defaultdict


DblPtr = ct.POINTER(ct.c_double)

class Struct0(ct.Structure):
    _fields_ = [
        ("size", ct.c_uint32),
        ("data", DblPtr),
    ]


class Wrapper(ct.Structure):
    _fields_ = [
        ("value", Struct0),
    ]


def test_np(np_array, save_intermediary_array):
    wrapper = Wrapper()
    wrapper.value.size = np_array.size

    if save_intermediary_array:
        fortran_array = np.asfortranarray(np_array)
        wrapper.value.data = fortran_array.ctypes.data_as(DblPtr)
    else:
        wrapper.value.data = np.asfortranarray(np_array).ctypes.data_as(DblPtr)
    #print(wrapper.value.data[0])
    return wrapper.value.data[1]


def main(*argv):
    dim1, dim0 = 16, 32
    mat = np.ones((dim1, dim0), dtype=np.float64, order="C")
    print("NumPy CTypes data: {0:}\n{1:}".format(mat.ctypes, mat.ctypes._ctypes))

    dd = defaultdict(int)
    flag = 0  # Change to 1 to avoid problem
    print("Saving intermediary array: {0:d}".format(flag))
    for i in range(100):
        dd[test_np(mat, flag)] += 1
    print("\nResult: {0:}".format(dd))


if __name__ == "__main__":
    print("Python {0:s} {1:d}bit on {2:s}\n".format(" ".join(item.strip() for item in sys.version.split("\n")), 64 if sys.maxsize > 0x100000000 else 32, sys.platform))
    print("NumPy version: {0:}".format(np.version.version))
    main(*sys.argv[1:])
    print("\nDone.")

输出

e:\Work\Dev\StackOverflow\q059959608>sopr.bat
*** Set shorter prompt to better fit when pasted in StackOverflow (or other) pages ***

[prompt]> "e:\Work\Dev\VEnvs\py_pc064_03.07.06_test0\Scripts\python.exe" code01.py
Python 3.7.6 (tags/v3.7.6:43364a7ae0, Dec 19 2019, 00:42:30) [MSC v.1916 64 bit (AMD64)] 64bit on win32

NumPy version: 1.18.0
NumPy CTypes data: <numpy.core._internal._ctypes object at 0x000001C9744B0348>
<module 'ctypes' from 'c:\\Install\\pc064\\Python\\Python\\03.07.06\\Lib\\ctypes\\__init__.py'>
Saving intermediary array: 0

Result: defaultdict(<class 'int'>, {9.707134377684e-312: 100})

Done.

[prompt]> "e:\Work\Dev\VEnvs\py_pc064_03.07.06_test0\Scripts\python.exe" code01.py
Python 3.7.6 (tags/v3.7.6:43364a7ae0, Dec 19 2019, 00:42:30) [MSC v.1916 64 bit (AMD64)] 64bit on win32

NumPy version: 1.18.0
NumPy CTypes data: <numpy.core._internal._ctypes object at 0x000001842ECA4FC8>
<module 'ctypes' from 'c:\\Install\\pc064\\Python\\Python\\03.07.06\\Lib\\ctypes\\__init__.py'>
Saving intermediary array: 0

Result: defaultdict(<class 'int'>, {1.0: 100})

Done.

[prompt]> "e:\Work\Dev\VEnvs\py_pc064_03.07.06_test0\Scripts\python.exe" code01.py
Python 3.7.6 (tags/v3.7.6:43364a7ae0, Dec 19 2019, 00:42:30) [MSC v.1916 64 bit (AMD64)] 64bit on win32

NumPy version: 1.18.0
NumPy CTypes data: <numpy.core._internal._ctypes object at 0x000001AD586E91C8>
<module 'ctypes' from 'c:\\Install\\pc064\\Python\\Python\\03.07.06\\Lib\\ctypes\\__init__.py'>
Saving intermediary array: 0

Result: defaultdict(<class 'int'>, {9.110668798574e-312: 100})

Done.

[prompt]> "e:\Work\Dev\VEnvs\py_pc064_03.07.06_test0\Scripts\python.exe" code01.py
Python 3.7.6 (tags/v3.7.6:43364a7ae0, Dec 19 2019, 00:42:30) [MSC v.1916 64 bit (AMD64)] 64bit on win32

NumPy version: 1.18.0
NumPy CTypes data: <numpy.core._internal._ctypes object at 0x0000012F903A9188>
<module 'ctypes' from 'c:\\Install\\pc064\\Python\\Python\\03.07.06\\Lib\\ctypes\\__init__.py'>
Saving intermediary array: 0

Result: defaultdict(<class 'int'>, {6.44158096444e-312: 100})

Done.

备注

  • 正如所见结果是相当随机的,这通常是一个UB指标
  • 有趣的是,在同一次运行中,它总是相同的值(defaultdict只有一项)
  • 将flag更改为1(或任何评估为True)将使问题消失

推荐阅读