首页 > 解决方案 > 什么是 numpy.core._multiarray_umath.implement_array_function 以及为什么要花费大量时间?

问题描述

我使用 numpy 进行大规模数据分析,有很多矩阵实现(例如dot,、、、count_nonzerolinalg.svd。在%prunJupyter 笔记本中之后,我发现这会numpy.core._multiarray_umath.implement_array_function花费大量时间,总共250秒中有38秒,其中有大量(67139/66979)。我知道应该优化其他功能,但我认为是否也可以抑制它,它是做什么用的?cumtimencall

这是我的%prun输出:

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 1848  203.845    0.110  242.582    0.131 stacking.py:130(_rda_cv)
 67139/66979   27.980    0.000   38.901    0.001 {built-in method numpy.core._multiarray_umath.implement_array_function}
    4    8.181    2.045  251.415   62.854 stacking.py:192(_model_selection)
14883    7.942    0.001    7.942    0.001 {method 'reduce' of 'numpy.ufunc' objects}
11096    2.107    0.000    2.353    0.000 linalg.py:1468(svd)
    4    0.154    0.038    0.188    0.047 stacking.py:20(_get_qvalues)
    1    0.149    0.149  251.887  251.887 stacking.py:255(fit)
   16    0.149    0.009    0.508    0.032 stacking.py:70(_construct_cov)
26341    0.140    0.000    0.140    0.000 {built-in method numpy.array}
    4    0.132    0.033    0.609    0.152 stacking.py:89(_construct_cov_cv)
11164    0.114    0.000    0.367    0.000 _methods.py:134(_mean)
 1919    0.102    0.000    0.102    0.000 {built-in method numpy.empty}
36989    0.073    0.000    0.073    0.000 {method 'astype' of 'numpy.ndarray' objects}
11132    0.052    0.000    0.383    0.000 fromnumeric.py:3153(mean)
   32    0.052    0.002    0.302    0.009 function_base.py:2245(cov)
38870    0.052    0.000   27.967    0.001 <__array_function__ internals>:2(dot)
11164    0.051    0.000    0.054    0.000 _methods.py:50(_count_reduce_items)
11096    0.043    0.000    0.070    0.000 linalg.py:144(_commonType)
   13    0.036    0.003    0.036    0.003 {method 'argsort' of 'numpy.ndarray' objects}
 3696    0.035    0.000    7.909    0.002 numeric.py:409(count_nonzero)
11096    0.033    0.000    0.064    0.000 linalg.py:116(_makearray)
66728    0.031    0.000    0.031    0.000 {built-in method builtins.issubclass}
11096    0.027    0.000    2.407    0.000 <__array_function__ internals>:2(svd)
11145    0.026    0.000    0.026    0.000 {method 'flatten' of 'numpy.ndarray' objects}
11096    0.024    0.000    0.024    0.000 linalg.py:111(get_linalg_error_extobj)
348583    0.023    0.000    0.023    0.000 {method 'append' of 'list' objects}
11132    0.021    0.000    0.421    0.000 <__array_function__ internals>:2(mean)
 7408    0.018    0.000    0.034    0.000 numerictypes.py:293(issubclass_)
 3696    0.017    0.000    7.940    0.002 <__array_function__ internals>:2(count_nonzero)
 3704    0.017    0.000    0.053    0.000 numerictypes.py:365(issubdtype)
 5544    0.017    0.000    0.017    0.000 stacking.py:146(<dictcomp>)
22192    0.016    0.000    0.025    0.000 linalg.py:134(_realType)
   40    0.016    0.000    0.016    0.000 {method 'sort' of 'numpy.ndarray' objects}
 3702    0.013    0.000    7.795    0.002 {method 'sum' of 'numpy.ndarray' objects}
15009    0.012    0.000    0.028    0.000 _asarray.py:88(asanyarray)
    5    0.012    0.002    0.053    0.011 _split.py:628(_make_test_folds)
22192    0.010    0.000    0.013    0.000 linalg.py:121(isComplexType)
22602    0.010    0.000    0.010    0.000 {built-in method builtins.isinstance}
13199    0.010    0.000    0.010    0.000 {built-in method builtins.getattr}
11264    0.010    0.000    0.025    0.000 _asarray.py:16(asarray)
11096    0.009    0.000    0.009    0.000 linalg.py:203(_assertRankAtLeast2)
22196    0.009    0.000    0.009    0.000 {method 'get' of 'dict' objects}
 1964    0.009    0.000    0.009    0.000 {method 'argmax' of 'numpy.ndarray' objects}
11132    0.008    0.000    0.008    0.000 {built-in method __new__ of type object at 0x00007FF847CE9BA0}
38870    0.008    0.000    0.008    0.000 multiarray.py:707(dot)
11625    0.008    0.000    0.008    0.000 {built-in method builtins.hasattr}
   45    0.007    0.000    0.038    0.001 arraysetops.py:297(_unique1d)
60/20    0.006    0.000    0.059    0.003 _split.py:74(split)
 1964    0.006    0.000    0.034    0.000 <__array_function__ internals>:2(argmax)
 1964    0.006    0.000    0.023    0.000 fromnumeric.py:1091(argmax)
 3702    0.005    0.000    7.782    0.002 _methods.py:36(_sum)
    4    0.005    0.001    0.221    0.055 stacking.py:317(_normalizer)
 1982    0.004    0.000    0.044    0.000 fromnumeric.py:55(_wrapfunc)
22192    0.004    0.000    0.004    0.000 {method '__array_prepare__' of 'numpy.ndarray' objects}
11096    0.004    0.000    0.004    0.000 linalg.py:1464(_svd_dispatcher)
   40    0.003    0.000    0.004    0.000 _split.py:107(_iter_test_masks)
11132    0.003    0.000    0.003    0.000 fromnumeric.py:3149(_mean_dispatcher)
 3696    0.003    0.000    0.003    0.000 numeric.py:405(_count_nonzero_dispatcher)
    3    0.003    0.001    0.005    0.002 stacking.py:243(_rda_prediction)
   20    0.002    0.000    0.055    0.003 _split.py:680(_iter_test_masks)
    1    0.002    0.002  251.889  251.889 <string>:1(<module>)
   48    0.002    0.000    0.002    0.000 {built-in method numpy.zeros}
   25    0.002    0.000    0.002    0.000 {built-in method numpy.arange}
    4    0.001    0.000    0.001    0.000 {method 'partition' of 'numpy.ndarray' objects}
    5    0.001    0.000    0.001    0.000 {method 'cumsum' of 'numpy.ndarray' objects}
   45    0.001    0.000    0.039    0.001 arraysetops.py:151(unique)
 1964    0.001    0.000    0.001    0.000 fromnumeric.py:1087(_argmax_dispatcher)
    5    0.001    0.000    0.011    0.002 multiclass.py:174(type_of_target)
  116    0.001    0.000    0.002    0.000 fromnumeric.py:42(_wrapit)
   32    0.001    0.000    0.001    0.000 stride_tricks.py:116(_broadcast_to)
   32    0.000    0.000    0.038    0.001 function_base.py:293(average)
    4    0.000    0.000    0.001    0.000 stacking.py:107(_calculate_weights)
  120    0.000    0.000    0.001    0.000 <__array_function__ internals>:2(where)
  115    0.000    0.000    0.001    0.000 validation.py:127(_num_samples)
   40    0.000    0.000    0.001    0.000 _split.py:430(_iter_test_indices)
  135    0.000    0.000    0.000    0.000 {built-in method _abc._abc_instancecheck}
60/20    0.000    0.000    0.060    0.003 _split.py:299(split)
   30    0.000    0.000    0.001    0.000 validation.py:238(indexable)
    5    0.000    0.000    0.001    0.000 validation.py:362(check_array)
    1    0.000    0.000  251.889  251.889 {built-in method builtins.exec}
    5    0.000    0.000    0.000    0.000 {method 'nonzero' of 'numpy.ndarray' objects}
    4    0.000    0.000    0.002    0.001 function_base.py:3508(_median)
  130    0.000    0.000    0.000    0.000 {built-in method _abc._abc_subclasscheck}
    5    0.000    0.000    0.000    0.000 function_base.py:1147(diff)
    1    0.000    0.000    0.003    0.003 stacking.py:350(_check_y)
   32    0.000    0.000    0.321    0.010 <__array_function__ internals>:2(cov)
    4    0.000    0.000    0.000    0.000 utils.py:1142(_median_nancheck)
    5    0.000    0.000    0.001    0.000 _split.py:661(<listcomp>)
   32    0.000    0.000    0.038    0.001 <__array_function__ internals>:2(average)
   32    0.000    0.000    0.036    0.001 {method 'mean' of 'numpy.ndarray' objects}
   30    0.000    0.000    0.001    0.000 validation.py:220(check_consistent_length)
   32    0.000    0.000    0.000    0.000 {method 'copy' of 'numpy.ndarray' objects}
   32    0.000    0.000    0.001    0.000 <__array_function__ internals>:2(broadcast_to)
   15    0.000    0.000    0.000    0.000 fromnumeric.py:73(_wrapreduction)
    5    0.000    0.000    0.001    0.000 validation.py:40(_assert_all_finite)
   15    0.000    0.000    0.000    0.000 _split.py:277(__init__)
   45    0.000    0.000    0.040    0.001 <__array_function__ internals>:2(unique)
   32    0.000    0.000    0.001    0.000 stride_tricks.py:143(broadcast_to)
    4    0.000    0.000    0.002    0.001 function_base.py:3359(_ureduce)
   32    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(result_type)
   32    0.000    0.000    0.000    0.000 <string>:1(__new__)
  135    0.000    0.000    0.000    0.000 abc.py:137(__instancecheck__)
    8    0.000    0.000    0.000    0.000 numeric.py:1273(normalize_axis_tuple)
   32    0.000    0.000    0.000    0.000 {built-in method builtins.any}
    4    0.000    0.000    0.000    0.000 numeric.py:1336(moveaxis)
  130    0.000    0.000    0.000    0.000 abc.py:141(__subclasscheck__)
   32    0.000    0.000    0.000    0.000 function_base.py:257(iterable)
  269    0.000    0.000    0.000    0.000 {built-in method builtins.len}
    5    0.000    0.000    0.000    0.000 validation.py:153(_shape_repr)
  120    0.000    0.000    0.000    0.000 multiarray.py:312(where)
   18    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(copyto)
   32    0.000    0.000    0.000    0.000 {method 'conj' of 'numpy.ndarray' objects}
   95    0.000    0.000    0.000    0.000 base.py:1189(isspmatrix)
   45    0.000    0.000    0.000    0.000 arraysetops.py:138(_unpack_tuple)
    5    0.000    0.000    0.000    0.000 _split.py:622(__init__)
    5    0.000    0.000    0.000    0.000 warnings.py:474(__enter__)
   32    0.000    0.000    0.000    0.000 {method 'squeeze' of 'numpy.ndarray' objects}
   30    0.000    0.000    0.000    0.000 validation.py:231(<listcomp>)
   10    0.000    0.000    0.000    0.000 numeric.py:290(full)
   10    0.000    0.000    0.000    0.000 _split.py:423(__init__)
    8    0.000    0.000    0.026    0.003 fromnumeric.py:978(argsort)
    8    0.000    0.000    0.000    0.000 numeric.py:166(ones)
   64    0.000    0.000    0.000    0.000 stride_tricks.py:121(<genexpr>)
   32    0.000    0.000    0.000    0.000 stride_tricks.py:26(_maybe_view_as_subclass)
    5    0.000    0.000    0.000    0.000 warnings.py:181(_add_filter)
    4    0.000    0.000    0.000    0.000 {built-in method _bisect.bisect_left}
    5    0.000    0.000    0.001    0.000 _split.py:685(split)
    8    0.000    0.000    0.026    0.003 <__array_function__ internals>:2(argsort)
    5    0.000    0.000    0.000    0.000 _internal.py:865(npy_ctypes_check)
    5    0.000    0.000    0.000    0.000 fromnumeric.py:1648(ravel)
    4    0.000    0.000    0.002    0.000 fromnumeric.py:657(partition)
   10    0.000    0.000    0.000    0.000 validation.py:180(<genexpr>)
    5    0.000    0.000    0.000    0.000 fromnumeric.py:2629(amin)
    4    0.000    0.000    0.002    0.001 function_base.py:3419(median)
   32    0.000    0.000    0.000    0.000 {built-in method builtins.iter}
   10    0.000    0.000    0.000    0.000 {built-in method builtins.max}
    5    0.000    0.000    0.000    0.000 warnings.py:453(__init__)
    5    0.000    0.000    0.000    0.000 warnings.py:165(simplefilter)
   32    0.000    0.000    0.000    0.000 function_base.py:2240(_cov_dispatcher)
    5    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(nonzero)
    5    0.000    0.000    0.000    0.000 fromnumeric.py:2189(any)
    5    0.000    0.000    0.000    0.000 validation.py:771(column_or_1d)
    5    0.000    0.000    0.000    0.000 {method 'remove' of 'list' objects}
   15    0.000    0.000    0.000    0.000 fromnumeric.py:74(<dictcomp>)
   32    0.000    0.000    0.000    0.000 function_base.py:289(_average_dispatcher)
    5    0.000    0.000    0.001    0.000 fromnumeric.py:2358(cumsum)
    4    0.000    0.000    0.002    0.001 <__array_function__ internals>:2(median)
    5    0.000    0.000    0.000    0.000 {method 'ravel' of 'numpy.ndarray' objects}
   13    0.000    0.000    0.000    0.000 {built-in method numpy.core._multiarray_umath.normalize_axis_index}
    4    0.000    0.000    0.002    0.000 <__array_function__ internals>:2(partition)
    5    0.000    0.000    0.001    0.000 <__array_function__ internals>:2(bincount)
    5    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(concatenate)
    4    0.000    0.000    0.000    0.000 core.py:6251(isMaskedArray)
    5    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(any)
    9    0.000    0.000    0.000    0.000 {method 'insert' of 'list' objects}
    5    0.000    0.000    0.000    0.000 {method 'join' of 'str' objects}
    5    0.000    0.000    0.002    0.000 <__array_function__ internals>:2(cumsum)
    5    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(diff)
    4    0.000    0.000    0.000    0.000 {built-in method builtins.sorted}
    5    0.000    0.000    0.000    0.000 fromnumeric.py:1759(nonzero)
    5    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(amin)
   32    0.000    0.000    0.000    0.000 stride_tricks.py:139(_broadcast_to_dispatcher)
   45    0.000    0.000    0.000    0.000 arraysetops.py:146(_unique_dispatcher)
    4    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(moveaxis)
    5    0.000    0.000    0.000    0.000 _config.py:12(get_config)
    5    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(shape)
    5    0.000    0.000    0.000    0.000 multiclass.py:111(is_multilabel)
    5    0.000    0.000    0.000    0.000 warnings.py:493(__exit__)
   32    0.000    0.000    0.000    0.000 multiarray.py:635(result_type)
    5    0.000    0.000    0.000    0.000 fromnumeric.py:2277(all)
    5    0.000    0.000    0.000    0.000 validation.py:355(_ensure_no_complex_data)
    5    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(all)
    5    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(ravel)
   18    0.000    0.000    0.000    0.000 multiarray.py:1043(copyto)
    8    0.000    0.000    0.000    0.000 numeric.py:1323(<listcomp>)
    5    0.000    0.000    0.000    0.000 fromnumeric.py:1755(_nonzero_dispatcher)
    4    0.000    0.000    0.000    0.000 {method 'transpose' of 'numpy.ndarray' objects}
    5    0.000    0.000    0.000    0.000 {method 'copy' of 'dict' objects}
   15    0.000    0.000    0.000    0.000 {method 'items' of 'dict' objects}
    8    0.000    0.000    0.000    0.000 fromnumeric.py:974(_argsort_dispatcher)
    1    0.000    0.000    0.000    0.000 _methods.py:32(_amin)
    8    0.000    0.000    0.000    0.000 {built-in method _operator.index}
   15    0.000    0.000    0.000    0.000 {built-in method _warnings._filters_mutated}
    5    0.000    0.000    0.000    0.000 fromnumeric.py:1856(shape)
    5    0.000    0.000    0.000    0.000 multiarray.py:145(concatenate)
    4    0.000    0.000    0.000    0.000 function_base.py:3414(_median_dispatcher)
    1    0.000    0.000    0.000    0.000 {method 'min' of 'numpy.ndarray' objects}
    5    0.000    0.000    0.000    0.000 fromnumeric.py:2185(_any_dispatcher)
    5    0.000    0.000    0.000    0.000 multiarray.py:853(bincount)
    5    0.000    0.000    0.000    0.000 fromnumeric.py:1852(_shape_dispatcher)
    5    0.000    0.000    0.000    0.000 fromnumeric.py:2354(_cumsum_dispatcher)
    5    0.000    0.000    0.000    0.000 function_base.py:1143(_diff_dispatcher)
    1    0.000    0.000    0.000    0.000 {method 'max' of 'numpy.ndarray' objects}
    4    0.000    0.000    0.000    0.000 numeric.py:1399(<listcomp>)
    5    0.000    0.000    0.000    0.000 fromnumeric.py:2273(_all_dispatcher)
    5    0.000    0.000    0.000    0.000 fromnumeric.py:2624(_amin_dispatcher)
    4    0.000    0.000    0.000    0.000 fromnumeric.py:653(_partition_dispatcher)
    5    0.000    0.000    0.000    0.000 fromnumeric.py:1644(_ravel_dispatcher)
    4    0.000    0.000    0.000    0.000 numeric.py:1332(_moveaxis_dispatcher)
    1    0.000    0.000    0.000    0.000 _methods.py:28(_amax)
    1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

标签: pythonnumpy

解决方案


最近版本的 NumPy 支持一个__array_function__钩子,对象可以实现该钩子来自定义任意 NumPy 可调用对象在调用它们时所做的事情。支持在 1.16 中默认禁用,在 1.17 中默认启用,预计最终将无条件启用。

implement_array_function是调用默认实现或__array_function__钩子以实现__array_function__支持的调度程序。按照设计,它实际上是在每次调用公共 NumPy 可调用对象时调用一次,包括在 NumPy 中发生的调用,并且它必须进行大量方法查找。希望未来的优化工作将减少一些这种开销。

您可以在NEP 18中查看更多详细信息,并且可以使用以下命令检查函数的文档字符串help(numpy.core._multiarray_umath.implement_array_function)

Help on built-in function implement_array_function in module numpy.core._multiarray_umath:

implement_array_function(...)
    Implement a function with checks for __array_function__ overrides.

    All arguments are required, and can only be passed by position.

    Arguments
    ---------
    implementation : function
        Function that implements the operation on NumPy array without
        overrides when called like ``implementation(*args, **kwargs)``.
    public_api : function
        Function exposed by NumPy's public API originally called like
        ``public_api(*args, **kwargs)`` on which arguments are now being
        checked.
    relevant_args : iterable
        Iterable of arguments to check for __array_function__ methods.
    args : tuple
        Arbitrary positional arguments originally passed into ``public_api``.
    kwargs : dict
        Arbitrary keyword arguments originally passed into ``public_api``.

    Returns
    -------
    Result from calling ``implementation()`` or an ``__array_function__``
    method, as appropriate.

    Raises
    ------
    TypeError : if no implementation is found.

推荐阅读