首页 > 解决方案 > 将numpy数组解析为单值的改进建议

问题描述

我编写了一个函数来基于选定的方法将 2D numpy 整数 n X m 数组聚合到 1 X 1 2D numpy 数组。如何改进我的功能以提高速度/性能?

方法是:

  1. min:返回最小值
  2. max:返回最大值
  3. 中位数:返回最常出现的值。
  4. 优先级值:priority如果它在数组中的出现超过阈值,则返回指定值th

其他要求:

  1. 如果输入值中的值都相同,则返回该数字
  2. 可以提供从方法中屏蔽的ignore值,但不是上述要求。

我目前的实现:

import numpy as np

def array2val(arr, method, dt, prio=None, th=None, ignore=None):
    """
    Parse a Numpy array to a single output value based on method. Useful for aggregation
    :param arr: 2D numpy array
    :param method: [sum, min, max, median, priority]. priority means to give priority to a value if it occurs >= a threshold
    :param dt: datatype of output array
    :param prio: the value to be prioritized if method == priority
    :param th: occurrence treshold for the priority value. Return median if threshold is not exceeded
    :param ignore: value to ignore in all methods
    :return: 2D numpy array with shape (1,1) with value following above, unless the input array has all same values,
             then return that value. This trumps ignore values
    """

    # All values are the same, return this value
    if arr.std() == 0:
        return np.array([[arr[0, 0]]]).astype(dt)

    # Mask away ignored values if requested
    if ignore is not None:
        arr = np.ma.array(arr, mask=np.where(arr == ignore, True, False))
        v, c = np.unique(arr, return_counts=True)
        vals = v.data[~v.mask]  # Values with ignore value removed
        counts = c[~v.mask]     # Counts with ignore value removed
    else:
        vals, counts = np.unique(arr, return_counts=True)

    if method == 'median':
        out = vals[counts.argmax()]
        return np.array([[out]]).astype(dt)

    elif method == 'priority':
        if counts[np.where(vals == prio)] >= th:  # priority value is in the array and exceeds treshold
            return np.array([[prio]]).astype(dt)
        else:  # priority value does not exceed treshold or is not in the array at all.
            out = vals[counts.argmax()]  # default to most occuring value
            return np.array([[out]]).astype(dt)

    elif method == 'sum':
        return np.array([[arr.sum()]]).astype(dt)

    elif method == 'min':
        return np.array([[arr.min()]]).astype(dt)

    elif method == 'max':
        return np.array([[arr.max()]]).astype(dt)

    else:
        raise Exception('Invalid method for aggregation')

标签: pythonarraysnumpyaggregatemask

解决方案


推荐阅读