首页 > 解决方案 > 从多个 numpy 数组中删除最小的 p% 数字

问题描述

希望每个人在这些 COVID-19 大流行时期都做得很好!我有一个问题如下:

我有多个 numpy 多维数组,其中包含不同的随机数。现在我需要从所有不同的 numpy 数组中删除最小的数量。这类似于以某种方式将多个数组中的所有数字存储到一维数组中,然后从中删除最小的量级数字。

我知道如何为单个 numpy 数组执行此操作。代码如下:

# Define 3 numpy arrays-
x = np.random.uniform(low = -1000, high = 1000, size = (3, 3, 64))
y = np.random.uniform(low = -1000, high = 1000, size = (3, 3, 128))
z = np.random.uniform(low = -1000, high = 1000, size = (3, 3, 256))

x.shape, y.shape, z.shape
# ((3, 3, 64), (3, 3, 128), (3, 3, 256))

# Print a slice of 'x'-
# x[:,:, 0]


# Compute absolute values of 'x'-
x_mod = np.abs(x)

# Remove smallest (p = 10%) of absolute magnitude numbers to zero-
x_mod[x_mod < np.percentile(x_mod, 10)] = 0

# For removed numbers, have zero, otherwise the original values from 'x'-
x_fin = np.where(x_mod == 0, 0, x)

考虑到所有 numpy 数组:x、y 和 z(对于这个给定的例子),有没有办法删除最小的权重?

我正在使用 Python 3.8 和 numpy 1.18。

谢谢!

编辑:

我目前解决这个问题的方法如下:

# Create a numpy array concatenating all numpy arrays-
a = np.concatenate((x, y, z), axis = None)

a.shape
(4032,)

# Create absolute value for the numbers-
a_mod = np.abs(a)

# Remove the smallest 10% of magnitude based numbers-
a_mod[a_mod < np.percentile(a_mod, 10)] = 0

# Final array which has 0 for pruned numbers and the original number otherwise-
a_fin = np.where(a_mod == 0, 0, a)

# Take slices from 1-D numpy array and re-create different numpy arrays
# from above-
x_new = a_mod[:576].reshape(3, 3, 64)
y_new = a_mod[576:1728].reshape(3, 3, 128)
z_new = a_mod[1728:].reshape(3, 3, 256)

x_new.shape, y_new.shape, z_new.shape                                  
# ((3, 3, 64), (3, 3, 128), (3, 3, 256))

有没有更好的方法来实现这一点,因为我必须处理大约 20 个或更多的 numpy 数组,然后创建诸如 [576:1728] 之类的绝对切片变得容易出错并且无法扩展!

标签: pythonpython-3.xnumpy

解决方案


您可以展平和连接所有 numpy 数组,计算阈值np.percentile并根据此阈值修改原始数组。

import numpy as np

# Define 3 numpy arrays-
x = np.random.uniform(low = -1000, high = 1000, size = (3, 3, 64))
y = np.random.uniform(low = -1000, high = 1000, size = (3, 3, 128))
z = np.random.uniform(low = -1000, high = 1000, size = (3, 3, 256))

# your list of arrays
arrs = [x, y, z]

# flatten all arrays
flattened = [a.flatten() for a in arrs]
# calculate the thresshold
threshold = np.percentile(abs(np.concatenate(flattened)), 10)

# set values to 0 when smaller than thresshold
for arr in arrs:
    arr[abs(arr) < threshold] = 0

推荐阅读