首页 > 解决方案 > 如何使用 numpy 和 scipy 更改异常值

问题描述

我正在尝试使用 numpy(不使用 pandas)删除异常值。我有一个我创建的数组,如下所示:

[[-9.00681170e-01  1.01900435e+00 -1.34022653e+00 -1.31544430e+00]
[-1.14301691e+00 -1.31979479e-01 -1.34022653e+00 -1.31544430e+00]
[-1.38535265e+00  3.28414053e-01 -1.39706395e+00 -1.31544430e+00]
[-1.50652052e+00  9.82172869e-02 -1.28338910e+00 -1.31544430e+00]
[-1.02184904e+00  1.24920112e+00 -1.34022653e+00 -1.31544430e+00]
[-5.37177559e-01  1.93979142e+00 -1.16971425e+00 -1.05217993e+00]
[-1.50652052e+00  7.88807586e-01 -1.34022653e+00 -1.18381211e+00]
[-1.02184904e+00  7.88807586e-01 -1.28338910e+00 -1.31544430e+00]]

我想创建一个检查该数组的函数,如果它找到任何数字: x>=3 它将用 2.9 替换它,如果它找到一个数字 x<=-3 它将用 -2.9 替换它我有尝试了两种不同的方式:首先我尝试这样写:

def ignoreOutlieres(array):
for i in array:
  for x in i:
      x = float(format(x,".2f"))
      if x >= 3:
          x = 2.99
      elif x <= -3:
          x = -2.99
return array

但我得到了这种类型的错误:

TypeError:'float' 对象不能解释为整数

然后我尝试使用 numpt 和 z 测试:

def ignoreOutlieres(num_array):
for i in num_array:
    i = np.all(stats.zscore(i)>=3, axis = 2.9)
    return num_array

但我认为我并没有真正理解它背后的想法,而且我没有正确使用它。生病apreaciate任何形式的帮助或指导。我最终想要得到的输出看起来像这样:

[[-0.90068117,  1.01900435, -1.34022653, -1.3154443 ],
                [-1.14301691, -0.13197948, -1.34022653, -1.3154443 ],
                [-1.38535265,  0.32841405, -1.39706395, -1.3154443 ],
                [-1.50652052,  0.09821729, -1.2833891 , -1.3154443 ],
                [-1.02184904,  1.24920112, -1.34022653, -1.3154443 ],
                [-0.53717756,  1.93979142, -1.16971425, -1.05217993],
                [-1.50652052,  0.78880759, -1.34022653, -1.18381211],
                [-1.02184904,  0.78880759, -1.2833891 , -1.3154443 ],
                [-1.74885626, -0.36217625, -1.34022653, -1.3154443 ],
                [-1.14301691,  0.09821729, -1.2833891 , -1.44707648],
                [-0.53717756,  1.47939788, -1.2833891 , -1.3154443 ],
                [-1.26418478,  0.78880759, -1.22655167, -1.3154443 ],
                [-1.26418478, -0.13197948, -1.34022653, -1.44707648],
                [-1.87002413, -0.13197948, -1.51073881, -1.44707648],
                [-0.05250608,  2.16998818, -1.45390138, -1.3154443 ],
                [-0.17367395, 2.9       , -1.2833891 , -1.05217993],
                [-0.53717756,  1.93979142, -1.39706395, -1.05217993],
                [-0.90068117,  1.01900435, -1.34022653, -1.18381211],
                [-0.17367395,  1.70959465, -1.16971425, -1.18381211],
                [-0.90068117,  1.70959465, -1.2833891 , -1.18381211]])

标签: python-3.xnumpyscipy

解决方案


使用 numpy 时不应使用循环。您需要np.where,这是 numpy 的组合等价于forand if

patched = np.where(array <= -3, -2.99, 
                   np.where(array >= 3, 2.99, array))

推荐阅读