首页 > 解决方案 > 如何根据条件将以前的值替换为当前值?

问题描述

我的任务是使用 Z 分数检测异常值,并将它们的值替换为之前的有效值。

signal = ['229.84', '227.8', '221.16', '220.6', '217.52', '225.2', '221.68', '221.68', '225.24', '218.6', '218.6', '222.08', '219.96', '219.52', '223.8', '223.72', '222.6', '222.68', '228.2', '221.84', '229.36', '227.48', '227.48', '226.56', '226.24', '215.32', '220.76', '222.44', '234.12', '226.56', '228.04', '236.64', '228.32', '236.72', '236.84', '237.64', '213.92', '235.52', '238.0', '239.12', '237.12', '217.24', '229.4', '229.4', '239.56', '236.2', '236.2', '220.04', '232.24', '223.92', '220.6', '242.96', '220.4', '242.2', '243.28', '241.72', '241.12', '241.8', '236.6', '234.24', '233.84', '234.8', '236.88', '244.8', '236.0', '230.84', '229.6', '229.84', '214.8', '231.48', '239.6', '239.56', '222.88', '238.24', '238.92', '235.36', '217.48', '217.2', '217.12', '218.08', '222.04', '89.48', '88.8', '223.2', '213.6', '239.6', '214.52', '95.8', '210.8', '209.92', '210.4', '215.76', '210.28', '211.76', '210.64', '211.36', '210.84', '201.84', '211.16', '242.16', '233.28', '212.8', '207.44', '209.0', '208.52', '207.44', '212.08', '210.96', '203.12', '207.76', '202.8', '203.16', '208.36', '209.76', '211.24', '211.24', '211.24', '206.04', '209.76', '210.2', '195.96', '195.84', '207.2', '201.92', '203.8', '199.96', '206.24', '204.12', '233.92', '230.68', '226.4', '221.6', '226.68', '226.56', '225.6', '223.72', '220.44', '223.64', '225.52', '223.96', '228.0', '227.44', '224.4', '223.32', '220.08', '220.2', '221.8', '218.08', '218.08', '216.96']

import numpy as np
results = [ float(s) for s in signal]
mean = np.mean(results)
std = np.std(results)


threshold = -1.5
outlier = []
new_list = []

for i in results:
            z = (i-mean)/std
            if z < threshold:
                   outlier.append(i)    

outlier in the dataset is [89.48, 88.8, 95.8]

最终列表应将这些值替换为前一个值(仅当上一个值的 z 分数不符合条件时z < threshold

编辑:

现在,当我尝试将其扩展到具有类似元素的整个文件时,它会出现错误。 文件

 with open(f "File.txt") as f:
 img_intensity_list = f.readlines()
        for count,value in enumerate(img_intensity_list):
            img_intensity_list[count] = value.split("[")[1].split("]")[0].split(", ")
    #                 print(img_intensity_list)
            for elem,val in enumerate(img_intensity_list):

                    results = [ float(elem) for elem in img_intensity_list]
                    mean = np.mean(results)
                    std = np.std(results)


                    threshold = -1.5 
                    outlier = []
                            # new_list = [0 for k in range(len(results))] 

                    for i, value in enumerate(results):
                            z = (value-mean)/std

                            if float(z) < threshold: 
                                outlier.append(value)
                                results[i] = results[i-1]
                            else:
                                results[i] = value                      

错误:float() argument must be a string or a number, not 'list'

标签: pythonliststatisticsconditional-statements

解决方案


i在您的代码中是列表中的值。您在计算z值时将其用作值,在分配先前结果的值时将其用作索引。

用于enumerate获取列表中每个元素的索引和值,如下所示:

for i, value in enumerate( results):
            z = (value-mean)/std
            if z - threshold:
                   outlier.append(value)
                   results[i] = results[i-1]

如果我很好地理解了你的代码,这个版本应该会给你预期的结果。

import numpy as np


signal = ['229.84', '227.8', '221.16', '220.6', '217.52', '225.2', '221.68', '221.68', '225.24', '218.6', '218.6', '222.08', '219.96', '219.52', '223.8', '223.72', '222.6', '222.68', '228.2', '221.84', '229.36', '227.48', '227.48', '226.56', '226.24', '215.32', '220.76', '222.44', '234.12', '226.56', '228.04', '236.64', '228.32', '236.72', '236.84', '237.64', '213.92', '235.52', '238.0', '239.12', '237.12', '217.24', '229.4', '229.4', '239.56', '236.2', '236.2', '220.04', '232.24', '223.92', '220.6', '242.96', '220.4', '242.2', '243.28', '241.72', '241.12', '241.8', '236.6', '234.24', '233.84', '234.8', '236.88', '244.8', '236.0', '230.84', '229.6', '229.84', '214.8', '231.48', '239.6', '239.56', '222.88', '238.24', '238.92', '235.36', '217.48', '217.2', '217.12', '218.08', '222.04', '89.48', '88.8', '223.2', '213.6', '239.6', '214.52', '95.8', '210.8', '209.92', '210.4', '215.76', '210.28', '211.76', '210.64', '211.36', '210.84', '201.84', '211.16', '242.16', '233.28', '212.8', '207.44', '209.0', '208.52', '207.44', '212.08', '210.96', '203.12', '207.76', '202.8', '203.16', '208.36', '209.76', '211.24', '211.24', '211.24', '206.04', '209.76', '210.2', '195.96', '195.84', '207.2', '201.92', '203.8', '199.96', '206.24', '204.12', '233.92', '230.68', '226.4', '221.6', '226.68', '226.56', '225.6', '223.72', '220.44', '223.64', '225.52', '223.96', '228.0', '227.44', '224.4', '223.32', '220.08', '220.2', '221.8', '218.08', '218.08', '216.96']

# Converting the strings to floats
results = [ float(s) for s in signal]
mean = np.mean(results)
std = np.std(results)


threshold = -1.5 
outlier = []
new_list = [0 for k in range(len(results))] 

for i, value in enumerate(results):
            z = (value-mean)/std

            if float(z) < threshold: 
                outlier.append(value)
                new_list[i] = new_list[i-1]
            else:
                new_list[i] = value

推荐阅读