首页 > 解决方案 > (Oracle)Oulier 去除

问题描述

我的目标是使用下表中的 IQR(四分位间距)检查 C1 列中的数据是否异常。去除异常值的数据有:10,20,30 异常值是数据100。(按此方法)

在此处输入图像描述

  Q1 : For this C1 data, Q1_Loacation data(interpolated, such as (10 + (20-10)*(1.75-1) )

  Q3 : For this C1 data, Q3_Loacation data(interpolated, such as (20 + (30-20)*(3.25-3) )

  IQR : Q3-Q1

  Fence1 : Q1-1.5*IQR

  Fence2 : Q3+1.5*IQR

  checked column: if c1 is in the Fence1 ~ Fence2 then 1

我可以得到 Q1_Location 和 Q3_Location 如下。

WITH t AS (
SELECT 10 C1 FROM DUAL
UNION 
SELECT 20  FROM DUAL
UNION 
SELECT 30  FROM DUAL
UNION 
SELECT 100  FROM DUAL
)
SELECT rn, c1,
(Count(c1) over () - 1)*0.25+1 Q1_Location
,(Count(c1) over () - 1)*0.75+1 Q3_Lacation
FROM (Select t.*, Row_number() over (order by c1) rn From t);

有没有一种简洁的方法来获取其他列(Q1、Q3、IQR、Fence1、Fence2 和检查列)?

标签: oracleoutliers

解决方案


WITH t AS (
SELECT 10 C1 FROM DUAL
UNION 
SELECT 20  FROM DUAL
UNION 
SELECT 30  FROM DUAL
UNION 
SELECT 100  FROM DUAL
) 
,LOC AS (
SELECT rn, c1,
(Count(c1) over () - 1)*0.25+1 Q1_Loc
,trunc((Count(c1) over () - 1)*0.25+1) Q1_Loc_tr
,(Count(c1) over () - 1)*0.75+1 Q3_Loc
,trunc((Count(c1) over () - 1)*0.75+1) Q3_Loc_tr
,c1+((lead(c1) over (order by c1)-c1))*(((Count(c1) over () - 1)*0.25+1)-rn) Q1_Val
,c1+((lead(c1) over (order by c1)-c1))*(((Count(c1) over () - 1)*0.75+1)-rn) Q3_Val
FROM (Select t.*, Row_number() over (order by c1) rn From t)
)
,Q1 AS (
SELECT Q1_Val FROM LOC WHERE rn=Q1_Loc_tr
)
,Q3 AS (
SELECT Q3_Val FROM LOC WHERE rn=Q3_Loc_tr
)
,IQR AS (
SELECT (SELECT Q3_Val FROM Q3) - (SELECT Q1_Val FROM Q1) FROM DUAL
),OUTLIER_REMOVED AS (
--SELECT * FROM IQR
SELECT * FROM t
WHERE C1 >= (SELECT Q1_Val FROM Q1)-(SELECT * FROM IQR)*1.5 AND C1 <= (SELECT Q3_Val FROM Q3) +(SELECT * FROM IQR)*1.5
)
SELECT * FROM OUTLIER_REMOVED;

推荐阅读