首页 > 解决方案 > 擦洗数据后尝试获取平均值、标准差和新行数

问题描述

我正在使用 Advantage Works 2012,我正在清理数据以消除订单数量的异常值。最大值和最小值看起来像它的工作,但新的标准偏差似乎不起作用,当我对 OrderQty 进行新计数时,我得到一个高于原始计数的荒谬数字。我做了新的标准偏差,但它似乎不正确。有点不对劲。

完成所有 CTE 后,我添加了新计数。但数量高于原始数量。看起来它比原始行数高 3 倍。

WITH rawdata 
     AS (SELECT sod.[salesorderid], 
                [orderqty], 
                [purchaseordernumber] 
         FROM   [AdventureWorks2012].[Sales].[salesorderdetail]sod 
                JOIN [Sales].[salesorderheader] soh 
                  ON sod.salesorderid = soh.salesorderid), 
     stats 
     AS (SELECT [salesorderid], 
                Avg([orderqty])   Avg_qty, 
                Sum([orderqty])   sum_qty, 
                Stdevp([orderqty])STDDev_Qty, 
                Min([orderqty])   Min_Qty, 
                Max([orderqty])   Max_Qty, 
                Count(*)          ctrows 

         FROM   [Sales].[salesorderdetail] 
         GROUP  BY [salesorderid]), 
     diff 
     AS (SELECT salesorderid, 
                Round(( avg_qty - stddev_qty ), 2) AS NegDiff, 
                Round(( avg_qty + stddev_qty ), 2) AS PosDiff 
         FROM   stats), 
     scrubed 
     AS (SELECT st.salesorderid, 
                r.orderqty, 
                d.negdiff, 
                d.posdiff 
         FROM   stats st 
                INNER JOIN diff d 
                        ON st.salesorderid = d.salesorderid 
                JOIN rawdata r 
                  ON st.salesorderid = r.salesorderid 
         WHERE  r.[orderqty] > st.avg_qty - stddev_qty 
                AND r.[orderqty] < st.avg_qty + stddev_qty 
         GROUP  BY st.salesorderid, 
                   r.[orderqty], 
                   d.negdiff, 
                   d.posdiff) 
SELECT r.[salesorderid], 
       ctrows                        AS rawct, 
       --  r.[purchaseordernumber],  
       st.avg_qty                    AS oldAvgQty, 
       sum_qty                       AS sumQty, 
       Round(st.stddev_qty, 2)       AS Old_StdDevQty, 
       Round(st.min_qty, 2)          AS Min_Qty, 
       Round(st.max_qty, 2)          AS Max_Qty, 
       Round(Avg(sc.orderqty), 4)    AS NewAvgQty, 
       Round(Stdevp(sc.orderqty), 4) AS NewStdDevQty, 
       Min(sc.orderqty)              AS newMin, 
       Max(sc.orderqty)              AS NewMax, 
       negdiff                       AS LowerLimit, 
       posdiff                       AS HigherLimit 
FROM   rawdata r 
       LEFT JOIN scrubed sc 
              ON r.salesorderid = sc.salesorderid 
       JOIN stats st 
         ON sc.salesorderid = st.salesorderid 
GROUP  BY r.salesorderid, 
          -- r.[purchaseordernumber],  
          st.avg_qty, 
          sum_qty, 
          st.stddev_qty, 
          st.max_qty, 
          st.min_qty, 
          ctrows, 
          negdiff, 
          posdiff 
ORDER  BY 1 

我希望在数据清理后获得正确的行数和正确的标准偏差。

标签: sql-servercommon-table-expression

解决方案


Declare @STD DECIMAL(5,2)

set @std = 2.0;
With rawdata as 
(SELECT [salesorderid], 
                Avg([orderqty])   Avg_qty, 
                round(Stdevp([orderqty]),2) STDDev_Qty, 
                Min([orderqty])   Min_Qty, 
                Max([orderqty])   Max_Qty, 
                Count(*)          ctrows, 
                Sum(orderqty)     AS Ordqty_sum,
                Avg([orderqty]) - @STD as LowerLmt,
                Avg([orderqty]) + @STD as HigherLmt


         FROM   [Sales].[salesorderdetail] 
         --Where [salesorderid] = 53621
         GROUP  BY [salesorderid]    ),

         scrubed as(


         select         sod.SalesOrderID,
                        sod.orderqty


         FROM 

         [Sales].[SalesOrderDetail]sod inner join rawdata c on sod.SalesOrderID=c.SalesOrderID
         Where 
         --sod.salesorderid = 53621 and 
         sod.orderqty between c.LowerLmt and c.HigherLmt )


         select 
                c.salesorderid,
                c.ctrows oldCtrows,
                Avg_qty oldAvg,
                STDDev_Qty oldStd,
                Min_Qty oldMin,
                Max_Qty oldMax,
                count(s.SalesOrderID)newct , 
                avg(orderqty)newAvg, 
                min(orderqty)NewMin, 
                max(orderqty)NewMax, 
                round(stdevp(orderqty),3)newSTD,
                c.LowerLmt,
                c.HigherLmt



         from scrubed s join  rawdata c on s.salesorderid= c.SalesOrderID
         Where  c.ctrows > 20
         group by 
                c.salesorderid,
                c.ctrows,
                c.Avg_qty ,
                c.STDDev_Qty ,
                c.Min_Qty ,
                c.Max_Qty ,
                c.LowerLmt,
                c.HigherLmt

推荐阅读