sql-server - 擦洗数据后尝试获取平均值、标准差和新行数
问题描述
我正在使用 Advantage Works 2012,我正在清理数据以消除订单数量的异常值。最大值和最小值看起来像它的工作,但新的标准偏差似乎不起作用,当我对 OrderQty 进行新计数时,我得到一个高于原始计数的荒谬数字。我做了新的标准偏差,但它似乎不正确。有点不对劲。
完成所有 CTE 后,我添加了新计数。但数量高于原始数量。看起来它比原始行数高 3 倍。
WITH rawdata
AS (SELECT sod.[salesorderid],
[orderqty],
[purchaseordernumber]
FROM [AdventureWorks2012].[Sales].[salesorderdetail]sod
JOIN [Sales].[salesorderheader] soh
ON sod.salesorderid = soh.salesorderid),
stats
AS (SELECT [salesorderid],
Avg([orderqty]) Avg_qty,
Sum([orderqty]) sum_qty,
Stdevp([orderqty])STDDev_Qty,
Min([orderqty]) Min_Qty,
Max([orderqty]) Max_Qty,
Count(*) ctrows
FROM [Sales].[salesorderdetail]
GROUP BY [salesorderid]),
diff
AS (SELECT salesorderid,
Round(( avg_qty - stddev_qty ), 2) AS NegDiff,
Round(( avg_qty + stddev_qty ), 2) AS PosDiff
FROM stats),
scrubed
AS (SELECT st.salesorderid,
r.orderqty,
d.negdiff,
d.posdiff
FROM stats st
INNER JOIN diff d
ON st.salesorderid = d.salesorderid
JOIN rawdata r
ON st.salesorderid = r.salesorderid
WHERE r.[orderqty] > st.avg_qty - stddev_qty
AND r.[orderqty] < st.avg_qty + stddev_qty
GROUP BY st.salesorderid,
r.[orderqty],
d.negdiff,
d.posdiff)
SELECT r.[salesorderid],
ctrows AS rawct,
-- r.[purchaseordernumber],
st.avg_qty AS oldAvgQty,
sum_qty AS sumQty,
Round(st.stddev_qty, 2) AS Old_StdDevQty,
Round(st.min_qty, 2) AS Min_Qty,
Round(st.max_qty, 2) AS Max_Qty,
Round(Avg(sc.orderqty), 4) AS NewAvgQty,
Round(Stdevp(sc.orderqty), 4) AS NewStdDevQty,
Min(sc.orderqty) AS newMin,
Max(sc.orderqty) AS NewMax,
negdiff AS LowerLimit,
posdiff AS HigherLimit
FROM rawdata r
LEFT JOIN scrubed sc
ON r.salesorderid = sc.salesorderid
JOIN stats st
ON sc.salesorderid = st.salesorderid
GROUP BY r.salesorderid,
-- r.[purchaseordernumber],
st.avg_qty,
sum_qty,
st.stddev_qty,
st.max_qty,
st.min_qty,
ctrows,
negdiff,
posdiff
ORDER BY 1
我希望在数据清理后获得正确的行数和正确的标准偏差。
解决方案
Declare @STD DECIMAL(5,2)
set @std = 2.0;
With rawdata as
(SELECT [salesorderid],
Avg([orderqty]) Avg_qty,
round(Stdevp([orderqty]),2) STDDev_Qty,
Min([orderqty]) Min_Qty,
Max([orderqty]) Max_Qty,
Count(*) ctrows,
Sum(orderqty) AS Ordqty_sum,
Avg([orderqty]) - @STD as LowerLmt,
Avg([orderqty]) + @STD as HigherLmt
FROM [Sales].[salesorderdetail]
--Where [salesorderid] = 53621
GROUP BY [salesorderid] ),
scrubed as(
select sod.SalesOrderID,
sod.orderqty
FROM
[Sales].[SalesOrderDetail]sod inner join rawdata c on sod.SalesOrderID=c.SalesOrderID
Where
--sod.salesorderid = 53621 and
sod.orderqty between c.LowerLmt and c.HigherLmt )
select
c.salesorderid,
c.ctrows oldCtrows,
Avg_qty oldAvg,
STDDev_Qty oldStd,
Min_Qty oldMin,
Max_Qty oldMax,
count(s.SalesOrderID)newct ,
avg(orderqty)newAvg,
min(orderqty)NewMin,
max(orderqty)NewMax,
round(stdevp(orderqty),3)newSTD,
c.LowerLmt,
c.HigherLmt
from scrubed s join rawdata c on s.salesorderid= c.SalesOrderID
Where c.ctrows > 20
group by
c.salesorderid,
c.ctrows,
c.Avg_qty ,
c.STDDev_Qty ,
c.Min_Qty ,
c.Max_Qty ,
c.LowerLmt,
c.HigherLmt
推荐阅读
- python - 为什么在python中的范围函数中排除了结束值?
- php - 如何在首页(wordpress)上仅显示来自post_type的一篇文章
- javascript - 模态弹出窗口的关闭动画
- angular - 如果用户正在访问它,则在我的网站中显示自定义通知,并且仅当用户没有关注我的网站选项卡时才显示推送通知
- common-lisp - 在 commonlisp 的嵌套列表中替换字符串
- testing - 在 Cucumber 中,Before 和 After 钩子在哪里
- c# - 语句执行但不将文本附加到 TextBox
- swift - SwiftUI 状态初始化器之间的区别
- python - 从另一个数组python为每个元素创建数组
- jquery - jquery需要在点击时切换文本并在第二次点击时再次切换回来