首页 > 解决方案 > 如何评估一列数字以查找重复序列中的变化?

问题描述

我有一个 100,000 行的表,其中一列是一系列重复的数字,代表机器人生产线上的一个“点”。我需要检测重复序列何时发生变化。一个序列由一个偶数和两边的奇数组成。因此,给定序列中的数字要么是 1 要么是 2,例如 32, 31, 33, 31, 33, 31, 33, 32, 31, 32, 31, 33, 32。三个数字的顺序是随机的,除了一个数字不能跟随它自己。顺序可以改变,例如 33, 32, 31, 33, 31, 33, 31, 32, 31, 32, 33, 44, 45, 44, 43, 45, 43, 45, 43, 44, 29, 30, 31, 29, 31, 30, 31, 29, 31, 29, 30, . . .

我希望能够检测到序列中的每个更改,并用 1 标记每个更改(在没有更改的行中保留默认的 0)。

如果不确定一个数字是否属于一个序列或另一个序列,例如 32、31、33、33、31、30、29、30、31(序列已从基于 32 的一个变为基于 30 的一个),则假设可以属于任一序列的最后一个数字属于“导入”序列。

数字序列中的每个步骤都有一个唯一的 moveID,由设施和机器人编号分区。

我尝试使用 Lead 和 Lag 推导出“点”前后的数字,找出点前三个数字之和与点后三个数字之和之间的差异,看看差异是否超过一定限制因此表明顺序发生了变化。(见下面的代码)。

SELECT moveID, facility, robotNum, spot, 
       CASE WHEN (((lastSpot1 + lastSpot2 + lastSpot3) - (nextSpot1 +  nextSpot2 +  nextSpot3))/3 BETWEEN -1.3 AND 1.3 ) 
       THEN 1
       ELSE 0 
       END 
       AS sequenceChange
     FROM
             ( SELECT facility, robotNum,
             , LEAD(spot,1,0) OVER(PARTITION BY facility, robotNum ORDER BY facility, robotNum, moveID ) AS nextSpot1
             , LEAD(spot,2,0) OVER(PARTITION BY facility, robotNum ORDER BY facility, robotNum, moveID ) AS nextSpot2
             , LEAD(spot,3,0) OVER(PARTITION BY facility, robotNum ORDER BY facility, robotNum, moveID ) AS nextSpot3
             , LAG(spot,1,0) OVER(PARTITION BY facility, robotNum ORDER BY facility, robotNum, moveID ) AS lastSpot1
             , LAG(spot,2,0) OVER(PARTITION BY facility, robotNum ORDER BY facility, robotNum, moveID ) AS lastSpot2
             , LAG(spot,3,0) OVER(PARTITION BY facility, robotNum ORDER BY facility, robotNum, moveID ) AS lastSpot3

            FROM SequenceTable1 
        ) t1
     ORDER BY facility, robotNum, moveID.

然而,这种方法是不可靠的,因为一些不同序列的前 3 位数字之和重叠。需要某种方式来“放大”差异,以便更可靠地检测变化,并在每个新序列开始时在“sequenceChange”列中更新 1。

标签: tsqlnumerical-analysis

解决方案


问题中描述的数字序列之间有几个简单的关系。一个是序列中的数字之间的差异永远不会超过2。另一个是每个唯一序列中的偶数的模数为2,有些的模数为4。有了模数,奇数的偶数属于很容易找到的。我确定了涵盖数字之间差异的所有可能的逻辑组合,以检测序列的变化(20 条规则涵盖 144 种可能的组合)并计算模数。以下是需要按顺序应用的规则的 SQL 案例语句。

                CASE    
                    WHEN    Spot = 1 OR Spot = 2   THEN 2   
                    WHEN    Spot = 3 and (POWER(lastSpot1,4) + POWER(Spot,4) + POWER(nextSpot2,4)) <114   THEN 2    
                    WHEN    Spot = 3 and nextSpot1 =  5   THEN 4    
                    WHEN    Spot = 3 and nextSpot1 = 4    THEN 4    
                    WHEN    Spot%4=0   THEN 4   
                    WHEN    Spot%4!=0 and Spot%2=0   THEN 2 
                    WHEN    Spot%2=1 and lastSpot1%2=1 AND (Spot -2 =lastSpot1 OR Spot+2 = lastSpot1) AND ((Spot+lastSpot1)/2)%4=0   THEN 4 
                    WHEN    Spot%2=1 and lastSpot1%2=1 AND (Spot -2 = lastSpot1 OR Spot+2 = lastSpot1) AND ((Spot+lastSpot1)/2)%4!=0 and ((Spot+lastSpot1)/2)%2=0   THEN 2  
                    WHEN    Spot%2=1 AND lastSpot1%2=0 AND (Spot-1 = lastSpot1 OR Spot+1 = lastSpot1) AND lastSpot1%4=0    THEN 4   
                    WHEN    Spot%2=1 AND lastSpot1%2=0 AND (Spot-1 = lastSpot1 OR Spot+1 = lastSpot1) AND lastSpot1%4!=0 AND lastSpot1%2=0     THEN 2   
                    WHEN    Spot%2=1 AND lastSpot1%2=1 AND (Spot-2 > lastSpot1 OR Spot+2 <  lastSpot1) AND nextSpot1%2=1 AND (Spot -2 = nextSpot1 OR Spot+2 = nextSpot1) AND  ((Spot+nextSpot1)/2)%4=0   THEN 4 
                    WHEN    Spot%2=1 AND lastSpot1%2=1 AND (Spot-2 > lastSpot1 OR Spot+2 <  lastSpot1) AND nextSpot1%2=1 AND (Spot -2 = nextSpot1 OR Spot+2 = nextSpot1) AND  ((Spot+nextSpot1)/2)%4!=0 AND and ((Spot+nextSpot1)/2)%2=0   THEN 2   
                    WHEN    Spot%2=1 AND lastSpot1%2=1 AND (Spot-2 > lastSpot1 OR Spot+2 <  lastSpot1) AND nextSpot1%2=0 AND (Spot - 1 = nextSpot1 OR Spot+1 = nextSpot1) AND  nextSpot1%4=0   THEN 4   
                    WHEN    Spot%2=1 AND lastSpot1%2=1 AND (Spot-2 > lastSpot1 OR Spot+2 <  lastSpot1) AND nextSpot1%2=1 AND (Spot -2 = nextSpot1 OR Spot+2 = nextSpot1) AND   nextSpot1%4!=0 AND and nextSpot1%2=0   THEN 2    
                    WHEN    Spot%2=1 AND lastSpot1%2=0 AND (Spot-1 > lastSpot1 OR Spot+1 <  lastSpot1) AND ((nextSpot1%2=1 AND (Spot-2 > nextSpot1 OR Spot+2 <  nextSpot1) ) OR(nextSpot1%2=0 AND (Spot-1 > nextSpot1 OR Spot+1 <  nextSpot1)))   THEN lastSpotStart    
                    WHEN    Spot%2=1 AND lastSpot1%2=1 AND (Spot-2 > lastSpot1 OR Spot+2 <  lastSpot1) AND ((nextSpot1%2=1 AND (Spot-2 > nextSpot1 OR Spot+2 <  nextSpot1) ) OR(nextSpot1%2=0 AND (Spot-1 > nextSpot1 OR Spot+1 <  nextSpot1)))   THEN lastSpotStart    
                    WHEN    Spot%2=1 AND lastSpot1%2=0 AND (Spot-12 > lastSpot1 OR Spot+1 <  lastSpot1) AND nextSpot1%2=0 AND (Spot - 1 = nextSpot1 OR Spot+1 = nextSpot1) AND  nextSpot1%4=0   THEN 4  
                    WHEN    Spot%2=1 AND lastSpot1%2=0 AND (Spot-1 > lastSpot1 OR Spot+1 <  lastSpot1) AND nextSpot1%2=0 AND (Spot - 1 = nextSpot1 OR Spot+1 = nextSpot1) AND   nextSpot1%4!=0 AND and nextSpot1%2=0   THEN 2   
                    WHEN    Spot%2=1 AND lastSpot1%2=0 AND (Spot-1 > lastSpot1 OR Spot+1 < lastSpot1) AND nextSpot1%2=1 AND  (Spot-2 = nextSpot1 OR Spot+2 = nextSpot1)  AND ((Spot + nextSpot1)/2)%4=0    THEN 4   
                    WHEN    Spot%2=1 AND lastSpot1%2=0 AND (Spot-1 > lastSpot1 OR Spot+1 < lastSpot1) AND nextSpot1%2=1 AND  (Spot-2 = nextSpot1 OR Spot+2 = nextSpot1)  AND lastSpot1%4!=0 AND  ((Spot + nextSpot1)/2)%2=0   THEN 2    
            ELSE 1      
            END AS seqMod

接下来,为了计算每个序列中每个数字的索引(索引是序列的偶数),我使用了以下代码(感谢https://stackoverflow.com/users/15498/damien-the-unbeliever):

            CASE 
            WHEN seqMod = 4 AND (Spot = 1 OR Spot = 2) THEN 2  -- for when a robot that started at index 4 returns to index 2
            ELSE 
            CONVERT(int, ROUND((Spot +  CASE    WHEN seqMod = 2 THEN 2 ELSE 0 END )/4.0,0)* 4 - CASE    WHEN seqMod = 2 THEN 2 ELSE 0 END   )
            END AS spotIndex

使用索引很容易检测从一个序列到另一个序列的变化


推荐阅读