首页 > 解决方案 > 用于运行不同计数和运行条件计数的 Vertica SQL

问题描述

我正在尝试基于更深的产品 url 级别分数表构建部门级别分数表。

  1. 日期不连续

  2. 并非所有网址都在同一天获得分数更新(彼此独立)

  3. dist_url 应该是运行计数不同(累积计数不同)

  4. dist urls 和 urls score >=30 都是不同的

我现在拥有的是:

   Date  url   Store   Dept   Page   Score   
   10/1   a      US      A      X      10   
   10/1   b      US      A      X      30  
   10/1   c      US      A      X      60
   10/4   a      US      A      X      20  
   10/4   d      US      A      X      60
   10/6   b      US      A      X      22 
   10/9   a      US      A      X      40
   10/9   e      US      A      X      10


   Date  Store   Dept   Page   dist urls   urls score >=30  
   10/1   US      A      X          3          2 
   10/4   US      A      X          4          3
   10/6   US      A      X          4          2
   10/9   US      A      X          5          2

我认为 dist_url 可以通过使用窗口函数来完成,只是不确定查询。

当前查询如下,但它是错误的,因为不是累积计数不同:

   SELECT
        bm.AnalysisDate,
        su.SoID         AS Store,
        su.DptCaID      AS DTID,
        su.PageTypeID   AS PTID,
        COUNT(DISTINCT bm.SeoURLID) AS NumURLsWithDupScore,
        SUM(CASE WHEN bm.DuplicationScore > 30 THEN 1 ELSE 0 END) AS Over30Count
    FROM csn_seo.tblBotifyMetrics bm 
    INNER JOIN csn_seo.tblSEOURLs su 
        ON bm.SeoURLID = su.ID
    WHERE su.DptCaID IS NOT NULL 
        AND su.DptCaID <> 0    
        AND su.PageTypeID IS NOT NULL
        AND su.PageTypeID <> -1
        AND bm.iscompliant = 1
    GROUP BY bm.AnalysisDate, su.SoID, su.DptCaID, su.PageTypeID;

如果有人有任何想法,请告诉我。

标签: sqlcountvertica

解决方案


根据您的问题,您似乎需要两个层次的逻辑:

select date, store, dept,
       sum(sum(start)) over (partition by dept, page order by date) as distinct_urls,
       sum(sum(start_30)) over (partition by dept, page order by date) as distinct_urls_30
from ((select store, dept, page, url, min(date) as date, 1 as start, 0 as start_30
       from t
       group by store, dept, page, url 
      ) union all
      (select store, dept, page, url, min(date) as date, 0, 1
       from t
       where score >= 30
       group by store, dept, page, url 
      )
     ) t
group by date, store, dept, page;

我不明白您的查询与您的问题有何关系。


推荐阅读