首页 > 解决方案 > Teradata, transactional data where columns have individual update dates

问题描述

I'm working on a Teradata query where I'd like to find the latest value for multiple columns. Each data column is related to a date column (update date). The header of my table should help understand my issue: ID Attr_1 Attr_1_Update_Dt Attr_2 Attr_2_Update_Dt Attr_3 Attr_3_Update_Dt

I need to select Attr_1, Attr_2 and Attr_3 for the latest update date of each attribute (for each ID). I've already thought of running individual ranks (by update dt) for each of the attributes and then join them into a single table. But I don't think this is too sophisticated (more over when I have 8 attribute columns).

Hope the above is enough for helping me. Looking forward to hearing from you.

Thanks!

标签: sqlteradata

解决方案


我希望看到您以标准化方式存储数据。在完成之前,您将因需要更复杂的查询来执行此类任务而受到诅咒。

一种按原样处理数据的方法是使用 GREATEST() 函数,但由于某些原因,只有 Teradata 开发人员知道此函数不接受日期,但它们可能会转换为它接受的整数。还必须避免 NULL。这可以为您提供每行的最新日期,然后您可以从那里使用 ROW_NUMBER() 到达每个 ID 具有最新日期的行。

WITH cte AS (     SELECT 
                         ID
                       , Attr_1
                       , Attr_2
                       , Attr_3
                       , CAST(GREATEST( 
                                CAST(COALESCE(Attr_1_Update_Dt, date '1900-01-01') AS INT)
                              , CAST(COALESCE(Attr_2_Update_Dt, date '1900-01-01') AS INT)
                              , CAST(COALESCE(Attr_3_Update_Dt, date '1900-01-01') AS INT)
                              , CAST(COALESCE(Attr_4_Update_Dt, date '1900-01-01') AS INT)
                              , CAST(COALESCE(Attr_5_Update_Dt, date '1900-01-01') AS INT)
                              , CAST(COALESCE(Attr_6_Update_Dt, date '1900-01-01') AS INT)
                              , CAST(COALESCE(Attr_7_Update_Dt, date '1900-01-01') AS INT)
                              , CAST(COALESCE(Attr_8_Update_Dt, date '1900-01-01') AS INT)
                         )as date) as maxdate
                  FROM yourtable)
SELECT
       ID
     , Attr_1
     , Attr_2
     , Attr_3
     , maxdate
FROM (
      SELECT
             ID
           , Attr_1
           , Attr_2
           , Attr_3
           , maxdate
           , ROW_NUMBER() OVER(PARTITION BY ID
                               ORDER BY maxdate DESC) as rn
      FROM cte
      ) d
WHERE rn = 1

我无法准确评估这种方法的性能,它可能最适合中等大小的表。如果您的表非常大,那么我真的会探索规范化数据的选项。


推荐阅读