首页 > 解决方案 > 根据 Snowflake 中的时间戳和值组获取最后一个值

问题描述

我正在尝试使用 Snowflake 根据时间戳获取组的最后一个值。

我有下表,

| ISSUE_ID | ISSUE_ID | FIELD_TIME | FIELD_NAME | FIELD_VALUE| STATUS
| -------- | ---------- | ---------- | ----------| ----------| ----------
|19229| X1|2021-08-01 09:04:35.234000000 +00:00|Status|10010|PENDING
|19229| X1|2021-08-01 09:04:35.234000000 +00:00|Number of Products|55|PENDING
|19229| X1|2021-08-01 09:04:35.234000000 +00:00|Number of SKU not created|34|PENDING
|19229| X1|2021-08-01 09:04:35.234000000 +00:00|Number of SKU live|21|PENDING
|19229| X1|2021-08-01 09:04:35.234000000 +00:00|Date of SKU live|2021-08-12|PENDING
|19229| X1|2021-08-08 06:19:05.209000000 +00:00|Status|10010|PENDING
|19229| X1|2021-08-08 11:43:51.953000000 +00:00|Status|10010|PENDING
|19229| X1|2021-08-12 06:12:15.185000000 +00:00|Number of Products|55|PENDING
|19229| X1|2021-08-12 06:27:32.199000000 +00:00|Number of SKU not created|31|PENDING
|19229| X1|2021-08-12 09:14:16.178000000 +00:00|Number of SKU live|24|PENDING

我希望表格中的每个值都有一列(例如,产品数量、SKU 上线日期等),并且该列只有当天的最后一个值。如下表所示,

| ISSUE_ID | ISSUE_ID | FIELD_TIME | Number of Products | Number of SKU live| Date of SKU live|Number of SKU not created|Work_In_Progress_Date|Pending_Date|Status|Resolution|
|--------|----------|---------- |----------|----------|----------|----------|----------
|19229   | X1       |2021-08-01 |55        |21        |2021-08-01|34        |          |             |PENDING              | null
|19229   | X1       |2021-08-08 |          |          |          |          |          |             |PENDING              | null
|19229   | X1       |2021-08-12 |55        |24        |2021-08-01|31        |2021-08-12|   2021-08-12|PENDING              | null

我已经尝试过last_value(FIELD_VALUE) over (partition by FIELD_NAME, ISSUE_ID order by field_time),但它给了我 23 行而不是 3 行的重复值。我也试过lag()了,但没有运气。
如果有人知道如何进行这种转换,请帮助我。

这是我的 SQL 代码,

select
       t.ISSUE_ID, t.issue_name,
--        t.created_date,
       t.field_time::date as field_time,
       max(case when field_name = 'Number of Products' then field_value end) as Number_of_Products,
       max(case when field_name = 'Number of SKU live' then field_value end) as Number_of_SKU_Live,
       max(case when field_name = 'Number of SKU not created' then field_value end) as Number_of_SKU_Not_Created,
       max(case when field_name = 'Date of SKU live' then field_value end) as Date_of_SKU_Live,
       max(case when field_value = '10020' then date(t.field_time) end) as Work_In_Progress_Date,
       max(case when field_value = '10010' then date(t.field_time) end) as Pending_Date,
       t.status, t.resolution
       from
(select fh.ISSUE_ID,
       i.issue_name,
       date(i.created_date) as created_date,
       fh.TIME as field_time,
       f.name as field_name,
       fh.value as field_value,
       i.status,
       i.resolution
from JIRA.ISSUE_FIELD_HISTORY fh
         left join JIRA.FIELD f on fh.FIELD_ID = f.ID and f._FIVETRAN_DELETED = 0
         left join (select i0.created as created_date,r.name as resolution, i0.id, i0.key as issue_name, i0.status as status_id, s.name as status
                    from JIRA.issue i0
                             left join JIRA.status s on i0.status = s.ID
                             left join JIRA.RESOLUTION r on i0.RESOLUTION = r.ID
             where i0._FIVETRAN_DELETED = 0
             and i0.key like 'PIM%')
             i on i.id = fh.ISSUE_ID
where fh.ISSUE_ID in (select ID from ISSUE where PROJECT = 10041)
and fh.FIELD_ID in ('customfield_10067', 'customfield_10063', 'customfield_10066', 'customfield_10068', 'status', 'resolution')
-- and issue_name = 'PIM-11'
qualify row_number() over (partition by issue_id, field_time::date, field_name order by field_time desc) = 1
order by field_time) t
group by issue_id, issue_name,created_date, field_time::date, status, resolution

标签: sqlsnowflake-cloud-data-platform

解决方案


您可以使用qualify获取最新值:

select t.*
from t
qualify row_number() over (partition by issue_id, field_time::date, field_name order by field_time desc) = 0;

然后聚合:

select issue_id, field_time::date,
       max(case when field_name = 'Number of Products' then field_value end) as number_of_products,
       max(case when field_name = 'Number of SKU live' then field_value end) as number_of_sku_live,
       . . . 
from (select t.*
      from t
      qualify row_number() over (partition by issue_id, field_time::date, field_name order by field_time desc) = 0
     ) t
group by issue_id, field_time::date

推荐阅读