首页 > 解决方案 > 如何使用 Amazon redshift 正确编写此查询?

问题描述

我想编写一个更新查询来更新表中列的计数,但我不知道如何实现它。我已将其缩小到三个选项,但我继续遇到一些或其他问题。哪个是正确的方法和正确的查询?

update fact_spv_commissioned_lot
set sn_count = fact_spv_commissioned_lot.sn_count + 
(
  SELECT COUNT(*) FROM staging_serials s
  JOIN dim_md_company c ON (c.lsc_company_id = s.companyid)
  JOIN staging_product p ON (s.compositeproductcode = p.compositeproductcode)
  JOIN dim_packaging_level l ON (l.unit_of_measure = p.packaginguom)
  JOIN fact_spv_commissioned_lot f ON (f.sk_company_id = s.companyid)
  WHERE c.sk_company_id = f.sk_company_id
  AND s.lotnumber = f.lot_number
  AND p.sk_product_id = f.sk_product_id
  AND l.sk_packaging_level_id = f.sk_packaging_level_id
)

或者这是正确的写法?

update fact_spv_commissioned_lot
set sn_count = fact_spv_commissioned_lot.sn_count + 
(
  SELECT COUNT(*) FROM staging_serials s
  JOIN dim_md_company c ON (c.lsc_company_id = s.companyid)
  JOIN staging_product p ON (s.compositeproductcode = p.compositeproductcode)
  JOIN dim_packaging_level l ON (l.unit_of_measure = p.packaginguom)
  JOIN fact_spv_commissioned_lot f ON (f.sk_company_id = s.companyid)
  WHERE c.sk_company_id = f.sk_company_id
  AND s.lotnumber = f.lot_number
  AND p.sk_product_id = f.sk_product_id
  AND l.sk_packaging_level_id = f.sk_packaging_level_id
)
FROM staging_serials s
  JOIN dim_md_company c ON (c.lsc_company_id = s.companyid)
  JOIN staging_product p ON (s.compositeproductcode = p.compositeproductcode)
  JOIN dim_packaging_level l ON (l.unit_of_measure = p.packaginguom)
  JOIN fact_spv_commissioned_lot f ON (f.sk_company_id = s.companyid)
  WHERE c.sk_company_id = f.sk_company_id
  AND s.lotnumber = f.lot_number
  AND p.sk_product_id = f.sk_product_id
  AND l.sk_packaging_level_id = f.sk_packaging_level_id

或者这是正确的写法?

update fact_spv_commissioned_lot
set sn_count = fact_spv_commissioned_lot.sn_count + 
(
  SELECT COUNT(*) FROM staging_serials s
  JOIN dim_md_company c ON (c.lsc_company_id = s.companyid)
  JOIN staging_product p ON (s.compositeproductcode = p.compositeproductcode)
  JOIN dim_packaging_level l ON (l.unit_of_measure = p.packaginguom)
  JOIN fact_spv_commissioned_lot f ON (f.sk_company_id = s.companyid)
)
  WHERE c.sk_company_id = f.sk_company_id
  AND s.lotnumber = f.lot_number
  AND p.sk_product_id = f.sk_product_id
  AND l.sk_packaging_level_id = f.sk_packaging_level_id

标签: sqlamazon-redshift

解决方案


我个人喜欢CTE,但您的第一个查询几乎就在那里。

CTE 版本如下(请用<pk-col>实际的主键列替换):

WITH
    agg_data (pk, count) AS (
        SELECT f.<pk-col>, COUNT(*)
        FROM staging_serials s
            JOIN dim_md_company c ON (c.lsc_company_id = s.companyid)
            JOIN staging_product p ON (s.compositeproductcode = p.compositeproductcode)
            JOIN dim_packaging_level l ON (l.unit_of_measure = p.packaginguom)
            JOIN fact_spv_commissioned_lot f ON (f.sk_company_id = s.companyid)
        WHERE c.sk_company_id = f.sk_company_id
            AND s.lotnumber = f.lot_number
            AND p.sk_product_id = f.sk_product_id
            AND l.sk_packaging_level_id = f.sk_packaging_level_id
        GROUP BY 1
    )
UPDATE fact_spv_commissioned_lot AS to_update
SET sn_count = sn_count + agg_data.count
FROM agg_data WHERE agg_data.pk = to_update.<pk-col>;

作为替代方案,您还可以使用与表相关的子选择中的原始连接列fact_spv_commissioned_lot来弥补删除JOIN( ) 的相关性,例如:f

WITH
    agg_data (sk_company_id, lot_number, sk_product_id, sk_packaging_level_id, count) AS (
        SELECT f.sk_company_id, f.lot_number, f.sk_product_id, f.sk_packaging_level_id, COUNT(*)
        FROM staging_serials s
            JOIN dim_md_company c ON (c.lsc_company_id = s.companyid)
            JOIN staging_product p ON (s.compositeproductcode = p.compositeproductcode)
            JOIN dim_packaging_level l ON (l.unit_of_measure = p.packaginguom)
            JOIN fact_spv_commissioned_lot f ON (f.sk_company_id = s.companyid)
        WHERE c.sk_company_id = f.sk_company_id
            AND s.lotnumber = f.lot_number
            AND p.sk_product_id = f.sk_product_id
            AND l.sk_packaging_level_id = f.sk_packaging_level_id
        GROUP BY 1, 2, 3, 4
    )
UPDATE fact_spv_commissioned_lot AS to_update
SET sn_count = sn_count + agg_data.count
FROM agg_data
WHERE agg_data.sk_company_id = to_update.sk_company_id
    AND agg_data.lot_number = to_update.lot_number
    AND agg_data.sk_product_id = to_update.sk_product_id
    AND agg_data.sk_packaging_level_id = to_update.sk_packaging_level_id
;

...或者使用子选择样式完全缩短一点:

UPDATE fact_spv_commissioned_lot AS to_update
SET sn_count = sn_count + (
    SELECT COUNT(*)
    FROM staging_serials s
        JOIN dim_md_company c ON (c.lsc_company_id = s.companyid)
        JOIN staging_product p ON (s.compositeproductcode = p.compositeproductcode)
        JOIN dim_packaging_level l ON (l.unit_of_measure = p.packaginguom)
    WHERE s.companyid = to_update.sk_company_id
        AND s.lotnumber = to_update.lot_number
        AND c.sk_company_id = to_update.sk_company_id
        AND p.sk_product_id = to_update.sk_product_id
        AND l.sk_packaging_level_id = to_update.sk_packaging_level_id
);

如果您的表是中型到大型(数百万到数十亿行),CTE 版本的性能也应该更好(尤其是使用主键列的第一个变体),尽管在 SQL 中有点冗长。


推荐阅读