首页 > 解决方案 > 按名称划分行然后为每个名称转置一列的最有效方法

问题描述

我在 Google Bigquery 中使用标准 SQL。所以我有一些关于这种格式的指标的数据:

Date        | metric_name  | metric_level
01/02/2019  | metric_one   | 1
02/03/2019  | metric_one   | 2
14/02/2019  | metric_two   | 6
17/02/2019  | metric_two   | 4
01/03/2019  | metric_three | 2
10/03/2019  | metric_three | 7

我想以这种格式获取它,日期历史可以追溯到一年,然后为每个日期填写每个指标。如果指标没有特定日期的数据,则它使用最新的数据点:

Date        | metric_one   | metric_two   | metric_three
..........
01/02/2019  | 1            | null         | null
02/02/2019  | 1            | null         | null
03/02/2019  | 1            | null         | null
...........
...........
13/02/2019  | 1            | null         | null
14/02/2019  | 1            | 6            | null
15/02/2019  | 1            | 6            | null
...........
...........
09/03/2019  | 2            | 4            | 2
10/03/2019  | 2            | 4            | 7
11/03/2019  | 2            | 4            | 7
...........

等等。

我已经设法编写了一些代码来做到这一点,但我想知道是否有更有效的方法来做到这一点。实际上有超过 3 个指标,所以如果我能以任何方式使其更高效,那么从长远来看,它将节省大量资源。

这是我的代码

    WITH date_arr AS(

        SELECT 
        date

        FROM UNNEST(
            GENERATE_DATE_ARRAY(
                DATE_SUB(CURRENT_DATE(),INTERVAL 365 DAY), 
                CURRENT_DATE(), 
                INTERVAL 1 day
            )
        ) AS date

    ),

    metric_one_raw AS (

        SELECT 
        date,
        metric_level

        FROM database
        WHERE metric_name = 'metric_one'

    ),

    metric_one_gapless AS (

        SELECT
        d.date AS date,
        IFNULL(metric_level, LAST_VALUE(metric_level IGNORE NULLS) OVER(window_latest)) AS metric_one

        FROM date_arr d
        LEFT JOIN metric_one_raw i
        ON d.date = i.date
        WINDOW window_latest AS (ORDER BY d.date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

    ),

    metric_two_raw AS (

        SELECT 
        date,
        metric_level

        FROM database
        WHERE metric_name = 'metric_two'

    ),

    metric_two_gapless AS (

        SELECT
        d.date AS date,
        IFNULL(metric_level, LAST_VALUE(metric_level IGNORE NULLS) OVER(window_latest)) AS metric_two

        FROM date_arr d
        LEFT JOIN metric_two_raw i
        ON d.date = i.date
        WINDOW window_latest AS (ORDER BY d.date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

    ),

    metric_three_raw AS (

        SELECT 
        date,
        metric_level

        FROM database
        WHERE metric_name = 'metric_three'

    ),

    metric_three_gapless AS (

        SELECT
        d.date AS date,
        IFNULL(metric_level, LAST_VALUE(metric_level IGNORE NULLS) OVER(window_latest)) AS metric_three

        FROM date_arr d
        LEFT JOIN metric_three_raw i
        ON d.date = i.date
        WINDOW window_latest AS (ORDER BY d.date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

    )

    SELECT
    *
    FROM metric_one_gapless
    LEFT JOIN metric_two_gapless USING(date)
    LEFT JOIN metric_three_gapless USING(date)

希望这是有道理的。提前致谢!

标签: sqlgoogle-bigquerybigquery-standard-sql

解决方案


在做了一些研究之后,我想出了一些东西,由于您使用的是左连接,并且左连接的数量可能不止一个,甚至可变数量,而且您不能declare在 BigQuery Web UI 中使用,您可能需要更好使用 API Rest BigQuery功能,您可以在此处找到依赖项,您可以使用 C#、GO、JAVA、NODE.JS、PHP、PYTHON 或 RUBY 编码,这将允许您将指标的数量分配给变量,所以我建议首先进行选择不同以了解有多少指标,然后您可以将它们保存到变量中,然后执行循环以执行您想要的左连接。

我希望这些信息对您有所帮助,如果您需要更多信息,我在这里。


推荐阅读