首页 > 解决方案 > 相对于 BigQuery 中的另一个数组列对一个数组列进行排序

问题描述

我在 Bigquery 中有下表 -

WITH results AS
  (SELECT 1 as customerid, ["apples", "bananas", "grapes","orange"] as fruit_array, [0.1,0.4,0.3,0.2] as probability
  UNION ALL
  SELECT 2 as customerid, ["apples", "bananas", "grapes","orange"] as fruit_array, [0.2,0.1,0.6,0.1] as probability
  UNION ALL
  SELECT 3 as customerid, ["apples", "bananas", "grapes","orange"] as fruit_array, [0.5,0.05,0.35,0.1] as probability
  )
 select * from results

在这里,每个顾客都有一定的概率购买一个水果。我想top 2为每个客户和他们相应probabilities的购买买水果。

有类似这样的输出会很好 -

customerid, fruits, probability
1, bananas, 0.4
1, grapes, 0.3
..

在上面的最终结果中,因为customerid 1我只拿起bananas并且grapes因为这两种水果的购买概率最高(来自[0.1,0.4,0.3,0.2]

有没有我可以在 BiqQuery 中使用的功能来实现这一点?

标签: google-bigquery

解决方案


以下是 BigQuery 标准 SQL

#standardSQL
WITH results AS (
  SELECT 1 AS customerid, ["apples", "bananas", "grapes","orange"] AS fruit_array, [0.1,0.4,0.3,0.2] AS probability   UNION ALL
  SELECT 2 AS customerid, ["apples", "bananas", "grapes","orange"] AS fruit_array, [0.2,0.1,0.6,0.1] AS probability   UNION ALL
  SELECT 3 AS customerid, ["apples", "bananas", "grapes","orange"] AS fruit_array, [0.5,0.05,0.35,0.1] AS probability
)
SELECT customerid, fruit, probability
FROM (
  SELECT customerid, ARRAY_AGG(STRUCT(fruit, probability) ORDER BY probability DESC LIMIT 2) top
  FROM results, 
    UNNEST(probability) probability WITH OFFSET off1
    JOIN UNNEST(fruit_array) fruit WITH OFFSET off2
    ON off1 = off2
  GROUP BY customerid
), UNNEST(top)  

结果

Row customerid  fruit   probability  
1   1           bananas 0.4  
2   1           grapes  0.3  
3   2           grapes  0.6  
4   2           apples  0.2  
5   3           apples  0.5  
6   3           grapes  0.35     

或者可能是更好的选择

#standardSQL
WITH results AS (
  SELECT 1 AS customerid, ["apples", "bananas", "grapes","orange"] AS fruit_array, [0.1,0.4,0.3,0.2] AS probability   UNION ALL
  SELECT 2 AS customerid, ["apples", "bananas", "grapes","orange"] AS fruit_array, [0.2,0.1,0.6,0.1] AS probability   UNION ALL
  SELECT 3 AS customerid, ["apples", "bananas", "grapes","orange"] AS fruit_array, [0.5,0.05,0.35,0.1] AS probability
)
SELECT customerid, fruit, probability
FROM (
  SELECT customerid, 
    (
      SELECT ARRAY_AGG(STRUCT(fruit, probability) ORDER BY probability DESC LIMIT 2) 
      FROM   UNNEST(probability) probability WITH OFFSET off1
      JOIN UNNEST(fruit_array) fruit WITH OFFSET off2
      ON off1 = off2
    ) top
  FROM results
), UNNEST(top)

结果相同


推荐阅读