首页 > 解决方案 > Json 数组 - 选择左侧数组中不在右侧数组中的元素

问题描述

对于每个 cust_id,我已经为每个 shape_id 聚合了 array1 列之后 created_at 的所有行。请注意,array1 列是 json!现在我有两个 json 数组列 (array1,array2) 我想选择左侧数组 (array1) 但不在右侧 (array) 中的所有元素。

谁能帮我?

提前谢谢了...

我已经尝试了下面的代码,并被困在去哪里。我知道我可以使用 json_array_elements 并进行左连接,然后使用 json_agg 将其重新组合在一起,但认为必须有更优雅的方式。

create table a
(cust_id numeric,shape_id numeric,array1 json,created_at timestamp with time zone);

insert into a
(cust_id,shape_id,array1,created_at)
values
(123,1,'[1,2,3]','2019-07-23 13:42:33+10'),
(456,1,'[3,4,5]','2019-07-23 13:44:52+10'),
(789,1,'[2,10,11]','2019-07-23 13:48:11+10'),
(555,2,'[5,4,3,2]','2019-07-26 13:48:11+10'),
(888,2,'[1]','2019-07-27 13:48:11+10'),
(982,3,'["x"]','2019-07-23 13:48:11+10');

SELECT a1.cust_id,
       a1.shape_id,
       a1.array1,
       a1.created_at,
       (SELECT json_agg(jae.e)
               FROM a a2
                    CROSS JOIN LATERAL json_array_elements(a2.array1) jae (e)
               WHERE a2.shape_id = a1.shape_id
                     AND a2.created_at > a1.created_at) array2
       FROM a a1;

实际结果:

cust_id  |  shape_id  | array1    | created_at              |    array2
------------------------------------------------------------------------------
123      | 1          | [1,2,3]   | 2019-07-23 13:42:33+10  | [3,4,5,2,10,11]  
456      | 1          | [3,4,5]   | 2019-07-23 13:44:52+10  | [2,10,11]    
789      | 1          | [2,10,11] | 2019-07-23 13:48:11+10  |
555      | 2          | [5,4,3,2] | 2019-07-26 13:48:11+10  | [1]
888      | 2          | [1]       | 2019-07-27 13:48:11+10  |
982      | 3          | ["x"]     | 2019-07-23 13:48:11+10  |

预期成绩:

cust_id  |  shape_id  | array1    | created_at              |  
-------------------------------------------------------------
123      | 1          | [1]       | 2019-07-23 13:42:33+10  |  
456      | 1          | [3,4,5]   | 2019-07-23 13:44:52+10  |     
789      | 1          | [2,10,11] | 2019-07-23 13:48:11+10  |
555      | 2          | [5,4,3,2] | 2019-07-26 13:48:11+10  | 
888      | 2          | [1]       | 2019-07-27 13:48:11+10  |
982      | 3          | ["x"]     | 2019-07-23 13:48:11+10  |

标签: sqljsonpostgresql

解决方案


像这样的东西:

SELECT y.cust_id, y.shape_id, y.created_at, json_agg(y.value) as array1
FROM
(
   SELECT t.cust_id, t.shape_id, j.value, t.created_at,
   ROW_NUMBER() OVER(PARTITION BY t.shape_id, j.value::text ORDER BY t.created_at DESC) as rn
   FROM 
    a t
    CROSS JOIN LATERAL json_array_elements(t.array1) j(value)
) y
WHERE y.rn = 1
GROUP BY y.cust_id, y.shape_id, y.created_at

它需要您的数据(片段):

123,1,[1,2,3],2019-07-23 13:42
456,1,[3,4,5],2019-07-23 13:44

并扩展 Json,因此数组中的每个元素都是其自己行上的单个值,复制所有其他行数据,因此它变为(扩展片段):

123,1,1,2019-07-23 13:42
123,1,2,2019-07-23 13:42
123,1,3,2019-07-23 13:42
456,1,3,2019-07-23 13:44
456,1,4,2019-07-23 13:44
456,1,5,2019-07-23 13:44

然后它通过仅选择最新的行来减少它,该行是通过按顺序对 shape/array1_element 组合进行编号确定的,按照 created_at 的降序排列。然后它只选择 created_at = 1 的那些。下面是那个神奇的查询:

SELECT * FROM
(
  SELECT *, ROW_NUMBER() OVER(PARTITION BY shape_id, element ORDER BY created_at DESCENDING) as rn
  FROM <expanded_data>
) 
WHERE rn = 1

这将仅返回最新的形状/元素配对及其关联的客户:

123,1,1,2019-07-23 13:42
123,1,2,2019-07-23 13:42
                           <--the array1_element = 3 row from 13:42 is removed because a newer one from 13:44 exists
456,1,3,2019-07-23 13:44
456,1,4,2019-07-23 13:44
456,1,5,2019-07-23 13:44

然后我们根据形状/客户组合将元素重新聚合到 Json 数组中

123,1,[1,2],  2019-07-23 13:42
456,1,[3,4,5],2019-07-23 13:44

推荐阅读