sql - 如何在 Redshift 中使用两个不同的分隔符从两列中拆分数据?
问题描述
我有一个CTE
有这样的数据。它几乎遵循两种格式,counts
并且process_ids
将有这两种类型的数据。
client_id day counts process_ids
--------------------------------------------------------------------------------------------
abc1 Feb-01-2021 3 C1,C2 | C3,C4,C5 | C6,C7
abc2 Feb-05-2021 2, 3 C10,C11,C12 | C13,C14 # C15,C16 | C17,C18
现在我想CTE
在将其拆分后从上面得到下面counts
的输出process_ids
-
client_id day counts process_ids
--------------------------------------------------------
abc1 Feb-01-2021 3 C1
abc1 Feb-01-2021 3 C2
abc1 Feb-01-2021 3 C3
abc1 Feb-01-2021 3 C4
abc1 Feb-01-2021 3 C5
abc1 Feb-01-2021 3 C6
abc1 Feb-01-2021 3 C7
abc2 Feb-05-2021 2 C10
abc2 Feb-05-2021 2 C11
abc2 Feb-05-2021 2 C12
abc2 Feb-05-2021 2 C13
abc2 Feb-05-2021 2 C14
abc2 Feb-05-2021 3 C15
abc2 Feb-05-2021 3 C16
abc2 Feb-05-2021 3 C17
abc2 Feb-05-2021 3 C18
基本上,如果它们遵循任何这些格式,这个想法是基于以下两个用例进行拆分counts
和基础。process_ids
用例 1
如果counts
列只有一位数并且process_ids
列有|
分隔符。
用例 2
如果counts
列只有两位数,由,
分隔符分隔,并且process_ids
列有#
分隔符和pipe
.
我在Amazon Redshift
这里工作,我很困惑如何根据需要将它们分开。
这有可能做到吗?
解决方案
乍一看,这可能看起来有点毛茸茸,但它是由扎实的技术建立起来的,并给出了预期的结果......
SQL
WITH seq_0_9 AS (
SELECT 0 AS d
UNION ALL SELECT 1 AS d
UNION ALL SELECT 2 AS d
UNION ALL SELECT 3 AS d
UNION ALL SELECT 4 AS d
UNION ALL SELECT 5 AS d
UNION ALL SELECT 6 AS d
UNION ALL SELECT 7 AS d
UNION ALL SELECT 8 AS d
UNION ALL SELECT 9 AS d
),
numbers AS (
SELECT a.d + b.d * 10 + c.d * 100 + 1 AS n
FROM seq_0_9 a, seq_0_9 b, seq_0_9 c
),
processed AS
(SELECT client_id,
day,
REPLACE(counts, ' ', '') AS counts,
REPLACE(REPLACE(process_ids, ' ', ''), '|', ',') AS process_ids
FROM tbl),
split_pids AS
(SELECT
client_id,
day,
counts,
split_part(process_ids, '#', n) AS process_ids,
n AS n1
FROM processed
CROSS JOIN numbers
WHERE
split_part(process_ids, '#', n) IS NOT NULL
AND split_part(process_ids, '#', n) != ''),
split_counts AS
(SELECT
client_id,
day,
split_part(counts, ',', n) AS counts,
process_ids,
n1,
n AS n2
FROM split_pids
CROSS JOIN numbers
WHERE
split_part(counts, ',', n) IS NOT NULL
and split_part(counts, ',', n) != ''),
matched_up AS
(SELECT * FROM split_counts WHERE n1 = n2)
SELECT
client_id,
day,
counts,
split_part(process_ids, ',', n) AS process_ids
FROM
matched_up
CROSS JOIN
numbers
WHERE
split_part(process_ids, ',', n) IS NOT NULL
AND split_part(process_ids, ',', n) != '';
演示
在线 rextester 演示(使用 PostgreSQL,但应与 Redshift 兼容):https ://rextester.com/FNA16497
简要说明
此技术用于生成数字表(从 1 到 1000 包括在内)。然后将该技术与多个公用表表达式一起使用多次,以在单个 SQL 语句中实现它。
推荐阅读
- dart - 如何在 Flutter 中将多个小部件作为子级传递?
- php - 如何在没有foreach的情况下获取数组中的每个项目
- java - 将两个独立的地图合并为第三个地图
- swift - NSMenu 的奇怪问题
- php - How to find out the values from a specific column in the results of a MySQL query?
- elasticsearch - 如何在 kibana 数据表中添加多个案例条件
- c# - 我正在尝试使用鼠标拖动元素,但它总是从特定位置而不是当前位置开始拖动
- discord.js - Bot mention as a prefix in Discord.js
- c# - C# How to connect to different databases based on a flag
- javascript - Using external JS libraries in a web component