sql - REGEXP_SUBSTR 将逗号分隔的值拆分为不起作用的行
问题描述
我有一个表结构,其中包含以逗号分隔值的列中的数据。我想将这些值分成几行。
表中的数据是这样的
MANUFACTURER_PART_NUMBER|MANUFACTURER_NAME |TEST_PLAN |ATTACHED_SUPPLIER_DOCUMENT |ATTACHED_LOGITECH_REPORT|
------------------------|-------------------------|--------------------|---------------------------|------------------------|
001059-0000 |CHENGDA |268452,268453 |268456,268457,268459,268460|268465 |
001059-0000 |SHANGHAI MARRISON CO.,LTD| |268458,268462 | |
001059-0000 |SUZHOU SHARETECH | | | |
001059-0000 |SYSTRON |268451,268452,268453|268456,268457,268459,268460|268465,268466 |
我已经尝试过这个查询,但它表现不佳,不确定原因
WITH CTE AS(
SELECT DISTINCT
MP.PART_NUMBER MANUFACTURER_PART_NUMBER, M.NAME MANUFACTURER_NAME,
RTRIM(LTRIM(PG3.MULTILIST31, ','), ',') TEST_PLAN,
RTRIM(LTRIM(PG3.MULTILIST32, ','), ',') ATTACHED_SUPPLIER_DOCUMENT,
RTRIM(LTRIM(PG3.MULTILIST33, ','), ',') ATTACHED_LOGITECH_REPORT
FROM MANU_PARTS MP
INNER JOIN MANUFACTURERS M ON M.ID = MP.MANU_ID
LEFT JOIN PAGE_TWO PG2 ON PG2.ID = MP.ID
LEFT JOIN PAGE_THREE PG3 ON PG3.ID = MP.ID
WHERE PART_NUMBER = '001059-0000'
)
SELECT DISTINCT
MANUFACTURER_PART_NUMBER, MANUFACTURER_NAME,
REGEXP_SUBSTR(TEST_PLAN, '[^,]+', 1, LEVEL) TEST_PLAN,
REGEXP_SUBSTR(ATTACHED_SUPPLIER_DOCUMENT, '[^,]+', 1, LEVEL) ATTACHED_SUPPLIER_DOCUMENT,
REGEXP_SUBSTR(ATTACHED_LOGITECH_REPORT, '[^,]+', 1, LEVEL) ATTACHED_LOGITECH_REPORT
FROM CTE
CONNECT BY REGEXP_SUBSTR(NVL(NVL(TEST_PLAN, ATTACHED_SUPPLIER_DOCUMENT), ATTACHED_LOGITECH_REPORT), '[^,]+', 1, LEVEL) IS NOT NULL
ORDER BY MANUFACTURER_NAME
它应该提供以下也是必需的输出
MANUFACTURER_PART_NUMBER|MANUFACTURER_NAME |TEST_PLAN|ATTACHED_SUPPLIER_DOCUMENT|ATTACHED_LOGITECH_REPORT|
------------------------|-------------------------|---------|--------------------------|------------------------|
001059-0000 |CHENGDA |268452 |268456 |268465 |
001059-0000 |CHENGDA |268453 |268457 | |
001059-0000 |CHENGDA | |268459 | |
001059-0000 |CHENGDA | |268460 | |
001059-0000 |SHANGHAI MARRISON CO.,LTD| |268458 | |
001059-0000 |SHANGHAI MARRISON CO.,LTD| |268462 | |
001059-0000 |SUZHOU SHARETECH | | | |
001059-0000 |SYSTRON |268451 |268456 |268465 |
001059-0000 |SYSTRON |268452 |268457 |268466 |
001059-0000 |SYSTRON |268453 |268459 | |
001059-0000 |SYSTRON | |268460 | |
但相反,它给出了这个输出并缺少 2 行或第CHENGDA
1 行和最后 1 行的值SYSTRON
MANUFACTURER_PART_NUMBER|MANUFACTURER_NAME |TEST_PLAN|ATTACHED_SUPPLIER_DOCUMENT|ATTACHED_LOGITECH_REPORT|
------------------------|-------------------------|---------|--------------------------|------------------------|
001059-0000 |CHENGDA |268452 |268456 |268465 |
001059-0000 |CHENGDA |268453 |268457 | |
001059-0000 |SHANGHAI MARRISON CO.,LTD| |268458 | |
001059-0000 |SHANGHAI MARRISON CO.,LTD| |268462 | |
001059-0000 |SUZHOU SHARETECH | | | |
001059-0000 |SYSTRON |268451 |268456 |268465 |
001059-0000 |SYSTRON |268452 |268457 |268466 |
001059-0000 |SYSTRON |268453 |268459 | |
我发现的原因是当我在 Connect By 子句中交换和的位置时Test Plan
,Attached Supplier Document
这可能是因为两列中的值数量。但这不应该发生,因为如果测试计划在任何情况下都有更多的价值呢?
希望有人能帮忙。
(在将此问题标记为重复之前,请询问它是否已经有答案,因为该查询不起作用)
解决方案
正如评论中所指出的,您的数据模型并不理想,您不应该将值列表存储在字符串中。假设您坚持这一点,您的问题是列表具有不同数量的元素,因此每个列表的深度可能不同,从而导致对level
;的限制不同。正如您已经意识到的那样,试图猜测它的基础将是一个问题。
您可能会尝试找出每个 ID 使用哪个值regexp_count
,greatest
但这对于处理多行也会很棘手。(这distinct
表明您已经在那里遇到了问题......)
另一种方法是单独拆分每个列表,然后尝试关联匹配的元素——尽管这本身似乎相当随意。这似乎得到了你想要的结果:
select mp.part_number,
m.name,
t.value31,
t.value32,
t.value33
from manu_parts mp
join manufacturers m
on m.id = mp.manu_id
left join (
select coalesce(t31.id, t32.id, t33.id) as id,
coalesce(t31.lvl, t32.lvl, t33.lvl) as lvl,
t31.value as value31,
t32.value as value32,
t33.value as value33
from
(
select id, level as lvl,
regexp_substr(multilist31, '(.*?)(,|$)', 1, level, null, 1) as value
from page_three
connect by id = prior id
and prior dbms_random.value is not null
and level < regexp_count(multilist31, '(.*?)(,|$)')
) t31
full outer join (
select id, level as lvl,
regexp_substr(multilist32, '(.*?)(,|$)', 1, level, null, 1) as value
from page_three
connect by id = prior id
and prior dbms_random.value is not null
and level < regexp_count(multilist32, '(.*?)(,|$)')
) t32
on t32.id = t31.id and t32.lvl = t31.lvl
full outer join (
select id, level as lvl,
regexp_substr(multilist33, '(.*?)(,|$)', 1, level, null, 1) as value
from page_three
connect by id = prior id
and prior dbms_random.value is not null
and level < regexp_count(multilist33, '(.*?)(,|$)')
) t33
on (t33.id = t32.id and t33.lvl = t32.lvl)
or (t33.id = t31.id and t33.lvl = t31.lvl)
) t on t.id = mp.id
where part_number = '001059-0000'
order by part_number, name, lvl;
PART_NUMBER NAME VALUE31 VALUE32 VALUE33
----------- ------------------------- ---------- ---------- ----------
001059-0000 CHENGDA 268452 268456 268465
001059-0000 CHENGDA 268453 268457
001059-0000 CHENGDA 268459
001059-0000 CHENGDA 268460
001059-0000 SHANGHAI MARRISON CO.,LTD 268458
001059-0000 SHANGHAI MARRISON CO.,LTD 268462
001059-0000 SUZHOU SHARETECH
001059-0000 SYSTRON 268451 268456 268465
001059-0000 SYSTRON 268452 268457 268466
001059-0000 SYSTRON 268453 268459
001059-0000 SYSTRON 268460
11 rows selected.
每个子查询如:
(
select id, level as lvl,
regexp_substr(multilist31, '(.*?)(,|$)', 1, level, null, 1) as value
from page_three
connect by id = prior id
and prior dbms_random.value is not null
and level < regexp_count(multilist31, '(.*?)(,|$)')
) t31
从该表中拆分单个列,保留稍后连接的 ID,以及与该 ID 的其他列关联的级别。然后将其中三个从外部连接在一起;并将其结果连接到主表。
性能可能很糟糕,但是如果您要过滤单个零件号,那么它可能会被推送到子查询中,因此它不会做比需要更多的工作。
您还可以查看递归子查询分解而不是分层查询,或者可能是与集合相关的子查询,但它们都有自己的问题。当您必须解决错误的数据模型时,就会发生这种情况。
推荐阅读
- protractor - 量角器:使用 API 调用登录应用程序
- xml - 如何在 XSLT 中计算 for Loop 中父节点的所有子节点
- c# - 我们可以在datagridview中更改列标题文本的某些部分的字体吗
- go - time.Sleep() 会阻塞上下文吗?
- android - 如何在 Java 中以编程方式设置微光布局持续时间?
- azure - Azure DevOps“Visual Studio 测试”-使用“覆盖参数”时无法启用“重新运行失败的测试”
- python - 如何连接数据框选择标准
- java - SimpleDateFormat 返回 12 小时前 5.30 到 6.30 之间的时间
- android - 在地图上显示几个书签
- scala - 如何在火花中只读取新文件