首页 > 解决方案 > REGEXP_SUBSTR 将逗号分隔的值拆分为不起作用的行

问题描述

我有一个表结构,其中包含以逗号分隔值的列中的数据。我想将这些值分成几行。

表中的数据是这样的

MANUFACTURER_PART_NUMBER|MANUFACTURER_NAME        |TEST_PLAN           |ATTACHED_SUPPLIER_DOCUMENT |ATTACHED_LOGITECH_REPORT|
------------------------|-------------------------|--------------------|---------------------------|------------------------|
001059-0000             |CHENGDA                  |268452,268453       |268456,268457,268459,268460|268465                  |
001059-0000             |SHANGHAI MARRISON CO.,LTD|                    |268458,268462              |                        |
001059-0000             |SUZHOU SHARETECH         |                    |                           |                        |
001059-0000             |SYSTRON                  |268451,268452,268453|268456,268457,268459,268460|268465,268466           |

我已经尝试过这个查询,但它表现不佳,不确定原因

WITH CTE AS(
SELECT DISTINCT 
MP.PART_NUMBER MANUFACTURER_PART_NUMBER, M.NAME MANUFACTURER_NAME, 
RTRIM(LTRIM(PG3.MULTILIST31, ','), ',') TEST_PLAN,
RTRIM(LTRIM(PG3.MULTILIST32, ','), ',') ATTACHED_SUPPLIER_DOCUMENT,
RTRIM(LTRIM(PG3.MULTILIST33, ','), ',') ATTACHED_LOGITECH_REPORT
FROM MANU_PARTS MP
INNER JOIN MANUFACTURERS M ON M.ID = MP.MANU_ID 
LEFT JOIN PAGE_TWO PG2 ON PG2.ID = MP.ID
LEFT JOIN PAGE_THREE PG3 ON PG3.ID = MP.ID

WHERE PART_NUMBER = '001059-0000'
)

SELECT DISTINCT 
MANUFACTURER_PART_NUMBER, MANUFACTURER_NAME, 
REGEXP_SUBSTR(TEST_PLAN, '[^,]+', 1, LEVEL) TEST_PLAN, 
REGEXP_SUBSTR(ATTACHED_SUPPLIER_DOCUMENT, '[^,]+', 1, LEVEL) ATTACHED_SUPPLIER_DOCUMENT, 
REGEXP_SUBSTR(ATTACHED_LOGITECH_REPORT, '[^,]+', 1, LEVEL) ATTACHED_LOGITECH_REPORT
FROM CTE 
CONNECT BY REGEXP_SUBSTR(NVL(NVL(TEST_PLAN, ATTACHED_SUPPLIER_DOCUMENT), ATTACHED_LOGITECH_REPORT), '[^,]+', 1, LEVEL) IS NOT NULL
ORDER BY MANUFACTURER_NAME

它应该提供以下也是必需的输出

MANUFACTURER_PART_NUMBER|MANUFACTURER_NAME        |TEST_PLAN|ATTACHED_SUPPLIER_DOCUMENT|ATTACHED_LOGITECH_REPORT|
------------------------|-------------------------|---------|--------------------------|------------------------|
001059-0000             |CHENGDA                  |268452   |268456                    |268465                  |
001059-0000             |CHENGDA                  |268453   |268457                    |                        |
001059-0000             |CHENGDA                  |         |268459                    |                        |
001059-0000             |CHENGDA                  |         |268460                    |                        |
001059-0000             |SHANGHAI MARRISON CO.,LTD|         |268458                    |                        |
001059-0000             |SHANGHAI MARRISON CO.,LTD|         |268462                    |                        |
001059-0000             |SUZHOU SHARETECH         |         |                          |                        |
001059-0000             |SYSTRON                  |268451   |268456                    |268465                  |
001059-0000             |SYSTRON                  |268452   |268457                    |268466                  |
001059-0000             |SYSTRON                  |268453   |268459                    |                        |
001059-0000             |SYSTRON                  |         |268460                    |                        |

但相反,它给出了这个输出并缺少 2 行或第CHENGDA1 行和最后 1 行的值SYSTRON

MANUFACTURER_PART_NUMBER|MANUFACTURER_NAME        |TEST_PLAN|ATTACHED_SUPPLIER_DOCUMENT|ATTACHED_LOGITECH_REPORT|
------------------------|-------------------------|---------|--------------------------|------------------------|
001059-0000             |CHENGDA                  |268452   |268456                    |268465                  |
001059-0000             |CHENGDA                  |268453   |268457                    |                        |
001059-0000             |SHANGHAI MARRISON CO.,LTD|         |268458                    |                        |
001059-0000             |SHANGHAI MARRISON CO.,LTD|         |268462                    |                        |
001059-0000             |SUZHOU SHARETECH         |         |                          |                        |
001059-0000             |SYSTRON                  |268451   |268456                    |268465                  |
001059-0000             |SYSTRON                  |268452   |268457                    |268466                  |
001059-0000             |SYSTRON                  |268453   |268459                    |                        |

我发现的原因是当我在 Connect By 子句中交换和的位置时Test PlanAttached Supplier Document这可能是因为两列中的值数量。但这不应该发生,因为如果测试计划在任何情况下都有更多的价值呢?

希望有人能帮忙。

(在将此问题标记为重复之前,请询问它是否已经有答案,因为该查询不起作用)

标签: sqloracle

解决方案


正如评论中所指出的,您的数据模型并不理想,您不应该将值列表存储在字符串中。假设您坚持这一点,您的问题是列表具有不同数量的元素,因此每个列表的深度可能不同,从而导致对level;的限制不同。正如您已经意识到的那样,试图猜测它的基础将是一个问题。

您可能会尝试找出每个 ID 使用哪个值regexp_countgreatest但这对于处理多行也会很棘手。(这distinct表明您已经在那里遇到了问题......)

另一种方法是单独拆分每个列表,然后尝试关联匹配的元素——尽管这本身似乎相当随意。这似乎得到了你想要的结果:

select mp.part_number,
  m.name,
  t.value31,
  t.value32,
  t.value33
from manu_parts mp
join manufacturers m
on m.id = mp.manu_id
left join (
  select coalesce(t31.id, t32.id, t33.id) as id,
    coalesce(t31.lvl, t32.lvl, t33.lvl) as lvl,
    t31.value as value31,
    t32.value as value32,
    t33.value as value33
  from 
  (
    select id, level as lvl,
      regexp_substr(multilist31, '(.*?)(,|$)', 1, level, null, 1) as value
    from page_three
    connect by id = prior id
    and prior dbms_random.value is not null
    and level < regexp_count(multilist31, '(.*?)(,|$)')
  ) t31
  full outer join (
    select id, level as lvl,
      regexp_substr(multilist32, '(.*?)(,|$)', 1, level, null, 1) as value
    from page_three
    connect by id = prior id
    and prior dbms_random.value is not null
    and level < regexp_count(multilist32, '(.*?)(,|$)')
  ) t32 
  on t32.id = t31.id and t32.lvl = t31.lvl
  full outer join (
    select id, level as lvl,
      regexp_substr(multilist33, '(.*?)(,|$)', 1, level, null, 1) as value
    from page_three
    connect by id = prior id
    and prior dbms_random.value is not null
    and level < regexp_count(multilist33, '(.*?)(,|$)')
  ) t33
  on (t33.id = t32.id and t33.lvl = t32.lvl)
  or (t33.id = t31.id and t33.lvl = t31.lvl)
) t on t.id = mp.id
where part_number = '001059-0000'
order by part_number, name, lvl;
PART_NUMBER NAME                      VALUE31    VALUE32    VALUE33   
----------- ------------------------- ---------- ---------- ----------
001059-0000 CHENGDA                   268452     268456     268465    
001059-0000 CHENGDA                   268453     268457               
001059-0000 CHENGDA                              268459               
001059-0000 CHENGDA                              268460               
001059-0000 SHANGHAI MARRISON CO.,LTD            268458               
001059-0000 SHANGHAI MARRISON CO.,LTD            268462               
001059-0000 SUZHOU SHARETECH                                          
001059-0000 SYSTRON                   268451     268456     268465    
001059-0000 SYSTRON                   268452     268457     268466    
001059-0000 SYSTRON                   268453     268459               
001059-0000 SYSTRON                              268460               

11 rows selected. 

每个子查询如:

  (
    select id, level as lvl,
      regexp_substr(multilist31, '(.*?)(,|$)', 1, level, null, 1) as value
    from page_three
    connect by id = prior id
    and prior dbms_random.value is not null
    and level < regexp_count(multilist31, '(.*?)(,|$)')
  ) t31

从该表中拆分单个列,保留稍后连接的 ID,以及与该 ID 的其他列关联的级别。然后将其中三个从外部连接在一起;并将其结果连接到主表。

性能可能很糟糕,但是如果您要过滤单个零件号,那么它可能会被推送到子查询中,因此它不会做比需要更多的工作。

您还可以查看递归子查询分解而不是分层查询,或者可能是与集合相关的子查询,但它们都有自己的问题。当您必须解决错误的数据模型时,就会发生这种情况。


推荐阅读