首页 > 解决方案 > 实现 regex_replace 的更聪明的方法?

问题描述

我正在尝试使用 regex_replace 解决此问题,但想知道是否有更聪明的方法来解决它并阻止我在将来添加更多嵌套的 regex_replace 函数来解决每种情况。

有关设置,请参见以下 sqlfiddle。http://sqlfiddle.com/#!17/82948/12

我要解决的主要问题是重复值“ACK”或“ZEBRA”或它们的组合。

所以基本上它不应该同时包含 ZEBRA 和 ACK。如果是,则取最接近该数字的 ACK 或 ZEBRA。

  1. ACK_ACK_DOV应该是ACK_DOV
  2. ZEBRA_ZEBRA_DOV应该是ZEBRA_DOV
  3. ZEBRA_ACK_ACK_DOV应该是ACK_DOV
  4. ZEBRA_ZEBRA_ACK_DOV应该是ACK_DOV
  5. ZEBRA_393939_DOV应该是 ZEBRA_393939_DOV
  6. ZEBRA_ZEBRA_29393930应该是ZEBRA_29393930
价值 固定的 理想的
ACK_ACK_DOV_90000 ACK_DOV_90000 ACK_DOV_90000
ACK_910101 ACK_910101 ACK_910101
ACK_XIS_900000000 ACK_XIS_900000000 ACK_XIS_900000000
GGG_0000000 GGG_0000000 GGG_0000000
ASC_VNA_303930 ASC_VNA_303930 ASC_VNA_303930
ACK_393848489 ACK_393848489 ACK_393848489
ACK_VNA_30303 ACK_VNA_30303 ACK_VNA_30303
ACK_XPM_303030303030 ACK_XPM_303030303030 ACK_XPM_303030303030
ACK_ACK_DOV_39393 ACK_DOV_39393 ACK_DOV_39393
ZEBRA_0393930 ZEBRA_0393930 ZEBRA_0393930
ZEBRA_393939_DOV ZEBRA_393939_DOV ZEBRA_393939_DOV
ZEBRA_VNA_3930321 ZEBRA_VNA_3930321 ZEBRA_VNA_3930321
ZEBRA_ACK_ACK_DOV_3934994 ZEBRA_ACK_DOV_3934994 ACK_DOV_3934994
ZEBRA_ZEBRA_29393930 ZEBRA_ZEBRA_29393930 斑马_29393930

先感谢您!!

标签: sqlregexsnowflake-cloud-data-platform

解决方案


而不是使用正则表达式,因为没有可用的反向匹配语法,将您的逻辑转换为下划线拆分,计算出现“坏”的标记,只保留好的或最后一个坏的,然后将它们粘在一起

with data(value,fixed,ideal) as (
    select * from values
        ('ACK_ACK_DOV_90000','ACK_VOD_90000','ACK_VOD_90000')
        ,('ACK_910101','ACK_910101','ACK_910101')
        ,('ACK_XIS_900000000','ACK_XIS_900000000','ACK_XIS_900000000')
        ,('GGG_0000000','GGG_0000000','GGG_0000000')
        ,('ASC_VNA_303930','ASC_VNA_303930','ASC_VNA_303930')
        ,('ACK_393848489','ACK_393848489','ACK_393848489')
        ,('ACK_VNA_30303','ACK_VNA_30303','ACK_VNA_30303')
        ,('ACK_XPM_303030303030','ACK_XPM_303030303030','ACK_XPM_303030303030')
        ,('ACK_ACK_DOV_39393','ACK_VOD_39393','ACK_VOD_39393')
        ,('ZEBRA_0393930','ZEBRA_0393930','ZEBRA_0393930')
        ,('ZEBRA_393939_DOV','ZEBRA_393939_DOV','ZEBRA_393939_DOV')
        ,('ZEBRA_VNA_3930321','ZEBRA_VNA_3930321','ZEBRA_VNA_3930321')
        ,('ZEBRA_ACK_ACK_DOV_3934994','ZEBRA_ACK_VOD_3934994','ACK_VOD_3934994')
        ,('ZEBRA_ZEBRA_29393930','ZEBRA_ZEBRA_29393930','ZEBRA_29393930')
)
select org_value
    ,seq
    ,array_to_string(array_agg(part) within group (order by index), '_') as output
from (
    select d.value as org_value
        ,f.seq
        ,f.index
        ,f.value as part
        ,case when part='ZEBRA' then 1
            when part='ACK' then 1
            else 0
         end bad_bit
        ,sum(bad_bit)over(partition by f.seq order by f.index desc) as c
    from data d, table(split_to_table(d.value,'_')) f
)
where c <= 1
group by org_value, seq
order by seq

给出:

ORG_VALUE                SEQ    OUTPUT
ACK_ACK_DOV_90000        1  ACK_DOV_90000
ACK_910101               2  ACK_910101
ACK_XIS_900000000        3  ACK_XIS_900000000
GGG_0000000              4  GGG_0000000
ASC_VNA_303930           5  ASC_VNA_303930
ACK_393848489            6  ACK_393848489
ACK_VNA_30303            7  ACK_VNA_30303
ACK_XPM_303030303030     8  ACK_XPM_303030303030
ACK_ACK_DOV_39393        9  ACK_DOV_39393
ZEBRA_0393930            10 ZEBRA_0393930
ZEBRA_393939_DOV         11 ZEBRA_393939_DOV
ZEBRA_VNA_3930321        12 ZEBRA_VNA_3930321
ZEBRA_ACK_ACK_DOV_3934994   13  ACK_DOV_3934994
ZEBRA_ZEBRA_29393930     14 ZEBRA_29393930

推荐阅读