首页 > 解决方案 > 删除字符左右的单词

问题描述

使用 MS SQL 2008,我需要删除包含在文本字符串中的电子邮件地址。IE:

“这是一个带有多个电子邮件地址的示例文本行,例如 fred@example.com,或者我也可以使用 bert@home.co.uk 甚至某人@somewhere.pl 来混淆”。

期望的结果是:

“这是一个包含多个电子邮件地址的示例文本行,或者我也可以拥有甚至混合起来”

甚至:

“这是一个包含多个电子邮件地址的示例文本行,例如 fred@***** 或者我也可以使用 bert@***** 甚至某人@***** 来混淆”。

有很多例子可以删除某个字符左侧或右侧的所有内容,但不会删除固定字符左侧或右侧的文本,直到第一个空格。任何帮助表示赞赏。

标签: sql-server-2008

解决方案


假设文本字符串在表列中,您可以在 SQL Server 2008 中借助字符串吐出函数和通过for xml. 在较新版本的 SQL Server 中(您应该真正尝试迁移到该版本),使用类似string_agg而不是stuff( ... for xml )组合的功能可以更轻松地实现这一点。

这种实现是贪婪的,因为它将掩盖两个空格(或字符串的开头/结尾)之间的任何字符组合,其中至少包含一个@字符。fullsentences-like^this'one%will£be$included,so\long)as/there>is.a=@*character&somewhere以及仅由一个或多个@字符组成的字符串。

我将把如何处理不是电子邮件地址的字符串留给你。

询问

with t as
(
    select *
    from(values('This is a sample line of text with multiple email addresses like fred@example.com or I could also have bert@home.co.uk or even someone@somewhere.pl to mix things up')
              ,('This is another sample line of text with multiple email addresses like fred2@example2.com or I could also have bert2@home2.co.uk or even someone2@somewhere2.pl to mix things up')
              ,('fullsentences-like^this''one%will£be$included,so\long)as/there>is.a=@*character&somewhere')
              ,('@')
              ,('a @@@@@ b')
              ,('@ @ @ @ @')
              ,('Let''s meet @ the beach')
        ) as s(s)
)
,s as
(
    select t.s
          ,s.rn
          ,case when charindex('@',s.item) > 0 then '***' else s.item end as item
    from t
        cross apply dbo.fn_StringSplit4k(t.s,' ',null) as s
)
select t.s
      ,stuff((select ' ' + s.item
              from s
              where t.s = s.s
              order by s.rn
              for xml path('')
             )
            ,1,1,''
            ) as s
from t;

输出

s s
这是一个带有多个电子邮件地址的示例文本行,例如 fred@example.com 或者我也可以使用 bert@home.co.uk 甚至某人@somewhere.pl 来混合 这是带有多个电子邮件地址的示例文本行,例如 *** 或者我也可以使用 *** 甚至 *** 来混淆
这是另一个带有多个电子邮件地址的示例文本行,例如 fred2@example2.com 或者我也可以使用 bert2@home2.co.uk 甚至someone2@somewhere2.pl 来混合 这是另一个带有多个电子邮件地址的示例文本行,例如 *** 或者我也可以使用 *** 甚至 *** 来混淆
fullsentences-like ^this'one%will£be$included,so\long)as/there>is.a=@*character&somewhere ***
@ ***
一个@@@@@ b a *** b
@@@@@ *** *** *** *** ***
让我们见面@海滩 让我们见见***海滩

字符串拆分功能

create function [dbo].[fn_StringSplit4k]
(
     @str nvarchar(4000) = ' '              -- String to split.
    ,@delimiter as nvarchar(1) = ','        -- Delimiting value to split on.
    ,@num as int = null                     -- Which value to return.
)
returns table
as
return
                    -- Start tally table with 10 rows.
    with n(n)   as (select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1)

                    -- Select the same number of rows as characters in @str as incremental row numbers.
                    -- Cross joins increase exponentially to a max possible 10,000 rows to cover largest @str length.
        ,t(t)   as (select top (select len(isnull(@str,'')) a) row_number() over (order by (select null)) from n n1,n n2,n n3,n n4)

                    -- Return the position of every value that follows the specified delimiter.
        ,s(s)   as (select 1 union all select t+1 from t where substring(isnull(@str,''),t,1) = @delimiter)

                    -- Return the start and length of every value, to use in the SUBSTRING function.
                    -- ISNULL/NULLIF combo handles the last value where there is no delimiter at the end of the string.
        ,l(s,l) as (select s,isnull(nullif(charindex(@delimiter,isnull(@str,''),s),0)-s,4000) from s)
    
    select rn
          ,item
    from(select row_number() over(order by s) as rn
               ,substring(@str,s,l) as item
        from l
        ) a
    where rn = @num
       or @num is null;

推荐阅读