首页 > 解决方案 > SQL Server:如何在字符串中查询?

问题描述

我有一个要求,我的客户希望我从文本列中检索特定信息

以下是相同的示例

the student scored following result:     class: 6            subject:            result:    english               80       math         23

科学 45

预期的结果需要像 -

English        Maths          Science
80             23             45

我尝试使用 string_split

select value from STRING_SPLIT( (select value from mytable where [student roll number] = 'SCH-01097') , ' ' ) 

但这只会将值拆分为无法查询的多行。

我还尝试将 LTRIM 与 CHARINDEX 方法一起使用,但该列的文本不同,而且并不总是井井有条。大多数情况下,初始文本都不同。

可以这样做吗?

编辑 - 我很接近,但还没有

到目前为止,我已经到达这里

SELECT VALUE FROM STRING_SPLIT ((select 
        substring(value, charindex('Block',value),1000)
                     from mytable where [rollnumber ] = 'SCH-01097'),' ') WHERE VALUE <> ' '

这给了我我需要的一切,但在一个专栏中

class6:
Subject
result
english
80
math
23
science
45

现在如何以所需的表格形式制作它?

标签: sql-servertsql

解决方案


为了保持拆分值的顺序,此答案使用DelimitedSplit8K。像这样的东西有效。

[编辑] 查询现在使用“词干”将多个字符串映射到同一个类,而不是在 CTE 中包含特定字符串。例如,如果英语输入为 En,它仍将映射到英语。

表格和数据

drop table if exists #tTest;
go
create table #tTest(
  string        Varchar(256));
  
insert #tTest(string) values
('the student scored following result:     class: 6            subject:            result:    english               80       math         23');

DelimitedSplit8k

CREATE FUNCTION dbo.DelimitedSplit8K
--===== Define I/O parameters
        (@pString VARCHAR(8000), @pDelimiter CHAR(1))
--WARNING!!! DO NOT USE MAX DATA-TYPES HERE!  IT WILL KILL PERFORMANCE!
RETURNS TABLE WITH SCHEMABINDING AS
 RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...
     -- enough to cover VARCHAR(8000)
  WITH E1(N) AS (
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
                ),                          --10E+1 or 10 rows
       E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
       E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
 cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
                     -- for both a performance gain and prevention of accidental "overruns"
                 SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
                ),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
                 SELECT 1 UNION ALL
                 SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter
                ),
cteLen(N1,L1) AS(--==== Return start and length (for use in substring)
                 SELECT s.N1,
                        ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000)
                   FROM cteStart s
                )
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
 SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
        Item       = SUBSTRING(@pString, l.N1, l.L1)
   FROM cteLen l
;

询问

;with 
stems_cte(stem, word) as (
    select 'English', 'English' union all
    select 'En', 'English' union all
    select 'Math', 'Math' union all
    select 'Maths', 'Math' union all
    select 'Science', 'Science'),
splt_cte(string, str_val, ndx, lead_ndx, lead_len, rn) as (
    select t.string, ds.Item, charndx.ndx,
           lead(charndx.ndx) over (order by ds.ItemNumber),
           lead(len(ds.[Item])) over (order by ds.ItemNumber),
           ItemNumber
    from #tTest t
         cross apply dbo.DelimitedSplit8K(t.string, ' ') ds
         cross apply (select charindex(ds.Item, t.string, 1) ndx) charndx
    where Item <> ' '),
spec_rows_cte(word, ndx, lead_ndx, lead_len, rn) as (
    select sp.word, sc.ndx, sc.lead_ndx, sc.lead_len, sc.rn 
    from splt_cte sc
         join stems_cte sp on sc.str_val=sp.stem)
select max(case when src.word='English' then substring(sc.string, src.lead_ndx, src.lead_len) else null end) English,
       max(case when src.word='Math' then substring(sc.string, src.lead_ndx, src.lead_len) else null end) Math,
       max(case when src.word='Science' then substring(sc.string, src.lead_ndx, src.lead_len) else null end) Science
from splt_cte sc
     join spec_rows_cte src on sc.rn=src.rn;

输出

English Math    Science
80      23      NULL

推荐阅读