首页 > 解决方案 > 在文本字段中选择单词 - SQL server

问题描述

我的数据有一个文本字段,我想在以下列表中选择关键字:'mr'、'jr'、'dr'、'ii' 这些词需要在前面和结尾有空格。所以使用下面的数据,输出应该是:

id|text|keyword1|keyword2|keyword3|keyword4
1, 'xxxx', 'jr','mr','ii'
2, 'xxxx','mr','',''

感谢您的帮助。HHC

Create TABLE have  (
    id   int,
    text varchar(225)   
);

Insert into have (id,text) values (1,'monday jr due date mr ii final');
Insert into have (id,text) values (2,'happy new year mr J');

标签: sqlsql-server

解决方案


首先获取Ngrams8K的副本。

接下来你可以这样做:

SELECT
  h.Id,
  h.[text],
  ng.Token,
  Keyword = ROW_NUMBER() OVER (PARTITION BY h.Id ORDER BY ng.Position)
FROM        dbo.have AS h
CROSS APPLY dbo.NGrams8k(h.[text], 4) AS ng
WHERE       ng.token IN (' mr ' , ' jr ', ' dr ', ' ii ');

回报:

Id   text                             Token   Keyword
---- -------------------------------- ----------------------------
1    monday jr due date mr ii final    jr     1
1    monday jr due date mr ii final    mr     2
1    monday jr due date mr ii final    ii     3
2    happy new year mr J               mr     1

一个简单的修改:

SELECT
  f.Id,
  f.[Text],
  Keyword1 = MAX(CASE f.Keyword WHEN 1 THEN f.Token ELSE '' END),
  Keyword2 = MAX(CASE f.Keyword WHEN 2 THEN f.Token ELSE '' END),
  Keyword3 = MAX(CASE f.Keyword WHEN 3 THEN f.Token ELSE '' END),
  Keyword4 = MAX(CASE f.Keyword WHEN 4 THEN f.Token ELSE '' END)
FROM
(
  SELECT h.Id, h.[text], ng.Token, Keyword = 
           ROW_NUMBER() OVER (PARTITION BY h.Id ORDER BY ng.Position)
  FROM        dbo.have AS h
  CROSS APPLY dbo.NGrams8k(h.[text], 4) AS ng
  WHERE       ng.token IN (' mr ' , ' jr ', ' dr ', ' ii ')
) AS f
GROUP BY f.Id, f.[Text]
ORDER BY f.Id;

回报:

Id   Text                            Keyword1       Keyword2     Keyword3      Keyword4
---- ------------------------------- -------------- ------------ ------------- ------------
1    monday jr due date mr ii final   jr             mr           ii           
2    happy new year mr J              mr                                       

推荐阅读