首页 > 解决方案 > 按来自 nvarchar(MAX) 列的订单 ID 过滤是否会受益于全文搜索?

问题描述

在我的表中,我有一nvarchar(MAX)列以 XML 格式保存有关我们订单的数据。另一个部门通常必须通过在该列中查找某个订单 ID 来从表中获取信息,但我们都需要能够查找该列中文本的任何部分。

问题是订单 ID 和其他过滤器/搜索条件对于每个订单都是唯一的,不是我可以在停用词或停用词列表中定义的词,并且会为新订单生成新的词。

由于表的大小正在增长,使用LIKE子句查询需要很长时间,但我无法想象我的同事如何获取有关订单的信息,除非通过查找订单 ID、零件序列号或一段文本等信息。我们无法控制 XML 订单数据,而且它没有标准化的结构,也就是说每种订单类型都有自己的 XML 结构,而且数量太多无法分成不同的表,我们无法提取单独的元素进入不同的列,因为我们无法控制处理,只能控制数据的存储。

由于在这种情况下索引将是徒劳的,我开始阅读全文搜索,但由于我提到的原因,这会有所帮助not吗?是否有更好的选择?

编辑:

添加存储在 nvarchar(MAX) 列中的示例消息(目前此处未显示大约 23 种其他类型):

<?xml version='1.0' encoding='UTF-8'?>
<ServiceRequest>
    <OrderID>GT123456789123465</OrderID>
    <CreatedBy>some.person</CreatedBy>
    <CreationTime>2021-08-18</CreationTime>
    <CustomerReference>INC123456</CustomerReference>
    <CustomerContract>123456</CustomerContract>
    <ShortDescription>A little bit of text</ShortDescription>
    <StockLocation>Text-Identifier</StockLocation>
    <DueDate>2021-08-19</DueDate>
    <ExpectedDeliveryDate>2021-08-19T10:30:00</ExpectedDeliveryDate>
    <ServiceLevel>Same-Day</ServiceLevel>
    <DeliveryLocation>SITE</DeliveryLocation>
    <Site>Address of a building or something</Site>
    <ContactName>Name of a person</ContactName>
    <ContactPhone>0123456789</ContactPhone>
    <DocTypeID>123</DocTypeID>
    <DeliveryAddress>
        <Address1>Address1</Address1>
        <Address2>More details about address</Address2>
        <City>Some city</City>
        <Postcode>LOL KEK</Postcode>
        <Country>Narnia</Country>
    </DeliveryAddress>
    <Parts>
        <Part>
            <UniqueID>168468468</UniqueID>
            <PartNumber>#ABCDE-1234-ABCDE</PartNumber>
            <Description>Example TV set with model name</Description>
            <Quantity>1</Quantity>
            <Returnable>Y</Returnable>
            <ReturnInformation>TBC</ReturnInformation>
            <LinkedDemandId>123456789</LinkedDemandId>
        </Part>
    </Parts>
</ServiceRequest>

.

<?xml version='1.0' encoding='UTF-8'?>
<Engineer>
    <PersonId>some.guy</PersonId>
    <SearchName>Some Guy</SearchName>
    <personemail>some.guy@company.com</PersonEmail>
    <PersonPhone>00000000000</PersonPhone>
    <Address>
        <Zip>LOL KEK</Zip>
        <CountryId>US</CountryId>
    </Address>
</Engineer>

.

<WarehouseUpdate>
    <WarehouseId>Warehouse-name-123456</WarehouseId>
    <WarehouseDescription>Friendlier WH name</WarehouseDescription>
    <CurrencyId>USD</CurrencyId>
    <CostDomainId>MAIN</CostDomainId>
    <NodeId>SSL</NodeId>
    <SupplySource>W</SupplySource>
    <ReturnWarehouse>Warehouse-ID</ReturnWarehouse>
    <IsAutoReceive>Y</IsAutoReceive>
    <IsRepairWhse>N</IsRepairWhse>
    <IsReplenishable>Y</IsReplenishable>
    <SupplyWarehouse>Warehouse-ID</SupplyWarehouse>
    <WarehouseTypeId>ABC</WarehouseTypeId>
    <Address>
        <Zip>LOL KEK</Zip>
        <CountryId>US</CountryId>
    </Address>
</WarehouseUpdate>

.

<briefing>
    <incident>
        <serviceProvider>A long name of a company that provides a service</serviceProvider>
        <receiverURL>https://www.google.com/some/url/that/is/used</receiverURL>
        <incidentNumber>123456789</incidentNumber>
        <taskNumber>123456789</taskNumber>
        <taskAssignmentID>123456789</taskAssignmentID>
        <taskCreationDate>2021-02-31</taskCreationDate>
        <taskCreationTime>10:32:13</taskCreationTime>
        <sendDate>2021-02-31</sendDate>
        <sendTime>10:32:27</sendTime>
        <request>ABC</request>
        <urgency>ABC</urgency>
        <severity>D</severity>
        <customerName>Name of a company</customerName>
        <helpdeskNumber>123456789</helpdeskNumber>
        <originalCustomerReference>123456789</originalCustomerReference>
        <projectNumber/>
        <project/>
        <callerFirstName>Some</callerFirstName>
        <callerLastName>Person</callerLastName>
        <callerPhone>123456789123456789</callerPhone>
        <callerPhoneType>ABC</callerPhoneType>
        <callerEmailaddress/>
        <callerPreferredLanguage/>
        <communicationPreference/>
        <installedAtAddress1>Address short</installedAtAddress1>
        <installedAtAddress2/>
        <installedAtAddress3>More address details</installedAtAddress3>
        <installedAtAddress4>Even more address details</installedAtAddress4>
        <installedAtCity>Washington</installedAtCity>
        <installedAtState/>
        <installedAtProvince/>
        <installedAtPostalCode>LOL KEK</installedAtPostalCode>
        <installedAtCountry>US</installedAtCountry>
        <installedAtPhone/>
        <installedAtFax/>
        <installedAtEmail/>
        <productSerialNumber>ABCDF123456789</productSerialNumber>
        <productTag>ABCDF123456789</productTag>
        <productSystem>123456</productSystem>
        <productItemNumber>123456789123456789</productItemNumber>
        <productItemDescription>This is a thingy</productItemDescription>
        <productComponentNumber/>
        <productComponentDescription/>
        <productServiceGroupNumber>12</productServiceGroupNumber>
        <customerSerialNumber/>
        <defectDescription>ABCD::ABCD::ABCD EF::ABCD EF</defectDescription>
        <orderDescription>Lorem ipsum</orderDescription>
        <taskType>In summet idit</taskType>
        <customerErrorCode/>
        <problemCode>corpsem mepsem dopsem</problemCode>
        <resolutionSummary/>
        <resolutionCode/>
        <reporteddate>2021-05-12</reporteddate>
        <reportedtime>10:30:44</reportedtime>
        <customerTimezone>EET</customerTimezone>
        <coverage>-ABC-EF-</coverage>
        <contractServiceNumber>123456789123456789</contractServiceNumber>
        <plannedStartDate>2021-05-12</plannedStartDate>
        <plannedStartTime>10:32:13</plannedStartTime>
        <plannedEndDate>2021-05-14</plannedEndDate>
        <plannedEndTime>10:32:44</plannedEndTime>
        <chargeableFlag/>
        <vkOrg>12AB</vkOrg>
        <attribute1/>
        <attribute2/>
        <attribute3>AB</attribute3>
        <attribute4/>
        <attribute5/>
        <attribute6/>
        <attribute7/>
        <attribute8/>
        <attribute9/>
    </incident>
    <incidentNotes>
        <technicianNote>Went to lunch, bought a snack</technicianNote>
        <technicianNote>AB    11.11  11/11/2021
            A huge block of text incoming
            Here a monumental, monolithic, megalithic block of text resides that has numerous details written in freeform however the person doing the thing decides
            There is some structure to the text but it is completely specific to this one type of XML message.
            
            Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla eu feugiat enim. Praesent malesuada, diam ut ornare tristique, ipsum dolor rutrum enim, et sodales lectus mi at ex. Nulla in varius nisl. Nunc non enim augue. Integer condimentum tempor lacus, non maximus tortor dictum a. Donec dapibus urna nulla, ac tempus justo sodales sed. Cras lacinia tempus lacinia. Sed fermentum libero vel lectus ornare, at egestas eros dignissim. Quisque ac vehicula erat. Morbi id ultrices sem, auctor dapibus ligula. Vivamus vestibulum consectetur ligula non viverra. Proin id mi non ipsum consectetur interdum. Aenean id posuere metus.

            Aenean diam justo, ultrices sed cursus eget, posuere eget justo. Maecenas egestas mi et rutrum auctor. Fusce tincidunt ac purus ut gravida. Proin ac condimentum nibh, id venenatis nibh. Sed eu turpis non sem venenatis posuere eu sit amet leo. Vivamus velit lacus, tempor quis dolor vel, sagittis vulputate risus. Praesent dignissim sed turpis vel porta. Duis elit ante, pellentesque sit amet nisl eu, ullamcorper varius ex. Aenean tortor ligula, posuere sed tempor eget, consequat ut orci. Vestibulum eu aliquet ante. Ut eros ex, dignissim nec accumsan eu, posuere nec ligula. Cras tempor volutpat tempor. Duis vitae dui sit amet diam porttitor viverra. Aliquam ornare, turpis ut pulvinar bibendum, urna eros sodales turpis, sed malesuada felis massa in neque. Etiam venenatis volutpat diam eget placerat. Integer ultrices vulputate neque ut ullamcorper.
        </technicianNote>
    </incidentNotes>
    <ibaseNotes>
        <serialnumberNote/>
    </ibaseNotes>
</briefing>

另外我想我无意中欺骗了你,最常见的订单 ID 正在被搜索,但我们需要能够通过 XML 消息的任何部分进行搜索/过滤,例如我需要能够找到所有包含“some.人”中。我会回去编辑它。

标签: sqlsql-serverperformancefull-text-search

解决方案


理想情况下,您应该将 XML 存储在一个xml列中。然后您可以使用 XQuery 进行搜索。列上的 XML 索引可能是明智的。

但是您也可以将数据转换为 XML(以降低效率为代价),像这样

CROSS APPLY (VALUES (
    CAST(CAST(yourData AS varchar(max)) AS xml)
) ) v(xmlData)

强制转换varchar是必要的,因为 XML 是使用 UTF-8 定义的。

然后您可以使用 XQuery,例如这样

WHERE v.xmlValue.exist('/Engineer/PersonId[contains(text()[1], "some.guy")]') = 1
   or v.xmlValue.exist('/Engineer/PersonEmail[contains(text()[1], "some.guy")] ') = 1

或者通过一个 SQL 变量

WHERE v.xmlValue.exist('/Engineer/PersonId[contains(text()[1], sql:variable("@toSearch"))]') = 1
   or v.xmlValue.exist('/Engineer/PersonEmail[contains(text()[1], sql:variable("@toSearch"))] ') = 1

SQL小提琴

要在任何节点中查找文本,您可以使用//*这意味着“下降任何深度,匹配任何节点名称”

WHERE v.xmlValue.exist('//*[contains(text()[1], sql:variable("@toSearch"))]') = 1

但这可能效率低下。


推荐阅读