azure-cognitive-search - Azure 搜索 - 无法合并(使用技能)从 KeyPhraseExtractionSkill 获得的数据
问题描述
我正在创建一个获取文档的索引器,运行 KeyPhraseExtractionSkill 并将其输出回索引。
对于许多文档,这是开箱即用的。但是对于那些超过 50,000 的记录,这不起作用。好的,没问题;这在文档中明确说明。
文档建议使用文本拆分技能。我所做的是使用文本拆分技能,将原始文档拆分为页面,将所有页面传递给 KeyPhraseExtractionSkill。然后我们需要将它们合并回来,因为我们最终会得到一个字符串数组。不幸的是,合并技能似乎不接受数组数组,只是一个数组。
https://i.imgur.com/dBD4qgb.png <- 链接到技能组层次结构。
这是 Azure 报告的错误:
Required skill input was not of the expected type 'StringCollection'. Name: 'itemsToInsert', Source: '/document/content/pages/*/keyPhrases'. Expression language parsing issues:
我最终想要实现的是对大于 50,000 的文本运行 KeyPhraseExtractionSkill 以最终将其添加回索引。
技能组的 JSON
"@odata.context": "https://-----------.search.windows.net/$metadata#skillsets/$entity",
"@odata.etag": "\"0x8D957466A2C1E47\"",
"name": "devalbertcollectionfilesskillset2",
"description": null,
"skills": [
{
"@odata.type": "#Microsoft.Skills.Text.SplitSkill",
"name": "SplitSkill",
"description": null,
"context": "/document/content",
"defaultLanguageCode": "en",
"textSplitMode": "pages",
"maximumPageLength": 1000,
"inputs": [
{
"name": "text",
"source": "/document/content"
}
],
"outputs": [
{
"name": "textItems",
"targetName": "pages"
}
]
},
{
"@odata.type": "#Microsoft.Skills.Text.EntityRecognitionSkill",
"name": "EntityRecognitionSkill",
"description": null,
"context": "/document/content/pages/*",
"categories": [
"person",
"quantity",
"organization",
"url",
"email",
"location",
"datetime"
],
"defaultLanguageCode": "en",
"minimumPrecision": null,
"includeTypelessEntities": null,
"inputs": [
{
"name": "text",
"source": "/document/content/pages/*"
}
],
"outputs": [
{
"name": "persons",
"targetName": "people"
},
{
"name": "organizations",
"targetName": "organizations"
},
{
"name": "entities",
"targetName": "entities"
},
{
"name": "locations",
"targetName": "locations"
}
]
},
{
"@odata.type": "#Microsoft.Skills.Text.KeyPhraseExtractionSkill",
"name": "KeyPhraseExtractionSkill",
"description": null,
"context": "/document/content/pages/*",
"defaultLanguageCode": "en",
"maxKeyPhraseCount": null,
"modelVersion": null,
"inputs": [
{
"name": "text",
"source": "/document/content/pages/*"
}
],
"outputs": [
{
"name": "keyPhrases",
"targetName": "keyPhrases"
}
]
},
{
"@odata.type": "#Microsoft.Skills.Text.MergeSkill",
"name": "Merge Skill - keyPhrases",
"description": null,
"context": "/document",
"insertPreTag": " ",
"insertPostTag": " ",
"inputs": [
{
"name": "itemsToInsert",
"source": "/document/content/pages/*/keyPhrases"
}
],
"outputs": [
{
"name": "mergedText",
"targetName": "keyPhrases"
}
]
}
],
"cognitiveServices": {
"@odata.type": "#Microsoft.Azure.Search.CognitiveServicesByKey",
"key": "------",
"description": "/subscriptions/13abe1c6-d700-4f8f-916a-8d3bc17bb41e/resourceGroups/mde-dev-rg/providers/Microsoft.CognitiveServices/accounts/mde-dev-cognitive"
},
"knowledgeStore": null,
"encryptionKey": null
}```
Please let me know if there is anything else that I can add to improve the question. Thanks!
[1]: https://i.stack.imgur.com/GNf7F.png
解决方案
您不必合并关键短语输出即可将它们插入索引。
假设您的索引已经有一个名为Collection(Edm.String)mykeyphrases
类型的字段,要使用关键短语输出填充它,请添加此索引器输出字段映射:
"outputFieldMappings": [
...
{
"sourceFieldName": "/document/content/pages/*/keyPhrases/*",
"targetFieldName": "mykeyphrases"
},
...
]
/*
末尾的对sourceFieldName
展平字符串数组的数组很重要。如果您想将字符串数组传递给另一个技能以进行其他扩充,这也可以作为技能输入。
推荐阅读
- sed - 如何在VI中加入一堆行
- neo4j - Neo4j 驱动程序实例化失败,ECONNREFUSED 127.0.0.1:11002
- oracle - 数一数。在 Oracle 数据库中使用 expdp 导出的表
- laravel - Laravel - 在 post 方法中在 postman 中运行 api 时获取输入值
- c++ - 带有 *.in 文件的 Doxygen
- marklogic - 创建 TDE Marklogic 时不支持列名中的点
- apache - 来自 2 个不同端口的 Apache 重定向子域
- ionic3 - 无法读取属性“纬度”未定义
- c# - ASP.NET MVC 视图:当前上下文中不存在名称“注入”
- mongodb - 从 mongo 中的嵌入式文档中多次查找相同的集合