elasticsearch - 弹性搜索重复数据删除
问题描述
使用 Elastic Search 6.5.x。我通过外部网络爬虫将文档索引到 ES 索引中。下面是我的索引。我想根据过滤器获取记录。如果我使用以下查询,假设如果我使用https的术语查询,则 http 结果与 http 显示的结果不同。记录 1 和 2 看起来很相似,但区别在于 URL 中的 https 和 http。如何比较协议后 URL 字段的记录。如果它具有相同的信息,我如何显示其中一条记录以及剩余的唯一记录。
指数:
"title": "About elastic search"
"content": "Elasticsearch is an open source distributed, RESTful search and analytics engine capable of solving a growing number of use cases."
"URL: "https://www.elastic.co/webinars/getting-started-elasticsearch"`
"title":"About elastic search"
"content":"Elasticsearch is an open source distributed, RESTful search and analytics engine capable of solving a growing number of use cases."
"URL":"http://www.elastic.co/webinars/getting-started-elasticsearch"
"title":"About Similarity"
"content":"A similarity (scoring / ranking model) defines how matching documents are scored. Similarity is per field, meaning that via the mapping one can define a different similarity per field."
"URL":"https://www.elastic.co/guide/en/elasticsearch/reference/6.2/index-modules-similarity.html"
"title":"SQL Access"
"content":"This functionality is experimental and may be changed or removed completely in a future release. Elastic will take a best effort approach to fix any issues, but experimental features are not subject to the support SLA of official GA features."
"URL":"http://www.elastic.co/guide/en/elasticsearch/reference/current/xpack-sql.html"
询问:
GET test-index/_search
{
"query":{
"bool":{
"must":{
"query_string":{
"query":"test"
}
},
"filter": {
"bool" : {
"must" :
{"term" : { "url" : "https" } }
}}
}
}
}
解决方案
推荐阅读
- javascript - Javascript DOM文档查询变量
- c++ - 没有运算符“<<”与这些操作数匹配——操作数类型为:ostream << int(?)
- android - Fragment not attached to Activity while loading adapter
- python - pd.merge 不能在函数内部工作,而在外部工作正常,TypeError 消息
- sql - 有没有办法按单列对左连接查询的结果进行分组
- react-native - 使用 React Native 时,多个生产者/消费者是否与 Apache Pulsar 在同一个 WebSocket 连接上?
- node.js - 订阅过期后无法获取key的值
- powershell - powershell - 远程磁盘信息唯一描述
- c++ - C++中的BLE扫描和通信
- html - 什么 VSCODE 扩展将短标签和类语句转换为扩展代码(内部示例)