elasticsearch - Perform JOIN operation in an efficient way in real time on two data sources produced by Elasticsearch?
问题描述
We have two very large flat Elasticsearch indices: variant
and genotype
. Each item of the latter index - genotype
- has variantId
to connect to a document in the variant
index. We need to do a regular JOIN
operation - for each variant
document in a list get all of the genotype
ones as a list - but Elasticsearch
is not able to perform it in any efficient way. So, we are wondering if such a JOIN
can be performed outside ES
by some separately run tool/engine (Kafka, Spark, we are not sure)? We could issue two ES
queries and feed them into the engine which would output joined result. What would be the best tool for the goal? Could anyone give some links to resources or to how this particular problem could be addressed?
Previously we tried to just store genotype
as a nested field in the variant
documents but because there are too many of both of them and there is a hard limit on the number of nested fields we needed to abandon that idea.
解决方案
推荐阅读
- c# - Simple.OData.Client 真的很慢
- php - 如何 php lint 仅在 gitlab 中存在和修改文件?
- android - 在 WebView 中存储和使用 Cookie
- arangodb - ArangoDB - 创建 AQL 插入脚本
- python - Python3内存错误的根源是什么?以及如何解决?
- jenkins - 在现有 Jenkins 实例上初始安装后,Jenkins Blue Ocean 未加载
- java - 在这种情况下,Observable 会因为被垃圾收集而停止发射吗?
- excel - 如何计数直到某个值?
- sql - linq 内连接子查询和条件选择
- google-app-maker - 是否可以安全地强制用户只能通过关系访问数据?