elasticsearch - 将新文档添加到生产 Elasticsearch 集群
问题描述
我的 Elasticsearch 集群经常被搜索查询使用。每周一次,我会收到一批需要添加到索引中的新文档。如果我将它们添加到索引中,它将在索引和合并或移动分片时大大降低搜索速度。避免减速的最佳方法是什么?
到目前为止我的解决方案:
1. Spin up a single node empty elastic.
2. Restore index i need to update from a snapshot.
3. Add new documents to this index.
4. Force merge shards
5. Snapshot resulting index.
6. Restore updated index on production cluster.
7. Update aliases to use updated index and delete old index.
我在想从快照恢复不应该占用太多资源。可能需要预热恢复的索引以获得更好的性能。
这是正常的解决方案还是太复杂了?
可能 Elasticsearch 有适当的方法来添加文档而不会停机或集群减速?
解决方案
500GB on one primary shard, I would clearly fix this before doing anything else. You have 10 nodes so you need to spread the load over all of them. Adding nodes will not help at all.
The official recommendation is to not let shards grow bigger than 10/50GB. So in your case I would split that index to have 10 primary shards (+1 replica each), so that each node can handle a part of the job. Otherwise, there's always only one node doing the write job and two nodes doing the read job, which is not optimal.
So before coming up with a way to circumvent the issue, fix the issue as I described above. Your cluster will be much better off, because 10 nodes should definitely handle 5TB easily without having to resort to a complex update procedure as the one you listed.
Try it out...
推荐阅读
- node.js - Node 和 Express - 第二次定义 app.get 不会覆盖第一次定义
- python - MDNavigationDrawer 位于阻止 MDToolBar 菜单按钮的所有内容之上
- javascript - 如何从 codepen 改变这个甜甜圈的颜色?
- javascript - Angular 字符串绑定对于 xss 预防安全吗?
- tensorflow - 我们可以保存一个标记器来预处理 savemodel 中的原始文本吗?
- postgresql - 如何使用 typeorm API 安全地设置 PostgreSQL 运行时配置变量
- entity-framework - 从旧版本迁移连接到数据库时,程序集版本不匹配异常
- .net-core - 如何在 .NET Core 3.1 中结合 OData 4 和 Automapper
- javascript - 如何通过 VSCode 上的 ESLint 在 JavaScript 中格式化函数周围的空间?
- c# - 内联变量声明无法构建