druid - Flush data from Historical Node Memory to Deep Storage
问题描述
I had initially setup a druid cluster with 2 historical nodes with 30gb memory each. 2 middle manager nodes, one node with coordinator and overlord running, 1 broker node.
After successfully running it for 3-4weeks, I saw that my tasks were staying in the running state even after the window period. I then happened to add one more historical node with same configuration, this resulted in my tasks working fine again. What this meant was all the data ingested to druid is going to memory and I will have to keep on adding historical nodes.
Is there a way to flush some of the data from memory to deep storage and it should get loaded into memory whenever a query is fired against that set of data? Each of my historical node is of 30GB RAM. Configs :
druid.processing.buffer.sizeBytes=1073741824
druid.segmentCache.locations=[{"path":"var/druid/segment-cache","maxSize":32212254720}]
druid.port=7080
druid.service=druid/historical
druid.server.maxSize=100000000000
druid.server.http.numThreads=50
druid.processing.numThreads=5 druid.query.groupBy.maxResults=10000000
druid.query.groupBy.maxOnDiskStorage=10737418240
解决方案
正如问题中提到的那样,我的问题是我必须每隔几天启动一个新节点,不知道为什么。根本原因是每个历史节点上的磁盘空间。本质上,即使 druid 将数据推送到深度存储,它也将所有数据本地保存在历史节点上。因此,您只能在所有历史节点中存储等于“druid.server.maxSize”配置总和的数据。如果不想横向扩展,可以增加历史节点的磁盘,增加这个配置的值,重启历史节点。
推荐阅读
- java - Caffeine LoadingCache 实现的单元测试单独通过,但一起运行时失败
- javascript - 为什么我不能在使用当前时间的 if/else 语句旁边打印当前时间?
- c# - 使用 vb6 dll。你调用的对象是空的。它适用于开发,但在部署 asp.net 时失败
- r - R:将指标列映射到构成列的内容
- python - 计算包含特定参数的数据框列中的值
- java - 如何将 CSV 文件转换为 OpenTSDB 格式
- airflow - 气流:如何模板化或将 Python 可调用函数的输出作为参数传递给其他任务?
- javascript - 通过 jQuery 执行函数
- angular - 使用 Subject 通知其他组件在 Angular 中执行操作是个好主意吗
- java - Spring boot考虑定义一个类型的bean,组件扫描不起作用