首页 > 解决方案 > Flush data from Historical Node Memory to Deep Storage

问题描述

I had initially setup a druid cluster with 2 historical nodes with 30gb memory each. 2 middle manager nodes, one node with coordinator and overlord running, 1 broker node.

After successfully running it for 3-4weeks, I saw that my tasks were staying in the running state even after the window period. I then happened to add one more historical node with same configuration, this resulted in my tasks working fine again. What this meant was all the data ingested to druid is going to memory and I will have to keep on adding historical nodes.

Is there a way to flush some of the data from memory to deep storage and it should get loaded into memory whenever a query is fired against that set of data? Each of my historical node is of 30GB RAM. Configs :

druid.processing.buffer.sizeBytes=1073741824

druid.segmentCache.locations=[{"path":"var/druid/segment-cache","maxSize":32212254720}]

druid.port=7080

druid.service=druid/historical

druid.server.maxSize=100000000000

druid.server.http.numThreads=50

druid.processing.numThreads=5 druid.query.groupBy.maxResults=10000000

druid.query.groupBy.maxOnDiskStorage=10737418240

标签: druid

解决方案


正如问题中提到的那样,我的问题是我必须每隔几天启动一个新节点,不知道为什么。根本原因是每个历史节点上的磁盘空间。本质上,即使 druid 将数据推送到深度存储,它也将所有数据本地保存在历史节点上。因此,您只能在所有历史节点中存储等于“druid.server.maxSize”配置总和的数据。如果不想横向扩展,可以增加历史节点的磁盘,增加这个配置的值,重启历史节点。


推荐阅读