首页 > 解决方案 > Cloudera Manager - 服务监控崩溃

问题描述

去年我一直在使用 Cloudera 管理器,运行 20 多个节点。最近我开始在服务监视器角色中看到堆内存大小问题。我从 3 GB 增加到 4 GB,然后从 4 GB 增加到 5 GB,然后从 5 GB 增加到 6 GB。但是,我有时会导致服务监视器崩溃并重新启动。在此期间,整个仪表板似乎很糟糕。我需要在这里做什么来解决这个问题?

日志是

2021-04-26 16:10:34,938 WARN com.cloudera.enterprise.debug.JvmPauseMonitor: Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 20583ms: GC pool 'G1 Young Generation' had collection(s): count=2 time=182ms, GC pool 'G1 Old Generation' had collection(s): count=1 time=20877ms 2021-04-26 16:11:34,862 WARN com.cloudera.enterprise.debug.JvmPauseMonitor: Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 19870ms: GC pool 'G1 Young Generation' had collection(s): count=2 time=131ms, GC pool 'G1 Old Generation' had collection(s): count=1 time=20228ms 2021-04-26 16:12:35,132 WARN com.cloudera.enterprise.debug.JvmPauseMonitor: Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 20427ms: GC pool 'G1 Young Generation' had collection(s): count=3 time=149ms, GC pool 'G1 Old Generation' had collection(s): count=1 time=20733ms 2021-04-26 16:13:36,415 WARN com.cloudera.enterprise.debug.JvmPauseMonitor: Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 19008ms: GC pool 'G1 Young Generation' had collection(s): count=1 time=104ms, GC pool 'G1 Old Generation' had collection(s): count=1 time=19381ms

你能帮我解决这个问题吗?

标签: clouderacloudera-manager

解决方案


根据集群中的主机数量、服务类型和当前正在监控的实体数量,Service Monitor 可能会占用更高的内存。基于上述因素,此处给出了明确的指南。

您可能需要根据集群使用情况保持堆大小增加。同一页面上提供了一些调优技巧,例如使用 G1GC。

众所周知,HBase、Solr、Kafka 和 Kudu 会生成大量实体并增加 Service Monitor 堆需求。

如果您有 Cloudera 支持订阅,请提交案例以获得专家的官方支持。


推荐阅读