apache-zookeeper - 在 kafka 中出现错误,因为无法重新连接到 Zookeeper,会话 0x20000xxxxxxxx
问题描述
我们在 kubernetes(1.14.6) 上运行 confluent kafka ( https://github.com/confluentinc/cp-helm-charts )。我们的日志保留时间为 30 分钟,存储空间为 300GB。我们有 4 个代理,复制因子为 3。我们有大约 65MBps 的吞吐量。大约一个小时后运行后,我们观察到以下错误。Kafka 代理有 6GB 的堆。
[2019-09-27 12:32:05,278] WARN [ReplicaFetcher replicaId=2, leaderId=3, fetcherId=0] Error when sending leader epoch request for Map(RT-15-7 -> (currentLeaderEpoch=Optional[8], leaderEpoch=6), RT-17-0 -> (currentLeaderEpoch=Optional[5], leaderEpoch=3), RT-19-6 -> (currentLeaderEpoch=Optional[7], leaderEpoch=5), RT-02-0 -> (currentLeaderEpoch=Optional[5], leaderEpoch=3), RT-27-4 -> (currentLeaderEpoch=Optional[6], leaderEpoch=4), RT-22-3 -> (currentLeaderEpoch=Optional[7], leaderEpoch=5), RT-32-5 -> (currentLeaderEpoch=Optional[7], leaderEpoch=5), RT-42-4 -> (currentLeaderEpoch=Optional[7], leaderEpoch=5), RT-27-1 -> (currentLeaderEpoch=Optional[9], leaderEpoch=7), _confluent-controlcenter-5-2-0-1-MetricsAggregateStore-repartition-2 -> (currentLeaderEpoch=Optional[11], leaderEpoch=9), RT-21-6 -> (currentLeaderEpoch=Optional[8], leaderEpoch=6), _confluent-controlcenter-5-2-0-1-metrics-trigger-measurement-rekey-3 -> (currentLeaderEpoch=Optional[11], leaderEpoch=9), RT-30-6 -> (currentLeaderEpoch=Optional[7], leaderEpoch=5), RT-06-6 -> (currentLeaderEpoch=Optional[7], leaderEpoch=5), _confluent-controlcenter-5-2-0-1-expected-group-consumption-rekey-1 -> (currentLeaderEpoch=Optional[11], leaderEpoch=9), RT-17-1 -> (currentLeaderEpoch=Optional[7], leaderEpoch=5), _confluent-metrics-10 -> (currentLeaderEpoch=Optional[11], leaderEpoch=9), RT-21-0 -> (currentLeaderEpoch=Optional[7], leaderEpoch=5), _confluent-monitoring-9 -> (currentLeaderEpoch=Optional[11], leaderEpoch=9), RT-17-9 -> (currentLeaderEpoch=Optional[9], leaderEpoch=7), RT-02-9 -> (currentLeaderEpoch=Optional[9], leaderEpoch=7), RT-20-1 -> (currentLeaderEpoch=Optional[8], leaderEpoch=6), RT-30-0 -> (currentLeaderEpoch=Optional[7], leaderEpoch=5), RT-12-0 -> (currentLeaderEpoch=Optional[8], leaderEpoch=6), RT-32-9 -> (currentLeaderEpoch=Optional[6], leaderEpoch=4), RT-02-1 -> (currentLeaderEpoch=Optional[7], leaderEpoch=5), RT-06-0 -> (currentLeaderEpoch=Optional[7], leaderEpoch=5), _confluent-monitoring-3 -> (currentLeaderEpoch=Optional[11], leaderEpoch=9), _confluent-metrics-7 -> (currentLeaderEpoch=Optional[11], leaderEpoch=9), _confluent-controlcenter-5-2-0-1-MetricsAggregateStore-changelog-0 -> (currentLeaderEpoch=Optional[11], leaderEpoch=9), RT-27-5 -> (currentLeaderEpoch=Optional[9], leaderEpoch=7), RT-17-6 -> (currentLeaderEpoch=Optional[6], leaderEpoch=4), RT-32-3 -> (currentLeaderEpoch=Optional[7], leaderEpoch=5), RT-02-6 -> (currentLeaderEpoch=Optional[6], leaderEpoch=4), _confluent-monitoring-0 -> (currentLeaderEpoch=Optional[11], leaderEpoch=9), RT-27-2 -> (currentLeaderEpoch=Optional[6], leaderEpoch=4), _confluent-controlcenter-5-2-0-1-actual-group-consumption-rekey-2 -> (currentLeaderEpoch=Optional[11], leaderEpoch=9)) (kafka.server.ReplicaFetcherThread)
java.io.IOException: Connection to 3 was disconnected before the response was read
at org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:100)
at kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:107)
at kafka.server.ReplicaFetcherThread.fetchEpochEndOffsets(ReplicaFetcherThread.scala:310)
at kafka.server.AbstractFetcherThread.truncateToEpochEndOffsets(AbstractFetcherThread.scala:208)
at kafka.server.AbstractFetcherThread.maybeTruncate(AbstractFetcherThread.scala:173)
at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:113)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:89)
其余配置为默认配置。我不确定是什么导致这个 zookeeper 关闭套接字连接。我也可以看到我所有的豆荚都很健康。如果需要添加更多信息,请告诉我。感谢任何调试指针。
解决方案
推荐阅读
- php - 带有条件和自定义错误消息的 Laravel 规则
- python - 在 CentOS7 中,AttributeError: module 'socketio' has no attribute 'Server'
- javascript - 从 txt 文件中删除一行
- json - 在 ElasticSearch 中使用 URL 作为 ID
- gem5 - 如何使用超过 8 个内核的 fs.py 运行 gem5 arm aarch64 全系统模拟?
- javascript - Adobe Acrobat DC 动态标记多个输入
- html - 数据句 HTML/CSS 中的居中文本
- android - 图像被捕获后保存到画廊
- php - 移动数组值 PHP
- database - 任何支持长位域查询的数据库?