首页 > 解决方案 > Getting NotLeaderForPartitionException for a very long time

问题描述

I have a 3 node kafka cluster, suddenly one of the node in the cluster was down and i started seeing the NotLeaderForPartitionException exception in my application logs when sending the message to one of the topics, however for some of the topics i am able post and consume messages.

I could see this problem lasting until all the kafka servers are restarted, after the restart things are all ok.

Now, my question is: why not the new leader not elected for those topics but keep throwing the same NotLeaderForPartitionException exception and how to get the new leader election happen for these topics ?

Exception Trace:

2020-04-11 22:05:21,747 ERROR [pool-15-thread-297] [KafkaMessageProducer:92] Message send failed:
java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.
    at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:94)
    at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:64)
    at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:29)

标签: javaapache-kafka

解决方案


Produce 和 Fetch 请求都发送到分区的领导副本。NotLeaderForPartitionException当请求被发送到现在不是分区的领导副本的分区时,将引发异常。

客户端将有关每个分区的领导者的信息作为缓存进行维护。缓存管理的完整过程如下图所示。

在此处输入图像描述

客户端需要通过设置metadata.max.age.msin producer 配置来刷新此信息。此标签的默认值为300000 ms

您可以浏览以下 Apache Kafka 文档。

https://kafka.apache.org/documentation/


推荐阅读