apache-kafka - Kafka 生产者/消费者打开了太多的文件描述符
问题描述
我们有一个 3 节点的 Kafka 集群部署,每个主题有 5 个主题和 6 个分区。我们已经配置了复制因子=3,我们看到一个非常奇怪的问题,文件描述符的数量已经超过了 ulimit(我们的应用程序是 50K)
As per the lsof command and our analysis
1. there have 15K established connection from kafka producer/consumer towards broker and at the same time in thread dump we have observed thousands of entry for kafka 'admin-client-network-thread'
admin-client-network-thread" #224398 daemon prio=5 os_prio=0 tid=0x00007f12ca119800 nid=0x5363 runnable [0x00007f12c4db8000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x00000005e0603238> (a sun.nio.ch.Util$3)
- locked <0x00000005e0603228> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000005e0602f08> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at org.apache.kafka.common.network.Selector.select(Selector.java:672)
at org.apache.kafka.common.network.Selector.poll(Selector.java:396)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:460)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:238)
- locked <0x00000005e0602dc0> (a org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:214)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:205)
at kafka.admin.AdminClient$$anon$1.run(AdminClient.scala:61)
at java.lang.Thread.run(Thread.java:748)
2. As per the lsof output , We have observed 35K entry for pipe and event poll
java 5441 app 374r FIFO 0,9 0t0 22415240 pipe
java 5441 app 375w FIFO 0,9 0t0 22415240 pipe
java 5441 app 376u a_inode 0,10 0 6379 [eventpoll]
java 5441 app 377r FIFO 0,9 0t0 22473333 pipe
java 5441 app 378r FIFO 0,9 0t0 28054726 pipe
java 5441 app 379r FIFO 0,9 0t0 22415241 pipe
java 5441 app 380w FIFO 0,9 0t0 22415241 pipe
java 5441 app 381u a_inode 0,10 0 6379 [eventpoll]
java 5441 app 382w FIFO 0,9 0t0 22473333 pipe
java 5441 app 383u a_inode 0,10 0 6379 [eventpoll]
java 5441 app 384u a_inode 0,10 0 6379 [eventpoll]
java 5441 app 385r FIFO 0,9 0t0 40216087 pipe
java 5441 app 386r FIFO 0,9 0t0 22483470 pipe
Setup details :-
apache kafka client :- 1.0.1
Kafka version :- 1.0.1
Open JDK :- java-1.8.0-openjdk-1.8.0.222.b10-1
CentOS version :- CentOS Linux release 7.6.1810
Note:- After restarted VM file descriptor count was able to clear and come to normal count as 1000
then after few second file descriptor count started to increase and it will reach to 50K (limit) after
1-week in Idle scenarios.
解决方案
此问题是由于使用了已弃用的 kafka.admin.AdminClient API。相反,可以使用 org.apache.kafka.clients.admin.AdminClient 从 Kafka 获取类似信息。此 API 具有等效的方法并提供与旧版 API 相同的功能。
使用遗留 API(kafka.admin.AdminClient API),在线程转储中观察到许多守护线程(“admin-client-network-thread”)。不知何故,在遗留 API 中,管理客户端网络线程维护得不好,您会看到为每次调用创建了许多“admin-client-network-thread”守护线程,并且它们都不会终止。由于在进程和系统级别观察到大量文件描述符。
推荐阅读
- python-3.x - Numba jit 和 Scipy
- python - Python - 每次出现特定字符串时将文件拆分为多个文件
- javascript - API 调用了两次,而 useEffect 触发了一次。ReactJS/Javascript
- javascript - [Vue 警告]:v-on 处理程序出错:“SyntaxError: Unexpected token , in JSON at position 76”
- generics - 如何在不重复每个方法的特征绑定的情况下表达超特征关联类型的特征绑定?
- javascript - 重新排序日志文件中的数据 - Javascript
- macos - 是否可以在 Windows 10 Pro 上下载 Mac OS X Catalina?
- matlab - 如何在Matlab中将前6行和所有列设置为0
- r - 带弹性表和官员的行高
- google-chrome - 如何在 chrome 启动时在后台完全加载固定的 youtube 选项卡?