docker - Ignite TcpDiscoverySpi 因“ServerSocket [addr=0.0.0.0/0.0.0.0..”的接受循环而导致 SocketTimeout 出现严重系统错误而失败
问题描述
使用 Ignite 2.7.6 在尝试通过简单配置在docker 桥接网络上启动嵌入式 ignite 服务器节点(在 spring boot 应用程序中)时,服务器启动失败并出现以下错误,
[10:16:16] Ignite node started OK (id=e7276b83)
[10:16:16] >>> Ignite cluster is not active (limited functionality available). Use control.(sh|bat) script or IgniteCluster interface to activate.
[10:16:16] Topology snapshot [ver=1, locNode=e7276b83, servers=1, clients=0, state=INACTIVE, CPUs=1, offheap=0.1GB, heap=0.4GB]
mediation-service - [INFO ] 10:16:16.981 [main] com.**.**.perfmon.common.spring.EmbeddedIgnite - ====>>> Activating Ignite Cluster
mediation-service - [WARN ] 10:16:17.383 [exchange-worker-#49] org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager - Started write-ahead log manager in NONE mode, persisted data may be lost in a case of unexpected node failure. Make sure to deactivate the cluster before shutdown.
[10:16:17] Started write-ahead log manager in NONE mode, persisted data may be lost in a case of unexpected node failure. Make sure to deactivate the cluster before shutdown.
mediation-service - [ERROR] 10:16:21.982 [tcp-disco-srvr-#3] org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi - Failed to accept TCP connection.
java.net.SocketTimeoutException: Accept timed out
at java.base/java.net.PlainSocketImpl.socketAccept(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:458)
at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:565)
at java.base/java.net.ServerSocket.accept(ServerSocket.java:533)
at org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:5845)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServerThread.body(ServerImpl.java:5763)
at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
mediation-service - [WARN ] 10:16:21.982 [RMI TCP Accept-19887] sun.rmi.transport.tcp - RMI TCP Accept-19887: accept loop for ServerSocket[addr=0.0.0.0/0.0.0.0,localport=19887] throws
java.net.SocketTimeoutException: Accept timed out
at java.base/java.net.PlainSocketImpl.socketAccept(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:458)
at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:565)
at java.base/java.net.ServerSocket.accept(ServerSocket.java:533)
at java.rmi/sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:394)
at java.rmi/sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:366)
at java.base/java.lang.Thread.run(Thread.java:834)
mediation-service - [WARN ] 10:16:21.982 [RMI TCP Accept-0] sun.rmi.transport.tcp - RMI TCP Accept-0: accept loop for ServerSocket[addr=0.0.0.0/0.0.0.0,localport=33254] throws
java.net.SocketTimeoutException: Accept timed out
at java.base/java.net.PlainSocketImpl.socketAccept(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:458)
at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:565)
at java.base/java.net.ServerSocket.accept(ServerSocket.java:533)
at java.rmi/sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:394)
at java.rmi/sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:366)
at java.base/java.lang.Thread.run(Thread.java:834)
mediation-service - [ERROR] 10:16:21.984 [tcp-disco-srvr-#3] - Critical system error detected. Will be handled accordingly to configured handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=java.net.SocketTimeoutException: Accept timed out]]
以下是相关配置,
点燃配置 xml 片段:
....
....
<property name="discoverySpi">
<bean
class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
<property name="ipFinder">
<bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder"/>
</property>
</bean>
</property>
....
....
码头工人撰写片段:
services:
***-mediation-service:
image: ***/mediation-service:latest
build: .
environment:
- PERCENTAGE_OF_RAM_FOR_HEAP=80.0
- SERVICE_NAME=mediation-service
- SERVICE_PORT=9887
- IGNITE_TCP_DISCOVERY_ADDRESSES=localhost
- JAVA_TOOL_OPTIONS=-Dcom.sun.management.jmxremote=true
-Dcom.sun.management.jmxremote.rmi.port=19887
-Dcom.sun.management.jmxremote.port=19887
-Dcom.sun.management.jmxremote.local.only=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Djava.rmi.server.hostname=$HOST_IP
-Djava.net.preferIPv4Stack=true
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=29887
...
...
networks:
- something-mediation-network
networks:
something-mediation-network:
driver: bridge
ipam:
driver: default
config:
- subnet: 186.30.240.0/24
有谁知道这里发生了什么?
谢谢穆图
更新(2020 年 11 月 13 日):我尝试了与 @alamar 建议的 2.9.0 相同的方法,但结果相同..请参见下文
mediation-service - [ERROR] 01:03:16.871 [tcp-disco-srvr-[:47500]-#3-#50] org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi - Failed to accept TCP connection.
java.net.SocketTimeoutException: Accept timed out
at java.base/java.net.PlainSocketImpl.socketAccept(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:458)
at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:565)
at java.base/java.net.ServerSocket.accept(ServerSocket.java:533)
at org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:6620)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServerThread.body(ServerImpl.java:6543)
at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:58)
mediation-service - [WARN ] 01:03:16.871 [RMI TCP Accept-19887] sun.rmi.transport.tcp - RMI TCP Accept-19887: accept loop for ServerSocket[addr=0.0.0.0/0.0.0.0,localport=19887] throws
java.net.SocketTimeoutException: Accept timed out
at java.base/java.net.PlainSocketImpl.socketAccept(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:458)
at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:565)
at java.base/java.net.ServerSocket.accept(ServerSocket.java:533)
at java.rmi/sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:394)
at java.rmi/sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:366)
at java.base/java.lang.Thread.run(Thread.java:834)
mediation-service - [WARN ] 01:03:16.871 [RMI TCP Accept-0] sun.rmi.transport.tcp - RMI TCP Accept-0: accept loop for ServerSocket[addr=0.0.0.0/0.0.0.0,localport=33351] throws
java.net.SocketTimeoutException: Accept timed out
at java.base/java.net.PlainSocketImpl.socketAccept(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:458)
at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:565)
at java.base/java.net.ServerSocket.accept(ServerSocket.java:533)
at java.rmi/sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:394)
at java.rmi/sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:366)
at java.base/java.lang.Thread.run(Thread.java:834)
mediation-service - [ERROR] 01:03:16.876 [tcp-disco-srvr-[:47500]-#3-#50] - Critical system error detected. Will be handled accordingly to configured handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=java.net.SocketTimeoutException: Accept timed out]]
java.net.SocketTimeoutException: Accept timed out
at java.base/java.net.PlainSocketImpl.socketAccept(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:458)
at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:565)
at java.base/java.net.ServerSocket.accept(ServerSocket.java:533)
at org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:6620)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServerThread.body(ServerImpl.java:6543)
at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:58)
mediation-service - [WARN ] 01:03:17.271 [tcp-disco-srvr-[:47500]-#3-#50] org.apache.ignite.internal.processors.cache.CacheDiagnosticManager - Page locks dump:
更新(2020 年 11 月 18 日):
我还有另一个更新,如果我使用 Java 8 而不是 Java 11,我在集群激活期间看不到这个问题并且一切正常。
所以我怀疑这与底层的java库使用/依赖有关..
解决方案
该错误意味着套接字设置了超时,并且在超时期间没有收到任何传入消息。
有趣的是,Ignite 创建的套接字没有超时!这表明某处存在错误...
...这次是在 Java 中:JDK-8237858。错误描述说accept
可以被信号中断(这是预期的),这会导致 Java 抛出错误(这是错误)。
根据 OpenJDK Jira,这不会影响 Java 8。在 Java 16 中已修复,并且默认设置也不影响 Java 13。
不过,我没有看到在 Java 11 维护版本中提到修复。
更新:在 2.12 中对此进行了修复。基本上,Ignite 必须在自己的代码中嵌入一个解决该错误的方法。
推荐阅读
- c - 如何释放链接结构的结构
- python - 在 TensorFlow.js 中切片 3d 张量
- kubernetes - Kubernetes 持久卷,用于在所有节点和 Pod 上可访问的裸机
- bash - Bash for 循环重复很多
- mysql - DBeaver 和 MySQL 之间的连接
- sql - 获取关联记录名称包含字符串且关联记录计数大于阈值的记录
- javascript - 在 Reactjs 中应用嵌套路由,因此只有新组件显示
- reactjs - 如何在 React 中使用 Context 处理二维数组?
- cassandra - 使用支持 order by 的集群键创建表
- c# - 无法在 VS Mac 上更新 asp.net 核心项目中的包