ignite - Apache Ignite:收到错误:获取拓扑版本的亲和性早于亲和性计算
问题描述
我在 Kubernetes 集群的 Linux 环境中运行 Apache Ignite .Net 2.7 集群。Ignite 集群由 5 个运行三个微服务(2x1st 服务、2x2nd 服务和 1 个 3rd 服务)的 Ignite 节点组成。其中两个微服务部署了几个相互调用的 Ignite 服务。
集群成功启动,发现工作正常,所有节点都被添加到集群中。但突然之间,服务的两个实例(2 个节点)都失败并出现以下错误:
java.lang.IllegalStateException: Getting affinity for topology version earlier than affinity is calculated [locNode=TcpDiscoveryNode [id=76308a3b-221a-4307-b181-bd4e66d82683, addrs=[10.0.0.62, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, product-service-deployment-7dd5496d58-l426m/10.0.0.62:47500], discPort=47500, order=8, intOrder=6, lastExchangeTime=1560283011887, loc=true, ver=2.7.0#20181130-sha1:256ae401, isClient=false], grp=ignite-sys-cache, topVer=AffinityTopologyVersion [topVer=17, minorTopVer=0], head=AffinityTopologyVersion [topVer=18, minorTopVer=0], history=[AffinityTopologyVersion [topVer=9, minorTopVer=0], AffinityTopologyVersion [topVer=11, minorTopVer=0], AffinityTopologyVersion [topVer=11, minorTopVer=1], AffinityTopologyVersion [topVer=12, minorTopVer=0], AffinityTopologyVersion [topVer=14, minorTopVer=0], AffinityTopologyVersion [topVer=16, minorTopVer=0], AffinityTopologyVersion [topVer=18, minorTopVer=0]]]
at org.apache.ignite.internal.processors.affinity.GridAffinityAssignmentCache.cachedAffinity(GridAffinityAssignmentCache.java:712)
at org.apache.ignite.internal.processors.affinity.GridAffinityAssignmentCache.nodes(GridAffinityAssignmentCache.java:612)
at org.apache.ignite.internal.processors.cache.GridCacheAffinityManager.nodesByPartition(GridCacheAffinityManager.java:226)
at org.apache.ignite.internal.processors.cache.GridCacheAffinityManager.primaryByPartition(GridCacheAffinityManager.java:266)
at org.apache.ignite.internal.processors.cache.GridCacheAffinityManager.primaryByKey(GridCacheAffinityManager.java:257)
at org.apache.ignite.internal.processors.cache.GridCacheAffinityManager.primaryByKey(GridCacheAffinityManager.java:281)
at org.apache.ignite.internal.processors.service.GridServiceProcessor$TopologyListener$1.run0(GridServiceProcessor.java:1877)
at org.apache.ignite.internal.processors.service.GridServiceProcessor$DepRunnable.run(GridServiceProcessor.java:2064)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
这会导致另一个服务失败,因为它依赖于第一个服务:
Unhandled Exception: Apache.Ignite.Core.Services.ServiceInvocationException: Proxy method invocation failed with an exception. Examine InnerException for details. ---> Apache.Ignite.Core.Common.IgniteException: Failed to find deployed service: ProductService ---> Apache.Ignite.Core.Common.JavaException: class org.apache.ignite.IgniteException: Failed to find deployed service: ProductService
由于 Kubernetes 正在重新启动第二个服务,因此第一个服务会报告不断的拓扑变化:
[19:57:14] Topology snapshot [ver=20, locNode=76308a3b, servers=4, clients=0, state=ACTIVE, CPUs=4, offheap=6.2GB, heap=2.0GB]
[19:57:15] Topology snapshot [ver=21, locNode=76308a3b, servers=5, clients=0, state=ACTIVE, CPUs=5, offheap=7.8GB, heap=2.5GB]
[19:57:17] Topology snapshot [ver=22, locNode=76308a3b, servers=4, clients=0, state=ACTIVE, CPUs=4, offheap=6.2GB, heap=2.0GB]
[19:57:49] Topology snapshot [ver=23, locNode=76308a3b, servers=5, clients=0, state=ACTIVE, CPUs=5, offheap=7.8GB, heap=2.5GB]
[19:57:50] Topology snapshot [ver=24, locNode=76308a3b, servers=4, clients=0, state=ACTIVE, CPUs=4, offheap=6.2GB, heap=2.0GB]
[19:57:56] Topology snapshot [ver=25, locNode=76308a3b, servers=5, clients=0, state=ACTIVE, CPUs=5, offheap=7.8GB, heap=2.5GB]
[19:57:58] Topology snapshot [ver=26, locNode=76308a3b, servers=4, clients=0, state=ACTIVE, CPUs=4, offheap=6.2GB, heap=2.0GB]
[19:58:41] Topology snapshot [ver=27, locNode=76308a3b, servers=5, clients=0, state=ACTIVE, CPUs=5, offheap=7.8GB, heap=2.5GB]
就在我发现这个问题之前,我对 Kubernetes 集群进行了一次小的重新配置,这并没有导致 Pod 重新启动。不知道这是否可能是有问题的情况的原因。
这是一个有解决方案的已知问题吗?我应该检查什么(特别是在日志中)可以阐明这种情况?
谢谢!
解决方案
Getting affinity for topology version earlier than affinity is calculated
错误是由已知问题引起的。这是它的 JIRA 票:https ://issues.apache.org/jira/browse/IGNITE-8098
到目前为止,尚未注意到此问题的负面影响,因此 pod 故障可能是由其他原因引起的。
在 Ignite 2.8 中不会有这样的问题,因为服务处理器的实现已经完全重做。这是相关的 IEP:https ://cwiki.apache.org/confluence/display/IGNITE/IEP-17%3A+Oil+Change+in+Service+Grid
推荐阅读
- postgresql - 取消嵌套数组值并计算每个值的出现次数
- python - 在容器中找不到 Python 模块
- postgresql - 如何使用纯 docker-compose 将 PostGIS 添加到我的 PostgreSQL 设置中
- cmake - CMake 在子目录中找不到 Fortran 源文件
- unit-testing - 由于 LocaDateTime 字段,单元测试未通过
- httpclient - PoolingHttpClientConnectionManager(apache.httpcomponents) 对 maxTotal 和 defaultMaxPerRoute 使用惰性或急切初始化?
- r - 从互相关到 R 中的列格式
- python - Selenium:能够通过完整的 XPath 定位元素,无法通过带有属性选择器的 XPath 定位
- python-3.x - 在 django 中手动添加日期和时间
- mysql - 如何根据 laravel 中的一列获取最新记录?