首页 > 解决方案 > Infinispan 9.4.16,JBoss EAP 7.3 与复制缓存 2 节点线程的锁争用是 TIMED_WAITING(停车)

问题描述

我有一个应用程序当前依赖 infinispan 复制缓存在所有节点之间共享一个工作队列。队列非常标准,头、尾和大小指针都保留在 infinispan 映射中。

我们已经从 Infinispan 7.2.5 升级到 9.4.16 并且注意到锁定性能比以前差了很多。当他们都试图同时初始化队列时,我已经设法从 2 个节点获取线程转储。Infinispan 7.2.5 的锁定和同步性能非常好,没有任何问题。现在我们看到了锁超时和更多的失败。

来自线程转储 2021-04-20 13:45:13 的节点 #1 部分堆栈跟踪:

"default task-2" #600 prio=5 os_prio=0 tid=0x000000000c559000 nid=0x1f8a waiting on condition [0x00007f4df3f72000]
   java.lang.Thread.State: TIMED_WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000006e1f4fec0> (a java.util.concurrent.CompletableFuture$Signaller)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
    at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1695)
    at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1775)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:105)
    at org.infinispan.interceptors.impl.SimpleAsyncInvocationStage.get(SimpleAsyncInvocationStage.java:38)
    at org.infinispan.interceptors.impl.AsyncInterceptorChainImpl.invoke(AsyncInterceptorChainImpl.java:250)
    at org.infinispan.cache.impl.CacheImpl.lock(CacheImpl.java:1077)
    at org.infinispan.cache.impl.CacheImpl.lock(CacheImpl.java:1057)
    at org.infinispan.cache.impl.AbstractDelegatingAdvancedCache.lock(AbstractDelegatingAdvancedCache.java:286)
    at org.infinispan.cache.impl.EncoderCache.lock(EncoderCache.java:318)
    at com.siperian.mrm.match.InfinispanQueue.initialize(InfinispanQueue.java:88)

来自线程转储的 Node#2 部分堆栈跟踪:2021-04-20 13:45:04:

"default task-2" #684 prio=5 os_prio=0 tid=0x0000000011f26000 nid=0x3c60 waiting on condition [0x00007f55107e4000]
   java.lang.Thread.State: TIMED_WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x0000000746bd36d8> (a java.util.concurrent.CompletableFuture$Signaller)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
    at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1695)
    at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1775)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:105)
    at org.infinispan.interceptors.impl.SimpleAsyncInvocationStage.get(SimpleAsyncInvocationStage.java:38)
    at org.infinispan.interceptors.impl.AsyncInterceptorChainImpl.invoke(AsyncInterceptorChainImpl.java:250)
    at org.infinispan.cache.impl.CacheImpl.lock(CacheImpl.java:1077)
    at org.infinispan.cache.impl.CacheImpl.lock(CacheImpl.java:1057)
    at org.infinispan.cache.impl.AbstractDelegatingAdvancedCache.lock(AbstractDelegatingAdvancedCache.java:286)
    at org.infinispan.cache.impl.EncoderCache.lock(EncoderCache.java:318)
    at com.siperian.mrm.match.InfinispanQueue.initialize(InfinispanQueue.java:88)

在运行节点 #1 的机器的控制台上弹出客户端错误:

2021-04-20 13:45:49,069 ERROR [org.infinispan.interceptors.impl.InvocationContextInterceptor] (jgroups-15,infinispan-cleanse-cluster_192.168.0.24_cmx_system105,N1618938080334-63633(machine-id=M1618938080334)) ISPN000136: Error executing command LockControlCommand on Cache 'orclmdm-MDM_SAMPLE105/FUZZY_MATCH', writing keys []: org.infinispan.util.concurrent.TimeoutException: ISPN000299: Unable to acquire lock after 60 seconds for key QUEUE_TAIL_C_PARTY and requestor GlobalTx:N1618938080334-63633(machine-id=M1618938080334):429. Lock is held by GlobalTx:N1618938062946-60114(machine-id=M1618938062946):420
    at org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.get(DefaultLockManager.java:288)
    at org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.lock(DefaultLockManager.java:261)
    at org.infinispan.util.concurrent.locks.impl.DefaultLockManager$CompositeLockPromise.lock(DefaultLockManager.java:348)
    at org.infinispan.interceptors.locking.PessimisticLockingInterceptor.localLockCommandWork(PessimisticLockingInterceptor.java:208)
    at org.infinispan.interceptors.locking.PessimisticLockingInterceptor.lambda$new$0(PessimisticLockingInterceptor.java:46)
    at org.infinispan.interceptors.InvocationSuccessFunction.apply(InvocationSuccessFunction.java:25)
    at org.infinispan.interceptors.impl.QueueAsyncInvocationStage.invokeQueuedHandlers(QueueAsyncInvocationStage.java:118)
    at org.infinispan.interceptors.impl.QueueAsyncInvocationStage.accept(QueueAsyncInvocationStage.java:81)
    at org.infinispan.interceptors.impl.QueueAsyncInvocationStage.accept(QueueAsyncInvocationStage.java:30)
    at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
    at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
    at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
    at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
    at org.infinispan.remoting.transport.AbstractRequest.complete(AbstractRequest.java:67)
    at org.infinispan.remoting.transport.impl.MultiTargetRequest.onResponse(MultiTargetRequest.java:102)
    at org.infinispan.remoting.transport.impl.RequestRepository.addResponse(RequestRepository.java:52)
    at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processResponse(JGroupsTransport.java:1369)
    at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processMessage(JGroupsTransport.java:1272)
    at org.infinispan.remoting.transport.jgroups.JGroupsTransport.access$300(JGroupsTransport.java:126)
    at org.infinispan.remoting.transport.jgroups.JGroupsTransport$ChannelCallbacks.up(JGroupsTransport.java:1417)
    at org.jgroups.JChannel.up(JChannel.java:816)
    at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:900)
    at org.jgroups.protocols.pbcast.STATE_TRANSFER.up(STATE_TRANSFER.java:128)
    at org.jgroups.protocols.RSVP.up(RSVP.java:163)
    at org.jgroups.protocols.FRAG2.up(FRAG2.java:177)
    at org.jgroups.protocols.FlowControl.up(FlowControl.java:339)
    at org.jgroups.protocols.FlowControl.up(FlowControl.java:339)
    at org.jgroups.protocols.pbcast.GMS.up(GMS.java:872)
    at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:240)
    at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1008)
    at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:734)
    at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:389)
    at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:590)
    at org.jgroups.protocols.BARRIER.up(BARRIER.java:171)
    at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:131)
    at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:203)
    at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:253)
    at org.jgroups.protocols.MERGE3.up(MERGE3.java:280)
    at org.jgroups.protocols.Discovery.up(Discovery.java:295)
    at org.jgroups.protocols.TP.passMessageUp(TP.java:1250)
    at org.jgroups.util.SubmitToThreadPool$SingleMessageHandler.run(SubmitToThreadPool.java:87)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

Infinispan 配置:

<?xml version="1.0" encoding="UTF-8"?>
<infinispan
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="urn:infinispan:config:9.4 http://www.infinispan.org/schemas/infinispan-config-9.4.xsd"
        xmlns="urn:infinispan:config:9.4">    

    <jgroups>
        <stack-file name="mdmudp" path="$cmx.home$/jgroups-udp.xml" />
        <stack-file name="mdmtcp" path="$cmx.home$/jgroups-tcp.xml" />
    </jgroups>

    <cache-container name="MDMCacheManager" statistics="true"
        shutdown-hook="DEFAULT">
        <transport stack="mdmudp" cluster="infinispan-cluster"
            node-name="$node$" machine="$machine$" />

        <jmx domain="org.infinispan.mdm.hub"/>  

        <replicated-cache name="FUZZY_MATCH" statistics="true" unreliable-return-values="false">
            <locking isolation="READ_COMMITTED" acquire-timeout="60000"
                concurrency-level="5000" striping="false" />
            <transaction
                transaction-manager-lookup="org.infinispan.transaction.lookup.GenericTransactionManagerLookup"
                stop-timeout="30000" auto-commit="true" locking="PESSIMISTIC"
                mode="NON_XA" notifications="true" />
        </replicated-cache>

    </cache-container>
</infinispan>

我们默认使用 udp 多播,这里是 udp 配置:

<!--
  Default stack using IP multicasting. It is similar to the "udp"
  stack in stacks.xml, but doesn't use streaming state transfer and flushing
  author: Bela Ban
-->

<config xmlns="urn:org:jgroups"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups.xsd">
    <UDP
         mcast_port="${jgroups.udp.mcast_port:46688}"
         ip_ttl="4"
         tos="8"
         ucast_recv_buf_size="5M"
         ucast_send_buf_size="5M"
         mcast_recv_buf_size="5M"
         mcast_send_buf_size="5M"
         max_bundle_size="64K"
         enable_diagnostics="true"
         thread_naming_pattern="cl"

         thread_pool.enabled="true"
         thread_pool.min_threads="2"
         thread_pool.max_threads="8"
         thread_pool.keep_alive_time="5000"/>

    <PING />
    <MERGE3 max_interval="30000"
            min_interval="10000"/>
    <FD_SOCK/>
    <FD_ALL/>
    <VERIFY_SUSPECT timeout="1500"  />
    <BARRIER />
    <pbcast.NAKACK2 xmit_interval="500"
                    xmit_table_num_rows="100"
                    xmit_table_msgs_per_row="2000"
                    xmit_table_max_compaction_time="30000"
                    use_mcast_xmit="false"
                    discard_delivered_msgs="true"/>
    <UNICAST3 xmit_interval="500"
              xmit_table_num_rows="100"
              xmit_table_msgs_per_row="2000"
              xmit_table_max_compaction_time="60000"
              conn_expiry_timeout="0"/>
    <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
                   max_bytes="4M"/>
    <pbcast.GMS print_local_addr="true" join_timeout="2000"
                view_bundling="true"/>
    <UFC max_credits="2M"
         min_threshold="0.4"/>
    <MFC max_credits="2M"
         min_threshold="0.4"/>
    <FRAG2 frag_size="60K"  />
    <RSVP resend_interval="2000" timeout="10000"/>
    <pbcast.STATE_TRANSFER />
    <!-- pbcast.FLUSH  /-->
</config>

任何关于配置的想法都会很棒。发生的情况是两个节点都超时并且队列没有正确初始化(空键)。提前致谢。顺便说一句,每个节点上最多有 24 个线程(总共 48 个)可以访问共享队列。

标签: lockinginfinispanjgroupscontentionparking

解决方案


我做了一些研究,结果发现在尝试在本地锁定密钥之前,首先对远程节点进行了针对复制缓存的锁定。我相信如果 node1 尝试锁定 node2 同时 node2 尝试锁定 node1,死锁是可能的。因此,我已将所有缓存更改为使用 Flag.FAIL_SILENTLY 和 Flag.ZERO_LOCK_ACQUISITION_TIMEOUT,并在从队列中添加或删除元素时在客户端添加了额外的重试逻辑。从最初的测试来看,现在看起来好多了。

我很好奇 Infinispan 7 和更高版本之间发生了什么变化,以使悲观锁定在新版本中表现得更差。旧客户端代码(没有标志或重试逻辑)在之前的相同测试条件下完美运行。我怀疑与使用 futures 和 forkJoinPool 相关的变化,因为我在其他项目中使用它们时遇到了问题,不得不回到使用标准 Executors 的旧方式做事。


推荐阅读