首页 > 解决方案 > 是否可以避免嵌套的 RetryLoop.callWithRetry 调用,以便我获得一致的超时?

问题描述

我已经使用 BoundedExponentialBackoffRetry 配置了一个合理的超时时间,并且当我进行像“create.forPath”这样的调用时,如果 ZK 关闭,它通常可以正常工作。但是,如果当我在 InterProcessReadWriteLock 上调用获取时 ZK 不可用,则它最终超时之前需要更长的时间。

我调用acquire,它被包裹在“RetryLoop.callWithRetry”中,它继续调用findProtectedNodeInForeground,它也被包裹在“RetryLoop.callWithRetry”中。如果我已将 BoundedExponentialBackoffRetry 配置为重试 20 次,则内部重试对 20 个外部重试循环中的每一个循环尝试 20 次,因此它重试 400 次。

我们真的需要一个一致的超时,之后我们就会失败。我在这方面做错了什么吗?如果没有,我想我会在一个新线程中调用麻烦的方法,我可以在我自己的超时后杀死这些方法。

这是重新创建它的示例代码。我在注释后面的行处设置断点,关闭 ZK,然后让它继续并在它重试时获取堆栈跟踪。

public class GoCurator {
public static void main(String[] args) throws Exception {

    CuratorFramework cf = CuratorFrameworkFactory.newClient(
            "localhost:2181",
            new BoundedExponentialBackoffRetry(200, 10000, 20)
    );
    cf.start();

    String root = "/myRoot";
    if(cf.checkExists().forPath(root) == null) {
        // Stacktrace A showing what happens if ZK is down for this call
        cf.create().forPath(root);
    }

    InterProcessReadWriteLock lcok = new InterProcessReadWriteLock(cf, "/grant/myLock");

    // See stacktrace B showing the nested re-try if ZK is down for this call
    lcok.readLock().acquire();

    lcok.readLock().release();

    System.out.println("done");
}

}

Stacktrace A(如果在我调用 create().forPath 时 ZK 已关闭)。这显示了单个重试循环,因此它在正确的尝试次数后存在:

  java.lang.Thread.State: WAITING
  at java.lang.Object.wait(Object.java:-1)
  at java.lang.Object.wait(Object.java:502)
  at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1499)
  at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1487)
  at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:2617)
  at org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:242)
  at org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:231)
  at org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:64)
  at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:100)
  at org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:228)
  at org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:219)
  at org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:41)
  at com.gebatech.curator.GoCurator.main(GoCurator.java:25)

Stacktrace B(如果当我调用 InterProcessReadWriteLock#readLock#acquire 时 ZK 已关闭)。这显示了嵌套的重试循环,因此它直到 20*20 次尝试才会退出。

  java.lang.Thread.State: WAITING
  at sun.misc.Unsafe.park(Unsafe.java:-1)
  at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
  at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
  at org.apache.curator.CuratorZookeeperClient.internalBlockUntilConnectedOrTimedOut(CuratorZookeeperClient.java:434)
  at org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:56)
  at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:100)
  at org.apache.curator.framework.imps.CreateBuilderImpl.findProtectedNodeInForeground(CreateBuilderImpl.java:1239)
  at org.apache.curator.framework.imps.CreateBuilderImpl.access$1700(CreateBuilderImpl.java:51)
  at org.apache.curator.framework.imps.CreateBuilderImpl$17.call(CreateBuilderImpl.java:1167)
  at org.apache.curator.framework.imps.CreateBuilderImpl$17.call(CreateBuilderImpl.java:1156)
  at org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:64)
  at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:100)
  at org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:1153)
  at org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:607)
  at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:597)
  at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:575)
  at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:51)
  at org.apache.curator.framework.recipes.locks.StandardLockInternalsDriver.createsTheLock(StandardLockInternalsDriver.java:54)
  at org.apache.curator.framework.recipes.locks.LockInternals.attemptLock(LockInternals.java:225)
  at org.apache.curator.framework.recipes.locks.InterProcessMutex.internalLock(InterProcessMutex.java:237)
  at org.apache.curator.framework.recipes.locks.InterProcessMutex.acquire(InterProcessMutex.java:89)
  at com.gebatech.curator.GoCurator.main(GoCurator.java:29)

标签: apache-zookeeperapache-curator

解决方案


事实证明,Curator 如何使用重试是一个长期存在的真实问题。我在这里准备好了修复和 PR:https ://github.com/apache/curator/pull/346 - 我会很感激更多的关注。


推荐阅读