首页 > 解决方案 > 自旋锁退避策略背后的原因

问题描述

我正在查看来自 OpenJDK12 的 JVM HotSpot 中的自旋锁实现。以下是它的实现方式(保留评论):

// Polite TATAS spinlock with exponential backoff - bounded spin.
// Ideally we'd use processor cycles, time or vtime to control
// the loop, but we currently use iterations.
// All the constants within were derived empirically but work over
// over the spectrum of J2SE reference platforms.
// On Niagara-class systems the back-off is unnecessary but
// is relatively harmless.  (At worst it'll slightly retard
// acquisition times).  The back-off is critical for older SMP systems
// where constant fetching of the LockWord would otherwise impair
// scalability.
//
// Clamp spinning at approximately 1/2 of a context-switch round-trip.
// See synchronizer.cpp for details and rationale.

int Monitor::TrySpin(Thread * const Self) {
  if (TryLock())    return 1;
  if (!os::is_MP()) return 0;

  int Probes  = 0;
  int Delay   = 0;
  int SpinMax = 20;
  for (;;) {
    intptr_t v = _LockWord.FullWord;
    if ((v & _LBIT) == 0) {
      if (Atomic::cmpxchg (v|_LBIT, &_LockWord.FullWord, v) == v) {
        return 1;
      }
      continue;
    }

    SpinPause();

    // Periodically increase Delay -- variable Delay form
    // conceptually: delay *= 1 + 1/Exponent
    ++Probes;
    if (Probes > SpinMax) return 0;

    if ((Probes & 0x7) == 0) {
      Delay = ((Delay << 1)|1) & 0x7FF;
      // CONSIDER: Delay += 1 + (Delay/4); Delay &= 0x7FF ;
    }

    // Stall for "Delay" time units - iterations in the current implementation.
    // Avoid generating coherency traffic while stalled.
    // Possible ways to delay:
    //   PAUSE, SLEEP, MEMBAR #sync, MEMBAR #halt,
    //   wr %g0,%asi, gethrtime, rdstick, rdtick, rdtsc, etc. ...
    // Note that on Niagara-class systems we want to minimize STs in the
    // spin loop.  N1 and brethren write-around the L1$ over the xbar into the L2$.
    // Furthermore, they don't have a W$ like traditional SPARC processors.
    // We currently use a Marsaglia Shift-Xor RNG loop.
    if (Self != NULL) {
      jint rv = Self->rng[0];
      for (int k = Delay; --k >= 0;) {
        rv = MarsagliaXORV(rv);
        if (SafepointMechanism::should_block(Self)) return 0;
      }
      Self->rng[0] = rv;
    } else {
      Stall(Delay);
    }
  }
}

链接到源

Atomic::cmpxchgx86 上实现为

template<>
template<typename T>
inline T Atomic::PlatformCmpxchg<8>::operator()(T exchange_value,
                                                T volatile* dest,
                                                T compare_value,
                                                atomic_memory_order /* order */) const {
  STATIC_ASSERT(8 == sizeof(T));
  __asm__ __volatile__ ("lock cmpxchgq %1,(%3)"
                        : "=a" (exchange_value)
                        : "r" (exchange_value), "a" (compare_value), "r" (dest)
                        : "cc", "memory");
  return exchange_value;
}

链接到源

我不明白的是“旧 SMP”系统退避背后的原因。在commnets中说

回退对于旧的 SMP 系统至关重要,在这些系统中,不断获取 LockWord 会损害可伸缩性。

我可以想象的原因是在较旧的 SMP 系统上,当获取然后 CASingLockWord总线锁时总是断言(而不是缓存锁)。正如英特尔手册第 3 卷 8.1.4 所述:

对于 Intel486 和 Pentium 处理器,LOCK#信号总是在LOCK操作期间在总线上断言,即使被锁定的内存区域被缓存在处理器中。对于 P6 和更新的处理器系列,如果在 LOCK操作期间被锁定的内存区域被缓存在执行LOCK 操作的处理器中作为回写内存并且完全包含在缓存行中,则处理器可能不会断言该LOCK#信号在公交车上。

这是真正的原因吗?或者那是什么?

标签: c++assemblycpu-architecturelock-freecompare-and-swap

解决方案


推荐阅读