首页 > 解决方案 > C++ countdown in CyclicBarrier going wrong using atomic variables [solutions without locks please]

问题描述

I am trying to implement a cyclic barrier in C++ from scratch. Aim is to implement as conformant to Java implementation as possible. The class reference is here. https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/CyclicBarrier.html

Now in my testing the returnStatus should be for each thread which successfully trips the barrier , a value ranging from barrierLimit-1 to zero. I am trying to achieve this using atomic variables and memory fence. but my code is failing testing and in some cases two threads are same value of returnStatus.

Would some one please suggest if any technique can be helpful to resolve this. I want to solve this without using locks so that i can truly apply the lockless behaviour as much as possible.

The full code reference is at : https://github.com/anandkulkarnisg/CyclicBarrier/blob/master/CyclicBarrier.cpp

Sample test case result is below [ buggy case ]:

I am currently in thread id = 140578053969664.My barrier state count is = 4
I am currently in thread id = 140577877722880.My barrier state count is = 2
I am currently in thread id = 140577550407424.My barrier state count is = 1
I am currently in thread id = 140577936471808.My barrier state count is = 2
I am currently in thread id = 140577760225024.My barrier state count is = 0


The code snippet is below.

        // First check and ensure that the barrier is in good / broken state.
        if(!m_barrierState && !m_tripStatus)
        {
            // First check the status of the variable and immediately exit throwing exception if the count is zero.
            int returnResult;
            if(m_count == 0)
                throw std::string("The barrier has already tripped. Pleas reset the barrier before use again!!" + std::to_string(returnResult));

            // First ensure that the current wait gets the waiting result assigned immediately.

            std::atomic_thread_fence(std::memory_order_acquire);
            m_count.fetch_sub(1, std::memory_order_seq_cst);
            returnResult = m_count.load();
    std::atomic_thread_fence(std::memory_order_release);

标签: c++multithreadingc++11relaxed-atomics

解决方案


std::atomic_thread_fence(std::memory_order_acquire);
m_count.fetch_sub(1, std::memory_order_seq_cst);      // [1]
returnResult = m_count.load();                        // [2]
std::atomic_thread_fence(std::memory_order_release);

[2]多个线程同时执行此步骤。std::atomic_thread_fence不会阻止其他线程同时运行相同的代码。这就是 2 个线程最终可以得到相同值的方式。

fetch_sub相反,在标有的行上捕获返回值[1]

returnResult = m_count.fetch_sub(1, std::memory_order_seq_cst) - 1;

顺便说一句,我很确定你不需要这里的栅栏。(如果没有看到更多的功能,我真的无法判断。)如果你这样做,你可能只是切换returnResult为原子而不是。

看起来您正在使用栅栏,就好像它们是事务性内存一样。他们不是。当任何使用获取的 CPU 感知时,该版本本质上控制存储顺序的保证。只要它不破坏排序保证,写入就可以在实际处理发布之前自由传播。作为一个思想实验,想象一下它[1]被执行,然后发生上下文切换,一百万年过去了,然后[2]被执行。现在假设m_count它与一百万年前具有相同的价值显然是荒谬的。释放可能会刷新写入缓冲区,但可能已经刷新了更改。

seq_cst最后,如果您与acquire/release语义混合,可能会发生奇怪的事情。抱歉,这含糊不清,但我理解得不够好,无法尝试解释。


推荐阅读