首页 > 解决方案 > 在 CUDA 内核的原子操作中从 unsigned long long 变量中减一

问题描述

我有一个unsigned long long count需要在 CUDA 内核内部进行原子递减的操作。我该如何以正确的方式做到这一点?

atomicAdd(&count, -1); // Impossible as second argument is also required to be `unsigned long long`.
atomicSub(&count, 1); // Impossible, because `unsigned long long` is not supported.

标签: cudaatomic

解决方案


CUDA 支持的所有平台上的整数类型都使用二进制补码表示。这意味着从计数器n中减去一个数字n与将n的二进制补码添加到c相同。相同大小的整数类型可以很容易地在有符号和无符号表示之间进行类型转换。因此,我们可以使用更具可读性的.0xffffffffffffffffULL(unsigned long long int)(-1LL)

下面是一个简短的 CUDA 程序,它将一个unsigned long long int计数器初始化为一个非零的起始值,然后在启动的内核中每个线程都递减一次。请注意,为了简洁明了,已经消除了错误检查,这在实际代码中是不希望做的。

#include <stdio.h>
#include <stdlib.h>

#define BLOCK_COUNT  (2)
#define THREAD_COUNT (256)

__global__ void kernel (unsigned long long int *counter)
{
    atomicAdd (counter, (unsigned long long int)(-1LL));
}

int main (void)
{
    unsigned long long int counter;
    unsigned long long int *counter_d = 0;
    cudaMalloc ((void**)&counter_d, sizeof (*counter_d));
    cudaMemset (counter_d, 0x01, sizeof (*counter_d));
    cudaMemcpy (&counter, counter_d, sizeof counter, cudaMemcpyDeviceToHost);
    printf ("counter before kernel = %llu\n", counter);
    printf ("decrement counter with %lld threads\n", BLOCK_COUNT * THREAD_COUNT);
    kernel<<<BLOCK_COUNT, THREAD_COUNT>>>(counter_d);
    cudaMemcpy (&counter, counter_d, sizeof counter, cudaMemcpyDeviceToHost);
    printf ("counter after kernel  = %llu\n", counter);
    cudaFree (counter_d);
    cudaDeviceSynchronize ();
    return EXIT_SUCCESS;
}

该程序的输出应如下所示:

counter before kernel = 72340172838076673
decrement counter with 512 threads
counter after kernel  = 72340172838076161

推荐阅读