首页 > 解决方案 > 您如何将“while”迭代器转换为 simd 指令?

问题描述

这是我实际拥有的代码(对于标量代码),我已复制(x4)将数据存储到 simd 中:

waveTable *waveTables[4];
for (int i = 0; i < 4; i++) {
    int waveTableIindex = 0;
    while ((phaseIncrement[i] >= mWaveTables[waveTableIindex].mTopFreq) && (waveTableIindex < kNumWaveTableSlots)) {
        waveTableIindex++;
    }
    waveTables[i] = &mWaveTables[waveTableIindex];
}

当然,它根本不是“更快”。你会如何用 simd 做同样的事情,节省 cpu?任何提示/起点?我在SSE2。

这是计算的上下文。 每个波表的 topFreq 从最大谐波量(x2,由于 Nyquist)开始计算,并在每个波表上乘以 2(稍后除以每个表可用的谐波数):

double topFreq = 1.0 / (maxHarmonic * 2);
while (maxHarmonic) {
    // fill the table in with the needed harmonics
    // ... makeWaveTable() code
    
    // prepare for next table
    topFreq *= 2;
    maxHarmonic >>= 1;
}

比处理时,对于每个样本,由于 osc 的频率(即相位增量),我需要“捕捉”要使用的正确波表:

freq = clamp(freq, 20.0f, 22050.0f);
phaseIncrement = freq * vSampleTime;

因此,例如(具有 vSampleTime = 1/44100,maxHarmonic = 500),30hz 是波表 0,50hz 是波表 1,依此类推

标签: c++while-loopsimdssesse2

解决方案


假设您的值是 FP32,我会这样做。未经测试。

const __m128 phaseIncrements = _mm_loadu_ps( phaseIncrement );
__m128i indices = _mm_setzero_si128();
__m128i activeIndices = _mm_set1_epi32( -1 );

for( size_t idx = 0; idx < kNumWaveTableSlots; idx++ )
{
    // Broadcast the mTopFreq value into FP32 vector. If you build this for AVX1, will become 1 very fast instruction.
    const __m128 topFreq = _mm_set1_ps( mWaveTables[ idx ].mTopFreq );
    // Compare for phaseIncrements >= topFreq
    const __m128 cmp_f32 = _mm_cmpge_ps( phaseIncrements, topFreq );
    // The following line compiles into no instruction, it's only to please the type checker
    __m128i cmp = _mm_castps_si128( cmp_f32 );
    // Bitwise AND with activeIndices
    cmp = _mm_and_si128( cmp, activeIndices );
    // The following line increments the indices vector by 1, only the lanes where cmp was TRUE
    indices = _mm_sub_epi32( indices, cmp );
    // Update the set of active lane indices
    activeIndices = cmp;
    // The vector may become completely zero, meaning all 4 lanes have encountered at least 1 value where topFreq < phaseIncrements
    if( 0 == _mm_movemask_epi8( activeIndices ) )
        break;
}

// Indices vector keeps 4 32-bit integers
// Each lane contains index of the first table entry less than the corresponding lane of phaseIncrements
// Or maybe kNumWaveTableSlots if not found

推荐阅读