c++ - 您如何将“while”迭代器转换为 simd 指令?
问题描述
这是我实际拥有的代码(对于标量代码),我已复制(x4)将数据存储到 simd 中:
waveTable *waveTables[4];
for (int i = 0; i < 4; i++) {
int waveTableIindex = 0;
while ((phaseIncrement[i] >= mWaveTables[waveTableIindex].mTopFreq) && (waveTableIindex < kNumWaveTableSlots)) {
waveTableIindex++;
}
waveTables[i] = &mWaveTables[waveTableIindex];
}
当然,它根本不是“更快”。你会如何用 simd 做同样的事情,节省 cpu?任何提示/起点?我在SSE2。
这是计算的上下文。 每个波表的 topFreq 从最大谐波量(x2,由于 Nyquist)开始计算,并在每个波表上乘以 2(稍后除以每个表可用的谐波数):
double topFreq = 1.0 / (maxHarmonic * 2);
while (maxHarmonic) {
// fill the table in with the needed harmonics
// ... makeWaveTable() code
// prepare for next table
topFreq *= 2;
maxHarmonic >>= 1;
}
比处理时,对于每个样本,由于 osc 的频率(即相位增量),我需要“捕捉”要使用的正确波表:
freq = clamp(freq, 20.0f, 22050.0f);
phaseIncrement = freq * vSampleTime;
因此,例如(具有 vSampleTime = 1/44100,maxHarmonic = 500),30hz 是波表 0,50hz 是波表 1,依此类推
解决方案
假设您的值是 FP32,我会这样做。未经测试。
const __m128 phaseIncrements = _mm_loadu_ps( phaseIncrement );
__m128i indices = _mm_setzero_si128();
__m128i activeIndices = _mm_set1_epi32( -1 );
for( size_t idx = 0; idx < kNumWaveTableSlots; idx++ )
{
// Broadcast the mTopFreq value into FP32 vector. If you build this for AVX1, will become 1 very fast instruction.
const __m128 topFreq = _mm_set1_ps( mWaveTables[ idx ].mTopFreq );
// Compare for phaseIncrements >= topFreq
const __m128 cmp_f32 = _mm_cmpge_ps( phaseIncrements, topFreq );
// The following line compiles into no instruction, it's only to please the type checker
__m128i cmp = _mm_castps_si128( cmp_f32 );
// Bitwise AND with activeIndices
cmp = _mm_and_si128( cmp, activeIndices );
// The following line increments the indices vector by 1, only the lanes where cmp was TRUE
indices = _mm_sub_epi32( indices, cmp );
// Update the set of active lane indices
activeIndices = cmp;
// The vector may become completely zero, meaning all 4 lanes have encountered at least 1 value where topFreq < phaseIncrements
if( 0 == _mm_movemask_epi8( activeIndices ) )
break;
}
// Indices vector keeps 4 32-bit integers
// Each lane contains index of the first table entry less than the corresponding lane of phaseIncrements
// Or maybe kNumWaveTableSlots if not found
推荐阅读
- reactjs - setState 钩子没有正确更新,即使在异步处理之后
- php - 167289889 字节的 POST 内容长度超过了 41943040 字节的限制
- python - 不使用 Selenium 渲染整个页面
- python - Django runserver 命令失败
- php - 慢速连接 PHP PDO Postgres 但不是 MySQL
- reactjs - 在我进行更改而不是保存时对重新编译做出反应
- python - 如何水平堆叠 Keras 模型?
- python - Python 未检测到“on_member_join”
- android - 应用内更新的 JobCancellationException
- node.js - kubernetes 上的 Jenkins 动态从站 - 构建时间非常长