首页 > 解决方案 > SSE Instructions

问题描述

I have a question regarding SSE instruction.

I hope this is the right place to ask such a question if not pls let me know and I will remove this question.

My goal is to use SSE instructions to execute calculations on 3 chars in parallel.

I have a typedef struct which has the attribute that it is packed

typedef struct
{
        unsigned char x;
        unsigned char y;
        unsigned char z;
} __attribute__((packed)) Number;

For each char I have to go through a certain calculation.

As an example:

((Number[0].x * 20)  / 256);

I have to do a small calculation for every char and then add them together.

Since I have to write the code in assembly I have already done some research and stumble upon this instruction:

__m128i _mm_add_epi8 (__m128i a, __m128i b)

As far as I am concerned this should add two values (who have each the size of 8 bytes) together and save the result.

At least that's how I understand it: From this link

But since we only add two values together this defeats the whole purpose of executing multiple instructions at once.

Any help would be very apricated. Kind regards!

标签: cintelsse

解决方案


如果您可以提供有关您实际使用它的更多信息,则可能可以更好地优化它,但根据您所写的内容,我猜您会想要类似_mm_srli_epi32(_mm_mullo_epi32(_mm_set_epi32(n.x, n.y, n.z, 0), _mm_set1_epi32(20)), 8). 它需要 SSE 4.1,但如果您想要适用于 SSE 2 的东西,请参阅SSE 乘以 4 个 32 位整数进行_mm_mullo_epi32替换。

您没有指定要对结果做什么,但您可以使用类似((int*) &r_sse)[i]的方法来访问结果,其中i1 代表 z,2 代表 y,1 代表 x。


推荐阅读