c - SSE Instructions
问题描述
I have a question regarding SSE instruction.
I hope this is the right place to ask such a question if not pls let me know and I will remove this question.
My goal is to use SSE instructions to execute calculations on 3 chars in parallel.
I have a typedef struct which has the attribute that it is packed
typedef struct
{
unsigned char x;
unsigned char y;
unsigned char z;
} __attribute__((packed)) Number;
For each char I have to go through a certain calculation.
As an example:
((Number[0].x * 20) / 256);
I have to do a small calculation for every char and then add them together.
Since I have to write the code in assembly I have already done some research and stumble upon this instruction:
__m128i _mm_add_epi8 (__m128i a, __m128i b)
As far as I am concerned this should add two values (who have each the size of 8 bytes) together and save the result.
At least that's how I understand it: From this link
But since we only add two values together this defeats the whole purpose of executing multiple instructions at once.
Any help would be very apricated. Kind regards!
解决方案
如果您可以提供有关您实际使用它的更多信息,则可能可以更好地优化它,但根据您所写的内容,我猜您会想要类似_mm_srli_epi32(_mm_mullo_epi32(_mm_set_epi32(n.x, n.y, n.z, 0), _mm_set1_epi32(20)), 8)
. 它需要 SSE 4.1,但如果您想要适用于 SSE 2 的东西,请参阅SSE 乘以 4 个 32 位整数进行_mm_mullo_epi32
替换。
您没有指定要对结果做什么,但您可以使用类似((int*) &r_sse)[i]
的方法来访问结果,其中i
1 代表 z,2 代表 y,1 代表 x。
推荐阅读
- javascript - if/else Javascript 语句返回相反的值
- ruby-on-rails - 如何在 Ruby on Rails 中使用数组对键进行分组和合并
- mysql - MySQL存储过程`错误代码:1064`在运行时
- python - Python pandas:根据较低级别的间距单列到多列
- json - AWS powershell commandlet (Write-CWDashboard) 输出消息“应该与 oneOf 中的一个模式完全匹配”
- python - 为什么当我使用 multiprocessing.Process 运行时 ZeroMQ 无法通信?
- c++ - 使用 Stroustrup 示例的 condition_vairable::wait_for() 问题
- loops - 如何循环寄存器输出的主机变量
- spring-webflux - 从 Mono.fromCallable 返回 Mono.empty()
- java - Apache Camel 到 Firebase 云消息传递 API 400 错误请求错误 NOT_A_JSON_REQUEST