首页 > 解决方案 > Y86 架构立即数 VS 寄存器算术效率问题

问题描述

我正在与 Y86 程序的计算机体系结构课程中的一个团队合作,以实现乘法函数 imul。我们有一个可以工作的代码块,但我们正在努力使其尽可能高效地执行。目前,对于 imul,我们的块看起来像这样:

imul:
    # push all used registers to stack for preservation
    pushq %rdi
    pushq %rsi
    pushq %r8
    pushq %r9
    pushq %r10
    
    irmovq 0, %r9       # set 0 into r9
    rrmovq %rdi, %r10   # preserve rdi in r10
    subq %rsi, %rdi     # compare rdi and rsi
    rrmovq %r10, %rdi   # restore rdi
    jl continue         # if rdi (looping value/count) less than rsi, don't swap
    
swap:
    # swap rsi and rdi to make rdi smaller value of the two
    rrmovq %rsi, %rdi
    rrmovq %r10, %rsi
    
continue: 
    subq %r9, %rdi      # check if rdi is zero
    cmove %r9, %rax     # if rdi = 0, rax = 0
    je imulDone         # if rdi = 0, jump to end
    irmovq 1, %r8       # set 1 into r8
    rrmovq %rsi, %rax   # set rax equal to initial value from rsi
    
imulLoop:
    subq %r8, %rdi      # count - 1
    je imulDone         # if count = 0, jump to end
    addq %rsi, %rax     # add another instance of rsi into rax, looped adition
    jmp imulLoop        # restart loop
    
imulDone:
    # pop all used registers from stack to original values and return 
    popq %r10
    popq %r9
    popq %r8
    popq %rsi
    popq %rdi
    ret  

现在我们最好的想法是使用立即算术指令(isubq 等)而不是普通的 OPq 指令,并将常量设置到寄存器中并使用这些寄存器。在这种特定情况下,这种方法会更有效吗?非常感谢!

标签: performanceassemblymicro-optimizationy86immediate-operand

解决方案


推荐阅读