首页 > 解决方案 > 需要用汇编计算二进制中 1 的数量

问题描述

我有一个任务,我应该计算我设置的具有奇数的二进制 1 的数量,然后我需要在 7 段显示器上显示它。

在代码上,我写了一条我应该这样做的评论。

我正在使用德州仪器 msp430。我查看了另一种解决方案,但他们用 C 而不是组装,不幸的是无法弄清楚如何在组装时做到这一点。

       bis.b #11111111b, &P1DIR
       bic.b #11111111b, &P1OUT

loop_1:
       ; do stuff with &P1OUT
       call #delay
       ...

delay

       mov #0, R5
       mov #0, R4

odd_even:
           ;Over here i need to count number of 1's in binary but cant figure out how to do it
           jnz try
           jz delay_over


      ...
           ret

标签: assemblymsp430

解决方案


There are algorithms that are better for more than 8 bits. @rcgldr's answer is a useful start to a 16 or 32-bit popcount. See How to count the number of set bits in a 32-bit integer? for some bithack and other algorithms, including table lookup.

You could consider a 4-bit lookup table. MSP430 shifts are slow-ish (1 cycle per bit, and 1 instruction per bit if you don't have MSP430X). Or use a big 8-bit lookup table.

Or loop over the set bits, clearing the low bit with v &= v - 1;. In MSP430 that takes a MOV, DEC, and AND. That's great if only a couple bits are usually set, but they're often scattered.


But the simplest and smallest code-size way is to just loop over all the bits one at a time.

If you're going to loop one bit at a time to keep it simple and compact, you want to take advantage of the carry flag by shifting into carry and using ADDC (add-with-carry).

I tried to write C that compilers could turn into nice asm using ADDC, but https://godbolt.org/z/2Ev2IC is the best I managed. GCC and clang don't do very well for MSP430 with the tmp = a+a; carry = tmp<a; idiom that they recognize for x86 and most other architectures.

So anyway, you wanted asm in the first place:

;; simple naive bit-count.  Small code-size and not too slow for 8 bits

;; input in r12,  result: r11 = popcount(r12)
mov.w     #0, r11        ; retval = 0
.popcount_loop:          ; do{
    add.b   r12,r12          ; shift a bit into carry flag
    addc    #0, r11          ; add that bit to r11:  r11 += 0 + C

    tst.b    r12
    jnz   .popcount_loop ; } while( (uint8_t)r12 != 0);

Using byte operand-size for the add means that bit 7 goes into C, not bit 15.

We could instead use a right-shift to put the low bit into the C flag, especially if we expect many inputs to be small numbers (so the non-zero bits are all toward the lower end). According to this copy of a MSP430 / MSP430X instruction-set reference google found, plain MSP430 doesn't have shift right, only rotate-right through carry. RRC[.W] / RRC.B. MSP430X has some "rotates" that actually shift in zeros, so they're really shifts. But we don't need that if we make sure C=0 before we run it. Since the population count won't wrap, ADDC will reliably clear C for us.

We can optimize this to fewer instructions inside the loop (same code size but runs faster), by having both JNZ and ADDC consume flags from the same ADD. Since ADDC also writes flags, that means it has to be in the next iteration. So we have to skew the loop. We can peel the first iteration and do its ADD outside the loop. We won't check for zero afterwards, but that's fine. Running one extra iteration for input = 0x80 is not a correctness problem, and not worth spending extra instructions on.

; simple looping popcount, optimized for small numbers (right shift)
; and optimized for fewer instructions inside the loop

;; input in r12,  result: r11 = popcount(r12)
xor.w     r11, r11        ; r11=0,  C=!Z=0.   (mov doesn't set flags; this saves a CLRC)

rrc.b     r12             ; C = lsb(r12);   r12 >>= 1  ; prep for first iter

.popcount_loop:            ; do{
    addc    #0, r11          ; result += C;  Clears C because r11 won't wrap
    rrc.b   r12              ; C = lsb(r12);   r12 >>= 1;  Z = (r12==0)
    jnz    .popcount_loop  ; } while( (uint8_t)r12 != 0);

    addc    #0, r11        ; we left the loop with the last bit still in C

If your input value is zero-extended, you could use rrc.w r12 so the loop works for 8 or 16-bit values. But it's no slower because it still exits after shifting all the bits out the right.

Skewing the loop and peeling the first half of the first iteration, and last half of the last iteration, only cost us one extra instruction total. (And they're still all single-word instructions.)


You mention odd/even. Do you actually just want the parity? (Whether the population count is odd or even)? That's the same thing as the horizontal XOR of all the bits.

; Needs MSP430X for rrum, otherwise you can only shift by 1 bit per instruction

;; input in r12,  result: r12=parity(r12)
;; clobbers: r11
mov.b   r12, r11       ; copy the low byte, zero the upper byte of R11 (not that it matters)
rrum     #4, r11       ; costs 4 cycles for shift-count = 4
xor     r11, r12       ; low 4 bits ^= (high 4 bits >> 4)

mov.b   r12, r11
rrum     #2, r11       ; costs 2 cycles for shift-count = 2
xor     r11, r12       ; narrow again to 2 bits

mov.b   r12, r11
rrum    #1,  r11       ; costs 1 cycle for shift-count = 1.  
xor     r11, r12       ; narrow again to 2 bits

and      #1, r12       ; clear high garbage from the high bits.

; ret  if this isn't inline

You could do this with a loop, e.g. use the popcount loop and do and #1, r12 at the end.

I feel like maybe we could save instructions if we shifted left (by 4 then 2) and did the last step (shift by 1) with an add.b r12,r12, because signed overflow (the V flag) = carry_in XOR carry_out for the sign bit. With both inputs the same for add, the existing sign bit will always be 0+0=00 or 1+1=10, so the sign bit = carry_in to the sign bit.

So with a bit-pattern like r12.b = XY??????, add.b r12,r12 sets V = X^Y, the horizontal XOR of the top two bits of the input. Because Y is the carry-in to the MSB, and X is the carry-out.

This would be good if you want to branch on it, but MSP430 doesn't seem to have a jXX that branches on V being set or not. It has JL and JGE which branch on (N XOR V) (i.e. signed compare), but N will be equal to the MSB, so N ^ V is just C, after our left-shift V sets V = N ^ C. I guess you'd have to get the flag word out of the flag register and shift/mask it! Or test that flag bit and JNZ.


推荐阅读