Thursday, February 3, 2022

[SOLVED] Why does this simple counting loop in assembly start behaving eratically when I add a random floating point instruction?

February 03, 2022 arm, assembly, floating-point, raspberry-pi

Issue

I am working on a bare metal OS for the Raspberry Pi model B, which features an ARM1176-JZF-S processor. While working on implementing the sine function for a math library, I encountered something very strange, which I've whittled down to a small-ish minimum reproducible example.

The following code counts up from zero to four, and prints out each number with spaces in between:

    mov     r4, #0              // Initialize counter to 0

c_loop$:
    ldr     r0, =IntString      // Convert counter to a string
    mov     r1, r4
    bl      int_to_str

    ldr     r0, =IntString      // Print the string
    ldr     r1, =0x00000FF0     // (Green text on black background)
    bl      print

    ldr     r0, =Space          // Print a space
    ldr     r1, =0x00000FF0     // (Green text on black background)
    bl      print

    mov     r5, #0x1000000      // Pause for a beat
c_pause$:
    subs    r5, #1
    bne     c_pause$

    add     r4, #1              // Increment counter
    cmp     r4, #5              // Repeat until counter = 5
    blt     c_loop$

halt:                           // Wait forever
    b       halt

The functions int_to_str and print were both written by me, and work fine. To be clear, they are not printing to any kind of output stream; they just write pixels in the shape of numbers directly to a frame buffer, which I got from the GPU through the mailbox system. The label IntString is a space for me to store the conversion of the counter to a string so I can print it out, and the label Space points to a string that's just a single space. This code works as intended and I see the numbers displayed on the screen.

Here's what's odd. Have a look at this floating-point operation:

    vadd.f32    s2, s0, s1      // What the heck is happening here?

When I add this into the loop right before the line where I increment the counter, I get different behavior entirely. Rather than printing "0, 1, 2, 3, 4", I now I see "0, 1, 0, 1, 0, 1, ..." repeating forever. Why is this happening? Why does the floating point instruction have any effect on this code at all?

Important additional info: A while ago, I was working on some code to draw a Mandelbrot fractal to the screen, using floating point arithmetic to do the calculations. Back then I believed that my Raspberry Pi had a Cortex A7 processor (which is what the newer models have) and I turned to the Cortex A7 Floating-Point Unit Technical Reference Manual which says that:

To use the Cortex-A7 FPU in Secure state and Non-secure state, first define the NSACR and then define the CPACR and FPEXC registers to enable the Cortex-A7 FPU.

It gave the following code snippet to accomplish this task:

    MRC     p15, 0, r0, c1, c1, 2
    ORR     r0, r0, #3<<10  // enable fpu
    MCR     p15, 0, r0, c1, c1, 2

    LDR     r0, =(0xF << 20)
    MCR     p15, 0, r0, c1, c0, 2

    MOV     r3, #0x40000000
    VMSR    FPEXC, r3

For some reason, this worked, and my Mandelbrot fractal appeared. Anyway, this snippet is present the program I'm working on today, directly above the code shown. When I remove it, I get different unexpected behavior. The program prints "0, 0, 0, ..." -- an infinite series of just 0's instead of 0's and 1's.

More details: My best guess about what's going on here is that the s0 and s1 floating point registers initially contain garbage, and that adding them together can raise an exception. This would explain a detail I haven't mentioned yet, which is that the code occasionally works even with the floating point instruction included -- maybe one time in five.

In order to test this theory, I tried setting all registers involved to zero right before the counting loop begins:

    mov     r0, #0
    vmov    s0, r0
    vmov    s1, r0
    vmov    s2, r0

And lo and behold, the loop worked again. However, as a further test, I decided to set both s0 and s1 to the maximum value a float can hold, reasoning that this should yield an overflow error and cause the unexpected behavior to return:

    ldr     r0, =0b01111111011111111111111111111111
    vmov    s0, r0
    vmov    s1, r0
    vmov    s2, r0

But this too leads to the correct counting behavior!

I'm at a loss for what's going on here. What's causing this?

Update: I've just noticed an issue. The code I'm using to assemble .s files into .o files is this:

    arm-none-eabi-as -o $@ $< -mfpu=vfpv4 -mcpu=cortex-a72 -mfloat-abi=hard

But this has two issues. One, the vfpv4 is incorrect as the model B features VFPv2, and two, cortex-a72 is incorrect as the model B features an ARM1176-JZF-S.

Fixing the first of these two issues doesn't change any of the behavior mentioned above (I re-tried each example and got the same results). The second issue seems more serious, however, since the man page for arm-none-eabi-as doesn't list the model B's processor type as one of the options. I will investigate further and post an update once I know more.

Solution

I have fixed this now. This web page explains what needs to be done to set up floating point numbers, and I was missing this part of the process:

    @; load the status register
    fmrx    r0, fpscr
    @; enable flush-to-zero (bit 24)
    orr     r0, #0x01000000
    @; disable traps (bits 8-12 and bit 15)
    bic     r0, #0x9f00
    @; save the status register
    fmxr    fpscr, r0

The page explains:

The default floating point mode on the ARM11 is to implement the most common floating point operations in hardware, and delgate to software for special cases. This is done by raising an unsupported operation exception, called a trap, in which you the programmer are supposed to figure out what went wrong (e.g., an underflow), calculate the correct result, and resume the program.

If, like me, you don't feel like implementing a bunch of floating point operations, there is an alternative: RunFast mode, or Flush-to-zero mode (which nearly means the same thing). This is a pure hardware floating point implementation which is not-quite IEEE 754-compliant. [...]

I haven't implemented any such handlers, so it looks like this configuration is what I need. I don't have a full mental picture of why this was causing the exact problem I was having, but I'm no longer surprised that there was a problem.

Answered By - MegaWidget

Answer Checked By - David Goodson (WPSolving Volunteer)

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, February 3, 2022

[SOLVED] Why does this simple counting loop in assembly start behaving eratically when I add a random floating point instruction?

Issue

Solution

Popular Posts

Labels