Wednesday, February 7, 2024

[SOLVED] what stops GCC __restrict__ qualifier from working

Issue

Here is some fairly straightforward code, compiled with -O2 (gcc 4.8.5) :

unsigned char  * linebuf;
int yuyv_tojpegycbcr(unsigned char * buf, int w)
{
    int  col;
    unsigned char * restrict pix = buf;
    unsigned char * restrict line = linebuf;

    for(col = 0; col < w - 1; col +=2)
    {
            line[col*3] = pix[0];
            line[col*3 + 1] = pix[1];
            line[col*3 + 2] = pix[3];
            line[col*3 + 3] = pix[2];
            line[col*3 + 4] = pix[1];
            line[col*3 + 5] = pix[3];
            pix += 4;
    }
    return 0;
}

and here is the corresponding assembly :

0000000000000000 <yuyv_tojpegycbcr>:
   0:   83 fe 01                cmp    $0x1,%esi
   3:   48 8b 05 00 00 00 00    mov    0x0(%rip),%rax        # a <yuyv_tojpegycbcr+0xa>
   a:   7e 4e                   jle    5a <yuyv_tojpegycbcr+0x5a>
   c:   83 ee 02                sub    $0x2,%esi
   f:   31 d2                   xor    %edx,%edx
  11:   d1 ee                   shr    %esi
  13:   48 8d 74 76 03          lea    0x3(%rsi,%rsi,2),%rsi
  18:   48 01 f6                add    %rsi,%rsi
  1b:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
  20:   0f b6 0f                movzbl (%rdi),%ecx
  23:   48 83 c2 06             add    $0x6,%rdx
  27:   48 83 c7 04             add    $0x4,%rdi
  2b:   48 83 c0 06             add    $0x6,%rax
  2f:   88 48 fa                mov    %cl,-0x6(%rax)
  32:   0f b6 4f fd             movzbl -0x3(%rdi),%ecx
  36:   88 48 fb                mov    %cl,-0x5(%rax)
  39:   0f b6 4f ff             movzbl -0x1(%rdi),%ecx
  3d:   88 48 fc                mov    %cl,-0x4(%rax)
  40:   0f b6 4f fe             movzbl -0x2(%rdi),%ecx
  44:   88 48 fd                mov    %cl,-0x3(%rax)
  47:   0f b6 4f fd             movzbl -0x3(%rdi),%ecx
  4b:   88 48 fe                mov    %cl,-0x2(%rax)
  4e:   0f b6 4f ff             movzbl -0x1(%rdi),%ecx
  52:   88 48 ff                mov    %cl,-0x1(%rax)
  55:   48 39 f2                cmp    %rsi,%rdx
  58:   75 c6                   jne    20 <yuyv_tojpegycbcr+0x20>
  5a:   31 c0                   xor    %eax,%eax
  5c:   c3                      retq   

When compiled without the restrict qualifier, the output is identical : A lots of intermixed loads and store. Some value are loaded twice, and it looks like no optimisation happened. If pix and line are unaliased, I expect the compiler to be smart enough, and among other things load pix[1] and pix[3] only once.

Do you know of anything that could disqualify the restrict qualifier ?

PS : With a newer gcc (4.9.2), on another architecture (arm v7), the result is similar. Here is a test script to compare the generated code with and without restrict.

#!/bin/sh
gcc -c -o test.o -std=c99 -O2 yuyv_to_jpegycbcr.c
objdump -d test.o > test.S


gcc -c -o test2.o -O2 -D restrict='' yuyv_to_jpegycbcr.c
objdump -d test2.o > test2.S

Solution

Put the restrict on the function parameters rather than the local variables.

From my experience, most compilers (including GCC) utilize the restrict only if it is specified on the function parameters. All uses on local variables within a function are ignored.

I suspect this has to do with aliasing analysis being done at the function-level rather than the basic-block level. But I have no evidence to back this up. Furthermore, it probably varies by compiler and compiler version.

Either way, these sorts of things are pretty finicky to rely on. So if the performance matters, either you optimize it manually, or you remember to revisit it every time you upgrade or change compilers.



Answered By - Mysticial
Answer Checked By - Timothy Miller (WPSolving Admin)