Issue
I am using as
and gcc
to assemble and create executables of ARM assembly programs, as recommended by this tutorial, as follows:
Given an assembly source file, program.s
, I run:
as -o program.o program.s
Then:
gcc -o program program.o
However, running gcc
on the assembly source directly like so:
gcc -o program program.s
yields the same result.
Does gcc
call as
behind the scenes? Is there any reason at all to use both as
and gcc
given that gcc
alone is able to produce an executable from source?
I am running this on a Raspberry Pi 3, Raspbian Jessie (a Debian derivative), gcc 4.9.2, as 2.25.
Solution
GCC calls all kinds of things behind the scenes; not just as
but ld
as well. This is pretty easy to instrument if you want to prove it (replace the CORRECT as
and ld
and other binaries with ones that say print out their command line, then run GCC and see that binary gets called).
When you use GCC as an assembler, it goes through a C preprocessor, so you can do some fairly disgusting things like this:
start.s
//this is a comment
@this is a comment
#define FOO BAR
.globl _start
_start:
mov sp,#0x80000
bl hello
b .
.globl world
world:
bx lr
And to see more of what is going on, here are other files:
so.h
unsigned int world ( unsigned int, unsigned int );
#define FIVE 5
#define SIX 6
so.c
#include "so.h"
unsigned int hello ( void )
{
unsigned int a,b,c;
a=FIVE;
b=SIX;
c=world(a,b);
return(c+1);
}
build
arm-none-eabi-gcc -save-temps -nostdlib -nostartfiles -ffreestanding -O2 start.s so.c -o so.elf
arm-none-eabi-objdump -D so.elf
producing
00008000 <_start>:
8000: e3a0d702 mov sp, #524288 ; 0x80000
8004: eb000001 bl 8010 <hello>
8008: eafffffe b 8008 <_start+0x8>
0000800c <world>:
800c: e12fff1e bx lr
00008010 <hello>:
8010: e92d4010 push {r4, lr}
8014: e3a01006 mov r1, #6
8018: e3a00005 mov r0, #5
801c: ebfffffa bl 800c <world>
8020: e8bd4010 pop {r4, lr}
8024: e2800001 add r0, r0, #1
8028: e12fff1e bx lr
being a very simple project. Here is so.i after the pre-processor, which goes and gets the include files and replaces the defines:
# 1 "so.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "so.c"
# 1 "so.h" 1
unsigned int world ( unsigned int, unsigned int );
# 4 "so.c" 2
unsigned int hello ( void )
{
unsigned int a,b,c;
a=5;
b=6;
c=world(a,b);
return(c+1);
}
Then GCC calls the actual compiler (whose program name is not GCC).
That produces so.s:
.cpu arm7tdmi
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 1
.eabi_attribute 30, 2
.eabi_attribute 34, 0
.eabi_attribute 18, 4
.file "so.c"
.text
.align 2
.global hello
.syntax unified
.arm
.fpu softvfp
.type hello, %function
hello:
@ Function supports interworking.
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
push {r4, lr}
mov r1, #6
mov r0, #5
bl world
pop {r4, lr}
add r0, r0, #1
bx lr
.size hello, .-hello
.ident "GCC: (GNU) 6.3.0"
Which is then fed to the assembler to make so.o. Then the linker is called to turn these into so.elf.
Now, you can do most of the calls directly. That doesn't mean that these programs have other programs they call. GCC still calls one or more programs to actually do the compile.
arm-none-eabi-as start.s -o start.o
arm-none-eabi-gcc -O2 -S so.c
arm-none-eabi-as so.s -o so.o
arm-none-eabi-ld start.o so.o -o so.elf
arm-none-eabi-objdump -D so.elf
Giving the same result:
00008000 <_start>:
8000: e3a0d702 mov sp, #524288 ; 0x80000
8004: eb000001 bl 8010 <hello>
8008: eafffffe b 8008 <_start+0x8>
0000800c <world>:
800c: e12fff1e bx lr
00008010 <hello>:
8010: e92d4010 push {r4, lr}
8014: e3a01006 mov r1, #6
8018: e3a00005 mov r0, #5
801c: ebfffffa bl 800c <world>
8020: e8bd4010 pop {r4, lr}
8024: e2800001 add r0, r0, #1
8028: e12fff1e bx lr
Using -S with GCC does feel a bit wrong. Using it like this instead feels more natural:
arm-none-eabi-gcc -O2 -c so.c -o so.o
Now there is a linker script that we didn't provide which the toolchain has a default for. We can control that, and depending on what this is being aimed at, perhaps we should.
I am not happy to see that the new/current version of as
is tolerant of C comments, etc... Didn't used to be that way, must be a new thing with the latest release.
Thus the term "toolchain" it is a number of tools chained together, one linked to the next in order.
Not all compilers take the assembly language step. Some compile to intermediate code, and then there is another tool that turns that compiler specific intermediate code into assembly language. Then some assembler is called (GCC's intermediate code is inside tables in the compile step, where clang/llvm you can ask it to compile to this code then go from there to assembly language for one of the targets).
Some compilers go straight to machine code and don't stop at assembly language. This is likely one of those "climb the mountain just because it is there" things vs "go around". Like writing an operating system purely in assembly language.
For any decent sized project and a tool that can support it, you are going to have a linker, and an assembler the first tool you make to support a new to you target. The processor (chip or ip or both) vendor is going to have an assembler and then other tools available as well.
Try compiling even the above simple C program by hand using assembly language. Then try it again without using assembly language, by hand, using just machine code. You will find that using assembly language as an intermediate step is far more sane for compiler developers, along with the fact that it has been done this way forever, which is also a good reason to keep doing it this way.
If you wander about in the gnu toolchain directory you are using, you may find programs like cc1:
./libexec/gcc/arm-none-eabi/6.3.0/cc1 --help
The following options are specific to just the language Ada:
None found. Use --help=Ada to show *all* the options supported by the Ada front-end.
The following options are specific to just the language AdaSCIL:
None found. Use --help=AdaSCIL to show *all* the options supported by the AdaSCIL front-end.
The following options are specific to just the language AdaWhy:
None found. Use --help=AdaWhy to show *all* the options supported by the AdaWhy front-end.
The following options are specific to just the language C:
None found. Use --help=C to show *all* the options supported by the C front-end.
The following options are specific to just the language C++:
-Wplacement-new
-Wplacement-new= 0xffffffff
The following options are specific to just the language Fortran:
Now if you run that cc1 program against the so.i file you saved with -save-temps
, you get so.s the assembly language file.
You could probably continue to dig into the directory or the gnu tools sources to find even more goodies.
Note this question has been asked before many times here at Stack Overflow in various ways.
Also note main()
isn't anything special as I have demonstrated. In some compilers it might be, but I can make programs that don't require that function name.
Answered By - old_timer