gcc main.c | Analysis for beginners | Linux
Have you ever wonder, what happens when you compile a *.c file?
What is “compile” and why we do it?
Has always…let’s do this in simple words.
When you write C code the computer can’t understand the instructions you just write, so the computer needs a translation to something closer to his language (binary).
The translation between your code and the machine takes place in a process called compile.
In this case, we are going to focus on the C compiler named gcc.
your_code_in_c.c --> gcc --> machine_code
The process
The first thing to understand is, this is pretended to be read by beginners, so, if you are a compiler pro, please leave in the comments all the knowledge you want to share.
We can divide this process into 4 main steps.
Hands-on
To understand this, the first thing you are going to do is a simple C file, that we are going to name main.c.
$ cat main.c#include <stdio.h>/**
* This is a comment
*/
int main(void)
{
return (0);
}
Preprocessor
The preprocessor is going to reunite all the code that he needs, that means:
- Get the code from stdio.h
- Ignore all the comments.
Lucky for us gcc will let us see this step by step, we are going to use the flag -E so gcc is only going to the preprocessor stage and the flag -o for the name of the output file.
$ gcc -E main.c -o output
Now let’s see the content of the output file.
Because the first line is #include <stdio.h> it’s going to put all the code from this library at the beginning of the output file.
$ head output# 1 "main.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "<command-line>" 2
# 1 "main.c"
# 1 "/usr/include/stdio.h" 1 3 4
# 27 "/usr/include/stdio.h" 3 4
# 1 "/usr/include/features.h" 1 3 4
# 374 "/usr/include/features.h" 3 4
If you what to read all the file you can use:
$ less output
At the end you will find:
$ tail output
# 2 "main.c" 2int main(void)
{
return (0);
}
Compiler
The compiler step will translate all the C code into Assembler code.
That means everything you write in C is being translated to a language closer to the hardware.
$ gcc -S main.c -o output
$ cat output
.file "main.c"
.text
.globl main
.type main, @function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4"
.section .note.GNU-stack,"",@progbits
Assembler
When the compiler it’s done, the assembler it’s next. It will assemble all the code in the output file, and make Object code.
We are going to use the flag -c to see what it will look like to this point without link-ing anything.
$ gcc -c main.c -o output
Now you can check the output file:
$ less output
^?ELF^B^A^A^@^@^@^@^@^@^@^@^@^A^@>^@^A^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^H^A^@^@^@^@^@^@^@^@^@^@@^@^@^@^@^@@^@^K^@^H^@UH<89><E5><B8>^@^@^@^@]<C3>^@GCC: (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4^@^@
^T^@^@^@^@^@^@^@^AzR^@^Ax^P^AESC^L^G^H<90>^A^@^@^\^@^@^@^\^@^@^@^@^@^@^@^K^@^@^@^@A^N^P<86>^BC^M^FF^L^G^H^@^@^@^@.symtab^@.strtab^@.shstrtab^@.text^@.data^@.bss^@.comment^@.note.GNU-stack^@.rela.eh_frame^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@ESC^@^@^@^A^@^@^@^F^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@@^@^@^@^@^@^@^@^K^@^@^@^@^@
Has you can see how we have an ELF at the beginning, this is something that the computer can understand better, to follow the instructions you gave him in C.
Linker
Finally, it will link all the libraries that you call in your code, this will allow your program reunite with the extra all-ready-compile code that you include in your C file.
The final product is your brand new executable file.
$ gcc main.c -o output
$ ./output
$ echo $?
$ 0
Extra
Try to answer these questions:
- What is an ELF?
- What $? will do?
- What’s the difference between compiled and interpreted?
Hope this can help you! See you latter… and remember… there's nothing like the man page gcc.