Compilation

TeachGuru Foundation (Regd.
)
Bangalore 560054 Contact: +91 9845456000, 9632203409 E-mail: teachgurufoundation@gmail.com About the Author: Subhash.K.U works as Senior Software Engineer at Mascon Global Ltd. and a consultant for Motorola India Pvt. Ltd. He is the author of Object Oriented Programming with C++ published by Pearson Sanguine. He is a passionate programmer, speaker, teacher and an author. He has delivered lectures at various schools and colleges on technical and motivational topics. He can be reached at subhash.writes@gmail.com. You can join his network on Orkut.
Overview of C/C++ compilation process in GNU/Linux:

Compilation is a process in which the human understandable source code of a program (using C/C++ programming languages) is converted into a machine understandable code (binaries after linking). The machine understandable code is known as the executable. In GNU/Linux, the compilation of a C/C++ program is done by using the utility gcc/g++. Note that gcc is a GNU/Linux C compiler and g++ is a GNU/Linux C++ compiler. This article explains how the gcc/g++ compilers will transform the source files into executable. Let us work on a C file named 1.c and a C++ file named 1.cpp throughout this article. The contents of the files 1.c and 1.cpp are very simple and famous Hello World program. /* 1.c */ #include <stdio.h> int main(void) { printf( Hello World\n); return 0; }
// 1.cpp #include <iostream> int main( ) { std::cout << Hello World << std::endl; return 0; }
The command to compile a C program in GNU/Linux is gcc 1.c and a C++ program in
GNU/Linux is compiled using the command g++ 1.cpp. Compilation process involves a series of steps, understanding of which is very important for any professional programmer. The steps involved are summarized in the Figure 1 given below. A detailed explanation follows. Readers should pay special attention to this section as this is mostly asked in examinations and interviews. Though the below explanation is based on the Linux like Operating systems, the underlying compilation procedure is common for most of non-Linux like operating systems available in market.
1.c C preprocessor ( 1.c to 1.i ) C compiler proper (1.i to 1.s) Figure 1 The sequence of steps involved in compilation of C/C++ program is as follows. Preprocessing Compiling Assembling Linking Assembler (1.s to 1.o) Linker (1.o to a.out )
a.out
All the above steps happen automatically in a single invocation of gcc or g++ as in the command gcc 1.c or g++ 1.cpp. But, in this article, let us try to split up these steps and see how exactly each step works by invoking specific dedicated commands for each operation. Preprocessing: This is the first step of the compilation process when a C/C++ program is compiled. During this step, the C/C++ source code is expanded based on the preprocessor directives included in the source code (For example: #include <stdio.h>, #define A 1 etc.). Hence in this step, the source code gets expanded and gets stored in an intermediate file with .i extension. For example, if the file 1.c/1.cpp is getting compiled, then during the preprocessing stage of the compilation process, the expanded source code is stored in 1.i (for .c files) and 1.ii (for .cpp files). One thing to be kept in mind is that, the expanded source files 1.i or 1.ii does not get saved in the disk unless the -save-temps option is added to the command as shown below. Else other way of saving the expanded source code is by redirecting the output. In GNU/Linux, this can be tried by typing the command gcc savetemps 1.c or g++ save-temps 1.cpp or gcc E 1.c > 1.i or gcc E 1.cpp > 1.ii onto the command prompt. For preprocessing, instead of invoking gcc/g++ compilers, we can just generate the expanded source code by using the command cpp E 1.c > 1.i or cpp E 1.cpp >
1.ii. Doing a cat on 1.i or 1.ii will display the entire expanded source code on the console. Compilation: This is the second step of the compilation process. During this process the expanded source code is passed to the compiler to identify the syntax errors. If the syntax of the programming language is wrong, then the compilation process stops and the error messages are displayed on the console. If there are no syntax errors, then the expanded source code is converted to assembly code that could be understood by the specific processor (Ex: Pentium family processors of Intel) of your computer. The assembly code file has the extension .s. In other words, after compiling of the expanded source code 1.i or 1.ii, another file1.s is generated that contains the assembly code. In GNU/Linux, we can practically see this by typing gcc save-temps 1.c or g++ save-temps 1.cpp or gcc S 1.i or g++ -S 1.ii onto the command line. Open the file 1.s and you can find the assembly code for the processor of your computer system. The below assembly code for the source file 1.c was obtained when the program was compiled for Intel x86 machine. /* 1.s */ .file "1.c" .section .rodata .LC0: .string "Hello World" .text .globl main .type main, @function main: .LFB2: pushq %rbp .LCFI0: movq %rsp, %rbp .LCFI1: movl $.LC0, %edi call puts movl $0, %eax leave ret .LFE2: .size main, .-main .section .eh_frame,"a",@progbits .Lframe1: .long .LECIE1-.LSCIE1 .LSCIE1: .long 0x0 .byte 0x1 .string "zR" .uleb128 0x1 .sleb128 -8 .byte 0x10 .uleb128 0x1
.byte .byte .uleb128 .uleb128 .byte .uleb128 .align 8 .LECIE1: .LSFDE1: .long .LASFDE1: .long .long .long .uleb128 .byte .long .byte .uleb128 .byte .uleb128 .byte .long .byte .uleb128 .align 8 .LEFDE1: .ident .section
0x3 0xc 0x7 0x8 0x90 0x1
.LEFDE1-.LASFDE1 .LASFDE1-.Lframe1 .LFB2 .LFE2-.LFB2 0x0 0x4 .LCFI0-.LFB2 0xe 0x10 0x86 0x2 0x4 .LCFI1-.LCFI0 0xd 0x6
"GCC: (GNU) 4.3.0 20080428 (Red Hat 4.3.0-8)" .note.GNU-stack,"",@progbits
Assembling : This is the third stage of the compilation process. Here, the assembler converts the assembly language code to machine dependent code (opcodes) and creates object file with the extension .o. Each object files contains a table known as symbol table. This symbol table contains the name, type and relative addresses (not original addresses) of the global variables; name and relative addresses of the function defined in the program; and name of the external functions like printf ( ) and scanf ( ). Note once again that, each object file will contain a symbol table. It is very important to keep in mind that, local variables do not come into picture during compilation process and therefore relative addresses are not generated for local variables during compilation. But the compiler generates instructions during compilation so that the memory for the local variables is generated during run time. Local variables always appear and disappear in the stack region of the RAM. Suppose we have files 1.c and 2.c, then during assembling process the object files generated for them are 1.o and 2.o. This can be practically seen in GNU/Linux by typing the command gcc -c 1.s for C programs and g++ -c 1.s for C++ programs or gcc -save-temps 1.c for C programs and g++
save-temps 1.cpp for C++ programs. For assembling, instead of invoking gcc/g++ compilers, we can just generate the object code by using the command as o 1.o 1.s. as is the assembler used by the gcc/g++ compiler internally to convert assembly codes into machine code and then generated the object file. Note that for the assembler the filename with .s extension should be available. Linking: Linking is the final step in the compilation process. Linker links all the object files (.o extension files) to create one final executable file. a.out is the default created final executable file in Linux. We can change the default executable name a.out to any other name by typing the command gcc o <my_executable_name> 1.c for C programs and g++ o <my_executable_name> 1.cpp for C++ programs. Let us imagine a situation where we write two C++ program files namely 1.cpp and 2.cpp. In 1.cpp we have a main ( ) function definition within which we call another function add ( ). But the function definition for add ( ) is placed in 2.cpp. Therefore during linking process, the linker will take the help of the symbol table present in object files with .o obtained during assembling process and creates a Global symbol table and maps the function call in one file to its function definition in another file. Hence after linking process we get the final executable which contains a global symbol table which contains the information like name, type and relative addresses of the global and static variables; name and addresses of function defined in two different files; name and addresses of function defined in the same file; and name (but not the address) of external functions like printf( ) and scanf( ). As printf( ) and scanf( ) are defined in specific libraries their addresses are not available during compilation. But they are rather linked during run-time. In GNU/Linux, to see the global symbol table, type the command nm a.out or nm <my_executablename>. For linking purpose, instead of invoking gcc/g++ compilers we can explicitly invoke the GNU linker ld which the gcc/g++ compiler will invoke internally. In actual practice, an executable program requires many external functions from system and C runtime libraries (crt). The link commands used by the gcc/g++ compilers to perform linking internally is very complex. A sample invocation of the ld on my Fedora 9, x86 machines to link the above 1.c program looks like this: $ ld /lib64/ld-linux-x86-64.so.2 /usr/lib/gcc/x86_64-redhatlinux/4.3.0/../../../../lib64/crt1.o /usr/lib/gcc/x86_64-redhatlinux/4.3.0/../../../../lib64/crti.o /usr/lib/gcc/x86_64-redhatlinux/4.3.0/crtbegin.o -L/usr/lib/gcc/x86_64-redhat-linux/4.3.0 L/usr/lib/gcc/x86_64-redhat-linux/4.3.0 -L/usr/lib/gcc/x86_64-redhatlinux/4.3.0/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 L/usr/lib/gcc/x86_64-redhat-linux/4.3.0/../../.. -lgcc --as-needed lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc/x86_64-redhat-linux/4.3.0/crtend.o /usr/lib/gcc/x86_64redhat-linux/4.3.0/../../../../lib64/crtn.o 1.o After this command, the final executable a.out will be created in the current directory. An option -v can be added to the gcc/g++ compilers in order to display all the commands executed to run the stages of compilation on the stderr (standard error output). It also prints the version
number of the compiler driver program and of the preprocessor and the compiler proper.
Now, after the compilation, the executable is ready. The executable is placed in the hard disk. When this executable is executed, an operating system program known as Program Loader will load the executable program from the hard disk to the RAM. Once the executable is brought to the RAM, it is known as a process. The relative addresses present in the global symbol table are converted to the original RAM addresses. The CPU starts executing by fetching instructions from the executable present in the RAM.

Compilation

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Compilation

Hochgeladen von

Copyright:

Verfügbare Formate

TeachGuru Foundation (Regd.

Overview of C/C++ compilation process in GNU/Linux:

0x3 0xc 0x7 0x8 0x90 0x1

"GCC: (GNU) 4.3.0 20080428 (Red Hat 4.3.0-8)" .note.GNU-stack,"",@progbits

Das könnte Ihnen auch gefallen