The steps of compilation with GCC
When we are in Windows, most of our code is written by compilers such as vs. This type of compiler has a collective name called IDE (Integrated Development Environment). The Chinese name is Integrated Development Environment. Why is it called Integrated Development Environment? Because you only need to install one VS, you can edit, compile, debug, etc. Every time we finish writing code in the VS environment, we take the f5 program to automatically start execution and get the program execution result. In fact, there is still a long way to go before the program will run properly and generate results after writing the code.
Generally, we divide it into four steps: Preprocessing, Compilation, Assembly, and Linking.
1. Preprocessing:
The instructions in the preprocessing stage usually start with #, so the programs we write like #include, #define, etc. Are completed in this stage, we often use it when we write programs For some stdio or iostream header files, they are not just a simple sentence, they are some libraries that have been written, we can directly quote them here, so the preprocessing stage will open all the header files that we quote To insert into our own program. The second is that the preprocessing stage will replace all macros in our program. We often define a macro definition at the beginning of the program, and the replacement of the macro definition is done in the preprocessing stage. The third task is that we often write some comments when we write programs. These comments are for our programmers to see and have no effect on the program. Therefore, the program will remove the compilation we wrote during the preprocessing stage. The machine cannot see the comments we wrote. The fourth is our conditional compilation. We often write #ifdef. At this time, our machine will not see the part that does not meet the conditions and will not enter the compilation stage.
After talking about this, we should mention our GCC. We introduced vim earlier. Vim can be said to be a program editor. Linux programs are separate. Vim does not say that you can compile your program after writing it. You must execute these tasks which are completed by GCC, and GCC can execute the above 4 steps one step at a time, we can see what changes are made in each step of the program. Here we use GCC to see what changes occur in each step of the program.
Here we simply write a Linux program:
We use commands to make the test.c file goes only to the preprocessing step.
The function of -E here is to stop the program after the preprocessing is completed, and then -o refers to the target file, which means that the file generated after -E processing of test. c is called. My file has been preprocessed Source program in C language. At this point, we can see that there is an. I file in our previous folder in ls and we can open it.
We will see a lot of content that we do not know, and then we press the G button until the end of the file.
Only when we looked at our own program after more than 800 lines did we discover that max had been replaced by 5 in our program, and our comments had disappeared, and the previous 800 lines were actually our stdio.h part after being disassembled.
2. compilation:
Here we are going to mention our assembly language. Our C or C++ language is a high-level language. It is a relatively easy language for humans to understand, but this language is not understood by machines, but it is We have these high-level languages because machine languages are too difficult for humans to understand. Some levels lower than C language and levels higher than machine language are our assembly language. Assembly language only uses some mnemonics based on machine language. In the second stage, the main task is to let the compiler check if your program has some grammatical errors. When we write the program in VS, we will also compile it to see if our program has errors. Or warn it and run it again, but it is not a good habit to let the program run directly to save time. When your program has no problems, compiling will bring your program's programming closer to machine language assembly language.
At this point, we will compile the test. I file using the -S instruction means that our file will only be compiled and not assembled to generate assembly code.
Open our test.s file and you can see this piece of assembly code. It is easy to understand if you have learned to assemble, so I won't explain much here. This is what is done in the second stage of compilation.
3. Assembly:
this stage is to convert the assembly code generated in the second stage into our executable file, which is to convert our assembly language into a machine language that can be executed by our machine. The program must go through the stage because our C or assembly language or all kinds of machine language cannot understand it.
-C means to let our program run the third stage to generate a machine language that the machine can understand. At this time, we can see the .o file generated by ls, and the color of this file is different from other colors.
Can we run it directly at this point? No, although our file machine can understand it now, there are still many steps to be done, for example, our current source file references functions in other header files or calls some libraries in the source file. The functions that have been written, at this point, if we do not link them all independently when we run them, the program will not be able to run correctly, so we have to connect in the fourth step.
Why do we simply write an #incldue <stdio.h> and directly quote our printf function?
The reason is that these functions have been written in a library. When there is no special designation, GCC will go to a default path to find this library, that is, it will link to this library to convert these files into a whole and then there is no problem with the implementation. In general, there are two types of function libraries: static libraries and dynamic libraries. Simply put, the static library is when I use this library, I will directly add the files in the library to its source files. If we have a hundred functions in this library, but I only use it here A printf function, the program will also insert all the functions into your program, in this linking mode, the function code will be copied from the static link library where it is located to the final executable program. In this way, these codes will be loaded into the virtual address space of the process when the program is executed. A static link library is actually a collection of object files, each of which contains one or a group of related function codes in the library. But dynamic linking is different.
4. Dynamic linking:
Does not say that the library file code is inserted into the executable file, but the linker loads the library when the program is run, which can save a lot of system overhead. This library suffix is usually .so. The stdio name just mentioned is libc.so.6, which is a dynamic library.
Using dynamic links can make the final executable file shorter and save some memory when the shared object is used by several processes because only one copy of the shared object code needs to be stored in memory. But it is not necessarily the case that using a dynamic link is better than using a static link. In some cases, dynamic links can cause performance damage. For static linking, the code loads faster and the execution speed is faster, because it will only link in the part it needs when compiling, and the application is relatively large. But if it is used by several applications, it will be loaded several times, wasting memory. There is no defined limit mentioned here, so whether to use dynamic or static is still a case-by-case analysis.