An Introduction to compilers

The origin

The compiler term was invented in the 1950s by Grace William Hopper. This has been viewed as the translation of machine language of a sequence of machine language from subprograms selected from a library. Used to be called automatic programing, and there was not even the idea of this become real.

What is a compiler?

The compiler is program that handle the programing language in order to translate the source (compiled) into some machine language that the computer can use (target program). These process can be separate into two step methods:

Machine code generator
- The compiler can generate a lot of different types of code, for example (Target codes):
  - Pure machine code
  - Augmented machine code
  - Virtual machine code
    - The VM code, composed of different virtual instructions. This approach can be very attractive because it can be easily produced on different computers. Java is the language example on which the virtual machine works. The Java Virtual Machine are just simple byte code instructions, this make easy when it comes to portability.
Target code generator
- When it comes to the target code generator it differentiates from the rest in the target code they generate. The various formats are:
  - Assembly
    - This source format simplifies and modularizes translations with the help of decision making with the help of instructions and data addresses so it goes directly addressed to the assembler.
  - Relocatable binary
    - This do not directly generate an assembly language, it is more like a generation of target code. The relocatable format is the one that most assemblers generate. Can be directly generate by a compiler.
  - Absolute binary

The interpreter

This is another kind of processor language, this has the semantic process and the syntactic, instead it differentiate on the execution order.

Instead of introduce the whole code into the compiler and necessary library this goes executed line by line in order to handle the compilation one line at the time. The interpreters are very independent and there is no machine code generated.

The Syntax and semantic

Every programming language has to have specification of his syntax and his semantic.

The syntax is defined as the structure or the pieces of word codes that can be verified and valid within the language. It use the resource of the studies of the context free grammar in which defines the sequence of symbols that are legal.

The semantic in a language makes reference to the meaning of those words, this syntactic legality is independent for any notions of what the symbols mean. It validates that the tokens are in the correct order, and capable of doing correct and valid sentences.

For example: the tokens on a code like a = b + c are valid this means that every token or word are included on the language, and is even on the correct terms by the syntax and semantic validation. But in the case that the tokens become like this a+ b = c stops to make sense on the semantic validation although is correct on the syntax validation.

The static semantic

The static semantic specify if the program is valid in which are a set of rules that can validate. This rules had to have previously declared all identifiers and the operators to be correctly compatible with the procedures that are being held. But this validation being static is only capable of doing it in a compilation time. Because when it comes to actually run the program this errors will not be displayed on this test.

The run-time semantics

Or also called the dynamic semantic, is defined when a validation is made on the run time. Because this are not seen on compilation time this tend to be validated on the run time when a cycle or instruction made dynamical are being executed.

The compiler process (organization)

Onthe previous image we can see the compiler processes in which the language code get translated into a language the machine can execute.

The scanner. The one who obtain the source program and process it to be separated into tokens and check whether the words are OK or not. Returning the valid tokens if there are completely legal.
The parser. By getting the valid tokens from the scanner this reads the tokens and read them into phrases or groups that make sense in the syntax way.
The type checker. The semantic checker analyses if the phrases and groups formed with the previously token make sense. This verifies the construct of the node. And validating if the types are correct and even the order to be correct (Static semantic).

Obtained from: Crafting a Compiler. by Charles N. Fisher.