Compilers

From Dev Wiki
Revision as of 18:28, 19 September 2020 by Brodriguez (talk | contribs) (Create page)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Compilers are essentially programs that read in one set of code, potentially optimizes it, and then outputs a different set of code.

Aspects of a Compiler

There are three main parts to a compiler:

  • Front End/Input
  • Middle End/Optimization
  • Back End/Output

Each of these are explained in further detail in the following sections.


Front End / Input

The "Front End" or "Input" effectively reads code in from a file. The compiler needs to be programmed to be able to understand the structure of each language it's expected to use as input. The structure of a given language (whether that be English, Python, or even Math in general) is often called a Grammar (

ToDo: Link to grammars

.

Once the input file is read in, the "front end" will often convert this to some internally recognized "intermediate language" or "intermediate representation". This representation can be in literally any format, including ones that are only internally recognized by the compiler itself.

The point of an "Intermediate representation" language is so that the compiler can read in any language (say, either C or Fortran

ToDo: Link to languages

) as input, and then convert them to a single format for the "Middle End" part of the compiler to handle. That way, the compiler can use one set of optimizations, regardless of what the original input file was.

Ideally at some point, the Front End will also check that the syntax of the original input file is correct for the corresponding language. After all, it doesn't make sense to do work on something that's initially broken.

Parse Trees

Middle End

The "Middle End" or "Optimization" effectively checks for various optimizations that can be run on the code.

Depending on implementation of the compiler, one of two formats can happen:

  • The user can specify exactly which optimizations occur from a list of the compiler's known optimizations.
  • The compiler attempts to execute all known optimizations.

Regardless of which format the compiler takes, it will run through the code and attempt to run said optimizations.

Note that the compiler has to be programmed with each individual optimization first. It doesn't just arbitrarily know them ahead of time. Furthermore, each optimization should ideally be programmed such that it will never break the code. Generally speaking, it's better that an optimization "doesn't do enough but never breaks any input files" rather than "changes as much as possible, but breaks for some specific input files".

This can actually be incredibly difficult to program, as a given optimization may work with many input files, but break with one single specific file. Or alternatively, an optimization may seem to work with all input files when run alone. But then when the optimization is ran alongside others, it ends up breaking the input files.

Regardless, assuming the optimizations found anything to improve (which is often), then the end result should be an improved form of the "intermediate representation" code that effectively accomplishes the same logic as the original code.


Back End

The "Back End' or "Output' effectively takes the "intermediate representation" of the code and outputs it to a file in a given language. It's basically the reverse of what the Front End of the compiler does.

By keeping it separate from the Front End, the compiler can basically read in one language for the input file, then output an entirely different language for the output file. For example, it might take in C code and output an equivalent, optimized Assembly file.

ToDo: link to languages

.