You are on page 1of 19

Compilers

Compilers are programs which translate computer programs from high-level


languages such as Pascal, C++, Java or JavaScript into the raw 1s and 0s which the
computer can understand, but the human programmers cannot: You write this The
computer translates it ... into this, which it can run

Compilers
Compilers were the first sort of translator program to be written. The idea is simple:
You write the program, and then hand it to the compiler which translates it. Then
you run the result.

The compiler takes the file that you have written and produces another file from it.
In the case of Pascal programs, for instance, you might write a program called
myProg.pas and the Pascal compiler would translate it into the file myProg.exe
which you could then run. If you tried to examine the contents of myProg.exe using,
say, a text editor, then it would just appear as gobbled-gook. The compiler has
another task apart from translating your program. It also checks it to make sure
that it is grammatically correct. Only when it is sure that there are no grammatical
errors does it do the translation. Any errors that the compiler detects are called
compile-time errors or syntax errors. If it finds so much as one syntax error, it stops
compiling and reports the error to you. Here is an example of the C++ compiler
reporting a whole list of errors:

Most "serious" languages are compiled, including Pascal, C++ and Ada.

Interpreters
An interpreter is also a program that translates a high-level language into a low-
level one, but it does it at the moment the program is run. You write the program
using a text editor or something similar, and then instruct the interpreter to run the
program. It takes the program, one line at a time, and translates each line before
running it: It translates the first line and runs it, then translates the second line and
runs it etc. The interpreter has no "memory" for the translated lines, so if it comes
across lines of the program within a loop, it must translate them afresh every time
that particular line runs. Consider this simple Basic program:
10 FOR COUNT = 1 TO 1000

20 PRINT COUNT * COUNT

30 NEXT COUNT

Line 20 of the program displays the square of the value stored in COUNT and this
line has to be carried out 1000 times. The interpreter must also translate that line
1000 times, which is clearly an inefficient process. However, interpreted languages
do have their uses, as we will see in a later section.

Examples of interpreted languages are Basic, JavaScript and LISP.

So which is better?

Well, that depends on how you want to write and run your program. The main
advantages of compilers are as follows:

They produce programs which run quickly.

They can spot syntax errors while the program is being compiled (i.e. you are
informed of any grammatical errors before you try to run the program). However,
this does not mean that a program that compiles correctly is error-free!

The main advantages of interpreters are as follows:

There is no lengthy "compile time", i.e. you do not have to wait between writing a
program and running it, for it to compile. As soon as you have written a program,
you can run it. They tend to be more "portable", which means that they will run on a
greater variety of machines. This is because each machine can have its own
interpreter for that language. For instance, the version of the BASIC interpreter for
the PDP series computers is different from the QBasic program for personal
computers, as they run on different pieces of hardware, but programs written in
BASIC are identical from the user's point of view.

Some computer systems try to get the best of both worlds. for instance, when I was
at Durham, we programmed in Pascal on the old PDP/11 machines. Running a
Pascal program on those machines was a two-stage process. Firstly, we ran a
compiler program (called pc) which compiled the program to a low-level version,
and spotted any grammatical errors in the process. We then ran an interpreter
program which took the output of pc and ran it. The fact that pc produced
something that didn't have to run directly as machine code made the program more
portable. Different versions of the low-level interpreter could be written for different
machines in the PDP range, each taking as its input the same output from pc
How does a compiler work? Compiling a program takes several stages of
processing, which I have outlined below. The principles which are explained below
also apply to interpreters, with the exception that the interpreters translate each
program one line at a time before running it, and then moving on to the next line.
The process may be summarised in this diagram:

Tokenising ® Syntax analysis ® Semantic analysis ® Translation Tokenising

This part of the process is also sometimes called Lexical Analysis. It involves turning
the program from a series of characters into a series of tokens that represent the
building blocks of a program. Thetokens are keywords of the language i.e.
important words such as if, print or repeat), variable names and mathematical
operators (+, *, brackets etc.).

The tokeniser takes each of the characters in turn, and as soon as it recognises a
legitimate token, it reports it to the next stage of the process. Each token has a
label and a type, so a variable "count" in a program would have the label
corresponding to its name ("count") and the type "variable name". The tokeniser is
also responsible for ignoring comments in the program - these are words and
phrases inserted purely for the benefit of any human reading the program, and they
have no function.

Syntax Analysis

Syntax means "grammar" and the syntax analyser in a compiler checks that the
right tokens appear in the right order to make grammatically correct instructions.
For instance, in C++, the instruction xyz++; is syntactically correct, but the
instruction +;xyz+ is not - the order of the tokens is wrong.

Semantic Analysis

The word "semantics" refers to meaning, and the semantic analyser checks the
meaning of the program. This refers to aspects such as whether the variables have
been declared, e.g. xyz++; may be syntactically correct, but if the variable xyz has
not been declared, then it is semantically incorrect!

The semantic analyser checks not only variable declarations and scope, but whether
the program has entered or left loops, or subroutines, whether classes are accessed
correctly etc. Translation

3.1 Compiler vs. Interpreter

An interpreter translates some form of source code into a target representation that
it can immediately execute and evaluate. The structure of the interpreter is similar
to that of a compiler, but the amount of time it takes to produce the executable
representation will vary as will the amount of optimization. The following diagram
shows one representation of the differences.

Compiler characteristics:

spends a lot of time analyzing and processing the program

the resulting executable is some form of machine- specific binary code

the computer hardware interprets (executes) the resulting code

program execution is fast

Interpreter characteristics:

relatively little time is spent analyzing and processing the program

the resulting code is some sort of intermediate code

the resulting code is interpreted by another program

program execution is relatively slow

The above characteristics are typical. There are well-known cases that are
somewhere in between, such as Java with it's JVM.

We usually prefer to write computer programs in languages we understand rather


than in machine language, but the processor can only understand machine
language. So we need a way of converting our instructions (source code) into
machine language. This is done by an interpreter or a compiler.

An interpreter reads the source code one instruction or line at a time, converts this
line into machine code and executes it. The machine code is then discarded and the
next line is read. The advantage of this is it's simple and you can interrupt it while it
is running, change the program and either continue or start again. The
disadvantage is that every line has to be translated every time it is executed, even
if it is executed many times as the program runs. Because of this interpreters tend
to be slow. Examples of interpreters are Basic on older home computers, and script
interpreters such as JavaScript, and languages such as Lisp and Forth.

A compiler reads the whole source code and translates it into a complete machine
code program to perform the required tasks which is output as a new file. This
completely separates the source code from the executable file. The biggest
advantage of this is that the translation is done once only and as a separate
process. The program that is run is already translated into machine code so is much
faster in execution. The disadvantage is that you cannot change the program
without going back to the original source code, editing that and recompiling (though
for a professional software developer this is more of an advantage because it stops
source code being copied). Current examples of compilers are Visual Basic, C, C++,
C#, Fortran, Cobol, Ada, Pascal and so on.

You will sometimes see reference to a third type of translation program: an


assembler. This is like a compiler, but works at a much lower level, where one
source code line usually translates directly into one machine code instruction.
Assemblers are normally used only by people who want to squeeze the last bit of
performance out of a processor by working at machine code level.

Compiler

A Compiler is a program that translates code of a programming language in


machine code

*****Translated source code into machine code***** .

A compiler is a special program that processes statements written in a particular


programming language and converts them into machine language, a "binary
program" or "code," that a computer processor uses. A compiler works with what
are sometimes called 3GL and higher-level languages (3rd-generation languages,
such as Java and C

Interpreter

Interpreters translate code one line at time, executing each line as it is "translated,"
much the way a foreign language interpreter would translate a book, by translating
one line at a time. Interpreters do generate binary code, but that code is never
compiled into one program entity.

Interpreters offer programmers some advantages that compilers do not. Interpreted


languages are easier to learn than compiled languages, which is great for beginning
programmers. An interpreter lets the programmer know immediately when and
where problems exist in the code; compiled programs make the programmer wait
until the program is complete.

Interpreters therefore can be easier to use and produce more immediate results;
however the source code of an interpreted language cannot run without the
interpreter.
Compilers produce better optimized code that generally run faster and compiled
code is self sufficient and can be run on their intended platforms without the
compiler present.

Compilation: source code ==> relocatable object code (binaries)

Linking: many relocatable binaries (modules plus libraries) ==> one relocatable
binary (with all external references satisfied)

Loading: relocatable ==> absolute binary (with all code and data references bound
to the addresses occupied in memory)

Execution: control is transferred to the first instruction of the program

At compile time (CT), absolute addresses of variables and statement labels are not
known.

Here in this aritcle , the compiling process is explained

A compiler for a language generally has several different stages as it

processes the input.

These are:

1. Preprocessing

During the preprocessing stage, comments, macros, and directives are

processed. Comments are removed from the source file. This greatly simplifies the
later stages.

If the language supports macros, the macros are replaced with the equivalent

text.

For example, C and C++ support macros using the #define directive. So if a

macro were defined for pi as:

#define PI 3.1415927
Any time the preprocessor encountered the word PI, it would replace PI with

3.1415927 and process the resulting text.

The preprocessor may also replace special strings with other characters. In

C and C++, the preprocessor recognizes the \ character as an escape code,

and will replace the escape sequence with a special character. For example

\t is the escape code for a tab, so \t would be replaced at this stage with

a tab character.

2. Lexical analysis is the process of breaking down the source files into

key words, constants, identifiers, operators and other simple tokens. A

token is the smallest piece of text that the language defines.

3. Syntactical analysis is the process of combining the tokens into

well-formed expressions, statements, and programs. Each language has

specific rules about the structure of a program--called the grammar or

syntax. Just like English grammar, it specifies how things may be put

together. In English, a simple sentence is: subject, verb, predicate.

In C or C++ an if statement is:

if ( expression ) statement

The syntactical analysis checks that the syntax is correct, but doesn't

enforce that it makes sense. In English, a subject could be: Pants, the
verb: are, the predicate: a kind of car. This would yield: Pants are a kind

of car. Which is a sentence, but doesn't make much sense.

In C or C++, a constant can be used in an expression: so the expression:

float x = "This is red"++

Is syntactically valid, but doesn't make sense because a float number can

not have string assigned to it, and a string can not be incremented.

4. Semantic analysis is the process of examining the types and values of the

statements used to make sure they make sense. During the semantic

analysis, the types, values, and other required information about statements

are recorded, checked, and transformed as appropriate to make sure the

program makes sense.

For C/C++ in the line:

float x = "This is red"++

The semantic analysis would reveal the types do not match and can not be

made to match, so the statement would be rejected and an error reported.

While in the statement:

float y = 5 + 3.0;
The semantical analysis would reveal that 5 is an integer, and 3.0 is a

double, and also that the rules for the language allow 5 to be converted to

a double, so the addition could be done, so the expression would then be

transformed to a double and the addition performed. Then, the compiler

would recognize y as a float, and perform another conversion from the double

8.0 to a float and process the assignment.

5. Intermediate code generation

Depending on the compiler, this step may be skipped, and instead the program

may be translated directly into the target language (usually machine object

code). If this step is implemented, the compiler designers also design a

machine independent language of there own that is close to machine language

and easily translated into machine language for any number of different

computers.

The purpose of this step is to allow the compiler writers to support

different target computers and different languages with a minimum of effort.

The part of the compiler which deals with processing the source files,

analyzing the language and generating the intermediate code is called the

front end, while the process of optimizing and converting the intermediate

code into the target language is called the back end.

6. Code optimization

During this process the code generated is analyzed and improved for
efficiency. The compiler analyzes the code to see if improvements can be

made to the intermediate code that couldn't be made earlier. For example,

some languages like Pascal do not allow pointers, while all machine

languages do. When accessing arrays, it is more efficient to use pointers,

so the code optimizer may detect this case and internally use pointers.

7. Code generation

Finally, after the intermediate code has been generated and optimized, the

compiler will generated code for the specific target language. Almost

always this is machine code for a particular target machine.

Also, it us usually not the final machine code, but is instead object code,

which contains all the instructions, but not all of the final memory

addresses have been determined.

Interpreter
Last modified: Monday, December 10, 2001

A program that executes instructions written in a high-level language. There are two ways
to run programs written in a high-level language. The most common is to compile the
program; the other method is to pass the program through an interpreter.

An interpreter translates high-level instructions into an intermediate form, which it then


executes. In contrast, a compilertranslates high-level instructions directly into machine
language. Compiled programs generally run faster than interpreted programs. The advantage
of an interpreter, however, is that it does not need to go through the compilation stage during
which machine instructions are generated. This process can be time-consuming if the
program is long. The interpreter, on the other hand, can immediately execute high-level
programs. For this reason, interpreters are sometimes used during thedevelopment of a
program, when a programmer wants to add small sections at a time and test them quickly. In
addition, interpreters are often used in education because they allow students to program
interactively.

Both interpreters and compilers are available for most high-level languages.
However, BASIC and LISP are especially designed to be executed by an interpreter. In
addition, page description languages, such as PostScript, use an interpreter. Every
PostScript printer, for example, has a built-in interpreter that executes PostScript instructions

Efficiency
The main disadvantage of interpreters is that when a program is interpreted, it typically runs more slowly
than if it had been compiled. Interpreting code is slower than running the compiled code because the
interpreter must analyze each statement in the program each time it is executed and then perform the
desired action, whereas the compiled code just performs the action within a fixed context determined by
the compilation. This run-time analysis is known as "interpretive overhead". Access to variables is also
slower in an interpreter because the mapping of identifiers to storage locations must be done repeatedly
at run-time rather than at compile time.

once a routine has been tested and debugged under the interpreter it can be compiled and thus benefit
from faster execution while other routines are being developed. Many interpreters do not execute the
source code as it stands but convert it into some more compact internal form.

Advantages and disadvantages of using interpreters


Programmers usually write programs in high level code which the CPU cannot execute. So this source
code has to be converted into machine code. This conversion is done by a compiler or an interpreter. A

•Interpreters
A program that executes instructions written in a high-level language. There are two ways
to run programs written in a high-level language. The most common is to compile the program; the other
method is to pass the program through an interpreter.

An interpreter translates high-level instructions into an intermediate form, which it then executes. In
contrast, a compilertranslates high-level instructions directly into machine language. Compiled programs
generally run faster than interpreted programs. The advantage of an interpreter, however, is that it does
not need to go through the compilation stage during which machine instructions are generated. This
process can be time-consuming if the program is long. The interpreter, on the other hand, can
immediately execute high-level programs. For this reason, interpreters are sometimes used during
thedevelopment of a program, when a programmer wants to add small sections at a time and test them
quickly. In addition, interpreters are often used in education because they allow students to program
interactively.
Both interpreters and compilers are available for most high-level language.
However, BASIC and LISP are especially designed to be executed by an interpreter. In addition, page
description languages, such as PostScript, use an interpreter. Every compiler makes the conversion just
once, while an interpreter typically converts it every time a program is executed (or in some languages
like early versions of BASIC, every time a single instruction is executed).

An interpreter usually just needs to translate to an intermediate representation or not translate at all, thus
requiring less time before the changes can be tested.

This often makes interpreted languages generally easier to learn and find bugs and correct problems.
Thus simple interpreted languages tend to have a friendlier environment for beginners.

Execution environment
An interpreter will make source translations during runtime. This means every line has to be converted
each time the program runs. This process slows down the program execution and is a major
disadvantage of interpreters over compilers. Another main disadvantage of interpreter is that it must be
present on the machine as additional software to run the program.

Advantages and disadvantages of using interpreters


Programmers usually write programs in high level code which the CPU cannot execute. So this source
code has to be converted into machine code. This conversion is done by a compiler or an interpreter. A
compiler makes the conversion just once, while an interpreter typically converts it every time a program is
executed (or in some languages like early versions of BASIC, every time a single instruction is executed).

Another definition of interpreter


Instead of producing a target program as a translation, an interpreter performs the operation implied by
the source program. For an assignment statement, for example, an interpreter might build a tree and then
carry out the operation at the node as it "walks" the tree.

bINTERPRETERS
These are programs which translate computer programs from high-level
languages such as Pascal, C++, Java or JavaScript into the raw 1s and 0s
which the computer can understand, but the human programmers cannot:
... into this, which
You write this The computer translates it
it can run

Interpreters

An interpreter is also a program that translates a high-level language


into a low-level one, but it does it at the moment the program is run. You
write the program using a text editor or something similar, and then
instruct the interpreter to run the program. It takes the program, one line
at a time, and translates each line before running it: It translates the first
line and runs it, then translates the second line and runs it etc. The
interpreter has no "memory" for the translated lines, so if it comes across
lines of the program within a loop, it must translate them afresh every
time that particular line runs. Consider this simple Basic program:
10 FOR COUNT = 1 TO 1000
20 PRINT COUNT * COUNT
30 NEXT COUNT

Line 20 of the program displays the square of the value stored


in COUNT and this line has to be carried out 1000 times. The interpreter
must also translate that line 1000 times, which is clearly an inefficient
process. However, interpreted languages do have their uses, as we will
see in a later section.

Examples of interpreted languages are Basic, JavaScript and LISP.


So which is better?
Well, that depends on how you want to write and run your program. The
main advantages of compilers are as follows:

• They produce programs which run quickly.


• They can spot syntax errors while the program is being compiled
(i.e. you are informed of any grammatical errors before you try to
run the program). However, this does not mean that a program that
compiles correctly is error-free!

The main advantages of interpreters are as follows:

• There is no lengthy "compile time", i.e. you do not


have to wait between writing a program and running
it, for it to compile. As soon as you have written a
program, you can run it.
• They tend to be more "portable", which means that
they will run on a greater variety of machines. This is
because each machine can have its own interpreter
for that language. For instance, the version of the
BASIC interpreter for the PDP series computers is
different from the QBasic program for personal
computers, as they run on different pieces of
hardware, but programs written in BASIC are
identical from the user's point of view.

Some computer systems try to get the best of both


worlds. for instance, when I was at Durham, we
programmed in Pascal on the old PDP/11 machines.
Running a Pascal program on those machines was a two-
stage process. Firstly, we ran a compiler program
(called pc) which compiled the program to a low-level
version, and spotted any grammatical errors in the
process. We then ran an interpreter program which took
the output of pc and ran it. The fact that pc produced
something that didn't have to run directly as machine
code made the program more portable. Different versions
of the low-level interpreter could be written for different
machines in the PDP range, each taking as its input the
same output from pc
Compiler vs.
Interpreter

Compiler vs. interpreter


Translation of program written in high level language
to low level language is done by software called compiler.
The compiler & interpreters are software which translates
the code written in high level language. The difference in
compilation and interpretation is in the methodology of
translation. The compiler takes the whole program and
generates the object code and interpreter executes the
program line by line. In the compilation process the whole
program is scanned for the syntax, compiler lists errors all
at a time. In interpreters, translation is done line by line;
only one line is check for the syntax error. Execution time
of compiled code or object code is faster than the
interpreter. Debugging of program is difficult in
compilation process because all the errors are listed at
every Compilation attempt. However, the interpreter is
best suited for debugging process because the errors are
reported line by line.
An interpreter translates some form of source code
into a target representation that it can immediately
execute and evaluate. The structure of the interpreter is
similar to that of a compiler, but the amount of time it
takes to produce the executable representation will vary
as will the amount of optimization. The following diagram
shows one representation of the differences.

Compiler characteristics:
• spends a lot of time analyzing and processing
the program
• the resulting executable is some form of
machine- specific binary code
• the computer hardware interprets (executes) the
resulting code
• program execution is fast
Interpreter characteristics:
• relatively little time is spent analyzing and
processing the program
• the resulting code is some sort of intermediate
code
• the resulting code is interpreted by another
program
• program execution is relatively slow

The above characteristics are typical. There are well-


known cases that are somewhere in between, such as
Java with it's JVM.

 Advantages of an Interpreter

• Interpreters are useful for program development


when execution speed is not important. As the
interpreter is in command of the execution process
debugging features can be build in.
• Debugging is easier since the interpreter stops when
it encounters an error. If an error is deducted there is
no need to re translate the whole program,
• There is no lengthy "compile time", i.e. you do not
have to wait between writing a program and running
it, for it to compile. As soon as you have written a
program, you can run it.

Disadvantages of an Interpreter
• Interpreters normally translate and execute
programs line by line, converting each program
statement into a sequence of machine code
instructions and executing these instructions without
retaining the translated version.

I.e. In a program with a loop, that the same statement


will be translated every time it is uncounted. Therefore
Interpreter programs are usually slower in execution
than compiled programs.
• No object code is produced, so a translation has to
be done every time the program is running. Source
code is required for the program to be executed

Compilers and interpreters do similar jobs, but there


are differences:

To run a program you've written, eg in JAVA, it must first


be translated into machine code so the computer can
read it. This is what compilers and interpreters do.

However, compilers convert the code all at once, save it,


then run it; whereas interpreters translate the code one
line at a time, as it is run.

Interpreters tend to result in faster translating of code so


they are used mostly for debugging. This is because if
you used a compiler, you'd have to re-compile your entire
project every time you changed one little thing.

However, it's not very efficient to keep re-translating your


code once you've finished writing it, because it would
waste CPU time. Because of this, once code is done, it is
normally compiled so that it runs faster and takes up less
space. Another advantage of this is that your code is then
much harder to copy without lengthy 'reverse
engineering.'