Sie sind auf Seite 1von 31

CSE309N

Chapter 1
Introduction to Compiling

Chapter 1
CSE309N
Introduction to Compilers
 As a Discipline, Involves Multiple CS&E Areas
 Programming Languages and Algorithms
 Theory of Computing & Software Engineering
 Computer Architecture & Operating Systems
 Has Deceivingly Simplistic Intent:

Source Target
Compiler Program
program

Error messages

Diverse & Varied

Chapter 1
CSE309N
Classifications of Compilers
 Compilers Viewed from Many Perspectives
Single Pass
Multiple Pass Construction

Load & Go

Debugging
Functional
Optimizing

 However, All utilize same basic tasks to accomplish


their actions

Chapter 1
CSE309N
The Model
 The TWO Fundamental Parts:

Analysis: Decompose Source into an


intermediate representation

Synthesis: Target program generation


from representation

 We Will Discuss Both in This Class, and


FOCUS on analysis.

Chapter 1
CSE309N
Important Notes
 Today: There are many Software Tools for helping
with the Analysis Part. This Wasn’t the Case in
Early Days. (some) analysis is also important in:
 Structure / Syntax directed editors: Force
“syntactically” correct code to be entered
 Takes input as a sequence of commands to build a
source program.
 Performs:
– Text-creation
– Text modifications
– Analyzes the source program

Chapter 1
CSE309N
Important Notes (Continue)
 Pretty Printers: Standardized version for program structure
(i.e., blank space, indenting, etc.)
 Analyzes the source program and prints it in such a way that
the structure of the program becomes clearly visible.
 Examples
 Comments may appear in a special font
 Statements may appear with an amount of indentations
proportional to the depth of their nesting in a hierarchical
organization of the stmts.
 Static Checkers: A “quick” compilation to detect
rudimentary errors
 Examples
 Detects parts of the program that can never be executed
 A variable used before it is defined
 Interpreters: “real” time execution of code a “line-at-a-
time”
Chapter 1
CSE309N
Important Notes (Continue)
 Compilation Is Not Limited to Programming Language
Applications
 Text Formatters
 LATEX & TROFF Are Languages Whose Commands
Format Text ( paragraphs, figures, mathematical
structures etc)
 Silicon Compilers
 Textual / Graphical: Take Input and Generate Circuit
Design
 Database Query Processors
 Database Query Languages Are Also a Programming
Language
 Input is compiled Into a Set of Operations for
Accessing the Database

Chapter 1
CSE309N
The Many Phases of a Compiler
Source Program

1
Lexical Analyzer

2
Syntax Analyzer

3
Semantic Analyzer

Symbol-table Error Handler


Manager
4 Intermediate
Code Generator

5
Code Optimizer

6
Code Generator

Target Program
Chapter 1
CSE309N
Language-Processing System
Skeleton Source Program

1
Pre-Processor

Source program
2
Compiler
Target Assembly
program 3
Assembler

4 Relocatable
Machine Code

5 Library,
Loader relocatable
Link/Editor object files

Executable

Chapter 1
CSE309N
The Analysis Task For Compilation
 Three Phases:
 Linear / Lexical Analysis:
 L-to-R Scan to Identify Tokens
token: sequence of chars having a collective meaning
 Hierarchical Analysis:
 Grouping of Tokens Into Meaningful Collection
 Semantic Analysis:
 Checking to ensure Correctness of Components

Chapter 1
CSE309N
Phase 1. Lexical Analysis

Easiest Analysis - Identify tokens which are the


basic building blocks

For
Example:
Position := initial + rate * 60 ;
_______ __ _____ _ ___ _ __ _

All are tokens

Blanks, Line breaks, etc. are


scanned out

Chapter 1
CSE309N
Phase 2. Hierarchical Analysis
Parsing or Syntax Analysis
For previous example,
we would have
assignment Parse Tree:
statement
:=
identifier expression
+
position expression expression
*
identifier expression expression
initial identifier number
rate 60

Nodes of tree are constructed using a grammar for the language


Chapter 1
CSE309N
What is a Grammar?
 Grammar is a Set of Rules Which Govern the
Interdependencies & Structure Among the Tokens

statement is an assignment statement, or


while statement, or
if statement, or ...

assignment statement is an identifier := expression ;

expression is an (expression), or
expression + expression, or
expression * expression, or
number, or
identifier, or ...

Chapter 1
CSE309N
Why Have We Divided Analysis
in This Manner?
 Lexical Analysis - Scans Input, Its Linear Actions
Are Not Recursive
 Identify Only Individual “words” that are the the Tokens
of the Language

 Recursion Is Required to Identify Structure of an


Expression, As Indicated in Parse Tree
 Verify that the “words” are Correctly Assembled into
“sentences”

 What is Third Phase?


 Determine Whether the Sentences have One and Only
One Unambiguous Interpretation
 … and do something about it!
 e.g. “John Took Picture of Mary Out on the Patio”

Chapter 1
CSE309N
Phase 3. Semantic Analysis
 Find More Complicated Semantic Errors and
Support Code Generation
 Parse Tree Is Augmented With Semantic Actions

:= :=
position + position +

initial * initial *

rate 60 rate inttoreal

60
Compressed Tree Conversion Action

Chapter 1
CSE309N
Phase 3. Semantic Analysis
 Most Important Activity in This Phase:
 Type Checking - Legality of Operands
 Many Different Situations:

Real := int + char ;


A[int] := A[real] + int ;
while char <> int do
…. Etc.

Chapter 1
CSE309N
Analysis in Text Formatting

Simple Commands : LATEX

\begin{proof} begin
Embedded
\end{proof} in a single Language
\noindent stream of Commands
noindent
text, i.e.,
\section{Introduction a FILE section
}
$A_i$
\ and $ serve as signals to LATEX
$A_{i_j}$

What are tokens?


What is hierarchical structure?
What kind of semantic analysis is required?
Chapter 1
CSE309N
Supporting Phases/
Activities for Analysis
 Symbol Table Creation / Maintenance
 Contains Info (storage, type, scope, args) on Each
“Meaningful” Token, Typically Identifiers
 Data Structure Created / Initialized During Lexical
Analysis
 Utilized / Updated During Later Analysis & Synthesis
 Error Handling
 Detection of Different Errors Which Correspond to All
Phases
 What Kinds of Errors Are Found During the Analysis
Phase?
 What Happens When an Error Is Found?

Chapter 1
CSE309N
The Many Phases of a Compiler
Source Program

1
Lexical
Analyzer

2
Syntax Analyzer

3
Semantic Analyzer

Symbol-table Error Handler


Manager
4 Intermediate
Code Generator

5
Code Optimizer

6
Code Generator

Target Program
Chapter 1
CSE309N
The Synthesis Task For Compilation
 Intermediate Code Generation
 Abstract Machine Version of Code - Independent of
Architecture
 Easy to Produce and
 Easy to translate into target program
 Code Optimization
 Find More Efficient Ways to Execute Code
 Replace Code With More Optimal Statements
 Final Code Generation
 Generate Relocatable Machine Dependent Code

Chapter 1
CSE309N
Reviewing the Entire Process
position := initial + rate * 60

lexical analyzer
id1 := id2 + id3 * 60
syntax analyzer
:=
id1 +
id2 *
id3 60
semantic analyzer
:=
Symbol + E
Table
id1 r
id2l *
r
position .... id3 inttoreal o
60 r
initial …. s
intermediate code generator
rate….

Chapter 1
CSE309N
Reviewing the Entire Process

Symbol Table E
r
position ....
r
initial …. o
intermediate code generator r
rate…. s
temp1 := inttoreal(60)
temp2 := id3 * temp1
temp3 := id2 + temp2 3 address code
id1 := temp3
code optimizer
temp1 := id3 * 60.0
id1 := id2 + temp1
final code generator
MOVF id3, R2
MULF #60.0, R2
MOVF id2, R1
ADDF R2, R1
MOVF R1, id1
Chapter 1
CSE309N
Assemblers
 Assembly code: names are used for instructions,
and names are used for memory addresses.
MOV a, R1
ADD #2, R1
MOV R1, b
 Two-pass Assembly:
 First Pass: all identifiers are assigned to memory
addresses (0-offset)
e.g. substitute 0 for a, and 4 for b
 Second Pass: produce relocatable machine code:

Load
0001 01 00 00000000 *
Store 0011 01 10 00000010 relocation
bit
0010 01 00 00000100 *
add

Chapter 1
CSE309N
Loaders and Link-Editors
 Loader: taking relocatable machine code, altering
the addresses and placing the altered instructions
into memory.

 Link-editor: taking many (relocatable) machine code


programs (with cross-references) and produce a
single file.
 Need to keep track of correspondence between variable
names and corresponding addresses in each piece of code.

Chapter 1
CSE309N
Compiler Cousins: Preprocessors
Provide Input to Compilers
1. Macro Processing

#define in C: does text substitution before


compiling

#define X 3
#define Y A*B+C
#define Z getchar()

Chapter 1
CSE309N
2. File Inclusion

#include in C - bring in another file before compiling

defs.h main.c

#include “defs.h” //////


////// //////
////// …---…---…---
…---…---…--- //////
//////
…---…---…--- …---…---…---
…---…---…---
…---…---…---

Chapter 1
CSE309N
3. Rational Preprocessors
 Augment “Old” Languages With Modern Constructs
 Add Macros for If - Then, While, Etc.
 #Define Can Make C Code More Pascal-like

#define begin {
#define end }

Chapter 1
CSE309N
4. Language Extensions for a
Database System

EQUEL - Database query language embedded in C

## Retrieve (DN=Department.Dnum) where


## Department.Dname = ‘Research’

is Preprocessed into:

ingres_system(“Retr…..Research’”,____,____);

a procedure call in a programming language.

Chapter 1
CSE309N
The Grouping of Phases

Front End : Analysis + Intermediate Code Generation


vs.
Back End : Code Generation + Optimization

Number of Passes:
A pass: requires r/w intermediate files

Fewer passes: more efficiency.


However: fewer passes require more
sophisticated memory management and compiler
phase interaction.
Tradeoffs ……..

Chapter 1
CSE309N
Compiler Construction Tools

Parser Generators:
Produce Syntax Analyzers
Scanner Generators:
Produce Lexical Analyzers
Syntax-directed Translation Engines:
Generate Intermediate Code
Automatic Code Generators:
Generate Actual Code
Data-Flow Engines:
Support Optimization

Chapter 1
CSE309N

The End

Chapter 1

Das könnte Ihnen auch gefallen