Sie sind auf Seite 1von 20

Compiler Design 1.

Overview

Compilers
Compilers translate from a source language (typically a high level language) to a functionally equivalent target language (typically the machine code of a particular machine or a machine-independent virtual machine). Compilers for high level programming languages are among the larger and more complex pieces of software
Original languages included Fortran and Cobol Often multi-pass compilers (to facilitate memory reuse) Compiler development helped in better programming language design Early development focused on syntactic analysis and optimization Commercially, compilers are developed by very large software groups Current focus is on optimization and smart use of resources for modern RISC (reduced instruction set computer) architectures.
2

Why Study Compilers?


General background information for good software engineer Increases understanding of language semantics Seeing the machine code generated for language constructs helps understand performance issues for languages Teaches good language design New devices may need device-specific languages New business fields may need domain-specific languages

Applications of Compiler Technology & Tools


Processing XML/other to generate documents, code, etc. Processing domain-specific and device-specific languages. Implementing a server that uses a protocol such as http or imap Natural language processing, for example, spam filter, search, document comprehension, summary generation Translating from a hardware description language to the schematic of a circuit Automatic graph layout (graphviz, for example) Extending an existing programming language Program analysis and improvement tools
4

Dynamic Structure of a Compiler


character stream
va l = 10 * va l + i

lexical analysis (scanning)

Front end (analysis)

token stream

1 ident "val"

3 assign -

2 number 10

4 times -

1 ident "val"

5 plus -

1 ident "i"

token number

token value

syntax analysis (parsing)

Statement

syntax tree

Expression Term ident = number * ident + ident 5

Dynamic Structure of a Compiler


Statement

syntax tree

Front end

Expression Term ident = number * ident + ident

semantic analysis (type checking, ...) intermediate representation


syntax tree, symbol table, or three address code (TAC) ...

optimization code generation machine code


const 10 load 1 mul ...

Back end (synthesis)

Compiler versus Interpreter


Compiler translates to machine code
scanner source code parser ... code generator loader

machine code

Interpreter

executes source code "directly"


scanner source code parser interpretation

statements in a loop are scanned and parsed again and again

Variant: interpretation of intermediate code


... compiler ...
source code intermediate code (e.g. Java bytecode)

VM

source code is translated into the code of a virtual machine (VM) VM interprets the code simulating the physical machine
7

Static Structure of a Compiler


parser & sem. analysis "main program" directs the whole compilation

scanner provides tokens from the source code

code generation generates machine code

symbol table
maintains information about declared names and types

uses data flow


8

Lexical Analysis
Stream of characters is grouped into tokens Examples of tokens are identifiers, reserved words, integers, doubles or floats, delimiters, operators and special symbols

int a; a = a + 2;
int a ; a = a + 2 ; reserved word identifier special symbol identifier operator identifier operator integer constant special symbol
9

Syntax Analysis or Parsing


Parsing uses a context-free grammar of valid programming language structures to find the structure of the input Result of parsing usually represented by a syntax tree Example of grammar rules:
expression expression + expression | variable | constant variable identifier constant intconstant | doubleconstant |

Example parse tree:

= a a + 2
10

Semantic Analysis
Parse tree is checked for things that violates the semantic rules of the language
Semantic rules may be written with an attribute grammar

Examples:
Using undeclared variables Function called with improper arguments Number and type of arguments Array variables used without array syntax Type checking of operator arguments Left hand side of an assignment must be a variable (sometimes called an L-value) ...
11

Intermediate Code Generation


An intermediate code representation often helps contain complexity of compiler and discover code optimizations. Typical choices include:
Annotated parse trees Three Address Code (TAC), and abstract machine language Bytecode, as in Java bytecode. Example statements:
Resulting TAC: _t1 = a > b if _t1 goto L0 _t2 = a c a = _t2 L0: _t3 = b * c C = _t3
12

if (a <= b)

{ a = a c; }
c=b*c

Intermediate Code Generation (cont'd)


Example statements: if (a <= b) { a = a c; } c=b*c Postfix/Polish/Stack: v1 v2 JumpIf(>) v1 v3 store(v1) v2 v3 * store(v3) Java bytecode (javap -c): 55: iload_1 56: iload_2 57: if_icmpgt 64 60: 61: 62: 63: 64: 65: 66: 67: iload_1 iload_3 isub istore_1 iload_2 iload_3 imul istore_3
13

Code Optimization
Compiler converts the intermediate representation to another one that attempts to be smaller and faster. Typical optimizations:
Inhibit code generation for unreachable segments Getting rid of unused variables Eliminating multiplication by 1 and addition by 0 Loop optimization: e.g. removing statements not modified in the loop Common sub-expression elimination ...

14

Object Code Generation


The target program is generated in the machine language of the target architecture.
Memory locations are selected for each variable Instructions are chosen for each operation Individual tree nodes or TAC is translated into a sequence of machine language instructions that perform the same task

Typical machine language instructions include things like


Load register Add register to memory location Store register to memory ...

15

Object Code Optimization


It is possible to have another code optimization phase that transforms the object code into more efficient object code. These optimizations use features of the hardware itself to make efficient use of processors and registers.
Specialized instructions Pipelining Branch prediction and other peephole optimizations

JIT (Just-In-Time) compilation of intermediate code (e.g. Java bytecode) can discover more context-specific optimizations not available earlier.

16

Symbol Table
Symbol table management is a part of the compiler that interacts with several of the phases
Identifiers are found in lexical analysis and placed in the symbol table During syntactical and semantical analysis, type and scope information is added During code generation, type information is used to determine what instructions to use During optimization, the live analysis may be kept in the symbol table

17

Error Handling
Error handling and reporting also occurs across many phases
Lexical analyzer reports invalid character sequences Syntactic analyzer reports invalid token sequences Semantic analyzer reports type and scope errors, and the like

The compiler may be able to continue with some errors, but other errors may stop the process

18

Compiler / Translator Design Decisions


Choose a source language
Large enough to have many interesting language features Small enough to implement in a reasonable amount of time Examples for us: MicroJava, Decaf, MiniJava

Choose a target language


Either a real assembly language for a machine with an assembler Or a virtual machine language with an interpreter Examples for us: MicroJava VM (JVM), MIPS (a popular RISC architecture, for which there is a SPIM simulator)

Choose an approach for implementation:


Either use an existing scanner and parser / compiler generator lex/flex, yacc/bison/byacc, Antlr/JavaCC/SableCC/byaccj/Coco/R. Or implement these yourself (limits the language somewhat)

19

Example MicroJava Program


program P main program; no separate compilation final int size = 10; class Table { classes (without methods) int[] pos; int[] neg; } global variables Table val; { void main() int x, i; local variables { //---------- initialize val ---------val = new Table; val.pos = new int[size]; val.neg = new int[size]; i = 0; while (i < size) { val.pos[i] = 0; val.neg[i] = 0; i = i + 1; } //---------- read values ---------read(x); while (x != 0) { if (x > 0) val.pos[x] = val.pos[x] + 1; else if (x < 0) val.neg[-x] = val.neg[-x] + 1; read(x); } } }

20

Das könnte Ihnen auch gefallen