Sie sind auf Seite 1von 75

3.

LANGUAGE TRANSLATION ISSUES

Dr.Narayana Swamy Ramaiah


Assoc.Prof, Dept of Electrical and Computer Engineering,
Arba Minch University,AMIT
3.1 Programming Language Syntax
The syntax of a programming language describes the structure of programs without any
consideration of their meaning.

Examples of syntax features:

 Statements end with ';' (C,C++, Pascal), with'.' (Prolog), or do not have an ending
symbol (FORTRAN)
 Variables must start with any letter (C, C++, Java), or only with a capital letter
(Prolog).
 The symbol for assignment statement is '=', or ':=' , or something else.

Key criteria concerning syntax

 Readability – a program is considered readable if the algorithm and data are apparent
by inspection.
 Writeability – ease of writing the program.
 Verifiability – ability to prove program correctness (very difficult issue)
 Translatability – ease of translating the program into executable form.
 Lack of ambiguity – the syntax should provide for ease of avoiding ambiguous
structures.

Basic syntactic concepts in a programming language

 Character set – the alphabet of the language. Several different character sets are used:
ASCII, EBCIDIC, Unicode.
 Identifiers – strings of letters of digits usually beginning with a letter
 Operator Symbols – +-*/
 Keywords or Reserved Words – used as a fixed part of the syntax of a statement.
 Noise words – optional words inserted into statements to improve readability.
 Comments – used to improve readability and for documentation purposes.
Comments are usually enclosed by special markers.
 Blanks – rules vary from language to language. Usually only significant in literal
strings.
 Delimiters – used to denote the beginning and the end of syntactic constructs.
 Expressions – functions that access data objects in a program and return a value
 Statements – these are the sentences of the language, describe a task to be performed.
Overall Program-Subprogram Structure

 Separate subprogram definitions: separate compilation, linked at load


time. Advantages: easy modification.
 Separate data definitions: Group together all definitions that manipulate a data
object. General approach in OOP.
 Nested subprogram definitions: Subprogram definitions appear as declarations
within the main program or other subprograms. Not used in many contemporary
languages. Provides for static type checking in non-local referencing environments.
 Separate interface definitions: Subprogram interface - the way programs and
subprograms interact by means of arguments and returned results. A program
specification component may be used to describe the type of information transferred
between separate components of the program.

E.G. C/C++ use header files as specification components. Packages contain either the
specification of the interface definitions or the source program implementation.
 Data descriptions separated from executable statements. A centralized data division
contains all data declarations. E.G. COBOL. Separate programs for data divisions (data
declaration, global), procedure divisions (executable statements, sub units) and
environment (declarations concerning to external operating environments) Advantage
- logical data format independent on algorithms.
 Unseparated subprogram definitions: No syntactic distinction between main
program statements and subprogram statements. Subprograms begin (function call) and
end (return) are not differentiated syntactically. Allows for run-time translation and
execution.
3.2 Stages in Translation

Translation is process of converting source program into executable object program, ensuring
efficiency. Translation is divided into two parts,

 Analysis of input source program


 Synthesis of the executable object program

2.1. Analysis of the source program

 Lexical analysis (scanning) – identifying the tokens of the programming language: the
keywords, identifiers, constants and other symbols appearing in the language.

In the program

Void main()
{
Printf ("Hello World\n");
}

The tokens are

Void, main, (,), {, printf, (, "Hello World\n",), ;,}


 Syntactic analysis (parsing) – determines the structure of the program,
as defined by the language grammar. Syntactic and semantic analysers
communicate using a stack.
 Semantic analysis - assigns meaning to the syntactic structures

Example:

int variable1;

The meaning is that the program needs 4 bytes in the memory to serve
as a location for variable1. Further on, a specific set of operations only
can be used with variable1, namely integer operations.

The semantic analysis builds the bridge between analysis and


synthesis.

Basic semantic tasks:

 Symbol–table maintenance
 Insertion of implicit information
 Error detection
 Macro processing and compile-time operations
 Macro, binding during translation; Subprogram, binding at
runtime
 Compile time operations- sequence of statements to be compiled
 #define PC……
 Ifdef PC….windows..#else...unix….#endif

The result of the semantic analysis is an internal representation, suitable to be


used for code optimization and code generation.

2. 2. Synthesis of the object program

The final result is the executable code of the program. It is obtained in three
main steps:

Optimization - Code optimization involves the application of rules and


algorithms applied to the intermediate and/ or assembler code with the
purpose to make it more efficient, i.e. faster and smaller.

For example: A = B+C+D

May generate the intermediate code

(a) Temp1 = B+C


(b) Temp2 = Temp1 + D
(c) A=Temp2
Which may generate the straight forward, but inefficient, code

1. Load register with B (from(a))


2. Add C to register
3. Store register in Temp1
4. Load register with Temp1 (from(b))
5. Add D to register
6. Store register in Temp2
7. Load register with Temp2 (from(c))
8. Store register in A

Instructions 3,4,6,7 are redundant

Code generation - generating assembler commands, machine codes or


other object program form with relative memory addresses for the
separate program modules - obtaining the object code of the program.

Linking and loading - resolving the addresses - obtaining the


executable code of the program.

2.3. Bootstrapping

The compiler for a given language can be written in the same language.
The process is based on the notion of a virtual machine. A virtual machine is
characterized by the set of operations, assumed to be executable by the
machine.

Assume we have:

 A real machine (at the lowest level with machine code operations
implemented in hardware)
 A firmware machine (next level - its set is the assembler language
operations and the program that translates them into machine operations
is stored in a special read-only memory)
 A virtual machine for some internal representation (this is the third level,
and there is a program that translates each operation into assembler
code)
 A compiler for the language L (some language) written in L (the same
language)

The translation of the compiler into the internal representation is done manually
- the programmer manually re-writes the compiler into the internal
representation. This is done once and though tedious, it is not difficult - the
programmer uses the algorithm that is encoded into the compiler. From there on
the internal representation is translated into assembler and then into machine
language.
3.3 Formal Translation Models

Syntax is concerned with the structure of programs. The formal description of


the syntax of a language is called grammar

 Grammars consist of a set of rules (called productions) that specify


the sequences of character (or lexical items) that form allowable
programs in the language being defined. A grammar lets us transform a
program, which is normally represented as a linear sequence
of ASCII characters, into a syntax tree. Only programs that are
syntactically valid can be transformed in this way. This tree will be the
main data-structure that a compiler or interpreter uses to process the
program. By traversing this tree the compiler can produce machine code,
or can type check the program, for instance. And by traversing this very
tree the interpreter can simulate the execution of the program.
 A formal grammar is just a grammar specified using a strictly defined
notation.
 Two classes of grammars useful in compile technology include the
BNF grammar (or context free grammar) and Regular grammar.
 Grammars are independent of the syntactic analysis.

1. More about grammars


a. Word categories, constituents

Language consists of sentences. Each sentence consists of


words. The rules that tell us how to combine words that form
correct sentences are called grammar rules.

Example from English:

"The boy reads" is a valid sentence


"Boy the reads" is an invalid sentence.

The corresponding rule here says that the article must precede
the noun.

Here you see the words "article" and "noun". These words
correspond to certain categories of words in the language.
For example, the words boy, book, room, class, are all nouns,
while read, speak, write, are verbs.

Why do we need word categories?

There are infinitely many sentences and we cannot write a rule


for each individual sentence. We need these categories in order
to describe the structural patterns of the sentences. For example,
the basic sentence pattern in English is a noun phrase followed
by a verb phrase.
A sequence of words that constitutes a given category is called
a constituent. For example, the boldface parts in each of the
sentences below correspond to a constituent called verb phrase.

The boy is reading a book.


The boy is reading an interesting book.
The boy is reading a book by Mark Twain.

b. Terminal and non-terminal symbols, grammar rules

How do we represent constituents and how do we represent


words?

Grammars use two types of symbols:

Terminal - to represent the words.


Non-terminal - to represent categories of words and
constituents.

These symbols are used in grammar rules. Here are some


examples:

Rule Meaning

N boy N is the non-terminal symbol for


"noun", "boy" is a terminal
"symbol"

D the | a | an D is the non-terminal symbol for


definite or indefinite articles.

NP DN this rule says that a noun


phrase NP may consist of an article
followed by a noun

There is one special non-terminal symbol S to represent


"sentence". It is called also the starting symbol of the grammar.

Grammars for programming languages - no major


differences
2. BNF notation

Grammars for programming languages use a special notation called


BNF (Backus-Naur form):

The non-terminal symbols are enclosed in < >


Instead of, the symbol ::= is used
The vertical bar is used in the same way - meaning choice.
[] are used to represent optional constituents.

BNF notation is equivalent to the first notation in the examples above.


A BNF grammar is defined by a four-element tuple represented by (T,
N, P, S). The meaning of these elements is as follows:

 T is a terminal, set of tokens. Tokens form the vocabulary of the


language and are the smallest units of syntax. These elements are
the symbols that programmers see when they are typing their code,
e.g., the while's, for's, +'s, ('s, etc.
 N is a set of nonterminal. Nonterminal are not part of the language
parse. Rather, they help to determine the structure of the derivation
trees that can be derived from the grammar. Usually we enclose
these symbols in angle brackets, to distinguish them from the
terminals.
 P is a set of productions rules. Each production is composed of a
left-hand side, a separator and a right-hand side, e.g., <non-
terminal> ::= <expr1> ... <exprN>, where '::=' is the separator. For
convenience, productions with the same left-hand side can be
abbreviated using the symbol '|'. The pipe, in this case, is used to
separate different alternatives.
 S is a “sentence” start symbol. Any sequence of derivations that
ultimately produces a grammatically valid program starts from this
special non-terminal.
As an example, below we have a very simple grammar that recognizes
arithmetic expressions. In other words, any program in this simple
language represents the product or the sum of names such as 'a', 'b' and
'c'.

<exp> ::= <exp> "+" <exp>


<exp> ::= <exp> "*" <exp>
<exp> ::= "(" <exp> ")"
<exp> ::= "a"
<exp> ::= "b"
<exp> ::= "c"
This grammar could be also represented in a more convenient way using
a sequence of bar symbols, e.g.:

<exp> ::= <exp> "+" <exp> | <exp> "*" <exp> | "(" <exp> ")" | "a" | "b" | "c"

3. Derivations, Parse trees, Ambiguity

Using a grammar, we can generate sentences. The process is


called derivation

Example: The simple grammar on p. 91: S SS | (S) | ( ) generates all


sequences of paired parentheses.

The rules of the grammar can be written separately:

Rule1: S SS

Rule2: S (S)

Rule3: S ()

One possible derivation is:

S (S) by Rule2

(SS) by Rule1

 ( ( ) S ) by Rule3

 ( ( ) ( ) ) by Rule3

The strings obtained at each step are called sentential forms. They may
contain both terminal and non-terminal symbols. The last
string obtained in the derivation contains only terminal symbols. It is
called a sentence in the language.

This derivation is performed in a leftmost manner. That is, at each step


the leftmost variable in the sentential form is replaced.

Parsing is the problem of transforming a linear sequence of characters


into a syntax tree. Nowadays we are very good at parsing. In other
words, we have many tools, such as lex and yacc, for instance, that helps
us in this task. However, in the early days of computer science parsing
was a very difficult problem. This was one of the first, and most
fundamental challenges that the first compiler writers had to face. If the
program text describes a syntactically valid program, then it is possible
to convert this text into a syntax tree. As an example, the figure below
contains different parsing trees for three different programs written in
our grammar of arithmetic expressions:

There are many algorithms to build a parsing tree from a sequence of


characters. Some are more powerful, others are more practical.
Basically, these algorithms try to find a sequence of applications of the
production rules that end up generating the target string. For instance,
let’s consider the grammar below, which specifies a very small subset
of the English grammar:

<sentence> ::= <noun phrase> <verb phrase> .


<noun phrase> ::= <determiner> <noun> | <determiner> <noun> <prepositional phrase>
<verb phrase> ::= <verb> | <verb> <noun phrase> | <verb> <noun phrase> <prepositional phrase>
<prepositional phrase> ::= <preposition> <noun phrase>
<noun> ::= student | professor | book | university | lesson | programming language | glasses
<determiner> ::= a | the
<verb> ::= taught | learned | read | studied | saw
<preposition> ::= by | with | about

Below we have a sequence of derivations showing that the sentence "the


student learned the programming language with the professor" is a valid
program in this language:

<sentence> ⇒ <noun phrase> <verb phrase> .


⇒ <determiner> <noun> <verb phrase> .
⇒ the <noun> <verb phrase> .
⇒ the student <verb phrase> .
⇒ the student <verb> <noun phrase> <prepositional phrase> .
⇒ the student learned <noun phrase> <prepositional phrase> .
⇒ the student learned <determiner> <noun> <prepositional phrase> .
⇒ the student learned the <noun> <prepositional phrase> .
⇒ the student learned the programming language <prepositional phrase> .
⇒ the student learned the programming language <preposition> <noun phrase> .
⇒ the student learned the programming language with <noun phrase> .
⇒ the student learned the programming language with <determiner> <noun> .
⇒ the student learned the programming language with the <noun> .
⇒ the student learned the programming language with the professor .

Ambiguity
Compilers and interpreters use grammars to build the data-structures
that they will use to process programs. Therefore, ideally a given
program should be described by only one derivation tree. However,
depending on how the grammar was designed, ambiguities are possible.
A grammar is ambiguous if some phrase in the language generated by
the grammar has two distinct derivation trees. For instance, the grammar
below, which we have been using as our running example, is ambiguous.

<exp> ::= <exp> "+" <exp>


| <exp> "*" <exp>
| "(" <exp> ")"
| "a" | "b" | "c"

In order to see that this grammar is ambiguous we can observe that it is


possible to derive two different syntax trees for the string "a * b + c".
The figure below shows these two different derivation trees:

Sometimes, the ambiguity in the grammar can compromise the meaning


of the sentences that we derive from that grammar. It is very important
that grammars for programming languages are not ambiguous.

4. Grammars for programming languages

4.1. Types of grammars

There are 4 types of grammars depending on the rule format.


Regular grammars: (Type 3 )

A a

A aB

Context-free grammars (Type 2)

A Any string consisting of terminals and non-terminals

Context-sensitive grammars (Type 1)

String1 String2

String1 and String2 are any strings consisting of terminals and


non-terminals, provided that the length of String1 is not greater
than the length of String2

General grammars (Type 0)

String1 String2, no restrictions.

4. 2. Regular grammars and Regular expressions

Regular grammars are used to describe identifiers in programming


languages and arithmetic expressions.

<nonterminal> ::= <terminal> <nonterminal> | <terminal>

A grammar to generate binary strings ending in 0 is given by

A 0A | 1A | 0

The first two alternatives are used to generate any binary string, and third
alternative is used to end the generation with a final 0.

Regular languages are languages whose sentences are regular


expressions.

Context-free grammars generate context-free languages.


They are used to describe programming languages.

Regular expressions

Strings of symbols may be composed of other strings by means of


Concatenation - appending two strings, and
Kleene star operation - any repetition of the string. E.g. a* can
be a, or aa, or aaaaaaa, etc.

Given an alphabet ∑, regular expressions consist of string


concatenations combined with the symbols U and *, possibly using '('
and ')'. There is one special symbol used to denote an empty expression:
Ø

Formal definition:

1. Ø and each member of ∑ is a regular expression.


2. If α and β are regular expressions, then (α β) is a regular
expression.
3. If α and β are regular expressions, then α U β is a regular
expression.
4. If α is a regular expression, then α* is a regular expression.
5. Nothing else is a regular expression.

Example:

Let ∑ = {0, 1}. Examples of regular expressions are:

0, 1, 010101, any combination of 0s and 1s

Generated String

0 U 1, 0,1

(0 U 1)1* 0, 01, 011, 0111,…, 1, 11, 111..

(0 U 1)*01 01, 001, 0001,… 1101, 1001,

Exam-like questions

1. List and briefly explain the key criteria in syntax design


2. Briefly describe lexical analysis, syntactic analysis and semantic analysis.
List the basic semantic tasks.
3. List three steps in synthesis of the object program and briefly characterize them.
4. What is bootstrapping?
5. What type of expressions are identifiers and arithmetic expressions in programming
languages?
6. What type of grammars are used to describe the syntax of identifiers and arithmetic
expressions?
7. What type of grammars are used to describe the syntax of programming language statements?
Chapter 4: Elementary Data Types:
4.1 Properties of Types and Objects

Basic differences among programming languages:

 types of data allowed


 types of operations available
 mechanisms for controlling the sequence of operations

Elementary data types: built upon the available hardware features


Structured data types: software simulated

1. Data objects, variables, and constants

1. 1. Data object:

A run-time grouping of one or more pieces of data in a virtual computer.


A location in memory with an assigned name in the actual computer.

Types of data objects:

 Programmer defined data objects - variables, arrays, constants, files,


etc.
 System defined data objects - set up for housekeeping during program
execution, not directly accessible by the program. E.g. run-time storage
stacks.

Data value: a bit pattern that is recognized by the computer.

Elementary data object: contains a data value that is manipulated as a


unit.
Data structure: a combination of data objects.

Attributes: determine how the location may be used. Most important attribute
- the data type.

Attributes and Bindings

 Type: determines the set of data values that the object may take and
the applicable operations.
 Name: the binding of a name to a data object.
 Component: the binding of a data object to one or more data objects.
 Location: the storage location in memory assigned by the system.
 Value: the assignment of a bit pattern to a name.

Type, name and component are bound at translation, location is bound at


loading, value is bound at execution
1. 2. Data objects in programs

In programs, data objects are represented as variables and constants

Variables: Data objects defined and named by the programmer explicitly.

Constants: a data object with a name that is permanently bound to a value for
its lifetime.

 Literals: constants whose name is the written representation of their


value.
 A programmer-defined constant: the name is chosen by the
programmer in a definition of the data object.

1. 3. Persistence

Data objects are created and exist during the execution of the program. Some data
objects exist only while the program is running. They are called transient data
objects. Other data objects continue to exist after the program terminates, e.g. data
files. They are called persistent data objects. In certain applications, e.g. transaction-
based systems the data and the programs coexist practically indefinitely, and they
need a mechanism to indicate that an object is persistent. Languages that provide such
mechanisms are called persistent languages.

2. Data types

A data type is a class of data objects with a set of operations for creating and
manipulating them.

Examples of elementary data types:


integer, real, character, Boolean, enumeration, pointer.

2. 1. Specification of elementary data types

1. Attributes that distinguish data objects of that type

Data type, name - invariant during the lifetime of the object

stored in a descriptor and used during the program execution


used only to determine the storage representation, not used explicitly
during execution
2. Values that data object of that type may have

Determined by the type of the object


Usually an ordered set, i.e. it has a least and a greatest value

3. Operations that define the possible manipulations of data objects of that type.

Primitive - specified as part of the language definition


Programmer-defined (as subprograms, or class methods)
An operation is defined by:

 Domain - set of possible input arguments


 Range - set of possible results
 Action - how the result is produced

The domain and the range are specified by the operation signature

 the number, order, and data types of the arguments in the domain,
 the number, order, and data type of the resulting range

mathematical notation for the specification:

op name: arg type x arg type x … x arg type ® result type

The action is specified in the operation implementation

Sources of ambiguity in the definition of programming language operations

 Operations that are undefined for certain inputs.


 Implicit arguments, e.g. use of global variables
 Implicit results - the operation may modify its arguments

(HW 01 - the value of a changed in x = a + b)

 Self-modification - usually through change of local data between calls,


i.e. random number generators change the seed.

Subtypes : a data type that is part of a larger class.


Examples: in C, C++ int, short, long and char are variations of integers.

The operations available to the larger class are available to the subtype.
This can be implemented using inheritance.

2. 2. Implementation of a data type

4. Storage representation

Influenced by the hardware


Described in terms of:
Size of the memory blocks required
Layout of attributes and data values within the block

Two methods to treat attributes:

a. determined by the compiler and not stored in descriptors during


execution - C
b. stored in a descriptor as part of the data object at run time -
LISP Prolog
5. Implementation of operations
 Directly as a hardware operation. E.g. integer addition
 Subprogram/function, e.g. square root operation
 In-line code. Instead of using a subprogram, the code is copied into the
program at the point where the subprogram would have been invoked.
3. Declarations

Declarations provide information about the name and type of data objects
needed during program execution.

 Explicit – programmer defined


 Implicit – system defined

e.g. in FORTRAN - the first letter in the name of the variable determines the
type
Perl - the variable is declared by assigning a value

$abc = 'a string' $abc is a string variable


$abc = 7 $abc is an integer variable

Operation declarations: prototypes of the functions or subroutines that are


programmer-defined.

Examples:
declaration: float Sub(int, float)
signature: Sub: int x float --> float

Purpose of declaration

 Choice of storage representation


 Storage management
 Declaration determines the lifetime of a variable, and allowes for more
efficient memory usage.
 Specifying polymorphic operations.

Depending on the data types operations having same name may have
different meaning, e.g. integer addition and float addition

In most language +, -. *, / are overloaded


Ada - aloows the programmer to overload subprograms
ML - full polymorphism

Declarations provide for static type checking

4. Type checking and type conversion

Type checking: checking that each operation executed by a program receives


the proper number of arguments of the proper data types.
Static type checking is done at compilation.
Dynamic type checking is done at run-time.

Dynamic type checking – Perl and Prolog


Implemented by storing a type tag in each data object

Advantages: Flexibility
Disadvantages:

 Difficult to debug
 Type information must be kept during execution
 Software implementation required as most hardware does not provide
support

Concern for static type checking affects language aspects:

Declarations, data-control structures, provisions for separate compilation of


subprograms

Strong typing: all type errors can be statically checked

Type inference: implicit data types, used if the interpretation is unambiguous. Used
in ML

Type Conversion and Coercion

Explicit type conversion : routines to change from one data type to another.

Pascal: the function round - converts a real type into integer


C - cast, e.g. (int)X for float X converts the value of X to type integer

Coercion: implicit type conversion, performed by the system.

Pascal: + integer and real, integer is converted to real


Java - permits implicit coercions if the operation is widening
C++ - and explicit cast must be given.

Two opposite approaches to type coercions:

 No coercions, any type mismatch is considered an error :


Pascal, Ada
 Coercions are the rule. Only if no conversion is possible, error
is reported.

Advantages of coercions: free the programmer from some low level


concerns,
as adding real numbers and integers.

Disadvantages: may hide serious programming errors.


5. Assignment and Initialization

Assignment - the basic operation for changing the binding of a value to a data object.

Two different ways to define the assignment operation:

. does not return a value


a. returns the assigned value

The assignment operation can be defined using the concepts L-value and R-value

Location for an object is its L-value.


Contents of that location is its R-value.

Consider executing: A = A + B;

1. Pick up contents of location A: R-value of A


2. Add contents of location B: R-value of B
3. Store result into address A: L-value of A.

For each named object, its position on the right-hand-side of the assignment
operator (=) is a content-of access, and its position on the left-hand-side of the
assignment operator is an address-of access.

o address-of is an L-value
o contents-of is an R-value
o Value, by itself, generally means R-value

Initialization

Uninitialized data object - a data object has been created, but no value is
assigned, i.e. only allocation of a block storage has been performed.
4.2: Elementary Data Types :
Scalar Data types, Composite Data Types

A. Scalar Data Types


o Numeric data types
o Other data types
B. Composite Data Types
o Character strings
o Pointers and programmer-constructed objects
o Files

 Exam-like questions

A. Scalar Data Types

Scalar data types represent a single object, i.e. only one value can be derived.
In general, scalar objects follow the hardware architecture of a computer.

1. Numeric data types

Integers

Specification

 Maximal and minimal values - depending on the hardware.


In some languages these values represented as defined
constants.
 Operations:

Arithmetic
Relational
Assignment
Bit operations

Implementation : Most often using the hardware-defined integer


storage representation and a set of hardware arithmetic and relational
primitive operations on integers.

Subranges

Specification: A subtype of integer, consists of a sequence of integer


values within some restricted range. e.g. a Pascal declaration A: 1..10
means that the variable A may be assigned integer values from 1
through 10.

Implementation: smaller storage requirements, better type checking


Floating-point real numbers

Specification

 Ordered sequence of some hardware-determined minimum


negative value
to a maximum value.
 Similar arithmetic, relational and assignment operations as with
integers.
Roundoff issues - the check for equality may fail due to
roundoff.

Implementation: Mantissa - exponent model.


The storage is divided into a mantissa - the significant bits of the
number, and an exponent.

Example: 10.5 = 0.105 x 102,

Mantissa: 105
Exponent: 2

Fixed-point real numbers

Specification: Used to represent real numbers with predefined decimal


places,
such as dollars and cents.

Implementation: May be directly supported by hardware or simulated


by software.

2. Other data types

Complex numbers: software simulated with two storage locations -


one for the real portion and one for the imaginary portion.

Rational numbers: the quotient of two integers.

Enumerations: ordered list of different values.

Example: enum StudentClass {Fresh, Soph, Junior, Senior}


the variable StudentClass may accept only one of the four listed values.

Implementation: represented during run time as integers,


correspondeing to the listed values.

Booleans

Specification: Two values: true and false. Can be given explicitly as


enumeration,
as in Pascal and Ada. Basic operations: and, or, not.
Implementation: A single addressable unit such as byte or word. Two
approaches:

 Use a particular bit for the value, e.g. the last bit; 1 - true, 0 -
false.
 Use the entire storage; a zero value would then be false,
otherwise - true.

Characters

Specification: Single character as a value of a data object.


Collating sequence - the ordering of the characters, used for
lexicographic sorting.
Operations:

Relational
Assignment
Testing the type of the character - e.g. digit, letter, special
symbol.

Implementation: usually directly supported by the underlying


hardware.

B. Composite Data Types

Characterized by a complex data structure organization, processed by the


compiler.

1. Character strings: Data objects that are composed of a sequence of


characters

Specification and syntax. Three basic methods of treatment:

a. Fixed declared length - storage allocation at translation time

The data object is always a character string of a declared


length.
Strings longer than the declared length are truncated.

b. Variable length to a declared bound - storage allocation at translation time.

An upper bound for length is set and any string over that length
is truncated

c. Unbounded length - storage allocation at run time. Strings can be of any length.

Special case: C/C++


Strings are arrays of characters
No string type declaration
Null character determines the end of a string.

Operations

 Concatenation – appending two strings one after another


 Relational operation on strings – equal, less than, greater than
 Substring selection using positioning subscripts
 Substring selection using pattern matching
 Input/Output formatting
 Dynamic strings - the string is evaluated at run time.

Perl: "$ABC" will be evaluated as a name of a variable, and the


contents of the variable will be used.

Implementation

Fixed declared length: a packed vector of characters

Variable length to a declared bound: a descriptor that contains the


maximum length and the current length

Unbounded length: either a linked storage of fixed-length data objects


or a contiguous array of characters with dynamic tun-time storage
allocation.

2. Pointers and programmer-constructed objects


 Pointers are variables that contain the location of other data objects
 Allow to construct complex data objects.
 Used to link together the components of the complex data objects.

Specification:

 Pointers may reference data objects only of a single type – C, Pascal,


Ada.
 Pointer may reference data objects of any type. – Smalltalk

C, C++: pointers are data objects and can be manipulated by the


program
Java: pointers are hidden data structures, managed by the language
implementation

Operations:

 Creation operation:

Allocates a block of storage for the new data object, and


returns its address to be stored in the pointer variable.
No name of the location is necessary as the reference
would be by the pointer.

 Selection operation: the contents of the pointer is used as an


address in the memory.

Implementation

Methods:

 Absolute addresses stored in the pointer. Allows for storing


the new object anywhere in the memory
 Relative addresses: offset with respect to some base address.
Requires initial allocation of a block of storage to be used by
the data objects. The address of each object is relative to the
address of the block.

Advantages: the entire block can be moved to another


location without invalidating the addresses in the
pointers, as they are relative, not absolute.

Implementation problems:

 Creating objects of different size during execution time requires


the management of a general heap storage area.
 Garbage - occurs when the contents of pointer is destroyed, and
the object still exists however it is no more accessible.
 Dangling references: the object is destroyed however the
pointer still contains the address of the used location, and can
be wrongly used by the program.
3. Files

Characteristics:

 Usually reside on secondary storage devices as disks, tapes.


 Lifetime is greater than the lifetime of the program that has created the
files.

Types of files depending on the method of access

 Sequential file: a data structure composed of a linear sequence of


components of the same type.

File operations:
Open
Read
Write
End-of-file
Close

Implementation: usually handled by the operating system.

 Interactive Input-Output: sequential files used in interactive mode.


 Direct Access Files: any single component can be accessed at random
just as in an array.

Key: the subscript to access a component.


Implementation: a key table is kept in main memory

 Indexed Sequential Files: similar to direct access files using a key


combined with capability to process the file sequentially. The file must
be ordered by the key
4.3: Encapsulation - Structured Data Types

 Structured Data Types


 Specifications of data structure types
 Implementation of data structure types
 Declarations and type checking for data structures
 Vectors and arrays
 Records
 Other structured data objects

Mechanisms to create new data types


 Structured data
o Homogeneous: arrays, lists, sets
o Non-homogeneous: records
 Subprograms
 Type declarations – to define new types and operations
 Inheritance

A. Structured data types


A data structure is a data object that contains other data objects as its elements
or components.
1. Specifications
 Number of components
Fixed size – Arrays, records, fixed size data structure
Variable size – stacks, lists, sets, tables and files
Pointer is used to link components.
 Type of each component
Homogeneous – all components are the same type - arrays
Heterogeneous – components are of different types – records,
lists, structures
 Data structure type needs a selection mechanism to identifying each
components – index, pointer
Two-step process:
Referencing the structure
selection of a particular component
 Maximum number of components
 Organization of the components:
 simple linear sequence of components
 multidimensional structures:
 separate types (Fortran)
 vector of vectors (C++)
Operations on data structures
 Component selection operations
Sequential
Random
 Insertion/deletion of components
 Whole-data structure operations
Creation/destruction of data structures

2. Implementation of data structure types

Storage representation

Includes:

a. storage for the components


b. optional descriptor - to contain some or all of the attributes

Sequential representation: the data structure is stored in a single contiguous


block of storage, that includes both descriptor and components. Used for
fixed-size structures, homogeneous structures (arrays, character strings)

Linked representation: the data structure is stored in several noncontiguous


blocks of storage, linked together through pointers. Used for variable-size
structured (trees, lists)

Stacks, queues, lists can be represented in either way. Linked representation is


more flexible and ensures true variable size, however it has to be software
simulated.

Implementation of operations on data structures


Component selection in sequential representation: Base address
plus offset calculation. Add component size to current location to
move to next component.

Component selection in linked representation: Move from address


location to address location following the chain of pointers.

Storage management

Access paths to a structured data object - to endure access to the object


for its processing. Created using a name or a pointer.

Two central problems:

Garbage – the data object is bound but access path is


destroyed.
Memory cannot be unbound.

Dangling references – the data object is destroyed, but the


access path still exists.

3. Declarations and type checking for data structures

What is to be checked:

 Existence of a selected component


 Type of a selected component
4. Vectors and arrays

A vector - one dimensional array

A matrix - two dimensional array

Multidimensional arrays

A slice - a substructure in an array that is also an array, e.g. a column in a


matrix.

Implementation of array operations:

. Access - can be implemented efficiently if the length of the


components of the array is known at compilation time. The address of
each selected element can be computed using an arithmetic expression.
a. Whole array operations, e.g. copying an array - may require much
memory.

Associative arrays
Instead of using an integer index, elements are selected by a key value, that is
a part of the element. Usually the elements are sorted by the key and binary
search is performed to find an element in the array.

5. Records

A record is a data structure composed of a fixed number of components of


different types.
The components may be heterogeneous, and they are named with symbolic
names.

Specification of attributes of a record:

Number of components
Data type of each component
Selector used to name each component.

Implementation:

Storage: single sequential block of memory where the components are


stored sequentially.

Selection: provided the type of each component is known, the location


can be computed at translation time.

Note on efficiency of storage representation:

For some data types storage must begin on specific memory boundaries
(required by the hardware organization). For example, integers must be
allocated at word boundaries (e.g. addresses that are multiples of 4). When the
structure of a record is designed, this fact has to be taken into consideration.
Otherwise the actual memory needed might be more than the sum of the
length of each component in the record. Here is an example:

struct employee
{ char Division;
int IdNumber; };

The first variable occupies one byte only. The next three bytes will remain
unused and then the second variable will be allocated to a word boundary.
Careless design may result in doubling the memory requirements.

6. Other structured data objects

Records and arrays with structured components: a record may have a


component that is an array, an array may be built out of components that are
records.
Lists and sets: lists are usually considered to represent an ordered sequence of
elements,
sets - to represent unordered collection of elements.

Executable data objects

In most languages, programs and data objects are separate structures (Ada, C,
C++).

Other languages however do not distinguish between programs and data - e.g.
PROLOG. Data structures are considered to be a special type of program
statements and all are treated in the same way.

Exam-like questions

1. What is a data structure?


2. Which are the elements of a structured data type specification?
3. What types of operations are considered when specifying a
structured data type?
4. Which are the basic methods for storage representations
generally used in implementing structured data types. Describe
briefly.
5. Discuss briefly the memory management problems when
implementing structured data types.

Exam-like questions

1. What is a scalar data type? Give examples


2. Describe briefly the implementation of floating-point real numbers.
3. Describe briefly the implementation of booleans.
4. What is a composite data type? Give examples.
5. Describe briefly the approaches to specification and implementation of
character strings.
6. What implementation problems exist with data objects referred to by
pointers?
Exam-like questions

1. Explain the concept "data object" .


2. Give five attributes of data objects, describe them briefly. What are
their binding times?
3. Explain the concept "data type".
4. Which are the three components needed to specify a data type?
Describe them briefly.
5. Which are the two issues to be considered in implementation of a data
type?
6. How can operations be implemented?
7. What is the purpose of declaration?
8. What is type checking? When is type checking performed?
9. What are the advantages and disadvantages of dynamic type checking?
10. Explain the concepts "coercion" and "explicit type conversion".
11. What are the advantages and disadvantages of coercion?
12. Explain the concepts "L-value" and "R-value" of a variable. Give
examples.
Chapter 5: Encapsulation – Abstract data types, Subprograms and Type
Definitions

 Abstract Data Type


 Encapsulation by Subprograms
 Subprograms as abstract operations
 Subprogram definition and invocation
 Subprogram definitions and subprogram
activations
 Implementation of subprogram definition
and invocation
 Type Definitions

Dr.Narayana Swamy Ramaiah


Assoc.Prof, Dept of Electrical and Computer Engineering,
Arba Minch University,AMIT
A. Abstract Data Types

An abstract data type is:

o A set of data objects,


o A set of abstract operations on those data objects,
o Encapsulation of the whole in such a way that the user of the data object
cannot manipulate data objects of the type except by the use of operation
defined.

Encapsulation is primarily a question of language design; effective encapsulation is


possible only when the language prohibits access to the information hidden within
the abstraction.

Some languages that provide for abstract data types:


Ada: packages; C++, Java, Visual Basic: classes.

Information hiding

Information hiding is the term used for the central principal in the design of
programmer-defined abstract data types.

A programming language provides support for abstraction in two ways

a. By providing a virtual computer that is simpler to use and more powerful than
the actual underlying hardware computer.
b. The language provides facilities that aid the programmer to construct
abstractions.

When information is encapsulated in an abstraction, it means that the user of the


abstraction

1. does not need to know the hidden information in order to use the abstraction,
2. is not permitted to directly use or manipulate the hidden information
even if desiring to do so.

Mechanisms that support encapsulation:

Subprograms
Type definitions
B. Encapsulation by subprograms

A subprogram is an abstract operation defined by the programmer.

Two views of subprograms

A. Program design level, subprogram represents an abstract operation that the


programmer defines
B. Language design level, design and implementation of the general facilities for
subprogram definition and invocation.

1. Subprograms as abstract operations

Subprogram definition are provided by programmer and has two parts

a. Specification
b. implementation

Specification of a subprogram (same as that for a primitive operation):

 the name of the subprogram


 the signature (or prototype) of the subprogram - gives the number of
arguments, their order, and the data type of each, as well as the number of
results, their order, and the data type of each
 the action performed by the subprogram (description of function it
computes)

float FN(float X, int Y);

FN : real x integer -> real

Some problems in attempting to describe precisely the function computed by a


subprogram:

i. Implicit arguments in the form of nonlocal variables.


ii. Implicit results (side effects) returned as changes to nonlocal variables
or as changes in the subprogram's arguments.
iii. Using exception handlers in case the arguments are not of the required
type.
iv. History sensitiveness - the results may depend on previous executions.

Implementation of a subprogram:

 Uses the data structures and operations provided by the language


 Defined by the subprogram body
float FN(float X, int Y) - signature of subprogram
{float M(10); int N; - Local data declarations
….. - Statements defining the actions
..} Over the data.
The body is encapsulated, its components cannot be accessed separately by the user
of the subprogram. The interface with the user (the calling program) is
accomplished by means of arguments and returned results.

Type checking: similar to type checking for primitive operations.


Difference: types of operands and results are explicitly stated in the program

2. Subprogram definition and invocation

2. 1. Subprogram definitions and subprogram activations

Subprogram definition: the set of statements constituting the body of the


subprogram. It is a static property of the program, and it is the only information
available during translation.

Subprogram activation: a data structure (record) created upon invoking the


subprogram. It exists while the subprogram is being executed. After execution the
activation record is destroyed.

2. 2. Implementation of subprogram definition and invocation

A simple (but not efficient) approach:

Each time the subprogram is invoked, a copy of its executable statements,


constants and local variables is created.

A better approach:

The executable statements and constants are invariant part of the


subprogram - they do not need to be copied for each execution of the
subprogram. A single copy is used for all activations of the subprogram.
This copy is called code segment. This is the static part.

The activation record contains only the parameters, results and local
data. This is the dynamic part. It has same structure, but different values
for the variables.

Below is Figure 6.3 from the textbook.


On the left is the subprogram definition. On the right is the activation record created
during execution. It contains the types and number of variables used by the
subprogram, and the assigned memory locations at each execution of the
subprogram. The definition serves as a template to create the activation record
(the use of the word template is different from the keyword template in class
definitions in C++, though its generic meaning is the same - a pattern, a frame to be
filled in with particular values. In class definitions the binding refers to the data
types and it is performed at compilation time, while here the binding refers to
memory locations and data values, and it is performed at execution time.)

Generic subprograms: have a single name but several different definitions –


overloaded.

3. Subprogram definitions as data objects

In compiled languages subprogram definition is separate from subprogram


execution. – C, C++, Java

In interpreted languages there is no difference - definitions are treated as run-


time data objects – Prolog, LISP, Perl. Interpreted languages use an operation
to invoke translation at run-time – consult in Prolog, define in LISP.
C.Type Definitions

3. Basics

Type definitions are used to define new data types. Note, that they do not define a
complete abstract data type, because the definitions of the operations are not
included.

Format: typedef definition name

Actually we have a substitution of name for the definition.

Examples:

typedef int key_type;


key_type key1, key2;

These statements will be processed at translation time and the type


of key1 and key2 will be set to integer.

struct rational_number
{int numerator, denominator;};

typedef rational_number rational;


rational r1, r2;

Here r1 and r2 will be of type rational_number

4. Type equivalence and equality of data objects

Two questions to be answered:

 When are two types the same?


 When do two data objects of same types are “equal”?

Type equality example program:


program main (input, output);
Type vect1: array[1..10] of real;
vect2: array[1..10] of real;
Var X,Z: vect1; Y: vect2;

 Name equivalence: two data types are considered equivalent


only if they have the same name.

Assignment X := Z is valid, X := Y is not


Issues
Every object must have an assigned type, there can be no anonymous
types.A singe type definition must serve all or large parts of a program.

 Structural equivalence: two data types are considered equivalent


if they define data objects that have the same internal components
(storage representation or runtime implementation of data types is
identical).

Vect1 and Vect2 are equivalent types

Issues

Do components need to be exact duplicates? Can field order be different in


Records? Can field sizes vary?

 Data object equality

We can consider two objects to be equal if each member in one object is


identical to the corresponding member of the other object. However
there still may be a problem. Consider for example the rational numbers 1/2
9

In general, the compiler has no way to know how to compare data values of
user-defined type. It is the task of the programmer that has defined that
particular data type to define also the operations with the objects of that
type.

5. Type definition with parameters

Parameters allow the user to prescribe the size of data types needed – array
sizes.

type section (MaxSize: integer) is


Record
Room: integer;
Instructor: integer;
ClassSize : integer range 0...MaxSize;
ClassRoll: array (1..MaxSize) of Student_ID;
End record;

X: section (100); -- gives maximum size 100


Y: section (25); -- gives maximum size 25

Implementation: The type definition with parameters is used as a


template as any other type definition during compilation.
Exam-like questions

1. What is an abstract data type? How does it differ from a structured data type?
2. Explain briefly the concept "encapsulation" and how it can be achieved by means
of subprograms.
3. Explain and compare the concepts "subprogram definition" and "subprogram
activation record".
4. Discuss the two aspects of type equivalence: name equivalence and structural
equivalence
5. Discuss briefly data object equality
Chapter : Encapsulation – Abstract data types, Subprograms and Type
Definitions

 Abstract Data Type


 Encapsulation by Subprograms
 Subprograms as abstract operations
 Subprogram definition and invocation
 Subprogram definitions and subprogram
activations
 Implementation of subprogram definition
and invocation
 Type Definitions

Dr.Narayana Swamy Ramaiah


Assoc.Prof, Dept of Electrical and Computer Engineering,
Arba Minch University,AMIT
A. Abstract Data Types

An abstract data type is:

o A set of data objects,


o A set of abstract operations on those data objects,
o Encapsulation of the whole in such a way that the user of the data object
cannot manipulate data objects of the type except by the use of operation
defined.

Encapsulation is primarily a question of language design; effective encapsulation


is possible only when the language prohibits access to the information hidden
within the abstraction.

Some languages that provide for abstract data types:


Ada: packages; C++, Java, Visual Basic: classes.

Information hiding

Information hiding is the term used for the central principal in the design of
programmer-defined abstract data types.

A programming language provides support for abstraction in two ways

a. By providing a virtual computer that is simpler to use and more powerful than
the actual underlying hardware computer.
b. The language provides facilities that aid the programmer to construct
abstractions.

When information is encapsulated in an abstraction, it means that the user of the


abstraction

1. does not need to know the hidden information in order to use the abstraction,
2. is not permitted to directly use or manipulate the hidden information
even if desiring to do so.

Mechanisms that support encapsulation:

Subprograms
Type definitions
B. Encapsulation by subprograms

A subprogram is an abstract operation defined by the programmer.

Two views of subprograms

A. Program design level, subprogram represents an abstract operation that the


programmer defines
B. Language design level, design and implementation of the general facilities for
subprogram definition and invocation.

1. Subprograms as abstract operations

Subprogram definition are provided by programmer and has two parts

a. Specification
b. implementation

Specification of a subprogram (same as that for a primitive operation):

 the name of the subprogram


 the signature (or prototype) of the subprogram - gives the number of
arguments, their order, and the data type of each, as well as the number of
results, their order, and the data type of each
 the action performed by the subprogram (description of function it
computes)

float FN(float X, int Y);

FN : real x integer -> real

Some problems in attempting to describe precisely the function computed by a


subprogram:

i. Implicit arguments in the form of nonlocal variables.


ii. Implicit results (side effects) returned as changes to nonlocal variables
or as changes in the subprogram's arguments.
iii. Using exception handlers in case the arguments are not of the required
type.
iv. History sensitiveness - the results may depend on previous executions.

Implementation of a subprogram:

 Uses the data structures and operations provided by the language


 Defined by the subprogram body
float FN(float X, int Y) - signature of subprogram
{float M(10); int N; - Local data declarations
….. - Statements defining the actions
..} Over the data.
The body is encapsulated, its components cannot be accessed separately by the
user of the subprogram. The interface with the user (the calling program) is
accomplished by means of arguments and returned results.

Type checking: similar to type checking for primitive operations.


Difference: types of operands and results are explicitly stated in the program

2. Subprogram definition and invocation

2. 1. Subprogram definitions and subprogram activations

Subprogram definition: the set of statements constituting the body of the


subprogram. It is a static property of the program, and it is the only information
available during translation.

Subprogram activation: a data structure (record) created upon invoking the


subprogram. It exists while the subprogram is being executed. After execution the
activation record is destroyed.

2. 2. Implementation of subprogram definition and invocation

A simple (but not efficient) approach:

Each time the subprogram is invoked, a copy of its executable statements,


constants and local variables is created.

A better approach:

The executable statements and constants are invariant part of the


subprogram - they do not need to be copied for each execution of the
subprogram. A single copy is used for all activations of the subprogram.
This copy is called code segment. This is the static part.

The activation record contains only the parameters, results and local
data. This is the dynamic part. It has same structure, but different values
for the variables.

Below is Figure 6.3 from the textbook.


On the left is the subprogram definition. On the right is the activation record
created during execution. It contains the types and number of variables used by
the subprogram, and the assigned memory locations at each execution of the
subprogram. The definition serves as a template to create the activation record
(the use of the word template is different from the keyword template in class
definitions in C++, though its generic meaning is the same - a pattern, a frame to
be filled in with particular values. In class definitions the binding refers to the data
types and it is performed at compilation time, while here the binding refers to
memory locations and data values, and it is performed at execution time.)

Generic subprograms: have a single name but several different definitions –


overloaded.

3. Subprogram definitions as data objects

In compiled languages subprogram definition is separate from subprogram


execution. – C, C++, Java

In interpreted languages there is no difference - definitions are treated as


run-time data objects – Prolog, LISP, Perl. Interpreted languages use an
operation to invoke translation at run-time – consult in Prolog, define in LISP.
C.Type Definitions

3. Basics

Type definitions are used to define new data types. Note, that they do not define a
complete abstract data type, because the definitions of the operations are not
included.

Format: typedef definition name

Actually we have a substitution of name for the definition.

Examples:

typedef int key_type;


key_type key1, key2;

These statements will be processed at translation time and the type


of key1 and key2 will be set to integer.

struct rational_number
{int numerator, denominator;};

typedef rational_number rational;


rational r1, r2;

Here r1 and r2 will be of type rational_number

4. Type equivalence and equality of data objects

Two questions to be answered:

 When are two types the same?


 When do two data objects of same types are “equal”?

Type equality example program:


program main (input, output);
Type vect1: array[1..10] of real;
vect2: array[1..10] of real;
Var X,Z: vect1; Y: vect2;

 Name equivalence: two data types are considered equivalent


only if they have the same name.

Assignment X := Z is valid, X := Y is not


Issues
Every object must have an assigned type, there can be no anonymous
types.A singe type definition must serve all or large parts of a program.

 Structural equivalence: two data types are considered equivalent


if they define data objects that have the same internal components
(storage representation or runtime implementation of data types is
identical).

Vect1 and Vect2 are equivalent types

Issues

Do components need to be exact duplicates? Can field order be different in


Records? Can field sizes vary?

 Data object equality

We can consider two objects to be equal if each member in one object is


identical to the corresponding member of the other object. However
there still may be a problem. Consider for example the rational numbers
1/2 9

In general, the compiler has no way to know how to compare data values
of user-defined type. It is the task of the programmer that has defined that
particular data type to define also the operations with the objects of that
type.

5. Type definition with parameters

Parameters allow the user to prescribe the size of data types needed – array
sizes.

type section (MaxSize: integer) is


Record
Room: integer;
Instructor: integer;
ClassSize : integer range 0...MaxSize;
ClassRoll: array (1..MaxSize) of Student_ID;
End record;

X: section (100); -- gives maximum size 100


Y: section (25); -- gives maximum size 25

Implementation: The type definition with parameters is used as a


template as any other type definition during compilation.
Exam-like questions

1. What is an abstract data type? How does it differ from a structured data type?
2. Explain briefly the concept "encapsulation" and how it can be achieved by means
of subprograms.
3. Explain and compare the concepts "subprogram definition" and "subprogram
activation record".
4. Discuss the two aspects of type equivalence: name equivalence and structural
equivalence
5. Discuss briefly data object equality
Chapter . Inheritance

 Abstract data types


 Derived classes
 Multiple inheritance
 Inheritance of methods
 Polymorphism

 Exam-like questions

Inheritance: implicit passing of information between program components.

The concept is generally used in the context of complex data objects.

1. Abstract Data Types


 Data components
 Operations to manipulate the data components

Basic idea: The data components and the programs that implement the operations are
hidden from the external world. The object is encapsulated.

Implementation of ADT: classes (C++), packages (ADA), objects (Smalltalk)

E.G. private section: accessible only to the class functions (class functions are called
also methods)
public section: contains the methods - to be used by other programs

Generic abstract data types - use templates

This is the case when the data components may be of different type,
however the operations stay the same, e.g. a list of integers, a list of
characters.

Instantiation occurs at compiling time

2. Derived classes

Object hierarchy: an object may be a special case of a more general object.


Some of the properties are the same - these properties can be inherited

Generalization and specialization: Down the hierarchy the objects become more
specialized, up the hierarchy - more generalized.

Instantiation: The process of creating instances of a class.

Derived classes inherit data components and/or methods


Further on, they can specify their own data components and their own specific
methods. The specific parts may have same names as in the parent -
they override the definition in the parent class.
Implementation

Copy-based approach (Direct encapsulation) - each instance of a class


object has its own data storage containing all data components - specific
plus inherited.

Delegation-based approach (Indirect encapsulation) – the object uses the


data storage of the base class. Data sharing.

3. Multiple inheritance

Not allowed in Java. Problems in case of contradictory information in the


parent classes.

4. Inheritance of methods

Virtual functions - bound at run time

class Figure
{
public:
Figure();
virtual void draw();
virtual void erase();
void center();
void set_color(T Color);
void position_center();
};
void Figure:: center()
{
erase();
position_center();
draw();
}

class Box : public Figure


{
public:
Box();
void draw();
void erase();
};
in MAIN:

Box a_box;
a_box.draw(); // overrides base class
a_box.set_color(C); // inherits the method
a_box.center(); // makes use of virtual
// functions

Implementation of virtual methods:

A slot in the record defining the class.


The constructor fills in the location of the new virtual procedure if
there is one.
If not, it fills in the location of the virtual procedure from the base
class.

Abstract Classes - can serve only as templates, no data objects can be


declared
with the name of the class. Specified by NULL virtual functions

virtual void TypeName() = 0;

Mixin inheritance - specify only the differences (not present in C++)

5. Polymorphism

The ability of a single operator or subprogram name to refer to any number of


function definitions
depending on the data types of the arguments and results.

Example:

When printing a person's name, we may want to print the full name, or to print only
the first and the last name. We may use two functions that have the same name but
different number of arguments:

void print_name(string, string, string);

void print_name(string, string);

Exam-like questions

Discuss briefly inheritance in programming.

a. How is inheritance defined?


b. What can be inherited?
c. Derived classes and inheritance. Multiple inheritance.
Chapter . Sequence Control

 Sequence control
 Levels of sequence control
 Sequencing with expressions
 Statement level sequence control
 Prime programs

 Exam-like questions

1. Sequence control

Sequence control : the control of the order of execution of the operations both
primitive and user defined.

Implicit: determined by the order of the statements in the source program


or by the built-in execution model

Explicit: the programmer uses statements to change the order of execution


(e.g. uses If statement)

2. Levels of sequence control

Expressions: computing expressions using precedence rules and


parentheses.

Statements: sequential execution, conditional and iteration statements.

Declarative programming: an execution model that does not depend on the


order
of the statements in the source program.

Subprograms: transfer control from one program to another.

3. Sequencing with expressions

The issue: given a set of operations and an expression involving these


operations,
what is the sequence of performing the operations?
How is the sequence defined, and how is it represented?

An operation is defined in terms of an operator and operands.


The number of operands determines the arity of the operator.

Basic sequence-control mechanism: functional composition


Given an operation with its operands, the operands may be:

o Constants
o Data objects
o Other operations
Example 1: 3 * (var1 + 5)

operation - multiplication, operator: *, arity - 2


operand 1: constant (3)
operand 2: operation addition
operand1: data object (var1)
operand 2: constant (5)

Functional compositions imposes a tree structure on the expression,


where we have one main operation, decomposable into an operator and operands.

In a parenthesized expression the main operation is clearly indicated.


However we may have expressions without parentheses.

Example 2: 3* var1 +5

Question: is the example equivalent to the above one?

Example 3: 3 + var1 +5

Question: is this equivalent to (3 + var1) + 5, or to 3 + (var1 + 5) ?

In order to answer the questions we need to know:

o Operator's precedence
o Operator's associativity

Precedence concerns the order of applying operations, associativity deals with the
order of operations of same precedence.

Precedence and associativity are defined when the language is defined - within the
semantic rules for expressions.

3. 1. Arithmetic operations / expressions

In arithmetic expressions the standard precedence and associativity of operations


are applied to obtain the tree structure of the expression.

Linear representation of the expression tree:

o Prefix notation
o Postfix notation
o Infix notation

Prefix and postfix notations are parentheses-free.

There are algorithms to evaluate prefix and postfix expressions and algorithms to
convert an infix expression into prefix/postfix notation, according to the operators'
precedence and associativity.
3. 2. Other expressions

Languages may have some specific operations, e.g. for processing arrays and vectors,
built-in or user defined. Precedence and associativity still need to be defined -
explicitly in the language definition or implicitly in the language implementation.

3. 3. Execution-time representation of expressions

o Machine code sequence


o Tree structures - software simulation
o Prefix or postfix form - requires stack, executed by an interpreter.

3. 4. Evaluation of tree representation

Eager evaluation - evaluate all operands before applying operators.


Lazy evaluation - first evaluate all operands and then apply operations

Problems:

o Side effects - some operations may change operands of other operations.


o Error conditions - may depend on the evaluation strategy (eager or lazy
evaluation)
o Boolean expressions - results may differ depending on the evaluation
strategy.

4. Statement level sequence control

4. 1. Forms of statement-level control

 Composition – Statements are executed in the order they appear on the


page.
 Alternation – Two sequences form alternatives so one sequence or the
other
sequence is executed but not both. (conditionals)
 Iteration – A sequence of statements that are executed repeatedly.
 Explicit Sequence Control

goto X
if Y goto X – transfer control to the statement labeled X if Y is
true.
break

4. 2. Structured programming design

5. Hierarchical design of program structures


6. Representation of hierarchical design directly in the program text
using "structured" control statements.
7. The textual sequence corresponds to the execution sequence
8. Use of single-purpose groups of statements
4. 3. "Structured" control statements

i. Compound statements

Typical syntax:

begin

statement1;
statement2;
...
end;

Execute each statement in sequence.

Sometimes (e.g., C) { ... } used instead of begin ... end

j. Conditional statements

if expression then statement1 else statement2

if expression then statement1

If we need to make a choice among many alternatives

nested if statements
case statements

Example :

case Tag is

when 0 => begin


statement0
end;

when 1 => begin

statement1
end;

when 2 => begin

statement2
end;

when others => begin

statement3
end;

end case

Implementation: jump and branch machine instructions, jump table


implementation for case statements (see fig. 8.7)
k. Iteration statements

Simple repetition (for loop) Specify a count of the number of


times to execute a loop:

Examples:

perform statement K times;

for I=1 to 10 do statement;

for(I=0; I<10; I++) statement;

Repetition while condition holds

while expression do statement; - Evaluate expression and if true


execute statement. then repeat process.

repeat statement until expression; - Execute statement and then


evaluate expression. Quit if expression is true.

C++ for loop functionally is equivalent to repetition while condition holds

by T. Pratt and M. Zelkowitz

Problems with structured sequence control:

Multiple exit loops


Exceptional conditions
Do-while-do structure

Solutions vary with languages, e.g. in C++ - break statement, assert for exceptions.

5. Prime programs

Theory of prime programs - a consistent theory of control structures

Consider 3 classes of flowchart nodes:

Any flowchart is a graph of directed arcs and these 3 types of nodes


A proper program is a flowchart with:

o 1 entry arc
o 1 exit arc
o There is a path from entry arc to any node to exit arc

A prime program is a proper program which has no embedded proper subprogram


of greater than 1 node. (i.e., cannot cut 2 arcs to extract a prime subprogram within
it).

A composite program is a proper program that is not prime.

Every proper program can be decomposed into a hierarchical set of prime


subprograms.
This decomposition is unique (except for special case of linear sequences of function
nodes).

All primes can be enumerated. Fig. 8.9 gives the primes with up to 4 nodes.

Question: Can any prime program be built out of structure control statements?
The answer is given by the structure theorem:

Any flowchart can be represented using only if statements, while statements and
sequence control statements

Exam-like questions

1. What is sequence control?


2. Which are the levels of sequence control? Name and define them.
3. What is the basic mechanism of sequence control in expressions?
Describe it and give examples.
4. What is the role of precedence and associativity in sequencing with expressions?
5. Name three linear representations of expression trees.
6. Name three execution-time representations of expressions.
7. List and define the statement-level control structures.
8. List four principles of structured programming design.
9. List three types of structured control statements.
10. Describe the concept prime program.
Chapter : Subprogram Control

 Subprogram sequence control


o Simple call-return subprograms
o Recursive subprograms
 Attributes of data control
o Names and referencing environments
o Static and dynamic scope
o Block structure
o Local data and local referencing environments

 Exam-like questions

Subprogram control: interaction among subprograms and how subprograms manage to pass
data among themselves in a structured and efficient manner.

Terminology:

Function call – subprograms that return values directly


Subroutine call – subprograms that operate only through side effects on shared data.

A. Subprogram Sequence Control

Simple subprogram call return

Copy rule view of subprograms: the effect of a call statement is the same as if
the subprogram were copied and inserted into the main program.

Implicit assumptions present in this view :

o Subprograms cannot be recursive


o Explicit call statements are required
o Subprograms must execute completely at each call
o Immediate transfer of control at point of call
o Single execution sequence
6. Simple call-return subprograms

Execution of subprograms
Outline of the method:

1. Subprogram definition and subprogram activation.

The definition is translated into a template, used to create an


activation
each time a subprogram is called.

2. Subprogram activation: consists of


 a code segment (the invariant part) - executable code and
constants
 an activation record (the dynamic part) - local data,
parameters

created anew each time the subprogram is called,


destroyed when the subprogram returns.

Execution is implemented by the use of two system-defined pointers:

 Current-instruction pointer – CIP-address of the next statement


to be executed
 Current-environment pointer – CEP- pointer to the activation
record.

On call instruction:

e. An activation record is created.


f. Current CIP and CEP are saved in the created activation record
as return point
g. CEP is assigned the address of the activation record.
h. CIP gets the address of the first instruction in the code segment
i. The execution continues from the address in CIP
On return

j. The old values of CIP and CEP are retrieved .


k. The execution continues from the address in CIP

Restrictions of the model: at most one activation of any subprogram

The simplest implementation: to allocate storage for each activation as an


extension of the code segment. Used in FORTRAN and COBOL.
The activation record is not destroyed - only reinitialized for each
subprogram execution.

Hardware support - CIP is the program counter, CEP is not used, simple
jump executed on return.

Stack-based implementation - the simplest run-time storage management


technique

call statements : push CIP and CEP


return statements : pop CIP and CEP off of the stack.

Used in most C implementations


LISP: uses the stack as an environment.

7. Recursive subprograms

Specification

Syntactically - no difference
Semantically - multiple activations of the same subprogram exist
simultaneously at some point in the execution.

Implementation

Stack-based - CIP and CEP are stored in stack, forming a dynamic


chain of links.

 A new activation record is created for each call and


destroyed at return.
 The lifetimes of the activation records cannot overlap -
they are nested.

Some language compilers (C, Pascal) always assume recursive structure of


subprograms,
while in others non-recursive subprograms are implemented in the simple
way.

B. Attributes of data control

Data control features: determine the accessibility of data at different points during
program execution.

Central problem: the meaning of variable names,


i.e. the correspondence between names and memory locations.

0. Names and referencing environments


Two ways to make a data object available as an operand for an operation.

0. Direct transmission – A data object computed at one point as the


result of
an operation may be directly transmitted to another operation as an
operand

Example: x = y + 2*z;
The result of multiplication is transmitted directly as an operand of
the addition operation.

1. Referencing through a named data object –


A data object may be given a name when it is created, and the
name may then
be used to designate it as an operand of an operation.

1. 1. Program elements that may be named

1. Variables
2. Formal parameters
3. Subprograms
4. Defined types
5. Defined constants
6. Labels
7. Exception names
8. Primitive operations
9. Literal constants

Names from 4 thru 9 - resolved at translation time.


Names 1 thru 3 - discussed below.

Simple names: identifiers, e.g. var1.


Composite names: names for data structure components,
e.g. student[4].last_name.

1. 2. Associations and Referencing Environments

Association: binding identifiers to particular data objects and


subprograms
Referencing environment: the set of identifier associations for a
given subprogram.
Referencing operations during program execution: determine the
particular data object
or subprogram associated with an identifier.

Local referencing environment:

The set of associations created on entry to a subprogram


that represent formal parameters, local variables, and
subprograms defined only within that subprogram

Nonlocal referencing environment:

The set of associations for identifiers that may be used


within a subprogram
but that are not created on entry to it. Can be global or
predefined.
Global referencing environment: associations created at
the start of execution
of the main program, available to be used in a
subprogram,

Predefined referencing environments: predefined


association in the language definition.

Visibility of associations

Associations are visible if they are part of the referencing


environment.
Otherwise associations are hidden

Dynamic scope of associations

The set of subprogram activations within which the


association is visible

1. 3. Aliases for data objects: Multiple names of a data object

 separate environments - no problem


 in a single referencing environment - called aliases.

Problems with aliasing

 Can make code difficult to understand for the programmer.


 Implementation difficulties at the optimization step - difficult to
spot interdependent statements - not to reorder them
1. Static and dynamic scope

The dynamic scope of an association for an identifier is that set of


subprogram activations in which the association is visible during execution.

Dynamic scope rules

relate references with associations for names during


program execution.

The static scope of a declaration is that part of the program text where a
use of the identifier is a reference to that particular declaration of the
identifier.

Static scope rules

relate references with declarations of names in the


program text.

Importance of static scope rules - recording information about a


variable during translation.

2. Block structure

Block-structured languages (Pascal):

 Each program or subprogram is organized as a set of nested blocks.


 The chief characteristic of a block is that it introduces a new local
referencing environment.

Static scope rules for block-structured programs

3. Local data and local referencing environments

Local environment of a subprogram: various identifiers declared in the


subprogram -
variable names, parameters, subprogram names.

Static scope rules: implemented by means of a table of the local


declarations
Dynamic scope rules: two methods:

 Retention - associations and the bound values are retained after


execution.
 Deletion - associations are deleted.
(For further explanation and example see Figure 9.9 on p. 369)

Implementation of dynamic scope rules in local referencing


environments:
by means of a local environment table to associate names, types and values.

Retention: the table is kept as part of the code segment


Deletion: the table is kept as part of the activation record,
destroyed after each execution.
Exam-like questions

1. What are the assumptions in simple subprogram call-return?


2. Describe the simple subprogram call-return. Outline the method.
Describe what happens on call and what happens on return.
3. Discuss the implementation of recursive subprograms.
4. Describe two methods to make a data object available as an operand of an operation.
5. Define the terms "association" and "referencing environment"
6. What is local referencing environment?
7. What is non-local referencing environment?
List and define types of non-local referencing environments.
8. What is the purpose of dynamic scope rules?
9. What is the purpose of static scope rules?
10. How are static scope rules implemented in local referencing environments?
11. Describe two approaches for dynamic scope rules in local referencing environments
and their implementation.
Chapter: Subprogram Control - Sharing Data Objects

 Parameter transmission
o Actual and formal parameters
o Methods for transmitting parameters
o Transmission semantics
o Implementation of parameter transmission
 Explicit common environment

 Exam-like questions

A. Parameter transmission

Subprograms need mechanisms to exchange data.

Arguments - data objects sent to a subprogram to be processed

Obtained through

o parameters
o non-local references

Results - data object or values delivered by the subprogram

Returned through

o parameters
o assignments to non-local variables
o explicit function values
6. Actual and Formal Parameters

A formal parameter is a particular kind of local data object within a


subprogram.
It has a name, the declaration specifies its attributes.

An actual parameter is a data object that is shared with the caller


subprogram.
Might be:

 a local data object belonging to the caller,


 a formal parameter of the caller,
 a nonlocal data object visible to the caller,
 a result returned by a function invoked by the caller and
immediately transmitted to the called subprogram.

Establishing a Correspondence

Positional correspondence – pairing actual and formal parameters


based on
their respective positions in the actual- and formal- parameter lists.

Correspondence by explicit name – the name is paired explicitly


by the caller.
7. Methods for transmitting parameters

Call by name – the actual parameter is substituted in the subprogram.

Call by reference – a pointer to the location of the data object is made


available to the subprogram. The data object does not change position in
memory.

Call by value – the value of the actual parameter is copied in the location of
the formal parameter.

Call by value-result – same as call by value, however at the end of


execution
the result is copied into the actual parameter.

Call by constant value – if a parameter is transmitted by constant value,


then no change in
the value of the formal parameter is allowed during program execution.

Call by result – a parameter transmitted by result is used only to transmit a


result back from a subprogram. The initial value of the actual-parameter
data object makes no difference and cannot be used by the subprogram.

Note: Often "pass by" is used instead of "call by" .

Examples:

Pass by name in Algol Pass by reference in FORTRAN

procedure S (el, k); SUBROUTINE S (EL, K)

integer el, k; K=2

begin EL = 0

k:=2; el := 0 RETURN

end; END

A[1] := A[2] := 1; A(1) = A(2) = 1

i := 1; I=1

S(A[i],i); CALL S (A(I), I)


Pass by name:

After calling S(A[i],i), the effect is as if the procedure were

i := 2;

A[i] := 0;

As a result A[2] becomes 0.

On exit we have

i = 2, A[1] = 1, A[2] = 0.

Pass by reference:

Since at the time of call i is 1, the formal parameter el is linked to


the address of A(1).

Thus it is A(1) that becomes 0.

On exit we have: i = 2, A(1) = 0, A(2) = 1

8. Transmission semantics

Types of parameters:

input information
output information (the result)
both input and output

The three types can be accomplished by copying or using pass-by-reference

Return results:

Using parameters
Using functions with a return value

9. Implementation of parameter transmission

Implementing formal parameters:

Storage - in the activation record


Type:
Local data object of type T in case of pass by value, pass
by value-result, pass by result
Local data object of type pointer to T in case of pass by
reference
Call by name implementation: the formal parameters are subprograms that
evaluate the actual parameters.

Actions for parameter transmission:

 associated with the point of call of the subprogram


each actual parameter is evaluated in the referencing
environment
of the calling program, and list of pointers is set up.

 associated with the entry and exit in the subprogram

on entry:
copying the entire contents of the actual
parameter in the formal parameter, or copying
the pointer to the actual parameter
on exit:
copying result values into actual parameters
or copying function values into registers

These actions are performed by prologue and epilogue code generated by


the compiler
and stored in the segment code part of the activation record of the
subprogram.

Thus the compiler has two main tasks in the implementation of parameter
transmission

3.
It must generate the correct executable code for transmission of
parameters, return of results, and each reference to a formal-
parameter name.
4. It must perform the necessary static type checking to ensure that
the type of each actual-parameter data object matches that declared
for the corresponding formal parameter
B. Explicit common environment

This method of sharing data objects is straightforward.

Specification: A common environment that is similar to a local environment,


however it is not a part of any single subprogram.

It may contain: definitions of variables, constants, types.


It cannot contain: subprograms, formal parameters.

Implementation: as a separate block of memory storage.

Special keywords are used to specify variables to be shared.

Exam-like questions

1. Describe each method of parameter transmission (see A.2).


2. What is the difference between pass by name and pass by reference methods of
parameter transmission?
3. What is the difference between pass by value and pass by reference methods of
parameter transmission?
4. What is the difference between pass by value-result and pass by result methods of
parameter transmission?
5. Consider the following code in some imaginary language :

/* main program */
....
integer i = 1,j = 2;
subprog(i,j);
print(i,j);
.....

subprog(integer k, integer m);


begin

k = k + 1;
m = m + i;
print (i,j,k,m);

end;

What values would be printed in the three modes of parameter transmission? Fill in
the table below:

print(i,j,k,m) in subprog Print(i,j) in main


program

i j k m i j

Pass by
reference

Pass by value

Pass by value -
result

Solution

To solve the problem we have to determine the scope of each variable name and the reference
environments in the three cases of parameter transmisson.

A. Pass by reference:

Reference environment of the main program:


Name Location

i Loc1

j Loc2

subprog …..

Reference environment of subprog:

Name Location

k Loc1

m Loc2

i Loc1

j Loc2

Changes of variable contents upon statement execution

Loc 1 Loc 2

i in main j in main
i in subprog j in subprog
k in subprog m in subprog

M1 1 2

M2

S1 2

S3 4

S3 Printed: 2, 4, 2, 4

M3 Printed: 2, 4

B. Pass by value:

Reference environment of the main program:

Name Location

i Loc1

j Loc2

subprog …..
Reference environment of subprog:

Name Location

k Loc3

m Loc4

i Loc1

j Loc2

Changes of variable contents upon statement execution

Loc 1 Loc 2 Loc 3 Loc 4

i in main j in main k in subprog m in subprog


i in subprog j in subprog

M1 1 2

M2 1 2

S1 2

S2 3

S3 Print: 1, 2, 2, 3

M3 Print: 1, 2

C. Pass by value-result:

Reference environment of the main program:

Name Location

i Loc1

j Loc2

subprog …..

Reference environment of subprog:

Name Location

k Loc3

m Loc4
i Loc1

j Loc2

Changes of variable contents upon statement execution

Loc 1 Loc 2 Loc 3 Loc 4

i in main j in main k in subprog m in


i in subprog j in subprog subprog

M1 1 2

M2 1 2

S1 2

S2 3

S3 Print: 1, 2, 2, 3

Exit subprog 2 3

M3 Print: 2, 3

The filled in table is:

print(i,j,k,m) in subprog Print(i,j) in main


program

i j k m i j

Pass by 2 4 2 4 2 4
reference

Pass by value 1 2 2 3 1 2

Pass by value - 1 2 2 3 2 3
result

Back to Parameter Transmission


Concurrency

Concurrency means that an application is making progress on more than one task at the same time
(concurrently). Well, if the computer only has one CPU the application may not make progress on more than
one task at exactly the same time, but more than one task is being processed at a time inside the application. It
does not completely finish one task before it begins the next.

Parallelism

Parallelism means that an application splits its tasks up into smaller subtasks which can be processed in parallel,
for instance on multiple CPUs at the exact same time.

Concurrency vs. Parallelism In Detail

As you can see, concurrency is related to how an application handles multiple tasks it works on. An application
may process one task at at time (sequentially) or work on multiple tasks at the same time (concurrently).

Parallelism on the other hand, is related to how an application handles each individual task. An application may
process the task serially from start to end, or split the task up into subtasks which can be completed in parallel.

As you can see, an application can be concurrent, but not parallel. This means that it processes more than one
task at the same time, but the tasks are not broken down into subtasks.

An application can also be parallel but not concurrent. This means that the application only works on one task at
a time, and this task is broken down into subtasks which can be processed in parallel.
Additionally, an application can be neither concurrent nor parallel. This means that it works on only one task at
a time, and the task is never broken down into subtasks for parallel execution.

Finally, an application can also be both concurrent and parallel, in that it both works on multiple tasks at the
same time, and also breaks each task down into subtasks for parallel execution. However, some of the benefits
of concurrency and parallelism may be lost in this scenario, as the CPUs in the computer are already kept
reasonably busy with either concurrency or parallelism alone. Combining it may lead to only a small
performance gain or even performance loss. Make sure you analyze and measure before you adopt a concurrent
parallel model blindly.

Das könnte Ihnen auch gefallen