Beruflich Dokumente
Kultur Dokumente
ISBN 0-321-33025-0
Chapter 5 Topics
.ITCS332
Introduction Names Variables The Concept of Binding Type Checking Strong Typing Type Compatibility Scope and Lifetime Referencing Environments Named Constants
5-2
Introduction
Imperative languages are abstractions of von Neumann computer architecture. The 2 primary components are:
Memory: stores both data and instructions. Processor: provides operations to modify the contents of memory. The abstractions in the L for memory cells are variables.
Variables characterized by a collection of attributes: Name, Address, Value, Type, Lifetime, and Scope
.ITCS332
5-3
Names
Names are one of the fundamentals attributes of vars. Names are associated with labels, arrays, subprograms, formal parameters, A name is a string of characters used to identify some entity in a program. In most PLs names have the same form: a letter followed by a string consisting of letters, digits, and underscore (_). In recent languages _ is replaced with Camel notation Design issues for names:
.ITCS332
Maximum length? Are connector characters allowed? Are names case sensitive? Are special words reserved words or keywords?
by Dr. Abdel Fattah Salman 5-4
Names (continued)
Length
If too short, names are less readable and meaningless. The name length affects the size of symbol table. Language examples: FORTRAN IV: maximum 6 COBOL: maximum 30 FORTRAN 90 and ANSI C: maximum 31 Ada and Java: no limit, and all characters are significant C++: no limit, but implementers often impose one External names (those defined outside functions) limited to 31.
.ITCS332
5-5
Names (continued)
Connectors ( Underscore or space).
Pascal, Modula-2, and FORTRAN 77 don't allow Others do In recent languages underscore is replaced by camel notation In some versions of FORTRAN spaces were used (ignored by a compiler).
Case sensitivity
Disadvantage: readability (names that look alike are different)
worse in C++ and Java because predefined names are mixed case (e.g. IndexOutOfBoundsException)
5-6
Names (continued)
.ITCS332
5-7
Variables
A variable is an abstraction of a memory cell or a collection of memory cells. Variables - names of memory cells. Variables can be characterized as a sextuple of attributes: Name, Address, Value, Type, Lifetime, Scope Name Most variables have names, but not all variables have them Address - the memory address with which it is associated (called L-value). A variable may have different addresses at different times during execution: a function has a local variable allocated on the run-time stack. A variable may have different addresses at different places in a program: 2 subprograms have local variable with the same name.
.ITCS332
5-8
Variables
Aliases: If multiple names can be used to access the same memory location, they are called aliases. Aliases are created via: Pointers: 2 pointer are aliases when they point to the same memory location. Reference variables: The pointer when dereferenced and the variable name are aliases. Unions in C and C++. Subprogram parameters. Aliases are harmful to readability (program readers must remember all of them): if variables sum and total are aliases, any change to total also changes sum and vice versa. Aliases makes program verification more harder.
.ITCS332
5-9
5-10
5-11
5-12
.ITCS332
5-13
Type Binding
Before a variable can be referenced, it must be bound to a data type. The 2 important aspects of this binding are: How is a type specified? When does the binding take place? If static type binding: the type may be specified by either an explicit or an implicit declaration. An explicit declaration is a program statement used for declaring the types of variables- lists variable names and specifies their type. An implicit declaration is a default mechanism (convention) for specifying types of variables (the first appearance of the variable in the program). Most PLs require explicit declarations of ALL program variables. FORTRAN, PL/I, BASIC, Perl, JavaScipt provide implicit declarations Advantage: writeability Disadvantage: implicit declaration can be detrimental to reliability because they prevent the compiler from detecting some typographical and programmer error. (less trouble with Perl).
.ITCS332
5-14
Type Binding
Some problems of implicit declaration can be avoided by requiring names for specific types to begin with particular special character. In Perl: If a names begins with $ it is a scalar (string or numeric); if it begins with @, it is an array; if it begins with %, it is a hash structure. This creates different namespaces for different type variables. In C and C++: Declarations specify type and other attributes but do not cause allocation of storage. Definitions specify attributes and cause allocation of storage. A variable may have MANY compatible declarations but ONE definition. Prototypes declare names and interfaces but not the code of functions
.ITCS332
5-15
This statement causes the name list to be a single-dimensional array regardless of previous type of name list. On the other hand, the statement list = 17.3; defines the name list as a scalar variable . Advantage: Programming flexibility (generic program units) Disadvantages: Makes a program less reliable and makes Type error detection by the compiler is difficult- Incorrect type of RHS of assignments are NOT detected as errors; rather, the type of the LHS is changed to the incorrect type.
Suppose i and x are currently storing scalar numeric values, and y is storing an array. Suppose that the JavaScipt program needs the assignment i=x; but because of a keying error, we wrote i=y;
.ITCS332
5-16
5-17
5-18
5-20
.ITCS332
.ITCS332
Advantage: variables have the highest degree of flexibility Disadvantages: Inefficient, because all attributes are dynamic large run-time overhead for binding ALL attributes.. Loss of error detection by the compiler They have the same storage management problems as explicit heap-dynamic variables.
.ITCS332
5-23
Type Checking
Generalize the concept of operands and operators to include subprograms and assignments. Subprograms are as operators whose operands are their parameters. The assignment symbol is as a binary operator, with its target variable and its expression being the operands. Type checking is the activity of ensuring that the operands of an operator are of compatible types A compatible type is one that is either legal for the operator, or is allowed under language rules to be implicitly converted, by compilergenerated code, to a legal type: int + real causes int to be converted (coerced) to real. This automatic conversion is called a coercion. A type error is the application of an operator to an operand of an inappropriate type
.ITCS332
5-24
Strong Typing
Advantage of strong typing: allows the detection of the misuses of variables that result in type errors Language examples:
FORTRAN 77 is not: parameters, EQUIVALENCE Pascal is not: variant records C and C++ are not: parameter type checking can be avoided; unions are not type checked. Expressions in Java are strongly typed. The value of strong typing is weakened by coercion. Ls with great deal of coercion (Fortran, C, C++, are significantly less reliable than those with little coercion such as Ada. Java and C# have half as many assignment type coercion as C++, so their error detection is better than C++. Ada is, almost (UNCHECKED CONVERSION is loophole) (Java is similar).
.ITCS332
5-26
.ITCS332
5-27
5-28
.ITCS332
Variables of these 2 derived types are incompatible, although their structures are identical. Variables of both types are incompatible with any other FP type. Lierals are exemption from the rule: a literal 3.0 has the type universal real and compatible with any FP type. An Ada subtype is a possibly range-constrained version of an existing type. A subtype is compatible with its parent type. Subtype small_type is integer range 0..99; Variables of subtype small_type are compatible with integer variables.
.ITCS332
5-30
have different names and different subscript ranges, because for objects of unconstrained array types, structure compatibility rather than name compatibility is used. Because both types have 10 elements and the elements are of type integer, they are compatible. Consider the following Ada declarations of constrained anonymous types
A: array(1..10)of integer; A has anonymous unique type. B: array(1..10)of integer; The 2 arrays A and B would be of anonymous but distinct incompatible
types, though they structurally identical. The multiple declaration: C, D: array(1..10)of integer; creates 2 anonymous types: one for C and one for D, which are incompatible. So, C and D are not compatible.
.ITCS332
5-31
5-32
5-33
Static Scope
There are 2 categories of static-scoped Ls: Ls in which subprograms can be nested creating nested static scopes, and Ls in which subprograms cannot be nested . Java and PHP allow nested subprograms, but C-based Ls do not. In static-scoped Ls: to connect a name reference to a variable, you (or the compiler) must find the statement in which it is declared. Search process: search declarations, first locally, then in increasingly larger enclosing scopes, until one is found for the given name. Enclosing static scopes (to a specific scope) are called its static ancestors; the nearest static ancestor is called a static parent. In the next example suppose a reference is made to a variable x in sub1:
.ITCS332
5-34
Scope (continued)
procedure Big is x: integer; procedure sub1 is begin -- of sub1 ... x ... end; -- of sub1 procedure sub2 is x: integer; begin -- of sub2 ... end; -- of sub2 begin -- of Big ... end; -- of Big
.ITCS332
Under static scoping: the reference to x in sub1 is to the x defined in Big. The presence of predefined names complicates this process: in this case the search for the meaning of a variable begins with the list of predefined names, even before the local scope declarations are checked.
5-35
Scope (continued)
Variables can be hidden from other units by having a "closer" variable with the same name:
void sub (){ // The reference to count in the while loop is to to that loops local count. int count; // The count of sub is hidden from the code inside the whiles loop. ... while (... ) { int count; count++; ... } ... }
C++ and Ada allow access to these "hidden" variables Ada uses selective references (operator): unit.name In C++: Global variables can be accessed using the scope operator (::) class_name::name
.ITCS332
5-36
Blocks
A block is a code section with its own local variables and restricted scope. Blocks variables are stack-dynamic: they are allocated storage when the block is entered and deallocated when it is exited. Ada blocks are specified with declare clause. C-based Ls allow any compound statement (a statement sequence surrounded by matched braces) to have declarations and thus define a new scope. Compound statement is a block.
Examples: for (...) { int index; ... } Ada: declare LCL : FLOAT; begin ... end .ITCS332 by Dr. Abdel Fattah Salman C and C++:
5-37
Blocks
C++ allows variables definitions to appear anywhere in functions. When a definition appears at a position other than at the beginning of a function, but not within a block, that variables scope is from its definition statement to the end of the function. The for statements of C++, Java, and C# allow variable definitions in their initialization expressions. The scope of such a variable is restricted to the for construct. In C++, Java , and C#: The scope of a class variable is the whole class in which it is defined. The scope of a variable defined in a method starts at the definition.
.ITCS332
5-38
.ITCS332
5-39
A C D
B E
A C D
B E
.ITCS332
5-40
.ITCS332
5-41
Dynamic Scope
Dynamic Scope is based on calling sequences of program units, not their spatial textual layout (temporal versus spatial). So the scope is determined at run time. References to variables are connected to declarations by searching back through the chain of subprogram calls that forced execution to this point
.ITCS332
5-42
Scope Example
procedure Big is x : integer; procedure sub1 is begin of x end ; of procedure sub2 is x : integer; begin of end; of begin of ... end; of
.ITCS332
sub1 sub1
The meaning of var x in sub1 is dynamic. It may reference x from either declaration of x, depending on the calling sequence. Begin the search with local declarations. If not found, the declarations of the dynamic parent or calling procedure are searched.
5-43
Scope Example
The search continues in that procedures dynamic parent and so forth, until a declaration of x is found. If none is found in any dynamic ancestor, it is a run-time error. In dynamic scoping: Consider 2 diff call sequences for sub1: Big calls sub2 which call sub1. Search starts at local proc sub1 to its caller sub2, where a declaration of x is found. So, the reference to x is to the x declared in sub2. Sub1 is called from Big. The dynamic parent of sub1 is Big, and the reference is to the x declared in Big. In static scoping: the reference to x in sub1 will be to Bigs in both call sequences.
.ITCS332
5-44
Scope Example
MAIN - declaration of x SUB1 - declaration of x ... call SUB2 ... end SUB1 SUB2 ... - reference to x ... end SUB2 ... call SUB1 ...
.ITCS332
5-45
Scope Example
Static scoping Reference to x is to MAIN's x Dynamic scoping Reference to x is to SUB1's x Evaluation of Dynamic Scoping: Advantage: convenience Disadvantage: Less reliable programs than static scoping Inability to statically type check references to nonlocals. poor readability. Accesses to nonlocal vars in dynamic-scoped Ls take longer than accesses to nonlocals when static scoping is used.
.ITCS332
5-46
/* end of compute
*/
The scope of a var sum is completely contained within the compute function. The lifetime of sum extends over the time during which printheader executes The binding of sum to storage continues until the termination of compute.
.ITCS332
5-47
Referencing Environments
The referencing environment of a statement is the collection of all names that are visible in the statement In a static-scoped language, it is the local variables plus all of the visible variables in all of the enclosing scopes (ancestor scopes). In these Ls, the ref. environment is needed while that statement is being compiled, so code and data structures can be created to allow references to variables from other scopes during run time. In Ada, scopes can be created by procedure definitions. The referencing environment of a statement includes the local variables plus all of the variables declared in the procedures in which the statement is nested (excluding variables in nonlocal scopes that are hidden by declarations in nearer procedures). Each procedure definition creates a new scope and a new referencing environment
.ITCS332
5-48
Referencing Environments
procedure Example is A, B : integer; ... procedure sub1 is x, y: integer; begin of sub1 ... < ------------------ 1 end ; of sub1 procedure sub2 is x: integer; ... procedure sub3 is x: integer; begin of sub3 ... < ------------- 2 end; of sub3 begin of sub2 ... < ----_-------- 3 end; of sub2 begin of Example ... < ------------- 4 end; of Example
.ITCS332
1 x, y of sub1, A, B of example 2 x of sub3, (x of sub2 is hidden), A and B of example 3 x of sub2, A and B of example. The variables declared in sub1 are stack- dynamic, so they are not bound to storage if sub1 is not in execution.
5-49
Referencing Environments
A subprogram is active if its execution has begun but has not yet terminated In a dynamic-scoped language, the referencing environment is the local variables plus all visible variables in all active subprograms Assume that the only function calls are: main calls sub2 which calls sub1:
void sub1() { int a, b;
<----------- 1
} void sub2 () { int b, c; sub1; } void main () { int c, d; sub2 ( ); }
.ITCS332
/* end of sub1*/
1 a, b of sub1,
<----------- 2
/* end of sub2*/
<----------- 3
/* end of main*/
by Dr. Abdel Fattah Salman
c of sub2, d of main, (c of main, b of sub2 are hidden). 2 b, c of sub2, d of main, ( c of main is hidden). 3 c and d of main
5-50
Named Constants
A named constant is a variable that is bound to a value only ONCE when it is bound to storage Advantages: readability (Using pi instead of 3.14) and modifiability ( changing all occurrences of 3.14 to 3.14159 can be tedious and error-prone). Used to parameterize programs The binding of values to named constants can be either static (as in FORTRAN 95) or dynamic (as in Java). Named constants in Ls that use static binding of values are called manifest constants. Languages:
FORTRAN 90: constant-valued expressions Ada, C++, and Java: expressions of any kind C# has 2 kinds: constants defined with const (static) and those defined with readonly (dynamic).
.ITCS332
5-51
Variable Initialization
It is convenient for a variable to have a value before the code in which it is declared begins execution. The binding of a variable to a value at the time it is bound to storage is called initialization. Binding a value to a named constant. If a variable is statically bound to storage, then binding and initialization occur before run time the initial value must be specified as a literal or constant expression. If the storage binding is dynamic, initialization is also dynamic and the initial values can be any expression. In most Ls, initialization is often done on the declaration statement, e.g., in Java int sum = 0; int ptrSum = ∑ char uob[ ] = University of Bahrain;
.ITCS332
5-52
Summary
Case sensitivity and the relationship of names to special words represent design issues of names Variables are characterized by the sextuples: name, address, value, type, lifetime, scope Binding is the association of attributes with program entities Knowledge of binding times of attributes to entities is essential to understanding the semantics of PLs. Scalar variables are categorized as: static, stack dynamic, explicit heap dynamic, implicit heap dynamic Strong typing means detecting all type errors
.ITCS332
5-53