CSCI 5535 Course Project - A Report On Interpreted Programming Languages

CSCI 5535 Course Project -- A Report On Interpreted Programming Languages
CSCI 5535 Project
A Report on
Interpreted Programming languages

by Xiaoli Zhang Helen Wong
December 11, 1996
By Xiaoli Zhang & Helen Wong
Dec. 11, 1996
Content
1. Introduction 2. Two important Languages in the Evolution of Interpreted Languages Pascal Smalltalk 3. Interpreter and Virtual Machine Traditional Compilation Process Self Compilation Compiler and Interpreter Intermediate Language Just-in-time and On-the-y Virtual Machine Examples of intermediate languages and related abstract machines Abstract Machine to Actual Machine Portability of Interpreters 4. Scripting Languages and Interpreted Languages 5. Case Study a) Java Overview Java Virtual Machine Java Language Construct and Javas Interpreter b) Tcl/Tk Overview On-the y Bytecode compiler for Tcl c) Both are Web Programming Languages d) Another Mobile Language: Omniware 6. Why Interpreted Languages? Portability Security Reusability Rapid Development Performance Other Advantages of Interpreted Languages 7. Summary
Dec. 11, 1996
1. Introduction Interpreted Languages have become more and more popular. In recent years, interpreted language such as Java, Tcl/Tk and Perl are the hot topics and wide-spread. Why? Generally, it is because they are portable, easy to use, fast to develop and safe. And most interpreted languages are closely related to Web programming. In this paper, we will do some study to expose the nature of interpreted programming languages and how these features of interpreted languages are achieved. 2. Two important Languages in the Evolution of Interpreted Languages Pascal Pascal is one of the early interpreted language developed by Niklaus Wirth. The non-interpreted Pascal was designed and implemented in 1967. The rst Pascal compiler was implemented for the CDC6000 computer family. It was written in Pascal itself. In implementing Pascal compiler, Wirth found that the effort to generate good code is proportional to the mismatch between language and machine, and the CDC6000 had certainly not been designed with high-level language in mind. [1] Whats more, after the existence of Pascal became well-known, many people asked Wirth for assistance in implementing Pascal on various other machines. Most of them wanted to use Pascal for teaching purpose. They liked Pascal for its simplicity and implementation elegance while did not care much about the performance. Thereupon, Wirth decided to provide a compiler version that would generate code for machines of different designs. Later, the code became known as P-code. P-code is an abstract machine code whose target is a Virtual Machine called P-machine. As an intermediate language, P-code is then interpreted to emulate its virtual machine on real machine. The P-code version Pascal was easy to construct because the new compiler was developed as a substantial exercise in structured programming by stepwise renement and therefore the rst few renement steps could be adopted unchanged. It also proved to be very successful in spreading the language among many users on different machines. Wirth had regretted that he had not possessed the wisdom to foresee the dimensions of this movement. Otherwise, he would have put more efforts into designing and documenting P-code. [1] Pascals P-code and related Virtual Machine elaborated the concept of existed Intermediate Language and Virtual Machine and thus are very important in the evolution of interpreted languages. Now, P-code has almost become a household word in the area of programming languages. With the Virtual Machine, Pascal-P system was well developed to an environment with integrated compiler, lter, editor, and debugger. This caused Pascals further spreading out. As mentioned above, Pascal-P is both compiled and interpreted. It has both compiler such as pcom and interpreter such as pint. [4] As a whole, it takes place in two phases, rst the compiler compiles a source code into P-code, and then the interpreter interprets the P-code. This implementation used self-compilation: The compiler is written in its own source language and can compiler itself. This approach is a common combination of elementary methods and is called
Dec. 11, 1996
bootstrap which is also very helpful in software migration. In Pascal-P, The resulting compiler is written in the Virtual Machine Language -- P-code and generates code for this same machine. Hence the compiler itself must be interpreted. [2] Similar story happens in Javas implementation. Smalltalk Another signicant interpreted language is Smalltalk which was developed during the 1970s at Xerox PARC (Palo Alto Research Center). It was the rst language to really exploit a graphical user interface. Many of the ideas for the Macintosh came from Smalltalk. Smalltalk is more of an envrionment rather than a language. This is because there is a Smalltalk Virtual Machine, and the entire operation of the Smalltalk environment and language is built on the virtual machine. [6] We here call Smalltalk an interpreted language solely because Smalltalk is a P-Machine. What actually happens as a result of a message sent in Smalltalk is: rst the system checks to see if the method has already been translated to machine code that has been cached in memory if the native machine code form is in the cache, the system executes that machine code if the cache doesnt contain a translated form of the method, the system dynamically compiles the methods bytecode [5]
Dynamic translation yields the benets of the execution speed of compiled code and the space compactness of bytecode. If all the code in a running Smalltalk image were kept purely in the form of compiled machine code, the image would consume 5-10 times as much memory, and therefore could in fact degrade performance on a virtual memory system by causing increased paging. [7] Many features in Smalltalk are worthy to be borrowed by new interpreted languages such as Java. One of these features is just-in-time compilation we mentioned above. Currently, Java is implementing just-in-time compilation of the bytecode into native code to improve its performance. We will address some not well-known interpreted Languages next section in introducing Interpreter and Virtual Machine. Also we will address in details those popular interpreted languages such as Java and Tcl as case study while address Perl as scripting language. 3. Interpreter and Virtual Machine In last section, quite a few terms (in bold characters) related to interpreter languages are mentioned. This section, we will report in details the concepts represented by these terms. Lets go a little bit backward. Traditional Compilation Process The process that translate a high-level language into machine code, which the hardware can understand is done by the compiler. The task of compiler has two subtasks: analysis of the source program and synthesis of the object program. Typically as in gure 1, the analysis tasks consists
Dec. 11, 1996
of three subphases: lexical analysis, syntax analysis and semantic analysis. While the synthesis task is usually a single phase: code-generation
source program
lexical analyzer
syntax analyzer
semantic analyzer
code generator
object program
Front End
Back End Figure 1
The lexical analyzer is responsible for reading the characters of the source program and recognizing basic syntactic components or tokens that they represent and returning the tokens to the syntax analyzer or parser. Then the parser has to determine how to group and structure the tokens according to the syntax rules of the language. The output of parser is a representation of the syntactic structure of the source program and often expressed in the form of parse tree. The parse tree is then passed to the semantic analyzer which is to determine the meaning of the source program including the meaning of declarations and scopes of identiers, storage allocation, type checking, selection of appropriate polymorphic operators, addition of automatic type transfers, etc. The code generator in the last phase of the compilation process takes the output from the semantic analyzer as input and generate machine code or assembly language for the target hardware. It has to know the machine architecture including machine instructions, allocation of machine registers, addressing, interfacing with the operating system and so on. in order to generate object code for that machine. If we say that the analysis phase or front end is language-dependent, --- the analyzers have to know the syntactic and semantic rules of the language, --- the synthesis phase or the back end is machine-dependent. The code generator usually includes some form of code optimizer to produce faster or more compact code. The code generation may include both machine-dependent and machine-independent techniques. [27] Self Compilation While compiler is to translate high level languages into machine code or object code, most compilers themselves are software written in high level languages, some of them are in the source languages they are supposed to compile. How can this self-compilation be achieved? This is done by a process called bootstrapping. We are going to take Pascal as an example to try to illustrate the bootstraping process. Refer to Table 1, suppose there are machines X, Y, Z. Any two of them
Dec. 11, 1996
could be the same or different. source code of Pascal Compiler target machine used compiler object code
(1) { Modula-2 for Y } (2) { Pascal for Z } (3) { Pascal for any }
[in Xs assembly language, running on X] [in Modular-2, running on X] [Pascal, running on Y] Table 1. Bootstraping
Xs object code Ys object code Zs object code
In compilation process (1), a Pascal compiler source code for Y was written in Modula-2. The code then was compiled by a Modula-2 compiler written in Xs assembly language, and was translated into Xs object code. Once this new compiler existed, a Pascal compiler source code for Z written in Pascal could be passed to it and was translated into Ys object code, as in (2). Further, another Pascal compiler source code written in Pascal for an arbitrary machine could be passed to the newest compiler and could be translated into Zs object code as in (3). Note a compiler will compile its input source code into object code of its target machine. While the compiler itself is an object code of the machine where it is running on. This machine does not have to be its target machine. [27] Compiler and Interpreter We denote interpreted languages to those languages using an interpreter in its compilation process. So, whats an interpreter by denition? A translator takes a program written in a source language as input and translates it into a program having the same meaning but written in an object language. If the source language is a higher-level one, the translator is a compiler. Generally, compiler generates machine code or abstract machine code from source code. A interpreter directly executes its source language, without rst translating it into an object language. Some Lisp or APL implementations could be considered to be pure interpreters. But many languages implementation consist of both compiler and interpreter. The former translates the source language into an interpretable intermediate language, in this case, the intermediate language is the source language for the interpreter. [2] With the intermediate language and interpreter, the compilation process becomes more sophisticated, typically as in Figure 2. The semantic analysis phase is often followed by another process that takes the parse tree from the syntax analyzer and produces a linear sequence of instructions equivalent to the original source program. [27] The sequence of instructions can be considered as abstract machine code since it is targeted not to an actual machine but an abstraction of real machines. This abstraction is often called abstract machine or virtual machine. Intermediate Language
Dec. 11, 1996
The intermediate language, which occurs between two phases of an language translation process,
abstract code generator
abstract machine code
interpreter memory
semantic analyzer
syntax analyzer
code generator
on-the-y
native machine code
source program
lexical analyzer
machine code Front end abstract machine code = intermediate language Figure 2 is an object code for the rst phase and a source language for the second phase. It is very important for modern interpreted languages such as Java and plays a great role in languages portability and security. With intermediate language, the problem arising from the characteristics of the target hardware can be conned to the code generator. So the front end of the compiler can be used for any different code generator for different machines. And the compiler can be easily ported to different machine such as by bootstrapping, since now only the code generator is necessary to be ported. If we substitute the code generator with an interpreter in the back end of the compilation process, the implementation of a language on new hardware will be further easier since the implementation of an interpreter is much easier than that of a code generator. We will see this typically in Javas implementation (section 5. a)). The minor disadvantage of intermediate language is that it is sometimes somewhat harder to generate optimized machine code from intermediate language than directly from parse tree. [27] An intermediate language is usually designed for a particular source language. It reects the conBack end
Dec. 11, 1996
structs, data types and operators of the source language in its basic operation. For example, Pmachine is a hypothetical stack-based virtual machine with very simple structure and P-code contains many instructions closely related to Pascal languages construct. While the designing of an intermediate code for several different source languages is hard. Such kind of generic intermediate language was proposed early in 1950s, a language called UNCOL(UNiversal Computer-Oriented Language), but failed to be developed due to practical difculties. Recently, people are thinking about generic intermediate language and there are some on going. We will address this in next section about Virtual Machine. Intermediate languages represent internal interfaces in the compilation process and consequently they can take any suitable form: trees, triples, quadruples, assembly languages, bytecode, etc. Pascals P-code is a famous intermediate language and is in assembly language. [2] Just-in-time and On-the-y As in Figure 2, intermediate language can be interpreted directly by an interpreter, or sometimes compiled again by native compiler or code generator into native machine code. This native compilation process is often Just-in-time compilation, means the native compiler rewrite those computer-intensive sections into native machine code at run-time as necessary, and the native machine code will not exist in disc le system but directly in memory. So, the compilation of intermediate language into native machine code is often a on-the-y compilation which by denition is that: the output of the compiler does not exist in the disc le system, but is loaded into memory portion by portion. While the input of on-the-y compilation could be either high-level source code or intermediate code, the output could be either intermediate code or native machine code. When the output is an intermediate code such as Bytecode, interpreter is necessary to interpret the Bytecode usually cached in memory. Virtual Machine As mentioned above, a virtual machine is an abstraction of a family of real machines. To be more accurate, a virtual machine is a ctitious target machine of an intermediate language, it species an somewhat ideal machine for some kind of convenience, either easier to write a simple-minded compiler or closer to most real machines. Most computers now have a set of general-purpose registers. Usually, operations take one of their operands from a register and the other from memory. Only some of the registers are for addressing in most cases.[2] So, a register-based virtual machine is closer to most real machine and the emulation of the virtual machine on a real machine needs less native machine instructions thus has less overhead. While most virtual machine now are stack-based. A stack machine has few actual registers, but an operand stack where operations nd their operands and put their results. The advantage of stack machines is that they can be totally independent of computer[2]. And the compilation process is relatively simple for a stack machine. Again, the usefulness of an virtual machine stems from the fact that it allows the majority of the
Dec. 11, 1996
compilation process to be isolated from dependency on a specic machine. Examples of intermediate languages and related abstract machines [2] P-code and related abstract machine: The abstract machine associated with Pascal-P is very conventional and exible. It is a stack machine with ve registers: top of stack, base of global variables, top of heap used for dynamic variables, base of local variables and instruction counter. EM-1 intermediate language and related machine: EM-1 is more sophisticated language than P-code and is closer to actual assembly language than P-code. It contains 130 instructions and P-code has only 60. It also contains a dozen pseudo instructions. While EM-1 machine is very similar to P-code machine, with a stack of local variable areas whose top is used as an execution stack, a heap, a global variable area, and a program area. Janus and its abstract machine: The abstract machine associated with the intermediate language Janus has a memory that is divided into several independent areas that are organized as tree structures. It uses a stack for expression evaluation, a processing unit to execute Janus instructions, and three specialized registers: condition code, instruction counter, and index register.
We will address two other important virtual machine: Java Virtual Machine and Omniware in Case Study of this paper. Abstract Machine to Actual Machine To obtain an actual implementation, the abstract machine must then be transported into actual machine. Generally, it is the interpreter that executes abstract machines instruction set in actual machine and gives abstract machine actual implementation. The two generally ways of interpreter implementation are: If the intermediate language resembles an assembly language, like Pascals P-code, the base operations of the abstract machine could be implemented using a macroprocessor. But if the macroprocessor is not a very powerful one, the resulting code is usually rather inefcient. An interpreter could be programmed or microprogrammed, which would amount to direct execution of the abstract machines instruction set, like Javas interpreter.[2]
For performance reasons, interpreter is not the only way for abstract machine to actual machine. The intermediate code could be recompiled usually on-the-y into targets machines native code. Portability of Interpreters The functionality of the front end of the compilation process of a language is identical to different target machines. With the self-compilation technique, the implementation of the front end could be totally portable to various platforms, such as Javas compiler javac. While the non-portable part
Dec. 11, 1996
of the implementation of a language is interpreter or code generator. Like code generator, an interpreter has to make use of the operating system facilities of the target machine by performing input and output, making use of graphics or window systems, making storage allocation requests, etc. That is, an interpreter has to deal with the run-time library which is a convenient way of providing interface between the compiled (abstract) machine code and the operating system, including a set of routines that can be called by the compiled code to perform all the machine and operating system dependent functions required by the users highlevel language program. It is possible to write part of the run-time library in high level language, as Javas API in Java. But at least part of the run-time library will have to be written in a low-level language to make use of particular machine and operating system facilities. 4. Scripting Languages and Interpreted Languages Scripting languages are good for their implementing variables, ow control and procedures for commands and serving as glues for commands. Scripting Languages are all interpreted. UNIX shell languages are simple scripting languages, they are interpreted directly with no intermediate languages and virtual machines involved. The interpreter for UNIX shell language is just a single executable, such as /usr/local/bin/sh for Bourne Shell, /usr/local/bin/ksh for Korn Shell. The interpreter that interprets high-level language directly has to include the lexical and syntax analysis phases in the front end of compilation process. But most directly interpreted high-level languages such as UNIX shell programming languages are simple enough that the interpreters can still be kept simple. But the modern scripting language, such as Perl, is not interpreted directly any more. The design goal of Perl is to make a scripting language easy to develop and portable. So the implementation process of Perl is two-phased. Perl is both compiled and interpreted. It is compiled because the program is completely read and parsed before the rst statement is executed. It is an interpreted because there is no object code sitting around lling up disk space. In some way, its the best of both world, typically a on-the-y compiler-interpreter process. While the compilation does take time -- its inefcient to have a voluminous Perl program that does one small quick task and then exits, because the runtime for the program will be dwarfed by the compiler time -- it is more efcient for heavy tasks such as those with a large body of loop. Compilation will save the time for reparsing. That is why another directly interpreted scripting language, Tcl is switching to the on-the-y bytecode compiler-interpreted style like Perl. To take more advantage of this style of interpreted languages. A caching of the compiled object code between invocations is used by both Perl and Tcl. We will address Tcls on-the-y bytecode compilation in case study of Tcl. The on-the-y compilation of Tcl or Perl is different from Javas on-the-y compilation, since they are on-the-y bytecode compilation. The whole compiler-interpreter compilation process happens at run-time; the target of the on-the-y compilation is bytecode that will be cached in
Dec. 11, 1996
10
memory and then be interpreted dynamically. While all scripting languages are interpreted, not all interpreted languages are scripting languages. An example is the popular language: PostScript, which is a page description language. PostScript language is typically interpreted, stack-based. The stack-based feature make the source code of PostScript natural to be interpreted and portable. This feature makes PostScript device independent, meaning that the image is described without reference to any specic device features. So, PostScript les in their source code can be transferred from machine to machine even by email in ASCII form and then be interpreted by interpreter such as ghostview and those plugged in printers without any modication. 5. Case Study a) Java Overview Java is a simple, familiar to user, Object-Oriented language. That is because Java takes the syntax very similar to C and C++ while it is a cleaned-up version of C++. It supports garbage collection removed off a bunch of features in C and C++ that make C and C++ complex, such as: pointers, automatic coercions, operator overloading and multiple inheritance, etc. Other important aspects for Javas success are its internet-related features, as in the following: Dynamic: In Java, classes are linked only as needed. New code modules can be linked in on demand from a variety of sources, even from sources across a network. Instead of simply downloading static pages of texts and images, Javas applets can be download through web browser and run in the client machine. This support the image animation and real-time userprogram interaction. Threaded: Modern network-based applications, such as the HotJava Web browser, typically need to do several things at the same time. A user can run several animations concurrently while downloading an image and scrolling the page. Javas multithreading capability provides the means to support this feature. [11]
The reason why Java is a popular mobile language is that it is architecture neutral and portable. To accommodate the diversity of operating environments, the Java compiler generated bytecode-an architecture neutral intermediate format designed to transport code efciently to multiple hardware and software platforms. The interpreted nature of Java solves both binary distribution problem and the version problem; the same Java language byte codes will run on any platform. Javas portability also relies on its basic data types and the behavior of its arithmetic operator. This makes programs the same on every platform. There are no data type incompatibilities across hardware and software architectures.
Dec. 11, 1996
11
The self-compilation feature of Java is also a factor that makes Java more portable. Javas compiler is written in Java and exists as Java bytecode. Furthermore, Java API and HotJava browser all exist as bytecode. Finally, in Java system, only the interpreter is left to be run-time system dependent. Java Virtual Machine The architecture-neutral and portable platform of Java is the Java Virtual Machine. Its the specication of an abstract machine for which Java compiler can generate code. Specic implementations of the Java Virtual Machine for specic hardware and software platforms then provide the concrete realization of the virtual machine. The Java Virtual Machine is based primarily on the POSIX interface specication -- an industry-standard denition of a portable system interface. Implementing the Java Virtual Machine on new architectures is a relatively straightforward task as long as the target platform meets the basic requirements such as support for multithreading. [11] Java VM is called A soft-CPU. It is a stack-based machine. JVM supports about 248 bytecodes, each performs a basic CPU operation like adding an integer to a register, combining the numbers in two registers, jumping to subroutines, storing a result, incrementing or decrementing registers, etc. In effect, JVM is a stacked arithmetic logic unit with local and global variables. To add two numbers, the VM actually works as follows: the VM rst pushes them onto its stack, then adds them. After completing the addition, the VM leaves the results on the stack for the next step in the process. To emulate this in a real machine, most probably a register-based machine, it takes quite a few real machine instructions and memory references. So, there is overhead for the transportation from stack-based VM to register-based real machine. We will address this further in section 6 of this paper where addressing Performance. At the beginning, Java Virtual Machine is the target machine just for Java source language. Recently, people are trying to support other languages on top of the same Java Virtual Machine. According to Javas creator James Gosling, languages like Visual Basic, COBOL, Dylan and Scheme are fairly reasonable bet for the Java VM. So, although JVM was not designed as a generic virtual machine, it is now intend to serve as one for existing requirements. [28] Java Language Construct and Javas Interpreter A Java programmer can create: Applets: Programs that are included in HTML pages through the APP tag and displayed in the HotJava browser. The simple hello world program shown in A Simple Java Program is an applet. The HotJava browser is invoked by the hotjava command included in the Java code distribution. Applications: The stand-alone program written in Java and executed independently is the HotJava browser. This is done using the Java interpreter--java, included in the Java code distribution.
Dec. 11, 1996
12
Protocol handlers: Programs that are loaded into the users HotJava browser and interpreter protocol. These protocols include standard ones such as HTTP or programmer-dened protocols. Content handlers: A program loaded into the users HotJava browser, which interprets les of a type dened by the Java programmer. The Java programmer provides the necessary code for the users HotJava browser to display/interpret this special format. Native methods: Methods that are declared in a Java class but implemented in C. These native methods essentially allow a Java programmer to access C code from Java. [10]
There is another tools in JDK called AppletViewer for testing and running applets. AppletViewer also has Java interpreter, java, embedded. Java Interpreter is plugged into every Java-enabled web browser. Here is a practical way to understand the technical description of Java by looking at the processes that occur when a user with a Java-enabled browser requests a page containing a Java applet: 1. The user sends a request for an HTML document to the information providers server. 2. The HTML document is returned to the users browser. The document contains the APP tag, which identies the applet. 3. The corresponding applet bytecode is transferred to the users host. This bytecode had been previously created by the Java compiler using the Java source code for that applet. 4. The Java-enabled browser on the users host interprets the bytecode and provides the display. 5. The user may have further interaction with the applet but with no further downloading from the providers Web server. This is because the bytecode contains all the information necessary to interpret the applet. b) Tcl/Tk Overview Tcl stands for Tool Command Language, which is an extensible embedded command language or a scripting language, implemented by John Ousterhout originally from University of California, Berkeley, now working for Sun. What makes Tcl different from other scripting languages is the ability of easily adding a Tcl interpreter to applications. A Tcl interpreter consists of a set of commands, a set of variable bindings and a command execution state. It is the basic unit manipulated by most of the Tcl library procedures. Applications may have one or more interpreters according to their complexity respectively. Multiple interpreters may responsible for different purposes. Tcl commands may be
Dec. 11, 1996
13
built-in commands such as those ow control key words: for, if, case, eval, etc. or may be application-specic commands dened by users. The application-specic commands have no limit to be extended up to the developer and user group. Since programmers can structure their applications using a set of primitive operations as well as any existing commands together with any new command(s) developed by themselves to best suit their need, there is no need to invent a command language for new application. All commands are embedded in Tcl code via creating interpreter object(s) inside the application by calling library procedures, similar to dening an extern function in C. That is natural for Tcl to create an interpreter inside the Tcl source code since an interpreter is equally a set of commands. Unlike other languages, such as Java, where an interpreter is a separate executable even though the execution of the interpreter costs memory and CPU time concurrently with interpreting the bytecode. The aspects that set Tcl apart form other extension languages, such as Scheme, Elisp and Python are: (1). Tcl has simple constructs somewhat like C and Tcl primitives are written in C or C++ procedures. (2). Tcl C library provides a clean interface to native C code. (3). Most extensions include new functionality such as socket access for network programming, database access, telephone control and expected interactive features. (4) Tcl is open to be developed by its community. [21] The most notable extension of Tcl is Tk, a toolkit for X windows as well as for windows and Mac. Tk provides a convenient way for user to build Motif-based GUIs because of its higher-level interface to X and its rapid turnaround in development. Safe-Tcl is a subset of Tcl where access to system resource is controlled. With something secure, Safe-Tcl is for running network agents. With the combination of Tk and Safe-Tcl, a web browser called TkWWW is now available for free. [12]
On-the-y Bytecode compiler for Tcl [19] Although Tcl has bunch of advantages as a new scripting language. Its lack of structure and slowness make it not good for large applications. To improve Tcls performance, people in Sun Microsystems Laboratories are working on an on-the-y bytecode compiler for Tcl. Below are some direct quotation from the paper An On-the-y Bytecode Compiler for Tcl by Brian T. lewis of that lab: So far Tcl is interpreted directly. Although the current Tcl interpreter is fast enough for most Tcl uses, there are many applications that need greater speed. The two main performance problems in current Tcl system (Tcl 7.5) are script reparsing and conversions between strings and other data representations. The current interpreter spends as much as 50% of its time in parsing. it reparses the body of a loop, for example, on each iteration. Data conversions also consume a great deal of time. It is reported that 92% of the time in incrs command procedure Tcl_incrCmd() was spent converting between strings and integers. To solve these performance problems, a new Tcl compiler and interpreter are being developed at
Dec. 11, 1996
14
Sun Microsystems Laboratories. Their goal for the bytecode compiler is to improve the speed for compute intensive Tcl scripts by a factor of 10. The compiler translates Tcl scripts at program runtime, or on-the-y, into a sequence of bytecode instructions that are then interpreted. The compiler eliminates most runtime script parsing. It also makes many decisions at compiler time that are made now only at runtime. It can tell, for example, whether a variable name refers to a scalar or an array element. It also compiles away many type conversions. As an example, it can recognize whether the argument string specifying the increment amount in an incr command represents a constant integer. The bytecode interpreter uses dual-ported objects extensively. These objects contain both a string and an internal representation appropriate for some data type. For example, a Tcl list is now represented as an object that holds the lists string representation as well as an array of pointers to the objects for each list element, dual-port objects avoid most runtime type conversions. they also improve the speed of many operations since an appropriate representation is available. The compiler itself uses dual-ported objects to cache the bytecode resulting from the compilation of each script. c) Both are Web Programming Languages As we mentioned in a) of this section, Java is an Internet-Oriented language. Tcl/Tk is also closely related to Web programming. Sun has recently released a Tcl/Tk plug-in for NetScape Navigator. It allows Web pages to contain Tcl/Tk scripts and display interfaces in the browser window. The plug-in used the Safe-Tcl mechanism to ensure that even untrusted script can be executed safely. So whats the difference between Java and Tcl/Tk? Tcl is a high-level scripting language. It is good for creating small and medium-sized applications quickly and gluing existing things together. It has a simple syntax and almost no structure, which makes it good for scripting. However, at least so far, Tcl is an directly interpreted language so it may not perform well for very large tasks. Think of Tcl as something like UNIX shell, except that it is embeddable and portable and can be used for Internet scripting, including CGI implementation. Java, on the other hand, is a system programming language like C or C++. it is much more structures than Tcl. This makes Java easier to build large complex application than Tcl. Java is also compiled, which results in great efciency. Java also supports multi-threading, whereas Tcl does not. Think of Java as something like C++ except simpler and more powerful and with facilities for sending Java programs around the Internet as executable content. [20] Since both Java and Tcl are properties of Sun and both are web programming languages, people are thinking of a marriage of Java and Tk, using Tk as the GUI building part of Java. It is said that Sun has a early version of a Tcl-to-Java interface. d) Another Mobile Language: Omniware
Dec. 11, 1996
15
Mobile language is pretty popular recently. It denotes those languages that can be easily ported and widely run on many nodes of the network. Since any programming language can be a web programming language and does not have to be portable, it is better to call Java, Perl and Tcl mobile languages. Another notable mobile language in our reports point of view is Omniware. Omniware is an interpreted language with two-phase compiler-interpreter process. It denes a virtual machine called OmniVM. The advantages of Omniware are: 1. OmniVM is a register-based virtual machine, and thus, it is closer to most real machine. So, the transportation from OmniVM to real machines is a shorter and lighter process than from Java Virtual Machine which is stack-based. 2. The design of OmniVM has all languages with C/C++ constructs in mind. So it can be the compiler targets of C/C++ and many others. In this case, Omniware serves somewhat a generic virtual machine. Omniware uses a technique called Software-based Fault Isolation which adds instructions to check at runtime that addresses are within legal address space to provide security, but as many other mobile languages, access to hosts system resource still remain a big problem in Omniware. [12] 6. Why Interpreted Languages? Now the hottest languages such as Java, Tcl/Tk and Perl are all interpreted languages. Why? An importance reason we think is they are all closely related with Internet. To be Internet-Oriented, the most importance feature of the language is portable. It has to operate in distributed environment, which means that security is of paramount importance. Interpreted languages have advantage to support both these features. Portability A program is portable if the effort required for its transport is much less than the effort required for its initial implementation and if its initial qualities remains the same after the transport. The portability of a program can be evaluated by measuring the transport effort. For example, if I is the work involved in initial implementation, and T is the work involved in transport, then the programs portability can be evaluated as: (I-T)/I. Hence any program can be mathematically determined to be 100 percent portable, which means that there is no transport effort involved, but this is impossible. [2] A mechanism that support software portability thus is the mechanism that can reduce efforts in software transportation. Some signicant this kind of mechanisms are:
Dec. 11, 1996
16
A compiler generates intermediate code that is independent of the target computer. If the compiler is self-compilation, itself is also portable. This is typically a compiler-interpreter mechanism with typical example as Pascal-P, Snobol4 and Java. A compiler can also be divided into two parts, the front end depending on the source language and the back end on the object language which in turn depends on the target machine. The interface between these two parts, if well designed, can be independent of both languages. A on-going study of generic virtual machine focuses on this mechanism. The mobile language Omniware mentioned above is a nice try in this category. Isolating those platform dependent parts of software, then using conguration tools such as imake to enable code to be compiled and installed on different platforms.
The rst two mechanisms are typically realized by interpreted languages with virtual machine. The virtual machines of interpreted languages are the platforms for architecture neutral and portable languages. In this case, Java and Omniware are the typical examples. Security Part of Javas security mechanism comes from its language design policy: simplicity. It excludes many dangerous features in C++, such as pointer, with which programmer could directly manipulate memory by accidents. And at the same time, Java provide automatic garbage collection. But the more important security mechanisms comes from its compiler-interpreter nature mentioned above. The compiler-interpreter mechanism with bytecode provides several levels of security defense for Java. The rst level is provided by the extensive compile-time checking. A trustworthy compiler ensures that Java source code does not violate the safety rules. The second level is provided by bytecode verier. This happens in the run time. Java just does not trust any applet coming from anywhere of the internet, and the bytecode verier has to ensure that the code passed to Java interpreter is in a t state to be executed and can run without breaking Java interpreter. The third level defense is done by the class loader. The class loader dynamically partition each network class source into its own private namespace and then prevents classes in one namespace from polluting other namespace. [13] While Javas security is mainly provided by its compiler-interpreted mechanism, Tcls security is provided by Safe-Tcl. Safe-Tcl is a mechanism that initializes a Tcl interpreter to a safe subset of Tcl commands so that Tcl scripts cannot harm their hosting machine or application. There are also mechanisms to grant privileges to a safe interpreter so the script can do non-trivial things. So the basic approach to ensuring safety is to rst completely remove the le command from safe interpreters and then replaced with command aliases. The NetScape Tcl plug-in supports Tcl/Tk applets, also called Tclets. The Tcl plug-in implements the standard Safe-Tcl subset, plus a limited version of Tk.
Dec. 11, 1996
17
Command aliases are the primary mechanism provided by Safe-Tcl to grant privileges. An alias is a command in the untrusted interpreter that is really implemented by a different, fully trusted interpreter. This is much like the user-mode and kernel-modes in multiuser operating systems. In Safe-Tcl, an untrusted script is isolated in its interpreter context, and given a few extra commands that are carefully implemented by another Tcl interpreters to ensure safety. Reusability Scripting languages as interpreted languages typically provide glue for commands. A shared, universal scripting language like Tcl serves as a powerful and exible glue for assembling reusable components. Tcl is a reusable command language because almost everything in this language is a command, from the Flow Control: for, if, case, continue, etc. to Variables and Procedures: global, proc, return, set. These built-in commands provide programmability and extensibility for free. Users of Tcl will feel free to develop any application-specic commands similar to those UNIX commands to UNIX shell. And these commands will appear the same as the built-in commands in Tcl. The most important design goal of Tcl is reusability. Thus it is component-approached. rather than building a new application as self-contained monolith with hundreds of thousands of lines of code, Tcl is a combination of many smaller reusable components. Each component would be small enough to be implemented by a small group, and interesting applications could be created by assembling existing components. [17] Rapid Development Reusability provides a way for rapid development of software application. The scripting or interpreted nature of interpreted languages are obvious good for rapid development. Instead of the heavyweight compiler, link, crush, debug cycles, interpreted languages can be interpreted directly and are easier to trace whats happening in the interpreting processes. Performance Currently, Java runs about 30 times slower than an equivalent C program. This seems not very bad considering those advantages Java has. Actually, performance is always a consideration of Javas designer. They thought they have achieved a superior performance by adopting a scheme by which the interpreter can run at full speed without needing to check the runtime environment. Also, the automatic garbage collection runs as a low-priority background thread, ensuring a high probability that memory is available when required, leading to better performance. Whats more, Sun have also been improving performance by providing just in time compilation of the bytecode into native code. Applications requiring large amounts of computer power can be designed such that compute-intensive sections can be rewritten in native machine code as required and interfaced with the Java platform. In general, Javas interactive applications respond quickly even though they are interpreted. But
Dec. 11, 1996
18
the efforts to improve performance will never get to an end. The current performance of Java still cannot meet the needs of a category of applications. Typically, an interpreted language has relatively low performance because of the overhead for fetching and decoding each virtual command or virtual instruction before performing the work specied by the commands. Most virtual machines at present are stack-based while most real machine are register-based. Interpreting the intermediate code to emulate corresponding virtual machine on a real machine thus is a heavier process compared with the situation where the virtual machine and the real machine have similar structure, either both stack-based or both registerbased. In Java, interpreting consists of token threadings. Each token threading is for one bytecode execution. A token threading requires about three instruction and ve memory references. And each virtual instruction required several real machine instruction. For example, executing an integer add (IADD) of JVM on most general-purpose processor-Sparc, 80x86, 680X0, PowerPC, ARM and MIPS-requires at least seven conventional processor instructions when using a C source code interpreter. To improve performance, a just-in-time compilation technique has been applied to Java which translates Java bytecode into instructions for the host processor at runtime. This technique does improve Javas performance by several times. Since native code compilers (or code generators) are usually complex software which cost both memory and execution time. This JIT compilation uses a less aggressive optimization which just translate each byte-code to in-line machine code or keep the top of the stack in a register. The performance improvement by JIT compilation is limited and it compromises with memory cost. There are arguments that the most efcient execution vehicle for many Java applications would be a dedicated Java chip which directly executes the Bytecode. Sun is now building a picoJava chip which is a microcontroller intended to directly execute Java Bytecodes. It is a simple, stack-based processor. Rather than being a pure stack architecture, the machine would have specic hardware features for dealing with Bytecode and other hardware feathers to t garbage collection, object-oriented, multithreading nature of Java. Now forget those rare-existed and newly-designed stack-based real machines, and lets talk about just stack-based virtual machines on register-based real machines. As mentioned above, the execution time of an interpreted program depends on the number of commands interpreted, the fetching and decoding cost of each command, and the time spent actually executing the operation specied by the commands. Since the number of commands required to accomplish a given task depends on the level of the virtual machine of the language, the performance of a interpreted language mainly depends on the level of the virtual machine dened for that language. A simple virtual machine might require the execution of a large number of commands, like Java. But the overhead of each virtual command is small and nearly xed. In contrast, Perl and Tcl each dene complex virtual machines and result in non-uniform slowdowns relative to the C implementations even their virtual machine can execute a given program in fewer commands. [26]
Dec. 11, 1996
19
As we mentioned in section 5. b), An On-the y Bytecode Compiler for Tcl is being implemented to improve Tcls performance. And caching those compiled bytecode will be very helpful in improving the performance of the interpreted languages such as Perl and Tcl. [24] Other Advantages of Interpreted Languages [27] Type of a variable could change dynamically during execution
Compiling efcient code to handle a dynamic typing where type of a variable could change during execution time is hard as the type of a variable is not known at compile time. While an interpreter could handle this situation easily and efciently. An interpreter can be very good for debugging
The interpreter can access the source program in its original form or in an internal form at any time. It also keeps holding a symbol table containing variable names and values. So, programmers can get diagnostic information in easily understandable forms. 7. Summary Most of interpreted languages have Virtual Machine, either explicitly dened such as Javas virtual machine, or implicitly dened such as Tcls and Perls. Some simple scripting languages such as UNIX shell PLs are interpreted directly and do not have virtual machine. Virtual Machines play a great role in interpreted programming languages. With the assistance of Virtual Machine, the compiler-interpreter mechanism provides portability, security and better performance for interpreted PLs. Scripting languages as interpreted languages are good for gluing programming components. When the group of components are open for extension, such as Tcls commands, built-in plus application-specic commands, the language can provide great reusability. The development processes of interpreted languages are relatively lightweight compared with the compile-link-test cycles in a traditional compiled language. So, interpreted languages are good for rapid development. Acknowledgment We would like to appreciate our Professor Benjamin Zorn for guidance of the topics in this paper. We believe that without his help we would have been still in a maze. . References [1]. Wirth N. From Programming Language Design to Computer Construction, ACM, February 1985, Vol 28, No. 2
Dec. 11, 1996
20
[2]. Lecarm O., Cart M. P., Gart M. Software Portability, McGraw-Hill Publishing Company, 1989 [3]. Kamin S. N. Programming Languages: an Interpreter-Based Approach, Addison-Wesley Pub. Co., 1990 [4]. Pembereton S., Daniels M. Pascal Implementation: The P4 Compiler and Interpreter, ISBN: 0-13-653-0311 [5]. Newsgroup: comp.lang.smalltalk [6]. Byrne S. B. GNU Smalltalk Users Guide, http://www.cs.utah.edu/csinfo/texinfo/mst/ mst_toc.html [7]. Goldberg, Robson, Smalltalk-80: The Language and Its Implementation, Addison Wesley, 1983, ISBN 0-201-11371-6 [8]. Sun Microsystems. The Java Virtual Machine Specication. http://java.sun.com/doc/vmspec/ html/vmspecl.html, 1995 [9]. Gosling, J, Java Intermediate Bytecodes, ACM SIGPLAN Workshop on Intermediate Representation, Jan. 1995 [10]. Sun JavaSoft: Getting Started: The Java Developers Kit [11]. Sun JavaSoft: Design Goals of Java 1.2 [12]. Caron J. Java: Status Report and Language Overview, CSCI 5535 Project, Dec. 1995, University of Colorado at Boulder [13]. Wang W, An Y, Zang L, Security --- How is it implemented in the Java language?, CSCI 5535 Project, Dec. 1995, University of Colorado at Boulder [14]. Sun JavaSoft: A Look Inside the Java Platform [15]. Sun JavaSoft: The Java language Environment, a White paper [16]. Abelson, H. and Sussman, G.J. Structure and Interpretation of Computer Programs, MIT Press, Cambridge, MA, 1985 [17]. Ousterhout J. K. Tcl and Tk Toolkit, Addison-Wesley, ISBN 0-201-63337-X [18]. Ousterhout J. K. Tcl: An Embeddable Command language, USENIX Conference Proceedings, 1990
Dec. 11, 1996
21
[19]. Lewis B. T. An On-the-y Bytecode Compiler for Tcl. http://www.sunlabs.com/people/ brian.lewis/ [20]. Ousterhout. J. K. Whats Happening at Sun Labs. http://www.sunlabs.com/research/tcl/ team.html, April 1996 [21]. Welch B. Practical Programming in Tcl and Tk, Prentice-Hall, 1995, ISBN 0-13-182007-9 [22]. newsgroup: comp.lang.tcl [23]. Ousterhout. J. K. An Introduction To Tcl Scripting, http://www.sunlabs.com/people/ john.ousterhout/ [24]. Schwartz R. L. Learning Perl, OReilly & Associates, Inc. 1993 [25]. Perl Documentation, http://www.csc.tntech.edu/docs/perl.html [26]. Romer, T. H. Lee D. etc. The structure and Performance of Interpreters, ACM, Oct.1996 [27]. Watson, D High-level Languages and Their Compilers, Addison-Wesley Publishing Company, 1989 [28]. Gosling on Java, DATAMATION, March 1, 1996
Dec. 11, 1996
22

CSCI 5535 Course Project - A Report On Interpreted Programming Languages

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

CSCI 5535 Course Project - A Report On Interpreted Programming Languages

Hochgeladen von

Copyright:

Verfügbare Formate

CSCI 5535 Course Project -- A Report On Interpreted Programming Languages

CSCI 5535 Project

Interpreted Programming languages

December 11, 1996

By Xiaoli Zhang & Helen Wong

Dec. 11, 1996