Sie sind auf Seite 1von 31

Tiny Basic

A minimal langue for experimenting with

History

Basic was introduced in 1964 at Dartmouth college in the US as one of the first computer timesharing systems that allowed students to actually log on to and use a computer interactively. It ran on mainframe computers with teletype terminals attached. It was an interpretive language so that as you typed commands in it stored them and then executed them when you said RUN

IBM 5100

In 1975 IBM introduced the 5100, the first personal computer with built in screen and storage. It had the option of being supplied either with Basic or with APL another interpretive language. Expensive and not widely used.

PET

Launched in 1977 the PET was the first successful mass market personal computer. It again came with Basic as an interpreter. Much cheaper due to use of 8 bit microprocessors.

Tiny Basic

Tiny basic was developed by amateurs wanting a small programming language that would fit into 2 kilobytes of ROM which was a standard cheap ROM chip in 1977. It ran on hobby machines like the Altair ( top left) and can still be obtained for contemporary hobby machines like the TinyBrick computer (bottom left)
A version Ti Basic also run on some calculators like the TI-83 on the right which use the Z80 chip used on early PCs

Features of the language : line numbers

Here is a very simple Tiny Basic programme 10 FOR I := 1 TO 5 20 PRINT I 30 NEXT I 40 END The language has numbered lines which should go up in ascending order. On an interpreter the line numbers normally substitute for an editor, allowing you to replace individual lines

Control structure

The version you will be working with is very simple it only has three control structures: For loops Goto statements If statements

For loops

A FOR loop has the structure 10 FOR I := 1 TO 100 20 LET A := I+A 30 NEXT I The lines between the FOR and the NEXT lines are executed 100 times in this case. For loops can be nested provided that each loop uses a different iteration variable.

Jumps

An unconditional jump to another line can be done using the GOTO statement, a conditional jump can be done using an IF statement which transfers to another line. 10 IF A>B THEN 30 20 GOTO 40 30 PRINT A 35 GOTO 50 40 PRINT B 50 END

Input output

There are 3 input output commands supported in the version of basic you will be working with, shown below. They allow reading and writing of integers. 10 READ I 20 PRINT 2*I 30 PRINTLN 40 END

LET statements

The LET keyword allows you to perform assignments to variables 320 LET J:= I*2+1 There is no need to declare variables. In the original Basic variables were either single letters, or a letter followed by a digit thus P,S,N1, Q9, T would all be valid In many Tiny Basic systems only a single letter is used.

Statements not currently implemented

REM allows comments GOSUB and RETURN allow for subroutines DIM allows for array variables.
10 DIM A(10) 20 GOSUB 100 30 PRINT S 100 REM calculate sum in A 105 FOR I:= 1 TO 10 110 LET S:= S+A(I) 120 NEXT I 130 RETURN

Your tasks

You will be working with a Basic compiler that I have written and will have to modify it to extend the language slightly 1. Allow variables to be strings of letters and digits starting with a letter 2. Add the REM statement to the language to allow comments 3. Add the DIM statement and support for array indexing to the language.

Interpret or compile

The early versions of Basic were all interpreters, that is to say the statements were translated into equivalent machine operations every time they were executed. Advantages of interpreters
Allow interactive use Can be implemented in very little code

Advantage of compilers
Allow much faster execution once programme is compiled

Phases of translation

Both interpreters and compilers share the first two tasks


1. Lexical analysis recognising keywords, variables etc 2. Syntax analysis checking the grammar

They differ in the way they cause execution to take place. In an interpreter a computed jump is performed to a routine that will execute a particular type of statement. In a compiler a sequence of machine instructions are output.

Outline data flow

10 LET A:=12

tokenizer len line 000a 06 82 A 41


Number 12

91

92

000C

Code for let

Code Code for for := number

dispatcher

Note codes are above hex 80 decimal 128, and thus outside ASCII range

Why tokenize

It performs data compression so the tokenized programme takes up less space in memory this used to be very important It allows faster interpretation since what is being interpreted is now a byte code which can be interpreted by a simple mechanism. Note that in Basic the semantics are always defined by the first token.

Word to token translation table

word GOTO IF LET NEXT PRINT etc

Token 80H 81H 82H 83H 84H Note codes are above 80H decimal 128, and thus outside ASCII range

Tokenizing Keyboard
On small computers and calculators the tokenizer was sometimes integrated into the keyboard scanning software so that it directly returned a token for a single key stroke, so that for example SHIFT P generated the PRINT token.

How a Basic interpreter worked


We will look at how an interpreter would have worked on the original IBM PC with the following hardware registers

A few basic reminders about assembler

Assembler works on machine registers On Intel assemblers the mov instruction moves data mov ax,[varstart] Means load the ax register with the word at label varstart Mov ax, [si] Means move the ax register with the word pointed to by si register Mov ax, [si*2+mylab] Means mov the word at address 2*si+varstart into ax Case is not significant in opcodes or register names

arithmetic

Add ax, varstart Means add the address of label varstart to ax sub ax,[si] Means ax = ax- memory[si ] Add ax, si Means add the si register to ax

How the dispatcher works

SI register len line 000a 06 82


Code for let

A 41

Number 12

91

92

000C

Code Code for for := number

dispatchtab gotoaddr ifaddr letaddr

Code to handle LET

How the dispatcher works


Reserve a register as the interpreter PC (for example the SI ) register, assume we are pointing at the first token of a line
nextstatement: ; dispatch routine movsb ax, [si] ; get the token inc si ; move pc on jmp [ax*2+dispatchtab-2*80h] ; jump to routine ; we subtract 2*80h from the address since codes ; start at 80h dispatchtab : dw gotoaddr dw ifaddr dw letaddr

This shows the typical feature of a fast interpreter, a small short sequence of assembly code that performs rapid dispatch to interpretive routines using byte codes. Only 3 instructions are used to do the dispatch

An interpret routine for Let


letaddr: call checkletter push ax call checkcoleq call expression

Bold means an instruction that does REAL work

; checks it is a letter ; address of var in ax ; look for a := ; evaluate expression ; result in ax pop di ; recover the address mov [di],ax ; do the assignment jmp advance ; this moves to the ; next line Note that the interpreter is made up of a sequence of calls to routines that do subsidiary matching tasks to recognise <letter>. := <epression>

Checking for letters

checkletter: movxb ax,[si] ; sub al,A ; jle notletter ; cmp al, 26 ; jge notletter ; ; ax now in range inc si ; add ax,ax ; add ax, varstart; ; return

get next char into ax register subtract letter A if negative was not a letter compare with 26 if al>=26 not a letter 0..25 move past the letter map to range 0..50 add the start address of the variables in memory

Expressions

Suppose we define an expression to be either 1. An identifier : A, B etc 2. A number : 1, 14 etc 3. An expression followed by an operator followed by another expression: A+1, B-C etc 4. An expression in brackets : ( A+9) The interpreter routine for expressions must recognise these cases

Expression code
expression: cmp [si],( ; check for ( jneq nobracket inc si ; found it so move past call expression ; must be an expression cmp [si],) ; check we have ) jneq error ; othewise it is an error inc si ; move past jmp checkop ; go look for an operator nobracket: cmp [si],numprefix; check for number prefix jneq mustbeletter ; look for a letter mov ax,[si+1] ; assume the number follows add si ,3 ; move pointer past it jmp checkop ; go look for an operator

Doing actual arithmetic

At this point we have the expression value so far in the ax register. We will only look for + and here, you can imagine the other operations

Bold means an instruction that does REAL work

Checkop: cmp [si],+ jne tryminus inc si ; move past push ax ; save value so far call expression; look for another expression pop di ; get back first value add ax,di ; add to the second return ; with result in ax tryminus: cmp [si],- etc etc

Efficiency

I have obviously only given you a part of an interpreter here but it is enough to show several things 1. The style of tight hand coded assembler that they typically used allowed a very small interpreter. 2. The way the code is structured by the syntax of the Basic 3. That you are lucky if one instruction in 10 or 20 does real computational work, rather than parsing and checking

Motivation for compiling

The major motivation is to get greater speed. Against this the complexity of a compiler is much greater, both the size of the compiler and the number of tools needed to build it. Also you have a slower debug cycle time for programmes: edit, compile, run instead of just edit, run