Sie sind auf Seite 1von 303

Hints on Working on a Team

Until this point in your academic career, you worked primarily independently, and on projects of very limited scope. Once you are employed as a programmer, you will rarely work independently on a project again. A skilled programmer can only turn out an average of 10 lines of well-designed, documented, and debugged code per day. With most systems programs and larger applications requiring many thousands to hundreds of thousands of lines of code, these are clearly beyond the scope of a single hot-shot programmer; the time to market would be too great. Hence working with a team to produce a major software system is an essential part of being a computer scientist. The stereotype of a computer programmer as a loner who communes with his/her machine to avoid people could not be further from the truth. Professional programmers spend more time in design meetings, in code walk-throughs, communicating with other programmers, with users, with the system maintainers, with marketers, etc. than in front of a monitor. That should be your experience in this course as well. One problem with working as part of a team, and working on very large software systems, is that the program is too large for any one programmer to understand the whole system. A software system such as an operating system has more components than a Boeing 747, and it is clear to us that no one person understands each and every component of a 747, much less their interactions. Hence adherence to good software engineering practices is essential if the results of a large group programming eort are ever going to work together, be suited for debugging, be maintainable, be modiable, or meet the original requirements. Working with a team can be anywhere from fun to awful, depending on your attitude and the attitudes of your teammates. Since you may not know the work habits or attitudes of your new team members how can you ensure a successful project and fairness in grading? Everyone needs to be involved in each aspect of the project (design, documentation, test planning, coding, and testing). In order to work on a team you will have to be considerate of your teammates. They are all High School Graduates and college juniors or above. In spite of your rst impression, they are capable, intelligent people, and deserve respect. Most team problems occur because a member of the team currently has too many commitments in life. This over commitment may be due to school class load, work issues, family issues, etc. It does not necessarily mean that they are lazy or stupid. However, if you feel that a team member is not attempting to contribute to the team, just let me know. We can have some friendly discussions and many times resolve the problem. You should all agree on a common language and a common hardware platform. Unless you are all extremely skilled (or masochistic), you should not use multiple platforms. Planning is the most important thing you can do. One hour spent in a preliminary team meeting saves many individual hours of redundant and possibly incompatible coding. Many students seem to think time planning and designing away from the keyboard is wasted eort. Such eort will not be wasted in this class.

Based on handout prepared by Al Stutz

I feel that my teammates are not doing their fair share. Contact the graders and/or the instructor. We will have a group meeting or individual meeting to determine exactly what the problem is. The earlier we discover and correct problems the more exibility we have in making adjustments. My teammate is a coding whiz-kid and has decided to simply do it all by her or him self. Contact the graders and/or the instructor. We will have a group meeting or individual meeting to determine exactly what the problem is. The earlier we discover and correct problems the more exibility we have in making adjustments. The whiz-kid who prevents others from working on the lab will have his or her lab grade reduced. Two of my teammates are long time buddies and they do everything together (software wise) and leave me out. Contact the graders and/or the instructor. We will have a group meeting or individual meeting to determine exactly what the problem is. The earlier we discover and correct problems the more exibility we have in making an adjustment. What is dierential grading? Why should we avoid it? If the graders and I determine that a fair share of the work was not done by all team members, then dierent grades will be assigned to each team member. If one team member does it all, he/she may get a lower grade than the rest of the team. But no one will be happy with the grade. The team members and the instructor will have a meeting to discuss the problem and hopefully correct it. However, a dierential grade may still be assigned. Can I still pass the class without doing anything on the labs? Absolutely not.

General
Problems are hardly ever fully dened. Welcome to the real world! You get a set of end user requirements 1 , then you need to study, examine, and sketch out issues and concerns. You will need to ask leading questions. While I do not intentionally leave information out, end user requirements hardly ever match the level of detail needed by the programmer. You and your teammates must agree on some set of standard coding practices: Will variables be passed as parameters or will you use global variables? A standard format for variable names. Variable names that represent a meaning. Names such as a, x, z, and n are not as clear as number of cases, location counter, etc. You should agree to a maximum module size. If a module needs to be larger than that, break it up. My rule is two screens worth, including comments. You will need to agree how to share les and how to know when a module should be added or a new update added to the lab. You might designate one team member has having sole responsibility to update the program les. If everyone makes changes then you will have a real mess! Another alternative is to make use of the Unix cvs utility.
1

The somewhat misleading term specications is often used here.

You should learn to use the make utility. You should consider writing several Unix scripts that will change the permissions of les for easy compiling. You might also want to write a script to facilitate compilation via the Unix make utility. Use common modules to avoid duplication. What if we have a question? Order of operations for maximum success! 1. email the grader 2. visit the graders during oce hours 3. call the grader 4. email the instructor 5. use the instructors oce hours 6. call the instructor All these options are acceptable and encouraged.

General Things (before you write any code):


1. Establish a regular meeting at a time and place where everyone can meet. 2. Keep minutes (i.e., the ocial record of the proceedings) of meetings and/or discussions. 3. Publish clear assignments and due dates. 4. Think about testing as you design the system. What are the syntax rules? How will you discover very subtle defects?

Design:
1. Layout a top down design. 2. Look for routine modules you will need repetitively such as: binary to hex, binary to decimal, decimal to hex, decimal to binary (could these really be one routine? should they be?), building a table (whats the relationship, if any, between a table and a partial map?), searching a table, . . . . 3. Write a dummy module for each routine needed. The following sample dummy modules implementation is written in pseudo-code.

Check_for_overflow: begin begin comment Procedure Name: Check_for_overflow Description: This routine determines whether the results of the operation would have resulted in an overflow. In this system the data length is 24 bits so the results range from -8,388,608 to +8,388,607

Calling Sequence: (temp_result: Integer, overflow: Boolean) Input parameters: temp_result Output parameters: overflow Error Conditions Tested: overflow Error Messages Generated: message ### Original Author: Al Stutz Procedure Creation Date: February 22, 1995 Modification Log: Who when why Al 3/11/93 Forgot to send message to screen Wayne 1/3/02 Corrected mismatch between data length and results range in description; introduced quote marks (") into pseudo-code; changed indentations in the comment; inserted colon (:) into pseudo-code; introduced types into the calling sequence; changed parameter name from flag to overflow; changed ambiguous "Initialize overflow" to more precise "Set overflow to false" in the pseudocode; end comment Set overflow to false Write "In routine: Check_for_overflow" Write temp_result, overflow end;

Documentation:
1. Draft the user guide before any code is written. It is easier to make modications in this document rather than in the code. 2. Write down clear assignments to team members. 4

Designing a Test Plan:


Test plans should be logical. For example you should test all categories of executions of arithmetic instructions in one test. In another you should test all the shift operations, in another all the branches (jumps), . . . . 1. Write down an overall test plan similar to the above statement. 2. Describe the behaviors each test is trying to be a counterexample for. 3. For each item being tested, you should determine the expected correct outcome by hand before making a run to see if the programs result is incorrect. 4. Never hesitate using extra output (write) statements in your code. They will help you debug and help us in grading. 5. The grader may provide you with a grading sheet that shows the level of detail that we will check for. Be sure to review this and use it as a guide for your planning. (Hint: We expect everything listed on the grading sheet to be tested, and as many more things as you can think of.)

Writing Code:
Even it you ignore all other advice, you should not do any coding before you complete the above steps. You must know what needs to be done and the limits of your own testing. You will, of course, want to create code for which your own tests can nd no counterexamples; furthermore, you will want to strive to create code for which no possible test could reveal a defect. That is to say, youre striving to create code without defects. This is why you hope that your test plan will reveal as many defects as possible. 1. As routines are written, they should be tested, even in their dummy form. Once you have all the routines identied (you will miss some), then start expanding them and step testing as you go. Test changes as you make them rather than all at once! 2. Have someone other than the module author also test it.

Testing:
All the nal testing must be done on the same version of the code. If in one of the tests an error shows up that you subsequently x, then you must re-run all previous tests. Your one line change could very well impact the results of an earlier run. (We have many examples where a small change has been ruinous.) It is very embarassing for some very basic and simple feature of your program that you know was working yesterday to fail during grading because of some little, last minute, change you made to x some advanced feature. Re-run those tests! 1. Use realistic test data. 2. Test extreme cases. 3. Test each function. 4. Test each error message. Hint: The grader will often assume that you have the basics working but try to catch you on a ne point. Your program better not crash if the graders test script references a memory address out of range, or tries to feed a gif le to your assembler. Your program should gracefully catch the error and print an informative error message.

THE OHIO STATE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CSE 560 Software Design

One of the major objectives of CSE 560 is the understanding and practice of techniques that enhance the development of quality software. The process of software development often is thought of as being composed of stages. In the rst stage the problem that needs to be solved is dened and the requirements that the software must meet are identied. The next stage consists of designing proposed solutions to the problem, evaluating alternatives, selecting one of the alternatives, and detailing the (modular) structure of the chosen solution. Next, the program is constructed in accordance with the design specication (this is the stage, coding, with which we all are familiar). The software also must be validated (e.g., through testing) to measure conformance with the specications laid out in the early stages, and installed so that the customer can use it. Throughout the development process documentation is the chief means of communication and management control. In formal development systems, there are specic documents that are required to be produced during these various stages of development. Each stage itself can be further decomposed into tasks, and each task can result in the production of some task document. Even for relatively small projects such as we have in this course, there are several reasons to follow a more formal approach: 1. To prepare you for more complex and more formal development environments in the real world. 2. To become more conscious about the various tasks that one goes through in developing a program. With this kind of awareness, one can more specically address sources of error in the development process. 3. To facilitate the later use of CASE tools to assist you with your project work. 1 4. To allow more direct supervision by the instructor. Each of the labs in this course goes through only a subset of the stages identied above. Requirements are provided by the instructor and installation is, for the most part, skipped because we arent in a production environment. You each have had experience with construction and testing of your software. Remember that you must prepare your own test data even though your software also may be tested by the grading assistant. This leaves the design stage, which we want to emphasize in this course. In order to assist you with this stage, we have identied a number of subtasks that you should perform,
CASE stands for Computer Aided Software Engineering. Though no CASE tool is stipulated in this course, you are welcome to use one with which you are familiar. Even if you do not use any such tool, the formal approach suggested in this document will prepare you for later use of CASE tools.
1

and a suggested order in which to perform them. Each task has some output, which you are required to produce and (with the exception of task 1.1) turn in as part of the writeup. These will constitute the programmers guide and part of the users guide portions of the writeup. To get you started, we have provided suggested output for a few of the earlier tasks. Feel free to augment our suggestions with your own. A preliminary version of your design is due the day before your design review meeting, so do not delay in getting started. The output for the tasks (particularly those in categories 2 and 4) can be produced using CASE tools that you know or using plain old pencil and paper. The diagrams and data descriptions should be shared among members of your group so that consistency is achieved as each of you works on your respective parts of the project.

Design Task List


1.0 Dene Design Framework 1.1 Review requirements 1.2 Identify development standards and utilities 1.3 Identify top level system structure 1.4 Prepare descriptive narrative 2.0 Dene Data Structures 2.1 Finalize input layout 2.2 Finalize output layout 2.3 Finalize major shared data elements 3.0 Identify Major Design Conventions 4.0 Dene Modular Structure 4.1 Identify modules and their interrelationships 4.2 Prepare detailed module descriptions
1.2 1.1 1.3 1.4 2.1 2.2 3.0 2.3 4.2 4.1

Figure 1: A Possible Task Flow Diagram Note: These tasks are not likely to ow smoothly from one to the other, as the diagram above might suggest. Rather, you probably will nd it necessary to iterate on certain tasks, especially when performing tasks 4.1 and 4.2.

Task 1.1 Review of Requirements


Objective: To develop a comprehensive understanding of the requirements for the system. Inputs: 1. Problem statement handout 2. Machine description handout 3. Instruction set description handout Task Summary: 1. Thoroughly review the handouts to become fully familiarized with the system requirements. 2. Identify key components of the system requirements. Outputs: Identication of descriptions in handouts of: 1. team conguration and responsibilities 2. quality assurance requirements 3. functional requirements 4. input requirements 5. output requirements

Solution: Make sure you have identied each of the requirements and constraints for the problem. Typically the problem description is not organized so that all of the requirements in each category are together, but it is important that you know exactly what youre expected to do in each of these categories. In the programmers guide, explicitly identify the responsibilities given to each team member. This is the only part of the solution to task 1.1 that you need to formally document.

Task 1.2 Identify Development Standards and Utilities


Objective: To identify the development standards, utilities and other support software to be used in the design of the system. Inputs: 1. Output of task 1.1 2. Knowledge of current course standards for internal design of system 3. Documentation on available utilities and support software Task Summary: 1. Based upon handouts and announcements in class, identify any standards that must be observed and applied in the development of the design specication. 2. Identify the available utilities and support software to be used. Outputs: 1. A tabulation of standards to be observed 2. A tabulation of utilities and support hardware and software to be used

Possible Partial Solution: Standards 1. Complete each design task identied in the task list table 2. Include output of each task in writeup, organized according to the task list table 3. Documentation is required of all team members 4. Use structure chart notation to show module relationships 5. Use Jackson notation for data structure diagrams Utilities and Support Hardware and Software Sun Workstation Unix operating system and X-windows Modula-2 Compiler Emacs

Task 1.3 Identify Top Level System Structure


Objective: To develop a high level system structure that will eectively implement the given requirements. Inputs: 1. Output of task 1.1 2. System ow diagramming expertise Task summary: 1. Allocate system functions to probable high-level programs. 2. Allocate system les to probable high-level programs. 3. Produce a system ow diagram that depicts the high-level structure of the system. Outputs: System ow diagram showing all the major planned programs, les, and outputs.

Possible Solution:

Load Object File object file

initial m/c state Interpret Instruc. initial state final state trace User

Task 1.4 Prepare Descriptive Narrative


Objective: To enhance the understanding of the system ow diagram with a narrative description that explains and claries the intent of its processes. Input: System ow diagram from task 1.3 Task summary: In narrative form explain the system ow diagram, emphasizing data ow, and giving the intent behind the function being performed. Output: Narrative explanation of the system ow diagram.

Possible Solution:

Load module: The user will create a le containing records that will be used to initialize the state of the 560 machine. The load module will read this le, providing initial values to various memory locations and various registers of the 560 machine. A display of the machine conguration is generated at the end of the load process. Interpreter module: Starting with the initial state provided by the load module, the instruction at the address given by the program counter is fetched and decoded, and the operation indicated by the instruction is performed. Each instruction appropriately resets the program counter. This entire cycle is repeated until the nal state is reached, when a HALT instruction is encountered or a fatal exception condition is reached. A trace is generated as each instruction is performed. A display of the machine conguration upon normal or abnormal termination of the simulation is generated.

Task 2.1 Finalize Input Layout


Objective: To describe the data element layout for all input les. Inputs: 1. Task 1.1 output item 5 2. Task 1.3 and 1.4 outputs Task summary: For each input le shown in the system ow diagram, prepare a detailed description of the le characteristics and record layouts. Outputs: 1. Detailed description of the input le characteristics. 2. Detailed layouts for input records.

Possible Solution: File characteristics: record input; each record is of type 1, or 2. There are 13 characters per record for type 1 records, 7 for type 2 records. Record layout: probably some kind of data structure diagram, perhaps using the Jackson notation. A sample is shown below.
input file

header record

text part

H-code

start exec.

seg name

seg length

IPLA

text* record

T-code

mem.addr.

init. contents

Each of the primitive subelements start exec, seg name, etc. should have a description as well. The descriptions should be in terms of a simple data type and its possible set of values (e.g., start exec, and mem. addr. might be of type cardinal with range 0..255; init contents might be declared as an integer, or array of char).

Task 2.2 Finalize Output Layouts


Objective: To describe the data element layout for all output les. Inputs: 1. Task 1.1 output item 6 2. Task 1.3 and 1.4 outputs Task summary: For each output le shown in the system ow diagram, prepare a detailed description of the le characteristics and record layouts. Outputs: 1. Detailed description of the le characteristics 2. Detailed layouts for the program output

Solution: Complete for each of the following, in a manner similar to that used for the outputs in task 2.1. 1. Initial state 2. Trace of execution 3. Final state Also note the way in which errors are reported. This isnt shown separately in the system ow diagram, but you may wish to create a separate error le. If so, modify the system ow diagram and complete the output description for the error information.

Task 2.3 Finalize Major Shared Data Elements


Objective: To describe the layout and structure of all major shared elements, including structured elements such as arrays and any global data structures. Input: Task 1.3 and 1.4 output Task Summary: Complete all shared data element denitions. Output: Detailed specication for all major shared data elements.

Solution: Include element name, purpose, and attributes (diagramming any substructure as you did with the input and output descriptions). Note: If a Modula-2 module, a C++ class, or a RESOLVE/C++ compontent will be encapsulating a type that is a major shared element, it is not necessary to describe that element here because it will be described in the detailed description for that module, class, or component.

Task 3.0 Identify Major Design Conventions


Objective: To record some special design conventions, and other observations that are important to the solution to the problem but may not otherwise be immediately apparent. Inputs: Task 1.1 and 1.4 output Task summary: 1. Review specic quality assurance requirements and identify their likely eect on the design. 2. Identify special problem constraints and their eect on the design. Output: Itemized special conventions agreed to as part of the design, and other observations that should be carefully considered during design and implementation.

Possible Partial Solution: 1. Not every word has initial contents read in from the input le. 2. Text records need not be in numerical order by word address. 3. All memory cells will be initialized to (ll in the value your group has chosen).

10

Task 4.1 Identify Modules and their Interrelationships


Objective: To develop the shared modular structure of the system. Inputs: 1. Outputs from tasks 1.0, 2.1, 2.2 and 3.0 2. Knowledge of functional decomposition principles Task summary: 1. For each system function, identify major procedural and/or data abstractions that appear to be needed. 2. For each such abstraction, decompose into more primitive components, continuing this step until the lowest level components are now elementary enough that further decomposition would be of questionable value. 3. Where alternative decompositions suggest themselves, evaluate the alternatives and select the one that appears to best satisfy the requirements. 4. Diagram the chosen decomposition, showing the modular structure, names of the modules, and their inputs and outputs. Output: Structure diagram (structure chart showing modules and their interrelationships.

Possible form of the solution: A graphical view of the program structure


1 2 3 B C A

Meaning (for procedural abstractions): The program A is thought of as invoking (calling) 2 modules, B and C, which presumably are invoked in that order. Module A provides B with inputs 1 and 2 and B returns 3. Module C is itself composed of D, E and F, and E has a submodule G. (Inputs and outputs for the other interfaces are not shown in this example, but all interfaces should appear in your solution.) The names given to modules should be simple commands such as Interpret Instructions, Compute Target Address, etc. This kind of diagram is called a structure chart. Meaning (for data abstractions): A data abstraction module Data Template provides type D and operations A, B, and C. A parameter d is both an input and output parameter to operation A. 11

Data Template D d Type D A B D

(Again, other parameters to the operations are not shown in this diagram, but should be when developing the completed module structure.) The boxes with curved sides represent the data type component of the module, while the boxes with the triangles on top represent the operations provided by the module. The triangle on top of a box denotes that this operation is lexically included as part of its parent, rather than being called from its parent. To show that this data abstraction module is invoked by (imported into) another module, simply show a line connecting the Data Template box to the other module.

12

Task 4.2 Prepare Detailed Module Descriptions


Objective: To describe each module in the modular structure in sucient detail that it can be coded in a straightforward manner. Input: Outputs from tasks 2.0, 3.0 and 4.1 Task summary: For each module in the structure diagram of 4.1, provide a statement of the purpose of the module a detailed description of each input and output, whether parameterized or global an overview of the algorithm (using pseudo code) if the module is a procedural abstraction, or an overview of the algorithms for each operation if the module is a data abstraction Output: Detailed design for each module

Elements of solution: (for each module) Module name: For a procedural module, this should be a declarative command like interpret instructions. For a data module, it should be a descriptive name of the data abstraction, like stack of integers. Formal Parameters: Name and type of each formal parameter in order of calling sequence. For data abstractions, there may not be any parameters to the top module box because not all components are templates. But there nearly always will be parameters to the other operations provided by the data module. Global Elements required, if any (descriptions should be included here unless they already are included in task 2.3). For data modules, any internal state information that is local to this module should be described here. Statement of purpose of module: A brief, one sentence description will do. If its too hard to write a concise statement of purpose, this may be a clue that the module isnt well thought out. Pseudo code: If this is a data abstraction, give the entire denition module, and pseudo code for each operation. It also would be nice to see pre- and post-conditions for each operation.

13

[This page was left blank intentionally.]

14

Principles of Good Technical Writing


Roshan Rao and Wayne Heym Writing is a process of communication between you and your audience. Generally, it involves reading and synthesizing material from dierent sources. The writer collates all the dierent threads of information together and presents it to the reader(s). Writing is not a passive process. Rather, it is creative. The writer should not just string the material together. Instead, he/she should integrate and interpret it in a manner suitable for the audience. Some guidelines: Before starting out, determine the needs and uses of your document. In the case of a software design document, a person who wants to redesign or implement a computer system would form your audience. When you write, adopt a tone appropriate to your audience. This means that you: Dont belabor trivial points. Stress the most important material. Present the material as simply and as clearly as possible. Strive to be as professional as possible. When you begin your report, you will nd that its dicult, if not impossible, to think about everything all at once. Experienced writers take a paper through stages starting from a rough draft and nally ending with a polished document. Initially, they concentrate more on content than on grammar, style or punctuation. The focus, organization, paragraphing and overall tone - these are considered rst. Only later come grammar, sentence structure, word choice and such other matters. The writing process is a development process. So, its stages have many parallels with the design process. Design includes the following stages: 1. Sketching an overall architectural structure of the solution. 2. Analyzing the proposed solution to see if it meets the specications. 3. Examining alternative solutions for correctness and relative quality. 4. Detailing the chosen solution, i.e., repeating steps 1-3 at ner levels of detail. The stages listed above are typically not sequential and you may need to iterate, especially when aws are discovered during the analysis phase, such as in the testing process.
1

Likewise, writing can consist of the following stages: 1. Generation of ideas and outlines. Think! Write down ideas. Ask yourself questions and generate a list of specications. Rearrange them, put them in groups. You may need to iterate here till you are able to sketch out a high-level structure and identify key components. 2. Prepare an initial draft. Flesh out details of the parts of the structure created above so that it can be read and understood by your audience. Basically, at this point you should have a prototype of your report. 3. Make big revisions. Evaluate basic ideas and make sure you are conforming to the requirements stated in (1). Look at the structure and the paragraphing. The organization of the report should represent a sensible coupling of ideas. Revise the components as needed. 4. Revise your paragraphs. Paragraphs reect the organization of your report. A paragraph should be cohesive, i.e., unied around an important point. (a) State the purpose. You should tell the reader the main topic of the paragraph as early as possible before the reader gets lost in it. (b) Be pertinent. Reject matter that is unrelated to the main theme of the paragraph. Develop the main point and enlarge on it. (c) Proper coupling. Link paragraphs to paragraphs. This improves the overall organization and coherence of the paper and ensures a smooth ow of information. 5. Revise your sentences. Now, you should look at individual sentences and assess them for style and clarity. Basically, at this stage, you are evaluating operations within components. (a) Highlight major ideas. Decide what ideas are worth emphasizing and put them in subjects, verbs or objects. Dont have too many short sentences, but dont move to the other extreme either and put too many ideas in one sentence. (b) Add necessary words. Put in words that are needed for logical completeness of the structure. Add words needed to complete compound structures. (c) Use good grammar. This means that you : Resolve mixed constructions. Fix misplaced and dangling modiers. Check if quantiers are properly bound. Provide consistency for verbs, etc. 6. Choose the right language. Your choice of words should suit your audience and your topic. Avoid jargon, slang and the like. Use proper math constructs and check expressions for succinctness. Avoid too much negation. Check logical connectives for succinctness and understandability. Choose an appropriate tone.
2

7. Edit your punctuation Check if your punctuation is appropriate to the context. Note: The latter steps represent the implementation stage of the writing process. Its an iterative process and you may have to move back and forth through each stage as you discover aws in your eort (testing and implementation are interspersed throughout the process). 8. Finally, you will be ready to show your report to the world. This represents the end of development. In the software lifecycle, this may correspond to the installation phase.

[This page was left blank intentionally.]

Guidelines For Writing A Software Report


Roshan Rao and Wayne Heym

A technical report is generally more intricate than the average essay. It contains complex materials, which need to be arranged in a suitable way to help readers read and understand the report quickly. It should be as brief as possible, yet as precise as possible. Accuracy is important, particularly in design documents. A complete design report consists of the following components: 1. The front matter. 2. The body of the report. 3. The references. 4. The appendices. These components are elucidated below for a software report.

The Front Matter

This helps readers use your report eciently. It includes the following: 1. The Title Page. 2. The Table of Contents. 3. The Introduction. 1.1 The Title Page

The title page is right at the head of the report and is the rst thing readers will look at. It should comprise the following: The title. This should be well-chosen and should clearly reect the content of the report. Names. The names of the people responsible for the report. The date. When the report was submitted.

1.2

The Table of Contents

This is a map of your report. Your readers will use it to nd their way through the report. It should be fairly comprehensive and should list all the sections and the subsections of the report in the order in which they appear and the page numbers on which each of them begins. It should be well-designed and should distinguish between sections and subsections by using upper/lower case letters and indentations. Figures and tables should be listed separately after the contents. 1.3 The Introduction

This gives a general overview of the project. It should provide the concepts on which the project is based and how it works. It should also lay the foundation for the other sections.

The Body of the Report

This is the main part of the report. In CIS 560, it might include the following sections : Users Guide. Programmers Guide. Source Code. Testing Documentation. Alternatively, the Users Guide, Programmers Guide, etc. can each be considered reports of their own, containing their own individual front matter, body, etc. In that case, the front matter, etc. would be more specic and relate to the particular report. 2.1 Users Guide

This should cover the basics of using your system. It should explain the capabilities of your program to the user and show him/her how to use it. It should not give the inner details of how and why you have written the program. Basically, it should cover the following: Learning to use the system. Getting started. Starting and exiting from the program. Other basic topics like expected input and output etc. Introduction to dierent commands. This section should cover the instruction set and can typically be subdivided as follows : Understanding the command syntax. Advanced commands.

Error messages. A descriptive list of error messages. How to recover from errors. An Index. If your Users Guide were a standalone document for a large system, then you could have an index containing all the signicant terms you have used. However, for this course, an index is not required. If one is produced, it may be better that it be a global index covering all documentation for the project rather than being separate for the Users Guide. 2.2 Programmers Guide

Almost invariably, someone (perhaps the original authors) will need to modify the program. The Programmers Guide is meant for a knowledgeable user who wants to know how it works, i.e., it should portray the design details of the program. Each design detail is the conclusion of some design decision. It should include: A Description of Data structures. The Purpose and Specications of the Dierent Modules. Their Inter-relationships. Error-handling. Parameter lists. It should describe your program concisely so that when the user looks at the program, he/she knows where to look for a particular structure/function. 2.3 Source Code

This is an important part of the overall system documentation, and may be considered part of the Programmers Guide. It is identied separately because it contains the implementation of your modules and data structures, rather than just their description and specication. Your program should include the following features: Modular code with appropriate indentation. So that it is easily readable. Good choice of variable names. Comments. These should neither be too sketchy nor too verbose.

2.4

Testing Documentation

This should contain: A Test Plan. This describes the dierent tests that are to be carried out, what they test and their input and expected output. Actual Test Runs. This portrays the actual results generated by the program for specied inputs and forms a collection of examples for running the program. Testing can be carried out separately for each module of your project and the documentation should reect this.

References

This section details the books, journals etc., to which you have referred for the project and also points the reader in the right direction, should he/she desire to learn more about the technical principles behind the project.

Appendices

Here, you can include information that, while it may be valuable to certain readers, can be omitted while still understanding the gist of the overall report. Sometimes the appendix includes extensive descriptions of matter that is more concisely used in the report body. Some candidates for the appendix of your report might be: The Instruction Set of the machine. A Glossary of terms used in the report. A list of errors discovered in the program and how to x them. A common term for this list is Errata. Possible enhancements to the system. An Index.

CSE 560

Required Lab Documentation

The rst thing to consider in doing a CSE 560 writeup, or any writing assignment whatsoever, is the audience for whom you are writing. The actual audience for your 560 writeups, of course, is the grader (and/or instructor) for the course, yourself, and your lab partners. We would like for you to imagine, however, that you are writing to fairly typical computer users experienced programmers who would like to nd pre-packaged software to ll their needs rather than write their own. You should imagine that your nished documentation be available on the world-wide web for potential users to browse through or study. We can reasonably assume that if one of our hypothetical users cannot nd a software package that ts the bill exactly, he or she will be willing to try to modify one that is close. The rst consequence of writing for this imaginary audience is that your documentation should have several distinct parts that will be used for distinct purposes. these will be described below under the headings Users Guide, Programmers Guide, Test Plan, and Meeting Minutes. Presumably you already have had some experience coding and testing programs, but perhaps little experience designing systems. For this reason, this document has a sequel, CSE 560 Software Design, which goes into more detail on the design of systems. Another consequence of our audience is that the organization and style of your writeup are almost as important as its content. If a prospective user cannot nd necessary information about your program, he or she is likely to give up on your program and look for another. Above all else, you should be concise. Try to avoid redundancy as well as ambiguity and omissions. To convey relationships among elements of your report, use tables and pictures rather than prose whenever possible. The CSE 560 Software Design document provides details and a suggested format for useful tables and pictures. A CASE tool may be employed in preparing this information. Do not reiterate a program from your text. Algorithms described at the same level of detail as the program itself are useless to our hypothetical user. He or she needs the big picture, not bit twiddling details, most of the time. Nearly all sections/levels of your documentation should be hypertext. (Nearly all exceptions to this rule will be diagrams or pictures.) The top level should either play the role of a table of contents, or there should be a link from the top level to a table of contents page. Each part can be reached from this table of contents through a link, making it easy to open to any particular one. Each part should begin with its own cover page, stating document information (e.g., title, date written, primary author) and group information (e.g. names of members). You should generously supply cross references to other parts of your documentation; use hyperlinks to implement these cross references.

Users Guide
The users guide should explain what your program does, and how a user can get the program to do it. Our hypothetical user does not need to know why you wrote the program. He or she simply wants to do X and wants to nd out if your program can do it. Based on this part of your writeup, a user must be able to install your program, make it run, and be able to understand its output, error messages and all. Write the users guide as if the grader knows nothing about the specics of the lab; the users guide should be communicating those specic details. It should explain every aspect of running and using the software, including troubleshooting. When describing what your program does, it is not necessary to copy the original requirements verbatim into your documentation. It is perfectly reasonable simply to paraphrase appropriate sections of it, or attach in their original form whatever parts are important. Remember, however, that in CSE 560 (and in virtually any system you will encounter) some parts of the problem are left unspecied. This means that you will always have something original to say about what your program does. Note that when you are working in a group (as you are in this course), it is important to have this part of your documentation done very early so that everyone in the group is working from the same requirements. Descriptions of the inputs to and outputs from your program, including their formats, are essential in a users guide. By reading this document, the user should be able to visualize the reports produced by the program. Error messages and their descriptions are also essential, as are any instructions and conventions needed to access the program. CSE 560 Software Design has some further information about these issues because some of this information is needed for both the users guide and the programmers guide.

Programmers Guide
The programmers guide should tell the prospective user how your program works. This is necessary in case he or she nds that the program needs to be changed in some way. The user needs to nd out fairly quickly how much work the change will take. Having to turn immediately to a long program listing will discourage our user, and probably will result in your program being set aside in favor of another (and someone else getting credit for writing a versatile program), or will result in an unnecessary new system which will be costly and wasteful of resources. Instead, the programmers guide captures the design details of the program the blueprint by which the nal program was written. It is in this part of the writeup that you should describe your data structures, the algorithms you have chosen, the module structure, the way errors are handled, etc. This is not an appendix to your program. The user has not looked at your program yet, but rather is trying to nd out whether or not to look at it, and on what parts to concentrate. You should not force the user to turn to the program to make sense of the writeup, but there certainly will be details in the code that are not in the programmers guide. Note that, in addition to the users guide, the programmers guide should be a working document for your group. To this end, it should separately document: (1) data structures, (2) relationships among modules, (3) module interfaces, (4) modules themselves. In documenting data structures, it 2

is important to describe the role the structure plays in the execution of the program (e.g., an object called pc may represent the program counter of the virtual machine), as well as its implementation (e.g., pc may be a record having two elds, one called length and the other called value), and any invariants (e.g., pc has a value in the range 0 to 65,536). In documenting modules, it is important to show which modules invoke which others, as well as how individual modules work and, lest we forget, what they do. Modules that encapsulate a data abstraction should separate the specication (i.e., denition) and algorithmic (i.e. implementation) details. Parameter lists are an essential part, but not the whole, of module documentation. Module interface descriptions must include what a module assumes about its calling environment (requires) and what it, in turn, guarantees to perform (ensures). A programmers guide should contain a very thorough description of the ow of control of the software. By reading only this guide, a programmer can learn everything that the software does, and how it is accomplished, without having to look at any code. The programmers guide should provide sucient detail about the design of the software and how everything works together. The CSE 560 Software Design handout provides you with templates to help you express this information in an organized manner. A CASE tool may be employed to assist you in producing the programmers guide, and in sharing its contents with other members of your group.

Data Element Dictionary


This section is used to describe each shared variable used in the program. The following format can be used.

Variable Name

Local/Global

Type

Declaring Module

Purpose

Code
An essential part of the documentation of any program is the source code itself. Despite the foregoing, all that you have ever learned about comments, choice of variable names, blocking structure, etc., still applies. Do not forget that someone modifying your program needs to be able to read it. The code should be organized so that individual modules and data structures can be found easily and read quickly. It is a good idea to adopt a precise coding standard, such as can be found under the class web site for C++ and for C.

Test Plan
Your test documentation is important to the hypothetical user for two reasons. First, it provides some indication that, at least sometimes, the program actually does what you say it does. Second, it provides a source of examples for running the program. It probably is obvious that the test 3

documentation should normally include a collection of actual runs of the program in which both the input and output are clear. Perhaps not so obvious is that the test documentation also should contain a test plan that describes the testing that is proposed to be done, rationalizes why these tests were chosen, and indicates the expected outcomes of each test case. In a sense, creation of the test plan is part of the design process. As such, it should be done early in the development process. If a mistake is found in your implementation, it should be possible to quickly nd where in the test plan this feature was (or was not) exercised.

Meeting Minutes
In this course, perhaps for the rst time in your major program, you will be working on a technical project as part of a group. Group projects oer the advantage of not requiring that each individual be responsible for every part of every assignment, but, at the same time, oer the disadvantage of having to depend on others to do part of the assignment correctly. Welcome to the real world. One of the most important elements in a successful group project is eective communication among the group members. You should meet with each other, and communicate via e-mail, frequently. At those meetings, there normally will be a set of topics covered, (possibly tentative) decisions made by the group, and perhaps assignments made to individual members of the group. A record of the meeting should be kept, including the date and time of the meeting, the topics covered at the meeting, the major ideas and rationals coming from the discussion, the conclusions reached (and their justication), and the assignments (often called action items) made to individuals as a result of the meeting. For each meeting, one of the group members should be assigned the responsibility of taking these minutes. The minute taker should type in the minutes and send them out to the members of the group via e-mail as soon as possible after the meeting. In addition, an archived copy of all minutes should be kept to be referred to by the group and by graders or the instructor. Each meeting (and the corresponding set of minutes) should begin with a review of the open action items from previous meetings, so that problems may be caught early. Action items should have milestones that are ne-grained enough to permit the group to determine whether a task is behind schedule early enough to be able to do something about it.

Members of the group will have dierent documentation responsibilities for the dierent labs. It is required that each member of the group have primary responsibility for a users guide or a programmers guide by the end of the quarter.

Lab Submission Instructions


You should place your documentation files (including source code) into a directory structure. Name your directory <groupname>_<labname> for example, c560aa01_lab1 I suggest you use organize your directory with subdirectories using a logical format, that is, subdirectory names such as doc, src, tests, etc. for the various parts of your lab. Do not submit executables. Your User Guide should contain instructions on building and setting up the emulator. Your install procedure should be as user friendly as possible. Your directory should contain a file named README or README.txt containing a list of the files submitted and any other immediately useful information (such as a suggestion to read the User Manual). To retain the directory structure you created and to allow easy transmission to others (including, and especially, your graders), you should gather your entire directory structure into one file. Archiving utilities are available for doing this, including, in Windows, sending your top-level directory to a compressed (zipped) folder (right-click on the folder and choose Send To then Compressed (zipped) Folder). Another alternative, on stdsun (which uses the Solaris flavor of Unix), is to bundle your files into a zipped "tarball" with the gtar command gtar zcf <groupname>_<labname>.tar.gz <groupname>_<labname> Look for error messages coming out of this tar command. Check especially for permission problems because different people have probably created the files. You should test that it worked using the table of contents option gtar ztvf <groupname>_<labname>.tar.gz or, to be even safer, by extracting the files and checking them using the command gtar zxvf <groupname>_<labname>.tar.gz

You should submit the file using the appropriate Carmen Dropbox. Only one person in a group should submit the lab. If there are multiple submissions, only the last one will be considered. Any submission dated after the due date and time is late. I would strongly suggest that groups test for problems in the process by submitting a test file early and seeing if they have any errors. Please email me if your group can't submit the test file. As a last resort, you can email a gzipped tar file to me. Because this last resort will cost me some time, I strongly discourage this.

Lecture #1
System Software Design, Development, and Documentation D l t dD t ti Introduction & Administration
0

Course Objectives
System Software Software Engineering
(A Little) Requirements Gathering Design Team Work

Writing (Documentation)

Choices
Hardware platform Software platform
Editor(s) Compiler(s) Compilation management (make) Configuration management (cvs) C fi i ( ) Off-the-shelf components Documentation
2

Remark
Now would be a particularly bad time to have to learn the main programming language that your project team will be using. Learning new off-the-shelf components also takes time.

Evaluation
We have to be able to verify independently that your source code produces an executable that has the desired behaviors. Therefore, if your team desires to use other than a CSE-provided platform, youll have to negotiate this matter with a grader, and come up with a short, written contract describing the agreement.
4

Graders
Our graders are: Sean O'Connor (oconnor.173@buckeyemail.osu.edu) Kai Li (li.966@osu.edu)

Lecture #2

Introducing the Machine

Overview of Labs 2-4


Assembly language e.g., LOAD r1,VALUE

Assembler
Machine code e.g., ...0110101110...

Linking Loader
Linked machine code e.g., ...0110101110...

Simulator
Executing program
8

The A11-560 Machine


An abstract machine abstract Nearly the worlds simplest architecture!
Memory CPU Output Device Input Device

. . .

PC Register Bank

Instruction Processing Cycle


Repetition of 2 steps:
Fetch
Read word in memory location indicated by PC Increment PC

Execute
Perform action specified by the contents of that memory location (may involve reading/modifying other memory locations or registers)
10

Memory
Organized in cells cells
i.e., smallest addressable unit
k 0 1 2 3

N cells, addressed 0..N-1


each cell consists of k bits

A cell is usually:
a byte (8 bits), or a word

N-2 N-1

More exotic architectures exist


11

Questions
How many different values can be represented in such a cell?
k 0 1 2 3

N-2 N-1

How many bits are required to represent an y q p address?

12

Number Representation
A number is a concept for which there are concept, many concrete representations/notations
e.g., the number eleven can be represented as 11 (decimal); XI (roman); 13 (octal); B (hex); 1011 (binary); k (alphabetic); etc

Conversely, a single concrete representation may have several interpretations


e.g., the representation 10 could be interpreted as ten, eight, sixteen, two, etc
13

We have only bits available for our machine


simple binary numbers used in memory for concrete representation the corresponding data, however, can be interpreted in different ways
e.g., numeric data, instruction, ascii string, i d i i ii i

Problem: representing negative numbers in binary.


14

Signed Magnitude
Use first bit to represent positive/negative
S
1

Magnitude
k-1

i.e., 1111, 1110, 1101, , 0111

Q. what is the smallest number? Q. Q what is the largest number? Q. how many numbers total? Notice: adding a negative to a positive number looks more like subtraction!

15

Ones Complement
To negate a number flip the bits number,
e.g., -5 would be:

First bit is still the sign!


i.e., 1000, 1001, 1010, , 0110, 0111

Now addition always looks like addition (with ( i h an end-around carry) d d ) But the range is still 2k - 1 Why?
16

Twos Complement
To negate a number:
i) flip the bits ii) add 1

e.g., -5 would be: Uses the full 2 k range!


consider negating 0

The first bit gives the sign (like signed mag)


17

Imagine a circle:
0000 1111 1110

0 -1 1

0001

-2

addition: clockwise subtraction: counterclockwise


-8
1000

7
0111

18

Let k be the number of bits in the representation representation. k 1 x 2k 1 1. Number x is representable iff -2 Assume so in the following. Let bin(y) be the simple binary representation of y. If 0 x, then x is represented by bin(x). Oth Otherwise, x i represented by bin(2k + x). i is t d b bi (2 )

19

10

Lecture #3

20

Instructions
Basic format of instructions in memory:
OP CODE OPERANDS

op code encodes function operands encode arguments

(A11-560 instructions are all the same size) Operands can be interpreted in different ways
21

11

Addressing Modes
Immediate
argument is the operand
LOAD #6 effect: ACC 6

Register Direct
operand gives the register where argument is
LOAD r1 effect: ACC r1
22

Addressing Modes (II)


Relative (PC or base)
operand gives displacement, relative to a special register (such as PC)
JMP 3 effect: branch forward 3 cells i.e., i e PC PCold + 3 ld (important to know what the PCs old value was!)

23

12

Addressing Modes (III)


Memory Direct
operand is the address of the argument
LOAD 1B00 effect is to copy contents of cell 1B00 into ACC

1AFF 1B00 1B01 4 i.e.,

ACC ACC

M[1B00]

24

Addressing Modes (IV)


Memory Indirect
operand is the address of the address of the argument
LOAD @1B00 effect is to move contents of cell 4 into ACC
0004 17 i.e., i.e., ACC ACC ACC
25

M[ M[1B00] ] M[ ]

1B00 1B01

13

Indexed Addressing Modes


Indexing can be used with direct & indirect
address of argument given by: operand + value of a register
LOAD +1B00
2 1B00 1B01 1B02 4 25 18 i.e., i.e., ACC ACC
26

IND ACC M[1B00 + IND] M[ ]

A11-560 Instructions
4 OP 2 U2 2 R 2 X 2 U1 8 S

addressing modes:
immediate, register direct, memory direct, PC relative (with or without indexing), opcode extension, and ignore

general syntax:
R S(X) -

OP R,S(X)

- a register (integer in range 03)

S , if X = 0 S + rX , if X = 1,2,3

27

14

Categories of Instructions
Branch
unconditional and conditional

Load / Store
copies data between registers and memory

Arithmetic / Logical
addition, subtraction, shifting,

IO
read/write numeric and ascii data
28

Advice for Groups


Meetings
set up frequent and convenient times and places keep minutes (designate someone) have an agenda (agree on or set before) conclude with action items
assign responsibility for each action item!

start with progress report on open action items


29

15

Advice for Groups (II)


Tasks
exchange email / phone numbers read all handouts carefully
e.g., Documentation Requirements, Software Design

agree on standards / conventions t d d ti


e.g., ELLEMTEL document on course web page

Anticipate and deal with problems


30

Machine Code Examples


1. 1 ST 2. IO 3. OR 0,EE(0) 0 EE(0) 3,0(2) 1,2B(3)

31

16

. . .
B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 BA

Memory
7 0 B 9 B 0 B 9 B C 5 0 2 3 2 3 0 3 0 3 0 4 0 0 8 0 8 0 0 0 0 0 6 A B 0 0 B B B 0 0 F F 0 A 0 8 4 0 6 8 0 F 5

Instruction/Data

. . .

32

Lecture #4

33

17

Labs 1 and 2: Milestones


Place lab1 in its Ca e Dropbox before class October 4. ace ab ts Carmen opbo be o e c ass Octobe . Mandatory Design Review: October 7, 10, & 11
Beginning Sep. 30, sign up for a 25-minute slot outside DL 481. Everyone in group must be present. Turn in printed version of preliminary documentation for lab2 October 5, in class (Programmers Guide in particular). Place preliminary documentation in the Carmen lab2 design review lab2-design-review Dropbox before class October 5.

Completed Documentation: October 18, before class, in the Carmen lab2 Dropbox.

34

Lab 2 Requirements
Input two text files put te t es
executable file: initial header record, followed by a sequence of text records process input file: consulted at process run time by IO instr. process trace:
1. memory configuration & registers after loading 2. trace of each instruction executed (i.e., memory & registers affected) 3. memory after termination

Output two text files

process output file: appended at run time by IO & BR instrs.

Robustness (so test it thoroughly)


35

18

CSE 560 Software Design


Task 1.1 Review of Requirements 11 How can we get a good handle on the functional requirements (Output #3)? Prepare a robust test plan.
Look for boundary conditions. There are more than a few little gotchas (Got you!) fe o !) in these project assignments. It is important for a software development team to find these.
36

When you discover a gotcha


Judge whether the customer should be involved in making the decision. When in doubt, ask the customer. Make a decision. Document your decision decision.

37

19

Example gotchas
What is a segment good for? How is it used? How might a segment name be used?
Ask the customer.

Should accesses outside of the segment be considered errors and/or warnings? g


Not if those accesses are to legal addresses (any address between 0 and 255 is legal).
38

Lecture #5

39

20

Software Engineering

40

The Software Crisis


Were in the midst of a s/w crisis We re
and weve been there for 40+ years!

Complexity continuously increasing:


machine software

Tools and techniques to manage complexity


tools: CASE, analysis, testing, techniques: languages, methodologies,
41

21

However system complexity frequently However, pushes the envelope Net effect:
The support for building complex systems always seems to lag behind the systems we build (or want to build) !!

42

Characteristics of Well-Designed Code:


use Easy to _______

user-friendly robust efficient flexible (and easy to configure)

43

22

Characteristics of Well-Designed Code (II):


maintain Easy to _______

easy to understand and reason about


documentation and simplicity of design

easy to read
coding conventions and style

easy to modify easy to extend


44

Geord Polya's How to Solve It

Waterfall Model of Development


Simple model of software development Occurs in stages:
Programmer's Guide -> 1. requirements analysis 2. system specification 3. 3 design I spend too much time 4. implementation doing #3 during #4 5. testing do this more often! This is the tried and true method for "checking" 6. maintenance / support -> User's Guide
45

23

Problems with this Model


There is no barrier between steps barrier
e.g., begin testing before implementation done
testing can happen at many different points. Writing tests even before designing can help mold the design

Water flows uphill


e.g., working on design reveals gaps in requirements analysis more like an Escher print than a real waterfall!

46

Fundamentals
Many alternatives to pure waterfall exist
spirals, matrices,

Concept of distinct stages is useful


helps structure the effort, like a battle plan

Some basic stages:


1. requirements / specification 2. design
47

24

Basic Stages in Software Design


Requirements Analysis
answers: What should the system do? understand the problem deliverables:
1. requirements document 2. specification document
from the users point of view p client is us. It makes it straightforward to defines and limits the scope of the system from the developers point of view basis for design / implementation / testing

This could be good to write even if the

write a design once we figure out how the system should work.

48

Not Done in Requirements Analysis

ON TESTS

Requirements Analysis does not aim to answer any of the following questions: What environment will the system operate This is in? deferred 'til later What will be the performance characteristics of the system? we haven't thought about HOW to implement h t i ti f th t ? this yet. What will be the cost of delivering the system to the client? same as above.
49

25

Basic Stages (II)


High Level Design Possible solutions (no evaluations)
identify and evaluate possible solutions now evaluate. factors to evaluate on: simplicity, effort, cost refine the design move on to lower levels...
Brainstorming!

50

Basic Stages (III)


Design
answers: How does the system do what it does? how to we answer the requirements analysis? (System specs) high-level description of components, interfaces, and interactions given in terms of data structures, procedures, algorithms, The data structures we provide, that is.
The operations we provide. The algorithms are NOT provided to the client. 51 Instead, this is an internal information.
abstraction is critical What details do we ignore? b i i ii l Which do we emphasize?

26

Basic Stages (IV)


Module Specification
identify the abstractions describe the abstractions
see the specification skeleton in the syllabus

describe the interactions (interfaces, operations)

Two common and broad classes of design:


1. procedural 2. object-oriented
These are both useful, and we can use either for different projects.
52

Procedural programming deals with data. There is less abstraction. It deals with transformations that happen to data.

Procedural Design

Focus on the functionality Create a data-flow view of computation


Input File Loader Memory Rep.

U thi data-flow to decompose a large Use this d t fl t d l system into smaller modules
53

27

Procedural Design (II)


Look for modules that:
are small enough to be understood are large enough to result in reasonable overall complexity are generic (and flexible) enough to be reused

Key activity: DEFINE INTERFACES


this allows work to proceed in parallel on the sub-parts Definiting interfaces is
important for both procedural AND OO programming.
54

ON TESTS

OO focuses on the data itself,not the functionality. Of course, looking at the name, this makes sense.

Object-Oriented Design (OOD)


Table is a Program = collection of Machine interacting objects Memory has a State Sketch out the types needed and the interactions between class these types object bj t Current State

We think of different types Focus on the data of data with different Objects. Objects have functionality by passing messages to each other by calling each others' methods.

55

28

OOD: Finding Classes


invariants inside of a class are those things which are true inside of a class. (Correspondence) outside, we have *constraints* client level invarients and implementation invarients

Design is often based on reality Talk to field experts to understand the system being modeled in software. Write down scenarios
use case analysis Very important. (JankCMS) use case

Draw lots of pictures and refine model Decide on class invariants


56

OOD: Specify Relationships


Typical class relationships include: yp ca c ass e at o s ps c ude:
inheritance (e.g., a car is a vehicle) containment (e.g., a car has a steering wheel) use (e.g., a car uses a highway) encapsulation (e.g., details of steering mechanisms are hidden behind the interface, i.e., the steering wheel)

Determine responsibility of each class


delegate where appropriate (not too much!)

Must strike a balance (small vs. large) in the size and functionality of classes
cuts across both procedural and OO
57

29

OOD: Specify Operations


Important categories of operations:
construct, initialize, copy, assign, xfer, destroy access, update, iterate

Set should be small and independent


do not implement every p p y possible use / extension

Focus on behavior, not implementation


confirm invariants

Key activity: DEFINE INTERFACES

58

Lecture #6

59

30

Road Map
Project - Lab 2 Form groups Requirements analysis Design review Submit complete project (i.e., implementation, documentation, tests) Lectures Admin stuff Abstract machine Software Engineering <detour!> Testing Technical Writing
60

System Software - Overview


What is system software anyway? system software
programs that support the operation of a computernot necessarily something the user interacts with often closely related to the architecture compiled. Java is a weird choice. allow us to focus on application without knowing details of machine Java makes more sense
for this kind of programming.

61

31

Overview (II)
Examples:
operating systems software used to create other software!
Could be on exam --->

--> Compilers and assemblers are both translators (from one language to another) Lab 2 is a simulator (interpreter) Usually a debugger is an interpreter (allowing us to see code execute)

compiler linker / loader assembler Linker / loader debugger editor could use word as system software (writing C++ code, etc.)

Driving force: people more expensive than machines


62

A program is static, A process is dynamic. A process uses a program's instructions to carry out actions.

Program vs. Process


Program: a collection of action descriptions Process: a program in execution
contains state:
i) values of variables, ii) location in program, iii) pending I/O, etc.
Processes contain state that change / evolve in time.

state changes over time


63

32

ALL THIS STUFF IS ON THE TEST.

Program vs. Process (Example)


Program S1 S2 S3 A <-- 3 B <-- A+1 Branch S1 State
A= 3 B = 17 PC = S2

time
A= 3 B=4 PC = S3

Executing an instruction changes the state


That's the purpose of executing an instruction: changing the state.
64

Layers of Abstraction Architecture


Computer can be Application < The stuff we're writing. viewed at different Tools < JUnit levels of abstraction LINE OF SYSTEM SOFTWARE--------------------------- Each layer is a virtual High-level Java translated to assembly machine (VM) language, translated to Somewhere Assembly machine language, run on This helps bridge the within these machine human / machine gap 3, we have a OS virtual machine. Each VM corresponds Machine to a language (Kind of like a C++ virtual machine or something.
MACHINE: State determined by the values stored in each memory location, registers, and 65 program counter. Machine has no notion of separately running processes. It simply stupidly executes instructions. The OS keeps track of where the processes go / do.

Keystrokes / mouse actions in Word constitute a langauge

33

An interpreter is a program. When it becomes a process (when we run it) it becomes something that advances another process through its states. That other thing is a program that we give the interpreter. Simply: Running an interpreter becomes a process that advances another process through its states. BASIC is interpreted. Java involves both translation and interpretation. Java bytecode is usually interpreted. A translator is a program, but when we run it, we give it another program, and it produces a 3rd program. The process a translator governs is merelt a translation process. C++, for example, is translated to machine code. Possibly with an interm translation to assembly language. Java also involves translation, by having source translated to bytecode (.class files) Advantages of Translating: Faster, because its translated to machine code. Disadvantages of trans: symbolic debugging is difficult. Advantages of Interpreting: Quicker debugging, prototyping. Disadvantages: Slower.

Two Important Kinds of Program


Translator
a program that, given a program at one VM level, produces a program at another (lower) one C++ is translated to machine code Java source is translated to byte-code faster in execution a program that advances another process through its states BASIC is interpreted Java byte-code is (usually) interpreted quicker debugging and prototyping

THIS IS ON TEST.

Interpreter

66

Why are we concerned more with OS instead of architecture when compiling? In C++, we have input/output streams IO, for example, is effected by the language using operating system routines.

Translating vs. Interpreting


Consider Java:
program MyProgram.java translated byte code MyProgram.class interpreted Java VM
67

34

Translating vs. Interpreting


Translation / interpretation can occur at all levels
High-level translator (compiler) Assembly or interpret or interpret

Any langauge can be translated or interpreted. The CPU always interprets its instructions.

translator (assembler) Machine Language interpreted (usually by a CPU)


68

Lecture #7

69

35

The operating system defines when a process really "begins". Though the CPU is the primary resource, the OS controls it and other pieces like memory in order to allow a process to execute.

Operating Systems - Introduction


When does a process begin?
when it is assigned certain system resources e.g., processor, memory, I/O, registers,

At any instant, there are many processes


multiple, concurrent users Even if there is only 1 user, there are usually many processes running mix of batch and interactive jobs interactive: emacs i fb h di i j b batch: g++ OS tasks (e.g., buffering printer output)

But only a fixed number of resources...


This is why we must meter them out.
70

OS - Introduction (II)
Resources must be managed This is the job of the operating system (OS)
This is one of the biggest tasks of the OS.

71

36

Concurrency is fundamental to the use of an operating system. Concurrency is hard because of scheduling. Priority, length of time waiting (starvation), and many other factors.

Challenges in OS
Concurrency is fundamental Concurrency is hard
Example: sharing a bridge
Long tunnel that only fits 1 lane of traffic What policy do you use to control traffic?

Example:
Process A is using X and needs Y, " B" " Y " " X <-- Deadlock OS must avoid and/or detect and resolve deadlock and starvation
waiting too long to get resources...
72

Responsibilities of OS
Handles interrupts
Interrupt-driven IO is supposed to increase efficiency of simultaneous requests.

may be generated by I/O or by programs

Manages real memory


loading of segments

Manages virtual memory "virtual" = "abstract"


virtual: appears to user to have different characteristics than it has in actuality, i.e., in implementation virtual memory: a large block of contiguous memory space The OS makes us think we have large blocks of memory,
but we really don't.
73

37

Responsibilities of OS (II)
virtual memory may be larger than real memory!

Disk User Virtual Memory Real Memory


page fault
Trying to access outside of Real Memory
74

Responsibilities of OS (III)
File management
These let us access bytes in a file without keeping CPU track of our place. schedules processes (i.e., running / ready / waiting) process cannot make progress without something else. (User interaction, etc.) Securityprocess is using CPUprocess is capable of making progress by using CPU

keeps file handles and position marks

prevents one user from damaging anothers data prevents user from damaging operating system

75

38

A Little About Documentation


See common errors (Top 15 ) link on Carmen common errors ( Top 15) Resource: OSU Center for the Study and Teaching of Writing (CSTW)
see link on web page (also under Resources on Carmen)

Clarification of some elements of the Programmers Guide:


Data Structures / Types Data Element Dictionary

Especially important: shared elements


76

Documenting Data Structures


Consider a structure representing a cell:
struct MemoryCell { char Bit[CellSize]; }

Give name, declaration, description, invariant, purpose,


invariant: (i : 0 i < CellSize : Bit[i] = 0 Bit[i] = 1)

Also use pictures and English to help in the description


77

39

Data Structures (II)


These are the shared data structures
used in the declaration of variables (even local)
variable active_cell: MemoryCell;

used in the declaration of other types


type MemoryType = array [256] MemoryCell;

used as parameter types in operation signatures


function decode (MemoryCell m): integer

For more OO designs, you have types


78

Documenting Data Types


Type: MemoryCell contents: array of 20 characters description: used to represent a ... invariant: every character is a 0 or a 1 operations: set_bit (int)
read_bit (int) : character initialize (string)

Then need to specify each operation


79

40

Documenting Operations
Name: CheckHeaderSyntax()
description: This function checks whether or not a given string conforms to the required syntax for header records (see section 2.3.4) calling sequence: input: char *s - h d record to be checked i h * header d b h k d returns: boolean - true iff header syntax ok requires: ensures:
80

Visible State
Basic principle: information hiding information hiding
hide implementation details from client
simplifies interface client shouldnt rely on these details

Same principle applies to specifications


given in terms of visible (abstract) state

For each shared type, then, there are two kinds of specification: internal & external
81

41

Data Element Dictionary


Should contain all shared data elements
types, variables, and constants documentation of classes should be elsewhere, not in the Data Element Dictionary

Give pertinent information:


name, type, declaring module, description of use, any invariant, value (for constants)

82

Lecture #8

83

42

Testing
1. 1 Philosophy 2. Example 3. How tos (including code) and caveats 4. Levels of Testing

84

Testing: Philosophy

85

43

Definition of Testing
What is testing? testing ?
A process whereby we increase our confidence in an implementation by observing its behavior

Fundamental point:
testing can detect the presence of mistakes, why we should never their absence! That'swithout testing! be confident in our code

A test case reveals a defect ==> Fix it! No test case reveals a defect ==> Not enough testing!
86

Importance of Testing
Despite limitations, testing is the most limitations practical approach for large systems Knuth quotation:
Warning: Ive only proven this algorithm is correct I havent tested it! Haha

87

44

The Right Frame of Mind


Tests should be written to break a program
not to show it works! Be mean!

When a test reveals an error, thats success! Good approach: have someone else test your code
This is one of the best things about working in a team.

88

Theory
3 levels of abstraction in functionality Want: the idea Have: implementation Testing requires comparing it against i i t something, but what?
Idea Id
capturing this idea into a concrete form.

Specification

Implementation

89

45

Theory (II)
Ideal: test against our idea idea
but the idea is usually too fuzzy

If different people, based on a specification, write two different implementations, should have the same expected output.

So make it concrete by writing specification


defines desired mapping from input to output
Input Specification Expected Output
Testing: compare expected and actual

Implementation! Actual Output


90

Testing: An Example

91

46

Example: Sorting a List


Idea: function sorts a list in ________ order Spec: void sort (List& x)
requires: |x| <= 100 modifies: x ensures: For all i in List, ARE_IN_ORDER(List(i), List(i+1)
ARE_IN_ORDER

Q: do we really need the expected output?


i.e., why not just look at actual output and see if 92 it is sorted?

elements(SORTED(List)) = elements(List)

Expected Output
A: #x is a permutation of x, and for all y in x, ARE_IN_ORDER(y,y+1) Specifications often relate final states to initial ones
but not necessarily true e.g., void f(int & x) g,

93

47

Testing: How Tos and Caveats

94

Importance of Independent Testing


See IEEE Computer, Oct 1999 Computer (J. D. Arthur, et al.)
study at NASA Langley had two groups working in parallel

The group with independent testers found:


more f lt overall (critical and non-critical) faults ll ( iti l d iti l) found these faults earlier in the process fixed these faults with less effort
95

48

Figure 1 from Arthur article

96

Figure 2 from Arthur article

97

49

Lecture #9

98

How To Choose Test Input


Too many possible inputs to test them all
space of possible inputs defined by requires

On MIDTERM / FINAL MEMORIZE

Important kinds of test input:


simple cases that are almost ---------------> too simple. They make the procedure not do work.

extremes (e.g., empty list, |x| = 100) trivial / degenerate (|x| = 1, x is already sorted) error-generating at least one (probably more) test cases that generate each possible error message i different categories (e.g., pos./neg. numbers) Also, inputs that cross categories. typical input (random list)
99

50

How To Generate Expected Output


1. 1 By hand
error-prone and tedious also error-prone often just redoing the implementation, and making the same mistakes! an inverse may be easier to calculate e.g., start with a sorted list, and permute it
100

2. With another program

3. Work backwards

Alternate: Validating Output


1. 1 Keep a copy of the input 2. Run the program 3. Validate the actual output against input

Example: sorting
write two functions: copy the input run program and check:

Check

Checking functions may be simpler than the 101 full implementation

51

INSERT: Code for Testing Harness Insert 1: Test Driver for Sort Insert 2: Test Suite for Sort Insert 3: Validating Form of Test Driver

102

Dangers with Testing


Expected output is wrong #1 -- could be reporting errors that are not errors. Expected output #2 could be not reporting errors Testing program is wrong #2 is worse.That would be a problem.
extra code means more chances to mess up e.g., is_permutation(A,B) always returns true

With these errors, there are 2 dangers: errors


1. reporting a non-error 2. not reporting an error

Which is worse?

Not reporting error! (ON TEST)


103

52

Dangers with Testing (II)


A third more subtle potential error: third, subtle, The specification is wrong
how can this be? may not be exposed during testing to increase the chances of finding these problems, have someone else test your code!
Make sure that the client is checking progress.
104

It can be vague on a point. It could say something other than what the author intended. (Conflicts with the original idea)

Testing: Levels

105

53

Levels of Testing
Typical testing path:
1. Unit tests Testing individual pieces of the software independantly. 2. Integration tests Testing how the individual parts communicate. 3. System tests Testing entire program behavior

106

Unit Tests
Individual modules tested in isolation Two flavors:
1. Black box: testing based only on specification (tester doesnt even look at code) 2. White box: testing based on code structure (e.g., tester makes sure every branch of a switch statement is followed)

107

54

Integration Tests
Modules tested in combination in order to check the interfaces Best done incrementally
Main
here
here

Initialize
here test here

Load
here

here

Simulate

here

FileIO

Header
here

Reserve
108

SetMemory

Bottom-up vs. Top-down Testing


Bottom up Bottom-up
start with most basic modules easy to exercise all the features write a driver in p place of higher-level g modules

Top down Top-down


start at top (main) test interfaces early write stubs in place of lower level modules
"stub" on test.

Often these two occur simultaneously, in tandem


109

55

System Tests
Verify that system as a whole meets the requirements and specifications Three flavors:
1. alpha: by developers, before release y y , general 2. beta: by friendly customers, before g release 3. acceptance: by end customer, to decide whether or not to hire you next time!
110

Lecture #10

111

56

Technical Writing

112

What Is Technical Writing?


Writing we do as part of our jobs Possible purposes:
Inform Instruct Persuade Call C ll to action i
Occasionally, entertainment is alright.

Missing from this list:


Entertain
113

57

Four Characteristics of Effective Technical Writing


1. 1 Engages a specific audience 2. Uses plain and objective language 3. Stresses presentation (obvious structure, understandable at a glance) 4. 4 Employs visual aids

114

Why Bother?
Communication is fundamental in society
Politics, law, science, personal lives, health,

Fundamental in personal success:


Good idea Ability to communicate that idea y

Highly valued by employers

115

58

Writing in Computer Science


Taking exams Documentation (for users and developers) Reports and memos Papers (journals, conferences, magazines) Proposals P l Reviews of others work Books
116

Good and Bad News


The bad news
Most of us are not very good at it We enjoy technical challenges much more

The good news


Writing is actually not too different from g y computer science!

117

59

Parallels Between Writing and Computer Science


Programming Must identify/determine: User Purpose of program Program features User interface Preprogramming Writing Must identify/determine: Audience Purpose of document Depth of document Style and tone Prewriting
118

Parallels Between Writing and Computer Science II


Software Development Requirements and design Implementation (i.e., coding) T ti Testing Debugging Technical Writing Prewriting Composition R i i Reviewing Revising

119

60

Analyzing the Audience


Few people read this stuff for fun Must correctly identify a customer and what that customers needs are Consider writing an audience profile
Novice, technician, expert, manager, VC, Reading level Motivations, biases, expectations,
120

Analyzing the Audience II


Make this analysis concrete by stating assumptions made on background, motivation, needs, etc.
The reader is expected to be familiar with the predicate calculus. This manual is designed for application programmers who write S47G applications for the insurance industry.
121

It's a good idea to say somewhere the requirements for the audience.

61

Technical Audience
Function oriented organization Function-oriented
E.g., alphabetical listing of all functions

Want a complete (exhaustive) resource


All information they might want is there somewhere

Willing to spend a great deal of time


Will read the document carefully
122

Technical Audiences: Function - oriented. Cusomer Audiences: Task - oriented. => More of a tutorial. (They don't care about "why")

Customer Audience
Prefer task-oriented organization task oriented
E.g., enumerated steps for each possible task

Want just the necessary information


Only the information critical to their jobs is there

Will spend as little time as possible


Must be concise and easy to read
123

62

Identify the Purpose


Rule of business letters and memos:
Begin with clear statement of what you want!

Larger documents are no different


What information are you trying to impart? What are you trying to teach? y y g What view do you want the reader to adopt? What action do you want done?
124

The Depth of Writing


Bloom s Blooms taxonomy of cognition:
1. 2. 3. 4. 5. 6.
(categorization) things people do when they think.

Knowledge Facts Comprehension Understand of a fact's implication Application How we apply facts to the current situation. Analysis Creating new ideas from those we already have. Synthesis Joining what we know together, creating connections to other topics and ideas. Evaluation Judging ideas based on their importance and
validity.
125

63

Verbs Used in Statement of Purpose


Knowledge
count, define, draw, identify, indicate, list, name, quote, recall, recite, recognize, record, state, tabulate, trace, write.

Comprehension
associate, compare, compute, contrast, describe, differentiate, discuss, distinguish, estimate, extrapolate, interpolate, predict, translate.
126

Verbs Used in Statement of Purpose II


Application
apply, calculate, classify, complete, construct, demonstrate, employ, examine, illustrate, practice, relate, solve, use.

Analysis
analyze, detect, explain, group, infer, order, relate, separate, summarize, transform.

127

64

Verbs Used in Statement of Purpose III


Synthesis
arrange, combine, construct, create, design, develop, formulate, generalize, integrate, organize, plan, prepare, prescribe, produce, specify, research.

E l ti Evaluation
appraise, assess, critique, determine, evaluate, grade, judge, measure, rank, select, test.
128

Prewriting: Getting Started


Read the question / problem statement carefully. Make a list of the required cognitive tasks. Assess what you know. Compare this knowledge with the level of the required cognitive task

129

65

Example
Compare the performance of two cache Compare replacement algorithms Cognitive tasks
Compare Contrast Maybe analyze and recommend?

Recall various issues in cache algorithms


130

Prewriting Tasks
Quick list
Specific points for each cognitive task

Brainstorm
List everything you know about the topic Do not judge or weed anything out Obj i quantity Objective: i

Review list
Assess where research is needed
131

66

Prewriting Tasks II
Choose a single point that will be in the final product and outline a section to develop that point. Involve other points as appropriate. Do the research. Plan the format. This is not a linear process!
132

Prewriting Tasks III


Outlining
Get the planned structure down Avoid forgetting a key point Check for the logical flow of arguments and information This is an easy step to skip, but good work here will pay dividends in the future!

133

67

Lecture #11

134

Writing the Document


The better the preparation in the prewriting phase, the smoother this goes. Regardless, its still work! Requires tools, skills, practice, experience, and motivation.

135

68

Technical Writing: Document Component Engineering


Component = section of the document. document Often has its own heading. Large components consist of smaller ones. Each (large enough) element from the outline becomes a component

136

Advantages of Components
The whole document is too intimidating Obvious milestones
Reduces panic (you know where you stand) Permits time budgeting

Reduces writing to a step-by-step process Instant gratification Easy cure for writers block: work on a different section
137

69

Writing a Component
Know the purpose Have all the information Different strategies:
Write a draft using sentences Jot down points in any form then flesh out into form, sentences Combination (sentences, phrases, points)
138

Overcoming Writers Block


Start anywhere If the problem is lack of information, go back and do more research Explain it to someone else (verbally) Work on a different section Take a walk (in a snow storm?) Imagine life when you are done
139

70

Overcoming Writers Block II


Force yourself to sit at your desk until done Revise your outline or organization Revise some section youve already written Change your environment Diagram th structure of the component Di the t t f th t Set an impossible schedule and panic Take a break
140

Rhetorical Patterns
Every culture has well-established patterns of exposition
Ready-made structures into which specific information may be dropped

The reader is already familiar with these patterns The technical writer does not have the time (or skill) to invent new ones
141

71

General-to-specific Pattern
Often used for introductory section Start with the most general statement
More computing resources are devoted to the management of data than to any other task

Gradually get more specific


This Thi program simplifies the manipulation of i lifi th i l ti f numeric data on a personal computer

Finally specific statements


142

Classification Pattern
Organize information by dividing it into categories
E.g., a section on each of initialization, data entry, selection, access control, etc

Within each category, present parallel information


E.g., purpose, prerequisites, results, error messages, alternatives, references, etc
143

72

Comparison-contrast Pattern (Point-by-point)


Consider one aspect at a time
In football, the ball may be thrown forward from behind the line of scrimmage In rugby, only lateral passes are allowed In football, play ends when the ball-carrier is tackled In rugby, play continues after a tackle, but the tackled player must release the ball
144

Comparison-contrast Pattern (Whole-to-whole)


Two ways to create a new object (zack)
1. From the command line
On the command line, type edit zack Fill in attributes of the presented template Select Save from the File menu

2. From within the application pp


Select New from the File menu Fill in attributes of the presented template Select Save As from File menu, and type zack
145

73

Definition Pattern
Typically short and simple Example
Undo is a function that restores an object to its state immediately prior to that last operation Places Undo in the class of functions, then distinguishes it from other functions

146

Chronological Pattern
Typical for task-oriented instructions task oriented Given in the order in which they must be performed

147

74

Effect-and-cause Pattern
Often used for error messages Give a list of error messages
ordered alphabetically, by error number,

For each message, list the possible causes


ordered most to least likely

After each cause, give the action(s) the user should take to recover
148

Putting Components Together


Add headings (part chapter section ) (part, chapter, section, ) Add transitions where needed
Important to prompt reader for what to expect, or to reinforce that some change is coming

149

75

Possible Transitions
Moving to the next point in a sequence
Firstly, secondly,

Contrasting item or viewpoint


However, on the other hand, otherwise,

A result or conclusion
Therefore, in consequence,

Relating things in time


Now, then, soon, immediately,
150

Possible Transitions
Introducing an example
For example, that is,

Further strengthening a point


Moreover, similarly, further,

Concluding
In conclusion, in summary, finally,

151

76

Preliminary Draft
Starting is always difficult
Helps to remember its just a draft!

Dont worry about spelling, grammar, form Spend effort on sound communication of major points Fill in your outline

152

Middle Draft
Build on the base of the preliminary draft Refine the organization and fill in points Ensure each point belongs in that paragraph Cut and paste Play ith th t t Pl with the text
Font, layout, spacing, page count

153

77

Final Draft
Spelling and grammar
Run the spell checker, but thats not enough!

A which hunt Word choice Transitions Typos Pagination


154

Revising
Where bad writing becomes good writing First draft is always bad
Tempting to become attached to text weve written Write the first draft anticipating that it will change in the future

155

78

Revising Tasks
Add flow and smooth transitions Careful, accurate movement from one point to the next, one section to the next Make decisions that have been put off Reduce wordiness Clarify subordination relationships between points
156

Red Flag Inconsistent View


Changing from 2nd to 3rd person
Limit your disk storage to 100 Mb. The user can submit a request for more storage space to the system administrator. Limit your disk storage to 100 Mb. For more storage space you can submit a request to the space, o s bmit req est system administrator.

157

79

Red Flag Passive Voice


The verb expresses what is done to the object (by someone or something) Occasional use is OK (and even unavoidable in many technical documents) But excessive use weakens your writing y g
This error is used to indicate The parser issues this error to indicate
158

Red Flag - Wordiness


In the final analysis, the end result of a analysis wordy document is increased cost in terms of pages of paper, bytes on a disk, and inefficient use of the readers time Words cost money. It is cheaper to print a short book than a long one.

159

80

Red Flag Faulty Parallelism


The use of different grammatical constructs in a parallel structure Consider the list:
Preparing for installation How to configure g Do you want the advanced options?

160

Red Flag Dangling Modifier


The use of a verbal phrase that does not connect with (or modify) anything else in the sentence
After typing enter, the system will continue with the second pass over the program After you type enter, the system will continue with the second pass over the program

161

81

Red Flag Ending a Sentence With a Preposition


Example:
Before using the software, you must set it up You must set up the software before you can use it

Winston Churchill:
That is criticism up with which I will not put.

162

Red Flag Provincial and Sexist Language


Unless you are sure your readership is homogeneous, be sensitive to and inclusive of many cultures and both genders Example: A list of names:
Bill White, Ken Williams, and Bob Smith Chris Amini, Lea Sanchez, and Rei Chi Lee

163

82

Red Flag Which vs. That


Simple rule: if that sounds OK use it! that OK, Use which with nonrestrictive clauses Use that with restrictive clauses Example:
Eds country house, which is located on five acres, had bats in the attic. The house that sat on the top of the hill had bats in the attic.
164

Red Flag - Utilize


Why would anyone utilize the word utilize when the word use would work just as well?

165

83

Things to Check Appropriate Style


Verbose vs terse vs. Formal vs. informal
Use of contractions, informal language, slang

Tone
Distant warm, intrusive Distant, warm

Consistency is very important

166

Bottom Line
Technical writing requires work, practice, work practice skill, technique, time; not talent. The first draft is always bad writing. Allow time (and energy) for revisions. There is no substitute for having something g g to say. You cant bluff it.

167

84

Lecture #12

168

CVS

169

85

CVS: Concurrent Versions System


Widely used, especially in the open source used open-source community, to track all changes to a project and allow multiple access
Can work across networks

Key Idea: Repository


The place where the originals and all the modifications to them are kept. kept Each person checkouts their own, private copy Changes are commited by each person Everyone elses changes are updated into your own copy.

170

CVS: Examples
The following examples show an existing project being put under CVS How to start using the repository Then two different people making changes:
Putting modified file into repository Getting each others changes. Finding out how things have changed.

You can read more:


man cvs, or emacs:
^U^Hi/usr/local/info/cvs.info

Pointer to manual in Carmen:Content:Resources

171

86

The Repository
Two ways to set the root of the repository root
Environment variable
setenv CVSROOT /project/c560ab05/CVSREP

Command line flag (-d)


-d /project/c560ab05/CVSREP

Repository may contain several modules

172

Creating Repository
Once per project by one person project, (with umask 7) Command:
cvs init

Creates repository root, administrative files Check that group and other permissions have been properly set. (Use ls -alF.)
173

87

Creating Repository (Example)

% cd /project/c560ab05/ % ls Lab1/ Lab2/ % cvs d /project/c560ab05/CVSREP init % ls CVSREP/ Lab1/ Lab2/

174

Adding Existing Project


Once per project by one person project, Command:
cvs import <module> <vendor> <release>

Copies current directory contents to module Afterwards, original source can be removed , g
But be careful!!

175

88

Adding an Existing Project (Example)


% cd Lab2 % ls loader.c loader.h simulator.c simulator.h % cvs d /project/c560ab05/CVSREP import sim A start [editor starts; save log entry; exit editor to cont.] N sim/loader.c N sim/loader.h N sim/simulator c sim/simulator.c N sim/simulator.h No conflicts created by this import % (cd to parent folder & carefully remove Lab2 folder)
176

Checking Out
Once per person Command:
cvs checkout <module>

Copies repository files to local, working directory


Local directory contains CVS subdirectory for administrative book-keeping
177

89

Checking Out (Example)


% cd ~person1/mycode/ % cvs d /project/c560ab05/CVSREP checkout sim cvs checkout: updating sim U sim/loader.c U sim/loader.h U sim/simulator.c U sim/simulator.h % ls sim/ % ls sim/ CVS/ loader.c loader.h simulator.c simulator.h
178

Committing
Commit (copy) changes made on local (working) files to the repository Command:
cvs commit

New files created in local working directory must be explicitly added (before commit)
cvs add <new-file>

179

90

Committing (Example)
% cd ~person1/mycode/sim % (modify loader.c and create memory.h) % cvs add memory.h cvs add: scheduling file memory.h for addition cvs add: use cvs commit to add this file permanently % cvs commit cvs commit: Examining . [editor starts; type & save log entry; exit editor to cont.] Checking in memory.h; /project/c560ab05/CVSREP/sim/memory.h,v <-- memory.h Initial revision: 1.1 done Checking in loader.c; /project/c560ab05/CVSREP/sim/loader.c,v <-- loader.c New revision: 1.2; previous revision: 1.1 done
180

Updating
Each person, with appropriate frequency person Command:
cvs update

Brings your local working directory up-to-date with repository (merging differences if possible)
U : local file was updated A/R: local file added/removed M: local file is a modification of repository C: conflict detected between local file and repository
181

91

Updating (Example)
% cd ~person2/mycode/sim % ls CVS/ loader.h loader.c simulator.h simulator.c % cvs update cvs update: Updating . U loader.c A memory.h % l ls CVS/ loader.c simulator.h loader.h memory.h simulator.c

182

Working on Project
Multiple people can simultaneously checkout the same module Person1 and Person2 are both working away on their local copies
If working on different files, no problem g p
Not quite true!

If they are working on the same file, there could be a conflict


183

92

Conflict Resolution
Person1 checks out code
modifies loader.c

Person2 also checks out code


also modifies loader.c

Person1 commits: no problem Person2 commits:


% cvs commit cvs commit Examining . cvs commit: Up-to-date check failed for loader.c cvs [commit aborted]: correct above errors first!
184

Resolving Conflicts
CVS tries to merge changes Sometimes changes clash
% cd ~person2/mycode/sim % cvs update cvs update: Updating . RCS file: /project/c560aa/CVSREP/sim/loader.c,v Retrieving revision 1.5 Retrieving revision 1.6 Merging differences between 1.5 and 1.6 into loader.c rcsmerge: warning: conflicts during merge cvs update: conflicts found in loader.c C loader.c
185

93

Resolving Conflicts: Human


Note how all non-overlapping modifications are incorporated in y pp g p your working g copy, and that the overlapping section is clearly marked with `<<<<<<<', `=======' and `>>>>>>>'.
int main(int argc, char **argv) { init_scanner(); parse(); if (argc != 1) { fprintf(stderr, "tc: No args expected.\n"); exit(1); } if (nerr == 0) gencode(); else fprintf(stderr, "No code generated.\n"); <<<<<<< loader.c exit(nerr == 0 ? EXIT_SUCCESS : EXIT_FAILURE); ======= exit(!!nerr); >>>>>>> 1.6 186 }

When to Update/Commit
When confident things can be used by others
Dont wait until perfection Your commits should at least compile though!

One should update before committing


Integrates everyone elses changes

Update when you are ready for someone else s elses work The more files, the better
187

94

Subversion: a more modern alternative


Subversion is also available on stdsun as stdsun, svn. (Must subscribe to SVN.) Documentation is available at http://svnbook.red-bean.com/

188

Distributed Version Control


No central repository Example (free) tools:
Mercurial Git

189

95

Lecture #13

190

Introduction to (and Review of) Assembly Language

191

96

Definition
Recall: translation (vs. ) ______________ (vs
source program translated into target program (virtual) execution of the target on its VM should represent (have the same behavior as) (virtual) execution of the source on its VM source is not directly executed target (object file) is executed or translated later
192

Definition II
When the source is a symbolic representation of machine language:
source language = __________________ translator = __________________

(When the source is higher-level, the ( g , translator is usually called a ____________)

193

97

Advantages of Assembly Language


Over machine code (lower level)
easier to remember mnemonic operations than actual opcodes
e.g., ADD, SUB, MUL, DIV, ... vs. 04, 2C, F6, F6,

similarly for addresses in program


e.g., BR LOOP1 vs. BR 46554
194

Advantages of Assembly Language II


Over higher level languages higher-level
access to full capabilities of the machine
e.g., testing overflow flag, test-and-set instruction, how would you do that in PASCAL or Modula?

performance ?

195

98

The Best of Both Worlds


Systems programming is often done in a language like C
syntax of a higher-level (problem-oriented) language but gives access to low-level machine, like assembly language bl l

196

An Old Notion, Now Mistaken


If a program will be used a lot, it should If lot (for efficiency) be written in assembly language. No longer true!

197

99

Good compilers Fast machines Hard to write


10 lines of code / day, independent of language

Hard to read
high cost of maintenance
can be 2/3 of total 15% (annual) programmer turnover
198

Modern Approach
Write in high-level language high level Analyze to find where time spent Invariably, its a small part of the code Tune that tiny part for high performance
perhaps by writing in assembly language

199

100

Modern Approach II
Higher level can be a performance win too!
problem-oriented language gives problem-level insights huge performance gains are in algorithmic insights
e.g., O (n3) vs. O (n lg n)

assembly language programmer tends to be immersed in bit-twiddling (saves small amounts all over, but misses big picture)
200

Modern Approach III


Conclusion:
assembly language use is often a holdover from when machines were expensive, and people were cheap

201

101

So Why Do We Learn This Stuff?


You may still need to write that tiny, critical tiny part in assembly language Concepts/techniques similar for compilers Good vehicle for understanding architecture Legacy code with large parts in asmbly lng asmbly. lng. This world still needs assemblers!
many compilers translate to assembly language
202

Assembly Language Instructions


Weve seen the basic structure in memory: We ve
OP CODE OPERANDS

Four parts to an instruction in assembly language:


1. 2. 2 3. 4. Label Operation Operands Comments
203

102

Example Instruction
Test BRZ 1,Loop ;if R1=0 goto Loop

label

operation

operands

comment

204

Label Field
Symbolic name for an instructions or a instruction s datums address (often, but not always) Clarifies branching to a particular instruction
e.g., BR e.g., IO Loop1 2,depth

Al allows symbolic access to d t Also ll b li t data Often severely limited in length


205

103

Operation Field
Mnemonic for an instruction
e.g., ADD, SUB, BRZ

Mnemonic for a pseudo-instruction


e.g., NMD well see what these mean later...

206

Operand Field
Addresses and registers used by instruction
recall: arguments to the function

What to add, where to branch, where to store, Operands for pseudo instructions pseudo-instructions
used to give information to the assembler e.g., program name, how much space to save,
207

104

Comment Field
No effect on translation
no semantic impact on program

But huge impact on legibility!


clarify the program strictly for human consumption y p

208

Lecture #14

209

105

Example Program
If we want:
N := I + J + K;

we might write (in SPARC assembly language) something like

210

Example Program (Continued)


set I_s, %r2 , ! %r2 = I_s ld [%r2], %r2 ! %r2 = [I_s] = I set J_s, %r3 ld [%r3], %r3 ! %r3 = J set K_s, %r4 ld [%r4], %r4 ! %r4 = K add %r2 %r3 %r2 %r2, %r3, ! %r2 = I + J add %r2, %r4, %r2 ! %r2 = I+J+K set N_s, %r3 st %r2, [%r3] ! N = I + J + K
211

106

Pseudo-Operations
Recall: operation field can be either: operation
instruction (BR, SHL, ) pseudo-op

Unlike operations, do not have a machine ( p ) q instruction (opcode) equivalent Give information to the assembler itself
assembler directives
212

SPARC Pseudo-Operations
I_s: I s: J_s: K_s: N_s: A_s: .word .word .word .word .skip 0 0 5 0 400

213

107

Pseudo-Ops: Uses
Four principal uses:
segment definition symbol definition memory initialization storage allocation

214

Segment Definition
Recall information in header record:
initial execution address segment name length load address

All this information comes from pseudo-ops


(all except ______________ )
215

108

Segment Definition II
Two important pseudo-ops: pseudo ops:
ORI END (origin) (end) MainP ORI 133 ST 0,136 . . . END 137

What is the header record of the object file? 216 (footprint?)

(133) 85x 86x

ST 0,136

89x

Header record: H89MainP_85??


217

109

Symbol Definition
A label creates a symbol Symbol is often implicitly defined to be the address of that instruction and/or data Hello Test ORI 133 ST 0,136 BRZ 1,147 . . .

What is the value of Test?


218

Symbol Definition II
(133) 85x ST 0,136 86x BRZ 1,147

So Test has value: _________


219

110

Explicit Symbol Definition


Symbols can also be defined explicitly Pseudo-op:
EQU (equate) EQU 0 ;set ACC to 0

Example:
ACC

Symbols are used as program constants

220

Use of Symbols
Example 1: ADD ACC,106
translates as: i.e.:

Example 2: NOut EQU 2 IO NOut,Count


translates as: i.e.:
221

111

. . .
B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 BA

Memory
7 0 B 9 B 0 B 9 B C 5 0 2 3 2 3 0 3 0 3 0 4 0 0 8 0 8 0 0 0 0 0 6 A B 0 0 B B B 0 0 F F 0 A 0 8 4 0 6 8 0 F 5

Instruction/Data

. . .

222

Memory Initialization
Recall Top\n example Top\n Sometimes want to load data into memory
might be able to use corresponding instruction (because machine doesnt care!) but that is inconvenient (and there may not always be a corresponding instruction)

Two pseudo-ops (for 2 kinds of data):


NMD (numeric data) CCD (character data)
223

112

NMD and CCD


Format for these pseudo-ops:
numeric data is decimal integer character data is two characters

Example:
Count NMD Text CC CCD CCD CCD CCD 10007 He ll o!
224

NMD and CCD (II)


These pseudo-ops often have labels pseudo ops
(but not required)
(106) 6Ax
0 2 7 1 7 0 0 0 0

(i.e., 1000710) Count = ________ Text T t = ________

H l o

e l !

225

113

Lecture #15

226

Storage Allocation
Set aside a block of memory not initialized (i.e., dont care) Pseudo-op:
RES (reserve storage) X Buffer Y NMD RES NMD 0 100 0
227

Example:

114

Yields:

Q: How would we typically address locations in this buffer?

228

Using Blocks of Storage


A: Example: NIn EQU 0 . . . IO NIn,Buffer Note: some pseudo-ops affect the location counter (storage) others dont
do: dont:
229

115

Symbols - Shortcomings
Weve seen a lot of utility for symbols We ve
mnemonics for data constants & memory addresses

But they are sometimes inconvenient:


consider initializing a register to 582
i.e., R1 <-- 582

must explicitly name and allocate a constant!


C582 start NMD LD 582 1,C582

(why not just LDI 1,582 ??)


230

Symbols - Shortcomings II
Problems with this approach:
___________________ ___________________ ___________________ ___________________

231

116

Alternative: Literals
Implicit allocation & initialization of memory Allows us to put the value itself right in the assembly language instruction Preface with = Example:
LD 1,=582
232

Literals
This means:
allocate storage at end of program initialize this storage with the value 582 use this address in the instruction

So it is almost equivalent to:


LD . . . C582 NMD 1,C582 1 C582 582
233

117

Literals - Restrictions
Must be in the range -219219-1 2 2 1
can be represented by 1 word

Can only replace the S field Cannot be indexed Cannot use with:
loading an immediate value, branch, store, shift, IO for reading, IO for writing a character

Each restriction has a motivation


234

Lecture #16

235

118

Big Picture: Labs 2-4


EG1 ORI 35 Assembly ST 1,Dt File
Assembler

H23EG1 2307 Object File T2321026

Loader

Executable File

Emulator
236

Big Picture: Lab 3


Assembly y Language Program
EG1 ORI 35 ... Buff RES 50 ...

Memory
0 35

50

Footprint

255

237

119

Lab 3
Assembler does not need to keep this (potentially huge) array/footprint Instead, use tables (symbol, literal, ) and location counter Generate object file only ( G j f y (much smaller) )

238

Assembler Tasks
1. 1 Parse assembly language instructions
check for syntax tokenize the character string maintain location counter (LC) ( ) LC = eventual location in memory of this instruction or data
239

2. Assign addresses to instructions and data

120

Assembler Tasks II
3. 3 Generate machine code
evaluate mnemonic instruction
replace with opcode recognize & translate synthetic instructions (RET, etc.) replace symbols & literals with value

evaluate operand subfields concatenate to form instruction generate header record evaluate NMD, CCD, etc
240

4. Process pseudo-ops

Assembler Tasks III


5. 5 Write object output file
header & text records

6. Write listing output file

Nothing here seems all that hard...


241

121

Example
1 2 3 4 5 6 7 8 9 Prog Acc Begin ORI EQU LD ADD ST BR NMD RES END 20 0 Acc,N ;R0 <- 13 Acc,=1 ;R0 <- R0+1 Acc,Ans M[Ans]<-R0 0,0 0 0 13 1 Begin
242

N Ans

First Attempt
Read each input line and generate machine code Line 1
information for header but not enough for full header record g we do know:
20
243

122

First Attempt II
Line 2:
information for assembler symbol Acc set to 0

Line 3:
yeah! An instruction to translate! ( y (LD Acc,N) , ) yields:

244

First Attempt - Difficulties


Problem: Lines 4 & 5:
same problem do not yet know address for =1 or Ans 1,

Solution:

245

123

Now let s see some basic data structures for lets assemblers...

246

Machine Op Table
Mnemonic Name Opcode O d Instruction Size Instruction Format

Static (doesnt change during computation) ( p ) For the (simple) abstract machine:
all opcodes are 4 bits all instructions are same size (i.e., 1 word) all formats are the same (i.e., O|U2|R|X|U1|S)
247

124

Machine Op Table II
But this need not be the case in general
different opcode lengths
common instrctns. have short opcodes (e.g., 0110) less common ones are longer (e.g., 1110110)

variable instruction length


e.g., branch relative, where PC <- PC + operand
could be near, operand is 8 bits could be far, operand is 28 bits (instruction uses 2 words)

248

Machine Op Table III


varying formats
e.g., long subroutine call operation syntax might be: CAL 19-bit-address-offset the fixed format of our abstract machines instructions makes the assemblers parsing easier

249

125

Pseudo Op Table
Mnemonic Length Format

Al a static table Also t ti t bl Some lengths are 0, others are 1, or variable


250

Location Counter
Eventual address of this instruction or data Initialized with __________ Increase with each instruction
see _________________

Increase with (some) pseudo-ops pseudo ops


see __________________

251

126

Symbol Table
Name Value Other Stuff

Pass #1
each symbol is identified
every time a new symbol is seen (i.e., a label), insert it into the Symbol Table if its already there?

252

Symbol Table II
Pass #1 (continued)
each symbol given a value
explicit assignment (e.g., Acc EQU 0)
easy! just put the value of operand into the table

implicit assignment (e.g., X NMD 13) p g ( g, )


must know the address of this instruction so, keep track of addresses as program is scanned use location counter (LC)
253

127

Symbol Table III


Pass #2
symbols in operands replaced with their value
look up symbol in Symbol Table if there, replace with value if not there, (Q: how could the symbol possibly not be there?) ho co ld s mbol possibl

254

Lecture #17

255

128

Literal Table
Name Address Value Size Other Stuff

P #1 Pass
literals are identified and placed in the table name, value, and size fields updated duplicates can be eliminated

256

Literal Table II
After Pass #1:
literals are added to the end of the program address field can now be calculated

Pass #2:
literals in instructions are replaced with the p __________ field from the Literal Table what if the literal is not in the table?
257

129

Information Flow (Passes 1/2)


Source File Intermediate File Object File

Pass #1

Pass #2

Symbol Table Location Counter Literal Table Machine Op Table Pseudo Op Table

Listing File

258

Two Pass Assembler: Limitations


Q: Does our 2-pass approach solve all 2 pass forward-reference problems? A: no! Something is still broken Hint:
what is the key invariant (w.r.t. symbols) during pass #1?

259

130

Pass #1 Invariant
Top ORI 34 ... Loop --- --... ... S EQU --...

Key invariant:

current position

So what could go wrong?

How could this happen?


260

Forward Reference Restriction


Consider
Y EQU 0 X EQU Y X EQU Y Y EQU 0

To avoid this trouble, impose a restriction:

261

131

1. 1 Pseudo code for 2-pass assembler 2 pass 2. In-class exercise: hand assembly
calculate symbol and literal tables calculate loaded image in memory calculate object file j

(Use list of op codes)

262

Lecture #18

263

132

Relocation

264

Absolute Programs
Programmer decides a priori where program will reside
e.g., Prog ORI 176

But memory is shared


concurrent users, concurrent jobs/processes , j p several programs loaded simultaneously

265

133

Absolute Programs: Limitation


At any given instant:
Memory occupied

Picture is dynamic
jobs are scheduled jobs complete

We cannot predict what this picture will look like!!


266

Absolute Programs: Limitation II


Would like the loading to be flexible
decide at load time where it goes! (not at ____________ time) this decision is made by __________________

What the programmer and user want:


find a free slot in memory that is big enough to fit this program
267

134

Motivating Relocation: Example


Prog X Reg Start ORI 0 NMD 32 EQU 2 LD Reg,X ST Reg,Y BR 0 0 0,0 RES 1 END Start
268

In memory, this memory program appears as:

269

135

One Slight Change


Prog X Reg Start ORI 58 NMD 32 EQU 2 LD Reg,X ST Reg,Y BR 0 0 0,0 RES 1 END Start
270

Appearance of new program in memory:

271

136

Compare the Old and the New

272

What Changed?
Load Address = ________

i.e., ____________ + _____ i.e., ____________ + _____

273

137

Another Slight Change


Prog X Reg Start ORI 176 NMD 32 EQU 2 LD Reg,X ST Reg,Y BR 0 0 0,0 RES 1 END Start In memory, this memory program appears as:

274

Relocation
The loader must update some (parts of) text records, but not all
after load address has been determined

The assembler does 2 things:


assemble with a load address of 0 tell the loader which parts will need to be updated
275

138

Modification Records
One approach: define a new record type
Tag Location

For our example:


H01Prog 0005 T0000020 T0102000 T0222004 T03C0000
276

Modification Records II
We could add the following records to the object file:
M01 M02

For some architectures, longer modification records would be required


for our machine, it is always the same part of the instruction that needs to be relocated
277

139

Modification Records III


One disadvantage of this approach:

278

Alternative: Bit Masks


Use 1 bit / memory cell
bit value is 0 means no relocation necessary bit value is 1 means relocation necessary

Size of relocation data independent of number of records needing modification g Hard to read (debug, grade,)

279

140

Compromise (For Our Machine)


Change the syntax of a text record Flag modification with an M at the end of the record Example
H------------M T00----T00 T01-----M T02-----M T03----280

Lecture #19

281

141

Kinds of Data
Our machine has two flavors of data:
relative (to the load address) absolute

The first must be modified, the second not Lets look at how these kinds arise Let s arise

282

Example 2
EG2 TS V Start ORI EQU NMD LDI LD LDI LD LD SUB BRZ BR BR END 27 27 1,V 2,0(1) 3,TS 0,0(1) 1,0(1) 1 =27 1,=27 1,Stop 3,Start 0,V(3) ! ! ! ! ! ! ! ! ! ! ! TS = 27 [V] = 27 R1 = V = ?(relative) R2 = [0+V] = 27 R3 = TS = 27 R0 = [0+V] = 27 R1 = [0+V] = 27 R1 = 27 27 = 0 if (R1 is 0) then halt goto Start halt; dump all

Stop

283

142

Symbols
Some are relative:
e.g.,

Some are absolute:


e.g.,

Symbol Table
Name Value Relative?

284

Symbols: Rules
A symbol is absolute if and only if it is defined in an EQU by:
_____________, or _____________________

(well see another way later) ( y )

285

143

Literals
Our machine does not have relative literals Other machines allow a special literal, =*, to mean current location counter
e.g., LD 1,=* such a literal is relative, others are absolute ,
Name Star1 =6
286

Value

Address

Relative?

Literals
With literals, relative refers to the value literals relative
the addresses are always relative!

287

144

Relocation Information in the Symbol Table: Example


Prog X Reg Start ORI NMD 32 EQU 2 LD Reg,X ST Reg,Y BR 0 0 0,0 RES 1 END Start After pass #1, the #1 symbol table is:
Name Value Relative?

288

Convention
To denote a relocatable program, program omit the operand of ORI
Prog1 ORI 96 Prog2 ORI (absolute) (relocatable)

289

145

Tables: Storing & Searching

290

Overview
Given: a collection of <tag,value> pairs <tag value>
e.g., symbol table

Searching =
g given a tag, return corresponding value g, p g

Q: is this spec ok?

291

146

Intentionality of Specification
What do we do if key not in table? y
a) return an arbitrary value b) crash, halt, explode c) return a special value (NULL, error, )

Traditionally, we want (c) But what if client knows key is in table? Pay for this extra checking with each call to search? The defensiveness dilemma; maybe options a) and b) for production look better if checking components are better, available for development. The point: intentionally decide what your specification is, and document your decision.
292

Relevance for Assemblers


Every line (almost) has an instruction or a pseudo op
search tables

Can have lots of symbols and literals


need to create and search tables

Assemblers can spend 50% of the time searching tables!!


so, its important this be efficient
293

147

Linear Search
Algorithm:
compare target with 1st key if match, then done (return value) else, compare target with 2nd key if match, then done (return value)

Advantages:
294

Linear Search - Complexity


How long to insert a new <tag,value> pair? <tag value>

How long to find a target?


best case: worst case: average case:

Average case assumes a distribution


Tavg = i p(i ) t(i )
295

148

Linear Search - Complexity II


E g for a distribution where the probability E.g., of seeking a key not present in the table is zero and is equal for all other keys (a uniform distribution),
Tavg =

One way to lower average case complexity:


____________________________________
296

Linear Search - Complexity III


Overall search complexity is ( )
i.e., double table size

Time

Table Size
297

149

Lecture #20

298

Binary Search
Algorithm to search among 2 or more items
compare target with middle key if target middle key, then search first half if target > middle key, then search second half

This algorithm requires: g q


a) b) c)
299

150

Binary Search - Complexity


Each iteration divides the problem in half
T(n) =
Time

Table Size

300

Binary Search - Complexity II


Example:
table with 1,000,000 entries: _________ table with 2,000,000 entries: _________

Drawback: simple insertion is linear


sorted sorted sorted + + sorted

time to build =

301

151

Building Sorted Tables


Solution #1: use a heap
insertion is faster, (lg n) but more complicated algorithm

Solution #2: build, then sort


in assembler, pass #1 does mostly insertions, then pass #2 does mostly searches create as a linear table ( ) sort ( ) use binary search on sorted table
302

Estimated Search
How do we search for things?
e.g., finding Brutus in phone book binary search? No! Make a guess

For table search, must know ____________


zzzz Key AAAA 0 Table Index N-1
303

152

Hash Tables - Overview


Combines:
strength of binary search (fast _________) strength of linear search (fast _________)

Hash function: converts keys to integers


h: K ZN-1 where: ZN-1 = { 0, 1 2 , N 1 } h 0 1, 2, N-1 N = K =
304

Hash Functions
Used to insert and to search
insert(key) --> into h(key) search(key) --> look in h(key)

Ideal: generate a unique integer for each key


but this ideal does not (usually) exist ( y) because:

So what we really look for in h:


returns a number in [0..N-1] with uniform distribution
305

153

Designing Hash Functions


1 Probability 1

0 AAAA Key zzzz

0 0 h(Key) N-1

Want: small differences in key result in big differences in h(key)


like a deterministic random number generator
306

Example Hash Function


h (key ) = ( letter values )mod 50

For example:

h (magenta ) = (13 + 1 + 7 + 5 + 14 + 20 + 1) mod 50

But two different keys could be mapped to same integer


h (rub ) = (18 + 21 + 2) mod 50 = h (madre ) = (13 + 1 + 4 + 18 + 5) mod 50 =

This is called a collision

307

154

Dealing with Collisions


a) Keep a linked list: chaining

collisions

b) Go to next open cell : open addressing


yields

insert
308

Dealing with Collisions II


this approach can lead to clustering clustering

c) Rehash with another function: open addressing with quadratic probing or double h hi (to d bl hashing (t approx. uniform if hashing)
cost: ___________________
309

big blocks of occupied slots

155

Complexity - Insertion
For approach c) (uniform hashing), how hashing) long does an insertion take? #attempts depends on _________________
e.g., first entry never collides (1 attempt)

Let r be the fraction of the rep. array that is full f ll


probability of a collision = probability of success =
310

Insertion II
Let p(t) = prob insertion requires t attempts prob.

The expected number of attempts is given by:

p(i ) i
i =1

Example: if a rep. array is half full, how many attempts does the next insertion take?
311

156

Complexity - Building a Table


Example:
array size = 1000, wish to insert 900 elements how long does this take? first item: ______ (so a lower bound: _______) 901st item: _____ (so an upper bound: ______)

Problem: each insertion takes more time

312

Building a Table II
A=
X

# Attempts 1 0 r

1 dr 1 r 0

= ... = _________

In our example: X = .9; array size = 1000 E(# of insertion attempts) 2.303 * 1000
313

157

Searching a (Hash) Table


On average how many attempts needed average, (for both insertion and search(!))?
V X = A = ln 1 1 X V = 1 ln 1 X 1 X
X r
314

# Attempts V 1 0

( (

) )

Lecture #21

315

158

Make

316

Prerequisite DAG
Large project: mix of files generated by people and by tools Contents of one file often depend on contents of some others (acyclic structure)
human final product(s)
317

machine

actions

159

Advantages of Make Tool


Automates the whole creation process
Describe DAG and actions once, then build entire structure with one command

Permits distribution of partial DAG to client Automates the partial creation process p p
Identify subDAGs that are out of date and need to be rebuilt, and invoke corresponding actions
318

Example Applications
Latex documents
tex, bib, bbl, dvi, ps

Report generation and filtering Compiled and linked code, object files, and executables
cpp, h, o, a, exe

(note: there can be more than 1 final product)


319

160

Example Prerequisite DAG

320

Makefile
DAG is represented in a makefile makefile
target: prerequisites command command rule

Notes:
command lines begin with tab line continuation with \ Comment lines begin with #
321

161

Processing Makefiles
Default: first target is the final product final product Rule: if any prerequisite is newer than target (or target does not exist), then execute associated commands But first (and in any case): ensure all ( y ) prerequisites are up to date!
recurse to rule that has prerequisite as a target
322

Special Rules Phony Targets


Targets without prerequisites
Always out of date (commands executed) clean: rm f *.o edit

Rules with no commands


Forces recursion to update prerequisites all: gui edit doc.ps
323

162

Variables
Frequently used strings can be replaced by variables Defined with = and referenced with $ Example
CC = g++ CFLAGS = -g prog: prog.c defs.h $(CC) o prog prog.c $(CFLAGS)
324

Implicit Rules
Describe when and how to remake files based on their name (extension)
E.g., <file.o> depends on <file>.c The associated command is cc c <file>.c

Omit the rule entirely


P Prerequisite and command implicitly provided i it d d i li itl id d

Omit only the command


Command implicitly provided
325

163

Explicit Pattern Rules


Contains one % character in target
Matches any non-empty string E.g., %.dvi : %.tex

Q: How can we write the command associated with such a rule? A: Automatic variables
%.dvi : %.tex latex $< latex $<
326

Automatic Variables
Not standard between make tools Gnu (gmake):
$@ - target filename $< - name of first prerequisite $ $? - names of all prereqs newer than target p q g $^ - names of all prerequisites

Not restricted to pattern rules


327

164

To Read More About Make


make: man make gmake: In xemacs:
^U^Hi/usr/local/info/make.info

328

Lecture #22
Review for Midterm

329

165

Lecture #23
Midterm

330

Lecture #24

331

166

Expressions

332

Introduction
Most assemblers permit use of expressions Used as instruction operands
in machine ops and pseudo ops

Typically simple mathematical forms only Two parts:


operators (+, -, *, /) individual terms
333

167

Introduction II
Individual te s may be: d v dua terms ay
constants (e.g., 4, A, 0x3F) user-defined symbols (e.g., X, Buff) special terms (e.g., * for LC) parenthesized expressions (e.g., (X-Z) in (X-Z)/2)

Examples
Buff RES ST ST 4 2,Buff 2,Buff+1 Buff RES 4 BEnd EQU * Len EQU BEnd-Buff
334

Relocation
Expressions are evaluated at ____________
(not entirely true as well see later)

Expressions can be relative or absolute or illegal Intuition:


the value of an absolute expression _________ __________ with program relocation

What are the rules for well-formed expressions?

335

168

Absolute Expressions
An expression is absolute iff:
1. it contains only absolute terms, OR 2. it contains relative terms provided:
i) they occur in summation pairs, AND ii) terms in each pair have opposite sign, AND iii) relative terms do not enter in * or /

Examples of 1: Examples of 2:

3+4+0x2 2*X where: Buff2-Buff

336

Relative Expressions
An expression is relative iff:
1. all relative terms can be paired as above, except one, AND 2. that remaining unpaired term is positive, AND 3. relative terms do not enter into * or /

Examples:
Buff+6
337

169

Motivation
These restrictions are not arbitrary They ensure the expression is meaningful after relocation If the restrictions are not met, the expression is erroneous p

338

Examples
Name X Z Y Value 16 6 4 R/A R R A

X+1-Z = 2+X/Y = Z+X = Z-Y = Y-Z = (X-Z)/2 = ((X-Y)-(Z+Y))*Y = (X/2) - (Z/2) = (X-Z)/2 =

339

170

Generalization
A relative value has the form:
LL + OFFSET
Load Location Indept of Load Location (i.e., absolute)

An absolute value has the form:


A
Indept of Load Location
340

Generalization - Examples
R1 - R2 = = R1 - A = = A - R1 = = (LL + OFF1) - (LL + OFF2)

(LL + OFF1) - A

A - (LL + OFF1)
341

171

Lecture #25

342

Loaders

343

172

What, Again?
So you ask:
Havent we done this already? I built one in Lab #2 !

Our view then was simplistic:


Source Program Compiler Object File Loader Memory

But there are problems with this view

344

Problems
Programmer is responsible for putting absolute addresses in code
error-prone

Programmer decides where object code g goes in memory y


better to give OS control allows multiple concurrent jobs & users (dynamic scheduling)
345

173

Problems II
Program must be self contained self-contained
would prefer to allow separate assembly
make one change, do not have to recompile the whole thing

would prefer to use libraries


for frequently used code freq entl sed functions used by many applications (e.g., strcpy(), sqrt(), sin(), )
346

Problems III
would prefer to have the flexibility to write different parts in different source languages
some languages are better suited to certain tasks than others library functions could be written in a single language (rather than rewriting for every possible source language)

347

174

More General View of Compilation and Loading


Source Files
C Program .c C Compiler

Object Files
.o

Library Files
.a

Memory

Fortran Program

.f

Fortran Compiler

.o Loader

Assembly Program

.s

Assembler

.o
348

General Loaders
This requires standardizing the format of the object file Each source language translator then follows this standard

349

175

Summary of Advantages
Needn t Neednt worry about address arithmetic More than 1 program in memory at once Assemble code once Separate assembly Libraries Lib i Multiple languages
350

Steps Before Execution


1. Translation
produce object file from source

2. Allocation
select area in memory for program

3. Relocation
adjust address references in object file

4. Linking
combine multiple object modules

5. Loading

351

176

Types of Loaders
Different loaders differ with respect to how these tasks are accomplished Types include
compile-and-go absolute relocating linking dynamic loading dynamic linking
352

Compile-and-Go
Observation: an executing translator is, is itself, a process governed by a program residing in memory! Reserve memory at the end of its block As source is compiled, object code is placed directly i t thi di tl into this reserved memory d
loader is really just part of the compiler example: WATFOR Fortran
353

177

Picture
Memory

Source Program

Translator

354

Advantages / Disadvantages
Advantages:
speed: need not produce intermediate file d d t d i t di t fil (I/O is always slow) batch environments: compiler remains resident in memory (so very low start-up cost)

Disadvantages:
must recompile every time you run
no object file produced bj t fil d d

libraries likely to be source-based requires more memory (program + compiler)


355

178

Absolute Loaders
Familiar loader from Lab #2 Consider responsibility for each task:
allocation: calc. length of all modules _______ calc. actual load location ________ relocation: ______________ linking: ________________ loading: ________________
356

Absolute Loaders II
Advantages:
simple, fast, small, programmer-controlled

Disadvantages:
program must be self-contained: programmer must edit library subroutines into one assembly language source file to run at a different memory location, reassembly is required
357

179

Special Case
Observation: any loader is a program
i.e., resides in memory; executes as a process

Q: What loads the loader?


the OS? then what loads the OS?

From an idl machine, we need a way to idle hi d start things up Solution: a bootstrap loader
358

Bootstrap Loader
Store a special program in ROM This program is automatically executed at power-up This program is an absolute loader
reads records from an input device puts them in a predetermined (absolute) location

Control is then transferred to loaded program (which can load other things, etc.)
359

180

Lecture #26

360

Relocating Loader
We need the assembler to do 2 things:
flag relative values (e.g., with modifn records) produce size-of-segment information
machine code

Loader works with OS to determine load location (dynamic)


361

181

Relocating Loader II
Loader performs relocation
adds LL to all relative values can be done in a single pass

Advantage:
more efficient packing of memory p g y

Disadvantage:
no external subroutines or libraries
362

Time for a slight detour, as a motivational detour aside for remaining loader types

363

182

Subroutine Linkage

364

Motivation
Example: want to calculate a square root
first write our program (in assembly) now how do we use this code?

Idea #1: embed right in the code of main


no code reuse program gets big and hard to read error-prone, tedious,
365

183

Motivation II
Idea #2: have a separate section
ORI ... Sqrt LD 0,=0 SHR . . . etc etc t t

From main we need to jump to this code

366

Branching
Want to branch:
to Sqrt, and (after were done) return to caller

So, 2 simple branches dont work


first branch is no problem (BR Sqrt) p ( q ) but what does the return look like???

Solution: for our machine, we can use


367

184

Branch-to-Subroutine
I e BRS R,S(X) I.e., R S(X) Example: 3Fx : BRS 1,Sqrt
loads the PC into register R branches to location Sqrt

Q: what is the new value in register 1? A:

368

Returning from a Subroutine


How do we return to the caller? Want: PC <-- R In our machine, we can use:__________ At the end of Sqrt, we have
BR 3 3,

(Aside: our synthetic instruction, RET, is a more direct way of expressing this)
369

185

Using the Sqrt Subroutine


Prog og Soln O ORI RES ... LD BRS ST BR ST SHR BR RES END 1 1,=625 3,Sqrt 1,Soln 0,1 1,Value ... 3, 1
370

Sqrt Value

Calling Conventions
Program and subroutine must agree on:
where to branch for function where to return when done where to put/get argument(s) where to put/get result(s)

In previous example:
return address in register 3 argument in register 1 result in register 1

371

186

Calling Conventions II
So conventions are required conventions
e.g., caller always places return address in register 3
if function uses r3, must save value first

caller pushes return address onto stack caller stores return address in first word of subr.

These conventions are generally not checked at assembly time


contrast this with higher-level languages
372

All this talk of subroutines is great But were still missing...

373

187

Separate Compilation
Would like to have in our program the line: BRS 3,Sqrt where Sqrt is a label in a different program! (Aside: what does your current assembler do with such a thing?) We extend our language and provide a (typical) mechanism for resolving this...
374

Lecture #27

375

188

New Pseudo Op: EXT


1. 1 EXT symbol name symbol_name
external indicates that symbol_name is defined in a different program legal to use, but assembler cant fill it in Prog ORI EXT Sqrt BRS 3,Sqrt END
376

New Pseudo Op: ENT


2. 2 ENT symbol name symbol_name
symbol_name is defined in this program it is a global symbol i.e., may be referenced in other programs

These pseudo ops change the scope of a p p g p symbol


local (default) -----> global (with ENT)

Q: why not make global the default, or make them all global?

377

189

Example: Two Programs


Main ORI EXT Sqrt ... CALL Sqrt ... END Subr ORI ENT Sqrt ... SHR 1,1 ... RETURN END

Sqrt

378

These two programs can now be:


Independently written Independently assembled into 2 object files

How does the linkage between these object files get resolved? g So now back to loaders
379

190

Binary Symbolic Subroutine Loaders (BSS)


One of the first relocating loaders
1956: IBM, GE, UNIVAC

Allows multiple program segments (control sections in your textbook)


different languages g g different times
Separate compilation!

Lets examine each of the tasks in turn


380

Tasks for BSS Loader


Allocation
assembler calculates each segment length loader adds them all up load location obtained from OS

Relocation
assembler flags words for relocation (bit masks) loader makes modifications
381

191

Tasks for BSS Loader II


Linking
by loader (with help from assembler) a restricted form uses a transfer vector

Loading g
by loader

382

Transfer Vector
Contains 1 entry per external symbol used by this program segment Assembler sets aside room at the beginning of the object file for TV Assembler places symbolic representation p y p of referenced external symbols in TV
sqrt Transfer Vector Program
383

192

Transfer Vector II
Assembler replaces all calls to external symbols with calls to appropriate locations in TV Loader replaces the entries in TV with calls to the appropriate location
0 ORI EXT Sqrt ... CALL Sqrt 6 7 Assembler CALL 6 relative Sqrt 60 66 67 BSS Loader CALL 66
384

CALL 32 RETURN

Disadvantages of Transfer Vector


Overhead
time (extra call instruction) space (for transfer vector in object file)

Works for subroutine calls, but what about for sharing data?
e.g., LD 1,XValue this cannot be replaced with a call to TV, or even a load (with memory direct addressing) from the TV
385

193

Data Sharing with BSS Loaders


Permit 1 common (shared) data segment data segment All external data is in this one DS
DS X

CS

LD 1,X

Assembler replaces X with its (relative) address in the data segment

386

Impact on Relocation
What does this mean for relocating the program? There are 2 different kinds of relative Assembler must distinguish them
extend relocation information e.g., use 2 bits per word
00 - absolute 01 - relative (to CS load location) 10 - relative (to DS load location)

387

194

Lecture #28

388

Direct Linking Loaders

389

195

Introduction
General linking/loading strategy Very common in modern systems And used in Lab #4! Advantages:
separate assembly multiple control and data segments lower time overhead (in program execution) lower space overhead (in run-time footprint)
390

Assembler Responsibilities
1. 1 Header information
length of segment execution start address

2. List of entry symbols


those defined in this segment gives their (relative) value

3. Mark each reference to an external symbol


used in this segment defined outside of segment
391

196

Assembler Responsibilities II
4. 4 Relocation information
modification records

5. Machine code
text records

There are some new things here, which suggests defining some new record types
392

Entry Record
List and define all the entry symbols Possible format:
<Flag> <symbol_name> <value>

Examples:
ESqrt 0E (possible because symbols have fewer than 7 characters) f th h t ) ESqrt=0E

Idea: provide information to loader


393

197

For Lab #4
Well adopt the following conventions: We ll
A programs name is always (implicitly) an entry symbol Entry symbols must be relative
(If you wish to handle absolute, too, thats up to you)

394

External Record
Can be combined with text and modification records if you wish Examples:
LD 1,9 LD 1,Num LD 1,Enum 1E T1F01009 T1F01002M T1F01000XEnum T1F01000XE

Format:
T <addr> <machine_code> X <symbol_name>
395

198

External Records II
Such a record tells the loader to:
find seg. that defines that (external) symbol find the value of that symbol within that seg. (i.e., look at the corresponding entry record!) add this value to the one in the text record add the load location of the seg. that defines the symbol to the text record
this last step is just like the usual relocation operation of relative symbols, but using the LL of the segment that defines the symbol
396

Lecture #29

397

199

Direct Linking Loaders: Algorithm and Data Structures

398

Algorithm & Data Structures


Problem is similar to assembler:
resolve symbols (with forward references) use these values

Solution is similar too!


use _______________

Pass #1: find definition of all external symbols Pass #2: aggregate, relocate, link, load
399

200

Pass #1
Q What does the assembler tell the loader Q. about each ENT symbol? A. So, to determine the actual symbol value, loader must calculate: ______________ + ______________ For lab #4, we can load the segments into one contiguous block of memory
400

Loaded Memory

Seg. 1 Seg. 2 Seg. 3

PLA (program load address)

401

201

Pass #1: Pseudocode


calculate total size (all segments) get PLA (from OS) SLA PLA for each segment do: if (its Main Seg.){IPC = SLA + Headers IPC} add entry symbols with their absolute values (stated value + SLA) to external symbol table (EST) y ( ) (if symbol already present flag an error) calculate next SLA (SLA += Seg. Len.) rof
402

Example
Main ORI EXT ENT BRS BR Num NMD END Pnum Num 3,Pnum 0,0(0) 7 ORI EXT ENT Pnum IO BR END Lib Num Pnum 2,Num 3,0(3)

403

202

External Symbol Table


For our example: Name Value

(assume: ______________________ ) Note: for lab #4, can restrict external symbols to be relative only
404

Pass #2: Pseudocode


SLA PLA for each Seg. in same order as pass 1 for each text record in Seg. calculate memory location relocate record: absolute relative external load the word rof SLA += Seg. size rof transfer control to IPC, start of Main segment
405

203

Recommended exercise: assemble link, and assemble, link load our example (assuming PLA of ________ )

406

Lecture #30

407

204

Checking External Symbols


Q. When do we check whether an external symbol is Q y actually defined? A. In addition to entry records at the top of the object file, there might be special external records, too, for EXT symbols declared. We may be tempted to rely on the fact that external y y y symbols used EXT symbols declared; we could try to gain speed with a conservative approach that complains when no definition is found for a declared EXT symbol regardless whether its used. (Look only at entry and special external records for this check.) Could be done after pass 1 or in pass 2.
408

Checking External Symbols II


However such a strategy is not robust; the fact However, fact is only surely true in files produced by the assembler. Good news: robustness and liberal accuracy can both be achieved here without (in the case of no errors to report) sacrificing speed. p ) g p During pass 2, when each text record is processed, report an error if the symbol is not in the EST.
409

205

Unifying X and M
Dont really need 2 separate mechanisms! Don t Recall meanings of
T_______M T _ _ _ _ _ _ _ XSym
Sym is in EST add this value of Sym to address field

Recall that segment names always in EST This suggests that X can be seen as a more general form of M!
410

Replacing M with X
Prog ORI ... Loop - - ... BR 3,Loop ... T 05 C3002 M or T 05 C3002 XProg

411

206

Linking with Libraries


Common funcs often defined in libraries func s Library linking can be made implicit:
after pass 1, may still have unresolved external symbolssymbols from EXT declarations that are not (yet) in EST if so, search libraries for matching definitions and load them (after pass 1?) still some unresolved externals? Then error

Typically, user-specified libraries searched first, then standard ones (automatically)


412

Lecture #31

413

207

Loader Refinements and Optimizations

414

Problem: Space
Consider a program that calls sqrt, rnd, sqrt rnd and substr Each defined in its own (large) library So, linked and loaded program is huge Solutions (for saving memory):
virtual memory and paging dynamic loading dynamic linking
415

208

Dynamic Loading
Observe: program does 1 thing at a time
dont need all segments present simultaneously

Example
B
500

200 300

D
Total Size = 1.9 Mb

300

200

400
416

Dynamic Loading - Overlays


B/D never together with C/E/F Define an overlay structure for how segments can be swapped in and out A 200 A 200 A 3 scenarios:
B D
Total Size = 1000 500 300 700

C E

300 200

C F
900

200 300 400

Only 1Mb needed (length of longest path) Trade-off: memory space & time

417

209

Dynamic Linking
Instead of branching directly to an external symbol, program issues a call request to OS
subroutine name is parameter for request

OS responsibilities
keep table of loaded libraries p
loads new library if needed manages swapping of libraries as appropriate

transfer control to appropriate subroutine return control to original program

418

Dynamic Linking II
Binding: the association of an actual Binding : address (5E) with a symbolic name (Sqrt) Dynamic linking delays binding from load time to execution time (late binding) Advantages: g
many programs can share 1 loaded library library can be recompiled on-the-fly library only loaded if actually used
419

210

Problem: Time
Every time we want to execute a program, program must re-link, relocate and re-load
costly if object code hasnt changed

Idea: separate these two operations


Object j File Linkage g Editor Linked Program Relocating Loader Memory y

linkage editor does the binding loader does allocn/relocn/loading


small, simple, fast
420

Linking and Loading in Practice


Real object files have multiple sect o s ea es ave u t p e sections
Code (C): fetch for execution only (instructions) Data (D): no fetches for execution (storage)

Why distinguish between C and D sections?


Multiple processes executing same program Load one copy of C (text) segment(s), and share it Each process gets its own copy of D (read/write) p g py ( ) segments

When linking multiple object files:


group C (D) sections together as segments
421

211

Object File Format: UNIX a.out


header text data text relocn data relocn symbol table string table
stack
422

text

data bss heap

Unix a.out Header Structure


int int int int int int int int a_magic; a magic; a_text; a_data; a_bss; a_syms; a_entry; a_trsize; a_drsize; //magic number //text seg size //data seg size //uninit data size //symbol table size //entry point //text relocn size //data relocn size
423

212

Unix a.out Relocation Entry


One entry (8 bytes) for each location to be patched
Handles both relocation and external symbols address index flags

Address: location (offset within segment) to patch Extern flag (1 bit):


Off: plain relocation (index gives segment) On: ext symbol (index is symbol no. from sym table)

Length: (2 bits) patch item is 1, 2, 4, or 8 bytes


424

Unix a.out Symbol Table


Each entry (12 bytes) describes 1 symbol
type name offset spare debug info value

Name offset: pointer into string table


Allows arbitrarily long symbol names (null terminated)

Type byte: low bit is external flag external


Text/data/bss: relative symbol (to that segment) Abs: absolute value (may or may not be external) Undefined: external bit must be on
425

213

Lecture #32

426

Macro Processors

427

214

Introduction
Macro: a notational convenience for Macro : programmers
short-hand for commonly used blocks of code not restricted to assembly languages

Macro Processor: tool that replaces shortp hand with corresponding block of code
performs string substitution (expansion) no analysis of instructions no semantics of programming language

428

Example
To clear all registers we write: registers,
LDI LDI LDI LDI 0,0 1,0 2,0 3,0

If needed often, this can be tedious S l i Solutions:


define a subroutine and call it when needed use a macro...
429

215

Example II
CLEAR MAC LDI LDI LDI LDI MND ;begin defn 0,0 1,0 2,0 3,0

macro name

macro body

430

Example III
In body of program:
M CLEAR M CLEAR M

After being fed to macro processor:


M LDI 0,0 LDI 1,0 LDI 2,0 LDI 3,0 M LDI 0 0 0,0 LDI 1,0 LDI 2,0 LDI 3,0 M
431

216

Picture
Notice that result is a __________ program Macro Processor source

source

The languages of the two programs differ only by what can be achieved with textual substitution
i.e., approximately the same level of abstract machine
432

Outline
Features
arguments labels variables conditional expansion

Algorithm for macro processor Macros in C and C++ Reference: Beck chp. 4

433

217

Macro Arguments
Arguments make macros more flexible I Involves textual substitution l l b i i
SWAP MAC LD LD ST ST MND ORI NMD NMD SWAP BR END (&A,&B) 1,&A 2,&B 1,&B 2,&A 10 0 (X,Y) 0,0(3)
434

Prog P X Y

Result of Macro Processing


Prog ORI X NMD Y NMD LD LD ST ST BR END 10 0 1,X 2,Y 1,Y 2,X 0,0(3)

435

218

Labels and Macros


Labels inside macro bodies can be useful
e.g., a macro that swaps values of 2 registers:
SWAPR MAC (&r1,&r2)

436

Labels: Problem
Consider a program with multiple invocns of macro SWAPR:
M SWAPR (1,2) M ( ) SWAPR (1,3) M

Expands to:

Labels defined twice!


assembler error
437

219

Labels: Solution
Macro processor provides a mechanism for generating unique labels
e.g., preface symbol (definition and use) with $
SWAPR $Tmp1 $Tmp2 $Strt MAC BR RES RES ST M (&r1,&r2) 3,$Strt 1 1 &r1,$Tmp1

438

Labels: Solution II
First expansion of this macro:
$AATmp1 $AATmp2 $AAStrt BR RES RES ... 3,$AAStrt 1 1

Unique prefix for each invocation


generated symbols must conform to assembler syntax (begins with $, length, etc.) programmer follows conventions (not to use $ outside of macros, use short labels, etc.)
439

220

Variables
Evaluated at time of: __________________
i.e., not at execution time

Example: &Test
variable name

SET 0
special expression pseudo-op

&Test can then be used in expressions within the macro body This feature is often used in conjunction with
440

Conditional Expansion
So far all macros we ve seen have been expanded far, weve to the same block of code
(modulo argument replacement)

Useful to generate different blocks of code


perhaps depending on value of some bool expr

Syntax: IF / ELSE / ENDIF


IF (expr) block1 ELSE block2 ENDIF Meaning: if expr is true, expand with block 1 else, expand with block 2
441

221

Example: Shifting Left/Right


S SHIFT MAC (&Target, & ou t, &Dir) C (& a get, &Amount, & )

442

Example: Swap
Conditional expansion for efficiency:
SWAP MAC (&A,&B) IF (&A NEQ &B) LD 1,&A LD 2,&B ST 1,&B ST 2,&A 2 &A ENDIF MND

Now SWAP (S,S) is expanded to nothing


443

222

Lecture #33

444

Macros vs. Subroutines: Tradeoffs


Macros are expanded inline Disadvantage:
program size increases

Subroutines are called (branched to) Disadvantage:


overhead of parameter passing (more costly than sbrtn. body?)

Advantage:
speed

Advantage:
program size

This is another example of the space / time tradeoff


445

223

Algorithm: First Attempt


2-pass approach is tempting 2 pass
lets us resolve forward references

1st pass:
build table of key, domain: ? and attribute, range: ?

2nd pass:
do the expansion (replace macro calls with bodies)

1st pass Invariant: after each MND, table contains all previous macro names seen in definitions, and their bodies
446

Problem A: Nested Definitions


Often useful to define macros inside macros
HPOS READ MAC MAC SolOS MAC READ MAC

M
MND WRITE MAC

M
MND WRITE MAC

M
MND MND

M
MND MND

In program:
begin by invoking OS macro (e.g., HPOS) then use READ & WRITE
447

224

Nested Definitions
To recompile on different OS change flag at the OS, top of program only! Another solution?
but nested definitions more convenient. Why?

Will this work with our 2-pass approach?


multiple definitions of READ (notice how invariant is violated)

Problem: definitions depend on previous expansions


448

Algorithm: Second Attempt


Use a 1-pass approach that alternates as alternates, necessary, between defining and expanding Data structure: Macro_Def_Table
Domain: macro names Range: macro definitions

Key invariant: after each outer MND seen outer


1. All previous outer macro definitions have been inserted into the table 2. All previous macro invocations expanded
449

225

Algorithm: Intuition
Scan program line-by-line MAC seen: change into definition mode
insert body into DefTable match up outer MND with initial MAC

Macro call seen: change into expansion mode


l k up macro name in table look i t bl process expansion from DefTable line-by-line
may include macro definitions!! requires changing back into definition mode
450

Algorithm: Limitation
This two edged approach is pretty clever two-edged clever... But are there any limitations it imposes on the definition / use of macros? A.
i.e.,

In practice, this is not a big problem

451

226

Problem B: Nested Invocations


Convenient to allow macros to call macros
CYCLE MAC (&A,&B,&C) SWAP (&A,&B) SWAP (&A,&C) MND

Expansion requires further expansion p q p


CYCLE (X,Y,Z) SWAP (X,Y) SWAP (X,Z) LD 1,X LD . . .

Or, in particular . . .
452

Recursive Invocations
Macro invokes itself! Of course, beware infinite recursion:
TROUBLE MAC NMD 10 TROUBLE MND

Solution: use _____________________

453

227

Recursive Macro: Example


Consider
TAB MAC IF TAB ENDIF NMD MND (&C) (NZ &C) (&C-1) &C

E Exercise: expand TAB (3) i d Exercise: what happens with


Depth EQU TAB 3 (Depth)
454

Expansion of TAB (3)

TAB (3)

TAB (3-1) NMD 3

TAB (3-1-1) NMD 3-1 NMD 3

TAB (3-1-1-1) NMD 3-1-1 NMD 3-1 NMD 3

NMD 3-1-1-1 NMD 3-1-1 NMD 3-1 NMD 3

455

228

Algorithm: Refinement for Nested Invocations


Add a data structure
ArgStack -- arguments in current expansion

As nested expansions encountered:


arguments are pushed onto this stack

Now, expand mode means:


look up macro name in table push arguments onto stack process expansion from DefTable line-by-line
may include macro definitions (change back to definition mode) may include macro invocations (expansions)

After last line from DefTable, pop arguments off stack.


456

Lecture #34

457

229

Macros in C and C++

458

MP Algorithmic Highlights
No nested definitions; nested invocations supported, ; pp , BUT no recursion allowed
Self-references not further expanded #define T (x+T) //only one expansion of T Circularities handled the same way (stop at first self-reference)

First action: strip comments; dont remove newlines View results of macro expansion with E
gcc E test.c > test.i E For RESOLVE/C++:
gcc E I/class/sce/rcpp I/class/sce/rcpp/RESOLVE_Catalog \ test.cpp > test.ii

Standard file extension for preprocessed C (C++) is .i (.ii), for intermediate file.
459

230

Basic Features: Definition


Use #define
e.g., #define BUFF_SIZE 1000

Must be 1 line

macro name

macro body

longer definitions use line continuation, \

N i convention: all upper case Naming ti ll Defines a global constant


example of use: int Buffer[BUFF_SIZE]; to change this constant, must recompile
460

Using Arguments
Argument list follows name (no space):
#define INC(X) X++ #define SUM(X,Y) X+Y

DANGER: arithmetic grouping Problem #1: protecting the body


e.g., #define MAX(X,Y) X > Y ? X : Y works fi with a = MAX(b,c); k fine i h but consider a = MAX(b,c) + 1; solution: protect the body
#define MAX(X,Y) (X > Y ? X : Y)
461

231

Using Arguments II
Problem #2: protecting the arguments
now consider using MAX macro in:
flag = MAX (b>0, c<0);

i.e., solution: protect the arguments


#define MAX(X,Y) ((X) > (Y) ? (X) : (Y))

Aside: line continuation


#define INC(X,Y) { X++; Y++; } \ \
462

Conditionals
Common condition is this macro defined? is defined?
#ifndef BUFF_SIZE #define BUFF_SIZE 1000 #endif /*BUFF_SIZE*/

Application: debugging modes


#define DEBUG_ON 1 _ ... #ifdef DEBUG_ON printf ( . . . ); #endif /*DEBUG_ON*/
Incurs no space/time overhead when not debugging!
463

232

File Inclusion
Syntax: #include filename filename
text of file called filename inserted at that point
#include f

DANGER: recursive inclusion


File f1
#include f2

File f2
#include f1
464

Recursive File Inclusion


Solution: protect every included file Convention: use #ifndef #endif
filename

File F1.h #ifndef F1 H IFP F1_H_IFP #define F1_H_IFP 1 ... #endif /*F1_H_IFP*/

something unique

465

233

Predefined Macros
Some defined by ANSI standard:
_ _ FILE_ _ / _ _LINE_ _: current file name / line number _ _DATE_ _ / _ _TIME_ _ : current date / time

Useful for error reporting


printf(error in %s, line %d\n, _ _FILE_ _, _ _LINE_ _);

Others defined by particular compilers (e.g., gcc)


_ _VERSION_ _, _ _BASE_FILE_ _, _ _INCLUDE_LEVEL_ _

Useful for distinguishing OSs


#ifdef _ _VAX_ _ ... #endif /*_ _VAX_ _*/
466

Arguments in Strings
ANSI C: parameter substitution not performed within quoted strings
#define DISP(EXP) printf(EXP = %d\n, EXP) Invocation: DISP (i*j+1); Result:

Solution: stringizing operator # stringizing operator,


#define DISP(EXP) printf(#EXP = %d\n, EXP) Result:
467

234

Pitfalls to Avoid
Text substitution aspect of macros can Text substitution make them tricky General strategy: limited use! Pitfall #1: side effects
recall MAX example consider: a = MAX(b++, c++) Q. if b = 2, c = 5 beforehand, what is result? A. a = ______ b = ______ c = _______
468

Pitfalls to Avoid II
Pitfall #2: swallowing the semicolon t a # : swa ow g t e se co o
Macro expands to form a compound statement:
#define INC(X,Y) {X++; Y++;}

We want to include semicolon with call:


INC(a,b);

But consider:
if (. . . ) INC(a,b); else . . .

This doesnt compile! why not? Solution (notice the missing semicolon at the end):
#define INC(X,Y) do { while(0)
469

X++; Y++; }

\ \ \

235

Lecture #35

470

Compilers

471

236

Introduction
Ref Beck chapter 5 Ref. Compiler = a kind of translator
high level language --> machine (or assembly) code

Translation gap is larger than for assembly language


sophisticated data structures str ct res
arrays, records, classes,

sophisticated control structures


if, while, switch, function calls, nested scopes,
472

The High-level Language


Two aspects to language definition:
1. Syntax
what are legal programs? i.e., what is accepted by the compiler

2. Semantics
what does the program mean? i.e., into what machine code it is translated

473

237

Modular Decomposition
View input as a stream of characters
P r o g _ _ _ _ _ O R I _ _ _ \n X _ _

source

Compiler

object file

Compiler must give this stream structure in order to perform the translation
474

Coarse-Grained Decomposition
( (source) ) stream of characters

Lexical Analyzer
stream of tokens

Parser
parse tree

Code Generator
object file
475

238

Lexical Analysis
First step of compilation process Also called:
scanner, tokenizer, lexer

Scans program (often stripping comments) Recognizes:


keywords, operators, identifiers, ints, floats,

All of these called tokens


476

Tokens
A token is defined by:
1. Type (e.g., integer) 2. Value (e.g., 312)

Keywords (e.g., while) often have their own token type (no associated value) yp ( ) Example:
MEAN := SUM DIV 100;
477

239

Tokens - Example
Result of tokenizing:
Line 13 Token Type id := id DIV int ; Token Value MEAN SUM 100

478

Token Definition
How to define what is & isnt a token isn t Some things seem to be simple
e.g., keywords

But language syntax rules add complexity


line continuation characters are spaces meaningful?

Need a general notation for defining all token types


479

240

Regular Expressions
Examples
label :: [A - Z] [A - Z 0 - 9] {0, 5}
a label is a capital letter followed by 0 to 5 characters that may be capital letters or numbers

int :: 0 | [1 - 9] [0 - 9]*
an int is either a 0 or a digit in range 1 to 9, followed 9 follo ed by any number (0 or more) digits

Regular expressions are equivalent to


480

Finite State Automata (FSA)


Definition:
a finite collection of nodes (states) directed arcs (transitions) between nodes arcs are labeled special nodes:
1 start at least 1 final (or ending or accepting)

An NFSA accepts a string iff it can read the string and end up in a final state
481

241

Example: LongLabel
longlabel :: [A - Z] [A - Z 0 - 9]* A-Z0-9 A- Z

482

Example: Int
0 0-9 1-9

483

242

Exercise
Write an NFSA for labels with underscores
same rules as for LongLabels (for letters/no.s) no _ at start no _ at end no 2 _s in a row BUFFER1 T_B_SI9ZE BUFF_SIZE BUFF_ BUFF_ _S 1SIZE
484

Should the following be accepted?

Lexical Analysis
Could write code to recognize LongLabel directly
see figure 5.10 but this is hard to read, modify, maintain,

Much easier to read and understand FSA! Scanners are often built automatically from FSA descriptions!
485

243

486

Lecture #36

487

244

Review (From Last Time)


Compiler overview Tokenizers Stream of char stream of tokens Token definition
regular expressions FSAs

Today: next step in compilation process...


488

Step #2: Parsing

489

245

Grammar
Defines syntax of language Given as a collection of rules
transformations e.g., ( X ) * ( T X ) * maps string on left into string on the right p g g g

One particular (and important) kind of grammar: Context-Free Grammar (CFG)


490

CFG
Two kinds of symbols:
terminals non-terminals

Each non-terminal has an associated rule


non-terminal is the only thing on the left eg p | (p) | pp e.g., application: p ((p)p)p pp (p)p (pp)p (()p)p (()())()
491

246

BNF (Backus-Naur Form)


A common notation for CFGs Invented to define the syntax of ALGOL60 Terminals are tokens! E.g.,
<entry> ::= ENT <entry-list> <entry list>
NT is defined to be token NT

<entry-list> ::= id | <entry-list>, id (notice the recursion)

492

BNF II
One special start symbol start
e.g., <program> ::= id <origin> <body> <end>

Notice division between tokenizer & parser


tokenizer could return smaller tokens then rules in parser become more complicated p p
e.g., <read> ::= R E A D . . . vs. <read> ::= READ ( <id-list> ) . . .

Example: see PASCAL BNF p. 228


493

247

494

Parse Trees
Record the application of BNF rules
root: the start symbol internal nodes: non-terminal symbols leaves: terminals (i.e., tokens)

Example: using PASCAL BNF, what is the p g , parse tree for MEAN := SUM DIV 100 ?

495

248

Parse Trees - Example


<assign> id MEAN := <exp> <term> <term> <factor> id SUM <factor> int 100
496

DIV

Parse Trees - Exercise


Exercise: FOR I := 1 TO 10 DO
READ (TEMP)

Exercise:
<exp> ::= <exp> + <exp> | <exp> - <exp> | int parse 3 - 6 - 2 answer?

Grammar that allows more than one parse tree to be formed for the same token sequence: ambiguous
497

249

Algorithm
How do we calculate a parse tree? Two approaches:
bottom-up (start at leaves) top-down (start at root)

498

Shift-Reduce Parsing
Bottom up approach Bottom-up Scan tokens, placing them on a stack Group tokens at top of stack:
pop them all off push corresponding non terminal non-terminal shift reduce

Repeat until done


should be left with ________________
499

250

Shift-Reduce Parsing II
Grammar must be LR LR
Left-to-right scan of the input, producing a Right-most derivation symbols to be reduced always appear at top of stack (never inside it)

Need to look ahead to decide how/when look ahead to reduce


if we only need to look ahead 1 token: LR (1) grammar
500

Lecture #37

501

251

Recursive Descent
Top-down approach Each non-terminal has associated routine
scan forward try to identify string matching this rule

Routine may have to call other routines (or itself) i lf)


see Figure 5.16, example for <read>:
find READ; find ( ; find <id-list>; find )
call a routine
502

503

252

Recursive-Descent - Problem
Subtle potential problem: left-recursion left-recursion
the left-most (first) symbol in the BNF rule is the same non-terminal (recursive) e.g., <id-list> ::= id | <id-list>, id

If we want to expand 2nd alternative, first call ourselves! (i.e., infinite recursion) ( , ) One solution: change notation slightly
<id-list> ::= id [ , <id-list> ] routine always consumes a token before recursion
504

Step #3: Code Generation

505

253

Introduction
Use a collection of routines 1 routine / non-terminal in the grammar
called semantic or code-generating routines

2 approaches:
create entire tree
then walk the tree, generating code

generate code as we go
when a grammar rule is recognized, call the corresponding code-generating routine
506

Example
Consider: <term> ::= <factor> * <factor> Occurs in parse tree as: <term>
<factor> * <factor>

G Generate code as we come up th tree t d the t


keep track of where (which registers) results of lower nodes are stored generate code for * operation 507 keep track of where result is placed

254

Optimization
An optimizing compiler tries to generate the most efficient object code
time (fast execution times) space (small object files)

Requires sophisticated analysis q p y Often uses an intermediate form of code


not executed directly analyzed for deciding register allocation, instruction ordering, branch shadows, etc...

508

Lex & Yacc


Unix tools for building compilers
lex: lexical analyzer yacc: yet another compiler compiler

A compiler compiler takes as input:


lexical analyzer y grammar code-generation rules

And produces as output:


compiler
509

255

Lex Example
Input file:
Definitions %% Rule {action} . . .

Definitions: convenient short-hands for REs R l recognized regular expressions and Rules: i d l i d corresponding action to perform
set the token value (use global variable yylval) return the token type (return an int)
510

INSERT: Lex Example Input file for simple Pascal syntax (pascal.lex)

511

256

Lex Example II
Run: lex pascal.lex pascal lex Result:
file called lex.yy.c a 677-line C program! implements the function int yylex() p yy ()

512

Yacc Example
Create a file defining the grammar C eate e de gt eg a a
%token NUMBER %% expr: NUMBER {$$ = $1 } | expr + expr {$$ = $1 + $3} | ( expr ) {$$ = $2 }

An invocation of yylex used to return the next token (and token value) Action produces output (object code) Run yacc on this file to produce a compiler that uses a bottom-up parsing method.
513

257

Lecture #38

514

To Ponder
What is meant by a text file? (vs. binary) A file of English text occupies 5 Mbytes on disk. A Java program reads the contents of this file into a String (or StringBuilder) object. How much memory does it need? Java string length vs. number of characters
String s = . . . assert (s length() == 7) (s.length() How many characters does s contain?

Whats so scary about:


..%c0%af..
515

258

Unicode
A standard for the discrete representation of written p text

516

The Big Picture


glyphs
m

code points binary encoding

U+0444 U+006D

U+20AC U+2019 U+5975

D1 84 6D

E2 82 AC E5 A5 BD E2 80 99

517

259

Text: A Sequence of Glyphs


Glyph: A recognizable abstract graphic symbol symbol
See foyer floor in main library

One character can have many glyphs


Example: e e e e e e e

One glyph can be different characters (capital Latin A and Greek Alpha: ) One glyph can be several characters (ligature of f+i into one symbol: )
518

Security Issue
Visual homograph: Two different characters that look th same h t th t l k the
Would you click here: www.paypl.com ? Oops! The second a is actually CYRILLIC SMALL LETTER A This site successfully registered in 2005 y g

Solution
Heuristics that warn users when languages are mixed and homographs are possible
519

260

Unicode Code Points


Each character is assigned a unique code point A code point is defined by an integer value, and is also given a name
Example: LATIN SMALL LETTER M, one hundred and nine

Convention: Write code points as U+hex


Example: U+006D

As of November 2010:
Contains 109,000+ code points Covers 93 scripts (and counting)
520

Organization
Code points are grouped into categories
e.g., Basic Latin, Cyrillic, Arabic, Cherokee, Currency, g, , y , , , y, Mathematical Operators

Standard allows for 17 x 216 code points


i.e., > 1 million U+0000 to U+10FFFF

Each group of 216 called a plane


U+nnnnnn, same green ==> same plane

Plane 0 called basic multilingual plane (BMP)


Has practically everything you could need Convention: code points in BMP written U+nnnn, others written with 5 or 6 hex digits
521

261

Basic Multilingual Plane

522

UTF-8
Encoding of code point (integer) in a sequence of bytes (octets)
Standard: all caps, with hyphen (UTF-8)

Variable length
Some code points require 1 octet Others require 2, 3, or 4

Consequence: Can not infer number of characters from size of file! No endian-ness: just a sequence of octets
D0 BF D1 80 D0 B8 D0 B2 D0 B5 D1 82 ...
523

262

UTF-8 Encoding Recipe


1-byte encodings
First bit is 0 Example: 0110 1101 (encodes U+006D)

2-byte encodings
First byte starts with 110 Second byte starts with 10
Example: Payload: = = 1101 0000 1011 1111 1101 0000 1011 1111 100 0011 1111 U+043F (i.e., , Cyrillic small letter pe)
524

UTF-8 Encoding Recipe


Generalization: An encoding of length k:
Fi t b t starts with k 1 th 0 First byte t t ith 1s, then
Example 1110 0110 ==> first byte of a 3-byte encoding

Subsequent k-1 bytes each starts with 10 Remaining bits are payload

Example: 11100010 10000010 10101100


Payload: x20AC (i.e., U+20AC, )

Consequence: Stream is self-synchronizing


A dropped byte affects only that character
525

263

UTF-8 Encoding Summary

(from wikipedia)

526

Security Issue
Not all encodings are permitted
overlong encodings are illegal g g g example: C0 AF = 1100 0000 1010 1111 = U+002F (should be encoded 2F)

Classic security bug (IIS 2001)


Should reject URL requests with ../..
Scanned for 2E 2E 2F 2E 2E (in encoding)

Accepted ..%c0%af.. (doesnt contain x2F) %c0%af (doesn t After accepting, then decoded
2E 2E C0 AF 2E 2E gets decoded into ../..

Moral of the story: Work in code point space!


527

264

Other (Older) Encodings


In the beginning Character sets were small
ASCII: only 128 characters (i.e., 27) 1 byte/character, leading bit always 0

Globalization means more characters


But 1 byte/character seemed so fundamental

Solutions:
Use that leading bit!
Text data now looks just like binary data

Use more than 1 encoding!


Must specify data + encoding used
528

ASCII

529

265

ISO-8859 family (-1 Latin)

530

Windows Family (1252 Latin)

531

266

Early Unicode and UTF-16


Unicode started as 216 code points
The BMP of modern Unicode Matches ISO-8859-1 in bottom 256 points

Encode every code point in 2 bytes (1 word)


Simple, but leads to bloat of ASCII text

For code points outside of BMP


A pair of words (surrogate pairs) carry 20-bit payload split, 10 bits in each word First: 1101 10xx xxxx xxxx (xD800-DBFF) (xD800 DBFF) Second: 1101 11yy yyyy yyyy (xDC00-DFFF)

U+D800 to U+DFFF are reserved code points in Unicode


And now were stuck with this legacy, even for UTF-8
532

Basic Multilingual Plane

533

267

UTF-16 and Endianness


Multibyte representation
Must distinguish between big & little endian

One solution: Specify encoding in name


UTF-16BE or UTF-16LE

Another solution: require byte order mark (BOM) at the start of the file
U+FEFF (ZERO WIDTH NO BREAK SPACE) there is no U FFFE code point th i U+FFFE d i t FE FF ==> BigE, FF FE ==> LittleE Not considered part of the text
534

BOM and UTF-8


Should BOM be added to start of UTF-8?
U+FEFF is encoded as EF BB BF

Advantages:
Forms magic-number for UTF-8 encoding

Disadvantages:
Not backwards-compatible to ASCII Existing programs may no longer work E.g., in Unix, shebang (#!, i.e., 23 21) at start of file is significant (file is a script)
#! /bin/bash
535

268

To Ponder
What is meant by a text file? (vs. binary) A text file occupies 5 Mbytes on disk A Java disk. program reads the contents of this file into a String (or StringBuilder) object. How much memory does it need? Java string length vs. number of characters
String s = . . . assert (s length() == 7) (s.length() How many characters does s contain?

Whats so scary about:


..%c0%af..
536

Lecture #39
Review

537

269

Das könnte Ihnen auch gefallen