Beruflich Dokumente
Kultur Dokumente
Until this point in your academic career, you worked primarily independently, and on projects of very limited scope. Once you are employed as a programmer, you will rarely work independently on a project again. A skilled programmer can only turn out an average of 10 lines of well-designed, documented, and debugged code per day. With most systems programs and larger applications requiring many thousands to hundreds of thousands of lines of code, these are clearly beyond the scope of a single hot-shot programmer; the time to market would be too great. Hence working with a team to produce a major software system is an essential part of being a computer scientist. The stereotype of a computer programmer as a loner who communes with his/her machine to avoid people could not be further from the truth. Professional programmers spend more time in design meetings, in code walk-throughs, communicating with other programmers, with users, with the system maintainers, with marketers, etc. than in front of a monitor. That should be your experience in this course as well. One problem with working as part of a team, and working on very large software systems, is that the program is too large for any one programmer to understand the whole system. A software system such as an operating system has more components than a Boeing 747, and it is clear to us that no one person understands each and every component of a 747, much less their interactions. Hence adherence to good software engineering practices is essential if the results of a large group programming eort are ever going to work together, be suited for debugging, be maintainable, be modiable, or meet the original requirements. Working with a team can be anywhere from fun to awful, depending on your attitude and the attitudes of your teammates. Since you may not know the work habits or attitudes of your new team members how can you ensure a successful project and fairness in grading? Everyone needs to be involved in each aspect of the project (design, documentation, test planning, coding, and testing). In order to work on a team you will have to be considerate of your teammates. They are all High School Graduates and college juniors or above. In spite of your rst impression, they are capable, intelligent people, and deserve respect. Most team problems occur because a member of the team currently has too many commitments in life. This over commitment may be due to school class load, work issues, family issues, etc. It does not necessarily mean that they are lazy or stupid. However, if you feel that a team member is not attempting to contribute to the team, just let me know. We can have some friendly discussions and many times resolve the problem. You should all agree on a common language and a common hardware platform. Unless you are all extremely skilled (or masochistic), you should not use multiple platforms. Planning is the most important thing you can do. One hour spent in a preliminary team meeting saves many individual hours of redundant and possibly incompatible coding. Many students seem to think time planning and designing away from the keyboard is wasted eort. Such eort will not be wasted in this class.
I feel that my teammates are not doing their fair share. Contact the graders and/or the instructor. We will have a group meeting or individual meeting to determine exactly what the problem is. The earlier we discover and correct problems the more exibility we have in making adjustments. My teammate is a coding whiz-kid and has decided to simply do it all by her or him self. Contact the graders and/or the instructor. We will have a group meeting or individual meeting to determine exactly what the problem is. The earlier we discover and correct problems the more exibility we have in making adjustments. The whiz-kid who prevents others from working on the lab will have his or her lab grade reduced. Two of my teammates are long time buddies and they do everything together (software wise) and leave me out. Contact the graders and/or the instructor. We will have a group meeting or individual meeting to determine exactly what the problem is. The earlier we discover and correct problems the more exibility we have in making an adjustment. What is dierential grading? Why should we avoid it? If the graders and I determine that a fair share of the work was not done by all team members, then dierent grades will be assigned to each team member. If one team member does it all, he/she may get a lower grade than the rest of the team. But no one will be happy with the grade. The team members and the instructor will have a meeting to discuss the problem and hopefully correct it. However, a dierential grade may still be assigned. Can I still pass the class without doing anything on the labs? Absolutely not.
General
Problems are hardly ever fully dened. Welcome to the real world! You get a set of end user requirements 1 , then you need to study, examine, and sketch out issues and concerns. You will need to ask leading questions. While I do not intentionally leave information out, end user requirements hardly ever match the level of detail needed by the programmer. You and your teammates must agree on some set of standard coding practices: Will variables be passed as parameters or will you use global variables? A standard format for variable names. Variable names that represent a meaning. Names such as a, x, z, and n are not as clear as number of cases, location counter, etc. You should agree to a maximum module size. If a module needs to be larger than that, break it up. My rule is two screens worth, including comments. You will need to agree how to share les and how to know when a module should be added or a new update added to the lab. You might designate one team member has having sole responsibility to update the program les. If everyone makes changes then you will have a real mess! Another alternative is to make use of the Unix cvs utility.
1
You should learn to use the make utility. You should consider writing several Unix scripts that will change the permissions of les for easy compiling. You might also want to write a script to facilitate compilation via the Unix make utility. Use common modules to avoid duplication. What if we have a question? Order of operations for maximum success! 1. email the grader 2. visit the graders during oce hours 3. call the grader 4. email the instructor 5. use the instructors oce hours 6. call the instructor All these options are acceptable and encouraged.
Design:
1. Layout a top down design. 2. Look for routine modules you will need repetitively such as: binary to hex, binary to decimal, decimal to hex, decimal to binary (could these really be one routine? should they be?), building a table (whats the relationship, if any, between a table and a partial map?), searching a table, . . . . 3. Write a dummy module for each routine needed. The following sample dummy modules implementation is written in pseudo-code.
Check_for_overflow: begin begin comment Procedure Name: Check_for_overflow Description: This routine determines whether the results of the operation would have resulted in an overflow. In this system the data length is 24 bits so the results range from -8,388,608 to +8,388,607
Calling Sequence: (temp_result: Integer, overflow: Boolean) Input parameters: temp_result Output parameters: overflow Error Conditions Tested: overflow Error Messages Generated: message ### Original Author: Al Stutz Procedure Creation Date: February 22, 1995 Modification Log: Who when why Al 3/11/93 Forgot to send message to screen Wayne 1/3/02 Corrected mismatch between data length and results range in description; introduced quote marks (") into pseudo-code; changed indentations in the comment; inserted colon (:) into pseudo-code; introduced types into the calling sequence; changed parameter name from flag to overflow; changed ambiguous "Initialize overflow" to more precise "Set overflow to false" in the pseudocode; end comment Set overflow to false Write "In routine: Check_for_overflow" Write temp_result, overflow end;
Documentation:
1. Draft the user guide before any code is written. It is easier to make modications in this document rather than in the code. 2. Write down clear assignments to team members. 4
Writing Code:
Even it you ignore all other advice, you should not do any coding before you complete the above steps. You must know what needs to be done and the limits of your own testing. You will, of course, want to create code for which your own tests can nd no counterexamples; furthermore, you will want to strive to create code for which no possible test could reveal a defect. That is to say, youre striving to create code without defects. This is why you hope that your test plan will reveal as many defects as possible. 1. As routines are written, they should be tested, even in their dummy form. Once you have all the routines identied (you will miss some), then start expanding them and step testing as you go. Test changes as you make them rather than all at once! 2. Have someone other than the module author also test it.
Testing:
All the nal testing must be done on the same version of the code. If in one of the tests an error shows up that you subsequently x, then you must re-run all previous tests. Your one line change could very well impact the results of an earlier run. (We have many examples where a small change has been ruinous.) It is very embarassing for some very basic and simple feature of your program that you know was working yesterday to fail during grading because of some little, last minute, change you made to x some advanced feature. Re-run those tests! 1. Use realistic test data. 2. Test extreme cases. 3. Test each function. 4. Test each error message. Hint: The grader will often assume that you have the basics working but try to catch you on a ne point. Your program better not crash if the graders test script references a memory address out of range, or tries to feed a gif le to your assembler. Your program should gracefully catch the error and print an informative error message.
One of the major objectives of CSE 560 is the understanding and practice of techniques that enhance the development of quality software. The process of software development often is thought of as being composed of stages. In the rst stage the problem that needs to be solved is dened and the requirements that the software must meet are identied. The next stage consists of designing proposed solutions to the problem, evaluating alternatives, selecting one of the alternatives, and detailing the (modular) structure of the chosen solution. Next, the program is constructed in accordance with the design specication (this is the stage, coding, with which we all are familiar). The software also must be validated (e.g., through testing) to measure conformance with the specications laid out in the early stages, and installed so that the customer can use it. Throughout the development process documentation is the chief means of communication and management control. In formal development systems, there are specic documents that are required to be produced during these various stages of development. Each stage itself can be further decomposed into tasks, and each task can result in the production of some task document. Even for relatively small projects such as we have in this course, there are several reasons to follow a more formal approach: 1. To prepare you for more complex and more formal development environments in the real world. 2. To become more conscious about the various tasks that one goes through in developing a program. With this kind of awareness, one can more specically address sources of error in the development process. 3. To facilitate the later use of CASE tools to assist you with your project work. 1 4. To allow more direct supervision by the instructor. Each of the labs in this course goes through only a subset of the stages identied above. Requirements are provided by the instructor and installation is, for the most part, skipped because we arent in a production environment. You each have had experience with construction and testing of your software. Remember that you must prepare your own test data even though your software also may be tested by the grading assistant. This leaves the design stage, which we want to emphasize in this course. In order to assist you with this stage, we have identied a number of subtasks that you should perform,
CASE stands for Computer Aided Software Engineering. Though no CASE tool is stipulated in this course, you are welcome to use one with which you are familiar. Even if you do not use any such tool, the formal approach suggested in this document will prepare you for later use of CASE tools.
1
and a suggested order in which to perform them. Each task has some output, which you are required to produce and (with the exception of task 1.1) turn in as part of the writeup. These will constitute the programmers guide and part of the users guide portions of the writeup. To get you started, we have provided suggested output for a few of the earlier tasks. Feel free to augment our suggestions with your own. A preliminary version of your design is due the day before your design review meeting, so do not delay in getting started. The output for the tasks (particularly those in categories 2 and 4) can be produced using CASE tools that you know or using plain old pencil and paper. The diagrams and data descriptions should be shared among members of your group so that consistency is achieved as each of you works on your respective parts of the project.
Figure 1: A Possible Task Flow Diagram Note: These tasks are not likely to ow smoothly from one to the other, as the diagram above might suggest. Rather, you probably will nd it necessary to iterate on certain tasks, especially when performing tasks 4.1 and 4.2.
Solution: Make sure you have identied each of the requirements and constraints for the problem. Typically the problem description is not organized so that all of the requirements in each category are together, but it is important that you know exactly what youre expected to do in each of these categories. In the programmers guide, explicitly identify the responsibilities given to each team member. This is the only part of the solution to task 1.1 that you need to formally document.
Possible Partial Solution: Standards 1. Complete each design task identied in the task list table 2. Include output of each task in writeup, organized according to the task list table 3. Documentation is required of all team members 4. Use structure chart notation to show module relationships 5. Use Jackson notation for data structure diagrams Utilities and Support Hardware and Software Sun Workstation Unix operating system and X-windows Modula-2 Compiler Emacs
Possible Solution:
initial m/c state Interpret Instruc. initial state final state trace User
Possible Solution:
Load module: The user will create a le containing records that will be used to initialize the state of the 560 machine. The load module will read this le, providing initial values to various memory locations and various registers of the 560 machine. A display of the machine conguration is generated at the end of the load process. Interpreter module: Starting with the initial state provided by the load module, the instruction at the address given by the program counter is fetched and decoded, and the operation indicated by the instruction is performed. Each instruction appropriately resets the program counter. This entire cycle is repeated until the nal state is reached, when a HALT instruction is encountered or a fatal exception condition is reached. A trace is generated as each instruction is performed. A display of the machine conguration upon normal or abnormal termination of the simulation is generated.
Possible Solution: File characteristics: record input; each record is of type 1, or 2. There are 13 characters per record for type 1 records, 7 for type 2 records. Record layout: probably some kind of data structure diagram, perhaps using the Jackson notation. A sample is shown below.
input file
header record
text part
H-code
start exec.
seg name
seg length
IPLA
text* record
T-code
mem.addr.
init. contents
Each of the primitive subelements start exec, seg name, etc. should have a description as well. The descriptions should be in terms of a simple data type and its possible set of values (e.g., start exec, and mem. addr. might be of type cardinal with range 0..255; init contents might be declared as an integer, or array of char).
Solution: Complete for each of the following, in a manner similar to that used for the outputs in task 2.1. 1. Initial state 2. Trace of execution 3. Final state Also note the way in which errors are reported. This isnt shown separately in the system ow diagram, but you may wish to create a separate error le. If so, modify the system ow diagram and complete the output description for the error information.
Solution: Include element name, purpose, and attributes (diagramming any substructure as you did with the input and output descriptions). Note: If a Modula-2 module, a C++ class, or a RESOLVE/C++ compontent will be encapsulating a type that is a major shared element, it is not necessary to describe that element here because it will be described in the detailed description for that module, class, or component.
Possible Partial Solution: 1. Not every word has initial contents read in from the input le. 2. Text records need not be in numerical order by word address. 3. All memory cells will be initialized to (ll in the value your group has chosen).
10
Meaning (for procedural abstractions): The program A is thought of as invoking (calling) 2 modules, B and C, which presumably are invoked in that order. Module A provides B with inputs 1 and 2 and B returns 3. Module C is itself composed of D, E and F, and E has a submodule G. (Inputs and outputs for the other interfaces are not shown in this example, but all interfaces should appear in your solution.) The names given to modules should be simple commands such as Interpret Instructions, Compute Target Address, etc. This kind of diagram is called a structure chart. Meaning (for data abstractions): A data abstraction module Data Template provides type D and operations A, B, and C. A parameter d is both an input and output parameter to operation A. 11
(Again, other parameters to the operations are not shown in this diagram, but should be when developing the completed module structure.) The boxes with curved sides represent the data type component of the module, while the boxes with the triangles on top represent the operations provided by the module. The triangle on top of a box denotes that this operation is lexically included as part of its parent, rather than being called from its parent. To show that this data abstraction module is invoked by (imported into) another module, simply show a line connecting the Data Template box to the other module.
12
Elements of solution: (for each module) Module name: For a procedural module, this should be a declarative command like interpret instructions. For a data module, it should be a descriptive name of the data abstraction, like stack of integers. Formal Parameters: Name and type of each formal parameter in order of calling sequence. For data abstractions, there may not be any parameters to the top module box because not all components are templates. But there nearly always will be parameters to the other operations provided by the data module. Global Elements required, if any (descriptions should be included here unless they already are included in task 2.3). For data modules, any internal state information that is local to this module should be described here. Statement of purpose of module: A brief, one sentence description will do. If its too hard to write a concise statement of purpose, this may be a clue that the module isnt well thought out. Pseudo code: If this is a data abstraction, give the entire denition module, and pseudo code for each operation. It also would be nice to see pre- and post-conditions for each operation.
13
14
Likewise, writing can consist of the following stages: 1. Generation of ideas and outlines. Think! Write down ideas. Ask yourself questions and generate a list of specications. Rearrange them, put them in groups. You may need to iterate here till you are able to sketch out a high-level structure and identify key components. 2. Prepare an initial draft. Flesh out details of the parts of the structure created above so that it can be read and understood by your audience. Basically, at this point you should have a prototype of your report. 3. Make big revisions. Evaluate basic ideas and make sure you are conforming to the requirements stated in (1). Look at the structure and the paragraphing. The organization of the report should represent a sensible coupling of ideas. Revise the components as needed. 4. Revise your paragraphs. Paragraphs reect the organization of your report. A paragraph should be cohesive, i.e., unied around an important point. (a) State the purpose. You should tell the reader the main topic of the paragraph as early as possible before the reader gets lost in it. (b) Be pertinent. Reject matter that is unrelated to the main theme of the paragraph. Develop the main point and enlarge on it. (c) Proper coupling. Link paragraphs to paragraphs. This improves the overall organization and coherence of the paper and ensures a smooth ow of information. 5. Revise your sentences. Now, you should look at individual sentences and assess them for style and clarity. Basically, at this stage, you are evaluating operations within components. (a) Highlight major ideas. Decide what ideas are worth emphasizing and put them in subjects, verbs or objects. Dont have too many short sentences, but dont move to the other extreme either and put too many ideas in one sentence. (b) Add necessary words. Put in words that are needed for logical completeness of the structure. Add words needed to complete compound structures. (c) Use good grammar. This means that you : Resolve mixed constructions. Fix misplaced and dangling modiers. Check if quantiers are properly bound. Provide consistency for verbs, etc. 6. Choose the right language. Your choice of words should suit your audience and your topic. Avoid jargon, slang and the like. Use proper math constructs and check expressions for succinctness. Avoid too much negation. Check logical connectives for succinctness and understandability. Choose an appropriate tone.
2
7. Edit your punctuation Check if your punctuation is appropriate to the context. Note: The latter steps represent the implementation stage of the writing process. Its an iterative process and you may have to move back and forth through each stage as you discover aws in your eort (testing and implementation are interspersed throughout the process). 8. Finally, you will be ready to show your report to the world. This represents the end of development. In the software lifecycle, this may correspond to the installation phase.
A technical report is generally more intricate than the average essay. It contains complex materials, which need to be arranged in a suitable way to help readers read and understand the report quickly. It should be as brief as possible, yet as precise as possible. Accuracy is important, particularly in design documents. A complete design report consists of the following components: 1. The front matter. 2. The body of the report. 3. The references. 4. The appendices. These components are elucidated below for a software report.
This helps readers use your report eciently. It includes the following: 1. The Title Page. 2. The Table of Contents. 3. The Introduction. 1.1 The Title Page
The title page is right at the head of the report and is the rst thing readers will look at. It should comprise the following: The title. This should be well-chosen and should clearly reect the content of the report. Names. The names of the people responsible for the report. The date. When the report was submitted.
1.2
This is a map of your report. Your readers will use it to nd their way through the report. It should be fairly comprehensive and should list all the sections and the subsections of the report in the order in which they appear and the page numbers on which each of them begins. It should be well-designed and should distinguish between sections and subsections by using upper/lower case letters and indentations. Figures and tables should be listed separately after the contents. 1.3 The Introduction
This gives a general overview of the project. It should provide the concepts on which the project is based and how it works. It should also lay the foundation for the other sections.
This is the main part of the report. In CIS 560, it might include the following sections : Users Guide. Programmers Guide. Source Code. Testing Documentation. Alternatively, the Users Guide, Programmers Guide, etc. can each be considered reports of their own, containing their own individual front matter, body, etc. In that case, the front matter, etc. would be more specic and relate to the particular report. 2.1 Users Guide
This should cover the basics of using your system. It should explain the capabilities of your program to the user and show him/her how to use it. It should not give the inner details of how and why you have written the program. Basically, it should cover the following: Learning to use the system. Getting started. Starting and exiting from the program. Other basic topics like expected input and output etc. Introduction to dierent commands. This section should cover the instruction set and can typically be subdivided as follows : Understanding the command syntax. Advanced commands.
Error messages. A descriptive list of error messages. How to recover from errors. An Index. If your Users Guide were a standalone document for a large system, then you could have an index containing all the signicant terms you have used. However, for this course, an index is not required. If one is produced, it may be better that it be a global index covering all documentation for the project rather than being separate for the Users Guide. 2.2 Programmers Guide
Almost invariably, someone (perhaps the original authors) will need to modify the program. The Programmers Guide is meant for a knowledgeable user who wants to know how it works, i.e., it should portray the design details of the program. Each design detail is the conclusion of some design decision. It should include: A Description of Data structures. The Purpose and Specications of the Dierent Modules. Their Inter-relationships. Error-handling. Parameter lists. It should describe your program concisely so that when the user looks at the program, he/she knows where to look for a particular structure/function. 2.3 Source Code
This is an important part of the overall system documentation, and may be considered part of the Programmers Guide. It is identied separately because it contains the implementation of your modules and data structures, rather than just their description and specication. Your program should include the following features: Modular code with appropriate indentation. So that it is easily readable. Good choice of variable names. Comments. These should neither be too sketchy nor too verbose.
2.4
Testing Documentation
This should contain: A Test Plan. This describes the dierent tests that are to be carried out, what they test and their input and expected output. Actual Test Runs. This portrays the actual results generated by the program for specied inputs and forms a collection of examples for running the program. Testing can be carried out separately for each module of your project and the documentation should reect this.
References
This section details the books, journals etc., to which you have referred for the project and also points the reader in the right direction, should he/she desire to learn more about the technical principles behind the project.
Appendices
Here, you can include information that, while it may be valuable to certain readers, can be omitted while still understanding the gist of the overall report. Sometimes the appendix includes extensive descriptions of matter that is more concisely used in the report body. Some candidates for the appendix of your report might be: The Instruction Set of the machine. A Glossary of terms used in the report. A list of errors discovered in the program and how to x them. A common term for this list is Errata. Possible enhancements to the system. An Index.
CSE 560
The rst thing to consider in doing a CSE 560 writeup, or any writing assignment whatsoever, is the audience for whom you are writing. The actual audience for your 560 writeups, of course, is the grader (and/or instructor) for the course, yourself, and your lab partners. We would like for you to imagine, however, that you are writing to fairly typical computer users experienced programmers who would like to nd pre-packaged software to ll their needs rather than write their own. You should imagine that your nished documentation be available on the world-wide web for potential users to browse through or study. We can reasonably assume that if one of our hypothetical users cannot nd a software package that ts the bill exactly, he or she will be willing to try to modify one that is close. The rst consequence of writing for this imaginary audience is that your documentation should have several distinct parts that will be used for distinct purposes. these will be described below under the headings Users Guide, Programmers Guide, Test Plan, and Meeting Minutes. Presumably you already have had some experience coding and testing programs, but perhaps little experience designing systems. For this reason, this document has a sequel, CSE 560 Software Design, which goes into more detail on the design of systems. Another consequence of our audience is that the organization and style of your writeup are almost as important as its content. If a prospective user cannot nd necessary information about your program, he or she is likely to give up on your program and look for another. Above all else, you should be concise. Try to avoid redundancy as well as ambiguity and omissions. To convey relationships among elements of your report, use tables and pictures rather than prose whenever possible. The CSE 560 Software Design document provides details and a suggested format for useful tables and pictures. A CASE tool may be employed in preparing this information. Do not reiterate a program from your text. Algorithms described at the same level of detail as the program itself are useless to our hypothetical user. He or she needs the big picture, not bit twiddling details, most of the time. Nearly all sections/levels of your documentation should be hypertext. (Nearly all exceptions to this rule will be diagrams or pictures.) The top level should either play the role of a table of contents, or there should be a link from the top level to a table of contents page. Each part can be reached from this table of contents through a link, making it easy to open to any particular one. Each part should begin with its own cover page, stating document information (e.g., title, date written, primary author) and group information (e.g. names of members). You should generously supply cross references to other parts of your documentation; use hyperlinks to implement these cross references.
Users Guide
The users guide should explain what your program does, and how a user can get the program to do it. Our hypothetical user does not need to know why you wrote the program. He or she simply wants to do X and wants to nd out if your program can do it. Based on this part of your writeup, a user must be able to install your program, make it run, and be able to understand its output, error messages and all. Write the users guide as if the grader knows nothing about the specics of the lab; the users guide should be communicating those specic details. It should explain every aspect of running and using the software, including troubleshooting. When describing what your program does, it is not necessary to copy the original requirements verbatim into your documentation. It is perfectly reasonable simply to paraphrase appropriate sections of it, or attach in their original form whatever parts are important. Remember, however, that in CSE 560 (and in virtually any system you will encounter) some parts of the problem are left unspecied. This means that you will always have something original to say about what your program does. Note that when you are working in a group (as you are in this course), it is important to have this part of your documentation done very early so that everyone in the group is working from the same requirements. Descriptions of the inputs to and outputs from your program, including their formats, are essential in a users guide. By reading this document, the user should be able to visualize the reports produced by the program. Error messages and their descriptions are also essential, as are any instructions and conventions needed to access the program. CSE 560 Software Design has some further information about these issues because some of this information is needed for both the users guide and the programmers guide.
Programmers Guide
The programmers guide should tell the prospective user how your program works. This is necessary in case he or she nds that the program needs to be changed in some way. The user needs to nd out fairly quickly how much work the change will take. Having to turn immediately to a long program listing will discourage our user, and probably will result in your program being set aside in favor of another (and someone else getting credit for writing a versatile program), or will result in an unnecessary new system which will be costly and wasteful of resources. Instead, the programmers guide captures the design details of the program the blueprint by which the nal program was written. It is in this part of the writeup that you should describe your data structures, the algorithms you have chosen, the module structure, the way errors are handled, etc. This is not an appendix to your program. The user has not looked at your program yet, but rather is trying to nd out whether or not to look at it, and on what parts to concentrate. You should not force the user to turn to the program to make sense of the writeup, but there certainly will be details in the code that are not in the programmers guide. Note that, in addition to the users guide, the programmers guide should be a working document for your group. To this end, it should separately document: (1) data structures, (2) relationships among modules, (3) module interfaces, (4) modules themselves. In documenting data structures, it 2
is important to describe the role the structure plays in the execution of the program (e.g., an object called pc may represent the program counter of the virtual machine), as well as its implementation (e.g., pc may be a record having two elds, one called length and the other called value), and any invariants (e.g., pc has a value in the range 0 to 65,536). In documenting modules, it is important to show which modules invoke which others, as well as how individual modules work and, lest we forget, what they do. Modules that encapsulate a data abstraction should separate the specication (i.e., denition) and algorithmic (i.e. implementation) details. Parameter lists are an essential part, but not the whole, of module documentation. Module interface descriptions must include what a module assumes about its calling environment (requires) and what it, in turn, guarantees to perform (ensures). A programmers guide should contain a very thorough description of the ow of control of the software. By reading only this guide, a programmer can learn everything that the software does, and how it is accomplished, without having to look at any code. The programmers guide should provide sucient detail about the design of the software and how everything works together. The CSE 560 Software Design handout provides you with templates to help you express this information in an organized manner. A CASE tool may be employed to assist you in producing the programmers guide, and in sharing its contents with other members of your group.
Variable Name
Local/Global
Type
Declaring Module
Purpose
Code
An essential part of the documentation of any program is the source code itself. Despite the foregoing, all that you have ever learned about comments, choice of variable names, blocking structure, etc., still applies. Do not forget that someone modifying your program needs to be able to read it. The code should be organized so that individual modules and data structures can be found easily and read quickly. It is a good idea to adopt a precise coding standard, such as can be found under the class web site for C++ and for C.
Test Plan
Your test documentation is important to the hypothetical user for two reasons. First, it provides some indication that, at least sometimes, the program actually does what you say it does. Second, it provides a source of examples for running the program. It probably is obvious that the test 3
documentation should normally include a collection of actual runs of the program in which both the input and output are clear. Perhaps not so obvious is that the test documentation also should contain a test plan that describes the testing that is proposed to be done, rationalizes why these tests were chosen, and indicates the expected outcomes of each test case. In a sense, creation of the test plan is part of the design process. As such, it should be done early in the development process. If a mistake is found in your implementation, it should be possible to quickly nd where in the test plan this feature was (or was not) exercised.
Meeting Minutes
In this course, perhaps for the rst time in your major program, you will be working on a technical project as part of a group. Group projects oer the advantage of not requiring that each individual be responsible for every part of every assignment, but, at the same time, oer the disadvantage of having to depend on others to do part of the assignment correctly. Welcome to the real world. One of the most important elements in a successful group project is eective communication among the group members. You should meet with each other, and communicate via e-mail, frequently. At those meetings, there normally will be a set of topics covered, (possibly tentative) decisions made by the group, and perhaps assignments made to individual members of the group. A record of the meeting should be kept, including the date and time of the meeting, the topics covered at the meeting, the major ideas and rationals coming from the discussion, the conclusions reached (and their justication), and the assignments (often called action items) made to individuals as a result of the meeting. For each meeting, one of the group members should be assigned the responsibility of taking these minutes. The minute taker should type in the minutes and send them out to the members of the group via e-mail as soon as possible after the meeting. In addition, an archived copy of all minutes should be kept to be referred to by the group and by graders or the instructor. Each meeting (and the corresponding set of minutes) should begin with a review of the open action items from previous meetings, so that problems may be caught early. Action items should have milestones that are ne-grained enough to permit the group to determine whether a task is behind schedule early enough to be able to do something about it.
Members of the group will have dierent documentation responsibilities for the dierent labs. It is required that each member of the group have primary responsibility for a users guide or a programmers guide by the end of the quarter.
You should submit the file using the appropriate Carmen Dropbox. Only one person in a group should submit the lab. If there are multiple submissions, only the last one will be considered. Any submission dated after the due date and time is late. I would strongly suggest that groups test for problems in the process by submitting a test file early and seeing if they have any errors. Please email me if your group can't submit the test file. As a last resort, you can email a gzipped tar file to me. Because this last resort will cost me some time, I strongly discourage this.
Lecture #1
System Software Design, Development, and Documentation D l t dD t ti Introduction & Administration
0
Course Objectives
System Software Software Engineering
(A Little) Requirements Gathering Design Team Work
Writing (Documentation)
Choices
Hardware platform Software platform
Editor(s) Compiler(s) Compilation management (make) Configuration management (cvs) C fi i ( ) Off-the-shelf components Documentation
2
Remark
Now would be a particularly bad time to have to learn the main programming language that your project team will be using. Learning new off-the-shelf components also takes time.
Evaluation
We have to be able to verify independently that your source code produces an executable that has the desired behaviors. Therefore, if your team desires to use other than a CSE-provided platform, youll have to negotiate this matter with a grader, and come up with a short, written contract describing the agreement.
4
Graders
Our graders are: Sean O'Connor (oconnor.173@buckeyemail.osu.edu) Kai Li (li.966@osu.edu)
Lecture #2
Assembler
Machine code e.g., ...0110101110...
Linking Loader
Linked machine code e.g., ...0110101110...
Simulator
Executing program
8
. . .
PC Register Bank
Execute
Perform action specified by the contents of that memory location (may involve reading/modifying other memory locations or registers)
10
Memory
Organized in cells cells
i.e., smallest addressable unit
k 0 1 2 3
A cell is usually:
a byte (8 bits), or a word
N-2 N-1
Questions
How many different values can be represented in such a cell?
k 0 1 2 3
N-2 N-1
12
Number Representation
A number is a concept for which there are concept, many concrete representations/notations
e.g., the number eleven can be represented as 11 (decimal); XI (roman); 13 (octal); B (hex); 1011 (binary); k (alphabetic); etc
Signed Magnitude
Use first bit to represent positive/negative
S
1
Magnitude
k-1
Q. what is the smallest number? Q. Q what is the largest number? Q. how many numbers total? Notice: adding a negative to a positive number looks more like subtraction!
15
Ones Complement
To negate a number flip the bits number,
e.g., -5 would be:
Now addition always looks like addition (with ( i h an end-around carry) d d ) But the range is still 2k - 1 Why?
16
Twos Complement
To negate a number:
i) flip the bits ii) add 1
Imagine a circle:
0000 1111 1110
0 -1 1
0001
-2
7
0111
18
Let k be the number of bits in the representation representation. k 1 x 2k 1 1. Number x is representable iff -2 Assume so in the following. Let bin(y) be the simple binary representation of y. If 0 x, then x is represented by bin(x). Oth Otherwise, x i represented by bin(2k + x). i is t d b bi (2 )
19
10
Lecture #3
20
Instructions
Basic format of instructions in memory:
OP CODE OPERANDS
(A11-560 instructions are all the same size) Operands can be interpreted in different ways
21
11
Addressing Modes
Immediate
argument is the operand
LOAD #6 effect: ACC 6
Register Direct
operand gives the register where argument is
LOAD r1 effect: ACC r1
22
23
12
ACC ACC
M[1B00]
24
M[ M[1B00] ] M[ ]
1B00 1B01
13
A11-560 Instructions
4 OP 2 U2 2 R 2 X 2 U1 8 S
addressing modes:
immediate, register direct, memory direct, PC relative (with or without indexing), opcode extension, and ignore
general syntax:
R S(X) -
OP R,S(X)
S , if X = 0 S + rX , if X = 1,2,3
27
14
Categories of Instructions
Branch
unconditional and conditional
Load / Store
copies data between registers and memory
Arithmetic / Logical
addition, subtraction, shifting,
IO
read/write numeric and ascii data
28
15
31
16
. . .
B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 BA
Memory
7 0 B 9 B 0 B 9 B C 5 0 2 3 2 3 0 3 0 3 0 4 0 0 8 0 8 0 0 0 0 0 6 A B 0 0 B B B 0 0 F F 0 A 0 8 4 0 6 8 0 F 5
Instruction/Data
. . .
32
Lecture #4
33
17
Completed Documentation: October 18, before class, in the Carmen lab2 Dropbox.
34
Lab 2 Requirements
Input two text files put te t es
executable file: initial header record, followed by a sequence of text records process input file: consulted at process run time by IO instr. process trace:
1. memory configuration & registers after loading 2. trace of each instruction executed (i.e., memory & registers affected) 3. memory after termination
18
37
19
Example gotchas
What is a segment good for? How is it used? How might a segment name be used?
Ask the customer.
Lecture #5
39
20
Software Engineering
40
21
However system complexity frequently However, pushes the envelope Net effect:
The support for building complex systems always seems to lag behind the systems we build (or want to build) !!
42
43
22
easy to read
coding conventions and style
23
46
Fundamentals
Many alternatives to pure waterfall exist
spirals, matrices,
24
write a design once we figure out how the system should work.
48
ON TESTS
Requirements Analysis does not aim to answer any of the following questions: What environment will the system operate This is in? deferred 'til later What will be the performance characteristics of the system? we haven't thought about HOW to implement h t i ti f th t ? this yet. What will be the cost of delivering the system to the client? same as above.
49
25
50
26
Procedural programming deals with data. There is less abstraction. It deals with transformations that happen to data.
Procedural Design
U thi data-flow to decompose a large Use this d t fl t d l system into smaller modules
53
27
ON TESTS
OO focuses on the data itself,not the functionality. Of course, looking at the name, this makes sense.
We think of different types Focus on the data of data with different Objects. Objects have functionality by passing messages to each other by calling each others' methods.
55
28
Design is often based on reality Talk to field experts to understand the system being modeled in software. Write down scenarios
use case analysis Very important. (JankCMS) use case
Must strike a balance (small vs. large) in the size and functionality of classes
cuts across both procedural and OO
57
29
58
Lecture #6
59
30
Road Map
Project - Lab 2 Form groups Requirements analysis Design review Submit complete project (i.e., implementation, documentation, tests) Lectures Admin stuff Abstract machine Software Engineering <detour!> Testing Technical Writing
60
61
31
Overview (II)
Examples:
operating systems software used to create other software!
Could be on exam --->
--> Compilers and assemblers are both translators (from one language to another) Lab 2 is a simulator (interpreter) Usually a debugger is an interpreter (allowing us to see code execute)
compiler linker / loader assembler Linker / loader debugger editor could use word as system software (writing C++ code, etc.)
A program is static, A process is dynamic. A process uses a program's instructions to carry out actions.
32
time
A= 3 B=4 PC = S3
33
An interpreter is a program. When it becomes a process (when we run it) it becomes something that advances another process through its states. That other thing is a program that we give the interpreter. Simply: Running an interpreter becomes a process that advances another process through its states. BASIC is interpreted. Java involves both translation and interpretation. Java bytecode is usually interpreted. A translator is a program, but when we run it, we give it another program, and it produces a 3rd program. The process a translator governs is merelt a translation process. C++, for example, is translated to machine code. Possibly with an interm translation to assembly language. Java also involves translation, by having source translated to bytecode (.class files) Advantages of Translating: Faster, because its translated to machine code. Disadvantages of trans: symbolic debugging is difficult. Advantages of Interpreting: Quicker debugging, prototyping. Disadvantages: Slower.
THIS IS ON TEST.
Interpreter
66
Why are we concerned more with OS instead of architecture when compiling? In C++, we have input/output streams IO, for example, is effected by the language using operating system routines.
34
Any langauge can be translated or interpreted. The CPU always interprets its instructions.
Lecture #7
69
35
The operating system defines when a process really "begins". Though the CPU is the primary resource, the OS controls it and other pieces like memory in order to allow a process to execute.
OS - Introduction (II)
Resources must be managed This is the job of the operating system (OS)
This is one of the biggest tasks of the OS.
71
36
Concurrency is fundamental to the use of an operating system. Concurrency is hard because of scheduling. Priority, length of time waiting (starvation), and many other factors.
Challenges in OS
Concurrency is fundamental Concurrency is hard
Example: sharing a bridge
Long tunnel that only fits 1 lane of traffic What policy do you use to control traffic?
Example:
Process A is using X and needs Y, " B" " Y " " X <-- Deadlock OS must avoid and/or detect and resolve deadlock and starvation
waiting too long to get resources...
72
Responsibilities of OS
Handles interrupts
Interrupt-driven IO is supposed to increase efficiency of simultaneous requests.
37
Responsibilities of OS (II)
virtual memory may be larger than real memory!
Responsibilities of OS (III)
File management
These let us access bytes in a file without keeping CPU track of our place. schedules processes (i.e., running / ready / waiting) process cannot make progress without something else. (User interaction, etc.) Securityprocess is using CPUprocess is capable of making progress by using CPU
prevents one user from damaging anothers data prevents user from damaging operating system
75
38
39
40
Documenting Operations
Name: CheckHeaderSyntax()
description: This function checks whether or not a given string conforms to the required syntax for header records (see section 2.3.4) calling sequence: input: char *s - h d record to be checked i h * header d b h k d returns: boolean - true iff header syntax ok requires: ensures:
80
Visible State
Basic principle: information hiding information hiding
hide implementation details from client
simplifies interface client shouldnt rely on these details
For each shared type, then, there are two kinds of specification: internal & external
81
41
82
Lecture #8
83
42
Testing
1. 1 Philosophy 2. Example 3. How tos (including code) and caveats 4. Levels of Testing
84
Testing: Philosophy
85
43
Definition of Testing
What is testing? testing ?
A process whereby we increase our confidence in an implementation by observing its behavior
Fundamental point:
testing can detect the presence of mistakes, why we should never their absence! That'swithout testing! be confident in our code
A test case reveals a defect ==> Fix it! No test case reveals a defect ==> Not enough testing!
86
Importance of Testing
Despite limitations, testing is the most limitations practical approach for large systems Knuth quotation:
Warning: Ive only proven this algorithm is correct I havent tested it! Haha
87
44
When a test reveals an error, thats success! Good approach: have someone else test your code
This is one of the best things about working in a team.
88
Theory
3 levels of abstraction in functionality Want: the idea Have: implementation Testing requires comparing it against i i t something, but what?
Idea Id
capturing this idea into a concrete form.
Specification
Implementation
89
45
Theory (II)
Ideal: test against our idea idea
but the idea is usually too fuzzy
If different people, based on a specification, write two different implementations, should have the same expected output.
Testing: An Example
91
46
elements(SORTED(List)) = elements(List)
Expected Output
A: #x is a permutation of x, and for all y in x, ARE_IN_ORDER(y,y+1) Specifications often relate final states to initial ones
but not necessarily true e.g., void f(int & x) g,
93
47
94
48
96
97
49
Lecture #9
98
extremes (e.g., empty list, |x| = 100) trivial / degenerate (|x| = 1, x is already sorted) error-generating at least one (probably more) test cases that generate each possible error message i different categories (e.g., pos./neg. numbers) Also, inputs that cross categories. typical input (random list)
99
50
3. Work backwards
Example: sorting
write two functions: copy the input run program and check:
Check
51
INSERT: Code for Testing Harness Insert 1: Test Driver for Sort Insert 2: Test Suite for Sort Insert 3: Validating Form of Test Driver
102
Which is worse?
52
It can be vague on a point. It could say something other than what the author intended. (Conflicts with the original idea)
Testing: Levels
105
53
Levels of Testing
Typical testing path:
1. Unit tests Testing individual pieces of the software independantly. 2. Integration tests Testing how the individual parts communicate. 3. System tests Testing entire program behavior
106
Unit Tests
Individual modules tested in isolation Two flavors:
1. Black box: testing based only on specification (tester doesnt even look at code) 2. White box: testing based on code structure (e.g., tester makes sure every branch of a switch statement is followed)
107
54
Integration Tests
Modules tested in combination in order to check the interfaces Best done incrementally
Main
here
here
Initialize
here test here
Load
here
here
Simulate
here
FileIO
Header
here
Reserve
108
SetMemory
55
System Tests
Verify that system as a whole meets the requirements and specifications Three flavors:
1. alpha: by developers, before release y y , general 2. beta: by friendly customers, before g release 3. acceptance: by end customer, to decide whether or not to hire you next time!
110
Lecture #10
111
56
Technical Writing
112
57
114
Why Bother?
Communication is fundamental in society
Politics, law, science, personal lives, health,
115
58
117
59
119
60
It's a good idea to say somewhere the requirements for the audience.
61
Technical Audience
Function oriented organization Function-oriented
E.g., alphabetical listing of all functions
Technical Audiences: Function - oriented. Cusomer Audiences: Task - oriented. => More of a tutorial. (They don't care about "why")
Customer Audience
Prefer task-oriented organization task oriented
E.g., enumerated steps for each possible task
62
Knowledge Facts Comprehension Understand of a fact's implication Application How we apply facts to the current situation. Analysis Creating new ideas from those we already have. Synthesis Joining what we know together, creating connections to other topics and ideas. Evaluation Judging ideas based on their importance and
validity.
125
63
Comprehension
associate, compare, compute, contrast, describe, differentiate, discuss, distinguish, estimate, extrapolate, interpolate, predict, translate.
126
Analysis
analyze, detect, explain, group, infer, order, relate, separate, summarize, transform.
127
64
E l ti Evaluation
appraise, assess, critique, determine, evaluate, grade, judge, measure, rank, select, test.
128
129
65
Example
Compare the performance of two cache Compare replacement algorithms Cognitive tasks
Compare Contrast Maybe analyze and recommend?
Prewriting Tasks
Quick list
Specific points for each cognitive task
Brainstorm
List everything you know about the topic Do not judge or weed anything out Obj i quantity Objective: i
Review list
Assess where research is needed
131
66
Prewriting Tasks II
Choose a single point that will be in the final product and outline a section to develop that point. Involve other points as appropriate. Do the research. Plan the format. This is not a linear process!
132
133
67
Lecture #11
134
135
68
136
Advantages of Components
The whole document is too intimidating Obvious milestones
Reduces panic (you know where you stand) Permits time budgeting
Reduces writing to a step-by-step process Instant gratification Easy cure for writers block: work on a different section
137
69
Writing a Component
Know the purpose Have all the information Different strategies:
Write a draft using sentences Jot down points in any form then flesh out into form, sentences Combination (sentences, phrases, points)
138
70
Rhetorical Patterns
Every culture has well-established patterns of exposition
Ready-made structures into which specific information may be dropped
The reader is already familiar with these patterns The technical writer does not have the time (or skill) to invent new ones
141
71
General-to-specific Pattern
Often used for introductory section Start with the most general statement
More computing resources are devoted to the management of data than to any other task
Classification Pattern
Organize information by dividing it into categories
E.g., a section on each of initialization, data entry, selection, access control, etc
72
73
Definition Pattern
Typically short and simple Example
Undo is a function that restores an object to its state immediately prior to that last operation Places Undo in the class of functions, then distinguishes it from other functions
146
Chronological Pattern
Typical for task-oriented instructions task oriented Given in the order in which they must be performed
147
74
Effect-and-cause Pattern
Often used for error messages Give a list of error messages
ordered alphabetically, by error number,
After each cause, give the action(s) the user should take to recover
148
149
75
Possible Transitions
Moving to the next point in a sequence
Firstly, secondly,
A result or conclusion
Therefore, in consequence,
Possible Transitions
Introducing an example
For example, that is,
Concluding
In conclusion, in summary, finally,
151
76
Preliminary Draft
Starting is always difficult
Helps to remember its just a draft!
Dont worry about spelling, grammar, form Spend effort on sound communication of major points Fill in your outline
152
Middle Draft
Build on the base of the preliminary draft Refine the organization and fill in points Ensure each point belongs in that paragraph Cut and paste Play ith th t t Pl with the text
Font, layout, spacing, page count
153
77
Final Draft
Spelling and grammar
Run the spell checker, but thats not enough!
Revising
Where bad writing becomes good writing First draft is always bad
Tempting to become attached to text weve written Write the first draft anticipating that it will change in the future
155
78
Revising Tasks
Add flow and smooth transitions Careful, accurate movement from one point to the next, one section to the next Make decisions that have been put off Reduce wordiness Clarify subordination relationships between points
156
157
79
159
80
160
161
81
Winston Churchill:
That is criticism up with which I will not put.
162
163
82
165
83
Tone
Distant warm, intrusive Distant, warm
166
Bottom Line
Technical writing requires work, practice, work practice skill, technique, time; not talent. The first draft is always bad writing. Allow time (and energy) for revisions. There is no substitute for having something g g to say. You cant bluff it.
167
84
Lecture #12
168
CVS
169
85
170
CVS: Examples
The following examples show an existing project being put under CVS How to start using the repository Then two different people making changes:
Putting modified file into repository Getting each others changes. Finding out how things have changed.
171
86
The Repository
Two ways to set the root of the repository root
Environment variable
setenv CVSROOT /project/c560ab05/CVSREP
172
Creating Repository
Once per project by one person project, (with umask 7) Command:
cvs init
Creates repository root, administrative files Check that group and other permissions have been properly set. (Use ls -alF.)
173
87
174
Copies current directory contents to module Afterwards, original source can be removed , g
But be careful!!
175
88
Checking Out
Once per person Command:
cvs checkout <module>
89
Committing
Commit (copy) changes made on local (working) files to the repository Command:
cvs commit
New files created in local working directory must be explicitly added (before commit)
cvs add <new-file>
179
90
Committing (Example)
% cd ~person1/mycode/sim % (modify loader.c and create memory.h) % cvs add memory.h cvs add: scheduling file memory.h for addition cvs add: use cvs commit to add this file permanently % cvs commit cvs commit: Examining . [editor starts; type & save log entry; exit editor to cont.] Checking in memory.h; /project/c560ab05/CVSREP/sim/memory.h,v <-- memory.h Initial revision: 1.1 done Checking in loader.c; /project/c560ab05/CVSREP/sim/loader.c,v <-- loader.c New revision: 1.2; previous revision: 1.1 done
180
Updating
Each person, with appropriate frequency person Command:
cvs update
Brings your local working directory up-to-date with repository (merging differences if possible)
U : local file was updated A/R: local file added/removed M: local file is a modification of repository C: conflict detected between local file and repository
181
91
Updating (Example)
% cd ~person2/mycode/sim % ls CVS/ loader.h loader.c simulator.h simulator.c % cvs update cvs update: Updating . U loader.c A memory.h % l ls CVS/ loader.c simulator.h loader.h memory.h simulator.c
182
Working on Project
Multiple people can simultaneously checkout the same module Person1 and Person2 are both working away on their local copies
If working on different files, no problem g p
Not quite true!
92
Conflict Resolution
Person1 checks out code
modifies loader.c
Resolving Conflicts
CVS tries to merge changes Sometimes changes clash
% cd ~person2/mycode/sim % cvs update cvs update: Updating . RCS file: /project/c560aa/CVSREP/sim/loader.c,v Retrieving revision 1.5 Retrieving revision 1.6 Merging differences between 1.5 and 1.6 into loader.c rcsmerge: warning: conflicts during merge cvs update: conflicts found in loader.c C loader.c
185
93
When to Update/Commit
When confident things can be used by others
Dont wait until perfection Your commits should at least compile though!
Update when you are ready for someone else s elses work The more files, the better
187
94
188
189
95
Lecture #13
190
191
96
Definition
Recall: translation (vs. ) ______________ (vs
source program translated into target program (virtual) execution of the target on its VM should represent (have the same behavior as) (virtual) execution of the source on its VM source is not directly executed target (object file) is executed or translated later
192
Definition II
When the source is a symbolic representation of machine language:
source language = __________________ translator = __________________
193
97
performance ?
195
98
196
197
99
Hard to read
high cost of maintenance
can be 2/3 of total 15% (annual) programmer turnover
198
Modern Approach
Write in high-level language high level Analyze to find where time spent Invariably, its a small part of the code Tune that tiny part for high performance
perhaps by writing in assembly language
199
100
Modern Approach II
Higher level can be a performance win too!
problem-oriented language gives problem-level insights huge performance gains are in algorithmic insights
e.g., O (n3) vs. O (n lg n)
assembly language programmer tends to be immersed in bit-twiddling (saves small amounts all over, but misses big picture)
200
201
101
102
Example Instruction
Test BRZ 1,Loop ;if R1=0 goto Loop
label
operation
operands
comment
204
Label Field
Symbolic name for an instructions or a instruction s datums address (often, but not always) Clarifies branching to a particular instruction
e.g., BR e.g., IO Loop1 2,depth
103
Operation Field
Mnemonic for an instruction
e.g., ADD, SUB, BRZ
206
Operand Field
Addresses and registers used by instruction
recall: arguments to the function
What to add, where to branch, where to store, Operands for pseudo instructions pseudo-instructions
used to give information to the assembler e.g., program name, how much space to save,
207
104
Comment Field
No effect on translation
no semantic impact on program
208
Lecture #14
209
105
Example Program
If we want:
N := I + J + K;
210
106
Pseudo-Operations
Recall: operation field can be either: operation
instruction (BR, SHL, ) pseudo-op
Unlike operations, do not have a machine ( p ) q instruction (opcode) equivalent Give information to the assembler itself
assembler directives
212
SPARC Pseudo-Operations
I_s: I s: J_s: K_s: N_s: A_s: .word .word .word .word .skip 0 0 5 0 400
213
107
Pseudo-Ops: Uses
Four principal uses:
segment definition symbol definition memory initialization storage allocation
214
Segment Definition
Recall information in header record:
initial execution address segment name length load address
108
Segment Definition II
Two important pseudo-ops: pseudo ops:
ORI END (origin) (end) MainP ORI 133 ST 0,136 . . . END 137
ST 0,136
89x
109
Symbol Definition
A label creates a symbol Symbol is often implicitly defined to be the address of that instruction and/or data Hello Test ORI 133 ST 0,136 BRZ 1,147 . . .
Symbol Definition II
(133) 85x ST 0,136 86x BRZ 1,147
110
Example:
ACC
220
Use of Symbols
Example 1: ADD ACC,106
translates as: i.e.:
111
. . .
B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 BA
Memory
7 0 B 9 B 0 B 9 B C 5 0 2 3 2 3 0 3 0 3 0 4 0 0 8 0 8 0 0 0 0 0 6 A B 0 0 B B B 0 0 F F 0 A 0 8 4 0 6 8 0 F 5
Instruction/Data
. . .
222
Memory Initialization
Recall Top\n example Top\n Sometimes want to load data into memory
might be able to use corresponding instruction (because machine doesnt care!) but that is inconvenient (and there may not always be a corresponding instruction)
112
Example:
Count NMD Text CC CCD CCD CCD CCD 10007 He ll o!
224
H l o
e l !
225
113
Lecture #15
226
Storage Allocation
Set aside a block of memory not initialized (i.e., dont care) Pseudo-op:
RES (reserve storage) X Buffer Y NMD RES NMD 0 100 0
227
Example:
114
Yields:
228
115
Symbols - Shortcomings
Weve seen a lot of utility for symbols We ve
mnemonics for data constants & memory addresses
Symbols - Shortcomings II
Problems with this approach:
___________________ ___________________ ___________________ ___________________
231
116
Alternative: Literals
Implicit allocation & initialization of memory Allows us to put the value itself right in the assembly language instruction Preface with = Example:
LD 1,=582
232
Literals
This means:
allocate storage at end of program initialize this storage with the value 582 use this address in the instruction
117
Literals - Restrictions
Must be in the range -219219-1 2 2 1
can be represented by 1 word
Can only replace the S field Cannot be indexed Cannot use with:
loading an immediate value, branch, store, shift, IO for reading, IO for writing a character
Lecture #16
235
118
Loader
Executable File
Emulator
236
Memory
0 35
50
Footprint
255
237
119
Lab 3
Assembler does not need to keep this (potentially huge) array/footprint Instead, use tables (symbol, literal, ) and location counter Generate object file only ( G j f y (much smaller) )
238
Assembler Tasks
1. 1 Parse assembly language instructions
check for syntax tokenize the character string maintain location counter (LC) ( ) LC = eventual location in memory of this instruction or data
239
120
Assembler Tasks II
3. 3 Generate machine code
evaluate mnemonic instruction
replace with opcode recognize & translate synthetic instructions (RET, etc.) replace symbols & literals with value
evaluate operand subfields concatenate to form instruction generate header record evaluate NMD, CCD, etc
240
4. Process pseudo-ops
121
Example
1 2 3 4 5 6 7 8 9 Prog Acc Begin ORI EQU LD ADD ST BR NMD RES END 20 0 Acc,N ;R0 <- 13 Acc,=1 ;R0 <- R0+1 Acc,Ans M[Ans]<-R0 0,0 0 0 13 1 Begin
242
N Ans
First Attempt
Read each input line and generate machine code Line 1
information for header but not enough for full header record g we do know:
20
243
122
First Attempt II
Line 2:
information for assembler symbol Acc set to 0
Line 3:
yeah! An instruction to translate! ( y (LD Acc,N) , ) yields:
244
Solution:
245
123
Now let s see some basic data structures for lets assemblers...
246
Machine Op Table
Mnemonic Name Opcode O d Instruction Size Instruction Format
Static (doesnt change during computation) ( p ) For the (simple) abstract machine:
all opcodes are 4 bits all instructions are same size (i.e., 1 word) all formats are the same (i.e., O|U2|R|X|U1|S)
247
124
Machine Op Table II
But this need not be the case in general
different opcode lengths
common instrctns. have short opcodes (e.g., 0110) less common ones are longer (e.g., 1110110)
248
249
125
Pseudo Op Table
Mnemonic Length Format
Location Counter
Eventual address of this instruction or data Initialized with __________ Increase with each instruction
see _________________
251
126
Symbol Table
Name Value Other Stuff
Pass #1
each symbol is identified
every time a new symbol is seen (i.e., a label), insert it into the Symbol Table if its already there?
252
Symbol Table II
Pass #1 (continued)
each symbol given a value
explicit assignment (e.g., Acc EQU 0)
easy! just put the value of operand into the table
127
254
Lecture #17
255
128
Literal Table
Name Address Value Size Other Stuff
P #1 Pass
literals are identified and placed in the table name, value, and size fields updated duplicates can be eliminated
256
Literal Table II
After Pass #1:
literals are added to the end of the program address field can now be calculated
Pass #2:
literals in instructions are replaced with the p __________ field from the Literal Table what if the literal is not in the table?
257
129
Pass #1
Pass #2
Symbol Table Location Counter Literal Table Machine Op Table Pseudo Op Table
Listing File
258
259
130
Pass #1 Invariant
Top ORI 34 ... Loop --- --... ... S EQU --...
Key invariant:
current position
261
131
1. 1 Pseudo code for 2-pass assembler 2 pass 2. In-class exercise: hand assembly
calculate symbol and literal tables calculate loaded image in memory calculate object file j
262
Lecture #18
263
132
Relocation
264
Absolute Programs
Programmer decides a priori where program will reside
e.g., Prog ORI 176
265
133
Picture is dynamic
jobs are scheduled jobs complete
134
269
135
271
136
272
What Changed?
Load Address = ________
273
137
274
Relocation
The loader must update some (parts of) text records, but not all
after load address has been determined
138
Modification Records
One approach: define a new record type
Tag Location
Modification Records II
We could add the following records to the object file:
M01 M02
139
278
Size of relocation data independent of number of records needing modification g Hard to read (debug, grade,)
279
140
Lecture #19
281
141
Kinds of Data
Our machine has two flavors of data:
relative (to the load address) absolute
The first must be modified, the second not Lets look at how these kinds arise Let s arise
282
Example 2
EG2 TS V Start ORI EQU NMD LDI LD LDI LD LD SUB BRZ BR BR END 27 27 1,V 2,0(1) 3,TS 0,0(1) 1,0(1) 1 =27 1,=27 1,Stop 3,Start 0,V(3) ! ! ! ! ! ! ! ! ! ! ! TS = 27 [V] = 27 R1 = V = ?(relative) R2 = [0+V] = 27 R3 = TS = 27 R0 = [0+V] = 27 R1 = [0+V] = 27 R1 = 27 27 = 0 if (R1 is 0) then halt goto Start halt; dump all
Stop
283
142
Symbols
Some are relative:
e.g.,
Symbol Table
Name Value Relative?
284
Symbols: Rules
A symbol is absolute if and only if it is defined in an EQU by:
_____________, or _____________________
285
143
Literals
Our machine does not have relative literals Other machines allow a special literal, =*, to mean current location counter
e.g., LD 1,=* such a literal is relative, others are absolute ,
Name Star1 =6
286
Value
Address
Relative?
Literals
With literals, relative refers to the value literals relative
the addresses are always relative!
287
144
288
Convention
To denote a relocatable program, program omit the operand of ORI
Prog1 ORI 96 Prog2 ORI (absolute) (relocatable)
289
145
290
Overview
Given: a collection of <tag,value> pairs <tag value>
e.g., symbol table
Searching =
g given a tag, return corresponding value g, p g
291
146
Intentionality of Specification
What do we do if key not in table? y
a) return an arbitrary value b) crash, halt, explode c) return a special value (NULL, error, )
Traditionally, we want (c) But what if client knows key is in table? Pay for this extra checking with each call to search? The defensiveness dilemma; maybe options a) and b) for production look better if checking components are better, available for development. The point: intentionally decide what your specification is, and document your decision.
292
147
Linear Search
Algorithm:
compare target with 1st key if match, then done (return value) else, compare target with 2nd key if match, then done (return value)
Advantages:
294
148
Time
Table Size
297
149
Lecture #20
298
Binary Search
Algorithm to search among 2 or more items
compare target with middle key if target middle key, then search first half if target > middle key, then search second half
150
Table Size
300
time to build =
301
151
Estimated Search
How do we search for things?
e.g., finding Brutus in phone book binary search? No! Make a guess
152
Hash Functions
Used to insert and to search
insert(key) --> into h(key) search(key) --> look in h(key)
153
0 0 h(Key) N-1
For example:
307
154
collisions
insert
308
c) Rehash with another function: open addressing with quadratic probing or double h hi (to d bl hashing (t approx. uniform if hashing)
cost: ___________________
309
155
Complexity - Insertion
For approach c) (uniform hashing), how hashing) long does an insertion take? #attempts depends on _________________
e.g., first entry never collides (1 attempt)
Insertion II
Let p(t) = prob insertion requires t attempts prob.
p(i ) i
i =1
Example: if a rep. array is half full, how many attempts does the next insertion take?
311
156
312
Building a Table II
A=
X
# Attempts 1 0 r
1 dr 1 r 0
= ... = _________
In our example: X = .9; array size = 1000 E(# of insertion attempts) 2.303 * 1000
313
157
# Attempts V 1 0
( (
) )
Lecture #21
315
158
Make
316
Prerequisite DAG
Large project: mix of files generated by people and by tools Contents of one file often depend on contents of some others (acyclic structure)
human final product(s)
317
machine
actions
159
Permits distribution of partial DAG to client Automates the partial creation process p p
Identify subDAGs that are out of date and need to be rebuilt, and invoke corresponding actions
318
Example Applications
Latex documents
tex, bib, bbl, dvi, ps
Report generation and filtering Compiled and linked code, object files, and executables
cpp, h, o, a, exe
160
320
Makefile
DAG is represented in a makefile makefile
target: prerequisites command command rule
Notes:
command lines begin with tab line continuation with \ Comment lines begin with #
321
161
Processing Makefiles
Default: first target is the final product final product Rule: if any prerequisite is newer than target (or target does not exist), then execute associated commands But first (and in any case): ensure all ( y ) prerequisites are up to date!
recurse to rule that has prerequisite as a target
322
162
Variables
Frequently used strings can be replaced by variables Defined with = and referenced with $ Example
CC = g++ CFLAGS = -g prog: prog.c defs.h $(CC) o prog prog.c $(CFLAGS)
324
Implicit Rules
Describe when and how to remake files based on their name (extension)
E.g., <file.o> depends on <file>.c The associated command is cc c <file>.c
163
Q: How can we write the command associated with such a rule? A: Automatic variables
%.dvi : %.tex latex $< latex $<
326
Automatic Variables
Not standard between make tools Gnu (gmake):
$@ - target filename $< - name of first prerequisite $ $? - names of all prereqs newer than target p q g $^ - names of all prerequisites
164
328
Lecture #22
Review for Midterm
329
165
Lecture #23
Midterm
330
Lecture #24
331
166
Expressions
332
Introduction
Most assemblers permit use of expressions Used as instruction operands
in machine ops and pseudo ops
167
Introduction II
Individual te s may be: d v dua terms ay
constants (e.g., 4, A, 0x3F) user-defined symbols (e.g., X, Buff) special terms (e.g., * for LC) parenthesized expressions (e.g., (X-Z) in (X-Z)/2)
Examples
Buff RES ST ST 4 2,Buff 2,Buff+1 Buff RES 4 BEnd EQU * Len EQU BEnd-Buff
334
Relocation
Expressions are evaluated at ____________
(not entirely true as well see later)
335
168
Absolute Expressions
An expression is absolute iff:
1. it contains only absolute terms, OR 2. it contains relative terms provided:
i) they occur in summation pairs, AND ii) terms in each pair have opposite sign, AND iii) relative terms do not enter in * or /
Examples of 1: Examples of 2:
336
Relative Expressions
An expression is relative iff:
1. all relative terms can be paired as above, except one, AND 2. that remaining unpaired term is positive, AND 3. relative terms do not enter into * or /
Examples:
Buff+6
337
169
Motivation
These restrictions are not arbitrary They ensure the expression is meaningful after relocation If the restrictions are not met, the expression is erroneous p
338
Examples
Name X Z Y Value 16 6 4 R/A R R A
X+1-Z = 2+X/Y = Z+X = Z-Y = Y-Z = (X-Z)/2 = ((X-Y)-(Z+Y))*Y = (X/2) - (Z/2) = (X-Z)/2 =
339
170
Generalization
A relative value has the form:
LL + OFFSET
Load Location Indept of Load Location (i.e., absolute)
Generalization - Examples
R1 - R2 = = R1 - A = = A - R1 = = (LL + OFF1) - (LL + OFF2)
(LL + OFF1) - A
A - (LL + OFF1)
341
171
Lecture #25
342
Loaders
343
172
What, Again?
So you ask:
Havent we done this already? I built one in Lab #2 !
344
Problems
Programmer is responsible for putting absolute addresses in code
error-prone
173
Problems II
Program must be self contained self-contained
would prefer to allow separate assembly
make one change, do not have to recompile the whole thing
Problems III
would prefer to have the flexibility to write different parts in different source languages
some languages are better suited to certain tasks than others library functions could be written in a single language (rather than rewriting for every possible source language)
347
174
Object Files
.o
Library Files
.a
Memory
Fortran Program
.f
Fortran Compiler
.o Loader
Assembly Program
.s
Assembler
.o
348
General Loaders
This requires standardizing the format of the object file Each source language translator then follows this standard
349
175
Summary of Advantages
Needn t Neednt worry about address arithmetic More than 1 program in memory at once Assemble code once Separate assembly Libraries Lib i Multiple languages
350
2. Allocation
select area in memory for program
3. Relocation
adjust address references in object file
4. Linking
combine multiple object modules
5. Loading
351
176
Types of Loaders
Different loaders differ with respect to how these tasks are accomplished Types include
compile-and-go absolute relocating linking dynamic loading dynamic linking
352
Compile-and-Go
Observation: an executing translator is, is itself, a process governed by a program residing in memory! Reserve memory at the end of its block As source is compiled, object code is placed directly i t thi di tl into this reserved memory d
loader is really just part of the compiler example: WATFOR Fortran
353
177
Picture
Memory
Source Program
Translator
354
Advantages / Disadvantages
Advantages:
speed: need not produce intermediate file d d t d i t di t fil (I/O is always slow) batch environments: compiler remains resident in memory (so very low start-up cost)
Disadvantages:
must recompile every time you run
no object file produced bj t fil d d
178
Absolute Loaders
Familiar loader from Lab #2 Consider responsibility for each task:
allocation: calc. length of all modules _______ calc. actual load location ________ relocation: ______________ linking: ________________ loading: ________________
356
Absolute Loaders II
Advantages:
simple, fast, small, programmer-controlled
Disadvantages:
program must be self-contained: programmer must edit library subroutines into one assembly language source file to run at a different memory location, reassembly is required
357
179
Special Case
Observation: any loader is a program
i.e., resides in memory; executes as a process
From an idl machine, we need a way to idle hi d start things up Solution: a bootstrap loader
358
Bootstrap Loader
Store a special program in ROM This program is automatically executed at power-up This program is an absolute loader
reads records from an input device puts them in a predetermined (absolute) location
Control is then transferred to loaded program (which can load other things, etc.)
359
180
Lecture #26
360
Relocating Loader
We need the assembler to do 2 things:
flag relative values (e.g., with modifn records) produce size-of-segment information
machine code
181
Relocating Loader II
Loader performs relocation
adds LL to all relative values can be done in a single pass
Advantage:
more efficient packing of memory p g y
Disadvantage:
no external subroutines or libraries
362
Time for a slight detour, as a motivational detour aside for remaining loader types
363
182
Subroutine Linkage
364
Motivation
Example: want to calculate a square root
first write our program (in assembly) now how do we use this code?
183
Motivation II
Idea #2: have a separate section
ORI ... Sqrt LD 0,=0 SHR . . . etc etc t t
366
Branching
Want to branch:
to Sqrt, and (after were done) return to caller
184
Branch-to-Subroutine
I e BRS R,S(X) I.e., R S(X) Example: 3Fx : BRS 1,Sqrt
loads the PC into register R branches to location Sqrt
368
(Aside: our synthetic instruction, RET, is a more direct way of expressing this)
369
185
Sqrt Value
Calling Conventions
Program and subroutine must agree on:
where to branch for function where to return when done where to put/get argument(s) where to put/get result(s)
In previous example:
return address in register 3 argument in register 1 result in register 1
371
186
Calling Conventions II
So conventions are required conventions
e.g., caller always places return address in register 3
if function uses r3, must save value first
caller pushes return address onto stack caller stores return address in first word of subr.
373
187
Separate Compilation
Would like to have in our program the line: BRS 3,Sqrt where Sqrt is a label in a different program! (Aside: what does your current assembler do with such a thing?) We extend our language and provide a (typical) mechanism for resolving this...
374
Lecture #27
375
188
Q: why not make global the default, or make them all global?
377
189
Sqrt
378
How does the linkage between these object files get resolved? g So now back to loaders
379
190
Relocation
assembler flags words for relocation (bit masks) loader makes modifications
381
191
Loading g
by loader
382
Transfer Vector
Contains 1 entry per external symbol used by this program segment Assembler sets aside room at the beginning of the object file for TV Assembler places symbolic representation p y p of referenced external symbols in TV
sqrt Transfer Vector Program
383
192
Transfer Vector II
Assembler replaces all calls to external symbols with calls to appropriate locations in TV Loader replaces the entries in TV with calls to the appropriate location
0 ORI EXT Sqrt ... CALL Sqrt 6 7 Assembler CALL 6 relative Sqrt 60 66 67 BSS Loader CALL 66
384
CALL 32 RETURN
Works for subroutine calls, but what about for sharing data?
e.g., LD 1,XValue this cannot be replaced with a call to TV, or even a load (with memory direct addressing) from the TV
385
193
CS
LD 1,X
386
Impact on Relocation
What does this mean for relocating the program? There are 2 different kinds of relative Assembler must distinguish them
extend relocation information e.g., use 2 bits per word
00 - absolute 01 - relative (to CS load location) 10 - relative (to DS load location)
387
194
Lecture #28
388
389
195
Introduction
General linking/loading strategy Very common in modern systems And used in Lab #4! Advantages:
separate assembly multiple control and data segments lower time overhead (in program execution) lower space overhead (in run-time footprint)
390
Assembler Responsibilities
1. 1 Header information
length of segment execution start address
196
Assembler Responsibilities II
4. 4 Relocation information
modification records
5. Machine code
text records
There are some new things here, which suggests defining some new record types
392
Entry Record
List and define all the entry symbols Possible format:
<Flag> <symbol_name> <value>
Examples:
ESqrt 0E (possible because symbols have fewer than 7 characters) f th h t ) ESqrt=0E
197
For Lab #4
Well adopt the following conventions: We ll
A programs name is always (implicitly) an entry symbol Entry symbols must be relative
(If you wish to handle absolute, too, thats up to you)
394
External Record
Can be combined with text and modification records if you wish Examples:
LD 1,9 LD 1,Num LD 1,Enum 1E T1F01009 T1F01002M T1F01000XEnum T1F01000XE
Format:
T <addr> <machine_code> X <symbol_name>
395
198
External Records II
Such a record tells the loader to:
find seg. that defines that (external) symbol find the value of that symbol within that seg. (i.e., look at the corresponding entry record!) add this value to the one in the text record add the load location of the seg. that defines the symbol to the text record
this last step is just like the usual relocation operation of relative symbols, but using the LL of the segment that defines the symbol
396
Lecture #29
397
199
398
Pass #1: find definition of all external symbols Pass #2: aggregate, relocate, link, load
399
200
Pass #1
Q What does the assembler tell the loader Q. about each ENT symbol? A. So, to determine the actual symbol value, loader must calculate: ______________ + ______________ For lab #4, we can load the segments into one contiguous block of memory
400
Loaded Memory
401
201
Example
Main ORI EXT ENT BRS BR Num NMD END Pnum Num 3,Pnum 0,0(0) 7 ORI EXT ENT Pnum IO BR END Lib Num Pnum 2,Num 3,0(3)
403
202
(assume: ______________________ ) Note: for lab #4, can restrict external symbols to be relative only
404
203
Recommended exercise: assemble link, and assemble, link load our example (assuming PLA of ________ )
406
Lecture #30
407
204
205
Unifying X and M
Dont really need 2 separate mechanisms! Don t Recall meanings of
T_______M T _ _ _ _ _ _ _ XSym
Sym is in EST add this value of Sym to address field
Recall that segment names always in EST This suggests that X can be seen as a more general form of M!
410
Replacing M with X
Prog ORI ... Loop - - ... BR 3,Loop ... T 05 C3002 M or T 05 C3002 XProg
411
206
Lecture #31
413
207
414
Problem: Space
Consider a program that calls sqrt, rnd, sqrt rnd and substr Each defined in its own (large) library So, linked and loaded program is huge Solutions (for saving memory):
virtual memory and paging dynamic loading dynamic linking
415
208
Dynamic Loading
Observe: program does 1 thing at a time
dont need all segments present simultaneously
Example
B
500
200 300
D
Total Size = 1.9 Mb
300
200
400
416
C E
300 200
C F
900
Only 1Mb needed (length of longest path) Trade-off: memory space & time
417
209
Dynamic Linking
Instead of branching directly to an external symbol, program issues a call request to OS
subroutine name is parameter for request
OS responsibilities
keep table of loaded libraries p
loads new library if needed manages swapping of libraries as appropriate
418
Dynamic Linking II
Binding: the association of an actual Binding : address (5E) with a symbolic name (Sqrt) Dynamic linking delays binding from load time to execution time (late binding) Advantages: g
many programs can share 1 loaded library library can be recompiled on-the-fly library only loaded if actually used
419
210
Problem: Time
Every time we want to execute a program, program must re-link, relocate and re-load
costly if object code hasnt changed
211
text
212
213
Lecture #32
426
Macro Processors
427
214
Introduction
Macro: a notational convenience for Macro : programmers
short-hand for commonly used blocks of code not restricted to assembly languages
Macro Processor: tool that replaces shortp hand with corresponding block of code
performs string substitution (expansion) no analysis of instructions no semantics of programming language
428
Example
To clear all registers we write: registers,
LDI LDI LDI LDI 0,0 1,0 2,0 3,0
215
Example II
CLEAR MAC LDI LDI LDI LDI MND ;begin defn 0,0 1,0 2,0 3,0
macro name
macro body
430
Example III
In body of program:
M CLEAR M CLEAR M
216
Picture
Notice that result is a __________ program Macro Processor source
source
The languages of the two programs differ only by what can be achieved with textual substitution
i.e., approximately the same level of abstract machine
432
Outline
Features
arguments labels variables conditional expansion
Algorithm for macro processor Macros in C and C++ Reference: Beck chp. 4
433
217
Macro Arguments
Arguments make macros more flexible I Involves textual substitution l l b i i
SWAP MAC LD LD ST ST MND ORI NMD NMD SWAP BR END (&A,&B) 1,&A 2,&B 1,&B 2,&A 10 0 (X,Y) 0,0(3)
434
Prog P X Y
435
218
436
Labels: Problem
Consider a program with multiple invocns of macro SWAPR:
M SWAPR (1,2) M ( ) SWAPR (1,3) M
Expands to:
219
Labels: Solution
Macro processor provides a mechanism for generating unique labels
e.g., preface symbol (definition and use) with $
SWAPR $Tmp1 $Tmp2 $Strt MAC BR RES RES ST M (&r1,&r2) 3,$Strt 1 1 &r1,$Tmp1
438
Labels: Solution II
First expansion of this macro:
$AATmp1 $AATmp2 $AAStrt BR RES RES ... 3,$AAStrt 1 1
220
Variables
Evaluated at time of: __________________
i.e., not at execution time
Example: &Test
variable name
SET 0
special expression pseudo-op
&Test can then be used in expressions within the macro body This feature is often used in conjunction with
440
Conditional Expansion
So far all macros we ve seen have been expanded far, weve to the same block of code
(modulo argument replacement)
221
442
Example: Swap
Conditional expansion for efficiency:
SWAP MAC (&A,&B) IF (&A NEQ &B) LD 1,&A LD 2,&B ST 1,&B ST 2,&A 2 &A ENDIF MND
222
Lecture #33
444
Advantage:
speed
Advantage:
program size
223
1st pass:
build table of key, domain: ? and attribute, range: ?
2nd pass:
do the expansion (replace macro calls with bodies)
1st pass Invariant: after each MND, table contains all previous macro names seen in definitions, and their bodies
446
M
MND WRITE MAC
M
MND WRITE MAC
M
MND MND
M
MND MND
In program:
begin by invoking OS macro (e.g., HPOS) then use READ & WRITE
447
224
Nested Definitions
To recompile on different OS change flag at the OS, top of program only! Another solution?
but nested definitions more convenient. Why?
225
Algorithm: Intuition
Scan program line-by-line MAC seen: change into definition mode
insert body into DefTable match up outer MND with initial MAC
Algorithm: Limitation
This two edged approach is pretty clever two-edged clever... But are there any limitations it imposes on the definition / use of macros? A.
i.e.,
451
226
Or, in particular . . .
452
Recursive Invocations
Macro invokes itself! Of course, beware infinite recursion:
TROUBLE MAC NMD 10 TROUBLE MND
453
227
TAB (3)
455
228
Lecture #34
457
229
458
MP Algorithmic Highlights
No nested definitions; nested invocations supported, ; pp , BUT no recursion allowed
Self-references not further expanded #define T (x+T) //only one expansion of T Circularities handled the same way (stop at first self-reference)
First action: strip comments; dont remove newlines View results of macro expansion with E
gcc E test.c > test.i E For RESOLVE/C++:
gcc E I/class/sce/rcpp I/class/sce/rcpp/RESOLVE_Catalog \ test.cpp > test.ii
Standard file extension for preprocessed C (C++) is .i (.ii), for intermediate file.
459
230
Must be 1 line
macro name
macro body
Using Arguments
Argument list follows name (no space):
#define INC(X) X++ #define SUM(X,Y) X+Y
231
Using Arguments II
Problem #2: protecting the arguments
now consider using MAX macro in:
flag = MAX (b>0, c<0);
Conditionals
Common condition is this macro defined? is defined?
#ifndef BUFF_SIZE #define BUFF_SIZE 1000 #endif /*BUFF_SIZE*/
232
File Inclusion
Syntax: #include filename filename
text of file called filename inserted at that point
#include f
File f2
#include f1
464
File F1.h #ifndef F1 H IFP F1_H_IFP #define F1_H_IFP 1 ... #endif /*F1_H_IFP*/
something unique
465
233
Predefined Macros
Some defined by ANSI standard:
_ _ FILE_ _ / _ _LINE_ _: current file name / line number _ _DATE_ _ / _ _TIME_ _ : current date / time
Arguments in Strings
ANSI C: parameter substitution not performed within quoted strings
#define DISP(EXP) printf(EXP = %d\n, EXP) Invocation: DISP (i*j+1); Result:
234
Pitfalls to Avoid
Text substitution aspect of macros can Text substitution make them tricky General strategy: limited use! Pitfall #1: side effects
recall MAX example consider: a = MAX(b++, c++) Q. if b = 2, c = 5 beforehand, what is result? A. a = ______ b = ______ c = _______
468
Pitfalls to Avoid II
Pitfall #2: swallowing the semicolon t a # : swa ow g t e se co o
Macro expands to form a compound statement:
#define INC(X,Y) {X++; Y++;}
But consider:
if (. . . ) INC(a,b); else . . .
This doesnt compile! why not? Solution (notice the missing semicolon at the end):
#define INC(X,Y) do { while(0)
469
X++; Y++; }
\ \ \
235
Lecture #35
470
Compilers
471
236
Introduction
Ref Beck chapter 5 Ref. Compiler = a kind of translator
high level language --> machine (or assembly) code
2. Semantics
what does the program mean? i.e., into what machine code it is translated
473
237
Modular Decomposition
View input as a stream of characters
P r o g _ _ _ _ _ O R I _ _ _ \n X _ _
source
Compiler
object file
Compiler must give this stream structure in order to perform the translation
474
Coarse-Grained Decomposition
( (source) ) stream of characters
Lexical Analyzer
stream of tokens
Parser
parse tree
Code Generator
object file
475
238
Lexical Analysis
First step of compilation process Also called:
scanner, tokenizer, lexer
Tokens
A token is defined by:
1. Type (e.g., integer) 2. Value (e.g., 312)
Keywords (e.g., while) often have their own token type (no associated value) yp ( ) Example:
MEAN := SUM DIV 100;
477
239
Tokens - Example
Result of tokenizing:
Line 13 Token Type id := id DIV int ; Token Value MEAN SUM 100
478
Token Definition
How to define what is & isnt a token isn t Some things seem to be simple
e.g., keywords
240
Regular Expressions
Examples
label :: [A - Z] [A - Z 0 - 9] {0, 5}
a label is a capital letter followed by 0 to 5 characters that may be capital letters or numbers
int :: 0 | [1 - 9] [0 - 9]*
an int is either a 0 or a digit in range 1 to 9, followed 9 follo ed by any number (0 or more) digits
An NFSA accepts a string iff it can read the string and end up in a final state
481
241
Example: LongLabel
longlabel :: [A - Z] [A - Z 0 - 9]* A-Z0-9 A- Z
482
Example: Int
0 0-9 1-9
483
242
Exercise
Write an NFSA for labels with underscores
same rules as for LongLabels (for letters/no.s) no _ at start no _ at end no 2 _s in a row BUFFER1 T_B_SI9ZE BUFF_SIZE BUFF_ BUFF_ _S 1SIZE
484
Lexical Analysis
Could write code to recognize LongLabel directly
see figure 5.10 but this is hard to read, modify, maintain,
Much easier to read and understand FSA! Scanners are often built automatically from FSA descriptions!
485
243
486
Lecture #36
487
244
489
245
Grammar
Defines syntax of language Given as a collection of rules
transformations e.g., ( X ) * ( T X ) * maps string on left into string on the right p g g g
CFG
Two kinds of symbols:
terminals non-terminals
246
492
BNF II
One special start symbol start
e.g., <program> ::= id <origin> <body> <end>
247
494
Parse Trees
Record the application of BNF rules
root: the start symbol internal nodes: non-terminal symbols leaves: terminals (i.e., tokens)
Example: using PASCAL BNF, what is the p g , parse tree for MEAN := SUM DIV 100 ?
495
248
DIV
Exercise:
<exp> ::= <exp> + <exp> | <exp> - <exp> | int parse 3 - 6 - 2 answer?
Grammar that allows more than one parse tree to be formed for the same token sequence: ambiguous
497
249
Algorithm
How do we calculate a parse tree? Two approaches:
bottom-up (start at leaves) top-down (start at root)
498
Shift-Reduce Parsing
Bottom up approach Bottom-up Scan tokens, placing them on a stack Group tokens at top of stack:
pop them all off push corresponding non terminal non-terminal shift reduce
250
Shift-Reduce Parsing II
Grammar must be LR LR
Left-to-right scan of the input, producing a Right-most derivation symbols to be reduced always appear at top of stack (never inside it)
Lecture #37
501
251
Recursive Descent
Top-down approach Each non-terminal has associated routine
scan forward try to identify string matching this rule
503
252
Recursive-Descent - Problem
Subtle potential problem: left-recursion left-recursion
the left-most (first) symbol in the BNF rule is the same non-terminal (recursive) e.g., <id-list> ::= id | <id-list>, id
If we want to expand 2nd alternative, first call ourselves! (i.e., infinite recursion) ( , ) One solution: change notation slightly
<id-list> ::= id [ , <id-list> ] routine always consumes a token before recursion
504
505
253
Introduction
Use a collection of routines 1 routine / non-terminal in the grammar
called semantic or code-generating routines
2 approaches:
create entire tree
then walk the tree, generating code
generate code as we go
when a grammar rule is recognized, call the corresponding code-generating routine
506
Example
Consider: <term> ::= <factor> * <factor> Occurs in parse tree as: <term>
<factor> * <factor>
254
Optimization
An optimizing compiler tries to generate the most efficient object code
time (fast execution times) space (small object files)
508
255
Lex Example
Input file:
Definitions %% Rule {action} . . .
Definitions: convenient short-hands for REs R l recognized regular expressions and Rules: i d l i d corresponding action to perform
set the token value (use global variable yylval) return the token type (return an int)
510
INSERT: Lex Example Input file for simple Pascal syntax (pascal.lex)
511
256
Lex Example II
Run: lex pascal.lex pascal lex Result:
file called lex.yy.c a 677-line C program! implements the function int yylex() p yy ()
512
Yacc Example
Create a file defining the grammar C eate e de gt eg a a
%token NUMBER %% expr: NUMBER {$$ = $1 } | expr + expr {$$ = $1 + $3} | ( expr ) {$$ = $2 }
An invocation of yylex used to return the next token (and token value) Action produces output (object code) Run yacc on this file to produce a compiler that uses a bottom-up parsing method.
513
257
Lecture #38
514
To Ponder
What is meant by a text file? (vs. binary) A file of English text occupies 5 Mbytes on disk. A Java program reads the contents of this file into a String (or StringBuilder) object. How much memory does it need? Java string length vs. number of characters
String s = . . . assert (s length() == 7) (s.length() How many characters does s contain?
258
Unicode
A standard for the discrete representation of written p text
516
U+0444 U+006D
D1 84 6D
E2 82 AC E5 A5 BD E2 80 99
517
259
One glyph can be different characters (capital Latin A and Greek Alpha: ) One glyph can be several characters (ligature of f+i into one symbol: )
518
Security Issue
Visual homograph: Two different characters that look th same h t th t l k the
Would you click here: www.paypl.com ? Oops! The second a is actually CYRILLIC SMALL LETTER A This site successfully registered in 2005 y g
Solution
Heuristics that warn users when languages are mixed and homographs are possible
519
260
As of November 2010:
Contains 109,000+ code points Covers 93 scripts (and counting)
520
Organization
Code points are grouped into categories
e.g., Basic Latin, Cyrillic, Arabic, Cherokee, Currency, g, , y , , , y, Mathematical Operators
261
522
UTF-8
Encoding of code point (integer) in a sequence of bytes (octets)
Standard: all caps, with hyphen (UTF-8)
Variable length
Some code points require 1 octet Others require 2, 3, or 4
Consequence: Can not infer number of characters from size of file! No endian-ness: just a sequence of octets
D0 BF D1 80 D0 B8 D0 B2 D0 B5 D1 82 ...
523
262
2-byte encodings
First byte starts with 110 Second byte starts with 10
Example: Payload: = = 1101 0000 1011 1111 1101 0000 1011 1111 100 0011 1111 U+043F (i.e., , Cyrillic small letter pe)
524
Subsequent k-1 bytes each starts with 10 Remaining bits are payload
263
(from wikipedia)
526
Security Issue
Not all encodings are permitted
overlong encodings are illegal g g g example: C0 AF = 1100 0000 1010 1111 = U+002F (should be encoded 2F)
Accepted ..%c0%af.. (doesnt contain x2F) %c0%af (doesn t After accepting, then decoded
2E 2E C0 AF 2E 2E gets decoded into ../..
264
Solutions:
Use that leading bit!
Text data now looks just like binary data
ASCII
529
265
530
531
266
533
267
Another solution: require byte order mark (BOM) at the start of the file
U+FEFF (ZERO WIDTH NO BREAK SPACE) there is no U FFFE code point th i U+FFFE d i t FE FF ==> BigE, FF FE ==> LittleE Not considered part of the text
534
Advantages:
Forms magic-number for UTF-8 encoding
Disadvantages:
Not backwards-compatible to ASCII Existing programs may no longer work E.g., in Unix, shebang (#!, i.e., 23 21) at start of file is significant (file is a script)
#! /bin/bash
535
268
To Ponder
What is meant by a text file? (vs. binary) A text file occupies 5 Mbytes on disk A Java disk. program reads the contents of this file into a String (or StringBuilder) object. How much memory does it need? Java string length vs. number of characters
String s = . . . assert (s length() == 7) (s.length() How many characters does s contain?
Lecture #39
Review
537
269