You are on page 1of 166

OPEN SOURCE LECTURE NOTES Relational Database Theory and Practice

Anthony Aaby DRAFT Version: 1.0 Last Edit: September 29, 2006

2 This work is licensed cc 2006 by Anthony A. Aaby Walla Walla College 204 S. College Ave. College Place, WA 99324 E-mail: aabyan@wwc.edu under the Creative Commons Attribution-NonCommercial License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/ 1.0, or send a letter to Creative Commons, 559 Natha n Abbot Way, Stanford, California 94305, USA. This book is distributed in the hope it will be useful, but without any warranty; without even the implied warranty of merchantability or tness for a particular purpose. You may copy, modify, and distribute this book for the cost of reproduction provided the above creative commons notice remains intact. No explicit permission is required from the author for reproduction of this book in any medium, physical or electronic.
A The most current version of this text and L TEXsource is available at:

PDF: http://cs.wwc.edu/~aabyan/LN/RDB/dbln.pdf, Postscript: http://cs.wwc.edu/~aabyan/LN/RDB/dbln.ps, DVI: http://cs.wwc.edu/~aabyan/LN/RDB/dbln.dvi, HTML: http://cs.wwc.edu/~aabyan/LN/RDB/dbln/index.html, Source: http://cs.wwc.edu/~aabyan/LN/RDB/dbln.tar.gz.

Contents
1 Preface 9

Background

11
Systems 13 . . . . . . . . . . . . . . . . . . . . . 14 . . . . . . . . . . . . . . . . . . . . . 14 . . . . . . . . . . . . . . . . . . . . . 14 17 18 18 18 20 21 21

2 IM1 Information Models and 2.1 History and Motivation . . 2.1.1 History . . . . . . . 2.1.2 Examples . . . . . .

3 IM2. Database Systems 3.1 History and motivation for database systems . . . 3.1.1 History . . . . . . . . . . . . . . . . . . . . 3.2 Components of database systems . . . . . . . . . . 3.3 Database architecture and data independence . . . 3.4 Use of a database query language . . . . . . . . . . 3.5 Recent developments and applications (hypertext, multimedia) . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . hypermedia, . . . . . . . .

II

Declarative Programming
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23
25 25 26 28 28 28 29 30 31 31 32 33

4 Prolog Tutorial 4.1 Introduction . . . . . . . . . . . . . 4.2 The Structure of Prolog Programs 4.3 Syntax . . . . . . . . . . . . . . . . 4.3.1 Facts . . . . . . . . . . . . . 4.3.2 Queries . . . . . . . . . . . 4.3.3 Rules . . . . . . . . . . . . 4.4 Types . . . . . . . . . . . . . . . . 4.4.1 Simple Types . . . . . . . . 4.4.2 Composite Types . . . . . . 4.4.3 Type Predicates . . . . . . 4.5 Expressions . . . . . . . . . . . . . 3

CONTENTS 4.5.1 Arithmetic Operators . . . . . . . . . . . . . . . . 4.5.2 Boolean Predicates . . . . . . . . . . . . . . . . . . 4.5.3 Logical Operators . . . . . . . . . . . . . . . . . . Unication and Pattern Matching . . . . . . . . . . . . . . Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.1 Iteration . . . . . . . . . . . . . . . . . . . . . . . . Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . Iterators, Generators and Backtracking . . . . . . . . . . . Tuples ( or Records) . . . . . . . . . . . . . . . . . . . . . Extra-Logical Predicates . . . . . . . . . . . . . . . . . . . 4.12.1 Input/Output . . . . . . . . . . . . . . . . . . . . . 4.12.2 Program Access and Manipulation . . . . . . . . . 4.12.3 System Access . . . . . . . . . . . . . . . . . . . . Style and Layout . . . . . . . . . . . . . . . . . . . . . . . 4.13.1 Debugging . . . . . . . . . . . . . . . . . . . . . . . Negation and Cuts . . . . . . . . . . . . . . . . . . . . . . 4.14.1 Negation . . . . . . . . . . . . . . . . . . . . . . . 4.14.2 Cuts . . . . . . . . . . . . . . . . . . . . . . . . . . Denite Clause Grammars . . . . . . . . . . . . . . . . . . 4.15.1 Context Free Grammars . . . . . . . . . . . . . . . 4.15.2 DCG . . . . . . . . . . . . . . . . . . . . . . . . . . 4.15.3 Parse Trees . . . . . . . . . . . . . . . . . . . . . . 4.15.4 Simple Semantics for Natural Language Sentences 4.15.5 Interleaving syntax and semantics in DCG . . . . . Incomplete Data Structures . . . . . . . . . . . . . . . . . Meta Level Programming . . . . . . . . . . . . . . . . . . 4.17.1 Meta-Logical Type Predicates . . . . . . . . . . . . 4.17.2 Term Comparison . . . . . . . . . . . . . . . . . . 4.17.3 The Meta-Variable Facility . . . . . . . . . . . . . 4.17.4 Assert/Retract . . . . . . . . . . . . . . . . . . . . Second-Order Programming . . . . . . . . . . . . . . . . . 4.18.1 Setof, Bagof and Findall . . . . . . . . . . . . . . . 4.18.2 Other second-order predicates . . . . . . . . . . . . 4.18.3 Applications . . . . . . . . . . . . . . . . . . . . . Database Programming . . . . . . . . . . . . . . . . . . . 4.19.1 Simple Databases . . . . . . . . . . . . . . . . . . . 4.19.2 Recursive Rules . . . . . . . . . . . . . . . . . . . . 4.19.3 Logic programs and the relational database model Expert systems . . . . . . . . . . . . . . . . . . . . . . . . Object-Oriented Programming . . . . . . . . . . . . . . . Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 33 35 36 36 39 40 42 43 45 46 46 48 49 49 50 50 50 50 51 51 51 53 54 55 56 57 57 57 57 58 60 60 60 61 61 61 62 63 64 64 65 67

4.6 4.7 4.8 4.9 4.10 4.11 4.12

4.13 4.14

4.15

4.16 4.17

4.18

4.19

4.20 4.21 4.22 4.23

CONTENTS 5 Datalog and Logic-Based Databases 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . 5.2 Datalog and the Relational Model . . . . . . . . 5.3 Relational Algebra and Datalog . . . . . . . . . . 5.3.1 General Form: a Datalog Rule . . . . . . 5.3.2 Operations and Equivalents . . . . . . . . 5.3.3 Bottom-up query evaluation (Coral) . . . 5.3.4 Top-down query evaluation (Prolog) . . . 5.3.5 Negation and the Open and Closed World 5.4 Recursive Programming in Datalog . . . . . . . . 5.5 Pure Prolog . . . . . . . . . . . . . . . . . . . . . 5.6 Higher-order Predicates in Prolog . . . . . . . . . 5.7 Logic and Prolog . . . . . . . . . . . . . . . . . . 5.8 Exercises . . . . . . . . . . . . . . . . . . . . . .

5 69 69 70 70 71 72 72 72 73 73 74 74 75 75

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . .

III

The Relational Database Model

77
79 79 80 81 81 84 84 84 85 87 88 88 88 88 89 89 89 91

6 IM4 Relational Databases 6.1 Mapping conceptual schema to a relational schema . . . . . . . . 6.2 Entity and referential integrity . . . . . . . . . . . . . . . . . . . 6.3 Relational algebra and relational calculus . . . . . . . . . . . . . 6.3.1 The Relational Algebra . . . . . . . . . . . . . . . . . . . 6.4 The Tuple Relational Calculus and the Domain Relational Calculus 6.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.2 The tuple relational calculus (TRC) . . . . . . . . . . . . 6.4.3 The domain relational calculus (DRC) . . . . . . . . . . . 7 IM5 Database Query Languages 7.1 Data Denition Language (DDL) . . . . 7.1.1 Data types (predened domains) 7.1.2 User dened domains . . . . . . 7.1.3 Table declarations and creation . 7.1.4 Table deletion . . . . . . . . . . . 7.1.5 Table modication . . . . . . . . 7.2 Data Manipulation Language (DML) . . 7.3 WWW Tutorials . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

IV

Relational Database Design

93
95

8 IM6 Relational Database Design

6 9 Database and Application Development 9.1 The Life-Cycle . . . . . . . . . . . . . . 9.2 Database Application Project Script . . 9.3 Database Design Quality Factors . . . . 9.4 Database Security and Authorization . . 9.4.1 Issues . . . . . . . . . . . . . . . 9.4.2 Security philosophy and issues . 9.4.3 Authentication . . . . . . . . . . 9.4.4 Authorization . . . . . . . . . . . 9.4.5 Encryption . . . . . . . . . . . . 9.5 Top-down vs Bottom-Up . . . . . . . . . 10 IM3 Data Modeling 10.1 Data Modeling . . . . . . . . 10.1.1 Constraints . . . . . . 10.1.2 Levels of Abstraction . 10.2 E/R Modeling and diagrams 10.2.1 Conceptual design . . 10.3 The Relational Data Model .

CONTENTS 97 97 98 99 100 100 100 101 102 102 102 103 103 105 105 105 105 107

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

11 Functional Dependency 111 11.1 Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 11.2 Keys for relations . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 11.3 Reasoning about functional dependencies . . . . . . . . . . . . . 112 12 Normal Forms 12.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2 First normal form - 1NF . . . . . . . . . . . . . . . . . . 12.3 Second normal form - 2NF . . . . . . . . . . . . . . . . . 12.4 Third normal form - 3NF . . . . . . . . . . . . . . . . . 12.5 Boyce-Codd normal form - BCNF . . . . . . . . . . . . 12.6 Forth normal form - 4NF - No multivalued dependencies 12.7 Fifth normal form 5NF . . . . . . . . . . . . . . . . . 115 115 116 117 117 118 118 119

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

Advanced topics Limited Coverage


. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

121
123 124 124 125 125 125 127 129

13 IM7 Transaction Processing 13.1 Transactions . . . . . . . . . . . . . . . . 13.1.1 Overview of transaction processing 13.1.2 Atomicity and Durability . . . . . 13.2 Failure and Recovery . . . . . . . . . . . . 13.3 Concurrency Control . . . . . . . . . . . . 13.4 Distributed Transactions . . . . . . . . . . 14 IM8. Distributed databases

CONTENTS 15 IM9 Physical Database Design 16 IM10. Data mining 17 IM11. Information storage and retrieval 18 IM12. Hypertext and hypermedia 19 IM13. Multimedia information and systems 20 IM14. Digital libraries

7 131 135 137 139 141 143

VI

Tools and RDBMS Specics


. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

145
147 147 147 147 147 147 147 147 147

21 Interaction with a RDBMS 21.1 Design Tools . . . . . . . 21.2 Vendor Specic DB Tools 21.2.1 MySQL . . . . . . 21.2.2 Postgesql . . . . . 21.2.3 Oracle . . . . . . . 21.2.4 DB2 . . . . . . . . 21.2.5 SQL-Server . . . . 21.2.6 Other . . . . . . .

VII

Project
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

149
151 151 152 153 154 156 157 157 163 164 166

22 Quick-Kill Project Management 22.1 The Quick Kill . . . . . . . . . . . . . . . . . 22.2 Vision and Scope Document: Up To 6 Hours 22.3 Work Breakdown Structure: 2 Hours . . . . . 22.4 Code Reviews: 2.5 Hours Per Review . . . . . 22.5 Exercises . . . . . . . . . . . . . . . . . . . .

23 Project Reports: DB Design and Implemenation 23.1 Phase 0 Problem Description . . . . . . . . . . . . 23.2 Phase I Project Report . . . . . . . . . . . . . . . . 23.3 Phase II Project Report . . . . . . . . . . . . . . . 23.4 Phase III Project Report . . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

CONTENTS

Chapter 1

Preface
Course contents cover subtopics IM1-IM6 of the topic IM. Information Management of the The Computer Science Body of Knowledge described by the Computing Curricula 2001. Satisfactory completion of this course requires demonstration of the following skills: The relational model Be able to construct a simple relational database. Be able to formulate simple queries using the relational algebra. Database Query Languages SQL Be able to use the data denition language (DDL). Be able to use the data manipulation language (DML). Database Design (The resulting database design must contain at least 15 relations.) Modeling Be able to construct either an E/R diagram or an UML diagram for a database. Be able to use a tool for drawing a diagram. Mapping Be able to map a conceptual model (diagram) to a relational schema. Functional Dependency Be able to determine all functional dependencies of a relation. Be able to determine all candidate keys. 9

10 Normalization

CHAPTER 1. PREFACE

Be able to achieve the desirable state of 3NF by progressing through the intermediate states of 1NF and 2NF if needed. Be able to normalize to the BCNF. Be aware of the 4NF and the 5NF. Database Implementation Create a database using DDL. Identify dierent classes of users and create appropriate views of the database. Create a web interface to the database for each class of user utilizing a programming language interface (e.g. Perl, PHP, Java, C/C++) to the DBMS Coordination with Course Text (Date, C. J. An Introduction to Database Systems 8th ed AddisonWesley): Overview: Chapters 1, 2, 3, 4 Design: Chapters 11, 12, 13 Theory: Chapters 5, 6, 7, 9, 10, 27

Part I

Background

11

Chapter 2

IM1 Information Models and Systems


Suggested time: 3 hours Topics: History and motivation for information systems Information storage and retrieval (IS & R) Information management applications Information capture and representation Analysis and indexing Search, retrieval, linking, navigation Information privacy, integrity, security, and preservation Scalability, eciency, and eectiveness Learning objectives: 1. Compare and contrast information with data and knowledge. 2. Summarize the evolution of information systems from early visions up through modern oerings, distinguishing their respective capabilities and future potential. 3. Critique/defend a small- to medium-size information application with regard to its satisfying real user information needs. 4. Describe several technical solutions to the problems related to information privacy, integrity, security, and preservation. 5. Explain measures of eciency (throughput, response time) and eectiveness (recall, precision), and describe approaches to ensure that information systems can scale from the individual to the global. 13

14

CHAPTER 2. IM1 INFORMATION MODELS AND SYSTEMS

2.1
2.1.1

History and Motivation


History
Sequential les Direct access les Indexed le

Files

Database technology Other Knowledge bases (expert systems) Object-oriented database Hypertext and media

2.1.2

Examples
Corporate records Data items include: record of each sale, information about accounts payable and receivable, information about employees (names addresses, salary, benet options, tax status, ...). Typical queries include: printing of reports such as accounts receivable, employee paychecks, ... Data modication operations include: each sale, purchase, bill receipt, each employee hired, red, or promoted, ... Airline reservation system Data items include: 1. Reservations by a single customer on a single ight, including assigned seat, meal preference, ... 2. Information about ights - airports, arrival & departure times, aircraft own, ... 3. Information about ticket prices, requirements, and availability Typical queries include: 1. ights from/to with departure/arrival times 2. what seats are available at what prices. Data modication operations include 1. booking of a ight 2. assigning a seat,

Classic business information systems such as

2.1. HISTORY AND MOTIVATION

15

3. indicating meal preference Must provide for concurrent access and prevent loss of records in the event of system failure. Banking System Data items include: names and addresses of customers, accounts, loans, their balances, and the connection between customers and their accounts and loans. Typical queries include: account balances Data modication operations include: payments from or deposits to an account. Must provide for concurrent access (by tellers and ATM machines) and prevent loss of records in the event of system failure. Modern applications VLSI design databases CAD databases Graphic databases Software engineering databases (libraries of reusable software) require fast retrieval and sophisticated operations Expert systems (Knowledge bases) Persistent datatype research (object-oriented database) FTP sites Scientic collaboration Web & Web browsers E-commerce

16

CHAPTER 2. IM1 INFORMATION MODELS AND SYSTEMS

Chapter 3

IM2. Database Systems


Suggested time: 3 hours Topics: History and motivation for database systems Components of database systems DBMS Functions Database architecture and data independence Use of a database query language Learning objectives: 1. Explain the characteristics that distinguish the database approach from the traditional approach of programming with data les. 2. Cite the basic goals, functions, models, components, applications, and social impact of database systems. 3. Describe the components of a database system and give examples of their use. 4. Identify major DBMS functions and describe their role in a database system. 5. Explain the concept of data independence and its importance in a database system. 6. Use a query language to elicit information from a database. 7. Recent developments and applications (hypertext, hypermedia, multimedia) 17

18

CHAPTER 3. IM2. DATABASE SYSTEMS

3.1
3.1.1

History and motivation for database systems


History
1960s Network, hierarchical: ecient access to large amounts of data, but neither integrated DML and host languages nor provided declarative query languages. 1970s Relational: declarative and value oriented, but do not easily allow integration of the DML and host languages. 1980s OO-DBMS: support object identity and ADTs; good integration of DML and host languages. Not declarative. 1990s KBMS? Declarative, integration of DML and host language, value-oriented and logic based.

Database technology

3.2

Components of database systems

Database (DB): A collection of data (stored more-or-less permanently in a computer) that is managed by a database management system (DBMS). Database management system (DBMS): A collection of programs that enables users to create and maintain a database. 1. Allow users to create new databases and specify their logical structure (schema) using a data-denition language (DDL). Dening a database involves specifying the data types, structures, and constraints for the data stored in the database. Constructing the database is the process of storing the data itself on some storage medium that is controlled by the DBMS. 2. Manipulating a database includes such functions as querying the database to retrieve specic data, updating the database to reect changes in the world and generating reports from the data. Allow users to query that database and modify the data using a query language (data-manipulation language DML). Support the storage, security, authorized access, and ecient access of data. Control simultaneous access while maintaining data integrity (prevent data corruption). A database system is a database together with the DBMS software. The components of a database system include: Data:

3.2. COMPONENTS OF DATABASE SYSTEMS

19

Database: data that is stored more-or-less permanently in a computer. Database management system (DBMS): software which allows the user to use or modify the database. DBMS Facilities Data denition language (DDL): used to dene the conceptual scheme. The scheme is compiled and stored in the data dictionary. Data manipulation language (DML): query sublanguage (retrieval), maintenance sublanguage (insertion, deletion, modication). Structure of DBMS DDL compiler: Compiles conceptual schemes to tables stored in the data dictionary. Database manager: translates query into le operations Query Processor: File manager: often a general purpose le system provided by the operating system. Disk manager Telecommunication system Data les Data dictionary: structure and usage of data contained in the database. Access aids: indexes Users: Database application programmers: develop programs or interfaces for naive and online users which are precompiled queries. Database implementers Database administrator (DBA): oversee and manage resources Design of the conceptual and physical schemas Security and authorization: Data availability and recovery from failures - backups and repairing damage due to hardware or software failures or misuse. Database tuning: performance and database evolution Database Designers End users Casual end users Naive or parametric end users Sophisticated end users Stand-alone users

20

CHAPTER 3. IM2. DATABASE SYSTEMS h3Advantages & Disadvantages/h3 h4Advantages/h4 Data independence application programs are insulated from changes in the way data is structured and stored which allows dynamic changes and provides growth potential. Ecient data access Data administration centralized Data integrity & security Concurrent access and crash recovery Reduced application development time h4Disadvantages/h4 Problems associated with centralization Cost of software/hardware and migration Complexity of backup and recovery

In reality, centralized databases are applicable only to small operations. Companies are bought, sold, and merge often necessitating interaction between distributed databases. An enterprise database is constructed from the distributed databases. h2DBMS Functions/h2 h3Access methods/h3 h3Security/h3 Prevention of unauthorized access: elds, les, read & write access privileges. h3Deadlock and concurrency problems/h3 Multiple readers, single writer. h3Fourth generation environments/h3

3.3

Database architecture and data independence

Database architecture Centralized Distributed Client-server Collaborating server systems (heterogeneous system) Middleware systems (heterogeneous system) Data independence: The structure of data les is stored in the DBMS catalog separately from the access programs.

3.4. USE OF A DATABASE QUERY LANGUAGE

21

3.4 3.5

Use of a database query language Recent developments and applications (hypertext, hypermedia, multimedia)

Hypertext Hypermedia Optical disks World Wide Web E-commerce Today, a corporate information system must integrate the database with a web based front end, the webserver, marketing, sales, and technical support.

22

CHAPTER 3. IM2. DATABASE SYSTEMS

Part II

Declarative Programming

23

Chapter 4

Prolog Tutorial
J. A. Robinson: A program is a theory (in some logic) and computation is deduction from the theory. N. Wirth: Program = data structure + algorithm R. Kowalski: Algorithm = logic + control

4.1

Introduction

Prolog, which stands for PROgramming in LOGic, is the most widely available language in the logic programming paradigm. Logic and therefore Prolog is based the mathematical notions of relations and logical inference. Prolog is a declarative language meaning that rather than describing how to compute a solution, a program consists of a data base of facts and logical relationships (rules) which describe the relationships which hold for the given application. Rather then running a program to obtain a solution, the user asks a question. When asked a question, the run time system searches through the data base of facts and rules to determine (by logical deduction) the answer. Among the features of Prolog are logical variables meaning that they behave like mathematical variables, a powerful pattern-matching facility (unication), a backtracking strategy to search for proofs, uniform data structures, and input and output are interchangeable. Often there will be more than one way to deduce the answer or there will be more than one solution, in such cases the run time system may be asked nd other solutions. backtracking to generate alternative solutions. Prolog is a weakly typed language with dynamic type checking and static scope rules. Prolog is used in articial intelligence applications such as natural language interfaces, automated reasoning systems and expert systems. Expert systems usually consist of a data base of facts and rules and an inference engine, the run time system of Prolog provides much of the services of an inference engine. Prolog may be compiled or interpreted. At WWC, create a source program le le.pl and launch the Prolog interpreter using the terminal command: 25

26 gprolog

CHAPTER 4. PROLOG TUTORIAL

Load a le into the Prolog interpreter with: | ?- consult(le.pl). Exit the Prolog interpreter with: | ?- halt.

4.2

The Structure of Prolog Programs

A Prolog program consists of a database of facts and rules, and queries (questions). Fact: ... . Rule: ... :- ... . Query: ?- ... . Variables: must begin with an upper case letter. Constants: numbers, begin with lowercase letter, or enclosed in single quotes. Inductive denitions: base and inductive cases Towers of Hanoi: move N disks from pin a to pin b using pin c. hanoi(N) :- hanoi(N, a, b, c). hanoi(0,_,_,_). hanoi(N,FromPin,ToPin,UsingPin) :- M is N-1, hanoi(M,FromPin,UsingPin,ToPin), move(FromPin,ToPin), hanoi(M,UsingPin,ToPin,FromPin). move(From,To) :- write([move, disk from, pin, From, to, pin, ToPin]), nl. Lists: append, member list([]). list([X|L]) :- list(L). [X1 |[...[Xn |[]...] = [X1 , ...Xn ] append([],L,L). append([X|L1],L2,[X|L12]) :- append(L1,L2,L12). member(X,L) :- concat(_,[X|_],L).

4.2. THE STRUCTURE OF PROLOG PROGRAMS Ancestor ancestor(A,D) :- parent(A,B). ancestor(A,D) :- parent(A,C), ancestor(C,D). % not ancestor(A,D) :- ancestor(A,P), parent(P,D). % as infinite recursion may result.

27

Depth-rst search: Maze/Graph traversal A database of arcs (we will assume they are directed arcs) of the form: a(node_i,node_j). Rules for searching the graph: go(From,To,Trail). go(From,To,Trail) :- a(From,In), not visited(In,Trail), visited(A,T) :- member(A,T). I/O: terms, characters, les, lexical analyzer/scanner read(T), write(T), nl. get0(N), put(N): ascii value of character name(Name,Ascii list). see(F), seeing(F), seen, tell(F), telling(F), told. Natural language processing: Context-free grammars may be represented as Prolog rules. For example, the rule sentence ::= noun_clause verb_clause can be implemented in Prolog as sentence(S) :- append(NC,VC,S), noun_clause(NC), verb_clause(VC). or in DCG as: sentence -> noun_clause, verb_clause. ?- sentence(S,[]). Note that two arguments appear in the query. Both are lists and the rst is the sentence to be parsed, the second the remaining elements of the list which in this case is empty. A Prolog program consists of a data base of facts and rules. There is no structure imposed on a Prolog program, there is no main procedure, and there is no nesting of denitions. All facts and rules are global in scope and the scope of a variable is the fact or rule in which it appears. The readability of a Prolog program is left up to the programmer. A Prolog program is executed by asking a question. The question is called a query. Facts, rules, and queries are called clauses.

28

CHAPTER 4. PROLOG TUTORIAL

4.3
4.3.1

Syntax
Facts

A fact is just what it appears to be a fact. A fact in everyday language is often a proposition like It is sunny. or It is summer. In Prolog such facts could be represented as follows: It is sunny. It is summer.

4.3.2

Queries

A query in Prolog is the action of asking the program about information contained within its data base. Thus, queries usually occur in the interactive mode. After a program is loaded, you will receive the query prompt, ?at which time you can ask the run time system about information in the data base. Using the simple data base above, you can ask the program a question such as ?- It is sunny. and it will respond with the answer Yes ?A yes means that the information in the data base is consistent with the subject of the query. Another way to express this is that the program is capable of proving the query true with the available information in the data base. If a fact is not deducible from the data base the system replys with a no, which indicates that based on the information available (the closed world assumption) the fact is not deducible. If the data base does not contain sucient information to answer a query, then it answers the query with a no. ?- It is cold. no ?-

4.3. SYNTAX

29

4.3.3

Rules

Rules extend the capabilities of a logic program. They are what give Prolog the ability to pursue its decision-making process. The following program contains two rules for temperature. The rst rule is read as follows: It is hot if it is summer and it is sunny. The second rule is read as follows: It is cold if it is winter and it is snowing. It It It It is is is is sunny. summer. hot :- It is summer, It is sunny. cold :- It is winter, It is snowing.

The query, ?- It is hot. Yes ?is answered in the armative since both It is summer and It is sunny are in the data base while a query ?- It is cold. will produce a negative response. The previous program is an example of propositional logic. Facts and rules may be parameterized to produce programs in predicate logic. The parameters may be variables, atoms, numbers, or terms. Parameterization permits the denition of more complex relationships. The following program contains a number of predicates that describe a familys genelogical relationships. female(amy). female(johnette). male(anthony). male(bruce). male(ogden). parentof(amy,johnette). parentof(amy,anthony). parentof(amy,bruce). parentof(ogden,johnette). parentof(ogden,anthony). parentof(ogden,bruce). The above program contains the three simple predicates: female; male; and parentof. They are parameterized with what are called atoms. There are other family relationships which could also be written as facts,

30

CHAPTER 4. PROLOG TUTORIAL but this is a tedious process. Assuming traditional marriage and childbearing practices, we could write a few rules which would relieve the tedium of identifying and listing all the possible family relations. For example, say you wanted to know if johnette had any siblings, the rst question you must ask is what does it mean to be a sibling? To be someones sibling you must have the same parent. This last sentence can be written in Prolog as siblingof(X,Y) :parentof(Z,X), parentof(Z,Y).

A translation of the above Prolog rule into English would be X is the sibling of Y provided that Z is a parent of X, and Z is a parent of Y. X, Y, and Z are variables. This rule however, also denes a child to be its own sibling. To correct this we must add that X and Y are not the same. The corrected version is: siblingof(X,Y) :parentof(Z,X), parentof(Z,Y), X Y.

The relation brotherof is similar but adds the condition that X must be a male. brotherof(X,Y) :parentof(Z,X), male(X), parentof(Z,Y), X Y. From these examples we see how to construct facts, rules and queries and that strings are enclosed in single quotes, variables begin with a capital letter, constants are either enclosed in single quotes or begin with a small letter.

4.4

Types

Prolog provides for numbers, atoms, lists, tuples, and patterns. The types of objects that can be passed as arguments are dened in this section.

4.4. TYPES

31

4.4.1

Simple Types

Simple types are implementation dependent in Prolog however, most implementations provide the simple types summarized in the following table. TYPE boolean integer real variable atom VALUES true, false integers oating point variables character sequence

The boolean constants are not usually passed as parameters but are propositions. The constant fail is useful in forcing the generation of all solutions. Variables are character strings beginning with a capital letter. Atoms are either quoted character strings or unquoted strings beginning with a small letter.

4.4.2

Composite Types

In Prolog the distinction between programs and data are blurred. Facts and rules are used as data and data is often passed in the arguments to the predicates. Lists are the most common data structure in Prolog. They are much like the array in that they are a sequential list of elements, and much like the stack in that you can only access the list of elements sequentially, that is, from one end only and not in random order. In addition to lists Prolog permits arbitrary patterns as data. The patterns can be used to represent tuples. Prolog does not provide an array type. But arrays may be represented as a list and multidimensional arrays as a list(s) of lists. An alternate representation is to represent an array as a set of facts in a the data base. TYPE Representation list [ comma separated list] pattern VALUES Sequence of items

A list is designated in Prolog by square brackets ([ ]). An example of a list is [dog,cat,mouse]

32

CHAPTER 4. PROLOG TUTORIAL This says that the list contains the elements dog, cat, and mouse, in that order. Elements in a Prolog list are ordered, even though there are no indexes. Records or tuples are represented as patterns. Here is an example. book(author(aaby,anthony),title(lab_manual),data(1991)) The elements of a tuple are accessed by pattern matching. book(Title,Author,Publisher,Date). author(LastName,FirstName,MI). publisher(Company,City). book(T,A,publisher(C,rome),Date)

4.4.3

Type Predicates

Since Prolog is a weakly typed language, it is important for the user to be able to determine the type of a parameter. The following built in predicates are used to determine the type of a parameter. PREDICATE var(V) nonvar(NV) atom(A) integer(I) real(R) number(N) atomic(A) functor(T, F, A) T=..L clause(H, T) CHECKS IF V is a variable NV is not a variable A is an atom I is an integer R is a oating point number N is an integer or real number A is an atom or a number T is a term with functor F and arity A T is a term, L is a list (See example below) H:-T is a rule in the database.

The last three are useful in program manipulation (metalogical or metaprogramming) and require additional explanation. clause(H,T) is used to check the contents of the data base. functor(T,F,A) and T=..L are used to manipulate terms. The predicate, functor is used as follows. functor(T,F,A) T is a term, F is its functor, and A is its arity. For example, ?- functor(t(a,b,c),F,A). F = t A = 3 yes

4.5. EXPRESSIONS

33

t is the functor of the term t(a,b,c), and 3 is the arity (number of arguments) of the term. The predicate =.. (univ) is used to compose and decompose terms. For example: ?- t(a,b,c) =..L. L = [t,a,b,c] yes ?- T =..[t,a,b,c]. T = t(a,b,c) yes

4.5

Expressions

Arithmetic expressions are evaluated with the built in predicate is which is used as an inx operator in the following form. variable is expression For example, ?- X is 3*4. X = 12 yes

4.5.1

Arithmetic Operators

Prolog provides the standard arithmetic operations as summarized in the following table. SYMBOL + * / // mod ** OPERATION addition subtraction multiplication real division integer division modulus power

4.5.2

Boolean Predicates

Besides the usual boolean predicates, Prolog provides more general comparison operators which compare terms and predicates to test for uniability and whether terms are identical.

34

CHAPTER 4. PROLOG TUTORIAL

SYMBOL A ?= B A=B A \+ B A == B A =:= B A<B A =< B A>B A >= B A@<B A @ =< B A@>B A @ >= B

OPERATION uniable unify not uniable identical equal (value) less than (numeric) less or equal (numeric) greater than (numeric) greater or equal (numeric) less than (terms) less or equal (terms) greater than (terms) greater or equal (term)

ACTION A and B are uniable but are not unied Unies A and B if possible does not unify evaluates A and B to determine if equal

For example, the following are all true. 3 @< 4 3 @< a a @< abc6 abc6 @< t(c,d) t(c,d) @< t(c,d,X) Logic programming denition of natural number. % natural_number(N); N is a natural number. natural_number(0). natural_number(s(N)) :- natural_number(N). Prolog denition of natural number. natural_number(N) :- integer(N), N >= 0. Logic programming denition of inequalities % less_than(M,N); M is less than M less_than(0,s(M)) :- natural_number(M). less_than(s(M),s(N)) :- less_than(M,N). % less_than_or_equal(M,N); M is less than or equal to M less_than_or_equal(0,N) :- natural_number(N). less_than_or_equal(s(M),s(N)) :- less_than_or_equal(M,N). Prolog denition of inequality.

4.5. EXPRESSIONS M =< N. Logic programming denition of addition/substraction % plus(X,Y,Z); Z is X + Y plus(0,N,N) :- natural\_number(N). plus(s(M),N,s(Z)) :- plus(M,N,Z). Prolog denition of addition plus(M,N,Sum) :- Sum is M+N.

35

This does not dene substration. Logic programming denition of multiplication/division % times(X,Y,Z); Z is X*Y times(0,N,0) :- natural\_number(N). times(s(M),N,Z) :- times(M,N,W), plus(W,N,Z). Prolog denition of multiplication. times(M,N,Product) :- Product is M*N. This does not dene substration. Logic programming denition of Exponentiation % exp(N,X,Z); Z is X**N exp(s(M),0,0) :- natural_number(M). exp(0,s(M),s(0)) :- natural_number(M). exp(s(N),X,Z) :- exp(N,X,Y), times(X,Y,Z). Prolog denition of exponentiation is implementation dependent.

4.5.3

Logical Operators

Predicates are functions which return a boolean value. Thus the logical operators are built in to the language. The comma on the right hand side of a rule is logical conjunction. The symbol :- is logical implication. In addition Prolog provides negation and disjunction operators. The logical operators are used in the denition of rules. Thus, Prolog a :- b. a :- b, c. a :- b;c. a :- \++ b. a :- b > c;d. Meaning a if b a if b and c. a if b or c. a if b is not provable. a if b and (c or d).

36

CHAPTER 4. PROLOG TUTORIAL This table summarizes the logical operators. SYMBOL not \+ , ; :> OPERATION negation not provable logical conjunction logical disjunction logical implication if-then-else

4.6

Unication and Pattern Matching

The arguments in a query are matched (or unied in Prolog terminology) to select the appropriate rule. Here is an example which makes extensive use of pattern matching. The rules for computing the derivatives of polynomial expressions can be written as Prolog rules. A given polynomial expression is matched against the rst argument of the rule and the corresponding derivative is returned. % deriv(Polynomial, variable, derivative) % dc/dx = 0 deriv(C,X,0) :- number(C). % dx/dx} = 1 deriv(X,X,1). % d(cv)/dx = c(dv/dx) deriv(C*U,X,C*DU) :- number(C), deriv(U,X,DU). % d(u v)/dx = u(dv/dx) + v(du/dx) deriv(U*V,X,U*DV + V*DU) :- deriv(U,X,DU), deriv(V,X,DV). % d(u &plusmn; v)/dx = du/dx &plusmn; dv/dx deriv(U+V,X,DU+DV) :- deriv(U,X,DU), deriv(V,X,DV). deriv(U-V,X,DU-DV) :- deriv(U,X,DU), deriv(V,X,DV). % du^n/dx = nu^{n-1}(du/dx) deriv(U^+N,X,N*U^+N1*DU) :- N1 is N-1, deriv(U,X,DU). Prolog code is often bidirectional. In bidirectional code, the arguments may be use either for input or output. For example, this code may be used for both dierentiation and integration with queries of the form: ?- deriv(Integral, X, Derivative). where either Integral or Derivative may be instantiated to a formula.

4.7

Functions

Prolog does not provide for a function type therefore, functions must be dened as relations. That is, both the arguments to the function and the

4.7. FUNCTIONS

37

result of the function must be parameters to the relation. This means that composition of two functions cannot be constructed. As an example, here is the factorial function dened as relation in Prolog. Note that the denition requires two rules, one for the base case and one for the inductive case. fac(0,1). fac(N,F) :- N > 0, M is N - 1, fac(M,Fm), F is N * Fm. The second rule states that if N > 0, M = N - 1, Fm is (N-1)!, and F = N * Fm, then F is N!. Notice how is is used. In this example it resembles an assignment operator however, it may not be used to reassign a variable to a new value. I the logical sense, the order of the clauses in the body of a rule are irrelevant however, the order may matter in a practical sense. M must not be a variable in the recursive call otherwise an innite loop will result. Much of the clumsiness of this denition comes from the fact that fac is dened as a relation and thus it cannot be used in an expression. Relations are commonly dened using multiple rules and the order of the rules may determine the result. In this case the rule order is irrelevant since, for each value of N only one rule is applicable. Here are the Prolog equivalent of the denitions of the gcd function, Fibonacci function and ackermans function. gcd(A,B,GCD) :- A = B, GCD = A. gcd(A,B,GCD) :- A < B, NB is B - A, gcd(A,NB,GCD). gcd(A,B,GCD) :- A > B, NA is A - B, gcd(NA,B,GCD). fib(0,1). fib(1,1). fib(N,F) :- N > 1, N1 is N - 1, N2 is N - 2, fib(N1,F1), fib(N2,F2), F is F1 + F2.

ack(0,N,A) :- A is N + 1. ack(M1,0,A) :- M > 0, M is M - 1, ack(M,1,A). ack(M1,N1,A) :- M1 > 0, N1 > 0, M is M - 1, N is N - 1, ack(M1,N,A1), ack(M,A1,A). Notice that the denition of ackermans function is clumsier than the corresponding functional denition since the functional composition is not available. Logic programming denition of the factorial function. % factorial(N,F); F is N! factorial(0,s(0)). factorial(s(N),F) :- factorial(N,F1), times(s(N),F1,F).

38

CHAPTER 4. PROLOG TUTORIAL Prolog denition of factorial function. factorial(0,1). factorial(N,F) :- N1 is N-1, factorial(N1,F1), F is N*F1. Logic programming denition of the minimum. % minimum(M,N,Min); Min is the minimum of {M, N} minimum(M,N,M) :- less_than_or_equal(M,N). minimum(M,N,N) :- less_than_or_equal(N,M). Prolog programming denition of the minimum. minimum(M,N,M) :- M =< N. minimum(M,N,N) :- N =< M. Logic programming denition of the modulus. % mod(M,N,Mod) <- Mod is the remainder of the integer division of M by N. mod(X,Y,Z) :- less\_than(Z,Y), times(Y,Q,W), plus(W,Z,X). % or mod(X,Y,X) :- less\_than(X,Y). mod(X,Y,X) :- plus(X1,Y,X), mod(X1,Y,Z). Logic programming denition of Ackermanns function. ack(0,N,s(N)). ack(s(M),0,Val) :- ack(M,s(0),Val). ack(s(M),s(N),Val) :- ack(s(M),N,Val1), ack(M,Val1,Val). Prolog denition of Ackermanns function. ack(0,N,Val) :- Val is N + 1. ack(M,0,Val) :- M > 0, M1 is M-1, ack(M1,1,Val). ack(M,N,Val) :- M > 0, N > 0, M1 is M-1, N1 is N-1, ack(M,N1,Val1), ack(M1,Val1,Val). Logic programming denition of the Euclidian algorithm. gcd(X,0,X) :- X > 0. gcd(X,Y,Gcd) :- mod(X,Y,Z), gcd(Y,Z,Gcd). Logic programming denition of the Euclidian algorithm. gcd(X,0,X) :- X > 0. gcd(X,Y,Gcd) :- mod(X,Y,Z), gcd(Y,Z,Gcd).

4.8. LISTS

39

4.8
Outline

Lists

Lists Composition of Recursive Programs Iteration Lists are the basic data structure used in logic (and functional) programming. Lists are a recursive data structure so recursion occurs naturally in the denitions of various list operations. When dening operations on recursive data structures, the denition most often naturally follows the recursive denition of the data structure. In the case of lists, the empty list is the base case. So operations on lists must consider the empty list as a case. The other cases involve a list which is composed of an element and a list. Here is a recursive denition of the list data structure as found in Prolog. List --> [ ] List --> [Element|List] Here are some examples of list representation, the rst is the empty list. Pair Syntax [ ] [a|[ ]] [a|b|[ ]] [a|X] [a|b|X] Element Syntax \tt [ ] [a] [a,b] [a|X] [a,b|X]

Predicates on lists are often written using multiple rules. One rule for the empty list (the base case) and a second rule for non empty lists. For example, here is the denition of the predicate for the length of a list. length(List,Number) <- Number is lenght of List length([],0). length([H|T],N) :- length(T,M), N is M+1. Element of a list. % member(Element,List) <- Element is an element of the list List member(X,[X|List). member(X,[Element|List]) :- member(X,List). Prex of a list.

40

CHAPTER 4. PROLOG TUTORIAL % prefix(Prefix,List) <- Prefix is a prefix of list List prefix([],List). prefix([X|Prefix],[X|List]) :- prefix(Prefix,List). Sux of a list. % suffix(Suffix,List) <- Suffix is a suffix of list List suffix(Suffix,Suffix). prefix(Suffix,[X|List]) :- suffix(Suffix,List). Append (concatenate) two lists. % append(List1,List2,List1List2) <% List1List2 is the result of concatenating List1 and List2. append([],List,List). append([Element|List1],List2,[Element|List1List2]) :append(List1,List2,List1List2). Compare this code with the code for plus. sublist dene using Sux of a prex Prex of a sux Recursive denition of sublist using prex Sux of a prex using append Prex of a sux using append member, prex and sux dened using append reverse, delete, select, sort, permutation, ordered, insert, quicksort.

4.8.1

Iteration

Iterative version of Length % length(List,Number) <- Number is lenght of List % Iterative version. length(List,LenghtofList) :- length(List,0,LengthofList). % length(SufixList,LengthofPrefix,LengthofList) <% LengthofList is LengthofPrefix + length of SufixList

4.8. LISTS length([],LenghtofPrefix,LengthofPrefix). length([Element|List],LengthofPrefix,LengthofList) :PrefixPlus1 is LengthofPrefix + 1, length(List,PrefixPlus1,LengthofList). Iterative version of Reverse

41

% reverse(List,ReversedList) <- ReversedList is List reversed. % Iterative version. reverse(List,RList) :- reverse(List,[],RList). % length(SufixList,LengthofPrefix,LengthofList) <% LengthofList is LengthofPrefix + length of SufixList reverse([],RL,RL). reverse([Element|List],RevPrefix,RL) :reverse(List,[Element|RevPrefix],RL). Here are some simple examples of common list operations dened by pattern matching. The rst sums the elements of a list and the second forms the product of the elements of a list. sum([ ],0). sum([X|L],Sum) :- sum(L,SL), Sum is X + SL. product([ ],1). product([X|L],Prod) :- product(L,PL), Prod is X * PL. Another example common list operation is that of appending or the concatenation of two lists to form a third list. Append may be described as the relation between three lists, L1, L2, L3, where L1 = [x1,...,xm], L2 = [y1,...,yn] and L3 = [x1,...,xm,y1,...,yn]. In Prolog, an inductive style denition is required. append([ ],L,L). append([X1|L1],L2, [X1|L3]) :- append(L1,L2,L3). The rst rule is the base case. The second rule is the inductive case. In eect the second rule says that if L1 = [x2,...,xm], L2 = [y1,...,yn] and L3 = [x2,...,xm,y1,...,yn], then [x1,x2,...,xm,y1,...,yn], is the result of appending [x1,x2,...,xm] and L2.

42

CHAPTER 4. PROLOG TUTORIAL The append relation is quite exible. It can be used to determine if an object is an element of a list, if a list is a prex of a list and if a list is a sux of a list. member(X,L) :- append(_,[X|_],L). prefix(Pre,L) :- append(Prefix,_,L). suffix(L,Suf) :- append(_,Suf,L). The underscore ( ) in the denitions denotes an anonymous variable (or dont care) whose value in immaterial to the denition. The member relation can be used to derive other useful relations. vowel(X) :- member(X,[a,e,i,o,u]). digit(D) :- member(D,[0,1,2,3,4,5,6,7,8,9]). A predicate dening a list and its reversal can be dened using pattern matching and the append relation as follows. reverse([ ],[ ]). reverse([X|L],Rev) :- reverse(L,RL), append(RL,[X],Rev).

Here is a more ecient (iterative/tail recursive) version. reverse([ ],[ ]). reverse(L,RL) :- reverse(L,[ ],RL). reverse([ ],RL,RL). reverse([X|L],PRL,RL) :- reverse(L,[X|PRL],RL).

To conclude this section, here is a denition of insertion sort. isort([ ],[ ]). isort([X|UnSorted],AllSorted) :- isort(UnSorted,Sorted), insert(X,Sorted,AllSorted). insert(X,[ ],[X]). insert(X,[Y|L],[X,Y|L]) :- X =< Y. insert(X,[Y|L],[Y|IL]) :- X > Y, insert(X,L,IL).

4.9

Iteration

Recursion is the only iterative method available in Prolog. However, tail recursion can often be implemented as iteration. The following denition

4.10. ITERATORS, GENERATORS AND BACKTRACKING

43

of the factorial function is an iterative denition because it is tail recursive. It corresponds to an implementation using a while-loop in an imperative programming language. fac(0,1). fac(N,F) :- N > 0, fac(N,1,F). fac(1,F,F). fac(N,PP,F) :- N > 1, NPp is N*PP, M is N-1, fac(M,NPp,F). Note that the second argument functions as an accumulator. The accumulator is used to store the partial product much as might be done is a procedural language. For example, in Pascal an iterative factorial function might be written as follows. function fac(N:integer) : integer; var i : integer; begin if N >= 0 then begin fac := 1 for I := 1 to N do fac := fac * I end end; In the Pascal solution fac acts as an accumulator to store the partial product. The Prolog solution also illustrates the fact that Prolog permits dierent relations to be dened by the same name provided the number of arguments is dierent. In this example the relations are fac/2 and fac/3 where fac is the functorquot; and the number refers to the arity of the predicate. As an additional example of the use of accumulators, here is an iterative (tail recursive version) of the Fibonacci function. fib(0,1). fib(1,1). fib(N,F) :- N > 1, fib(N,1,1,F) fib(2,F1,F2,F) :- F is F1 + F2. fib(N,F1,F2,F) :- N > 2, N1 is N - 1, NF1 is F1 + F2, fib(N1,NF1,F1,F).

4.10

Iterators, Generators and Backtracking

The following fact and rule can be used to generate the natural numbers.

44

CHAPTER 4. PROLOG TUTORIAL nat(0). nat(N) :- nat(M), N is M + 1. The successive numbers are generated by backtracking. For example, when the following query is executed successive natural numbers are printed. ?- nat(N), write(N), nl, fail. The rst natural number is generated and printed, then fail forces backtracking to occur and the second rule is used to generate the successive natural numbers. The following code generates successive prexes of an innite list beginning with N. natlist(N,[N]). natlist(N,[N|L]) :- N1 is N+1, natlist(N1,L). As a nal example, here is the code for generating successive prexes of the list of prime numbers. primes(PL) :- natlist(2,L2), sieve(L2,PL). sieve([ ],[ ]). sieve([P|L],[P|IDL]) :- sieveP(P,L,PL), sieve(PL,IDL). sieveP(P,[ ],[ ]). sieveP(P,[N|L],[N|IDL]) :- N mod P > 0, sieveP(P,L,IDL). sieveP(P,[N|L], IDL) :- N mod P =:= 0, sieveP(P,L,IDL). Occasionally, backtracking and multiple answers are annoying. Prolog provides the cut symbol (!) to control backtracking. The following code denes a predicate where the third argument is the maximum of the rst two. max(A,B,M) :- A < B, M = B. max(A,B,M) :- A >= B, M = A. The code may be simplied by dropping the conditions on the second rule. max(A,B,B) :- A < max(A,B,A). B.

However, in the presence of backtracking, incorrect answers can result as is shown here. ?- max(3,4,M). M = 4; M = 3

4.11. TUPLES ( OR RECORDS)

45

To prevent backtracking to the second rule the cut symbol is inserted into the rst rule. max(A,B,B) :- A < B.!. max(A,B,A). Now the erroneous answer will not be generated. A word of caution: cuts are similar to gotos in that they tend to increase the complexity of the code rather than to simplify it. In general the use of cuts should be avoided.

4.11

Tuples ( or Records)

We illustrate the data type of tuples with the code for the abstract data type of a binary search tree. The binary search tree is represented as either nil for the empty tree or as the tuple btree(Item,L Tree,R Tree). Here is the Prolog code for the creation of an empty tree, insertion of an element into the tree, and an in-order traversal of the tree. create_tree(niltree). inserted_in_is(Item,niltree, btree(Item,niltree,niltree)). inserted_in_is(Item,btree(ItemI,L_T,R_T),Result_Tree) :Item @< ItemI, inserted_in_is(Item,L_Tree,Result_Tree). inserted_in_is(Item,btree(ItemI,L_T,R_T),Result_Tree) :Item @> ItemI, inserted_in_is(Item,R_Tree,Result_Tree). inorder(niltree,[ ]). inorder(btree(Item,L_T,R_T),Inorder) :inorder(L_T,Left), inorder(R_T,Right), append(Left,[Item|Right],Inorder). The membership relation is a trivial modication of the insert relation. Since Prolog access to the elements of a tuple are by pattern matching, a variety of patterns can be employed to represent the tree. Here are some alternatives. [Item,LeftTree,RightTree] Item/LeftTree/RightTree (Item,LeftTree,RightTree)

46

CHAPTER 4. PROLOG TUTORIAL

4.12
Outline

Extra-Logical Predicates

Input/Output Assert/Retract System Access The class of predicates in Prolog that lie outside the logic programming model are called extra-logical predicates. These predicates achieve a side eect in the course of being satised as a logical goal. There are three types of extra-logical predicates, predicates for handling I/O, predicates for manipulating the program, and predicates for accessing the underlying operating system.

4.12.1

Input/Output

Most Prolog implementations provide the predicates read and write. Both take one argument, read unies its argument with the next term (terminated with a period) on the standard input and write prints its argument to the standard output. As an illustration of input and output as well as a more extended example, here is the code for a checkbook balancing program. The section beginning with the comment Promptsquot; handles the I/0. % Check Book Balancing Program. checkbook :- initialbalance(Balance), newbalance(Balance). % Recursively compute new balances newbalance(OldBalance) :- transaction(Transaction), action(OldBalance,Transaction). % If transaction amount is 0 then finished. action(OldBalance,Transaction) :- Transaction = 0, finalbalance(OldBalance). % If transaction amount is not 0 then compute new balance. action(OldBalance,Transaction) :- Transaction \+= 0, NewBalance is OldBalance + Transaction, newbalance(NewBalance). % % Prompts initialbalance(Balance) :- write(Enter initial balance: \), read(Balance).

4.12. EXTRA-LOGICAL PREDICATES

47

transaction(Transaction) :write(Enter Transaction, ), write(- for withdrawal, 0 to terminate): ), read(Transaction). finalbalance(Balance) :- write(Your final balance is: \), write(Balance), nl. Files see(File) Current input le is now File. seeing(File) File is unied with the name of the current input le. seen Closes the current input le. tell(File) Current output le is now File. telling(File) File is unied with the name of the current output le. told Closes the current output le. Term I/O read(Term) Reads next full-stop (period) delimited term from the current input stream, if eof then returns the atom end of le. write(Term) Writes a term to the current output stream. print(Term) Writes a term to the current output stream. Uses a user dened predicate portray/1 to write the term, otherwise uses write. writeq(Term) Writes a term to the current output stream in a form aceptable as input to read. Character I/O get(N) N is the ASCII code of the next non-blank printable character on the current input stream. If end of le, then a -1 is returned. put(N) Puts the character corresponding to ASCII code N on the current output stream. nl Causes the next output to be on a new line. tab(N) N spaces are output to the current output stream. Program Access consult(SourceFile) Loads SourceFile into the interpreter but, if a predicate is dened accross two or more les, consulting them will result in only the clauses in the le last consulted being used. reconsult(File) available in some systems. Other the conversion routine between lists of ASCII codes and atoms. display, prompt

48

CHAPTER 4. PROLOG TUTORIAL % Read a sentence and return a list of words. read_in([W|Ws]) :- get0(C), read_word(C,W,C1), rest_sent(W,C1,Ws). % Given a word and the next character, read in the rest of the sentence rest_sent(W,_,[]) :- lastword(W). rest_sent(W,C,[W1|Ws]) :- read_word(C,W1,C1), rest_sent(W1,C1,Ws). read_word(C,W,C1) :- single_character(C),!,name(W,[C]), get0(C1). read_word(C,W,C2) :- in_word(C,NewC), get0(C1), rest_word(C1,Cs,C2), name(W,[NewC|Cs]). read_word(C,W,C2) :- get0(C1), read_word(C1,W,C2). rest_word(C,[NewC|Cs],C2) :- in_word(C,NewC), !, get0(C1), rest_word(C1,Cs,C2). rest_word(C,[],C). % These are single character words. single_character(33). single_character(44). single_character(46). single_character(58). single_character(59). single_character(63). % % % % % % ! , . : ; ?

% These characters can appear within a word. in_word(C,C) :- C > 96, C < 123. in_word(C,L) :- C > 64, C < 91, L is C + 32. in_word(C,C) :- C > 47, C < 58. in_word(39,39). in_word(45,45). % These words terminate a sentence. lastword(.). lastword(!). lastword(?). % % % % % a,b,...,z A,B,...,Z 0,1,...,9 -

4.12.2

Program Access and Manipulation

clause(Head,Body) assert(Clause) adds clause to the end of the database

4.13. STYLE AND LAYOUT asserta(Clause) retract(Clause Head) consult(File Name)

49

4.12.3

System Access

system(Command) Execute Command in the operating system

4.13
Outline

Style and Layout

Style and Layout Debugging Some conventions for comments. Long comments should precede the code they refer to while short comments should be interspersed with the code itself. Program comments should describe what the program does, how it is used (goal predicate and expected results), limitations, system dependent features, performance, and examples of using the program. Predicate comments explain the purpose of the predicate, the meaning and relationship among the arguments, and any restrictions as to argument type. Clause comments add to the description of the case the particular clause deals with and is usefull for documenting cuts. Some conventions for program layout Group clauses belonging to a relation or ADT together. Clauses should be short. Their body should contain no more than a few goals. Make use of indentation to improve the readability of the body of a clause. Mnemonic names for relations and variables should be used. Names should indicate the meaning of relations and the role of data objects. Clearly separate the clauses dening dierent relations. The cut operator should be used with care. The use of red cuts should be limited to clearly dened mutually exclusive alternatives. Illustration

50

CHAPTER 4. PROLOG TUTORIAL merge( List1, List2, List3 ) :( List1 = [], !, List3 = List2 ); ( List2 = [], !, List3 = List1 ); ( List1 = [X|L1], List2 = [Y|L2 ), ((X < Y, ! Z = X, merge( L1, List2, L3 ) ); ( Z = Y, merge( List1, L2, L3 ) )), List3 = [Z|L3]. A better version merge( [], List2, List2 ). merge( List1, [], List1 ). merge( [X|List1], [Y|List2], [X|List3] ) :X < Y, !, merge( List1, List2, List3 ). % Red Cut merge( List1, [Y|List2], [Y|List3] ) :merge( List1, List2, List3 ).

4.13.1

Debugging

trace/notrace, spy/nospy, programmer inserted debugging aids write predicates and p :- write, fail.

4.14
Outline

Negation and Cuts

Negation as failure Green Cuts Red Cuts

4.14.1 4.14.2

Negation Cuts

Green cuts: Determinism Selection among mutually exclusive clauses. Tail Recursion Optimization Prevention of backtracking when only one solution exists. A :- B1,...,Bn,Bn1. A :- B1,...,Bn,!,Bn1. % prevents backtracking

4.15. DEFINITE CLAUSE GRAMMARS Red cuts: omitting explicit conditions

51

4.15
Outline

Denite Clause Grammars

The parsing problem: Context-free grammars; Construct a parse tree for a sentence given the context-free grammar. Representing the Parsing Problem in Prolog The Grammar Rule Notation] (Denite Clause Grammars DCG) Adding Extra Arguments Adding Extra Tests Prolog originated from attempts to use logic to express grammar rules and formalize the parsing process. Prolog has special syntax rules which are called denite clause grammars (DCG). DCGs are a generalization of context free grammars.

4.15.1

Context Free Grammars

A context free grammar is a set of rules of the form: <nonterminal> -> where nonterminal is a nonterminal and body is a sequence of one or more items. Each item is either a nonterminal symbol or a sequence of terminal symbols. The meaning of the rule is that the body is a possible form for an object of type nonterminal. S --> a b S --> a S b

4.15.2

DCG

Nonterminals are written as Prolog atoms, the items in the body are separated with commas and sequences of terminal symbols are written as lists of atoms. For each nonterminal symbol, S, a grammar denes a language which is obtained by repeated nondeterministic application of the grammar rules, starting from S. s --> [a],[b]. s --> [a],s,[b]. As an illustration of how DCG are used, the string [a,a,b,b] is given to the grammar to be parsed.

52 ?- s([a,a,b,b],[]). yes

CHAPTER 4. PROLOG TUTORIAL

Here is a natural language example. % DCGrammar sentence --> noun_phrase, verb_phrase. noun_phrase --> determiner, noun. noun_phrase --> noun. verb_phrase --> verb. verb_phrase --> verb, noun_phrase. % Vocabulary determiner --> [the]. determiner --> [a]. noun noun noun noun verb verb verb verb --> --> --> --> --> --> --> --> [cat]. [cats]. [mouse]. [mice]. [scare]. [scares]. [hate]. [hates].

Context free grammars cannot dene the required agreement in number between the noun phrase and the verb phrase. That information is context dependent (sensitive). However, DCG are more general Number agreement % DCGrammar - with number agreement between noun phrase and verb phrase sentence --> noun_phrase(Number), verb_phrase(Number). noun_phrase(Number) --> determiner(Number), noun(Number). noun_phrase(Number) --> noun(Number). verb_phrase(Number) --> verb(Number). verb_phrase(Number) --> verb(Number), noun_phrase(Number1). % Vocabulary

4.15. DEFINITE CLAUSE GRAMMARS determiner(Number) --> [the]. determiner(singular) --> [a]. noun(singular) --> [cat]. noun(plural) --> [cats]. noun(singular) --> [mouse]. noun(plural) --> [mice]. verb(plural) --> [scare]. verb(singular) --> [scares]. verb(plural) --> [hate]. verb(singular) --> [hates].

53

4.15.3

Parse Trees

% DCGrammar -- with parse tree as a result sentence(sentence(NP,VP)) --> noun_phrase(NP), verb_phrase(VP). noun_phrase(noun_phrase(D,NP)) --> determiner(D), noun(NP). noun_phrase(NP) --> noun(NP). verb_phrase(verb_phrase(V)) --> verb(V). verb_phrase(verb_phrase(V,NP)) --> verb(V), noun_phrase(NP). % Vocabulary determiner(determiner(the)) --> [the]. determiner(determiner(a)) --> [a]. noun(noun(cat)) --> [cat]. noun(noun(cats)) --> [cats]. noun(noun(mouse)) --> [mouse]. noun(noun(mice)) --> [mice]. verb(verb(scare)) --> [scare]. verb(verb(scares)) --> [scares]. verb(verb(hate)) --> [hate]. verb(verb(hates)) --> [hates].

54

CHAPTER 4. PROLOG TUTORIAL

4.15.4 Simple Semantics for Natural Language Sentences


Transitive and intransitive verbs % DCGrammar -- Transitive and intransitive verbs sentence(VP) --> noun_phrase(Actor), verb_phrase(Actor,VP). noun_phrase(Actor) --> proper_noun(Actor). verb_phrase(Actor,VP) --> intrans_verb(Actor,VP). verb_phrase(Actor,VP) --> transitive_verb(Actor,Something,VP), noun_phrase(Something). % Vocabulary proper_noun(john) --> [john]. proper_noun(annie) --> [annie]. intrans_verb(Actor,paints(Actor)) --> [paints].

transitive_verb(Somebody,Something,likes(Somebody,Something)) --> [likes]. Determiners a and every :- op( 100, xfy, and). :- op( 150, xfy, =>).

% DCGrammar -- Transitive and intransitive verbs sentence(S) --> noun_phrase(X,Assn,S), verb_phrase(X,Assn). noun_phrase(X,Assn,S) --> determiner(X,Prop,Assn,S), noun(X,Prop). verb_phrase(X,Assn) --> intrans_verb(X,Assn). % Vocabulary determiner(X,Prop,Assn,exists(X,Prop and Assn)) --> [a]. determiner(X,Prop,Assn, all(X,Prop => Assn)) --> [every]. noun(X,man(X)) --> [man]. noun(X,woman(X)) --> [woman].

4.15. DEFINITE CLAUSE GRAMMARS intrans_verb(X,paints(X)) intrans_verb(X,dances(X)) Relative Clauses --> [paints]. --> [dances].

55

4.15.5

Interleaving syntax and semantics in DCG

% Word level sentence --> word(W), rest_sent(W). rest_sent(W) --> {last_word(W)}. rest_sent(_) --> word(W), rest_sent(W). % Character level word(W) --> {single_char_word(W)}, [W]. word(W) --> {multiple_char_word(W)}, [W]. % Read a sentence and return a list of words. sentence --> {get0(C)}, word(C,W,C1), rest_sent(C1,W). % Given the next character and the previous word, % read the rest of the sentence rest_sent(C,W) --> {lastword(W)}. rest_sent(C,_) --> word(C,W,C1), rest_sent(C1,W). % empty

word(C,W,C1) --> {single_character(C),!,name(W,[C]), get0(C1)}, [W]. % !,.:;? word(C,W,C2) --> {in_word(C,Cp), get0(C1), rest_word(C1,Cs,C2), name(W,[Cp|Cs])},[W]. word(C,W,C2) --> {get0(C1)}, word(C1,W,C2). % consume blanks % These words terminate a sentence. lastword(.). lastword(!). lastword(?). % This reads the rest of the word plus the next character. rest_word(C,[Cp|Cs],C2) :- in_word(C,Cp), get0(C1), rest_word(C1,Cs,C2). rest_word(C,[],C). % These are single character words. single_character(33). % !

56 single_character(44). single_character(46). single_character(58). single_character(59). single_character(63).

CHAPTER 4. PROLOG TUTORIAL % % % % % , . : ; ?

% These characters can appear within a word. in_word(C,C) :- C > 96, C < 123. in_word(C,L) :- C > 64, C < 91, L is C + 32. in_word(C,C) :- C > 47, C < 58. in_word(39,39). in_word(45,45). a calculator!! % % % % % a,b,...,z A,B,...,Z 0,1,...,9 -

4.16
Outline

Incomplete Data Structures

Dierence Lists Dictionaries Queue QuickSort An incomplete data structure is a data structure containing a variable. Such a data structure is said to be partially instantiated or incomplete. We illustrate the programming with incomplete data structures by modifying the code for a binary search tree. The resulting code permits the relation inserted in is to dene both the insertion and membership relations. The empty tree is represented as a variable while a partially instantiated tree is represented as a tuple. create_tree(Niltree) :- var(Niltree). % Note: Nil is a variable inserted_in_is(Item,btree(Item,L_T,R_T)). inserted_in_is(Item,btree(ItemI,L_T,R_T)) :Item @< ItemI, inserted_in_is(Item,L_T). inserted_in_is(Item, btree(ItemI,L_T,R_T)) :Item @> ItemI, inserted_in_is(Item,R_T).

4.17. META LEVEL PROGRAMMING inorder(Niltree,[ ]) :- var(Niltree). inorder(btree(Item,L_T,R_T),Inorder) :inorder(L_T,Left), inorder(R_T,Right), append(Left,[Item|Right],Inorder).

57

4.17

Meta Level Programming

Meta-programs treat other programs as data. They analyze, transform, and simulate other programs. Prolog clauses may be passed as arguments, added and deleted from the Prolog data base, and may be constructed and then executed by a Prolog program. Implementations may require that the functor and arity of the clause be previously declared to be a dynamic type. Outline Meta-logical Type Predicates Assert/Retract System Access

4.17.1

Meta-Logical Type Predicates

var(V) Tests whether V is a variable. nonvar(NV) Tests whether NV is a non-variable term. atom(A) Tests whether A is an atom (non-variable term of arity 0 other than a number). integer(I) Tests whether I is an integer. number(N) Tests whether N is a number.

4.17.2
X=Y X == Y X =:= Y

Term Comparison

4.17.3

The Meta-Variable Facility

call(X) this

58

CHAPTER 4. PROLOG TUTORIAL

4.17.4

Assert/Retract

Here is an example illustrating how clauses may be added and deleted from the Prolog data base. The example shows how to simulate an assignment statement by using assert and retract to modify the association between a variable and a value. :- dynamic x/1 .% this may be required in some Prologs x(0). % An initial value is required in this example

assign(X,V) :- Old =..[X,_], retract(Old), New =..[X,V], assert(New). Here is an example using the assign predicate. ?- x(N). N = 0 yes ?- assign(x,5). yes ?- x(N). N = 5 Here are three programs illustrating Prologs meta programming capability. This rst program is a simple interpreter for pure Prolog programs. % Meta Interpreter for pure Prolog prove(true). prove((A,B)) :- prove(A), prove(B). prove(A) :- clause(A,B), prove(B). Here is an execution of an append using the interpreter. ?- prove(append([a,b,c],[d,e],F)). F = [a,b,c,d,e] It is no dierent from what we get from using the usual run time system. The second program is a modication of the interpreter, in addition to interpreting pure Prolog programs it returns the sequence of deductions required to satisfy the query.

4.17. META LEVEL PROGRAMMING % Proofs for pure Prolog programs

59

proof(true,true). proof((A,B),(ProofA,ProofB)) :- proof(A,ProofA), proof(B,ProofB). proof(A,(A:-Proof)) :- clause(A,B), proof(B,Proof). Here is a proof an append. ?- proof(append([a,b,c],[d,e],F),Proof). F = [a,b,c,d,e] Proof = (append([a,b,c],[d,e],[a,b,c,d,e]) :(append([b,c],[d,e],[b,c,d,e]) :(append([c],[d,e],[c,d,e]) :(append([ ],[d,e],[d,e]) :- true)))) The third program is also a modication of the interpreter. In addition to interpreting pure Prolog programs, is a trace facility for pure Prolog programs. It prints each goal twice, before and after satisfying the goal so that the programmer can see the parameters before and after the satisfaction of the goal. % Trace facility for pure Prolog trace(true). trace((A,B)) :- trace(A), trace(B). trace(A) :- clause(A,B), downprint(A), trace(B), upprint(A). downprint(G) :- write(>), write(G), nl. upprint(G) :- write(<), write(G), nl. Here is a trace of an append. ?- trace(append([a,b,c],[d,e],F)). >append([a,b,c],[d,e],[a|_1427104]) >append([b,c],[d,e],[b|_1429384]) >append([c],[d,e],[c|_1431664]) >append([ ],[d,e],[d,e]) <append([ ],[d,e],[d,e]) <append([c],[d,e],[c,d,e]) <append([b,c],[d,e],[b,c,d,e]) <append([a,b,c],[d,e],[a,b,c,d,e]) F = [a,b,c,d,e]

60

CHAPTER 4. PROLOG TUTORIAL Predictates for program manipulation consult(le name) var(term), nonvar(term), atom(term), integer(term), atomic(term) functor(Term,Functor,arity), arg(N,term,N-th arg), Term =..List call(Term) clause(Head,Body), assertz(Clause), retract(Clause)

4.18
Outline:

Second-Order Programming

Setof, Bagof, Findall Other second-order predicates Applications

4.18.1 4.18.2

Setof, Bagof and Findall Other second-order predicates

has property, map list, lter, foldr etc Variable predicate names p(P,X,Y) :- P(X,Y). p(P,X,Y) :- R =..[P,X,Y], call(R). For the following functions let S be the list [S1 , ..., Sn ]. 1. The function map where map(f,S) is [f(S1 ), ..., f (Sn )]. 2. The function filter where filter(P,S) is the list of elements of S that satisfy the predicate P. 3. The function foldl where foldl(Op,In,S) which folds up S, using the given binary operator Op and start value In, in a left associative way, ie, foldl(op, r,[a,b,c]) = (((r op a) op b) op c). 4. The function foldr where foldr(Op,In,S) which folds up S, using the given binary operator Op and start value In, in a right associative way, ie, foldr(op,r,[a,b,c]) = a op (b op (c op r)). 5. The function map2 is similar to map, but takes a function of two arguments, and maps it along two argument lists. 6. The function scan where scan(op, r, S) applies foldl op r) to every initial segment of a list. For example scan (+) 0 x) computes running sums.

4.19. DATABASE PROGRAMMING

61

7. The function dropwhile where dropwhile(P,S) which returns the sux of S where each element of the prefex satises the predicate P. 8. The function takewhile where takewhile(P,S) returns the list of initial element of S which satisfy P. 9. The function until where until(P,F,V) returns the result of applying the function F to the value the smallest number of times necessary to satisfy the predicate. Example until (1000) (2*) 1 = 1024 10. The function iterate where iterate(f,x) returns the innite list [x, f x, f(f x), ... ] 11. Use the function foldr to dene the functions, sum, product and reverse. 12. Write a generic sort program, it should take a comparison function as a parameter. 13. Write a generic transitive closer program, it should take a binary relation as a parameter.

4.18.3

Applications

Generalized sort, transitive closure ... transitive_closure(Relation,Item1,Item2) :- Predicate =..[Relation,Item1,Item2], call(Predicate). transitive_closure(Relation,Item1,Item2) :- Predicate =..[Relation,Item1,Link], call(Predicate), transitive_closure(Relation,Link,Item2).

4.19
Outline

Database Programming
Simple Family Database Recursive Rules Logic Programming and the Relational Database Model (relational algebra)

Objective: Logic Programming as Database Programming

4.19.1

Simple Databases

Basic predicates: father/2,mother/2, male/1, female/1. father(Father,Child). mother(Mother,Child). male(Person).

62

CHAPTER 4. PROLOG TUTORIAL female(Person). son(Son,Parent). daughter(Daughter,Parent). parent(Parent,Child). grandparent(Grandparent,Grandchild).

Question: Which should be facts and which should be rules? Example: if parent, male and female are facts then father and mother could be rules. father(Parent,Child) :- parent(Parent,Child), male(Parent). mother(Parent,Child) :- parent(Parent,Child), female(Parent).

Some other relations that could be dened are. mother(Woman) :- mother(Woman,Child). parents(Father,Mother) :- father(Father,Child), mother(Mother,Child). brother(Brother,Sibling) :- parent(P,Brother), parent(P,Sibling), male(Brother), Brother Sibling. uncle(Uncle,Person) :- brother(Uncle,Parent), parent(Parent,Person). sibling(Sib1,Sib2) :- parent(P,Sib1), parent(P,Sib2), Sib1 =\= Sib2. cousin(Cousin1,Cousin2) :- parent(P1,Cousin1), parent(P2,Cousin2), sibling(P1,P2).

What about: sister, niece, full sibling, mother in law, etc.

4.19.2

Recursive Rules
ancestor(Ancestor,Descendent) :- parent(Ancestor,Descendent). ancestor(Ancestor,Descendent) :- parent(Ancestor,Person), ancestor(Persion,Descendent).

The ancestor relation is an example of the more general relation of transitive closure. Here is an example of the transitive closure for graphs. Transitive closure: connected edge(Node1,Node2). ... connected(Node1,Node2) :- edge(Node1,Node2). connected(Node1,Node2) :- edge(Node1,Link), connected(Link,Node2).

4.19. DATABASE PROGRAMMING

63

4.19.3

Logic programs and the relational database model

The mathematical concept underlying the relational database model is the settheoretic relation, which is a subset of the Cartesian product of a list of domains. A domain is a set of values. A relation is any subset of the Cartesian product of one or more domains. The members of a relation are called tuples. In relational databases, a relation is viewed as a table. The Prolog view of a relation is that of a set of named tuples. For example, in Prolog form, here are some unexpected entries in a city-state-population relation. city_state_population(San Diego,Texas,4490). city_state_population(Miami,Oklahoma,13880). city_state_population(Pittsburg,Iowa,509). In addition to dening relations as a set of tuples, a relational database management system (DBMS) permits new relations to be dened via a query language. In Prolog form this means dening a rule. For example, the sub-relation consisting of those entries where the population is less than 1000 can be dened as follows: smalltown(Town,State,Pop) :- city_state_pop(Town,State,Pop), Pop < 1000. One of the query languages for relational databases is the Relational Algebra. Its operations are union, set dierence, Cartesian product, projection, and selection. They may be dened for two relations r and s as follows. % Union of relations r/n and s/n r_union_s(X1,...,Xn) :- r(X1,...,Xn). r_union_s(X1,...,Xn) :- s(X1,...,Xn). % Set Difference r/n $\setminus$ s/n r_diff_s(X1,...,Xn) :- r(X1,...,Xn), not s(X1,...,Xn). r_diff_s(X1,...,Xn) :- s(X1,...,Xn), not r(X1,...,Xn). % Cartesian product r/m, s/n r_x_s(X1,...,Xm,Y1,...,Yn) :- r(X1,...,Xm), s(Y1,...,Yn). % Projection r_p_i_j(Xi,Xj) :- r(X1,...,Xn). % Selection r_c(X1,...,Xn) :- r(X1,...,Xn), c(X1,...,Xn). % Meet r_m_s(X1,...,Xn) :- r(X1,...,Xn), s(X1,...,Xn).

64

CHAPTER 4. PROLOG TUTORIAL

% Join r_j_s(X1,...,Xj,Y1,...,Yk) :- r(X1,...,Xn), s(Y1,...,Yn).

The dierence between Prolog and a Relational DBMS is that the in Prolog the relations are stored in main memory along with the program whereas in a Relational DBMS the relations are stored in les and the program extracts the information from the les.

4.20

Expert systems

Expert systems may be programmed in one of two ways in Prolog. One is to construct a knowledge base using Prolog facts and rules and use the builtin inference engine to answer queries. The other is to build a more powerful inference engine in Prolog and use it to implement an expert system. Pattern matching: Symbolic dierentiation

4.21

Object-Oriented Programming

object( Object, Methods ) /****************************************************************************** OOP ******************************************************************************/ /*============================================================================= Interpreter for OOP =============================================================================*/ send( Object, Message ) :- get_methods( Object, Methods ), process( Message, Methods ). get_methods( Object, Methods ) :- object( Object, Methods ). get_methods( Object, Methods ) :- isa( Object, SuperObject ), get_methods( SuperObject, Methods ). process( Message, [Message|_] ). process( Message, [(Message :- Body)|_] ) :- call( Body ). process( Message, [_|Methods] ) :- process( Message, Methods ). /*============================================================================= Geometric Shapes

4.22. APPENDIX

65

=============================================================================*/ object( polygon( Sides ), [ (perimeter( P ) :- sum( Sides, P )) ] ). object( reg_polygon( Side, N ), [ ((perimeter( P ) :- P is N*Side)), (describe :- write(Regular polygon)) ] ).

object( rectangle( Length, Width ), [ (area( A ) :- A is Length * Width ), (describe :- write(Rectangle of size ), write( Length*Width)) ] ). object( square( Side ), [ (describe :- write( Square with side ), write( Side )) ] ). object( pentagon( Side ), [ (describe :- write(Pentagon)) ] ). isa( isa( isa( isa( square( Side ), rectangle( Side, Side ) ). square( Side ), reg_polygon( Side, 4 ) ). rectange( Length, Width ), polygon([Length, Width, Length, Width]) ). pentagon( Side ), reg_polygon( Side, 5 ) ).

isa( reg_polygon( Side, N ), polygon( L ) ) :- makelist( Side, N, L ).

4.22

Appendix

The entries in this appendix have the form: pred/n definition where pred is the name of the built in predicate, n is its arity (the number of arguments it takes), and definition is a short explanation of the function of the predicate. ARITHMETIC EXPRESSIONS +, -, *, /, sin, cos, tan, atan, sqrt, pow, exp, log I/O see/1 the current input stream becomes arg1 seeing/1 arg1 unies with the name of the current input stream. seen/0 close the current input stream tell/1 the current output stream becomes arg1 telling/1 arg1 unies with the name of the current output stream. told/0 close current output stream read/1 arg1 is unied with the next term delimited with a period from the current input stream.

66

CHAPTER 4. PROLOG TUTORIAL get/1 arg1 is unied with the ASCII code of the next printable character in the current input stream. write/1 arg1 is written to the current output stream. writeq/1 arg1 is written to the current output stream so that it can be read with read. nl/0 an end-of-line character is written to the current output stream. spaces/1 arg1 number of spaces is written to the current output stream. PROGRAM STATE listing/0 all the clauses in the Prolog data base are written to the current output stream listing/1 all the clauses in the Prolog data base whose functor name is equal to arg1 are written to the current output stream clause(H,B) succeeds if H is a fact or the head of some rule in the data base and B is its body (true in case H is a fact). PROGRAM MANIPULATION consult/1 the le with name arg1 is consulted (loaded into the Prolog data base) reconsult/1 the le with name arg1 is reconsulted assert/1 arg1 is interpreted as a clause and is added to the Prolog data base (functor must be dynamic) retract/1 the rst clause which is uniable with arg1 is retracted from the Prolog data base (functor must be dynamic) META-LOGICAL ground/1 succeeds if arg1 is completely instantiated (BIM) functor/3 succeeds if arg1 is a term, arg2 is the functor, and arg3 is the arity of the term. T =..L succeeds if T is a term and L is a list whose head is the principle functor of T and whose tail is the list of the arguments of T. name/2 succeeds if arg1 is an atom and arg2 is a list of the ASCII codes of the characters comprising the name of arg1. call/1 succeeds if arg1 is a term in the program. setof/3 arg3 is a set (list) of all instances of arg1 for which arg2 holds. Arg3 must be of the form XT where X is an unbound variables in T other than arg1. bagof/3 arg3 is a list of all instances of arg1 for which arg2 holds. See setof. \+/1 succeeds if arg1 is not provable (Required instead of not in some Prologs if arg1 contains variables. not/1 same as but may requires arg1 to be completely instantiated

4.23. REFERENCES SYSTEM CONTROL halt/0, C-d exit from Prolog DIRECTIVES :- dynamic pred/n . the predicate pred of order n is dynamic

67

4.23

References

Clocksin amp; Mellish, Programming in Prolog 4th ed. Springer-Verlag 1994. Hill, P. amp; Lloyd, J. W., The Gouml;del Programming Language MIT Press 1994. Hogger, C. J., Introduction to Logic Programming Academic Press 1984. Lloyd, J. W., Foundations of Logic Programming 2nd ed. Springer-Verlag 1987. Nerode, A. amp; Shore, R. A., Logic for Applications Springer-Verlag 1993. Robinson, J. A., Logic: Form and Function North-Holland 1979. Sterling and Shapiro, The Art of Prolog. MIT Press, Cambridge, Mass. 1986.

68

CHAPTER 4. PROLOG TUTORIAL

Chapter 5

Datalog and Logic-Based Databases


Related to: relational calculus, logic, expert systems, automated reasoning Requisite for: relational calculus, logic, expert systems, automated reasoning Prerequisites: Additional Resources Datalog versions of several databases from Ullman and Windom. Relational database design tools for normalization. These tools are written in Prolog. Datalog example. Prolog Tutorial. Logic Programming.

5.1

Introduction

Datalog (a subset of Prolog - PROgramming in LOGic) is a language of facts and rules. It is a logic based query language for the relational model. The basic concepts are: predicate (a relation) term, constant, variable goal clause, rule, fact 69

70

CHAPTER 5. DATALOG AND LOGIC-BASED DATABASES substitution unication

add info

Datalog versions of several databases from Ullman and Windom.

5.2

Datalog and the Relational Model

Example: The relation R R (A 1 2 B) 2 4

may be represented in Datalog as a collection of facts: r(1,2). r(2,4). where relation names must be atoms and the elements in a relation must be terms. An atom is a string of letters, digits, and underscores that begins with a lowercase letter. A term can be: a constant symbol: atoms (names) numbers (integers oats) ... The facts in a Datalog database are called extensional predicates (dened by their extension) in contrast to relations which are computed by applying one or more Datalog rules. The abbreviation EDB, standing for extensional database is used to refer to the extensional predicates or relations. The computed relations are called intensional predicates (dened by the programmers intent). The abbreviation IDB stands for intensional database and is used to refer to the computed predicates or relations. Aside. Prolog permits terms to be complex objects.

5.3

Relational Algebra and Datalog

Example: given the schema Movie(Title, Year, Length, InColor, StudioName, Producer)

5.3. RELATIONAL ALGEBRA AND DATALOG

71

with appropriate instances represented as facts in the Datalog database, the rule longMovie(T,Y) :- movie(T,Y,L, , , ), L >= 100. denes a set of long movies as those that are at least 100 minutes long. A clause has the form: Concept Syntax Semantics Goal P( T1, ..., Tn ). Predicate P is true of terms T1 and ... and Tn Query ?- A1, ... , Ak. Succeeds if A1 and ... and Ak are true Implication A0 :- A1, ... , Ak. Read as: A0 if A1 and ... and Ak A term can be: constant symbol: atoms (names) numbers (integers oats) (logical) variable symbol the anonymous variable symbol (the underscore)

Logical variables start with an upper-case letter, atoms with a lower-case letter, and numbers with a digit. A logical variable: is untyped can be instantiated by substituting it by another term cannot be assigned Instantiation occurs by unication through pattern matching appearing in an equality expression The anonymous variable (the underscore) matches or unies with anything and is used when it does not matter what the variable matches. Aside. Prolog rules may be recursive.

5.3.1

General Form: a Datalog Rule

General form denedRelation(As) :- r1(R1As), ..., rn(RnAs), selectionConditions(R1As, ..., RnAs). The conditions on the attributes are relational expressions between numerical values: A = B, A <> B, A < B, A <= B, A > B, A >= B. Equality may also be used between atoms. Arithmetic expressions are formed using: +, -, *, /, and (). Numerical expressions are evaluated using is: Value is Expression.

72

CHAPTER 5. DATALOG AND LOGIC-BASED DATABASES

5.3.2

Operations and Equivalents

Selection: selectR(As) :- r(As), selectionConditions(As). Projection: projR(SubsetOfTheAs) :- r(As). Union One rule: unionR1R2(As) :- r1(As); r2(As). or two: unionR1R2(As) :- r1(As). unionR1R2(As) :- r2(As). Dierence: diR1R2(As) :- r1(As), \+ r2(As). Cartesian Product: prodR1R2(As Bs) :- r1(As), r2(Bs). Intersection: intersectionR1R2(As) :- r1(As), r2(As). Divide: divideR1R2(As) :- r1(As Bs), r2(Bs). Join: joinR1R2(UnionOfTheAs) :- r1(R1As), r2(R2As).

5.3.3

Bottom-up query evaluation (Coral)

Begin with the facts and systematically apply rules until the goal is reached. Reasoning is based on modus ponens i.e. Given: p, if p then q Conclude: q

5.3.4

Top-down query evaluation (Prolog)

Begin with a goal and use the rules to reason back toward the facts. Reasoning is based on modus tollens i.e. To estabish q, assume not q Given: not q, if p then q Conclude: not p. Evaluation of a ground goal A succeeds: 1. immediately if there is a fact A in the program. 2. if there is a clause (rule) A0 :- A1, ... , Ak in the program and a ground substitution h such that h(A0) = A the evaluation of all A1, ... , Ak succeed The evaluation of a non-ground goal A succeeds if A unies with a fact or a rule head (after renaming), the substitutions propagate to the body of the rule, and the evaluation of the body of the rule succeeds.

5.4. RECURSIVE PROGRAMMING IN DATALOG

73

Two goals unify if there is a substitution that maps both to the same goal. A substitution is a most general unier (mgu) of two goals if all other unifying substitutions can be obtained from it by composition. If there is a failure then backtracking occurs with the substitutions undone. Innite looping may occur.

5.3.5

Negation and the Open and Closed World Assumptions


Closed World Assumption (CWA): What is not implied by a program is false. Open World Assumption (OWA): What is not implied by a program is unknown.

Traditional database applications use the CWA Negated goals may be represented as \+ G or not G. Negation causes problems: Not every program has a clear logical meaning (due to the interaction of negation with recursion) Bottom-up evaluation does not always produce an intuitive result Example: p(X) :- not q(X). q(X) :- not p(X).

5.4

Recursive Programming in Datalog

Simple recursion: A0 : A1 , ..., Ak . where an Ai is A0 . If an Ak is A0 then the recursion is tail recursion which is equivalent to iteration. If Ai is A0 then rule is left recursive. In Prolog, left recursion results in an innite loop unless there is some change in the parameters. Mutual recursion: There are two or more rules A0 : A0 , ..., A0 ., ..., Aj : Aj , ..., Aj . 0 1 0 1 k l where one of the Aj , i [1..l] is A0 . 0 i Mutual recursion may be detected by constructing a dependency graph whose nodes correspond to the relations (predicates). If a cycle occurs, then the nodes are mutually recursive.

74

CHAPTER 5. DATALOG AND LOGIC-BASED DATABASES

5.5
Lists

Pure Prolog

Terms are extended to include compound terms.

[] - the empty list [X] - a list of one element [Head|T ail] - Head and tail of a list. Note: [X] = [X|[]] [X, Y |T ail] - The rst two elements and the tail of a list. General Terms atom or number List atom(comma separated list of terms); Example: name(Last,First,MI) In Prolog, a fact may have the form: student(name(Last,First,MI), SSNo, address(Number, Street, City, State, Zip)).

5.6

Higher-order Predicates in Prolog


setof(X, X P, Set) - collects all distinct values of X in P into the list Set. bagof(X, X P, Bag) - collects all values of X in P into the list Bag.

These higher-order predicates may be used to collect values so that counts, sums, and averages may be computed. If the Prolog implementation fails to provide builtin predicates, the following may be used. sum(List, Sum) :- sum(List, 0, Sum). sum([], Sum, Sum). sum([H|T], PS, Sum) :- PS1 is PS+H, sum(T, PS1, Sum). count(Bag, Size) :- count(Bag, 0, Size). count([], Size, Size). count([ |T], SP, Size) :- SP1 is SP+1, count(T, SP1, Size). average(L, Ave) :- average(L, 0, 0, Ave). average([], Count, Sum, Ave) :- Count > 0, Ave is Sum/Count. average([H|T], PCount, PSum, Ave) :-Sum is PSum+H, Count is PCount+1, average(T, Count, Sum, Ave).

5.7. LOGIC AND PROLOG

75

5.7

Logic and Prolog

least Herbrand model, substitutions, quantication, xpoint semantics

5.8

Exercises
1. Construct a family database (yours or some immaginary family) with gender and parentOf facts and rules for fatherOf, motherOf, ancestorOf, siblingOf, and other common family relationships including a relatedTo relation. 2. Construct an airline ight schedule relation ights(Airline, From, To, Departs, Arrives) and construct a collection of rules that given departure and destination airports returns a ight itinerary. 3. Reimplement your database project (without the GUI interface) in Datalog.

76

CHAPTER 5. DATALOG AND LOGIC-BASED DATABASES

Part III

The Relational Database Model

77

Chapter 6

IM4 Relational Databases


Suggested time: 8 hours Topics: Mapping conceptual schema to a relational schema Entity and referential integrity Relational algebra and relational calculus Learning objectives: 1. Prepare a relational schema from a conceptual model developed using the entity-relationship model 2. Explain and demonstrate the concepts of entity integrity constraint and referential integrity constraint (including denition of the concept of a foreign key). 3. Demonstrate use of the relational algebra operations from mathematical set theory (union, intersection, dierence, and cartesian product) and the relational algebra operations developed specically for relational databases (select, product, join, and division). 4. Demonstrate queries in the relational algebra. 5. Demonstrate queries in the tuple relational calculus.

6.1

Mapping conceptual schema to a relational schema


Related to: Prerequisites: E/R modeling, Relational data model 79

Connections

80

CHAPTER 6. IM4 RELATIONAL DATABASES Relational Model Entity relation Foreign key (or relationship relation) Relationship relation and two foreign keys Relationship relation and n foreign keys Attribute Set of simple component attributes Relation and foreign key Domain Primary (or secondary) key

ER Model Entity type 1:1 or 1:N relationship type M:N relationship type n-ary relationship type Simple attribute Composite attribute Multivalued attribute Value set Key attribute

Figure 6.1: Correspondence between ER and Relational Models Requisite for: E/R diagrams to relational schema Regular Entity sets - represent each entity set as a relation For each regular (strong) entity set E in the ER schema, create a relation of the same name with the same set of attributes. For each weak entity type E in the ER schema, create a relation R that includes all the simple attributes of E. Weak Entity sets - represent each entity set as a relation As with regular entity sets but include the key attributes of the other entity sets that help form the key of the weak entity set. Note: relations connecting a weak entity set do not need to be created as its attributes are a subset of the attributes of the relation created for the weak entity set. Relations - also represent each relation as a relation For each relationship type R in the ER schema, create a relation of the same name with the key attributes of each entity set in the relation R as attributes of the relation. If the relation R has attributes, include them as attributes of the relation.

6.2

Entity and referential integrity

Representation of relationships Entity Integrity If an attribute A of relation R is a prime attribute of R, then A cannot accept null values.

6.3. RELATIONAL ALGEBRA AND RELATIONAL CALCULUS

81

Referential Integrity Given two relations R and S, suppose R refers to the relation S via a set of attributes that forms the primary key of S and this set of attributes forms a foreign key in R. Then the value of the foreignp key in a tuple in R must either be equal to the primary key of a tuple of S or be entirely null. Constraints (Integrity rules) Domain Constraints attribute values must be atomic (1st normal form assumption) data types (standard numeric types, characters, xed-length strings, variable length strings, date, time, timestanp, money, other) Key Constraint - see functional dependencies Not Null Constraints Entity Integrity - A prime attribute (key) of a relation cannot be null. Referential Integrity - references in one entity to another entity may not be null

6.3
6.3.1

Relational algebra and relational calculus


The Relational Algebra

The relational algebra is a notation for representing the types of operations which can be performed on relational databases. It is used in a RDBMS as the intermediate language for query optimization. Thus an understanding of it is useful for database implementation and for database tuning. A relation is a set of k-tuples, for some k, called the arity of the relation. In general, names are given to the components of the tuple (a tuple corresponds to a record - Pascal or structure - C with elds corresponding to the names of the components). Note: this denition implies that each tuple is unique. Each relation is described by a schema which consists of a relation name and a list of attribute names - relation-name(attribute-list). R(A1 , ..., An ), R.Ai . A relational algebra is an algebraic language based on a small number of operators which operate on relations (tables). It is the intermediate language used by a RDBMS. Queries are expressed by applying special operators to relations. Eight operations were originally dened for relations. Each of these creates a new relation from an existing relation or set of relations. Basic Operators 1. Selection: Selects a subset of tuples from a particular relation, based upon a specied selection condition. The selection condition is a boolean expression formed from the names of the attributes of the relation and constants.

82 expression monadic-expression dyadic-expression selection projection renaming dyadic-operation selection-condition ::= ::= ::= ::= ::= ::= ::= ::=

CHAPTER 6. IM4 RELATIONAL DATABASES relation | monadic-expression | dyadic-expression selection | projection | renaming expression diadic-operation expression selectioncondition (relation name) attributelist (relation name) attributelist (relation name) | | | | | joincondition logical-condition | comparison

Figure 6.2: Relational Algebra 2. Projection: Drops columns from a relation retaining only those in the attribute list. 3. Set union: Combines tuples of two relations with like attributes. Both relations must have the same number of columns. The names of the attributes are the same in both relations. Attributes with the same name in both relations have the same domain. 4. Set dierence: Finds tuples in two relations with like attributes which are in the rst relation but not the second. 5. Cartesian product: Creates a new relation from all concatenations of two relations. NOTE: this is the most computationally expensive operator in the relational algebra. Renaming: The attribute names in the attribute list replace the attribute names of the relation. Derived operators 1. Set intersection: Finds the common tuples in two relations with like attributes. 2. Divide: Takes two relations, with attributes {X1 ...XN , Y1 ...YM } and {Y1 ...YM } respectively, and returns a relation with attributes {X1 ...XN } representing all the tuples in the rst with matched every tuple in the second relation. 3. Join: Creates new relation from all combinations of tuples in two relations with some matching attributes - R joincondition S = joincondition (R S). While this relation has the potential to be computationally expensive (due to the cartesian product) the join-condition typically allows the operation to be relatively inexpensive. The join dened above is called a theta-join. Equijoins are joins where the join-condition only involves equalities. Set Operations For relations R and S of the same arity

6.3. RELATIONAL ALGEBRA AND RELATIONAL CALCULUS R S is the set of tuples in R or in S. R S is the set of tuples that are both in R and S. R S is the set of tuples in R but not in S Where

83

1. R and S must have schemas with identical sets of attributes. 2. The order of the attributes must be the same. These rules may require that columns be renamed and/or reordered. Size Reductions Projection The projection operator is used to produce from a relation R a new relation that has only some of Rs columns. A1 ,...,An (R) is the relation with columns of the attributes A1, ... , An Selection The selection operator, applied to a relation R, produces a new relation with the subset of Rs tuples that satisfy some condition C. C(R) denotes the operation C is a conditional expresssion on the attributes of R. Combining Relations Cartesian product The cartesian product of two sets R and S is the set of pairs of elements of the two sets. R S = {(r, s)|r R and s S} Natural Joins The natural join of two relations R and S, denoted R of R S that agree on some list of attributes. The natural join may be dened by 1. Compute R S 2. For each attribute A that names both a column in R and a column in S, select from R S those tuples whose values agree in the columns for R.A and S.A. 3. For each attribute A above, project out the column S.A and call the remaining column R.A, simply A. (example: employee(id,name), salary(id,salary); the natural join employee-salary(id,name,salary) The theta join of two relations R and S denoted R R S that satisfy the condition C. 1. Compute R S
C

S is only those tuples

S is only those tuples of

84

CHAPTER 6. IM4 RELATIONAL DATABASES 2. Select from the product only those tuples that satisfy the condition.

Renaming S(A1 , ..., An )(R) is the same relation as R but its name is S with the attributes named A1 , ..., An .

6.4

The Tuple Relational Calculus and the Domain Relational Calculus


Related to: SQL, First-order logic Prerequisites: Relational data model Requisite for:

Connections

6.4.1

Introduction

Both the tuple relational calculus and the domain relational calculus are based on classical rst-order logic. Queries have the form: P (X) = LogicalExpressionIn(X) The distinction between the two is whether X is a single variable or a tuple of variables.

6.4.2

The tuple relational calculus (TRC)

The SQL language is based on the tuple relational calculus which in turn is a subset of classical predicate logic. Queries in the TRC all have the form: QueryT arget|QueryCondition The QueryTarget is a tuple variable which ranges over tuples of values. The QueryCondition is a logical expression such that It uses the QueryTarget and possibly some other variables. If a concrete tuple of values is substituted for each occurence of the QueryTarget in QueryCondition, the condition evaluates to a boolean value of true or false. The result of a TRC query with respect to a database instance is the set of all choices of values for the query variable that make the query condition a true statement about the database instance. The relation between the TRC and logic is in that the QueryCondition is a logical expression of classical rst-order logic.

6.4. THE TUPLE RELATIONAL CALCULUS AND THE DOMAIN RELATIONAL CALCULUS85

6.4.3

The domain relational calculus (DRC)

Queries in the DRC have the form: {X1 , ..., Xn |Condition} The X1 , ..., Xn are a list of domain variables. The condition is a logical expression of classical rst-order logic.

86

CHAPTER 6. IM4 RELATIONAL DATABASES

Chapter 7

IM5 Database Query Languages


Suggested time: 5 hours Topics: Overview of database languages SQL Data denition language Data manipulation language Query optimization QBE and 4th-generation environments Embedding non-procedural queries in a procedural language Introduction to Object Query Language Learning objectives: 1. Create a relational database schema in SQL that incorporates key, entity integrity, and referential integrity constraints. 2. Demonstrate data denition in SQL and retrieving information from a database using the SQL SELECT statement. 3. Evaluate a set of query processing strategies and select the optimal strategy. 4. Create a non-procedural query by lling in templates of relations to construct an example of the desired query result. 5. Embed object-oriented queries into a stand-alone language such as C++ or Java (e.g., SELECT Col.Method() FROM Object). 87

88 Data denition Query formulation Update sublanguage Constraints Referential integrity

CHAPTER 7. IM5 DATABASE QUERY LANGUAGES

Embedding in a procedural language Structured Query Language (SQL sequel) Summary of SQL Syntax While SQL is case insensitive, we will use uppercase for sequel reserved words.

7.1
7.1.1

Data Denition Language (DDL)


Data types (predened domains)
Character strings Fixed length: CHAR(n) Varying length: VARCHAR(n) Bit strings Fixed length: BIT(n) Varying length: VARYING(n) Integer:SHORTINT, INT, INTEGER Floating point: FLOAT, REAL, DOUBLE PRECISION, DECIMAL(n,d) Date and time: DATE, TIME

7.1.2

User dened domains

User dened: CREATE DOMAIN domainName AS denition[ DEFAULT value ]; ALTER DOMAIN domainName SET DEFAULT value ; DROP DOMAIN domainName

7.1.3

Table declarations and creation

The CREATE TABLE statement is used to dene a new table. CREATE TABLE tableName ( attributeName attributeType [ attributeConstraint ] {, attributeName attributeType [ attributeConstraint ] } [ tableConstraint {, tableConstraint } ] );

7.2. DATA MANIPULATION LANGUAGE (DML) The constraints include Attribute constraints

89

PRIMARY KEY which is used to designate an attribute as a primary key. UNIQUE which designates an attribute as having unique values and thus can be used as a key. Table constraints PRIMARY KEY(listOfAttributes) which is used to designate a set of attributes as a primary key. The INSERT command is used to insert a tuple into a table. INSERT INTO tableName (attributeNames) VALUES(values) The DELETE command is used to delete one or more tuples from a table. DELETE FROM tableName variable WHERE variable.eld=expression The UPDATE command is used to update/change a tuple in a table. UPDATE tableName variable SET variable.f ield1 = expression WHERE variable.f ield2 = expression

7.1.4

Table deletion

The DROP command is used to delete a table. DROP TABLE tableName;

7.1.5

Table modication

The ALTER command is used to add and delete columns from a table. ALTER TABLE tableName ADD attributeName attributeType; ALTER TABLE tableName DROP attributeName;

7.2

Data Manipulation Language (DML)

Qualied names - where necessary to disambiguate, qualied names may be used. relationName . attributeName Aliases

90

CHAPTER 7. IM5 DATABASE QUERY LANGUAGES attributeExpression AS newAttribute new name for an attribute relationName AS newRelationName new name for a relation

Queries The SELECT command is used to create a new temporary relation consisting of those tuples that satisfy the constraints. SELECT attributeList FROM relations WHERE constraints The attributeList comma separated * - all elds - in eect the selection operation of the relational algebra. comma separated list of some of the attributes - in eect the projection operation of the relational algebra. attributeExpression AS newAttribute new name for the attribute DISTINCT attribute - will eliminate duplicate tuples Relational operators: =, <>, <, >, <=, >= Logical operators: NOT, AND, OR Membership operator: s IN R, s NOT IN R, EXISTS R, NOT EXISTS R, ALL R, ANY R Column operators: SUM( attribute), AVG( attribute ), MIN( attribute ), MAX( attribute ), COUNT( attribute ), COUNT( * ) - number of tuples The relations * - all elds The constraints attribute relop value - relational expressions attributeExpression IN ( subquery ) string LIKE pattern % matches 0 or more characters, matches any one character ... DATE yyyy-mm-dd ... TIME hh-mm-ss.d... ORDER BY attributeList Subqueries Set operators: IN, UNION, INTERSECTION, EXCEPT

7.3. WWW TUTORIALS

91

7.3

WWW Tutorials
James Homans Introduction to Structured Query Langauage With access to a practice database http://sqlcourse.com A Gentle Introduction to SQL

92

CHAPTER 7. IM5 DATABASE QUERY LANGUAGES

Part IV

Relational Database Design

93

Chapter 8

IM6 Relational Database Design


Suggested time: 5 hours Topics: Database design Functional dependency Normal forms (1NF, 2NF, 3NF, BCNF) - schema renement Multivalued dependency (4NF) Join dependency (PJNF, 5NF) Representation theory Learning objectives: Determine the functional dependency between two or more attributes that are a subset of a relation. Describe what is meant by 1NF, 2NF, 3NF, and BCNF, and identify whether a relation is in 1NF, 2NF, 3NF, or BCNF. Normalize a 1NF relation into a set of 3NF (or BCNF) relations and denormalize a relational schema. Explain the impact of normalization on the eciency of database operations, especially query optimization. Describe what is a multivalued dependency and what type of constraints it species. Explain why 4NF is useful in schema design.

95

96

CHAPTER 8. IM6 RELATIONAL DATABASE DESIGN

Chapter 9

Database and Application Development


The development of a database follows the usual project life cycle processes and, as with other software engineering projects, the phases may overlap and/or be performed iteratively. An Approach to Design1 Philosophy Policy Mechanism Ontology, epistemology, axiology (values) An implementation plan Used to implement a policy

9.1

The Life-Cycle

The life-cycle of a database system is a subcycle of an information system lifecycle and includes the following phases: System Denition The Database Design Process Loading or Data Conversion Application Conversion Martin Fowler and Pramod Sadalag in their article Evolutionary Database Design at http://martinfowler.com/articles/evodb.html suggest the following agile practices for database and database application development: DBAs collaborate closely with developers
1 IS

and IT Majors should notice the resemblance to the strategic management process.

97

98

CHAPTER 9. DATABASE AND APPLICATION DEVELOPMENT Everybody gets their own database instance Developers frequently integrate into a shared master A database consists of schema and test data All changes are database refactorings Automate the refactorings Automatically Update all Database Developers Clearly separate all database access code

Evolutionary Database Design at http://martinfowler.com/articles/ evodb.html,

9.2

Database Application Project Script


To guide a team though the development of a database application. The development of a database, often a part of a larger information system, follows the usual project life cycle processes and, as with other software engineering projects, the phases may overlap and/or be performed iteratively. System denition A database application needs statement Materials, facilities, and resources for team support. A development team

Purpose

Entry Criteria

General

Phase 0 1 2 3 4 Exit criteria

The following phases are not sequential but proceed in parallel and are interative with feedback from one phase to another. Activities Description Requirements Requirements collection and analysis Design The database design process Implementation Database implementation and tuning Loading Loading or data conversion Conversion Adding conversion/migration to the database system Functioning database application

Requirements Phase
The focus of the requirements phase is on requirements elicitation and analysis. It elicits the user level ontology and values using appropriate epstemological methods and tools.

9.3. DATABASE DESIGN QUALITY FACTORS

99

Design Phase
Conceptual design Answers the question: What entities and relationships are needed? This phase expands the user level ontology with additional constructs necessary in an abstract ontology and the data is captured using E/R modeling or OOD. Choice of DBMS Answers the question What is the appropriate DBMS? Logical design Answers the question What database model? Mapping conceptual design to a database schema. For the relational model, it is the tranformation of the design ontology into a relational ontology. Schema renement Answers the question What is the simplest schema? The goal is to eliminate redundancy through normalization of database schema. Physical design Answers the question What are the performance requirements? Performance issues. Security design Answers the question What are the security requirements? User groups and access restrictions.

Implementation phase
The focus of the implementation phase is the actual database system implementation and tuning with test data.

9.3

Database Design Quality Factors

These are the values by which the quality of a database is determined Faithfulness: The design and implementation should be faithful to the requirements. The use of constaints helps to achieve this value. Avoid Redundancy: Something is redundant if when hidden from view, you could still gure it out from other data. This value is important because redundancy wastes space and encourages inconsistency. Simplicity: Simplicity requires that the design and implementation avoid introducing more elements than are absolutely necessary - KISS. This value requires designers to avoid introducing unnecessary intermediate concepts Right kind of element: Attributes are easier to implement but entity sets and relationships are necessary to ensure that the right kind of element is introduced.

100

CHAPTER 9. DATABASE AND APPLICATION DEVELOPMENT

9.4

Database Security and Authorization

Connections Related to: Prerequisites: Requisite for: Query optimization Topics/Lectures: Database security issues Discretionary access control Mandatory access control Statistical database security Learning objectives:

9.4.1
Integrity

Issues

Protection & security Security: Protection against unauthorized disclosure, alteration, or destruction - protection against unauthorized users. An organizations security policy denes the rules for authorizing access to computer and information resources. Security objectives: Secrecy: Information should not be disclosed to unauthorized users - protection against authorized users. Integrity: Maintain the accuracy or validity of data - protection against authorized users Availability: Authorized users should not be denied access. The computers protection mechanisms are tools for implementing the organizations security policy.

9.4.2

Security philosophy and issues

To develop a security philosophy, ask these questions: What do your users expect in the way of system security? Will you lose customers if security is not taken seriously enough, too seriously, or if so severe that functionality is impaired?

9.4. DATABASE SECURITY AND AUTHORIZATION

101

How much down-time or monetary loss has occurred due to security incidents in the past? Are you concerned about insider threats? Should you trust your users? How much sensitive information is on-line? What is the loss to the organization if this information is compromised or stolen? Do you need dierent levels of security for dierent parts of your organization? What is your network and host conguration? Do you support dangerous network services? Do you require individual hosts to meet a basic security prole? Are there security guidelines, regulations, or laws you are required to meet? Do business requirements take precedence over security where there is a conict? How important are condentiality, integrity and/or availability to the overall operation of your company or site? Are the decisions youve made consistent with your business needs and economic reality? Legal and ethical issues regarding the right to access information Policy issues regarding privacy System level security Security levels and security policies Multiuser database system - DBMS must provide a database security and authorization subsystem to enforce limits on individual and group access rights and privileges. Two types of database security mechanisms Discretionary security mechanisms - grant privileges to users - read, insert, delete, update les, records or elds. Mandatory security mechanisms - enforce multilevel security by classifying data and users into various security classes (or levels) and implementing the appropriate security policy of the organization.

9.4.3

Authentication

Authentication is the process by which the identity of a participant is established.

102

CHAPTER 9. DATABASE AND APPLICATION DEVELOPMENT

9.4.4

Authorization

Authorization refers to the process that determines the mode in which a particular (previously authenticated) client is allowed to access a specic resource controlled by a server. Protection policy Access control list (ACL) - Example: File permissions Urwax Grwax Orwax read, write, append, execute SQL - GRANT

9.4.5

Encryption

Encryption is used to protect information stored at a particular site or transmitted between sites from being accessed by unauthorized users. Security notes in networking.

9.5

Top-down vs Bottom-Up

Goal of database design: Minimize redundancies to reduce storage requirements to prevent update anomolies Preserve dependencies Satisfy the lossless join property Top-Down: Used extensively in commercial database application design. Conceptual design using the ER model Map the conceptual design to a relational database schema Rene the relational schema using functional dependencies, primary keys, and normalization procedures. Bottom-Up: Preferred by relational database purists. Identify database attributes, functional dependencies, and other constraints Starting with one giant relation schema (the universal relation), apply a normalization algorithm to synthesize the relation schemas A program is available which may be used to normalize individual relations and do bottom-up design.

Chapter 10

IM3 Data Modeling


Suggested time: 4 hours Topics: Data modeling Conceptual models (including entity-relationship and UML) Object-Oriented model Relational data model Learning objectives: 1. Categorize data models based on the types of concepts that they provide to describe the database structure that is, conceptual data model, physical data model, and representational data model. 2. Describe the modeling concepts and notation of the entity-relationship model and UML, and illustrate their use in data modeling. 3. Describe the main concepts of the OO model such as object identity, type constructors, encapsulation, inheritance, polymorphism, and versioning. 4. Describe the basic principles of the relational data model, and dene the fundamental relational model terms. 5. Illustrate the modeling concepts and notation of the relational data model.

10.1

Data Modeling

A database represents some aspect of the real world - a miniworld or a Universe of Discourse. Changes in the miniworld are reected in the database. Logically, inferences within the database are subject to the closed world assumption. 103

104

CHAPTER 10. IM3 DATA MODELING When the semantic mapping between the language and the domain is incomplete or even missing, it may not be possible to determine whether a sentence is true or not. The closed world assumption is used provide a default solution in the absence of a better solution. Closed world assumption: if you cannot prove P or P from a knowledge base KB, add P to the knowledge base KB. There are at least two situations where the closed world assumption is used. The rst is where it is assumed that a knowledge base contains all relevant facts. This is common in corporate databases. That is, the information it contains is assumed to be complete. The second situation is where it is known that the knowledge base is incomplete (does not have enough information to produce an answer to a question) and a decision must be made without complete information a situation familiar to most people. The closed world assumption is designed to solve a reasoning problems in both of these situations. The idea is that if you cannot prove P or not P, assume it is false. This is the usual semantics of relational databases and is employed by programs written in the programming language PROLOG. The closed world assumption is designed to nesse but not solve these problems and is adopted in default of a better solution. The closed-world assumption simply declares that all relevant facts are stored in the database, so that any statement that is true about the actual world can be deduced from facts in the system. This assumption is useful in these situations, but it is untenable for mathematics or the scientic world. Scientic theories are, of course, rarely complete and in fact, it is their incompleteness that suggests areas for further research. The further research is designed to enlarge the knowledge base and, of course, test the accuracy of the theories.

Data modeling is the hardest and most important activity in the RDBMS world. If you get the data model wrong, your application might not do what users need, it might be unreliable, it might ll up the database with garbage. Why then do we start with the most challenging part of the job? Because you cant do queries, inserts, and updates until youve dened some tables. And dening tables is data modeling. When data modeling, you are telling the RDBMS the following: what elements of the data you will store how large each element can be what kind of information each element can contain what elements may be left blank

10.2. E/R MODELING AND DIAGRAMS which elements are constrained to a xed range whether and how various tables are to be linked

105

10.1.1

Constraints
Key: set of attributes that uniquely identify an object within its class or entity within its entity set. Single-value constraints: a uniqueness requirement as with a key. Referential integrity constraints: object referenced must exist in the DB. Domain constraints: value of an attribute must be in a specic set and/or range. General constraints: assertions that must hold in the DB.

10.1.2

Levels of Abstraction

Conceptual design/database is an abstraction of the real world as it pertains to the users of the database. The purpose of a conceptual design is to allow the description of the conceptual scheme of an enterprise to be written down without consideration of the physical implementation. The DDL describes the conceptual scheme and its implementation by the physical scheme. The structure of the database, called the database schema, is expressed in a design language two of which are entity-relationship (E/R) and object denition language (ODL). Logical design/database Data model mapping (logical database design) Physical database a collection of les, indices and other storage structures. Resides permanently on physical storage. View is a portion of the conceptual database. Provides access and security to restricted portions of the database.

10.2
10.2.1

E/R Modeling and diagrams


Conceptual design

The purpose of a conceptual design is to allow the description of the conceptual scheme of an enterprise to be written down without consideration of the physical implementation.

106
Item Entity Description

CHAPTER 10. IM3 DATA MODELING


Graphical Representation tuple

Entity set

Attribute Key

Relationships

An object distinguishable from other objects e.g. (J. J. Blue, 234432789, 324 SW Steller MyTown, SW 98345) A collection of entities described by a common set of attributes. A relation whose name is the entity set name and the elements of the tuples are the attribute names. Often stored as a record. The attributes are the elds of the record. e.g. Employee(Name, SS#, Address) value describing some property of an entity; connected to an entity set; may have attributes attribute or set of attributes that uniquely identify an entity; attribute names are underlined. e.g. SS# a name association among two or more entities; may have attributes; key is the combined key of the entities e.g. WorksIn(EmpName, DeptName)

rectangle

oval with link to entity Underlined names diamond with links to entities

Figure 10.1: Entity Relationship Diagram Elements The ER model is convenient for representing an initial, high-level (conceptual) database design. It provides useful concepts that allow us to move from an informal description of what users want to a more detailed, and precise, description that can be implemented in a DBMS. An Entity-Relationship diagram is a graph drawn with Rectangles each containing the name of a distinct entity set Circles containing a name of an attributes and each linked to an entity set. Underlined attribute names which constitute the set of attributes whose values uniquely identify an entity (key). Diamonds containing the name of the relationship and linked to entity sets (the <isa>: the isa relationship captures some properties of inheritance.) The elements of the E/R diagram are listed in gure 10.1. Constraints - must be identied by the database designer 1. Functional dependencies - attributes or sets of attributes that uniquely identify an object within its class. 2. Single-value constraints 3. Referential integrity constraints - requirement that a value referred to by some object actually exists

10.3. THE RELATIONAL DATA MODEL

107

4. Domain constraints - the value of an attribute must be in a specic set of values or lie within a range 5. General constraints - arbitrary assertions. May be between entity sets. Attributes on relationships It is never necessary to place attributes on relationships. A new entity set can be invented with the desired attributes. Weak entity set A weak entity set is an entity set whose key is composed of attributes some or all of which belong to another entity set. Weak entity sets are inclosed with double rectangles. 1 to 1, n to 1 relationships A number (n if many) is included with the link from a relation to an entity set to indicate the number of entities associated in the relationship. n-ary (n > 2) relationships Any n-ary (n> 2) relationships are rare and can be rewritten as a collection of binary, many-one relationships without loss of information. For example: t(a,b,c) t0(a,t), t1(b,t), t2(c,t).

10.3

The Relational Data Model

Proposed in 1970 by E. F. Codd Connections Related to: Data denition language (DDL) Prerequisites: Sets and Relations Requisite for: Relational algebra and the tuple and domain relational calculus Relation In the relational model, data is represented as a two-dimensional table called a relation. Relations have names and the columns have names called attributes. The elements in a column must be atomic - an elementary type such as a number, string. date, or timestamp and from a single domain. A relation r(R) is a mathematical relation of degree n on the domains dom(A1), dom(A2), ..., dom(An) which is a subset of the Cartesian product of the domains that dene R: r(R) (dom(A1 ) dom(A2 ) ... dom(An )) Figure 10.2 illustrates the idea using an employee relation consiting of a table of names, birth dates, social security numbers, ... The contents of a relation are rarely static thus the addition or deletion of a row must be ecient.

108 The attributes The data Name John Brown

CHAPTER 10. IM3 DATA MODELING BirthDate 12151934 SS # 123454434 ...

Figure 10.2: An Employee Relation movie(title, year, length, lmType) employee(name, birthDate, ss#, ...) department(deptName, empSSNo, employeeName, function) Figure 10.3: Database schema example Relational Database A relational database is a nite set of relation schemas (called a database schema) and a corresponding set of relation instances (called a database instance). The relational database model represents data as a two-dimensional tables called a relations and consists of three basic components: 1. a set of domains and a set of relations 2. operations on relations 3. integrity rules Database schema A database schema is a set of relation schemas for the relations in a design. Changes to a schema or database schema are expensive thus careful thought must go into the design of a database schema. Figure 10.3 illustrates a database schema. Note: the examples given in this document do not include the domain constraint information. Relation Schema - relationName(attribute1 :dom1 , ..., attributen :domn ) A relation schema e.g. employee(name, birthDate, ss#), consists of 1. The name of the relation. Relation names must be unique across the database. 2. The names of the attributes in the relation along with their associated domain names. An attribute is the name given to a column in a relation instance. All columns must be named and no two columns in the same relation may have the same name. A domain name is a name given to a well-dened set of values. Column values are referenced using its attribute name (A) or alternatively, the relation name followed by the attribute name (R.A) 3. The integrity consraints (IC). Integrity constraints are restrictions on the relational instances of this schema. Relation Instance A relation instance is a table with rows and named columns. The rows in a relation instance (or just relation) are called tuples. The cardinality of the relation is the number of tuples in it.

10.3. THE RELATIONAL DATA MODEL

109

The names of the columns are called attributes of the relation. The number of columns in a relation is called the arity of the relation. The type constraint that the relation instance must satisfy is 1. the attribute names must correspond to the attribute names of the corresponding schema and 2. the tuple values must correspond to the domain values specied in the corresponding schema. Database Instance A database instance is a nite set of relation instances.

110

CHAPTER 10. IM3 DATA MODELING

Chapter 11

Functional Dependency
Motivation: Why is one relational schema better than another? We could for example, place all data into a single relation. The single most important concept in a relational schema design is that of functional dependency. Connections Related to: Prerequisites: Requisite for: Normal forms Informal design guidelines 1. Semantics of the attributes: Design a relational schema so that it is easy to explain its meaning. 2. Reduce redundant values in tuples: Minimize storage space. 3. Reduce the null values in tuples: Nulls should apply in exceptional cases only. 4. Avoid generating spurious tuples: Design relations that can be joined with equality conditions on attributes that are primary or foreign keys. Denition 11.1 A functional dependency, denoted by X Y , between two sets of attributes X and Y that are subsets of the attributes of relation R, species that the values in a tuple corresponding to the attributes in Y are uniquely determined by the values corresponding to the attributes in X. For example, the social security number uniquely determines a name; SSN Name. 111

112

CHAPTER 11. FUNCTIONAL DEPENDENCY

Functional dependencies are determined by the semantics of the relation, in general, they cannot be determined by inspection of an instance of the relation. That is, a functional dependency is a constraint and not a property derived from a relation. Required Skills Be able to determine all functional dependencies of a relation. Be able to determine all candidate keys.

11.1

Keys

Denition 11.2 We say that a set of one or more attributes {A1 , ..., An } is a key for a relation R if: 1. Those attributes functionally determine all other attributes of the relation. 2. No proper subset of those attributes functionally determines all other attributes of R i.e., a key is minimal. A set of attributes that contains a key is called a superkey. Thus every key is a superkey but not every superkey is minimal. If a relation has more than one key, one of the keys is designated as the primary key.

11.2

Keys for relations


many-many. The keys of the connected entity sets are the key of R. many-one. The key of the connected entity set (many) are the key attributes of R. one-one. The key attributes of either of the connected entity sets are key attributes of R.

A relation R is constructed from a relationship:

11.3

Reasoning about functional dependencies


Functional dependencies are transitive: If A B and B C, then A C. Two functional dependencies S & T are equivalent i S T and T S. The dependency {A1 , ..., An } {B1 , ..., Bm }

11.3. REASONING ABOUT FUNCTIONAL DEPENDENCIES

113

is trivial if the Bs are a subset of the As is nontrivial if at least one of the Bs is not among the As is completely nontrivial if none of the Bs is also one of the As Splitting rule: {A1 , ..., An } {B1 , ..., Bm } may be replaced with {A1 , ..., An } {B1 } ... {A1 , ..., An } {Bm } Combining rule: {A1 , ..., An } {B1 } ... {A1 , ..., An } {Bm } may be replaced with {A1 , ..., An } {B1 , ..., Bm } Reexive-transitive closure Armstrongs axioms are sound and complete i.e, they enable the computation of any functional dependency. Reexivity - if the Bs are a subset of the As then A B Augmentation - If A B, then {A, C B, C}. Transitivity - If A B and B C then A C. Additional inference rules Decomposition - If {A B, C} then A B Union - If A B and A C then {A B, C} Pseudotransitive - If A B and {C, B D} then {C, A D}

Closure
The computation of the closure of a set of attributes with respect to a set of functional dependencies is used to determine keys and to assist in the normalization of relations. Prolog code to compute the closure http://www.cs.wwc.edu/~aabyan/415/ normal.

114

CHAPTER 11. FUNCTIONAL DEPENDENCY

Chapter 12

Normal Forms
12.1 Overview

The principle problem is redundancy which is often caused by attempts to group both single-valued and multivalued properties of an object. The solution is to replace a relation by a collection of smaller relations. Each of the smaller relations contains a strict subset of the attributes of the original relation. This process is called the decomposition of the larger relation into smaller relations. Redundancy - information (including null values) may be repeated unnecessarily in several tuples. Key assumptions: For each relation there is a set of functional dependencies and a designated primary key. Anomalies 1. Update anomalies - changing information in one tuple leaves the same information unchanged in another; occurs with redundancy. 2. Deletion anomalies - if a set of values becomes empty, we may lose other information as a side eect. 3. Insertion anomalies - inability to represent certain information. The accepted way to eliminate these anomalies is to decompose relations splitting a relation into two or more relations. It should be noted that decomposition of relations has to be always based on principles that ensure that the original relation may be reconstructed from the decomposed relations if and when necessary. If we are able to reduce redundancy and not loose any information, it implies that all that redundant information can be derived given 115

116

CHAPTER 12. NORMAL FORMS

the other information in the database. Therefore information that has been removed must be related or dependent on other information that still exists in the database. Careless decomposition of a relation can result in loss of information. A relation R with schema {A1 , ..., An } may be decomposed into two relations S and T with schemas {B1 , ..., Bm } and {C1 , ..., Ck } respectively, such that 1. {A1 , ..., An } = {B1 , ..., Bm } {C1 , ..., Ck } 2. The tuples in S are the projections onto {B1 , ..., Bm } of all the tuples in R. 3. Similarly for T Note: the decomposition is not necessarily disjoint. Two questions about renement: 1. Do we need to decompose a relation? 2. What problems does the given decomposition cause? Relational design by analysis: normalization is a process of analyzing the given relational scheme based on functional dependencies and primary keys to achieve the desirable properties of 1. minimizing redundancy and 2. minimizing the insertion, deletion, and update anomalies by decomposing the original relation. Required Skill Be able to achieve the desirable state of 3NF by progressing through the intermediate states of 1NF and 2NF if needed.

12.2

First normal form - 1NF

Test: There should be no nonatomic attributes or nested relations. Goal: Statisfy the denition of the relational model. Normalization procedure: Form new relations for reach nonatomic attribute or nested relation along with the primary key. This is the most general procedure. Example: StudentInfo( Name, Address(Street, City, State, Zip), Classes). Address is not atomic. Solution: create a new relation for Address but include the key from the original relation. Address(Name, Street, City, State, Zip), Classes(Name, Class)

12.3. SECOND NORMAL FORM - 2NF

117

12.3

Second normal form - 2NF

Test: A schema must not have a FD, X - Y where X is a strict subset of that schemas key i.e., for relations where the primary key contains multiple attributes, no nonkey attribute should be functionally dependent on a part of the primary key. The functional dependency X - Y is a full functional dependency if the removal of any attribute A from X means that the dependency no longer holds. Goal: eliminate update and insertion anomalies. Normalization procedure: Decompose and set up a new relation for each partial key with its dependent attribute(s). Keep a relation with the original primary key and any attributes that are fully functionally dependent on it. Example: EmpProj(EmpNo, EmpName, ProjNo, ProjName) EmpNo - EmpName, ProjNo - ProjName. Key: EmpNo, ProjNo. Solution: EmpProj(EmpNo, ProjNo), Emp(EmpNo, EmpName), Proj(ProjNo, ProjName) Example: R(A, B, C, D); FD1: A, B - C; FD2: A - D Solution: R1(A, B, C), R2(A, D)

12.4

Third normal form - 3NF

Test: Relation should not have a nonkey attribute functionally determined by another nonkey attribute (or by a set of nonkey attributes). That is, there should be no transitive dependency of a nonkey attribute on the primary key. A transitive dependency can be described as follows: if A determines B, and B determines C, then A determines C. Alternately, a relational schema, R=(R,F), is in the 3rd Normal Form if every FD X-Y in F satises one of the following: * Y is a subset of X * X is a super key of R * Each attribute in Y-X belongs to some candidate key of R Goal: eliminate update anomalies. Normalization procedure: Decompose and set up a relation that includes the nonkey attributes(s) that functionally determine(s) other nonkey attributes(s). Example: StudentInfo(SSNo, Name, Major, Dept, DeptChair), SSNo- Name,Major; Major - Dept; Dept - DeptChair Solution: StudentInfo(SSNo, Name, Major), MajDept(Major, Dept), DeptInfo(Dept, DeptChair) Example: R(A, B, C); FDs: A-B, B-C Solution: R1(A, B); R2(B, C);

118

CHAPTER 12. NORMAL FORMS

12.5

Boyce-Codd normal form - BCNF

Test: Two (or more) composite keys which share an attribute. The following are equivalent denitions of the BCNF A relation R is in BCNF i whenever there is a nontrivial dependency {A1 , ..., An } B} for R, it is the case that {A1 , ..., An } is a superkey for R. The left side of every nontrivial dependency A B contains a key. A relation is in Boyce-Codd normal form if it is in third normal form and all candidate keys dened for the relation satisfy the test for third normal form. Alternately, a relational schema, R=(R,F), is in the Boyce-Codd Normal Form (BCNF) if every FD X Y in F satises either of the following: Y is a subset of X (i.e., this is a trivial FD) X is a superkey of R Goal: eliminate update anomalies. Normalization procedure: Decompose the relation r(As) using the functional dependency X Y (X is not a superkey) as follows: r1(As\Y), r2(XY). Example: Teach(Student, Course, Instructor); Student Course, Instructor Course Solution: Teaches(Instructor, Course), Taking(Student, Course) Example: R(A, B, C, D); FD1: {A, B C, D}; FD2: {B, C A, D} Solution: R1(B,C,A,D), {B, C A, D}; R2(A, B, C), {A, B C}

12.6

Forth normal form - 4NF - No multivalued dependencies

Problem: Multivalued dependencies are a consequence of the 1NF which disallows an attribute to have a set of values where there are two or more multivlued dependencies. Goal: There should not exist any nontrivial multivalued dependencies in a relation. To move from BCNF to 4NF, remove any independently multivalued components of the primary key to two new parent entities. Example: If an employee can have many skills and many dependents, move the skill and dependent information to separate tables since they repeat AND since they are independent of each other. Emp(Name, Skill, Dependent) EmpSkill(Name, Skill), EmpDep(Name, Dependent)

12.7. FIFTH NORMAL FORM 5NF

119

12.7

Fifth normal form 5NF

Goal: keep splitting the tables until either of two states is reached: Further splitting would result in tables that could NOT be joined to recreate the original. The only splits left are trivial (). By now, youve seen that normalization results in splitting tables from one table into two or more tables to eliminate anomalies. One tacit property of this splitting is that the designer could always reconstruct the original table by joining the new ones created during normalization. Fifth normal form diers from the denitions of the previous normal forms in that 5NF denes a goal to be reached, rather than the resolution of a particular anomaly. Requires a great deal of intuition about the data on the part of the designer. Current practice pays scant attention to them.

120

CHAPTER 12. NORMAL FORMS

Part V

Advanced topics Limited Coverage

121

Chapter 13

IM7 Transaction Processing


Connections Related to: Prerequisites: Requisite for: Query optimization Topics: Transactions Failure and recovery Concurrency control Learning objectives: 1. Create a transaction by embedding SQL into an application program. 2. Explain the concept of implicit commits. 3. Describe the issues specic to ecient transaction execution. 4. Explain when and why rollback is needed and how logging assures proper rollback. 5. Explain the eect of dierent isolation levels on the concurrency control mechanisms. 6. Choose the proper isolation level for implementing a specied transaction protocol.

123

124

CHAPTER 13. IM7 TRANSACTION PROCESSING Transactions DBMS Database TP Monitor Figure 13.1: Transaction Processing System

13.1
13.1.1

Transactions
Overview of transaction processing

Transaction Changes made in real time to a database are called transactions. Examples include ATM transactions, credit card approvals, ight reservations, hotel check-in, phone calls, supermarket canning, academic registration and billing. Transaction processing system A system that manages transactions and controls their access to a DBMS is called a TP monitor. A transaction processing system (TPS) generally consists of a TP monitor, one or more DBMSs, and a set of application programs containing transaction. See Figure 13.1. Online transaction processing (OLTP). ACID properties The main goal of a transaction processing system is to maintain the correspondence between the database and the realworld situation it is modeling as events occur in the real world. The transactions of a transaction processing application should satisfy the ACID properties. Atomic Each transaction is executed completely or not at all. Consistent Each transaction maintains database consistency by maintaining all database integrity constraints. It is assumed that if a transaction starts in a state which satises all integrity constraints, then when the transaction completes the database must be left in a state which satises all integrity constraints. committed when a transaction has successfully completed, it is said to be committed. aborted If a transaction does not successfully complete, it is said to be aborted. rolled back If a transaction is aborted, the system must ensure that partial changes are undone or rolled back. Isolated The concurrent execution of a set of transactions has the same eect as some serial execution of that set. No race conditions. Durable The eects of committed transactions are permanently recorded in the database. The system must not lose information in spite of crashes.

13.2. FAILURE AND RECOVERY

125

13.1.2

Atomicity and Durability


Write-ahead log. A log is a sequence of records that describe database updates made by transactions. - used for both atomicity and durability isa sequential le, often duplexed to survive media failure update (undo) record - before image and transaction id and changed item. Log: Transaction: begin record with transaction id ... update record ... etc ... commit record aborted transactions include abort record checkpoint record - list of all uncommitted transactions update record is written to log before DB is updated - the write-ahead log

13.2

Failure and Recovery


Rollback - read log in reverse until begin record is reached undoing the transaction using the before image Recovery from mass storage failure Log and Mirrored disks - disk replacement requires resynchronization - good performance Log and one DB: periodically copy DB or dump Restore after failure Copy dump le to new disk Restore information from a log - read log backwards to determine commits - read log forwards applying writes corresponding to the commits.

fuzzy dump avoids need to take system o line

13.3

Concurrency Control

Serial execution may result in 1. unacceptable small transaction throughput (transactions/sec) and 2. an unacceptably long response time for users

126

CHAPTER 13. IM7 TRANSACTION PROCESSING

therefore, concurrency must be available but race conditions must be prevented to maintain the integrity and consistency of the database. Serializability - a schedule that is equivalent to a serial schedule is called a serializable schedule. For a concurrent schedule and serial schedule to be equivalent: Values returned by corresponding reads are the same Updates occur in the same order. Goal of Concurrency Control: transform the arriving sequence of database requests into a strict, serializable schedule. Focus of this section: The interaction between a transaction, concurrency control and a database. The interaction can be immediate-update (writes are immediate, reads return the values in the database) deferred-update (writes are saved in the intentions list, commits use the intentions list to update the database, reads return values that the transaction has written or written by a committed transaction). Implementation of concurrency Two-Phase Locking - commercial systems implement serializability using a strict two-phase locking protocol. Deadlock - See OS3 Locking in relational databases Table locks are course and may result in loss of concurrency Lock tuples returned by SELECT statement Phantoms - tuples added to the DB after the execution of the select statement Isolation levels (increasing order of strength) 1. READ UNCOMMITTED - dirty reads are possible 2. READ COMMITTED - dirty reads are not permitted (but nonrepeatable reads are possible) 3. REPEATABLE READ successive reads of the same tuple by the same transaction will not yield dierent values (but phantoms are possible)

13.4. DISTRIBUTED TRANSACTIONS

127

4. SERIALIZABLE phantoms are not permitted - transactions are serializable Lock granularity and intention locks tuple lock (ne grain) page lock (medium grain) table lock (course grain) Intension locks ... Serializable locking strategy using intention locks ...

13.4

Distributed Transactions

128

CHAPTER 13. IM7 TRANSACTION PROCESSING

Chapter 14

IM8. Distributed databases

129

130

CHAPTER 14. IM8. DISTRIBUTED DATABASES

Chapter 15

IM9 Physical Database Design


Connections Related to: Prerequisites: Requisite for: Query optimization Topics/Lectures: Storage and le structure Disk organization and File structure. Assumption: each table is stored in a separate le. Heap les (simplest le structure): rows are appended to the end of the le as they are created. Sorted les: table is sorted based on the values of some attribute(s). Indexed les: Index le + data le. The index table utilized an appropriate search key which may involve multiple columns and may not uniquely identify individual rows. The index consists of a set of index entries + a mechanism for locating entries. create index index name on table ( search key ) drop index index name Changes in the table require updating of the indexes.

131

132

CHAPTER 15. IM9 PHYSICAL DATABASE DESIGN Multilevel indexing for sorted le Indexed-sequential access (ISAM) B+ trees Special-purpose indices Bitmap indices Join indices Hashed les Signature les B-trees Files with dense index Files with variable length records Database eciency and tuning Guidance for choice of search key: 1. A column used in a join condition might be indexed 2. A clustered B+ tree index on a column that is used in an ORDER BY clause makes it possible to retrieve rows in the specied order. 3. An index on a column that is a candidate key makes it possible to enforce the unique constraint eciently. 4. A clustered B+ tree index on a column used in a range search allows elements in a particular range to be quickly retrieved. Additional suggestions Throughput problems - identify problematic queries frequently used queries long running queries Concurrency - index based locking improves concurrency.

Learning objectives: 1. Explain the concepts of records, record types, and les, as well as the dierent techniques for placing le records on disk. 2. Give examples of the application of primary, secondary, and clustering indexes. 3. Distinguish between a nondense index and a dense index.

133 4. Implement dynamic multilevel indexes using B-trees. 5. Explain the theory and application of internal and external hashing techniques. 6. Use hashing to facilitate dynamic le expansion. 7. Describe the relationships among hashing, compression, and ecient database searches. 8. Evaluate costs and benets of various hashing schemes. 9. Explain how physical database design aects database transaction eciency.

134

CHAPTER 15. IM9 PHYSICAL DATABASE DESIGN

Chapter 16

IM10. Data mining

135

136

CHAPTER 16. IM10. DATA MINING

Chapter 17

IM11. Information storage and retrieval

137

138 CHAPTER 17. IM11. INFORMATION STORAGE AND RETRIEVAL

Chapter 18

IM12. Hypertext and hypermedia

139

140

CHAPTER 18. IM12. HYPERTEXT AND HYPERMEDIA

Chapter 19

IM13. Multimedia information and systems

141

142 CHAPTER 19. IM13. MULTIMEDIA INFORMATION AND SYSTEMS

Chapter 20

IM14. Digital libraries

143

144

CHAPTER 20. IM14. DIGITAL LIBRARIES

Part VI

Tools and RDBMS Specics

145

Chapter 21

Interaction with a RDBMS


To be student contstructed

21.1 21.2
21.2.1 21.2.2 21.2.3 21.2.4 21.2.5 21.2.6

Design Tools Vendor Specic DB Tools


MySQL Postgesql Oracle DB2 SQL-Server Other

147

148

CHAPTER 21. INTERACTION WITH A RDBMS

Part VII

Project

149

Chapter 22

Quick-Kill Project Management


How to do smart software development with a small team even when facing impossible schedules.1

22.1

The Quick Kill

Quick-Kill project management is a project management/development technique for a small team faced with a high-pressure project and little time for development. Ideally, project management takes either a dedicated project manager or a lot of the project leads time. The lead developers job isnt managementits development. He needs to spend most of his time designing the solution, designing the software, and building the code. But if there is neither the time nor the budget to do project management it is necessary to have a highly directed system of practices that give project leads a good trade-o that yields the most gain for the least eort. Quick-kill project management consists of three techniques that leads can use to help their project produce what the boss expects and users need (Figure 22.1). Each of these techniques takes little time to implement, and helps the team avoid some of the most common and costly project pitfalls. Using them, leads can vastly improve the odds of delivering acceptable software.
1 A condensation of Andrew Stellman and Jennifer Greenes Quick-Kill Project Management Dr. Dobbs Journal Jun 30, 2006. http://ddj.com/dept/windows/189401902 Andrew and Jennifer are the authors of Applied Software Project Management (OReilly & Associates). They can be contacted at www.stellman-greene.com.

151

152

CHAPTER 22. QUICK-KILL PROJECT MANAGEMENT

Vision and scope document Work breakdown structure Code review Figure 22.1: Quick Kill Project Techniques 1. Problem Statement (a) Project Background (b) Stakeholders (c) Users 2. Vision of the Solution (a) Vision Statement (b) List of Features (c) Features That Will Not Be Developed Figure 22.2: Vision and scope document outline

22.2

Vision and Scope Document: Up To 6 Hours

The vision and scope document (see Figure 22.2) should be brief, no more than a couple of pages. Writing a vision and scope document takes less than a day, and helps the team avoid weeks of rework and false starts. The rst step in writing a vision and scope document is to talk to project stakeholders. The lead should nd the people who will be most impacted by the project, either because he plans on using it or because he is somehow responsible for it being developed. Gathering those needs should take less than an hour per stakeholder. All of the information the team gathers by talking to the stakeholders should be added to the Problem Statement section. The Project Background section is a summary of the problem that the project solves. It should provide a brief history of the problem and an explanation of how the organization justied the decision to build software to address it. This section should cover the reasons why the problem exists, the organizations history with this problem, any previous projects that were undertaken to try to address it, and the way that the decision to begin this project was reached. The Stakeholders section is a bulleted list of the stakeholders. Each stakeholder may be referred to by name, title, or role (support group manager, SCTO, senior manager). The needs of each stakeholder are described in a few sen-

22.3. WORK BREAKDOWN STRUCTURE: 2 HOURS Task 1. . . . n. Est. Time Assumptions

153

Figure 22.3: Work Breakdown Structure tences. The Users section is similar, containing a bulleted list of the users. As with the stakeholders, each user can either be referred to by name or role (support rep, call quality auditor, home web site user). The needs of each user are described. The goal of the Vision Statement section is to explain the purpose of the project and to describe what the project is expected to accomplish. It should provide a compelling reason, a solid justication for spending time, money, and resources on the project. It should provide a general statement describing how the needs of the stackholders and users will be lled. The List of Features and Features That Will Not Be Developed sections contain a concise list of exactly what will and wont be built. Every single feature in each list should be built to address a specic need listed in the Problem Statement section. Those features which seem obvious, but which do not really address a need should be described in the Features That Will Not Be Developed section.

22.3

Work Breakdown Structure: 2 Hours

The work breakdown structure (WBS) consists of a list of 10-20 tasks that will produce the nal product, a time estimate for each task, and a list of assumptions for that task. The total time for creating the WBS will typically be about two hours. It should take less than an hour for the team to brainstorm this list of tasks. Another hour is needed for the team to come up with the assumptions and estimates for the tasks in the WBS. A record must be created for the tasks, their estimates, and the assumptions (Figure 22.3). The list of tasks are developed by the team in a brainstorming session led by the lead developer. Then the lead should bring up each task one by one, and for each task the team should decide how long it will take. Estimating a project requires the team to gure out in advance the steps required to complete it, and to come up with a number of days (or weeks, months, hours, and so on) that each step requires. The estimates are necessarily based on incomplete information. To deal with incomplete information, they must make assumptions about the work to be done. Disagreement on how long a task takes is likely

10-20 tasks

154

CHAPTER 22. QUICK-KILL PROJECT MANAGEMENT

1. Select code segment 2. Distribute copies for individual study 3. Conduct review meeting 4. Log defects 5. Assign repair task 6. Review updated code Figure 22.4: Code Review

due to dierent assumptions about details of the work product or about the strategy for producing it. As the assumptions are discovered, they should be written down. In the course of its discussion, team will think through many of the details that would otherwise be left unaddressed until later in the project and the team will start to make decisions about how the software will be built.

22.4

Code Reviews: 2.5 Hours Per Review

In a code review (Figure 22.4), the team examines a sample of code and xes any defects in it. To prepare for the review, the lead distributes a printed copy of the selected code (with line numbers) to each team member. The team members each spend half an hour individually reading through (and, if possible, executing) the code sample. Each team member tries to spot as many defects as possible, marking them down on the printed copy. After the individual preparation, the team leader gathers everyone for the review meeting. The code review session starts with the lead developer reading the rst small block in the code sample he simply gives a brief description (about one sentence) of the purpose of that block. Team members should then discuss any defects found in that block of code. If anyone found a defect, the lead developer should decide whether or not the team can come up with a way to x it on the spot. The defect and discussion are documented in the review log (Figure 22.5). Once the meeting is over, the lead should e-mail the log to the rest of the team and assign defect repairs to whoever is responsible for that block of code. Once the defects are repaired, he should review the updated code to make sure that it was repaired properly.

Date: Defect Line # How to x Fix assigned to

Module: Who found it

Sign-o

22.4. CODE REVIEWS: 2.5 HOURS PER REVIEW

Figure 22.5: The Review Log

155

156

CHAPTER 22. QUICK-KILL PROJECT MANAGEMENT

22.5

Exercises
1. Construct a rational for the vision and scope document consider e the programmer, the team, the lead developer, the employing organization, and the stakeholders. 2. Construct a rational for the work breakdown structure consider e the programmer, the team, and the lead developer. 3. Construct a rational for code reviews consider the programmer, e the team, the lead developer, the employing organization, and the stakeholders. 4. What criteria should be used to select the sample of code to be inspected for a code review? 5. What should one look for in a code review? 6. Suggest uses for the review log once the project is completed. 7. Evaluate Quick-Kill with respect to the ve basic process groups and nine knowledge aspects of project management. 8. Evaluate Quick-Kill with respect to CMM-Software. 9. Evaluate Quick-Kill with respect to project risk management.

Chapter 23

Project Reports: DB Design and Implemenation


23.1 Phase 0 Problem Description

The following is an example. It is intended to get you started in the right direction in designing your system. You as the designer must analyze and decide what other details or features should be specied for your system. Thus, individual group implimentations will dier in terms of design and implementation styles. The Allegro music store serves a wide variety of musical interests. As a small store, it must maintain close control of its inventory. The store keeps an inventory of music in stock. The store also keeps track of customer orders, which are placed either online or through an employee of the store. Your task is to design the database and application programs that will help manage the inventory and the day to day processing. Note that certain functions like orders with vendors, automatically-generated orders, receipt of shipments, etc. are left out in order to reduce the size of the project. There are four types of processes that are relevant: [query] This process allows store employees to query the database with regard to music in stock and music on order. [order] This process generates orders by customers for music. Orders are placed online (dont worry about online details) or called in by phone. A store employee inputs orders. [sell] This process modies the database appropriately, regarding the item(s) being sold and the employee making the sale. It is typically the operation done at the cash register. 157

158CHAPTER 23. PROJECT REPORTS: DB DESIGN AND IMPLEMENATION [billing] An invoice is generated for every order placed by a customer. A receipt is printed for every in-store sale transaction. [admin] This process modies the database information about employees, customers, vendors, etc. It may have other management report features which are left out for this project. The Allegro music store sells music in 3 dierent forms: cassette, compact disk, and sheet music. A given music title has several attributes associated with it: examples are title, artist(s), producer, musical subject, physical type of music (cassette, compact disc, or sheet music), year of recording, etc. An employee can query the music in stock by searching on any one or more attributes of the music. Music vendors can distribute several music titles; however, each music title can only be distributed by one vendor. Vendor information (name, address, phone, contact person, etc.) is also maintained in the database. Employees are in charge of several tasks: selling music to customers, ordering music from vendors, and/or generating sales reports. Employee information is also maintained: ID, name, address, pay level, job title, etc. The store can place orders to vendors when stock runs low or when a customer requests a special order. A particular order can only go to one vendor: this means that all items being ordered from a particular vendor can exist on the same order. However, if items are distributed by diering vendors, an order form for each vendor must be lled out. Example attributes of orders are the item(s) to be ordered, date of order, vendor to whom the order is going, and customer information if applicable. *Note: In Phase I you should consider vendor orders for restocking purposes as a part of the system. In Phases II and III, they will be left out. Customer information is maintained when special orders are placed for that customer. Customer information can also be maintained optionally when selling music: for example, customers can ll out a card to be put onto a mailing list. *Note: Advertising and promotion features are out of the scope of this project. Sales information must be maintained whenever music items are sold: amount of purchase, each item purchased, customer information (if applicable), the employee making the sale, etc. Statistics Sample statistics relevant to the applications include the following. They would impact the real operational environment of the database system. Here, it is oered as general information only. CUSTOMERS: 30,000 customers in the database Queries: sales reports are done once weekly by music title

23.1. PHASE 0 PROBLEM DESCRIPTION MUSIC: 50,000 dierent music titles between 1 and 20 copies of any music title is in stock at any given time VENDORS: 200 dierent vendors supply music CUSTOMER ORDERS/SALES:

159

300 customer orders per day are submitted online (each order has on average 5 titles) 10 clerk orders per day are taken over the phone (each order has on average 3 titles) 500 orders are queried per day to check on the status of the order 500 sale transactions per day in the store with an average of 3 titles per sale ORDERS TO VENDORS: Every week about xx orders are placed with vendors, each with about 25 items (each item with an average quantity of 20) SHIPMENTS RECEIVED FROM VENDORS: Every day yy shipments are received from vendors. Each shipment corresponds to an order. NOTE: Values of xx and yy above can be estimated but are left out. Assumptions The following assumptions will be made regarding the day-to-day activities of the Allegro Music store. More assumptions may be added at a later date. Assume that the database is implemented for transactions on or after Jan. 1, 2000. Music Music titles can come in 3 dierent forms: cassette tape, compact disk, or sheet music. At any one time the store may have all three forms in stock for a particular music title. Therefore, the database must keep track of which forms of music are in stock for each music title. Vendors A vendor may publish several music titles; however, a particular music title can be published by one and only one vendor. Orders to Vendors When placing a regular order, all music by the same vendor may exist on the same order. Music from dierent vendors must be placed on dierent orders: one vendor per order. You can create a unique order ID by using the date as part of the order number. Date should be in the format YYMMDD.

160CHAPTER 23. PROJECT REPORTS: DB DESIGN AND IMPLEMENATION Orders may be placed one of two ways. The customer can use an online order form to place an order, or the customer may ask a store employee to place the order for them. *Note: you should not be concerned with how the online form is implemented. Your only concern is to make sure the order information gets recorded in the database, regardless of the method of input. Customers Customer information is optionally stored in the database. For orders placed online, recording customer information is mandatory. For in-store customers, it is optional. If a customer places a special order, then the customers name and telephone number must be recorded. Employees Employees of the store will be the users of this database system, not the customers themselves. Information on every employee is in the database for management purposes. Sales Each sale is made by one employee to one customer. The sale can include multiple music titles, and should also include quantity & price of each item and a grand total. Upon making the sale, the system should decrement the music in stock appropriately. Applications For the Allegro Music Store You are to develop a menu driven application system for the Allegro Music Store database using Pro*C. The following are examples of some of the menus to be developed. All applications described below MUST be implemented. However, you may choose to add more functions. ** Note: You may add more menu screens as necessary; the screens below are intended only as a guide. Allegro Music - Main Menu 1. Music Applications 2. Ordering Applications 3. Sales Applications 4. Administrative Applications Access restricted to Managers ONLY 5. Exit Select Item: For item 1 (Music Applications) in the Main Menu, another menu would appear as shown below, which allows a sales clerk to add a music title to the inventory, change information about any title (based on ISBN) and query the music title information.

23.1. PHASE 0 PROBLEM DESCRIPTION Music Applications - Menu 1. Add A Music Title 2. Change Music Information 3. Query Music Information 4. Return to Main Menu Select Item:

161

Each of these items would require further information about the music title to be entered from the user. Adding music requires all parameters of the title to be entered into the database. Changing music information stored in the database requires the changed eld(s) to be entered and the database must be updated appropriately. To query music information, the system must prompt the user to enter the desired search criteria and then return the results in a tabular format. For item 2 (Ordering Applications) in the Main Menu, another menu would appear as shown below, which allows a sales clerk to query the order data (based on order number, customer, or vendor), place a new order, or change an existing order. Order Applications - Menu 1. Query Order Information 2. Place New Order 3. Change Existing Customer Order 4. Return to Main Menu Select Item: Each of these items would require further information about the order from the user. To query an order, the user must enter search criteria, and results of the query must be returned in a tabular format. To place an order, the order information must be prompted for and stored into the database; if the order is for a customer, then customer information must be stored also. To change order information, the user must rst pull up the order in question, then change the appropriate eld(s). The changes are then stored in the database. For item 3 (Sales Applications) in the Main Menu, another menu would appear as shown below. The rst option allows a clerk to record the information in the database that a given music title(s) has been sold. Appropriate customer information is also stored. The second option produces six sales reports. Note that only managers are allowed to produce sales reports. Sales Applications - Menu

162CHAPTER 23. PROJECT REPORTS: DB DESIGN AND IMPLEMENATION 1. Sell A Music Title(s) 2. Sales Reports Access restricted to Managers ONLY 3. Return to Main Menu Select Item: The rst choice is continuously used at the cash registers. No customer information is required for a sale. A sale should allow for multiple music titles to be sold during one transaction. A receipt should also be generated. For item 2, another menu would be produced which would allow the manager to produce ve sales reports. Note that the system must have some method of recognizing that the user is a valid manager (use some kind of login method). Sales Reports - Menu 1. Total Sales by Title 2. Total Sales by Customer 3. Total Sales sorted by Vendor 4. Total Sales sorted by Subject 5. Total Sales - Top n Titles 4. Return to Main Menu Select Item: All sales reports will prompt the manager to enter starting and ending dates for the report. Depending on which report is chosen, additional information may be needed: vendor, subject, type of music, etc. All reports should list sales in decreasing order: e.g., the highest sale rst. The rst report should list total sales per day for a particular title during the time period specied by starting/ending dates. The second report should list all titles ordered for and sold to a particular customer. *Note that this is only applicable for those customers who have ordered books, since their information is already in the database. This does not apply to walk-in customers who are not listed in the database. The third report shows all titles sold per vendor, sorted by vendor name. It should list a subtotal for each vendor, and a sales total at the end of the report. The fourth report shows all titles sold per subject, sorted by subject name. It should list a subtotal for each subject, and a sales total at the end of the report. The fth report shows the top n sales of music titles in decreasing order. For instance, if n is 20, then the top 20 best-selling music titles will be listed with their total sales in decreasing order.

23.2. PHASE I PROJECT REPORT

163

For item 4 (Administrative Applications) in the Main Menu, another menu would appear as shown below. This menu allows the manager to add/change customer information, publisher information, or employee information. Again, access to employee information is restricted to managers only. Administrative Applications - Menu 1. Customer Information 2. Publisher Information 3. Employee Information Access restricted to Managers ONLY 4. Return to Main Menu Select Item: Each of these items would require further information. For each choice, the system should ask whether the user wants to add new information or change existing information. To add information, the system should ask for input and store it appropriately. To change information, the system should rst display the current information in the database for a particular record, then prompt the user to change the appropriate information, and nally, store the changes. To access Employee information, managers must provide some sort of clearance (login id and password, for instance).

23.2

Phase I Project Report

The Phase I Report must contain an analysis of the intended information system and the purpose of this phase of the project. It must describe the problems encountered in this phase and justify the solutions. It must contain all documentation produced in this phase. The Phase I Report will be graded out of 20 points total, counting as 20% of your nal project grade. Specically, the following is expected: 1. (10 points) Analyze the entire application and come up with an extended ER diagram. include max cardinalities only show all attributes use some diagram editor to draw a clean diagram. 2. (2 points) Make a list of semantic constraints that apply over and above what you can show in the diagram. 3. (2 points) Mention any assumptions you made in doing the above that go beyond what is given in the project description.

164CHAPTER 23. PROJECT REPORTS: DB DESIGN AND IMPLEMENATION 4. (5 points) Make a list of application areas: e.g., order processing, billing, vendor orders, inventory management etc. (NOTE: as we said, not all of these are for implementation in phases 2 and 3, but you should list all areas for Phase 1.). For each area - list what input documents are involved (just state contents of the document), what outputs are produced (state contents in terms of attributes) and what database activity is involved (retrievals and updates). 5. (1 point) Besides the above, comment on the diculties you faced in doing this conceptual design task. NOTE: The above specication will drive your subsequent phases. As in any actual design and implementation, you will be revising what you do here as you proceed thru the phases. Please be explicit and detailed when writing your report. This will help you in the long run!

23.3

Phase II Project Report

The Phase II Report must contain the goals of this phase of the project. It must contain the Phase I Report and must describe any revisions made to the specication described in the Phase I Report. It must further describe the problems encountered in Phase II and justify the solutions. It must contain all documentation produced in Phase II. The Phase II Report will be graded out of 20 points total, counting as 20% of your nal project grade. However, you will be responsible for making sure customer orders and sales are reected in the database system. You can make the assumption that a customer order will automatically initiate a sale. E.g., when a customer places a special order, assume that the order will be delivered to them (we dont care how) and that the store collects the customers payment (again, we dont care how). This means that both the order and sales portions of the system will be updated. Specically, the following is expected: 1. Outline the goals for this Phase and briey mention any revisions made to the previous Phase. 2. (8 points) Hand in a copy of Phase 1 either the original or a copy. If youve modied your ER diagram, turn in the modied copy. IMPORTANT: Make sure that the ER-diagram is correct First section of report: Illustrate how you translated from the ER diagram to your relational schema. It should follow the ER to relational algorithm.

23.3. PHASE II PROJECT REPORT

165

Show each attributes data representation: e.g. each columns data type Primary keys and foreign keys should be correctly identied Show constraints (in words) over and above referential integrity contraints for each table. E.g., a customer cannot have 2 orders in one day is a constraint on the order table and must be checked before an order can be generated. 3. (8 points) Next: Show SQL for creating tables (ONLY the table creation SQL). You should write SQL scripts as mentioned in class that will create and populate each table. Each table is required to have at least 5 tuples. Also, keep in mind the relationships between tables and plan your data accordingly. You should be able to run these scripts successfully in SQLPlus with no errors and no integrity violations. 4. (4 points) Last section: Identify 8 (non-trivial) tasks from your tasks (e.g. application areas) of Phase 1. The tasks that you choose should be distinct from each other. E.g. you should NOT have 5 of your tasks be all 5 of the sales reports. Your tasks should be broken down into their associated subtasks. E.g. Selling a music item requires putting sales information into the sales table AND decrementing the music in stock that was just sold. For each task, identify which tables are aected by that task and its subtasks, and show explicit operations (e.g. insert, delete, or modify) that are performed on tables. Example: To generate an order, rst you must gather order information - customer, names/IDs of music to order, employee placing the order, etc. Next, check to make sure customer has not already placed an order that day. Next, information must be inserted into the appropriate tables: insert into Order MusicTable[orderID, musicID, quantity]; insert into OrdersTable[cust name, employee name, date, orderID, totalquantity, ...), etc. Denote any contraints that must be met before actions are performed on the table(s). Note: You do not have to implement the SQL or C code for your tasks yet; that will be done for phase 3. 6. Identify any diculties you may have had with this phase of the project.

166CHAPTER 23. PROJECT REPORTS: DB DESIGN AND IMPLEMENATION

23.4
Deadlines:

Phase III Project Report

The report is due on: the last day of class You must also demo your project during the last week of class. Requirements: Phase III will be graded out of 60 points total (e.g. 60% of your project grade for this class). Demo: 50 points Time: You will have 20 minutes in which to set up and demo your project. Grading strategy: We will be making a checklist of all features which we expect and will grade according to how well these features are implemented and working. We expect you to follow the sample menus provided in the project description and to provide functionality for each task listed in those menus. What is expected: The program MUST run (preferably without any core dumps) Plan the demo ahead of time (walk through the tasks that you want to demo) Make sure you have ample sample data in your tables so that you can suciently perform all possible tasks and illustrate each expected function No fancy stu (dont waste time on a fancy GUI!), but your interface should be reasonable and menu-driven We really hope to see all the functionality! Dont miss out important tasks like Orders and Sales. If you do miss any tasks, then make sure that their functionality is demonstrated somewhere else - e.g. we will give partial credit if you have source code which illustrates the functionality of that task. Report: 10 points What is expected: Description of Implementation, problems faced, ... (Less than 3 pages) Users guide (Less than 4 pages) All source code (in ProC), including the SQL scripts that create and populate your tables