Sie sind auf Seite 1von 21

Course Schedule

Course Modules

Review and Practice

Exam Preparation

Resources

Module 3: File and database organization


Overview
This module introduces the basic concepts of files and databases, their components, and organization.
Database characteristics, advantages, and disadvantages will be reviewed, followed by a comparison of
hierarchical, network, and relational databases. You will also study database management systems and new
developments.

Test your knowledge


Begin your work on this module with a set of test-your-knowledge questions designed to help you gauge the
depth of study required.

Topic outline and learning objectives


3.1

Data organization and


information

Describe how fields, records, files, and databases are organized within a data
hierarchy. (Level1)

3.2

Database organization
methods

Describe database organization and database components. (Level1)

3.3

Database management
systems

Describe a database management system and explain why it is needed.


(Level1)

3.4

Database storage and


analysis

Describe database storage techniques. (Level2)

3.5

Database developments

Describe database developments, including data warehousing, data marts, and


data mining. (Level2)

Module summary
Print this module

Course Schedule

Course Modules

Review and Practice

Exam Preparation

Resources

Module 3: Test your knowledge


1. Multiple choice
a. Which of the following is the lowest level in the hierarchy of data?
1.
2.
3.
4.

Entity
Field
File
Record

b. What is a data definition language?


1. A language used
2. A language used
3. A language used
a database
4. A language used

by Java to define data on the Web


in data communication to route data packets
to define and describe data and data relationships in
in decision support systems to define data

c. What is the most important characteristic of a primary key field?


1.
2.
3.
4.

It
It
It
It

is short.
is a file.
represents an entity.
is unique.

d. Which of the following is a problem of traditional file environment?


1.
2.
3.
4.

Requires administrator for data maintenance


Requires fourth-generation language to program
Program data dependence
Storage capacity

e. Which of the following file or database models has a many-to-many relationship?


1.
2.
3.
4.

Indexed
Flat file
Hierarchical
Relational

Solution
2. Chapter 5, Review question 2, page 210
Solution
3. Chapter 5, Review question 13, page 210
Solution

Course Schedule

Course Modules

Review and Practice

Exam Preparation

Resources

3.1 Data organization and information


Learning objective

Describe how fields, records, files, and databases are organized within a data hierarchy. (Level 1)
Required reading

Chapter 5, pages 174-177


LEVEL 1

Databases can be used for business intelligence purposes such as obtaining product profitability, customer
profiles, and targeting promotions. In the opening case of Chapter 5, the Valero Energy company uses a
fully-integrated enterprise business intelligence system, called WebFocus, to make meaningful data available
and accessible throughout the organization.

Basic terms
Data is generally organized in a hierarchy that starts with a character and progresses into a database. For
illustrative purposes, let's look at the components of a student database that holds the students' names,
courses enrolled, and the students' grades. A character may be alphabetic, numeric, or a symbol, and each
character occupies a single position in a field. Each letter in a student's name is a character.
A field is a group of related characters and it is the smallest piece of information in a record. For example, in a
student file, one field could hold the first name of each student; in an accounts receivable file, one field could
hold the invoice number. A field can also hold graphical, video, or sound images. More than one field makes up
a record.
A record is a collection of related data fields. It holds all the information about an entity in the file. All the
records in a file must have the same fields.
A file is a collection of related records. Each file has a unique structure. For example, a paper-based file is
identified by a folder and all the pages it holds, organized in some fashion, perhaps with a table of contents. An
electronic file on a computer is identified by a filename, and holds all the records stored under the filename.
An entity is a generalized class of people, places, or things (objects) for which data is collected, stored, and
maintained. For example, in a student database, one entity could be a student. In general, each entity has at
least one record associated with it.
An attribute is a characteristic of an entity. In the above example, the student has a student number, name,
date of birth, and so on. Attributes are contained in the fields that are grouped by entities. Not only must each
record in a file contain the same fields, each field must hold the same type of information and have the same
attributes. An example of an attribute defined for the NAME field of a personnel file could be:
Field description:

NAME

Field type:

Character field

Field width:

30 characters

Field structure:

Last name, followed by a blank, then first name, followed


by a blank, then initial. The first character in the last and
first names must be in upper case, subsequent characters
to be in lower case, unless specified otherwise. The initial
is always in upper case. If a name contains more than one

initial, use the first initial only. If the name is too long to
fit in the field, drop the initial, then truncate (shorten) the
first name as needed.

Each record can be seen as a row in a table and each field can be seen as a column. A database is an
organized collection of records in one or multiple tables.
All databases require that every record contain at least one key. A key is a field or set of fields that identifies
the record. A primary key is a field or set of fields that uniquely identifies each record in the table. In case
the primary key is not unique, a secondary key can be used. For example, in a file containing a student
directory, the key field could be the name, and the secondary field could be the address, so that in case of
identical names, the secondary field can be used for sorting.

Course Schedule

Course Modules

Review and Practice

Exam Preparation

Resources

3.2 Database organization methods


Learning objective

Describe database organization and database components. (Level 1)


Required reading

Chapter 5, pages 179-186


LEVEL 1

Database approach
As computer applications became more complex and required the use of several related files, database
techniques were developed to meet these needs. The Data Base Task Group of the Conference on Data
Systems Languages (CODASYL) published the first formal documentation of the key features of databases in
1971. This publication, which has been updated several times, has become the model that many software
developers use to develop databases.
Unlike the file approach, the database approach allows different applications (for example, accounting,
personnel, and payroll) to access the same database. Instead of organizing the data to meet the needs of a
particular application (for example, payroll), the database approach requires the organization to analyze its
overall information requirements, and then design a common database to meet the needs of multiple
applications.
Database systems provide a centralized repository of information that is not application-specific. The data in the
database is managed centrally regarding the data integrity, primary and secondary key management, and
indexing. Various applications access the database to update information. Because the information is no longer
organized in application-specific files, it is much easier to update or change software applications as long as the
information is used as structured in the database.
The database approach requires the use of database management systems (DBMS).

Data modelling
Logical design describes logical relationships among data and groups them in a logical order, whereas
physical design takes the logical design and structures it for efficiency and effectiveness. For example, it
might be more effective to create summary totals as data are entered, rather than calculate them each time
they are required, or some data attributes could be carried in more than one entity. These are examples of
planned data redundancy, with the goal of improving system performance to meet user needs.
An important tool for database designers is a data model, which is used to show relationships between
entities. If this is done at the highest level for the organization, it is known as enterprise modelling. A
commonly used tool for modellers is an entity-relationship(ER) diagram. By using these tools, designers
can ensure that relationships are logically structured so that when databases and application programs are
developed, they will in fact meet the needs of the system's users.

Database models
The data in a database can be interrelated in many ways. Historically, databases were organized in a
hierarchical or network structure. Today, the most popular structure is a relational database. Do not be overly

concerned with the mechanics of these structures. Instead, focus on the essential differences between the
database types, and the general organization of the data.
Hierarchical database

A hierarchical database organizes information in a tree-like structure in which data elements are related to
each other in a parent (superior) to child (subordinate) relationship. A data element can be a data field, a
record, or a database file.
The hierarchical database provides a one-to-many relationship, in a top-down manner. To access the employee
of any department, you must specify the department because department is the parent of employee. If you
have no information on the parent, it is impossible to retrieve the item because you must access the item
through its parent.
A hierarchical structure is particularly useful for databases containing structured information where access to
information is keyed to the structure, that is, the logical access is in the same hierarchy as the physical layout
of the database. The rigid structure of a hierarchical database enables it to be updated efficiently. Typically, it
is used in applications such as inventory management, where a large number (hundreds of thousands or
millions) of records are in the system.
Network database

A network database is similar to a hierarchical database except that a child in the system can have more
than one parent. Thus, because more than one path to a particular data element exists, the database structure
is many-to-many. Network databases are particularly efficient for looking up information because they permit
access from more than one starting point. Unlike a hierarchical database, the process of querying a network
database is less restrictive. A network database is appropriate for situations where queries of the database may
not follow a predetermined pattern. An example is a database of students and their course enrolment, where a
student can be enrolled in multiple courses. The relationship between students and courses is thus many-tomany, and a hierarchical database is inappropriate.
Relational database

A relational database uses two-dimensional tables called relations to store data. In the relational model,
each row of a table represents an entity, with the columns representing attributes. Each attribute can have
only certain predefined values, and these allowable values are called the domain. This provides automatic
error-checking features to all applications using the table.
The relational database is particularly easy to manage for answering user questions and producing reports.
Basic data manipulation includes selecting (eliminates rows), projecting (eliminates columns), and joining
and linking (creates a new table).
One distinctive feature of a relational database is that you can combine any number of tables as long as there
are common fields. You can combine (join) two tables to form a third, provided there is a common column. As
long as the tables share at least one common attribute, they can be linked to answer queries or produce
reports. What is especially important is that data from multiple tables can be linked to answer queries. Using a
relational database, you can answer a complex query with a few simple commands, whereas the traditional
file-based approach would require several programs to be written and run against the various files containing
the required data, and then creating a new file after several operations.
A relational database has properties beyond two-dimensional tables. For example, there is no need for order or
sequence in a table, and the relation is a logical structure, thus users need not be concerned with physical
storage details.
Because of the flexibility provided by relational databases, they are becoming the design of choice for computer
professionals. Relational databases reduce data redundancy (facilitated by the database joining capability) and
allow data tables to be added with relative ease. With relational databases, it is relatively easy to perform

queries on the data without being constrained by the actual structure of the data. Microsoft Access is a
relational database program.

Example 3.1
Choosing a database model

Francine Ong has been assigned to design a database for a new inventory control system. The following is a
partial description of the data items and their relationship:
Product items are organized by product lines, and each product can only belong to one product line. Each
salesperson is assigned one or more product lines. A product line can have more than one salesperson
assigned. Each salesperson is assigned a sales territory. For large territories, more than one salesperson can be
assigned.
Q: Of the three database models (hierarchical, network, and relational), which model is suitable for the
information described?

Solution

Exhibit 3-1 graphically represents the network model for the inventory database, while Exhibit 3-2 depicts the
relational model. Exhibit 3-3 is a short-form notation of the relational model.

Exhibit 3-1
Network model

Exhibit 3-2
Relational model

Sales territory (partial table)


Territory code Territory name
Territory manager G/L profit centre
1001
Northern B.C.
J. Chrieten
901001
1002
North Vancouver Island B. Beverly
901002
1003
South Vancouver Island C. Cleverly
901003
1004
Lower Mainland
K.C. Leung
901004

Salesperson (partial table)


Salesperson ID Salesperson name Territory code Quota for the year
810
Kelvin Longile
1001
500,000
811
Rowanda Dhaliwal 1002
450,000
812
James Jones
1003
600,000
813
Mathew Mah
1004
800,000
814
Lucien Chong
1004
800,000

Product line (partial list)


Product line code Description
E01
E02

Electrical parts
Plumbing supplies

Product assignment (partial list)


Product line code Salesperson ID
E01
810
E01
811
E01
812
E02
813
E02
814

Product (partial table)


Product Product name Manufacturer
Product Unit
code
line code cost
C1023 Centronics plug Acme Manufacturing E01
10.57
C0143 Triplex plug
Acme Manufacturing E01
13.98
C1045 Universal plug TDK Manufacturing E01
23.52
P4106 Peerless faucet LEW Piping Supplies E02
65.23
P4107
Kitchen Sink
Kitchen Aid

Stainless
Manufacturing
E02
105.21

Instead of drawing the tables as shown in Exhibit 3-2, another common practice is to list the contents of each
table in short-form notation, marking the key field with an asterisk, as shown in Exhibit 3-3.

Exhibit 3-3

Short-form notation
Tables:

Sales territory
Salesperson
Product line
Product assignment
Product
Sales territory:
Territory code*
Territory name
Territory manager
G/L profit centre
Salesperson:
Salesperson ID*
Salesperson name

Territory code
Quota for the year
Product line:
Product line code*

Description
Product assignment: Product line code*

Salesperson ID*
Product:
Product code*
Product name
Manufacturer
Product line code
Unit cost

Many computer database programs, such as Access, FileMaker Pro, and Paradox, provide relational capabilities.
Oracle and Microsoft SQL are examples of fully relational databases.
Any database is only as useful as the data it contains. Data should be accurate, complete, economical, flexible,
reliable, relevant, simple, timely, verifiable, accessible, and secure. The purpose of data cleanup is to develop
processes to ensure those characteristics. Data cleanup is particularly important when moving from a file-based
system to a database or migrating from one database to another one.

Course Schedule

Course Modules

Review and Practice

Exam Preparation

Resources

3.3 Database management systems


Learning objective

Describe a database management system and explain why it is needed. (Level 1)


Required reading

Chapter 5, pages 186-194


LEVEL 1

The goals and activities of a business should be supported by the appropriate database structure. To create,
implement, and use a database, a database management system (DBMS) is required. A DBMS is a group
of programs used as an interface between the database and either the application programs or a user. Users
include end-users, who use the information from the database or enter data into the database; programmers,
who develop applications for the database; and database administrators (DBA), who create and manage the
database. All DBMSs have certain common functions, but are classified by the type of database they support.

Providing a user view


The first step in creating a database is to define the business objective or goal of the database in a measurable
manner. The next step is providing the DBMS with information about the physical structure and logical
relationships among the data to be contained in the database. This description is called a schema or
schematic. Subschemas, which defines a set of data that users can view or modify, or do both, are used to
give users access to only a portion of the entire database that they need based on business rules and their role
in the organization. For example, the subschema for the accounts payable clerks should only allow them to
have access to the accounts payable-related information and not payroll information. The use of subschemas is
not only efficient but also ensures data security.

Creating and modifying the database


A data definition language (DDL) is used to define and describe data and data relationships in a database.
The schema and subschema are applied using a DDL. When creating or modifying a database, it is also critical
to establish a data dictionary that contains a complete description of all data in the database, including
nomenclature, attributes, users, and applications. Typical uses of a data dictionary are to
provide a standard definition of terms and data elements
assist programmers in designing and writing programs
simplify database modification
A data dictionary helps achieve the advantages of the database approach by
reducing data redundancy
increasing data reliability
speeding up program development
facilitating modification of data and information

Storing and retrieving data


Potential problems arise if more than one user or program attempts to access the same record in the same
database at the same time, and so there is a need for concurrency control. Data access control functions

within the DBMS ensure that two users cannot modify the same field at the same time.

Manipulating data and generating reports


When the DBMS is operational, a variety of programming languages can be used by different users to create
applications that will access the data from the database. Data manipulation language (DML) are
commands that are part of the DBMS package. Structured query language (SQL) is a popular DML tool
that can be used across a wide range of hardware platforms.
The personal computer environment is significantly different from the corporate mainframe or networked
environment. Typical database software for personal computers, such as Microsoft Access, MySQL and
FileMakerPro, allows the user to interact directly with the database without needing to know or understand the
different components such as the DDL and DML.

Course Schedule

Course Modules

Review and Practice

Exam Preparation

Resources

3.4 Database storage and analysis


Learning objective

Describe database storage techniques. (Level 2)


Required reading

Chapter 5, pages 202-204


LEVEL 2

Database storage techniques


For any database, a number of database storage techniques can be used to store and manage it. Most
databases are stored in a central location. Mainframe computers and personal computers use a centralized
database storage technique. However, distributed database storage is growing in popularity.
Distributed databases

Distributed databases are technically quite complicated to implement and administer. Distributed database
storage involves storing an organization's data in several different servers that are connected via
telecommunication equipment. It is sufficient to know that such a technology exists, and that one form of
implementation is a replicated database. For the purpose of this course, the description in the text on page
203 is adequate.

Course Schedule

Course Modules

Review and Practice

Exam Preparation

Resources

3.5 Database developments


Learning objective

Describe database developments, including data warehousing, data marts, and data mining.
(Level 2)
Required reading

Chapter 5, pages 196-202 (up to "Distributed databases"), pages 204-207


LEVEL 2

Data warehouses, data marts, and data mining


The value of data ultimately lies in the decisions it enables. Companies have started developing data
warehouses and data marts to collect business information from the multiple sources within an organization
with the objective of making better business decisions. Data mining and online analytical processing (OLAP) are
information-analysis tools that help automate the identification of patterns, trends, or relationships in a data
warehouse to support decision making.
A data warehouse enables an organization to consolidate massive amounts of information extracted from
operational and production systems for analysis. Data warehousing techniques are becoming increasingly
popular with large organizations that have amassed trillions of bytes of data. Ordinary database analysis
techniques do not work well with such massive amounts of data.
A well-designed and properly built data warehouse
delivers a good return on investment
improves the company's competitive advantage by linking both internal and external information
stores data extracted from the production databases and conventional files in one place
has directories that show users what is in the database and how to access it
provides information that meets the organization's need for business intelligence
Building a data warehouse is a very time-consuming process because it is difficult to define what data are
necessary and what level of consolidation is desired. Many organizations now start with a smaller version of
data warehouse called a data mart for departmental use. Data marts are also used by small and
medium-sized businesses. Departmental data marts can be used for online analytical processing (OLAP) within
departments and form the basis of the data warehouse for the organization.
Data mining is an information analysis tool that involves the automated discovery of patterns and
relationships in a data warehouse. Business intelligence has stimulated the interest in and the use of data
mining because of the enormous amounts of data being collected. Because of the rapid growth and potential
for data mining, the traditional DBMS vendors are incorporating data mining tools into their products.
While both online analytical processing (OLAP) and data mining support data analysis and decision making, a
data-mining tool generally does the work for the user and presents results, while OLAP requires the user to be
more knowledgeable about the data andtheir business context to gain insight from the data. OLAP is now
being used to store and deliver vast amounts of data warehouse information efficiently.

Business intelligence

Business intelligence (BI) is the process of getting enough of the right information in a timely manner and
usable form to support the business strategy, tactics, or operations.
Competitive intelligence is the continuous legal and ethical collection and analysis of information about
competitors for comparison purposes.
Counterintelligence is what a firm does to protect its information from the competition.
Knowledge management is a collection of techniques that captures and manages structured and
unstructured information to improve the ability of the organization to make timely and good business decisions.
Open database connectivity (ODBC) is a set of standards that helps database integration and has the
ability to share information between databases. Software developed according to these standards can be used
with any ODBC-compliant database. This is extremely important to organizations that use a variety of levels of
database applications. ODBC is frequently a standard requirement when organizations select software.

Object-oriented and object-relational database management systems


Instead of storing individual records, an object-oriented database management system (OODBMS)
stores objects which, unlike records, may not be uniform in shape and size and may exist in a variety of forms
including audio, video, and graphical data. An object-relational database management system
(ORDBMS) allows third parties to add new data types and operations to the database. The growth of
e-commerce, web-based applications, and web servers has created increasing demands for ORDBMS.
Virtual or hypermedia databases contain linked nodes of data. A web page containing hypertext links can
be viewed as a form of hypermedia database of information. On a web page, a user does not need to navigate
through the information in a sequential manner. Instead, hypertext links can be used to explore other parts of
the database. The advantage hypertext has is that, unlike traditional database manipulation languages, users
can search for and manipulate alphanumeric data in an unstructured form. Hypermedia databases are an
extension of hypertext that store and access graphics, sound, and video, as well as alphanumeric data.
One other database system of increasing importance is spatial data technology, also known as geographic
information systems (GIS). The global positioning system (GPS) is one of the applications that provide
data input to the GIS. The databases store spatial location data. In the case of NASA and Canadian satellites,
over a terabyte of data is stored every day. The cumulative data is nearing the petabyte (1,000 terabytes)
mark. For such large databases, special tools are being developed to handle the data.

Course Schedule

Course Modules

Review and Practice

Exam Preparation

Resources

Module 3 summary
File and database organization
This module introduces the basic concepts of files and databases, their components, and organization.
Database characteristics, advantages, and disadvantages will be reviewed, followed by a comparison of
hierarchical, network, and relational databases.

Describe how fields, records, files, and databases are organized within a
data hierarchy.
Data must be organized and structured so that they can be used effectively.
Data hierarchy (from largest to smallest element):
1. Database a group of files holding related information
2. File a collection of related information called records
3. Record a collection of attributes of an entity in a file. For example, in a personnel
file, an employee is an entity. Attributes of an employee include employee number,
date of birth, and start date.
4. Field:
A field is the smallest piece of information in a record, corresponding
to one attribute of an entity.
A primary key field is a field that uniquely identifies a record in a file
for quicker access of data and sorting.
A secondary key field is sometimes used for access and sorting but it
does not uniquely identify a record.
5. Entity people, places, or objects for which data is collected, stored, and
maintained
6. Attribute a characteristic of an entity
7. Character a letter, number, or symbol

Describe database organization and database components.


A database is a collection of data organized so that they can be accessed and used by many
different applications. Data is stored and managed centrally.
Logical and physical view of data:
logical view presents what end-users see
physical view reflects the way data is actually organized and structured on
physical storage media
Some advantages of using a database approach:
data independent of application program
reduction of data redundacy and inconsistency
elimination of data confusion
consolidation of data management
ease of information access and use
Disadvantages of database approach:
Organization is more vulnerable in the event of system failures because data is
centralized.
Software and hardware requirements are higher.
Because data is centralized, errors that do enter the database may have a

widespread effect.
A database administrator (DBA) is required to manage the DBMS.
Three principal database models are:
hierarchical model organizes information in a tree-like structure
network model the database structure is many-to-many
relational model uses two-dimensional tables called relations to store data
Which model to use depends on
the nature of the data relationships
the need for flexibility
the volume of requests or changes to the database to be processed
the ease of use for end-users

Describe a database management system and explain why it is needed.


Database management system (DBMS) is the software that serves as an interface between a
common database and various application programs.
Three components of a DBMS are:
data definition language
data manipulation language
data dictionary
A schema describes physical structure and logical relationships of data.
A subschema provides a specific user view.
A data definition language (DDL) is used to define and describe data and data relationships in a
database.
A data dictionary contains a complete description of all data in the database.
A data dictionary reduces data redundancy, increases data reliability, and facilitates
development and modification of the database.
Data manipulation language (DML) commands are part of a DBMS package, and are used to
manipulate the data and generate reports.
Structured query language (SQL) is a tool to be used across a wide range of
hardware platforms.

Describe database storage techniques and services.


Most databases are stored in a central location. Mainframe computers, personal computers, as
well as LANs, use a centralized database storage technique.
Distributed databases are technically quite complicated to implement and administer.
A replicated database holds a duplicate set of frequently-used data at different locations and is
one type of distributed database.

Describe database developments, including data warehousing, data marts,


and data mining.
Data warehouse

Data warehouse consolidates data from various operational systems and external data.
It enables online analytical processing (OLAP) to provide information that meets the organizations
information needs.

It is difficult and costly to build; however, it provides a good return on investment if properly
designed.
Data marts

Smaller versions of data warehouse, called data marts, may be built first. These data marts can
be used for departmental OLAP and form the basis of data warehouse for the organization.
Data mining

Data mining consolidates data from various operational systems and external data.
It enables online analytical processing (OLAP) to provide information that meets the organizations
information needs.

Course Schedule

Course Modules

Solution 1
a.
b.
c.
d.
e.

2)
3)
4)
3)
4)

Text, page 177


Module Notes, Topic 3.3
Text, page 179
Text, page 179
Text, page 182

Review and Practice

Exam Preparation

Resources

Course Schedule

Course Modules

Review and Practice

Exam Preparation

Resources

Solution 2
A database is a collection of integrated and related files. A database management system is the software used
to manipulate the database and provide an interface between the database and the user or application
programs. A database management system is systems software that helps organize data for effective access
and storage by multiple applications. A DBMS provides different users with different views of the data
(subschemas), avoids redundancy, encourages program independence, offers flexible access, and provides
centralized control.

Course Schedule

Course Modules

Review and Practice

Exam Preparation

Resources

Solution 3
Data mining is the automated discovery of patterns and relationships in data warehouses. OLAP tools can tell
users what happened in their business. Data mining searches the data for statistical "whys" by seeking patterns
in the data and then developing hypotheses to predict future behaviour. Online analytical processing (OLAP)
programs are used to store and deliver data warehouse information. The OLAP allows users to explore
corporate data in new and innovative ways using multiple dimensions such as products, salespeople, or time.
OLAP programs include spreadsheets, reporting and analysis tools, and custom applications.

Course Schedule

Course Modules

Review and Practice

Exam Preparation

Resources

The hierarchical model is inappropriate in this case because of the many-to-many relationships between
salespersons, product lines, sales territories, and inventory items. The network and relational models, however,
are both suitable. The preferred model is a relational database due to its flexibility to associate or link different
types of data.

Das könnte Ihnen auch gefallen