Sie sind auf Seite 1von 74

CHAPTER 13

COMPUTING FOUNDATIONS

ACRONYMS

AOP Aspect-Oriented Programming SCSI Small Computer System Interface


ALU Arithmetic and Logic Unit SQL Structured Query Language
Application Programming TCP Transport Control Protocol
API
Interface UDP User Datagram Protocol
ATM Asynchronous Transfer Mode VPN Virtual Private Network
B/S Browser-Server WAN Wide Area Network
Computer Emergency Response
CERT
Team
COTS Commercial Off-The-Shelf INTRODUCTION
CRUD Create, Read, Update, Delete
The scope of the Computing Foundations knowl-
C/S Client-Server edge area (KA) encompasses the development
CS Computer Science and operational environment in which software
DBMS Database Management System evolves and executes. Because no software can
FPU Float Point Unit exist in a vacuum or run without a computer, the
core of such an environment is the computer and
I/O Input and Output its various components. Knowledge about the
ISA Instruction Set Architecture computer and its underlying principles of hard-
International Organization for ware and software serves as a framework on
ISO which software engineering is anchored. Thus, all
Standardization
ISP Internet Service Provider software engineers must have good understand-
ing of the Computing Foundations KA.
LAN Local Area Network It is generally accepted that software engi-
MUX Multiplexer neering builds on top of computer science. For
NIC Network Interface Card example, “Software Engineering 2004: Cur-
OOP Object-Oriented Programming riculum Guidelines for Undergraduate Degree
Programs in Software Engineering” [1] clearly
OS Operating System states, “One particularly important aspect is that
OSI Open Systems Interconnection software engineering builds on computer science
PC Personal Computer and mathematics” (italics added).
PDA Personal Digital Assistant Steve Tockey wrote in his book Return on
Software:
PPP Point-to-Point Protocol
RFID Radio Frequency Identification Both computer science and software engi-
RAM Random Access Memory neering deal with computers, computing,
ROM Read Only Memory and software. The science of computing, as
a body of knowledge, is at the core of both.

13-1
13-2 SWEBOK® Guide V3.0

Figure 13.1. Breakdown of Topics for the Computing Foundations KA

… Software engineering is concerned with KA. For example, computer graphics—while an


the application of computers, computing, important course in a computer science degree
and software to practical purposes, specifi- program—is not included in this KA.
cally the design, construction, and opera- Second, some topics discussed in this guide-
tion of efficient and economical software line do not exist as standalone courses in under-
systems. graduate or graduate computer science programs.
Consequently, such topics may not be adequately
Thus, at the core of software engineering is an covered in a purely course-based breakdown. For
understanding of computer science. example, abstraction is a topic incorporated into
While few people will deny the role computer several different computer science courses; it is
science plays in the development of software unclear which course abstraction should belong to
engineering both as a discipline and as a body of in a course-based breakdown of topics.
knowledge, the importance of computer science The Computing Foundations KA is divided into
to software engineering cannot be overempha- seventeen different topics. A topic’s direct useful-
sized; thus, this Computing Foundations KA is ness to software engineers is the criterion used for
being written. selecting topics for inclusion in this KA (see Figure
The majority of topics discussed in the Com- 13.1). The advantage of this topic-based breakdown
puting Foundations KA are also topics of discus- is its foundation on the belief that Computing Foun-
sion in basic courses given in computer science dations—if it is to be grasped firmly—must be con-
undergraduate and graduate programs. Such sidered as a collection of logically connected topics
courses include programming, data structure, undergirding software engineering in general and
algorithms, computer organization, operating software construction in particular.
systems, compilers, databases, networking, dis- The Computing Foundations KA is related
tributed systems, and so forth. Thus, when break- closely to the Software Design, Software Con-
ing down topics, it can be tempting to decompose struction, Software Testing, Software Main-
the Computing Foundations KA according to tenance, Software Quality, and Mathematical
these often-found divisions in relevant courses. Foundations KAs.
However, a purely course-based division of
topics suffers serious drawbacks. For one, not all BREAKDOWN OF TOPICS FOR
courses in computer science are related or equally COMPUTING FOUNDATIONS
important to software engineering. Thus, some
topics that would otherwise be covered in a The breakdown of topics for the Computing
computer science course are not covered in this Foundations KA is shown in Figure 13.1.
Computing Foundations 13-3

1. Problem Solving Techniques 1.3. Analyze the Problem


[2*, s3.2, c4] [3*, c5]
Once the problem statement is available, the next
The concepts, notions, and terminology introduced step is to analyze the problem statement or situ-
here form an underlying basis for understanding ation to help structure our search for a solution.
the role and scope of problem solving techniques. Four types of analysis include situation analysis,
in which the most urgent or critical aspects of a
1.1. Definition of Problem Solving situation are identified first; problem analysis, in
which the cause of the problem must be deter-
Problem solving refers to the thinking and activi- mined; decision analysis, in which the action(s)
ties conducted to answer or derive a solution to a needed to correct the problem or eliminate its
problem. There are many ways to approach a cause must be determined; and potential problem
problem, and each way employs different tools analysis, in which the action(s) needed to prevent
and uses different processes. These different any reoccurrences of the problem or the develop-
ways of approaching problems gradually expand ment of new problems must be determined.
and define themselves and finally give rise to dif-
ferent disciplines. For example, software engi- 1.4. Design a Solution Search Strategy
neering focuses on solving problems using com-
puters and software. Once the problem analysis is complete, we can
While different problems warrant different focus on structuring a search strategy to find the
solutions and may require different tools and solution. In order to find the “best” solution (here,
processes, the methodology and techniques used “best” could mean different things to different
in solving problems do follow some guidelines people, such as faster, cheaper, more usable, dif-
and can often be generalized as problem solving ferent capabilities, etc.), we need to eliminate
techniques. For example, a general guideline for paths that do not lead to viable solutions, design
solving a generic engineering problem is to use tasks in a way that provides the most guidance in
the three-step process given below [2*]. searching for a solution, and use various attributes
of the final solution state to guide our choices in
• Formulate the real problem. the problem solving process.
• Analyze the problem.
• Design a solution search strategy. 1.5. Problem Solving Using Programs

1.2. Formulating the Real Problem The uniqueness of computer software gives prob-
lem solving a flavor that is distinct from general
Gerard Voland writes, “It is important to recog- engineering problem solving. To solve a problem
nize that a specific problem should be formulated using computers, we must answer the following
if one is to develop a specific solution” [2*]. This questions.
formulation is called the problem statement,
which explicitly specifies what both the problem • How do we figure out what to tell the com-
and the desired outcome are. puter to do?
Although there is no universal way of stat- ing • How do we convert the problem statement
a problem, in general a problem should be into an algorithm?
expressed in such a way as to facilitate the devel- • How do we convert the algorithm into
opment of solutions. Some general techniques to machine instructions?
help one formulate the real problem include
statement-restatement, determining the source The first task in solving a problem using a com-
and the cause, revising the statement, analyzing puter is to determine what to tell the computer to
present and desired state, and using the fresh eye do. There may be many ways to tell the story, but
approach. all should take the perspective of a computer such
13-4 SWEBOK® Guide V3.0

that the computer can eventually solve the prob- “Through abstraction,” according to Voland,
lem. In general, a problem should be expressed in “we view the problem and its possible solution
such a way as to facilitate the development of paths from a higher level of conceptual under-
algorithms and data structures for solving it. standing. As a result, we may become better pre-
The result of the first task is a problem state- pared to recognize possible relationships between
ment. The next step is to convert the problem state- different aspects of the problem and thereby gen-
ment into algorithms that solve the problem. Once erate more creative design solutions” [2*]. This is
an algorithm is found, the final step converts the particularly true in computer science in general
algorithm into machine instructions that form the (such as hardware vs. software) and in software
final solution: software that solves the problem. engineering in particular (data structure vs. data
Abstractly speaking, problem solving using a flow, and so forth).
computer can be considered as a process of prob-
lem transformation—in other words, the step-by- 2.1. Levels of Abstraction
step transformation of a problem statement into a
problem solution. To the discipline of software When abstracting, we concentrate on one “level”
engineering, the ultimate objective of problem of the big picture at a time with confidence that
solving is to transform a problem expressed in we can then connect effectively with levels above
natural language into electrons running around a and below. Although we focus on one level,
circuit. In general, this transformation can be abstraction does not mean knowing nothing about
broken into three phases: the neighboring levels. Abstraction levels do not
necessarily correspond to discrete components in
a) Development of algorithms from the prob- reality or in the problem domain, but to well-
lem statement. defined standard interfaces such as programming
b) Application of algorithms to the problem. APIs. The advantages that standard interfaces
c) Transformation of algorithms to program provide include portability, easier software/hard-
code. ware integration and wider usage.

The conversion of a problem statement into 2.2. Encapsulation


algorithms and algorithms into program codes
usually follows a “stepwise refinement” (a.k.a. Encapsulation is a mechanism used to imple-
systematic decomposition) in which we start with ment abstraction. When we are dealing with one
a problem statement, rewrite it as a task, and level of abstraction, the information concerning
recursively decompose the task into a few simpler the levels below and above that level is encapsu-
subtasks until the task is so simple that solutions lated. This information can be the concept, prob-
to it are straightforward. There are three basic lem, or observable phenomenon; or it may be the
ways of decomposing: sequential, condi- tional, permissible operations on these relevant entities.
and iterative. Encapsulation usually comes with some degree of
information hiding in which some or all of the
2. Abstraction underlying details are hidden from the level above
[3*, s5.2–5.4] the interface provided by the abstraction. To an
object, information hiding means we don’t need
Abstraction is an indispensible technique associ- to know the details of how the object is rep-
ated with problem solving. It refers to both the resented or how the operations on those objects
process and result of generalization by reducing are implemented.
the information of a concept, a problem, or an
observable phenomenon so that one can focus on 2.3. Hierarchy
the “big picture.” One of the most important skills
in any engineering undertaking is framing the When we use abstraction in our problem formula-
levels of abstraction appropriately. tion and solution, we may use different abstractions
Computing Foundations 13-5

at different times—in other words, we work on dif- perform a desired function. It is an indispensible
ferent levels of abstraction as the situation calls. part in software construction. In general, pro-
Most of the time, these different levels of abstrac- gramming can be considered as the process of
tion are organized in a hierarchy. There are many designing, writing, testing, debugging, and main-
ways to structure a particular hierarchy and the taining the source code. This source code is writ-
criteria used in determining the specific content of ten in a programming language.
each layer in the hierarchy varies depending on the The process of writing source code often
individuals performing the work. requires expertise in many different subject
Sometimes, a hierarchy of abstraction is sequen- areas—including knowledge of the application
tial, which means that each layer has one and only domain, appropriate data structures, special- ized
one predecessor (lower) layer and one and only algorithms, various language constructs, good
one successor (upper) layer—except the upmost programming techniques, and software
layer (which has no successor) and the bottommost engineering.
layer (which has no predecessor). Sometimes,
however, the hierarchy is organized in a tree-like 3.1. The Programming Process
structure, which means each layer can have more
than one predecessor layer but only one successor Programming involves design, writing, testing,
layer. Occasionally, a hierarchy can have a many- debugging, and maintenance. Design is the con-
to-many structure, in which each layer can have ception or invention of a scheme for turning a
multiple predecessors and successors. At no time, customer requirement for computer software into
shall there be any loop in a hierarchy. operational software. It is the activity that links
A hierarchy often forms naturally in task decom- application requirements to coding and debug-
position. Often, a task analysis can be decomposed ging. Writing is the actual coding of the design in
in a hierarchical fashion, starting with the larger an appropriate programming language. Testing is
tasks and goals of the organization and breaking the activity to verify that the code one writes
each of them down into smaller subtasks that can actually does what it is supposed to do. Debug-
again be further subdivided This continuous divi- ging is the activity to find and fix bugs (faults) in
sion of tasks into smaller ones would produce a the source code (or design). Maintenance is the
hierarchical structure of tasks-subtasks. activity to update, correct, and enhance existing
programs. Each of these activities is a huge topic
2.4. Alternate Abstractions and often warrants the explanation of an entire
KA in the SWEBOK Guide and many books.
Sometimes it is useful to have multiple alternate
abstractions for the same problem so that one can 3.2. Programming Paradigms
keep different perspectives in mind. For exam-
ple, we can have a class diagram, a state chart, Programming is highly creative and thus some-
and a sequence diagram for the same software at what personal. Different people often write dif-
the same level of abstraction. These alternate ferent programs for the same requirements. This
abstractions do not form a hierarchy but rather diversity of programming causes much difficulty
complement each other in helping understanding in the construction and maintenance of large
the problem and its solution. Though beneficial, it complex software. Various programming para-
is as times difficult to keep alternate abstractions digms have been developed over the years to put
in sync. some standardization into this highly creative and
personal activity. When one programs, he or she
3. Programming Fundamentals can use one of several programming paradigms to
[3*, c6–19] write the code. The major types of programming
paradigms are discussed below.
Programming is composed of the methodologies Unstructured Programming: In unstructured
or activities for creating computer programs that programming, a programmer follows his/her
13-6 SWEBOK® Guide V3.0

hunch to write the code in whatever way he/she problems. In functional programming, all com-
likes as long as the function is operational. Often, putations are treated as the evaluation of math-
the practice is to write code to fulfill a specific ematical functions. In contrast to the imperative
utility without regard to anything else. Programs programming that emphasizes changes in state,
written this way exhibit no particular structure— functional programming emphasizes the applica-
thus the name “unstructured programming.” tion of functions, avoids state and mutable data,
Unstructured programming is also sometimes and provides referential transparency.
called ad hoc programming.
Structured/Procedural/ Imperative Program- 4. Programming Language Basics
ming: A hallmark of structured programming is [4*, c6]
the use of well-defined control structures, includ-
ing procedures (and/or functions) with each pro- Using computers to solve problems involves
cedure (or function) performing a specific task. programming—which is writing and organiz- ing
Interfaces exist between procedures to facilitate instructions telling the computer what to do at
correct and smooth calling operations of the pro- each step. Programs must be written in some
grams. Under structured programming, program- programming language with which and through
mers often follow established protocols and rules which we describe necessary computations. In
of thumb when writing code. These protocols and other words, we use the facilities provided by a
rules can be numerous and cover almost the entire programming language to describe problems,
scope of programming—ranging from the develop algorithms, and reason about problem
simplest issue (such as how to name variables, solutions. To write any program, one must under-
functions, procedures, and so forth) to more com- stand at least one programming language.
plex issues (such as how to structure an interface,
how to handle exceptions, and so forth). 4.1. Programming Language Overview
Object-Oriented Programming: While proce-
dural programming organizes programs around A programming language is designed to express
procedures, object-oriented programming (OOP) computations that can be performed by a com-
organize a program around objects, which are puter. In a practical sense, a programming lan-
abstract data structures that combine both data guage is a notation for writing programs and thus
and methods used to access or manipulate the should be able to express most data structures and
data. The primary features of OOP are that objects algorithms. Some, but not all, people restrict the
representing various abstract and concrete entities term “programming language” to those languages
are created and these objects interact with each that can express all possible algorithms.
other to collectively fulfill the desired functions. Not all languages have the same importance
Aspect-Oriented Programming: Aspect-ori- and popularity. The most popular ones are often
ented programming (AOP) is a programming defined by a specification document established
paradigm that is built on top of OOP. AOP aims by a well-known and respected organization. For
to isolate secondary or supporting functions from example, the C programming language is speci-
the main program’s business logic by focusing on fied by an ISO standard named ISO/IEC 9899.
the cross sections (concerns) of the objects. The Other languages, such as Perl and Python, do not
primary motivation for AOP is to resolve the enjoy such treatment and often have a dominant
object tangling and scattering associated with implementation that is used as a reference.
OOP, in which the interactions among objects
become very complex. The essence of AOP is the 4.2. Syntax and Semantics of Programming
greatly emphasized separation of concerns, which Languages
separates noncore functional concerns or logic
into various aspects. Just like natural languages, many programming
Functional Programming: Though less popu- languages have some form of written specifica-
lar, functional programming is as viable as the tion of their syntax (form) and semantics (mean-
other paradigms in solving programming ing). Such specifications include, for example,
Computing Foundations 13-7

specific requirements for the definition of vari- 4.4. High-Level Programming Languages
ables and constants (in other words, declara- tion
and types) and format requirements for the A high-level programming language has a strong
instructions themselves. abstraction from the details of the computer’s
In general, a programming language supports ISA. In comparison to low-level programming
such constructs as variables, data types, con- languages, it often uses natural-language ele-
stants, literals, assignment statements, control ments and is thus much easier for humans to
statements, procedures, functions, and comments. understand. Such languages allow symbolic nam-
The syntax and semantics of each construct must ing of variables, provide expressiveness, and
be clearly specified. enable abstraction of the underlying hardware.
For example, while each microprocessor has its
4.3. Low-Level Programming Languages own ISA, code written in a high-level program-
ming language is usually portable between many
Programming language can be classified into two different hardware platforms. For these reasons,
classes: low-level languages and high-level lan- most programmers use and most software are
guages. Low-level languages can be understood written in high-level programming languages.
by a computer with no or minimal assistance and Examples of high-level programming languages
typically include machine languages and assem- include C, C++, C#, and Java.
bly languages. A machine language uses ones and
zeros to represent instructions and variables, and 4.5. Declarative vs. Imperative Programming
is directly understandable by a computer. An Languages
assembly language contains the same instructions
as a machine language but the instructions and Most programming languages (high-level or low-
variables have symbolic names that are easier for level) allow programmers to specify the indi-
humans to remember. vidual instructions that a computer is to execute.
Assembly languages cannot be directly under- Such programming languages are called impera-
stood by a computer and must be translated into a tive programming languages because one has to
machine language by a utility program called an specify every step clearly to the computer. But
assembler. There often exists a correspondence some programming languages allow program-
between the instructions of an assembly language mers to only describe the function to be per-
and a machine language, and the translation from formed without specifying the exact instruction
assembly code to machine code is straightfor- sequences to be executed. Such programming
ward. For example, “add r1, r2, r3” is an assem- languages are called declarative programming
bly instruction for adding the content of register languages. Declarative languages are high-level
r2 and r3 and storing the sum into register r1. This languages. The actual implementation of the
instruction can be easily translated into machine computation written in such a language is hidden
code “0001 0001 0010 0011.” (Assume the oper- from the programmers and thus is not a concern
ation code for addition is 0001, see Figure 13.2). for them.
The key point to note is that declarative pro-
add r1, r2, r3 gramming only describes what the program
0001 0001 0010 0011 should accomplish without describing how to
accomplish it. For this reason, many people
Figure 13.2. Assembly-to-Binary Translations believe declarative programming facilitates
easier software development. Declarative pro-
One common trait shared by these two types of gramming languages include Lisp (also a func-
language is their close association with the tional programming language) and Prolog, while
specifics of a type of computer or instruction set imperative programming languages include C,
architecture (ISA). C++, and JAVA.
13-8 SWEBOK® Guide V3.0

5. Debugging Tools and Techniques 5.2. Debugging Techniques


[3*, c23]
Debugging involves many activities and can be
Once a program is coded and compiled (compila- static, dynamic, or postmortem. Static debug-
tion will be discussed in section 10), the next step ging usually takes the form of code review, while
is debugging, which is a methodical process of dynamic debugging usually takes the form of
finding and reducing the number of bugs or faults tracing and is closely associated with testing.
in a program. The purpose of debugging is to find Postmortem debugging is the act of debugging
out why a program doesn’t work or produces a the core dump (memory dump) of a process. Core
wrong result or output. Except for very simple dumps are often generated after a process has ter-
programs, debugging is always necessary. minated due to an unhandled exception. All three
techniques are used at various stages of program
5.1. Types of Errors development.
The main activity of dynamic debugging is
When a program does not work, it is often because tracing, which is executing the program one piece
the program contains bugs or errors that can be at a time, examining the contents of registers and
either syntactic errors, logical errors, or data errors. memory, in order to examine the results at each
Logical errors and data errors are also known as step. There are three ways to trace a program.
two categories of “faults” in software engineering
terminology (see topic 1.1, Testing-Related Ter- • Single-stepping: execute one instruction at a
minology, in the Software Testing KA). time to make sure each instruction is exe-
Syntax errors are simply any error that pre- cuted correctly. This method is tedious but
vents the translator (compiler/interpreter) from useful in verifying each step of a program.
successfully parsing the statement. Every state- • Breakpoints: tell the program to stop execut-
ment in a program must be parse-able before its ing when it reaches a specific instruction.
meaning can be understood and interpreted (and, This technique lets one quickly execute
therefore, executed). In high-level programming selected code sequences to get a high-level
languages, syntax errors are caught during the overview of the execution behavior.
compilation or translation from the high-level • Watch points: tell the program to stop when a
language into machine code. For example, in the register or memory location changes or when
C/C++ programming language, the statement it equals to a specific value. This technique is
“123=constant;” contains a syntax error that will useful when one doesn’t know where or
be caught by the compiler during compilation. when a value is changed and when this value
Logic errors are semantic errors that result in change likely causes the error.
incorrect computations or program behaviors.
Your program is legal, but wrong! So the results 5.3. Debugging Tools
do not match the problem statement or user expec-
tations. For example, in the C/C++ programming Debugging can be complex, difficult, and tedious.
language, the inline function “int f(int x) {return Like programming, debugging is also highly cre-
f(x-1);}” for computing factorial x! is legal but ative (sometimes more creative than program-
logically incorrect. This type of error cannot be ming). Thus some help from tools is in order. For
caught by a compiler during compilation and is dynamic debugging, debuggers are widely used
often discovered through tracing the execution of and enable the programmer to monitor the execu-
the program (Modern static checkers do identify tion of a program, stop the execution, restart the
some of these errors. However, the point remains execution, set breakpoints, change values in mem-
that these are not machine checkable in general). ory, and even, in some cases, go back in time.
Data errors are input errors that result either in For static debugging, there are many static
input data that is different from what the program code analysis tools, which look for a specific set
expects or in the processing of wrong data. of known problems within the source code.
Computing Foundations 13-9

Both commercial and free tools exist in various 6.2. Types of Data Structure
languages. These tools can be extremely useful
when checking very large source trees, where it is As mentioned above, different perspectives can
impractical to do code walkthroughs. The UNIX be used to classify data structures. However, the
lint program is an early example. predominant perspective used in classification
centers on physical and logical ordering between
6. Data Structure and Representation data items. This classification divides data struc-
[5*, s2.1–2.6] tures into linear and nonlinear structures. Linear
structures organize data items in a single dimen-
Programs work on data. But data must be sion in which each data entry has one (physical or
expressed and organized within computers before logical) predecessor and one successor with the
being processed by programs. This organization exception of the first and last entry. The first entry
and expression of data for programs’ use is the has no predecessor and the last entry has no
subject of data structure and representation. Sim- successor. Nonlinear structures organize data
ply put, a data structure tries to store and organize items in two or more dimensions, in which case
data in a computer in such a way that the data can one entry can have multiple predecessors and
be used efficiently. There are many types of data successors. Examples of linear structures include
structures and each type of structure is suitable for lists, stacks, and queues. Examples of nonlinear
some kinds of applications. For example, B/ B+ structures include heaps, hash tables, and trees
trees are well suited for implementing mas- sive (such as binary trees, balance trees, B-trees, and
file systems and databases. so forth).
Another type of data structure that is often
6.1. Data Structure Overview encountered in programming is the compound
structure. A compound data structure builds on
Data structures are computer representations of top of other (more primitive) data structures and,
data. Data structures are used in almost every pro- in some way, can be viewed as the same structure
gram. In a sense, no meaningful program can be as the underlying structure. Examples of com-
constructed without the use of some sort of data pound structures include sets, graphs, and parti-
structure. Some design methods and program- tions. For example, a partition can be viewed as a
ming languages even organize an entire software set of sets.
system around data structures. Fundamentally,
data structures are abstractions defined on a col- 6.3. Operations on Data Structures
lection of data and its associated operations.
Often, data structures are designed for improv- All data structures support some operations that
ing program or algorithm efficiency. Examples of produce a specific structure and ordering, or
such data structures include stacks, queues, and retrieve relevant data from the structure, store data
heaps. At other times, data structures are used for into the structure, or delete data from the structure.
conceptual unity (abstract data type), such as the Basic operations supported by all data structures
name and address of a person. Often, a data struc- include create, read, update, and delete (CRUD).
ture can determine whether a program runs in a
few seconds or in a few hours or even a few days. • Create: Insert a new data entry into the
From the perspective of physical and logi- cal structure.
ordering, a data structure is either linear or • Read: Retrieve a data entry from the structure.
nonlinear. Other perspectives give rise to dif- • Update: Modify an existing data entry.
ferent classifications that include homogeneous • Delete: Remove a data entry from the
vs. heterogeneous, static vs. dynamic, persistent structure.
vs. transient, external vs. internal, primitive vs.
aggregate, recursive vs. nonrecursive; passive vs. Some data structures also support additional
active; and stateful vs. stateless structures. operations:
13-10 SWEBOK® Guide V3.0

• Find a particular element in the structure. 7.2. Attributes of Algorithms


• Sort all elements according to some ordering.
• Traverse all elements in some specific order. The attributes of algorithms are many and often
• Reorganize or rebalance the structure. include modularity, correctness, maintainabil-
ity, functionality, robustness, user-friendliness
Different structures support different opera- (i.e. easy to be understood by people), program-
tions with different efficiencies. The difference mer time, simplicity, and extensibility. A com-
between operation efficiency can be significant. monly emphasized attribute is “performance” or
For example, it is easy to retrieve the last item “efficiency” by which we mean both time and
inserted into a stack, but finding a particular ele- resource-usage efficiency while generally
ment within a stack is rather slow and tedious. emphasizing the time axis. To some degree, effi-
ciency determines if an algorithm is feasible or
7. Algorithms and Complexity impractical. For example, an algorithm that takes
[5*, s1.1–1.3, s3.3–3.6, s4.1–4.8, s5.1–5.7, one hundred years to terminate is virtually use-
s6.1–6.3, s7.1–7.6, s11.1, s12.1] less and is even considered incorrect.

Programs are not random pieces of code: they are 7.3. Algorithmic Analysis
meticulously written to perform user-expected
actions. The guide one uses to compose programs Analysis of algorithms is the theoretical study of
are algorithms, which organize various functions computer-program performance and resource
into a series of steps and take into consideration usage; to some extent it determines the goodness
the application domain, the solution strategy, and of an algorithm. Such analysis usually abstracts
the data structures being used. An algorithm can away the particular details of a specific computer
be very simple or very complex. and focuses on the asymptotic, machine-indepen-
dent analysis.
7.1. Overview of Algorithms There are three basic types of analysis. In
worst-case analysis, one determines the maxi-
Abstractly speaking, algorithms guide the opera- mum time or resources required by the algorithm
tions of computers and consist of a sequence of on any input of size n. In average-case analysis,
actions composed to solve a problem. Alternative one determines the expected time or resources
definitions include but are not limited to: required by the algorithm over all inputs of size
n; in performing average-case analysis, one often
• An algorithm is any well-defined computa- needs to make assumptions on the statistical dis-
tional procedure that takes some value or set tribution of inputs. The third type of analysis is
of values as input and produces some value the best-case analysis, in which one determines
or set of values as output. the minimum time or resources required by the
• An algorithm is a sequence of computational algorithm on any input of size n. Among the three
steps that transform the input into the output. types of analysis, average-case analysis is the
• An algorithm is a tool for solving a well- most relevant but also the most difficult to
specified computation problem. perform.
Besides the basic analysis methods, there are
Of course, different definitions are favored by also the amortized analysis, in which one deter-
different people. Though there is no univer- sally mines the maximum time required by an algo-
accepted definition, some agreement exists that rithm over a sequence of operations; and the
an algorithm needs to be correct, finite (in other competitive analysis, in which one determines the
words, terminate eventually or one must be able relative performance merit of an algorithm
to write it in a finite number of steps), and against the optimal algorithm (which may not be
unambiguous. known) in the same category (for the same
operations).
Computing Foundations 13-11

7.4. Algorithmic Design Strategies aggregation, potential, and accounting to ana-


lyze the worst performance of an algorithm on a
The design of algorithms generally follows one of sequence of operations; and competitive analysis,
the following strategies: brute force, divide and in which one uses methods such as potential and
conquer, dynamic programming, and greedy accounting to analyze the relative performance of
selection. The brute force strategy is actually a an algorithm to the optimal algorithm.
no-strategy. It exhaustively tries every possible For complex problems and algorithms, one
way to tackle a problem. If a problem has a solu- may need to use a combination of the aforemen-
tion, this strategy is guaranteed to find it; however, tioned analysis strategies.
the time expense may be too high. The divide and
conquer strategy improves on the brute force 8. Basic Concept of a System
strategy by dividing a big problem into smaller, [6*, c10]
homogeneous problems. It solves the big prob-
lem by recursively solving the smaller problems Ian Sommerville writes, “a system is a purposeful
and combing the solutions to the smaller prob- collection of interrelated components that work
lems to form the solution to the big problem. The together to achieve some objective” [6*]. A sys-
underlying assumption for divide and conquer is tem can be very simple and include only a few
that smaller problems are easier to solve. components, like an ink pen, or rather complex,
The dynamic programming strategy improves like an aircraft. Depending on whether humans
on the divide and conquer strategy by recogniz- are part of the system, systems can be divided into
ing that some of the sub-problems produced by technical computer-based systems and socio-
division may be the same and thus avoids solving technical systems. A technical computer-based
the same problems again and again. This elimina- system functions without human involvement,
tion of redundant subproblems can dramatically such as televisions, mobile phones, thermostat,
improve efficiency. and some software; a sociotechnical system will
The greedy selection strategy further improves not function without human involvement.
on dynamic programming by recognizing that not Examples of such system include manned space
all of the sub-problems contribute to the solu- tion vehicles, chips embedded inside a human, and so
of the big problem. By eliminating all but one forth.
sub-problem, the greedy selection strategy
achieves the highest efficiency among all algo- 8.1. Emergent System Properties
rithm design strategies. Sometimes the use of
randomization can improve on the greedy selec- A system is more than simply the sum of its parts.
tion strategy by eliminating the complexity in Thus, the properties of a system are not simply the
determining the greedy choice through coin flip- sum of the properties of its components. Instead,
ping or randomization. a system often exhibits properties that are proper-
ties of the system as a whole. These properties are
7.5. Algorithmic Analysis Strategies called emergent properties because they develop
only after the integration of constituent parts in
The analysis strategies of algorithms include the system. Emergent system properties can be
basic counting analysis, in which one actually either functional or nonfunctional. Functional
counts the number of steps an algorithm takes to properties describe the things that a system does.
complete its task; asymptotic analysis, in which For example, an aircraft’s functional properties
one only considers the order of magnitude of the include flotation on air, carrying people or cargo,
number of steps an algorithm takes to com- plete and use as a weapon of mass destruction. Non-
its task; probabilistic analysis, in which one functional properties describe how the system
makes use of probabilities in analyzing the behaves in its operational environment. These can
average performance of an algorithm; amor- tized include such qualities as consistency, capac- ity,
analysis, in which one uses the methods of weight, security, etc.
13-12 SWEBOK® Guide V3.0

Figure 13.3. Basic Components of a Computer System Based on the von Neumann Model

8.2. Systems Engineering and electronic components with each component


performing a preset function. Jointly, these com-
“Systems engineering is the interdisciplinary ponents are able to execute the instructions that
approach governing the total technical and mana- are given by the program.
gerial effort required to transform a set of cus- Abstractly speaking, a computer receives some
tomer needs, expectations, and constraints into a input, stores and manipulates some data, and
solution and to support that solution through- out provides some output. The most distinct feature
its life.” [7]. The life cycle stages of systems of a computer is its ability to store and execute
engineering vary depending on the system being sequences of instructions called programs. An
built but, in general, include system requirements interesting phenomenon concerning the computer
definition, system design, sub-system develop- is the universal equivalence in functionality.
ment, system integration, system testing, sys- tem According to Turing, all computers with a certain
installation, system evolution, and system minimum capability are equivalent in their abil-
decommissioning. ity to perform computation tasks. In other words,
Many practical guidelines have been produced given enough time and memory, all computers—
in the past to aid people in performing the activi- ranging from a netbook to a supercomputer—are
ties of each phase. For example, system design capable of computing exactly the same things,
can be broken into smaller tasks of identification irrespective of speed, size, cost, or anything else.
of subsystems, assignment of system require- Most computer systems have a structure that
ments to subsystems, specification of subsystem is known as the “von Neumann model,” which
functionality, definition of sub-system interfaces, consists of five components: a memory for storing
and so forth. instructions and data, a central processing unit
for performing arithmetic and logical operations,
8.3. Overview of a Computer System a control unit for sequencing and interpreting
instructions, input for getting external informa-
Among all the systems, one that is obviously rel- tion into the memory, and output for producing
evant to the software engineering community is results for the user. The basic components of a
the computer system. A computer is a machine computer system based on the von Neumann
that executes programs or software. It consists of model are depicted in Figure 13.3.
a purposeful collection of mechanical, electrical,
Computing Foundations 13-13

9. Computer Organization the ISA, which specifies such things as the native
[8*, c1–c4] data types, instructions, registers, addressing
modes, the memory architecture, interrupt and
From the perspective of a computer, a wide exception handling, and the I/Os. Overall, the
semantic gap exists between its intended behav- ISA specifies the ability of a computer and what
ior and the workings of the underlying electronic can be done on the computer with programming.
devices that actually do the work within the com-
puter. This gap is bridged through computer orga- 9.2. Digital Systems
nization, which meshes various electrical, elec-
tronic, and mechanical devices into one device At the lowest level, computations are carried out
that forms a computer. The objects that computer by the electrical and electronic devices within a
organization deals with are the devices, connec- computer. The computer uses circuits and mem-
tions, and controls. The abstraction built in com- ory to hold charges that represents the presence or
puter organization is the computer. absence of voltage. The presence of voltage is
equal to a 1 while the absence of voltage is a zero.
9.1. Computer Organization Overview On disk the polarity of the voltage is repre- sented
by 0s and 1s that in turn represents the data stored.
A computer generally consists of a CPU, mem- Everything—including instruction and data—is
ory, input devices, and output devices. Abstractly expressed or encoded using digital zeros and ones.
speaking, the organization of a computer can be In this sense, a computer becomes a digital
divided into four levels (Figure 13.4). The macro system. For example, decimal value 6 can be
architecture level is the formal specification of all encoded as 110, the addition instruction may be
the functions a particular machine can carry out encoded as 0001, and so forth. The component of
and is known as the instruction set architecture the computer such as the control unit, ALU,
(ISA). The micro architecture level is the imple- memory and I/O use the information to compute
mentation of the ISA in a specific CPU—in other the instructions.
words, the way in which the ISA’s specifications
are actually carried out. The logic circuits level is 9.3. Digital Logic
the level where each functional component of
the micro architecture is built up of circuits that Obviously, logics are needed to manipulate data
make decisions based on simple rules. The and to control the operation of computers. This
devices level is the level where, finally, each logic logic, which is behind a computer’s proper func-
circuit is actually built of electronic devices such tion, is called digital logic because it deals with
as complementary metal-oxide semiconductors the operations of digital zeros and ones. Digital
(CMOS), n-channel metal oxide semiconductors logic specifies the rules both for building various
(NMOS), or gallium arsenide (GaAs) transistors, digital devices from the simplest elements (such
and so forth. as transistors) and for governing the operation of
digital devices. For example, digital logic spells
Macro Architecture Level (ISA) out what the value will be if a zero and one is
Micro Architecture Level ANDed, ORed, or exclusively ORed together. It
also specifies how to build decoders, multiplex-
Logic Circuits Level ers (MUX), memory, and adders that are used to
Devices Level assemble the computer.

Figure 13.4. Machine Architecture Levels 9.4. Computer Expression of Data

Each level provides an abstraction to the level As mentioned before, a computer expresses data
above and is dependent on the level below. To a with electrical signals or digital zeros and ones.
programmer, the most important abstraction is Since there are only two different digits used in
13-14 SWEBOK® Guide V3.0

data expression, such a system is called a binary • Memory cells and chips
expression system. Due to the inherent nature of • Memory boards and modules
a binary system, the maximum numerical value • Memory hierarchy and cache
expressible by an n-bits binary code is 2n − 1. • Memory as a subsystem of the computer.
Specifically, binary number anan−1…a1a0 corre-
sponds to an  2 n + a n−1 2 n−1+ … + a 1 2 +1 a Memory cells and chips deal with single-digital
0 20. Thus, the numerical value of the binary storage and the assembling of single-digit units
expression of 1011 is 1  8 + 0  4 + 1  2 + 1 into one-dimensional memory arrays as well as
 1 = 11. To express a nonnumerical value, we the assembling of one-dimensional storage arrays
need to decide the number of zeros and ones to into multi-dimensional storage memory chips.
use and the order in which those zeros and ones Memory boards and modules concern the
are arranged. assembling of memory chips into memory sys-
Of course, there are different ways to do the tems, with the focus being on the organization,
encoding, and this gives rise to different data operation, and management of the individual
expression schemes and subschemes. For example, chips in the system. Memory hierarchy and cache
integers can be expressed in the form of unsigned, are used to support efficient memory operations.
one’s complement, or two’s complement. For Memory as a sub-system deals with the interface
characters, there are ASCII, Unicode, and IBM’s between the memory system and other parts of the
EBCDIC standards. For floating point numbers, computer.
there are IEEE-754 FP 1, 2, and 3 standards.
9.7. Input and Output (I/O)
9.5. The Central Processing Unit (CPU)
A computer is useless without I/O. Common
The central processing unit is the place where input devices include the keyboard and mouse;
instructions (or programs) are actually executed. common output devices include the disk, the
The execution usually takes several steps, includ- screen, the printer, and speakers. Different I/O
ing fetching the program instruction, decoding devices operate at different data rates and reli-
the instruction, fetching operands, performing abilities. How computers connect and manage
arithmetic and logical operations on the oper- various input and output devices to facilitate the
ands, and storing the result. The main compo- interaction between computers and humans (or
nents of a CPU consist of registers where instruc- other computers) is the focus of topics in I/O. The
tions and data are often read from and written to, main issues that must be resolved in input and
the arithmetic and logic unit (ALU) that performs output are the ways I/O can and should be
the actual arithmetic (such as addition, subtrac- performed.
tion, multiplication, and division) and logic (such In general, I/O is performed at both hard- ware
as AND, OR, shift, and so forth) operations, the and software levels. Hardware I/O can be
control unit that is responsible for producing performed in any of three ways. Dedicated I/O
proper signals to control the operations, and vari- dedicates the CPU to the actual input and output
ous (data, address, and control) buses that link the operations during I/O; memory-mapped I/O treats
components together and transport data to and I/O operations as memory operations; and hybrid
from these components. I/O combines dedicated I/O and memory-mapped
I/O into a single holistic I/O operation mode.
9.6. Memory System Organization Coincidentally, software I/O can also be per-
formed in one of three ways. Programmed I/O
Memory is the storage unit of a computer. It con- lets the CPU wait while the I/O device is doing
cerns the assembling of a large-scale memory I/O; interrupt-driven I/O lets the CPU’s handling
system from smaller and single-digit storage of I/O be driven by the I/O device; and direct
units. The main topics covered by memory sys- memory access (DMA) lets I/O be handled by a
tem architecture include the following: secondary CPU embedded in a DMA device (or
Computing Foundations 13-15

channel). (Except during the initial setup, the there are some important differences between the
main CPU is not disturbed during a DMA I/O two methods. First, a compiler makes the conver-
operation.) sion just once, while an interpreter typically con-
Regardless of the types of I/O scheme being verts it every time a program is executed. Second,
used, the main issues involved in I/O include I/O interpreting code is slower than running the com-
addressing (which deals with the issue of how to piled code, because the interpreter must analyze
identify the I/O device for a specific I/O opera- each statement in the program when it is executed
tion), synchronization (which deals with the issue and then perform the desired action, whereas the
of how to make the CPU and I/O device work in compiled code just performs the action within a
harmony during I/O), and error detection and fixed context determined by the compilation.
correction (which deals with the occurrence of Third, access to variables is also slower in an
transmission errors). interpreter because the mapping of identifiers to
storage locations must be done repeatedly at run-
10. Compiler Basics time rather than at compile time.
[4*, s6.4] [8*, s8.4] The primary tasks of a compiler may include
preprocessing, lexical analysis, parsing, semantic
10.1. Compiler/Interpreter Overview analysis, code generation, and code optimiza-
tion. Program faults caused by incorrect compiler
Programmers usually write programs in high behavior can be very difficult to track down. For
level language code, which the CPU cannot exe- this reason, compiler implementers invest a lot of
cute; so this source code has to be converted into time ensuring the correctness of their software.
machine code to be understood by a computer.
Due to the differences between different ISAs, 10.3. The Compilation Process
the translation must be done for each ISA or spe-
cific machine language under consideration. Compilation is a complex task. Most compilers
The translation is usually performed by a piece divide the compilation process into many phases.
of software called a compiler or an interpreter. A typical breakdown is as follows:
This process of translation from a high-level lan-
guage to a machine language is called compila- • Lexical Analysis
tion, or, sometimes, interpretation. • Syntax Analysis or Parsing
• Semantic Analysis
10.2. Interpretation and Compilation • Code Generation

There are two ways to translate a program writ- Lexical analysis partitions the input text (the
ten in a higher-level language into machine code: source code), which is a sequence of characters,
interpretation and compilation. Interpretation into separate comments, which are to be ignored
translates the source code one statement at a time in subsequent actions, and basic symbols, which
into machine language, executes it on the spot, have lexical meanings. These basic symbols must
and then goes back for another statement. Both the correspond to some terminal symbols of the
high-level-language source code and the inter- grammar of the particular programming lan-
preter are required every time the program is run. guage. Here terminal symbols refer to the ele-
Compilation translates the high-level-language mentary symbols (or tokens) in the grammar that
source code into an entire machine-language pro- cannot be changed.
gram (an executable image) by a program called a Syntax analysis is based on the results of the
compiler. After compilation, only the executable lexical analysis and discovers the structure in the
image is needed to run the program. Most appli- program and determines whether or not a text
cation software is sold in this form. conforms to an expected format. Is this a textu-
While both compilation and interpretation con- ally correct C++ program? or Is this entry tex-
vert high level language code into machine code, tually correct? are typical questions that can be
13-16 SWEBOK® Guide V3.0

answered by syntax analysis. Syntax analysis 11.1. Operating Systems Overview


determines if the source code of a program is cor-
rect and converts it into a more structured rep- Operating systems is a collection of software and
resentation (parse tree) for semantic analysis or firmware, that controls the execution of computer
transformation. programs and provides such services as computer
Semantic analysis adds semantic information resource allocation, job control, input/output con-
to the parse tree built during the syntax analysis trol, and file management in a computer system.
and builds the symbol table. It performs vari- ous Conceptually, an operating system is a computer
semantic checks that include type checking, program that manages the hardware resources
object binding (associating variable and function and makes it easier to use by applications by pre-
references with their definitions), and definite senting nice abstractions. This nice abstraction is
assignment (requiring all local variables to be often called the virtual machine and includes such
initialized before use). If mistakes are found, the things as processes, virtual memory, and file
semantically incorrect program statements are systems. An OS hides the complexity of the
rejected and flagged as errors. underlying hardware and is found on all modern
Once semantic analysis is complete, the phase computers.
of code generation begins and transforms the The principal roles played by OSs are manage-
intermediate code produced in the previous ment and illusion. Management refers to the OS’s
phases into the native machine language of the management (allocation and recovery) of physi-
computer under consideration. This involves cal resources among multiple competing users/
resource and storage decisions—such as deciding applications/tasks. Illusion refers to the nice
which variables to fit into registers and memory abstractions the OS provides.
and the selection and scheduling of appropriate
machine instructions, along with their associated 11.2. Tasks of an Operating System
addressing modes.
It is often possible to combine multiple phases The tasks of an operating system differ signifi-
into one pass over the code in a compiler imple- cantly depending on the machine and time of its
mentation. Some compilers also have a prepro- invention. However, modern operating systems
cessing phase at the beginning or after the lexical have come to agreement as to the tasks that must
analysis that does necessary housekeeping work, be performed by an OS. These tasks include CPU
such as processing the program instructions for management, memory management, disk man-
the compiler (directives). Some compilers pro- agement (file system), I/O device management,
vide an optional optimization phase at the end of and security and protection. Each OS task man-
the entire compilation to optimize the code (such ages one type of physical resource.
as the rearrangement of instruction sequence) for Specifically, CPU management deals with the
efficiency and other desirable objectives allocation and releases of the CPU among com-
requested by the users. peting programs (called processes/threads in OS
jargon), including the operating system itself. The
11. Operating Systems Basics main abstraction provided by CPU management is
[4*, c3] the process/thread model. Memory management
deals with the allocation and release of memory
Every system of meaningful complexity needs to space among competing processes, and the main
be managed. A computer, as a rather complex abstraction provided by memory management is
electrical-mechanical system, needs its own man- virtual memory. Disk management deals with the
ager for managing the resources and activities sharing of magnetic or optical or solid state disks
occurring on it. That manager is called an operat- among multiple programs/users and its main
ing system (OS). abstraction is the file system. I/O device manage-
ment deals with the allocation and releases of
various I/O devices among competing processes.
Computing Foundations 13-17

Security and protection deal with the protection of • Multiprogrammed batching OS: adds mul-
computer resources from illegal use. titask capability into earlier simple batching
OSs. An example of such an OS is IBM’s
11.3. Operating System Abstractions OS/360.
• Time-sharing OS: adds multi-task and inter-
The arsenal of OSs is abstraction. Corresponding active capabilities into the OS. Examples of
to the five physical tasks, OSs use five abstrac- such OSs include UNIX, Linux, and NT.
tions: process/thread, virtual memory, file sys- • Real-time OS: adds timing predictabil- ity
tems, input/output, and protection domains. The into the OS by scheduling individual tasks
overall OS abstraction is the virtual machine. according to each task’s completion
For each task area of OS, there is both a physi- deadlines. Examples of such OS include
cal reality and a conceptual abstraction. The phys- VxWorks (WindRiver) and DART (EMC).
ical reality refers to the hardware resource under • Distributed OS: adds the capability of man-
management; the conceptual abstraction refers to aging a network of computers into the OS.
the interface the OS presents to the users/pro- • Embedded OS: has limited functionality and
grams above. For example, in the thread model of is used for embedded systems such as cars
the OS, the physical reality is the CPU and the and PDAs. Examples of such OSs include
abstraction is multiple CPUs. Thus, a user doesn’t Palm OS, Windows CE, and TOPPER.
have to worry about sharing the CPU with others
when working on the abstraction provided by an Alternatively, an OS can be classified by its
OS. In the virtual memory abstraction of an OS, applicable target machine/environment into the
the physical reality is the physical RAM or ROM following.
(whatever), the abstraction is multiple unlim- ited
memory space. Thus, a user doesn’t have to • Mainframe OS: runs on the mainframe com-
worry about sharing physical memory with others puters and include OS/360, OS/390, AS/400,
or about limited physical memory size. MVS, and VM.
Abstractions may be virtual or transparent; in • Server OS: runs on workstations or servers
this context virtual applies to something that and includes such systems as UNIX, Win-
appears to be there, but isn’t (like usable memory dows, Linux, and VMS.
beyond physical), whereas transparent applies to • Multicomputer OS: runs on multiple com-
something that is there, but appears not to be there puters and include such examples as Novell
(like fetching memory contents from disk or Netware.
physical memory). • Personal computers OS: runs on personal
computers and include such examples as
11.4. Operating Systems Classification DOS, Windows, Mac OS, and Linux.
• Mobile device OS: runs on personal devices
Different operating systems can have different such as cell phones, IPAD and include such
functionality implementation. In the early days of examples of iOS, Android, Symbian, etc.
the computer era, operating systems were rela-
tively simple. As time goes on, the complexity 12. Database Basics and Data Management
and sophistication of operating systems increases [4*, c9]
significantly. From a historical perspective, an
operating system can be classified as one of the A database consists of an organized collection of
following. data for one or more uses. In a sense, a database is
a generalization and expansion of data structures.
• Batching OS: organizes and processes work But the difference is that a database is usually
in batches. Examples of such OSs include external to individual programs and permanent in
IBM’s FMS, IBSYS, and University of existence compared to data structures. Databases
Michigan’s UMES. are used when the data volume is large or logical
13-18 SWEBOK® Guide V3.0

relations between data items are important. The 12.3. Database Query Language
factors considered in database design include per-
formance, concurrency, integrity, and recovery Users/applications interact with a database
from hardware failures. through a database query language, which is a spe-
cialized programming language tailored to data-
12.1. Entity and Schema base use. The database model tends to determine
the query languages that are available to access
The things a database tries to model and store are the database. One commonly used query lan-
called entities. Entities can be real-world objects guage for the relational database is the structured
such as persons, cars, houses, and so forth, or they query language, more commonly abbreviated as
may be abstract concepts such as persons, salary, SQL. A common query language for object data-
names, and so forth. An entity can be primitive bases is the object query language (abbreviated as
such as a name or composite such as an employee OQL). There are three components of SQL: Data
that consists of a name, identification number, Definition Language (DDL), Data Manipulation
salary, address, and so forth. Language (DML), and Data Control Language
The single most important concept in a database (DCL). An example of an DML query may look
is the schema, which is a description of the entire like the following:
database structure from which all other database
activities are built. A schema defines the relation- SELECT Component_No, Quantity
ships between the various entities that compose a FROM COMPONENT
database. For example, a schema for a company WHERE Item_No = 100
payroll system would consist of such things as
employee ID, name, salary rate, address, and so The above query selects all the Component_No
forth. Database software maintains the database and its corresponding quantity from a database
according to the schema. table called COMPONENT, where the Item_No
Another important concept in database is the equals to 100.
database model that describes the type of rela-
tionship among various entities. The commonly 12.4. Tasks of DBMS Packages
used models include relational, network, and
object models. A DBMS system provides the following
capabilities:
12.2. Database Management Systems (DBMS)
• Database development is used to define and
Database Management System (DBMS) compo- organize the content, relationships, and struc-
nents include database applications for the stor- ture of the data needed to build a database.
age of structured and unstructured data and the • Database interrogation is used for accessing
required database management functions needed the data in a database for information retrieval
to view, collect, store, and retrieve data from the and report generation. End users can selec-
databases. A DBMS controls the creation, main- tively retrieve and display information and
tenance, and use of the database and is usually produce printed reports. This is the operation
categorized according to the database model it that most users know about databases.
supports—such as the relational, network, or • Database Maintenance is used to add, delete,
object model. For example, a relational database update, and correct the data in a database.
management system (RDBMS) implements fea- • Application Development is used to develop
tures of the relational model. An object database prototypes of data entry screens, queries,
management system (ODBMS) implements fea- forms, reports, tables, and labels for a proto-
tures of the object model. typed application. It also refers to the use of
4th Generation Language or application gen-
erators to develop or generate program code.
Computing Foundations 13-19

12.5. Data Management provided by computer networks. These paradigms


include distributed computing, grid computing,
A database must manage the data stored in it. This Internet computing, and cloud computing.
management includes both organization and
storage. 13.1. Types of Network
The organization of the actual data in a database
depends on the database model. In a relational Computer networks are not all the same and may
model, data are organized as tables with different be classified according to a wide variety of
tables representing different entities or relations characteristics, including the network’s connec-
among a set of entities. The storage of data deals tion method, wired technologies, wireless tech-
with the storage of these database tables on disks. nologies, scale, network topology, functions, and
The common ways for achieving this is to use files. speed. But the classification that is familiar to
Sequential, indexed, and hash files are all used in most is based on the scale of networking.
this purpose with different file structures providing
different access performance and convenience. • Personal Area Network/Home Network is a
computer network used for communication
12.6. Data Mining among computer(s) and different informa-
tion technological devices close to one per-
One often has to know what to look for before son. The devices connected to such a net-
querying a database. This type of “pinpointing” work may include PCs, faxes, PDAs, and
access does not make full use of the vast amount TVs. This is the base on which the Internet
of information stored in the database, and in fact of Things is built.
reduces the database into a collection of discrete • Local Area Network (LAN) connects com-
records. To take full advantage of a database, one puters and devices in a limited geographical
can perform statistical analysis and pattern dis- area, such as a school campus, computer lab-
covery on the content of a database using a tech- oratory, office building, or closely positioned
nique called data mining. Such operations can be group of buildings.
used to support a number of business activities • Campus Network is a computer network made
that include, but are not limited to, marketing, up of an interconnection of local area networks
fraud detection, and trend analysis. (LANs) within a limited geographical area.
Numerous ways for performing data mining • Wide area network (WAN) is a computer
have been invented in the past decade and include network that covers a large geographic area,
such common techniques as class description, such as a city or country or even across inter-
class discrimination, cluster analysis, association continental distances. A WAN limited to a
analysis, and outlier analysis. city is sometimes called a Metropolitan Area
Network.
13. Network Communication Basics • Internet is the global network that connects
[8*, c12] computers located in many (perhaps all)
countries.
A computer network connects a collection of
computers and allows users of different comput- Other classifications may divide networks into
ers to share resources with other users. A network control networks, storage networks, virtual pri-
facilitates the communications between all the vate networks (VPN), wireless networks, point-
connected computers and may give the illusion of to-point networks, and Internet of Things.
a single, omnipresent computer. Every com-
puter or device connected to a network is called a 13.2. Basic Network Components
network node.
A number of computing paradigms have emerged All networks are made up of the same basic hard-
to benefit from the functions and capabilities ware components, including computers, network
13-20 SWEBOK® Guide V3.0

interface cards (NICs), bridges, hubs, switches, link layer protocols include frame-relay, asyn-
and routers. All these components are called nodes chronous transfer mode (ATM), and Point-to-
in the jargon of networking. Each component per- Point Protocol (PPP). Application layer protocols
forms a distinctive function that is essential for include Fibre channel, Small Computer System
the packaging, connection, transmission, amplifi- Interface (SCSI), and Bluetooth. For each layer or
cation, controlling, unpacking, and interpretation even each individual protocol, there may be
of the data. For example, a repeater amplifies the standards established by national or international
signals, a switch performs many-to-many connec- organizations to guide the design and develop-
tions, a hub performs one-to-many connections, ment of the corresponding protocols.
an interface card is attached to the computer and
performs data packing and transmission, a bridge Application Layer
connects one network with another, and a router is Presentation Layer
a computer itself and performs data analysis and
flow control to regulate the data from the network. Session Layer
The functions performed by various network Transport Layer
components correspond to the functions specified Network Layer
by one or more levels of the seven-layer Open Data link Layer
Systems Interconnect (OSI) networking model,
Physical Layer
which is discussed below.

13.3. Networking Protocols and Standards Figure 13.5. The Seven-Layer OSI Networking Model

Computers communicate with each other using


protocols, which specify the format and regula- 13.4. The Internet
tions used to pack and un-pack data. To facilitate
easier communication and better structure, net- The Internet is a global system of interconnected
work protocols are divided into different layers governmental, academic, corporate, public, and
with each layer dealing with one aspect of the private computer networks. In the public domain
communication. For example, the physical lay- access to the internet is through organizations
ers deal with the physical connection between the known as internet service providers (ISP). The
parties that are to communicate, the data link layer ISP maintains one or more switching centers
deals with the raw data transmission and flow called a point of presence, which actually con-
control, and the network layer deals with the nects the users to the Internet.
packing and un-packing of data into a particular
format that is understandable by the relevant par- 13.5. Internet of Things
ties. The most commonly used OSI networking
model organizes network protocols into seven The Internet of Things refers to the networking of
layers, as depicted in Figure 13.5. everyday objects—such as cars, cell phones,
One thing to note is that not all network proto- PDAs, TVs, refrigerators, and even buildings—
cols implement all layers of the OSI model. For using wired or wireless networking technologies.
example, the TCP/IP protocol implements neither The function and purpose of Internet of Things is
the presentation layer nor the session layer. to interconnect all things to facilitate autono-
There can be more than one protocol for each mous and better living. Technologies used in the
layer. For example, UDP and TCP both work on Internet of Things include RFID, wireless and
the transport layer above IP’s network layer, pro- wired networking, sensor technology, and much
viding best-effort, unreliable transport (UDP) vs. software of course. As the paradigm of Internet of
reliable transport function (TCP). Physical layer Things is still taking shape, much work is needed
protocols include token ring, Ethernet, fast Ether- for Internet of Things to gain wide spread
net, gigabit Ethernet, and wireless Ethernet. Data acceptance.
Computing Foundations 13-21

13.6. Virtual Private Network (VPN) Fundamentally, distributed computing is


another form of parallel computing, albeit on a
A virtual private network is a preplanned virtual grander scale. In distributed computing, the func-
connection between nodes in a LAN/WAN or on tional units are not ALU, FPU, or separate cores,
the internet. It allows the network administrator to but individual computers. For this reason, some
separate network traffic into user groups that have people regard distributed computing as being the
a common affinity for each other such as all users same as parallel computing. Because both distrib-
in the same organization, or workgroup. This uted and parallel computing involve some form
circuit type may improve performance and of concurrency, they are both also called concur-
security between nodes and allows for eas- ier rent computing.
maintenance of circuits when troubleshooting.
14.2. Difference between Parallel and Distrib-
14. Parallel and Distributed Computing uted Computing
[8*, c9]
Though parallel and distributed computing resem-
Parallel computing is a computing paradigm that ble each other on the surface, there is a subtle but
emerges with the development of multi-func- real distinction between them: parallel comput-
tional units within a computer. The main objec- ing does not necessarily refer to the execution of
tive of parallel computing is to execute several programs on different computers— instead, they
tasks simultaneously on different functional units can be run on different processors within a single
and thus improve throughput or response or both. computer. In fact, consensus among computing
Distributed computing, on the other hand, is a professionals limits the scope of parallel comput-
computing paradigm that emerges with the devel- ing to the case where a shared memory is used by
opment of computer networks. Its main objective all processors involved in the computing, while
is to either make use of multiple computers in the distributed computing refers to computations
network to accomplish things otherwise not pos- where private memory exists for each processor
sible within a single computer or improve com- involved in the computations.
putation efficiency by harnessing the power of Another subtle difference between parallel and
multiple computers. distributed computing is that parallel computing
necessitates concurrent execution of several tasks
14.1. Parallel and Distributed Computing while distributed computing does not have this
Overview necessity.
Based on the above discussion, it is possible to
Traditionally, parallel computing investigates classify concurrent systems as being “parallel” or
ways to maximize concurrency (the simultaneous “distributed” based on the existence or nonex-
execution of multiple tasks) within the bound- ary istence of shared memory among all the proces-
of a computer. Distributed computing studies sor: parallel computing deals with computations
distributed systems, which consists of multiple within a single computer; distributed computing
autonomous computers that communicate through deals with computations within a set of comput-
a computer network. Alternatively, distributed ers. According to this view, multicore computing
computing can also refer to the use of distributed is a form of parallel computing.
systems to solve computational or transactional
problems. In the former definition, distributed 14.3. Parallel and Distributed Computing
computing investigates the protocols, mecha- Models
nisms, and strategies that provide the foundation
for distributed computation; in the latter definition, Since multiple computers/processors/cores are
distributed computing studies the ways of dividing involved in distributed/parallel computing, some
a problem into many tasks and assigning such tasks coordination among the involved parties is nec-
to various computers involved in the computation. essary to ensure correct behavior of the system.
13-22 SWEBOK® Guide V3.0

Different ways of coordination give rise to differ- 15. Basic User Human Factors
ent computing models. The most common mod- [3*, c8] [9*, c5]
els in this regard are the shared memory (paral-

lel) model and the message-passing (distributed) Software is developed to meet human desires or
model. needs. Thus, all software design and develop-
In a shared memory (parallel) model, all com- ment must take into consideration human-user
puters have access to a shared central memory factors such as how people use software, how
where local caches are used to speed up the people view software, and what humans expect
processing power. These caches use a protocol to from software. There are numerous factors in the
insure the localized data is fresh and up to date, human-machine interaction, and ISO 9241 docu-
typically the MESI protocol. The algorithm ment series define all the detailed standards of
designer chooses the program for execution by such interactions.[10] But the basic human-user
each computer. Access to the central memory can factors considered here include input/output, the
be synchronous or asynchronous, and must be handling of error messages, and the robustness of
coordinated such that coherency is maintained. the software in general.
Different access models have been invented for
such a purpose. 15.1. Input and Output
In a message-passing (distributed) model, all
computers run some programs that collectively Input and output are the interfaces between users
achieve some purpose. The system must work and software. Software is useless without input
correctly regardless of the structure of the net- and output. Humans design software to process
work. This model can be further classified into some input and produce desirable output. All
client-server (C/S), browser-server (B/S), and n- software engineers must consider input and out-
tier models. In the C/S model, the server pro- put as an integral part of the software product they
vides services and the client requests services engineer or develop. Issues considered for input
from the server. In the B/S model, the server pro- include (but are not limited to):
vides services and the client is the browser. In the
n-tier model, each tier (i.e. layer) provides ser- • What input is required?
vices to the tier immediately above it and requests • How is the input passed from users to
services from the tier immediately below it. In computers?
fact, the n-tier model can be seen as a chain of • What is the most convenient way for users to
client-server models. Often, the tiers between the enter input?
bottommost tier and the topmost tier are called • What format does the computer require of
middleware, which is a distinct subject of study the input data?
in its own right.
The designer should request the minimum data
14.4. Main Issues in Distributed Computing from human input, only when the data is not
already stored in the system. The designer should
Coordination among all the components in a dis- format and edit the data at the time of entry to
tributed computing environment is often complex reduce errors arising from incorrect or malicious
and time-consuming. As the number of cores/ data entry.
CPUs/computers increases, the complexity of For output, we need to consider what the users
distributed computing also increases. Among the wish to see:
many issues faced, memory coherency and
consensus among all computers are the most dif- • In what format would users like to see
ficult ones. Many computation paradigms have output?
been invented to solve these problems and are the • What is the most pleasing way to display
main discussion issues in distributed/parallel output?
computing.
Computing Foundations 13-23

If the party interacting with the software isn’t 15.3. Software Robustness
human but another software or computer or con-
trol system, then we need to consider the input/ Software robustness refers to the ability of soft-
output type and format that the software should ware to tolerate erroneous inputs. Software is said
produce to ensure proper data exchange between to be robust if it continues to function even when
systems. erroneous inputs are given. Thus, it is unaccept-
There are many rules of thumb for developers able for software to simply crash when encoun-
to follow to produce good input/output for a soft- tering an input problem as this may cause unex-
ware. These rules of thumb include simple and pected consequences, such as the loss of valuable
natural dialogue, speaking users’ language, mini- data. Software that exhibits such behavior is con-
mizing user memory load, consistency, minimal sidered to lack robustness.
surprise, conformance to standards (whether Nielsen gives a simpler description of software
agreed to or not: e.g., automobiles have a stan- robustness: “The software should have a low
dard interface for accelerator, brake, steering). error rate, so that users make few errors during
the use of the system and so that if they do make
15.2. Error Messages errors they can easily recover from them. Further,
catastrophic errors must not occur” [9*].
It is understandable that most software con- tains There are many ways to evaluate the robust-
faults and fails from time to time. But users ness of software and just as many ways to make
should be notified if there is anything that software more robust. For example, to improve
impedes the smooth execution of the program. robustness, one should always check the validity
Nothing is more frustrating than an unexpected of the inputs and return values before progress-
termination or behavioral deviation of software ing further; one should always throw an excep-
without any warning or explanation. To be user tion when something unexpected occurs, and one
friendly, the software should report all error con- should never quit a program without first giving
ditions to the users or upper-level applications so users/applications a chance to correct the
that some measure can be taken to rectify the condition.
situation or to exit gracefully. There are several
guidelines that define what constitutes a good 16. Basic Developer Human Factors
error message: error messages should be clear, to [3*, c31–32]
the point, and timely.
First, error messages should clearly explain Developer human factors refer to the consider-
what is happening so that users know what is ations of human factors taken when developing
going on in the software. Second, error mes- software. Software is developed by humans, read
sages should pinpoint the cause of the error, if at by humans, and maintained by humans. If any-
all possible, so that proper actions can be taken. thing is wrong, humans are responsible for cor-
Third, error messages should be displayed right recting those wrongs. Thus, it is essential to write
when the error condition occurs. According to software in a way that is easily understandable by
Jakob Nielsen, “Good error messages should be humans or, at the very least, by other software
expressed in plain language (no codes), precisely developers. A program that is easy to read and
indicate the problem, and constructively suggest understand exhibits readability.
a solution” [9*]. Fourth, error messages should The means to ensure that software meet this
not overload the users with too much informa- objective are numerous and range from proper
tion and cause them to ignore the messages all architecture at the macro level to the particular
together. coding style and variable usage at the micro level.
However, messages relating to security access But the two prominent factors are structure (or
errors should not provide extra information that program layouts) and comments (documentation).
would help unauthorized persons break in.
13-24 SWEBOK® Guide V3.0

16.1. Structure • Within a function, comments should be given


for each logical section of coding to explain
Well-structured programs are easier to understand the meaning and purpose (intention) of the
and modify. If a program is poorly structured, then section.
no amount of explanation or comments is sufficient • Comments should stipulate what freedom
to make it understandable. The ways to organize a does (or does not) the maintaining program-
program are numerous and range from the proper mers have with respect to making changes to
use of white space, indentation, and parentheses to that code.
nice arrangements of groupings, blank lines, and • Comments are seldom required for indi-
braces. Whatever style one chooses, it should be vidual statements. If a statement needs com-
consistent across the entire program. ments, one should reconsider the statement.

16.2. Comments
17. Secure Software Development and
To most people, programming is coding. These Maintenance
people do not realize that programming also [11*, c29]
includes writing comments and that comments are
an integral part of programming. True, comments Due to increasing malicious activities targeted at
are not used by the computer and certainly do not computer systems, security has become a sig-
constitute final instructions for the computer, but nificant issue in the development of software. In
they improve the readability of the programs by addition to the usual correctness and reliability,
explaining the meaning and logic of the statements software developers must also pay attention to the
or sections of code. It should be remembered that security of the software they develop. Secure
programs are not only meant for computers, they software development builds security in software
are also read, written, and modified by humans. by following a set of established and/or recom-
The types of comments include repeat of the mended rules and practices in software develop-
code, explanation of the code, marker of the code, ment. Secure software maintenance complements
summary of the code, description of the code’s secure software development by ensuring the no
intent, and information that cannot possi- bly be security problems are introduced during software
expressed by the code itself. Some com- ments maintenance.
are good, some are not. The good ones are those A generally accepted view concerning software
that explain the intent of the code and justify why security is that it is much better to design security
this code looks the way it does. The bad ones are into software than to patch it in after software is
repeat of the code and stating irrel- evant developed. To design security into software, one
information. The best comments are self- must take into consideration every stage of the soft-
documenting code. If the code is written in such a ware development lifecycle. In particular, secure
clear and precise manner that its meaning is self- software development involves software require-
proclaimed, then no comment is needed. But this ments security, software design security, software
is easier said than done. Most programs are not construction security, and software testing secu-
self-explanatory and are often hard to read and rity. In addition, security must also be taken into
understand if no comments are given. consideration when performing software mainte-
Here are some general guidelines for writing nance as security faults and loopholes can be and
good comments: often are introduced during maintenance.

• Comments should be consistent across the 17.1. Software Requirements Security


entire program.
• Each function should be associated with Software requirements security deals with the
comments that explain the purpose of the clarification and specification of security policy
function and its role in the overall program. and objectives into software requirements, which
Computing Foundations 13-25

lays the foundation for security considerations in • Structure the process so that all sections
the software development. Factors to consider in requiring extra privileges are modules. The
this phase include software requirements and modules should be as small as possible and
threats/risks. The former refers to the specific should perform only those tasks that require
functions that are required for the sake of secu- those privileges.
rity; the latter refers to the possible ways that the • Ensure that any assumptions in the program
security of software is threatened. are validated. If this is not possible, docu-
ment them for the installers and maintainers
17.2. Software Design Security so they know the assumptions that attackers
will try to invalidate.
Software Design security deals with the design of • Ensure that the program does not share
software modules that fit together to meet the objects in memory with any other program.
security objectives specified in the security • The error status of every function must be
requirements. This step clarifies the details of checked. Do not try to recover unless neither
security considerations and develops the specific the cause of the error nor its effects affect any
steps for implementation. Factors considered may security considerations. The program should
include frameworks and access modes that set up restore the state of the software to the state it
the overall security monitoring/enforce- ment had before the process began, and then
strategies, as well as the individual policy terminate.
enforcement mechanisms.
17.4. Software Testing Security
17.3. Software Construction Security
Software testing security determines that soft-
Software construction security concerns the ques- ware protects data and maintains security speci-
tion of how to write actual programming code for fication as given. For more information, please
specific situations such that security considerations refer to the Software Testing KA.
are taken care of. The term “Software Construction
Security” could mean different things for different 17.5. Build Security into Software Engineering
people. It can mean the way a specific function is Process
coded, such that the coding itself is secure, or it can
mean the coding of security into software. Software is only as secure as its development
Most people entangle the two together without process goes. To ensure the security of software,
distinction. One reason for such entanglement is security must be built into the software engineer-
that it is not clear how one can make sure that a ing process. One trend that emerges in this regard
specific coding is secure. For example, in C pro- is the Secure Development Lifecycle (SDL) con-
gramming language, the expression of i<<1 (shift cept, which is a classical spiral model that takes a
the binary representation of i’s value to the left by holistic view of security from the perspective of
one bit) and 2*i (multiply the value of variable i software lifecycle and ensures that security is
by constant 2) mean the same thing semantically, inherent in software design and development, not
but do they have the same security ramification? an afterthought later in production. The SDL pro-
The answer could be different for different com- cess is claimed to reduce software maintenance
binations of ISAs and compilers. Due to this lack costs and increase reliability of software concern-
of understanding, software construction secu- ing software security related faults.
rity—in its current state of existence—mostly
refers to the second aspect mentioned above: the 17.6. Software Security Guidelines
coding of security into software.
Coding of security into software can be Although there are no bulletproof ways for secure
achieved by following recommended rules. A few software development, some general guidelines
such rules follow: do exist that can be used to aid such effort. These
13-26 SWEBOK® Guide V3.0

guidelines span every phase of the software 1. Validate input.


development lifecycle. Some reputable guide- 2. Heed compiler warnings.
lines are published by the Computer Emergency 3. Architect and design for security policies.
Response Team (CERT) and below are its top 10 4. Keep it simple.
software security practices (the details can be 5. Default deny.
found in [12]: 6. Adhere to the principle of least privilege.
7. Sanitize data sent to other software.
8. Practice defense in depth.
9. Use effective quality assurance techniques.
10. Adopt a software construction security
standard.
Computing Foundations 13-27

MATRIX OF TOPICS VS. REFERENCE MATERIAL

Null and Lobur 2006


Horowitz et al. 2007

Sommerville 2011
Brookshear 2008
McConnell 2004
Voland 2003

Nielsen 1993

Bishop 2002
[11*]
[2*]

[4*]

[6*]
[3*]

[5*]

[8*]

[9*]
1. Problem Solving s3.2,
Techniques s4.2
1.1. Definition of
s3.2
Problem Solving
1.2. Formulating the
s3.2
Real Problem
1.3. Analyze the
s3.2
Problem
1.4. Design a
Solution Search s4.2
Strategy
1.5. Problem Solving
c5
Using Programs
s5.2–
2. Abstraction
5.4
2.1. Levels of s5.2–
Abstraction 5.3
2.2. Encapsulation s5.3
2.3. Hierarchy s5.2
3. Programming
c6–19
Fundamentals
3.1. The
Programming c6–c19
Process
3.2. Programming
c6–c19
Paradigms
3.3. Defensive
c8
Programming
4. Programming
c6
Language Basics
4.1. Programming
s6.1
Language Overview
4.2. Syntax and
Semantics of
s6.2
Programming
Language
13-28 SWEBOK® Guide V3.0

Null and Lobur 2006


Horowitz et al. 2007

Sommerville 2011
Brookshear 2008
McConnell 2004
Voland 2003

Nielsen 1993

Bishop 2002
[11*]
[2*]

[4*]

[6*]
[3*]

[5*]

[8*]

[9*]
4.3. Low Level
s6.5–
Programming
6.7
Language
4.4. High Level
s6.5–
Programing
6.7
Language
4.5. Declarative
vs. Imperative s6.5–
Programming 6.7
Language
5. Debugging Tools
c23
and Techniques
5.1. Types of Errors s23.1
5.2. Debugging
s23.2
Techniques:
5.3. Debugging
s23.5
Tools
6. Data Structure and s2.1–
Representation 2.6
6.1. Data Structure s2.1–
Overview 2.6
6.2. Types of Data s2.1–
Structure 2.6
6.3. Operations on s2.1–
Data Structures 2.6
s1.1–
1.3,
s3.3–
3.6,
s4.1–
4.8,
7. Algorithms and s5.1–
Complexity 5.7,
s6.1–
6.3,
s7.1–
7.6,
s11.1,
s12.1
Computing Foundations 13-29

Null and Lobur 2006


Horowitz et al. 2007

Sommerville 2011
Brookshear 2008
McConnell 2004
Voland 2003

Nielsen 1993

Bishop 2002
[11*]
[2*]

[4*]

[6*]
[3*]

[5*]

[8*]

[9*]
7.1. Overview of
s1.1–1.2
Algorithms
7.2. Attributes of
s1.3
Algorithms
7.3. Algorithmic
s1.3
Analysis
s3.3–
3.6,
s4.1–
4.8,
s5.1–
7.4. Algorithmic 5.7,
Design Strategies s6.1–
6.3,
s7.1–
7.6,
s11.1,
s12.1
s3.3–
3.6,
s4.1–
4.8,
s5.1–
7.5. Algorithmic 5.7,
Analysis Strategies s6.1–
6.3,
s7.1–
7.6,
s11.1,
s12.1
8. Basic Concept of a
c10
System
8.1. Emergent
s10.1
System Properties
8.2. System
s10.2
Engineering
8.3. Overview of a
Computer System
13-30 SWEBOK® Guide V3.0

Null and Lobur 2006


Horowitz et al. 2007

Sommerville 2011
Brookshear 2008
McConnell 2004
Voland 2003

Nielsen 1993

Bishop 2002
[11*]
[2*]

[4*]

[6*]
[3*]

[5*]

[8*]

[9*]
9. Computer
c1–4
Organization
9.1. Computer
Organization s1.1–1.2
Overview
9.2. Digital Systems c3
9.3. Digital Logic c3
9.4. Computer
c2
Expression of Data
9.5. The Central
s4.1–
Processing Unit
4.2
(CPU)
9.6. Memory System
s4.6
Organization
9.7. Input and Output
s4.5
(I/O)
10. Compiler Basics s6.4 s8.4
10.1. Compiler
s8.4
Overview
10.2. Interpretation
s8.4
and Compilation
10.3. The
s6.4 s8.4
Compilation Process
11. Operating
c3
Systems Basics
11.1. Operating
s3.2
Systems Overview
11.2. Tasks of
s3.3
Operating System
11.3. Operating
s3.2
System Abstractions
11.4. Operating
Systems s3.1
Classification
Computing Foundations 13-31

Null and Lobur 2006


Horowitz et al. 2007

Sommerville 2011
Brookshear 2008
McConnell 2004
Voland 2003

Nielsen 1993

Bishop 2002
[11*]
[2*]

[4*]

[6*]
[3*]

[5*]

[8*]

[9*]
12. Database
Basics and Data c9
Management
12.1. Entity and
s9.1
Schema
12.2. Database
Management s9.1
Systems (DBMS)
12.3. Database
s9.2
Query Language
12.4. Tasks of
s9.2
DBMS Packages
12.5. Data
s9.5
Management
12.6. Data Mining s9.6
13. Network
Communication c12
Basics
13.1. Types of s12.2–
Network 12.3
13.2. Basic Network
s12.6
Components
13.3. Networking
s12.4–
Protocols and
12.5
Standards
13.4. The Internet
13.5. Internet of
s12.8
Things
13.6. Virtual Private
Network
14. Parallel and
Distributed c9
Computing
14.1. Parallel
and Distributed s9.4.1–
Computing 9.4.3
Overview
13-32 SWEBOK® Guide V3.0

Null and Lobur 2006


Horowitz et al. 2007

Sommerville 2011
Brookshear 2008
McConnell 2004
Voland 2003

Nielsen 1993

Bishop 2002
[11*]
[2*]

[4*]

[6*]
[3*]

[5*]

[8*]

[9*]
14.2. Differences
between Parallel s9.4.4–
and Distributed 9.4.5
Computing
14.3. Parallel
s9.4.4–
and Distributed
9.4.5
Computing Models
14.4. Main Issues
in Distributed
Computing
15. Basic User
c8 c5
Human Factors
15.1. Input and s5.1,
Output s5.3
s5.2,
15.2. Error Messages
s5.8
15.3. Software s5.5–
Robustness 5.6
16. Basic Developer
c31–32
Human Factors
16.1. Structure c31
16.2. Comments c32
17. Secure Software
Development and c29
Maintenance
17.1. Two Aspects of
s29.1
Secure Coding
17.2. Coding
Security into s29.4
Software
17.3. Requirement
s29.2
Security
17.4. Design
s29.3
Security
17.5. Implementation
s29.5
Security
Computing Foundations 13-33

REFERENCES
[1] Joint Task Force on Computing Curricula, [7] ISO/IEC/IEEE 24765:2010 Systems and
IEEE Computer Society and Association Software Engineering—Vocabulary, ISO/
for Computing Machinery, Software IEC/IEEE, 2010.
Engineering 2004: Curriculum Guidelines
for Undergraduate Degree Programs in [8*] L. Null and J. Lobur, The Essentials of
Software Engineering, 2004; http://sites. Computer Organization and Architecture,
computer.org/ccse/SE2004Volume.pdf. 2nd ed., Jones and Bartlett Publishers,
2006.
[2*] G. Voland, Engineering by Design, 2nd ed.,
Prentice Hall, 2003. [9*] J. Nielsen, Usability Engineering, Morgan
Kaufmann, 1993.
[3*] S. McConnell, Code Complete, 2nd ed.,
Microsoft Press, 2004. [10] ISO 9241-420:2011 Ergonomics of Human-
System Interaction, ISO, 2011.
[4*] J.G. Brookshear, Computer Science: An
Overview, 10th ed., Addison-Wesley, 2008. [11*] M. Bishop, Computer Security: Art and
Science, Addison-Wesley, 2002.
[5*] E. Horowitz et al., Computer Algorithms,
2nd ed., Silicon Press, 2007. [12] R.C. Seacord, The CERT C Secure Coding
Standard, Addison-Wesley Professional,
[6*] I. Sommerville, Software Engineering, 9th 2008.
ed., Addison-Wesley, 2011.
13-34 SWEBOK® Guide V3.0

CHAPTER 14

MATHEMATICAL FOUNDATIONS

INTRODUCTION short, you can write a program for a problem only


if it follows some logic. The objective of this KA
Software professionals live with programs. In a is to help you develop the skill to identify and
very simple language, one can program only for describe such logic. The emphasis is on helping
something that follows a well-understood, non- you understand the basic concepts rather than on
ambiguous logic. The Mathematical Foundations challenging your arithmetic abilities.
knowledge area (KA) helps software engineers
comprehend this logic, which in turn is translated BREAKDOWN OF TOPICS FOR
into programming language code. The mathemat- MATHEMATICAL FOUNDATIONS
ics that is the primary focus in this KA is quite
different from typical arithmetic, where numbers The breakdown of topics for the Mathematical
are dealt with and discussed. Logic and reason- Foundations KA is shown in Figure 14.1.
ing are the essence of mathematics that a software
engineer must address. 1. Set, Relations, Functions
Mathematics, in a sense, is the study of formal [1*, c2]
systems. The word “formal” is associated with
preciseness, so there cannot be any ambiguous or Set. A set is a collection of objects, called elements
erroneous interpretation of the fact. Mathemat- of the set. A set can be represented by listing its
ics is therefore the study of any and all certain elements between braces, e.g., S = {1, 2, 3}.
truths about any concept. This concept can be The symbol ∈ is used to express that an ele-
about numbers as well as about symbols, images, ment belongs to a set, or—in other words—is a
sounds, video—almost anything. In short, not member of the set. Its negation is represented by
only numbers and numeric equations are sub- ject ∉, e.g., 1 ∈ S, but 4 ∉ S.
to preciseness. On the contrary, a software In a more compact representation of set using
engineer needs to have a precise abstraction on a set builder notation, {x | P(x)} is the set of all x
diverse application domain. such that P(x) for any proposition P(x) over any
The SWEBOK Guide’s Mathematical Founda- universe of discourse. Examples for some impor-
tions KA covers basic techniques to identify a set tant sets include the following:
of rules for reasoning in the context of the system
under study. Anything that one can deduce fol- N = {0, 1, 2, 3, …} = the set of nonnegative
lowing these rules is an absolute certainty within integers.
the context of that system. In this KA, techniques Z = {…, −3, −2, −1, 0, 1, 2, 3, …} = the set of
that can represent and take forward the reasoning integers.
and judgment of a software engineer in a precise
(and therefore mathematical) manner are defined Finite and Infinite Set. A set with a finite num-
and discussed. The language and methods of logic ber of elements is called a finite set. Conversely,
that are discussed here allow us to describe math- any set that does not have a finite number of ele-
ematical proofs to infer conclusively the absolute ments in it is an infinite set. The set of all natural
truth of certain concepts beyond the numbers. In numbers, for example, is an infinite set.

14-1
14-2 SWEBOK® Guide V3.0

Figure 14.1. Breakdown of Topics for the Mathematical Foundations KA

Cardinality. The cardinality of a finite set S is Empty Set. A set with no elements is called an
the number of elements in S. This is represented empty set. An empty set, denoted by ∅, is also
|S|, e.g., if S = {1, 2, 3}, then |S| = 3. referred to as a null or void set.
Universal Set. In general S = {x ∈ U | p(x)}, Power Set. The set of all subsets of a set X is
where U is the universe of discourse in which called the power set of X. It is represented as
the predicate P(x) must be interpreted. The “uni- ℘(X).
verse of discourse” for a given predicate is often For example, if X = {a, b, c}, then ℘(X) = {∅,
referred to as the universal set. Alternately, one {a}, {b}, {c}, {a, b}, {a, c}, {b, c}, {a, b, c}}. If
may define universal set as the set of all elements. |X| = n, then |℘(X)| = 2n.
Set Equality. Two sets are equal if and only if Venn Diagrams. Venn diagrams are graphic rep-
they have the same elements, i.e.: resentations of sets as enclosed areas in the plane.
For example, in Figure 14.2, the rectangle rep-
X = Y ≡ ∀p (p ∈ X ↔ p ∈ Y). resents the universal set and the shaded region
represents a set X.
Subset. X is a subset of set Y, or X is contained
in Y, if all elements of X are included in Y. This is
denoted by X ⊆ Y. In other words, X ⊆ Y if and
only if ∀p (p ∈ X → p ∈ Y).
For example, if X = {1, 2, 3} and Y = {1, 2, 3,
4, 5}, then X ⊆ Y.
If X is not a subset of Y, it is denoted as X Y.
Proper Subset. X is a proper subset of Y (denoted
by X ⊂ Y) if X is a subset of Y but not equal to Y,
i.e., there is some element in Y that is not in X.
In other words, X ⊂ Y if (X ⊆ Y) ∧ (X ≠ Y). Figure 14.2. Venn Diagram for Set X
For example, if X = {1, 2, 3}, Y = {1, 2, 3,
4}, and Z = {1, 2, 3}, then X ⊂ Y, but X is not a
proper subset of Z. Sets X and Z are equal sets. 1.1. Set Operations
If X is not a proper subset of Y, it is denoted
as X ⊄ Y. Intersection. The intersection of two sets X and
Y, denoted by X ∩ Y, is the set of common ele-
Superset. If X is a subset of Y, then Y is called ments in both X and Y.
a superset of X. This is denoted by Y ⊇ X, i.e., Y In other words, X ∩ Y = {p | (p ∈ X) ∧ (p ∈ Y)}.
⊇ X if and only if X ⊆ Y. As, for example, {1, 2, 3} ∩ {3, 4, 6} = {3}
For example, if X = {1, 2, 3} and Y = {1, 2, 3, If X ∩ Y = f, then the two sets X and Y are said
4, 5}, then Y ⊇ X. to be a disjoint pair of sets.
Mathematical Foundations 14-3

A Venn diagram for set intersection is shown in The shaded portion of the Venn diagram in Fig-
Figure 14.3. The common portion of the two sets ure 14.5 represents the complement set of X.
represents the set intersection. Set Difference or Relative Complement. The set
of elements that belong to set X but not to set Y
builds the set difference of Y from X. This is rep-
resented by X − Y.
In other words, X − Y = {p | (p ∈ X) ∧ (p ∉ Y)}.
As, for example, {1, 2, 3} − {3, 4, 6} = {1, 2}.
It may be proved that X − Y = X ∩ Y’.
Set difference X – Y is illustrated by the shaded
region in Figure 14.6 using a Venn diagram.
Figure 14.3. Intersection of Sets X and Y

Union. The union of two sets X and Y, denoted


by X ∪ Y, is the set of all elements either in X, or
in Y, or in both.
In other words, X ∪ Y = {p | (p ∈ X) ∨ (p ∈ Y)}.
As, for example, {1, 2, 3} ∪ {3, 4, 6} = {1, 2,
3, 4, 6}.

Figure 14.6. Venn Diagram for X − Y

Cartesian Product. An ordinary pair {p, q} is a


Figure 14.4. Union of Sets X and Y set with two elements. In a set, the order of the
elements is irrelevant, so {p, q} = {q, p}.
It may be noted that |X ∪ Y| = |X| + |Y| − |X In an ordered pair (p, q), the order of occur-
∩ Y|. rences of the elements is relevant. Thus, (p, q) ≠
A Venn diagram illustrating the union of two (q, p) unless p = q. In general (p, q) = (s, t) if and
sets is represented by the shaded region in Figure only if p = s and q = t.
14.4. Given two sets X and Y, their Cartesian product
Complement. The set of elements in the univer- X × Y is the set of all ordered pairs (p, q) such that
sal set that do not belong to a given set X is called p ∈ X and q ∈ Y.
its complement set X'. In other words, X × Y = {(p, q) | (p ∈ X) ∧ (q
In other words, X' ={p | (p ∈ U) ∧ (p ∉ X)}. ∈ Y)}.
As for example, {a, b} × {1, 2} = {(a, 1), (a, 2),
(b, 1), (b, 2)}

1.2. Properties of Set

Some of the important properties and laws of sets


are mentioned below.

1. Associative Laws:
X ∪ (Y ∪ Z) = (X ∪ Y) ∪ Z
Figure 14.5. Venn Diagram for Complement Set of X X ∩ (Y ∩ Z) = (X ∩ Y) ∩ Z
14-4 SWEBOK® Guide V3.0

2. Commutative Laws: this becomes a well-behaved relation and hence a


X∪Y=Y∪X X∩Y=Y∩X function. This means that, while all functions are
relations, not all relations are functions. In case of
3. Distributive Laws: a function given an x, one gets one and exactly
X ∪ (Y ∩ Z) = (X ∪ Y) ∩ (X ∪ Z) one y for each ordered pair (x, y).
X ∩ (Y ∪ Z) = (X ∩ Y) ∪ (X ∩ Z) For example, let’s consider the following two
relations.
4. Identity Laws:
X∪ ∅=X X∩U=X A: {(3, –9), (5, 8), (7, –6), (3, 9), (6, 3)}.
B: {(5, 8), (7, 8), (3, 8), (6, 8)}.
5. Complement Laws:
X ∪ X' = U X ∩ X' = ∅ Are these functions as well?
In case of relation A, the domain is all the x-
6. Idempotent Laws: values, i.e., {3, 5, 6, 7}, and the range is all the
X∪X=X X∩X=X y-values, i.e., {–9, –6, 3, 8, 9}.
Relation A is not a function, as there are two
7. Bound Laws: different range values, –9 and 9, for the same x-
X∪U=U X ∩ ∅ =∅ value of 3.
In case of relation B, the domain is same as that
8. Absorption Laws: for A, i.e., {3, 5, 6, 7}. However, the range is a
X ∪ (X ∩ Y) = X X ∩ (X ∪ Y) = X single element {8}. This qualifies as an example
of a function even if all the x-values are mapped
9. De Morgan’s Laws: to the same y-value. Here, each x-value is distinct
(X ∪ Y)' = X' ∩ Y' (X ∩ Y)' = X' ∪ Y' and hence the function is well behaved. Relation
B may be represented by the equation y = 8.
1.3. Relation and Function The characteristic of a function may be verified
using a vertical line test, which is stated below:
A relation is an association between two sets of Given the graph of a relation, if one can draw
information. For example, let’s consider a set of a vertical line that crosses the graph in more than
residents of a city and their phone numbers. The one place, then the relation is not a function.
pairing of names with corresponding phone
numbers is a relation. This pairing is ordered for
the entire relation. In the example being consid-
ered, for each pair, either the name comes first
followed by the phone number or the reverse. The
set from which the first element is drawn is called
the domain set and the other set is called the range
set. The domain is what you start with and the
range is what you end up with.
A function is a well-behaved relation. A rela-
tion R(X, Y) is well behaved if the function maps Figure 14.7. Vertical Line Test for Function
every element of the domain set X to a single ele-
ment of the range set Y. Let’s consider domain set In this example, both lines L1 and L2 cut the
X as a set of persons and let range set Y store their graph for the relation thrice. This signifies that for
phone numbers. Assuming that a person may have the same x-value, there are three different y-
more than one phone number, the relation being values for each of case. Thus, the relation is not a
considered is not a function. However, if we draw function.
a relation between names of residents and their
date of births with the name set as domain, then
Mathematical Foundations 14-5

2. Basic Logic Idempotent laws:


[1*, c1] p ∨ p≡ p p∧ p≡p

2.1. Propositional Logic Double negation law:


¬ (¬ p) ≡ p
A proposition is a statement that is either true or
false, but not both. Let’s consider declarative Commutative laws:
sentences for which it is meaningful to assign p∨q≡q∨p p ∧ q ≡ q ∧p
either of the two status values: true or false. Some
examples of propositions are given below. Associative laws:
(p ∨ q) ∨ r ≡ p ∨ (q ∨ r)
1. The sun is a star (p ∧ q) ∧ r ≡ p ∧ (q ∧ r)
2. Elephants are mammals.
3. 2 + 3 = 5. Distributive laws:
p ∨ (q ∧ r) ≡ (p ∨ q) ∧ (p ∨ r)
However, a + 3 = b is not a proposition, as it is p ∧ (q ∨ r) ≡ (p ∧ q) ∨ (p ∧ r)
neither true nor false. It depends on the values of
the variables a and b. De Morgan’s laws:
The Law of Excluded Middle: For every propo- ¬ (p ∧ q) ≡ ¬ p ∨ ¬ q ¬ (p ∨ q) ≡ ¬ p ∧ ¬ q
sition p, either p is true or p is false.
The Law of Contradiction: For every proposi- 2.2. Predicate Logic
tion p, it is not the case that p is both true and false.
Propositional logic is the area of logic that A predicate is a verb phrase template that
deals with propositions. A truth table displays describes a property of objects or a relationship
the relationships between the truth values of among objects represented by the variables. For
propositions. example, in the sentence, The flower is red, the
A Boolean variable is one whose value is either template is red is a predicate. It describes the
true or false. Computer bit operations correspond property of a flower. The same predicate may be
to logical operations of Boolean variables. used in other sentences too.
The basic logical operators including negation Predicates are often given a name, e.g., “Red”
(¬ p), conjunction (p ∧ q), disjunction (p ∨ q), or simply “R” can be used to represent the predi-
exclusive or (p ⊕ q), and implication (p → q) are cate is red. Assuming R as the name for the predi-
to be studied. Compound propositions may be cate is red, sentences that assert an object is of the
formed using various logical operators. color red can be represented as R(x), where x rep-
A compound proposition that is always true is a resents an arbitrary object. R(x) reads as x is red.
tautology. A compound proposition that is always Quantifiers allow statements about entire col-
false is a contradiction. A compound proposition lections of objects rather than having to enumer-
that is neither a tautology nor a contradiction is a ate the objects by name.
contingency. The Universal quantifier ∀x asserts that a sen-
Compound propositions that always have the tence is true for all values of variable x.
same truth value are called logically equivalent For example, ∀x Tiger(x) → Mammal(x)
(denoted by ≡). Some of the common equiva- means all tigers are mammals.
lences are: The Existential quantifier ∃x asserts that a sen-
tence is true for at least one value of variable x.
Identity laws: For example, ∃x Tiger(x) → Man-eater(x) means
p ∧ T≡ p p∨ F≡p there exists at least one tiger that is a man-eater.
Thus, while universal quantification uses
Domination laws: implication, the existential quantification natu-
p ∨ T≡ T p∧ F≡F rally uses conjunction.
14-6 SWEBOK® Guide V3.0

A variable x that is introduced into a logical Statements used in a proof include axioms and
expression by a quantifier is bound to the closest postulates that are essentially the underlying
enclosing quantifier. assumptions about mathematical structures, the
A variable is said to be a free variable if it is not hypotheses of the theorem to be proved, and pre-
bound to a quantifier. viously proved theorems.
Similarly, in a block-structured programming A theorem is a statement that can be shown to
language, a variable in a logical expression refers be true.
to the closest quantifier within whose scope it A lemma is a simple theorem used in the proof
appears. of other theorems.
For example, in ∃x (Cat(x) ∧ ∀x (Black(x))), x A corollary is a proposition that can be estab-
in Black(x) is universally quantified. The expres- lished directly from a theorem that has been
sion implies that cats exist and everything is proved.
black. A conjecture is a statement whose truth value
Propositional logic falls short in representing is unknown.
many assertions that are used in computer sci- When a conjecture’s proof is found, the conjec-
ence and mathematics. It also fails to compare ture becomes a theorem. Many times conjectures
equivalence and some other types of relationship are shown to be false and, hence, are not theorems.
between propositions.
For example, the assertion a is greater than 1 3.1. Methods of Proving Theorems
is not a proposition because one cannot infer
whether it is true or false without knowing the Direct Proof. Direct proof is a technique to estab-
value of a. Thus, propositional logic cannot deal lish that the implication p → q is true by showing
with such sentences. However, such assertions that q must be true when p is true.
appear quite often in mathematics and we want to For example, to show that if n is odd then n2−1
infer on those assertions. Also, the pattern is even, suppose n is odd, i.e., n = 2k + 1 for some
involved in the following two logical equiva- integer k:
lences cannot be captured by propositional logic:
“Not all men are smokers” and “Some men don’t ∴ n2 = (2k + 1)2 = 4k2 + 4k + 1.
smoke.” Each of these two propositions is
treated independently in propositional logic. As the first two terms of the Right Hand Side
There is no mechanism in propositional logic to (RHS) are even numbers irrespective of the value
find out whether or not the two are equivalent to of k, the Left Hand Side (LHS) (i.e., n2) is an odd
one another. Hence, in propositional logic, each number. Therefore, n2−1 is even.
equivalent proposition is treated individually Proof by Contradiction. A proposition p is true
rather than dealing with a general formula that by contradiction if proved based on the truth of
covers all equivalences collectively. the implication ¬ p → q where q is a contradiction.
Predicate logic is supposed to be a more pow- For example, to show that the sum of 2x + 1
erful logic that addresses these issues. In a sense, and 2y − 1 is even, assume that the sum of 2x + 1
predicate logic (also known as first-order logic or and 2y − 1is odd. In other words, 2(x + y), which
predicate calculus) is an extension of propo- is a multiple of 2, is odd. This is a contradiction.
sitional logic to formulas involving terms and Hence, the sum of 2x + 1 and 2y − 1 is even.
predicates. An inference rule is a pattern establishing that
if a set of premises are all true, then it can be
3. Proof Techniques deduced that a certain conclusion statement is
[1*, c1] true. The reference rules of addition, simplifica-
tion, and conjunction need to be studied.
A proof is an argument that rigorously establishes Proof by Induction. Proof by induction is done
the truth of a statement. Proofs can themselves be in two phases. First, the proposition is estab-
represented formally as discrete structures. lished to be true for a base case—typically for the
Mathematical Foundations 14-7

positive integer 1. In the second phase, it is estab- n2 ways, and if these tasks cannot be done at the
lished that if the proposition holds for an arbitrary same time, then there are n1+ n2 ways to do either
positive integer k, then it must also hold for the task.
next greater integer, k + 1. In other words, proof
by induction is based on the rule of inference that • If A and B are disjoint sets, then |A ∪ B|=|A|
tells us that the truth of an infinite sequence of + |B|.
propositions P(n), ∀n ∈ [1 … ∞] is established if • In general if A1, A2, …. , An are disjoint
P(1) is true, and secondly, ∀k ∈ [2 ... n] if P(k) sets, then |A1 ∪ A2 ∪ … ∪ An| = |A1| + |A2|
→ P(k + 1). + … + |An|.
It may be noted here that, for a proof by math-
ematical induction, it is not assumed that P(k) is For example, if there are 200 athletes doing
true for all positive integers k. Proving a theo- sprint events and 30 athletes who participate in
rem or proposition only requires us to establish the long jump event, then how many ways are
that if it is assumed P(k) is true for any arbitrary there to pick one athlete who is either a sprinter
positive integer k, then P(k + 1) is also true. The or a long jumper?
correctness of mathematical induction as a valid Using the sum rule, the answer would be 200
proof technique is beyond discussion of the cur- + 30 = 230.
rent text. Let us prove the following proposition The product rule states that if a task t1 can be
using induction. done in n1 ways and a second task t2 can be done
Proposition: The sum of the first n positive odd in n2 ways after the first task has been done, then
integers P(n) is n2. there are n1 * n2 ways to do the procedure.
Basis Step: The proposition is true for n = 1 as
P(1) = 12 = 1. The basis step is complete. • If A and B are disjoint sets, then |A × B| =
Inductive Step: The induction hypothesis (IH) |A| * |B|.
is that the proposition is true for n = k, k being an • In general if A1, A2, …, An are disjoint sets,
arbitrary positive integer k. then |A1 × A2 × … × An| = |A1| * |A2| * ….
* |An|.
∴ 1 + 3 + 5+ … + (2k − 1) = k2
For example, if there are 200 athletes doing
Now, it’s to be shown that P(k) → P(k + 1). sprint events and 30 athletes who participate in
the long jump event, then how many ways are
P(k + 1) = 1 + 3 + 5+ … +(2k − 1) + (2k + 1) there to pick two athletes so that one is a sprinter
= P(k) + (2k + 1) and the other is a long jumper?
= k2 + (2k + 1) [using IH] Using the product rule, the answer would be
= k2 + 2k + 1 200 * 30 = 6000.
= (k + 1)2 The principle of inclusion-exclusion states that
if a task t1 can be done in n1 ways and a second
Thus, it is shown that if the proposition is true task t2 can be done in n2 ways at the same time
for n = k, then it is also true for n = k + 1. with t1, then to find the total number of ways the
The basis step together with the inductive step of two tasks can be done, subtract the number of
the proof show that P(1) is true and the conditional ways to do both tasks from n1 + n2.
statement P(k) → P(k + 1) is true for all positive
• If A and B are not disjoint, |A ∪ B| = |A| +
integers k. Hence, the proposition is proved.
|B| − |A ∩ B|.
4. Basics of Counting

[1*c6] In other words, the principle of inclusion-


exclusion aims to ensure that the objects in the
The sum rule states that if a task t1 can be done intersection of two sets are not counted more than
in n1 ways and a second task t2 can be done in once.
14-8 SWEBOK® Guide V3.0

Recursion is the general term for the practice of 5. Graphs and Trees
defining an object in terms of itself. There are [1*, c10, c11]
recursive algorithms, recursively defined func-
tions, relations, sets, etc. 5.1. Graphs
A recursive function is a function that calls

itself. For example, we define f(n) = 3 * f(n − 1) A graph G = (V, E) where V is the set of vertices
for all n ∈ N and n ≠ 0 and f(0) = 5. (nodes) and E is the set of edges. Edges are also
An algorithm is recursive if it solves a problem referred to as arcs or links.
by reducing it to an instance of the same problem
with a smaller input.
A phenomenon is said to be random if individ-
ual outcomes are uncertain but the long-term pat-
tern of many individual outcomes is predictable.
The probability of any outcome for a ran- dom
phenomenon is the proportion of times the
outcome would occur in a very long series of
repetitions.
The probability P(A) of any event A satisfies 0
≤ P(A) ≤ 1. Any probability is a number between
0 and 1. If S is the sample space in a probabil- ity
model, the P(S) = 1. All possible outcomes
together must have probability of 1. Figure 14.8. Example of a Graph
Two events A and B are disjoint if they have no
outcomes in common and so can never occur F is a function that maps the set of edges E to a
together. If A and B are two disjoint events, P(A set of ordered or unordered pairs of elements V.
or B) = P(A) + P(B). This is known as the addi- For example, in Figure 14.8, G = (V, E) where V
tion rule for disjoint events. = {A, B, C}, E = {e1, e2, e3}, and F = {(e1, (A,
If two events have no outcomes in common, the C)), (e2, (C, B)), (e3, (B, A))}.
probability that one or the other occurs is the sum The graph in Figure 14.8 is a simple graph that
of their individual probabilities. consists of a set of vertices or nodes and a set of
Permutation is an arrangement of objects in edges connecting unordered pairs.
which the order matters without repetition. One The edges in simple graphs are undirected.
can choose r objects in a particular order from a Such graphs are also referred to as undirected
total of n objects by using nP ways, where, np = graphs.
r r
n! / (n − r)!. Various notations like nPr and P(n, r) For example, in Figure 14.8, (e1, (A, C)) may
are used to represent the number of permutations be replaced by (e1, (C, A)) as the pair between
of a set of n objects taken r at a time. vertices A and C is unordered. This holds good
Combination is a selection of objects in which for the other two edges too.
the order does not matter without repetition. This In a multigraph, more than one edge may con-
is different from a permutation because the order nect the same two vertices. Two or more connect-
does not matter. If the order is only changed (and ing edges between the same pair of vertices may
not the members) then no new combination is reflect multiple associations between the same
formed. One can choose r objects in any order two vertices. Such edges are called parallel or
from a total of n objects by using nCr ways, where, multiple edges.
n
Cr = n! / [r! * (n − r)!]. For example, in Figure 14.9, the edges e3 and
e4 are both between A and B. Figure 14.9 is a
multigraph where edges e3 and e4 are multiple
edges.
Mathematical Foundations 14-9

A directed graph G = (V, E) consists of a set of


vertices V and a set of edges E that are ordered
pairs of elements of V. A directed graph may con-
tain loops.
For example, in Figure 14.11, G = (V, E) where
V = {A, B, C}, E = {e1, e2, e3}, and F = {(e1, (A,
C)), (e2, (B, C)), (e3, (B, A))}.

Figure 14.9. Example of a Multigraph

In a pseudograph, edges connecting a node to


itself are allowed. Such edges are called loops.

Figure 14.12. Example of a Weighted Graph

In a weighted graph G = (V, E), each edge has a


weight associated with it. The weight of an edge
typically represents the numeric value associated
with the relationship between the corresponding
two vertices.
For example, in Figure 14.12, the weights for
the edges e1, e2, and e3 are taken to be 76, 93, and
Figure 14.10. Example of a Pseudograph 15 respectively. If the vertices A, B, and C
represent three cities in a state, the weights, for
For example, in Figure 14.10, the edge e4 both example, could be the distances in miles between
starts and ends at B. Figure 14.10 is a pseudo- these cities.
graph in which e4 is a loop. Let G = (V, E) be an undirected graph with edge
set E. Then, for an edge e ∈ E where e = {u, v},
the following terminologies are often used:

• u, v are said to be adjacent or neighbors or


connected.
• edge e is incident with vertices u and v.
• edge e connects u and v.
• vertices u and v are endpoints for edge e.

If vertex v ∈ V, the set of vertices in the undi-


rected graph G(V, E), then:

• the degree of v, deg(v), is its number of inci-


Figure 14.11. Example of a Directed Graph dent edges, except that any self-loops are
counted twice.
14-10 SWEBOK® Guide V3.0

• a vertex with degree 0 is called an isolated


vertex.
• a vertex of degree 1 is called a pendant
vertex.

Let G(V, E) be a directed graph. If e(u, v) is an


edge of G, then the following terminologies are
often used:
Figure 14.13. Example of Cycles C3 and C4
• u is adjacent to v, and v is adjacent from u.
• e comes from u and goes to v. An adjacency list is a table with one row per
• e connects u to v, or e goes from u to v. vertex, listing its adjacent vertices. The adjacency
• the initial vertex of e is u. listing for a directed graph maintains a listing of
• the terminal vertex of e is v. the terminal nodes for each of the vertex in the
graph.
If vertex v is in the set of vertices for the Adjacency
directed graph G(V, E), then Vertex
List

• in-degree of v, deg−(v), is the number of A B, C


edges going to v, i.e., for which v is the ter-
minal vertex. B A, B, C
• out-degree of v, deg+(v), is the number of C A, B
edges coming from v, i.e., for which v is the
initial vertex. Figure 14.14. Adjacency Lists for Graphs in Figures 14.10
• degree of v, deg(v) = deg−(v) + deg+(v), is the and 14.11
sum of vs in-degree and out-degree.
• a loop at a vertex contributes 1 to both in- For example, Figure 14.14 illustrates the adja-
degree and out-degree of this vertex. cency lists for the pseudograph in Figure 14.10
and the directed graph in Figure 14.11. As the
It may be noted that, following the definitions out-degree of vertex C in Figure 14.11 is zero,
above, the degree of a node is unchanged whether there is no entry against C in the adjacency list.
we consider its edges to be directed or undirected. Different representations for a graph—like
In an undirected graph, a path of length n from adjacency matrix, incidence matrix, and adja-
u to v is a sequence of n adjacent edges from ver- cency lists—need to be studied.
tex u to vertex v.
5.2. Trees
• A path is a circuit if u=v.
• A path traverses the vertices along it. A tree T(N, E) is a hierarchical data structure of n
• A path is simple if it contains no edge more = |N| nodes with a specially designated root node
than once. R while the remaining n − 1 nodes form subtrees
under the root node R. The number of edges |E| in
A cycle on n vertices Cn for any n ≥ 3 is a sim- a tree would always be equal to |N| − 1.
ple graph where V = {v1, v2, …, vn} and E = {{v1, The subtree at node X is the subgraph of the
v2}, {v2, v3}, … , {vn−1, vn}, {vn, v1}}. tree consisting of node X and its descendants and
For example, Figure 14.13 illustrates two all edges incident to those descendants. As an
cycles of length 3 and 4. alternate to this recursive definition, a tree may be
defined as a connected undirected graph with no
simple circuits.
Mathematical Foundations 14-11

at level 0. Alternately, the level of a node X is the


length of the unique path from the root of the tree
to node X.
For example, root node A is at level 0 in Fig-
ure 14.15. Nodes B, C, and D are at level 1. The
remaining nodes in Figure 14.15 are all at level 2.
The height of a tree is the maximum of the lev-
els of nodes in the tree.
For example, in Figure 14.15, the height of the
tree is 2.
A node is called a leaf if it has no children. The
degree of a leaf node is 0.
Figure 14.15. Example of a Tree For example, in Figure 14.15, nodes E through
K are all leaf nodes with degree 0.
However, one should remember that a tree is The ancestors or predecessors of a nonroot
strictly hierarchical in nature as compared to a node X are all the nodes in the path from root to
graph, which is flat. In case of a tree, an ordered node X.
pair is built between two nodes as parent and For example, in Figure 14.15, nodes A and D
child. Each child node in a tree is associated with form the set of ancestors for J.
only one parent node, whereas this restric- tion The successors or descendents of a node X are
becomes meaningless for a graph where no all the nodes that have X as its ancestor. For a tree
parent-child association exists. with n nodes, all the remaining n − 1 nodes are
An undirected graph is a tree if and only if there successors of the root node.
is a unique simple path between any two of its For example, in Figure 14.15, node B has suc-
vertices. cessors in E, F, and G.
Figure 14.15 presents a tree T(N, E) where the If node X is an ancestor of node Y, then node Y
set of nodes N = {A, B, C, D, E, F, G, H, I, J, K}. is a successor of X.
The edge set E is {(A, B), (A, C), (A, D), (B, E), Two or more nodes sharing the same parent
(B, F), (B, G), (C, H), (C, I), (D, J), (D, K)}. node are called sibling nodes.
The parent of a nonroot node v is the unique For example, in Figure 14.15, nodes E and G
node u with a directed edge from u to v. Each are siblings. However, nodes E and J, though
node in the tree has a unique parent node except from the same level, are not sibling nodes.
the root of the tree. Two sibling nodes are of the same level, but two
For example, in Figure 14.15, root node A is nodes in the same level are not necessarily
the parent node for nodes B, C, and D. Similarly, siblings.
B is the parent of E, F, G, and so on. The root A tree is called an ordered tree if the rela- tive
node A does not have any parent. position of occurrences of children nodes is
A node that has children is called an internal significant.
node. For example, a family tree is an ordered tree if,
For example, in Figure 14.15, node A or node B as a rule, the name of an elder sibling appears
are examples of internal nodes. always before (i.e., on the left of) the younger
The degree of a node in a tree is the same as its sibling.
number of children. In an unordered tree, the relative position of
For example, in Figure 14.15, root node A and occurrences between the siblings does not bear
its child B are both of degree 3. Nodes C and D any significance and may be altered arbitrarily.
have degree 2. A binary tree is formed with zero or more nodes
The distance of a node from the root node in where there is a root node R and all the remaining
terms of number of hops is called its level. Nodes nodes form a pair of ordered subtrees under the
in a tree are at different levels. The root node is root node.
14-12 SWEBOK® Guide V3.0

In a binary tree, no internal node can have more


than two children. However, one must consider
that besides this criterion in terms of the degree of
internal nodes, a binary tree is always ordered. If
the positions of the left and right subtrees for any
node in the tree are swapped, then a new tree is
derived.

Figure 14.18. Example of Complete Binary Trees

Figure 14.16. Examples of Binary Trees Interestingly, following the definitions above,
the tree in Figure 14.18(b) is a complete but not
For example, in Figure 14.16, the two binary full binary tree as node B has only one child in D.
trees are different as the positions of occurrences On the contrary, the tree in Figure 14.17 is a full
of the children of A are different in the two trees. —but not complete—binary tree, as the children
of B occur in the tree while the children of C do
not appear in the last level.
A binary tree of height H is balanced if all its
leaf nodes occur at levels H or H − 1.
For example, all three binary trees in Figures
14.17 and 14.18 are balanced binary trees.
There are at most 2H leaves in a binary tree of
height H. In other words, if a binary tree with L
leaves is full and balanced, then its height is H =
⎡log2L⎤.
For example, this statement is true for the
two trees in Figures 14.17 and 14.18(a) as both
Figure 14.17. Example of a Full Binary Tree trees are full and balanced. However, the expres-
sion above does not match for the tree in Figure
According to [1*], a binary tree is called a full 14.18(b) as it is not a full binary tree.
binary tree if every internal node has exactly two A binary search tree (BST) is a special kind of
children. binary tree in which each node contains a distinct
For example, the binary tree in Figure 14.17 is key value, and the key value of each node in the
a full binary tree, as both of the two internal nodes tree is less than every key value in its right subtree
A and B are of degree 2. and greater than every key value in its left subtree.
A full binary tree following the definition A traversal algorithm is a procedure for sys-
above is also referred to as a strictly binary tree. tematically visiting every node of a binary tree.
For example, both binary trees in Figure 14.18 Tree traversals may be defined recursively.
are complete binary trees. The tree in Figure If T is binary tree with root R and the remain-
14.18(a) is a complete as well as a full binary tree. ing nodes form an ordered pair of nonnull left
A complete binary tree has all its levels, except subtree TL and nonnull right subtree T R below R,
possibly the last one, filled up to capacity. In case then the preorder traversal function PreOrder(T)
the last level of a complete binary tree is not full, is defined as:
nodes occur from the leftmost positions available.
PreOrder(T) = R, PreOrder(T L), PreOrder(TR)
… eqn. 1
Mathematical Foundations 14-13

The recursive process of finding the preorder randomness has been defined in section 4 of this
traversal of the subtrees continues till the sub- KA. Here, let us start with the concepts behind
trees are found to be Null. Here, commas have probability distribution and discrete probability.
been used as delimiters for the sake of improved A probability model is a mathematical descrip-
readability. tion of a random phenomenon consisting of two
The postorder and in-order may be similarly parts: a sample space S and a way of assigning
defined using eqn. 2 and eqn. 3 respectively. probabilities to events. The sample space defines
the set of all possible outcomes, whereas an event
PostOrder(T) = PostOrder(T L), PostOrder(TR), is a subset of a sample space representing a pos-
R … eqn 2 sible outcome or a set of outcomes.
InOrder(T) = InOrder(T L), R, InOrder(TR) … A random variable is a function or rule that
eqn 3 assigns a number to each outcome. Basically, it is
just a symbol that represents the outcome of an
experiment.
For example, let X be the number of heads
when the experiment is flipping a coin n times.
Similarly, let S be the speed of a car as registered
on a radar detector.
The values for a random variable could be dis-
crete or continuous depending on the experiment.
A discrete random variable can hold all pos- sible
outcomes without missing any, although it
might take an infinite amount of time.
A continuous random variable is used to mea-
sure an uncountable number of values even if an
infinite amount of time is given.
Figure 14.19. A Binary Search Tree For example, if a random variable X represents
an outcome that is a real number between 1 and
For example, the tree in Figure 14.19 is a binary 100, then X may have an infinite number of val-
search tree (BST). The preorder, postorder, and ues. One can never list all possible outcomes for
in-order traversal outputs for the BST are given X even if an infinite amount of time is allowed.
below in their respective order. Here, X is a continuous random variable. On the
contrary, for the same interval of 1 to 100, another
Preorder output: 9, 5, 2, 1, 4, 7, 6, 8, 13, 11, random variable Y can be used to list all the
10, 15 integer values in the range. Here, Y is a dis- crete
Postorder output: 1, 4, 2, 6, 8, 7, 5, 10, 11, 15, random variable.
13, 9 An upper-case letter, say X, will represent the
In-order output: 1, 2, 4, 5, 6, 7, 8, 9, 10, 11, name of the random variable. Its lower-case
13, 15 counterpart, x, will represent the value of the ran-
dom variable.
Further discussion on trees and their usage has The probability that the random variable X will
been included in section 6, Data Structure and Rep- equal x is:
resentation, of the Computing Foundations KA.
P(X = x) or, more simply, P(x).
6. Discrete Probability
[1*, c7] A probability distribution (density) function is
a table, formula, or graph that describes the val-
Probability is the mathematical description of ues of a random variable and the probability asso-
randomness. Basic definition of probability and ciated with these values.
14-14 SWEBOK® Guide V3.0

Probabilities associated with discrete random These numbers indeed aim to derive the aver-
variables have the following properties: age value from repeated experiments. This is
based on the single most important phenom- enon
i. 0 ≤ P(x) ≤ 1 for all x of probability, i.e., the average value from
ii. ΣP(x) = 1 repeated experiments is likely to be close to the
expected value of one experiment. Moreover, the
A discrete probability distribution can be repre- average value is more likely to be closer to the
sented as a discrete random variable. expected value of any one experiment as the
number of experiments increases.
X 1 2 3 4 5 6
7. Finite State Machines
P(x) 1/6 1/6 1/6 1/6 1/6 1/6 [1*, c13]

Figure 14.20. A Discrete Probability Function for a Rolling A computer system may be abstracted as a map-
Die ping from state to state driven by inputs. In other
words, a system may be considered as a transition
The mean μ of a probability distribution model function T: S × I → S × O, where S is the set of
is the sum of the product terms for individual states and I, O are the input and output functions.
events and its outcome probability. In other If the state set S is finite (not infinite), the sys-
words, for the possible outcomes x1, x2, … , xn in tem is called a finite state machine (FSM).
a sample space S if pk is the probability of out- Alternately, a finite state machine (FSM) is a
come xk, the mean of this probability would be μ mathematical abstraction composed of a finite
= x1p1 + x2p2 + … + xnpn. number of states and transitions between those
For example, the mean of the probability den- states. If the domain S × I is reasonably small,
sity for the distribution in Figure 14.20 would be then one can specify T explicitly using diagrams
similar to a flow graph to illustrate the way logic
1 * (1/6) + 2 * (1/6) + 3 * (1/6) + 4 * (1/6) + 5 flows for different inputs. However, this is prac-
* (1/6) + 6 * (1/6) tical only for machines that have a very small
= 21 * (1/6) = 3.5 information capacity.
An FSM has a finite internal memory, an input
Here, the sample space refers to the set of all feature that reads symbols in a sequence and one
possible outcomes. at a time, and an output feature.
The variance s2 of a discrete probability model The operation of an FSM begins from a start
is: s2 = (x 1– μ)2p +1 (x – 2μ)2p + …
2
+ (x – μ)k 2p . The
k
state, goes through transitions depending on input
standard deviations is the square root of the to different states, and can end in any valid state.
variance. However, only a few of all the states mark a suc-
For example, for the probability distribution in cessful flow of operation. These are called accept
Figure 14.20, the variation σ2 would be states.
The information capacity of an FSM is C
s2 = [(1 – 3.5)2 * (1/6) + (2 – 3.5)2 * (1/6) + = log |S|. Thus, if we represent a machine having an
(3 – 3.5)2 * (1/6) + (4 – 3.5)2 * (1/6) + (5 – information capacity of C bits as an FSM, then its
3.5)2 * (1/6) + (6 – 3.5)2 * (1/6)] state transition graph will have |S| = 2C nodes.
= (6.25 + 2.25 + 0.25 + 0.5 + 2.25 + 6.25) * A finite state machine is formally defined as M
(1/6) = (S, I, O, f, g, s0).
= 17.5 * (1/6)
= 2.90 S is the state set;
I is the set of input symbols;
∴ standard deviation s = O is the set of output symbols;
f is the state transition function;
Mathematical Foundations 14-15

g is the output function; The state transition and output values for differ-
and s0 is the initial state. ent inputs on different states may be represented
using a state table. The state table for the FSM in
Given an input x ∈ I on state Sk, the FSM Figure 14.21 is shown in Figure 14.22. Each pair
makes a transition to state Sh following state tran- against an input symbol represents the new state
sition function f and produces an output y ∈ O and the output symbol.
using the output function g. For example, Figures 14.22(a) and 14.22(b) are
two alternate representations of the FSM in Fig-
ure 14.21.

8. Grammars
[1*, c13]

The grammar of a natural language tells us


whether a combination of words makes a valid
sentence. Unlike natural languages, a formal lan-
guage is specified by a well-defined set of rules for
syntaxes. The valid sentences of a formal language
can be described by a grammar with the help of
these rules, referred to as production rules.
Figure 14.21. Example of an FSM A formal language is a set of finite-length
words or strings over some finite alphabet, and a
For example, Figure 14.21 illustrates an FSM grammar specifies the rules for formation of these
with S0 as the start state and S1 as the final state. words or strings. The entire set of words that are
Here, S = {S0, S1, S2}; I = {0, 1}; O = {2, 3}; f(S0, valid for a grammar constitutes the lan- guage for
0) = S2, f(S0, 1) = S1, f(S1, 0) = S2, f(S1, 1) = S2, f(S2, the grammar. Thus, the grammar G is any
0) = S2, f(S2, 1) = S0; g(S0, 0) = 3, g(S0, 1) = 2, g(S1, compact, precise mathematical definition of a
0) = 3, g(S1, 1) = 2, g(S2, 0) = 2, g(S2, 1) = 3. language L as opposed to just a raw listing of all
of the language’s legal sentences or examples of
Current Input those sentences.
State 0 1 A grammar implies an algorithm that would
generate all legal sentences of the language.
S0 S2 S1
There are different types of grammars.
S1 S2 S2 A phrase-structure or Type-0 grammar G = (V,
S2 S2 S0 T, S, P) is a 4-tuple in which:

(a) • V is the vocabulary, i.e., set of words.


• T ⊆ V is a set of words called terminals.
• S ∈ N is a special word called the start
Output State symbol.
Current
State
Input Input • P is the set of productions rules for substitut-
0 1 0 1 ing one sentence fragment for another.
S0 3 2 S2 S1
There exists another set N = V − T of words
S1 3 2 S2 S2 called nonterminals. The nonterminals represent
S2 2 3 S2 S0 concepts like noun. Production rules are applied
on strings containing nonterminals until no more
(b) nonterminal symbols are present in the string.
The start symbol S is a nonterminal.
Figure 14.22. Tabular Representation of an FSM
14-16 SWEBOK® Guide V3.0

The language generated by a formal grammar 3. Every CSG is a phrase-structure grammar


G, denoted by L(G), is the set of all strings over (PSG).
the set of alphabets V that can be generated, start-
ing with the start symbol, by applying produc- Context-Sensitive Grammar: All fragments in
tion rules until all the nonterminal symbols are the RHS are either longer than the corresponding
replaced in the string. fragments in the LHS or empty, i.e., if b → a, then
For example, let G = ({S, A, a, b}, {a, b}, S, {S |b| < |a| or a = ∅.
→ aA, S → b, A → aa}). Here, the set of termi- A formal language is context-sensitive if a con-
nals are N = {S, A}, where S is the start symbol. text-sensitive grammar generates it.
The three production rules for the grammar are Context-Free Grammar: All fragments in the
given as P1: S → aA; P2: S → b; P3: A → aa. LHS are of length 1, i.e., if A → a, then |A| = 1
Applying the production rules in all possible for all A ∈ N.
ways, the following words may be generated The term context-free derives from the fact that
from the start symbol. A can always be replaced by a, regardless of the
context in which it occurs.
S → aA (using P1 on start symbol) A formal language is context-free if a context-
→ aaa (using P3) free grammar generates it. Context-free lan-
S →b (using P2 on start symbol) guages are the theoretical basis for the syntax of
most programming languages.
Nothing else can be derived for G. Thus, the Regular Grammar. All fragments in the RHS
language of the grammar G consists of only two are either single terminals or a pair built by a
words: L(G) = {aaa, b}. terminal and a nonterminal; i.e., if A → a, then
either a ∈ T, or a = cD, or a = Dc for c ∈ T, D ∈ N.
8.1. Language Recognition If a = cD, then the grammar is called a right
linear grammar. On the other hand, if a = Dc, then
Formal grammars can be classified according to the the grammar is called a left linear grammar. Both
types of productions that are allowed. The Chom- the right linear and left linear grammars are regu-
sky hierarchy (introduced by Noam Chomsky in lar or Type-3 grammar.
1956) describes such a classification scheme. The language L(G) generated by a regular
grammar G is called a regular language.
A regular expression A is a string (or pattern)
formed from the following six pieces of infor-
mation: a ∈ S, the set of alphabets, e, 0 and the
operations, OR (+), PRODUCT (.), CONCATE-
NATION (*). The language of G, L(G) is equal to
all those strings that match G, L(G) = {x ∈ S*|x
matches G}.

For any a ∈ S, L(a) = a; L(e) = {ε}; L(0) = 0.


+ functions as an or, L(A + B) = L(A) ∪ L(B).
Figure 14.23. Chomsky Hierarchy of Grammars . creates a product structure, L(AB) = L(A) .
L(B).
As illustrated in Figure 14.23, we infer the fol- * denotes concatenation, L(A*) = {x1x2…xn |
lowing on different types of grammars: xi ∈ L(A) and n ³ 0}

1. Every regular grammar is a context-free For example, the regular expression (ab)*
grammar (CFG). matches the set of strings: {e, ab, abab, ababab,
2. Every CFG is a context-sensitive grammar abababab, …}.
(CSG).
Mathematical Foundations 14-17

For example, the regular expression (aa)* or chopping (meaning the exact representative
matches the set of strings on one letter a that have immediately below —or above, if negative—the
even length. number).
For example, the regular expression (aaa)* + Numbers lying beyond the range must be repre-
(aaaaa)* matches the set of strings of length equal sented by the largest (or largest negative) number
to a multiple of 3 or 5. that can be represented. This becomes a symbol
for overflow. Overflow occurs when a computa-
9. Numerical Precision, Accuracy, and Errors tion produces a value larger than the maximum
[2*, c2] value in the range.
When processing speed is a significant bottle-
The main goal of numerical analysis is to develop neck, the use of the fixed-point representations is
efficient algorithms for computing pre- cise an attractive and faster alternative to the more
numerical values of functions, solutions of cumbersome floating-point arithmetic most com-
algebraic and differential equations, optimization monly used in practice.
problems, etc. Let’s define a couple of very important terms:
A matter of fact is that all digital computers can accuracy and precision as associated with numer-
only store finite numbers. In other words, there is ical analysis.
no way that a computer can represent an infi- Accuracy is the closeness with which a mea-
nitely large number—be it an integer, rational sured or computed value agrees with the true value.
number, or any real or all complex numbers (see Precision, on the other hand, is the closeness
section 10, Number Theory). So the mathematics with which two or more measured or computed
of approximation becomes very critical to handle values for the same physical substance agree with
all the numbers in the finite range that a computer each other. In other words, precision is the close-
can handle. ness with which a number represents an exact
Each number in a computer is assigned a loca- value.
tion or word, consisting of a specified number of Let x be a real number and let x* be an approxi-
binary digits or bits. A k bit word can store a total mation. The absolute error in the approximation
of N = 2k different numbers. x* ≈ x is defined as | x* − x |. The relative error
For example, a computer that uses 32 bit arith- is defined as the ratio of the absolute error to the
metic can store a total of N = 232 ≈ 4.3 × 109 dif- size of x, i.e., |x* − x| / | x |, which assumes x ¹ 0;
ferent numbers, while another one that uses 64 otherwise, relative error is not defined.
bits can handle N’ = 264 ≈ 1.84 × 1019 different For example, 1000000 is an approximation to
numbers. The question is how to distribute these 1000001 with an absolute error of 1 and a relative
N numbers over the real line for maximum effi- error of 10−6, while 10 is an approximation of 11
ciency and accuracy in practical computations. with an absolute error of 1 and a relative error of
One evident choice is to distribute them evenly, 0.1. Typically, relative error is more intuitive and
leading to fixed-point arithmetic. In this system, the preferred determiner of the size of the error.
the first bit in a word is used to represent a sign The present convention is that errors are always
and the remaining bits are treated for integer val- ≥ 0, and are = 0 if and only if the approximation
ues. This allows representation of the integers is exact.
from 1 − ½N, i.e., = 1 − 2 k−1 to 1. As an approxi- An approximation x* has k significant deci-
mating method, this is not good for noninteger mal digits if its relative error is < 5 × 10−k−1. This
numbers. means that the first k digits of x* following its
Another option is to space the numbers closely first nonzero digit are the same as those of x.
together—say with a uniform gap of 2−n—and so Significant digits are the digits of a number that
distribute the total N numbers uniformly over the are known to be correct. In a measurement, one
interval −2−n−1N < x ≤ 2−n−1N. Real numbers lying uncertain digit is included.
between the gaps are represented by either round- For example, measurement of length with
ing (meaning the closest exact representative) a ruler of 15.5 mm with ±0.5 mm maximum
14-18 SWEBOK® Guide V3.0

allowable error has 2 significant digits, whereas a decimals either do not exist, e.g., 15, or, when
measurement of the same length using a caliper decimals do exist, they may terminate, as in 15.6,
and recorded as 15.47 mm with ±0.01 mm maxi- or they may repeat with a pattern, as in 1.666...,
mum allowable error has 3 significant digits. (which is 5/3).
Irrational Numbers. These are numbers that
10. Number Theory cannot be expressed as an integer divided by an
[1*, c4] integer. These numbers have decimals that never
terminate and never repeat with a pattern, e.g., PI
Number theory is one of the oldest branches of or √2.
pure mathematics and one of the largest. Of Real Numbers. This group is made up of all the
course, it concerns questions about numbers, rational and irrational numbers. The numbers that
usually meaning whole numbers and fractional or are encountered when studying algebra are real
rational numbers. The different types of numbers numbers. The common mathematical symbol for
include integer, real number, natural number, the set of all real numbers is R.
complex number, rational number, etc. Imaginary Numbers. These are all based on the
imaginary number i. This imaginary number is
10.1. Divisibility equal to the square root of −1. Any real number
multiple of i is an imaginary number, e.g., i, 5i,
Let’s start this section with a brief description of 3.2i, −2.6i, etc.
each of the above types of numbers, starting with Complex Numbers. A complex number is a
the natural numbers. combination of a real number and an imaginary
Natural Numbers. This group of numbers starts number in the form a + bi. The real part is a, and
at 1 and continues: 1, 2, 3, 4, 5, and so on. Zero is b is called the imaginary part. The common math-
not in this group. There are no negative or frac- ematical symbol for the set of all complex num-
tional numbers in the group of natural numbers. bers is C.
The common mathematical symbol for the set of For example, 2 + 3i, 3−5i, 7.3 + 0i, and 0 + 5i.
all natural numbers is N. Consider the last two examples:
Whole Numbers. This group has all of the natu- 7.3 + 0i is the same as the real number 7.3.
ral numbers in it plus the number 0. Thus, all real numbers are complex numbers with
Unfortunately, not everyone accepts the above zero for the imaginary part.
definitions of natural and whole numbers. There Similarly, 0 + 5i is just the imaginary number
seems to be no general agreement about whether 5i. Thus, all imaginary numbers are complex
to include 0 in the set of natural numbers. numbers with zero for the real part.
Many mathematicians consider that, in Europe, Elementary number theory involves divisibility
the sequence of natural numbers traditionally among integers. Let a, b ∈ Z with a ≠ 0.The expres-
started with 1 (0 was not even considered to be a sion a|b, i.e., a divides b if ∃c ∈ Z: b = ac, i.e., there
number by the Greeks). In the 19th century, set is an integer c such that c times a equals b.
theoreticians and other mathematicians started For example, 3|−12 is true, but 3|7 is false.
the convention of including 0 in the set of natural If a divides b, then we say that a is a factor of
numbers. b or a is a divisor of b, and b is a multiple of a.
Integers. This group has all the whole numbers b is even if and only if 2|b.
in it and their negatives. The common mathemati- Let a, d ∈ Z with d > 1. Then a mod d denotes
cal symbol for the set of all integers is Z, i.e., Z = that the remainder r from the division algorithm
{…, −3, −2, −1, 0, 1, 2, 3, …}. with dividend a and divisor d, i.e., the remainder
Rational Numbers. These are any numbers that when a is divided by d. We can compute (a mod
can be expressed as a ratio of two integers. The a) by: a − d * ⎣a/d⎦, where ⎣a/d⎦ represents the
common symbol for the set of all rational num- floor of the real number.
bers is Q. Let Z+ = {n ∈ Z | n > 0} and a, b ∈ Z, m ∈ Z+,
Rational numbers may be classified into then a is congruent to b modulo m, written as a ≡
three types, based on how the decimals act. The b (mod m), if and only if m | a−b.
Mathematical Foundations 14-19

Alternately, a is congruent to b modulo m if and 11.1. Group


only if (a−b) mod m = 0.
A set S closed under a binary operation • forms a
10.2. Prime Number, GCD group if the binary operation satisfies the follow-
ing four criteria:
An integer p > 1 is prime if and only if it is not
the product of any two integers greater than 1, i.e., • Associative: ∀a, b, c ∈ S, the equation (a • b)
p is prime if p > 1 ∧ ∃ ¬ a, b ∈ N: a > 1, b > 1, a • c = a • (b • c) holds.
* b = p. • Identity: There exists an identity element I ∈
The only positive factors of a prime p are 1 and S such that for all a ∈ S, I • a = a • I = a.
p itself. For example, the numbers 2, 13, 29, 61, • Inverse: Every element a ∈ S, has an inverse
etc. are prime numbers. Nonprime integers a' ∈ S with respect to the binary operation,
greater than 1 are called composite numbers. A i.e., a • a' = I; for example, the set of integers
composite number may be composed by multi- Z with respect to the addition operation is a
plying two integers greater than 1. group. The identity element of the set is 0 for
There are many interesting applications of the addition operation. ∀x ∈ Z, the inverse
prime numbers; among them are the public- key of x would be –x, which is also included in Z.
cryptography scheme, which involves the • Closure property: ∀a, b ∈ S, the result of the
exchange of public keys containing the product operation a • b ∈ S.
p*q of two random large primes p and q (a private • A group that is commutative, i.e., a • b = b • a,
key) that must be kept secret by a given party. is known as a commutative or Abelian group.
The greatest common divisor gcd(a, b) of inte-
gers a, b is the greatest integer d that is a divisor The set of natural numbers N (with the opera-
both of a and of b, i.e., tion of addition) is not a group, since there is no
inverse for any x > 0 in the set of natural numbers.
d = gcd(a, b) for max(d: d|a ∧ d|b) Thus, the third rule (of inverse) for our operation
is violated. However, the set of natural number
For example, gcd(24, 36) = 12. has some structure.
Integers a and b are called relatively prime or Sets with an associative operation (the first
coprime if and only if their GCD is 1. condition above) are called semigroups; if they
For example, neither 35 nor 6 are prime, but also have an identity element (the second condi-
they are coprime as these two numbers have no tion), then they are called monoids.
common factors greater than 1, so their GCD is 1. Our set of natural numbers under addition is
A set of integers X = {i1, i2, …} is relatively then an example of a monoid, a structure that is
prime if all possible pairs ih, ik, h ≠ k drawn from not quite a group because it is missing the
the set X are relatively prime. requirement that every element have an inverse
under the operation.
11. Algebraic Structures A monoid is a set S that is closed under a single
associative binary operation • and has an identity
This section introduces a few representations element I ∈ S such that for all a ∈ S, I • a = a • I
used in higher algebra. An algebraic structure = a. A monoid must contain at least one element.
consists of one or two sets closed under some For example, the set of natural numbers N forms
operations and satisfying a number of axioms, a commutative monoid under addition with
including none. identity element 0. The same set of natural num-
For example, group, monoid, ring, and lattice bers N also forms a monoid under multiplication
are examples of algebraic structures. Each of with identity element 1. The set of positive inte-
these is defined in this section. gers P forms a commutative monoid under multi-
plication with identity element 1.
It may be noted that, unlike those in a group,
elements of a monoid need not have inverses. A
14-20 SWEBOK® Guide V3.0

monoid can also be thought of as a semigroup 11.2. Rings


with an identity element.
A subgroup is a group H contained within a If we take an Abelian group and define a second
bigger one, G, such that the identity element of G operation on it, a new structure is found that is
is contained in H, and whenever h1 and h2 are different from just a group. If this second opera-
in H, then so are h1 • h2 and h1−1. Thus, the ele- tion is associative and is distributive over the first,
ments of H, equipped with the group operation on then we have a ring.
G restricted to H, indeed form a group. A ring is a triple of the form (S, +, •), where (S,
Given any subset S of a group G, the subgroup +) is an Abelian group, (S, •) is a semigroup, and
generated by S consists of products of elements • is distributive over +; i.e., “ a, b, c ∈ S, the equa-
of S and their inverses. It is the smallest subgroup tion a • (b + c) = (a • b) + (a • c) holds. Further, if
of G containing S. • is commutative, then the ring is said to be com-
For example, let G be the Abelian group whose mutative. If there is an identity element for the •
elements are G = {0, 2, 4, 6, 1, 3, 5, 7} and whose operation, then the ring is said to have an identity.
group operation is addition modulo 8. This group For example, (Z, +, *), i.e., the set of integers Z,
has a pair of nontrivial subgroups: J = {0, 4} and with the usual addition and multiplication opera-
H = {0, 2, 4, 6}, where J is also a subgroup of H. tions, is a ring. As (Z, *) is commutative, this ring
In group theory, a cyclic group is a group that is a commutative or Abelian ring. The ring has 1
can be generated by a single element, in the as its identity element.
sense that the group has an element a (called the Let’s note that the second operation may not
generator of the group) such that, when written have an identity element, nor do we need to find
multiplicatively, every element of the group is a an inverse for every element with respect to this
power of a. second operation. As for what distributive means,
A group G is cyclic if G = {an for any integer n}. intuitively it is what we do in elementary math-
Since any group generated by an element in a ematics when performing the following change: a
group is a subgroup of that group, showing that * (b + c) = (a * b) + (a * c).
the only subgroup of a group G that contains a is A field is a ring for which the elements of the
G itself suffices to show that G is cyclic. set, excluding 0, form an Abelian group with the
For example, the group G = {0, 2, 4, 6, 1, 3, 5, second operation.
7}, with respect to addition modulo 8 operation, A simple example of a field is the field of ratio-
is cyclic. The subgroups J = {0, 4} and H = {0, 2, nal numbers (R, +, *) with the usual addition and
4, 6} are also cyclic. multiplication operations. The numbers of the
format a/b ∈ R, where a, b are integers and b ≠
0. The additive inverse of such a fraction is
simply −a/b, and the multiplicative inverse is b/a
provided that a ≠ 0.
Mathematical Foundations 14-21

MATRIX OF TOPICS VS. REFERENCE MATERIAL

Cheney and Kincaid 2007


Rosen 2011
[1*]

[2*]
1. Sets, Relations, Functions c2
2. Basic Logic c1
3. Proof Techniques c1
4. Basic Counting c6
5. Graphs and Trees c10, c11
6. Discrete Probability c7
7. Finite State Machines c13
8. Grammars c13
9. Numerical Precision, Accuracy, and Errors c2
10. Number Theory c4
11. Algebraic Structures
14-22 SWEBOK® Guide V3.0

REFERENCES ACKNOWLEDGMENTS

[1*] K. Rosen, Discrete Mathematics and Its The author thankfully acknowledges the contri-
Applications, 7th ed., McGraw-Hill, 2011. bution of Prof. Arun Kumar Chatterjee, Ex-Head,
Department of Mathematics, Manipur Univer-
[2*] E.W. Cheney and D.R. Kincaid, Numerical sity, India, and Prof. Devadatta Sinha, Ex-Head,
Mathematics and Computing, 6th ed., Department of Computer Science and Engineer-
Brooks/Cole, 2007. ing, University of Calcutta, India, in preparing
this chapter on Mathematical Foundations.
CHAPTER 15

ENGINEERING FOUNDATIONS

ACRONYMS effectively is a goal of all engineers in all engi-


neering disciplines.
CAD Computer-Aided Design
Capability Maturity Model BREAKDOWN OF TOPICS FOR
CMMI
Integration ENGINEERING FOUNDATIONS
pdf Probability Density Function
pmf Probability Mass Function The breakdown of topics for the Engineering
Foundations KA is shown in Figure 15.1.
RCA Root Cause Analysis
SDLC Software Development Life Cycle 1. Empirical Methods and Experimental
Techniques
[2*, c1]
INTRODUCTION
An engineering method for problem solving
IEEE defines engineering as “the application of a involves proposing solutions or models of solu-
systematic, disciplined, quantifiable approach to tions and then conducting experiments or tests to
structures, machines, products, systems or study the proposed solutions or models. Thus,
processes” [1]. This chapter outlines some of the engineers must understand how to create an exper-
engineering foundational skills and techniques iment and then analyze the results of the experi-
that are useful for a software engineer. The focus ment in order to evaluate the proposed solution.
is on topics that support other KAs while mini- Empirical methods and experimental techniques
mizing duplication of subjects covered elsewhere help the engineer to describe and understand vari-
in this document. ability in their observations, to identify the sources
As the theory and practice of software engi- of variability, and to make decisions.
neering matures, it is increasingly apparent that Three different types of empirical studies com-
software engineering is an engineering disci- monly used in engineering efforts are designed
pline that is based on knowledge and skills com- experiments, observational studies, and retro-
mon to all engineering disciplines. This Engi- spective studies. Brief descriptions of the com-
neering Foundations knowledge area (KA) is monly used methods are given below.
concerned with the engineering foundations that
apply to software engineering and other engi- 1.1. Designed Experiment
neering disciplines. Topics in this KA include
empirical methods and experimental techniques; A designed or controlled experiment is an inves-
statistical analysis; measurement; engineering tigation of a testable hypothesis where one or
design; modeling, prototyping, and simulation; more independent variables are manipulated to
standards; and root cause analysis. Application of measure their effect on one or more dependent
this knowledge, as appropriate, will allow variables. A precondition for conducting an
software engineers to develop and maintain experiment is the existence of a clear hypothesis.
software more efficiently and effectively. Com- It is important for an engineer to understand how
pleting their engineering work efficiently and to formulate clear hypotheses.

15-1
15-2 SWEBOK® Guide V3.0

Figure 15.1. Breakdown of Topics for the Engineering Foundations KA

Designed experiments allow engineers to 2. Statistical Analysis


determine in precise terms how the variables are [2*, c9s1, c2s1] [3*, c10s3]
related and, specifically, whether a cause-effect

relationship exists between them. Each combi- In order to carry out their responsibilities, engi-
nation of values of the independent variables is a neers must understand how different product and
treatment. The simplest experiments have just process characteristics vary. Engineers often
two treatments representing two levels of a sin- come across situations where the relationship
gle independent variable (e.g., using a tool vs. not between different variables needs to be studied.
using a tool). More complex experimental designs An important point to note is that most of the
arise when more than two levels, more than one studies are carried out on the basis of samples and
independent variable, or any dependent variables so the observed results need to be understood with
are used. respect to the full population. Engineers must,
therefore, develop an adequate understand- ing of
1.2. Observational Study statistical techniques for collecting reliable data in
terms of sampling and analysis to arrive at results
An observational or case study is an empirical that can be generalized. These techniques are
inquiry that makes observations of processes or discussed below.
phenomena within a real-life context. While an
experiment deliberately ignores context, an 2.1. Unit of Analysis (Sampling Units),
observational or case study includes context as Population, and Sample
part of the observation. A case study is most use-
ful when the focus of the study is on how and why Unit of analysis. While carrying out any empiri-
questions, when the behavior of those involved in cal study, observations need to be made on cho-
the study cannot be manipulated, and when con- sen units called the units of analysis or sampling
textual conditions are relevant and the boundaries units. The unit of analysis must be identified and
between the phenomena and context are not clear. must be appropriate for the analysis. For exam-
ple, when a software product company wants to
1.3. Retrospective Study find the perceived usability of a software product,
the user or the software function may be the unit
A retrospective study involves the analysis of his- of analysis.
torical data. Retrospective studies are also known Population. The set of all respondents or items
as historical studies. This type of study uses data (possible sampling units) to be studied forms the
(regarding some phenomenon) that has been population. As an example, consider the case of
archived over time. This archived data is then ana- studying the perceived usability of a software
lyzed in an attempt to find a relationship between product. In this case, the set of all possible users
variables, to predict future events, or to identify forms the population.
trends. The quality of the analysis results will While defining the population, care must be
depend on the quality of the information contained exercised to understand the study and target
in the archived data. Historical data may be incom- population. There are cases when the popula- tion
plete, inconsistently measured, or incorrect. studied and the population for which the
Engineering Foundations 15-3

results are being generalized may be different. Distribution of a random variable. The range
For example, when the study population consists and pattern of variation of a random variable is
of only past observations and generalizations are given by its distribution. When the distribution of
required for the future, the study population and a random variable is known, it is possible to
the target population may not be the same. compute the chance of any event. Some distribu-
Sample. A sample is a subset of the population. tions are found to occur commonly and are used
The most crucial issue towards the selection of a to model many random variables occurring in
sample is its representativeness, including size. practice in the context of engineering. A few of
The samples must be drawn in a manner so as to the more commonly occurring distributions are
ensure that the draws are independent, and the given below.
rules of drawing the samples must be pre- defined
so that the probability of selecting a par- ticular • Binomial distribution: used to model random
sampling unit is known beforehand. This method variables that count the number of successes
of selecting samples is called probability in n trials carried out independently of each
sampling. other, where each trial results in success or
Random variable. In statistical terminology, failure. We make an assumption that the
the process of making observations or measure- chance of obtaining a success remains con-
ments on the sampling units being studied is stant [2*, c3s6].
referred to as conducting the experiment. For • Poisson distribution: used to model the count
example, if the experiment is to toss a coin 10 of occurrence of some event over time or
times and then count the number of times the coin space [2*, c3s9].
lands on heads, each 10 tosses of the coin is a • Normal distribution: used to model continu-
sampling unit and the number of heads for a given ous random variables or discrete random
sample is the observation or outcome for the variables by taking a very large number of
experiment. The outcome of an experiment is values [2*, c4s6].
obtained in terms of real numbers and defines the
random variable being studied. Thus, the attribute Concept of parameters. A statistical distribution
of the items being measured at the outcome of the is characterized by some parameters. For exam-
experiment represents the random variable being ple, the proportion of success in any given trial is
studied; the observation obtained from a the only parameter characterizing a binomial
particular sampling unit is a particular realization distribution. Similarly, the Poisson distribution is
of the random variable. In the example of the coin characterized by a rate of occurrence. A normal
toss, the random variable is the number of heads distribution is characterized by two parameters:
observed for each experiment. In statistical stud- namely, its mean and standard deviation.
ies, attempts are made to understand population Once the values of the parameters are known,
characteristics on the basis of samples. the distribution of the random variable is com-
The set of possible values of a random variable pletely known and the chance (probability) of any
may be finite or infinite but countable (e.g., the event can be computed. The probabilities for a
set of all integers or the set of all odd numbers). discrete random variable can be computed
In such a case, the random variable is called a dis- through the probability mass function, called the
crete random variable. In other cases, the random pmf. The pmf is defined at discrete points and
variable under consideration may take values on gives the point mass—i.e., the probability that the
a continuous scale and is called a continuous ran- random variable will take that particular value.
dom variable. Likewise, for a continuous random vari- able, we
Event. A subset of possible values of a random have the probability density function, called the
variable is called an event. Suppose X denotes pdf. The pdf is very much like density and needs
some random variable; then, for example, we may to be integrated over a range to obtain the
define different events such as X ³ x or X < x and probability that the continuous random vari- able
so on. lies between certain values. Thus, if the pdf
15-4 SWEBOK® Guide V3.0

or pmf is known, the chances of the random vari- observations as well as the sample size. The lim-
able taking certain set of values may be computed its are computed on the basis of some assump-
theoretically. tions regarding the sampling distribution of the
Concept of estimation [2*, c6s2, c7s1, c7s3]. point estimate on which the limits are based.
The true values of the parameters of a distribution Properties of estimators. Various statistical
are usually unknown and need to be estimated properties of estimators are used to decide about
from the sample observations. The estimates are the appropriateness of an estimator in a given
functions of the sample values and are called sta- situation. The most important properties are that
tistics. For example, the sample mean is a statistic an estimator is unbiased, efficient, and consistent
and may be used to estimate the population mean. with respect to the population.
Similarly, the rate of occurrence of defects esti- Tests of hypotheses [2*, c9s1].A hypothesis is
mated from the sample (rate of defects per line of a statement about the possible values of a param-
code) is a statistic and serves as the estimate of eter. For example, suppose it is claimed that a
the population rate of rate of defects per line of new method of software development reduces the
code. The statistic used to estimate some popula- occurrence of defects. In this case, the hypoth-
tion parameter is often referred to as the estimator esis is that the rate of occurrence of defects has
of the parameter. reduced. In tests of hypotheses, we decide—on
A very important point to note is that the results the basis of sample observations—whether a pro-
of the estimators themselves are random. If we posed hypothesis should be accepted or rejected.
take a different sample, we are likely to get a dif- For testing hypotheses, the null and alternative
ferent estimate of the population parameter. In the hypotheses are formed. The null hypothesis is the
theory of estimation, we need to understand dif- hypothesis of no change and is denoted as H0. The
ferent properties of estimators—particularly, how alternative hypothesis is written as H1. It is impor-
much the estimates can vary across samples and tant to note that the alternative hypothesis may be
how to choose between different alternative ways one-sided or two-sided. For example, if we have
to obtain the estimates. For example, if we wish the null hypothesis that the population mean is not
to estimate the mean of a population, we might less than some given value, the alternative hypoth-
use as our estimator a sample mean, a sample esis would be that it is less than that value and we
median, a sample mode, or the midrange of the would have a one-sided test. However, if we have
sample. Each of these estimators has different the null hypothesis that the population mean is
statistical properties that may impact the standard equal to some given value, the alternative hypoth-
error of the estimate. esis would be that it is not equal and we would
Types of estimates [2*, c7s3, c8s1].There are have a two-sided test (because the true value could
two types of estimates: namely, point estimates be either less than or greater than the given value).
and interval estimates. When we use the value of In order to test some hypothesis, we first com-
a statistic to estimate a population parameter, we pute some statistic. Along with the computation
get a point estimate. As the name indicates, a of the statistic, a region is defined such that in
point estimate gives a point value of the param- case the computed value of the statistic falls in
eter being estimated. that region, the null hypothesis is rejected. This
Although point estimates are often used, they region is called the critical region (also known as
leave room for many questions. For instance, we the confidence interval). In tests of hypotheses,
are not told anything about the possible size of we need to accept or reject the null hypothesis
error or statistical properties of the point esti- on the basis of the evidence obtained. We note
mate. Thus, we might need to supplement a point that, in general, the alternative hypothesis is the
estimate with the sample size as well as the vari- hypothesis of interest. If the computed value of
ance of the estimate. Alternately, we might use an the statistic does not fall inside the critical region,
interval estimate. An interval estimate is a random then we cannot reject the null hypothesis. This
interval with the lower and upper lim- its of the indicates that there is not enough evidence to
interval being functions of the sample believe that the alternative hypothesis is true.
Engineering Foundations 15-5

As the decision is being taken on the basis of given the value of one variable, the other can be
sample observations, errors are possible; the types estimated with no error. A positive correlation
of such errors are summarized in the fol- lowing coefficient indicates a positive relationship—that
table. is, if one variable increases, so does the other. On
the other hand, when the variables are negatively
Statistical Decision correlated, an increase of one leads to a decrease
Nature of the other.
Accept H0 Reject H0
It is important to remember that correlation
H0 is Type I error does not imply causation. Thus, if two variables
OK
true (probability = a) are correlated, we cannot conclude that one
H0 is Type II error causes the other.
OK
false (probability = b) Regression. The correlation analysis only
measures the degree of relationship between two
In test of hypotheses, we aim at maximizing the variables. The analysis to find the relation- ship
power of the test (the value of 1−b) while ensur- between two variables is called regression
ing that the probability of a type I error (the value analysis. The strength of the relationship between
of a) is maintained within a particular value— two variables is measured using the coefficient of
typically 5 percent. determination. This is a value between 0 and 1.
It is to be noted that construction of a test of The closer the coefficient is to 1, the stronger the
hypothesis includes identifying statistic(s) to relationship between the variables. A value of 1
estimate the parameter(s) and defining a critical indicates a perfect relationship.
region such that if the computed value of the sta-
tistic falls in the critical region, the null hypoth- 3. Measurement
esis is rejected. [4*, c3s1, c3s2] [5*, c4s4] [6*, c7s5]
[7*, p442–447]
2.2. Concepts of Correlation and Regression
[2*, c11s2, c11s8] Knowing what to measure and which measure-
ment method to use is critical in engineering
A major objective of many statistical investiga- endeavors. It is important that everyone involved
tions is to establish relationships that make it pos- in an engineering project understand the mea-
sible to predict one or more variables in terms of surement methods and the measurement results
others. Although it is desirable to predict a quan- that will be used.
tity exactly in terms of another quantity, it is sel- Measurements can be physical, environmen-
dom possible and, in many cases, we have to be tal, economic, operational, or some other sort of
satisfied with estimating the average or expected measurement that is meaningful for the particular
values. project. This section explores the theory of mea-
The relationship between two variables is stud- surement and how it is fundamental to engineer-
ied using the methods of correlation and regres- ing. Measurement starts as a conceptualization
sion. Both these concepts are explained briefly in then moves from abstract concepts to definitions
the following paragraphs. of the measurement method to the actual appli-
Correlation. The strength of linear relation- cation of that method to obtain a measurement
ship between two variables is measured using the result. Each of these steps must be understood,
correlation coefficient. While computing the communicated, and properly employed in order to
correlation coefficient between two variables, we generate usable data. In traditional engineer- ing,
assume that these variables measure two differ- direct measures are often used. In software
ent attributes of the same entity. The correlation engineering, a combination of both direct and
coefficient takes a value between –1 to +1. The derived measures is necessary [6*, p273].
values –1 and +1 indicate a situation when the The theory of measurement states that mea-
association between the variables is perfect—i.e., surement is an attempt to describe an underlying
15-6 SWEBOK® Guide V3.0

real empirical system. Measurement methods this simple measurement will lead to substantial
define activities that allocate a value or a symbol variation. Engineers must appreciate the need to
to an attribute of an entity. define measures from an operational perspective.
Attributes must then be defined in terms of the
operations used to identify and measure them— 3.1. Levels (Scales) of Measurement
that is, the measurement methods. In this [4*, c3s2] [6*, c7s5]
approach, a measurement method is defined to be
a precisely specified operation that yields a num- Once the operational definitions are determined,
ber (called the measurement result) when mea- the actual measurements need to be undertaken.
suring an attribute. It follows that, to be useful, It is to be noted that measurement may be car- ried
the measurement method has to be well defined. out in four different scales: namely, nominal,
Arbitrariness in the method will reflect itself in ordinal, interval, and ratio. Brief descriptions of
ambiguity in the measurement results. each are given below.
In some cases—particularly in the physical Nominal scale: This is the lowest level of mea-
world—the attributes that we wish to measure are surement and represents the most unrestricted
easy to grasp; however, in an artificial world like assignment of numerals. The numerals serve only
software engineering, defining the attributes may as labels, and words or letters would serve as well.
not be that simple. For example, the attributes of The nominal scale of measurement involves only
height, weight, distance, etc. are easily and uni- classification and the observed sampling units are
formly understood (though they may not be very put into any one of the mutually exclusive and
easy to measure in all circumstances), whereas collectively exhaustive categories (classes).
attributes such as software size or complexity Some examples of nominal scales are:
require clear definitions.
Operational definitions. The definition of attri- • Job titles in a company
butes, to start with, is often rather abstract. Such • The software development life cycle (SDLC)
definitions do not facilitate measurements. For model (like waterfall, iterative, agile, etc.)
example, we may define a circle as a line forming followed by different software projects
a closed loop such that the distance between any
point on this line and a fixed interior point called In nominal scale, the names of the different cat-
the center is constant. We may further say that the egories are just labels and no relationship between
fixed distance from the center to any point on the them is assumed. The only operations that can be
closed loop gives the radius of the circle. It may be carried out on nominal scale is that of counting
noted that though the concept has been defined, no the number of occurrences in the different classes
means of measuring the radius has been proposed. and determining if two occurrences have the same
The operational definition specifies the exact steps nominal value. However, statistical analyses may
or method used to carry out a specific measure- be carried out to understand how entities belong-
ment. This can also be called the measurement ing to different classes perform with respect to
method; sometimes a measurement procedure may some other response variable.
be required to be even more precise. Ordinal scale: Refers to the measurement scale
The importance of operational definitions can where the different values obtained through the
hardly be overstated. Take the case of the process of measurement have an implicit order-
apparently simple measurement of height of ing. The intervals between values are not speci-
individuals. Unless we specify various factors fied and there is no objectively defined zero
like the time when the height will be measured (it element. Typical examples of measurements in
is known that the height of individuals vary across ordinal scales are:
various time points of the day), how the
variability due to hair would be taken care of, • Skill levels (low, medium, high)
whether the measurement will be with or without • Capability Maturity Model Integration
shoes, what kind of accuracy is expected (correct (CMMI) maturity levels of software devel-
up to an inch, 1/2 inch, centimeter, etc.)—even opment organizations
Engineering Foundations 15-7

• Level of adherence to process as measured in measured in interval scale, as it is not neces-


a 5-point scale of excellent, above average, sary to define what zero intelligence would
average, below average, and poor, indicating mean.
the range from total adherence to no adher-
ence at all If a variable is measured in interval scale, most
of the usual statistical analyses like mean, stan-
Measurement in ordinal scale satisfies the tran- dard deviation, correlation, and regression may be
sitivity property in the sense that if A > B and B carried out on the measured values.
> C, then A > C. However, arithmetic operations Ratio scale: These are quite commonly encoun-
cannot be carried out on variables measured in tered in physical science. These scales of mea-
ordinal scales. Thus, if we measure customer sat- sures are characterized by the fact that operations
isfaction on a 5-point ordinal scale of 5 implying exist for determining all 4 relations: equality, rank
a very high level of satisfaction and 1 implying a order, equality of intervals, and equality of ratios.
very high level of dissatisfaction, we cannot say Once such a scale is available, its numerical val-
that a score of four is twice as good as a score of ues can be transformed from one unit to another
two. So, it is better to use terminology such as by just multiplying by a constant, e.g., conversion
excellent, above average, average, below aver- of inches to feet or centimeters. When measure-
age, and poor than ordinal numbers in order to ments are being made in ratio scale, existence of
avoid the error of treating an ordinal scale as a a nonarbitrary zero is mandatory. All statistical
ratio scale. It is important to note that ordinal measures are applicable to ratio scale; logarithm
scale measures are commonly misused and such usage is valid only when these scales are used, as
misuse can lead to erroneous conclusions [6*, in the case of decibels. Some examples of ratio
p274]. A common misuse of ordinal scale mea- measures are
sures is to present a mean and standard deviation
for the data set, both of which are meaningless. • the number of statements in a software
However, we can find the median, as computation program
of the median involves counting only. • temperature measured in the Kelvin (K) scale
Interval scales: With the interval scale, we or in Fahrenheit (F).
come to a form that is quantitative in the ordi-
nary sense of the word. Almost all the usual sta- An additional measurement scale, the absolute
tistical measures are applicable here, unless they scale, is a ratio scale with uniqueness of the mea-
require knowledge of a true zero point. The zero sure; i.e., a measure for which no transformation
point on an interval scale is a matter of conven- is possible (for example, the number of program-
tion. Ratios do not make sense, but the difference mers working on a project).
between levels of attributes can be computed and
is meaningful. Some examples of interval scale of 3.2. Direct and Derived Measures
measurement follow: [6*, c7s5]

• Measurement of temperature in different Measures may be either direct or derived (some-


scales, such as Celsius and Fahrenheit. Sup- times called indirect measures). An example of a
pose T1 and T2 are temperatures measured in direct measure would be a count of how many
some scale. We note that the fact that T 1 is times an event occurred, such as the number of
twice T2 does not mean that one object is defects found in a software product. A derived
twice as hot as another. We also note that the measure is one that combines direct measures in
zero points are arbitrary. some way that is consistent with the measurement
• Calendar dates. While the difference between method. An example of a derived measure would
dates to measure the time elapsed is a mean- be calculating the productivity of a team as the
ingful concept, the ratio does not make sense. number of lines of code developed per developer-
• Many psychological measurements aspire to month. In both cases, the measurement method
create interval scales. Intelligence is often determines how to make the measurement.
15-8 SWEBOK® Guide V3.0

3.3. Reliability and Validity The design of a software product is guided by the
[4*, c3s4, c3s5] features to be included and the quality attri- butes
to be provided. It is important to note that
A basic question to be asked for any measure- software engineers use the term “design” within
ment method is whether the proposed measure- their own context; while there are some common-
ment method is truly measuring the concept with alities, there are also many differences between
good quality. Reliability and validity are the two engineering design as discussed in this section
most important criteria to address this question. and software engineering design as discussed in
The reliability of a measurement method is the the Software Design KA. The scope of engineer-
extent to which the application of the mea- ing design is generally viewed as much broader
surement method yields consistent measurement than that of software design. The primary aim of
results. Essentially, reliability refers to the consis- this section is to identify the concepts needed to
tency of the values obtained when the same item develop a clear understanding regarding the pro-
is measured a number of times. When the results cess of engineering design.
agree with each other, the measurement method Many disciplines engage in problem solving
is said to be reliable. Reliability usually depends activities where there is a single correct solu- tion.
on the operational definition. It can be quantified In engineering, most problems have many
by using the index of variation, which is com- solutions and the focus is on finding a feasible
puted as the ratio between the standard deviation solution (among the many alternatives) that best
and the mean. The smaller the index, the more meets the needs presented. The set of pos- sible
reliable the measurement results. solutions is often constrained by explic- itly
Validity refers to whether the measurement imposed limitations such as cost, available
method really measures what we intend to mea- resources, and the state of discipline or domain
sure. Validity of a measurement method may be knowledge. In engineering problems, sometimes
looked at from three different perspectives: there are also implicit constraints (such as the
namely, construct validity, criteria validity, and physical properties of materials or laws of phys-
content validity. ics) that also restrict the set of feasible solutions
for a given problem.
3.4. Assessing Reliability

[4*, c3s5] 4.1. Engineering Design in Engineering


Education
There are several methods for assessing reli-
ability; these include the test-retest method, the The importance of engineering design in engi-
alternative form method, the split-halves method, neering education can be clearly seen by the high
and the internal consistency method. The easi- est expectations held by various accreditation bod-
of these is the test-retest method. In the test- retest ies for engineering education. Both the Cana-
method, we simply apply the measurement dian Engineering Accreditation Board and the
method to the same subjects twice. The correla- Accreditation Board for Engineering and Tech-
tion coefficient between the first and second set nology (ABET) note the importance of including
of measurement results gives the reliability of the engineering design in education programs.
measurement method. The Canadian Engineering Accreditation
Board includes requirements for the amount of
4. Engineering Design engineering design experience/coursework that
[5*, c1s2, c1s3, c1s4] is necessary for engineering students as well as
qualifications for the faculty members who teach
A product’s life cycle costs are largely influenced such coursework or supervise design projects.
by the design of the product. This is true for manu- Their accreditation criteria states:
factured products as well as for software products.
Engineering Foundations 15-9

Design: An ability to design solutions for 4.3. Steps Involved in Engineering Design
complex, open-ended engineering prob- [7*, c4]
lems and to design systems, components or
processes that meet specified needs with Engineering problem solving begins when a need
appropriate attention to health and safety is recognized and no existing solution will meet
risks, applicable standards, and economic, that need. As part of this problem solving, the
environmental, cultural and societal con- design goals to be achieved by the solution should
siderations. [8, p12] be identified. Additionally, a set of accep- tance
criteria must be defined and used to deter- mine
In a similar manner, ABET defines engineering how well a proposed solution will satisfy the
design as need. Once a need for a solution to a problem has
been identified, the process of engineering design
the process of devising a system, compo- has the following generic steps:
nent, or process to meet desired needs. It is
a decision-making process (often itera- a) define the problem
tive), in which the basic sciences, math- b) gather pertinent information
ematics, and the engineering sciences are c) generate multiple solutions
applied to convert resources optimally to d) analyze and select a solution
meet these stated needs. [9, p4] e) implement the solution

Thus, it is clear that engineering design is a All of the engineering design steps are itera-
vital component in the training and education for tive, and knowledge gained at any step in the
all engineers. The remainder of this section will process may be used to inform earlier tasks and
focus on various aspects of engineering design. trigger an iteration in the process. These steps are
expanded in the subsequent sections.
4.2. Design as a Problem Solving Activity
[5*, c1s4, c2s1, c3s3] a. Define the problem. At this stage, the custom-
er’s requirements are gathered. Specific informa-
It is to be noted that engineering design is primar- tion about product functions and features are also
ily a problem solving activity. Design problems closely examined. This step includes refining the
are open ended and more vaguely defined. There problem statement to identify the real problem to
are usually several alternative ways to solve the be solved and setting the design goals and criteria
same problem. Design is generally considered to for success.
be a wicked problem—a term first coined by Horst The problem definition is a crucial stage in
Rittel in the 1960s when design methods were a engineering design. A point to note is that this step
subject of intense interest. Rittel sought an alterna- is deceptively simple. Thus, enough care must be
tive to the linear, step-by-step model of the design taken to carry out this step judiciously. It is
process being explored by many designers and important to identify needs and link the success
design theorists and argued that most of the prob- criteria with the required product characteristics.
lems addressed by the designers are wicked prob- It is also an engineering task to limit the scope of
lems. As explained by Steve McConnell, a wicked a problem and its solution through negotiation
problem is one that could be clearly defined only among the stakeholders.
by solving it or by solving part of it. This paradox
implies, essentially, that a wicked problem has to b. Gather pertinent information. At this stage,
be solved once in order to define it clearly and then the designer attempts to expand his/her knowl-
solved again to create a solution that works. This edge about the problem. This is a vital, yet often
has been an important insight for software design- neglected, stage. Gathering pertinent information
ers for several decades [10*, c5s1]. can reveal facts leading to a redefinition of the
15-10 SWEBOK® Guide V3.0

problem—in particular, mistakes and false starts refine the design or drive the selection of an alter-
may be identified. This step may also involve the native design solution. One of the most impor-
decomposition of the problem into smaller, more tant activities in design is documentation of the
easily solved subproblems. design solution as well as of the tradeoffs for the
While gathering pertinent information, care choices made in the design of the solution. This
must be taken to identify how a product may be work should be carried out in a manner such that
used as well as misused. It is also important to the solution to the design problem can be com-
understand the perceived value of the product/ municated clearly to others.
service being offered. Included in the pertinent The testing and verification take us back to the
information is a list of constraints that must be success criteria. The engineer needs to devise
satisfied by the solution or that may limit the set tests such that the ability of the design to meet the
of feasible solutions. success criteria is demonstrated. While design-
ing the tests, the engineer must think through
c. Generate multiple solutions. During this stage, different possible failure modes and then design
different solutions to the same problem are devel- tests based on those failure modes. The engineer
oped. It has already been stated that design prob- may choose to carry out designed experiments to
lems have multiple solutions. The goal of this step assess the validity of the design.
is to conceptualize multiple possible solu- tions
and refine them to a sufficient level of detail that a 5. Modeling, Simulation, and Prototyping
comparison can be done among them. [5*, c6] [11*, c13s3] [12*, c2s3.1]

d. Analyze and select a solution. Once alternative Modeling is part of the abstraction process used
solutions have been identified, they need to be ana- to represent some aspects of a system. Simula-
lyzed to identify the solution that best suits the cur- tion uses a model of the system and provides a
rent situation. The analysis includes a functional means of conducting designed experiments with
analysis to assess whether the proposed design that model to better understand the system, its
would meet the functional requirements. Physical behavior, and relationships between subsystems,
solutions that involve human users often include as well as to analyze aspects of the design. Mod-
analysis of the ergonomics or user friendliness of eling and simulation are techniques that can be
the proposed solution. Other aspects of the solu- used to construct theories or hypotheses about the
tion—such as product safety and liability, an eco- behavior of the system; engineers then use those
nomic or market analysis to ensure a return (profit) theories to make predictions about the system.
on the solution, performance predictions and anal- Prototyping is another abstraction process where
ysis to meet quality characteristics, opportunities a partial representation (that captures aspects of
for incorrect data input or hardware malfunctions, interest) of the product or system is built. A pro-
and so on—may be studied. The types and amount totype may be an initial version of the system but
of analysis used on a proposed solution are depen- lacks the full functionality of the final version.
dent on the type of problem and the needs that the
solution must address as well as the constraints 5.1. Modeling
imposed on the design.
A model is always an abstraction of some real or
e. Implement the solution. The final phase of the imagined artifact. Engineers use models in many
design process is implementation. Implemen- ways as part of their problem solving activities.
tation refers to development and testing of the Some models are physical, such as a made-to-
proposed solution. Sometimes a preliminary, scale miniature construction of a bridge or
partial solution called a prototype may be devel- building. Other models may be nonphysical
oped initially to test the proposed design solu- representations, such as a CAD drawing of a cog
tion under certain conditions. Feedback resulting or a mathematical model for a process. Models
from testing a prototype may be used either to help engineers reason and understand aspects of
Engineering Foundations 15-11

a problem. They can also help engineers under- An important problem in the development of a
stand what they do know and what they don’t discrete simulation is that of initialization. Before
know about the problem at hand. a simulation can be run, the initial values of all
There are three types of models: iconic, ana- the state variables must be provided. As the simu-
logic, and symbolic. An iconic model is a visu- lation designer may not know what initial values
ally equivalent but incomplete 2-dimensional or are appropriate for the state variables, these val-
3-dimensional representation—for example, ues might be chosen somewhat arbitrarily. For
maps, globes, or built-to-scale models of struc- instance, it might be decided that a queue should
tures such as bridges or highways. An iconic be initialized as empty and idle. Such a choice of
model actually resembles the artifact modeled. initial condition can have a significant but unrec-
In contrast, an analogic model is a functionally ognized impact on the outcome of the simulation.
equivalent but incomplete representation. That is,
the model behaves like the physical artifact even 5.3. Prototyping
though it may not physically resemble it.
Examples of analogic models include a miniature Constructing a prototype of a system is another
airplane for wind tunnel testing or a computer abstraction process. In this case, an initial version
simulation of a manufacturing process. of the system is constructed, often while the sys-
Finally, a symbolic model is a higher level of tem is being designed. This helps the designers
abstraction, where the model is represented using determine the feasibility of their design.
symbols such as equations. The model captures There are many uses for a prototype, includ- ing
the relevant aspects of the process or system in the elicitation of requirements, the design and
symbolic form. The symbols can then be used to refinement of a user interface to the system, vali-
increase the engineer’s understanding of the final dation of functional requirements, and so on. The
system. An example is an equation such as F = objectives and purposes for building the proto-
Ma. Such mathematical models can be used to type will determine its construction and the level
describe and predict properties or behavior of the of abstraction used.
final system or product. The role of prototyping is somewhat different
between physical systems and software. With
5.2. Simulation physical systems, the prototype may actually be
the first fully functional version of a system or it
All simulation models are a specification of real- may be a model of the system. In software
ity. A central issue in simulation is to abstract and engineering, prototypes are also an abstract
specify an appropriate simplification of reality. model of part of the software but are usually not
Developing this abstraction is of vital importance, constructed with all of the architectural, perfor-
as misspecification of the abstrac- tion would mance, and other quality characteristics expected
invalidate the results of the simulation exercise. in the finished product. In either case, prototype
Simulation can be used for a variety of testing construction must have a clear purpose and be
purposes. planned, monitored, and controlled—it is a tech-
Simulation is classified based on the type of nique to study a specific problem within a limited
system under study. Thus, simulation can be either context [6*, c2s8].
continuous or discrete. In the context of software In conclusion, modeling, simulation, and pro-
engineering, the emphasis will be primarily on totyping are powerful techniques for studying the
discrete simulation. Discrete simulations may behavior of a system from a given perspective.
model event scheduling or process interaction. All can be used to perform designed experiments
The main components in such a model include to study various aspects of the system. How- ever,
entities, activities and events, resources, the state these are abstractions and, as such, may not model
of the system, a simulation clock, and a random all attributes of interest.
number generator. Output is generated by the
simulation and must be analyzed.
15-12 SWEBOK® Guide V3.0

6. Standards regional and governmentally recognized organi-


[5*, c9s3.2] [13*, c1s2] zations that generate standards for that region or
country. For example, in the United States, there
Moore states that a are over 300 organizations that develop stan-
dards. These include organizations such as the
standard can be; (a) an object or measure American National Standards Institute (ANSI),
of comparison that defines or represents the American Society for Testing and Materials
the magnitude of a unit; (b) a characteriza- (ASTM), the Society of Automotive Engineers
tion that establishes allowable tolerances (SAE), and Underwriters Laboratories, Inc. (UL),
for categories of items; and (c) a degree or as well as the US government. For more detail on
level of required excellence or attainment. standards used in software engineering, see
Standards are definitional in nature, estab- Appendix B on standards.
lished either to further understanding and There is a set of commonly used principles
interaction or to acknowledge observed (or behind standards. Standards makers attempt to
desired) norms of exhibited characteristics have consensus around their decisions. There is
or behavior. [13*, p8] usually an openness within the community of
interest so that once a standard has been set, there
Standards provide requirements, specifica- is a good chance that it will be widely accepted.
tions, guidelines, or characteristics that must be Most standards organizations have well-defined
observed by engineers so that the products, pro- processes for their efforts and adhere to those
cesses, and materials have acceptable levels of processes carefully. Engineers must be aware of
quality. The qualities that various standards pro- the existing standards but must also update their
vide may be those of safety, reliability, or other understanding of the standards as those standards
product characteristics. Standards are considered change over time.
critical to engineers and engineers are expected to In many engineering endeavors, knowing and
be familiar with and to use the appropriate stan- understanding the applicable standards is critical
dards in their discipline. and the law may even require use of particular
Compliance or conformance to a standard lets standards. In these cases, the standards often rep-
an organization say to the public that they (or their resent minimal requirements that must be met by
products) meet the requirements stated in that the endeavor and thus are an element in the con-
standard. Thus, standards divide organiza- tions straints imposed on any design effort. The engi-
or their products into those that conform to the neer must review all current standards related to
standard and those that do not. For a standard to a given endeavor and determine which must be
be useful, conformance with the standard must met. Their designs must then incorporate any and
add value—real or perceived—to the product, all constraints imposed by the applicable stan-
process, or effort. dard. Standards important to software engineers
Apart from the organizational goals, standards are discussed in more detail in an appendix spe-
are used for a number of other purposes such as cifically on this subject.
protecting the buyer, protecting the business, and
better defining the methods and procedures to be 7. Root Cause Analysis
followed by the practice. Standards also provide [4*, c5, c3s7, c9s8] [5*, c9s3, c9s4, c9s5]
users with a common terminology and [13*, c13s3.4.5]
expectations.
There are many internationally recognized Root cause analysis (RCA) is a process designed
standards-making organizations including the to investigate and identify why and how an
International Telecommunications Union (ITU), undesirable event has happened. Root causes are
the International Electrotechnical Commission underlying causes. The investigator should
(IEC), IEEE, and the International Organization attempt to identify specific underlying causes of
for Standardization (ISO). In addition, there are the event that has occurred. The primary objective
Engineering Foundations 15-13

of RCA is to prevent recurrence of the undesir- A very simple approach that is useful in quality
able event. Thus, the more specific the investiga- control is the use of a checklist. Checklists are a
tor can be about why an event occurred, the easier list of key points in a process with tasks that must
it will be to prevent recurrence. A common way be completed. As each task is completed, it is
to identify specific underlying cause(s) is to ask a checked off the list. If a problem occurs, then
series of why questions. sometimes the checklist can quickly identify tasks
that may have been skipped or only par- tially
7.1. Techniques for Conducting Root Cause completed.
Analysis Finally, relations diagrams are a means for dis-
[4*, c5] [5*, c3] playing complex relationships. They give visual
support to cause-and-effect thinking. The dia-
There are many approaches used for both quality gram relates the specific to the general, revealing
control and root cause analysis. The first step in key causes and key effects.
any root cause analysis effort is to identify the real Root cause analysis aims at preventing the
problem. Techniques such as statement-restate- recurrence of undesirable events. Reduction of
ment, why-why diagrams, the revision method, variation due to common causes requires utili-
present state and desired state diagrams, and the zation of a number of techniques. An important
fresh-eye approach are used to identify and refine point to note is that these techniques should be
the real problem that needs to be addressed. used offline and not necessarily in direct response
Once the real problem has been identified, then to the occurrence of some undesirable event.
work can begin to determine the cause of the Some of the techniques that may be used to
problem. Ishikawa is known for the seven tools reduce variation due to common causes are given
for quality control that he promoted. Some of below.
those tools are helpful in identifying the causes
for a given problem. Those tools are check sheets 1. Cause-and-effect diagrams may be used to
or checklists, Pareto diagrams, histograms, run identify the sub and sub-sub causes.
charts, scatter diagrams, control charts, and 2. Fault tree analysis is a technique that may be
fishbone or cause-and-effect diagrams. More used to understand the sources of failures.
recently, other approaches for quality improve- 3. Designed experiments may be used to under-
ment and root cause analysis have emerged. Some stand the impact of various causes on the
examples of these newer methods are affinity dia- occurrence of undesirable events (see Empir-
grams, relations diagrams, tree diagrams, matrix ical Methods and Experimental Techniques
charts, matrix data analysis charts, process deci- in this KA).
sion program charts, and arrow diagrams. A few 4. Various kinds of correlation analyses may be
of these techniques are briefly described below. used to understand the relationship between
A fishbone or cause-and-effect diagram is a various causes and their impact. These tech-
way to visualize the various factors that affect niques may be used in cases when conduct-
some characteristic. The main line in the diagram ing controlled experiments is difficult but
represents the problem and the connecting lines data may be gathered (see Statistical Analy-
represent the factors that led to or influenced the sis in this KA).
problem. Those factors are broken down into sub-
factors and sub-subfactors until root causes can
be identified.
15-14 SWEBOK® Guide V3.0

MATRIX OF TOPICS VS. REFERENCE MATERIAL

Montgomery and Runger 2007

Cheney and Kincaid 2007


Null and Lobur 2006

Sommerville 2011
McConnell 2004
Tockey 2004
Voland 2003

Fairley 2009

Moore 2006
Kan 2002

[11*]
[10*]

[12*]

[13*]
[2*]

[3*]

[5*]

[6*]

[7*]
[4*]

1. Empirical
Methods and
c1
Experimental
Techniques
1.1. Designed
Experiment
1.2.
Observational
Study
1.3.
Retrospective
Study
2. Statistical c9s1,
c10s3
Analysis c2s1
c3s6,
c3s9,
2.1. Concept of
c4s6,
Unit of Analysis
c6s2,
(Sampling
c7s1,
Units), Sample,
c7s3,
and Population
c8s1,
c9s1
2.2. Concepts of
c11s2,
Correlation and
c11s8
Regression
c3s1,
3. Measurement c4s4 c7s5
c3s2
3.1. Levels
p442
(Scales) of c3s2 c7s5
–447
Measurement
3.2. Direct
and Derived
Measures
Engineering Foundations 15-15

Montgomery and Runger 2007

Cheney and Kincaid 2007


Null and Lobur 2006

Sommerville 2011
McConnell 2004
Tockey 2004
Voland 2003

Fairley 2009

Moore 2006
Kan 2002

[11*]
[10*]

[12*]

[13*]
[2*]

[3*]

[5*]

[6*]

[7*]
[4*]
3.3. Reliability c3s4,
and Validity c3s5
3.4. Assessing
c3s5
Reliability
c1s2,
4. Engineering
c1s3,
Design
c1s4
4.1. Design in
Engineering
Education
4.2. Design c1s4,
as a Problem c2s1, c5s1
Solving Activity c3s3
4.3. Steps
Involved in
c4
Engineering
Design
5. Modeling,
c2
Prototyping, and c6 c13s3
s3.1
Simulation
5.1. Modeling
5.2. Simulation
5.3. Prototyping
c9
6. Standards c1s2
s3.2
c5, c9s3,
7. Root Cause c13
c3s7, c9s4,
Analysis s3.4.5
c9s8 c9s5
7.1. Techniques
for Conducting
c5 c3
Root Cause
Analysis
15-16 SWEBOK® Guide V3.0

FURTHER READINGS
A. Abran, Software Metrics and Software W.G. Vincenti, What Engineers Know and How
Metrology. [14] They Know It. [15]

This book provides very good information on the This book provides an interesting introduc- tion
proper use of the terms measure, measurement to engineering foundations through a series of
method and measurement outcome. It provides case studies that show many of the founda- tional
strong support material for the entire section on concepts as used in real world engineering
Measurement. applications
Engineering Foundations 15-17
15-18 SWEBOK® Guide V3.0

Das könnte Ihnen auch gefallen