CBDT3103 Intro To Distributed System PDF

Faculty of Science and Technology
CBDT3103
Introduction to Distributed System
Copyright © Open University Malaysia (OUM)

CBDT3103
INTRODUCTION
TO DISTRIBUTED
SYSTEM

Project Directors: Prof Dato’ Dr Mansor Fadzil
Assoc Prof Dr Norlia T. Goolamally
Open University Malaysia
Developer: Dr Lawrence Chi-Chung Cheung

City University of Hong Kong
Designer: Dr Rex G Sharman

OUHK
Coordinator: Dr Reggie Kwan

OUHK
Member: Dr Vanessa S C Ng
OUHK
External Course Assessor: Dr Cho-Li Wang

Hong Kong University
Adapted for Open University Malaysia: Prof Dr Mohammed Yusoff

Faculty of Information Technology and
Multimedia Communication, OUM
Reviewed by: Shadil Akimi
First Edition, January 2009
Copyright © The Open University of Hong Kong and Open University Malaysia,
February 2011, CBDT3103
All rights reserved. No part of this work may be reproduced in any form or by any means
without the written permission of the President, Open University Malaysia (OUM).

Table of Contents
Course Guide ix - xviii
Topic 1: Introduction to the Network and Distributed System 1

1.1 Networks 2
1.1.1 Network Goals 2
1.1.2 Design Issues of Networks 4
1.1.3 Network Software 6
1.2 Reference Models 8
1.3 Distributed Systems 10
1.3.1 The Relationship between Networks and
Distributed Systems 11
1.3.2 The Advantages and Disadvantages of
Distributed Systems 12
1.3.3 Characteristics of Distributed Systems 14
1.3.4 Architectural Models of Distributed Systems 19
1.3.5 Fundamental Models 21
Summary 22
References 22
Topic 2: Networking and Internetworking 23

2.1 Network Hardware 24
2.2 Types of Network Service 31
2.2.1 Connection-Oriented Service 32
2.2.2 Connectionless Services 33
2.2.3 Performance Comparisons 34
2.3 Quality of Service (QoS) 36
2.4 Networking Requirements for Distributed Systems 38
2.4.1 Local Area Networks (LANs) 38
2.4.2 LAN Topology 39
2.5 An Example of a Bus Network - An Ethernet 44
2.5.1 Rules 44
2.5.2 Frame Structure 49
2.5.3 Performance Analysis 50
2.5.4 Switched Ethernet 51

iv TABLE OF CONTENTS
2.6 An Example of a Ring Network · Token Ring 52

2.6.1 Rules 53
2.6.2 Frame Structure 54
2.6.3 Performance Analysis 56
2.6.4 Ethernet vs Token Ring 57
2.7 Internetworking 60
Summary 63
References 64
Topic 3: Transmission Control Protocol/ Internet Protocol (TCP/IP) 65

3.1 Wide Area Network (WAN) 66
3.2 Layered Network Architecture 69
3.2.1 ISO OSI Reference Model 69
3.2.2 TCP/IP Reference Model 71
3.2.3 ISO OSI vs TCP/IP 73
3.3 Internet Protocol (IP) 74
3.3.1 IP Addresses (IPv4) 74
3.3.2 Class 75
3.3.3 Comments on Ipv4 77
3.3.4 Datagram Header 78
3.3.5 IP Routing 82
3.3.6 Companion IP Protocols 86
3.4 Transmission Control Protocol (TCP) 89
3.4.1 TCP Port Number 90
3.4.2 TCP Services 92
3.4.3 TCP Multiplexing 104
3.4.4 User Datagram Protocol (UDP) 106
3.5 Internet Applications 107
3.5.1 Domain Name Service (DNS) 107
3.5.2 Email 109
3.5.3 Telnet 111
3.5.4 File Transfer Protocol (FTP) 112
Summary 113

TABLE OF CONTENTS v
Topic 4: Interprocess Communication (IPC) and

Remote Procedure Calls (RPC) 114
4.1 Marshalling and Unmarshalling 115
4.2 Synchronization 118
4.2.1 Synchronous Communication 119
4.2.2 Asynchronous Communication 120
4.3 Client-Server Communication 124
4.3.1 The Failure Model of the Request-Reply Protocol 124
4.3.2 RPC Exchange Protocol 126
4.4 Remote Procedure Calls (RPC) 128
4.4.1 The RPC Mechanism 130
4.4.2 Call Semantics 133
4.4.3 RPC Implementation 134
4.4.4 RPC Programming 135
Summary 145
References 145
Topic 5: Multicast Group Communication 146

5.1 Applications of Group Communication 147
5.2 Design Issues in Group Communication 148
5.3 Logical Time in Distributed Systems 151
5.4 Implementation of Multicast Group Communication 156
Summary 167
References 168
Suggested Answers to Self-Test Questions 169

vi TABLE OF CONTENTS

COURSE GUIDE

COURSE GUIDE W ix
INTRODUCTION
Welcome to CBDT3103 Introduction to Distributed System. This course is a one-
semester, three-credit, under-graduate-level course for OUM students seeking a
Bachelor Degree in Information Technology with Network Computing.
Assignment and test in this module will help you master the topics for a period of
one semester.
PURPOSE OF THIS COURSE GUIDE

This Course Guide tells you briefly what the course is about and how you can
work your way through the material. It suggests the amount of time you will
need to spend to complete the module, activities you need to carry out and
exercises you need to do and how best to allocate your time in mastering the
contents of this module. This module also gives you a general idea of when your
tutor-marked assignments are due.
Think of your study module as reading the lecture instead of hearing it from a
lecturer. Basically, in the open distance mode of education, the module replace
your live lecture notes. However, the module still require you to think for
yourself and to practice key skills. In the same way that a lecturer in a
conventional full-time mode of study might give you an in-class exercise, your
study module will have activities for you to do at appropriate points. You will
also find self-test questions in each unit. These activities and self-tests give you
practice in the skills that you need to achieve the objectives of the course and to
complete assignments and pass the final examination. You are also strongly
advised to discuss with your tutors, during the tutorial sessions, the difficult
points or topics you may encounter in the module.
COURSE OBJECTIVES
By the end of this course, you should be able to:
• Differentiate between networks and distributed systems.
• Explain the role of a network in a distributed system.
• Outline the challenges of designing and implementing distributed systems.
• Describe the architectural models of distributed systems.
• Identify the fundamental models of distributed systems
• Identify different network services.
• Outline the quality of services required in a network.

x X COURSE GUIDE
• Describe the basic concept of Local Area Networks (LANs).

• Give examples of LANs.
• Identify the functions of internetworking facilities.
• Describe the concept of layered network architecture.
• Outline the layered structure of the TCP/IP reference model.
• Describe the functions of Internet Protocol (IP).
• Describe the functions of Transmission Control Protocol (TCP).
• Identify different Internet applications.
• Explain the concepts of marshalling and unmarshalling.
• Describe synchronous and asynchronous communication.
• Outline the steps in a remote procedure call (RPC).
• Write network programs for RPC.
• Describe the advantages of group communication.
• Explain the definitions of atomicity, group and ordering.
• Discuss the design and implementation issues of group communication.
MODULE STRUCTURES
There are FIVE major topics in this modules. A brief summary of the five major
topics are given below.
Topic 1: Introduction to the Network and Distributed System

Topic 1 is organised into a number of sections. The first section comprises of the
unit overview, the objectives and the introduction. The next section introduces
the concepts of a network and briefly describes what you should know about
them before you start topic 2. The third section defines distributed systems. You
are shown the relationships between networks and distributed systems, and then
learn about their differences. The characteristics of distributed systems and the
architectural models of distributed systems are discussed near the end of this
third section, and three fundamental models are described. At the end of this unit
is a summary, followed by the suggested solutions to the self-test questions. The
self-tests have been placed at appropriate points throughout the topic so that you
can test your understanding of the material. You are advised to complete these
tests before checking the answers.
Topic 2: Networking and Internetworking

You are introduced to network hardware very briefly at the beginning of this
topic. Then, you concentrate on software protocol in the remainder of topic 2 The

COURSE GUIDE W xi
software protocols you study include network services, quality of services (QoS),
networking requirements for distributed systems, LAN (Local Area Network),
and the devices used in internetworking (repeater, bridge and router).
Ć Network services: You study two important types of network service ·

connection oriented services and connectionless services. They have different
characteristics, all of which are described and compared.
Ć Quality of Services (QoS): You learn which factors should be considered to

determine the quality of services.
Ć Networking requirements for distributed systems: You are shown how

networks support distributed systems and which parts of networks are
important for distributed systems.
Ć LAN: LAN (Local Area Network) is described in this topic, with two case
studies of common configurations · Ethernet and Token Ring. Quite a large
portion of this unit is devoted to studying LANs, since LANs support most
distributed systems (that are then connected to a WAN.
Ć Internetworking: Internetworking devices usually connect LANs together.

You receive a brief introduction to the different internetworking devices in
this unit.
Topic 3: Transmission Control Protocol/Internet Protocol (TCP/IP)

Topic 3 focuses on the study of TCP/IP (Transmission Control Protocol/Internet
Protocol). The first section of this topic introduces the concept of a Wide Area
Network (WAN) along with its packet-switching technology. WANs are a very
big topic in networking, but we are not able to cover every topic in this topic·
only TCP/IP. We concentrate on that, because it is the most popular
communication protocol in WANs, and almost every computer uses this protocol
to connect to the Internet · the biggest network group in the world. After
introducing WANs, we deal with the concept of the layered network architecture.
Two models are discussed: the International Standards Organisation Open
Systems Interconnection (ISO OSI) reference model and the Transmission Control
Protocol/ Internet Protocol (TCP/IP) reference model. Although this topic
focuses on TCP/IP, the ISO OSI reference model is discussed briefly, because it
was an important reference model in networking and, with the brief discussion
that is provided, you will appreciate the TCP/IP reference model more.IP
(Internet Protocol) is investigated after the layered network architecture. We go
through its functions, addressing, datagram [the Protocol Data Unit (PDU) of the
IP layer; PDU is defined in topic 1 ] header, routing, and companion protocols.
For TCP (Transmission Control Protocol), we study its functions, segment (the

xii X COURSE GUIDE
PDU of the TCP layer) header, services, and multiplexing function. TCP is
followed by a brief discussion of UDP (User Datagram Protocol). Some simple
Internet applications, such as DNS (Domain Name System), email, Telnet and
FTP (File Transfer Protocol), are introduced briefly.
At the end of this unit is a summary followed by the solutions to the self-test
questions. The self-tests are spread throughout the topic at appropriate points so
that you can test your understanding of the material. You are advised to complete
the self-tests before checking the answers.
Topic 4: Interprocess Communication (IPC) and Remote Procedure Calls (RPC)

In broad sense, inter-process communication (IPC) is the exchange of data
between one process and another. The two processes can be either within the
same computer or over a network. IPC implies a protocol that guarantees a
response to a request. This unit is mainly concerned with the characteristics of
protocols for IPC in a distributed system. It discusses protocols for the
representation of data items in messages of IPC. It also discusses the construction
of protocols to support the client-server communication · the communication
pattern that is most commonly used in distributed programs. Remote procedure
call (RPC) is dealt with as a case study of IPC methods. Note that this unit does
not cover low-level IPC methods such as message queues, shared memory, and
semaphores. The client-server communication model is one of the
communication patterns that use IPC. This client-server model is the most
commonly cited architecture when distributed systems are discussed. The
following figure shows a simple structure in which client processes interact with
server processes in separated machines. Client A communicates with Server 1 to
get a service from Server 1. Client B requests a service from Server 2, but Server 2
cannot handle the request by itself. Thus it communicates with Server 3 to
complete the service.
Server 1
Client A
Server 2
Client B
Server 3
A simple client-server model
Topic 4 first introduces the concept of IPC and then deals with the related
concepts of marshalling and unmarshalling. A discussion of synchronization
follows, and two kinds of communication mode are introduced: synchronous and
asynchronous communications. Then you study some design and

COURSE GUIDE W xiii
implementation problems surrounding client-server communication. Finally, you

learn the concepts underlying a remote procedure call (RPC), which is the most
popular model for client-server communication. We provide programming
examples to show how it works.
Topic 5: Multicast Group communication

Topic 5 focuses on group communication. Group communication is quite
different from client-server communication, such as RPC. For the latter, there is
only one source and one destination. When the source (sender) sends a message
to the destination, the destination receives it and then returns an
acknowledgement. When the sender receives the acknowledgement, the
communication is completed. For group communication, however, there will be
more than one destination and sometimes there will be more than one source.
The situation is totally different.
So, you should immediately begin to sense that the problems of handling group
communication are quite different from point-to-point communication, and weÊll
investigate the problems and suggest some solutions to solve them. Topic 5 will:
1. Define group communication along with its characteristics.
2. Define important concepts for group communication such as atomicity and
ordering.
3. Discuss the design and implementation issues of group communication.
YOUR STUDY PLAN

As you by now aware three-credit course requires 120 learning hours the
breakdown of which is shown in the following table 1.
Table 1: Study Plan
Activities Totals Hours

General understanding of module 5
Reading module (see guide in table 2) 60
Attending tutorial: 3 times of 5 hours each 15
Access OUM website 12
Work on assignment 15
Revision 18
Total 120

xiv X COURSE GUIDE
INDEPENDENT STUDY GUIDE

The following table gives a general guideline on the minimum total hours you
should spend on independent study.
Table 2: General Guideline
No. of Assessment
Topics
Hours Activities
Self-test 1.1
Self-test 1.2
1 Introduction to the network and distributed
10 Self-test 1.3
system
Self-test 1.4
Self-test 1.5
Self-test 2.1
Self-test 2.2
Self-test 2.3
2 Networking and internetworking 13
Self-test 2.4
Self-test 2.5
Self-test 2.6
Self-test 3.1
Self-test 3.2
3 Transmission Control Protocol/Internet
13 Self-test 3.3
Protocol (TCP/IP)
Self-test 3.4
Self-test 3.5
Self-test 4.1
4 Interprocess communication (IPC) and Self-test 4.2
12
remote procedure calls (RPC) Self-test 4.3
Self-test 4.4
Self-test 5.1
5 Multicast group communication 12
Self-test 5.2

COURSE GUIDE W xv
SUPPLEMENTARY COURSE MATERIALS

The following are important supplementary course materials to help you in this
course:
• Text book and supplementary reading text as suggested in the module. You
are advised to read the text.
• Non-Print materials.
Text Book
George Coulouris, Jean Dollimore and Tim Kindberg (2005) Distributed Systems:
Concepts and Design, 4th edition, Reading, MA: Addison Wesley Longman.
Non-Print Media
OUM will also provide you with e-materials to support you in your learning.
This e-materials are available in OUM portal, in particular in OUM Learning
Management System, known as myLMS. You are required to access this Learning
Management System. Faculty website is also contain in the portal.
COURSE ASSESSMENT
Formal assessment is of two components:
• Continuous assessment, which contributes 50% to your final mark
• Course examination, which contributes 50% to your final mark.
Continuous Assessment Components:

1. Involvement in online discussion 5%
2. One or two assignment 45%
• Involvement in Online Discussion

Online discussion with your tutors, on academic issues or problems related to
your study, namely in understanding the materials in the module or doing
your self-tests in the module and also online discussion with your and fellow
students represent important components of teaching and learning activities
at OUM. To help you to go through your online discussion, OUM has
developed a Computer-base Learning Management System, known as
myLMS. The system have a number of functionalities which enable the
students to access OUMÊs digital library, communicate with the tutors and
their fellow students. We strongly advised you to use this system.

xvi X COURSE GUIDE
• Assignment
For this course you are required to do one or two assignments. The objective
of the assignment is:
– To provide a mechanism for you to check your progress and make sure
that you have met certain learning objective
– To provide you with the chance to demonstrate your understanding of the
materials in the module.
– To provide an opportunity for you to apply what you have learned.
• How to Do Your Assignments

Please carefully read through the assignment question to make sure you
understand what is required before attempting an assignment. If you do not
understand an assignment or the instructions, please contact your tutor. Once
you have completed each assignment, you must send it (together with your
TMA form) to your tutor. Please make sure that each assignment reaches your
tutor on or before the deadline. You must be careful when you are using other
references in your assignments. Please do not commit plagiarism, if you
commit plagiarism, you will be penalized severely. Plagiarism is theft of
somebody elseÊs work or ideas. This applies just as much to using the work of
other students as it does to authors of books. However, you may include
parenthetical references to the works you cite, e.g. (Stott 2002, 38). You should
include a section at the end of your assignment called ÂReferencesÊ where the
full name, title, date and place of the publication of any references that you
have used appear. The way to cite a reference is:
Stott, V. (2002). Web server technology, 2nd edn., London: ABC Publishing.
COURSE EXAMINATION
Course examination will contribute 50% of the final mark. The examination is
divided into two parts: part one which will be conducted in the mid-term and
part two which will be conducted at end of the course. Each part contributes 25%
of the total mark of 50%. Part one will examine the first few topics and part two
will examine the last few topics of the module.

COURSE GUIDE W xvii
TUTORIALS
The course includes 5 tutorial meetings of two hours each. The tutorials are
conducted to provide an opportunity for you to meet your tutors and discussed
important points or difficult points or concepts in the module. In addition, you
have an opportunity to discuss self-test with your tutors or share your study
experiences and difficulties in your peer-to-peer group discussions. Although the
tutorials are not compulsory, you are encouraged to attend the tutorial meetings
as far as possible. It is strongly recommended that you attend all tutorials, as they
will provide considerable assistance in your study of this course. Moreover, you
will have the chance to meet with other distance learners who are taking the same
course.
GROUP PROJECT
Please do group project if it is specified in the course. The group project provides
you with the opportunity to show your ability to work in group, namely to do
group problem solving, sharing and communicate your ideas to group members.
You are required to use myVLE in this group project, i.e. to communicate and
share your ideas with the group members.

xviiiX COURSE GUIDE

T op i c 1X Introduction
to Network
and Distributed
System
LEARNING OUTCOMES
By the end of this topic 1, you should be able to:
1. Differentiate between networks and distributed systems;
2. Explain the role of a network in a distributed system;
3. Outline the challenges of designing and implementing distributed
systems;
4. Describe the architectural models of distributed systems; and
5. Identify the fundamental models of distributed systems.
X INTRODUCTION
The latter part of the 20th century sets the stage for the age of information, during
which technologies for information gathering, processing and distribution were
developed. One of the most important technologies developed for information
management was the computer. Computers were introduced after the Second
World War. Within half a century, computer technology and the scope of its
applications have developed very fast, and computers are now as common as
cars and TV sets.
In the early years of evolution, computers were large and expensive. Only
governments and large organisations had them for computation. Since the mid-
1980s, two important improvements were introduced that radically changed the
face of computers (and computing technologies).

2 X TOPIC 1 INTRODUCTION TO NETWORK AND DISTRIBUTED SYSTEM
1. The first development was in the power of microprocessors. The

computational power of microprocessors has increased two times every 1.5
years, but the price has fallen drastically. For instance, a $100 million
machine could execute one instruction per second in 1945. A $10,000
machine can execute 109 instructions per second now. That is the
price/performance gain of 1013.
2. The second development was the invention of computer networking. A

local area network (LAN), can connect hundreds of computers within a
building or on a campus and provide very high data-transmission rates
between them. The Internet allows machines all over the world to
communicate with each other.
Now, powerful computers connected through networks are found everywhere.

They are used in every aspect of business, including advertising, production,
shipping, planning, billing and accounting. Many companies have multiple
networks, and even primary and secondary schools are using computer networks
to provide students and teachers with access to the Internet. Networks can be
used to provide some complicated services and share important resources.
However, since a computer operating system cannot directly and effectively
work with general network protocols to provide complicated remote services and
handle issues related to the remote services, we need some additional software to
work with the above tasks to allow computers and networks to cooperate
properly. We call this linkage system a distributed system.
1.1 NETWORKS
Networks, or computer networks, have been growing rapidly. They are now an
essential part of our computer systems. According to Tanenbaum (1995), a
network is an inter-connected collection of autonomous computers. Most
computers in your organisation ă whether PCs or servers ă are most probably
connected to a network. A computer network consists of a series of computers
that are connected together so that they can communicate with each other.
Computer networks can also share peripheral devices like printers.
1.1.1 Network Goals

To Overcome Geographic Separation
The most obvious goal for network communication is to overcome geographic
separation. Just like a telephone network transfers voices over a distance,
computer networks provide data transmission services between separated

TOPIC 1 INTRODUCTION TO NETWORK AND DISTRIBUTED SYSTEM W 3
computers or data devices. For example, if a user wants to print a document, he

or she will send the document from his or her computer to the printer, which
may be located far away from the user, through a network. The printer will print
out the document.
Computer networks also enhance human communication. Branches of companies

and organisations that have been developed overseas can still communicate
efficiently and effectively with their headquarters, through computer networks.
Overseas trading is as simple as local trading, and international cooperation
among companies exists all over the world.
Networks can be classified into three different geographic scopes.
1. A wide area network (WAN) spans the longest distance, such as a city, a
country or even all over the world. The most common example of a WAN is
the Internet. Almost all networked computers are connected to it.
2. A local area network (LAN) is usually limited to a geographic scope of a

few kilometres. LANs are commonly used within a campus, a building or
even an office. A LAN usually supports a distributed system.
3. A metropolitan area network (MAN) is one that covers a metropolitan area,

such as cable TV (CATV).
To Share Resources
Another important goal of networking is to enable resource sharing. Some
devices (such as printers) are expensive so many users should be able to access
them. An obvious solution is to connect the devices to the network, and every
user who is connected to the network can share the devices. For example, many
users can share an expensive high-speed colour laser printer by connecting it to a
network with other computers. Sharing resources does not need to be limited to
physical (hardware) devices ă data files and application software can also be
shared.
To Support Distributed Systems

Another important but not too obvious goal is to support the operations of
distributed systems. All distributed services need computer networks to send
requests and to return corresponding results. The results may be a simple "yes" or
"no" answer, or a big binary file - depending on the nature of distributed services
and the request.

1.1.2 Design Issues of Networks

Computer networks are complicated systems. Many network technologies exist,
and each technology has features that distinguish it from the others. Many
different commercial products are developed in different ways, and sometimes
they may be combined to form a complicated network. To help you understand
the basic concepts of computer networks, the following main design issues of
computer networks are introduced: transmission media, network hardware, and
network software. All of them are elaborated further in Topic 2.
Transmission Media
Transmission media are used for the actual transmission (transportation) of data
or information. At the lowest level, computer networks encode data into a form
of signals, such as electromagnetic or optical signals, and send them through a
transmission medium. For example, copper wires are used to transmit data in the
form of electromagnetic waves from a sender to its corresponding receiver.
Each transmission medium has its own characteristics, such as bandwidth, delay,
cost and ease of installation and maintenance. You are introduced to different
transmission media in Topic 2.
Network Hardware
Let us consider which hardware should be included in a network. Figure 1.1
shows a simple high-level model of a network.
Figure: 1.1: A Simple high-level model of a network
The sender generates a message and puts it into the network. The network
receives the message and then transfers it to the receiver. The receiver takes the
message out and gives it to its application program. Note that there may be many
small networks (called sub-networks) connected to each other to form a big
network.
Structurally, a network includes a set of nodes inter-connected by a set of

transmission lines, and each connection is called a link. There are usually many
senders and receivers. Senders may send many messages and at the same time,
receivers may need to receive many messages from different senders. Practically,
senders and receivers can be computers, workstations or terminals. We assume

users run application programs in those machines and thus, we can call them
hosts or host computers (they can also be called nodes, end-stations, machines or
end-users).
There are many ways in which nodes and links are inter-connected to form sub-
networks. Usually, we classify them into two types of transmission technology:
1. Point-to-point communication: A network with point-to-point

communication consists of many connections between individual pairs of
machines. To go from the source to the destination, the sender needs to
provide the destination's address (location), and the message might need to
travel through some intermediate nodes before reaching its destination.
2. Shared-point communication: In this type of network a single

communication channel is shared by all machines on a network. If a
machine in the network wants to transmit a message to its corresponding
destination, it uses the shared common communication channel to do the
transmission, and the destination has to receive (copy, or pick up) the
message from the channel. Only the destination can copy the message from
the channel - others do not have the right to do so.
Usually, we classify a network by considering its coverage area. Thus, we have

the following classes of network:
Ć Local Area Networks (LANs): These are small networks that are usually
implemented for use in offices, buildings and campuses up to a few square
kilometres in size. They are widely used to connect personal computers,
workstations and devices in company offices to exchange information and
share resources. The three common kinds of network topology for LANs are
star, bus and ring. The size of LAN is small (the coverage area should be
within a few kilometres) so the transmission delay (the average time taken
between sending and receiving a message) is short (less than 10 ms). The
data-transmission rate (the average speed of transmitting a message from a
sender to a receiver) is high (from 107 to 109 bits per second). We focus on
LANs in our study, because LANs are usually built as distributed systems.
Ć Metropolitan Area Networks (MANs): A MAN is actually like a LAN but it

inter-connects computers and computing resources that span a single city, for
example, a large business organisation with several or many buildings
located throughout a city. Each building has its own LAN, and all of these
LANs are connected to one another, forming a big network that spans the
whole city, i.e. MAN.

Some of the following businesses in Hong Kong are likely to have their own
MAN: Park'n Shop, Marks & Spencer, G2000, and so on. Each of these firms
has a chain of shops in Hong Kong, and each shop's computer network is
connected to a broader MAN system.
Ć Wide Area Networks (WANs): They span a large geographical area such as a
country or a continent - or even the world. They connect many machines
together (at least thousands) and usually the transmission delay is long (up to
a few seconds) and the transmission rate is low (until now it may be up to 106
bits per second but still lower than LANs). One of the most common
examples of a WAN is the Internet.
Ć Wireless Networks (or Mobile Networks) are another type of network. They
use a wireless transmission medium. Many users who have desktop machines
on LANs and WANs want to work outside with their computers. It is
impossible if their computers are wired, thus, there is a lot of interest in
exploring the use of wireless networks. The terminals used in wireless
networks are mobile computers or personal digital assistants (PDAs).
Wireless networks are necessary, especially when the environment is difficult
for cabling or when users are always "on the move".
SELF-TEST 1.1
1. Give three reasons for using computer networks. Briefly explain
each.
2. State the names of the different transmission media that you
know.
3. Suppose there are n computers and you want to have a
communication path between any two of them. This is to be
achieved by direct (point-to-point) connection with links only (no
switching nodes or routers). How many links are required? What
implication can you draw from your answer?
1.1.3 Network Software

Modern networks are highly complicated systems, but your learning and
understanding would be incomplete if we only consider the hardware side of
things. Network software is also highly structured and complicated. To reduce
the network design complexity, most of them are organised as a series of layers,
as shown in Figure 1.2.

Figure 1.2: Layers, protocols, and interfaces
A layer is defined as a service provided for its upper layer. The number of layers
and the functions of each one are different for different networks. Each layer is
independent from the others, but there is a communication interface between two
adjacent layers.
Consider two computers, host 1 and host 2, that communicate with each other.
Both have the same number of layers. Note that the number of layers in two
computers does not need to be the same, but when a sender wants to
communicate with a receiver, each host must have corresponding layers in its
system.
For example, for layer 2 of host 1 to communicate with host 2, host 2 must have
the corresponding (peer) layer 2 in its system. The set of rules governing the
message exchange between two machines in layer n is called n-peer protocol or
simply layer n protocol. The messages exchanged between these two layers are
called n-PDUs (Protocol Data Units of layer n). The format and the meaning of
the fields in n-PDUs are specified in the layer n protocol.
To process the communication, layer n uses the services provided by layer n ă 1

and the interface between layer n and layer n ă 1 (layer n ă 1/ n interface) is
standardised. When a sender, say host 1, wants to send a message to its
corresponding receiver, say host 2, the message first passes to the highest layer,
say N, of host 1. After processing by layer N, the resulting message is passed to
layer N ă 1. By repeating the above steps, we have the lowest layer, layer 1 of host
1, send the final message to the receiver through the physical transmission
medium. Layer 1 of host 2:

1. Receives the message through the physical transmission medium;

2. Processes the message; and
3. Passes it to its upper layer, layer 2.
By repeating the above steps, the highest layer of host 2 will receive the original
message and then pass it to the corresponding application program. The whole
process, which sends messages from one side to the other, is called network
protocol.
1.2 REFERENCE MODELS

You might (or might not) know that there are many different networks and
network protocols in the world. If we did not have a standard model to follow, it
would be difficult for users to communicate with others whose network protocol
is different from theirs - it is like two computers "speaking" different dialects (or
perhaps different languages). Software designers need to use a great deal of effort
to overcome this problem.
Around the early 1980s, the International Standard Organization (ISO) proposed
the Open System Inter-connection (OSI) reference model. The model aimed to
standardise network components to allow multi-vendor development and
support. This OSI reference model was expected to become the dominant
standard in the computer network market. It is a layered reference model with
seven layers - the physical layer, data link layer, network layer, transport layer,
session layer, presentation layer, and application layer. They are all well defined,
well structured, and each layer has its own networking function(s).
However, the ISO OSI reference model is too complicated (too many layers). In
the 1970s, the United States (US) Department of Defence developed a research
network called ARPANET. Then, after further development, it became the
TCP/IP reference model (some texts refer to TCP/IP as a suite of network
protocols) and was released in the commercial market in the 1980s. Then it
quickly became the dominant model or standard in the computer networks
market, which is a major reason why the Internet developed so rapidly - there
was a common network protocol suite. The most important reason why the
TCP/IP reference model succeeded (over the ISO OSI reference model) was its
simplicity and ease of operation. The TCP/IP reference model, shown in Figure
1.3, has only four layers - host-to-network layer, network or IP (Internet Protocol)
layer, transport or TCP (Transmission Control Protocol) layer, and application
layer.

Application
TCP
IP
Host-to-network
Figure 1.3: The TCP/IP reference model
1. The host-to-network layer handles all physical transmission issues - the

format of the physical signals, the network topology, the hardware devices
(transmitters, receivers and routers) and low-level network protocol (related
to physical networks). All physical network configurations are related to
this layer.
2. IP (Internet Protocol) is the second layer. It has two main functions:
(a) To define a unique and well-defined IP address for each machine and
to define the format of its PDU (i.e. datagram); and
(b) To provide services to route a datagram from a sender to its
corresponding receiver through a network.
3. TCP (Transmission Control Protocol) is the third layer. It provides end-to-

end reliable communication between two user-processes in two different
machines. Note that there is also a disconnected protocol defined over IP. It
is called the User Datagram Protocol (UDP).
4. The application layer is the highest layer. It provides some simple

application services for end-users on top of TCP such as FTP (File Transfer
Protocol), SMTP (Simple Mail Transfer Protocol), and HTTP (Hypertext
Transfer Protocol).
Because of the popularity of the TCP/IP reference model, you are given a
complete picture of it in Topic 3. In Topic 3, you learn the functions of IP
and TCP, and how they support the application layers. You also learn some
simple application services such as FTP and email services.

SELF-TEST 1.2
1. Which type of network (LAN, MAN or WAN) supports the

following scenario:
(a) A large shopping centre (e.g. Festival Walk)?
(b) The School of Science and Technology in the Open
University of Malaysia?
(c) World Wide Web (WWW)?
Justify your answers.
2. Which layer of the TCP/IP reference model handles each of the

following functions:
(a) Corrects the errors of the physical signals?
(b) Shows the path that a message travelled from a sender to its
corresponding receiver?
(c) Provides an email service?
(d) Corrects the errors of the received data bits?
Justify your answers.
1.3 DISTRIBUTED SYSTEMS

As mentioned previously, a computer operating system cannot directly and
effectively work with general network protocols to provide complicated remote
services and handle issues related to remote services. The reason is that, in the
beginning, the design of a computer operating system did not include the
support of remote services. Computers and computer operating systems were
introduced after the Second World War, but networking became popular (even
possible) more than 30 years after the first computers were developed. Therefore,
it is easy to see that operating systems were designed for standalone computers
but not for networked computers.
You might say that operating systems could be upgraded to support the remote
services, but it is always easier and cheaper to install additional software to
handle remote services than to upgrade the original ones. Therefore, distributed
systems software is the additional software to support remote services for a set of
computers connected by a computer network.

A typical distributed system is shown in Figure 1.4. It shows the components of a

distributed system based on a LAN. Such a system, equipped with appropriate
software, can support the needs of a substantial number of computers (users),
thereby performing a function similar to a single powerful computer.
Figure 1.4: A simple distributed system
A distributed system is defined as a collection of autonomous computers linked

by a network with software designed to produce an integrated computing
facility. Tanenbaum (2002) also provides a simple and direct definition of
distributed systems - a distributed system is a collection of independent
computers that appears to its users as a single coherent system. Keep this
definition in mind, but you will not look into the details of distributed systems
until you have finished Topic 4 and Topic 5.
1.3.1 The Relationship between Networks and

Distributed Systems
A beginner can easily get confused between the definitions of a network and a
distributed system. A network, or specifically a computer network, is an inter-
connected collection of autonomous computers. Two computers are inter-
connected if they can exchange messages or information. The connection between
them is through a transmission medium such as copper wire, fibre optics,
microwaves or satellite.

The key difference between the two systems is, in a distributed system, the
existence of multiple autonomous computers is transparent to the users or
appears to the users as a single computer. Users can use the services provided by
the distributed system, input some data (parameters or files) to the system and
wait for the output from the system. Users do not need to know exactly how and
where the remote services are in the system.
For a network, users must explicitly log on to a machine, explicitly know what
the machine can do, explicitly submit data to the correct location, and explicitly
tell the machine how to return their results (e.g. give their own logical addresses
to the machine).
In fact, a distributed system is built on top of a network. Networks are just one of
the resources of distributed systems, and distributed systems use them to deliver
and receive data. For example, both distributed systems and networks support
file movement, but users in networks need to know the locations of the sender
and receiver, the network configuration and which network protocol is used,
whereas users in distributed systems do not need to know these things. In fact,
they should not know these details.
1.3.2 The Advantages and Disadvantages of

Distributed Systems
The main advantage of building a distributed system is resource sharing, which
is similar to a network but which is done more efficiently and effectively.
Compared to standalone computer systems, distributed systems can share
common databases and some expensive peripherals, e.g. printers, in a better way.
The following are the advantages of distributed systems over standalone
computers:
Ć Economy: Microprocessors in distributed systems offer a better

price/performance gain than in standalone computer systems. For a
standalone computer system, an expensive, high-performance and high-
reliability CPU is used, whereas we can achieve the same performance using
a large number of cheap CPUs together in a distributed system.
Consider the following example. A process requires ten hours for execution
by a high-speed computer. But in a distributed system, we can use ten cheap
and slow CPUs, with the speed which is ten times slower than the high-speed
one, in parallel to finish the same process. Both may use the same amount of
time to finish the job, but the cost of 10 slow CPUs will be much lower than
that of the high-speed one.

Ć Inherent distribution: Some applications involve spaciously separated

machines. For example, a supermarket chain may have many stores.
Management needs to keep track of inventory at each store and update this
kind of information at headquarters. To implement this application, a
commercial distributed system is a natural choice.
Ć Reliability: A distributed system is more reliable than a standalone computer.

A standalone computer will crash if its CPU crashes. However, in a
distributed system, if a single machine crashes, the rest of the system can still
survive and operate properly with some fault tolerance facilities (i.e.
hardware or/and software). Also, as the CPU of a standalone computer
system is usually very expensive, it would cost a lot to replace. But, it is much
cheaper to replace malfunctioning components in a distributed system.
Ć Incremental growth: Computing power cannot be upgraded easily in a

standalone computer system, since the cost of upgrading the CPU is very
high. However, for distributed systems, we do not need to upgrade all
machines at the same time. We can upgrade the CPU of each individual
machine (one at a time) so that the system grows incrementally · step-by-
step.
Nevertheless, there are also some disadvantages of distributed systems with

regard to standalone computer systems:
Ć Software complexity: Designing the software for a distributed system is much

more complex than designing for a standalone computer system, because the
software for distributed systems has to take care of many machines and their
interactions. For example, we need to design a system that allows for sharing
resources while maintaining operational consistency in the users' machines,
but consistency is much simpler in standalone computer systems.
Ć Communication delay: There is almost no communication delay for

standalone computer systems. In distributed systems, however, the delay can
be significant because there are no dedicated links - all users share the many
paths in the network. Sometimes, because of the long communication delays,
users prefer to execute a program locally rather than in a remote server.
Ć Security: It is difficult to protect confidential data in distributed systems,

because many users can access shared resources and the system must be
capable of identifying unauthorised users. Most standalone computer systems
do not have this feature - they are closed systems and it is not easy
(impossible) to access them from the outside. Distributed systems, however,
are usually connected to the Internet, and thus, anyone can try to access
(hack) their systems via this worldwide WAN.

READING
Topic 1, sections 1.1ă1.2 (Introduction), 2ă7.
SELF-TEST 1.3
1. Name one network and one distributed service.

2. Give another example to show the difference between computer
networks and distributed systems.
1.3.3 Characteristics of Distributed Systems

To design and implement a distributed system, the following characteristics
should be considered:
Ć Resource sharing
Ć Heterogeneity
Ć Openness
Ć Security
Ć Scalability
Ć Fault-handling
Ć Concurrency
Ć Transparency.
Note that all of the above characteristics should be considered but they do not
need to be implemented at the same time. Sometimes, the characteristics that
should be implemented depend on the nature of application services provided
(or desired). Thus, we can say the above points bring challenges when building a
distributed system.
Resource Sharing
As mentioned, resource sharing is the most important characteristic or advantage
of a distributed system, and thus, all distributed systems should deal with this
issue. The term "resource" is abstract, since it can represent hardware (e.g.
printers, CPUs) or data (e.g. shared database, share executed files). To manage
the resource(s) effectively, a program called resource manager is required to

provide an interface between the resource and users. The resource manager
should provide the resource name, identify the resource location, map the
resource name to a communication address, and coordinate concurrent accesses
to ensure consistency.
Heterogeneity
Heterogeneity applies here to a variety of different hardware and software
components operating together in the different levels in a distributed system:
Ć Networks
Ć Computer hardware
Ć Operating systems
Ć Programming languages
Ć Implementations by different developers.
Since a distributed system can be implemented by more than one group of

developers and it might be supported by different hardware and software, a
standard protocol or interface is essential for all of them to ensure that the system
works properly.
For networks, the most common approach of linking different components

together is to use the Internet protocols, i.e. TCP/IP. You investigate TCP/IP in
Topic 3.
For computer hardware and operating systems, heterogeneity is not a major

problem, because computers communicate through networks, and thus, it will
not affect a distributed system if exchange messages are standardised.
For programming languages, heterogeneity might cause problems because

program files might produce different results if they compile and execute in
different machines. Now, however, some programming languages, such as Java,
are platform independent, and produce programs that are compatible and
interoperable on all machines.
To handle different implementation approaches for different developers,

standard interfaces are needed for each application service, i.e. standardise the
input data and output result format. Thus, even if a client requests an application
service from a server that was implemented by other developers, the client and
server - even though they are different - can still communicate with each other, as
they "speak the same dialect".

Openness
Openness is the characteristic that determines whether a distributed system can
be extended or expanded in various ways. For hardware, we should be
concerned about whether additional peripherals, memory or communication
interfaces can be put into the system or not. For software, additional operating
system functions, communication protocols and resource-sharing devices should
be able to join the system without any modification to the system.
Security
Many of the information resources are maintained in a distributed system for the
users to share. However, some critical resources should not be shared by
unauthorised users, but need to be protected. There are two kinds of protection.
1. Resources that should not (must not) be accessed by unauthorised users

must be protected. A firewall is usually used to form a barrier around a
distributed system so that all incoming and outgoing traffic will be
inspected.
2 If sensitive information is sent in a message over a network, you also need a

security procedure to protect the message so that unauthorised users cannot
access the content of the message. Security is not always implemented in a
distributed system, because it depends on how sensitive the information
resources are.
Scalability
Distributed systems can operate effectively at many different scales. A system is
scalable if it remains stable when the number of users and the amount of
resources are increased significantly - in other words, adding users does not
adversely affect the way the system works. Usually there are three kinds of scale:
1. The smallest one is two workstations with one file server:
2. The middle and most common one is a distributed system within a LAN.
Hundreds of workstations and several file servers and printer servers might
be interconnected; and
3. The largest one involves inter-networking. Several inter-connected LANs

might contain thousands of computers and many shared resources.
Fault-Handling
Sometimes, distributed systems fail. Some output results might be incorrect,
some incoming requests might be lost, a server might be down, or some services
stop before they complete the computation. A good distributed system should be

capable of detecting, correcting, or even preventing such failures, although the

failures are difficult to handle.
Generally there are two kinds of failures - hardware and software. Hardware
redundancy is used to handling hardware failures, i.e. redundancy components
replace the failed ones. Programs should be designed to tolerate or automatically
recover from faults of software failure.
Fault-handling, like security, is not always important and is not always

implemented in a distributed system, because it depends on how critical the
information resources are. Since it is difficult to implement (sometimes the cost of
the system may be double if fault-handling is included), it would be considered
only if the resources are extremely critical (e.g. banking database).
Concurrency
Since there are many clients (users) and several servers in a distributed system, it
is possible to have more than one process executing in parallel. Concurrency is
one of the intrinsic characteristics of distributed systems. There are two reasons
for parallel executions to occur:
1. Many users simultaneously invoke commands or interact with (the same)

application programs.
2. Many server processes run concurrently, each corresponding to a single

request from a client process.
The major drawback of this characteristic is the problem of inconsistency. For

example, if more than one process executes in parallel with the same database,
some processes might get inconsistent data. To overcome this, some database
updating algorithms might be added to avoid the inconsistencies.
Transparency
This characteristic is hidden from the user and the (application) programmer. A
distributed system is transparent if it achieves the image of being a single system
to make everyone think that the collection of independent components is simply
a single time-sharing system.
There are several transparencies that should be achieved in a distributed system:
Ć Access transparency enables local and remote resources to be accessed using

identical operations. That means it does not matter whether the resources are
from local or remote machines - the way to access them should be identical
(or very similar at the least).

Ć Location transparency enables resources to be accessed without knowing

their location. Users might need to know the name of the resources but not
their location.
Ć Concurrency transparency enables several processes to operate concurrently

(simultaneously, at the same time) using shared resources without
interference between them. The most common example of this transparency is
multiple processes accessing a shared database. All processes can retrieve and
save data from the shared database, but the database must still maintain the
consistency of the base data.
Ć Replication transparency enables multiple instances (copies) of resources to

be used to increase reliability and performance without the users or
application programmers knowing anything about the replicas. The multiple
instances of resources are usually distributed uniformly (through the system),
so that users can always find one of the instances close to them.
Ć Failure transparency enables the concealment of faults, and allows users and
application programs to complete their tasks despite the failure of hardware
or software components. The power of failure transparency is highly
dependent on how many resources are held in reserve within this fault-
tolerance scheme.
Ć Mobility transparency allows the movement of resources and clients within a

system, without affecting how the users or the programs operate. It is highly
related to location transparency.
Ć Performance transparency allows the system to be reconfigured to improve

performance as loads vary. This is very difficult to implement if the loading of
a system increases a lot, because a distributed system has no way to improve
its performance significantly.
Ć Scaling transparency allows the system and applications to expand in scale

without changing the system structure or the application algorithms.
READING
Topic 1, section 1.4, 16ă25.

SELF-TEST 1.4
1. Name the resources that can be shared in a distributed system.

2. Which two transparencies do you consider to be the most
important? Justify your answer.
1.3.4 Architectural Models of Distributed Systems

An architectural model is a structural model of separately specified components
that provides a consistent frame of reference for the design. A distributed system
can be described from the following perspectives as:
Ć A layered software model;
Ć According to the system architecture; and
Ć In the design requirements.
Layered Software Model

Consider the layer structure of a standalone computer system in Figure 1.5. It
shows its four main hardware and software layers - applications, run-time
support for programming languages (for example, interpreters and libraries), the
operating system and hardware components. Note that the hardware component
includes both computer and network hardware.
Application
Run-time support
Operating system
Hardware
Figure 1.5: The layered structure of a standalone computer system
For distributed systems, however, there is a different layered structure. The

platform is the lowest-level hardware and operating system, which is
independent from the distributed system. Middleware is a layer designed to
implement a distributed system. It provides distributed services to application
programs.

System Architecture
In a distributed system, processes are arranged together to perform useful tasks.
The following are the four types of system architecture.
1. The client-server model is the most widely used model. Clients send
invocations (requests to an authority) to the servers (the authority) for its
remote services. Then the server executes the remote services based on the
invocations and sends the results back to the clients.
2. Services provided by multiple servers. This model is usually used to

provide complicated services such as fault-tolerance or security issues.
3. Proxy servers and caches. The cache is a fast secondary storage device that
records the most recently used data objects. When a client requests an
object, the caching service first checks the cache and supplies the object
from the cache if it is available. If not, a search is required through the Web
servers. The cache will be updated when the object has been found - this
most recently sought object will be added into the cache memory.
4. Peer processes. Sometimes, some processes play similar roles, interacting

co-operatively as peers to perform a task. This model is usually used
because of the nature of the application, e.g. group communication.
Design Requirements
There are four requirements in the design of a distributed system:
1. Performance issues concern how a distributed system functions or performs

when it executes some application services. Since an application service
usually exchanges messages through a network, the network performance
is highly related to the performance of a distributed system. There are two
main parameters for network performance:
(a) Message transmission delay (the time taken to send a message from a
sender to its corresponding receiver).
(b) Throughput (the data transfer rate).
Also, the software-processing rate and the computational-load balancing

(the load distribution of machines in a distributed system) are factors of
performance.
2. Quality of Service (QoS). The QoS experienced by clients and users is the
reliability, security and performance. The concern here is whether fault-
tolerance can be achieved in a distributed system to maintain its reliability
and availability. As for security, a reasonable degree of security should be

applied to the data that are stored and transmitted within a distributed
system.
3. Use of caching and replication. Both cache and replicated servers should be
used to improve the performance and availability within a distributed
system. The concern is how to validate a cached response, how to refresh
cache and how to maintain the consistency of cache and replicated servers.
4. Dependability issues. The dependability of computer systems is defined as

correctness, security and fault tolerance. Note that QoS is related to the
client side, whereas dependability issues are related to the server side.
1.3.5 Fundamental Models

To classify the distributed system, the following classification models are based
on the fundamental properties of systems.
Ć Interaction model. In these kinds of distributed system, processes interact by

passing messages that result in communication and coordination among
processes. This is the most common model. Examples are the client-server
and group communication models.
Ć Failure model. Since processes and communication networks might fail in a

distributed system, this model defines how the system should recover when
the above failure occurs.
Ć Security model. Since resources are shared within a distributed system, this
model defines how to protect those resources from being accessed by the
unauthorised users, and it provides a secure way for authorised users to
access the shared resources.
READING
Topic 2, section 2.2, 31ă47.

SELF-TEST 1.3
1. Instead of building a middleware layer, is it possible to build the

distributed application services into the application layer for a
layered structure of distributed systems? Justify your answer.
2. Give a practical example for each fundamental model.
• Topic 1 has looked at some basic concepts of networks and distributed

systems. By now you should know the basic definition of networks and their
characteristics, which you will find very useful when you study Topic 2.
• You should also know the definition of distributed systems and be aware of
their advantages over standalone computer systems. Moreover, you should
now understand the differences between networks and distributed systems.
• You have also learned the basic characteristics of distributed systems. The
section dealing with architectural models demonstrates very clearly how the
layered software in distributed systems is significantly different from the
layered software in conventional standalone computer systems.
• You should also understand the four system architectures and their design
requirements, and the fundamental models of distributed systems.
• You should now be ready to move on to a more detailed study of networks,

the focus of Topics 2ă3.
Tanenbaum, A S and van Steen, M (2002) Distributed Systems: Principles and

Paradigms, Upper Saddle River, NJ: Prentice Hall.

Topic 2 X Networking
and Inter-
networking
LEARNING OUTCOMES
1. Identify different network services;
2. Outline the quality of services required in a network;
3. Describe the basic concept of Local Area Networks (LANs);
4. Give examples of LANs; and
5. Identify the functions of internetworking facilities.
X INTRODUCTION
Most of the worldÊs computers are now connected to some form of network. In
fact, the idea of computer networking was introduced soon after the invention of
computers, and now you can find computer networks all over the world. A
computer network consists of a series of computers that are connected together to
allow them to communicate with each other. Through a network, computers can
share peripheral devices like printers, databases and, perhaps most importantly,
information. In fact, the truly standalone PC is rapidly becoming a thing of the
past.
The main issues involved in connecting computers are network hardware and
network software. Let us briefly go through the network hardware issues in this
section, and later you can concentrate more on the network software. The main
focus of this course deals with the software part of computers and systems, and
distributed systems are built on the top of the network software.

24 X TOPIC 2 NETWORKING AND INTER-NETWORKING
2.1 NETWORK HARDWARE

Network hardware deals with the issues of transmitting raw signals, such as
electronic signals, from a sender to its corresponding receiver. Network hardware
can be divided into three parts:
1. Transmission media;
2. Short-distance communication; and
3. Long-distance communication.
Transmission Media
Transmission media are used to provide a connection between two machines to
exchange information. These media can be classified into two groups - guided
and unguided. Guided media define a physical and tangible property through
which the signals are transmitted between the communication points, such as
twisted pair, coaxial cable, and optical fibre. Unguided media have no physical
(tangible) connection between two points, such as satellite, microwave, and
infrared.
Guided Media
A twisted pair medium is made up of pairs of copper wires that are insulated and
twisted together. They are widely used in telephone networks and are good for a
data rate (bit rate) in the order of 10ă100 Mbps over 100 metres, and at lower bit
rates over longer distances. Unshielded twisted pair cable (UTP), shown in Figure
2.1, is a four-pair wire medium that is commonly used in LANs because of its low
price and easy installation.
Figure 2.1: Unshielded twisted pair (UTP)

Source: Cisco Systems Networking Academy

TOPIC 2 NETWORKING AND INTER-NETWORKING W 25
Coaxial cable, shown in Figure 2.2, contains a central wire inside an outer circular
copper wire mesh. The space between these two conductors is filled with a
dielectric insulating material. The data rate is typically 10ă100 Mbps over a
maximum cable length of 500 metres. Coaxial cable is better than twisted pair in
protecting electronic signals against external noise.
Figure 2.2: Coaxial cable

Optical fibres, shown in Figure 2.3 below, are plastic or glass fibres with a light
source on one end and a light detector on the other. They are immune to electrical
noise and support very high-data rates (up to 5 Gbps) over a cable length of 2ă3
km.
Figure 2.3: Fibre optic cable


Unguided Media
Satellites are used for very high frequency (GHz) radio communication. A
satellite travels in space and moves synchronously with the rotation of the earth -
that is, in a geosynchronous orbit. A transmitting antenna sends its signal to the
satellite and then the satellite reflects the signal to the earth. Its data rate is high
but the satellite and transmitting antenna are very costly. Thus, satellites are
mainly used for video broadcasting.
Microwaves, as shown in Figure 2.4, are typically used for high-frequency

communication such as radio and TV communication. The transmission is limited
to line-of-sight and is susceptible to adverse weather conditions such as rain,
snow and fog.
Figure 2.4: Microwave transmission (line of sight)

Table 2.1 provides a quick reference tabulation of the transmission media. You
can find the advantages and disadvantages from the table.

Table 2.1: A General Description of Transmission Media
Type Advantages Disadvantages

Twisted pair Easy to install, low cost, light Easily affected by external noise,
weight low data rate
Coaxial High data rate, well protected High cost, difficult to install, heavy
cable from external noise
Optical fibre Extremely high data rate, Very high cost
extremely low error rate, light
Satellite Very high data rate, covers a wide Low security, high cost for
area, low cost for receiver transmitter
Microwave High data rate, fast setup Low security, quality may be
affected by adverse weather
Short-distance Communication
Computers use binary digits (bits) to represent data. Thus, transmitting data
means sending bits from one end to the other through a transmission medium.
The simplest way to transmit bits is to use a small electric current to encode data.
A negative voltage is usually used to represent a logical „1‰ and a positive
voltage is used for logical Â0Ê. To transmit a Â0Ê bit, the hardware device of a
sender inputs a well-defined positive voltage into a wire for a specified short
time. Then the hardware device of its corresponding receiver will receive the
signal and interpret it as a „0‰ bit. This is the physical way for a sender to send a
logical bit and a receiver to receive a logical bit in a short-distance
communication.
To ensure that communication hardware built by different vendors can work

together, standard specifications are needed for communication systems. One
particular standard that is widely accepted in the market is RS-232. Table 2.2
summarises some of the features of RS-232.
Table 2.2: General Descriptions of RS-232
Distance of the connection Less than 50 feet

Voltage specification +15V for Â0Ê; -15V for Â1Ê
Devices to connect Modem, terminal
Data type Character (usually ASCII character)
Communication type Asynchronous communication

Figure 2.5 shows an example of asynchronous communication for RS-232. When

there are no data for communication (i.e. idle state), the voltage is kept in ă15V
(i.e. logical „1‰). Once a sender wants to send a character, it first sends an extra
„0‰ bit (called the start bit) to inform its corresponding receiver that data
communication has started. Then a 7-bits ASCII character is sent (1001101 in our
example) with one parity bit. A parity bit is an additional bit for error checking. A
parity bit is to make the total number of binary 1 digits in a character or word
(including the parity bit) either odd (for odd parity) or even (for even parity). In
our example, the number of „1s‰ in the character is four, which is an even
number. If even parity is used, the value of the parity bit should be „0‰;
otherwise, the parity bit should be Â1Ê for odd parity to make the total number of
„1‰ an odd number (i.e. five). After sending a start bit, the data character and a
parity bit, a bit of logical „1‰ (called as stop bit) is sent to inform the receiver that
the data communication has ended.
Figure 2.5: An example of asynchronous communication for RS-232
Long-distance Communication
The hardware for long-distance communication provides us with another
problem. An electrical current cannot be transmitted too far, because the current
becomes weaker as it travels. This is known as signal attenuation (loss of
communication signal energy). Thus, we need to use modulation to send data.
Modulation is a process by which the characteristics of electrical signals are
transformed to represent the data. Instead of transmitting an electric current,
long-distance communication systems send a continuously oscillating signal as a
data-bearing signal, usually in the form of a sine wave called a carrier, as shown
in Figure 2.6.

Figure 2.6: Carrier signal (unmodulated)
However, the carrier signal alone is not sent in the form shown in Figure 2.6
above. The data are encoded in an analog format into the carrier signal to become
what is called a modulated signal. The data are then transmitted through the
modulated carrier. That means, based on the binary data that you want to send,
you modify the signal a little - but the modifications are based on the values of
binary data and the modulation techniques required.
The three common kinds of modulating technique are: amplitude modulation

(AM), frequency modulation (FM) and phase modulation (PM).
1. In amplitude modulation (also known as amplitude shift keying),

information is conveyed through the amplitude of the carrier signal.
Different voltage levels of amplitude are used for different bit values.
Figure 2.7 shows how data are transmitted using the AM medium.
Figure 2.7: Amplitude modulation (AM) signal

Source: http://www.linuxports.com/howto/intro_to_networking/c4014.htm
2. In frequency modulation (also called as frequency shift keying), different

frequencies are used to represent different bit values. Figure 2.8 very clearly
shows the frequency changes associated with this transmission medium.

Figure 2.8: Frequency modulation (FM) signal

Phase modulation uses different phase changes (0o or 180o) to represent different
bit values. Figure 2.9 shows how phase modulation works.
Figure 2.9: Phase modulation signal

When a modulated signal arrives at the receiver, the receiver demodulates the
signal and coverts it into binary data.
What is provided above is only a very brief description (and concepts) of the
computer hardware associated with data transmissions. The next section
introduces the software components of computer networks.

SELF-TEST 2.1
1. What is the parity bit for each of the following characters if

EVEN parity is used?
(a) 1100100
(b) 0000000
(c) 1111111
2. Consider the two phases used in phase modulation to identify

the signal, a 180Ĉ phase shift in signal represent a „1‰ bit, and a 0Ĉ
phase change represent a „0‰ bit. What should be the bit string of
the modulated signal as shown below?
2.2 TYPES OF NETWORK SERVICE

This section introduces two types of network service. Before we look at them,
however, let us examine how a computer network transfers data.
Most computer networks do not transfer data as an arbitrary string of continuous

bits. Instead, they divide data into small blocks called packets, and send them
individually. There are two reasons for using packets for data transmission
instead of sending continuous streams of logical bits.
1. The data signal might be lost or damaged because of a noisy environment. If

we send continuous bits, it is difficult for the sender and the receiver to
handle the error (e.g. to detect the location of the error and retransmit the
data that were lost). But if we send data in a packet-by-packet form, it will
be easy to find out which one is damaged or lost and then to retransmit it.
2. Sending long continuous streams of bits may cause the fairness problem
(some might call it an „unfairness‰ problem). If the communication
occupies a communication channel for a very long time for sending the

data, others will suffer from long transmission delays. Sending data in
packets can solve this problem. Everyone can send data, but the computer
networks will take turns in handling them.
Figure 2.10 shows the general structure of a packet. The header of a packet
includes the source address, destination address, and options. The source address
identifies who has sent this packet, and the destination address identifies who
should receive this packet. Options are usually for network control and
management. Since a packet is assumed to be small, the size of a packet is
variable but it does have an upper bound (an upper limit).
Figure 2.10: A general structure of a packet
To deliver a packet from a source to its corresponding destination, we have two

types of network service: connection-oriented service and connectionless service.
2.2.1 Connection-oriented Service

Connection-oriented networks are analogous to the conventional telephone
systems. Before two computers start their data communication in a computer
network, they have to establish a connection from one end to the other end
through the network. One of the two computers will first send a connection
request to the other computer to ask for the connection establishment (this is
analogous to dialling a telephone number from one end). Then the other
computer agrees to accept the connection establishment (analogous to the other
end picking up the telephone). Now the network forms a path between two
computers, which we call a connection path. After the connection has been
established, two computers can exchange data.
Another analogy, which links better to the connection type described in the next
sub-section, deals with one form of mail delivery. Figure 2.11 shows the
guaranteed and dedicated delivery channel associated with one form of mailing a
letter.

Figure 2.11: An analogy for connected packet service · sending a registered letter
Note that two computers do not need to exchange their data continuously. They
can sometimes stop their data communication and then resume it later. The
connection path will not be removed until it is no longer needed. To release the
connection path, one of the two computers will send a disconnection request to
the other computer(s). It then disconnects the path, which is called connection
termination (analogous to hanging up the telephone, or the registered letter being
signed for and delivered).
2.2.2 Connectionless Services

Connectionless networks are analogous to the general postal systems. When a
computer wants to send some data to another computer, it places the packet into
the network with full source and destination addresses (analogous to putting a
letter into an envelope and writing an address on the envelope; see Figure 2.12).
Figure 2.12: We will do our best to get it there · general mail delivery

Then the network routes the packet from the source computer to the destination
computer according to its destination address (analogous to mailing a letter),
though the path is not dedicated or guaranteed. The computer may send more
than one packet to the other computer, but not all of the packets follow the same
path to the destination. Based on the routing mechanism of the network, each
packet will find its own way to travel. For example, a general letter posted in
Wan Chai to Kwun Tong today, might get there through Mongkok; one posted
tomorrow might get there through Quarry Bay. Another just might not get there;
but Hong Kong Post is very reliable.
2.2.3 Performance Comparisons

For connection-oriented services, you will see the transmission overhead is the
connection set-up time and the termination time. If the number of packets
transmitted is small - and the actual transmission time is therefore small - the
overhead is relatively significant since the connection set-up time is fixed.
However, once the connection is set up, all packets will be delivered over the
same connection path, so full source and destination addresses are not required
to be carried (a small identifier may be required to identify the connection path).
Another way of thinking about overhead is to think of efficiency. The proportion

of the amount of actual data or information that can be transmitted compared to
the total amount of data (including overhead) that must be transmitted could be
used as a measure of efficiency. Likewise, you could think of efficiency as the
amount of time needed to transmit the actual data compared to the total amount
of time to set up, effect and terminate the transmission as another measure.
Efficiency could therefore be represented as:
No. of bits of actual data

Efficiency (%) = x 100%
Total no. of bits sent (including overhead)
or
Actualamount of time to transmit data

Efficiency (%) = x 100%
Total time to send data (including overhead)
The percentage remaining if you subtracted the efficiency percentage from 100%
would provide you with a measure of the overhead. For example, if the efficiency
of your data transmission were 80%, then the overhead in that system would be
20%.

Moreover, the quality of services (e.g. transmission delay, transmission rate;

discussed in the next section) can be negotiated during the connection set-up and
can be maintained throughout the communication process while the connection
exists. Thus, connection-oriented services are usually provided for delivery
assurance (the registered letter) and sequenced delivery (the sequence of packets
received is the same as the sequence of the packets sent). They are suitable for
longer communication sessions such as file transfer and digitised voice
communication.
There is no connection set-up procedure for connectionless services, and thus, the
overhead does not include the connection set-up or termination time. However,
since each packet has to be delivered with full source and destination addresses,
the transmission overhead is the addresses, and it is fixed whether the
communication time is long or not. Obviously, the quality of service cannot be
negotiated, since each packet is routed independently and thus, the overall
performance is not stable. Also, the sequencing of the packet delivery cannot be
maintained (they can arrive in any order) and there is usually no assurance of
delivery. Connectionless service is suitable for short communication transactions.
SELF-TEST 2.2
1. Suggest two reasons why packets are used to transfer data
instead of sending logical bits continuously.
2. What is the efficiency of a connection-oriented service if it spends
four seconds to establish the connection setup and ten seconds
for data communication?
3. What is the efficiency of a connectionless service if the size of the
header of a packet is 24-bits and the average packet size is 120-
bits? Assume the header contains the only additional bits needed
to send the packet.
4. If we want the efficiency of the above connection-oriented
service (Question 2) to be not less than the above connectionless
service (Question 3), how long should the data communication
time be maintained in the above connection-oriented service?

2.3 QUALITY OF SERVICE (QOS)

To determine whether a network service is good or not, we need some
parameters or factors to investigate the networkÊs performance. The following are
the common parameters used to determine the Quality of Service (QoS) of a
network.
Throughput
Throughput is defined as the actual packet delivery rate, which means
throughput is equal to the number of packets that are successfully transmitted
over the network from one end to the other end in one second. The unit of
throughput is bits per second or packets per second. There are many other names
for this term, such as:
Ć Data transport rate.
Ć Data transmission rate,
Ć Data transfer rate, or simply
Ć Data rate.
Note that throughput is different from the transmission rate or transmission

capacity. The transmission rate or transmission capacity just shows the maximum
bit rate that can be achieved by the transmission link. However, depending on
different communication protocols, there are different transmission overheads
and channel uses, and thus, you can have different throughputs. Moreover, you
should concentrate on the throughput only - not on the transmission rate - since
the throughput gives us the actual information about the communication.
Sometimes, we consider the normalised throughput as:
the throughput
normalized throughput = , where 0≤ normalised
the transmission rate
throughput ≤ 1
Normalised throughput can also be called channel utilisation. When measuring

utilisation, you should consider the normalised throughout, because it is difficult
to compare the performance of two communication protocols if they have
different transmission rates. For example, if a network has 1 Mbps throughput,
you might claim that its throughput is quite high. However, if you know the
transmission rate of the network is up to 10 Mbps, you might (probably should)
say the network is not very efficient. But if I were to tell you the maximum
transmission rate was at most only 1.2 Mbps, you might then appreciate that it

has a highly efficient throughput. Thus, if we consider the normalised

throughput, the former is 0.1 and the latter is 0.833. It will be easy to identify the
quality of service without being influenced by the actual transmission rate.
Delay
Delay is also known as latency, transmission delay or data transfer delay. Delay
means the time taken from the sender wanting to send a packet to the receiver
completely receiving the packet. It is different from the propagation delay, in
which the latter is defined as the time taken between the first bit of a packet
leaving from a sender to that bit arriving at the receiver, or the time taken to send
an empty or null message.
It should be obvious that the longer transmission delay, the worse the QoS, and
you should always want to have a short transmission delay in any system. The
delay might depend on the length of a transmission medium and its properties,
but sometimes it also depends on how a communication protocol handles the
packet transmission procedure. A good communication protocol should find a
good way to minimise this delay factor.
Sometimes, delay jitter is also an important factor in determining the QoS,

especially for real-time data transmission such as multimedia communication
and video conferencing. Delay jitter is the variation of the mean transmission
delay. Since real-time data transmission requires a constant delay, delay jitter
should be as small as possible in order to provide for stable performance and
reduce the jittering effect.
Error Rate
Error rate represents the number of error bits over the total number of bits sent. If
the error rate is 0.01, it means there is one-bit of error for every 100-bits of data.
Error rate is one of the parameters for QoS. In order to provide better QoS, the
error rates must keep as low as possible. However, this factor is greatly
dependent on the communication environment and the transmission media.
Thus, the value of the error rate is not the factor that should be used to determine
the performance of a communication protocol but how errors can be handled.
The greater the efficiency in handling errors, the better the network performs (or
is performing).

SELF-TEST 2.3
1. If a network has a 3 Mbps transmission link and 20% is wasted
during data transmission:
(a) What is the throughput of the network?

(b) What is the normalised throughput of the network?
2. Is there any relationship between delay and throughput?
2.4 NETWORKING REQUIREMENTS FOR

DISTRIBUTED SYSTEM
A distributed system is usually applied (installed) in an organisation such as a
department or on a campus. Thus, a Local Area Network (LAN) is suitable for
supporting a distributed system. To build a distributed system, a LAN should be
constructed using interconnection devices (such as routers or switches) connected
to wide area networks (e.g. the Internet). MANs (Metropolitan Area Networks)
are not yet in widespread use, so we do not consider how to build a distributed
system in MAN. This topic concentrates on LANs and their inter-connection
devices, because they are the fundamental technology that supports message and
data exchange in distributed systems.
2.4.1 Local Area Networks (LANs)

In the early years of the evolution of networks and distributed system, engineers
and scientists focussed on the operation of dedicated point-to-point
communication - on connection-oriented services. In the late 1960s and early
1970s, they found that long-distance communication was different from short-
distance communication. A long-distance communication requires expensive and
dedicated point-to-point communication (e.g. telephone system). For a short-
distance communication, computers can be attached to a shared communication
medium and take turns to communicate their data.
Sharing a transmission link can reduce costs, but shared transmission links
should only be considered in local communications. The two reasons why shared
transmission links are not considered for long-distance communications are as
follows.

1. It takes time to coordinate computers to take turns in using the shared

communication link. If many computers share a transmission link over a
long distance, the delay in coordinating them would take too long and thus
the transmission overhead would be too large - especially if the amount of
data to be transmitted did not take too long. Think back about Questions 2
and 3 of Self-test 2.2 and you will know the reason.
2. Second, the transmission rate of a transmission link over a short distance is

very high and thus, it is suitable for sharing. If you want to maintain high
transmission rates over a long distance, the cost will be high.
A Local Area Network (LAN) is defined as a data network with a small

geographic scope (or span). Usually it spans an office, a building or a
campus. You find that LAN technologies are widely used because a LAN is
cheap for a small group of users. A LAN typically has the following
characteristics:
Ć It covers short distances, or perhaps it is better seen as covering a small
area (usually a few km at most).
Ć It has high rates of data transmission (10 Mbps to 1 Gbps).
Ć It has low bit error rates (10-8 to 10-12).
Ć It is inexpensive.
Ć It is easy to configure and install.
2.4.2 LAN Topology

The three kinds of LAN topology are bus, ring and star.
Bus Topology
In bus topology, all computers (also called stations, hosts, end stations, or
terminals) share a common (broadcast) transmission medium in a multi-point
configuration. Figure 2.13 shows a general bus topology. Here is how it works.
1. If a computer (e.g. Host 2) wants to send data to its corresponding receiver

(e.g. Host 1), it sends its data through its bus interface to the shared
transmission link (the bus). The bus interface is an electronic interface for a
computer to send and receive its electronic data signals.
2. The data signal, shown in the light arrow, then travels to two directions
through the bus - one is from the computer to the left terminator; another

one is from the computer to the right terminator. The function of the two
terminators is to terminate (or absorb) the data signal at the end of the bus.
3. Then the bus interface of the receiver checks the data signal in the bus. If the
signal belongs to it, the bus interface will copy the signal from the bus to its
computer. Note that a bus interface is passive, which means that if a
computer (e.g. Host n) is not the receiver of the data signal, it cannot force
its bus interface not to copy the signal from the bus to the computer.
4. Obviously, if more than one computer wants to transmit data at the same
time, it would cause a signal-overlapping problem and no receivers would
receive error-free signals. This particular problem is further discussed and
solved when we study the Ethernet, which is a common LAN commercial
product that invokes (uses) a bus topology.
Now check out Figure 2.13.
Figure 2.13: A general bus topology
Ring Topology
In ring topology, all computers are connected into a ring, which is a shared
common transmission link in a multi-point configuration similar to bus topology.
Figure 2.14 below shows the general structure of a ring topology. The operation
of the ring topology is similar to that of the bus topology, except for three
differences.
Ć The electronic data signal travels in one direction only, not in two directions -
it goes clockwise or anti-clockwise around the ring.
Ć Ring topology has no terminators. When the data signal travels from a sender
(e.g. Host 1), it travels around the ring and ultimately comes back to the
sender, where the sender finally absorbs it. This means that, when the data
signal arrives at the sender, the sender will not allow it to pass around the
ring again - it will absorb its own data signal (it does its own housekeeping by
acting as the terminator for its own signals).

Ć Third, a ring interface is active. Basically, a ring is connected through

computers (hosts) so if one of the computers has crashed, the ring will
become disconnected.
Figure 2.14: A general ring topology
Assume in Figure 2.14 that Host 1 is the data signal source.
Ć When a ring interface (such as the Host 2 interface) receives a data signal, the
interface will copy it.
Ć If the data signal belongs to its computer (i.e. the receiver, e.g. Host 2), the
ring interface will copy it into the Host 2 computer and then pass the data
signal to the next computer through the ring.
Ć If the data signal is not destined for Host 2, the Host 2 ring interface will
simply copy and send it to the next computer (Host 3) through the ring. If the
message is destined for Host 3, the interface will copy the signal to the
computer and then pass the message through the ring to Host 4.
Ć If the message in not intended for Host 4, Host 4 passes it along (back) to Host
1. And you know what Host 1 should do with it, donÊt you? „Kill‰ it, right!
When the message is sent back to Host 1, Host 1 will know that the message
has been received by Host 3. It will then absorb („kill‰) the message and
release the free token again for circulation.
Obviously, if more than one computer wants to transmit data at the same time,
there is a problem of how control of the ring is assigned to a particular computer -
two signals cannot be active on the ring at the same time. The problem is further
discussed and solved when we study Token Ring later in the topic Token Ring is
a common commercial LAN product built on (using) the ring topology.

Star Topology
In the star topology, all computers are connected to a central node and all data
transmissions go through that node. Note that the central node, which can be
active or passive, could be a:
Ć Mainframe computer,
Ć Hub,
Ć Switch, or
Ć Router.
The function of a node is to serve as a „repeater‰ to repeat the signal it received

and send it to other branches of the node. For example, in Figure 2.15, when Host
3 wants to send data to Host 4, it will pass the data to the node and then the node
will send it to all the other computers including Host 4. Host 4 will check whether
the data belong to it. If they do, then it will copy the data into its machine.
Otherwise, it (like all other unintended receivers) will ignore the data.
In the past, this star-type of LAN topology was usually used for centralised
computer systems. In other words, the node was usually a central computer (i.e. a
mainframe computer) and all of the hosts were terminals (i.e. a terminal
consisting of a monitor with keyboard and a mouse if any). The computation
power and the secondary storage resided in the central computer. When a user
on a terminal wanted to do some operations (e.g. run an application program),
the user sent commands from the terminal to the central computer. After the
computer executed the commands and finished the corresponding operations, the
output was transmitted back to the terminal and displayed on the terminalÊs
monitor.
Nowadays, the node of many star topologies is no longer a central or mainframe

computer, but is more likely to be a hub, switch or router. Modern terminals have
much computing power in their own, however, in most situations, the powerful
servers are built into the topology as one of the „hosts‰ that other hosts can access
through the node.
Basically this topology has the following disadvantages:
Ć Reliability. The reliability of the communication network is closely dependent

on the reliability of the node. If the node crashes, the whole system crashes.
Thus, in an effort to ensure its reliability, the cost of the node might be very
high.

Ć Loading. Since the node is involved in all data communications, its speed has
to be very high to process each data communication; otherwise, the node will
become a bottleneck and all data communications might suffer very long
transmission delays if too many packets travel too slowly through the node.
Ć Multiplexing. There is no multiplexing for each communication channel.

Since each computer has its own dedicated link for communication,
multiplexing cannot be applied in the star topology and thus, the cost of
establishing a star network is expensive.
Despite the above disadvantages, the star topology is often applied in LANs, but
bus and ring topologies are still very commonly used.
Figure 2.15: Star topology
SELF-TEST 2.4
1. What are the differences between the bus and ring topologies?
2. What combination of medium and topology would you choose for

the following scenario?
A LAN is to be built for an assembly line. The environment is

noisy (electronically) and station additions and removals are done
quite frequently. Furthermore, the medium should carry video and
voices in addition to LAN data.
3. Although the star topology has some disadvantages, can you point
out any advantages?

2.5 AN EXAMPLE OF A BUS NETWORK-

AN ETHERNET
History
The Ethernet is the most common LAN in the world. It was created by Xerox
CorporationÊs Palo Alto Research Centre in early 1970s. The standards associated
with the Ethernet (IEEE 802.3) are now given by the IEEE (Institute for Electrical
and Electronic Engineers), a famous international organisation that defines the
standards of different kinds of communication network.
10Base5 Ethernet has a shared transmission link called a segment, which is a

coaxial cable limited to 500 meters. The minimum separation among computers
(or connections) for Ethernet is three metres. The operational transmission rate of
the bus is usually 10 Mbps. The speed of Ethernet has recently increased a lot. A
recent version of Ethernet called Fast Ethernet can achieve speeds of up to 100
Mbps, and the latest version of Ethernet called Gigabit Ethernet operates at 1
Gbps. Table 2.3 shows the general description of the most common Ethernets.
Table 2.3: General Descriptions of the Most Common Ethernets
Maximum Maximum no.

Name Cable Advantages
Segment of Nodes
10Base5 Thick coaxial 500 m 100 Good for backbones
cable
10Base2 Thin coaxial 200 m 30 Cheapest system
cable
10Base-T Twisted pair 100 m 1024 Easy maintenance
10Base-F Fibre optics 2000 m 1024 Best between buildings
2.5.1 Rules
The algorithm used to manage the data transmission in Ethernets is called
CSMA/CD (Carrier Sense Multiple Access with Collision Detect). The algorithm
is a kind of media access control protocol that corresponds to the functions of
Data Link Layer in the ISO OSI model. The CSMA/CD rules are quite simple and
are shown below:
1. When a station (computer) has data to send, it first listens to the channel
(segment) to „see‰ if anyone else is transmitting at that moment.
2. If the channel is idle, it transmits a frame (packet).

3. If the channel is busy, the station waits until it becomes idle.
4. If a collision occurs, the station aborts the transmission, sends a jam signal,
waits a random amount of time, and then starts all over again from step 1.
Now let us further elaborate on the above four steps.
In rule one, „listens to the channel‰ means „measures the voltage of the channel‰.
Since coaxial cable is used as the shared transmission link, the data signal is an
electronic signal. When the channel is idle, a specified pattern of an electronic
signal can be found in the cable. When a station wants to send data, it measures
the change of the voltage of the channel. If the pattern matches the idle state, the
channel is idle; otherwise, it is busy.
In rule two, a packet in an Ethernet network is called a „frame‰. When the

channel is idle, it transmits the frame in two directions and thus, the whole bus
will be filled with this data signal. It is important to send the data to both ends,
even if the station „knows‰ the location of the corresponding receiver. It is
important, because if the station sends the data signal to one side (direction) only,
some stations on other side will not know that a station is transmitting data, so
they might start to send some data of their own. Therefore, a collision would
occur. Here, a collision means more than one frame transmission exists in the
channel and thus, more than one electronic signal is overlapping so that the
receiver(s) cannot detect the correct voltage and the correct message. A collision
is shown diagrammatically in Figure 2.16.
Now, when data can be safely transmitted over the bus, how does the receiver
receive data? The receiver will check the destination address field in the header of
the frame. If the destination address is its own address, it will copy the frame
from the bus through the bus interface.

Figure 2.16: Ethernet collision
Rule three is simple and no further elaboration is needed.
In rule four (you may refer again to pp. 116ă17 in your text), you might wonder
why or how a collision could occur if the first three steps were followed. In fact,
there is still a chance of having collisions. Let us look at Figure 2.16 again.
Let δ be a very short time duration. When t = 0, Host 2 inspects the bus and
finds it is idle. Thus Host 2 thinks there is no data transmission and then it sends
the data at t = δ . When t = 2 δ , Host 2 is sending data and tries to put the data
over the whole bus so that everyone will know the bus is occupied. However, at
that same time, Host n starts to detect the status of the bus. At that moment, since
the data signal sent from Host 2 has not yet reached Host n, Host n wrongly
believes that the bus is idle. At t = 3 δ , Host n starts to send its data and thus, a
collision occurs.

When a station detects a collision, the station will abort the transmission and send
a jam signal. The jam signal is a random signal so that every other station knows
there is a collision in the channel and no one should start any transmission before
the jam signal is completely transmitted. Note that more than one station might
detect the collision and thus, more than one jam signal might be transmitted. This
is allowed in Ethernets.
Now, after the collision is detected and a jam signal is transmitted, all stations
involved in the collision will need to retransmit their data. It is obvious that they
might all retransmit at the same time. If that happens, the data signals will collide
again.
So now the question is: „We know they need to wait for a while and then
retransmit, but how can we arrange their random waiting time?‰. To answer this
question, we have Binary Exponential Backoff algorithm, which is shown below:
After the nth collision, each station chooses a random integer from 0
and 2n · 1 and waits that number of slot times. If 10 ≤ n ≤ 16, the
randomization interval is frozen at a maximum of 1023 slots. If n > 16,
the network is assumed to have crashed.
After the first collision, each station waits either 0 or 1 slot times before trying
again. One slot time is equal to the worst case of the round-trip propagation
delay, i.e. 2 × end-to-end propagation delay. End-to-end propagation delay is
defined as the time required for the first bit of a frame transmitted from one end
to get the other end. To accommodate the worst case allowed by the IEEE
standard, the slot time has been set to 512-bit times (the time taken to send 512-
bits), or 512-bits /10Mbps = 51.2 μs. If more than one station wants to retransmit
after the same number of slot times, they will collide again and then randomly
wait for the next retransmission. Next time they will have four slots to select from
(0, 1, 2, and 3; that is, the second collision yields four random time slots [between
0 and {22 ă 1 = 3}]. The third collision would yield eight random slots, and so on.
When more than ten collisions have occurred, it is meaningless to increase the
randomisation interval further since the interval is sufficiently large; it is
therefore frozen. If, finally, the messages still cannot be delivered after the 16th
collision, the system assumes that:

Ć Too many stations want to transmit data,
Ć The applied load is so high that the network cannot support the transmission
of data, so the system announces that the network has crashed.
In addition to the rules of the transmission and retransmission procedures is one

more important rule for Ethernets. It is the restriction of the minimum Ethernet
frame size. Consider the worst case shown in Figure 2.17. There are two stations
such that one (e.g. Host 1) is close to one end and another one (e.g. Host 2) is
close to the other end. When t = 0, Host 2 wants to transmit data. When it listens
to the channel, it finds the bus is idle and so it starts to send the data. Just before t
= τ (the end-to-end propagation delay), Host 1 wants to transmit its data. It
detects (inspects) the status of the bus and finds it is idle (since the first bit of the
frame sent by Host 2 has not yet arrived at its end), so it starts to transmit and
immediately detects the collision. However, at t = 2τ, when the first bit of the
frame sent by Host 1 arrives at Host 2, Host 2 finally knows that a collision has
occurred. Thus, if a frame transmitted by a station is too short (so that it cannot
stay in the network longer than 2τ) the sender may miss the chance to detect a
collision and wrongly believe the frame has been successfully transmitted. Thus,
a frame has to stay in the bus longer than 2τ to guarantee that any collisions that
happened to this frame will be detected. Therefore, given C as the bus
transmission capacity (or transmission rate in bps), the minimise size of an
Ethernet frame is 2τC.

Figure 2.17: The worst case of a collision on an Ethernet
2.5.2 Frame Structure

Ethernet frames were mentioned above, but it is time to describe the structure of
a frame (or packet) in a little more detail. Please note right from the start that „the
lower bound of 46 bytes on the data field ensures a minimum packet length of 64-
bytes, which is necessary to guarantee that collisions will be detected by all
stations on the network⁄‰Therefore, Ethernet frames have a minimum size of 64-
bytes. Figure 2.18 shows its structure.
Ć Preamble (seven-bytes): It contains a special pattern („10101010‰) for each

byte (one-byte = eight-bits) so that the receiverÊs clock can synchronise the
senderÊs clock using the preamble.
Ć Start of frame delimiter (one-byte): It contains „10101011‰ to indicate that the

frame starts.
Ć Destination address (two or six-bytes).

Ć Source address (two or six-bytes).
Ć Length of data field (two-bytes): This is the length of the data field in units of
bytes and the maximum length of the data field is 1,500-bytes.
Ć Pad (0 ă 46-bytes): As mentioned, a valid Ethernet frame size cannot be less

than the minimum frame size. According to the IEEE standard, the minimum
frame size is 64-bytes. Since a data field of 0 is legal, we need the pad field to
fill up a frame to the minimal size.
Ć Checksum (four-bytes): This field is used for the receiver to check whether the
receiving frame is correct or not. Usually, the Cyclic Redundancy Check
(CRC) method is used for error checking. CRC is an error-checking technique
in which the receiver calculates a remainder by dividing frame contents by a
polynomial divisor and compares the calculated remainder to a value stored
in the checksum of the frame by the sender. Note that the design and
implementation of CRC is not included in this course because of its
complexity.
Figure 2.18: General structure of an Ethernet frame
2.5.3 Performance Analysis

This part concentrates on the performance analysis of CSMA/CD protocol. As
mentioned, three important parameters illustrate the performance of a protocol:
throughput, access delay and error rate. Because of very low error rates in LANs,
we ignore this parameter in our investigation. Furthermore, there is little need to
devote much attention to access delay, because its value is easy to estimate.
First, let us consider the term „applied load‰. The applied load of a LAN is
defined as the total number of frames per second generated by all stations. It
includes both new frames generated and re-attempted frames (retransmitted after
collisions). When the applied load is low, the bus is always idle and almost no
collisions occur. Thus, every station wanting to transmit its frame can start its
transmission almost immediately without any delay and the overall access delay
is very small (close to 0). When the applied load is high, many frame

transmissions abort because collisions and need to be retransmitted. Since frame

retransmission leads to an increase in applied load, the access delay is high.
When the applied load is sufficiently high, frames are always being retransmitted
and so the access delay will become unboundedly high.
Now let us concentrate on the analysis of the throughput. To simplify the

analysis, we assume that the length of all frames is fixed. Let tx be the
transmission time of a frame, which is equal to F/C where F is the fixed Ethernet
frame size and C is the bus transmission capacity, and τ is the end-to-end
propagation delay. By following the complicated research results from Metcalfe
and Boggs (1976) and some modifications after them, we have the following
equation to compute the upper bound of the normalised throughput of
CSMA/CD protocol, S:
1 τ
S ≤ ,where a = . Equation 2.1
1 + 4.44a tx
Let us follow an example to show how to use the above formula to compute the
upper bound of the normalised throughput.
Consider a 10 Mbps bus LAN using CSMA/CD. Suppose the bus spans 5 km and
the frame size is 1,000-bits. What is the maximum throughput of the LAN (in bps)
if the propagation delay of the bus is 5 μs/km?
First, we need to compute τ. τ = 5 × 5 = 25 μs. Then tx = 1,000 bits / 10 Mbps = 100

μs. So, we have a = τ / tx = 0.25 and then the maximum normalised throughput is
1 / (1 + 4.44a), i.e. 0.4739. Hence, the maximum throughput of the 10 Mbps
network is 4.739 Mbps.
2.5.4 Switched Ethernet

The performance of Ethernets is improving all of the time. A shared-media
Ethernet transmission method implies a half-duplex operation. Half-duplex
means a station can either transmit or receive data but cannot perform these tasks
simultaneously. In the early 1990s, the shared transmission medium was
implemented in full-duplex mode, which means stations can send and receive
data at the same time (i.e. simultaneously).
Figure 2.19 shows an Ethernet with dedicated media full-duplex technology and
a switch. There is no collision in this Ethernet, because stations can transmit a
frame to the switch and receive a frame from the switch. If more than one station

wants to send frames to the same destination, the switch will buffer them and
send them to the destination in sequence.
Figure 2.19: Switched Ethernet
SELF-TEST 2.5
1. Describe the principle of CSMA/CD protocol.
2. Why is CSMA/CD technique not suitable for WAN? Justify

your answer, using Equation 2.1.
3. Consider a 4 km bus LAN using coaxial cable with a

propagation delay of 10 μs/km and a speed of 5 Mbps. The
frame size is 1,000 bits and CSMA/CD is used.
(a) What is the minimum frame size?
(b) What is the maximum throughput achievable?
(c) What is the real throughput if the actual maximum
utilisation level is set at 60% of the theoretical
maximum?
2.6 AN EXAMPLE OF A RING NETWORK -

TOKEN RING
History
Ring networks have been used for many years in local area networks and wide
area networks (Pierce 1972). Since a ring network is fair and collision-free
(discussed later), IBM chose the ring as its LAN, and IEEE included this LAN
protocol as a standard (IEEE 802.5).
2.6.1 Rules
The rules of a Token Ring network are quite simple and are shown below:
1. At ring initialisation, a special packet called a token is injected into the ring
and circulates on (around) the ring.
2. A sender station (wanting to transmit) waits until it sees a token. It holds

the token and transmits its frame (packet) on the ring (the sender enters into
transmit mode).
The receiving station(s) picks up the frame from the ring and changes one-
bit field for acknowledgments while the other (non-receiving) stations
simply pass the frame along to the next station.
3. The sender station absorbs the transmitted frame when it circulates back.
4. Then the sender station releases the token (the sender then switches to listen
mode).
Now let us further elaborate on the above four steps.
In rule one, a token circulates around the ring and waits for someone to catch it.
Note that, at that moment, all stations are idle and the token travelling around is
called a „free‰ token.
In rule two, when a station wanting to transmit sees a „free‰ token, it seizes the
token by inverting a specified single bit in the three-byte token and changes it
(the token) into the first three bytes of a normal data frame (the token and frame
formats are shown later). The data to be sent will then follow the token. When
other stations „see‰ this modified token, they know this token is occupied.
To transmit a data frame, the ring interface of the station will change from listen
mode to transmit mode. Listen mode is shown in Figure 2.20(a). In this mode, the
input frame is simply copied to the output with one-bit delay. one-bit delay is the
time taken for the ring interface to transmit one bit. For example, if the ring
transmission capacity is one Mbps, one-bit delay is 1 μs. In transmission mode
[see Figure 2.20(b)], (which is entered when the free token is obtained) the ring
interface breaks the connection between input and output. The input „token
frame‰ will be copied to the station and the station will regenerate its own data
frame to the output.

In rule three, „absorb‰ means when the transmitted frame circulates back after
one cycle around the ring, the sender will receive the frame but it will not
regenerate it to the next station. The sender does not do the regeneration, because
the transmitted frame has already circulated the ring once and so the
corresponding receiver should have already copied the frame.
In rule four, since every station knows the bit pattern of the „free‰ token, it is easy
for the sender to regenerate a „free‰ token.
Figure 2.20: Ring interface
Note that a Token Ring is a collision-free LAN. Since a station is allowed to

transmit data only when it gets the token and there is only one token in the ring,
only one station can be doing data transmission at any one time. However,
compared with Ethernet, the disadvantage is that a station wanting to transmit
needs to spend some time to grab the token, regardless of how heavy or how
light the applied load is.
What you would have to weigh or judge, however, is whether waiting is more of
a disadvantage than having systems crash through collisions, which is likely a
major disadvantage of Ethernets.
2.6.2 Frame Structure

Let us investigate the frame structure associated with Token Ring. Figure 2.21
shows its structure.
Figure 2.21 shows the structure of the token.
Ć Starting delimiter (SD) (one-byte): It marks the beginning of the token.

Ć Access control (AC) (one-byte): It contains the token bit, the monitor bit,
priority bits, and reservation bits.
Ć Ending delimiter (ED) (one-byte): It marks the end of the token.
Figure 2.21 also shows the structure of a data frame.
Ć The token bit of Access control (AC) should be Â0Ê.
Ć Frame control (FC) (one-byte): This distinguishes data frames from various
possible control frames.
Ć Destination address (two or six-bytes).
Ć Source address (two or six-bytes).
Ć Data (no limit).
Ć Checksum (4 bytes): This field is used for the receiver to check whether the
receiving frame is correct or not. The Cyclic Redundancy Check (CRC)
method is usually used for error checking.
Ć Frame Status (FS) (one-byte): It is for network control, and contains the
acknowledgement bits.
Figure 2.21: General structure of a Token Ring token and frame


2.6.3 Performance Analysis

This part concentrates on the performance analysis of a Token Ring. As in our
discussion of Ethernets, we ignore its error rate in our investigation because of
the very low error rates that occur in LANs.
The access delay is quite different from an Ethernet. When the applied load is
low, a station wanting to transmit simply waits for the tokenÊs arrival before data
transmission can begin. Thus, the range of the access delay is from zero to the
ring latency (say L), which is the total delay incurred by a bit (signal) to circulate
once around the ring, and the average access delay is L / 2. When the applied
load becomes higher and higher, the access delay becomes longer and longer,
until it reaches the maximum access delay. The maximum access delay is
achieved when all stations want to transmit data and each one that holds the
token holds it as long as possible. If the time that a station can hold on to the
token (MTHT = Maximum Token Holding Time) is bounded (limited), the
maximum access delay is also bounded, which is suitable for real-time
applications.
Suppose there are n stations. The worst situation is that a station wanting to
transmit just misses the token and every other station holds the token as long as
possible, i.e.
D ≤ L + ( n − 1)T h . Equation 2.2
Note that D is the access delay and Th is MTHT.
The throughput increases as the applied load increases and levels off as the
transmission capacity of the ring is approached. When the access delay is
maximized, its throughput is also maximized.
Figure 2.22 succinctly illustrates the relationship between throughput and

applied load in Token Rings, but a bonus of the figure is that it shows, by
comparison, what happens in an Ethernet that faces increased applied loads. You
can readily see what happens to the throughput when an Ethernet is put under
„strain‰.

Figure 2.22: Applied load vs throughput · Token Ring and Ethernet
Assume each station spends all the time to send data when it holds the token
(and the transmission overhead is ignored, including processing time to hold the
token and release the token). When all stations want to transmit their data - and
hold the token as long as possible - the total actual data transmission time is nTh
but the ring latency L is the total transmission overhead, i.e.
nT h
S≤ , Equation 2.3
L + nT h
where S is the normalised throughput.
2.6.4 Ethernet vs Token Ring

After seeing two common LAN protocols, you should understand that Ethernet
and Token Ring have their strengths and weaknesses. Let us summarise them in
Table 2.4:
Table 2.4: Ethernet vs Token Ring
Characteristics Ethernet Token Ring

Shared transmission link Bus Ring
Interface Passive Active
Collision Yes No
Access delay Unbounded Bounded
Network failure Yes (if the applied load is No
very high)

From the above table, you may think the overall performance of Token Ring is
better than that of Ethernet. However, Ethernet is more popular than Token Ring
in the commercial world for the following reasons:
1. We know the access delay of Ethernet is unbounded and Token Ring is

bounded, and this is supposed to be the advantage of Token Ring over
Ethernet. However, when the applied load is low, there is almost no access
delay for Ethernet, since users wanting to transmit can do so immediately,
and the possibility of having a collision is extremely small. So, users find
that using Ethernet to access remote resources is just like accessing local
resources and the network looks very transparent. For Token Ring users,
however, even if no station is trying to transmit data, a station wanting to
transmit still needs to wait for the tokenÊs arrival. Thus, the access delay is
small but cannot necessarily be ignored.
You might argue that the performance of Ethernet is not good when
compared with Token Ring, when the applied load is very high. But usually
we do not allow the applied load to be so high. We know that both Ethernet
and Token Ring have large access delays when the applied load is high, and
users would not like either network, because they need to wait for a long
time to do a few remote operations. So, in the commercial world, since a
LAN is inexpensive, companies will increase the transmission rate or even
buy one more LAN to reduce the applied load along with the access delay.
2. Another advantage of Ethernet over Token Ring is the ease of

reconfiguration and installation. For Ethernet, when you want to add a
user, you just add a T-joint (a connector between an Ethernet network card
and the bus) to the bus and connect it to an Ethernet card. It is very simple
and efficient. If the applied load is still reasonably low after the addition, no
users know about the addition and the performance of the Ethernet is
almost the same. When a user wants to leave, just remove the Ethernet card
from its T-joint, which is also very simple.
But for Token Ring, it is not easy to add a new user. You need to disconnect
the ring, put the ring interface into the ring and connect them together.
Disconnection means stopping the operation of the network, which will
disturb other users. Also, one more user will add one more one-bit access
delay; the new user affects the performance of a Token Ring. Moreover,
when you want to remove a user, you either need to disconnect the ring
and then remove it, or set the ring interface to bypass all packets.

Of course, Token Rings have advantages over the Ethernet.
(a) Reason 1
Token Ring is fairer than an Ethernet. In an Ethernet, when a station has a
lot of data to transmit and its first frame is successfully transmitted, the rest
of the data will easily follow the first frame and be transmitted. This comes
about because, this one station always occupies the bus and other stations
would find it difficult to access the bus when the station uses the bus to
send its frames continuously - and the other stations cannot detect an idle
status on the bus. However, for Token Ring, when a station finishes
sending its first data frame, it has to wait for the token to circulate round
the ring once before it can access the token again. So, if other stations want
to transmit, they have a fair chance to get the token to do their data
transmission.
(b) Reason 2
It is easy to set priorities in a Token Ring system but not in an Ethernet. In
Ethernet, the schedule is first-come, first-served. Thus, even if you have a
high priority, if another station has already occupied the bus, you have to
wait until it finishes. But for Token Ring, if a station has a higher priority, it
can set the token in such a way that it can only be accessed by high-priority
stations; stations with lower priority cannot hold the token to send data
(until it is released).
However, the above two advantages of Token Ring over Ethernet are not
very important in the commercial world and thus, Ethernet is more popular
than Token Ring.
SELF-CHECK 2.6
1. What is the principle of the Token Ring protocol?
2. Consider a two Mbps Token Ring network. The ring latency of

the LAN is three ms and the maximum token holding time is ten
ms. If there are 100 stations, what is the maximum access delay
of the LAN?

ACTIVITY 2.1
If you work in an office with a local area network (LAN), try to check
the type of the LAN (Ethernet or Token Ring?) in your office and
check its transmission rate and other specifications.
Do you believe your LAN system is the „best‰ for your particular
working environment? Think about how you could improve its
usefulness or performance.
2.7 INTER-NETWORKING
LANs can be found everywhere but they are seldom large in size. Companies or
organisations typically like to build a LAN for each department rather than a
large LAN serving all departments. Three reasons to explain why it is better to
build several small LANs and interconnect them rather than building a large one:
1. Different departments (or offices) typically have different requirements. It is

not easy to find a single LAN to fulfil all requirements of different
departments (or offices) in a single company. Thus, it is a good idea to build
a LAN for each department and then use some internetworking device to
connect all of them.
2. Within a department (or an office), a lot of data and messages might be

travelling around. However, information is seldom exchanged among
departments (or offices) because not many activities involve more than one
department (or office). So, to handle the busy data traffic within a
department (or office), a small LAN is suitable, whereas some
interconnection devices could be built between departments (or offices) to
connect the smaller LANs for information exchange.
3. For a large LAN, if a small portion of the stations or a shared transmission

link fails, all users in the LAN will be affected or perhaps cannot even
communicate with each other. But if we build smaller LANs and
interconnect them, the type of failure described above will affect only one
small LAN and other parts of network can still work properly.

Four types of inter-connection device allow LANs to be inter-networked:

(a) Repeater,
(b) Bridge,
(c) Router, and
(d) Gateway.
A router is the most common inter-connection device. Since it is closely related to

TCP/IP, we discuss it in Topic 3. A gateway is an inter-connection device for the
highest layer in the OSI model - the application layer. They can examine every
field of an incoming packet, even its data content, and then take some operations
to modify incoming packets. Since it involves different application services that
are not included in this course, we do not discuss it here. So, let us investigate the
rest of the inter-connection devices - repeater and bridge.
Repeater
A repeater is used to regenerate transmission signals (so that they can travel
farther) to extend the length of a LAN. It operates in the physical layer of ISO OSI
model. It is usually used when a travelling signal is too weak to travel further.
When a repeater receives a frame, it re-generates the frame and strengthens the
signal. A hub is a multi-port form of a repeater. The main advantage of a repeater
is its low cost, because its function is simple. But its disadvantage is that if a
repeater connects two LANs, the two LANs will actually become one LAN. For
example, if a repeater connects two Ethernets, when a station in one of the two
LANs sends a frame, all stations in two LANs can read this frame - the two have
merged into one. Thus, even though a repeater is the simplest interconnection
device, it is not suitable if you want to build separate LANs.
Bridge
A bridge is a more complicated inter-connection device and has more functions
than a repeater. It operates in the data link layer of ISO OSI model, it can connect
different LANs together, but it can also filter the traffic among (between) the
LANs. Basically a bridge serves three functions:
1. Packet forwarding function. A bridge can determine where to forward a

packet (towards its destination). Consider the example shown in Figure
2.23. Assume a station in LAN A is going to send a data frame. Suppose the
destination of the transmitted frame is a station in its own LAN (i.e. LAN
A). When the bridge receives the frame, it will examine its destination
address and find that the receiver is within LAN A. Then the bridge will
absorb or terminate the frame since the frame does not need to travel
outside LAN A.

Suppose now the destination of the transmitted frame is a station in LAN B.

Then the bridge will selectively forward the frame (packet) to LAN B but
not LAN C.
This is a very important feature of a bridge. If it allowed every frame to

travel anywhere (to every part of interconnected LANs), all LANs would
essentially become one LAN - which would be the same as a repeater. If a
bridge can selectively forward frames, all LANs can communicate with each
other but unnecessary frame forwarding can be avoided.
Figure 2.23: Physical configuration of bridged LANs
2. Another important feature of a bridge is the transformation between

different LAN frame formats. For example, the Ethernet frame format is
different from the Token Ring frame format. If an Ethernet frame is sent to a
station in a Token Ring, the receiver will not „understand‰ the content of
the Ethernet frame. So, a bridge is required to do the transformation so that
different LANs can communicate with each other.
3. A bridge is also required to accommodate the speed differences between

LANs. For example, if a low-speed LAN (e.g. a 100 Kbps Ethernet) sends
data frames to a high-speed LAN (e.g. a 1 Mbps Ethernet), there is no
problem because the high-speed LAN just waits for the low-speed one to
send data to it. But if the high-speed one sends data frames to the low-speed
one, the low-speed one cannot receive the data frames from the
corresponding station quickly enough, and therefore the bridge becomes
congested. For this reason, a bridge should have a large buffer to store data
frames travelling between LANs. It is especially important when there is an
important LAN - one in which traffic from several LANs converges into it.
The bridge may need to handle the traffic one by one, and many data
frames might need to be buffered first and then transmitted later when the
LAN is available.

In this topic, you have been introduced to some network concepts. The first
section of the topic deals with some of the hardware issues associated with
connecting „standalone‰ computers into networks. The important issues in your
choice of transmission medium for a network include the cost of inter-connecting
the computers, but perhaps more importantly it includes the capability of each
medium to transmit your data effectively and efficiently. Different media have
different transmission rates and different spans or ranges over which they are
effective. You should now be able to choose the right one for your networkÊs
needs.
The next two sections deal with two types of network service - connection-
oriented and connectionless - with the definition and parameters to determine the
Quality of Service (QoS). To a large extent, a loose relationship exists between
network services, quality of service and the type of transmission media.
The last sections, which deal with the networking requirements for distributed
systems and two examples (case studies) of LANs, should have provided you
with a good understanding of how the parts fit together to make effective
networks. You were shown the characteristics of Local Area Networks (LANs)
and the different topologies that can be used to develop such networks. Each of
the case studies provide you with information about the rules, frame structure
and performance analysis for Ethernets and Token Ring, and you should also
understand the strengths and weaknesses of these two systems.
Finally, you were exposed to the ways in which LANs could be inter-networked
to form larger units but, at the same time, you were alerted to the limitations
associated with joining LANs into larger units. Perhaps the key issue here
revolved around the fact that even though inter-networking is highly desirable, it
is still advantageous that each individual LAN retains a degree of separateness.
„Linked separateness‰, if such a thing exists, could be achieved by using bridges
(rather than other devices) to connect the separate LANs.
You now have the background concerning the physical aspects of networks and
distributed systems to put your learning in Topic 3 into context. This next topic
focuses on the protocols required for data transfer through and across networks
and systems.

Metcalfe, R M and Boggs, D R (1976). „Ethernet: distributed packet switching for

local computer networks‰, Communications of the ACM, 19: 395ă404.
Pierce, J (1972). „How far can data loops go?‰ IEEE Trans. on Communications,
COM-20L 527ă30.
Tanenbaum, A S (1996). Computer Networks, 3rd ed, Englewood Cliffs, NJ:

Prentice Hall.

TOPIC 3 TRANSMISSION CONTROL PROTOCOL/ INTERNET PROTOCOL TCP/IP W 65
Topic 3 X Transmission
Control Protocol/
Internet Protocol
TCP/IP
LEARNING OUTCOMES
1. Describe the concept of layered network architecture;
2. Outline the layered structure of the TCP/IP reference model;
3. Describe the functions of Internet Protocol (IP);
4. Describe the functions of Transmission Control Protocol (TCP);
and
5. Identify different Internet applications.
X INTRODUCTION
Now that many computers are connected to the Internet, many users use Internet
application services everyday. This important topic introduces the
communication protocol used in the Internet - TCP/IP (Transmission Control
Protocol/Internet Protocol). In fact, the Internet is just one type of WAN (Wide
Area Network). Because the Internet is so popular, everyone knows about it but
not about WANs in general. So, to get this topic easier to understand, we
introduce the broad concept of a WAN and its switching technology.

66 X TOPIC 3 TRANSMISSION CONTROL PROTOCOL/ INTERNET PROTOCOL TCP/IP
3.1 WIDE AREA NETWORK (WAN)

A WAN is a network that uses technology designed to span a large geographic
area (e.g. a country). Machines that use a WAN to send and receive packets are
called hosts, stations, terminals and end-users or end-stations. Machines that
switch packets from one link to another in a WAN are called nodes, switches,
switching nodes or routers (for TCP/IP).
Transmitting data over a WAN cannot be achieved by simply applying LAN

technology in a WAN. The reasons are as follows:
1. It is impossible for a large number of computers to share a single

transmission link, regardless of how high the transmission rate is. If we
were to apply the CSMA/CD (Carrier Sense Multiple Access with Collision
Detect) protocol, its performance would be very bad. For a typical WAN,
the end-to-end propagation delay is very long (usually longer than one
second). The minimum packet size of Ethernet, 2τC (recall from Topic 2),
which is directly proportional to the end-to-end propagation delay (τ) and
the link transmission capacity (c), would be unreasonably large. For
example, if the transmission rate of the shared transmission link is one
Mbps and the end-to-end propagation delay is one s, the minimum packet
size of Ethernet is two Mb! Furthermore, since the end-to-end propagation
delay is very long and the frame transmission time is comparatively short,
the value of a in Equation 2.1 would therefore be very large (refer to
„Performance analysis of Ethernet‰ in Topic 2). Therefore, the normalised
through-put would be unacceptably small (less than 1%). If we were to
apply the Token Ring protocol, and since the end-to-end propagation delay
is very long and the number of stations allowed is very large (more than
1,000), the access delay in a Token Ring would be unreasonably long;
2. Second, a WAN should permit many computers to send packets

simultaneously, which is not allowed in a LAN. Even that is allowed in
some advanced LANs, but the transmission delay will be very high, since
all packets must be transmitted through a shared transmission link or
switch.
We use packet switching to deliver packets in a WAN. The principle is, a

sender injects a packet into the network, and then the network routes it to
its destination. Each packet might travel different paths to the destination.
Figure 3.1 shows the general structure of a packet.

Figure 3.1: The general structure of a packet
The general structure of a WAN is shown in Figure 3.2. When a packet arrives at
a node, it is stored in the nodeÊs memory and the destination address (included in
the header of the packet) is examined. By searching its routing table, the node
then makes a routing decision about where to forward the packet.
Ć If the packet is destined to a host that is attached to the local network of the
node, the node will deliver the packet to the network and then to the
destination host computer.
Ć Otherwise, it is forwarded to an adjacent-node for onward travel to the

destination.
To forward the message, the node puts it on the outgoing queue of the link. The
link transmits packets in its queue on a first-come-first-served basis. Since each
packet requires storing and forwarding procedures during its transmission,
packet switching is also called store-and-forward switching.
Figure 3.2: The general structure of a WAN

For better buffer management and fairness among computers, packets have a
fixed maximum (or bounded) size. If the packet size were unbounded, a large
buffer would be required for each node to store large packets and it would be
difficult to know how many packets a node could handle. Moreover, if a packet is
too large (long), it will occupy the nodes that the packet must travel through for a
long time. Other packets using the same nodes will suffer a long-transmission
delay, and hence it is not fair to them.
Now the question is: What happens if the transmitted data from the application
program is larger than the fixed maximum size allowed in the whole network
protocol (i.e. the whole network software)? The application program should
divide the data into several parts, and then put them into the network protocol.
Then the application program at the destination will recombine them. Note that
this is different from the fragmentation and reassembling in the IP (Internet
Protocol) layer (described later).
Your first reading deals with issues surrounding packet transmission or

switching. Perhaps the key generalisation to note from this reading is that, this
technology is not really suitable for all types of data transmission - it meets the
needs of some better than of others. The reading also provides a neat description
of the different switching schemes that are available in a computer network.
READING
Chapter 3, section 3.3, „Network principles‰, 73ă77.
SELF-TEST 3.1
1. Refer to Figure 3.2. How many potential pathways are between

the sending station, (Host A) and the receiving station (Host B)?
2. Circuit switching and packet switching are two common types

of switching used in networking. What are the differences?

3.2 LAYERED NETWORK ARCHITECTURE

To reduce the design complexity of network software, most networks are
organised as a series of layers or levels. The layered network architecture (layered
network protocol) is briefly discussed in Topic 1. Now we focus on two reference
models - the ISO OSI and TCP/IP.
3.2.1 ISO OSI Reference Model

This model is shown in Figure 3.3. The model was created in 1983, based on the
proposal developed by the International Standards Organization (ISO). Its aim
was to standardise all network protocols in the various layers. It is called ISO OSI
(Open Systems Interconnection) Reference Model because it deals with all open
systems. Brief descriptions of each of the layers are given in Figure 3.3 below:
Application
Presentation
Session
Transport
Network
Data link
Physical
Figure 3.3: The ISO OSI reference model
Ć Physical layer: This is a base layer, which defines the digital interface for the
physical transmission, e.g. communication modes, modulation, multiplexing
and so on. It also defines how to send logical Â1Ê and Â0Ê signals and the
configuration of connectors.
Ć Data link layer: This layer includes two sub-layers. One is the Medium Access
Control (MAC) sub-layer, which is special for LAN protocols. CSMA/CD and
Token Ring protocols are two examples of this sub-layer. The other layer is
the Logical Link Control (LLC) sub-layer, whose role is to implement an
error-free communication link. The functions of this sub-layer include error
control, flow control and link control (link set-up and release procedure).

Ć Network layer: This layer controls the low-level operations of a network. It

determines how packets are routed from source to destination, and the
network congestion control, i.e. how to solve the problem of too many packets
contained in a network.
Ć Transport layer: This layer isolates the user from the details of the network
access (i.e. the user is not required to know any details about the network).
From this layer up to the application layer, users are not involved in any
physical details of a communication network. The functions of this layer
include Quality of Service (QoS) negotiation, end-to-end reliable message
transport service, and multiplexing.
Ć Session layer: This layer establishes and maintains communication sessions.

The functions of this layer include dialogue control (control the direction of
traffic - half-duplex or full-duplex mode) and synchronisation (set check
points for error-recovery).
Ć Presentation layer: This layer is provided for transformation so that the

meaning of the data with different representations on different computers is
preserved when that piece of data is transferred from one computer to
another. The most common example of this layer is ASN.1 (a protocol to
represent a standard structure).
Ć Application layer: This layer accommodates commonly used applications and

utility services such as DNS (Domain Name System), email, Telnet, and FTP
(File Transfer Protocol).
Figure 3.4 shows an example of how to deliver a packet of data from a sender to
its corresponding receiver in an ISO OSI reference model.

Figure 3.4: Data transmission in the ISO OSI reference model
Based on Figure 3.4, when the sender wants to send data to a receiver, the sender
passes the data to the application layer. The application layer then takes some
action on the data and attaches a header (i.e. AH = Application Header) to the
data. The data with the AH header then pass to the presentation layer and the
presentation layer claims the whole thing from the application layer as its „data‰.
This process is repeated until it reaches the data link layer. The data link layer
adds both the header (DH) and the trailer (DT) to the message and passes it to the
physical layer. The physical layer then passes the whole thing - a long series of
logical bits - to the transmission machine, and the machine sends the
corresponding physical signals to the receiver through the transmission medium.
On the receiverÊs side, the above process is reversed and finally the application
layer on the receiverÊs side gets the data. The key idea of the process is that every
layer has its own procedure and every layer is independent of the others.
3.2.2 TCP/IP Reference Model

In the 1970s, the US Department of Defence Advanced Research Project Agency
(ARPA) was concerned about the lack of high-powered computers. At that time,
high-powered computers were very expensive, and all research projects needed
access to the latest equipment. ARPA found they did not have enough money to
buy many high-powered computers to satisfy all of their demands. Thus, they

came up with a solution - establish a network to connect computers with data, i.e.
ARPANET. In the beginning, only a few researchers agreed with this idea; the
rest were worried that it would not be successful. Finally it worked and has now
become a commercial success - the Internet.
The TCP/IP reference model is shown in Figure 3.5. A brief description of each
layer follows in the figure below:
Process/Application
Host-to-host
Internetworking
Network access
Figure 3.5: TCP/IP reference model
Ć Network access layer: This layer includes the functions of the physical and
data link layers in the ISO OSI reference model. That means it manages
communication mode, device specification and low-level network protocol
(e.g. Ethernet, Token Ring).
Ć Internetworking layer: This layer specifies the format of packets sent across
the Internet and the mechanisms used to forward packets from a host through
one or more routers to the destination.
Ć Host-to-host layer: This layer specifies an end-to-end protocol for the reliable
transfer of data between two application programs.
Ć Process/application layer: This layer specifies application services used in the

Internet.
The process of the packet delivery in this model is similar to that in the ISO OSI
reference model, except that the former has four layers only, and the latter has
seven layers.

3.2.3 ISO OSI vs TCP/IP

The ISO OSI and TCP/IP reference models have much in common. They both use
layered network architecture and provide similar functions. However, the ISO
OSI reference model is an ideal model and TCP/IP can now be found throughout
the world. The reason is that TCP/IP is simple to develop, whereas ISO OSI is too
complicated.
In the ISO OSI reference model, some layers - such as session and presentation
layers - are insignificant. It is difficult to put suitable functions into these two
layers. The data link layer, however, is too complicated, so it was divided into
two sub-layers (MAC and LLC). Thus, the design is not good and somehow the
loading of each layer is not well balanced.
The TCP/IP reference model is easy to implement because only two layers - the
Internet and transport layers - are standardised. The host-to-network layer comes
from any low-level network protocol. Any existing low-level network protocol is
welcomed to be the lowest part of the TCP/IP reference model. Also, above the
Internet and transport layers, any application services can be implemented on top
of them. Thus, to design a complete network software solution, you can use any
network components and low-level network protocol for your network, and you
can build any application services you like on top of TCP/IP. What you must do
is include the standard TCP/IP functions. Therefore, you should be able to see
that the TCP/IP model is very simple and efficient when compared with the ISO
OSI reference model.
Your next reading is a long one, and you are advised to make brief notes as you
read the material about protocols, addressing, packet delivery, routing,
congestion control and internetworking. Much of the material here is not really
new, as you were introduced to many of the general concepts in Topic 1 and
Topic 2.
READING
Chapter 3, section 3.3, „Network principles‰, 77ă89.

SELF-TEST 3.2
1. According to the performance analysis of CSMA/CD,

explain why CSMA/CD cannot be applied in WAN?
2. What are the advantages of fixed maximum packet size of

packet switching?
3. What are the similarities and differences between the

network layer in the ISO OSI reference model and the
internet layer in the TCP/IP reference model?
4. What are the similarities and differences between the

transport layer in the ISO OSI reference model and the
transport layer in the TCP/IP reference model?
3.3 INTERNET PROTOCOL (IP)

The Internet protocol (IP) has two main functions:
1. Hierarchical addressing: An IP address is 32-bits in length and is used in the

source and destination address fields of the datagram (the protocol data
unit of the IP layer). We need to define how to use these 32-bits to represent
the IP address of a machine; and
2. Connectionless routing: Each packet is an individual datagram to be routed

between stations (terminals) through routers. We need to define how to
route a datagram to the next suitable router or the destination.
3.3.1 IP Addresses (IPv4)

Dotted Decimal Notation
An IP address is 32-bits long and is a unique address to represent a machine.
Since it is difficult for human beings to read 32-bits IP addresses in technical
documents or in application programs, an IP address is written as four decimal
integers separated by decimal points, with each integer giving the value of one
octet of the IP address (one octet = eight-bits). We call this notation „Dotted
Decimal Notation‰. Note that the integers used in dotted decimal notation range
from 0 to 255. For example, the dotted decimal notation of 1000000 00001010
00000010 00011110 is 128.10.2.30, and the IP address of 149.8.12.40 is 10010101
00001000 00001100 00101000.

The conversion between bit pattern and dotted decimal notation is simple. To
convert a 32-bit IP address into its dotted decimal notation, we first divide it into
four eight-bits binary integers and then convert each binary integer into a decimal
integer. The conversion is as follows:
Let A to H represent the binary digit 0 or 1.
ABCDEFGH(2) = A × 27 + B × 26 + C × 25 + D × 24 + E × 23 + F × 22 + G × 21 + H × 20
(10).
For example,
00011110(2) = 1 × 24 + 1 × 23 + 1 × 22 + 1 × 21 (10) = 16 + 8 + 4 + 2 (10) = 30(10).
To convert a decimal integer into binary form, we use long division. For example,
if you wanted to convert 149 into a binary integer, you have:
Thus, 149(10) = 10010101(2). Then the four eight-bits binary integers are combined
into one 32-bits IP address.
3.3.2 Class
A 32-bits IP address has two components - a network identifier and a host
identifier, as shown below.
The network identifier identifies the network that the machine is connected to,
and the host identifier identifies the machine itself on the network. In the
beginning, you might think that optimal values should exist for the sizes of both

identifiers and thus their sizes should be fixed. For example, we could assign 16-
bits for the network identifier and 16-bits for the host identifier. However, it is not
flexible to assign like this. A trade-off exists between the sizes of the network and
the host identifiers.
Ć If the size of the network identifier were larger, this would allow for a larger
possible number of networks in the Internet, with each network having a
smaller number of hosts.
Ć If the size of the host identifier is larger, the number of hosts in a network
would be larger but the possible number of networks would be smaller.
It should be obvious that there are different needs in the real world. An
international organisation expects to be able to assign a large number of hosts to
its network, whereas a local trading company would find a smaller number of
hosts is adequate.
Therefore, to accommodate networks of different sizes, networks are classified

into five categories: A, B, C, D and E.
Class A
The first bit of an IP address in Class A is 0, and the next seven-bits are its
network identifier. Thus, Class A has 126 networks - 126 seven-bits network
identifiers excluding two special addresses.
Ć One special address is all 0s (0.0.0.0). This special IP address is allowed only at
system startup and is not a valid destination address. When a system starts
up or a new machine joins the network, they do not have any IP address and
they need to ask the network administrator to assign one. However, when
they send the request, they do not have their source addresses. At this
moment, they use 0.0.0.0 as their source address. Once they learn their correct
IP address, all 0s will no longer be used.
Ć Another special address is all 1s in the network identifier (127.xx.yy.zz). It is

reserved for loop-back testing (testing the TCP/IP on a local machine, by
sending a packet from its output port and then receiving it from its own
input-port). So, when a machine uses this IP address to send packets, all other
stations should ignore them and the machine will receive its own packets by
itself.

A Class A network has 16.8 million hosts based on 24-bits host identifiers
(excluding two special situations). An IP address with all 0s in its host
identifier (xx.0.0.0) represents its network. If an IP address has all 1s in its
host identifier (xx.255.255.255), it means a broadcast address. All hosts in
the network xx, will receive this message.
Class B
Similarly, class B has two specified leading bits (Â10Ê), 16,382 networks (14-bits
network identifier = 214 ă 2) and each network can have 65,534 hosts (16-bits host
identifier = 216 ă 2).
Class C
Class C has three specified leading bits (Â110Ê), two million networks (21-bits
network identifier = 221 ă 2) and each network can have 254 hosts (eight-bits host
identifier = 28 ă 2).
Class D
Class D is for packet broadcasting. It has four specified leading bits (Â1110Ê) and
the rest of the 28-bits are used to specify a multicast group. Multicasting is
defined as a communication with one sender and many receivers. If a machine is
in a multicast group, that machine will receive any message sent to the multicast
group. Note that a Class D address can be used only as a destination address. It
cannot be used as a source address, because there should be only one sender of a
message. If a Class D, IP address is put in the source address field, we do not
know exactly which machine in the multicast group is the sender.
Class E
Class E has five specified leading bits (Â11110Ê) and is reserved for research and
development.
3.3.3 Comments on IPv4

Between 1978 and 1982, when the specifications for the TCP/IP protocols were
being developed, a 32-bits IP address was considered sufficiently long, because it
offered more than four billion different combinations of addresses, which was
equal to the population of the world at that time. At present, however, not
enough IP addresses are left to assign, which is the reason why researchers
started to develop IPv6 (128-bits IP addressing scheme). Of course, one of the
main reasons is the current population is now much larger than four billion. But,
in fact, there are other reasons to explain the failure (or perhaps it is just the
shortcomings) of the design:

Ć Unexpected popularity: In the beginning, no one expected that TCP/IP would

be so popular that the number of IP addresses would be insufficient.
However, TCP/IP is popular because it solves a big problem of networking.
To achieve global connectivity, ideally we have a virtual global network
through which every machine can be connected. However, in the real world,
we do not have such a global network. Many different networks with
different technologies and protocols exist all over the world. We call them
heterogeneous networks.
Thus an alternative solution was considered - establish physical links and routers
to connect networks, and apply the same higher-level communication protocol
for each machine so that receivers can understand the content of packets sent
from senders. The most suitable choice of the common high-level communication
protocol is TCP/IP, because it is simple enough (only two layers) for everyone to
implement.
Ć Insufficient use of IP addresses: Although an IP address is 32-bits long for

four billion machines, some address spaces are wasted. Class D is for
broadcasting, so its IP address cannot be assigned to a machine. Class E is
reserved for future development but, until now, no one has used this class to
make any further enhancements.
Ć Bad classification: Originally, having more than one class in the design of IP
addresses was a good idea for different network groups, but the classification
was unrealistic. The number of hosts in a Class A network is unrealistically
large (16.8 million hosts). No network will accommodate such a large number
of host machines. Thus many IP address spaces are wasted in this class. Class
B has the best design, since the sizes of both the network and host identifiers
are appropriate. No one uses Class C to assign IP addresses, because the
number of hosts is too small (254 hosts). That number (254) is insufficient for
the network of a large organisation.
3.3.4 Datagram Header

Let us now move on to the structure of the header of the IP datagram. A
datagram (the PDU of the IP layer) has two components - a header and data.
Figure 3.6 shows the structure of the header.

Figure 3.6: The header of a datagram
Ć Version (four-bits): The field shows the version of the IP protocol. If a

machine is using a version of IP other than the current version, the datagram
is discarded rather than interpreted incorrectly. Currently, the version is four
(IPv4) and the next version will be six (IPv6), which will replace version four
in the next few years.
Ć Header length (HLEN) (four- bits): This four-bits field defines the length of
the datagram header in four-byte words. This field is required because the
length of an IP header is not fixed. If there are no options, the minimum
header length is 20-bytes and thus the value shown in this field is 5 (5 × 4 =
20). The maximum value of this field is 15, so the maximum header length is
60-bytes.
Ć Type of service (eight-bits): This field is to let the sender tell the routers (the
inter-networking devices used to route this datagram) how to handle this
datagram. Figure 3.7 shows the structure of the eight-bits.
Figure 3.7: Types of service
Precedence is a 3-bit field to define the priority of the datagram. Lower

priority datagrams are discarded first if some datagrams need to be discarded
in a router (e.g. as the result of congestion). However, this field is currently
not used but is expected to be functional in future versions.

TOS bits are a 4-bit field to describe a datagram. However, since IP provides
connectionless service (see topic 2 for the characteristics of a connectionless
service) only, the above descriptions are useless for the communication.
Ć Total length (16-bits): The field indicates the total length of a datagram
(including header). The unit of this field is byte (1 byte = eight-bits). The
maximum value of this field 216 ă 1, i.e. 65,535, and thus, the maximum
datagram length is 65,535-bytes.
Now we know the maximum size of a datagram. Then you might have a
question: How are the data handled if the data size is larger than the
maximum datagram size excluding the IP and TCP headers? That is easy to
answer: The data should be divided into several parts so that the size of all
parts is less than the allowable maximum. This job should be done by the
application layer since the maximum datagram size is well known, as is the
size of the IP and TCP headers.
A second question is not easy to answer: If a datagram size is larger than the
maximum packet size of a physical network, how should this be handled?
The solution is fragmentation and reassembling. Fragmentation is a way to
divide a datagram into small datagrams (fragments), whereas reassembling is
recombining all fragments into a datagram. Fragmentation of IP datagrams is
necessary because this feature allows networks with different maximum
packet sizes to be connected, especially networks whose maximum packet
size is less than the maximum size of IP datagrams. Another minor advantage
is that short packets are preferred because long packets make other stations
suffer long transmission delay.
When the size of a datagram is larger than the maximum size, a router breaks
the datagram up into a number of small fragments. The IP header of the
datagram is removed first. Then the data field is cut into several small parts
and each part has an IP header attached to form a new datagram. We call this
kind of new datagram „fragments‰. The IP layer of the destination can then
reassemble the fragments into the complete datagram before passing it up to
the upper layer protocol (say TCP) entity. The reassembling procedure is also
simple - collect all fragments, remove their headers and combine them all in
sequence.

The following parameters are used for fragmentation and reassembling:
Ć Identifier (16-bits): When a large datagram needs to be fragmented, all of its

fragments carry the same value in the identifier field. The destination host can
then determine which datagram the current fragment belongs to and
reassemble the original datagram.
Ć DF (one-bit): When it is set to one, it tells the Internet (router) not to fragment
the datagram. If it is equal to 0, the contents can be fragmented.
Ć MF (one-bit): When it is set to one, it stands for „more fragments‰. All

fragments of a datagram except the last, have this bit set to one. That means if
a fragment has logical „0‰ in its MF bit, it means the fragment is the last
fragment.
Ć Fragment offset (13-bits): This tells where this fragment belongs in the
containing (original) datagram. To reassemble, the destination host must
obtain all fragments starting with the fragment that has offset 0 through the
fragment with the highest offset.
Now, let us look at an example. Consider that a datagram is so large that it

will be divided into four fragments. Then, each fragment has the same
identifier (e.g. 100) and assigns DF as 0. The first fragment assigns MF as 1
and the fragment offset as 0. The second and the third ones are similar to the
first one, except their fragment offsets are one and two, respectively. The forth
one assigns MF as 0 to show that it is the last segment and the fragment offset
as three. All fragments are sent to the receiver and the receiver receives all of
them successfully.
However, the receiver does not get them in order. The first one received is the
first fragment (MF = 1 and fragment offset = 0), and then the receiver waits
for the rest. The second one is the fourth segment (MF = 0 and fragment offset
= 3). The receiver knows it is the last one but the second and the third ones
have not been received yet. Later, the second and the third arrive. The
receiver starts to do the reassembling. It combines all fragments in order and
then sends the complete datagram to the upper layer.
Ć Time to live (8-bits): This specifies how long the datagram is allowed to
remain in the Internet. When a source sends the datagram, it stores a number
in this field. Usually, it is set to be twice (two times) the maximum number of
routers between the source and destination. For example, the maximum
number of routers in Figure 3.2 is five (refer to Self-test 3.1); the value of „time
to live‰ will be set to 10. When a router receives the datagram, it decrements
the value of this field by one. If a router receives a datagram whose time-to-
live value has been reduced to zero, the router discards the datagram.

Time-to-live is very important because, when some datagrams travel through

the Internet with invalid destination addresses, they will stay in the Internet
forever if this field did not exist. This field can automatically clean up any
invalid datagram or a datagram that stays in a network for too long.
Ć Protocol (8- bits): This tells the network access layer in the destination host
which upper protocol process to give the datagram to. Usually, it is TCP or
UDP.
Ć Header checksum (16-bits): The receiver uses this field to check whether the
received datagram is correct or not. The Cyclic Redundancy Check (CRC)
method is usually used for error-checking.
Ć Source address (32-bits): The 32-bits field defines the IP address of the source.
Ć Destination address (32-bits): The 32-bits field defines the IP address of the
destination.
3.3.5 IP Routing
This sub-section demonstrates how to route a datagram from a source to its
corresponding destination.
Consider the network shown in Figure 3.8. Suppose H1 wants to send a packet to
H3. We know H1 is an end-station in network NetA, and H2, H3, and H4 are end-
stations in network NetD. H1 communicates with other stations by using the
native protocol of the network NetA (say PrA, e.g. Ethernet). Similarly, H2, H3,
and H4 communicate with other stations with the native protocol of the network
NetD (say PrD, e.g. Token Ring). Note that it is possible that PrA, PrB, PrC, and
PrD do not use the same protocol.
Thus, H1 will transmit the packet by using an IP protocol which H1, R(ABD) (a
router connected to NetA, NetB, and NetD), and H3 understand and agree on. At
the start of the transmission process, in the IP layer, H1 puts H3Ês IP address in
the destination address and its own IP address in the source address. Then, in the
Host-to-Network layer (PrA), H1 puts its own low-level network address in the
source address field of the header of PrA-PDU. However, at this time, H1 puts
the low-level network address of R(ABD) in the destination address field. The
reason is that H3Ês low-level network address cannot be identified by PrA, as the
address belongs to NetD and its format is in PrD. Therefore, instead of sending
the packet directly to H3, H1 will send the packet to router R(ABD) and expect
the router to redirect the packet to H3. The packet formats in the different layers -
and the travelling path of the packet - is shown in Figure 3.9.

Figure 3.8: Network configuration
Figure 3.9: Packet format and travelling path
When NetA routes the PrA-PDU to the destination R(ABD), R(ABD) will extract
the IP datagram from the PrA-PDU, look at the destination address and decide
that the destination is on NetD. So, R(ABD) sends the datagram to station H3
through NetD, embedding the datagram in a PrD-PDU. This time, the source
address of the PrD-PDU is the low-level network address of R(ABD), and the
destination address is the low-level network address of H3. R(ABD) knows the
low-level network address of H3 because they are in the same network, i.e. NetD.

When H3 receives the PrD-PDU, it will extract the IP datagram and obtain the
data.
We need to clarify several things at this point:
1. The source and destination addresses of the IP datagram are always

unchanged, so that whichever router handles, the datagram knows how to
route the datagram. But the low-level network source and destination
addresses may change if the packet crosses over more than one network;
2. R(AC) might also receive this packet, but when it checks the destination
address of the IP datagram, it would find that the destination station does
not belong to NetA or NetC. So, R(AC) will take action according to its
routing table - if the destination network address is unknown, it may
forward it directly to the default route; and
3. All routers have at least two layers - IP and host-to-network. The reason is
that, routers need an IP layer to extract IP datagrams and perform IP
routing, and use the host-to-network layer to do the actual switching and
forwarding functions.
After learning and understanding the procedure of IP routing, here is another

question: How does the router R(ABD) make a suitable decision - that is, to route
the packet to H3? The solution comes from its routing table.
A network on the Internet is usually designated by the network prefix of its IP

address, followed by appending 0s to the suffix (the host portion). For example, a
network with network identifier 144.214 is usually designated 144.214.0.0. A
router is connected to more than one network. Hence, it has multiple IP
addresses, and each IP address corresponds to each of its interfaces to the
network. In Figure 3.10, R1 has three IP addresses including 20.0.0.7, 40.0.0.7 and
128.1.0.8, which correspond to the three networks 20.0.0.0, 40.0.0.0 and 128.1.0.0
respectively.
In practice, networks are identified by their IP address. When a router examines a

datagram, it uses a network-id mask (also called a sub-net mask) to extract the
network identifier from the destination address field of the datagram and search
its own routing table for a match. Then, based on the search, the router will know
where to put the packet on its next hop (hop means network node or router). So,
it is called „next-hop routing‰. Note that the network-id mask is a 32-bits integer.
When it performs a logical AND with the destination address, the network
identifier will be extracted.
Consider the following example. Figure 3.10 is the routing configuration and
Table 3.1 is the routing table at R2. Suppose a datagram P arrives at R2 with a
destination address of 144.214.10.8. For each entry in the routing table, the

corresponding mask is „anded‰ with the destination. Let us check the first entry
of Table 3.1. When the destination address is „anded‰ with 255.0.0.0 (the network
mask in the first entry of Table 3.1), we have 144.0.0.0. Therefore, the destination-
network address should be 144.0.0.0. From the first entry of routing table in Table
3.1, the destination-network address is 20.0.0.0, and thus the first entry is not
matched. The „AND‰ operation of the above process is shown below:
144.214.10.8 10010000 11010110 00001010 000010000

and 255.0.0.0 and 11111111 00000000 00000000 00000000
144.0.0.0 10010000 00000000 00000000 00000000
Note that it is easy to design a mask for a network identifier. We first check how
long the network identifier is, and then we set all the bits in the network
identifier to one, and the rest of the bits (the host identifier) to zero - and that
becomes the mask of the network. That is, the network-id mask is a bit
combination used to describe which portion of an address refers to the network
and which part refers to the host.
For example, we know the 32-bits network IP address of 192.4.10.0 is 11000000

00000100 00001010 00000000, in which the network identifier is 24-bits long. By
the way, how do you know the network identifier is 24-bits long? Think about
what you learned about „classes‰ earlier: Which class does 192.4.10.0 belongs to?
How can you easily tell? Well, the answer lies in the first three-bits of the network
identifier, 110. Remember? Thus, its corresponding network mask should be
11111111 11111111 11111111 00000000, i.e. 255.255.255.0. The first 24-bits
correspond to the network portion and the remaining eight-bits represent the
hosts.
Table 3.1: Routing table at R2
Destination Mask Next Hop

20.0.0.0 255.0.0.0 128.1.0.8
40.0.0.0 255.0.0.0 128.1.0.8
128.1.0.0 255.255.0.0 Direct deliver
192.4.10.0 255.255.255.0 Direct deliver
144.214.0.0 255.255.0.0 192.4.10.8
The above process is repeated for each entry in the router table, and finally we
find that the fifth entry matches. Thus, we know the network identifier of the
destination address is 144.214.0.0. From the table, we find that we need to send it
to 192.4.10.8, the IP address of router R3 (the next hop). Note that although R3 has

two IP addresses, we send the packet to 192.4.10.8 but not to 144.214.0.5. The
reason is that, R2 and R3 are in the same network 192.4.10.0. Therefore, R2 knows
the low-level network address of R3 and one of the IP addresses of R3 (the one
with the same network identifier of R2). Therefore, the packet will be sent from
192.4.10.9 through the network 192.4.10.0 to router R3 with the IP address
192.4.10.8. Follow this description very carefully with regard to Figure 3.10 below.
Note that „direct deliver‰ in the routing table in R2 means that the packet has
already arrived at the destination network. Thus the packet will be directly
delivered to the destination station, since the router knows the low-level network
address and the IP address of the destination.
Figure 3.10: Routing configuration
Sometimes, there might be more than one path from one end to the other end.
However, a routing-table provides only one solution for each destination
network. The next hop is usually chosen, on the basis of the transmission delay of
the path from the router to the next hop that is the shortest.
3.3.6 Companion IP Protocols

The core IP protocol is used for sending datagrams between stations across the
Internet. In fact, the core protocol is primitive and there are some companion IP
protocols to handle other functions. Here, you study two important companion
protocols: ICMP (Internet Control Message Protocol) and ARP (Address
Resolution Protocol).
Internet Control Message Protocol (ICMP)

ICMP is used to communicate control messages between hosts and routers,
among routers and among hosts. ICMP messages are embedded in the data field
of the datagram and the protocol type is set to one. Most ICMP messages are for

signalling errors or unusual situations. The error messages can be classified into
five groups:
1. Source quench: Quench might be a new word for you. Two synonyms for
quench, as it is used here, are to reduce or to put out (like a fire) - in other
words, to stop or minimize something. A router sends a source quench
whenever it has already received too many datagrams, and no more buffer
space is available to receive more datagrams. The router will be temporarily
out of space and start to discard any incoming datagrams. When it discards
datagrams, it sends a source quench message to the sender of the discarded
datagrams and expects it to reduce the transmission rate.
2. Time exceeded: There are two situations in which this message is sent to the
sender when a router receives its datagram:
(a) When the time-to-live of the datagram has expired (i.e. the datagram
has stayed in the network for a long time so that the value of time-to-
live has been reduced to zero). Can you remember what determines
the „length‰ of time-to-live?
(b) When the reassembly timer has expired. When a datagram has to be
fragmented during the transmission, it is divided into small fragments
and all of them will continue to be sent to the receiver. When the
receiver gets the first fragments, it will start a reassembly timer and
then wait for the rest of fragments. If the timer has expired but the
receiver has not received all of the fragments, the received fragments
will be discarded and an ICMP message will be sent to inform the
sender.
3. Destination unreachable: When a router finds that a datagram cannot be

delivered to the destination, it will send this message to the sender. This
error message can distinguish whether the destination is temporarily out of
service (e.g. power down, crashed or disconnected from its network) or the
entire network is temporarily out of service (e.g. disconnected from the
Internet, or the connected router is down).
4. Redirect: When a sender transmits a datagram to a router, the router finds

that the path is incorrect and the datagram should be sent to a different
router. Then the router will send this message to the sender to ask for a
redirection path and usually recommend which router the datagram should
be routed to.
5. Parameter problem: This error message is to report the use of incorrect

parameters.

In addition to error messages, ICMP has the following information

messages:
Ć Echo request/reply: An echo request message can be sent by ICMP, and

ICMP can reply to this echo message. Note that the reply will carry the
same data as the request.
Ć Address Mask Request/Reply: A station broadcasts an address mask

request when it boots, and routers that receive this request will reply
with an address mask reply that contains the 32-bits sub-net mask being
used on the network.
Address Resolution Protocol (ARP)

When an IP datagram arrives at a destination router, the router sends the
datagram to the destination host over the destination network. Since the format
of the physical network (e.g. Ethernet address) is different from that of the IP
address, the router usually has a table to map the destination IP address to its
corresponding physical network address (i.e. Ethernet address). Then the router
sends the datagram to the destination by encapsulating the datagram in the
corresponding physical network address. However, if the router does not know
the address, how does it send the datagram? Sometimes a router does not know
the networkÊs map because the configuration of the physical network might have
changed or the station has just joined the network. To handle this problem, we
have ARP - Address Resolution Protocol.
Consider the following. An IP datagram with destination address Ap (e.g.

144.120.60.8) arrives at the destination network. The router wants to know the
low-level network address (e.g. Ethernet address) of the destination but the
current mapping cannot find it. The router will then broadcast a request within
the destination network: „Who owns IP address Ap?‰. Only the destination with
the IP address Ap will respond to this message and give its low-level network
address to the router. Now the router updates its table and sends the datagram to
the destination. Sometimes, if a sender does not know the low-level network
address of a router within its network, it can use ARP to find out.
Your next reading provides you with a comprehensive review of Internet

protocols, addressing and IP routing. As you read this section, think about all of
the things you have learned up to this point in the topic. When you have finished,
you should have a thorough understanding of how network stations
communicate with each other.
READING
Chapter 3, section 3.3, „Internet protocols‰, 89ă101.

SELF-TEST 3.3
1. An IP address is 144.214.6.20.
(a) What is the class of the IP address?
(b) What are the network and host identifiers of the IP
address?
2. What happens if a time-to-live field does not exist in an IP

datagram?
3. Using the routing configuration in Figure 3.10, write down

the routing table of R1.
4. What would happen if we did not have ARP?
3.4 TRANSMISSION CONTROL PROTOCOL

(TCP)
The two main functions in Transmission Control Protocol (TCP) are:
1. To provide a reliable point-to-point connection-oriented service for upper

(application) layer entities; and
2. To provide for multiplexing of multiple transport connections over a single

network.
You might already have a question in mind: Why do we need TCP to provide a
reliable point-to-point service? Almost all low-level network protocols such as
Ethernet and Token Ring are able to provide reliable services, so TCP should not
be necessary in order to handle reliability again. Moreover, the efficiency of TCP
is low because of processing this duplicate function.
So, obviously this is a drawback of TCP. However, it is also an advantage of TCP.

As mentioned, TCP/IP beats the ISO OSI model because of its simplicity.
Basically, TCP/IP allows any low-level network protocol (e.g. Ethernet or Token
Ring) to do the low-level data transmission. Since there is no restriction on which
low-level protocol is used, TCP does not know whether the data transmission is
reliable or not. Thus, TCP provides the end-to-end reliable service by itself.
Regardless of whether the network is using error-free optical-fibre

communication or high error-rate wireless communication, you can always put

TCP on top.
Another point that needs to be clarified is that TCP provides connection-oriented

service, whereas IP provides connectionless service. IP is connectionless because
it routes datagrams to their destinations but the path is not fixed - and a full
destination address is required for each datagram. TCP, however, assigns a
number (called a „port number‰) for each connection. All packets sent through
each connection are marked with the corresponding port number and all packets
received are identified by their port number. The procedure to assign a port
number for each source-destination pair is called the connection set-up
procedure. Thus TCP is a connection-oriented service.
The above description also explains why TCP can handle multiplexing. Here,
multiplexing means the way to handle more than one connection in a single
computer. For example, you are allowed to visit the CNN Web page, download a
file through FTP services and send email at the same time. Since each connection
has a unique port number, it is easy to identify which packets belong to which
connection.
3.4.1 TCP Port Number

Compared with IP addresses, the TCP port number is much simpler, since the
requirement is very simple - the port number must be unique for each
communication session in a single computer. It is a 16-bits integer that is used to
identify the application program at the end of the connection. Note that a port
number plus an IP address form a unique transport service access point (TSAP).
Segment Header
PDUs (Protocol Data Units) of TCP are called segments. A segment header has a
fixed size (20-bytes) and is shown in Figure 3.11.

Figure 3.11: TCP segment header
Ć Source port (16-bits): The 16-bits field defines the TCP port number of the
source application program.
Ć Destination port (16-bits): The 16-bits field defines the TCP port number of the
destination application program.
Ć Sequence number (SEQ) (32-bits): This identifies the position in the senderÊs
byte stream of the data in the segment. It is used for the positive
acknowledgement time-out retransmission mechanism, described later.
Ć Acknowledgement number (ACKN) (32-bits): This identifies the number of

octets (bytes) that the source expects to receive next. It is also used for the
positive acknowledgement time-out retransmission mechanism.
Ć TCP header length (4-bits): This shows the length of the header of the TCP
segment (in units of 32-bits words).
Ć URG (one-bit): When it is set to one, the urgent pointer is in use. It is used to
draw the attention of the receiver.
Ć ACK (one-bit): When it is set to one, the field of the acknowledgement

number (ACKN) is valid; otherwise, ACKN is invalid.
Ć PSH (one-bit): When it is set to one, it indicates to the receiver that it should
deliver the data (and any already buffered) to the application program;
otherwise, the receiver may buffer (and only deliver when the buffer is full) for
efficiency. This bit is used when the sender temporarily has nothing to send, or
at the end of data transmission, so that the receiver has time to handle the
received data.

Ć RST (one- bit): When it is set to one, reset the connection.

Ć SYN (one-bit): This bit is used for connection set-up. When SYN = 1 and ACK
= 0, the segment is a connection set-up request. When SYN = 1 and ACK = 1,
the segment is a connection set-up accept. The connection set-up and
disconnection procedure is described later.
Ć FIN (one-bit): This bit is used for connection release. When it is set to one, the
sender has reached the end of its byte stream.
Ć Window size (16-bits): This field is used for flow control. The mechanism of
the flow control is described later.
Ć Checksum (16-bits): This field is used for the receiver to check whether the
received segment is correct or not. The Cyclic Redundancy Check (CRC)
method is usually used for error-checking.
Ć Urgent Pointer (16- bits): This is used to specify the position in the segment in
which urgent data ends. The urgent data are in the data field of the segment.
3.4.2 TCP Services

Connection Set-up
TCP connection set-up is a three-way handshaking mechanism, as shown in
Figure 3.12.
Figure 3.12: TCP connection set-up (three-way handshaking)

1. To get things going, the sender sends a connection set-up request to the
receiver. The request segment includes SYN = 1, ACK = 0, and SEQ = x
where x is an arbitrary positive integer less than 232 ă 1.
2. When the receiver receives the message and wants to accept the request, it
replies with a connection set-up accept to the sender. The reply segment
includes SYN = 1, ACK = 1, SEQ = y and ACKN = x + 1. The ACKN is set to
x + 1 because, the receiver wants to indicate to the sender that it correctly
received its message with sequence number x and expects to receive
message x +1. The SEQ is set to y where y is another arbitrary positive
integer.
3. After the sender receives the accept message, it sends a connection set-up
confirm message to the receiver to confirm the connection, and then data
transmission will begin. The SEQ and ACKN are set to x + 1 and y + 1
respectively, because they indicate to the receiver that the sender has
correctly received the receiverÊs message with sequence number y. Note
that once the connection is established, both sides can send and receive
segments simultaneously.
The advantage of a three-way handshake is that, it works even if the TCP

segment that contains the connection set-up accept segment is lost. Why? Because
if the accept segment is lost, the sender knows about it and it will retransmit its
request message. When the receiver receives the duplicate request message, it
knows either the connection set-up request or accept message was lost and the
receiver will transmit the accept message again (it will retransmit).
Note that a new set of starting sequence numbers is used on connection set-up.
This is to avoid any segment from a previous connection session between the
same processes confusing the current connection.
Connection Termination
TCP connection termination is a four-way handshaking mechanism, as shown in
Figure 3.13.

Figure 3.13: TCP disconnection
The disconnection procedure is similar to the connection set-up procedure. In the

beginning, the sender sends a disconnection request (a segment with FIN = 1 and
SEQ = x) to the receiver. When the receiver receives the message and wants to
accept the request, it replies with a disconnection accept (a segment with FIN = 1,
SEQ = y, and ACKN = x + 1) to the sender. At this moment, the sender terminates
the connection from the sender to the receiver but not from the receiver to the
sender. The receiver can still reply to (or send any message to) the sender. When
the receiver wants to terminate the connection from the receiver to the sender, the
receiver sends a disconnection request (a segment with FIN = 1, SEQ = y + 1, and
ACKN = x + 1) to the sender and the sender replies with the disconnection reply
(a segment with FIN = 1, SEQ = x + 1, and ACKN = y + 2) to the receiver.
Connection Resetting
TCP may make a request for connection resetting. A connection must be reset if
the current connection is destroyed. There are three possible situations in which a
connection could be destroyed:
1. The sender requests a connection to a port that does not exist or is occupied
by other users. Then the receiver will send a segment with RST = 1 to reject
the request;
2. One side wants to abort the connection because of an abnormal situation

(e.g. the computer is going to shut down); it sends a segment with RST = 1
to close the connection; and
3. One side might find the other side is idle for a long time; it sends a segment
with RST = 1 to destroy the connection.

Data Transfer
Normal Operation
TCP is a reliable transport protocol. Damaged and lost segments are handled by a
positive acknowledgement time-out retransmission mechanism.
For example, a sender sends a 100-byte segment with SEQ = 1,000 to its
corresponding receiver. If the receiver receives it correctly, the receiver will reply
with a positive acknowledgement (a TCP segment with no data and ACK = 1)
with SEQ = 1,100, i.e. the sequence number that the next segment sent to the
receiver should carry. You might wonder why the sequence number of the
positive acknowledgement is 1,100, and not 1,001. In fact, this is how TCP is
implemented. TCP will actually add the data size on to the SEQ number and
acknowledge with the SEQ = SEQ(received) + Data size(received)
1,000 + 100 = 1,100. Figure 3.14 shows the normal operation of TCP segment
transmission.
Figure 3.14: TCP segment transmission
Error Detection and Correction

Since a physical network used in data communication is not always reliable, TCP
performs error detection - and correction if any error occurs. Each TCP segment
includes the check-sum field for error checking. If a damaged (or corrupted)
segment is found by the receiver, the receiver discards the segment and then does
nothing. It is not required to send any negative acknowledgement (an
acknowledgement that indicates the segment received by the receiver is
damaged).
The error correction mechanism is simple. When a sender sends a segment out, it
starts a timer for the segment. When the timer expires (the time-out period is
over) but no positive acknowledgement has been received, the sender assumes
the sent segment is lost or damaged, and thus it will be retransmitted. Then a
timer for the retransmitted segment will be started. The sender hopes the receiver
will receive the segment correctly this time, and that the sender will receive a
positive acknowledgement within the time-out period.

Damaged Segment
Figure 3.15 shows a damaged segment arriving at the destination. The sender
sends three segments to the receiver, each of 100 bytes. The sequence number of
the first segment is 1,000. The receiver receives the first and second segments,
checks them and finds there is no error. Then it acknowledges the two segments
by sending an acknowledgment with the sequence number 1,200 to the sender.
The sequence number 1,200 means that up to 1,199, all segments are received
successfully and the receiver is expecting the next segment with sequence
number 1,200. This is called accumulative acknowledgement system.
Note that sometimes the receiver is not required to acknowledge every segment.
When a sender sends many segments to its corresponding receiver, the receiver
may send back fewer acknowledgements rather than the same number of
acknowledgements, because it saves network resources. But of course the most
important thing is to make sure that the timer in the sender is not expired, i.e.
every successfully received segment should be acknowledged before time-out.
Now the third segment with sequence number 1,200 is sent, but it is damaged
before arriving to the receiver. This time the receiver does nothing and lets the
timer in the sender expire. After the timer expires, the sender retransmits the
third one, and this time the transmission is successfully completed.
Figure 3.15: Damaged segment

Lost Segment
Figure 3.16 shows a lost segment in the data transmission. This is similar to the
situation of a damaged segment. The third segment with sequence number 1,200
is sent but lost before arriving at the receiver. Since the receiver does not receive
anything, it does nothing and the timer at the senderÊs end expires. After the
timer expires, the sender retransmits the third one. This time, the transmission is
successfully completed.
Figure 3.16: Lost segment
Lost Acknowledgement
Figure 3.17 shows a lost acknowledgement sent by the receiver. Sometimes the
sender does not even notice a lost acknowledgement. In this example, the first
acknowledgement with sequence number 1,200 is lost but the second one with
sequence number 1,300 is received properly. Since this is an accumulative
acknowledgement system, if the second acknowledgement (SEQ = 1,300) is
received and the timers of the three segments have not expired, all three
segments are successfully acknowledged.
Even if the second acknowledgement is not received, the system can still recover
from this error. In this situation, the timer of the first segment will expire first,
and then the first segment will be retransmitted. According to the sequence
number of the segment, the receiver knows it is a duplicate segment. The receiver
will discard the segment and send the corresponding acknowledgement back to
the sender. The rest of the segments are handled in the same way.

Figure 3.17: Lost acknowledgement
Piggybacking
TCP offers full-duplex service, in which data can flow in both directions
simultaneously. When a TCP connection is established between A and B, A can
send data to B and B can send data to A. When a segment is sent from A to B, it
can also carry an acknowledgement of packets received from B. Similarly, a
segment sent from B to A can carry an acknowledgement of packets received
from A. This is called „piggybacking‰, because acknowledgements can be sent
along with the data. Note that if one side does not have any data to send, it can
just send an acknowledgement without data. Figure 3.18 shows TCP
communication with piggybacking.
Figure 3.18: TCP communication with piggybacking

Flow control
A network is stable if the input rate is the same as the output rate.
When a sender injects packets into a network, the network routes the packets to
its corresponding receiver and the receiver retrieves the packets from the
network. If the number of packets sent is not large, flow control is not important,
because the receiver can retrieve the packets slowly from the network, and the
input packets will not cause network congestion. However, if the number of
packets sent is sufficiently large or the packet transmission time is sufficiently
long, the input rate must be controlled - especially if the input rate is greater than
the packet-retrieving rate of the receiver. Such a discrepancy (difference) causes
network congestion.
The easiest way to solve this problem is to let the receiver control the input rate.
This means that the receiver tells the sender how many packets it can send. This
kind of control is known as flow control. The receiver uses flow control to control
the rate of packets that it is receiving. This is analogous to a conversation between
a young man and an old man. The young man speaks so fast that the old man
cannot follow what he is saying. The old man will ask the younger one to speak
slowly so that he can keep up with him.
The flow control protocol used in TCP is dynamic window protocol. The rules of
the dynamic window protocol are simple:
1. The receiver sends a flow-control packet called a window advertisement to

the sender, which tells the sender the number of packets it can send. A
window advertisement is a segment without data, and its window size is
the number of packets the sender can send. Note that the unit used in
window size is octets (1 octet = 1 byte = 8-bits). The window size usually
depends on the available buffer size of the receiver.
2. The sender keeps a send window size variable, which is the number of
packets it can send. On sending a packet, the send window size is reduced
by one. If the senderÊs window becomes zero, the sender stops sending
packets.
3. On receiving a window advertisement, the sender sets its send window size
to the value of the window size contained in the window advertisement.

If the receiver cannot handle any more packets for now (e.g. the computer is too
busy to handle other things or suddenly there is no buffer to receive data), it can
send a window advertisement whose window size is zero to stop the sender.
Figure 3.19 shows an example of flow control. The maximum segment size of the
sender is 1,000 octets, and the maximum window advertisement is 2,500 octets.
Figure 3.19: Flow control
To start the process, the receiver sends an advertisement window of 2,500 to the
sender to indicate that the receiver allows the sender to send packets with a
maximum of 2,500 octets. Then the sender starts to send packets. First, it sends
packets of 1,000 octets with sequence numbers up to 1,000. Later, it sends packets
of 1,000 octets again but with the sequence number up to 2,000. Finally, the third
group of packets of 500 octets is sent and then the sender is blocked, because the
send window size is 0. The receiver correctly receives all packets and
acknowledges all of them. The sender receives all acknowledgements, but it waits
because the send window size is still zero. Later, the application program of the
receivers reads 2,000 octets and thus 2,000 octets of buffer space become free
(available). The receiver sends a window advertisement of 2,000 to the sender to
inform it. Then the sender receives the message and starts to send more packets
(2,000 octets) until the send window size becomes zero again. The receiver
correctly receives all packets and acknowledges them.

The maximum send window size (also the maximum value of window
advertisement), can have a significant effect on the performance of the protocol in
the maximum data transfer rate. The following numerical example shows the
effect of the window size.
Consider the following. A dynamic window protocol is used on a satellite link.

The link has a speed of 100 Kbps, and the packet size is 1,000-bits. The end-to-end
propagation delay, tp, is 5 ms. Assume the maximum window size is 125 octets
(i.e. 1,000-bits). Figure 3.20 is the flow control diagram.
Figure 3.20: Flow control timing diagram
To complete a packet transmission, the sender first sends a packet to the receiver.
The packet arrives at the receiver, after the end-to-end propagation delay. Then,
after the packet transmission time, the receiver correctly receives the whole
packet and then processes the packet (e.g. does the error checking, places it in the
receiverÊs buffer) and then sends an acknowledgement back. After the sender
receives the acknowledgement, the packet transmission is completed.
We know the packet transmission time, tx = 1,000-bits / 100 kbps = 10 ms.

Assume the processing time of the receiver, tproc, and the transmission time of the
acknowledgement, tack, are negligible (very small). If there is no idle time and no
retransmission, the maximum link utilisation = tx / (tx + 2tp) = 50%. However, if
the maximum window size is 2500 octets, the packet transmission time, tx = 2500
× 8 bits/100 kbps = 200 ms, the maximum link utilisation will be tx / (tx + 2tp) =
200/(200 + 2 ï 5) = 95.24%.

Congestion Control
A packet sent from the sender might need to pass through several routers to
reach the receiver. Each router has a buffer that stores the incoming packets,
processes them (e.g. extracts the destination IP address of a packet, searches the
routing table), and forwards them. If the packet-receiving rate is faster than the
processing rate, congestion occurs and some packets may be discarded. The worst
possibility is that when some packets are discarded, they cannot reach their
destination, and therefore, their senders will not receive any positive
acknowledgement. Then, the senders will retransmit their packets, thereby
creating more congestion.
TCP needs to solve this problem. In the real world, if there is traffic congestion,
we usually need someone (e.g. a police officer) to monitor the traffic. However,
since the Internet does not belong to any organisation or company, it is difficult
to find someone doing the same thing. Thus a distributed congestion control
mechanism is needed (i.e. everyone takes some the responsibility to solve the
congestion problem). The name of the congestion control method in TCP is slow-
start algorithm. Let us go through the method. Then we explain why it is called
slow-start.
The TCP (sender) maintains two windows:
1. A send window Ws (in Kbytes) which is set by the receiverÊs window

advertisement, and
2. A congestion window, Wc.
The sender uses the smaller of the two for actual transmission. A threshold, T, is
an integer such that the congestion window will increase exponentially before
reaching the threshold. Usually, T is initially set to 64 Kbytes. The procedure of
the slow-start algorithms is listed below:
1. Wc = 1 (usually 1-Kbytes).
2. When (a) a window is sent, (b) there is no time-out, and (c) Wc is smaller
than T (Wc <T), Wc = min (2 × Wc, T) (growth rate is exponential).
3. When (a) a window is sent, (b) there is no time-out, and (c) Wc is not
smaller than T (Wc >=T), Wc = Wc + 1 (growth rate is linear).
4. When a time-out occurs, T = Wc / 2 and Wc = 1.

Here is an example to show the process of congestion control. The initial value of
T is 64 (i.e. 64-Kbytes), and time-out has occurred when the transmission number
is 12. Table 3.2 and Figure 3.21 show an example of congestion control. Study
Figure 3.21 carefully.
Table 3.2: Congestion Control Algorithm (Timeout Occurred at Trans. no. 12)
Trans. No. 0 1 2 3 4 5 6 7 8 9 10
WC (KB) 1 2 4 8 16 32 64 65 66 67 68
11 12 13 14 15 16 17 18 19 20 21 22
69 70 1 2 4 8 16 32 35 36 37 38
Figure 3.21: Congestion control
Figure 3.21 shows that the congestion window increases very slowly at the very
beginning (that is why it is called slow-start) but then increases quickly later.
When it increases up to 64, the rate of increase changes from the exponential rate
to the linear rate. When the 12th group of packets is sent, its acknowledgement is
time-out (the timer expires but the acknowledgement is not received yet) and
thus the threshold is reduced from 64 to 70 / 2, i.e. 35. The congestion window is
reset to one. Then the congestion window increases again at the exponential rate
until it meets the new threshold, 35.

Several things need to be clarified:
1. Here we assume the time-out has occurred because of congestion. That

means the receiver sends a positive acknowledgement back, but the
acknowledgement cannot arrive at the sender in time because of network
congestion, and the acknowledgement has been delayed or even lost (killed
by routers). In fact, the time-out might occasionally occur because of
damaged or lost packets and not because of network congestion. Is it
possible to distinguish between these two situations? The answer is that,
you cannot distinguish between these two. Their consequences are the
same: time-out. However, it is predicted that the latter situation is the most
likely reason, because modern network equipment and transmission media
are very reliable, and packets are therefore seldom damaged or lost.
2. The rate of increase of the congestion window increases slowly at the

beginning, because we do not want to transmit too many packets after the
congestion window is reset. If network congestion occurs, the congestion
window is reset but the rate of increase is high, so the network does not
have enough time to solve the congestion problem and thus, the congestion
will reoccur.
3 The rate of increase of the congestion window changes to linear after

reaching the threshold so that the transmission rate does not increase too
quickly. Before reaching the threshold, the congestion window increases
very quickly because the initial value of the congestion window was very
small (one only). But, after reaching the threshold, too great a transmission
rate will cause network congestion. Moreover, the transmission rate above
the threshold in the communication has been satisfied and there is no need
to increase the congestion window quickly.
3.4.3 TCP Multiplexing

A host uses a unique IP address to communicate through the Internet. Within
that machine may be multiple application programs requiring remote
communication services. The TCP layer implements multiple transport
connections over a single network interface. Figure 3.22 shows a set of multiple
connections over two machines.

Figure 3.22: TCP multiplexing
From the above example, we see that there are two application programs, X and
Y, in a machine connected through the Internet to another two application
programs, M and N respectively, in another machine. Although X and Y are in
the same machine with the IP address 144.214.12.38, X and Y will not have any
conflict because they have different TCP port numbers (X is 290; Y is 23). Also,
they will not communicate with the wrong application programs, because their
partners also have different TCP port numbers (M is 1326 while N is 2529).
SELF-TEST 3.4
1. What is the maximum size of a TCP segment? Justify your

answer.
2. Reconstruct Figure 3.17. This time, the first acknowledgement

is not lost but damaged.
3. Recompute the performance of the TCP flow control if the

processing time of the receiver is 0.5 ms, the size of the
acknowledgement is 160-bits, and the maximum window size
is 10 packets.
4. Consider the example of congestion control shown in this

section. Please write down all congestion windows of 10
further transmissions if a time-out has occurred again at the
24th transmission.

3.4.4 User Datagram Protocol (UDP)

UDP is another choice of transport layer above IP. It can be called User Data
Protocol, Unreliable Data Protocol, or User Datagram Protocol. It is a
connectionless transport protocol. The header structure is very simple. It is
shown in Figure 3.23.
Figure 3.23: UDP segment
UDP is a simple process-to-process protocol that adds only port addresses, check-
sum error control, and length information to the data from the upper layer. There
is no positive acknowledgement time-out retransmission scheme, since there is
no sequence number in the UDP header. There is no flow control since there are
no window advertisements and window size fields. There is no congestion
control, because no acknowledgements are expected. That is why it is a
connectionless protocol.
Basically, UDP is a simplified TCP that is suitable for applications requiring short
communication exchanges. For short communications, the TCP connection setup
time is a heavy overhead and thus, UDP is more suitable or efficient. Also, since
the communication is short, flow control and congestion control do not need to be
applied in the communication.
The following short reading deals with features of UDP and TPC. Please pay
particular attention to the transmission parameters that are ideally served by
UDP, and note the reliability guarantees offered by TCP.
READING
Chapter 3, section 3.4.6, „TCP and UDP‰, 105ă8.

3.5 INTERNET APPLICATIONS

Now that you have been introduced to TCP and IP, you should be ready to study
some Internet applications. Several common Internet applications introduced
here are DNS (domain name system), email, Telnet, and FTP (File Transfer
Protocol).
3.5.1 Domain Name Service (DNS)

As you now know, every machine on the Internet is assigned a unique IP
address. However, it is not easy to memorise those numbers. To make the IP
address of machines more readable, machines are referred to by a textual name
that carries some meaningful structural information. The process of converting a
textual name into its corresponding IP address or vice versa is called Domain
Name Service, and it is operated by Domain Name System (DNS). An example of
DNS is shown in Figure 3.24.
DNS specifies only the top level. The authority of assigning domain names under
each node in the top level is delegated to the organisations responsible for that
node. For example, the Hong Kong Internet Organization is responsible for the
hk domain. Another example is the Open University of Hong Kong. It has the
authority to assign any domain name with ouhk.edu.hk as the suffix, e.g.
learn.ouhk.edu.hk. The meaning of domain names is shown in Table 3.3.
Figure 3.24: Domain name service

Table 3.3: The Meaning of Domain Names
Domain Name Assigned to Domain Name Assigned to

Com Commercial firm Business or firms
organization
Edu Education store Business selling goods
institution
Gov Government web Entities related to WWW
organization
Mil Military group arts Cultural/entertainment

entities
Net Major network rec Recreation/entertainment
support centre entities
Org Organisation info Entities that provide

other than those information
above
Arpa Temporary nom Individual/personal
ARPA domain nomenclature
(still used)
Int International country code A country (e.g. cn,uk,jp)
organisation
DNS Server
A DNS server provides domain name mapping services to its clients. When
sending a service request to map a domain name of a machine, the DNS server
replies with the IP address of the machine. The above process is called name
resolving. Each machine on the Internet has a piece of software for resolving
names. It is often known as a resolver. For example, in UNIX, this is accessed by
calling gethostbyname. A resolver is configured with the IP address of a local
DNS server. When called, it packages a request to that DNS server. When the
DNS server returns the result, the resolver relays the result to the caller.
When a request reaches a DNS server (usually the closest DNS server), the name
is extracted. If the server is an authority for the name, the name appears in its
mapping database and a lookup of the database will return the IP address.
Otherwise, this DNS server will become the client of another DNS server and will
send a request to that DNS server. When the reply comes back, it in turn replies
to the resolver.

Optimisation of DNS Performance

When a name cannot be found in the local server, the local server will become a
client and send a request to other DNS servers. This happens quite frequently,
and the process usually involves the root server and other remote servers. Thus,
the traffic at a remote server would become intolerable, because the root server
would receive a request each time someone mentioned the name of a remote
computer. Since there are many remote requests like this, the loading of the root
server and remote servers becomes too heavy. There are two solutions for
improving the performance of DNS servers.
1. Replication. Each root server is replicated. Many copies of the server exist
around the world. In practice, the geographically closest server usually
responds best. Thus many duplicated root servers share the loading.
2. Caching. Each server maintains a cache of names. Whenever it looks up a

new name, the server places a copy of the binding (a pair of a name and its
corresponding IP address) in its cache. Before contacting another server(s)
to request a binding, the server checks its cache. If the cache contains the
answer, the server uses the cache answer to generate a reply. It is a very
efficient way to reduce the network traffic. Many remote names are called
repeatedly, and a sufficiently large cache can handle most of these requests
locally.
READING
Chapter 3, section 3.4.7, „Domain names‰, p. 108, provides you with a

brief explanation of domain names.
3.5.2 Email
Simple Message Transfer Protocol (SMTP)
Email is one of the most common Internet applications in the world. Figure 3.25
shows a general architectural model of an email system. This system is called
SMTP (Simple Message Transfer Protocol).
Figure 3.25: Email system

A UA (user agent) is a program that interfaces to the end-user. Through the UA

(say, UA1), a user creates an email message. An MTA (Message Transfer Agent,
also called a mail server) is used to deliver the message to the Internet. The MTA
may deliver the message through other MTA(s) until the destination MTA (say
MTA2) is reached.
The addressing of SMTP is simple. Each electronic mailbox has a unique address,
which is divided into two parts - the first part identifies a userÊs mailbox and the
second part identifies a computer on which the mailbox resides. For example, an
email address mt368@learn.ouhk.edu.hk represents „an email account called
mt368 in the learn server of The Open University of Hong Kong‰.. Email software
on the senderÊs computer uses the second part to select the destination. The email
software in the recipientÊs computer uses the first part to select the particular
mailbox.
Email Format
Now, let us investigate the message format of email. An email has two parts:
header and body. The header includes the following fields:
Ć To: email address(es) of primary recipient(s)
Ć Cc: email address(es) of secondary recipient(s)
Ć Bcc: email address(es) of blind copies (the same as cc but primary recipient(s)
cannot know those who have also received this email)
Ć From: a person who created the message
Ć Sender: email address of the actual sender
Ć Date: the date and time that the message was sent
Ć Subject: subject matter of the message
Ć Message-id: unique number for reference
Ć Reply to: email address to which replies should be sent.
Ć The body is simply the message itself.
Security
The current Internet email systems have significant security weaknesses.
1. Senders can be faked. Anyone can fake an email address as the sender.
SMTP can be accessed without any protection.
2. Messages can be tapped. Anyone who can tap into the path of the message
can get a copy easily.

In fact, in the beginning, no one knew that email would be so popular.

Thus, no one was concerned about the security issues of email systems.
Now we have some security software - PEM [Privacy Enhanced Mail, as
specified by IEIF (a famous organisation)] and PGP (Pretty Good Privacy as
developed by P Zimmermann) are two schemes that use encryption
techniques to ensure security of email.
Binary File Transmission: MIME

For simplicity, email messages allow ASCII text only. Binary file transmission
(e.g. graphics, image, audio or even video) through this channel was not
considered at the very beginning. But now, since email is so popular, we need to
consider other ways to send data including images and audio through email
systems.
One of the most common ways to handle this is MIME (Multipurpose Internet
Mail Extension). The basic idea of MIME is to add structure to the message body
and define encoding rules for non-ASCII text data, i.e. binary code → ASCII text
→ e-mail → ASCII text → binary code. We encode every 6-bits to 64 base
numbers and they are A, B, C, ⁄, Z, a, b, c, ⁄, z, 0, 1, 2, ⁄, 9, +, and /.
3.5.3 Telnet
Telnet is a remote terminal access protocol, shown in Figure 3.26. It allows a user
to access a remote computer as if he or she were directly interfaced to the
computer through the keyboard (for input) and display (for output).
When the Telnet operation is started, the Telnet client makes a TCP connection to
the remote server. After the connection is established, the keystroke input of the
client is transmitted to the server through the connection. The server handles the
keystroke input as local input and executes any commands from the input. Then
the response shows on the screen display in the server. At this moment, this
display is transmitted to the client and thus, the client has the same screen
display. Usually the screen display is in text format (i.e. text output). Graphical
output is allowed for some Telnet software, but the communication is more
complicated.

Figure 3.26: Telnet
3.5.4 File Transfer Protocol (FTP)

FTP (File Transfer Protocol) is the oldest Internet protocol. It is used to transfer
files between a client and a server. To use FTP, a user starts up an FTP client
program. The FTP client makes a TCP connection to port 21 of the remote
computer. Note that port 21 is the default port for the FTP program. After the
connection (called control connection) is made, the client issues commands to the
FTP server. The commands for FTP include display directories, change directory,
and move up and down the directory tree. To download a file, the FTP server sets
up another TCP connection (called data connection). At the end of file transfer,
the data connection will be closed. Note that the control connection will not be
closed after the end of file transfer and is closed only when the client disconnects
the server.
The common used FTP commands are:

Ć open: opens a control connection
Ć close: closes a control connection
Ć dir: lists the current (remote) directory
Ć get: downloads a file
Ć put: uploads a file
Ć cd: sets the current directory to a specified one
Ć pwd: prints current working directory.
FTP commands will be listed when you type „?‰ and the press „enter‰ key in the
ftp prompt.

The two file types for FTP are text files and binary files. The content of a text file
is transferred as ASCII text. The file content of a binary file is transferred as a byte
stream. Note that the default mode of FTP is for text file. If you want to change to
binary file mode, you need to type „b‰ or „binary‰ and then press the „return‰Ê or
„enter‰ key to change it. If you want to change back, you just need to type „a‰ or
„ascii‰ and then press „return‰ or „enter‰. You can use the binary file mode to
transfer text files, but there will be an „^M‰ at the end of each line. However,
transferred files will be in wrong format if you use text file mode to transfer
binary files.
SELF-TEST 3.5
1. What is the function of a DNS server?
2. If an email is sent to a mailing list, is every recipient on the

list guaranteed to receive a copy? Explain.
ACTIVITY 3.1
If you work in an office with a network that is connected to the
Internet, please check the TCP/IP setting of your machine (e.g. IP
address). Also check what Internet applications are used in your
machine. Based on what you have learned in this topic, do you
believe your system is as efficient and effective as it could be?
This topic looked at some concepts of WANs, TCP/IP, and some Internet
applications.
By now you should know what a WAN is, and know about its packet-switching
technology. You should also understand the two reference models associated
with layered network architecture - the ISO OSI reference model and TCP/IP
reference models - and their differences.
In IP (Internet Protocol), you should understand the structure of IP addresses and

IP datagrams, and IP routing and its companion protocols. In TCP, you have seen
the structure of TCP segments, TCP services and its multiplexing function.
Moreover, you should understand the connectionless protocol - UDP.

114 X TOPIC 4 INTERPROCESS COMMUNICATION (IPC) AND REMOTE PROCEDURE
CALLS (RPC)
T op i c 4 X Interprocess
Communication
(IPC) and Remote
Procedure Calls
(RPC)
LEARNING OUTCOMES
1. Explain the concepts of marshalling and unmarshalling;
2. Describe synchronous and asynchronous communication;
3. Outline the steps in a remote procedure call (RPC); and
4. Write network programs for RPC.
X INTRODUCTION
What is interprocess communication (IPC)? IPC is a communication method that
allows communication among processes that might be located in different
machines. The second question is: Why do we need IPC? We know that the
components of a distributed system are both logically and physically separated.
We need IPC to let them communicate in order to interact.
There are two kinds of communication pattern in interprocess communication

(IPC): client-server communication and group communication.
1. In client-server communication, we use request and reply to messages to

provide the basis for communication between clients and servers. The

TOPIC 4 INTERPROCESS COMMUNICATION (IPC) AND REMOTE PROCEDURE W 115
CALLS (RPC)
communication process usually begins when a client sends a request

message to its corresponding server to request a service from the server.
The request message includes the information about the client and
parameters required to process the service. After receiving the request
message, the server processes the service and then sends a reply message
with the result to the client.
2. For group communication, the exchange of messages takes place from the
server to a group of clients (members). We need a server to collect messages
from clients (members) and broadcast them to all members in the group.
In this topic, we concentrate on the client-server communication.
For client-server communication, we use the two transport level protocols.· TCP
and User Data Protocols(UDP). TCP provides a two-way stream communication
between senders and receivers. It includes error control, flow control, and
congestion control. UDP provides simple message passing abstraction, which
means that it simply passes messages from a sender to its corresponding receiver,
and leaves the higher system layers to handle the controls.
This topic goes through the details of client-server communication. You study the
concept of marshalling and unmarshalling with a common external data form ·
External Data Representation (XDR). Later, we introduce the concept of
synchronisation between clients and servers, and investigate two kinds of
synchronisation · synchronous communication and asynchronous
communication. Then we discuss the design and implementation problems of
client-server communication. After that, we introduce a high-level model for
client-server communication · remote procedure call (RPC). We show its design
and implementation problems. Finally, we give the C source code of RPC and one
simple example to show how to use the RPC programming to implement simple
client-server communications.
4.1 MARSHALLING AND UNMARSHALLING

To allow the exchange of data between computers, we need to map data
structures and data items to messages, because data transmission is one-
dimensional · during the transmission, data are sent bit by bit from one end to
the other end. However, a data structure can have more than one dimension, e.g.
in a hierarchical data-type structure. Therefore, a hierarchical data structure must
be flattened before transmission and rebuilt on arrival. Note that flattening here
means converting structured data into a sequence of bits.

CALLS (RPC)
A simple example of a hierarchical data structure in C is:
struct Data {
int length;
char flag;
char buffer[20];
}
To flatten this data structure, you would need to pack the 4-byte integer variable
length, then the 1-byte character flag, and then the next 20 bytes of character
array buffer to become the flattened external data form.
To send a data stream, you need to flatten the data first while, on receiving the
data stream, the data structure must be rebuilt for the receiver to receive the data
correctly.
Ć For two computers of the same type, the conversion to external data form
may be omitted. The sender may send the data bit by bit in its own way, and
the receiver can receive and rebuild the data easily, since the receiver knows
how to handle it.
Ć However, if two computers are not the same type, another way can be used
by converting data into an external data form (or format) · data values are
transmitted in their native form (such as character and integer format) using
special identifiers.
In IPC, the way to flatten the data is called marshalling and the sequence of bits is
called external data form. The way to rebuild the data from the external dat form
is called unmarshalling.
Ć Marshalling is the process of taking a collection of data items and assembling

them into a form suitable for transmission in a message.
Ć Unmarshalling is the process of disassembling them on arrival to produce an

equivalent collection of data items at the destination.
An analogy is helpful here: Marshalling is like packing a parcel for postal

delivery. Unmarshalling can be thought of as unpacking the parcel after receiving
it.

CALLS (RPC)
Since the marshalling and unmarshalling processes are used to work on a

standard external data form, some standard compilers (such as rpcgen used in
the Sun RPC system) can accomplish the task · the task doesnÊt necessarily have
to be done by the programmers. Thus, a language pre-processor (interface
compiler) can usually be used to generate the external data form automatically.
When an IPC primitive is encountered involving a data item of any data type, the
pre-processor generates code to do the marshalling (for the sender) or
unmarshalling (for the receiver) based on the type description. This operation is
described in detail when we talk about the design and implementation problems
of RPC.
We have briefly introduced how marshalling and unmarshalling is done. Now

we provide an example of how the processes handle it. The most common
example of external data forms is XDR (eXternal Data Representation). It is
provided by the Sun XDR language, described later when we talk about Sun RPC
programming.
The idea behind XDR is quite simple: Each message consists of a sequence of 4-
byte objects. The three types of object are: cardinal/integer, character and others.
The first two types are the most common types of data object, so we focus on the
last one. Except for the integer and the character types, we put all of the rest into
the type ÂothersÊ. ÂOthersÊ then represents the data types as sequences of bytes
with the length specified. The representation of each string consists of a data type
unsigned long representing its length in bytes followed by the characters in
the string. The length of the string is stored in a 4-byte object. For simplicity, the
characters in a string are assumed to occupy one byte per character. If a string
cannot completely fill in a 4-byte object, the unused bytes in the 4-byte object are
padded with zeros.
One of the major advantages of the marshalling and unmarshalling procedures

that work on the XDR format is that they provide a way for different computers
to communicate with each other using a standard format. However, one of the
disadvantages is that it also creates high transmission overhead and wastes
transmission bandwidth.

CALLS (RPC)
4.2 SYNCHRONISATION
Now you know how computers communicate with each other and the way they
send data using data structures created by the marshalling process. However, we
have no idea how servers synchronise with clients. If a client starts to send a
request to its server, how does the server know when to receive the request? The
server was always waiting for the clientÊs request. Is that the only way to
implement a server? How does a server handle things if it has its own
background job running and simply takes care of an incoming request only when
it detects a request is coming? This is the main issue discussed in this section.
Synchronisation is a central issue in the communication structure. There are two

types of operation · blocking and non-blocking. Blocking means that when a
process has sent out a message, it will stop the execution of the current process
until the receiving process has received the message · the corresponding receive
is the signal that releases it from its blocked state. Non-blocking means that even
though a computer (system) has sent out a message, it carries on doing other
activities · it does not block (stop the operation of) those activities to make sure
the receiver has really received the message. Thus, the system and the request are
executed simultaneously and they will not block each other.
The two types of communication operation are send and receive. Combined with
the above synchronisation operations, we have four types of synchronisation
operation for communication:
1. Blocking send: The issuing process blocks all processing in its own system
(i.e. control is not passed back) until the message sent to the receiving
process has been received. When a process uses the blocking send operation,
a message (request) is delivered to its corresponding remote process. Once
the remote process confirms it has received the message, the issuing process
continues executing it own processing. This is similar to phoning a friend
and waiting for your friend to answer · you donÊt leave the phone
unattended while you go and do something else.
2. Blocking receive: The issuing process blocks its own processing until a
message has arrived. In other words, if your phone is not being used (the
phone is blocked because it is not processing anything), then you can be
contacted by phone.
3. Non-blocking send: The issuing process continues executing other tasks

after the message has been sent to a receiving process. The issuing process
usually creates a child process to send the message so that it can continue
executing other tasks · the child process does the send operation

CALLS (RPC)
simultaneously (but independently). This is like making a phone call to a

friend and having your spouse talk to your friend while you go and make a
cup of coffee.
4. Non-blocking receive: The process issuing the non-blocking receive

continues with its processes if there is no message waiting to be received.
The issuing process will have to be notified later of the arrival of a message,
either by using polling or an interrupt mechanism. This is like having a
secretary handle your incoming messages (mail or phone). You might call
the secretary periodically to ask if any messages have been received
(polling), or your secretary calls you (interrupts) when an incoming
message has been received and asks what you want to do about it.
The two kinds of communication are synchronous and asynchronous.
4.2.1 Synchronous Communication

Synchronous communication uses the blocking send and blocking receive
operations. The sender and its receiver synchronise at the point of message
transfer. Figure 4.1 shows a typical synchronous communication.
Figure 4.1: Synchronous communication
In synchronous communication, when the sender (client) sends a request

message, the sender blocks all processing and waits for the reply. On the receiver
(server) side, it is blocked at the beginning and does nothing until a request
message is received. Once the request message arrives, it executes the request and
returns the reply message. After sending the reply message, the receiver (server)
blocks itself again and waits for the next request message. When the sender
(client) receives the reply message, it continues its execution (it is unblocked).

CALLS (RPC)
The obvious advantage of this communication is its simple implementation: It is

easy to design and control the message flow. Servers are always waiting for the
requests from clients, so it is impossible to miss any incoming request in the
normal way. Even though a request message might be lost during network
transmission, a simple timer and retransmission scheme for clients could solve
this problem. Loss of a reply message could be covered (recovered really) by a
similar solution: A simple timer and retransmission scheme for servers could
handle it. A client needs not worry about how to receive a reply message, since it
will wait for the reply message and the control will pass back only when the reply
message is received.
The following system calls are used for synchronous communication with error
handling:
Ć Send (B, msg, TO): A sender, say Machine A, sends a message msg to
Machine B, and blocks. If the message is not received in TO (time-out)
seconds (or an acknowledgement is not received yet after TO seconds), the
process is unblocked and an error code will be returned to inform the sender.
If the client wants to retry after the time out, the client simply uses a loop to
repeat the calling of this function.
Ć Receive (B, msg, TO): If a message is not received by Machine B in TO

seconds, the process will be unblocked and an error code will be returned to
inform the receiver that the receive operation is aborted.
4.2.2 Asynchronous Communication

Asynchronous communication uses the non-blocking send operation. The sender
and its receiver do not synchronise at the point of message transfer. Note that
there are two combinations · non-blocking send with non-blocking receive
operations, and non-blocking send with blocking receive operations. Figure 4.2
shows a typical asynchronous communication.

CALLS (RPC)
Figure 4.2: Asynchronous communication
In asynchronous communication, when the sender (client) sends a request

message, it is not blocked and continues executing its own processes. On the
receiver (server) side, it is quite complicated. If the receiver (server) uses the
blocking receive operation, the approach to handle incoming requests is the same
as the receiver (server) in synchronous communication. However, if the receiver
(server) uses the non-blocking receive operation, it will not block itself to wait for
incoming requests. Before receiving incoming requests, it executes its own
processes. Once an incoming request has arrived and the non-blocking receive
operation receives it and informs the receiver (server), the receiver (server) will
stop its own job and start to execute the incoming request. When the request is
served, it uses the non-blocking send operation to send the reply message and
then continues with its own processes.
Using non-blocking send and blocking receive operations is the usual

combination for asynchronous communication because the most important
function of a server is to serve clients. Thus, a server should always wait for
incoming requests and then serve them, unless it has other important tasks to
serve.
The obvious advantages of asynchronous communication are greater flexibility

and potentially more parallelism. It allows servers to handle some tasks other
than serving incoming requests and thus the servers will be used better and more
efficiently, especially when the arrival rate of incoming requests is low. A similar
idea can be applied on the client side. In synchronous communication, client
processes waste time by blocking and waiting for the reply message. This waiting
time significantly reduces the efficiency of client processes, because the

CALLS (RPC)
processing time of a client process is usually very short compared with the
round-trip time between the client and server. If the non-blocking send operation
is used, the network transmission time and the processing time of the remote
service can be ignored, since the client process is executing its own processes
while the remote server processes its request.
It is not easy to implement asynchronous communication, however. To

implement a non-blocking send operation, a child process needs to be created. If
the request message is lost or the server receives it incorrectly, the child process
for the non-blocking send operation should set up a simple timer and retransmit
if there is a time-out. If the child process tries N times (N is a sufficiently large
positive number, letÊs say ten times) and the server still has not successfully
received the request message, it will inform the client (parent) process that the
transmission of the request message has failed.
The most difficult part of asynchronous communication is the non-blocking

receive operation. Consider a situation in which a reply message is received by
the non-blocking receive operation of a client process, which is also similar to that
of a request message being received by the non-blocking receive operation of a
server process. Since the client process is doing its own job, two problems occur:
1. It is difficult to inform the client process that the reply message has been
received.
2. If the client process is informed, it is not easy to switch from the current job
to handle the incoming message.
To overcome the first problem, you should consider using either polling or
interrupt to receive the message. To implement polling, a child process is created
to wait for the incoming message. When the child process receives the message, it
stores the message in a specified location · which the parent process also knows.
Then, after creating the child process, the parent process examines the specified
location from time to time. When the parent process finds the message has been
received, it takes action to handle the message.
Implementing the interrupt solution is more complicated. Just like polling, a child
process must be created to do the non-blocking receive operation. But this time,
the child process will turn on an interrupt vector to inform the parent process that
the message has arrived. The interrupt will stop the execution of the parent
process, and then the parent process will take action to handle the message.

CALLS (RPC)
The brief description above should illustrate that implementing asynchronous

communication is much more difficult than implementing synchronous
communication. Moreover, it is not easy to use the advantages of asynchronous
communication. Usually, we call a remote service when we need it. It is quite
hard for programmers to schedule a call to a remote service before we actually
need it. Furthermore, we have to predict the time taken to complete the remote
service; otherwise, we may find we still need to wait for the reply when we want
to use the information in the reply. However, it is difficult to predict correctly,
because the environment of a distributed system varies from time to time. We
expect our client process and the remote service to work properly in all situations.
Thus, programmers usually consider synchronous communication for client-
server communication. The rest of this topic concentrates on the design and
implementation of synchronous communication.
READING
Sections 4.2 and 4.3, 133ă55.
SELF-TEST 4.1
1. When is it unnecessary to use the external data form in
marshalling?
2. Draw the message {ÂNetworksÊ, ÂandÊ, ÂDistributedÊ, ÂSystemsÊ} in

XDR format.
3. What are the advantages and disadvantages of a non-blocking

send operation?

CALLS (RPC)
4.3 CLIENT-SERVER COMMUNICATION

This section describes the design and implementation of the client-server
communication model. The idea of this model is to structure the distributed
systems as a group of cooperating processes, i.e. servers that offer services to the
users · the client.
The client-server communication model uses request-reply communication.

Request-reply communication is normally synchronous, which means, as we
mentioned, that clients wait for the reply. It can be asynchronous when clients
can afford to retrieve replies later
4.3.1 The Failure Model of the Request-reply

Protocol
There are three possible communication failures:
Figure 4.3: Communication failures
1. Loss of request message (Case 1 in Figure 4.4): A request message might be

lost because of communication link failure, network switch failure, or/and
crash of the receiver. It can be easily solved by a simple timer and a
retransmission scheme: After the procedure doOperation sends the request
message, a timer will start. If the reply message does not come back before
time-out, the request message will be retransmitted and the timer will be
restarted. If the retransmission is repeated N times (N is a sufficiently large
positive integer) and the reply message still cannot be received, the

CALLS (RPC)
procedure assumes the system has crashed and the procedure doOperation
will be aborted.
2. Loss of reply message (Case 2): A reply message might also be lost because
of communication link failure, network switch failure, and/or crash of the
receiver. The way to handle the loss of the reply message is the same as for
the loss of a request message.
3. Unsuccessful execution of the request (Case 3): This might happen if the
server crashes while executing the request. When it happens, the server will
shut down and have to be restarted. Usually, all uncompleted executions
before the server crash will be aborted and re-executed after the server
restarts.
It is difficult to distinguish between a server process failure (Case 3) and a

communication failure (Cases 1 and 2 above). When the server process does not
reply after N attempts, it is assumed to be unavailable. If the failure is from
communication link problems, when we retry the request, it will finally get to the
server and the actual operation will be done more than once. If the failure is from
the server problem, there is no use retrying to send the message.
Another problem we need to consider is that of duplicate request messages.

Sometimes the procedure doOperation retransmits the request message if a time-
out occurs, but all that happened was that the reply message was just too late in
coming back · the request and reply messages had not been lost. When this
happens, the server process will receive a duplicate request message. We need to
be able to handle such events.
Now we introduce a new term · idempotent. An idempotent operation is an

operation that can be performed repeatedly with the same effect as if it had been
performed exactly once. That means when you execute an idempotent operation,
you will get a result. If you execute it again, you will get the same result. Even
you execute it one thousand times, the result is still the same.
HereÊs an example of an idempotent operation. For example x = 1 is idempotent;

(when I repeat it many times, it gives the same result for x), but x = x+1 is not
idempotent (when I repeat this statement, the value of x is different). If a server
process is idempotent, it can execute the duplicate request message and reply by
sending the same result. Re-execution is allowed, because the result is the same
and thus the reply message is also a duplicate one. If the server process is not
idempotent (i.e. the result will be different if the server process is executed more
than once), a record of past results (called history) has to be kept. When a
duplicate request message arrives and is detected, the server process will not
execute the duplicate request but search the history. If the past result is found

CALLS (RPC)
from the history, the server process will copy it out, put it to the reply message,
and send the reply message back. The size of the history can be controlled by a
simple timer. When a result was stored over a certain time limit (time-out period),
it will be discarded.
Compared with the duplicate request message, it is easy to handle the duplicate
reply message. The procedure doOperation will be closed when the reply
message is received. Thus, if the first reply message is received, the procedure
will be closed and the duplicate reply message will be ignored. If the first reply
message is actually lost, the procedure will retransmit the request message. Later,
when the duplicate reply message is received, the procedure just treats it as the
first reply message.
4.3.2 RPC Exchange Protocol

Spector (1982) suggested that three protocols with different semantics in the
presence of communication failures, be used for implementing various types of
remote procedure call (RPC). RPC is a high-level model for client-server
communication, described in detail in the next section. The three protocols are
shown below:
1. The request (R) protocol: Figure 4.4 shows the R protocol. A client issues a
procedure Send (ServerPort, RequestMessage) and continues its own
processing. It is suitable when no reply is required from the server and
when the client requires no confirmation that the request has been carried
out.
Figure 4.4: The R protocol

CALLS (RPC)
2. The request-reply (RR) protocol: Figure 4.5 shows the RR protocol, the most
commonly used. The reply message from the server also acts as an
acknowledgement to the original request message. A subsequent request
from the client may be regarded as an acknowledgement of the serverÊs
message.
Figure 4.5: RR protocol
3. The request-reply-acknowledge reply (RRA) protocol: Figure 4.6 shows the

RRA protocol. The ack is an acknowledgement, which includes the request-
id and acknowledges all request messages up to that request-id. Although
the exchange involves an additional message, it need not block the client, as
the acknowledgement may be transmitted after the reply has been given to
the client, but it does use processing time and network resources.
Figure 4.6: The RRA protocol

CALLS (RPC)
READING
Section 4.4, 155ă64. This reading provides another perspective on
client-server communication.
SELF-TEST 4.2
1. What are the advantages of doOperation-getRequest-sendReply
communication (synchronous communication) over send-receive
communication (asynchronous communication)?
2. Define ÂidempotentÊ operation. Discuss whether the following

operations are idempotent:
(a) pressing a lift (elevator) close-the-door button

(b) writing data to a file
(c) appending data to a file.
4.4 REMOTE PROCEDURE CALLS (RPC)

The RPC is a high-level model for client-server communication. It provides
programmers with a standard mechanism for building client-server systems. In
the real world, distributed file service and authentication service are common
examples that use RPC for the implementation.
Why do we need RPC? That is a good question. Basically, calling a local

procedure is a very simple programming technique. However, when we develop
a client-server communication model, we face some problems that are not found
in local procedure calls.
Ć The most important difference is in I/O operations. The procedures of send

and receive are fundamentally engaged in doing I/O. Since I/O is not one of
the key concepts of centralized systems, making it the basis for client-server
computing has struck many workers in the field as a mistake. For example,
when you call a local procedure, you would not expect to handle a situation
in which the local procedure does not return any result. However, in client-
server communication, because of communication network failure or a server
crash, sometimes the remote procedure or process cannot send any reply or
result back. Thus, error handling in remote procedure calls is very important
and very different from the local procedure.

CALLS (RPC)
Ć Another difference is marshalling. In a local procedure, since both the

procedure and the main body (the parent process) are within the same
environment, parameter passing and data sharing are easy to handle.
However, in client-server communication, parameter passing involves
network communication, and data or data structures are not necessarily
shared with each other. ThatÊs why we need to have external data
representations.
Now you should see that if we develop client-server communication · except to

implement remote procedures or services and local client processes · we need to
make a real effort to overcome the above problems. Since these problems are
standard in all situations, RPC provides a standard mechanism to handle them.
Thus the client who needs a service needs RPC to provide an easy way to call the
procedures of the server.
Figure 4.7 shows a typical RPC model using the RR protocol. When a process on
Machine A calls a procedure on Machine B, all processing on A is suspended
(blocked) and the execution of the called procedure takes place on B. When B
finishes executing the procedure, it sends the result to A. After A receives the
result, it resumes its execution (it unblocks). Note that information can be
transported from the caller to the callee in the parameters and can come back in
the procedure result. No message passing or I/O at all is visible to the
programmer.
Figure 4.7: RPC model

CALLS (RPC)
4.4.1 The RPC Mechanism

The following summarizes the RPC mechanism:
1. The client provides the arguments and calls the client stub in the normal
way.
2. The client stub builds (marshals) a message (call request) and sends it to the
Operating System and network kernel (communication module).
3. The kernel sends the message to the remote kernel.
4. The remote kernel receives the message and gives it to the server dispatcher.
5. The dispatcher selects the appropriate server stub.
6. The server stub unpacks (unmarshals) the parameters and calls the
corresponding server procedure.
7. The server procedure does the work and returns the result to the server stub.
8. The server stub packs (marshals) it in a message (call return) and sends it to
the Operating System and network kernel.
9. The remote (receiver) kernel sends the message to the client kernel.
10. The client kernel gives the message to the client stub.
11. The client stub unpacks (unmarshals) the result and returns to client.
Figure 4.8 provides another perspective on the 11 steps listed above. Note,
however, that the StevensÊs model describes the RPC process using ten steps (not
11), but the overriding logic should still be clear to you.
Figure 4.8: Remote procedure call (RPC) model

Source: Fig. 18.1, Stevens 1994, 693

CALLS (RPC)
Several things need to be clarified. First, both client and server stubs are
automatically generated by rpcgen. rpcgen is an interface processor (compiler)
that integrates the RPC mechanism with client and server programs in
conventional programming languages. It has four functions:
1. to generate a client stub procedure;
2. to generate a server stub procedure;
3. to use the signatures of the procedures in the interface to generate
marshalling and unmarshalling operations in the above stub procedures;
and
4. to generate procedure headings for each procedure in the service from the
interface definition.
The details of rpcgen are discussed later in the section of RPC implementation.
Programmers donÊt need to marshal and unmarshal their request and reply
messages, since both client and server stub procedures are automatically
generated. Programmers also donÊt need to consider the socket (TCP/IP)
programming for their message passing, because the stub procedures handle that
aspect of the communication as well. Thus, what they need to do is to develop
their own client process and remote services. The rest will be handled by the RPC.
On the client side, the client stub receives the procedure call from the client
process. Thus when the client process calls remote procedures, it just calls a local
procedure in the normal way and the procedure is stored in the client stub. When
the client stub receives the call, it will do the rest, because the control now is still
in the client stub after calling the local procedure. Any error handling is included
in the client stub; the client process doesnÊt need to take care of it. When the reply
arrives at the client, the client stub unmarshals the result and passes it to the
client process. From the point of view of the client process, it just receives the
result from a local procedure; where the procedure was actually performed is
totally invisible to it.
On the server side, a remote service is just a function or procedure but not a
process. That means the remote service is not a complete program. When a
request message arrives, the server stub receives the message. Then it chooses an
appropriate remote service and makes a call (a local procedure call) to it in order
to serve the request. Since the server stub takes the action first, it is a process. As
the server stub calls the remote service which is just a function or procedure. On
the client side, it is totally different. Since the client process executes first, it is a
complete process. Since the client stub is called by the client process, the client
stub is just a function or procedure. Or, we can say that the client stub is passive
and the server stub is active. Also, the client process is active while the server
remote service is passive.

CALLS (RPC)
Characteristics
After discussing the RPC mechanism in general, we now investigate its strengths:
Ć Simple call syntax: RPC is designed to have exactly the same syntax as a local
procedure call, and in fact, in the view of the client process, RPC is exactly the
same as a local procedure. Thus programmers donÊt need to involve anything
new when they call remote procedures.
Ć Familiar semantics: Similarly, RPC has similar semantics compared with a

local procedure call. However, it is impossible to have exactly the same
semantics because of RPC limitations. The RPC limitations are discussed later.
Ć Well-defined interface: Since the RPC generator is open to the public and the
way to implement RPC is well defined, we have a well-defined interface.
Ć Ease of use and efficiency: Since the communication part and the marshalling
and unmarshalling procedures are automatically generated by rpcgen,
programmers find RPC is easy to use and very efficient.
Ć Ability to communicate between processes on the same machine or different

machines: The way to call the procedures on the same machine or different
machines is the same under the RPC mechanism.
Although RPC has the above good features, it also has some limitations:
Ć Parameters: The remote procedure is in another process that may reside in

another machine. When it is called, the execution is within the environment of
the server process but not the local process. Thus the client and server
processes do not share address space. Passing of parameters by reference and
passing point values are not allowed. Parameters have to be passed by values.
Ć Speed: Remote procedure calling (and return) time (i.e. overheads) can be
significantly (1ă3 orders of magnitude) slower than that for local procedures.
It usually it takes 10 to 100 μs to finish a local procedure. However, RPC
involves network transmission, which takes several milliseconds at least. This
might affect real-time design, and the programmer should be aware of its
effect. Later, in an activity of this topic, you are required to investigate this
effect by measuring the time taken to complete a remote procedure call.
Ć Failure: RPC is more vulnerable to failure since it involves a communication

system, another machine and another process. The programmer should be
aware of the call semantics, i.e. programs that use RPC must have the
capability of handling errors that cannot occur in local procedure calls.

CALLS (RPC)
4.4.2 Call Semantics

To implement RPC, we consider three types of call semantics:
1. ÂMaybeÊ call semantics: After a RPC time-out (or a client crashed and then
restarted), the client is not sure if the remote procedure has or has not been
called. This is a situation in which no fault tolerance is built into RPC
mechanism. Clearly, call semantics is not desirable and we have no way to
guarantee whether the RPC is successful or not.
2. At-least-once call semantics: With this call semantics, the client can assume
that the remote procedure has been executed at least once (on return from
the remote procedure). That means the client does not mind if the remote
procedure is executed more than once, but the client does not allow it never
to be executed. In other words, it can be executed more than once but it
must be executed once at least. This call semantics can be implemented by
retransmission of the call request message when a time-out occurs. Clearly,
the limitation of this call semantics is that the serverÊs operation must be
idempotent. Can you remember why?
3. At-most-once call semantics: This is the most complicated of the call

semantics. When a remote procedure call returns, it can be assumed that the
remote procedure has been called exactly once or not at all. Some critical
applications, such as Internet banking, belong to this call semantics. To
implement it, the server has to build up a filtering system for duplicates,
which is used to filter out duplicates when a retransmission is used, and for
caching replies, which is used to keep a history of reply messages to enable
lost replies to be retransmitted without re-executing the server operations.
The above facilities ensure that the remote procedure is called exactly once
if the server does not crash during execution of the remote procedure.
When the server crashes during the execution of the remote procedure, the
partial execution may lead to erroneous results. When this happens, we
want the effect that the remote procedure has not executed at all · the
procedure should be atomic, with Âeither all of the operations must be
completed successfully or they must have no effect at all in the presence of
server crashesÊ (Coulouris, Dollimore and Kindberg, 2001, 469).

CALLS (RPC)
4.4.3 RPC Implementation

The three main tasks in RPC implementation are: interface processing,
communication handling, and binding.
1. Interface processing integrates the RPC mechanism with client and server
programs in conventional programming languages. The interface compiler
(called RPC generator in Sun RPC) processes interface definitions written in
an interface definition language. After executing, the interface compiler
generates client and server stub procedures with marshalling and
unmarshalling operations. Procedure headings for each procedure in the
service are also generated from the interface definition along with a process
for dispatching of request messages to the appropriate procedure in the
server.
2. Communication handling: TCP and UDP communications are supported to

transmit and receive request and reply messages within the Internet
environment.
3. Binding specifies a mapping process from a name to a particular object,

which is usually identified by a communication ID. Binding is important,
because an interface definition specifies a textual service name for use by
clients and servers. We need a binder, which is a separate service that
maintains a table containing mappings from service names to server ports.
Moreover, a clientÊs request message must be addressed to a server port.
Note that all other services depend on the binder service. If we did not have
the binding service, we would not know how to access remote services.
Servers use two binder interfaces, Register and Withdraw, and clients
use one binder interface, Portlookup. The procedure Register (String
ServiceName, Port ServerPort, int version) is used to register the
name of a server process ServiceName with its serverÊs port ServerPort.
Note that the integer version is used to record the number of versions
registered in the server. The procedure Withdraw (String ServiceName,
Port ServerPort, int version) is to withdraw the registration. The
procedure Portlookup (String ServiceName, int version) is used
by a client to search the corresponding ServerPort by given
ServerName.
READING
Section 5.3, 197ă201. The short section describes remote procedure
calls quite concisely and provides a brief description of binding.

CALLS (RPC)
SELF-TEST 4.3
1. How many steps must be taken to accomplish an RPC?

Please show all steps.
2. Explain how client stub procedures may be generated

automatically from an interface definition.
3. How does a client know the procedures (names) it can call

and which parameters it should provide from its server?
4. How does the client transfer its call request (the procedure
name) and the arguments to the server via the network?
5. How does the server react to a request from the client? How
is the procedure selected? How are the arguments
interpreted?
6. The binding service described in this topic does not support

the registering of multiple instances of the same service.
Define new versions of the procedure Register, Withdraw,
and Portlookup that can deal with multiple instances of the
same service.
7. Network transmission time accounts for 80% of the total

time taken to complete an RPC. By what percentage will the
times for these two operations improve if the network is
upgraded from 10 Mbps to 100 Mbps?
4.4.4 RPC Programming

In this section, you study the most common RPC implementation: Sun RPC. Sun
RPC consists of the following parts:
Ć Rpcgen, an interface compiler that takes the definition of a remote procedure

interface and generates a client stub and a server stub.
Ć XDR, the interface definition language, which is a standard way of encoding
data in a portable fashion between different systems.
Ć A run-time library that handles all of the details.

CALLS (RPC)
The communication handling used in Sun RPC is TCP or UDP. They are
implemented by socket programming. The example shown in this section is
under RPCSRC 3.9 in 4.3 BSD UNIX.
The simple example shown in this section is that the client calls remote services
using RPC. The server has the following two functions:
Ć bin_date_1: This returns the current time as the number of seconds since
00:00:00 GMT, January 1, 1970.
Ć str_date_1: This takes a long integer value from the above function and
converts it into an ASCII string that is human readable format.
Figure 4.9 shows the procedure to generate the RPC example.
Figure 4.9: RPC example

Source: Stevens 1994, Fig 18.3
date.x is a RPC specification file that specifies the signatures of the remote
server procedures in date_server.c. The content of date.x is listed below:
/*
* date.x - specification of remote date and time service
*/

CALLS (RPC)
/*
* Define 2 procedures:
* bin_date_1() returns the binary time and date (no
arguments).
* str_date_1() takes a binary time and returns a human-
readable string.
*
*/
program DATE_PROG {
version DATE_VERS {
long BIN_DATE(void) = 1; /* procedure number = 1 */
string STR_DATE(long) = 2; /* procedure number = 2 */
} = 1; /* version number = 1 */
} = 0x31234567; /* program number =
0x31234567 */
The file declares both of the procedures and specifies the argument and return
values for each. It also assigns a procedure number to each function (1 and 2),
along with a program number (0x31234567) and a version number (1). The
program numbers are 32-bit integers that are assigned as follows:
Ć 0x00000000 · 0x1fffffff is defined by Sun machine.
Ć 0x20000000 · 0x3fffffff is defined by users.
Ć 0x40000000 · 0x5fffffff is for transient.
Ć 0x60000000 · 0xffffffff is reserved for future use.
Procedure numbers start at 0. Every remote program and version must define
procedure number 0 as the Ânull procedureÊ. It does not require any arguments
and returns nothing. The rpcgen compiler automatically generates it. The
function of this procedure number is to allow a client to call it to verify that the
particular program and version exist. It is also useful for clients to calculate the
round-trip time, since if a client calls it, it does nothing and returns a null reply to
the client. Thus, the time taken to call an RPC with procedure number 0 is the
round-trip communication time.
The rpcgen compiler generates the actual remote procedure names by

converting the names BIN_DATE and STR_DATE to bin_date_1 and
str_date_1 respectively. The methodology used is to change the name from
uppercase to lowercase and then append an underscore and the version number.
For example, if the procedure name in the RPC specification file is ABCDE with
version number 2, the actual remote procedure name is abcde_2.

CALLS (RPC)
We define the remote procedure BIN_DATE as taking no arguments (void) and

returning a long integer result (int). Also, the remote procedure STR_DATE takes
a long integer argument (int) and returns a string result (static char). We
explain later why we use static char but not char. Note that Sun RPC allows
at most one single argument and returns at most one single result. If more
arguments are expected, we have to use a data structure as the argument. Also, if
more than one value is returned, a data structure must be used as the return
value.
To execute the rpcgen compiler, you should type the following command:
>rpcgen date.x
Then the rpcgen compiler generates three different files from date.x —
date_svc.c, date.h, and date_clnt.c. The content of the file date.h is
shown below:
/*
* Please do not edit this file.
* It was generated using rpcgen.
*/
#include <rpc/types.h>
#define DATE_PROG ((u_int)0x31234567)

#define DATE_VERS ((u_int)1)
#define BIN_DATE ((u_int)1)
extern long *bin_date_1();
#define STR_DATE ((u_int)2)
extern char **str_date_1();
It defines our function bin_date_1 as returning a pointer to a long integer. It

also defines our function str_date_1 as returning a pointer to a character
pointer.
date_svc.c and date_clnt.c are the client and server stub procedures
respectively. They are used to compile with the client and server processes when
we compile. To create a client program from the client main function
chkdate.c in the client side, you should type the following commands in the
following order:
>cc –c chkdate.c chkdate.o

>cc –c date_clnt.c –o date_clnt.o
>cc –o chkdate chkdate.o date_clnt.o -lrpc

CALLS (RPC)
Note that rpc is the RPC run-time library. On the server side, you should type
the following commands in the following order:
>cc –c date_server.c date_server.o

>cc –c date_svc.c –o date_svc.o
>cc –o date_server date_server.o date_svc.o -lrpc
Here is the client program chkdate.c.
/*
* chkdate.c - client program for remote date service.
*/
#include <stdio.h>
#include <rpc/rpc.h> /* standard RPC include file */
#include "date.h" /* this file is generated by rpcgen */
main(argc, argv)
int argc;
char *argv[];
{
CLIENT *cl; /* RPC handle */
char *server;
long *lresult; /* return value from bin_date_1() */
char **sresult; /* return value from str_date_1() */
if (argc != 2) {
fprintf(stderr, "usage: %s hostname\n", argv[0]);
exit(1);
}
server = argv[1];
/*
* Create the client "handle".
*/
if ( (cl = clnt_create(server, DATE_PROG, DATE_VERS,

"tcp"))==NULL) {
/*
* Couldn't establish connection with server.
*/
clnt_pcreateerror(server);
exit(2);
}

CALLS (RPC)
* First call the remote procedure "bin_date".

*/
if ((lresult = bin_date_1(NULL, cl))==NULL) {

clnt_perror(cl,server);
exit(3);
}
printf("time on host %s = %ld\n", server, *lresult);
/*
* Now call the remote procedure "str_date".
*/
if ((sresult = str_date_1(lresult,cl))==NULL) {
clnt_perror(cl,server);
exit(4);
}
printf("time on host %s = %s", server, *sresult);
clnt_destroy(cl); /* done with the handle */

exit(0);
}
The program flow is simple. First, check the existence of the second argument,
which should contain the name of the remote server. Then call clnt_create to
create an RPC handle to the specified program (the second argument) and
version (the third argument) on a host. We also need to specify which
communication protocol we used. It is usually either TCP or UDP. In our example,
we used TCP, so the fourth argument is tcp. Note that the first argument is the
name of the remote server. After executing the function call clnt_create, we
have the handle. Later, we can call the remote procedures bin_date_1 and
str_date_1 for that particular program and version. When we finish executing
the two remote procedures, we call clnt_destroy to destroy the RPC handle.
The server program date_server.c is listed as below:
/*
* date_server.c - remote procedures; called by server stub.
*/
#include <rpc/rpc.h> /* standard RPC include file */

#include "date.h" /* this file is generated by rpcgen */
/*
* Return the binary date and time.
*/

CALLS (RPC)
long *
bin_date_1()
{
static long timeval; /* must by static */
time_t time(); /* Unix function */
timeval = (long)time((long *) 0);

return(&timeval);
}
/*
* Convert a binary time and return a human readable string.
*/
char **
str_date_1(bintime)
long *bintime;
{
static char *ptr; /* must be static */
char *ctime(); /* Unix function */
ptr = ctime(bintime); /* convert to local time */
return(&ptr); /* return the address of pointer

*/
}
As mentioned, the server program is just a set of functions and is not a main
program. The flow is very simple. The first function just calls the time function to
get the current time, and the second function converts it into a human readable
format. Note that the return values must be static variables, because if we do not
use static variables, their values would be undefined after the return statement
passes control back to the server stub that calls our remote procedure.
In the server side, we execute the server program as follows:
> date_server &
Note that & means background execution.
On the client side, we execute the client program and obtain the following
daytime result from the server:
> chkdate localhost

time on host localhost = 1016692751
time on host localhost = Thu Mar 21 14:39:11 2002

CALLS (RPC)
Note that localhost is used as the remote server.
Finally, letÊs go through the way Sun RPC processes in Sun environment. Figure
4.10 shows the steps in RPC.
Figure 4.10: Steps in RPC

Source: Stevens 1994, Fig. 18.4
1. When the server program date_server is started on the remote side, the
server program calls a function in the RPC library, svc_register,
registers its program number and version with the remote system. This
function contacts the port mapper process to register itself.
2. The client program calls clnt_create to contact the port mapper to find
out the port number of the server.
3. Call bin_date_1 remote function and receive its return value.
4. Call str_bin_1 remote function and receive its return value.
Note that the dispatcher of Sun RPC, i.e. port mapper, is quite different from our
dispatcher. When we introduced the RPC mechanism, the function of the
dispatcher was to select an appropriate server process to serve the incoming
request. In Sun RPC, the port mapper does not select a server process for any
incoming request. Instead, it searches the remote procedure required by the client
and sends its port number back to the client. Once the client knows the port
number of the remote procedure, it directly contacts the remote procedure next
time, and the port mapper will not serve the client after that.

CALLS (RPC)
READING
Pages 700ă8 from Stevens (1994) Unix Network Programming at the end
of this topic. This supplementary reading provides you with another
perspective on remote procedure calls in Sun RPC. Please read it to
improve your understanding of this very important network concept.
SELF-TEST 4.4
1. Can we use the procedure number 0? What is the function of

this procedure number?
2. In the client program chkdate.c, how do we modify the

program if we want to use UDP as the communication
protocol?

CALLS (RPC)
ACTIVITY 4.1
Please repeat the given programming example in your UNIX account

and see whether its performance is the same as what you expect.
If it works, please measure the time taken to complete each RPC.
Hints to start programming:
It is very easy to do the measurement. Just before you start to call the
remote procedure, call the time function time and store the value into
a variable, say start_time. Then, after executing the remote
procedure and the control is back to you, call the time function time
again and store the value into another variable, say finish_time.
Thus the difference of these two variables is the time taken to complete
the remote procedure.
You might find that the time taken may be too short to measure. We
suggest you measure it after it executes 100 times. That means you
record the initial time before executing the RPC. Then you use for loop
to repeat it 100 times. After the repeat loop has finished, record the
final time and get the difference. You divide it by 100, and the result is
the average time taken to finish the RPC.
It is also suggested that you do the same thing in your local procedure,
which means you cut and paste the remote procedure into your main
program. Just remove client and server stub procedures (i.e. do not
compile your client program with date_clnt.c) and directly execute
the procedure in your main program. Use the above method to
measure the time taken to execute a local procedure, but this time
repeat the execution 10,000 times, because the time taken to execute a
local procedure is much shorter than a remote procedure.
Please compare the time taken to execute a local procedure call and a
remote procedure call. You will find the difference is more than 100
times. It shows that executing a remote procedure call is significantly
slower than a local procedure call.

CALLS (RPC)
Topic 4 took you on a guided tour of the design and implementation problems of
interprocess communication (IPC) and remote procedure call (RPC), and
provided you with a small example of how they actually are implemented. Now
that you have completed the topic, you should be able to explain the role of
marshalling and unmarshalling in IPC. You should also be able to describe and
differentiate between synchronous and asynchronous communications.
You should understand the simple client-server communication model. Moreover,

you should understand the RPC mechanism, and its characteristics and
limitations. Finally, you should know the source code of RPC from the small
example, and the operation of RPC under the UNIX environment.
Coulouris, G, Dollimore, J and Kindberg, T (2001) Distributed Systems: Concepts

and Design, New York: Pearson Education Limited.
Spector, A Z (1982) ÂPerforming remote operations efficiently on a local computer

networkÊ, Comms. ACM, 25(4): 246ă60.
Stevens, W R (1994) Unix Network Programming, Englewood Cliffs, NJ: Prentice-

Hall Inc.

146 X TOPIC 5 MULTICAST GROUP COMMUNICATION
T op i c 5 X Multicast Group
Communication
LEARNING OUTCOMES
1. Describe the advantages of group communication;
2. Explain the definitions of atomicity, group and ordering; and
3. Discuss the design and implementation issues of group
communication.
X INTRODUCTION
Group communication refers to the ability of a set of more than two processes in
a communication network to communicate simultaneously with each other.
Multicast communication refers to a process sending a message to all members of
the group of processes. In this topic, we will denote group communication as
multicast group communication.
A group is usually defined as a set of dedicated processes. These processes are

also called members of the group. When one member wants to broadcast a
message to the group, it will use the SEND system call and then pass the message
to the system call. The system call will then broadcast the message. Other
members (the potential receivers) will call a RECEIVE system call to receive the
message, and the system call will return or pass the message to the members.
Note that members do not receive messages directly - members usually execute a
system call to receive messages, and when the system call receives messages, it
will pass the messages to its corresponding member or receiving process.

TOPIC 5 MULTICAST GROUP COMMUNICATION W 147
To implement a multicast group communication system, the pair-wise exchange

of messages is not the best solution for communication from one process to a
group of other processes. If this approach were used, the loading on the sender
process (the source) would be too high - it would require sending many messages
out to all destinations and waiting for all acknowledgements to come back from
them. Moreover, each sending process would need a lookup table to search for
the destination addresses of all members of the group, and it will be difficult to
maintain the consistency (of the lookup tables) since group members should be
allowed to join and leave the group from time-to-time. Thus, we need another
approach to implement multicast group communication.
To repeat, a source in a multicast group communication system handles the

message sending and receiving functions, and you will see that the loading of the
source is much higher than a source for point-to-point communication. The
source needs to send many messages, but each destination receives only one of
them. The source also needs to receive many acknowledgements since each
destination will reply by sending an acknowledgement back to the source. How
many messages do you think a source needs to handle (for a single message to
the group) if there were 100 members in the group and an acknowledgement was
required from each destination? Think about it.
Now, regardless of what answer you came up with for the above situation, if the
number of acknowledgements is not equal to the number of messages sent (i.e.
message lost / acknowledgement lost / process failure), problems occur - who
did not get the message? If the source just retransmits the message again to all
members, it will be a waste of lots of bandwidth because only a small portion of
the destination group (e.g. one out of 100) probably did not reply. This puts an
even greater load on the source - and the system.
5.1 APPLICATIONS OF GROUP

COMMUNICATION
A group communication is a very useful method for constructing distributed
systems with the following characteristics:
Ć Fault tolerance based on replicated services: Reliable remote services can be

implemented by using group communication, and those kinds of services are
called replicated services. A replicated service consists of a group of servers.
Client requests are multicast (broadcast) to all members of the group and each
member (server) performs an identical operation. If some of the members fail,
clients can still be served because other members are ÂaliveÊ to perform the
task.

The major drawback of the above implementation is that many unnecessary

server operations exist since the client needs one reply only. However, if the
operation is critical or very important (e.g. a military operation), this
implementation is worth to be considered.
Ć Locating objects in distributed services: Group communication can be used to

locate objects such as files within a distributed file service. For example, when
a client wants to locate files with given names, it multicasts the name query to
different file servers. Only the server which has the files will reply to the
query.
Another example is Address Resolution Protocol (ARP) in IP (Internet

Protocol). A router uses the group communication service to find out the IP
address of a machine with a given low-level network address.
Ć Better performance through replicated data: To get better performance from a

service, replicas of the data are placed in usersÊ workstations so that users
need not get the data from remote servers. The time taken to access remote
data is much longer than accessing the data locally, so replicated data can
greatly reduce data access time.
However, each time the data change, the modified data must be broadcasted
to the processes managing the replicas so that data consistency can be
maintained in all processes.
Ć Multicast update: Some applications are required to multicast updated

information to a group when something happens, e.g. a news system might
notify interested users when a new message has been posted in a particular
news group.
5.2 DESIGN ISSUES IN GROUP

COMMUNICATION
Group communication services are important building blocks for many useful
distributed applications that require fault-tolerant, high-performance and high-
availability characteristics. However, few existing operating systems provide
application programs with support for group communication.
There are six important issues that need to be addressed in the design of group
communication within a distributed system. The following descriptions of these
six issues have been extracted from a paper by Kaashoek [1993].

Addressing
This issue is about addressing methods for a group of members.
„Four methods of addressing messages to a group exist. The simplest one is to

require the sender to explicitly specify all the destinations to which the message
should be delivered. A second method is to use a single address for the whole
group. This method saves bandwidth and also allows a process to send a
message without knowing which processes are members of the group. Two less
common addressing methods are source addressing and functional addressing.
Using source addressing, a process accepts a message if the source is a member
of the group. Using functional addressing, a process accepts a message if a user-
defined function on the message evaluates to true. The disadvantage of the latter
two methods is that, they are hard to implement with current network
interfaces.‰
Reliability
This issue is about to identify whether the communication is reliable or unreliable.
„The second design criterion, reliability, deals with recovery from

communication failures, such as buffer overflows and garbled packets. Because
reliability is more difficult to implement for group communication than for
point-to-point communication, a number of existing operating systems provided
unreliable group communication, whereas almost all operating systems provide
reliable point-to-point communication, for example, in the form of RPC.‰
Ordering
This issue is about the order among messages in the communication.
„Another important design decision in group communication is the ordering of

messages sent to a group. In general, there are four possible orderings - no
ordering, FIFO ordering, causal ordering and total ordering. No ordering is easy
to understand and implement, but unfortunately makes programming often
harder. FIFO ordering guarantees that all messages from a member are delivered
in the order in which they were sent. Causal ordering guarantees that all
messages that are related are ordered. To be specific, messages are in FIFO order
and if a member after receiving message A sends a message B, it is guaranteed
that all members will receive A before B. In the total ordering, each member
receives all messages in the same order. The last ordering is stronger than any of
the other orderings and makes programming easier, but it is harder to implement.

To illustrate the difference between FIFO and total ordering, consider a service
that stores records for client processes. Assume that the service replicates the
records on each server to increase availability and reliability and that it
guarantees that all replicas are consistent. If a client may only update its own
records, then it is sufficient that all messages from the same client will be ordered.
Thus, in this case FIFO ordering can be used. If a client may update any of the
records, then FIFO ordering is not sufficient. A total ordering on the updates,
however, is sufficient to ensure consistency among the replicas. To see this,
assume that two clients, C1 and C2, send an update for record X at the same time.
As these two updates will be totally ordered, all servers either (1) receive first the
update from C1 and then the update from C2 or (2) receive first the update from
C2 and then the update from C1. In either case, the replicas will stay consistent,
because every server applies the updates in the same order. If in this case FIFO
(or causal) ordering had been used, it might have happened that the servers
applied the updates in different orders, resulting in inconsistent replicas.‰
Delivery Semantics
This issue is about how many processes (i.e. group members) must receive the
message successfully.
„The fourth item, delivery semantics relates to when a message is considered

delivered successfully to a group. There are three choices - k-delivery, quorum
delivery, and atomic delivery. With k-delivery, a broadcast is successful when k
processes have received the message for some constant k. With quorum delivery,
a broadcast is defined as being successful when a majority of the current
membership has received it. With atomic delivery, either all processes receive it
or none do. For many applications, atomic delivery is the ideal semantics, but is
harder to implement if processors can fail.‰
Response Semantics
This issue is about how to response to a broadcast message.
„Item five, response semantics deals with what the sending process expects from
the receiving processes. There are four broad-categories of what the sender can
expect - no responses, a single response, many responses and all responses.
Operating systems that integrate group communication and RPC completely
support all four choices.‰
Group Structure
This issue is about the semantics of a group such as dynamic versus static and
open versus closed.

„The last design decision-specific to group communication is group structure.

Groups can be either closed or open. In a closed group, only members can send
messages to the group. In an open group, non-members may also send messages
to the group. In addition, groups can be static or dynamic. In static groups,
processes cannot leave or join a group, but remain a member of the group for the
lifetime of the process. Dynamic groups may have a varying number of members
over time. If processes can be members of multiple groups, the semantics for
overlapping groups must be defined. Suppose that two processes are members of
both groups, G1 and G2 and that each group guarantees a total ordering. A design
decision has to be made about the ordering between the messages of G1 and G2.
All choices discussed in this section (none, FIFO, causal, and total ordering) are
possible.
5.3 LOGICAL TIME IN DISTRIBUTED SYSTEM

In a distributed system, processes running on separate computers may need to
cooperate to solve problems. However, two important limitations to a distributed
system are the absence of global time and the absence of shared memory. In other
words, computers of a distributed system do not share a common clock and a
common memory. Therefore, the traditional notions of „time‰ and „state‰ in a
single processor system (e.g. a stand-alone computer) do not work in distributed
systems. We need to develop some concepts that correspond to „time‰ and
„state‰ in a single processor system.
Logical Clock
Lamport [1978] suggested a method that can be used to order events in a
distributed system. The following brief descriptions have been extracted from the
textbook and another book about distributed systems.
Lamport defined a relation called happens-before. The expression a → b is read

„a happens before b‰ and means that all processes agree that first event a occurs,
then afterwards, event b occurs. These happen before relation can be observed
directly in two situations:
1. If a and b are events in the same process, and a occurs before b, then a → b
is true.
2. If a is the event of a message being sent by one process, and b is the event of
the message being received by another process, then a → b is also true. A
message cannot be received before it is sent, or even at the same time it is
sent, since it takes a finite, non-zero amount of time to arrive (Tanenbaum
and van Steen 2002, 252).

Lamport invented a simple mechanism by which the happened-before ordering

can be captured numerically, called a logical clock. A Lamport logical clock is a
monotonically increasing software counter, whereby the value need not bear any
particular relationship to any physical clock. Each process pi keeps its own logical
clock, Li, which it uses to apply so-called Lamport timestamp to events. We
denote the timestamp of event e at pi by Li(e), and by L(e) we denote the
timestamp of event e at whatever process it occurred.
To capture the happened-before relation →, processes update their logical clocks

and transmit the values of their logical clock in messages as follows:
Ć LC1: Li is incremented by one before each event is issued at process pi: Li = Li

+ 1;
Ć LC2: (a) When a process pi sends a message m, it piggybacks on m the value t
= Li; and
Ć LC2: (b) On receiving (m, t), a process pj computes Lj := max(Lj, t) and then
applies LC1 before time stamping the event receive(m).
Although we increment clocks by 1, we could have chosen any positive value. It

can easily be shown, by induction on the length of any sequence of events
relating two events e and eÊ, that e → eÊ implies L(e) < L(eÊ). Note that the
converse is not true. If L(e) < L(eÊ), then we cannot infer that e → e (Coulouris,
Dollimore and Kindberg 2001, 398).
Before proceeding further, let us clarify the meaning of an event in

communication among processes of distributed systems. An event is either
sending or receiving a message in a process. We denote the timestamp of event e
at pi by using Li(e), and by L(e) we denote the timestamp of event e at whatever
process it occurred. To capture the notation →, processes update their logical
clocks and transmit the values of their logical clocks in messages according to the
rules LC1, LC2 (a) and LC2 (b) described above.
The first rule states that when an event happens, its logical clock should be
incremented by one so that we know the sequence of event happening. For an
event whose logical clock is smaller than another event, the former event
happened earlier than the latter one. The second rule just puts the logical clock to
be carried in every message. The third rule is used to arrange the ordering of a
process if it receives a message sent by another process.

Consider Figure 11.6 in your textbook (pg. 447). There are three processes p1, p2,
and p3, and two messages m1 and m2 in the system. All processes have their
logical clock initialised to 0 and their timestamps are assigned in the figure. Event
a happens first and thus, its logical clock is one. Event b happens next and thus,
its logical clock is 2. The message m1 carrying logical clock two arrives at process
p2. Since process p2 has a logical clock 0, which is smaller than the logical clock of
m1, the logical clock of event c (receiving the message m1) is therefore two and
then it is incremented by one after receiving the message, i.e. three. Similarly,
events d and f have logical clocks four and five respectively. Note that even if
event f is next to event e in process p3, the logical clock of event f is much larger
than that of event e. However, the difference is meaningless. We only care about
which logical clock value is larger to identify the order of an event - not the
difference.
Moreover, not all events are related by the relation →. For example, events b and
e have taken place in different processes and there is no chain of message-
interacting between them. Thus, they are not ordered or we can say they are
concurrent, i.e. b || e. Note that the definition of concurrent here is different to that
in operating systems.
Ć In operating systems, concurrent means some processes are executing in

parallel or at the same time.
Ć In group communication, concurrent means two events are not in order. We

do not know which one happened first and which one happened later.
In Figure 11.6 in the textbook, L(b) = 2 and L(e) = 1. But, whichever event
happened before or after event b, the value of L(e) is still equal to 1. Thus, even
though L(b) > L(e), and since the events b and e are in different processes, we still
have b || e and the comparison between their timestamps is meaningless.
Figure 5.1: Lamport timestamps for the example events

Source: Figure 10.6, Coulouris 2001, 399

Vector Clock
Vector clocks are an improvement on LamportÊs logical clock. By using vector
clocks, it is possible to determine by examining their vector timestamps whether
two events are ordered by happened-before or are concurrent. The following
brief description has been extracted from the textbook.
Mattern [1989] and Fidge [1991] developed vector clocks to overcome the
shortcoming of LamportÊs clock - the fact that from L(e) < L(eÊ) we cannot
conclude that e → eÊ. A vector clock for a system of N processes in an array of N
integers. Each process keeps its own vector clock Vi, which it uses to timestamp
local events. Like Lamport timestamps, processes piggyback vector timestamps
on the messages they send to one another, and there are simple rules for updating
the clocks as following:
VC1: Initially, Vi[j] = 0, for i, j = 1, 2, ⁄, N.
VC2: Just before pi timestamps an event, it sets Vi[i] = Vi[i] + 1.
VC3: pi includes the value t = Vi in every message it sends.

VC4: When pi receives a timestamp t in a message, it sets Vi[j] = max( Vi[j], t[j]),
for j = 1, 2, ⁄, N. Taking the component-wise maximum of two vector
timestamps in this way is known as a merge operation.
For a vector clock Vi, Vi[i] is the number of events that pi has timestamped, and
Vi[j] (j ≠ i) is the number of events that have occurred at pj that pi has potentially
been affected by. (Process pj may have timestamped more events by this point,
but no information has flowed to pj about them in message as yet.) (Coulouris,
Dollimore and Kindberg 2001, 399).
We may compare vector timestamps as follows:

Ć V = VÊ if and only if V[j] = VÊ[j] for j = 1, 2, ⁄, N
Ć V ≤ VÊ if and only if V[j] ≤ VÊ[j] for j = 1, 2, ⁄, N
Ć V < VÊ if and only if V ≤ VÊ and V ≠ VÊ

Figure 11.7 (pg. 448, textbook) shows the vector timestamps of the events.
Consider the events a and f. Let V(q) be the vector timestamp of an event q, then
we have V(a) = (1, 0, 0) and V(f) = (2, 2, 2). By comparing V(a) and V(f), we know
V(a) < V(f) and thus, a → f. We also know b || e since V(b) is not larger than V(e),
V(e) is not larger than V(b), and also V(b) is not equal to V(e).
Vector timestamps have a disadvantage when compared with Lamport

timestamps. Vector timestamps require more space to store the timestamp, and
thus the overhead is larger than that used by Lamport timestamps. This
disadvantage is significant if many processes which are involved in the group
communication.
Figure 5.2: Vector timestamps for the example events

Source: Figure 10.7, Coulouris 2001, 400
Application of Logical Time

One important application of logical time in a distributed system is for ordering
message delivery in multicast group communication. For example, vector clock is
generally used in implementing causal ordering, and LamportÊs logical clock is
used in implementing FIFO ordering. Chapter 12, Section 12.4, pp. 484ă98 in your
textbook has detailed descriptions on the implementation of ordered multicast
communication.
READING
Section 12.4, 484ă98 in your textbook contains a comprehensive coverage of

multicast communication.

SELF-TEST 5.1
1. Give an example of how to use the multicast update, which is
one of the applications of using group communication.
2. Give an example in which the ordering of multicasts sent by

two clients is not important. Give an example in which
ordering is important.
3. Give an example to show the importance of multicast

atomicity. Explain why a multicast implemented with simple
unicasting might fail to be atomic, even though the
underlying communication (i.e. unicast operation) is reliable.
5.4 IMPLEMENTATION OF MULTICAST GROUP

COMMUNICATION
In implementation of multicast group communication, the following problems
need to be addressed:
Ć Some messages may get lost or arrive out of order;
Ć A process may fail or a node may crash; and
Ć Networks may partition into several subnets due to network failure.
In this section, we will investigate how to implement multicast group

communication that fulfils the following requirements:
Ć Ensures total ordering of the multicast messages;
Ć Monitors the activity of processes and detects faulty processes (members);
and
Ć Implements fault recovery (removes the failed process from the group).
There are two different approaches for implementing group communication - the
centralised approach and the distributed approach.
1. The centralised approach involves using a server as a centre. If a source

wants to broadcast a message, the source will send it to the centre and then
the centre will broadcast it to all group members. The server (centre) also
handles all group membership function requests, and the centre keeps the
database of all active members in the system.

2. For the distributed approach, each group member keeps track of

membership. There will not be a centre (or coordinator) to handle the
multicast messages.
We will only consider the centralised approach. Compared with the distributed
approach, the centralised approach is simple and efficient in establishing total
ordering. If every multicast message is generated by the source itself, it is difficult
to maintain total ordering because every member has to know the ordering of all
members. Moreover, in the centralised approach, the list storing all members can
be simply maintained by the centre - not by all members. Each member wanting
to broadcast a message just sends it to the centre and the centre will know which
members should receive this broadcast message. There will be no consistency
problem in maintaining the list. Finally, the loading of broadcast messages will be
transferred from each source to the centre. Even if the number of members in a
group is very large (e.g. > 1,000), every source member just sends a broadcast
message once to the centre and the centre does the rest of them, especially if we
want to invoke atomicity (which will be discussed later).
There are some drawbacks of using the centralised approach, however:
Ć The centre used has to be reliable, or some backup centres are required in case
the centre fails. The centre is so important that we have to keep it alive to
maintain the proper operation of any group communication.
Ć The performance bottleneck is in the centre (server), in this approach. A fast

machine must be installed as the centre to execute the processes to achieve
better group communication performance.
A comparison between the two different approaches for implementing group

communication is shown below:
Centralised
Characteristics Distributed Approach
Approach
The process to broadcast messages The centre The sender
Membership database management The centre All members
Implementation complexity of total Low High
ordering
Transmission overhead Small Large
Reliability Depends on No need to rely on the
the centre centre
Performance bottleneck The centre The sender

The Centralised Approach

Before proceeding further, let us consider the environment of group
communication. Assume there are n processes (members in a group) in a group G
where G = {S1, S2, ⁄, Sn}. The centre is one of the Si, i = 1, 2, ⁄, n. It is also called
„Token‰ or „Sequencer‰ in some other texts.
Normal Operation
Figure 5.3 shows the normal operation of a group communication. The operation
is simple. We have used two multicast messages m and mÊ to show the operation.
At the beginning, S1 and Sn send multicast messages m and mÊ to the centre Si.
LetÊs assume that message m is delivered first and then message mÊ. Since centre
Si receives both messages, the centre knows the order of these two messages and
thus, assigns them in the order m.1 and mÊ.2. After that, two messages will be
broadcasted to all processes (members) including the two original sources S1 and
Sn. All members will receive these two messages and, regardless of the order in
which the messages are received, they can identify the correct order by the
numbering.
Figure 5.3: Normal operation
Message Reliability and Negative Acknowledgement

Figure 5.4 shows a situation in which a message is lost. Like the example shown
in Figure 5.3, we have two multicast messages m and mÊ sent by S1 and Sn.
Message m is delivered first, followed by message mÊ. When the centre SI
broadcasts two messages, message m.1 to Sn is lost but message mÊ.2 is received
by Sn. Sn finds out that it has received only the second message (it knows by
examining the number of the message assigned by the centre) and thus, it sends a
negative acknowledgement to the centre and asks for m.1. The negative
acknowledgement should include the name of the source sending this message
and indicate which message is missing. When the centre receives this negative
acknowledgement, it will re-send message m.1 to Sn. This time, the transmission
is successful and that session of the group communication is completed.

Figure 5.4: Message reliability and negative acknowledgement
Note that there is no need to consider positive acknowledgement time-out for

message reliability unless atomicity is needed (discussed later). This would result
in too many positive acknowledgements so that the loading on both the centre
and the network will be very high.
Consider that a group has N members (including the centre) and positive
acknowledgement is used (remember this from the „Introduction‰ to this topic ?).
If there is no transmission error, the expected number of messages transmitted for
one multicast message is 1 + (N ă 1) + (N ă 1), i.e. 2N ă 1. [Your guess before (for
N = 100) was probably 200; 199 is pretty close.] However, if we consider negative
acknowledgement, the expected number of messages transmitted is 1 + (N ă 1),
i.e. N. Now assume that one re-transmission is enough for error handling. Then,
the expected number of messages transmitted for positive acknowledgement is 1
+ (N ă 1) + (N ă 2) + 1 + 1, i.e. 2N (this is the 200 you might have guessed earlier),
whereas, for negative acknowledgement, the number is 1 + (N ă 1) + 1 + 1, i.e. N
+ 2 (as one gives a negative acknowledgement back). Therefore, if N is
sufficiently large, the expected number of messages transmitted for positive
acknowledgement is always almost two times larger than for negative
acknowledgement.

Achieving Atomicity of m and mÊ

Of course, there are some drawbacks to using negative acknowledgement.
Consider if the last multicast message is lost - the centre does not know if some
members have received the message or not because, if there is no further
broadcasting, members who did not receive the last message do not know it is
lost because they are not aware its existence - and thus, there is no negative
acknowledgement. So, there might be a short period in which the messages
received by members are inconsistent across all members. However, those
members will finally know they do not receive the last message. When the next
multicast message is sent, all members can check out whether the previous one is
lost or not. Even if there is no next message and the group communication is
closed, the message to indicate the communication is closed will include the last
sequence number, and thus, those members will know from this message that
they missed something.
Figure 5.5 shows the operation to achieve atomicity of m and mÊ. Two multicast
messages m and mÊ are sent by S1 and Sn respectively. Message m is delivered
first and then message mÊ. Centre Si then broadcasts the messages to all members.
Each member sends a positive acknowledgement message (dashed line in the
figure) to the centre to confirm that messages m and mÊ have been correctly
received. When the centre receives all acknowledgements and confirms that all
members received the messages correctly, it sends a message (dotted line in the
figure) to inform each member that the atomicity of m and mÊ has been achieved.
When a member receives this message, it will deliver (commit) messages m and
mÊ to its client process.
Note that to achieve this atomicity, many messages must be transmitted for one
multicast message. Consider a group has N members (including the centre). If
there is no transmission error, the expected number of messages transmitted for 1
multicast message is 1 + (N ă 1) + (N ă 1) + (N ă 1), i.e. 3N ă 2. Also, before the
atomicity message arrives at a member, it cannot deliver its message to its client
process because of the rule of atomicity. The rule is „A message transmitted by
atomic multicast is either received by all of the processes that are members of the
receiving group or else it is received by none of them‰. Thus, we might have a
chance that even if a process has correctly received the multicast message and a
positive acknowledgement has been sent back, the message cannot be delivered
to its client process and should be removed from the buffer if atomicity cannot be
achieved, i.e. some other processes did not correctly receive the message and
acknowledge it.

Figure 5.5: Achieving atomicity of m and mÊ
Failure Detection of a Group Member by Time-Out

Figure 5.6 shows the operation to detect a failed member - S1. Assume S1 fails
after sending the multicast message m. Note that another multicast message mÊ is
sent by Sn and delivered after message m. Centre Si then broadcasts the messages
to all members. Each member sends a positive acknowledgement message
(dashed line in the figure) to the centre to confirm that messages m and mÊ have
been received correctly. When the centre receives „all‰ acknowledgements, it
finds that the acknowledgement from S1 has not been received. After a time-out,
the centre sends a non-data message to station S1 to ask whether it is „alive or
not‰ (usually the message is called a „ping‰ message). Since S1 has now failed, no
acknowledgement will be sent back and it will be time-out again. After the time-
out, the centre confirms that station S1 has failed.
Figure 5.6: Failure detection of a group member by time-out

Fault-Tolerance of a Group Member by Time-Out

Figure 5.7 shows the operation of fault-tolerance for group member S1. To
tolerate S1Ês failure, the group must be reformed. When centre Si detects the
failure of S1, it notifies S2, ⁄, Sn that S1 is out of the group because, according to
the definition of atomicity, if S1 is out of the group the atomicity of the group can
still be maintained since, after reformation, all members of the group have
received the multicast messages. When S2, ⁄, Sn receive the notification, they
will send an acknowledgement back and keep „S1 is out‰ in memory. After
centre SI receives all acknowledgements, it informs S2, ⁄, Sn to deliver m and mÊ
to their client processes by broadcasting an atomicity message. Finally, the centre
and S2, ⁄, Sn (without S1) resume normal operations.
Figure 5.7: Fault tolerance of a group member
Failure Detection of the Centre by Time-Out

Figure 5.8 shows the operation to detect a failed centre - SI. Assume that centre SI
fails. When S1 and Sn send their own multicast message m and mÊ respectively to
centre (Si), they expect their multicast messages will be sent back by centre SI to
process the multicast operations. However, after a specified period (time-out),
they do not receive any message and thus, they know something is wrong. S1 and
Sn will send a probe message „Are you alive?‰ (a dashed line) to detect the
existence of SI. If a time-out occurs again, the failure of the centre is detected.
Note that only members wanting to broadcast messages (S1 and Sn here) have the
ability to detect the failure of the centre because they expected to receive their
own messages back from the centre. Therefore, if a member sends a message and
does not receive any message back, it means the centre has crashed. Any non-
message-sending member of the group could not detect the failure since the
member cannot distinguish between a centre failure and the fact that no message
has been sent to the group - they were not expecting anything.

Figure 5.8: Failure detection of the centre by time-out
Recovery of the Centre Failure by Election

If a group member fails, it is simply removed from the group and then all
members of the group are informed. However, if the centre fails, it becomes more
complicated. You cannot just remove it - and a new centre is needed to do the
broadcasting.
Figure 5.9 shows the operation to recover from a centre failure by election.
Assume Sj is the first one to invite other members to elect it as the new centre. Sj
should usually be the backup server. When members know the centre is down,
they will request Sj to take action. Thus Sj will broadcast an invitation (self-
nomination) to every member of the group. When S2, ⁄, Sn receive the invitation,
they will send back a positive acknowledgement (a confirm message) to Sj to
confirm that they agree to elect Sj the new centre. When Sj receives all confirm
messages, it becomes the new centre.
Although it seems quite easy to handle this failure, some problems still exist.
Sometimes we might claim (in our design) that one backup server is not reliable
enough, therefore there should be more. However, if we have more than one
backup server, they might compete in the election of the new centre. This is the
problem we address next.

Figure 5.9: Recovery of the centre failure by election
Recovery of the Centre Failure by Election with Competition

We introduce how to elect a new centre by competition in Figure 5.10. Assume Sj
is the first one to invite other members to elect it the new centre. Without loss of
generality, this time Sn competes with Sj and sends its own invitation. Then some
members might support Sj as the new centre, and others might support Sn.
We consider the following criteria to select a new centre: If Sn receives more

confirm messages than Sj, Sn will win the election to be the new centre; otherwise,
if Sn receives the same number of confirmed messages as Sj and if SnÊs IP address
is greater than SjÊs IP address, then Sn will win the election; otherwise, Sj will win
the election.
Figure 5.10: Recovery of the centre failure by election with competition

Adding a New Member to the Group

Figure 5.11 shows how to add a new member to the group. Assume process S
wants to join the group. It sends an application of „Ask to join‰ to the centre. If
the centre accepts the application, it will send notification of „S joins‰ to all other
members. All members will acknowledge this notification to the centre. When the
centre receives all acknowledgements to confirm that all members know a new
process S has joined the group, it informs S with an acknowledgement message
that S is accepted as a member.
Note that if the centre does not want to accept the application, it simply does
nothing. After process S tries several times, it will stop trying. Also, if some
members do not want S to be a member of the group, they simply do not
acknowledge the notification of „S joins‰, which means the centre cannot accept S
as a member since it did not receive all acknowledgements.
Figure 5.11: Adding a new member to the group
Fault-Tolerance for Network Partition

Figure 5.12 shows how to handle a situation if the network is partitioned. We
assume that, due to network partitions, group G is divided into two parts ·
partition a and b. Assume a = {S1, S2, ⁄, Si (centre), ⁄, Sj} and b = {SjÊ, ⁄, Sn}
where jÊ = j + 1.
1. In partition a, centre Si sends an invitation message to every member in

partition a and all members have to acknowledge the message. If the centre
gets a majority of members (i.e. the number of processes in partition a is
greater than the half of the total number of the original group, j > n / 2),
partition a will continue normal group communication with its members. It
looks like members SjÊ to Sn leave group G and group (G ) continues its
group communication.

If the centre cannot get a majority of members (i.e. j < n / 2), it blocks. That
means the members in partition a will not have group communication. Even
if they want to form a group and the centre establishes a group
communication for them in partition a, it is a new group and it is
independent of group G.
2. In partition b, since the centre (Si) is not in this partition, they need to elect a
new centre. It can be elected by competition, depending on which recovery
mechanism is used. After the election, it does the same thing as partition a.
That means the centre will examine the size of the partition - If it gets a
majority of members, the partition becomes group G; otherwise, there will
be no group in partition b and they will not have group communication.
Note that if the number of members in partition a is equal to partition b (i.e. each
of them gets half of the total number of members of group G, n / 2), then the
partition that includes centre Si will win and the other partition will block. Here,
one partition is allowed to take the name of the group because each group should
be unique. If more than one group has the same identifier, they belong to the
same group or there is an error.
Figure 5.12: Fault-tolerance for network partition

SELF-TEST 5.2
1. State the advantages and disadvantages of the centralised

approach in the implementation of group communication.
2. Assume that there is a group of 10 members including the

centre. Compute the number of messages that must be
delivered if a member sends two multicast messages to the
group. No atomicity and positive acknowledgement are
required. Also, no messages are lost.
3. Repeat Question 2 if the first message is lost when it is sent

to the centre.
4. Repeat Question 2 if the first message is lost when the centre

sends it to one of the members of the group.
5. Repeat Question 2 if no messages are lost and the atomicity

has to be achieved.
6. We have a way to do fault-tolerance in network partitions.

Should we reform a group if the network partition is
recovered from several subnets to one single network?
This topic looks at the design and implementation issues of multicast group
communication, and focuses initially on the characteristics of atomicity and
ordering in such systems. You should now understand the simple group
management functions.
You should also know how to design group communication systems. In

particular, you should understand the concepts underlying the centralised
implementation approach for group communication, and be able to describe the
advantages and disadvantages of this approach. You should also be fully aware
of the different types of problem that can occur in group communications and
fully understand how the centralised approach handles them.

Finally, think about the logistics of centralised group communication in relation

to client-server communication from the point of view of „being‰ a process in a
system. If a process wants to send a message to many other processes using
client-server communication, the sending process must set up the required
number of links and handle the send and receive messages. Now, if the sender
was part of a well-defined group - and depending on what form of multicasting
was used (atomic, reliable or unreliable) - the sending process only has to send its
message once and the server (the centre) does the rest. The process can be doing
other things while the server does its work - almost like the server has become the
child of the process. Quite efficient from a processÊs point of view, isnÊt it?
Coulouris, G F, Dollimore, J and Kindberg, T. (2001) Distributed Systems:

Concepts and Design, Third Edition, Reading, MA: Addison Wesley.
Fidge, C (1991) „Logical Time in Distributed Computing Systems‰, IEEE

Computer, 24:8, 28ă33.
Kaashoek, M F (1993) „Group Communication in Amoeba and its Applications‰,

Distributed System Engineering, 1:9, 48ă58.
Lamport, L (1978) „Time, clocks and the ordering of events in a distributed

system‰, Comms. ACM, 21:7, 558ă65.
Mattern, F (1989) „Virtual Time and Global States in Distributed Systems‰, in

Proceedings Workshop on Parallel and Distributed Algorithms, 215ă26.
Tanenbaum, A S and van Steen, M (2002) Distributed Systems: Principles and

Paradigms, Upper Saddle River, NJ: Prentice Hall.

SUGGESTED SOLUTIONS TO SELF-TEST QUESTIONS W 169
Suggested Solutions to
Self-test Questions
TOPIC 1: INTRODUCTION TO NETWORK AND DISTRIBUTED SYSTEM
Self-test 1.1
1. Three main reasons for using networks are:
(a) Geographic: A network is used for communication over a distance.

The separation of computers, data and information centre and other
physical and logical devices are common in different applications.
(b) Resource sharing: A network can be used to connect users and shared
resources. Physical devices such as printers, and logical devices such
as databases and program libraries, can be shared.
(c) Supporting distributed systems: Computer users can request remote

services through networks.
2. Twisted pair, coaxial cable, optical fibre, radio and satellite.
3. Consider there is a computer connected to the other n ă 1 computers. It

requires n ă 1 links. Then, by ignoring the first computer (since it is
connected to the rest of them), the next computer requires n ă 2 links for the
remaining n ă 2 computers. Now take each computer in turn and repeat the
above steps. You will find that the total number of links required is:
(n ă 1) + (n ă 2) + ⁄ + 2 + 1
i.e., n(n ă 1) / 2.
This result indicates that for large n, a fully-connected approach is not

practical and thus, some messages need to go through more than one link to
their destinations.

170 X SUGGESTED SOLUTIONS TO SELF-TEST QUESTIONS
Self-test 1.2
1. (a) MAN or LAN, because the size of a shopping centre is fixed. If its size
is large, a MAN or several connected LANs may be required;
otherwise, a LAN is sufficient.
(b) LAN. The staff of the department are usually located together and
thus, the area to be covered will not be too big.
(c) WAN. The WWW provides services to the world, WAN is a suitable
one.
2. (a) Host-to-Network, because it does not depend on either TCP/IP or the

application layer. Also, the host-to-network layer handles all physical
networks.
(b) IP. The second main function of IP is to „teach‰ the message how to
travel from a sender, through different routers, to its corresponding
receiver.
(c) Application, because email service is the top of TCP.
(d) TCP, because TCP provides end-to-end reliable transmission for users;
thus correcting any errors in the data bits is a part of its job.
Self-test 1.3
1. Network: TCP/IP, Ethernet, Token Ring and ATM (Asynchronous Transfer

Mode).
Distributed services: SUN RPC (Remote Procedure Call), NFS (Network File
System), AFS (Andrew File System).
2. Secure communication. A network can provide reliable communication

between a sender and its corresponding receiver. However, no security
services are provided and for distributed systems, a standard procedure is
provided to build a security communication channel on top of a network.

Self-test 1.4
1. In current distributed systems, resources include processors, printers, disk

drives, processes, communication channels, and files and other types of
data item (e.g. electronic mail messages, financial information).
2. The two most important transparencies are the location transparencies,

because their existence strongly affects the use of distributed resources.
Self-test 1.5
1. It is possible but not desirable to build the distributed application services

into application layer for a layered structure of distributed systems. One of
the objectives of building middleware is to develop some common
distributed services for different applications. If they are built into the
application layer, each application will implement its own distributed
services and thus, it would not be cost-effective.
2. Client-server model: Distributed File System.
Failure model: Intranet Banking System (they need to handle failures, if

any).
Security model: Internet Banking System (they need to protect remote user
information).
TOPIC 2: NETWORKING AND INTER-NETWORKING
Self-test 2.1
1. (a) 1
(b) 0 (0 is treated as an even number)
(c) 1
2.

Self-test 2.2
1. The two reasons are:
(a) Data signal may be lost or damaged because of a noisy environment. If

we send continuous bits, it is difficult for the sender and receiver to
handle any error (e.g. detect the location of the error and retransmit
the data). For example, if 10,000-bits are sent continuously, we may
need to retransmit all of the data when the first 200-bits have errors.
But if we send data packet by packet, it will be easy to find out which
one is damaged or lost and then retransmit that packet. If a packet
contains 1,000 bits of data, we may need to resend first 3 data packets
only for error data transmission.
(b) Sending long continuous bits may cause the fairness problem. If the
communication occupies a communication channel for a very long
time for sending data, others will suffer long transmission delays.
Sending data in packets can solve this problem. Everyone can send
data, but the computer networks will handle the data in turns. For
example, one computer sends 10,000 bits, which requires 10 seconds to
complete; other computers may suffer 10 seconds delay. However, if
we use ten packets to send it, other stations may suffer only 1 second
delay and then their packets (packets sent from other stations) may
start to transmit after 1 second delay.
2. The data communication time = 10 seconds.

The overall connection time = 4 + 10 seconds = 14 seconds.
Thus, the efficiency = 10 / 14 × 100 % = 71.43 %.
3. The data size for a packet = 120 ă 24-bits = 96 bits.

The average packet size = 120-bits.
Thus, the efficiency = 96 / 120 × 100 % = 80 %.
4. To increase the efficiency up to 80%, we have:

Data communication time / (4 seconds + Data communication time) ≥ 80 %
⇒ Data communication time ≥ 16 seconds.

Self-test 2.3
1. (a) The throughput is 3 Mbps × (100 % ă 20 %) = 2.4 Mbps.

(b) The normalised throughput is 100 % ă 20 %, i.e. 80 %.
2. If the loading of a network is low, the throughput is not related to the delay,
because even if the delay to send a packet is long, the network still has time
to maintain the low data transmission rate.
However, if the loading is high, and every computer wants to send packets,
the throughput and the delay is closely related. If the average delay to
deliver packets is long, the time spent to complete the transmission will be
long and hence the throughput cannot be very high. Sometimes it is even
smaller than the input packet arrival rate, which will cause network
congestion.
Self-test 2.4
1. The three differences are:
(a) The data signal in the ring does not travel in two directions on the
shared transmission link but only one direction - clockwise or anti-
clockwise.
(b) There are no terminators in the ring topology. When the data signal
travels from a sender, it goes through the ring and then comes back to
the sender, and the sender will absorb the data signal. That means that
when the sender receives the data signal, the sender will not send it to
the ring again.
(c) The ring interface is active. Basically a ring is connected by computers

(hosts). If one of the computers has crashed, the ring will be
disconnected. When a bus interface receives any data signal, the
interface will copy it. If the data signal belongs to its computer, it will
copy it to the computer and copy and send it to the next computer
through the ring; otherwise, the ring interface will simply copy and
send it to the next computer through the ring.
2. The topology chosen is bus, since frequent re-configuration is a

requirement. The medium chosen is coaxial cable for its ease of tapping,
immunity to noise and its ability to provide multiple channels for carrying
data, voice and video. Optical fibre is also a suitable choice since it is an
error-free transmission medium and its transmission rate is high.

3. There are some advantages:
(a) It is a collision-free topology. Since all hosts are connected to the node
with dedicated links, they do not share transmission links but they
share the node, so there is no collision within the transmission links.
(b) There is no control procedure for each host. Hosts just send their
messages or data to the node and the node will handle it.
(c) To increase the efficiency of the network, what we do is just to

increase the efficiency of the node and the performance of each host is
not related to the network.
(d) Also, the reliability of the network is dependent on the node; thus it is
easy to maintain a network - just maintain the node.
Self-test 2.5
1. The four rules are:
(a) When a station (computer) has data to send, it first listens to the
channel (segment) to see if anyone else is transmitting at that moment.
(b) If the channel is idle, it transmits a frame (packet).
(c) If the channel is busy, the station waits until it becomes idle.
(d) If a collision occurs, the station aborts their transmission, sends a jam
signal, waits a random amount of time, and then starts all over again.
2. The maximum throughput of a CSMA/CD is given by:
1 τ
S ≤ ,where a =
1 + 4.44a tx
For a WAN spanning a long distance, τ is large relative to tx. Hence a will be
large and the maximum throughput will be small. Thus, CSMA/CD is
insufficient for WANs.
Also, since the shared transmission link is long, the minimum frame size is
very large and it is not practical in WANs.

3. (a) Minimum frame size = 2τC = 2 × 4 km × 10 μs/km × 5 Mbps = 400-

bits.
(b) tx = 1000-bits / 5 Mbps = 200 μs.

τ = 4 km × 10 μs / km = 40 μs.
a = τ / tx = 0.2
Therefore, S = 1 / (1 + 4.44a) = 0.53 and the maximum throughput is
5 Mbps × 0.53, i.e. 2.65 Mbps.
(c) The real throughput = 2.65 Mbps × 60 % = 1.59 Mbps.
Self-test 2.6
1. The four rules are:
(a) At ring initialisation, a special packet called a token is injected into the
ring and circulates on the ring.
(b) A station (wanting to transmit) waits until it sees a token. It holds the
token and transmits its frame (packet) on the ring.
(c) The station absorbs the transmitted frame when it circulates back.
(d) Then the station releases the token.
2. The maximum access delay = 3 ms + (100 ă 1) × 10 ms = 993 ms.

TOPIC 3: TRANSMISSION CONTROL PROTOCOL/ INTERNET PROTOCOL

TCP/IP
Self-test 3.1
1. 1Æ 5
1Æ4Æ5
1Æ2Æ3Æ5
1Æ2Æ3Æ4Æ5
1Æ4Æ3Æ5
2. In circuit switching, a connection is to be established before communication

occurs. One typical example of circuit switching is the traditional telephone
network. In packet switching, each message is divided into small packets,
and each packet is routed through the communication network
individually. The packets may arrive at the destination in a different order.
They are to be assembled at the destination to form the whole message.
Self-test 3.2
1. The two reasons are:
(a) The minimum Ethernet frame size is 2τC where τ is the end-to-end
propagation delay and C is the link transmission capacity. If we apply
CSMA/CD into a WAN, the value of τ will be very large and so will
the minimum Ethernet frame size. For example, a reasonable
propagation delay is 10 μs / km and an appropriate length of a WAN
is 100 km. Then τ is equal to 1 ms. A reasonable link transmission
capacity is 10 Mbps and thus the minimum Ethernet frame size under
this configuration is 20-kbits, which is not a reasonable size for a
frame.
(b) The maximum normalised throughput (link utilisation) is 1 / (1 +

4.44a). Assume a reasonable frame size on average is 1,000-bits. We
have tx = 1,000-bits / 10 Mbps = 0.1 ms and a = τ / tx = 10. Thus the
maximum link utilisation is 2.203%, which is a very small value and
an unacceptable link utilisation.
2. Fixed maximum packet size gives better buffer management and is fairer
for other packets. If we have the fixed maximum size, we know how many
packets a buffer can store, and thus we know how to manage the buffer and
estimate whether it is adequate or not. Also, a fixed maximum packet size
gives the upper bound of transmission delay. Thus no one suffers a long

transmission delay, and packets can take turns being processed within a
node.
3. The similarity is they both handle packet routing. The difference is the
network layer in the ISO OSI reference model handles the network
congestion, whereas the Internet layer in the TCP/IP reference model
handles IP address format and ignores network congestion.
4. The similarity is they both provide reliable end-to-end message transport

service and multiplexing. But the transport layer in the ISO OSI reference
model provides QoS negotiation, whereas that in TCP/IP reference model
does not.
Self-test 3.3
1. 144.214.6.20 ⇒ 10010000 11010110 00000110 00010100
(a) Class B since it has leading „10‰.
(b) Network identifier = 010000 11010110 and host identifier = 00000110

00010100.
2. If packets have wrong destination addresses or some routers route packets

wrong, the packets will stay in the Internet forever.
3.
Destination Mask Next Hop
20.0.0.0 255.0.0.0 Direct deliver
40.0.0.0 255.0.0.0 Direct deliver
128.1.0.0 255.255.0.0 Direct deliver
192.4.10.0 255.255.255.0 128.1.0.9
144.214.0.0 255.255.0.0 128.1.0.9
4. ARP is used to search for the low-level network address or physical address
of a host computer if we do not have the mapping between it and its
corresponding IP address. If we do not have ARP, we need the network
administrator to update the mapping at various times. And, if the mapping
table is sometimes out-of-date, some packets cannot be correctly routed to
their destination.

Self-test 3.4
1. We know the maximum size of an IP datagram is 65535-bytes. Also, we

know the minimum size of an IP datagram header is 20-bytes. Since a TCP
segment is embedded into an IP datagram, the maximum size of a TCP
segment is 65535-bytes - 20-bytes, i.e. 65515 bytes.
3. The total transmission time = tp + tx + tproc + tack + tp = 5 + 10 × 10 + 0.5 +

160 / 100 + 5 = 112.1 ms. Thus, the maximum link utilisation is 100 / 112.1 =
89.21%.
4.
Trans. No. Wc (kbytes)
24 (TO) 40
25 1
26 2
27 4
28 8
29 16
30 20
31 21
32 22
33 23

Self-test 3.5
1. A DNS server provides domain name mapping services to its clients. When
sent a service request to map a domain name of a machine, the DNS server
replies with the IP address of the machine. The above process is called name
resolving. Each machine on the Internet has a piece of software for resolving
names. It is often known as a resolver. For example, in UNIX, this is
accessed by calling gethostbyname. A resolver is configured with the IP
address of a local DNS server. When called, it packages a request to that
DNS server. When the DNS server returns the result, the resolver relays the
result to the caller.
When a request reaches a DNS server (usually the closest DNS server), the
name is extracted. If the server is an authority for the name, then the name
appears in its mapping database, and a lookup will return the IP address.
Otherwise, this DNS server will become the client of other DNS servers and
will send a request. When the reply comes back, it in turn replies to the
resolver.
2. Not every recipient on a mailing list is guaranteed to receive a copy of

message sent to the list, because there is no positive acknowledgement time-
out retransmission scheme installed in the email system. Thus, the email
system does not know if some recipients do not receive the email.
TOPIC 4: INTERPROCESS COMMUNICATION (IPC) AND REMOTE

PROCEDURE CALLS (RPC)
Self-test 4.1
1. External data form is included in order to define the type of each data item
in the message to enable the sender and recipient to interpret them in the
same way. There is no need to include it if the sender and recipient agree on
the number of data items and the type of each one before the message is
transmitted.

2.
8
„Netw‰
„orks‰
3
„and_‰
11
„Dist‰
„ribu‰
„ted_‰
7
„Syst‰
„ems_‰
3. A blocking send delays the sending process until the message is received.
The delay is considerable when different processes in different computers
are involved. The advantage of a non-blocking send is that it avoids
delaying the sending process, allowing the sender to proceed with work in
parallel with the receiving process.
A non-blocking form of send may also be used in situations in which

blocking primitives can lead to deadlock. If someone is waiting for your
reply but you are blocked and waiting for his or her reply, it causes
deadlock. However, if we use non-blocking send, both sides receive the
replies and thus no deadlock has occurred.
The disadvantage of non-blocking send is that the sender must make the
effort to ensure that a message is really received by the receiver. It causes
high implementation complexity.
4. The client would be delayed for a long time waiting for service from the
server, as the server may be busy serving other clients.
Self-test 4.2
1. The doOperation-getRequest-sendReply communication can be designed

with whatever amount of reliability is required to support the client-server
communication above it, for example, by providing time-outs, suppressing
duplicate request messages or supplying lost reply messages. Moreover, it
requires one less system call than send-receive communication (three
instead of four). Finally, it avoids one acknowledgement message by using
the reply message as an acknowledgement of the request message. For
asynchronous communication, a non-blocking send operation may require a
separated acknowledgement to ensure that the send operation successes.

2. An idempotent operation is repeatable. No matter how many times you

carry it out, the same effect is achieved.
In part (a), it is idempotent because you can repeat it any number of times
with the same effect.
In part (b), the operation to write data to a file can be considered in two
different situations. It can be defined as in UNIX, in which each write
operation is applied at the read-write pointer, so the operation is not
idempotent. It can also be defined as in several file servers in which the
write operations are applied to a specified sequence of locations, so the
operation is idempotent because it can be repeated any number of times
with the same effect.
In part (c), the operation to append data to a file is not idempotent, because
this operation puts something new into the file each time.
Self-test 4.3
1. There are 11 steps.
1. The client provides the arguments and calls the client stub in the
normal way.
2. The client stub builds (marshals) a message (call request) and traps to
Operating System and network kernel.
3. The kernel sends the message to the remote kernel.
4. The remote kernel receives the message and gives it to the server
dispatcher.
5. The dispatcher selects the appropriate server stub.
6. The server stub unpacks (unmarshals) the parameters and calls the
corresponding server procedure.
7. The server procedure does the work and returns the result to the
server stub.
8. The server stub packs (marshals) it in a message (call return) and traps
it to Operating System and network kernel.
9. The remote (receiver) kernel sends the message to the client kernel.
10. The client kernel gives the message to the client stub.
11. The client stub unpacks (unmarshals) the result and returns to client.

2. Assume a high level Interface Definition Language (IDL) exists; the heading
of a stub procedure is derived directly from the corresponding procedure
signature in the IDL. The same names and types are used for the arguments.
Moreover, marshalling requires a library of marshalling and unmarshalling
procedures for the simple data types. In the client stub procedure, the input
arguments are marshalled one by one into the request message and the
output arguments are unmarshalled one by one from the reply message.
Finally, the IDL signature can determine the order of the arguments. All of
the above tasks can be generated automatically from an interface definition.
3. The server uses server interface definition. RPC interface specifies the
characteristics of the procedures provided by a server that are visible to the
clients. The characteristics include the names of the procedures and types of
parameter. Each parameter is defined as input or output. In summary, an
interface contains a list of procedure signatures · the names and types of
their I/O arguments. Later, when we talk about the implementation of Sun
RPC, we give an example of the interface.
4. The client needs to marshal the request and then communicate with the
server. As mentioned, RPC provides a standard mechanism to handle
everything except the contents of the client and server processes. Here, for
each remote procedure call, a client stub procedure is generated and
attached to the client program. The client stub replaces the remote
procedure call with a local call to the stub procedure. The stub procedure is
used to marshal the input arguments and place them into a message with
the procedure identifier of the remote procedure. The client stub uses IPC
primitive to send the call request message to the server and to wait for the
reply message.
5. The server receives the request and unmarshalls it. For each server, a
dispatcher is provided in the remote kernel. It receives the call request
message from the client and uses the procedure identifier in the message to
select the appropriate server procedure and pass the arguments to that
procedure. For each procedure in the server, which is declared at the server
interface as callable remotely, a server stub procedure is generated. The task
of a server stub procedure is to unmarshall the arguments and call the
corresponding local service procedure.
6. To allow multiple instances of the same service, the binder must associate a
unique (server) identifier with each service name and server port. The
unique identifier may be chosen by the server before Register, or by the
binder, so the unique identifier is returned to the server. The following are
the new versions of the binding service:

/* first alternative · binder adds name, port, version and unique identifier
to its table */
Procedure Register (String ServiceName, Port ServerPort, int

version, int InstanceID)
/* binder removes the entry, port, version and unique identifier from its
table */
Procedure Withdraw (String ServiceName, Port ServerPort,

int version, int InstanceID)
/* binder looks up service with this name and version and returns the
server port */
Procedure Portlookup (String ServiceName, int version)
7. TRPC · Time taken to complete a RPC.

F · Fixed operating system cost including the processing time of client and
server stubs and the time taken for the dispatcher to select an appropriate
server process.
N · Network transmission time

TRPC = F + N
Let TÊRPC be the new time taken to complete the RPC and NÊ be the new
network transmission time such that NÊ = 0.1 N (the transmission rate is 10
times faster), then
TÊRPC = F + NÊ = F + 0.1N
Since N = 0.8 TRPC (network transmission time accounts for 80%), then
F = 0.2 TRPC
and
TÊRPC = 0.2 TRPC + 0.1 (0.8 TRPC) = 0.28 TRPC
Thus the percentage change is (TRPC ă TÊRPC) / TRPC × 100 %, i.e. 72%.
That is, the time to complete a RPC will be shortened by 72% when the
network is upgraded to 100Mbps.

Self-test 4.4
1. No, we cannot use the procedure number 0, because it is assigned as the

Ânull procedureÊ. It does not require any arguments and returns nothing. It
is automatically generated by rpcgen compiler. The function of this
procedure number is to allow a client to call it to verify that the particular
program and version exist. It is also useful for clients to calculate the round-
trip time since if a client calls it, it does nothing and returns a null reply to
the client. Thus the time taken to call an RPC with procedure number 0 is
the round-trip communication time.
2. Simply change the following if statement in chkdate.c

TOPIC 5: MULTICAST GROUP COMMUNICATION
Self-test 5.1
1. An obvious example is the multicasting of flight information to a group of

display processes in an airport terminal system.
2. Ordering does not matter if the data sent by two clients are independent of
each other. In other words, there is no time or logical sequence attached to
the order in which messages should be acted on. (There will be no problems
with data inconsistency, because one action is not dependent on a previous
action.)
Ordering is essential for update requests to replicated services, because all

service replicas must execute the operations in the same order.
3. In replicated data update, if the atomicity is not upheld, some group

members may miss some of the updates and their copies of data will be
inconsistent.
Although the messages, once sent, will get to the destination reliably, the
sender may fail after sending to some, but not to all members of the group.
That means the sending process may fail when it is doing the message
transmission to all members. Thus some members receive the message but
others cannot.
Self-test 5.2
1. Advantages: simple implementation, high efficiency in establishing total

message ordering, simple group database management (easy to maintain
the consistency), and light loading for each source.
Disadvantages: highly reliable and high-speed server is required; fault-

tolerance for the server is important; performance bottleneck of the group
communication is in the server.
2. Two messages are sent from the source to the centre.
18 messages are sent from the centre to all members (9 members × 2

messages).
Therefore, the total is 20.

One retransmitted message is sent from the source to the centre.
18 messages are sent from the centre to all members.
One negative acknowledgement message is sent from the member who

does not receive the second message to the centre.
1 message is resent from the centre to the member.
A total of nine acknowledgements are sent from all members to the centre
(18 acknowledgements are required if one acknowledgement is used for one
message only).
A total of nine messages are sent from the centre to all members to confirm
the atomicity is achieved.
Therefore, the total is 38 (or 47 if one acknowledgement is used for each

message).
6. If the processes in the partition, which cannot form a group, want to join the
group again, we consider them new members. If the processes in the group
do not want this, they will simply ignore the newcomers.

MODULE FEEDBACK
MAKLUM BALAS MODUL
If you have any comment or feedback, you are welcome to:
1. E-mail your comment or feedback to modulefeedback@oum.edu.my
OR
2. Fill in the Print Module online evaluation form available on myINSPIRE.
Thank you.
Centre for Instructional Design and Technology

(Pusat Reka Bentuk Pengajaran dan Teknologi )
Tel No.: 03-27732578
Fax No.: 03-26978702


CBDT3103 Intro To Distributed System PDF

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

CBDT3103 Intro To Distributed System PDF

Hochgeladen von

Copyright:

Verfügbare Formate

Faculty of Science and Technology

Copyright © Open University Malaysia (OUM)

Copyright © Open University Malaysia (OUM)

Developer: Dr Lawrence Chi-Chung Cheung

Designer: Dr Rex G Sharman

Coordinator: Dr Reggie Kwan

External Course Assessor: Dr Cho-Li Wang

Adapted for Open University Malaysia: Prof Dr Mohammed Yusoff

Reviewed by: Shadil Akimi

First Edition, January 2009

Copyright © Open University Malaysia (OUM)

Topic 1: Introduction to the Network and Distributed System 1

Topic 2: Networking and Internetworking 23

Copyright © Open University Malaysia (OUM)

2.6 An Example of a Ring Network · Token Ring 52

Topic 3: Transmission Control Protocol/ Internet Protocol (TCP/IP) 65

Copyright © Open University Malaysia (OUM)

Topic 4: Interprocess Communication (IPC) and

Topic 5: Multicast Group Communication 146

Suggested Answers to Self-Test Questions 169

Copyright © Open University Malaysia (OUM)

Copyright © Open University Malaysia (OUM)

Copyright © Open University Malaysia (OUM)

PURPOSE OF THIS COURSE GUIDE

Copyright © Open University Malaysia (OUM)

• Describe the basic concept of Local Area Networks (LANs).

Topic 1: Introduction to the Network and Distributed System

Topic 2: Networking and Internetworking

Copyright © Open University Malaysia (OUM)

Ć Network services: You study two important types of network service ·

Ć Quality of Services (QoS): You learn which factors should be considered to

Ć Networking requirements for distributed systems: You are shown how

Ć Internetworking: Internetworking devices usually connect LANs together.

Topic 3: Transmission Control Protocol/Internet Protocol (TCP/IP)

Copyright © Open University Malaysia (OUM)

Topic 4: Interprocess Communication (IPC) and Remote Procedure Calls (RPC)

A simple client-server model

Copyright © Open University Malaysia (OUM)

implementation problems surrounding client-server communication. Finally, you

Topic 5: Multicast Group communication

YOUR STUDY PLAN

Table 1: Study Plan

Activities Totals Hours

Copyright © Open University Malaysia (OUM)

INDEPENDENT STUDY GUIDE

Table 2: General Guideline

Copyright © Open University Malaysia (OUM)

SUPPLEMENTARY COURSE MATERIALS

Continuous Assessment Components:

• Involvement in Online Discussion

Copyright © Open University Malaysia (OUM)

• How to Do Your Assignments

Copyright © Open University Malaysia (OUM)

Copyright © Open University Malaysia (OUM)

Copyright © Open University Malaysia (OUM)

Copyright © Open University Malaysia (OUM)

1. The first development was in the power of microprocessors. The

2. The second development was the invention of computer networking. A

Now, powerful computers connected through networks are found everywhere.

1.1.1 Network Goals

Copyright © Open University Malaysia (OUM)

computers or data devices. For example, if a user wants to print a document, he

Computer networks also enhance human communication. Branches of companies