Beruflich Dokumente
Kultur Dokumente
CBDT3103
Introduction to Distributed System
Member: Dr Vanessa S C Ng
OUHK
Copyright © The Open University of Hong Kong and Open University Malaysia,
February 2011, CBDT3103
All rights reserved. No part of this work may be reproduced in any form or by any means
without the written permission of the President, Open University Malaysia (OUM).
INTRODUCTION
Welcome to CBDT3103 Introduction to Distributed System. This course is a one-
semester, three-credit, under-graduate-level course for OUM students seeking a
Bachelor Degree in Information Technology with Network Computing.
Assignment and test in this module will help you master the topics for a period of
one semester.
Think of your study module as reading the lecture instead of hearing it from a
lecturer. Basically, in the open distance mode of education, the module replace
your live lecture notes. However, the module still require you to think for
yourself and to practice key skills. In the same way that a lecturer in a
conventional full-time mode of study might give you an in-class exercise, your
study module will have activities for you to do at appropriate points. You will
also find self-test questions in each unit. These activities and self-tests give you
practice in the skills that you need to achieve the objectives of the course and to
complete assignments and pass the final examination. You are also strongly
advised to discuss with your tutors, during the tutorial sessions, the difficult
points or topics you may encounter in the module.
COURSE OBJECTIVES
By the end of this course, you should be able to:
• Differentiate between networks and distributed systems.
• Explain the role of a network in a distributed system.
• Outline the challenges of designing and implementing distributed systems.
• Describe the architectural models of distributed systems.
• Identify the fundamental models of distributed systems
• Identify different network services.
• Outline the quality of services required in a network.
MODULE STRUCTURES
There are FIVE major topics in this modules. A brief summary of the five major
topics are given below.
software protocols you study include network services, quality of services (QoS),
networking requirements for distributed systems, LAN (Local Area Network),
and the devices used in internetworking (repeater, bridge and router).
Ć LAN: LAN (Local Area Network) is described in this topic, with two case
studies of common configurations · Ethernet and Token Ring. Quite a large
portion of this unit is devoted to studying LANs, since LANs support most
distributed systems (that are then connected to a WAN.
PDU of the TCP layer) header, services, and multiplexing function. TCP is
followed by a brief discussion of UDP (User Datagram Protocol). Some simple
Internet applications, such as DNS (Domain Name System), email, Telnet and
FTP (File Transfer Protocol), are introduced briefly.
At the end of this unit is a summary followed by the solutions to the self-test
questions. The self-tests are spread throughout the topic at appropriate points so
that you can test your understanding of the material. You are advised to complete
the self-tests before checking the answers.
Server 1
Client A
Server 2
Client B
Server 3
Topic 4 first introduces the concept of IPC and then deals with the related
concepts of marshalling and unmarshalling. A discussion of synchronization
follows, and two kinds of communication mode are introduced: synchronous and
asynchronous communications. Then you study some design and
So, you should immediately begin to sense that the problems of handling group
communication are quite different from point-to-point communication, and weÊll
investigate the problems and suggest some solutions to solve them. Topic 5 will:
1. Define group communication along with its characteristics.
2. Define important concepts for group communication such as atomicity and
ordering.
3. Discuss the design and implementation issues of group communication.
No. of Assessment
Topics
Hours Activities
Self-test 1.1
Self-test 1.2
1 Introduction to the network and distributed
10 Self-test 1.3
system
Self-test 1.4
Self-test 1.5
Self-test 2.1
Self-test 2.2
Self-test 2.3
2 Networking and internetworking 13
Self-test 2.4
Self-test 2.5
Self-test 2.6
Self-test 3.1
Self-test 3.2
3 Transmission Control Protocol/Internet
13 Self-test 3.3
Protocol (TCP/IP)
Self-test 3.4
Self-test 3.5
Self-test 4.1
4 Interprocess communication (IPC) and Self-test 4.2
12
remote procedure calls (RPC) Self-test 4.3
Self-test 4.4
Self-test 5.1
5 Multicast group communication 12
Self-test 5.2
Text Book
George Coulouris, Jean Dollimore and Tim Kindberg (2005) Distributed Systems:
Concepts and Design, 4th edition, Reading, MA: Addison Wesley Longman.
Non-Print Media
OUM will also provide you with e-materials to support you in your learning.
This e-materials are available in OUM portal, in particular in OUM Learning
Management System, known as myLMS. You are required to access this Learning
Management System. Faculty website is also contain in the portal.
COURSE ASSESSMENT
Formal assessment is of two components:
• Continuous assessment, which contributes 50% to your final mark
• Course examination, which contributes 50% to your final mark.
• Assignment
For this course you are required to do one or two assignments. The objective
of the assignment is:
– To provide a mechanism for you to check your progress and make sure
that you have met certain learning objective
– To provide you with the chance to demonstrate your understanding of the
materials in the module.
– To provide an opportunity for you to apply what you have learned.
Stott, V. (2002). Web server technology, 2nd edn., London: ABC Publishing.
COURSE EXAMINATION
Course examination will contribute 50% of the final mark. The examination is
divided into two parts: part one which will be conducted in the mid-term and
part two which will be conducted at end of the course. Each part contributes 25%
of the total mark of 50%. Part one will examine the first few topics and part two
will examine the last few topics of the module.
TUTORIALS
The course includes 5 tutorial meetings of two hours each. The tutorials are
conducted to provide an opportunity for you to meet your tutors and discussed
important points or difficult points or concepts in the module. In addition, you
have an opportunity to discuss self-test with your tutors or share your study
experiences and difficulties in your peer-to-peer group discussions. Although the
tutorials are not compulsory, you are encouraged to attend the tutorial meetings
as far as possible. It is strongly recommended that you attend all tutorials, as they
will provide considerable assistance in your study of this course. Moreover, you
will have the chance to meet with other distance learners who are taking the same
course.
GROUP PROJECT
Please do group project if it is specified in the course. The group project provides
you with the opportunity to show your ability to work in group, namely to do
group problem solving, sharing and communicate your ideas to group members.
You are required to use myVLE in this group project, i.e. to communicate and
share your ideas with the group members.
LEARNING OUTCOMES
By the end of this topic 1, you should be able to:
1. Differentiate between networks and distributed systems;
2. Explain the role of a network in a distributed system;
3. Outline the challenges of designing and implementing distributed
systems;
4. Describe the architectural models of distributed systems; and
5. Identify the fundamental models of distributed systems.
X INTRODUCTION
The latter part of the 20th century sets the stage for the age of information, during
which technologies for information gathering, processing and distribution were
developed. One of the most important technologies developed for information
management was the computer. Computers were introduced after the Second
World War. Within half a century, computer technology and the scope of its
applications have developed very fast, and computers are now as common as
cars and TV sets.
In the early years of evolution, computers were large and expensive. Only
governments and large organisations had them for computation. Since the mid-
1980s, two important improvements were introduced that radically changed the
face of computers (and computing technologies).
1.1 NETWORKS
Networks, or computer networks, have been growing rapidly. They are now an
essential part of our computer systems. According to Tanenbaum (1995), a
network is an inter-connected collection of autonomous computers. Most
computers in your organisation ă whether PCs or servers ă are most probably
connected to a network. A computer network consists of a series of computers
that are connected together so that they can communicate with each other.
Computer networks can also share peripheral devices like printers.
1. A wide area network (WAN) spans the longest distance, such as a city, a
country or even all over the world. The most common example of a WAN is
the Internet. Almost all networked computers are connected to it.
To Share Resources
Another important goal of networking is to enable resource sharing. Some
devices (such as printers) are expensive so many users should be able to access
them. An obvious solution is to connect the devices to the network, and every
user who is connected to the network can share the devices. For example, many
users can share an expensive high-speed colour laser printer by connecting it to a
network with other computers. Sharing resources does not need to be limited to
physical (hardware) devices ă data files and application software can also be
shared.
Transmission Media
Transmission media are used for the actual transmission (transportation) of data
or information. At the lowest level, computer networks encode data into a form
of signals, such as electromagnetic or optical signals, and send them through a
transmission medium. For example, copper wires are used to transmit data in the
form of electromagnetic waves from a sender to its corresponding receiver.
Each transmission medium has its own characteristics, such as bandwidth, delay,
cost and ease of installation and maintenance. You are introduced to different
transmission media in Topic 2.
Network Hardware
Let us consider which hardware should be included in a network. Figure 1.1
shows a simple high-level model of a network.
The sender generates a message and puts it into the network. The network
receives the message and then transfers it to the receiver. The receiver takes the
message out and gives it to its application program. Note that there may be many
small networks (called sub-networks) connected to each other to form a big
network.
users run application programs in those machines and thus, we can call them
hosts or host computers (they can also be called nodes, end-stations, machines or
end-users).
There are many ways in which nodes and links are inter-connected to form sub-
networks. Usually, we classify them into two types of transmission technology:
Ć Local Area Networks (LANs): These are small networks that are usually
implemented for use in offices, buildings and campuses up to a few square
kilometres in size. They are widely used to connect personal computers,
workstations and devices in company offices to exchange information and
share resources. The three common kinds of network topology for LANs are
star, bus and ring. The size of LAN is small (the coverage area should be
within a few kilometres) so the transmission delay (the average time taken
between sending and receiving a message) is short (less than 10 ms). The
data-transmission rate (the average speed of transmitting a message from a
sender to a receiver) is high (from 107 to 109 bits per second). We focus on
LANs in our study, because LANs are usually built as distributed systems.
Some of the following businesses in Hong Kong are likely to have their own
MAN: Park'n Shop, Marks & Spencer, G2000, and so on. Each of these firms
has a chain of shops in Hong Kong, and each shop's computer network is
connected to a broader MAN system.
Ć Wide Area Networks (WANs): They span a large geographical area such as a
country or a continent - or even the world. They connect many machines
together (at least thousands) and usually the transmission delay is long (up to
a few seconds) and the transmission rate is low (until now it may be up to 106
bits per second but still lower than LANs). One of the most common
examples of a WAN is the Internet.
Ć Wireless Networks (or Mobile Networks) are another type of network. They
use a wireless transmission medium. Many users who have desktop machines
on LANs and WANs want to work outside with their computers. It is
impossible if their computers are wired, thus, there is a lot of interest in
exploring the use of wireless networks. The terminals used in wireless
networks are mobile computers or personal digital assistants (PDAs).
Wireless networks are necessary, especially when the environment is difficult
for cabling or when users are always "on the move".
SELF-TEST 1.1
1. Give three reasons for using computer networks. Briefly explain
each.
2. State the names of the different transmission media that you
know.
3. Suppose there are n computers and you want to have a
communication path between any two of them. This is to be
achieved by direct (point-to-point) connection with links only (no
switching nodes or routers). How many links are required? What
implication can you draw from your answer?
A layer is defined as a service provided for its upper layer. The number of layers
and the functions of each one are different for different networks. Each layer is
independent from the others, but there is a communication interface between two
adjacent layers.
Consider two computers, host 1 and host 2, that communicate with each other.
Both have the same number of layers. Note that the number of layers in two
computers does not need to be the same, but when a sender wants to
communicate with a receiver, each host must have corresponding layers in its
system.
For example, for layer 2 of host 1 to communicate with host 2, host 2 must have
the corresponding (peer) layer 2 in its system. The set of rules governing the
message exchange between two machines in layer n is called n-peer protocol or
simply layer n protocol. The messages exchanged between these two layers are
called n-PDUs (Protocol Data Units of layer n). The format and the meaning of
the fields in n-PDUs are specified in the layer n protocol.
By repeating the above steps, the highest layer of host 2 will receive the original
message and then pass it to the corresponding application program. The whole
process, which sends messages from one side to the other, is called network
protocol.
Around the early 1980s, the International Standard Organization (ISO) proposed
the Open System Inter-connection (OSI) reference model. The model aimed to
standardise network components to allow multi-vendor development and
support. This OSI reference model was expected to become the dominant
standard in the computer network market. It is a layered reference model with
seven layers - the physical layer, data link layer, network layer, transport layer,
session layer, presentation layer, and application layer. They are all well defined,
well structured, and each layer has its own networking function(s).
However, the ISO OSI reference model is too complicated (too many layers). In
the 1970s, the United States (US) Department of Defence developed a research
network called ARPANET. Then, after further development, it became the
TCP/IP reference model (some texts refer to TCP/IP as a suite of network
protocols) and was released in the commercial market in the 1980s. Then it
quickly became the dominant model or standard in the computer networks
market, which is a major reason why the Internet developed so rapidly - there
was a common network protocol suite. The most important reason why the
TCP/IP reference model succeeded (over the ISO OSI reference model) was its
simplicity and ease of operation. The TCP/IP reference model, shown in Figure
1.3, has only four layers - host-to-network layer, network or IP (Internet Protocol)
layer, transport or TCP (Transmission Control Protocol) layer, and application
layer.
Application
TCP
IP
Host-to-network
(a) To define a unique and well-defined IP address for each machine and
to define the format of its PDU (i.e. datagram); and
(b) To provide services to route a datagram from a sender to its
corresponding receiver through a network.
Because of the popularity of the TCP/IP reference model, you are given a
complete picture of it in Topic 3. In Topic 3, you learn the functions of IP
and TCP, and how they support the application layers. You also learn some
simple application services such as FTP and email services.
SELF-TEST 1.2
You might say that operating systems could be upgraded to support the remote
services, but it is always easier and cheaper to install additional software to
handle remote services than to upgrade the original ones. Therefore, distributed
systems software is the additional software to support remote services for a set of
computers connected by a computer network.
The key difference between the two systems is, in a distributed system, the
existence of multiple autonomous computers is transparent to the users or
appears to the users as a single computer. Users can use the services provided by
the distributed system, input some data (parameters or files) to the system and
wait for the output from the system. Users do not need to know exactly how and
where the remote services are in the system.
For a network, users must explicitly log on to a machine, explicitly know what
the machine can do, explicitly submit data to the correct location, and explicitly
tell the machine how to return their results (e.g. give their own logical addresses
to the machine).
In fact, a distributed system is built on top of a network. Networks are just one of
the resources of distributed systems, and distributed systems use them to deliver
and receive data. For example, both distributed systems and networks support
file movement, but users in networks need to know the locations of the sender
and receiver, the network configuration and which network protocol is used,
whereas users in distributed systems do not need to know these things. In fact,
they should not know these details.
Consider the following example. A process requires ten hours for execution
by a high-speed computer. But in a distributed system, we can use ten cheap
and slow CPUs, with the speed which is ten times slower than the high-speed
one, in parallel to finish the same process. Both may use the same amount of
time to finish the job, but the cost of 10 slow CPUs will be much lower than
that of the high-speed one.
READING
SELF-TEST 1.3
Note that all of the above characteristics should be considered but they do not
need to be implemented at the same time. Sometimes, the characteristics that
should be implemented depend on the nature of application services provided
(or desired). Thus, we can say the above points bring challenges when building a
distributed system.
Resource Sharing
As mentioned, resource sharing is the most important characteristic or advantage
of a distributed system, and thus, all distributed systems should deal with this
issue. The term "resource" is abstract, since it can represent hardware (e.g.
printers, CPUs) or data (e.g. shared database, share executed files). To manage
the resource(s) effectively, a program called resource manager is required to
provide an interface between the resource and users. The resource manager
should provide the resource name, identify the resource location, map the
resource name to a communication address, and coordinate concurrent accesses
to ensure consistency.
Heterogeneity
Heterogeneity applies here to a variety of different hardware and software
components operating together in the different levels in a distributed system:
Ć Networks
Ć Computer hardware
Ć Operating systems
Ć Programming languages
Ć Implementations by different developers.
Openness
Openness is the characteristic that determines whether a distributed system can
be extended or expanded in various ways. For hardware, we should be
concerned about whether additional peripherals, memory or communication
interfaces can be put into the system or not. For software, additional operating
system functions, communication protocols and resource-sharing devices should
be able to join the system without any modification to the system.
Security
Many of the information resources are maintained in a distributed system for the
users to share. However, some critical resources should not be shared by
unauthorised users, but need to be protected. There are two kinds of protection.
Scalability
Distributed systems can operate effectively at many different scales. A system is
scalable if it remains stable when the number of users and the amount of
resources are increased significantly - in other words, adding users does not
adversely affect the way the system works. Usually there are three kinds of scale:
2. The middle and most common one is a distributed system within a LAN.
Hundreds of workstations and several file servers and printer servers might
be interconnected; and
Fault-Handling
Sometimes, distributed systems fail. Some output results might be incorrect,
some incoming requests might be lost, a server might be down, or some services
stop before they complete the computation. A good distributed system should be
Generally there are two kinds of failures - hardware and software. Hardware
redundancy is used to handling hardware failures, i.e. redundancy components
replace the failed ones. Programs should be designed to tolerate or automatically
recover from faults of software failure.
Concurrency
Since there are many clients (users) and several servers in a distributed system, it
is possible to have more than one process executing in parallel. Concurrency is
one of the intrinsic characteristics of distributed systems. There are two reasons
for parallel executions to occur:
Transparency
This characteristic is hidden from the user and the (application) programmer. A
distributed system is transparent if it achieves the image of being a single system
to make everyone think that the collection of independent components is simply
a single time-sharing system.
Ć Failure transparency enables the concealment of faults, and allows users and
application programs to complete their tasks despite the failure of hardware
or software components. The power of failure transparency is highly
dependent on how many resources are held in reserve within this fault-
tolerance scheme.
READING
SELF-TEST 1.4
Application
Run-time support
Operating system
Hardware
System Architecture
In a distributed system, processes are arranged together to perform useful tasks.
The following are the four types of system architecture.
1. The client-server model is the most widely used model. Clients send
invocations (requests to an authority) to the servers (the authority) for its
remote services. Then the server executes the remote services based on the
invocations and sends the results back to the clients.
3. Proxy servers and caches. The cache is a fast secondary storage device that
records the most recently used data objects. When a client requests an
object, the caching service first checks the cache and supplies the object
from the cache if it is available. If not, a search is required through the Web
servers. The cache will be updated when the object has been found - this
most recently sought object will be added into the cache memory.
Design Requirements
There are four requirements in the design of a distributed system:
(a) Message transmission delay (the time taken to send a message from a
sender to its corresponding receiver).
(b) Throughput (the data transfer rate).
2. Quality of Service (QoS). The QoS experienced by clients and users is the
reliability, security and performance. The concern here is whether fault-
tolerance can be achieved in a distributed system to maintain its reliability
and availability. As for security, a reasonable degree of security should be
applied to the data that are stored and transmitted within a distributed
system.
3. Use of caching and replication. Both cache and replicated servers should be
used to improve the performance and availability within a distributed
system. The concern is how to validate a cached response, how to refresh
cache and how to maintain the consistency of cache and replicated servers.
Ć Security model. Since resources are shared within a distributed system, this
model defines how to protect those resources from being accessed by the
unauthorised users, and it provides a secure way for authorised users to
access the shared resources.
READING
SELF-TEST 1.3
• You should also know the definition of distributed systems and be aware of
their advantages over standalone computer systems. Moreover, you should
now understand the differences between networks and distributed systems.
• You have also learned the basic characteristics of distributed systems. The
section dealing with architectural models demonstrates very clearly how the
layered software in distributed systems is significantly different from the
layered software in conventional standalone computer systems.
• You should also understand the four system architectures and their design
requirements, and the fundamental models of distributed systems.
LEARNING OUTCOMES
By the end of this topic 2, you should be able to:
1. Identify different network services;
2. Outline the quality of services required in a network;
3. Describe the basic concept of Local Area Networks (LANs);
4. Give examples of LANs; and
5. Identify the functions of internetworking facilities.
X INTRODUCTION
Most of the worldÊs computers are now connected to some form of network. In
fact, the idea of computer networking was introduced soon after the invention of
computers, and now you can find computer networks all over the world. A
computer network consists of a series of computers that are connected together to
allow them to communicate with each other. Through a network, computers can
share peripheral devices like printers, databases and, perhaps most importantly,
information. In fact, the truly standalone PC is rapidly becoming a thing of the
past.
The main issues involved in connecting computers are network hardware and
network software. Let us briefly go through the network hardware issues in this
section, and later you can concentrate more on the network software. The main
focus of this course deals with the software part of computers and systems, and
distributed systems are built on the top of the network software.
Transmission Media
Transmission media are used to provide a connection between two machines to
exchange information. These media can be classified into two groups - guided
and unguided. Guided media define a physical and tangible property through
which the signals are transmitted between the communication points, such as
twisted pair, coaxial cable, and optical fibre. Unguided media have no physical
(tangible) connection between two points, such as satellite, microwave, and
infrared.
Guided Media
A twisted pair medium is made up of pairs of copper wires that are insulated and
twisted together. They are widely used in telephone networks and are good for a
data rate (bit rate) in the order of 10ă100 Mbps over 100 metres, and at lower bit
rates over longer distances. Unshielded twisted pair cable (UTP), shown in Figure
2.1, is a four-pair wire medium that is commonly used in LANs because of its low
price and easy installation.
Coaxial cable, shown in Figure 2.2, contains a central wire inside an outer circular
copper wire mesh. The space between these two conductors is filled with a
dielectric insulating material. The data rate is typically 10ă100 Mbps over a
maximum cable length of 500 metres. Coaxial cable is better than twisted pair in
protecting electronic signals against external noise.
Optical fibres, shown in Figure 2.3 below, are plastic or glass fibres with a light
source on one end and a light detector on the other. They are immune to electrical
noise and support very high-data rates (up to 5 Gbps) over a cable length of 2ă3
km.
Unguided Media
Satellites are used for very high frequency (GHz) radio communication. A
satellite travels in space and moves synchronously with the rotation of the earth -
that is, in a geosynchronous orbit. A transmitting antenna sends its signal to the
satellite and then the satellite reflects the signal to the earth. Its data rate is high
but the satellite and transmitting antenna are very costly. Thus, satellites are
mainly used for video broadcasting.
Table 2.1 provides a quick reference tabulation of the transmission media. You
can find the advantages and disadvantages from the table.
Short-distance Communication
Computers use binary digits (bits) to represent data. Thus, transmitting data
means sending bits from one end to the other through a transmission medium.
The simplest way to transmit bits is to use a small electric current to encode data.
A negative voltage is usually used to represent a logical „1‰ and a positive
voltage is used for logical Â0Ê. To transmit a Â0Ê bit, the hardware device of a
sender inputs a well-defined positive voltage into a wire for a specified short
time. Then the hardware device of its corresponding receiver will receive the
signal and interpret it as a „0‰ bit. This is the physical way for a sender to send a
logical bit and a receiver to receive a logical bit in a short-distance
communication.
Long-distance Communication
The hardware for long-distance communication provides us with another
problem. An electrical current cannot be transmitted too far, because the current
becomes weaker as it travels. This is known as signal attenuation (loss of
communication signal energy). Thus, we need to use modulation to send data.
Modulation is a process by which the characteristics of electrical signals are
transformed to represent the data. Instead of transmitting an electric current,
long-distance communication systems send a continuously oscillating signal as a
data-bearing signal, usually in the form of a sine wave called a carrier, as shown
in Figure 2.6.
However, the carrier signal alone is not sent in the form shown in Figure 2.6
above. The data are encoded in an analog format into the carrier signal to become
what is called a modulated signal. The data are then transmitted through the
modulated carrier. That means, based on the binary data that you want to send,
you modify the signal a little - but the modifications are based on the values of
binary data and the modulation techniques required.
Phase modulation uses different phase changes (0o or 180o) to represent different
bit values. Figure 2.9 shows how phase modulation works.
When a modulated signal arrives at the receiver, the receiver demodulates the
signal and coverts it into binary data.
What is provided above is only a very brief description (and concepts) of the
computer hardware associated with data transmissions. The next section
introduces the software components of computer networks.
SELF-TEST 2.1
2. Sending long continuous streams of bits may cause the fairness problem
(some might call it an „unfairness‰ problem). If the communication
occupies a communication channel for a very long time for sending the
data, others will suffer from long transmission delays. Sending data in
packets can solve this problem. Everyone can send data, but the computer
networks will take turns in handling them.
Figure 2.10 shows the general structure of a packet. The header of a packet
includes the source address, destination address, and options. The source address
identifies who has sent this packet, and the destination address identifies who
should receive this packet. Options are usually for network control and
management. Since a packet is assumed to be small, the size of a packet is
variable but it does have an upper bound (an upper limit).
Another analogy, which links better to the connection type described in the next
sub-section, deals with one form of mail delivery. Figure 2.11 shows the
guaranteed and dedicated delivery channel associated with one form of mailing a
letter.
Figure 2.11: An analogy for connected packet service · sending a registered letter
Source: Cisco Systems Networking Academy
Note that two computers do not need to exchange their data continuously. They
can sometimes stop their data communication and then resume it later. The
connection path will not be removed until it is no longer needed. To release the
connection path, one of the two computers will send a disconnection request to
the other computer(s). It then disconnects the path, which is called connection
termination (analogous to hanging up the telephone, or the registered letter being
signed for and delivered).
Figure 2.12: We will do our best to get it there · general mail delivery
Source: Cisco Systems Networking Academy
Then the network routes the packet from the source computer to the destination
computer according to its destination address (analogous to mailing a letter),
though the path is not dedicated or guaranteed. The computer may send more
than one packet to the other computer, but not all of the packets follow the same
path to the destination. Based on the routing mechanism of the network, each
packet will find its own way to travel. For example, a general letter posted in
Wan Chai to Kwun Tong today, might get there through Mongkok; one posted
tomorrow might get there through Quarry Bay. Another just might not get there;
but Hong Kong Post is very reliable.
or
The percentage remaining if you subtracted the efficiency percentage from 100%
would provide you with a measure of the overhead. For example, if the efficiency
of your data transmission were 80%, then the overhead in that system would be
20%.
There is no connection set-up procedure for connectionless services, and thus, the
overhead does not include the connection set-up or termination time. However,
since each packet has to be delivered with full source and destination addresses,
the transmission overhead is the addresses, and it is fixed whether the
communication time is long or not. Obviously, the quality of service cannot be
negotiated, since each packet is routed independently and thus, the overall
performance is not stable. Also, the sequencing of the packet delivery cannot be
maintained (they can arrive in any order) and there is usually no assurance of
delivery. Connectionless service is suitable for short communication transactions.
SELF-TEST 2.2
1. Suggest two reasons why packets are used to transfer data
instead of sending logical bits continuously.
2. What is the efficiency of a connection-oriented service if it spends
four seconds to establish the connection setup and ten seconds
for data communication?
3. What is the efficiency of a connectionless service if the size of the
header of a packet is 24-bits and the average packet size is 120-
bits? Assume the header contains the only additional bits needed
to send the packet.
4. If we want the efficiency of the above connection-oriented
service (Question 2) to be not less than the above connectionless
service (Question 3), how long should the data communication
time be maintained in the above connection-oriented service?
Throughput
Throughput is defined as the actual packet delivery rate, which means
throughput is equal to the number of packets that are successfully transmitted
over the network from one end to the other end in one second. The unit of
throughput is bits per second or packets per second. There are many other names
for this term, such as:
Ć Data transport rate.
Ć Data transmission rate,
Ć Data transfer rate, or simply
Ć Data rate.
the throughput
normalized throughput = , where 0≤ normalised
the transmission rate
throughput ≤ 1
Delay
Delay is also known as latency, transmission delay or data transfer delay. Delay
means the time taken from the sender wanting to send a packet to the receiver
completely receiving the packet. It is different from the propagation delay, in
which the latter is defined as the time taken between the first bit of a packet
leaving from a sender to that bit arriving at the receiver, or the time taken to send
an empty or null message.
It should be obvious that the longer transmission delay, the worse the QoS, and
you should always want to have a short transmission delay in any system. The
delay might depend on the length of a transmission medium and its properties,
but sometimes it also depends on how a communication protocol handles the
packet transmission procedure. A good communication protocol should find a
good way to minimise this delay factor.
Error Rate
Error rate represents the number of error bits over the total number of bits sent. If
the error rate is 0.01, it means there is one-bit of error for every 100-bits of data.
Error rate is one of the parameters for QoS. In order to provide better QoS, the
error rates must keep as low as possible. However, this factor is greatly
dependent on the communication environment and the transmission media.
Thus, the value of the error rate is not the factor that should be used to determine
the performance of a communication protocol but how errors can be handled.
The greater the efficiency in handling errors, the better the network performs (or
is performing).
SELF-TEST 2.3
1. If a network has a 3 Mbps transmission link and 20% is wasted
during data transmission:
Sharing a transmission link can reduce costs, but shared transmission links
should only be considered in local communications. The two reasons why shared
transmission links are not considered for long-distance communications are as
follows.
Bus Topology
In bus topology, all computers (also called stations, hosts, end stations, or
terminals) share a common (broadcast) transmission medium in a multi-point
configuration. Figure 2.13 shows a general bus topology. Here is how it works.
2. The data signal, shown in the light arrow, then travels to two directions
through the bus - one is from the computer to the left terminator; another
one is from the computer to the right terminator. The function of the two
terminators is to terminate (or absorb) the data signal at the end of the bus.
3. Then the bus interface of the receiver checks the data signal in the bus. If the
signal belongs to it, the bus interface will copy the signal from the bus to its
computer. Note that a bus interface is passive, which means that if a
computer (e.g. Host n) is not the receiver of the data signal, it cannot force
its bus interface not to copy the signal from the bus to the computer.
4. Obviously, if more than one computer wants to transmit data at the same
time, it would cause a signal-overlapping problem and no receivers would
receive error-free signals. This particular problem is further discussed and
solved when we study the Ethernet, which is a common LAN commercial
product that invokes (uses) a bus topology.
Ring Topology
In ring topology, all computers are connected into a ring, which is a shared
common transmission link in a multi-point configuration similar to bus topology.
Figure 2.14 below shows the general structure of a ring topology. The operation
of the ring topology is similar to that of the bus topology, except for three
differences.
Ć The electronic data signal travels in one direction only, not in two directions -
it goes clockwise or anti-clockwise around the ring.
Ć Ring topology has no terminators. When the data signal travels from a sender
(e.g. Host 1), it travels around the ring and ultimately comes back to the
sender, where the sender finally absorbs it. This means that, when the data
signal arrives at the sender, the sender will not allow it to pass around the
ring again - it will absorb its own data signal (it does its own housekeeping by
acting as the terminator for its own signals).
Ć When a ring interface (such as the Host 2 interface) receives a data signal, the
interface will copy it.
Ć If the data signal belongs to its computer (i.e. the receiver, e.g. Host 2), the
ring interface will copy it into the Host 2 computer and then pass the data
signal to the next computer through the ring.
Ć If the data signal is not destined for Host 2, the Host 2 ring interface will
simply copy and send it to the next computer (Host 3) through the ring. If the
message is destined for Host 3, the interface will copy the signal to the
computer and then pass the message through the ring to Host 4.
Ć If the message in not intended for Host 4, Host 4 passes it along (back) to Host
1. And you know what Host 1 should do with it, donÊt you? „Kill‰ it, right!
When the message is sent back to Host 1, Host 1 will know that the message
has been received by Host 3. It will then absorb („kill‰) the message and
release the free token again for circulation.
Obviously, if more than one computer wants to transmit data at the same time,
there is a problem of how control of the ring is assigned to a particular computer -
two signals cannot be active on the ring at the same time. The problem is further
discussed and solved when we study Token Ring later in the topic Token Ring is
a common commercial LAN product built on (using) the ring topology.
Star Topology
In the star topology, all computers are connected to a central node and all data
transmissions go through that node. Note that the central node, which can be
active or passive, could be a:
Ć Mainframe computer,
Ć Hub,
Ć Switch, or
Ć Router.
In the past, this star-type of LAN topology was usually used for centralised
computer systems. In other words, the node was usually a central computer (i.e. a
mainframe computer) and all of the hosts were terminals (i.e. a terminal
consisting of a monitor with keyboard and a mouse if any). The computation
power and the secondary storage resided in the central computer. When a user
on a terminal wanted to do some operations (e.g. run an application program),
the user sent commands from the terminal to the central computer. After the
computer executed the commands and finished the corresponding operations, the
output was transmitted back to the terminal and displayed on the terminalÊs
monitor.
Ć Loading. Since the node is involved in all data communications, its speed has
to be very high to process each data communication; otherwise, the node will
become a bottleneck and all data communications might suffer very long
transmission delays if too many packets travel too slowly through the node.
Despite the above disadvantages, the star topology is often applied in LANs, but
bus and ring topologies are still very commonly used.
SELF-TEST 2.4
1. What are the differences between the bus and ring topologies?
3. Although the star topology has some disadvantages, can you point
out any advantages?
2.5.1 Rules
The algorithm used to manage the data transmission in Ethernets is called
CSMA/CD (Carrier Sense Multiple Access with Collision Detect). The algorithm
is a kind of media access control protocol that corresponds to the functions of
Data Link Layer in the ISO OSI model. The CSMA/CD rules are quite simple and
are shown below:
1. When a station (computer) has data to send, it first listens to the channel
(segment) to „see‰ if anyone else is transmitting at that moment.
4. If a collision occurs, the station aborts the transmission, sends a jam signal,
waits a random amount of time, and then starts all over again from step 1.
In rule one, „listens to the channel‰ means „measures the voltage of the channel‰.
Since coaxial cable is used as the shared transmission link, the data signal is an
electronic signal. When the channel is idle, a specified pattern of an electronic
signal can be found in the cable. When a station wants to send data, it measures
the change of the voltage of the channel. If the pattern matches the idle state, the
channel is idle; otherwise, it is busy.
Now, when data can be safely transmitted over the bus, how does the receiver
receive data? The receiver will check the destination address field in the header of
the frame. If the destination address is its own address, it will copy the frame
from the bus through the bus interface.
In rule four (you may refer again to pp. 116ă17 in your text), you might wonder
why or how a collision could occur if the first three steps were followed. In fact,
there is still a chance of having collisions. Let us look at Figure 2.16 again.
Let δ be a very short time duration. When t = 0, Host 2 inspects the bus and
finds it is idle. Thus Host 2 thinks there is no data transmission and then it sends
the data at t = δ . When t = 2 δ , Host 2 is sending data and tries to put the data
over the whole bus so that everyone will know the bus is occupied. However, at
that same time, Host n starts to detect the status of the bus. At that moment, since
the data signal sent from Host 2 has not yet reached Host n, Host n wrongly
believes that the bus is idle. At t = 3 δ , Host n starts to send its data and thus, a
collision occurs.
When a station detects a collision, the station will abort the transmission and send
a jam signal. The jam signal is a random signal so that every other station knows
there is a collision in the channel and no one should start any transmission before
the jam signal is completely transmitted. Note that more than one station might
detect the collision and thus, more than one jam signal might be transmitted. This
is allowed in Ethernets.
Now, after the collision is detected and a jam signal is transmitted, all stations
involved in the collision will need to retransmit their data. It is obvious that they
might all retransmit at the same time. If that happens, the data signals will collide
again.
So now the question is: „We know they need to wait for a while and then
retransmit, but how can we arrange their random waiting time?‰. To answer this
question, we have Binary Exponential Backoff algorithm, which is shown below:
After the nth collision, each station chooses a random integer from 0
and 2n · 1 and waits that number of slot times. If 10 ≤ n ≤ 16, the
randomization interval is frozen at a maximum of 1023 slots. If n > 16,
the network is assumed to have crashed.
After the first collision, each station waits either 0 or 1 slot times before trying
again. One slot time is equal to the worst case of the round-trip propagation
delay, i.e. 2 × end-to-end propagation delay. End-to-end propagation delay is
defined as the time required for the first bit of a frame transmitted from one end
to get the other end. To accommodate the worst case allowed by the IEEE
standard, the slot time has been set to 512-bit times (the time taken to send 512-
bits), or 512-bits /10Mbps = 51.2 μs. If more than one station wants to retransmit
after the same number of slot times, they will collide again and then randomly
wait for the next retransmission. Next time they will have four slots to select from
(0, 1, 2, and 3; that is, the second collision yields four random time slots [between
0 and {22 ă 1 = 3}]. The third collision would yield eight random slots, and so on.
When more than ten collisions have occurred, it is meaningless to increase the
randomisation interval further since the interval is sufficiently large; it is
therefore frozen. If, finally, the messages still cannot be delivered after the 16th
collision, the system assumes that:
Ć The applied load is so high that the network cannot support the transmission
of data, so the system announces that the network has crashed.
Ć Length of data field (two-bytes): This is the length of the data field in units of
bytes and the maximum length of the data field is 1,500-bytes.
Ć Checksum (four-bytes): This field is used for the receiver to check whether the
receiving frame is correct or not. Usually, the Cyclic Redundancy Check
(CRC) method is used for error checking. CRC is an error-checking technique
in which the receiver calculates a remainder by dividing frame contents by a
polynomial divisor and compares the calculated remainder to a value stored
in the checksum of the frame by the sender. Note that the design and
implementation of CRC is not included in this course because of its
complexity.
First, let us consider the term „applied load‰. The applied load of a LAN is
defined as the total number of frames per second generated by all stations. It
includes both new frames generated and re-attempted frames (retransmitted after
collisions). When the applied load is low, the bus is always idle and almost no
collisions occur. Thus, every station wanting to transmit its frame can start its
transmission almost immediately without any delay and the overall access delay
is very small (close to 0). When the applied load is high, many frame
1 τ
S ≤ ,where a = . Equation 2.1
1 + 4.44a tx
Let us follow an example to show how to use the above formula to compute the
upper bound of the normalised throughput.
Consider a 10 Mbps bus LAN using CSMA/CD. Suppose the bus spans 5 km and
the frame size is 1,000-bits. What is the maximum throughput of the LAN (in bps)
if the propagation delay of the bus is 5 μs/km?
Figure 2.19 shows an Ethernet with dedicated media full-duplex technology and
a switch. There is no collision in this Ethernet, because stations can transmit a
frame to the switch and receive a frame from the switch. If more than one station
wants to send frames to the same destination, the switch will buffer them and
send them to the destination in sequence.
SELF-TEST 2.5
1. Describe the principle of CSMA/CD protocol.
2.6.1 Rules
The rules of a Token Ring network are quite simple and are shown below:
1. At ring initialisation, a special packet called a token is injected into the ring
and circulates on (around) the ring.
The receiving station(s) picks up the frame from the ring and changes one-
bit field for acknowledgments while the other (non-receiving) stations
simply pass the frame along to the next station.
3. The sender station absorbs the transmitted frame when it circulates back.
4. Then the sender station releases the token (the sender then switches to listen
mode).
In rule one, a token circulates around the ring and waits for someone to catch it.
Note that, at that moment, all stations are idle and the token travelling around is
called a „free‰ token.
In rule two, when a station wanting to transmit sees a „free‰ token, it seizes the
token by inverting a specified single bit in the three-byte token and changes it
(the token) into the first three bytes of a normal data frame (the token and frame
formats are shown later). The data to be sent will then follow the token. When
other stations „see‰ this modified token, they know this token is occupied.
To transmit a data frame, the ring interface of the station will change from listen
mode to transmit mode. Listen mode is shown in Figure 2.20(a). In this mode, the
input frame is simply copied to the output with one-bit delay. one-bit delay is the
time taken for the ring interface to transmit one bit. For example, if the ring
transmission capacity is one Mbps, one-bit delay is 1 μs. In transmission mode
[see Figure 2.20(b)], (which is entered when the free token is obtained) the ring
interface breaks the connection between input and output. The input „token
frame‰ will be copied to the station and the station will regenerate its own data
frame to the output.
In rule three, „absorb‰ means when the transmitted frame circulates back after
one cycle around the ring, the sender will receive the frame but it will not
regenerate it to the next station. The sender does not do the regeneration, because
the transmitted frame has already circulated the ring once and so the
corresponding receiver should have already copied the frame.
In rule four, since every station knows the bit pattern of the „free‰ token, it is easy
for the sender to regenerate a „free‰ token.
What you would have to weigh or judge, however, is whether waiting is more of
a disadvantage than having systems crash through collisions, which is likely a
major disadvantage of Ethernets.
Ć Access control (AC) (one-byte): It contains the token bit, the monitor bit,
priority bits, and reservation bits.
Ć Frame control (FC) (one-byte): This distinguishes data frames from various
possible control frames.
Ć Checksum (4 bytes): This field is used for the receiver to check whether the
receiving frame is correct or not. The Cyclic Redundancy Check (CRC)
method is usually used for error checking.
Ć Frame Status (FS) (one-byte): It is for network control, and contains the
acknowledgement bits.
The access delay is quite different from an Ethernet. When the applied load is
low, a station wanting to transmit simply waits for the tokenÊs arrival before data
transmission can begin. Thus, the range of the access delay is from zero to the
ring latency (say L), which is the total delay incurred by a bit (signal) to circulate
once around the ring, and the average access delay is L / 2. When the applied
load becomes higher and higher, the access delay becomes longer and longer,
until it reaches the maximum access delay. The maximum access delay is
achieved when all stations want to transmit data and each one that holds the
token holds it as long as possible. If the time that a station can hold on to the
token (MTHT = Maximum Token Holding Time) is bounded (limited), the
maximum access delay is also bounded, which is suitable for real-time
applications.
Suppose there are n stations. The worst situation is that a station wanting to
transmit just misses the token and every other station holds the token as long as
possible, i.e.
The throughput increases as the applied load increases and levels off as the
transmission capacity of the ring is approached. When the access delay is
maximized, its throughput is also maximized.
Assume each station spends all the time to send data when it holds the token
(and the transmission overhead is ignored, including processing time to hold the
token and release the token). When all stations want to transmit their data - and
hold the token as long as possible - the total actual data transmission time is nTh
but the ring latency L is the total transmission overhead, i.e.
nT h
S≤ , Equation 2.3
L + nT h
From the above table, you may think the overall performance of Token Ring is
better than that of Ethernet. However, Ethernet is more popular than Token Ring
in the commercial world for the following reasons:
You might argue that the performance of Ethernet is not good when
compared with Token Ring, when the applied load is very high. But usually
we do not allow the applied load to be so high. We know that both Ethernet
and Token Ring have large access delays when the applied load is high, and
users would not like either network, because they need to wait for a long
time to do a few remote operations. So, in the commercial world, since a
LAN is inexpensive, companies will increase the transmission rate or even
buy one more LAN to reduce the applied load along with the access delay.
But for Token Ring, it is not easy to add a new user. You need to disconnect
the ring, put the ring interface into the ring and connect them together.
Disconnection means stopping the operation of the network, which will
disturb other users. Also, one more user will add one more one-bit access
delay; the new user affects the performance of a Token Ring. Moreover,
when you want to remove a user, you either need to disconnect the ring
and then remove it, or set the ring interface to bypass all packets.
(a) Reason 1
Token Ring is fairer than an Ethernet. In an Ethernet, when a station has a
lot of data to transmit and its first frame is successfully transmitted, the rest
of the data will easily follow the first frame and be transmitted. This comes
about because, this one station always occupies the bus and other stations
would find it difficult to access the bus when the station uses the bus to
send its frames continuously - and the other stations cannot detect an idle
status on the bus. However, for Token Ring, when a station finishes
sending its first data frame, it has to wait for the token to circulate round
the ring once before it can access the token again. So, if other stations want
to transmit, they have a fair chance to get the token to do their data
transmission.
(b) Reason 2
It is easy to set priorities in a Token Ring system but not in an Ethernet. In
Ethernet, the schedule is first-come, first-served. Thus, even if you have a
high priority, if another station has already occupied the bus, you have to
wait until it finishes. But for Token Ring, if a station has a higher priority, it
can set the token in such a way that it can only be accessed by high-priority
stations; stations with lower priority cannot hold the token to send data
(until it is released).
However, the above two advantages of Token Ring over Ethernet are not
very important in the commercial world and thus, Ethernet is more popular
than Token Ring.
SELF-CHECK 2.6
ACTIVITY 2.1
If you work in an office with a local area network (LAN), try to check
the type of the LAN (Ethernet or Token Ring?) in your office and
check its transmission rate and other specifications.
Do you believe your LAN system is the „best‰ for your particular
working environment? Think about how you could improve its
usefulness or performance.
2.7 INTER-NETWORKING
LANs can be found everywhere but they are seldom large in size. Companies or
organisations typically like to build a LAN for each department rather than a
large LAN serving all departments. Three reasons to explain why it is better to
build several small LANs and interconnect them rather than building a large one:
Repeater
A repeater is used to regenerate transmission signals (so that they can travel
farther) to extend the length of a LAN. It operates in the physical layer of ISO OSI
model. It is usually used when a travelling signal is too weak to travel further.
When a repeater receives a frame, it re-generates the frame and strengthens the
signal. A hub is a multi-port form of a repeater. The main advantage of a repeater
is its low cost, because its function is simple. But its disadvantage is that if a
repeater connects two LANs, the two LANs will actually become one LAN. For
example, if a repeater connects two Ethernets, when a station in one of the two
LANs sends a frame, all stations in two LANs can read this frame - the two have
merged into one. Thus, even though a repeater is the simplest interconnection
device, it is not suitable if you want to build separate LANs.
Bridge
A bridge is a more complicated inter-connection device and has more functions
than a repeater. It operates in the data link layer of ISO OSI model, it can connect
different LANs together, but it can also filter the traffic among (between) the
LANs. Basically a bridge serves three functions:
In this topic, you have been introduced to some network concepts. The first
section of the topic deals with some of the hardware issues associated with
connecting „standalone‰ computers into networks. The important issues in your
choice of transmission medium for a network include the cost of inter-connecting
the computers, but perhaps more importantly it includes the capability of each
medium to transmit your data effectively and efficiently. Different media have
different transmission rates and different spans or ranges over which they are
effective. You should now be able to choose the right one for your networkÊs
needs.
The next two sections deal with two types of network service - connection-
oriented and connectionless - with the definition and parameters to determine the
Quality of Service (QoS). To a large extent, a loose relationship exists between
network services, quality of service and the type of transmission media.
The last sections, which deal with the networking requirements for distributed
systems and two examples (case studies) of LANs, should have provided you
with a good understanding of how the parts fit together to make effective
networks. You were shown the characteristics of Local Area Networks (LANs)
and the different topologies that can be used to develop such networks. Each of
the case studies provide you with information about the rules, frame structure
and performance analysis for Ethernets and Token Ring, and you should also
understand the strengths and weaknesses of these two systems.
Finally, you were exposed to the ways in which LANs could be inter-networked
to form larger units but, at the same time, you were alerted to the limitations
associated with joining LANs into larger units. Perhaps the key issue here
revolved around the fact that even though inter-networking is highly desirable, it
is still advantageous that each individual LAN retains a degree of separateness.
„Linked separateness‰, if such a thing exists, could be achieved by using bridges
(rather than other devices) to connect the separate LANs.
You now have the background concerning the physical aspects of networks and
distributed systems to put your learning in Topic 3 into context. This next topic
focuses on the protocols required for data transfer through and across networks
and systems.
Pierce, J (1972). „How far can data loops go?‰ IEEE Trans. on Communications,
COM-20L 527ă30.
Topic 3 X Transmission
Control Protocol/
Internet Protocol
TCP/IP
LEARNING OUTCOMES
By the end of this topic 3, you should be able to:
1. Describe the concept of layered network architecture;
2. Outline the layered structure of the TCP/IP reference model;
3. Describe the functions of Internet Protocol (IP);
4. Describe the functions of Transmission Control Protocol (TCP);
and
5. Identify different Internet applications.
X INTRODUCTION
Now that many computers are connected to the Internet, many users use Internet
application services everyday. This important topic introduces the
communication protocol used in the Internet - TCP/IP (Transmission Control
Protocol/Internet Protocol). In fact, the Internet is just one type of WAN (Wide
Area Network). Because the Internet is so popular, everyone knows about it but
not about WANs in general. So, to get this topic easier to understand, we
introduce the broad concept of a WAN and its switching technology.
The general structure of a WAN is shown in Figure 3.2. When a packet arrives at
a node, it is stored in the nodeÊs memory and the destination address (included in
the header of the packet) is examined. By searching its routing table, the node
then makes a routing decision about where to forward the packet.
Ć If the packet is destined to a host that is attached to the local network of the
node, the node will deliver the packet to the network and then to the
destination host computer.
To forward the message, the node puts it on the outgoing queue of the link. The
link transmits packets in its queue on a first-come-first-served basis. Since each
packet requires storing and forwarding procedures during its transmission,
packet switching is also called store-and-forward switching.
For better buffer management and fairness among computers, packets have a
fixed maximum (or bounded) size. If the packet size were unbounded, a large
buffer would be required for each node to store large packets and it would be
difficult to know how many packets a node could handle. Moreover, if a packet is
too large (long), it will occupy the nodes that the packet must travel through for a
long time. Other packets using the same nodes will suffer a long-transmission
delay, and hence it is not fair to them.
Now the question is: What happens if the transmitted data from the application
program is larger than the fixed maximum size allowed in the whole network
protocol (i.e. the whole network software)? The application program should
divide the data into several parts, and then put them into the network protocol.
Then the application program at the destination will recombine them. Note that
this is different from the fragmentation and reassembling in the IP (Internet
Protocol) layer (described later).
READING
Chapter 3, section 3.3, „Network principles‰, 73ă77.
SELF-TEST 3.1
Application
Presentation
Session
Transport
Network
Data link
Physical
Ć Physical layer: This is a base layer, which defines the digital interface for the
physical transmission, e.g. communication modes, modulation, multiplexing
and so on. It also defines how to send logical Â1Ê and Â0Ê signals and the
configuration of connectors.
Ć Data link layer: This layer includes two sub-layers. One is the Medium Access
Control (MAC) sub-layer, which is special for LAN protocols. CSMA/CD and
Token Ring protocols are two examples of this sub-layer. The other layer is
the Logical Link Control (LLC) sub-layer, whose role is to implement an
error-free communication link. The functions of this sub-layer include error
control, flow control and link control (link set-up and release procedure).
Ć Transport layer: This layer isolates the user from the details of the network
access (i.e. the user is not required to know any details about the network).
From this layer up to the application layer, users are not involved in any
physical details of a communication network. The functions of this layer
include Quality of Service (QoS) negotiation, end-to-end reliable message
transport service, and multiplexing.
Figure 3.4 shows an example of how to deliver a packet of data from a sender to
its corresponding receiver in an ISO OSI reference model.
Based on Figure 3.4, when the sender wants to send data to a receiver, the sender
passes the data to the application layer. The application layer then takes some
action on the data and attaches a header (i.e. AH = Application Header) to the
data. The data with the AH header then pass to the presentation layer and the
presentation layer claims the whole thing from the application layer as its „data‰.
This process is repeated until it reaches the data link layer. The data link layer
adds both the header (DH) and the trailer (DT) to the message and passes it to the
physical layer. The physical layer then passes the whole thing - a long series of
logical bits - to the transmission machine, and the machine sends the
corresponding physical signals to the receiver through the transmission medium.
On the receiverÊs side, the above process is reversed and finally the application
layer on the receiverÊs side gets the data. The key idea of the process is that every
layer has its own procedure and every layer is independent of the others.
came up with a solution - establish a network to connect computers with data, i.e.
ARPANET. In the beginning, only a few researchers agreed with this idea; the
rest were worried that it would not be successful. Finally it worked and has now
become a commercial success - the Internet.
The TCP/IP reference model is shown in Figure 3.5. A brief description of each
layer follows in the figure below:
Process/Application
Host-to-host
Internetworking
Network access
Ć Network access layer: This layer includes the functions of the physical and
data link layers in the ISO OSI reference model. That means it manages
communication mode, device specification and low-level network protocol
(e.g. Ethernet, Token Ring).
Ć Internetworking layer: This layer specifies the format of packets sent across
the Internet and the mechanisms used to forward packets from a host through
one or more routers to the destination.
Ć Host-to-host layer: This layer specifies an end-to-end protocol for the reliable
transfer of data between two application programs.
The process of the packet delivery in this model is similar to that in the ISO OSI
reference model, except that the former has four layers only, and the latter has
seven layers.
In the ISO OSI reference model, some layers - such as session and presentation
layers - are insignificant. It is difficult to put suitable functions into these two
layers. The data link layer, however, is too complicated, so it was divided into
two sub-layers (MAC and LLC). Thus, the design is not good and somehow the
loading of each layer is not well balanced.
The TCP/IP reference model is easy to implement because only two layers - the
Internet and transport layers - are standardised. The host-to-network layer comes
from any low-level network protocol. Any existing low-level network protocol is
welcomed to be the lowest part of the TCP/IP reference model. Also, above the
Internet and transport layers, any application services can be implemented on top
of them. Thus, to design a complete network software solution, you can use any
network components and low-level network protocol for your network, and you
can build any application services you like on top of TCP/IP. What you must do
is include the standard TCP/IP functions. Therefore, you should be able to see
that the TCP/IP model is very simple and efficient when compared with the ISO
OSI reference model.
Your next reading is a long one, and you are advised to make brief notes as you
read the material about protocols, addressing, packet delivery, routing,
congestion control and internetworking. Much of the material here is not really
new, as you were introduced to many of the general concepts in Topic 1 and
Topic 2.
READING
SELF-TEST 3.2
The conversion between bit pattern and dotted decimal notation is simple. To
convert a 32-bit IP address into its dotted decimal notation, we first divide it into
four eight-bits binary integers and then convert each binary integer into a decimal
integer. The conversion is as follows:
ABCDEFGH(2) = A × 27 + B × 26 + C × 25 + D × 24 + E × 23 + F × 22 + G × 21 + H × 20
(10).
For example,
00011110(2) = 1 × 24 + 1 × 23 + 1 × 22 + 1 × 21 (10) = 16 + 8 + 4 + 2 (10) = 30(10).
To convert a decimal integer into binary form, we use long division. For example,
if you wanted to convert 149 into a binary integer, you have:
Thus, 149(10) = 10010101(2). Then the four eight-bits binary integers are combined
into one 32-bits IP address.
3.3.2 Class
A 32-bits IP address has two components - a network identifier and a host
identifier, as shown below.
The network identifier identifies the network that the machine is connected to,
and the host identifier identifies the machine itself on the network. In the
beginning, you might think that optimal values should exist for the sizes of both
identifiers and thus their sizes should be fixed. For example, we could assign 16-
bits for the network identifier and 16-bits for the host identifier. However, it is not
flexible to assign like this. A trade-off exists between the sizes of the network and
the host identifiers.
Ć If the size of the network identifier were larger, this would allow for a larger
possible number of networks in the Internet, with each network having a
smaller number of hosts.
Ć If the size of the host identifier is larger, the number of hosts in a network
would be larger but the possible number of networks would be smaller.
It should be obvious that there are different needs in the real world. An
international organisation expects to be able to assign a large number of hosts to
its network, whereas a local trading company would find a smaller number of
hosts is adequate.
Class A
The first bit of an IP address in Class A is 0, and the next seven-bits are its
network identifier. Thus, Class A has 126 networks - 126 seven-bits network
identifiers excluding two special addresses.
Ć One special address is all 0s (0.0.0.0). This special IP address is allowed only at
system startup and is not a valid destination address. When a system starts
up or a new machine joins the network, they do not have any IP address and
they need to ask the network administrator to assign one. However, when
they send the request, they do not have their source addresses. At this
moment, they use 0.0.0.0 as their source address. Once they learn their correct
IP address, all 0s will no longer be used.
A Class A network has 16.8 million hosts based on 24-bits host identifiers
(excluding two special situations). An IP address with all 0s in its host
identifier (xx.0.0.0) represents its network. If an IP address has all 1s in its
host identifier (xx.255.255.255), it means a broadcast address. All hosts in
the network xx, will receive this message.
Class B
Similarly, class B has two specified leading bits (Â10Ê), 16,382 networks (14-bits
network identifier = 214 ă 2) and each network can have 65,534 hosts (16-bits host
identifier = 216 ă 2).
Class C
Class C has three specified leading bits (Â110Ê), two million networks (21-bits
network identifier = 221 ă 2) and each network can have 254 hosts (eight-bits host
identifier = 28 ă 2).
Class D
Class D is for packet broadcasting. It has four specified leading bits (Â1110Ê) and
the rest of the 28-bits are used to specify a multicast group. Multicasting is
defined as a communication with one sender and many receivers. If a machine is
in a multicast group, that machine will receive any message sent to the multicast
group. Note that a Class D address can be used only as a destination address. It
cannot be used as a source address, because there should be only one sender of a
message. If a Class D, IP address is put in the source address field, we do not
know exactly which machine in the multicast group is the sender.
Class E
Class E has five specified leading bits (Â11110Ê) and is reserved for research and
development.
Thus an alternative solution was considered - establish physical links and routers
to connect networks, and apply the same higher-level communication protocol
for each machine so that receivers can understand the content of packets sent
from senders. The most suitable choice of the common high-level communication
protocol is TCP/IP, because it is simple enough (only two layers) for everyone to
implement.
Ć Bad classification: Originally, having more than one class in the design of IP
addresses was a good idea for different network groups, but the classification
was unrealistic. The number of hosts in a Class A network is unrealistically
large (16.8 million hosts). No network will accommodate such a large number
of host machines. Thus many IP address spaces are wasted in this class. Class
B has the best design, since the sizes of both the network and host identifiers
are appropriate. No one uses Class C to assign IP addresses, because the
number of hosts is too small (254 hosts). That number (254) is insufficient for
the network of a large organisation.
Ć Header length (HLEN) (four- bits): This four-bits field defines the length of
the datagram header in four-byte words. This field is required because the
length of an IP header is not fixed. If there are no options, the minimum
header length is 20-bytes and thus the value shown in this field is 5 (5 × 4 =
20). The maximum value of this field is 15, so the maximum header length is
60-bytes.
Ć Type of service (eight-bits): This field is to let the sender tell the routers (the
inter-networking devices used to route this datagram) how to handle this
datagram. Figure 3.7 shows the structure of the eight-bits.
TOS bits are a 4-bit field to describe a datagram. However, since IP provides
connectionless service (see topic 2 for the characteristics of a connectionless
service) only, the above descriptions are useless for the communication.
Ć Total length (16-bits): The field indicates the total length of a datagram
(including header). The unit of this field is byte (1 byte = eight-bits). The
maximum value of this field 216 ă 1, i.e. 65,535, and thus, the maximum
datagram length is 65,535-bytes.
Now we know the maximum size of a datagram. Then you might have a
question: How are the data handled if the data size is larger than the
maximum datagram size excluding the IP and TCP headers? That is easy to
answer: The data should be divided into several parts so that the size of all
parts is less than the allowable maximum. This job should be done by the
application layer since the maximum datagram size is well known, as is the
size of the IP and TCP headers.
A second question is not easy to answer: If a datagram size is larger than the
maximum packet size of a physical network, how should this be handled?
The solution is fragmentation and reassembling. Fragmentation is a way to
divide a datagram into small datagrams (fragments), whereas reassembling is
recombining all fragments into a datagram. Fragmentation of IP datagrams is
necessary because this feature allows networks with different maximum
packet sizes to be connected, especially networks whose maximum packet
size is less than the maximum size of IP datagrams. Another minor advantage
is that short packets are preferred because long packets make other stations
suffer long transmission delay.
When the size of a datagram is larger than the maximum size, a router breaks
the datagram up into a number of small fragments. The IP header of the
datagram is removed first. Then the data field is cut into several small parts
and each part has an IP header attached to form a new datagram. We call this
kind of new datagram „fragments‰. The IP layer of the destination can then
reassemble the fragments into the complete datagram before passing it up to
the upper layer protocol (say TCP) entity. The reassembling procedure is also
simple - collect all fragments, remove their headers and combine them all in
sequence.
Ć DF (one-bit): When it is set to one, it tells the Internet (router) not to fragment
the datagram. If it is equal to 0, the contents can be fragmented.
Ć Fragment offset (13-bits): This tells where this fragment belongs in the
containing (original) datagram. To reassemble, the destination host must
obtain all fragments starting with the fragment that has offset 0 through the
fragment with the highest offset.
However, the receiver does not get them in order. The first one received is the
first fragment (MF = 1 and fragment offset = 0), and then the receiver waits
for the rest. The second one is the fourth segment (MF = 0 and fragment offset
= 3). The receiver knows it is the last one but the second and the third ones
have not been received yet. Later, the second and the third arrive. The
receiver starts to do the reassembling. It combines all fragments in order and
then sends the complete datagram to the upper layer.
Ć Time to live (8-bits): This specifies how long the datagram is allowed to
remain in the Internet. When a source sends the datagram, it stores a number
in this field. Usually, it is set to be twice (two times) the maximum number of
routers between the source and destination. For example, the maximum
number of routers in Figure 3.2 is five (refer to Self-test 3.1); the value of „time
to live‰ will be set to 10. When a router receives the datagram, it decrements
the value of this field by one. If a router receives a datagram whose time-to-
live value has been reduced to zero, the router discards the datagram.
Ć Protocol (8- bits): This tells the network access layer in the destination host
which upper protocol process to give the datagram to. Usually, it is TCP or
UDP.
Ć Header checksum (16-bits): The receiver uses this field to check whether the
received datagram is correct or not. The Cyclic Redundancy Check (CRC)
method is usually used for error-checking.
Ć Source address (32-bits): The 32-bits field defines the IP address of the source.
Ć Destination address (32-bits): The 32-bits field defines the IP address of the
destination.
3.3.5 IP Routing
This sub-section demonstrates how to route a datagram from a source to its
corresponding destination.
Consider the network shown in Figure 3.8. Suppose H1 wants to send a packet to
H3. We know H1 is an end-station in network NetA, and H2, H3, and H4 are end-
stations in network NetD. H1 communicates with other stations by using the
native protocol of the network NetA (say PrA, e.g. Ethernet). Similarly, H2, H3,
and H4 communicate with other stations with the native protocol of the network
NetD (say PrD, e.g. Token Ring). Note that it is possible that PrA, PrB, PrC, and
PrD do not use the same protocol.
Thus, H1 will transmit the packet by using an IP protocol which H1, R(ABD) (a
router connected to NetA, NetB, and NetD), and H3 understand and agree on. At
the start of the transmission process, in the IP layer, H1 puts H3Ês IP address in
the destination address and its own IP address in the source address. Then, in the
Host-to-Network layer (PrA), H1 puts its own low-level network address in the
source address field of the header of PrA-PDU. However, at this time, H1 puts
the low-level network address of R(ABD) in the destination address field. The
reason is that H3Ês low-level network address cannot be identified by PrA, as the
address belongs to NetD and its format is in PrD. Therefore, instead of sending
the packet directly to H3, H1 will send the packet to router R(ABD) and expect
the router to redirect the packet to H3. The packet formats in the different layers -
and the travelling path of the packet - is shown in Figure 3.9.
When NetA routes the PrA-PDU to the destination R(ABD), R(ABD) will extract
the IP datagram from the PrA-PDU, look at the destination address and decide
that the destination is on NetD. So, R(ABD) sends the datagram to station H3
through NetD, embedding the datagram in a PrD-PDU. This time, the source
address of the PrD-PDU is the low-level network address of R(ABD), and the
destination address is the low-level network address of H3. R(ABD) knows the
low-level network address of H3 because they are in the same network, i.e. NetD.
When H3 receives the PrD-PDU, it will extract the IP datagram and obtain the
data.
2. R(AC) might also receive this packet, but when it checks the destination
address of the IP datagram, it would find that the destination station does
not belong to NetA or NetC. So, R(AC) will take action according to its
routing table - if the destination network address is unknown, it may
forward it directly to the default route; and
3. All routers have at least two layers - IP and host-to-network. The reason is
that, routers need an IP layer to extract IP datagrams and perform IP
routing, and use the host-to-network layer to do the actual switching and
forwarding functions.
corresponding mask is „anded‰ with the destination. Let us check the first entry
of Table 3.1. When the destination address is „anded‰ with 255.0.0.0 (the network
mask in the first entry of Table 3.1), we have 144.0.0.0. Therefore, the destination-
network address should be 144.0.0.0. From the first entry of routing table in Table
3.1, the destination-network address is 20.0.0.0, and thus the first entry is not
matched. The „AND‰ operation of the above process is shown below:
Note that it is easy to design a mask for a network identifier. We first check how
long the network identifier is, and then we set all the bits in the network
identifier to one, and the rest of the bits (the host identifier) to zero - and that
becomes the mask of the network. That is, the network-id mask is a bit
combination used to describe which portion of an address refers to the network
and which part refers to the host.
The above process is repeated for each entry in the router table, and finally we
find that the fifth entry matches. Thus, we know the network identifier of the
destination address is 144.214.0.0. From the table, we find that we need to send it
to 192.4.10.8, the IP address of router R3 (the next hop). Note that although R3 has
two IP addresses, we send the packet to 192.4.10.8 but not to 144.214.0.5. The
reason is that, R2 and R3 are in the same network 192.4.10.0. Therefore, R2 knows
the low-level network address of R3 and one of the IP addresses of R3 (the one
with the same network identifier of R2). Therefore, the packet will be sent from
192.4.10.9 through the network 192.4.10.0 to router R3 with the IP address
192.4.10.8. Follow this description very carefully with regard to Figure 3.10 below.
Note that „direct deliver‰ in the routing table in R2 means that the packet has
already arrived at the destination network. Thus the packet will be directly
delivered to the destination station, since the router knows the low-level network
address and the IP address of the destination.
Sometimes, there might be more than one path from one end to the other end.
However, a routing-table provides only one solution for each destination
network. The next hop is usually chosen, on the basis of the transmission delay of
the path from the router to the next hop that is the shortest.
signalling errors or unusual situations. The error messages can be classified into
five groups:
1. Source quench: Quench might be a new word for you. Two synonyms for
quench, as it is used here, are to reduce or to put out (like a fire) - in other
words, to stop or minimize something. A router sends a source quench
whenever it has already received too many datagrams, and no more buffer
space is available to receive more datagrams. The router will be temporarily
out of space and start to discard any incoming datagrams. When it discards
datagrams, it sends a source quench message to the sender of the discarded
datagrams and expects it to reduce the transmission rate.
2. Time exceeded: There are two situations in which this message is sent to the
sender when a router receives its datagram:
(a) When the time-to-live of the datagram has expired (i.e. the datagram
has stayed in the network for a long time so that the value of time-to-
live has been reduced to zero). Can you remember what determines
the „length‰ of time-to-live?
(b) When the reassembly timer has expired. When a datagram has to be
fragmented during the transmission, it is divided into small fragments
and all of them will continue to be sent to the receiver. When the
receiver gets the first fragments, it will start a reassembly timer and
then wait for the rest of fragments. If the timer has expired but the
receiver has not received all of the fragments, the received fragments
will be discarded and an ICMP message will be sent to inform the
sender.
READING
SELF-TEST 3.3
1. An IP address is 144.214.6.20.
(a) What is the class of the IP address?
(b) What are the network and host identifiers of the IP
address?
You might already have a question in mind: Why do we need TCP to provide a
reliable point-to-point service? Almost all low-level network protocols such as
Ethernet and Token Ring are able to provide reliable services, so TCP should not
be necessary in order to handle reliability again. Moreover, the efficiency of TCP
is low because of processing this duplicate function.
The above description also explains why TCP can handle multiplexing. Here,
multiplexing means the way to handle more than one connection in a single
computer. For example, you are allowed to visit the CNN Web page, download a
file through FTP services and send email at the same time. Since each connection
has a unique port number, it is easy to identify which packets belong to which
connection.
Segment Header
PDUs (Protocol Data Units) of TCP are called segments. A segment header has a
fixed size (20-bytes) and is shown in Figure 3.11.
Ć Source port (16-bits): The 16-bits field defines the TCP port number of the
source application program.
Ć Destination port (16-bits): The 16-bits field defines the TCP port number of the
destination application program.
Ć Sequence number (SEQ) (32-bits): This identifies the position in the senderÊs
byte stream of the data in the segment. It is used for the positive
acknowledgement time-out retransmission mechanism, described later.
Ć TCP header length (4-bits): This shows the length of the header of the TCP
segment (in units of 32-bits words).
Ć URG (one-bit): When it is set to one, the urgent pointer is in use. It is used to
draw the attention of the receiver.
Ć PSH (one-bit): When it is set to one, it indicates to the receiver that it should
deliver the data (and any already buffered) to the application program;
otherwise, the receiver may buffer (and only deliver when the buffer is full) for
efficiency. This bit is used when the sender temporarily has nothing to send, or
at the end of data transmission, so that the receiver has time to handle the
received data.
Ć FIN (one-bit): This bit is used for connection release. When it is set to one, the
sender has reached the end of its byte stream.
Ć Window size (16-bits): This field is used for flow control. The mechanism of
the flow control is described later.
Ć Checksum (16-bits): This field is used for the receiver to check whether the
received segment is correct or not. The Cyclic Redundancy Check (CRC)
method is usually used for error-checking.
Ć Urgent Pointer (16- bits): This is used to specify the position in the segment in
which urgent data ends. The urgent data are in the data field of the segment.
1. To get things going, the sender sends a connection set-up request to the
receiver. The request segment includes SYN = 1, ACK = 0, and SEQ = x
where x is an arbitrary positive integer less than 232 ă 1.
2. When the receiver receives the message and wants to accept the request, it
replies with a connection set-up accept to the sender. The reply segment
includes SYN = 1, ACK = 1, SEQ = y and ACKN = x + 1. The ACKN is set to
x + 1 because, the receiver wants to indicate to the sender that it correctly
received its message with sequence number x and expects to receive
message x +1. The SEQ is set to y where y is another arbitrary positive
integer.
3. After the sender receives the accept message, it sends a connection set-up
confirm message to the receiver to confirm the connection, and then data
transmission will begin. The SEQ and ACKN are set to x + 1 and y + 1
respectively, because they indicate to the receiver that the sender has
correctly received the receiverÊs message with sequence number y. Note
that once the connection is established, both sides can send and receive
segments simultaneously.
Note that a new set of starting sequence numbers is used on connection set-up.
This is to avoid any segment from a previous connection session between the
same processes confusing the current connection.
Connection Termination
TCP connection termination is a four-way handshaking mechanism, as shown in
Figure 3.13.
Connection Resetting
TCP may make a request for connection resetting. A connection must be reset if
the current connection is destroyed. There are three possible situations in which a
connection could be destroyed:
1. The sender requests a connection to a port that does not exist or is occupied
by other users. Then the receiver will send a segment with RST = 1 to reject
the request;
3. One side might find the other side is idle for a long time; it sends a segment
with RST = 1 to destroy the connection.
Data Transfer
Normal Operation
TCP is a reliable transport protocol. Damaged and lost segments are handled by a
positive acknowledgement time-out retransmission mechanism.
For example, a sender sends a 100-byte segment with SEQ = 1,000 to its
corresponding receiver. If the receiver receives it correctly, the receiver will reply
with a positive acknowledgement (a TCP segment with no data and ACK = 1)
with SEQ = 1,100, i.e. the sequence number that the next segment sent to the
receiver should carry. You might wonder why the sequence number of the
positive acknowledgement is 1,100, and not 1,001. In fact, this is how TCP is
implemented. TCP will actually add the data size on to the SEQ number and
acknowledge with the SEQ = SEQ(received) + Data size(received)
1,000 + 100 = 1,100. Figure 3.14 shows the normal operation of TCP segment
transmission.
The error correction mechanism is simple. When a sender sends a segment out, it
starts a timer for the segment. When the timer expires (the time-out period is
over) but no positive acknowledgement has been received, the sender assumes
the sent segment is lost or damaged, and thus it will be retransmitted. Then a
timer for the retransmitted segment will be started. The sender hopes the receiver
will receive the segment correctly this time, and that the sender will receive a
positive acknowledgement within the time-out period.
Damaged Segment
Figure 3.15 shows a damaged segment arriving at the destination. The sender
sends three segments to the receiver, each of 100 bytes. The sequence number of
the first segment is 1,000. The receiver receives the first and second segments,
checks them and finds there is no error. Then it acknowledges the two segments
by sending an acknowledgment with the sequence number 1,200 to the sender.
The sequence number 1,200 means that up to 1,199, all segments are received
successfully and the receiver is expecting the next segment with sequence
number 1,200. This is called accumulative acknowledgement system.
Note that sometimes the receiver is not required to acknowledge every segment.
When a sender sends many segments to its corresponding receiver, the receiver
may send back fewer acknowledgements rather than the same number of
acknowledgements, because it saves network resources. But of course the most
important thing is to make sure that the timer in the sender is not expired, i.e.
every successfully received segment should be acknowledged before time-out.
Now the third segment with sequence number 1,200 is sent, but it is damaged
before arriving to the receiver. This time the receiver does nothing and lets the
timer in the sender expire. After the timer expires, the sender retransmits the
third one, and this time the transmission is successfully completed.
Lost Segment
Figure 3.16 shows a lost segment in the data transmission. This is similar to the
situation of a damaged segment. The third segment with sequence number 1,200
is sent but lost before arriving at the receiver. Since the receiver does not receive
anything, it does nothing and the timer at the senderÊs end expires. After the
timer expires, the sender retransmits the third one. This time, the transmission is
successfully completed.
Lost Acknowledgement
Figure 3.17 shows a lost acknowledgement sent by the receiver. Sometimes the
sender does not even notice a lost acknowledgement. In this example, the first
acknowledgement with sequence number 1,200 is lost but the second one with
sequence number 1,300 is received properly. Since this is an accumulative
acknowledgement system, if the second acknowledgement (SEQ = 1,300) is
received and the timers of the three segments have not expired, all three
segments are successfully acknowledged.
Even if the second acknowledgement is not received, the system can still recover
from this error. In this situation, the timer of the first segment will expire first,
and then the first segment will be retransmitted. According to the sequence
number of the segment, the receiver knows it is a duplicate segment. The receiver
will discard the segment and send the corresponding acknowledgement back to
the sender. The rest of the segments are handled in the same way.
Piggybacking
TCP offers full-duplex service, in which data can flow in both directions
simultaneously. When a TCP connection is established between A and B, A can
send data to B and B can send data to A. When a segment is sent from A to B, it
can also carry an acknowledgement of packets received from B. Similarly, a
segment sent from B to A can carry an acknowledgement of packets received
from A. This is called „piggybacking‰, because acknowledgements can be sent
along with the data. Note that if one side does not have any data to send, it can
just send an acknowledgement without data. Figure 3.18 shows TCP
communication with piggybacking.
Flow control
A network is stable if the input rate is the same as the output rate.
When a sender injects packets into a network, the network routes the packets to
its corresponding receiver and the receiver retrieves the packets from the
network. If the number of packets sent is not large, flow control is not important,
because the receiver can retrieve the packets slowly from the network, and the
input packets will not cause network congestion. However, if the number of
packets sent is sufficiently large or the packet transmission time is sufficiently
long, the input rate must be controlled - especially if the input rate is greater than
the packet-retrieving rate of the receiver. Such a discrepancy (difference) causes
network congestion.
The easiest way to solve this problem is to let the receiver control the input rate.
This means that the receiver tells the sender how many packets it can send. This
kind of control is known as flow control. The receiver uses flow control to control
the rate of packets that it is receiving. This is analogous to a conversation between
a young man and an old man. The young man speaks so fast that the old man
cannot follow what he is saying. The old man will ask the younger one to speak
slowly so that he can keep up with him.
The flow control protocol used in TCP is dynamic window protocol. The rules of
the dynamic window protocol are simple:
2. The sender keeps a send window size variable, which is the number of
packets it can send. On sending a packet, the send window size is reduced
by one. If the senderÊs window becomes zero, the sender stops sending
packets.
3. On receiving a window advertisement, the sender sets its send window size
to the value of the window size contained in the window advertisement.
If the receiver cannot handle any more packets for now (e.g. the computer is too
busy to handle other things or suddenly there is no buffer to receive data), it can
send a window advertisement whose window size is zero to stop the sender.
Figure 3.19 shows an example of flow control. The maximum segment size of the
sender is 1,000 octets, and the maximum window advertisement is 2,500 octets.
To start the process, the receiver sends an advertisement window of 2,500 to the
sender to indicate that the receiver allows the sender to send packets with a
maximum of 2,500 octets. Then the sender starts to send packets. First, it sends
packets of 1,000 octets with sequence numbers up to 1,000. Later, it sends packets
of 1,000 octets again but with the sequence number up to 2,000. Finally, the third
group of packets of 500 octets is sent and then the sender is blocked, because the
send window size is 0. The receiver correctly receives all packets and
acknowledges all of them. The sender receives all acknowledgements, but it waits
because the send window size is still zero. Later, the application program of the
receivers reads 2,000 octets and thus 2,000 octets of buffer space become free
(available). The receiver sends a window advertisement of 2,000 to the sender to
inform it. Then the sender receives the message and starts to send more packets
(2,000 octets) until the send window size becomes zero again. The receiver
correctly receives all packets and acknowledges them.
The maximum send window size (also the maximum value of window
advertisement), can have a significant effect on the performance of the protocol in
the maximum data transfer rate. The following numerical example shows the
effect of the window size.
To complete a packet transmission, the sender first sends a packet to the receiver.
The packet arrives at the receiver, after the end-to-end propagation delay. Then,
after the packet transmission time, the receiver correctly receives the whole
packet and then processes the packet (e.g. does the error checking, places it in the
receiverÊs buffer) and then sends an acknowledgement back. After the sender
receives the acknowledgement, the packet transmission is completed.
Congestion Control
A packet sent from the sender might need to pass through several routers to
reach the receiver. Each router has a buffer that stores the incoming packets,
processes them (e.g. extracts the destination IP address of a packet, searches the
routing table), and forwards them. If the packet-receiving rate is faster than the
processing rate, congestion occurs and some packets may be discarded. The worst
possibility is that when some packets are discarded, they cannot reach their
destination, and therefore, their senders will not receive any positive
acknowledgement. Then, the senders will retransmit their packets, thereby
creating more congestion.
TCP needs to solve this problem. In the real world, if there is traffic congestion,
we usually need someone (e.g. a police officer) to monitor the traffic. However,
since the Internet does not belong to any organisation or company, it is difficult
to find someone doing the same thing. Thus a distributed congestion control
mechanism is needed (i.e. everyone takes some the responsibility to solve the
congestion problem). The name of the congestion control method in TCP is slow-
start algorithm. Let us go through the method. Then we explain why it is called
slow-start.
The sender uses the smaller of the two for actual transmission. A threshold, T, is
an integer such that the congestion window will increase exponentially before
reaching the threshold. Usually, T is initially set to 64 Kbytes. The procedure of
the slow-start algorithms is listed below:
1. Wc = 1 (usually 1-Kbytes).
2. When (a) a window is sent, (b) there is no time-out, and (c) Wc is smaller
than T (Wc <T), Wc = min (2 × Wc, T) (growth rate is exponential).
3. When (a) a window is sent, (b) there is no time-out, and (c) Wc is not
smaller than T (Wc >=T), Wc = Wc + 1 (growth rate is linear).
Here is an example to show the process of congestion control. The initial value of
T is 64 (i.e. 64-Kbytes), and time-out has occurred when the transmission number
is 12. Table 3.2 and Figure 3.21 show an example of congestion control. Study
Figure 3.21 carefully.
Table 3.2: Congestion Control Algorithm (Timeout Occurred at Trans. no. 12)
Trans. No. 0 1 2 3 4 5 6 7 8 9 10
WC (KB) 1 2 4 8 16 32 64 65 66 67 68
11 12 13 14 15 16 17 18 19 20 21 22
69 70 1 2 4 8 16 32 35 36 37 38
Figure 3.21 shows that the congestion window increases very slowly at the very
beginning (that is why it is called slow-start) but then increases quickly later.
When it increases up to 64, the rate of increase changes from the exponential rate
to the linear rate. When the 12th group of packets is sent, its acknowledgement is
time-out (the timer expires but the acknowledgement is not received yet) and
thus the threshold is reduced from 64 to 70 / 2, i.e. 35. The congestion window is
reset to one. Then the congestion window increases again at the exponential rate
until it meets the new threshold, 35.
From the above example, we see that there are two application programs, X and
Y, in a machine connected through the Internet to another two application
programs, M and N respectively, in another machine. Although X and Y are in
the same machine with the IP address 144.214.12.38, X and Y will not have any
conflict because they have different TCP port numbers (X is 290; Y is 23). Also,
they will not communicate with the wrong application programs, because their
partners also have different TCP port numbers (M is 1326 while N is 2529).
SELF-TEST 3.4
UDP is a simple process-to-process protocol that adds only port addresses, check-
sum error control, and length information to the data from the upper layer. There
is no positive acknowledgement time-out retransmission scheme, since there is
no sequence number in the UDP header. There is no flow control since there are
no window advertisements and window size fields. There is no congestion
control, because no acknowledgements are expected. That is why it is a
connectionless protocol.
Basically, UDP is a simplified TCP that is suitable for applications requiring short
communication exchanges. For short communications, the TCP connection setup
time is a heavy overhead and thus, UDP is more suitable or efficient. Also, since
the communication is short, flow control and congestion control do not need to be
applied in the communication.
The following short reading deals with features of UDP and TPC. Please pay
particular attention to the transmission parameters that are ideally served by
UDP, and note the reliability guarantees offered by TCP.
READING
DNS specifies only the top level. The authority of assigning domain names under
each node in the top level is delegated to the organisations responsible for that
node. For example, the Hong Kong Internet Organization is responsible for the
hk domain. Another example is the Open University of Hong Kong. It has the
authority to assign any domain name with ouhk.edu.hk as the suffix, e.g.
learn.ouhk.edu.hk. The meaning of domain names is shown in Table 3.3.
DNS Server
A DNS server provides domain name mapping services to its clients. When
sending a service request to map a domain name of a machine, the DNS server
replies with the IP address of the machine. The above process is called name
resolving. Each machine on the Internet has a piece of software for resolving
names. It is often known as a resolver. For example, in UNIX, this is accessed by
calling gethostbyname. A resolver is configured with the IP address of a local
DNS server. When called, it packages a request to that DNS server. When the
DNS server returns the result, the resolver relays the result to the caller.
When a request reaches a DNS server (usually the closest DNS server), the name
is extracted. If the server is an authority for the name, the name appears in its
mapping database and a lookup of the database will return the IP address.
Otherwise, this DNS server will become the client of another DNS server and will
send a request to that DNS server. When the reply comes back, it in turn replies
to the resolver.
1. Replication. Each root server is replicated. Many copies of the server exist
around the world. In practice, the geographically closest server usually
responds best. Thus many duplicated root servers share the loading.
READING
3.5.2 Email
Simple Message Transfer Protocol (SMTP)
Email is one of the most common Internet applications in the world. Figure 3.25
shows a general architectural model of an email system. This system is called
SMTP (Simple Message Transfer Protocol).
The addressing of SMTP is simple. Each electronic mailbox has a unique address,
which is divided into two parts - the first part identifies a userÊs mailbox and the
second part identifies a computer on which the mailbox resides. For example, an
email address mt368@learn.ouhk.edu.hk represents „an email account called
mt368 in the learn server of The Open University of Hong Kong‰.. Email software
on the senderÊs computer uses the second part to select the destination. The email
software in the recipientÊs computer uses the first part to select the particular
mailbox.
Email Format
Now, let us investigate the message format of email. An email has two parts:
header and body. The header includes the following fields:
Ć To: email address(es) of primary recipient(s)
Ć Cc: email address(es) of secondary recipient(s)
Ć Bcc: email address(es) of blind copies (the same as cc but primary recipient(s)
cannot know those who have also received this email)
Ć From: a person who created the message
Ć Sender: email address of the actual sender
Ć Date: the date and time that the message was sent
Ć Subject: subject matter of the message
Ć Message-id: unique number for reference
Ć Reply to: email address to which replies should be sent.
Ć The body is simply the message itself.
Security
The current Internet email systems have significant security weaknesses.
1. Senders can be faked. Anyone can fake an email address as the sender.
SMTP can be accessed without any protection.
2. Messages can be tapped. Anyone who can tap into the path of the message
can get a copy easily.
One of the most common ways to handle this is MIME (Multipurpose Internet
Mail Extension). The basic idea of MIME is to add structure to the message body
and define encoding rules for non-ASCII text data, i.e. binary code → ASCII text
→ e-mail → ASCII text → binary code. We encode every 6-bits to 64 base
numbers and they are A, B, C, ⁄, Z, a, b, c, ⁄, z, 0, 1, 2, ⁄, 9, +, and /.
3.5.3 Telnet
Telnet is a remote terminal access protocol, shown in Figure 3.26. It allows a user
to access a remote computer as if he or she were directly interfaced to the
computer through the keyboard (for input) and display (for output).
When the Telnet operation is started, the Telnet client makes a TCP connection to
the remote server. After the connection is established, the keystroke input of the
client is transmitted to the server through the connection. The server handles the
keystroke input as local input and executes any commands from the input. Then
the response shows on the screen display in the server. At this moment, this
display is transmitted to the client and thus, the client has the same screen
display. Usually the screen display is in text format (i.e. text output). Graphical
output is allowed for some Telnet software, but the communication is more
complicated.
FTP commands will be listed when you type „?‰ and the press „enter‰ key in the
ftp prompt.
The two file types for FTP are text files and binary files. The content of a text file
is transferred as ASCII text. The file content of a binary file is transferred as a byte
stream. Note that the default mode of FTP is for text file. If you want to change to
binary file mode, you need to type „b‰ or „binary‰ and then press the „return‰Ê or
„enter‰ key to change it. If you want to change back, you just need to type „a‰ or
„ascii‰ and then press „return‰ or „enter‰. You can use the binary file mode to
transfer text files, but there will be an „^M‰ at the end of each line. However,
transferred files will be in wrong format if you use text file mode to transfer
binary files.
SELF-TEST 3.5
ACTIVITY 3.1
If you work in an office with a network that is connected to the
Internet, please check the TCP/IP setting of your machine (e.g. IP
address). Also check what Internet applications are used in your
machine. Based on what you have learned in this topic, do you
believe your system is as efficient and effective as it could be?
This topic looked at some concepts of WANs, TCP/IP, and some Internet
applications.
By now you should know what a WAN is, and know about its packet-switching
technology. You should also understand the two reference models associated
with layered network architecture - the ISO OSI reference model and TCP/IP
reference models - and their differences.
T op i c 4 X Interprocess
Communication
(IPC) and Remote
Procedure Calls
(RPC)
LEARNING OUTCOMES
By the end of this topic 4, you should be able to:
1. Explain the concepts of marshalling and unmarshalling;
2. Describe synchronous and asynchronous communication;
3. Outline the steps in a remote procedure call (RPC); and
4. Write network programs for RPC.
X INTRODUCTION
What is interprocess communication (IPC)? IPC is a communication method that
allows communication among processes that might be located in different
machines. The second question is: Why do we need IPC? We know that the
components of a distributed system are both logically and physically separated.
We need IPC to let them communicate in order to interact.
2. For group communication, the exchange of messages takes place from the
server to a group of clients (members). We need a server to collect messages
from clients (members) and broadcast them to all members in the group.
For client-server communication, we use the two transport level protocols.· TCP
and User Data Protocols(UDP). TCP provides a two-way stream communication
between senders and receivers. It includes error control, flow control, and
congestion control. UDP provides simple message passing abstraction, which
means that it simply passes messages from a sender to its corresponding receiver,
and leaves the higher system layers to handle the controls.
This topic goes through the details of client-server communication. You study the
concept of marshalling and unmarshalling with a common external data form ·
External Data Representation (XDR). Later, we introduce the concept of
synchronisation between clients and servers, and investigate two kinds of
synchronisation · synchronous communication and asynchronous
communication. Then we discuss the design and implementation problems of
client-server communication. After that, we introduce a high-level model for
client-server communication · remote procedure call (RPC). We show its design
and implementation problems. Finally, we give the C source code of RPC and one
simple example to show how to use the RPC programming to implement simple
client-server communications.
struct Data {
int length;
char flag;
char buffer[20];
}
To flatten this data structure, you would need to pack the 4-byte integer variable
length, then the 1-byte character flag, and then the next 20 bytes of character
array buffer to become the flattened external data form.
To send a data stream, you need to flatten the data first while, on receiving the
data stream, the data structure must be rebuilt for the receiver to receive the data
correctly.
Ć For two computers of the same type, the conversion to external data form
may be omitted. The sender may send the data bit by bit in its own way, and
the receiver can receive and rebuild the data easily, since the receiver knows
how to handle it.
Ć However, if two computers are not the same type, another way can be used
by converting data into an external data form (or format) · data values are
transmitted in their native form (such as character and integer format) using
special identifiers.
In IPC, the way to flatten the data is called marshalling and the sequence of bits is
called external data form. The way to rebuild the data from the external dat form
is called unmarshalling.
The idea behind XDR is quite simple: Each message consists of a sequence of 4-
byte objects. The three types of object are: cardinal/integer, character and others.
The first two types are the most common types of data object, so we focus on the
last one. Except for the integer and the character types, we put all of the rest into
the type ÂothersÊ. ÂOthersÊ then represents the data types as sequences of bytes
with the length specified. The representation of each string consists of a data type
unsigned long representing its length in bytes followed by the characters in
the string. The length of the string is stored in a 4-byte object. For simplicity, the
characters in a string are assumed to occupy one byte per character. If a string
cannot completely fill in a 4-byte object, the unused bytes in the 4-byte object are
padded with zeros.
4.2 SYNCHRONISATION
Now you know how computers communicate with each other and the way they
send data using data structures created by the marshalling process. However, we
have no idea how servers synchronise with clients. If a client starts to send a
request to its server, how does the server know when to receive the request? The
server was always waiting for the clientÊs request. Is that the only way to
implement a server? How does a server handle things if it has its own
background job running and simply takes care of an incoming request only when
it detects a request is coming? This is the main issue discussed in this section.
The two types of communication operation are send and receive. Combined with
the above synchronisation operations, we have four types of synchronisation
operation for communication:
1. Blocking send: The issuing process blocks all processing in its own system
(i.e. control is not passed back) until the message sent to the receiving
process has been received. When a process uses the blocking send operation,
a message (request) is delivered to its corresponding remote process. Once
the remote process confirms it has received the message, the issuing process
continues executing it own processing. This is similar to phoning a friend
and waiting for your friend to answer · you donÊt leave the phone
unattended while you go and do something else.
2. Blocking receive: The issuing process blocks its own processing until a
message has arrived. In other words, if your phone is not being used (the
phone is blocked because it is not processing anything), then you can be
contacted by phone.
The following system calls are used for synchronous communication with error
handling:
Ć Send (B, msg, TO): A sender, say Machine A, sends a message msg to
Machine B, and blocks. If the message is not received in TO (time-out)
seconds (or an acknowledgement is not received yet after TO seconds), the
process is unblocked and an error code will be returned to inform the sender.
If the client wants to retry after the time out, the client simply uses a loop to
repeat the calling of this function.
processing time of a client process is usually very short compared with the
round-trip time between the client and server. If the non-blocking send operation
is used, the network transmission time and the processing time of the remote
service can be ignored, since the client process is executing its own processes
while the remote server processes its request.
1. It is difficult to inform the client process that the reply message has been
received.
2. If the client process is informed, it is not easy to switch from the current job
to handle the incoming message.
To overcome the first problem, you should consider using either polling or
interrupt to receive the message. To implement polling, a child process is created
to wait for the incoming message. When the child process receives the message, it
stores the message in a specified location · which the parent process also knows.
Then, after creating the child process, the parent process examines the specified
location from time to time. When the parent process finds the message has been
received, it takes action to handle the message.
Implementing the interrupt solution is more complicated. Just like polling, a child
process must be created to do the non-blocking receive operation. But this time,
the child process will turn on an interrupt vector to inform the parent process that
the message has arrived. The interrupt will stop the execution of the parent
process, and then the parent process will take action to handle the message.
READING
SELF-TEST 4.1
1. When is it unnecessary to use the external data form in
marshalling?
procedure assumes the system has crashed and the procedure doOperation
will be aborted.
2. Loss of reply message (Case 2): A reply message might also be lost because
of communication link failure, network switch failure, and/or crash of the
receiver. The way to handle the loss of the reply message is the same as for
the loss of a request message.
3. Unsuccessful execution of the request (Case 3): This might happen if the
server crashes while executing the request. When it happens, the server will
shut down and have to be restarted. Usually, all uncompleted executions
before the server crash will be aborted and re-executed after the server
restarts.
from the history, the server process will copy it out, put it to the reply message,
and send the reply message back. The size of the history can be controlled by a
simple timer. When a result was stored over a certain time limit (time-out period),
it will be discarded.
Compared with the duplicate request message, it is easy to handle the duplicate
reply message. The procedure doOperation will be closed when the reply
message is received. Thus, if the first reply message is received, the procedure
will be closed and the duplicate reply message will be ignored. If the first reply
message is actually lost, the procedure will retransmit the request message. Later,
when the duplicate reply message is received, the procedure just treats it as the
first reply message.
1. The request (R) protocol: Figure 4.4 shows the R protocol. A client issues a
procedure Send (ServerPort, RequestMessage) and continues its own
processing. It is suitable when no reply is required from the server and
when the client requires no confirmation that the request has been carried
out.
2. The request-reply (RR) protocol: Figure 4.5 shows the RR protocol, the most
commonly used. The reply message from the server also acts as an
acknowledgement to the original request message. A subsequent request
from the client may be regarded as an acknowledgement of the serverÊs
message.
READING
Section 4.4, 155ă64. This reading provides another perspective on
client-server communication.
SELF-TEST 4.2
1. What are the advantages of doOperation-getRequest-sendReply
communication (synchronous communication) over send-receive
communication (asynchronous communication)?
Figure 4.7 shows a typical RPC model using the RR protocol. When a process on
Machine A calls a procedure on Machine B, all processing on A is suspended
(blocked) and the execution of the called procedure takes place on B. When B
finishes executing the procedure, it sends the result to A. After A receives the
result, it resumes its execution (it unblocks). Note that information can be
transported from the caller to the callee in the parameters and can come back in
the procedure result. No message passing or I/O at all is visible to the
programmer.
Figure 4.8 provides another perspective on the 11 steps listed above. Note,
however, that the StevensÊs model describes the RPC process using ten steps (not
11), but the overriding logic should still be clear to you.
Several things need to be clarified. First, both client and server stubs are
automatically generated by rpcgen. rpcgen is an interface processor (compiler)
that integrates the RPC mechanism with client and server programs in
conventional programming languages. It has four functions:
1. to generate a client stub procedure;
2. to generate a server stub procedure;
3. to use the signatures of the procedures in the interface to generate
marshalling and unmarshalling operations in the above stub procedures;
and
4. to generate procedure headings for each procedure in the service from the
interface definition.
The details of rpcgen are discussed later in the section of RPC implementation.
Programmers donÊt need to marshal and unmarshal their request and reply
messages, since both client and server stub procedures are automatically
generated. Programmers also donÊt need to consider the socket (TCP/IP)
programming for their message passing, because the stub procedures handle that
aspect of the communication as well. Thus, what they need to do is to develop
their own client process and remote services. The rest will be handled by the RPC.
On the client side, the client stub receives the procedure call from the client
process. Thus when the client process calls remote procedures, it just calls a local
procedure in the normal way and the procedure is stored in the client stub. When
the client stub receives the call, it will do the rest, because the control now is still
in the client stub after calling the local procedure. Any error handling is included
in the client stub; the client process doesnÊt need to take care of it. When the reply
arrives at the client, the client stub unmarshals the result and passes it to the
client process. From the point of view of the client process, it just receives the
result from a local procedure; where the procedure was actually performed is
totally invisible to it.
On the server side, a remote service is just a function or procedure but not a
process. That means the remote service is not a complete program. When a
request message arrives, the server stub receives the message. Then it chooses an
appropriate remote service and makes a call (a local procedure call) to it in order
to serve the request. Since the server stub takes the action first, it is a process. As
the server stub calls the remote service which is just a function or procedure. On
the client side, it is totally different. Since the client process executes first, it is a
complete process. Since the client stub is called by the client process, the client
stub is just a function or procedure. Or, we can say that the client stub is passive
and the server stub is active. Also, the client process is active while the server
remote service is passive.
Characteristics
After discussing the RPC mechanism in general, we now investigate its strengths:
Ć Simple call syntax: RPC is designed to have exactly the same syntax as a local
procedure call, and in fact, in the view of the client process, RPC is exactly the
same as a local procedure. Thus programmers donÊt need to involve anything
new when they call remote procedures.
Ć Well-defined interface: Since the RPC generator is open to the public and the
way to implement RPC is well defined, we have a well-defined interface.
Ć Ease of use and efficiency: Since the communication part and the marshalling
and unmarshalling procedures are automatically generated by rpcgen,
programmers find RPC is easy to use and very efficient.
Although RPC has the above good features, it also has some limitations:
Ć Speed: Remote procedure calling (and return) time (i.e. overheads) can be
significantly (1ă3 orders of magnitude) slower than that for local procedures.
It usually it takes 10 to 100 μs to finish a local procedure. However, RPC
involves network transmission, which takes several milliseconds at least. This
might affect real-time design, and the programmer should be aware of its
effect. Later, in an activity of this topic, you are required to investigate this
effect by measuring the time taken to complete a remote procedure call.
1. ÂMaybeÊ call semantics: After a RPC time-out (or a client crashed and then
restarted), the client is not sure if the remote procedure has or has not been
called. This is a situation in which no fault tolerance is built into RPC
mechanism. Clearly, call semantics is not desirable and we have no way to
guarantee whether the RPC is successful or not.
2. At-least-once call semantics: With this call semantics, the client can assume
that the remote procedure has been executed at least once (on return from
the remote procedure). That means the client does not mind if the remote
procedure is executed more than once, but the client does not allow it never
to be executed. In other words, it can be executed more than once but it
must be executed once at least. This call semantics can be implemented by
retransmission of the call request message when a time-out occurs. Clearly,
the limitation of this call semantics is that the serverÊs operation must be
idempotent. Can you remember why?
1. Interface processing integrates the RPC mechanism with client and server
programs in conventional programming languages. The interface compiler
(called RPC generator in Sun RPC) processes interface definitions written in
an interface definition language. After executing, the interface compiler
generates client and server stub procedures with marshalling and
unmarshalling operations. Procedure headings for each procedure in the
service are also generated from the interface definition along with a process
for dispatching of request messages to the appropriate procedure in the
server.
Servers use two binder interfaces, Register and Withdraw, and clients
use one binder interface, Portlookup. The procedure Register (String
ServiceName, Port ServerPort, int version) is used to register the
name of a server process ServiceName with its serverÊs port ServerPort.
Note that the integer version is used to record the number of versions
registered in the server. The procedure Withdraw (String ServiceName,
Port ServerPort, int version) is to withdraw the registration. The
procedure Portlookup (String ServiceName, int version) is used
by a client to search the corresponding ServerPort by given
ServerName.
READING
Section 5.3, 197ă201. The short section describes remote procedure
calls quite concisely and provides a brief description of binding.
SELF-TEST 4.3
4. How does the client transfer its call request (the procedure
name) and the arguments to the server via the network?
5. How does the server react to a request from the client? How
is the procedure selected? How are the arguments
interpreted?
The communication handling used in Sun RPC is TCP or UDP. They are
implemented by socket programming. The example shown in this section is
under RPCSRC 3.9 in 4.3 BSD UNIX.
The simple example shown in this section is that the client calls remote services
using RPC. The server has the following two functions:
Ć bin_date_1: This returns the current time as the number of seconds since
00:00:00 GMT, January 1, 1970.
Ć str_date_1: This takes a long integer value from the above function and
converts it into an ASCII string that is human readable format.
date.x is a RPC specification file that specifies the signatures of the remote
server procedures in date_server.c. The content of date.x is listed below:
/*
* date.x - specification of remote date and time service
*/
/*
* Define 2 procedures:
* bin_date_1() returns the binary time and date (no
arguments).
* str_date_1() takes a binary time and returns a human-
readable string.
*
*/
program DATE_PROG {
version DATE_VERS {
long BIN_DATE(void) = 1; /* procedure number = 1 */
string STR_DATE(long) = 2; /* procedure number = 2 */
} = 1; /* version number = 1 */
} = 0x31234567; /* program number =
0x31234567 */
The file declares both of the procedures and specifies the argument and return
values for each. It also assigns a procedure number to each function (1 and 2),
along with a program number (0x31234567) and a version number (1). The
program numbers are 32-bit integers that are assigned as follows:
Ć 0x00000000 · 0x1fffffff is defined by Sun machine.
Ć 0x20000000 · 0x3fffffff is defined by users.
Ć 0x40000000 · 0x5fffffff is for transient.
Ć 0x60000000 · 0xffffffff is reserved for future use.
Procedure numbers start at 0. Every remote program and version must define
procedure number 0 as the Ânull procedureÊ. It does not require any arguments
and returns nothing. The rpcgen compiler automatically generates it. The
function of this procedure number is to allow a client to call it to verify that the
particular program and version exist. It is also useful for clients to calculate the
round-trip time, since if a client calls it, it does nothing and returns a null reply to
the client. Thus, the time taken to call an RPC with procedure number 0 is the
round-trip communication time.
To execute the rpcgen compiler, you should type the following command:
>rpcgen date.x
Then the rpcgen compiler generates three different files from date.x —
date_svc.c, date.h, and date_clnt.c. The content of the file date.h is
shown below:
/*
* Please do not edit this file.
* It was generated using rpcgen.
*/
#include <rpc/types.h>
date_svc.c and date_clnt.c are the client and server stub procedures
respectively. They are used to compile with the client and server processes when
we compile. To create a client program from the client main function
chkdate.c in the client side, you should type the following commands in the
following order:
Note that rpc is the RPC run-time library. On the server side, you should type
the following commands in the following order:
/*
* chkdate.c - client program for remote date service.
*/
#include <stdio.h>
#include <rpc/rpc.h> /* standard RPC include file */
#include "date.h" /* this file is generated by rpcgen */
main(argc, argv)
int argc;
char *argv[];
{
CLIENT *cl; /* RPC handle */
char *server;
long *lresult; /* return value from bin_date_1() */
char **sresult; /* return value from str_date_1() */
if (argc != 2) {
fprintf(stderr, "usage: %s hostname\n", argv[0]);
exit(1);
}
server = argv[1];
/*
* Create the client "handle".
*/
clnt_pcreateerror(server);
exit(2);
}
/*
* Now call the remote procedure "str_date".
*/
if ((sresult = str_date_1(lresult,cl))==NULL) {
clnt_perror(cl,server);
exit(4);
}
printf("time on host %s = %s", server, *sresult);
The program flow is simple. First, check the existence of the second argument,
which should contain the name of the remote server. Then call clnt_create to
create an RPC handle to the specified program (the second argument) and
version (the third argument) on a host. We also need to specify which
communication protocol we used. It is usually either TCP or UDP. In our example,
we used TCP, so the fourth argument is tcp. Note that the first argument is the
name of the remote server. After executing the function call clnt_create, we
have the handle. Later, we can call the remote procedures bin_date_1 and
str_date_1 for that particular program and version. When we finish executing
the two remote procedures, we call clnt_destroy to destroy the RPC handle.
/*
* date_server.c - remote procedures; called by server stub.
*/
/*
* Return the binary date and time.
*/
long *
bin_date_1()
{
static long timeval; /* must by static */
time_t time(); /* Unix function */
/*
* Convert a binary time and return a human readable string.
*/
char **
str_date_1(bintime)
long *bintime;
{
static char *ptr; /* must be static */
char *ctime(); /* Unix function */
As mentioned, the server program is just a set of functions and is not a main
program. The flow is very simple. The first function just calls the time function to
get the current time, and the second function converts it into a human readable
format. Note that the return values must be static variables, because if we do not
use static variables, their values would be undefined after the return statement
passes control back to the server stub that calls our remote procedure.
In the server side, we execute the server program as follows:
On the client side, we execute the client program and obtain the following
daytime result from the server:
Finally, letÊs go through the way Sun RPC processes in Sun environment. Figure
4.10 shows the steps in RPC.
1. When the server program date_server is started on the remote side, the
server program calls a function in the RPC library, svc_register,
registers its program number and version with the remote system. This
function contacts the port mapper process to register itself.
2. The client program calls clnt_create to contact the port mapper to find
out the port number of the server.
Note that the dispatcher of Sun RPC, i.e. port mapper, is quite different from our
dispatcher. When we introduced the RPC mechanism, the function of the
dispatcher was to select an appropriate server process to serve the incoming
request. In Sun RPC, the port mapper does not select a server process for any
incoming request. Instead, it searches the remote procedure required by the client
and sends its port number back to the client. Once the client knows the port
number of the remote procedure, it directly contacts the remote procedure next
time, and the port mapper will not serve the client after that.
READING
Pages 700ă8 from Stevens (1994) Unix Network Programming at the end
of this topic. This supplementary reading provides you with another
perspective on remote procedure calls in Sun RPC. Please read it to
improve your understanding of this very important network concept.
SELF-TEST 4.4
ACTIVITY 4.1
It is very easy to do the measurement. Just before you start to call the
remote procedure, call the time function time and store the value into
a variable, say start_time. Then, after executing the remote
procedure and the control is back to you, call the time function time
again and store the value into another variable, say finish_time.
Thus the difference of these two variables is the time taken to complete
the remote procedure.
You might find that the time taken may be too short to measure. We
suggest you measure it after it executes 100 times. That means you
record the initial time before executing the RPC. Then you use for loop
to repeat it 100 times. After the repeat loop has finished, record the
final time and get the difference. You divide it by 100, and the result is
the average time taken to finish the RPC.
It is also suggested that you do the same thing in your local procedure,
which means you cut and paste the remote procedure into your main
program. Just remove client and server stub procedures (i.e. do not
compile your client program with date_clnt.c) and directly execute
the procedure in your main program. Use the above method to
measure the time taken to execute a local procedure, but this time
repeat the execution 10,000 times, because the time taken to execute a
local procedure is much shorter than a remote procedure.
Please compare the time taken to execute a local procedure call and a
remote procedure call. You will find the difference is more than 100
times. It shows that executing a remote procedure call is significantly
slower than a local procedure call.
Topic 4 took you on a guided tour of the design and implementation problems of
interprocess communication (IPC) and remote procedure call (RPC), and
provided you with a small example of how they actually are implemented. Now
that you have completed the topic, you should be able to explain the role of
marshalling and unmarshalling in IPC. You should also be able to describe and
differentiate between synchronous and asynchronous communications.
T op i c 5 X Multicast Group
Communication
LEARNING OUTCOMES
By the end of this topic 5, you should be able to:
1. Describe the advantages of group communication;
2. Explain the definitions of atomicity, group and ordering; and
3. Discuss the design and implementation issues of group
communication.
X INTRODUCTION
Group communication refers to the ability of a set of more than two processes in
a communication network to communicate simultaneously with each other.
Multicast communication refers to a process sending a message to all members of
the group of processes. In this topic, we will denote group communication as
multicast group communication.
Now, regardless of what answer you came up with for the above situation, if the
number of acknowledgements is not equal to the number of messages sent (i.e.
message lost / acknowledgement lost / process failure), problems occur - who
did not get the message? If the source just retransmits the message again to all
members, it will be a waste of lots of bandwidth because only a small portion of
the destination group (e.g. one out of 100) probably did not reply. This puts an
even greater load on the source - and the system.
However, each time the data change, the modified data must be broadcasted
to the processes managing the replicas so that data consistency can be
maintained in all processes.
There are six important issues that need to be addressed in the design of group
communication within a distributed system. The following descriptions of these
six issues have been extracted from a paper by Kaashoek [1993].
Addressing
This issue is about addressing methods for a group of members.
Reliability
This issue is about to identify whether the communication is reliable or unreliable.
Ordering
This issue is about the order among messages in the communication.
To illustrate the difference between FIFO and total ordering, consider a service
that stores records for client processes. Assume that the service replicates the
records on each server to increase availability and reliability and that it
guarantees that all replicas are consistent. If a client may only update its own
records, then it is sufficient that all messages from the same client will be ordered.
Thus, in this case FIFO ordering can be used. If a client may update any of the
records, then FIFO ordering is not sufficient. A total ordering on the updates,
however, is sufficient to ensure consistency among the replicas. To see this,
assume that two clients, C1 and C2, send an update for record X at the same time.
As these two updates will be totally ordered, all servers either (1) receive first the
update from C1 and then the update from C2 or (2) receive first the update from
C2 and then the update from C1. In either case, the replicas will stay consistent,
because every server applies the updates in the same order. If in this case FIFO
(or causal) ordering had been used, it might have happened that the servers
applied the updates in different orders, resulting in inconsistent replicas.‰
Delivery Semantics
This issue is about how many processes (i.e. group members) must receive the
message successfully.
Response Semantics
This issue is about how to response to a broadcast message.
„Item five, response semantics deals with what the sending process expects from
the receiving processes. There are four broad-categories of what the sender can
expect - no responses, a single response, many responses and all responses.
Operating systems that integrate group communication and RPC completely
support all four choices.‰
Group Structure
This issue is about the semantics of a group such as dynamic versus static and
open versus closed.
Logical Clock
Lamport [1978] suggested a method that can be used to order events in a
distributed system. The following brief descriptions have been extracted from the
textbook and another book about distributed systems.
1. If a and b are events in the same process, and a occurs before b, then a → b
is true.
2. If a is the event of a message being sent by one process, and b is the event of
the message being received by another process, then a → b is also true. A
message cannot be received before it is sent, or even at the same time it is
sent, since it takes a finite, non-zero amount of time to arrive (Tanenbaum
and van Steen 2002, 252).
Ć LC2: (b) On receiving (m, t), a process pj computes Lj := max(Lj, t) and then
applies LC1 before time stamping the event receive(m).
The first rule states that when an event happens, its logical clock should be
incremented by one so that we know the sequence of event happening. For an
event whose logical clock is smaller than another event, the former event
happened earlier than the latter one. The second rule just puts the logical clock to
be carried in every message. The third rule is used to arrange the ordering of a
process if it receives a message sent by another process.
Consider Figure 11.6 in your textbook (pg. 447). There are three processes p1, p2,
and p3, and two messages m1 and m2 in the system. All processes have their
logical clock initialised to 0 and their timestamps are assigned in the figure. Event
a happens first and thus, its logical clock is one. Event b happens next and thus,
its logical clock is 2. The message m1 carrying logical clock two arrives at process
p2. Since process p2 has a logical clock 0, which is smaller than the logical clock of
m1, the logical clock of event c (receiving the message m1) is therefore two and
then it is incremented by one after receiving the message, i.e. three. Similarly,
events d and f have logical clocks four and five respectively. Note that even if
event f is next to event e in process p3, the logical clock of event f is much larger
than that of event e. However, the difference is meaningless. We only care about
which logical clock value is larger to identify the order of an event - not the
difference.
Moreover, not all events are related by the relation →. For example, events b and
e have taken place in different processes and there is no chain of message-
interacting between them. Thus, they are not ordered or we can say they are
concurrent, i.e. b || e. Note that the definition of concurrent here is different to that
in operating systems.
In Figure 11.6 in the textbook, L(b) = 2 and L(e) = 1. But, whichever event
happened before or after event b, the value of L(e) is still equal to 1. Thus, even
though L(b) > L(e), and since the events b and e are in different processes, we still
have b || e and the comparison between their timestamps is meaningless.
Vector Clock
Vector clocks are an improvement on LamportÊs logical clock. By using vector
clocks, it is possible to determine by examining their vector timestamps whether
two events are ordered by happened-before or are concurrent. The following
brief description has been extracted from the textbook.
Mattern [1989] and Fidge [1991] developed vector clocks to overcome the
shortcoming of LamportÊs clock - the fact that from L(e) < L(eÊ) we cannot
conclude that e → eÊ. A vector clock for a system of N processes in an array of N
integers. Each process keeps its own vector clock Vi, which it uses to timestamp
local events. Like Lamport timestamps, processes piggyback vector timestamps
on the messages they send to one another, and there are simple rules for updating
the clocks as following:
For a vector clock Vi, Vi[i] is the number of events that pi has timestamped, and
Vi[j] (j ≠ i) is the number of events that have occurred at pj that pi has potentially
been affected by. (Process pj may have timestamped more events by this point,
but no information has flowed to pj about them in message as yet.) (Coulouris,
Dollimore and Kindberg 2001, 399).
Figure 11.7 (pg. 448, textbook) shows the vector timestamps of the events.
Consider the events a and f. Let V(q) be the vector timestamp of an event q, then
we have V(a) = (1, 0, 0) and V(f) = (2, 2, 2). By comparing V(a) and V(f), we know
V(a) < V(f) and thus, a → f. We also know b || e since V(b) is not larger than V(e),
V(e) is not larger than V(b), and also V(b) is not equal to V(e).
READING
SELF-TEST 5.1
1. Give an example of how to use the multicast update, which is
one of the applications of using group communication.
There are two different approaches for implementing group communication - the
centralised approach and the distributed approach.
We will only consider the centralised approach. Compared with the distributed
approach, the centralised approach is simple and efficient in establishing total
ordering. If every multicast message is generated by the source itself, it is difficult
to maintain total ordering because every member has to know the ordering of all
members. Moreover, in the centralised approach, the list storing all members can
be simply maintained by the centre - not by all members. Each member wanting
to broadcast a message just sends it to the centre and the centre will know which
members should receive this broadcast message. There will be no consistency
problem in maintaining the list. Finally, the loading of broadcast messages will be
transferred from each source to the centre. Even if the number of members in a
group is very large (e.g. > 1,000), every source member just sends a broadcast
message once to the centre and the centre does the rest of them, especially if we
want to invoke atomicity (which will be discussed later).
Ć The centre used has to be reliable, or some backup centres are required in case
the centre fails. The centre is so important that we have to keep it alive to
maintain the proper operation of any group communication.
Centralised
Characteristics Distributed Approach
Approach
The process to broadcast messages The centre The sender
Membership database management The centre All members
Implementation complexity of total Low High
ordering
Transmission overhead Small Large
Reliability Depends on No need to rely on the
the centre centre
Performance bottleneck The centre The sender
Normal Operation
Figure 5.3 shows the normal operation of a group communication. The operation
is simple. We have used two multicast messages m and mÊ to show the operation.
At the beginning, S1 and Sn send multicast messages m and mÊ to the centre Si.
LetÊs assume that message m is delivered first and then message mÊ. Since centre
Si receives both messages, the centre knows the order of these two messages and
thus, assigns them in the order m.1 and mÊ.2. After that, two messages will be
broadcasted to all processes (members) including the two original sources S1 and
Sn. All members will receive these two messages and, regardless of the order in
which the messages are received, they can identify the correct order by the
numbering.
Consider that a group has N members (including the centre) and positive
acknowledgement is used (remember this from the „Introduction‰ to this topic ?).
If there is no transmission error, the expected number of messages transmitted for
one multicast message is 1 + (N ă 1) + (N ă 1), i.e. 2N ă 1. [Your guess before (for
N = 100) was probably 200; 199 is pretty close.] However, if we consider negative
acknowledgement, the expected number of messages transmitted is 1 + (N ă 1),
i.e. N. Now assume that one re-transmission is enough for error handling. Then,
the expected number of messages transmitted for positive acknowledgement is 1
+ (N ă 1) + (N ă 2) + 1 + 1, i.e. 2N (this is the 200 you might have guessed earlier),
whereas, for negative acknowledgement, the number is 1 + (N ă 1) + 1 + 1, i.e. N
+ 2 (as one gives a negative acknowledgement back). Therefore, if N is
sufficiently large, the expected number of messages transmitted for positive
acknowledgement is always almost two times larger than for negative
acknowledgement.
Figure 5.5 shows the operation to achieve atomicity of m and mÊ. Two multicast
messages m and mÊ are sent by S1 and Sn respectively. Message m is delivered
first and then message mÊ. Centre Si then broadcasts the messages to all members.
Each member sends a positive acknowledgement message (dashed line in the
figure) to the centre to confirm that messages m and mÊ have been correctly
received. When the centre receives all acknowledgements and confirms that all
members received the messages correctly, it sends a message (dotted line in the
figure) to inform each member that the atomicity of m and mÊ has been achieved.
When a member receives this message, it will deliver (commit) messages m and
mÊ to its client process.
Note that to achieve this atomicity, many messages must be transmitted for one
multicast message. Consider a group has N members (including the centre). If
there is no transmission error, the expected number of messages transmitted for 1
multicast message is 1 + (N ă 1) + (N ă 1) + (N ă 1), i.e. 3N ă 2. Also, before the
atomicity message arrives at a member, it cannot deliver its message to its client
process because of the rule of atomicity. The rule is „A message transmitted by
atomic multicast is either received by all of the processes that are members of the
receiving group or else it is received by none of them‰. Thus, we might have a
chance that even if a process has correctly received the multicast message and a
positive acknowledgement has been sent back, the message cannot be delivered
to its client process and should be removed from the buffer if atomicity cannot be
achieved, i.e. some other processes did not correctly receive the message and
acknowledge it.
Note that only members wanting to broadcast messages (S1 and Sn here) have the
ability to detect the failure of the centre because they expected to receive their
own messages back from the centre. Therefore, if a member sends a message and
does not receive any message back, it means the centre has crashed. Any non-
message-sending member of the group could not detect the failure since the
member cannot distinguish between a centre failure and the fact that no message
has been sent to the group - they were not expecting anything.
Figure 5.9 shows the operation to recover from a centre failure by election.
Assume Sj is the first one to invite other members to elect it as the new centre. Sj
should usually be the backup server. When members know the centre is down,
they will request Sj to take action. Thus Sj will broadcast an invitation (self-
nomination) to every member of the group. When S2, ⁄, Sn receive the invitation,
they will send back a positive acknowledgement (a confirm message) to Sj to
confirm that they agree to elect Sj the new centre. When Sj receives all confirm
messages, it becomes the new centre.
Although it seems quite easy to handle this failure, some problems still exist.
Sometimes we might claim (in our design) that one backup server is not reliable
enough, therefore there should be more. However, if we have more than one
backup server, they might compete in the election of the new centre. This is the
problem we address next.
Note that if the centre does not want to accept the application, it simply does
nothing. After process S tries several times, it will stop trying. Also, if some
members do not want S to be a member of the group, they simply do not
acknowledge the notification of „S joins‰, which means the centre cannot accept S
as a member since it did not receive all acknowledgements.
If the centre cannot get a majority of members (i.e. j < n / 2), it blocks. That
means the members in partition a will not have group communication. Even
if they want to form a group and the centre establishes a group
communication for them in partition a, it is a new group and it is
independent of group G.
2. In partition b, since the centre (Si) is not in this partition, they need to elect a
new centre. It can be elected by competition, depending on which recovery
mechanism is used. After the election, it does the same thing as partition a.
That means the centre will examine the size of the partition - If it gets a
majority of members, the partition becomes group G; otherwise, there will
be no group in partition b and they will not have group communication.
Note that if the number of members in partition a is equal to partition b (i.e. each
of them gets half of the total number of members of group G, n / 2), then the
partition that includes centre Si will win and the other partition will block. Here,
one partition is allowed to take the name of the group because each group should
be unique. If more than one group has the same identifier, they belong to the
same group or there is an error.
SELF-TEST 5.2
This topic looks at the design and implementation issues of multicast group
communication, and focuses initially on the characteristics of atomicity and
ordering in such systems. You should now understand the simple group
management functions.
Suggested Solutions to
Self-test Questions
TOPIC 1: INTRODUCTION TO NETWORK AND DISTRIBUTED SYSTEM
Self-test 1.1
(b) Resource sharing: A network can be used to connect users and shared
resources. Physical devices such as printers, and logical devices such
as databases and program libraries, can be shared.
(n ă 1) + (n ă 2) + ⁄ + 2 + 1
i.e., n(n ă 1) / 2.
Self-test 1.2
1. (a) MAN or LAN, because the size of a shopping centre is fixed. If its size
is large, a MAN or several connected LANs may be required;
otherwise, a LAN is sufficient.
(b) LAN. The staff of the department are usually located together and
thus, the area to be covered will not be too big.
(c) WAN. The WWW provides services to the world, WAN is a suitable
one.
(b) IP. The second main function of IP is to „teach‰ the message how to
travel from a sender, through different routers, to its corresponding
receiver.
(d) TCP, because TCP provides end-to-end reliable transmission for users;
thus correcting any errors in the data bits is a part of its job.
Self-test 1.3
Distributed services: SUN RPC (Remote Procedure Call), NFS (Network File
System), AFS (Andrew File System).
Self-test 1.4
Self-test 1.5
Security model: Internet Banking System (they need to protect remote user
information).
Self-test 2.1
1. (a) 1
(b) 0 (0 is treated as an even number)
(c) 1
2.
Self-test 2.2
(b) Sending long continuous bits may cause the fairness problem. If the
communication occupies a communication channel for a very long
time for sending data, others will suffer long transmission delays.
Sending data in packets can solve this problem. Everyone can send
data, but the computer networks will handle the data in turns. For
example, one computer sends 10,000 bits, which requires 10 seconds to
complete; other computers may suffer 10 seconds delay. However, if
we use ten packets to send it, other stations may suffer only 1 second
delay and then their packets (packets sent from other stations) may
start to transmit after 1 second delay.
Self-test 2.3
2. If the loading of a network is low, the throughput is not related to the delay,
because even if the delay to send a packet is long, the network still has time
to maintain the low data transmission rate.
However, if the loading is high, and every computer wants to send packets,
the throughput and the delay is closely related. If the average delay to
deliver packets is long, the time spent to complete the transmission will be
long and hence the throughput cannot be very high. Sometimes it is even
smaller than the input packet arrival rate, which will cause network
congestion.
Self-test 2.4
(a) The data signal in the ring does not travel in two directions on the
shared transmission link but only one direction - clockwise or anti-
clockwise.
(b) There are no terminators in the ring topology. When the data signal
travels from a sender, it goes through the ring and then comes back to
the sender, and the sender will absorb the data signal. That means that
when the sender receives the data signal, the sender will not send it to
the ring again.
(a) It is a collision-free topology. Since all hosts are connected to the node
with dedicated links, they do not share transmission links but they
share the node, so there is no collision within the transmission links.
(b) There is no control procedure for each host. Hosts just send their
messages or data to the node and the node will handle it.
(d) Also, the reliability of the network is dependent on the node; thus it is
easy to maintain a network - just maintain the node.
Self-test 2.5
(a) When a station (computer) has data to send, it first listens to the
channel (segment) to see if anyone else is transmitting at that moment.
(c) If the channel is busy, the station waits until it becomes idle.
(d) If a collision occurs, the station aborts their transmission, sends a jam
signal, waits a random amount of time, and then starts all over again.
1 τ
S ≤ ,where a =
1 + 4.44a tx
For a WAN spanning a long distance, τ is large relative to tx. Hence a will be
large and the maximum throughput will be small. Thus, CSMA/CD is
insufficient for WANs.
Also, since the shared transmission link is long, the minimum frame size is
very large and it is not practical in WANs.
Self-test 2.6
(a) At ring initialisation, a special packet called a token is injected into the
ring and circulates on the ring.
(b) A station (wanting to transmit) waits until it sees a token. It holds the
token and transmits its frame (packet) on the ring.
(c) The station absorbs the transmitted frame when it circulates back.
Self-test 3.1
1. 1Æ 5
1Æ4Æ5
1Æ2Æ3Æ5
1Æ2Æ3Æ4Æ5
1Æ4Æ3Æ5
Self-test 3.2
(a) The minimum Ethernet frame size is 2τC where τ is the end-to-end
propagation delay and C is the link transmission capacity. If we apply
CSMA/CD into a WAN, the value of τ will be very large and so will
the minimum Ethernet frame size. For example, a reasonable
propagation delay is 10 μs / km and an appropriate length of a WAN
is 100 km. Then τ is equal to 1 ms. A reasonable link transmission
capacity is 10 Mbps and thus the minimum Ethernet frame size under
this configuration is 20-kbits, which is not a reasonable size for a
frame.
2. Fixed maximum packet size gives better buffer management and is fairer
for other packets. If we have the fixed maximum size, we know how many
packets a buffer can store, and thus we know how to manage the buffer and
estimate whether it is adequate or not. Also, a fixed maximum packet size
gives the upper bound of transmission delay. Thus no one suffers a long
transmission delay, and packets can take turns being processed within a
node.
3. The similarity is they both handle packet routing. The difference is the
network layer in the ISO OSI reference model handles the network
congestion, whereas the Internet layer in the TCP/IP reference model
handles IP address format and ignores network congestion.
Self-test 3.3
3.
Destination Mask Next Hop
20.0.0.0 255.0.0.0 Direct deliver
40.0.0.0 255.0.0.0 Direct deliver
128.1.0.0 255.255.0.0 Direct deliver
192.4.10.0 255.255.255.0 128.1.0.9
144.214.0.0 255.255.0.0 128.1.0.9
4. ARP is used to search for the low-level network address or physical address
of a host computer if we do not have the mapping between it and its
corresponding IP address. If we do not have ARP, we need the network
administrator to update the mapping at various times. And, if the mapping
table is sometimes out-of-date, some packets cannot be correctly routed to
their destination.
Self-test 3.4
4.
Trans. No. Wc (kbytes)
24 (TO) 40
25 1
26 2
27 4
28 8
29 16
30 20
31 21
32 22
33 23
Self-test 3.5
1. A DNS server provides domain name mapping services to its clients. When
sent a service request to map a domain name of a machine, the DNS server
replies with the IP address of the machine. The above process is called name
resolving. Each machine on the Internet has a piece of software for resolving
names. It is often known as a resolver. For example, in UNIX, this is
accessed by calling gethostbyname. A resolver is configured with the IP
address of a local DNS server. When called, it packages a request to that
DNS server. When the DNS server returns the result, the resolver relays the
result to the caller.
When a request reaches a DNS server (usually the closest DNS server), the
name is extracted. If the server is an authority for the name, then the name
appears in its mapping database, and a lookup will return the IP address.
Otherwise, this DNS server will become the client of other DNS servers and
will send a request. When the reply comes back, it in turn replies to the
resolver.
Self-test 4.1
1. External data form is included in order to define the type of each data item
in the message to enable the sender and recipient to interpret them in the
same way. There is no need to include it if the sender and recipient agree on
the number of data items and the type of each one before the message is
transmitted.
2.
8
„Netw‰
„orks‰
3
„and_‰
11
„Dist‰
„ribu‰
„ted_‰
7
„Syst‰
„ems_‰
3. A blocking send delays the sending process until the message is received.
The delay is considerable when different processes in different computers
are involved. The advantage of a non-blocking send is that it avoids
delaying the sending process, allowing the sender to proceed with work in
parallel with the receiving process.
The disadvantage of non-blocking send is that the sender must make the
effort to ensure that a message is really received by the receiver. It causes
high implementation complexity.
4. The client would be delayed for a long time waiting for service from the
server, as the server may be busy serving other clients.
Self-test 4.2
In part (a), it is idempotent because you can repeat it any number of times
with the same effect.
In part (b), the operation to write data to a file can be considered in two
different situations. It can be defined as in UNIX, in which each write
operation is applied at the read-write pointer, so the operation is not
idempotent. It can also be defined as in several file servers in which the
write operations are applied to a specified sequence of locations, so the
operation is idempotent because it can be repeated any number of times
with the same effect.
In part (c), the operation to append data to a file is not idempotent, because
this operation puts something new into the file each time.
Self-test 4.3
1. The client provides the arguments and calls the client stub in the
normal way.
2. The client stub builds (marshals) a message (call request) and traps to
Operating System and network kernel.
3. The kernel sends the message to the remote kernel.
4. The remote kernel receives the message and gives it to the server
dispatcher.
5. The dispatcher selects the appropriate server stub.
6. The server stub unpacks (unmarshals) the parameters and calls the
corresponding server procedure.
7. The server procedure does the work and returns the result to the
server stub.
8. The server stub packs (marshals) it in a message (call return) and traps
it to Operating System and network kernel.
9. The remote (receiver) kernel sends the message to the client kernel.
10. The client kernel gives the message to the client stub.
11. The client stub unpacks (unmarshals) the result and returns to client.
2. Assume a high level Interface Definition Language (IDL) exists; the heading
of a stub procedure is derived directly from the corresponding procedure
signature in the IDL. The same names and types are used for the arguments.
Moreover, marshalling requires a library of marshalling and unmarshalling
procedures for the simple data types. In the client stub procedure, the input
arguments are marshalled one by one into the request message and the
output arguments are unmarshalled one by one from the reply message.
Finally, the IDL signature can determine the order of the arguments. All of
the above tasks can be generated automatically from an interface definition.
3. The server uses server interface definition. RPC interface specifies the
characteristics of the procedures provided by a server that are visible to the
clients. The characteristics include the names of the procedures and types of
parameter. Each parameter is defined as input or output. In summary, an
interface contains a list of procedure signatures · the names and types of
their I/O arguments. Later, when we talk about the implementation of Sun
RPC, we give an example of the interface.
4. The client needs to marshal the request and then communicate with the
server. As mentioned, RPC provides a standard mechanism to handle
everything except the contents of the client and server processes. Here, for
each remote procedure call, a client stub procedure is generated and
attached to the client program. The client stub replaces the remote
procedure call with a local call to the stub procedure. The stub procedure is
used to marshal the input arguments and place them into a message with
the procedure identifier of the remote procedure. The client stub uses IPC
primitive to send the call request message to the server and to wait for the
reply message.
5. The server receives the request and unmarshalls it. For each server, a
dispatcher is provided in the remote kernel. It receives the call request
message from the client and uses the procedure identifier in the message to
select the appropriate server procedure and pass the arguments to that
procedure. For each procedure in the server, which is declared at the server
interface as callable remotely, a server stub procedure is generated. The task
of a server stub procedure is to unmarshall the arguments and call the
corresponding local service procedure.
6. To allow multiple instances of the same service, the binder must associate a
unique (server) identifier with each service name and server port. The
unique identifier may be chosen by the server before Register, or by the
binder, so the unique identifier is returned to the server. The following are
the new versions of the binding service:
/* first alternative · binder adds name, port, version and unique identifier
to its table */
/* binder removes the entry, port, version and unique identifier from its
table */
/* binder looks up service with this name and version and returns the
server port */
TÊRPC = F + NÊ = F + 0.1N
Since N = 0.8 TRPC (network transmission time accounts for 80%), then
F = 0.2 TRPC
and
Thus the percentage change is (TRPC ă TÊRPC) / TRPC × 100 %, i.e. 72%.
That is, the time to complete a RPC will be shortened by 72% when the
network is upgraded to 100Mbps.
Self-test 4.4
Self-test 5.1
2. Ordering does not matter if the data sent by two clients are independent of
each other. In other words, there is no time or logical sequence attached to
the order in which messages should be acted on. (There will be no problems
with data inconsistency, because one action is not dependent on a previous
action.)
Although the messages, once sent, will get to the destination reliably, the
sender may fail after sending to some, but not to all members of the group.
That means the sending process may fail when it is doing the message
transmission to all members. Thus some members receive the message but
others cannot.
Self-test 5.2
A total of nine acknowledgements are sent from all members to the centre
(18 acknowledgements are required if one acknowledgement is used for one
message only).
A total of nine messages are sent from the centre to all members to confirm
the atomicity is achieved.
6. If the processes in the partition, which cannot form a group, want to join the
group again, we consider them new members. If the processes in the group
do not want this, they will simply ignore the newcomers.
OR
Thank you.