Beruflich Dokumente
Kultur Dokumente
Unit-I
The computers that are in a distributed system can be physically close together and
connected by a local network, or they can be geographically distant and connected by a
wide area network.
2. No global clock: When programs need to cooperate they coordinate their actions by
exchanging messages. Close coordination often depends on a shared idea of the time
at which the programs’ actions occur. But it turns out that there are limits to the
accuracy with which the computers in a network can synchronize their clocks – there
is no single global notion of the correct time. This is a direct consequence of the fact
that the only communication is by sending messages through a network.
3. Independent failures of components: All computer systems can fail and the system
designer is responsible for planning for the consequences of possible failures Each
component of a distributed system can fail independently, leaving the others still
running The failure may be due to a crash or a slow response
Hardware Concepts
Although all Distributed Systems consist of multiple CPUs, there are different ways of
interconnecting them and how they communicate
Flynn (1972) identified two essential characteristics to classify multiple CPU computer systems:
the number of instruction streams and the number of data streams
Uniprocessors SISD
Array processors are SIMD - processors cooperate on a single problem
MISD - No known computer fits this model
Distributed Systems are MIMD - a group of independent computers each with its own program
counter, program and data
MIMD can be split into two classifications
Multiprocessors - CPUs share a common memory
Multicomputers - CPUs have separate memories
Can be further subclassified as
Bus - All machines connected by single medium (e.g., LAN, bus, backplane, cable)
Switched - Single wire from machine to machine, with possibly different wiring patterns (e.g,
Internet)
I. Tightly coupled systems. In these systems, there is a single systemwide primary memory
(address space) that is shared by all the processors [Fig. l.1(a)]. If any processor writes, for
example, the value 100 to the memory location x, any other processor subsequently reading
from location x will get the value 100. Therefore, in these systems, any communication
between the processors usually takes place through the shared memory.
2. Loosely coupled systems. In these systems, the processors do not share memory, and
each processor has its own local memory [Fig. l.1(b)]. If a processor writes the value 100 to
the memory location x, this write operation will only change the contents of its local
memory and will not affect the contents of the memory of any other processor.
Hence, if another processor reads the memory location x, it will get whatever value was
there before in that location of its own local memory. In these systems, all physical
communication between the processors is done by passing messages across the network
that interconnects the processors.
Distributed Computing Models With diagrams
The various models that are used for building distributed computing systems can be
classified into 5 categories:
1.Minicomputer Model
3.Workstation–Server Model
The workstation model is a network of personal workstations having its own disk & a
local file system.
A workstation with its own local disk is usually called a diskful workstation & a
workstation without a local disk is called a diskless workstation. Disk less workstations
have become more popular in network environments than diskful workstations, making
the workstation-server model more popular than the workstation model for building
distributed computing systems.
A distributed computing system based on the workstation-server model consists of a
few minicomputers & several workstations interconnected by a communication
network.
In this model, a user logs onto a workstation called his or her home workstation. Normal
computation activities required by the user's processes are performed at the user's
home workstation, but requests for services provided by special servers are sent to a
server providing that type of service that performs the user's requested activity &
returns the result of request processing to the user's workstation.
Therefore, in this model, the user's processes need not migrated to the server machines
for getting the work done by those machines.
Example: The V-System.
4.Processor–Pool Model:
The processor-pool model is based on the observation that most of the time a user
does not need any computing power but once in a while the user may need a very
large amount of computing power for a short time.
Therefore, unlike the workstation-server model in which a processor is allocated to
each user, in processor-pool model the processors are pooled together to be shared
by the users as needed.
The pool of processors consists of a large number of microcomputers &
minicomputers attached to the network.
Each processor in the pool has its own memory to load & run a system program or
an application program of the distributed computing system.
In this model no home machine is present & the user does not log onto any machine.
This model has better utilization of processing power & greater flexibility.
Example: Amoeba & the Cambridge Distributed Computing System.
5.Hybrid Model:
The workstation-server model has a large number of computer users only
performing simple interactive tasks &-executing small programs.
In a working environment that has groups of users who often perform jobs needing
massive computation, the processor-pool model is more attractive & suitable.
To combine Advantages of workstation-server & processor-pool models, a hybrid
model can be used to build a distributed system.
The processors in the pool can be allocated dynamically for computations that are
too large or require several computers for execution.
The hybrid model gives guaranteed response to interactive jobs allowing them to be
more processed in local workstations of the users
5. Failure handling
Computer systems sometimes fail. When faults occur in hardware or software, programs
may produce incorrect results or may stop before they have completed the intended
computation. Failures in a distributed system are partial – that is, some components fail
while others continue to function. Therefore the handling of failures is particularly difficult.
6. Concurrency
Where applications/services process concurrency, it will effect a conflict in operations with
one another and produce inconsistence results.
Each resource must be designed to be safe in a concurrent environment.
7. Transparency
One of the main goals of a distributed operating system is to make the existence of multiple
computers invisible (transparent) and provide a single system image to its users. That is, a
distributed operating system must be designed in such a way that a collection of distinct
machines connected by a communication subsystem appears to its users as a virtual
uniprocessor. Achieving complete transparency is a difficult task and requires that several
different aspects of transparency be supported by the distributed operating system.
a) Access Transparency- Access transparency means that users should not need or be able
to recognize whether a resource (hardware or software) is remote or local. This implies that
the distributed operating system should allow users to access remote resources in the same
way as local resources.
b) Location transparency: The users cannot tell where resources are located
The two main aspects of location transparency are as follows:
1. Name transparency- This refers to the fact that the name of a resource (hardware or
software) should not reveal any hint as to the physical location of the resource. That is,
the name of a resource should be independent of the physical connectivity or topology
of the system or the current location of the resource.
2. User mobility- This refers to the fact that no matter which machine a user is logged
onto, he or she should be able to access a resource with the same name. That is, the
user should not be required to use different names to access the same resource from
two different nodes of the system.
b) Migration transparency: Resources can move at will without changing their names
c) Replication transparency: For better performance and reliability, almost all distributed
operating systems have the provision to create replicas (additional copies) of files and other
resources on different nodes of the distributed system. In these systems, both the existence
of multiple copies of a replicated resource and the replication activity should be transparent
to the users.
d) Failure Transparency-Failure transparency deals with masking from the users' partial
failures in the system, such as a communication link failure, a machine failure, or a storage
device crash. A distributed operating system having failure transparency property will
continue to function, perhaps in a degraded form, in the face of partial failures. For
example, suppose the file service of a distributed operating system is to be made failure
transparent.
e) Concurrency transparency: It enables several processes to operate concurrently using
shared resources without interference between them.
f) Parallelism transparency: Activities can happen in parallel without users knowing.
g) Scaling Transparency- The aim of scaling transparency is to allow the system to expand in
scale without disrupting the activities of the users.
h) Performance transparency: It allows the system to be reconfigured to improve
performance as loads vary.
8. Quality of service
Once users are provided with the functionality that they require of a service, such as the file
service in a distributed system, we can go on to ask about the quality of the service
provided. The main non-functional properties of systems that affect the quality of the
service experienced by clients and users are reliability, security and performance.
Adaptability to meet changing system configurations and resource availability has been
recognized as a further important aspect of service quality.
9. Reliability
One of the original goals of building distributed systems was to make them more reliable
than single-processor systems. The idea is that if a machine goes down, some other machine
takes over the job. A highly reliable system must be highly available, but that is not enough.
Data entrusted to the system must not be lost or garbled in any way, and if files are stored
redundantly on multiple servers, all the copies must be kept consistent. In general, the more
copies that are kept, the better the availability, but the greater the chance that they will be
inconsistent, especially if updates are frequent.
10. Performance
If a distributed system is to be used, its performance must be at least as good as a
centralized system. That is, when a particular application is run on a distributed system, its
overall performance should be better than or at least equal to that of running the same
application on a single-processor system. However, to achieve this goal, it is important that
the various components of the operating system of a distributed system be designed
properly; otherwise, the overall performance of the distributed system may turn out to be
worse than a centralized system. Unfortunately, achieving this is easier said than done.
3.Ring-Based Multiprocessors:
· A single address line is partitioned into a private area and shared area.
· Private area is divided up into regions so each machine has a piece for its stack
· Shared area is divided into 32 byte blocks.
· All machines are connected via token passing ring. All components are interconnected
via Memnet device.
· No centralized global memory.
4.Switched Multiprocessors:
· Two approaches can be taken to attack the problem of not enough bandwidth.
· Reduce the amount of communication. E.g. Caching.
· Increase the communication capacity. E.g. Changing topology.
· One method is to build the system as a hierarchy. Build the system as multiple clusters
and connect the clusters using an intercluster bus. As long as most CPUs communicate
primarily within their own cluster, there will be relatively little intercluster traffic. If still
more bandwidth is needed, collect a bus, tree, or grid of clusters together into a
supercluster, and break the system into multiple superclusters.
Design and Implementation issues of DSM
1. Granularity. Granularity refers to the block size of a DSM system, that is, to the unit of
sharing and the unit of data transfer across the network when a network block fault occurs.
Possible units are a few words, a page, or a few pages. Selecting proper block size is an
important part of the design of a DSM system because block size is usually a measure of the
granularity of parallelism explored and the amount of network traffic generated by network
block faults.
2. Structure of shared-memory space. Structure refers to the layout of the shared data in
memory. The structure of the shared-memory space of a DSM system is normally
dependent on the type of applications that the DSM system is intended to support.
3. Memory coherence and access synchronization. In a DSM system that allows replication
of shared data items, copies of shared data items may simultaneously be available in the
main memories of a number of nodes. In this case, the main problem is to solve the memory
coherence problem that deals with the consistency of a piece of shared data lying in the
main memories of two or more nodes.
4. Data location and access. To share data in a DSM system, it should be possible to locate
and retrieve the data accessed by a user process. Therefore, a DSM system must implement
some form of data block locating mechanism in order to service network data block faults to
meet the requirement of the memory coherence semantics being used.
5. Replacement strategy. If the local memory of a node is full, a cache miss at that node
implies not only a fetch of the accessed data block from a remote node but also a
replacement. That is, a data block of the local memory must be replaced by the new data
block. Therefore, a cache replacement strategy is also necessary in the design of a DSM
system.
6. Thrashing. In a DSM system, data blocks migrate between nodes on demand. Therefore,
if two nodes compete for write access to a single data item, the corresponding data block
may be transferred back and forth at such a high rate that no real work can get done. A DSM
system must use a policy to avoid this situation (usually known as thrashing).
The problem of thrashing may occur when data items in the same data block are
being updated by multiple nodes at the same time, causing large numbers of data block
transfers among the nodes without much progress in the execution of the application.
While a thrashing problem may occur with any block size, it is more likely with larger block
sizes, as different regions in the same block may be updated by processes on different
nodes, causing data block transfers, that are not necessary with smaller block sizes.
7. Heterogeneity. The DSM systems built for homogeneous systems need not address the
heterogeneity issue. However, if die underlying system environment is heterogeneous, the
DSM system must be designed to take care of heterogeneity so that it functions properly
with machines having different architectures.
False sharing- False sharing occurs when two different processes access two unrelated
variables that reside in the same data block (Fig. 5.2). In such a situation, even though the
original variables are not shared, the data block appears to be shared by the two processes.
The larger is the block size, the higher is the probability of false sharing, due to the fact that
the same data block may contain different data structures that are used independently.
Notice that false sharing of a block may lead to a thrashing problem.
3. Structuring as a database:
Structure the shared memory like a database.
Shared memory space is ordered as an associative memory called tuple space.
To perform update old data item in the DSM are replaced by new data item.
Processes select tuples by specifying the number of their fields and their values or
type Access to shared data is non-transparent. Most system they are transparent
Consistency models
Consistency requirement vary from application to application .
A consistency model basically refers to the degree of consistency that has to be
maintained for the shared memory data.
It is defined as a set of rules that application must obey if they want the DSM system to
provide the degree of consistency guaranteed by the consistency model.
If a system support the stronger consistency model then the weaker consistency model
is automatically supported but the converse is not true.
Types:
1. Strict consistency model
This is the strongest form of memory coherence having the most stringent consistency
requirement
Value returned by a read operation on a memory address is always same as the value
written by the most recent write operation to that address
All writes instantaneously become visible to all processes
Implementation of the strict consistency model requires the existence of an absolute
global time
Absolute synchronization of clock of all the nodes of a distributed system is not possible
Implementation of strict consistency model for a DSM system is practically impossible
If the three operations read(r1), write(w1), read(r2) are performed on a memory
location in that order Only acceptable ordering for a strictly consistency memory is (r1,
w1, r2)
THRASHING
Thrashing is said to occur when the system spends a large amount of time transferring
shared data blocks from one node to another, compared to the time spent doing the useful
work of executing application processes. It is a serious performance problem with DSM
systems that allow data blocks to migrate from one node to another.
Such situations indicate poor (node) locality in references. If not properly handled, thrashing
degrades system performance considerably. Therefore, steps must be taken to solve this
problem. The following methods may be used to solve the thrashing problem in DSM
systems:
1. Providing application-controlled locks. Locking data to prevent other nodes from
accessing that data for a short period of time can reduce thrashing. An application-
controlled lock can be associated with each data block to implement this method.
2. Nailing a block to a node for a minimum amount of time Another method to reduce
thrashing is to disallow a block to be taken away from a node until a minimum amount of
time t elapses after its allocation to that node. The time t can either be fixed statically or be
tuned dynamically on the basis of access patterns. The main drawback of this scheme is that
it is very difficult to choose the appropriate value for the time t. If the value is fixed
statically, it is liable to be inappropriate in many cases.
3. Tailoring the coherence algorithm to the shared-data usage patterns- Thrashing can also
be minimized by using different coherence protocols for shared data having different
characteristics.
File models
Unstructured files
Simplest model
File – unstructured sequence of data
No substructure
Contents – un interpreted sequence of bytes
Unix, MS-Dos
Modern OS used this model because sharing of files is easier
in compared with structured file model
Since file has no structure then different applications can
interpret the contents of files in many different ways.
Structured files
Rarely used
File – ordered sequence of records
Files – different types, different size and different properties
Record – smallest unit of data that can be accessed
Two categories
Files with non indexed records:
File records is accessed by specifying it’s position within file
For ex. Fifth record from beginning, second record from end
File attributes
Information describing that files
Has name and value
Contains information such has owner, size, access permission, date of creation, date
of last modification and date of last access
User can read the value of any attribute but can’t change or modify it
Maintained and used by directory service because they are subject to different
access control
Mutable files
Used by most existing os
Update performed on files overwrites on old contents to produce new contents
File is represented as a single stored sequence and that is altered by each update
operation
Immutable files
Cedar files system.
File cannot be modified once it has been created except to be deleted.
File versioning approach is used, a new version of file is created when change is made
rather than updating same file.
In practice storage space may be reduced by keeping only differences rather than
created whole file again.
Sharing is much easier because it supports caching and replication which eliminates the
problem of keeping multiple consistent copies.
Suffering from two issues
Increased use of disk space
Increased disk allocation activity
CFS uses keep parameter as the no. of most current version of file to be retained.
File-sharing semantics
Multiple users may access a shared file simultaneously. An important design issue for any
file system is to define when modifications of file data done by a user are visible to other
users. This is defined by the file-sharing semantics used by the file system.
1. UNIX semantics
Absolute time ordering is enforced on operations which ensure that read operation on a file
sees the effects of all previous write operations performed on that file. Write to an open file
immediately become visible to users accessing the file at the same time.
2. Session semantics
A session is a series of file accesses made between the open and close file operations. The
changes made to a file are made visible only to the client process that opened the session
and is made invisible to the other remote processes that have the same file open
simultaneously. The changes made to the file are available to the remote processes only
after the session is closed.
3. Immutable shared-files semantics
This is based on the use of immutable file model where an immutable file cannot be
modified once it is created. Changes to the file are handled by creating a new updated
version of the file. The semantics allows the file to be shared only in the read-only mode.
With this approach, shared files cannot be shared at all.
4. Transaction-like semantics This is based on the use of transaction mechanism which
ensures that partial changes made to the shared data by a transaction will not be visible to
other concurrently executing transactions until the transaction ends.
A file-caching scheme for a distributed file system contributes to its scalability and reliability
as it is possible to cache remotely located data on a client node. Every distributed file
system use some form of file caching.
The following can be used:
1.Cache Location
Cache location is the place where the cached data is stored. There can be three possible
cache locations
i.Servers main memory:
A cache located in the server’s main memory eliminates the disk access cost on a cache hit
which increases performance compared to no caching.
The reason for keeping locating cache in server’s main memory-
Easy to implement
Totally transparent to clients
Easy to keep the original file and the cached data consistent.
ii.Clients disk:
If cache is located in clients disk it eliminates network access cost but requires disk access
cost on a cache hit. This is slower than having the cache in servers main memory. Having the
cache in server’s main memory is also simpler.
Advantages:
Provides reliability.
Large storage capacity.
Contributes to scalability and reliability.
Disadvantages:
Does not work if the system is to support diskless workstations.
Access time is considerably large.
iii.Clients main memory
A cache located in a client’s main memory eliminates both network access cost and disk
access cost. This technique is not preferred to a client’s disk cache when large cache size
and increased reliability of cached data are desired.
Advantages:
Maximum performance gain.
Permits workstations to be diskless.
Contributes to reliability and scalability.
2.Modification propagation
When the cache is located on client’s nodes, a files data may simultaneously be cached on
multiple nodes. It is possible for caches to become inconsistent when the file data is
changed by one of the clients and the corresponding data cached at other nodes are not
changed or discarded.
The modification propagation scheme used has a critical effect on the systems performance
and reliability.
Techniques used include –
i.Write-through scheme
When a cache entry is modified, the new value is immediately sent to the server for
updating the master copy of the file.
Advantage:
High degree of reliability and suitability for UNIX-like semantics.
The risk of updated data getting lost in the event of a client crash is low.
Disadvantage:
Poor Write performance.
ii.Delayed-write scheme
To reduce network traffic for writes the delayed-write scheme is used. New data value is
only written to the cache when a entry is modified and all updated cache entries are sent to
the server at a later time.
There are three commonly used delayed-write approaches:
Write on ejection from cache:
Modified data in cache is sent to server only when the cache-replacement policy has
decided to eject it from client’s cache. This can result in good performance but there can be
a reliability problem since some server data may be outdated for a long time.
Periodic write:
The cache is scanned periodically and any cached data that has been modified since the last
scan is sent to the server.
Write on close:
Modification to cached data is sent to the server when the client closes the file. This does
not help much in reducing network traffic for those files that are open for very short periods
or are rarely modified.
Advantages:
Write accesses complete more quickly that result in a performance gain.
Disadvantage:
Reliability can be a problem.
3.Cache validation schemes
The modification propagation policy only specifies when the master copy of a file on the
server node is updated upon modification of a cache entry. It does not tell anything about
when the file data residing in the cache of other nodes is updated. A file data may
simultaneously reside in the cache of multiple nodes. A client’s cache entry becomes stale as
soon as some other client modifies the data corresponding to the cache entry in the master
copy of the file on the server. It becomes necessary to verify if the data cached at a client
node is consistent with the master copy. If not, the cached data must be invalidated and the
updated version of the data must be fetched again from the server.
There are two approaches to verify the validity of cached data:
i.Client-initiated approach
The client contacts the server and checks whether its locally cached data is consistent with
the master copy.
Checking before every access- This defeats the purpose of caching because the
server needs to be contacted on every access.
Periodic checking- A check is initiated every fixed interval of time.
Check on file open- Cache entry is validated on a file open operation.
ii.Server-initiated approach
A client informs the file server when opening a file, indicating whether a file is being
opened for reading, writing, or both. The file server keeps a record of which client
has which file open and in what mode. The server monitors file usage modes being
used by different clients and reacts whenever it detects a potential for inconsistency.
E.g. if a file is open for reading, other clients may be allowed to open it for reading,
but opening it for writing cannot be allowed. So also, a new client cannot open a file
in any mode if the file is open for writing.
When a client closes a file, it sends intimation to the server along with any
modifications made to the file. Then the server updates its record of which client has
which file open in which mode.
When a new client makes a request to open an already open file and if the server
finds that the new open mode conflicts with the already open mode, the server can
deny the request, queue the request, or disable caching by asking all clients having
the file open to remove that file from their caches.
File Replication : High availability is a desirable feature of a good distributed file system and
file replication is the primary mechanism for improving file availability. Replication is a key
strategy for improving reliability, fault tolerance and availability. Therefore duplicating files
on multiple machines improves availability and performance.
Replicated file : A replicated file is a file that has multiple copies, with each copy located on
a separate file server. Each copy of the set of copies that comprises a replicated file is
referred to as replica of the replicated file.
Advantages of file replication from qb
Difference between caching and replication from qb
Multi Copy Update Problem
Maintaining consistency among copies when a replicated file is updated is the major issue of
file system that supports replication of files. Some commonly used approaches to handle
this issue are described below:
1. Read -Only-Replication 2. Read -Any-Write- All Protocol 3. Available –Copies Protocol 4.
Primary-Copy Protocol 5. Quorum-Based Protocol
1. Read- Only- Replication: This approach allows the replication of only immutable files,
since immutable files are used only in the readonly mode, because mutable files cannot be
replicated. This approach is too restrictive in the sense that it allows the replication of only
immutable files.
2. Read-Any-Write-All Protocol: This approach allows the replication of mutable files. In this
method, a read operation on a replicated file is performed by reading any copy of the file
and write operation by writing to all copies of the file. Some of the lock has to be used to
carryout a write operation. That is, before updating any copy, all copies are locked, then the
they are updated, and finally locks are released to complete write operation. The protocol is
used for implementing UNIX like Semantics The main problem with this approach is that a
write operation can’t be performed if any of the servers having a copy of the replicated file
is down at a time of write operation.
3.Available-Copies Protocol: this Approach allows the write operation to be carried out
even when some of the servers having a copy of the replicated file are down. In this method
the read operation is performed by reading any available copy, but a write operation is
performed by writing to all available copies. When the server recovers after a failure, it
brings itself up to date by copying from the other servers before accepting any user request.
4.Primary-Copy Protocol: another simple method to solve the multi-copy update problem is
the primary –copy protocol. In this protocol for each replicated file one copy is as the
primary copy and all others are secondary copies. Read operation can be performed using
any copy primary or secondary. Each server having a copy updates its copy either by
receiving notification of changes from the server having the primary copy or by requesting
the updated copy from it.
5. Quorum –Based Protocol: This protocol is capable of handling the network partition
problem and can increase the availability of write operations at the expense of read
operation.
Fault Tolerance
A system fails when it does not match its promises. An error in a system can lead to a
failure. The cause of an error is called a fault.
Faults are generally transient, intermittent or permanent. Transient occurs once and
disappears, intermittent fault keep reoccurring and permanent faults continue till
system is repaired.
Being fault tolerant is related to dependable systems. The dependability includes
availability, reliability, safety and maintainability.
The ability of a system to continue functioning in the event of partial failure is known
as fault tolerance.
It is an important goal in distributed system design to construct a system that can
automatically recover from partial failures without affecting overall performance.
The primary file properties that directly influence the ability of a distributed file system to
tolerate faults are as follows:
i. Availability
This refers to the fraction of time for which the file is available for use. This property
depends on location of the clients and the location of files.
Example: If a network is partitioned due to a communication failure, a link may be
available to the clients on some nodes but may not be available to clients on other
nodes. Replication is a primary mechanism for improving the availability.
ii. Robustness
This refers to the power to survive crashes of the storage files and storage decays of
the storage medium on which it is stored. Storage devices implemented using
redundancy techniques are often used to store robust files.
A robust file may not be available until the faulty component has been recovered.
Robustness is independent of either the location of the file or the location of clients.
iii. Recoverability
This refers to the ability to roll back to the previous stable and consistent state when
an operation on a file is aborted by the client. Atomic update techniques such as
transaction mechanism are used.
The main advantage of stateless servers is that they can easily recover from failure. Because
there is no state that must be restored, a failed server can simply restart after a crash and
immediately provide services to clients as though nothing happened. Furthermore, if clients
crash the server is not stuck with abandoned opened or locked files. Another benefit is that
the server implementation remains simple because it does not have to implement the state
accounting associated with opening, closing, and locking of files.
7)Operations
In a Stateful server for byte stream files the following operations take place:
i. Open (filename, mode): This operation opens a file named filename in the specified mode.
An entry for this file in file table is created which maintains file state information. When file
is opened R/W pointer is set to zero and the client receives the file Fid.
ii. Read (Fid, m, buffer): This operation gets m bytes of data from the Fid file into the buffer.
When this operation is executed the client receives m bytes of file data starting from the
byte addressed by R/W pointer and then the pointer is incremented by m.
iii. Write (Fid, m, buffer): When this operation is executed, m bytes of data are taken from
the specified buffer and writes it into the Fid file at byte position which is addressed by the
W/R pointer and then increments the pointer by m.
iv. Seek (Fid, position): This operation changes the value of read-write pointer of the file Fid
to a new value specified as position.
v. Close (Fid): This operation is used to delete file state information of the file Fid from its
file-table.
In a stateless servers For a byte stream file server the operations for a file to be stateless
are:
i. Read (filename, position, m, buffer): For this operation the server returns to the client
with m bytes of data of the file identified as filename and saves it in the buffer. The value of
the bytes is returned to the client and the position for reading is specified as position
parameter.
ii. Write (filename, position, m, buffer): This operation takes m bytes of data from the buffer
and writes it into the file named filename. The position to start writing in the file is specified
by position parameter.
8) Disadvantages
Stateful server – If the server crashes and restarts, the state information it was holding may
be lost and the client may produce inconsistent results – If the client process crashes and
restarts, the server will have inconsistent information about the client.
Stateless server: The stateless service paradigm also suffers from the drawbacks of longer
request messages and slower processing of request. Request messages are longer because
every request must be accompanied with all the necessary parameters to successfully carry
out the desired operation.
Naming
The naming facility of a distributed operating system enables users and programs to
assign character-string names to objects and subsequently use these names to refer
to those objects.
The locating facility, which is an integral part of the naming facility, maps an object's
name to the object's location in a distributed system.
The naming and locating facilities jointly form a naming system that provides the
users with an abstraction of an object that hides the details of how and where an
object is actually located in the network.
It provides a further level of abstraction when dealing with object replicas. Given an
object name, it returns a set of the locations of the object's replicas.
The naming system plays a very important role in achieving the goal of location
transparency, facilitating transparent migration and replication of objects, object
sharing.
Several object-locating mechanisms have been proposed and are being used by various
distributed operating systems.
1. Broadcasting
2. Expanding ring Broadcast
3. Encoding location of object within its UID
4.Searching creator node first and then Broadcasting
5. Using forward location pointers
6. using hint cache and broadcasting
Unit 5
Distributed DataBase
Users access the distributed database via applications. Applications are classified as those
that do not require data from other sites local Applications) and those that do require data
from other sites global applications). We require a DDBMS to have at least one global
application.
Features
Types:
1. Homogeneous Database:
In a homogeneous database, all different sites store database identically.
The operating system, database management system and the data structures used –
all are same at all sites.
All sites Are aware of each other and agree to cooperate in processing user requests.
Each site surrenders part of its autonomy in terms of right to change schemas or
software
Appears to user as a single system
They’re easy to design and manage.
2. Heterogeneous Database:
In a heterogeneous distributed database, different sites can use different schema
and software.
Difference in schema is a major problem for query processing
Difference in software is a major problem for transaction processing
Sites may not be aware of each other and may provide only limited facilities for
cooperation in query processing
translations are required for different sites to communicate.
Difficult to design and manage
Disadvantages
1. Complexity of management and control.
2. Applications must recognize data location, and they must be able to stitch together data
from various sites.
3. Security concerns
4. Increased storage and infrastructure requirements.
5. Multiple copies of data has to be at different sites, thus an additional disk storage space
will be required.
6. The probability of security lapses increases when data are located at multiple sites.
Multimedia applications generate and consume continuous streams of data in real time.
They contain large quantities of audio, video and other time-based data elements, and the
timely processing and delivery of the individual data elements is essential. In distributed
system, data transmission is pre-requisite. So the main topic in distributed multimedia
system is how to transfer multimedia data within the demanded quality. The existing
standards and platforms about the distributed system, such as RM-ODP, CORBA and DCE,
mainly focus on the discrete data transmission. The introduction of multimedia computing
puts a large number of new requirements on distributed system.
Firstly, the distributed multimedia system should be able to provide support for continuous
media types, such as audio, video and animation. The introduction of such continuous
media data to distributed systems demands the need for continuous data transfers over
relatively long periods of time. For example, playing a video from a remote website implies
that the timeliness of such media transmission must be maintained in the course of the
continuous media presentation.
The second requirement of distributed multimedia applications is the need for sophisticated
quality of service (QoS) management. In most traditional computing environments, requests
for a particular service are either met or ignored. But in multimedia system, there are more
contents, which can be classified into static QoS management and dynamic QoS
management.
Yet another requirement of distributed multimedia applications is the need for a rich set of
real-time synchronization mechanisms about continuous media transmission. Such real-time
synchronizations can be divided into two categories: intra-media synchronization and inter-
media synchronization.
A further requirement is to support multiparty communications. Many distributed
multimedia applications are concerned with interactions between dispersed groups of
users, for example, a remote conference application. So it is important for distributed
multimedia system to support multiparty communication.
QoS Managers
Software that runs on network nodes which have two main functions:
QoS negotiation: get requirements from apps and checks feasibility versus
available resources.
Admission control: If negotiation succeeds, provides a "resource contract"
that guarantees reservation of resources for a certain time.
Ways to achieve QoS
Buffering (on both ends)
Compression
More load on the nodes, but that is okay
Bandwidth Reservation
Resource Scheduling
Traffic Shaping
Flow Specifications
Stream Adaptation
Amoeba
The Amoeba distributed operating system project is a research effort aimed at
understanding how to connect computers together in a seemless way.
• Amoeba is an OS that performs all the standard functions of any OS, but it performs
them with a collection of machines.
• One of the main goals of the Amoeba development team was to design a transparent
distributed system that allows users to log into the system as a whole.
• Amoeba is also a parallel system.
• On an Amoeba system, a single program or command can use multiple processors to
increase performance.
• Special development tools have been developed for an Amoeba environment that take
advantage of the inherent parallelism.
• When a user logs into the Amoeba system that they can access the entire system, and
are not limited to only operations on their home machine.
• The Amoeba architecture is designed as a collection of micro-kernels.
• Amoeba implements a standard distributed client / server model, where user processes
and applications (the clients) communicate with servers that perform the kernel
operations.
• An Amoeba system consists of four principle components: user workstations, pool
processors, specialized servers, and gateways.
Its origin
• Amoeba was originally designed and implemented at the Vrije University in
Amsterdam (the Netherlands) in 1981.
• Now, it is being jointly developed there and at the Center for Mathematics and
Computer Science, also in Amsterdam.
• Currently used in various EU countries
• Built from the ground up. UNIX emulation added later
•
Four basic design goals were apparent in Amoeba: Distribution, Parallelism, Transparency,
and Performance
Unit 4
Load Balancing
In every distributed system there is always a possibility that some nodes are heavily
loaded while some are lightly loaded or are even idle.
Load balancing is a technique in which workload is distributed across multiple computers
that are connected to a network and maybe distributed across the globe to get optimal
resource utilization, minimum time delay, maximize throughput and avoid overloading
of any single node.
A node can be identified by their current load as-
Heavily loaded: Enough jobs are waiting for execution.
Lightly loaded: less jobs are waiting execution
Idle: no jobs are executing on the node
The basic aim is to make every processor equally bussy and to finish the task approximately
at the same time.
Types of load balancing algorithm
Dynamic load balancing algorithms make changes to the distribution of work among
workstations at run-time; they use current or recent load information when making
distribution decisions.
Dynamic load balancing algorithms can provide a significant improvement in
performance over static algorithms.
However, this comes at the additional cost of collecting and maintaining load
information, so it is important to keep these overheads within reasonable limits
TASK MIGRATION
Introduction
Singhal and Shivaratri define process/task migration as the transfer of partially
executed tasks to another node in a distributed system - in other words, preemptive
task transfer. Some references to task migration include the transfer of processes
before execution begins, but the most difficult issues are those related to
preemption.
Terminology:
o Task placement: non-preemptive transfer of process that has never run
o Task migration: preemptive transfer of process that has been executed
o Home node: site where the process originates
o Foreign process: process executing on a node other than its home node
o Freezing: when a process is migrated, its execution is interrupted (frozen) for
the time it takes to move the process to another site.
Migration is useful for several reasons:
o load distribution, to equalize workload due to
transient periods of high load at some network nodes (transfer
processes from temporarily overloaded node to another node)
or in a system where process origination is always unbalanced (a few
workstations consistently generate many processes )
o to return a workstation to its owner. (migrate a foreign process back to its
home node when owner returns)
o to improve communication (processes that communicate frequently can be
moved to the same node; processes that access resources at a remote site
may be moved to that site)
Two steps in task migration:
o State transfer: collect all information crucial to process execution and
transfer to another machine
Process control block contains register contents, memory map
information, etc.
Process address space includes stack, heap, plus virtual pages
currently in memory
At some point during state collection, the process must be frozen
o Unfreeze: after installing the process state at the new site, the process is put
in the Ready queue to be scheduled
2. Deadlock Avoidance: Resources are granted as and when requested by the processes
provided the resulting system is safe. The system state is said to be safe, if there exist at
least one sequence of execution for all the process such that all of them can run to
completion without getting into deadlock situation.
Deadlock avoidance methods use some advance knowledge of the resource usage of
processes to predict the future state of the system for avoiding allocations that can
eventually lead to a deadlock. Deadlock avoidance algorithms are usually in the following
steps:
1. When a process requests for a resource, even if the resource is available for allocation, it
is not immediately allocated to the process. Rather, the system simply assumes that the
request is granted.
2. With the assumption made in step 1and advance knowledge of the resource usage of
processes, the system performs some analysis to decide whether granting the process's
request is safe or unsafe.
3. The resource is allocated to the process only when the analysis of step 2 shows that it is
safe to do so; otherwise the request is deferred.
3. Deadlock Detection and Recovery: In this strategy, the resources are allocated to the
processes as and when requested. However, the deadlock is detected by deadlock
detection algorithm. If deadlock is detected, the system recovers from it by aborting
one or more deadlocked processes.
In this approach for deadlock handling, the system does not make any attempt to prevent
deadlocks and allows processes to request resources and to wait for each other in an
uncontrolled manner. Rather, it uses an algorithm that keeps examining the state of the
system to determine whether a deadlock has occurred. When a deadlock is detected, the
system takes some action to recover from the deadlock.
Advantages: Overhead is low, simple to implement, false deadlocks not detected.