Sie sind auf Seite 1von 26

University of Pennsylvania

Distributed Systems

1
University of Pennsylvania

Introduction to Distributed Systems


Why do we develop distributed systems?
• Availability of powerful yet cheap microprocessors (PCs,
workstations), continuing advances in communication
technology,

What is a distributed system?


A distributed system is a collection of independent computers that
appear to the users of the system as a single system.
Examples:
• Network of workstations
• Distributed manufacturing system (e.g., automated assembly
line)
• Network of branch office computers

2
Advantages of Distributed Systems
University of Pennsylvania

over Centralized Systems


• Economics: A collection of microprocessors offer a better
price/performance than mainframes. Low price/performance ratio:
cost effective way to increase computing power.
• Speed: A distributed system may have more total computing power
than a mainframe. Ex. 10,000 CPU chips, each running at 50 MIPS.
Not possible to build 500,000 MIPS single processor since it would
require 0.002 nsec instruction cycle. Enhanced performance through
load distributing.
• Inherent distribution: Some applications are inherently
distributed. Ex. a supermarket chain.
• Reliability: If one machine crashes, the system as a whole can still
survive. Higher availability and improved reliability.
• Incremental growth: Computing power can be added in small
increments. Modular expandability
• Another deriving force: The existence of large number of personal
computers, the need for people to collaborate and share information.
3
Advantages of Distributed Systems
University of Pennsylvania

over Independent PCs


• Data sharing: Allow many users to access to a
common data base
• Resource Sharing: Expensive peripherals like color
printers
• Communication: Enhance human-to-human
communication, e.g., email, chat
• Flexibility: Spread the workload over the available
machines

4
University of Pennsylvania

Disadvantages of Distributed Systems

• Software: Difficult to develop software for


distributed systems
• Network: The network can saturate or
cause other problems
• Security: Easy access also applies to secrete
data

5
University of Pennsylvania

Hardware Concepts
All distributed systems consist of multiple CPUs, there are several different
ways the hardware can be organized interms of how they are interconnected
and how they communicate.
Flynn picked two characterstics they are the number of instructions stream
and the number of data streams.
A computer with a Single instruction and single data stream is called SISD.
All traditional uniprocessor computers.
The next category is SIMD, single instruction stream, multiple data stream.
This type refers to array processors with one instruction unit that fetches an
instruction, and then commands many data units to carry it out in parallel, each
with its own data. Some supercomputers are SIMD.
The next category is MISD, multiple instruction stream, single data stream.
No known computers fit this mode.
Next comes MIMD, we find multiple instruction stream, multiple data stream.
Which essentially means a group of independent computers, each with its own
program counter, program, and data. All distributed systems are MIMD. 6
University of Pennsylvania

Hardware Concepts
ATaxonomy of parallel and distributed computer systems.

7
University of Pennsylvania

MIMD (Multiple-Instruction Multiple-Data)


Tightly Coupled versus Loosely Coupled
 Tightly coupled systems (multiprocessors)
o shared memory
o intermachine delay short, data rate high
 Loosely coupled systems (multicomputers)
o private memory
o intermachine delay long, data rate low

8
University of Pennsylvania

Bus versus Switched MIMD

• Bus: a single network, backplane, bus, cable or other medium


that connects all machines. E.g., cable TV
• Switched: individual wires from machine to machine, with
many different wiring patterns in use.
Multiprocessors (shared memory)
– Bus
– Switched
Multicomputers (private memory)
– Bus
– Switched

9
University of Pennsylvania

Switched Multiprocessors

A crossbar switch An Omega switching network

11
University of Pennsylvania

Switched Multiprocessors
 for connecting large number (say over 64) of processors
 crossbar switch: n**2 switch points
 omega network: 2x2 switches for n CPUs and n memories,
log n switching stages, each with n/2 switches,
 total (n log n)/2 switches
 delay problem: E.g., n=1024, 10 switching stages from
CPU to memory. a total of 20 switching stages. 100
MIPS 10 nsec instruction execution time need 0.5 nsec
switching time
 NUMA (Non-Uniform Memory Access): placement of
program and data
 building a large, tightly-coupled, shared memory
multiprocessor is possible, but is difficult and expensive

12
Multicomputers
University of Pennsylvania

Bus-Based Multicomputers

A Multicomputer consisting of workstations on a LAN


 easy to build
 communication volume much smaller
 relatively slow speed LAN (10-100 MIPS, compared to
300 MIPS and up for a backplane bus)

13
University of Pennsylvania

Switched Multicomputers

 interconnection networks: E.g., grid, hypercube


 hypercube: n-dimensional cube

14
University of Pennsylvania

Software Concepts

• Software more important for users


• Three types:
1. Network Operating Systems
2. (True) Distributed Systems
3. Multiprocessor Time Sharing

15
University of Pennsylvania

Network Operating Systems


 loosely-coupled software on loosely-coupled hardware
 A network of workstations connected by LAN
 each machine has a high degree of autonomy
o rlogin machine
o rcp machine1:file1 machine2:file2
 Files servers: client and server model
 Clients mount directories on file servers
 Best known network OS:
o Sun’s NFS (network file servers) for shared file
systems
 a few system-wide requirements: format and meaning of
all the messages exchanged

16
University of Pennsylvania

NFS
NFS Architecture
• Server exports directories
• Clients mount exported directories
NSF Protocols
• For handling mounting
• For read/write: no open/close, stateless
NSF Implementation

17
University of Pennsylvania

(True) Distributed Systems

 tightly-coupled software on loosely-coupled hardware


 provide a single-system image or a virtual uniprocessor
 a single, global interprocess communication mechanism,
process management, file system; the same system call
interface everywhere
 Ideal definition:
“ A distributed system runs on a collection of
computers that do not have shared memory, yet looks
like a single computer to its users.”

18
University of Pennsylvania
Multiprocessor Operating Systems

Tightly-coupled software on tightly-coupled hardware


 Examples: high-performance servers
 shared memory
 single run queue
 traditional file system as on a single-processor system: central
block cache 19
University of Pennsylvania

Comparison of three different ways of organizing n CPUs

ITEM N/W OS DIS. OS MULTIPROCESSOR OS

Does it look like a No Yes Yes


virtual uniprocessor?

Do all have to run the No Yes Yes


same operating
system?
How many copies of N N 1
the operating systems
are there?

How is communication Shared files Messages Shared memory


achieved?

Are agreed upon n/w Yes Yes No


protocols required?

Is there a single run No No Yes


queue?

Does file sharing have Usually No Yes Yes


well-defined
semantics? 20
University of Pennsylvania

Design Issues of Distributed Systems

• Transparency
• Flexibility
• Reliability
• Performance
• Scalability

21
University of Pennsylvania

1. Transparency
• How to achieve the single-system image, i.e., how to make a
collection of computers appear as a single computer.
• Hiding all the distribution from the users as well as the
application programs can be achieved at two levels:
1) hide the distribution from users
2) at a lower level, make the system look transparent to
programs.
1) and 2) requires uniform interfaces such as access to
files, communication.

22
University of Pennsylvania

Types of transparency

– Location Transparency: users cannot tell where hardware and


software resources such as CPUs, printers, files, data bases are
located.
– Migration Transparency: resources must be free to move from one
location to another without their names changed.
E.g., /usr/lee, /central/usr/lee
– Replication Transparency: OS can make additional copies of files
and resources without users noticing.
– Concurrency Transparency: The users are not aware of the
existence of other users. Need to allow multiple users to
concurrently access the same resource. Lock and unlock for
mutual exclusion.
– Parallelism Transparency: Automatic use of parallelism without
having to program explicitly. The holy grail for distributed and
parallel system designers.
Users do not always want complete transparency: a fancy printer 1000
miles away

23
University of Pennsylvania

2. Flexibility

• Make it easier to change


• Monolithic Kernel: systems calls are trapped and executed by the kernel. All
system calls are served by the kernel, e.g., UNIX.
• Microkernel: provides minimal services. Shown in above Fig.
1) IPC
2) some memory management
3) some low-level process management and scheduling
4) low-level i/o
E.g., Mach can support multiple file systems, multiple system interfaces .
24
University of Pennsylvania

3. Reliability

• Distributed system should be more reliable than single


system. Example: 3 machines with .95 probability of being
up. 1-.05**3 probability of being up.
– Availability: fraction of time the system is usable.
Redundancy improves it.
– Need to maintain consistency
– Need to be secure
– Fault tolerance: need to mask failures, recover from
errors.

25
University of Pennsylvania

4. Performance

• Without gain on this, why bother with distributed systems.


• Performance loss due to communication delays:
– fine-grain parallelism: high degree of interaction
– coarse-grain parallelism
• Performance loss due to making the system fault tolerant.

26
University of Pennsylvania

5. Scalability

• Systems grow with time or become obsolete.


Techniques that require resources linearly in
terms of the size of the system are not scalable.
e.g., broadcast based query won't work for large
distributed systems.
• Examples of bottlenecks
o Centralized components: a single mail server
o Centralized tables: a single URL address book
o Centralized algorithms: routing based on complete
information

27

Das könnte Ihnen auch gefallen