Sie sind auf Seite 1von 39

Shared Memory Architecture

Shared Memory Systems

M M M M

All PEs share a global memory Communication between tasks is performed through writing to and reading from the global memory. All interprocessor coordination and synchronization is also accomplished via the global memory.

Interconnection Network

Figure 4.1 Shared memory systems.

Designing Problems
Performance degradation due to: Contention Coherence The degradation might happen when

multiple processors are trying to access the shared memory simultaneously.

Typical Design
Use caches to solve the contention
problem.

However, having multiple copies of data


might lead to a coherence problem.

What we would study?


Variety of shared memory systems Their solutions of the cache coherence
problem

Fundamental of Systems
M

P1

P2

Figure 4.2

Shared memory via two ports.

Classication
Uniform Memory Access (UMA) Nonuniform Memory Access (NUMA) Cache-Only Memory Architecture
(COMA)

UMA
M Bus M M M

Figure 4.3

Bus-based UMA (SMP) shared memory system.

NUMA

Figure 4.4

NUMA shared memory system.

COMA
Interconnection Network

Figure 4.5

COMA shared memory system.

Bus-based Symmetric Multiprocessors


Pros: Bus is the simplest network for shared
memory systems.

Bus/cache architecture reduce: the need for expensive multiported


memories & interface circuitry

the need to adopt a message-passing

paradigm (when developing software)

Bus-based Symmetric Multiprocessors


Cons: the bus may get saturated if multiple

processors are trying to access the shared memory simultaneously via the bus (of course)

Bus/Cache
Using high-speed caches to solve the bus
contention

Caches are in the middle between the bus


and processors.

Hit rate should be as high as possible. Factors: ranging from application

programs being run to the manner in which cache hardware is implemented.

the central bus. Otherwise, processor speed will be limited by bus bandwidth. We dene the variables for hit rate, number of processors, processor speed, bus speed, and processor duty cycle rates as follows:
. . . . .

N number of processors; h hit rate of each cache, assumed to be the same for all caches; (1 2 h) miss rate of all caches; B bandwidth of the bus, measured in cycles/second; I processor duty cycle, assumed to be identical for all processors, in fetches/ cycle; and V peak processor speed, in fetches/second.

The effective bandwidth of the bus is BI fetches/second. If each processor is running at a speed of V, then misses are being generated at a rate of V(1 2 h). For an N-processor system, misses are simultaneously being generated at a rate of N(1 2 h)V. This leads to saturation of the bus when N processors simultaneously try to access the bus. That is, N(1 2 h)V BI. The maximum number of processors with cache memories that the bus can support is given by the relation, N BI 1 hV

Example 1 Suppose a shared memory system is constructed from processors that can execute V 107 instructions/s and the processor duty cycle I 1. The caches

a speed of V, then misses are being generated at a rate of V(1 2 h). For an N-processor system, misses are simultaneously being generated at a rate of N(1 2 h)V. This leads to saturation of the bus when N processors simultaneously try to access the bus. That is, N(1 2 h)V BI. The maximum number of processors with cache memories that the bus can support is given by the relation, N BI 1 hV

Example 1 Suppose a shared memory system is constructed from processors that can execute V 107 instructions/s and the processor duty cycle I 1. The caches are designed to support a hit rate of 97%, and the bus supports a peak bandwidth of B 106 cycles/s. Then, (1 2 h) 0.03, and the maximum number of processors N is N 106/(0.03 * 107) 3.33. Thus, the system we have in mind can support only three processors! We might ask what hit rate is needed to support a 30-processor system. In this case, h 1 2 BI/NV 1 2 (106(1))/((30)(107)) 1 2 1/300, so for the system we have in mind, h 0.9967. Increasing h by 2.8% results in supporting a factor of ten more processors.

4.3

BASIC CACHE COHERENCY METHODS

Multiple copies of data, spread throughout the caches, lead to a coherence problem

Basic Cache Coherence Methods


As mentioned, cache coherence problem
exists when using caches:

Cache coherence algorithms are needed to


maintain a level of consistency throughout the parallel system

Cache-Memory Coherence Cache-cache Coherence Shared Memory System Coherence

4.3.2

Cache Cache Coherence

Cache-Memory Coherence
Using one of the two policies Write-through Write-back
TABLE 4.1 Write-Through vs. Write-Back Write-Through Serial 1 2 3 Event P reads X P updates X Memory X X X0 Cache X X0 Write-Back Memory X X X Cache X X0

In multiprocessing system, when a task running on processor P requests the data in global memory location X, for example, the contents of X are copied to processor Ps local cache, where it is passed on to P. Now, suppose processor Q also accesses X. What happens if Q wants to write a new value over the old value of X? There are two fundamental cache coherence policies: (1) write-invalidate, and (2) write-update. Write-invalidate maintains consistency by reading from local caches until a write occurs. When any processor updates the value of X through a write, posting a dirty bit for X invalidates all other copies. For example, processor Q invalidates all other copies of X when it writes a new value into its cache. This sets the dirty bit for X. Q can continue to change X without further notications to other caches because Q has the only valid copy of X. However, when processor P wants to read X, it must wait until X is updated and the dirty bit is cleared. Write-update maintains consistency by immediately updating all copies in all caches. All dirty bits are set during each write operation. After all copies have been updated, all

Cache-Cache Coherence
Two fundamental policies Write-update Write-invalidate
4.4 SNOOPING PROTOCOLS

83

TABLE 4.2

Write-Update vs. Write-Invalidate Write-Update Write-Invalidate Ps Cache X X INV INV Qs Cache X X0 X00

Serial 1 2 3 4

Event P reads X Q reads X Q updates X Q updates X0

Ps Cache X X X0 X00

Qs Cache X X0 X00

dirty bits are cleared. Table 4.2 shows the write-update versus write-invalidate

Shared Memory System Coherence

The four combinations to maintain

coherence among all caches & global memory are:

Write-update & write-through Write-update & write-back Write-invalidate & write-through Write-invalidate & write-back

M Bus

Figure 4.3

Bus-based UMA (SMP) shared memory system.

Snooping Protocols
Based on watching bus activities and carry
out the appropriate coherency commands when necessary. block has a state associated with it. operations: RM, RH, WM, and WH.

Global memory is moved in blocks - each Any state might change as a result of

Snooping Protocols
A cache miss means that the requested
block is not in the cache or it is in the cache but has been invalidated.

Snooping protocols differ in whether they Also differ as to where to obtain the new
data in the case of a cache miss.

update or invalidate shared copies in remote caches in case of a write operation.

Write-Invalidate & Write-Through


84
SHARED MEMORY ARCHITECTURE

TABLE 4.3 State

Write-Invalidate Write-Through Protocol Description The copy is consistent with global memory. The copy is inconsistent. Actions Use the local copy from the cache. Fetch a copy from global memory. Set the state of this copy to Valid. Perform the write locally. Broadcast an Invalid command to all caches. Update the global memory. Get a copy from global memory. Broadcast an invalid command to all caches. Update the global memory. Update the local copy and set its state to Valid. Since memory is always consistent, no write-back is needed when a block is replaced.

Valid [VALID] Invalid [INV] Event Read-Hit Read-Miss Write-Hit Write-Miss

Block replacement

and protocol are summarized in Table 4.3. Example 2 Consider a bus-based shared memory with two processors P and Q as shown in Figure 4.6. Let us see how the cache coherence is maintained using WriteInvalidate Write-Through protocol. Assume that that X in memory was originally set to 5 and the following operations were performed in the order given: (1) P reads X; (2) Q reads X; (3) Q updates X; (4) Q reads X; (5) Q updates X; (6) P updates X; (7) Q reads X. Table 4.4 shows the contents of memory and the

P
Figure 4.6

A bus-based shared memory system with two processors P and Q.

4.4

SNOOPING PROTOCOLS

85

TABLE 4.4

Example 2 (Write-Invalidate Write-Through) Memory Location X 5 5 5 10 10 15 20 20 Ps Cache Location X 5 5 5 5 5 20 20 State VALID VALID INV INV INV VALID VALID 5 10 10 15 15 20 VALID VALID VALID VALID INV VALID Qs Cache Location X State

Serial 0 1 2 3 4 5 6 7

Event Original value P reads X (Read-Miss) Q reads X (Read-Miss) Q updates X (Write-Hit) Q reads X (Read-Hit) Q updates X (Write-Hit) P updates X (Write-Miss) Q reads X (Read-Miss)

two caches after the execution of each operation when Write-Invalidate WriteThrough was used for cache coherence. The table also shows the state of the

Write-Invalidate & Write-Back


86
SHARED MEMORY ARCHITECTURE

(Ownership Protocol)

TABLE 4.5 State

Write-Invalidate Write-Back Protocol Description Data is valid and can be read safely. Multiple copies can be in this state. Only one valid cache copy exists and can be read from and written to safely. Copies in other caches are invalid. The copy is inconsistent. Action Use the local copy from the cache. If no Exclusive (Read-Write) copy exists, then supply a copy from global memory. Set the state of this copy to Shared (Read-Only). If an Exclusive (Read-Write) copy exists, make a copy from the cache that set the state to Exclusive (Read-Write), update global memory and local cache with the copy. Set the state to Shared (ReadOnly) in both caches. If the copy is Exclusive (Read-Write), perform the write locally. If the state is Shared (Read-Only), then broadcast an Invalid to all caches. Set the state to Exclusive (Read-Write). Get a copy from either a cache with an Exclusive (ReadWrite) copy, or from global memory itself. Broadcast an Invalid command to all caches. Update the local copy and set its state to Exclusive (Read-Write). If a copy is in an Exclusive (Read-Write) state, it has to be written back to main memory if the block is being replaced. If the copy is in Invalid or Shared (Read-Only) states, no write-back is needed when a block is replaced.

Shared (Read-Only) [RO] Exclusive (Read-Write) [RW] Invalid [INV] Event Read-Hit Read-Miss

Write-Hit

Write-Miss

Block replacement

Table 4.5. Example 2 Consider a bus-based shared memory with two processors P and Q as shown in Figure 4.6. Let us see how the cache coherence is maintained using WriteExample 3 Write-Through protocol. Assumesystem of X in memory was originally Invalidate Consider the shared memory that that Figure 4.6 and the following operations: (1) Pthe following Q reads X; (3) Q performed in the reads X; (5) Q set to 5 and reads X; (2) operations were updates X; (4) Q order given: updates reads(6) P updates X; (7) Q updatesX. Table reads X; (5)the updates X;of reads X; (4) Q 4.6 shows Q contents (1) P X; X; (2) Q reads X; (3) memory and the two Q reads after the 4.4 shows the contents of memory and the (6) P updates X; (7) caches X. Table execution of each operation when WriteInvalidate Write-Back was used for cache coherence. The table also shows the state of the block containing X in Ps cache and Qs cache. M 4.4.3 Write-Once

This write-invalidate protocol, which was proposed by Goodman in 1983, uses a combination of write-through and write-back. Write-through is used the very rst C C

P
Figure 4.6

A bus-based shared memory system with two processors P and Q.

4.4

SNOOPING PROTOCOLS

87

TABLE 4.6

Example 3 (Write-Invalidate Write-Back) Memory Location X 5 5 5 5 5 5 5 20 Ps Cache Location X 5 5 5 5 5 20 20 State RO RO INV INV INV RW RO 5 10 10 15 15 20 RO RW RW RW INV RO Qs Cache Location X State

Serial 0 1 2 3 4 5 6 7

Event Original value P reads X (Read-Miss) Q reads X (Read-Miss) Q updates X (Write-Hit) Q reads X (Read-Hit) Q updates X (Write-Hit) P updates X (Write-Miss) Q reads X (Read-Miss)

and perform updates to their local copies. There is also a special bus line, which is

Write-Once
This write-invalidate protocol uses a
a block is written. write-back. combination of write-through and writeback.

Write-through is used the very rst time Subsequent writes are performed using

88

SHARED MEMORY ARCHITECTURE

TABLE 4.7 State

Write-Once Protocol Description The copy is inconsistent. The copy is consistent with global memory. Data have been written exactly once and the copy is consistent with global memory. There is only one copy of the global memory block in one local cache. Data have been updated more than once and there is only one copy in one local cache. When a copy is dirty, it must be written back to global memory. Actions Use the local copy from the cache. If no Dirty copy exists, then supply a copy from global memory. Set the state of this copy to Valid. If a dirty copy exists, make a copy from the cache that set the state to Dirty, update global memory and local cache with the copy. Set the state to VALID in both caches. If the copy is Dirty or Reserved, perform the write locally, and set the state to Dirty. If the state is Valid, then broadcast an Invalid command to all caches. Update the global memory and set the state to Reserved. Get a copy from either a cache with a Dirty copy or from global memory itself. Broadcast an Invalid command to all caches. Update the local copy and set its state to Dirty. If a copy is in a Dirty state, it has to be written back to main memory if the block is being replaced. If the copy is in Valid, Reserved, or Invalid states, no write-back is needed when a block is replaced.

Invalid [INV] Valid [VALID] Reserved [RES]

Dirty [DIRTY]

Event Read-Hit Read-Miss

Write-Hit

Write-Miss

Block replacement

block states 2 Consider aare summarized in Table 4.7. two processors P and Q as Example and protocol bus-based shared memory with
shown in Figure 4.6. Let us see how the cache coherence is maintained using WriteInvalidate Consider the protocol. Assume that that Figure 4.6 and the following Example 4 Write-Through shared memory system of X in memory was originally set to 5 and reads X; (2) operations: (1) P the followingQoperations (3) Q updates X; in the reads X; (5) Q reads X; were performed (4) Q order given: (1) P X; (6) updates reads X;P(2) Q reads X;(7) Q reads X. Table 4.8 shows the updates X; updates X; (3) Q updates X; (4) Q reads X; (5) Q contents of (6) P and the two Q reads X. Table 4.4 shows of contents of memory and the memoryupdates X; (7) caches after the execution the each operation when Write-

Once was used for cache coherence. The table also shows the state of the block containing X in Ps cache and Qs cache. M 4.4.4 Write-Update and Partial Write-Through

In this protocol an update to one cache is written to memory at the same time it is broadcast to other caches sharing the updated block. These caches snoop on the bus C C

P
Figure 4.6

A bus-based shared memory system with two processors P and Q.

Invalid states, no write-back is needed when a block is replaced.

TABLE 4.8

Example 4 (Write-Once Protocol) Memory Location X 5 5 5 10 10 10 10 20 Ps Cache Location X 5 5 5 5 5 20 20 State VALID VALID INV INV INV DIRTY VALID 5 10 10 15 15 20 VALID RES RES DIRTY INV VALID Qs Cache Location X State

Serial 0 1 2 3 4 5 6 7

Event Original value P reads X (Read-Miss) Q reads X (Read-Miss) Q updates X (Write-Hit) Q reads X (Read-Hit) Q updates X (Write-Hit) P updates X (Write-Miss) Q reads X (Read-Miss)

Write-Update & Partial Write-Through

An update to one cache is written to These caches snoop on the bus and
perform update to their local copies.

memory at the same time it is broadcast to other caches sharing the updated block.

There is a special bus line, which is asserted


to indicate that at least one other cache is sharing the block.

TABLE 4.9 State

Write-Update Partial Write-Through Protocol Description This is the only cache copy and is consistent with global memory. There are multiple cache copies shared. All copies are consistent with memory. This copy is not shared by other caches and has been updated. It is not consistent with global memory. (Copy ownership.) Action Use the local copy from the cache. State does not change. If no other cache copy exists, then supply a copy from global memory. Set the state of this copy to Valid Exclusive. If a cache copy exists, make a copy from the cache. Set the state to Shared in both caches. If the cache copy was in a Dirty state, the value must also be written to memory. Perform the write locally and set the state to Dirty. If the state is Shared, then broadcast data to memory and to all caches and set the state to Shared. If other caches no longer share the block, the state changes from Shared to Valid Exclusive. The block copy comes from either another cache or from global memory. If the block comes from another cache, perform the update and update all other caches that share the block and global memory. Set the state to Shared. If the copy comes from memory, perform the write and set the state to Dirty. If a copy is in a Dirty state, it has to be written back to main memory if the block is being replaced. If the copy is in Valid Exclusive or Shared states, no write-back is needed when a block is replaced.

Valid Exclusive [VAL-X] Shared [SHARE] Dirty [DIRTY] Event Read-Hit Read-Miss

Write-Hit

Write-Miss

Block replacement

and protocol are summarized in Table 4.9. Example 2 Consider a bus-based shared memory with two processors P and Q as shown in Figure 4.6. Let us see how the cache coherence is maintained using WriteExample 5Write-Through protocol. Assumesystem of X in memory was originally Invalidate Consider the shared memory that that Figure 4.6 and the following operations: and Pthe following Poperations wereQperformed(4) Qthe orderX; (5) Q set to 5 (1) reads X; (2) updates X; (3) reads X; in updates given: reads X; (6) X; (2) Q reads X; (3) Q updates X; (4) Q reads X; (5) Q updates X; (1) P reads Block X is replaced in Ps cache; (7) Q updates X; (8) P updates X. Table 4.10 shows the reads X. Table 4.4 showsthe two caches after the execution (6) P updates X; (7) Q contents of memory and the contents of memory and the of each operation when Write-Update Partial Write-Through was used for cache coherence. The table also shows the state of the block containing X in Ps cache and Qs cache. M 4.4.5 Write-Update and Write-Back

This protocol is similar to the previous one except that instead of writing through to C C the memory whenever a shared block is updated, memory updates are done only when the block is being replaced. The block states and protocol are summarized in Table 4.11.

Example 6 4.6 A bus-based sharedmemory systemwithFigure 4.6 andPthe following Figure Consider the shared memory system of two processors and Q. operations: (1) P reads X; (2) P updates X; (3) Q reads X; (4) Q updates X; (5) Q reads X; (6) Block X is replaced in Qs cache; (7) P updates X; (8) Q updates X. Table 4.12 shows the contents of memory and the two caches after the execution

90

SHARED MEMORY ARCHITECTURE

TABLE 4.10 Example 5 (Write-Update Partial Write-Through) Memory Location X 5 5 5 10 15 15 15 Ps Cache Location X 5 10 10 15 15 State VAL-X DIRTY SHARE SHARE SHARE 10 15 15 15 SHARE SHARE SHARE VAL-X Qs Cache Location X State

Serial 0 1 2 3 4 5 6

Event Original value P reads X (Read-Miss) P updates X (Write-Hit) Q reads X (Read-Miss) Q updates X (Write-Hit) Q reads X (Read-Hit) Block X is replaced in Ps cache (Replace) Q updates X (Write-Hit) P updates X (Write-Miss)

7 8

15 25

25

SHARE

20 25

DIRTY SHARE

Write-Update & Write-Back


4.5 DIRECTORY BASED PROTOCOLS

91

TABLE 4.11 State

Write-Update Write-Back Protocol Description This is the only cache copy and is consistent with global memory. There are multiple cache copies shared. There are multiple shared cache copies. This is the last one being updated. (Ownership.) This copy is not shared by other caches and has been updated. It is not consistent with global memory. (Ownership.) Action Use the local copy from the cache. State does not change. If no other cache copy exists, then supply a copy from global memory. Set the state of this copy to Valid Exclusive. If a cache copy exists, make a copy from the cache. Set the state to Shared Clean. If the supplying cache copy was in a Valid Exclusion or Shared Clean, its new state becomes Shared Clean. If the supplying cache copy was in a Dirty or Shared Dirty state, its new state becomes Shared Dirty. If the sate was Valid Exclusive or Dirty, perform the write locally and set the state to Dirty. If the state is Shared Clean or Shared Dirty, perform update and change state to Shared Dirty. Broadcast the updated block to all other caches. These caches snoop the bus and update their copies and set their state to Shared Clean. The block copy comes from either another cache or from global memory. If the block comes from another cache, perform the update, set the state to Shared Dirty, and broadcast the updated block to all other caches. Other caches snoop the bus, update their copies, and change their state to Shared Clean. If the copy comes from memory, perform the write and set the state to Dirty. If a copy is in a Dirty or Shared Dirty state, it has to be written back to main memory if the block is being replaced. If the copy is in Valid Exclusive, no write back is needed when a block is replaced.

Valid Exclusive [VAL-X] Shared Clean [SH-CLN] Shared Dirty [SH-DRT] Dirty [DIRTY] Event Read-Hit Read-Miss

Write-Hit

Write-Miss

Block replacement

in Table 4.11. Example 2 Consider a bus-based shared memory with two processors P and Q as shown in Figure 4.6. Let us see how the cache coherence is maintained using WriteInvalidate Write-Through protocol.memory system of X in memory was following Example 6 Consider the shared Assume that that Figure 4.6 and the originally operations: (1) P following operations (3) reads X; (4) Q updates X; (5) Q set to 5 and thereads X; (2) P updates X;wereQperformed in the order given: reads X; X; (2) Q reads X; (3) in Qs cache; (7) P updates (5) Q Q updates (1) P reads(6) Block X is replaced Q updates X; (4) Q reads X; X; (8) updates X; X. Table 4.12 shows the contents of memory and the contents after the execution (6) P updates X; (7) Q reads X. Table 4.4 showsthe two cachesof memory and the

P
Figure 4.6

A bus-based shared memory system with two processors P and Q.

92

SHARED MEMORY ARCHITECTURE

TABLE 4.12

Example 6 (Write-Update Write-Back) Memory Location X 5 5 5 5 5 5 15 Ps Cache Location X 5 10 10 15 15 15 State VAL-X DIRTY SH-DRT SH-CLN SH-CLN VAL-X 10 15 15 SH-CLN SH-DRT SH-DRT Qs Cache Location X State

Serial 0 1 2 3 4 5 6

Event Original value P reads X (Read-Miss) P updates X (Write-Hit) Q reads X (Read-Miss) Q updates X (Write-Hit) Q reads X (Read-Hit) Block X is replaced in Qs cache (Replace) P updates X (Write-Hit) Q updates X (Write-Miss)

7 8

15 15

20 25

DIRTY SH-CLN

25

SH-DRT

Das könnte Ihnen auch gefallen