You are on page 1of 34

US005627992A

Umted States Patent [19] [11] Patent Number: 5,627,992


Baror [45] Date of Patent: May 6, 1997

[54] ORGANIZATION OF AN INTEGRATED 4,471,430 9/1984 BoWden et a1. 364/200


CAC HE UNIT F0R FLEXIBLE USAGE IN 4,493,02 6 1/1985 01n ow1c
' h .............................. .. 364/200
MICROPROCESSOR (List continued on next page.)
FOREIGN PATENT DOCUMENTS
[75] Inventor: Gigy Baror. Austin, Tex.
0075714A2 4/1983 European Pat. O?'. .
- _ - - 0082949A3 6/1983 European Pat. Off. .
[73] Ass1gnee. glyfanced Micro Devices. Sunnyvale. 0325421A2 7/1989 European Pat. OE
' 0325419A2 7/1989 European Pat. O?. .
0325422A2 7/1989 European Pat. OE. .
[21] Appl. No.: 434,494 0325420A2 7/1989 European Pat. 01f. .
_ 2344093A1 3/1976 France .
[22] P119113 May 4, 1995 8728494 12/1987 United Kingdom ................. .. 364/200
WO81/
Related U.S. Application Data 01894A1 12/1980 WLPO .

[63] Continuation of Ser. No. 559,550, Jul. 23, 1990, abandoned, OTHER PUBLICATIONS
mggggge?fonnnuauon of Ser' No 146076 Jam-2O 1988 IBM Technical Disclosure Bulletin, Shared Castout
Buffer, vol. 28, No. 3. Au . 1985, . 1169-1174.
[5 Int- Cl~6 .................................................... .. Bursky, Intelligent Cache_nemory%hip Speeds Access to
[52] U.S. Cl. ........................ .. 395/460; 395/461; 395/469; Most CPUs_ Elcctl'onic Design, Mar. 5, 1987, vol. 35, N0,
395/470; 395/471; 395/472; 395/415 5. p. 30.
[58] Field of Search ................................... .. 364/200. 900; . .
395/469. 470, 471, 403. 472, 275. 461. (Llst ntmued on next Page)
402. 415. 892 Primary ExaminerChrist0pher B. Shin
Attorney, A ent, or Firm-Conle , Rose & Ta on EC; B.
[56] References Cited N061 Kivljlf y y
U.S. PATENT DOCUMENTS [57] ABSTRACT
3,761,833 9/1973 Alvarez et a1. ......................... .. 331/45 A computer system having a cache mamOry subsysmm
3,898,624 8/1975 Tobias .. 395/258 . . . . . .
WhlCh allows ?exible setting of caching p0l1c1es on a page
4,156,906 5/1979 Ryan .......... .. 395/455 b . d ?n b . A h bl k tam ? 1d . .d d
4,157,586 6/1979 Gannon et a1. .. .. 364/200 as an a e 1515' _a_ e S S 6 15 Pro 6
4 >228 ,503 10/1980 Waite et aL . I . . . . . . _ ' ' _ ' _ a 395,448 for each cache block to lndicate the cache blocks state, such
4,357,656 11/1932 Sam et a1_ _ . _ , , _ . _ _ _ _ I _ _ _ _ __ 364/900 as shared or exclusive. The cache block status ?eld controls
4,394,731 7/1983 Flusche et a1. .... .. 395/472 whether the cache control unit operates in a write-through
4,400,770 8/1983 Chan et a1. ........ .. 395/403 write mode or in a copy-back write mode when a write hit
4,423,433 12/1933 Tague at 81 395/375 access to the block occurs. The cache block status ?eld may
4,435,759 3/1984 Baum et a1. 364/200 be updated by either a TLB Write policy ?eld contained
$3312??? at al' within a translation look-aside buffer entry which corre
4:442:487 4/1984 Flei?hl' et a1. ..... .. 395/449 .sponds to the Page of the acceis or by a Sec? "1pm
4,445,174 4/1984 Fletcher ......... .. 395/448 mdePcndant of the TLB entry Whlch may be Provlded from
4,455,606 6/1984 Cushing et a1. . 395/307 the System on a line basis
4,464,7l7 8/1984 Keeley et a1. .. .... . . . . . . .. 395/449

4,471,429 9/1984 Porter et a1. .......................... .. 364/200 32 Claims, 4 Drawing Sheets

PROCESSOR
BUS UNIT
I CACHE UNIT
gi??sm?lclziil
I
I MEMORY BUS UNIT
pm /312 313 314 l
TAG VALID 58111100; LRU
I ARRAY ARRAY ARRAY ARRAY I
CBSF /305
301i 512x20 512x4 +5CTEEX%'+ 25ex1 I XIEQISERSE
A0451 I I25 11 {9 Is | LOG":
: 332 ADDRESS BUS 111 310 I32 315'
MBJORY ARRAY SPECML
\ m: ~ REsisTms I
DATA BLOCK

00-031
1502 \
DATA _ 52
{I 32
I
I
OTSH'F'ED :2 DATA BUS 1

355
"-5
i'_"3o3_ ---- m"I__-'zo1_'1|
PROCESSOR \{320 CACHE 321 I MEMORY MEMORY
.,,,,- I \ 1 I @111?
W155- T"AEE-wRFE___< _____ __I_ __
POLICY FIELD
5,627,992
Page 2

US. PATENT DOCUMENTS Bell et al. An Investigation of Alternative Cache Organi


zations, IEEE Transactions on Computers, vol. C-23, No.
4,506,323 3/1985 Pusic et a1. ........................... .. 395/488
4,513,367 4/1985 Chan et al. .. 364/200 4, Apr. 1974, pp. 346-351.
4,530,047 7/1985 Roger et a1. . 364/200 Dubois et al., E?ects of Cache Coherency in Multiproces
4,553,201 11/1985 Pollack, Jr. .. 364/200 sors, IEEE Transactions on Computers, vol. C31, No. 11,
4,608,631 8/1986 Sti?ler et al. 395/293 Nov. 1982, pp. 1083-1099.
4,616,310 10/1986 Dill et a1. 395/250
Papamaroos et al., A Low-Overhead Coherence Solution
4,622,631 11/1986 Frank et a1. . 395/800
4,648,029 3/1987 Cooper et a1. 364/200 for Multiproces sors with Private Cache Memories. The 1 1"
4,654,819 3/1987 Stif?er et a1. ......................... .. 395/489 Annual International Symposium on Computer Architec
4,669,043 5/1987 Kaplinsky. ture. Ann Arbor, Michigan, Jun. 05-07/ 1984, pp. 348-354.
4,713,755 12/1987 Worley, Jr. et al. .................. .. 364/200 Patel, Analysis of Multiprocessors with Private Cache
4,755,930 7/1988 Wilson, Jr. et a1. . Memories, lEEETransactions on Computers, vol. C31, No.
4,758,982 7/1988 Price ..................................... .. 395/435
4,766,534 8/1988 De Bemedictis 364/200
4. Apr, 1982, pp. 296-304.
4,775,955 10/1988 Liu ................. .. 364/200 Rao, Performance Analysis of Cache Memories, Journal
4,787,028 11/1988 Finfrock et a1. . 364/200 of the Association for Computng Machinery, vol. 25, No. 3,
4,794,521 12/1988 Ziegler et al. 364/200 JuL, 1978, pp. 378-395.
4,797,813 1/1989 Igarashi ...... .. 395/445
Rudolph et a1. Dynamic Decentralized Cache Schemes for
4,811,208 3/1989 Myers et al. . 395/800
4,811,209 3/1989 Rubinstein ..... .. 395/471
M]MD Parallel Processors, The 11 Annual International
4,825,360 4/1989 Knight, Jr. ..... .. 364/200 Symposium on Computer Architecture, Ann Arbor, Michi
4,843,542 6/1989 Dashiell et al. . 364/200 gan, Jun. 05-07/1984. pp. 340-347.
4,847,804 7/1989 Shaffer et al. 364/900 Smith, Cache Memories, Computing Surveys, vol. 14, No.
4,851,990 7/1989 Johnson et a1. .. 395/280 3, Sep., 1982, pp. 473-530.
4,853,846 8/1989 Johnson et a1. .. 364/200
4,860,192 8/1989 Sachs et a1. . 364/200 Censier et al, A New Solution to Coherence Problems in
5,008,813 4/1991 Crane et al. .. 364/200 Multicache Systems, IEEE Transactions on Computers,
5,025,366 6/1991 Baror ......... .. 395/455 vol. C-27. No. 12., Dec., 1978, pp. 1112-1118.
5,091,846 2/1992 Sachs et al. 395/250 Tang, Cache System Design in the Tightly Coupled Mul
5,136,691 8/1992 Baror ......... .. 395/466
tiprocessor System, National Computer Conference, 1976,
5,185,878 2/1993 Baror et al. .. 395/450
pp. 749-753.
OTHER PUBLICATIONS Yen et al., Analysis of Multiprocessor Cache Organizations
R.H. Katz et al., Implementing a Cache Consistency Pro
iwth Alternative Main Memory Update Policies, The 8"
tocol. 12" Annual International Symposium on Computer Annual Symposium on Computer Architecture, Minneapolis
Architecture, Boston Mass., Jun. 17-19/1985, pp. 276-283.
Minn. May 12-14/1981, pp. 89-105.
Ian Wilson, Extending 80386 Performance. New Elec Goodman, Using Cache Memory to Reduce Proces
tronics. vol. 20. No. 7, Mar. 31, 1995, pp. 30-33. sor-Memory Tra?ic,pp. 124-130.
Sachs. The Fairchild Clipper Microprocessor Family, A Implementing a Cache Consistency Protocol, Intema
High-Performance 32-Bit Processor. 8080 Wescon Pro tional Symposium on Computer Architecture (12thz1985;
ceedings. 1985 Session. NY, US. Nov. 19-22/1985. Boston, Mass.).
US. Patent May 6, 1997 Sheet 1 of 4 5,627,992

\/
m2\25E6
Em0a5m:
<M150
A
v./
m$\<m$4E6w5z238
3Hvm2gm. 02wt
A\G
\\M201:03pm;
/
Q:\ /3A
2$?z035\8:ES?w
8SD

OnTm8:2295 ;
\EOE:

DH
5 1627.992
1 2
ORGANIZATION OF AN INTEGRATED The ?exible implementation of a plurality of multipro
CACHE UNIT FOR FLEXIBLE USAGE IN cessor support schemes in the cache system is achieved,
SUPPORTING MICROPROCESSOR according to the preferred embodiment of the invention, via
OPERATIONS methods and apparatus which allow the user to specify a
desired multiprocessor support scheme through the setting
This application is a continuation of application Ser. No. of appropriate option bits in on-chip special registers. This
07/559550 ?led Jul. 23. 1990. now abandoned. which is a can be performed under software control and allows a high
continuation-application of Ser. No. 07/ 146.076. ?led Jan. performance multiprocessor cache system to be designed
20. 1988. now abandoned. with few parts. at low cost and with the ability to perform
10 with high e?iciency.
BACKGROUND OF THE INVENTION The preferred embodiment of the invention is described in
1. Field of the Invention the context of an integrated cache unit (ICU) which includes
The invention relates generally to methods and apparatus 8 k bytes of data. 512-20 bit words of tags. and the necessary
for organizing cache units and more particularly relates to control to fully implement cache functions in both RISC and
methods and apparatus for organizing an integrated cache non-RISC environments. The preferred embodiment of the
unit which may be used to implement a variety of multi ICU has two buses. one for the processor interface and the
processor support schemes in a ?exible manner. The result other for a memory interface. An exemplary processor and
ing cache unit is programmable. can be operated in both high speed interface supported by the ICU is described in
single and multiprocessor modes. and achieves cache data copending application Ser. No. 012,226 ?led Feb. 9. 1987,
consistency in multiprocessor mode by allowing a plurality 20 assigned to the same assignee as the instant invention,
of user selectable multiprocessor support schemes to be hereby incorporated by reference.
implemented and tailored to cache system application. The For the purposes of this disclosure. the processor bus is a
novel system supports high speed instruction and data non-multiplexed 32 bit address and data bus. The processor
processing in a Reduced Instruction Set Computer (RISC) bus supports burst and pipeline accesses as taught in the
environment and is capable of supporting all of the aforesaid 25 copending application. Additionally. for the purposes of this
functions with an architecture suitable for integration on a disclosure. the memory bus provides the interface to main
single chip. memory and accommodates multiprocessor organizations.
2. Description of the Related Art According to the preferred embodiment of the invention,
Cache memories and controllers for cache memories are 30 the ICU is capable of operating at frequencies in excess of
well known. Devices integrating both the memory and I 25 megahertz. achieving processor access times of two
control features on a single chip are also known. These cycles for the ?rst access in a sequence. and one cycle for
include the commercially available: 43608 manufactured by burst-mode or pipelined accesses. It can be used as either an
NEC. Such devices are hereinafter referred to as Integrated instruction cache or data cache with ?exible internal cache
Cache Units (ICUs). organization. A RISC processor and two ICUs (for instruc
35
Prior art ICU devices utilize predetermined algorithms for tion cache and data cache) implements a very high perfor
caching data and instructions i.e., the devices are not pro mance processor with 16 k bytes of cache. Larger caches can
grammable. Heretofore. integrating cache memory. a cache be designed by using additional ICUs.
controller and programmability features on a single chip has In one embodiment, a computer system having a cache
not been achieved due in part to circuit density and data path 40
memory subsystem in accordance with the invention allows
requirement. In addition to not being programmable, no ?exible setting of caching policies on a page basis and a line
known ICU architecture has overcome the circuit density basis. A cache block status ?eld is provided for each cache
and data path requirement problems associated with sup block to indicate the cache blocks state, such as shared or
porting high speed RISC systems having multiprocessor exclusive. The cache block status ?eld controls whether the
capabilities. 45
cache control unit operates in a write-through write mode or
A programmable integrated cache unit would be desirable in a copy-back write mode when a write hit access to the
since it would have the inherent ?exibility to permit the block occurs. The cache block status ?eld may be updated by
selection and/or modi?cation of caching algorithms. either a TLB write policy ?eld contained within a translation
Additionally, a programmable ICU which incorporates a look-aside buffer entry which corresponds to the page of the
plurality of user selectable multiprocessor support schemes access. or by a second input independent of the TLB entry
would allow cache data consistency to be assured in a which may be provided from the system on a line basis.
?exible manner. It is an object of the invention to provide methods and
A single chip ICU architecture with the aforementioned apparatus for realizing a programmable cache unit which has
features would also be desirable to minimize space and unit enough inherent ?exibility to support a wide variety of user
power requirements. Still further, it would be desirable to be 55 speci?ed multiprocessor support schemes, while imposing
able to use such an integrated cache unit to support high only minimal restrictions on the multiprocessor system
speed processing operations in both single and multiproces organization.
sor modes. for both RISC and non-RISC environments. It is a further object of the invention to provide an ICU
architecture including means for choosing from a plurality
SUMMARY OF THE INVENTION of multiprocessor support schemes to assure cache data
Methods and apparatus are disclosed for realizing an consistency.
integrated cache unit which may be used to ?exibly imple It is still a further object of the invention to provide
ment a plurality of multiprocessor support schemes. The methods and apparatus for realizing a programmable inte
preferred embodiment of the invention comprises both a grated cache unit suitable for supporting high speed data and
cache memory and a cache controller on a single chip, is 65 instruction processing applications, in a multiprocessing
programmable and includes the other aforementioned desir system. in both RISC and non-RISC architecture environ
able features. ments.
5,627,992
3 4
It is yet another object of the invention to realize the typical system con?guration to. for example, allow com
aforementioned objectives with a cache architecture which mercially available peripheral devices having speeds much
can be integrated on a single chip. lower then the high speed processor, to be attached to the
The disclosed ICU can be used as either a data cache or system without limiting the performance of the processor,
instruction cache. cache units. etc.
Further features include ?exible and extensive multipro A typical arithmetic accelerator is described in copending
cessor support hardware. modularity, low power application Ser. No. 771,385. ?led Aug. 30, 1985. also
requirements. etc. A combination of bus watching. owner assigned to the same assignee as the instant invention.
ship schemes. software control and hardware control are also Before proceeding with the detailed description. the novel
10
used to achieve cache consistency. integrated cache architecture is described hereinafter using
These and other objects and features of the present general cache terms and multiprocessor cache terms which
invention will become apparent to those skilled in the art themselves need to be de?ned ?rst for the sake of clarity.
upon consideration of the following detailed description and The general cache terms used herein are de?ned as follows:
the accompanying Drawing. in which like reference desig 15
BlockA cache block is a group of sequential words
nations represent like features throughout the ?gures. associated with a tag. A cache block is allocated and
replaced as a group whenever required. In the preferred
BRIEF DESCRIPTION OF THE DRAWING embodiment of the ICU the block size is four words (one tag
FIG. 1 depicts. in block diagram form. a computing per four words).
system that includes two of the novel ICUs. Block statusStatus bits which are associated with a
FIG. 2 depicts the pin package and the various inputs and cache block. According to the preferred embodiment of the
outputs. to/from the novel ICU. ICU, there are two status bits per block. They specify the
modi?ed and shared status of a block.
FIG. 3 is a data ?ow diagram depicting data ?ow between
the processor bus unit and memory bus unit of FIG. 1. via Block status array-An on chip random access memory
25 array that contains the block status bits.
the novel ICU.
FIG. 4 depicts a simpli?ed shared bus multiprocessor Copy-backA write policy in which a write access is
system with two ICUs per processor. one used as an instruc performed only in the cache, for the case of hit. The block
tion cache. the other used as a data cache. which includes the written data is marked as modi?ed. The
data is written (copied back) to the main memory only when
DETAILED DESCRIPTION the modi?ed block is replaced.
Data cache-A cache which is used for caching fre
FIG. 1 depicts a typical computing system con?guration
quently used processor data variables.
that would include the novel ICU.
Direct-mappedThis is an alternate term for a one-way
For the sake of illustration only. the invention will be
set-associative organization. A speci?c address can be
described in the context of a RISC processing system having 35
cached only in one speci?c location (directly mapped by the
both single and multiprocessor modes of operation. It will be
clear to those skilled in the art that the ICU. to be described address) in the cache.
in detail hereinafter. may be used in non-RISC environments Hit--The word speci?ed by the access address is present
as well. in the cache memory array. An address match is found in the
40 tag array, and the corresponding valid bit is set.
FIG. 1 depicts two of the new ICUs. ICU 101 is shown
being used as an instruction cache. while ICU 102 is shown InvalidateAn operation that removes valid data from
being used as a data cache. the cache. One or more valid bits are reset, so that the
corresponding words become invalid.
An example of the depicted, and suitable, high speed
interface coupling the RISC streamlined instruction proces Instruction cacheA cache which is used for caching
sor (SIP) to the caches is described in detail in the copending 45 frequently used processor instructions.
application Ser. No. 012,226 previously incorporated herein Least recently used (LRU)A replacement algorithm in
by reference. which the block to be replaced in chosen according to the
The incorporated application also teaches the various history of its usage. The least recently used block is
operating modes of the SIP processor inputs and outputs, etc. replaced.
which will be referenced herein and will be shown as Lock in cacheData variables or instructions can be
supported by the novel ICU. locked in the cache. They will not be replaced even if they
FIG. 1 goes on to show the processor bus comprised of are chosen by the replacement algorithm. Note that the
Address Bus 120 and Data Bus 121. Instructions in Instruc *LOCK input to the ICU, to be described hereinafter,
tion ROM 150 and the instruction cache. ICU 101, are 55 speci?es interlock operations, and it is not used for lock in
addressed by processor 110 via Address Bus 120. Instruc cache operations. A lock bit is associated with each block to
tions fetched are shown transmitted to processor 110 via bus facilitate locking the block.
125. Memory array0n chip random access memory array
The data cache. ICU 102, is also addressed via Address that contains the cached data or instructions.
Bus 120. Bidirectional data ?ow is shown possible between Memory bus-The bus that connects the cache to the
processor 110 and ICU 102 via Data Bus 121. main memory.
Memory Bus 175 is shown as the bus coupling main MissThe word speci?ed by the access address is not
memory 190 with ICU 101 and ICU 102. For the sake of present in the cache memory array.
completeness. arithmetic accelerator 195, which would be Modi?ed blockA block is marked as modi?ed when it
part of a typical RISC system con?guration, is shown 65 is written in a copy-back write policy. This indicates that the
coupled to buses 120 and 121. Also, Data Transfer Control block is modi?ed relative to the main memory, and contains
lers (DTCs) shown as DTC 198, may be used as part of a the more recent version of the data.
5,627,992
5 6
Non-cacheableAn instruction or data variable which Update memoryAn operation that causes the update of
cannot be cached. A non-cacheable operation is transferred the main memory from the cache. A modi?ed block is
by the ICU to the memory bus. The cache is not searched for written by the cache to the main memory so that the memory
it. and no cache block is allocated. is updated with the most current version of the data.
PrefetchThe operation of fetching data variables or Valid arrayOn chip random access memory array that
instructions into the cache before they are requested contains the valid bits.
Prefetch buffer-A buffer which holds prefetched data Valid bitA bit that indicates the validity of the cached
variables or instructions read from the memory bus. before data. In the preferred embodiment of the ICU a valid bit is
they are written into the cache. In the ICU the read buffer is
used for the function of a prefetch buffer. associated with each cached word.
Preload-A special prefetch operation that loads the WayA group of tags associated with a tag array module.
cache with speci?c data variables or instructions. The Only one tag speci?ed by the modules address decoder is
addresses for the preload operation are speci?ed by the user read and compared from each way. In the preferred embodi
as opposed to other prefetch operations in which the ment of the ICU there are two ways. each containing 256
prefetched addresses are determined by the cache. tags.
15
Processor busThe bus that connects the cache to the Write-allocate-In case of cache miss for write operation.
processor. a cache block is allocated for the block that contains the
Random replacement-A replacement algorithm in which written address. A reload operation is performed for the
the block to be replaced is chosen randomly. required block. For non-write-allocate, a cache block is not
Read bu?erA buffer which holds data variables or
allocated, and the write is performed only in the memory.
instructions read ?om the memory bus. before they are Write bu?erA buifer which holds write accesses infor
written into the cache. In the preferred embodiment of the mation (address, data, and control) until the write access is
ICU the read buffer is four words deep. and it is also used performed on the memory bus. The preferred embodiment of
for the function of a prefetch buffer. the ICU includes a four deep write buffer that can hold up
Read-throughIn the case of cache miss. the required 25 to four write accesses.
data or instruction is transferred to the processor as soon as Write-through-A write policy in which every write
it is accepted from the memory. The reload operation is access is performed into the main memory. In the case of
completed in parallel. In a cache with no read-through. the cache hit, the data is also written into the cache.
reload operation is completed before the required data or The multiprocessor cache terms used herein are de?ned as
instruction is transferred to the processor. 30 follows:
ReloadThe operation that is performed in the case of Bus watching-The memory bus is monitored (watched)
cache miss for fetching the required data or instructions by the slave caches. They compare the transferred address to
from the main memory. tag bu?er addresses. A special operation can be performed
Replacement algorithrnThe algorithm that determines in the case that a match is found. The terms bus snooping or
the block to be replaced when a new block is allocated in the 35 snooping cache are equivalent.
cache. One block from the set that contains the required new Cache consistency-This is a different term used to
address is chosen. describe data consistency in a multiprocessor cache systems.
ScopeThis term is used in the context of cache instruc Data consistency-This is the main problem in multipro
tion to de?ne the scope of the instruction e?tect. The instruc cessor cache systems. If a variable is shared by multiple
40
tion can affect one speci?c ICU. multiple instruction caches, processors it can be cached in multiple caches. The most
multiple data caches, or multiple instruction and data caches. up-to-date version of the variable should be supplied when
Set-A group of tags and the associated cache blocks, ever the variable is accessed. This insures the consistency of
which are read and compared concurrently to the requested the data variables, throughout the system.
address. A match can be found to any of the sets tags. The Data interventionAn operation that can be performed
set is speci?ed according to some of the address bits. The by a slave cache when a match is found in the case of
number of tags in the set is equal to the number of ways. memory-bus read access. If the slave cache contains modi
(to be de?ned hereinafter), in the cache organization. ?ed data (the most current version of the data), it intervenes
Set-associative-A cache organization which allows in the access and supplies the data. In this case the main
caching of a speci?c address in a number of possible 50
memory should not supply the data.
locations in the cache. This number is referred to as the Exclusive--Indicates that a variable or a cache block is
degree of associativity. It speci?es the number of ways inpresent exclusively in one cache. It can be used exclusively
the cache organization and the number of tags which are by one processor, or used by more processors but exists only
read and compared concurrently. The disclosed ICU sup in one cache. In the preferred embodiment of the ICU a
ports two-way and one-way set-associative organizations. 55 LOW in the shared block status bit indicates an exclusive
Sub-blockA group of one or more words which are block. The block can be either exclusive non-modi?ed or
fetched from main memory together with the required word exclusive modi?ed.
for the reload operation. The sub-block size de?nes the InterlockInterlock operations are used for temporarily
maximum number of words that are fetched. The sub-block locking a variable (interlock variable) for exclusive usage of
size is lower or equal to the block size. one processor. No other processors or caches are allowed to
Tag-The tag identi?es the address of the data or instruc use the variable while it is interlocked. The *LOCK input to
tion which is currently present in the cache. A cache tag is the ICU, to be described hereinafter, indicates an interlock
associated with each cache block. and it is stored in the tag operation.
array. In the preferred embodiment of the ICU each tag Master cacheA cache which is a master of the memory
corresponds to a four-word block. 65 bus. It issues the request and expects a response.
Tag arrayOn chip random access memory array that MatchThe address of the memory bus access matches
contains the address tags for the cached data or instructions. one of the addresses which is present in the tag buffer, and
5 ,627,992
7 8
the corresponding word is valid. This term is used for the *CBACK. Cache Burst Acknowledge, is an ICU output.
memory bus address compare, and it is equivalent to the synchronous, and active LOW. This output is asserted when
term hit which is used for the processor bus. ever a burst-mode cache access has been established on
OwnershipThis is a scheme to guarantee data consis Processor Bus 120.
tency. The most current value of a variable is owned by one *CBREQ, Cache Burst Request, is an ICU input,
cache or the main memory. It is the responsibility of the synchronous, and active LOW. This input is used to establish
a burst-mode cache access on the processor bus and to
owner to maintain the consistency of the variable. There are
request the next transfer during a burst-mode cache access.
several ownership schemes which differ in the number of
This signal can become valid late in the cycle compared to
states that are attributed to a variable and the algorithms for
*DREQ or *IREQ, to be described hereinafter.
ownership and state transitions.
*CERR, Cache Error, is an ICU output. synchronous. and
SharedIndicates that a variable or a cache block is active LOW. This output indicates that an error occurred
shared by more than one processor. A shared variable can be during the current cache access.
present in more than one cache. In the preferred embodiment *CRDY, Cache Ready. is an ICU output. synchronous.
of the ICU a HIGH in the shared block status bit indicates and active LOW. For processor bus cache reads this output
a shared block. The block can be either shared non-modi?ed 15 indicates that valid data is on the Cache Bus. For cache
or shared modi?ed. writes it indicates that data need no longer be driven on the
Shared modi?edA block status in the ICU which indi Cache Bus.
cates that a cache block is shared and modi?ed. It also CREQTO-CREQTI. Cache Request Type, is a synchro
indicates that the block is owned by the cache. and that it is nous input. This signals specify the address space for the
the most current value of the block in the system. cache access on the processor bus as follows:
Slave cacheA cache which is not the master of the
memory bus. A slave cache can monitor the memory bus for
CREQTl CREQTO Meaning
data consistency purposes.
Snooping-A ditferent term which is used instead of bus 25
Data Cache Usage
watching. Bus snooping or snooping cache are also equiva 0 0 Memory access
lent. 0 1 Input/Output access
Having de?ned the terms to be used herein. a detailed 1 x coprocessor transfer
description of the preferred embodiment of the novel ICU (ignored by the ICU)
Instruction Cache Usage
will be set out immediately hereinafter in terms of its pin 30
description and functional organization. x 0 Memory access
With respect to the pin description it should be noted that 1 Instruction read-only
memory access
the term three state is used to describe signals which may
be placed in the high impedance state during normal opera
tion. All outputs (except MSERR. which will be described 35 For instruction cache usage CREQTl has a special func
hereinafter) may be placed in the high impedance state by tion. It is sampled during RESET, and if the sampled value
the *TEST input (also to be described hereinafter). is HIGH, the ICU responds to instruction ROM accesses. If
The preferred embodiment of the novel ICU is a CMOS; the sampled value is LOW, the ICU does not respond to
169 Pin Grid Array package as shown in FIG. 2. instruction ROM accesses. After RESET the CREQTl input
40 is ignored for instruction cache usage.
The processor bus interface will ?rst be described and
includes the following: *CSEL, Chip Select, is a synchronous input, active LOW.
An active level on the *CSEL input selects the ICU for
The Address Bus, pins Ail-A31 of FIG. 2, is an ICU input, processor-bus cache instruction accesses. It is not used in
is synchronous, and transfers the byte address for the cache normal memory accesses. The *CSEL input can be disabled
access on the processor bus. via the Chip Select Mapping Register, a portion of the ICU
45
The Access Status Control signals, shown as ASTCO and to be described hereinafter. When *CSEL is enabled and not
ASTCl in FIG. 2. are synchronous inputs which specify the asserted the ICU does not respond to processor-bus cache
status control associated with an access. They are encoded, instruction accesses.
according to the preferred embodiment of the invention, as *CSM, Chip Select for Memory access, is a synchronous
follows: ICU input, active LOW. An active level on the *CSM input
selects the ICU for memory accesses. It can be used for
cache extensions and cache address space selection. The
ASTCl AS'ICO Meaning
*CSM input can be enabled via the Chip Select Mapping
0 0 Exclusive write-through Register. When *CSM is enabled, The ICU responds to
O 1 Exclusive copy-back 55 memory accesses only if the *CSM is asserted and the
1 0 Shared address matches corresponding enabled bits in the prese
1 l Non-Cacheable
lected ?eld of the Chip Select Mapping Register (to be
described in detail hereinafter).
These inputs are normally connected to processor 101s CBO-CB31. the Cache Bus, is bidirectional, synchronous
MPGMO-MPGMI outputs as described in the incorporated and three state. The Cache Bus transfers instructions or data
application. Speci?cally, the MMU programmable, to and from the ICU on the processor bus.
MPGMO-MPGMl outputs re?ect the value of two bits in an *DREQ, Data Request, is a synchronous ICU input,
entry of a translation look-aside buifer (illustrated as TLB active LOW. This input requests a data access on the
350 in FIG. 3) that is associated with the access. processor bus. When it is active, the address for the access
*BINV. Bus Invalid, is a synchronous ICU input, active 65 appears on the Address Bus. For instruction cache usage of
LOW. indicating that the address bus and related controls are the ICU. *DREQ is used for processor-bus cache instruction
invalid. It de?nes an idle cycle on Processor Bus 120. transfers.
5,627,992
10
*IREQ. Instruction Request. is a synchronous ICU input. signalled by the assertion of *DREQ for data cache and
active LOW. This input requests an instruction cache access *IREQ for instruction cache.
on the processor bus. When it is active the address for the *RESET. Reset. is an asynchronous input, active LOW.
access appears on the Address Bus. This input has a special This input resets the ICU.
function during reset operation of the ICU. It is sampled by
the rising edge of SYSCLK. when RESEI (to be described R/*W, Read/Write. is a synchronous input. This input
hereinafter) is active. The last sampled value determines the indicates whether the cache access is a transfer from the ICU
ICU operation as a data cache (*IREQ LOW). or as an to the processor (R/*W High), or from the processor to the
instruction cache (*IREQ HIGH). For data cache operation. ICU (R/*W Low).
*IREQ should be tied LOW (it is ignored during normal 10
SUP/*US. Supervisor/User Mode. is a synchronous input.
operation). For instruction cache operation. it should be This input indicates the program mode of the illustrative
connected to the processor *IREQ output. which is deas processor (Supervisor mode or User mode) during the
serted during RESET. Note that if the processor is placed in access. The ICU internal registers and the execution of cache
Test Mode during RESET. external logic should drive the instructions are protected from User mode accesses.
IREQ HIGH (the processor described in the copending SYSCLK. System Clock is an external clock input. at the
application regarding the high performance interface does operating frequency of the ICU.
not drive IREQ HIGH in its Test Mode). *TEST, Test Mode. is an asynchronous input, active
*LOCK. Lock. is an ICU input. synchronous and active LOW. When this input is active the ICU is in the Test mode.
LOW. This input indicates that the processor cache access is All outputs and bi-directional lines. except MSERR. are
to an interlocked variable. The ICU handles this access in a forced to the high-impedance state.
special way to be described with reference to the interlock WREP. Way for Replacement. is a synchronous input.
facility set forth in detail hereinafter. This input forces the way number for replacement in a case
MSERR. Master/Slave Error. is a synchronous. ICU of a cache miss. It is sampled during the ?rst cycle of a valid
output. active HIGH. This output shows the result of the cache access. A miss in a two-way set associative organi
comparison of the ICU outputs with the signals provided 25 zation (if the replacement mode is external). causes replace
internally to the off-chip drivers. If there is a difference for ment as determined by WREP: WREP LOW forces way 0 to
any enabled driver. this signal is asserted. be replaced. WREP HIGH forces way 1 to be replaced.
OPIO-OPTZ. Option Control. is a synchronous ICU Next. the memory bus interface will be described. making
input. These signals re?ect the option control bits associated continued reference to FIG. 2. The memory bus may be seen
with a cache access. They are used for specifying the data to include:
length as well as special access information. The interpre BSTCO-BSTCl. Block Status Control. which is
tation of these signals is dependent on the usage of the ICU bi-directional. synchronous and three state. These signals are
as data or instruction cache. The encoding and interpretation used to inspect and update the cache block status informa
of these inputs. according to the preferred embodiment of tion. When required by a memory bus instruction. the ICU
the invention. is as follows: 35
uses them to indicate the block status bits associated with the
supplied address. These signals are also used for supplying
OPIZ-O the block status from the memory bus for a Write Block
value Meaning (Data Cache) Meaning (Instruction Cache) Status instruction. The encoding of this signal for both of the
above functions is as follows:
000 32-bit access no access
001 8-bit access no access
010 16-bit access no access
BSTCl BSTCO Meaning
011 no access no access
100 instruction memory no access
0 0 Exclusive non modi?ed
access (as data) - \ 0 1 Exclusive modi?ed
101 cache operand cache operand 45 1 0 Shared non modi?ed
transfer transfer
1 1 Shared modi?ed
110 debug module access no access
111 reserved reserved
*DI. Data Intervention is a synchronous output, three state
The OP'I' inputs are ignored if DREQTl is HIGH (treated and active LOW. This output is used for the indication of a
as no access). Codes 100. 101 are treated as no access if data intervention operation on the memory bus. The data
DREQTO is HIGH. Code 100 is used for reading the intervention operation is used in some multiprocessor
instruction ROM as data. A data cache responds to this code con?gurations, to supply the most updated version of the
only if the ROM Enable bit in Moda Register, to be de?ned variable from the appropriate cache (as opposed to memory).
hereinafter. is HIGH. In this case the request is treated as a 55 The ICU master precharges the *DI signal during the
non cacheable access. Code 101 indicates an operand trans address cycle of a memory bus read access. It then places it
fer for the ICU processor-bus cache instructions. According in three state mode. The ICUs which are not the bus masters,
to the preferred embodiment of the invention, code 110 is discharge the *DI signal if they respond with data interven
used for a special debug module access. A data cache tion.
responds with *CRDY HIGH for four cycles. and then *GRI. Memory Bus Grant, is a synchronous input, active
*CRDY is asserted for one cycle. LOW. This input signals that the memory bus is granted for
*PCA. Pipelined Cache Access. is a synchronous ICU the ICU use.
input. active LOW. If *DREQ for data cache or *IREQ for *HIT, Hit, is bidirectional, synchronous, three state and
instruction cache is not active, this input indicates that the active LOW. This signal can be programmed to be either an
cache access is pipelined with another. in-progress. cache 65 output only or an input/output signal. As an output it is used
access. The pipelined access cannot complete until the ?rst for hit indication. It is asserted when a hit is detected in the
access is complete. The completion of the ?rst access is tag bu?er for the address presented on the memory bus. It is
5.627.992
11 12
also used in some of the memory bus instruction. to indicate is a multiplexed Address/Data bus used for the memory
the validity of a Word or a block. interface. When *MASTB is asserted. this bus holds the byte
When programmed to be an input/output signal it can be address of the Memory Bus access. When the ICU is the bus
used. in addition to the above output functions. as a signal master. it outputs the address. When the ICU is not the bus
for the detection of hit in any other cache. The ICU master master the bus is an input and the address is latched by the
precharges the *HlT signal during the address cycle and then ICU for its internal use. When MASTB is not asserted. the
memory bus is used to transfer data to and from the ICU.
places it in input three state mode. The ICUs which are not
the bus masters discharge the *HIT signal only when a hit *MERR. Memory Error. is a synchronous ICU input.
active LOW. This input indicates that an error occurred
is detected in their tag buifer. during the current memory access. The ICU also uses this
*MASTB. Memory Address Strobe. is bidirectional, signal for data consistency operations.
synchronous. three state and active LOW. When the ICU is *MLOCK. Memory Lock. is bidirectional. synchronous.
the Memory Bus master. this signal is asserted by the ICU three state and active LOW. This signal indicates that the
to indicate that a byte address is present on the memory bus. memory access is an interlocked access. The ICU master
When the ICU is not a master. this signal indicates that a asserts this output when an interlocked access is presented
byte address from another bus master is present on the on the memory bus. When the ICU is not the bus master. this
memory bus. Note that if both instruction cache and data signal is used as an input. If a match is found for a write
cache are present in the system. tWo *MASTB signals are access with *MLOCK asserted. the associated word is
available. The two signals can be used to distinguish invalidated. This feature is used for schemes that enable
between instruction and data accesses. caching of interlock variables.
*MBACK. Memory Burst Acknowledge. is a synchro *MRDY. Memory Ready. is bidirectional. synchronous
nous input. active LOW. This input is active whenever a and active LOW. When the ICU is the bus master. this signal
burst-mode cache access has been established on the is used as an input. For memory bus reads. this input
memory bus. indicates that valid data is present on the memory bus. For
MBPO-MBP3. Memory Byte Parity. is bidirectional. syn memory bus writes. it indicates that the data need no longer
chronous and three state. This is the byte parity bus for be driven on the memory bus. When the ICU is not the bus
transfers on the memory bus. Even or odd parity can be master. this signal is used as an input for data consistency
speci?ed. MBPO is the byte parity for operations. It is used as an output for data intervention and
MEMADO-MEMAD7. MBPl is the byte parity for memory bus special operations. The ICU asserts *MRDY to
MEMAD8-MEMAD15. and so on. For transfers from the indicate that valid data is present on the memory bus.
30
ICU to the memory, the ICU generates parity. For transfers MREQTO-MREQTI. Memory Request Type. is
to the ICU. it checks the byte parity. bidirectional. synchronous and three state. These signals
If a parity error is detected. and data is to be transferred specify the address space for the access on the memory bus.
on the processor bus. the *CERR signal is asserted. If the When the ICU is the bus master. it uses these signals as
data need not be transferred to the processor (e.g. block 35
outputs. When it is not the bus master. the MREQT signals
reload). an error bit is set in the Status Register (to be are used as inputs for data consistency operations. The
described hereinafter) and the data is ignored. The parity encoding. according to the preferred embodiment of the
generation and checking can be disabled. Memory bus data invention. is as follows:
timing are relaxed if parity generation and veri?cation is
disabled.
MREQTl MREQTD Meaning
*MBREQ. Memory Burst Request. is bidirectional.
synchronous. three state and active LOW. This signal is used 0 0 Data memory access
to establish a burst-mode access on the memory bus and to 0 1 Data input/output access
request the next transfer during the burst-mode access. 1 0 Instruction memory access
1 1 Instruction Rom access
When the ICU is the bus master. this signal is an output. 45
When the ICU is not a master, it is an input. used by the ICU
for data consistency operations. When the ICU is not a bus master the MREQT signals are
MDLNO-MDLNl. Memory Data Length. is also used (together with the MRW signals) for specifying a
bidirectional. synchronous. and three state. These signals memory bus cache instruction for the ICU.
re?ect the data length for the memory bus data accesses. 50 MRWOMRW1. Memory Read Write. is bidirectional
They are ignored for instruction accesses. For data cache synchronous and three state. These signals are used to
usage the ICU supports 8. 16. and 32 bit transfers. The specify the type of read and write operations on the memory
encoding of these signals are. according to the preferred bus. When the ICU is the bus master. it uses these signals to
embodiment of the invention. are as follows: indicate the required operation. When the ICU is not the bus
55 master these signals are inputs, used for data consistency
operations. The encoding of these signals. according to the
MDLNl MDLNO Meaning preferred embodiment of the invention. is as follows:
0 0 32-bit access
0 l 8-bit access
1 0 16-bit access MRWl MRWO Meaning
60
1 1 Invalid
0 0 Write
0 1 Read
It should be noted that MDLNO and MDLN 1 encoding 1 0 Write broadcast
l 1 Read for modify
corresponds to OPIO and OPIl for the processor used with -
the illustrative embodiment of the invention. 65
MEMADO-MEMAD31. Memory Address/Data Bus. is The read and write operations referred to hereinabove are
bidirectional. synchronous. and three state. The Memory bus relative to the ICU. e. g. read is from the memory to the ICU.
5,627,992
13 14
When the ICU is not a bus master the MRW signals are registers are implemented for programmable option
also used (together with the MREQT signals) for specifying selection, cache instructions implementation and status
the memory bus cache instruction for the ICU. A detailed report.
description of a memory bus instruction set suitable for use Cache policies can be selected by using programmable
with the ICU being described herein. will be set forth in options. The cache write policy can be programmed as
detail hereinafter. write-through. copy-back or ?exible on a per access basis. A
MS/*MU. Memory Supervisor/User Mode. is a synchro write allocate or nonwrite allocate option can be selected. A
nous ICU output and is three state. This output indicates the four word write buffer is incorporated for e?icient imple
program mode of the processor (Supervisor mode or User mentation of write accesses. The write bulfer can be enabled
mode) during the memory access. The ICU transfers the 10 or disabled. The replacement algorithm can be programmed
SUP/*US value presented on the processor bus to the as LRU, random or external. A ?exible prefetch policy can
MS/*MU value on the memory bus. for the appropriate be selected. Read through option can be enabled. A four
transactions. word read bu?er is incorporated to support e?icient
*REQ Memory Bus Request. is a synchronous ICU prefetching and read operations.
output. active LOW. This output is used by the ICU to The multiprocessor support policy can be tailored to the
request the memory bus. system. The level of multiprocessor support can vary from
*VSI. Valid Status or Instruction. is a synchronous, ICU a simple software controlled organization through an exten
input. active LOW. When the ICU is the bus slave. an sive ownership scheme. A bus watch capability can be
asserted *VSI indicates a memory bus cache instruction enabled or disabled. The ownership algorithm can be con
20 trolled to support the required scheme. Caching interlock
access. When the ICU is the bus master and it issues a read
request for a reload operation. the assertion of *VSI indi variables can be enabled or disabled
cates that a special Write Block Status instruction should be Two chip select inputs and a chip select mapping register
executed. A detailed description of a suitable memory bus allows easy cache extensions as Well as multi cache orga
cache instruction set and the use of *VSI will be set forth nizations. The reload function can be tailored to the system
25
hereinafter. by selecting the appropriate access control options such as
Having described the various inputs and outputs to and reload size. starting and stopping addresses. burst and wrap
from the novel ICU with reference to the pinout diagram of around.
FIG. 2. a complete understanding of the novel methods and Again, the preferred embodiment of the ICU operates at
apparatus may be had with reference to the detailed descrip the same frequency of the RISC processor in the illustrative
30
tion of the ICUs functional organization. This description example being set forth. i.e., at a 25 MHZ. nominal fre
will be set forth in several parts. First. an overview of how quency with possible higher frequencies. It achieves access
the ICU ?ts into a computing system. particularly the times of two cycles for the ?rst hit access and one cycle for
exemplary RISC system being used to demonstrate the the next burst-mode or pipeline hit accesses.
utility and operability of the invention. will be described 35
A three IC con?guration, one containing the RISC pro
Second. the data ?ow through the ICU will be described in cessor and two for ICUs (one for an instruction cache and
detail. Third. a register level description of a register set one for a data cache) is a very high performance cache
suitable for implementing the invention will be set forth. A system with 16 k bytes of cache.
suitable cache instruction set, a description of data formats As indicated hereinbefore with reference to FIG. 1, the
and handling. cache accesses and prefetch operations will ICU has two interface busses; the Processor Bus and the
also be set forth. Memory Bus. It can be connected directly to the RISC
Further details on the use of an ICU write buffer. initial processor Without any interface logic. The ICU cache bus is
ization and reset operation will be described hereinafter as connected to the processors data or instruction bus, for data
well. . or instruction cache respectively. Pipelined and burst-mode
Finally, multiprocessor support by the ICU and the special 45 accesses are supported for maximum utilization of the
ICU interlock facility will be described to complete the processor channel. The Memory Bus is a separate interface
detailed description of the invention. to the memory, other processors, and the system bus. It is a
As indicated hereinbefore. the novel ICU is described in multiplexed address and data bus with support for burst
the context of a RISC architecture including a SIP. The mode accesses. It also incorporates multiprocessor support
preferred embodiment of the invention is described in such 50 functions. In shared memory multiprocessor environments
a way as to support this architecture. Those skilled in the art the memory bus can be used effectively as a shared multi
will readily appreciate that modi?cations may be made to processor bus. For single processor systems it can be used as
the preferred embodiment of the novel ICU, without depart the system bus or as a local bus.
ing from the scope of spirit of the invention. to support, for The preferred embodiment of the ICU contains special
example. non-RISC processors. The description to follow is, 55 hardware for fault tolerance support. It supports master/
accordingly, set forth for the purpose of illustration only. slave checking and byte parity generation and checking on
The core of the preferred embodiment of the ICU is an 8 the memory bus. For master/slave checking two or more
k byte memory array with the associated tag and valid ICUs are connected in parallel with one or more caches (the
arrays. The arrays are organized as a two-way set slaves) checking the outputs of the master. The byte parity
associative cache with 4 words per tag (block size=4 words) generation and checking can be used on the memory bus for
and a valid bit per word. This basic organization also reliable bus transfers.
supports direct mapped cache. variable block and sub-block The preferred embodiment of the invention is, as indi
size as well as a ?exible reload scheme. Ablock status array cated hereinbefore, fabricated in CMOS technology and has
and LRU array are also incorporated. They are used for a maximum power dissipation of 1.5 W.
cache replacement. locking data in cache, and data consis 65 The ICU internal data ?ow organization is shown in FIG.
tency policies. The ICU contains all the control logic for the 3. The following description refers to the functional com
different cache policies. algorithms and instructions. Special ponents on this data ?ow diagram. The ICU is partitioned
5,627,992
15 16
into three main functional units: Processor Bus Unit, Each tag corresponds to a block of 4 consecutive cached
Memory Bus Unit and Cache Unit. each depicted in FIG. 3. words. For each cache access, two tags are accessed simul
The description to follow will from time to time make taneously using bits 11-4 of the address. The tags are
reference to specific bit locations and ?elds of various words compared to bits 31-12 (in one embodiment of the
being used for any number of purposes. One skilled in the invention) of the address. The hit signals are generated from
art will readily appreciate that these speci?c references are the comparison results and con?guration bits. The Tag array
not intended to be limiting and can be modi?ed to suit a is written whenever a miss occurs and a cache block is
desired application. The speci?c references are made for the allocated. The tag is selected according to bits 11-4 of the
sake of clarity only in setting out a workable, illustrative address and the replacement algorithm. Address bits 31-12
preferred embodiment of the invention. are written to the Tag array. The Tag array is shown in FIG.
10
3 as unit 311.
The Processor Bus Unit controls all the Processor Bus
The Valid array. unit 312 of FIG. 3. is a 2 k bit array that
activity. It supports all RISC/SIP channel protocols; single, stores a valid bit for each cached word It is organized as two
burst and pipelined. It incorporates the Address Incrementer. 1024><l arrays. A valid bit is set for each word as it is written
the Data Shifter and the Processor Bus Control. each to be into the cache. The valid bit is selected by address bits 11-2
described immediately hereinafter. (as is the memory array). and the matching way. 'Iwo valid
The Address Incrementer (AI) latches the address bus bits corresponding to the two ways are checked on every
input. It can be incremented on every cycle. The AI output cache access. Ahit signal is generated only if the appropriate
is the address for a cache access. The AI is shown in FIG. valid bit is set.
3 as unit 301. The Block Status array, unit 313 of FIG. 3. is a 1536 bit
The Data Shifter (DSH) is used for data alignment. It array that incorporates 3 block status bits per each cache
shifts bytes and halfwords and holds the data for cache write block. It is organized as two 256>6 arrays (this corresponds
accesses. It is also used for the appropriate byte and half to the Tag array organization). The three bits are: Modi?ed,
word shift operation in the case of byte and half-word reads. Shared and Locked. The Modi?ed bit indicates that the
The DSH is shown in FIG. 3 as unit 302. block is modi?ed and should be written back to memory
The Processor Bus Control (PBC) controls the di?erent 25
before replacement. It is used for copy-back operations and
Processor Bus operations. The PBC is shown in FIG. 3 as data consistency operations. The Shared bit indicates that the
unit 303. _ block is shared. It is used for data consistency operations.
The Memory Bus Unit controls Memory Bus activity. It The Locked bit indicates that the block is locked and cannot
incorporates the Write Buffer. the Memory Address logic. be replaced. It is used for locking important data or instruc
the Memory Read Buffer and Memory Bus Control. each to tions in the cache.
be described immediately hereinafter. The LRU array, unit 314 of FIG. 3, is a 256 bit array
The Write Buffer (WB) used in the preferred embodiment which stores the LRU bits. It is organized as a 256><l array.
of the invention. includes two four word ?rst-in. ?rst-out Each LRU bit corresponds to a set of two tags. The LRU
(FIFO) buffers (one for address and one for data). The WB array is used if the Least Recently Used replacement algo
35
can buffer all ICU write operations. For write through rithm is selected. The appropriate LRU bit is updated to
operations it bu?fers up to 4 byte. half word or word writes. re?ect the least recently used block of the two blocks in a set.
For copy-back operations it buffers a 4 word block. The WB When required. the LRU bit determines which block is
is shown in FIG. 3 as unit 304. replaced from the set.
The Memory Address Logic (MAL) includes two address The Special Registers block. shown as unit 315 of FIG. 3,
incrementers. The ?rst latches and increments the memory incorporates all the special registers of the ICU. They are
bus addresses for operations from the bus to the ICU. The used for programming the ICU options, controlling special
second latches and increments the addresses for read opera~ operations. and holding status information.
tions initiated by the ICU. The MAL is shown in FIG. 3 as Finally, the Cache Control block. unit 316 of FIG. 3,
unit 305. 45 includes control logic for the cache operations.
The Memory Read Buffer (MRB) is a four word data Having set forth and described data ?ow through the
buffer. It buffers the data from the memory bus, until the novel ICU with reference to FIG. 3. attention will now be
cache is available for update operation. The MRB is used as turned to a register structure to support the data ?ow and
a prefetch buffer when prefetching is enabled. The MRB is control of the ICU.
shown in FIG. 3 as unit 306. 50 The preferred embodiment of the ICU contains 8 special
Finally. the Memory Bus Control (MBC), which controls registers. These registers select programmable options, sup
memory bus operations, is shown in FIG. 3 as unit 307. port cache control operations, and indicate cache status
The cache unit performs all the cache functions. It incor information. Each register can be read or written by the
porates the Memory array, Tag array. Valid array, Block processor via the processor bus or the memory bus. A read
Status array, LRU array, Special Registers. and Cache 55 register or write register cache instruction is transferred to
Control. each to be described immediately hereinafter. the ICU using the appropriate cache instructions protocol on
The Memory array is a 64 k bits storage array for cached one of the buses. For the processor-bus instructions, the
instructions or data. It is organized as two ways of 1024 register number is speci?ed by the three least signi?cant bits
words. For read operations the two ways are accessed of the opcode. The instruction transfer protocol and the ICU
simultaneously. according to the preferred embodiment of response are controlled by the Read Registers Control and
the invention. using bits 11-2 of the address. The appropriate Instruction Transfer Protocol Control in the Modb Register
word is selected according to the hit signals from the Tag respectively. For the memory-bus instruction the register
array. For write operations the correct word in the array is number is speci?ed by the three least signi?cant bits of the
written after the Tag access is completed and the hit signals address. A detailed description of the these illustrative
generated. The Memory array is shown in FIG. 3 as unit 310. 65 protocols and responses will be set forth hereinafter.
The Tag array stores cache tags in a two way set asso For processor-bus accesses, all ICU special registers are
ciative organization. Each way is organized as 256x20 bits. protected. They can be accessed only if the SUP/*US input
5,627,992
17 18
is HIGH. User mode accesses are not executed. A *CERR instruction. A valid cache instruction is detected when the
response signals an invalid User mode access. The registers *CSEL input is asserted and enabled or when the address
are not protected for the memory bus cache insn'uction. inputs A31-A14 and CREQTO match the Cache Instructions
According to the preferred embodiment of the invention, Address and Cache Instructions Address Space ?elds in the
each register is assigned a number as follows: Chip Select Mapping Register. In these cases, the instruction
is copied from the address inputs A7-A0.
The Instruction Register can be accessed as a special
reg # reg name
register for saving and restoring the ICU state in the case of
Chip Select Mapping a processor interrupt during the instruction transfer protocol,
Instruction or if a multiple cache instruction is interrupted. When it is
Address Operand accessed as a special register. a Read or Write Register cache
Count
Error Address
instruction is transferred to the ICU. Shadow registers are
Status incorporated in order to be able to latch the Read Register
Moda instruction and optional operand. without destroying the
Modb required old contents of the Instruction and Operand Reg
ister. The shadow Instruction Register is ?rst loaded with an
Only the general function of each register will be instruction. For all the instructions except the Read Register
described hereinafter since one skilled in the art will appre and the Write Register instructions, the Instruction Register
ciate that speci?c bit assignment may be made for applica is updated when the instruction execution is started. For the
tion dependent purposes. Read Register and Write Register instructions the Instruction
The Chip Select Mapping Register is special register 0. It Register is updated only if there is no other valid instruction
speci?es the address and the conditions for the ICU chip in the Instruction Register. If a valid instruction is present in
select functions. The ICU has two independent chip-select the Instruction Register. the Read Register instruction is
functions: for normal cache accesses and for cache instruc 25 executed without atfecting the Instruction and Operand
tion accesses.
registers. This feature is also used for checking the status of
the ICU without a?ecting the execution of a cache instruc
For normal cache accesses. the chip-select function can be
tion.
used for cache extensions. cache address space assignments.
and multiple-cache con?gurations. In the preferred embodi An instruction is executed by the ICU whenever all the
ment of the invention. up to 32 ICUs (16 instruction caches required operands are valid. A detailed description of exem
and 16 data caches) can be used without external chip-select plary instructions. required operands, and cache-instructions
hardware. Memory access (MA) selection is also a?ected by accesses will be set forth hereinafter.
the *CSM input and a Memory Bit Enable ?eld of the Moda The Operand Register is special register 2. It speci?es an
Register. If the *CSM input is enabled. it must be asserted operand for certain ICU instructions. The register contains a
to enable a memory access. If the *CSM input is disabled, 35 bit (OV bit) to indicate the validity of the Operand Register
it is ignored. The Memory Bit Enable Field can selectively value. A shadow Operand Register is incorporated in the
enable the comparison of an appropriate address bit to a ICU for correct execution of Read Register and Write
corresponding (MA) ?eld bit in the chip-select memory Register instructions. The shadow Operand Register is ?rst
register. When a bit is disabled for comparison. it is ignored loaded with an operand. For all the instructions, except Read
If all the bits are disabled. a match is forced. Note that when 40
Register and Write Register instructions, the Operand Reg
the *CSM is enabled. it should be asserted and the MA?eld ister is updated if the validity bit is reset. If the 0V bit is set,
comparison match. for a memory access to be serviced. the ICU delays the *CRDY response until the 0V bit is reset
For cache instruction accesses. the chip-select function is (the previous instruction completion), and then loads the
used for selecting the appropriate ICU for the access. A register. For Read Register or Write Register instructions the
cache instruction access is enabled when *CSEL is asserted Operand Register is updated only if there is no other valid
and enabled, or when the address inputs A31-A14 and instruction in the Instruction Register. If a valid instruction
CREQIO match Cache Instructions. Address and Cache is present in the Instruction Register, the Read Register and
Instruction Address Space ?elds in the Chip Select Mapping Write Register instructions are executed without affecting
Register. In the second case. all the caches with a match are the Instruction and Operand registers. This enables saving
selected. If the instruction scope is multiple caches, all the 50
and restoring the Operand Register, when required. When
selected caches respond. If the instruction scope is one required, the Operand Register can be read or written by
cache. only the cache where the address inputs A13-A8 using Read Register or Write Register instructions.
match a Speci?c Cache Number ?eld of the Chip Select The Count Register is special register 3. It speci?es the
Mapping Register. responds. number of words to be operated on by certain ICU instruc
During initialization the *CSEL input is used to program 55 tions. The Count Register is not affected by the instruction
the Chip Select Mapping Registers of a given ICU. In a execution.
typical con?guration. a different address bit can be con The Exception Address Register is special register 4. It is
nected to the *CSEL of di?erent ICUs. When *CSEL is used for reporting the address associated with some excep
asserted and enabled. the ICU treats data accesses as cache tions. The Exception Address register is loaded with the
instruction accesses. After the Chip Select Mapping Register exception address. The exception type can be found in the
is programmed. the *CSEL input can be disabled, and the Status Register.
mapping speci?ed by the Chip Select Mapping Register The Status Register is special register 5. It is used for
applies. reporting the status of the ICU, and for information transfer
The Instruction Register is special register 1. It is used for between the ICU and the processor in the Read Tag and
specifying an instruction to the ICU. The Instruction Reg 65 Write Tag cache instructions (to be explained hereinafter).
ister can be read or written as any other special register. Bits are reserved for reporting the tag value for the Read Tag
however. it is also loaded automatically for a valid cache instruction and for transferring the tag for the Write Tag
5,627,992
19 20
instruction; for the reporting of the WAY for cached data; for The Moda Register also contains a ?eld used to select the
reporting the valid bits for a block and for transferring the block replacement policy. The selection is not used for
valid bits for the Write Tag instruction; for reporting the direct-mapped organization. In a two-way set-associative
locked bit for the block and for transferring the locked bit for organization, if one of the blocks is not valid, it is chosen for
the Write Tag instruction; for reporting the shared bit for the the new block. The replacement policy options, according to
block and for transferring the shared bit for the Write Tag the preferred embodiment of the invention, are: Least
instruction; for reporting the modi?ed bit for the block and Recently Used (LRU), random, or external.
for transferring the modi?ed bit for the Write Tag instruc When the LRU policy is selected, the LRU array is used
tion; for indicating if a hit is found in the cache; and for for the selection of the block to be replaced. The LRU array
indicating protection violations; illegal instructions; is updated on each cache access. One bit is associated with
memory errors and parity error. each set. It points to the least recently used block from the
The Moda Register is special register 6. It is used for two blocks in a set.
selecting various ICU options. The Moda register is reset When the random policy is selected, a pseudo random
during initialization. logic selects the block to be replaced. The logic is a simple
Fields are set aside to: control global cache operation; to ?ip ?op which change state every clock cycle. and used for
lock all cache locations; to disable the write buffer; and to all the sets.
disable the read through option. When the external policy is selected, the WREP input is
A ROM enable bit is included in this register which has latched with the processor address. The latched value forces
different functions for instruction cache and data cache the replaced block selection. This option may be used for
usage (ROME bit). cache testing and multi-level cache organizations.
For instruction cache usage when this bit is set. the ICU Finally. the Moda Register contains ?elds for selecting the
responds to ROM accesses and caches them. When it is 0. cache organization (two-way set-associative or direct
ROM accesses are ignored. mapped); for selecting the cache operation as Instruction
Cache or Data Cache; for controlling parity generation and
For data cache usage. when this bit is set, the data cache 25
checlq'ng options and memory bit enable information for
is enabled for instruction memory accesses (as data). When enabling the corresponding MA bits in the Chip Select
the OPT inputs indicate an instruction memory access, the Mapping Register for address comparison on Memory
ICU treats it as a non-cacheable transaction. The speci?ed
accesses.
address is read on the memory bus and transferred to the
processor, without loading it into the cache. When the The Modb Register is special register 7. It is used for the
selection of various ICU options. One ?eld of the Modb
ROME bit is 0. the ICU ignores this type of transaction.
Register is associated with multiprocessor organizations. A
Further ?elds are provided in the register to enable the detailed description of multiprocessor organizations and the
prefetch option for single accesses; to disable an address usage of this ?eld will be described hereinafter.
Wrap around option; to control the ICU operation on a single The Modb Register also contains a bit for controlling the
access miss; to control the ICU operation when a burst mode 35 ICU for Input/Output data accesses on the processor bus (it
read request is terminated by the processor; to enable the is ignored in instruction cache usage); a bit which speci?es
prefetch option for burst accesses; and to select sub-block the mode of reading the ICU registers; a bit which speci?es
size (SBS). the protocol for cache instruction transfer on the processor
It should be understood that the sub-block size is used in bus; and a bit which controls the byte and half-Word ordering
40
the control of cache reload operations. The number of words within a word, for the relevant ICU operations.
to be reloaded into the cache for a single cache miss is The multiprocessor oriented information in the Modb
de?ned by the de?ned sub-block size and the information Register includes: a Cache Interlocked Enable bit; a Read
stored in the ?eld controlling ICU operation for an access Bus Watch Enable bit; a Write Bus Watch Enable; a ?eld
miss (for example, control information such as start and stop which controls the operation of the ICU in the case of a
45
on a sub-block boundary). The SBS ?eld is also used, match on memory bus read by another master; a ?eld which
together with the burst mode control information, in the controls the operation on match for memory bus write
burst end control. The inherent block size of the ICU is 4 operation by another master; a Write Miss Memory Access
words. one tag is associated with four data words. The Control which controls the memory bus operation of the
sub-block size is either 1, 2 or 4 words. ICU for a copy-back write miss with write-allocate; a Write
Still further, the Moda Register contains ?elds for con Shared Hit Control ?eld which controls the operation on a
trolling the extension of memory bus address transfers; for write hit to a shared block; a Processor Shared Bit Control
indicating whether a write allocate option is on; and for which causes the block status share bit to be modi?ed for
selecting the write policy options on a global basis. every processor access when set; and an External Shared Bit
According to the preferred embodiment of the invention, 55 Control which controls the assignment of the shared block
the write policies are: ?exible, write-through or copy-back. status by external control. For external shared control, the
When a write-through policy is selected. every processor *HIT signal is used as an input for memory bus read and/or
write to the ICU is also written to the memory. When a write accesses. If it is asserted, then the data is also present
copy-back is selected, every processor write hit is written in in other caches, and the block is assigned a shared status. If
the cache and the modi?ed bit set. The block copies back to the *IHT input is not asserted, the variable is not present in
memory only when the block is replaced. When the ?exible any cache and the block is assigned an exclusive status.
policy is selected, a write-through or copy-back operation Detailed description of this feature is set forth hereinafter
can be selected on an access by access basis. The write with reference to ICU multiprocessor support.
policy for an access. is also affected by the ASTC inputs, the Turning to the ICU instruction set.
block status shared information, and the Write Shared Hit 65 The preferred embodiment of the ICU implements 20
Control and the Processor Status Bit Control ?elds of the processor bus cache instructions. and 9 memory bus cache
Modb Register (to be explained hereinafter). instructions. The processor bus instructions are issued by the
5,627,992
21 22
processor for special register accesses and for special cache The ?rst part of each instruction is the detection of a
operation requests. The memory bus instructions are issued processor bus cache instruction request. A processor bus
by special logic on the memory bus for special cache cache instruction request can be either a read or write
operation requests. This section sets forth examples of processor bus access. It is detected by the ICU in two cases:
different instruction transfer protocols and sets forth an 1) The *CSEL is asserted and enabled or 2) the address
example of a typical processor bus instruction and a memory inputs and CREQTO match the Cache Instruction Address
bus instruction set. in detail. and Cache Instruction Address Space ?elds in the Chip
The processor bus cache instructions and operands are Select Mapping Register.
transferred to the ICU from the processor by a special When no operand is required for the instruction
sequence of processor bus transactions. The instruction is execution. the transfer is completed in one cycle. In this case
executed whenever all the required operands are transferred the value on the cache bus is irrelevant and the instruction
to the ICU. execution starts immediately.
Some processor bus cache instructions have four optional If operands are required. there are two optional transfer
scopes of operation. The instruction scope speci?es the protocols which are supported by the ICU. The protocol is
number of ICUs which are affected by a cache instruction. selected according to the Instruction Transfer Protocol Con
The scope is designated by (s) in the instruction table. The trol bit of the Modb Register.
instructions without the (s) indication can operate only on
one speci?c ICU. The instruction scope can be designated as When ITPC is l. the cache bus is used for the operand
follows: transfer. The processor uses a normal write operation, and
the processor address and data buses are both used. The ICU
latches the operand value from the cache bus. For ICU used
speci?ed Opcode 2 as an instruction cache, external transceivers are required for
(s) signi?cant bits Meaning transferring the data from the ICU data bus to the cache bus.
0 00 One speci?c ICU atfected
This option is the natural selection for ICU used as a data
I 01 All instruction caches a?ected 25 cache. A more e?icient protocol is also achieved for ICU
D 10 All data caches a?ected used as an instruction cache in the expense of external
A 11 All caches affected transceivers.
When ITPC is 0, the cache bus in not used. Operands are
This ?exibility allows a system designer to issue cache transferred on the processor address bus. A special two cycle
30
instructions which operate on the desired addresses in the protocol is required for instruction and operand transfer.
appropriate caches. In a multiple ICU environment. it is After a valid instruction is latched, the ICU expects a special
more e?icient to issue one instruction that will operate on all data transaction on the processor bus. The CREQT inputs
the caches simultaneously. For example, the Invalidate should specify a memory data transaction. the option inputs
Block instruction with (s)=A (INVBA), invalidates the should specify a cache operand transfer, and the address bus
35
speci?ed block in all the ICUs (where the block is valid) in contains the operand. This special write transaction is
parallel. detected by the ICU and the operand on the address bus
Most instructions are executed in two cycles. However. latched. Note that the above transaction should be ignored
there may be exceptions as will be indicate hereinafter. by all other elements connected to the processor bus. This
During processor bus cache instruction execution, the ICU option is less e?icient than the other, but is does not require
can accept other processor transactions, (including new transceivers for ICU used as an instruction cache.
instruction transfers) but in most cases (unless otherwise Both options can be selected independently for ICU used
speci?ed in the instruction description) they are serviced as a data cache or an instruction cache. However, from
only after the processor bus cache instruction completes. system programming point of view, it might be desirable to
Memory bus transactions are serviced during the processor 45 access both caches in a similar way. If this is the case, both
bus cache instruction execution. However. since they are not caches can be programmed to respond to the same protocol.
synchronized to instruction execution. the ICU operation
re?ects the current state of the cache. If a count operand is required for the instruction
execution, a Write Register Instruction should be executed
All processor bus cache instructions are privileged. If the for writing it to the Count Register before the instruction
SUP/*US input is LOW. when the processor bus cache execution starts. The instruction is executed by the ICU
instruction access is performed, it is ignored and the *CERR whenever the instruction and all the required operands are
asserted as a response. The Status Register is updated with valid.
status information for the relevant instructions. In case of an
exception the Exception Address Register and the Status Processor interrupts can happen during the cache instruc
Register are updated with the exception information. 55
tion transfer protocol. The preferred embodiment of the ICU
There are several options for the communication between includes logic to recover from such cases. The Instruction
the ICU and the processor during cache instruction execu Register, Operand Register, and Count Register can be read
tion. The protocol is de?ned according to the Instruction for saving their content without affecting it, using the Write
Transfer Protocol Control (ITPC) and Read Register Control Register instruction. The Instruction Valid, Operand Valid,
(RRC) bits of the Modb Register as well as the cache and Count Valid bits indicate the validity of the correspond
instruction type and scope. All instructions. except Read ing register. The save and restore operations enable the
Register, are transferred by using a write operation. The instruction transfer protocol to continue from the point of
Read Registers instruction is transferred by using a read or interruption.
write operation as de?ned by the Read Register Control bit A set of Processor bus cache instructions found useful in
of the Modb Register. A description of these options can be one embodiment of the invention. in terms of Mnemonic,
found in Read Register instruction description set forth description and brief explanation of purpose, are set forth
hereinafter. immediately hereinafter in tabular form:
5,627,992
23 24
Update Memory and Invalidate (UPMIO. UPMII.
UPMID. UPMIA)if the four word block speci?ed by the
Mnemonic Description address operand is modi?ed, all the valid words are written
to the corresponding addresses in the memory. Then. all the
NOP No Operation
INVW(s) Invalidate word valid bits are reset. The write operation is performed by
lNVB(s) Invalidate block using a burst mode or single memory bus write access
]NVM(s) Invalidate multiple (depending on the number of valid Words). after the memory
]NVA(s) Invalidate all bus is granted to the ICU.
SWM(s) Send word if modi?ed
SBM(s) Send block if modi?ed
Update Memory and Invalidate Multiple (UPIMO.
UPMKS) Update memory and invalidate 10 UPIMI. UPJMD. UPIMA)multiple sequential Update
UPIM(s) Update memory and invalidate multiple Memory and Invalidate operations are executed by the ICU.
LCK(s) Lock block Every single operation is similar to the Update Memory and
ULCK(s) Unlock block Invalidate instruction. The instruction is executed in mul
RBST(s) Read block status tiple cycles (two cycles for initialization plus one cycle and
WBST(s) Write block status
SWH Send word if hit optional memory bus four word write access per four word
RTAG Read tag block). For this instruction. the Count Register must contain
WIAG Write tag a valid count. The Count Register is loaded by using the
PRLD Preload cache WRR instruction. The Count Valid bit remains set after the
RST Reset instruction execution. To guarantee correct execution. the
RDR(i) Read register Count Register should either be loaded before the UPIM
W'RR(i) Write register
instruction is requested or should contain the correct value
from a previous operation. This instruction can be used
It should be noted that the (s) symbol in the instruction in-order to update memory and invalidate a page frame. The
mnemonic indicates the instruction scope as previously Count Register should contain the number of blocks in a
de?ned herein and the (i) symbol in the RDR and WR page (page size in bytes divided by 16).
instructions indicates the register number which is speci?ed 25 Lock block (LCKO, LCKI. LCKD. LCKA)if the four
by the 3 least signi?cant bits of the opcode. word block (or part of it) speci?ed by the address operand
It should also be noted that speci?c operands (although is present in the cache. the corresponding block status Lock
not discussed herein) are speci?ed according to instruction bit is set.
requirements. If more than one operand is to be speci?ed. a
WR instruction is required for loading the second speci?ed Unlock block (ULCKO. ULCKI. ULCKD, ULCKA)if
operand. the four word block (or part of it) speci?ed by the address
On a functional level. the execution of the instructions
operand is present in the cache, the corresponding block
status lock bit is reset.
result in the following actions:
No Operation (NOP). no operation is executed by the ICU Read Block Status (RBSTO. RBSTI. RBSTD. RBSTA)
for this instruction. the speci?ed Word address is checked for presence in the
35 cache and the status register is updated accordingly.
Invalidate Word (INVWO. lNVWI. INVWD. INVWA)
(note different scopes)the word speci?ed by an address Write Block Status (WBSTO, WBSTI, WBSTD,
operand is invalidated (the corresponding valid bit is reset). WBSTA)the block which includes the speci?ed word
Invalidate Block (INVBO. INVBI. lNVBD, INVBA) address is checked for presence in the cache. If the block is
the four word block speci?ed by the address operand is present in the cache then the block status bits are updated
40
invalidated (all the corresponding valid bits are reset). based on the value of the bits in the status register.
Invalidate Multiple (INVMO. INVMI, INVMD, Send Word if Hit (SWH)if the word speci?ed by the
INVMA)multiple sequential four word blocks are invali address operand is present in the cache. it is written to the
dated by the ICU. Every single operation is similar to the corresponding address in the memory.
INVB instruction. The instruction is executed in multiple 45
Read Tag (RIAG)this instruction is used for reading a
cycles (two cycles for initialization+one cycle per four word speci?c tag in the cache.
block). For this instruction. the Count Register must contain Write Tag (WTAG)this instruction is used for writing a
a valid count. The Count Register is loaded by using the speci?c tag in the cache.
Write Register instruction. The Count Valid bit remains set Preload (PRLD)preload is a special operation that can
after the instruction execution. To guarantee correct 50 be performed in order to load a cache with speci?c data
execution, the Count Register should either be loaded before variables or instructions before they are needed The opera
the INVM instruction is requested or should contain the tion is done under software control. The addresses of the
correct value from a previous operation. This instruction can required variables or instructions are supplied by the user.
be used in order to invalidate a page frame. The Count Note that the preload operation is diiferent from simple
Register should contain the number of four word blocks in 55 prefetch operations. Prefetch is usually done under hardware
a page (page size in bytes divided by 16). control, and the prefetched addresses are in the close neigh
Invalidate All (INVAO. INVAI, INVAD, INVAA)this borhood of the address for the original memory request.
instruction invalidates all words in the cache. All valid bits User and compiler lmowledge of the program should be
are reset. used in order to predict the most valuable instructions or data
Send Word if Modi?ed (SWMO. SWMI, SWMD, variables. The appropriate addresses are preloaded into the
SWMA)-if the word speci?ed by the address operand is instruction and data caches before the program execution is
present in the cache and the modi?ed bit for the block is set, started. The operation can be done under the cache control
it is written to the corresponding address in memory. without the processor interference. This allows better utili
Send Block if Modi?ed (SBMO. SBMI, SBMD. SBMA) zation of the cache and higher overall performance.
if the four word block speci?ed by the address operand is 65 The preload operation, novel unto itself, is described in
modi?ed. all the valid words are written to the correspond the context of the incorporated RISC/SIP architecture. It can
ing addresses in the memory. be used in other cache systems as well.
5,627,992
25 26
The instruction is initiated by the processor. by transfer speci?ed as the instruction operand. The WRR instruction is
ring the opcode and the operands to the appropriate cache executed immediately as it is accepted. If valid instruction
unit. Two operands are required: an address operand and a and operand are present in the Instruction and Operand
count operand. Multiple sequential words are loaded to the registers, it is executed without affecting their content. This
cache. using memory bus read burst transaction. under the feature can be used for restoring the ICU registers.
ICU control. The address operand speci?es the starting word
address. The Count Register speci?es the number of four As indicated hereinbefore. the memory bus cache instruc
word blocks. When the highest possible address is reached. tions are issued by special logic on the memory bus. This
a wrap around is performed. logic should be able to direct the instruction to the appro
The instruction is executed in multiple cycles (two cycles 10 priate ICU. All the ICUs in the system which recognize a
for initialization plus one memory bus burst read access per valid instruction on their inputs execute it. There are no
word). For this instruction. the Count Register must also equivalent concepts to the processor bus instruction scope
contain a valid count. The preload instruction can be used in and privileged instructions on the memory bus.
various methods. Two basic options are as follows:
All the memory bus cache instructions can be transferred
A simple method of using the preload instruction does not to the ICU. when it is the bus slave. by using one cache
require special cache con?guration and can be used with any instruction transfer memory bus transaction. The Write
cache system. In this method, the preload instruction is Block Status instruction can be also transferred when the
issued under software control before the program execution.
The instruction can be issued as part of the context switch
ICU is the bus master. by asserting the *VSI input. The
procedure. Since the caches are preloaded the cold start instruction is executed whenever the required cache
effect is minimized. The main disadvantage of this method resources are available. after the instruction and the oper
is that the preload operation can interfere with other cache ands are accepted. All the instruction are executed internally
operations which are required by the processor. in two cycles. In the case that a memory bus operation is
A more complex cache con?guration can overcome the required for the instruction execution. the number of cycle
above limitation. This con?guration (referred to as switch 25 for the bus operation should be added.
able caches) require more than one cache per each processor, The memory bus cache instructions are executed inde
and a method to switch between the caches. The switchable pendently of other ICU operations on the processor bus
caches con?guration allows the operation of multiple caches (including the processor bus cache instructions). The
on the same address space. Two or more caches are placed memory bus instructions and the processor bus operations
in parallel. but only one of the caches is enabled for are executed according to the order in which they are
processor memory accesses response. The other cache or
received. The result of the memory bus instruction re?ects
caches can be programmed to preload the variables or
the current state of the cache. If a memory bus cache
instructions which are required for the next program (or
procedure). Before the execution of the next program starts, instruction is accepted during the execution of a multiple
the appropriate cache (which includes the preloaded data) is 35 processor bus cache instruction, it is executed without
enabled. and the other caches disabled. affecting the multiple instruction execution.
The implementation of this scheme is fully supported by All the memory bus cache instruction. except Write Word
the ICU. All the necessary support is implemented on the in Cache. has an equivalent processor bus cache instruction
single chip. An individual ICU can. be disabled or enabled with a similar name. These instructions have a similar e?tect
by programming a mode register. The preload instruction is on the internal cache, however, the processor bus and
implemented so that this scheme is supported. Speci?cally memory bus operations are diiferent.
the ICU can perform the preload instruction. even if it is
disabled for normal cache operations. Two or more ICUs can All the memory bus cache instructions are transferred
be placed in parallel and only one can be enabled for normal using one cache instruction transfer memory bus transaction.
45
cache operations, while the other performs preload opera When the ICU is not the bus master, the assertion of *VSI
tions (no glue logic is required). causes a cache instruction transaction on the memory bus.
This con?guration can have a signi?cant performance During the ?rst cycle of the transaction, the MRW, MREQT,
advantage. since the preload operations are performed with BSTC signals, and the address are latched. The MRW and
minimal impact on the current program execution. 50 MREQT signals specify the instruction. The BSTC signals
Reset (RST)-the Reset instruction performs the same specify the block status for the relevant instructions. The
function as performed by asserting the *RESET input. The address is used as the address operand for the relevant
ICU is initialized. The RST instruction is executed imme instruction. If the instruction requires data operand, the data
diately as it is accepted. A multiple instruction (INVM. is latched during the second cycle.
UPIM. PRLD) execution is terminated. 55
Read Register (RDR)the Read Register instruction is When the ICU is the bus master, and it issues a read
used for reading the special registers of the ICU. The special request for reload operation. the V81 input has a special
register number is speci?ed by the opcode. The ICU function. If it is asserted before the transaction is completed
responds in different ways to this instruction depending on (the last *MRDY has not been accepted), then the BSTC
the state of a Read Register Control (RRC) bit in the Modb signals are latched, and a Write Block Status instruction is
Register. When RRC is l, a read transaction with the executed. The address operand is the block address which
instruction on the address bus initiates the instruction. When correspond to the address that was transferred by the ICU for
RRC is 0. the main memory is used in order to read the the reload operation.
registers. A set of memory bus cache instructions found useful in
Write Register (WRR)the Write Register instruction is 65 one embodiment of the invention. in terms of Mnemonic,
used for writing the special registers of the ICU. The special description and brief explanation of purpose, are set forth
register number is speci?ed by the opcode. The data is immediately hereinafter in tabular form:
5 ,627,992
27 28
(the last *MRDY has not been accepted), then the BSTC
signals are latched. and the Write Block Status instruction is
MRW MREQT Mnemonic description executed. The address operand is the address that was
00 O0 NOP No operation transferred by the ICU for the reload operation. In this case,
00 Ol SWH Send Word if hit the block which includes the speci?ed word address is
00 1O SWM Send word if modi?ed always present in the cache (since it is being reloaded). The
00 ll SBM Send block if modi?ed block status bits of this block are copied from the BSTC
01 0O INVW Invalidate word
01 01 INVB Invalidate block signals. The Status Register is not alfected.
01 10 WEST Write block status Write Register (WRR)the Write Register instruction is
01 11 WRR Write register used for writing the special registers of the ICU.
10 0O UPMI Update memory and invalidate
1O 01 RBST Read block status Update Memory and Invalidate (UPMI)if the four word
l0 l0 RDR Read register block speci?ed by the address operand is modi?ed. the *HII
ll 10 WRW Write word in cache and the *MRDY signals are asserted and the block status
shared and modi?ed bits are driven on the BSTC signals.
It should be noted that the instruction codes are arranged Then. all the valid words are Written to the corresponding
so that MRWO is consistent with other memory bus opera addresses in the memory, and then the valid bits are reset.
tions. For MRWO LOW. the dissection is from the ICU to Read Block Status (RBST)the speci?ed word address is
the memory bus and for MRWO HIGH. the direction is from checked for presence in the cache. The block status bits are
the memory bus to the ICU. 20
driven on the BSTC signals. The *H1T and *MRDY are also
On the functional level. the execution of the instructions driven accordingly.
result in the following actions: Read Register (RDR)the Read Register instruction is
No Operation (NOP)this instruction has no affect on the used for reading the special registers of the ICU.
ICU. The memory bus cache instruction transaction protocol Write Word in Cache (WRW)--if the word speci?ed by
is performed as for any instruction. 25 the address operand is present in the cache. the *HIT signal
Send Word if Hit (SWH)if the word speci?ed by the is asserted and the supplied data is written into the cache.
address operand is present in the cache. the *HH signal is The two least signi?cant bits of the address are ignored
asserted. The word is driven on the memory address bus. the Note that the address cycle is speci?ed by the *VSI input
block status shared and modi?ed bits are driven on the and not by the *MASTB signal. The data is speci?ed by the
BST C signals. and the *MRDY is asserted. value on the memory bus during the cycle following the
Send Word if Modi?ed (SWM)if the word speci?ed by address cycle. If the word is not present in the cache the
the address operand is present in the cache and the modi?ed *H1T signal is deasserted and the Hit and Valid bits of the
bit for the block is set, the *H1T and the *MRDY signals are Status Register are reset. The *MRDY signal is asserted
asserted. and the block status shared and modi?ed bits are when the ICU latches the data from the memory bus.
driven on the BSTC signals. Then. the word is written to the 35 Having completed the description of a suitable and useful
corresponding address in the memory. instruction set for use with the preferred embodiment of the
Send Block if Modi?ed (SBM)if the four word block invention, a brief description of data formats and data
speci?ed by the address operand is modi?ed, the *H1T and manipulation mechanisms supported by the novel ICU will
the *MRDY signals are asserted. and the block status shared now be set forth.
and modi?ed bits are driven on the BSTC signals. Then, all As indicated hereinbefore. a word is de?ned as 32 bits of
the valid words are written to the corresponding addresses in data. Ahalf word consists of 16 bits. Abyte consists of 8 bits.
the memory. The ICU has direct support for word. half word and byte
Invalidate Word (INVW)the word speci?ed by the accesses. On the processor bus. the access length is deter
address operand is invalidated (the corresponding valid bit is mined according to the OPT inputs. The access length is
reset). The *H1T signal is driven according to the hit or miss e?ective only for single data memory accesses. It is ignored
conditions and the *MRDY asserted after the instruction and a word length is assumed. in all other processor bus
execution is completed. accesses (including burst mode memory accesses). On the
Invalidate Block (1NVB)the four word block speci?ed memory bus, the access length is determined according to
by the address operand is invalidated (all the corresponding the MDLN signals.
50
valid bits are reset). The *H1T signal is driven according to The numbering conventions for data units Within a word
the hit or miss conditions and the *MRDY asserted after the for the preferred embodiment of the invention. are consistent
instruction execution is completed. with the RISC/SIP de?nitions set forth in the incorporated
Write Block Status (WBST)this instruction can be application describing a RISC processor. Bits are numbered
executed either when the ICU is the bus slave or the bus 55 in increasing order from right to left. Bytes and half words
master. When the ICU is the slave the address operand is can be ordered either from right-to-left or from left-to-right,
speci?ed by the memory bus transaction. In this case. the as controlled by the Byte Order bit of the Modb Register.
block which includes the speci?ed word address is checked The preferred embodiment of the ICU contains the nec
for presence in the cache. If the block is present then the essary hardware to fully support byte. half word and word
*H1T and *MRDY signals are asserted, and the block status accesses. The di?erent data types should be aligned on their
bits are copied from the BSTC signals. If the block is not natural address boundaries. Accesses which are not on their
present in the cache then the *HII signal is deasserted and natural boundary are reported on the Error and Status
the *MRDY asserted. The Hit and Valid bits of the Status registers; but the access is serviced as if it was correctly
Register are reset. aligned (i.e. the appropriate address bits are ignored).
When the ICU is the bus master and it issues a read For memory byte read operations on the processor bus,
request for a reload operation the *VSI input has a special alignment hardware shifts the byte to the low order
function. If it is asserted before the transaction is completed (rightmost) location within a word.