Beruflich Dokumente
Kultur Dokumente
Tag+Data Tag+Data
index
Address Repla-
Deco- cement
der
CacheReq
tag Tag+Data Tag+Data
Policy
Comparator Comparator
Hit
0 Hit
index assoc-1
Ack
write Cache Bank2Rep
Controler init Power
Req Consum.
Consum.
cacti
To/from Main Memory
Figure 3. Internal structure of the cache memory
The cache uses the write-allocate policy to deal configuration file. The configuration file contains
with write misses. The power consumption the following parameters:
evaluation is performed by attaching the Cacti –nlines <num. of cache lines>
model to the cache controller. In fact, when the –bsize <block size in bytes>
cache is declared in the SystemC description, the –assoc <cache associativity>
cache configuration parameters are used to evaluate –readPorts <num. of cache read ports>
the access time and the energy consumption for –writePorts <num of cache write ports>
each access to the cache. –readWritePorts <num. of cache read_write ports
These two values are stored by the cache module. –techno <technology size in micron>
These values in conjunction with activity statistics –memLat <main memory access latency in cycles>
of the cache-module (number of accesses with hits, -cpu2cache <cpu to cache bus width in bytes>
misses, external bus access, etc.) are used to -cache2mem <cache to main memory bus width in
evaluate the total execution time in cycles, as well bytes>.
as the total energy consumed by the cache at the The second use of our cache module is as a
end of the simulation. SystemC module in a SoC description. In this case,
the cache declaration must simply be added to the
4. Using the cache module in a SoC description as follows:
Cache<cpu2cache, cache2mem> *dcache =
SystemC SoC description new Cache<cpu2cache, cache2mem> ("dcache",
Our cache modules can be used in two different nAssoc, nLines, bSize, techSize, rwPort, rPort, wPort,
ways. First, they can be used separately to analyze memLat);
the cache performance of a given application. In Figure 4, 5 and 6 depict the experimental results
this case, the cache is activated by the following for our cache where the cache performance is
command: analyzed separately (first method). Figure 5 shows
sc-cacheAnal –f <trace file> -config <config_file> the execution outputs of our SystemC cache
where sc-cacheAnal is the SystemC cache name, description for a trace file. In this example, the
and <trace file> represents the file containing the merge sort program was used on a vector of 20 000
list of memory access addresses generated by elements. In the simulation results shown here give
memory tracing during functional simulation. The statistics after the first 121118 memory references.
parameter <config_file> corresponds to the cache The outputs contain two sets of statistics.
start ..... ……etc….
Cacti Statistics: compare (ns): 0.557825
Main Memory configuration: latency = 2 (nJ): 0.0110586
Cache configuration: *******************
Size in bytes: 8192 SYSTEMC CACHE POWER AWARE SIMULATOR
Number of sets: 128 ****************
Associativity: 2 Cache Configuration :
Block Size (bytes): 32 LSU to Dcache Bus width in bytes : 4
Read/Write Ports: 1 Dcache to Mem bus width in bytes : 8
Read Ports: 0 Write Ports: 0 Statistics :
Technology Size: 0.35um Vdd: 2.6V Load / Store Instruction Nbr : 121118
Access Time (ns): 2.19856 SystemC: simulation stopped by user.
Power (nJ): 3.37432 simulation time : 5.53403 seconds
Best Ndwl (L1): 1 Best Ndbl (L1): 2 #cycles : 131330
Time Components: #Miss: 1733 #Hit : 119385
data side (with Output driver) (ns): 1.70219 #Cache Bloc Read : 121118
tag side (with Output driver) (ns): 2.19856 #Cache Bloc Write: 38107
decode_data (ns): 0.405051 Power per access : 3.37432e-09
(nJ): 0.075142 Total power in Cache (J) = 0.000537276
wordline and bitline data (ns): 0.601265
Figure 4. Statistics report for an application example
6. References:
Figure 6. Total energy consumption
[1] T. Mudge. “Power: A first class design
The first set corresponds to those given by Cacti constraint”, IEEE Computer,April 2001.
and are related only to the cache configuration and [2] G.Martin H.Chang, “Winning the SoC
not to the application. Cacti also reports the power Revolution”, Kluwer Academic Publi.
and access time contribution of each cache [3] www.systemc.org
component (decoder, wordline, bitline, etc). [4] www.microlib.org
The second set of statistics corresponds to [5] Orinoco, www.chipvision.com
application performance. It consists of the number [6] S. Wilton and N. Jouppi. An Enhanced
of memory references, the number of cycles needed Access and Cycle Time Model for On-
to execute these memory references, the number of Chip Caches. Research Report WRL 1994.
hits and misses in the cache, and the total energy [7] P. Shivakumar and N. P. Jouppi. CACTI
consumed by the cache. 3.0: An integrated cache timing, power,
Figures 5 and 6, respectively, present the total and area model, Research Report WRL01.
execution time (in millions of cycles) and the total [8] S. Niar, L.Eekhout, K.DeBosschere,
energy consumption in milliJoule Joule (mJ) for “Comparing multiported cache schemes”.
executing the merge sort program on an array of 20 Inter. Conf. on Parallel and Distributed
000 elements. This program generates 1 409 836 Processing Techniques and Appli., 2003.