Beruflich Dokumente
Kultur Dokumente
10/27/16
Contents
Introduction
Architecture
Clock distribution
FIFO
Leakage optimization
Core voltage scaling & power gating
Array design
Conclusion
10/27/16
Introduction
Architecture
10/27/16
Dependent packets helps to increase the no.
instruction per packets, increase utilization of
functional units.
Eight 16 bit multiply accumulate operations
per cycles
Dynamic multi-threading real time thread
performance
Communicated with external memory using
asynchronous bus
10/27/16
Processor
Architecture
10/27/16
Clock Distribution
The clock enters the core through a voltage
shifter- optimized for low clock insertion delay
and duty cycle distortion across the range of
voltages of DSP.
The circuit can apply different delays to the
rising and falling edges-adjust duty cycleimprove the timing of phase path.
Each clock bay consist of a long metal 8-wire
driven from the middle of the inverter (low
resistance) .
10/27/16
10/27/16
Levels
Global clock
10/27/16
10
Pulsed Latches
Reduces power .
Improves robustness.
11
FIFO
12
When read/write operations done the pointer
values updated new value sent to receiver
entry valid and readable
Handshaking protocols should include enough
margin to cover any variation between the
writer pointer and array timing
FIFO includes a writer pointer tracking circuitmimic the path of the input register to the
latch array
10/27/16
13
Asynchronous FIFO
10/27/16
14
Leakage optimization
10/27/16
15
Leakage recovery processor provides other
benefits
provides crosstalk
provide delay timing
reduces the no. delay circuits
maximum frequency of the core is improved
after the leakage recovery
10/27/16
16
10/27/16
17
When powered down ,all the tiles are turned off
simultaneously
Proper sequence is important to safely power
down the core
Data retention is desired ,L1 cache dataL2
cache
Signal controlling the sleep state L2 cache
must be latched on to the memory supply and
output to the VDDQ6power domain must be
isolated.
10/27/16
18
Includes the core output as well as internal
interface to L2 cache
Isolation prevents flow of short circuit currents
and the propagation of unknown logic values to
the power domain .
The BHS is finally turned off . Power-up sequence
resets the core before removing the isolation.
In low power mode , LDO can reduce the voltage
of DSP while the rest of the system stays at
higher voltage .
10/27/16
19
During power down , BHS can cut the DSP
leakage to practically zero .
LDO does not require an external capacitor.
When the current demand of the core increases,
not having a large capacitor to quickly supply
charge forces the LDO itself to be faster.
Increase its gate capacitance and the time
required to change its gate voltage degrades
the ability of LDO to respond quickly to transient
events.
10/27/16
20
Switching between LDO and BHS is allowed
while DSP is running .
When transition from LDO to BHS is
progressively-LDO no longer regulates its
output voltage and can be turned off .
When transition from BHS to LDO is forced to
have minimum impedance state by the digital
controller and BHS is turned off controller
gradually increases the impedance of LDO
output voltage drops to its target value .
10/27/16
21
10/27/16
22
Array Design
23
An access takes 2 cycles
It eliminates unnecessary access to the data
array when a read miss occurs
L1 data cache is stored in SRAM array
DSP is multi threaded and each thread operates
in its own virtual memory region in a different
virtual page number
CAM (content addressable memory) and
VPN(virtual page number) ,area is significantly
large and large power dissipation
10/27/16
24
Tag and data array of L1 cache must be
accessed in a single cycle
This allows much of the dynamic power
consumption and leakage of the instruction
cache to scale down with the core voltage
10/27/16
25
Conclusion
10/27/16
26
References
27
QUESTIONS
10/27/16
28
Thank u.
10/27/16
29