Sie sind auf Seite 1von 56

TMS320C54XX

BY: Dr. Sudhir N. Shelke


Principal
Guru Nanak Institute of Technology, Nagpur
Architecture of TMS320C54XX

9/14/2017 Dr. Sudhir N Shelke


INTROUCTION
This unit provides the architectural overview of TMS320C54XX
which comprises of :-

• CPU
• On Chip Memory
• On Chip Peripherals
• Addressing Modes
• Interrupts
• Program Control
• Internal Memory Bus Organization
• Buses
• Pipelining

9/14/2017 Dr. Sudhir N Shelke


INTROUCTION
• The C54XX DSP uses modified harvard architecture that
maximizes processing power eight buses.

• Separate Program & Data buses allow simultaneous access to


program & data providing high degree of parallelism.

• Data can be transferred between program & data memory.

9/14/2017 Dr. Sudhir N Shelke


Efficient data/program flow
#1: CPU designed for efficient DSP processing
 MAC unit, 2 Accumulators, Additional Adder,
Barrel Shifter
#2: Multiple busses for efficient data
and program flow
 Four busses and large on-chip memory that
result in sustained performance near peak
#3: Highly tuned instruction set for
powerful DSP computing
 Sophisticated instructions that execute in fewer
cycles, with less code and low power demands
9/14/2017 Dr. Sudhir N Shelke
TMS320C54x Internal Block Diagram

9/14/2017 Dr. Sudhir N Shelke


Buses in C54xx

9/14/2017 Dr. Sudhir N Shelke


Buses in C54XX
 The C54XX architecture is built around 8 major 16 bit buses.

 The Program Bus carries the instruction code & immediate operands
from program memory.

 Three data buses (CB,DB,EB) interconnect to various elements such


as CPU, Data address generation logic ,on chip Peripherals & data
memory.

 The CB & DB carry the data operands that are read from memory.

 The EB carries the data to be written to memory.

 Four address buses (PAB, CAB, DAB, and EAB) carry the addresses
needed for instruction execution.

9/14/2017 Dr. Sudhir N Shelke


Buses usage

CNTL PC ARs
P

MEMORY
EXTERNAL
INTERNAL

M
MEMORY

U D M
X C U
E X
S E
Central
Arithmetic
Logic Unit
T MAC A B ALU SHIFTER

9/14/2017 Dr. Sudhir N Shelke


Buses…

9/14/2017 Dr. Sudhir N Shelke


Buses…

9/14/2017 Dr. Sudhir N Shelke


Buses

• The C54x DSP can generate up to two data-memory addresses


per cycle using the two auxiliary register arithmetic units
(ARAU0 and ARAU1).

• The PB can carry data operands stored in program space to


the multiplier and adder for multiply/accumulate operations
or to a destination in data space for data move instructions.

• The C54x DSP also has an on-chip bidirectional bus for


accessing on-chip peripherals. This bus is connected to DB
and EB through the bus exchanger in the CPU interface

9/14/2017 Dr. Sudhir N Shelke


9/14/2017 Dr. Sudhir N Shelke
Internal Memory Organization

9/14/2017 Dr. Sudhir N Shelke


Internal Memory Organization
 The C54XX DSP memory is organized into three individually
selectable spaces: program, data, and I/O space.
 The C54x devices can contain random access memory (RAM)
and read-only memory (ROM).
 Among the devices, the following types of RAM are
represented: dual-access RAM (DARAM), single-access RAM
(SARAM), and two-way shared RAM.
 The DARAM or SARAM can be shared within subsystems of
a multiple-CPU core device.
 We can configure the DARAM and SARAM as data memory
or program/data memory.
 The C54x DSP also has 26 CPU registers plus peripheral
registers that are mapped in data-memory space.

9/14/2017 Dr. Sudhir N Shelke


Internal Memory Organization

9/14/2017 Dr. Sudhir N Shelke


On Chip ROM
 The on-chip ROM is part of the program memory space and,
in some cases, part of the data memory space.

 The amount of on-chip ROM available on each device varies


On most devices, the ROM contains a boot loader that is
useful for booting to faster on-chip or external RAM.

 On devices with large amounts of ROM, a portion of the


ROM may be mapped into both data and program space.

9/14/2017 Dr. Sudhir N Shelke


On Chip DARAM

 The amount of on-chip DARAM available on each device


varies.
 The DARAM is composed of several blocks. Because each
DARAM block can be accessed twice per machine cycle.
 The CPU and peripherals, such as a buffered serial port
(BSP) and host-port interface (HPI), can read from and write
to a DARAM memory address in the same cycle.
 The DARAM is always mapped in data space and is primarily
intended to store data values. It can also be mapped into
program space and used to store program code.

9/14/2017 Dr. Sudhir N Shelke


On Chip SARAM
 The amount of on-chip SARAM available on each device
varies.

 The SARAM is composed of several blocks. Each block is


accessible once per machine cycle for either a read or a write.

 The SARAM is always mapped in data space and is primarily


intended to store data values.

 It can also be mapped into program space and used to store


program code.

9/14/2017 Dr. Sudhir N Shelke


MEMORY MAPPED REGISTERS (MMR)
• The data memory space contains memory-mapped registers for
the CPU and the on-chip peripherals.

• These registers are located on data page 0, simplifying


access to them.

• The memory-mapped access provides a convenient way to


save and restore the registers for context switches and to
transfer information between the accumulators and the other
registers.

9/14/2017 Dr. Sudhir N Shelke


Central Processing Unit (CPU)
40 Bit ALU

Two 40 bit Accumulators

Barrel Shifter

17 X 17 bit Multiplier

40 Bit Adder

16 Bit Temp Register

CSSU
9/14/2017 Dr. Sudhir N Shelke
STATUS REGISTERS

ST0: Contains the status of flags (OVA, OVB, C, TC)


produced by arithmetic operations & bit
manipulations.

ST1: Contain the status of various conditions &


modes. Bits of ST0 & ST1 registers can be set or clear
with the SSBX & RSBX instructions.

9/14/2017 Dr. Sudhir N Shelke


ST0

•DP: Data memory page pointer, concatenated with the 7-LSBs of an


instruction word to form a direct memory address of 16-bits, if CPL = 0.

 • OVB: Overflow for AccB.


 • OVA: Overflow for AccA.
•C: Carry,
1 for Carry generated by addition. 0 for Borrow generated by
subtraction otherwise, 0 for add & 1 for sub.
• TC: Test/Control flag, Stores the result of ALU test bit
operations.
• ARP: Auxiliary Register Pointer, Selects AR0 –AR7 for
indirect single-operand addressing.
9/14/2017 Dr. Sudhir N Shelke
ST1

•15.BRAF: Block-Rep active flag


BRAF=0, when BRC< zero; BRAF=1, when RPTB

• 14.CPL: Compiler mode.


CPL=0, DP is selected; CPL=1, SP is selected

• 13.XF: External flag, a GP O/P pin for multiprocessor configuration.


Set: SSBX; Reset: RSBX

• 12.HM: Hold Mode, determines whether the CPU stops or continues


execution when acknowledging an active HOLD signal.

9/14/2017 Dr. Sudhir N Shelke


• 11.INTM: Interrupt mode.
0, all unmasked interrupts are enabled
1, all maskable interrupts are enabled

• 10. O: Overflow.
• 09.OVM: Overflow mode, enables (1) / disables(0) the
accumulator to saturate on overflow.

• 08.SXM: Sign extension mode, enables / disables sign


extension of an arithmetic operation

9/14/2017 Dr. Sudhir N Shelke


•07.C16: Dual 16-bit/ Double precision arithmetic mode.
C16=0, ALU operates in double precision mode
C16=1, ALU operates in dual 16 bit arithmetic mode

• 06.FRCT: Fractional mode (multiplication)


If 1, multiplier output is left shifted by 1 bit to compensate for extra
sign bit

• 05.CMPT: Compatibility mode for ARP.


(ARP not updated(0), ARP updated(1))

• 04.ASM: Accumulator Shift Mode.


Specifies a shift value of -16 to +15 range and is coded as 2’s
complement value
9/14/2017 Dr. Sudhir N Shelke
ALU
• The 40-bit ALU, implements a wide range of arithmetic and
logical functions, most of which execute in a single clock
cycle.

• After an operation is performed in the ALU, the result is


usually transferred to a destination accumulator (accumulator
A or B).

• The ALU can also function as two 16-bit ALUs and perform
two 16-bit operations simultaneously.

9/14/2017 Dr. Sudhir N Shelke


ACCUMULATORS
 The C54XX devices have two 40 bit ACC’s A & B.

 Accumulators A and B store the output from the ALU or the


multiplier/adder block. They can also provide a second input to the ALU.

 Accumulator A can be an input to the multiplier/adder.

 Each accumulator is divided into three parts:


 Guard bits (bits 39–32)
 High-order word (bits 31–16)
 Low-order word (bits 15–0)

9/14/2017 Dr. Sudhir N Shelke


BARREL SHIFTER

• The C54x DSP barrel shifter has a 40-bit input connected to the accumulators
or to data memory (using CB or DB), and a 40-bit output connected to the ALU
or to data memory (using EB).

• The barrel shifter can produce a left shift of 0 to 31 bits and a right shift of 0
to 16 bits on the input data.

• The shift requirements are defined in the shift count field of the instruction,
the shift count field (ASM = Accu shift mode) of status register ST1, or in
temporary register T (when it is designated as a shift count register).

• The shift count determines how many bits to shift. Positive shift values
correspond to left shifts, whereas negative values correspond to right shifts.

9/14/2017 Dr. Sudhir N Shelke


MAC UNIT
 The C54xx CPU has a 17-bit × 17-bit hardware multiplier coupled to a
40-bit dedicated adder.

 This multiplier/adder unit provides multiply and accumulate (MAC)


capability in one pipeline phase cycle.

 Signed / unsigned multiplication.

 First Input to the Multiplier:-


 Temp Register
 Data Memory Operand from DB
 ACC A (32-16)

 Second Input to the Multiplier:-


 Data Memory Operand from CB
 Data Memory Operand from DB
 Data Memory Operand from EB
 ACC A (32-16)

9/14/2017 Dr. Sudhir N Shelke


REGISTERS
Temporary Registers:-
• It may hold one of the multiplicands for Multiplication
Instructions.

• A dynamic shift count for instructions with shift operation


such as ADD & SUB instruction.

• It may hold branch metrics of Viterbi decoding.

• In addition the EXP instruction stores the exponent value


computed into Temp Reg & the NORM instruction uses the
Temp Register value to normalize the number

9/14/2017 Dr. Sudhir N Shelke


REGISTERS

Transition Register:-

• The 16 bit Transition Register holds the transition decisions


for the path to new metrics to perform Viterbi algorithm.

• The CMPS instruction compares the updates the content of


TRN Reg on the basis of comparison between ACC High
Word & ACC Low Word.

9/14/2017 Dr. Sudhir N Shelke


REGISTERS

Auxillary Registers:-

• The Eight 16 bit ARs (AR0 – AR7) can be accessed by CPU &
modified by ARAU.

• The primary function of ARs is to generate 16 bit addresses


for data space.

• However these registers can also act as general purpose


registers.

9/14/2017 Dr. Sudhir N Shelke


REGISTERS

Stack Pointer:-

• The 16 bit Stack Pointer (SP) Register contains the 16 bit


address of Top of Stack.

• The SP always points to last element pushed onto the stack.

• The stack is manipulated by Interrupts, Traps , Calls,


Returns, PSHD,PSHM,POPD,POPM Instructions.

9/14/2017 Dr. Sudhir N Shelke


COMAPRE SELECT STORE UNIT(CSSU)
The compare, select, and store unit (CSSU) is an
application-specific hardware unit dedicated to
add/compare/select (ACS) operations of the Viterbi operator.

9/14/2017 Dr. Sudhir N Shelke


CSSU
 The CSSU allows the C54x device to support various Viterbi
butterfly algorithms used in equalizers and channel
decoders.
 The add function of the Viterbi operator is performed by the
ALU. This function consists of a double addition function
(Met1 ± D1 and Met2 ± D2).
 Double addition is completed in one machine cycle if the
ALU is configured for dual 16-bit mode by setting the C16
bit in ST1.
 With the ALU configured in dual 16-bit mode, all the long-
word (32-bit) instructions become dual 16-bit arithmetic
instructions.

9/14/2017 Dr. Sudhir N Shelke


CSSU

9/14/2017 Dr. Sudhir N Shelke


Working of CSSU
1. The CSSU implements the compare and select operation via
the CMPS instruction, a comparator, and the 16-bit transition
register (TRN).

2. This operation compares two 16-bit parts of the specified


accumulator and shifts the decision into bit 0 of TRN.

3. This decision is also stored in the TC bit of ST0.

4. Based on the decision, the corresponding 16-bit part of the


accumulator is stored in data memory.

9/14/2017 Dr. Sudhir N Shelke


PROGRAM CONTROL

The Program Control unit of TMS320C54XX processors


contain:-

Program Counter
Hardware Stack
Repeat Counters
Status Registers

9/14/2017 Dr. Sudhir N Shelke


Program Counter
The PC addresses the Program Memory either on chip or off chip
& is loaded in one of the several ways:-
Code Operation Address loaded into PC
Reset PC is loaded with FF80h
Sequential Execution PC is loaded with PC+1

Branch PC is loaded with the 16-bit-immediate value directly following the


branch instruction
Branch from ACC PC is loaded with the lower 16-bit word of accumulator A or B
Block Repeat Loop PC is loaded with the repeat start address (RSA) when PC + 1 equals the
repeat end address (REA) + 1,
provided that BRAF = 1.
Subroutine Call PC+2 is pushed on stack & PC is loaded with 16 bit immediate value
following CALL instruction. The return instruction pops the top of stack
into PC to return.

Interrupts P C is pushed onto stack & PC is loaded with address of appropriate


vector address. The return instruction pops the top of stack into PC to
return.
9/14/2017 Dr. Sudhir N Shelke
PROGRAM CONTROL
• The program counter related hardware PAGEN provides for above
options.

• Hardware Stack: The Stack is used to solve & restore the PC value
during subroutine Call & Interrupts.

• Repeat Counter: A single instruction can be repeated N+1 times by


loading value N in Repeat Counter Register, likewise a block of
instructions can be repeated N+1 times by loading value into Block
Repeat Counter Register.

• Status Register :The TMS320C54XX contains


• ST0
• ST1

9/14/2017 Dr. Sudhir N Shelke


INTERRUPTS
• Many times when the CPU is in the midst of executing a
program a peripheral device may require a service from CPU.

• In such a situation main program may be interrupted by signal


generated by peripheral devices.

• This results in processor suspending the main program in order


to execute another program called Interrupt Service Routine to
service the peripheral.

• On completion of ISR the processor returns to the main


program to continue from where it left.

• Interrupt may be generated by internal or external device.


9/14/2017 Dr. Sudhir N Shelke
INTERRUPTS
• It may also generated by software.

• Not all the interrupts are serviced by when they occur only
those interrupts that are called non maskable are serviced
when they occur.

• Other Interrupts which are called maskable interrupts are


serviced only if they are enabled.

• There is also a priority to determine which interrupts gets


serviced first if more than one interrupts occur simultaneously.

9/14/2017 Dr. Sudhir N Shelke


PIPELINE OPERATION of TMS320C54XX

• The C54xx DSP has a six-level deep instruction pipeline.

• The six stages of the pipeline are independent of each other,


which allows overlapping execution of instructions.

• During any given cycle, from one to six different instructions


can be active, each at a different stage of completion.

9/14/2017 Dr. Sudhir N Shelke


PIPELINE OPERATION of TMS320C54XX

• The six levels and functions of the pipeline structure are:

• Program address bus (PAB) is loaded


Program Prefetch with the address of the next instruction
to be fetched.

• An instruction word is fetched from the


program bus (PB) and loaded into the
Program fetch instruction register (IR). This completes
an instruction fetch sequence that
consists of this and the previous cycle.

9/14/2017 Dr. Sudhir N Shelke


• The contents of the instruction
register (IR) are decoded to determine

Decode the type of memory access operation


and the control sequence at the data-
address generation unit (DAGEN) and
the CPU.

• DAGEN outputs the read operand’s


address on the data address bus, DAB.
If a second operand is required, the

Access other data address bus, CAB, is also


loaded with an appropriate address.
Auxiliary registers in indirect
addressing mode and the stack
pointer (SP) are also updated.
9/14/2017 Dr. Sudhir N Shelke
• The read data operand(s), if any, are
read from the data buses, DB and CB.
This completes the two-stage operand

Read read sequence. At the same time, the


two-stage operand write sequence
begins. The data address of the write
operand, if any, is loaded into the data
write address bus (EAB).

Execute
• The operand write sequence is
completed by writing the data using the
data write bus (EB). The instruction is
executed in this phase

9/14/2017 Dr. Sudhir N Shelke


PIPELINING STAGES

9/14/2017 Dr. Sudhir N Shelke


Show the pipeline operation of following sequence of
instructions if the initial value of AR3 is 80 & the values stored in
memory location 80,81,82 are 1,2,3.

LD * AR3+,A
ADD #100h,A
STL A,*AR3+
-----------
-----------

9/14/2017 Dr. Sudhir N Shelke


Execute
Cycle Prefetch Fetch Decode Access Read AR3 A
& Write
1 LD 80 X
2 ADD LD 80 X
3 STL ADD LD 80 X
4 STL ADD LD 81 X
5 STL ADD LD 82 1
6 STL ------ LD 82 0001h
7 STL ADD 82 1001h
8 STL 82 1001h

9/14/2017 Dr. Sudhir N Shelke


ON CHIP PERIPHERALS
General-purpose I/O pins: XF and BIO

Timer

Host port interface (HPI)

Synchronous serial port

Buffered serial port (BSP)

Multichannel buffered serial port (McBSP)

Time-division multiplexed (TDM) serial port

Software-programmable wait-state generator

Programmable bank-switching module


9/14/2017 Dr. Sudhir N Shelke
GENERAL-PURPOSE I/O
• The C54xx DSP offers general-purpose I/O through two dedicated pins that
are software controlled. The two dedicated pins are the branch control input
pin (BIO) and the external flag output pin (XF).

• BIO can be used to monitor the status of peripheral devices.

• XF can be used to signal external devices. The XF pin is controlled using


software.

• It is driven high by setting the XF bit (in ST1) and is driven low by clearing
the XF bit. The set status register bit (SSBX) and reset status register bit
(RSBX) instructions can be used to set and clear XF, respectively.

9/14/2017 Dr. Sudhir N Shelke


SOFTWARE PROGRAMMABLE WAIT STATE
GENERATOR

• Software Programmable wait state generator extends external bus cycle up


to seven machine cycles to interface with slower off chip memory &
devices.

• The Software wait state generator is incorporated without any external


hardware.

• For off chip memory access from zero to seven wait states can be specified
within the software wait state register.

9/14/2017 Dr. Sudhir N Shelke


HOST PORT INTERFACE
• The host port interface is an 8 bit parallel port that provides an
interface with host processor.

• Information is exchanged between C54xx & host processor the


C54xx on chip memory that is accessible to both C54xx &
host processor.

9/14/2017 Dr. Sudhir N Shelke


HARDWARE TIMER
• The on-chip timer is a software-programmable timer that consists of three
registers and can be used to periodically generate interrupts.

• The timer resolution is the CPU clock rate of the processor.

• The high dynamic range of the timer is achieved with a 16-bit counter with
a 4-bit prescaler.

• Timer Registers:-
The on-chip timer consists of three memory-mapped registers (TIM, PRD,
and TCR).

9/14/2017 Dr. Sudhir N Shelke


• Timer register (TIM):The 16-bit memory-mapped timer register (TIM)
is loaded with the period register (PRD) value and decremented.

• Timer period register (PRD): The 16-bit memory-mapped timer period


register (PRD) is used to reload the timer register (TIM).

• Timer control register (TCR):The 16-bit memory-mapped timer control


register (TCR) contains the control and status bits of the timer.

9/14/2017 Dr. Sudhir N Shelke

Das könnte Ihnen auch gefallen