Beruflich Dokumente
Kultur Dokumente
Speaker: Lung-Hao Chang Advisor: Porf. Andy Wu Graduate Institute of Electronics Engineering, National Taiwan University
Modified from National Chiao-Tung University IP Core Design course
Outline
ARM processor core Memory hierarchy Software development Summary
09/21/2003
09/21/2003
Embedded ICE
scan chain 0
processor core
scan chain 1
other signals
bus splitter
09/21/2003
09/21/2003
memory interface
bus control
ARM7TDMI core
dbgrq breakpt dbgack exec extern1 extern0 debug dbgen rangeout0 rangeout1 dbgrqi commrx commtx
tapsm[3:0] ir[3:0] tdoen tck1 tck2 screg[3:0] drivebs ecapclkbs icapclkbs highz pclkbs rstclkbs sdinbs sdoutbs shclkbs shclk2bs TRST TCK TMS TDI TDO
TAP information
JTAG controls
09/21/2003
Memory interface
32-bit address A[31:0], bidirectional data bus D[31:0], separate data out Dout[31:0], data in Din[31:0] seq indicates that the memory address will be sequential to that used in the previous cycle
09/21/2003
MMU interface
\trans (translation control), 0: user mode, 1: privileged mode \mode[4:0], bottom 5 bits of the CPSR (inverted) Abort, disallow access
State
T bit, whether the processor is currently executing ARM or Thumb instructions
Configuration
Bigend, big-endian or little-endian
09/21/2003
Initialization
\reset, starts the processor from a known state, executing from address 0000000016
ARM7TDMI characteristics
09/21/2003
10
Memory Access
The ARM7 is a Von Neumann, load/store architecture, i.e.,
Only 32 bit data bus for both inst. And data. Only the load/store inst. (and SWP) access memory.
Memory is addressed as a 32 bit address space Data type can be 8 bit bytes, 16 bit half-words or 32 bit words, and may be seen as a byte line folded into 4-byte words Words must be aligned to 4 byte boundaries, and half-words to 2 byte boundaries. Always ensure that memory controller supports all three access sizes
ARM Platform Design
09/21/2003
11
Non-sequential (N cycle)
(nMREQ, SEQ) = (0, 0) The ARM core requests a transfer to or from an address which is unrelated to the address used in the preceding address.
Internal (I cycle)
(nMREQ, SEQ) = (1, 0) The ARM core does not require a transfer, as it performing an internal function, and no useful prefetching can be performed at the same time
09/21/2003
12
Virtual Address
MMU
Write Buffer
ARM710T
8K unified write through cache Full memory management unit supporting virtual memory Write buffer
ARM Platform Design
ARM720T
As ARM 710T but with WinCE support
ARM 740T
8K unified write through cache Memory protection unit Write buffer
09/21/2003 13
CPU Core
Consists of the ARM processor core and some tightly coupled function blocks Cache and memory management blocks E.g.: ARM710T, ARM720T, ARM74T, ARM920T, ARM922T, ARM940T, ARM946E-S, and ARM966E-S
virtual address
MMU
ARM7TDMI
EmbeddedICE & JTAG
physical address
CP15
ARM710T
09/21/2003 14
ARM8
Higher performance than ARM7
By increasing the clock rate By reducing the CPI
Higher memory bandwidth, 64-bit wide memory Separate memories for instruction and data accesses
ARM8
ARM9TDMI ARM10TDMI
addresses
prefetch unit
Core Organization
The prefetch unit is responsible for fetching instructions from memory and buffering them (exploiting the double bandwidth memory) It is also responsible for branch prediction and use static prediction based on the branch prediction (backward: predicted taken; forward: predicted not taken)
ARM Platform Design
PC instructions
memory (doublebandwidth)
integer unit
CPinst. CPdata
coprocessor(s)
09/21/2003
15
Pipeline Organization
5-stage, prefetch unit occupies the 1st stage, integer unit occupies the remainder
(1) Instruction prefetch (2) Instruction decode and register read (3) Execute (shift and ALU) (4) Data memory access (5) Write back results
Prefetch Unit
Integer Unit
09/21/2003
16
register read
coproc data
multiplier
execute
memor y
09/21/2003
17
ARM8 Macrocell
ARM810
prefetch unit
virtual address
CP44
8K byte unified instruction and data cache Copy-back Double-bandwidth MMU Coprocessor Write buffer
JTAG
write buffer
physical address
MMU
address buffer
data in data out address
09/21/2003
18
StrongARM
The first ARM processor to use a modified-Harvard (separate instruction and data cache) architecture and now available from Intel Feature
A 5-stage pipeline with register forwarding Single-cycle execution of all common instruction s except 64-bit multiplies Instruction cache/copy-back data cache Write buffer Pseudo-static operation with low power consumption
09/21/2003
19
+4
I-cache
fetch
+ disp
B, BL MOV pc LDM/ STM branch target
register read
+4
postindex
pre-index
reg shift
execute
forwarding paths
mux
SUBS pc
load/store address
buffer/ data
register write
write-back
09/21/2003
20
StrongARM Processor
SA-1110/SA-1111
Intel SA-1 core 16-Kbyte instruction and 8-Kbyte data cache MMU, read and write buffers 512-byte mini-data cache
09/21/2003
21
ARM9TDMI
Harvard architecture
Increases available memory bandwidth
Instruction memory interface Data memory interface
09/21/2003
22
ARM9TDMI Organization
next pc
+4 I-cache fetch
pc + 4
pc + 4 r44
register read
mul
LDM/ STM
+4
post index
shift ALU
reg shift
pre-index
execute
forwarding paths
mux
B, B L MOV pc SUBS pc
load/store address
D-cache
rot/sgn ex
LDR pc
register write
write-back
09/21/2003
23
Decode
Thumb decompress ARM decode reg read
Execute
shift/ALU reg write
ARM9TDMI:
instr uction fetch r. read decode shift/ALU data memor y access reg write
Fetch
Decode
Execute
Memory
Write
The ARM9TDMI pipeline is much tighter and does not have sufficient slack time to allow Thumb instructions to be first translate into ARM instructions and then decoded It has hardware to decode both ARM and Thumb instructions directly
ARM Platform Design SOC Consortium Course Material 09/21/2003 24
On-chip debugger
Additional features compared to ARM7TDMI
Hardware single stepping Breakpoint can be set on exceptions
ARM9TDMI characteristics
09/21/2003
25
ARM922T
ARM9TDMI 8KB instruction cache, 8KB data cache Full Memory Management Unit, Write Buffer
ARM940T
ARM9TDMI 4KB instruction cache, 4KB data cache Protection Unit
ARM Platform Design SOC Consortium Course Material 09/21/2003 26
09/21/2003
27
Architecture v5TE
ARM946E-S
ARM10TDMI (1/2)
High-end ARM processor core Performance on the same IC process
ARM10TDMI 2 ARM9TDMI 2 ARM7TDMI
decode
s h ift /A L U m u ltip ly
F e tc h
ARM Platform Design
Is s ue
D e c od e
E x e c ute
M em or y
09/21/2003
W rit e
29
ARM10TDMI (2/2)
Reduce CPI
Branch prediction Non-blocking load and store execution 64-bit data memory transfer 2 registers in each cycle
09/21/2003
30
ARM1020T Overview
Architecture v5T
ARM1020E will be v5TE
CPI ~ 1.3 6-stage pipeline Static branch prediction 32KB instruction and 32KB data caches
hit under miss support
64 bits per cycle LDM/STM operations EmbeddedICE Logic RT-II Support for new VFPv1 architecture ARM10200 test chip
ARM1020T VFP10 SDRAM memory interface PLL
09/21/2003
31
ARM1136J(F)-S
First Implementations of ARMv6 Architecture
ARM1136J-S ARM1136JF-S 8 stages integer-only core with integrated floating point
Availability
Delivering to first licensees in December 2002
The ARM11 core has been developed and integrated in parallel with the ARM11 PrimeXsys Platform to ensure a fully compatible, high performance, extendable system solution
ARM Platform Design SOC Consortium Course Material 09/21/2003 32
Memory Hierarchy
09/21/2003
33
Small
Fast
registers
Expensive
Main memory
Large capacity
Hard disk
Cheap Cost
09/21/2003
34
Caches (1/2)
A cache memory is a small, very fast memory that retains copies of recently used memory values. It usually implemented on the same chip as the processor. Caches work because programs normally display the property of locality, which means that at any particular time they tend to execute the same instruction many times on the same areas of data. An access to an item which is in the cache is called a hit, and an access to an item which is not in the cache is a miss.
ARM Platform Design SOC Consortium Course Material 09/21/2003 35
Caches (2/2)
A processor can have one of the following two organizations:
A unified cache
This is a single cache for both instructions and data
09/21/2003
36
memory
4.. 4 4 44 4
37
cache address
instructions
instructions
instructions
data
memory
4.. 4 4 44 4
38
09/21/2003
Copy-back (write-back)
No kept coherent with main memory
09/21/2003
39
Software Development
09/21/2003
40
ARM C Compiler
Compiler is compliant with the ANSI standard for C Supported by the appropriate library of functions Use ARM Procedure Call Standard, APCS for all external functions
For procedure entry and exit
09/21/2003
42
ARM Linker
Take one or more object files and combine them Resolve symbolic references between the object files and extract the object modules from libraries Normally the linker includes debug tables in the output file
09/21/2003
43
09/21/2003
44
09/21/2003
46
ARM Integrator
A mother with some extensions to support the development of applications Provides core modules, logic modules (Xilinx Virtex FPGA, Alter APEX FPGA), OS, input/output resources, bus arbitration, interrupt handling
09/21/2003
47
Summary (1/2)
ARM Processor Family
Processor family ARM6 ARM7 ARM8 ARM9 ARM10 StrongARM ARM11 # of pipeline stages 3 3 5 5 6 5 8 Memory organization Von Neumann Von Neumann Von Neumann Harvard Harvard Harvard Clock Rate MIPS/MHz 25 MHz 66 MHz 72 MHz 200 MHz 400 MHz 233 MHz 0.9 1.2 1.1 1.25 1.15 1.2
09/21/2003
48
Summary (2/2)
Memory hierarchy
Unified cache/Separate instruction and data cache Write-through with buffered write
Software Development
CodeWarrior IDE
armcc/tcc/armcpp/tcpp armasm armlink armprof
ARMulator
ARM Integrator
ARM Platform Design SOC Consortium Course Material 09/21/2003 49
References
[1] http://twins.ee.nctu.edu.tw/courses/ip_core_02/index.html [2] ARM System-on-Chip Architecture, Second Edition, edited by S.Furber, Addison Wesley Longman: ISBN 0-201-67519-6. [3] Architecture Reference Manual, Second Edition, edited by D. Seal, Addison Wesley Longman: ISBN 0-201-73719-1. [4] www.arm.com
09/21/2003
50