Sie sind auf Seite 1von 88

INTRODUCTION

ARM is a RISC processor.


It is used for small size and high
performance applications.
Simple architecture low power
consumption.

ARM

System - On

TIMELINE (1/2)

1985: Acorn Computer Group manufactures


the first commercial RISC microprocessor.
1990: Acorn and Apple participation leads to
the founding of Advanced RISC Machines
(A.R.M.).
1991: ARM6, First embeddable RISC
microprocessor.
1992 1994: Various companies use ARM
(Sharp, Samsung), while in 1993 ARM7, the
first multimedia microprocessor is introduced.

ARM

System - On

TIMELINE (2/2)

1995: Introduction of Thumb and ARM8.


1996 2000: Alcatel, Huindai, Philips, Sony,
use RM, while in 1999 ARM cooperates with
Erickson for the development of Bluetooth.
2000 2002: ARMs share of the 32 bit
embedded RISC microprocessor market is
80%. ARM Developer Suite is introduced.

ARM

System - On

THE ARM
ARCHITECTURE

GENERAL INFO (1/2)


AIM: Simple design

Load store architecture


32 bit data bus
3 addressing modes

ARM

System - On

GENERAL INFO (2/2)


Simple architecture
+
Simple instruction
set
+
Code density

ARM

Small size

Low power
consumption

System - On

Registers

32 general purpose registers


7 modes of operation
Different set of visible registers
and different cpsr control level in
each mode.

ARM

System - On

ARM Programming Model


r0
r1
r2
r3
r4
r5
r6
r7
r8
r9
r10
r11
r12
r13
r14
r15 (PC)

CPSR

user mode

usable in user mode


system modes only

r8_fiq
r9_fiq
r10_fiq
r11_fiq
r12_fiq
r13_fiq
r14_fiq

SPSR_fiq

fiq
mode

r13_svc
r14_svc

r13_abt
r14_abt

SPSR_svc

SPSR_abt

svc
mode

abort
mode

r13_irq
r14_irq

r13_und
r14_und

SPSR_irq SPSR_und

irq
mode

undefined
mode

CPSR

31

28 27

NZCV

ARM CPSR
format
unused

8 7 6 5 4

IF T

mode

N: Negative
Z: Zero
C: Carry
V: Overflow
Q: Saturation (for enhanced DSP instructions)
ARM

System - On

10

Memory Organization
bit 31

bit 0

23

22

21

20

19

18

17

16

15

14

13

12

11

10

word16

Address bus: 32 bits

1 word = 32 bits

half-word14 half-word12

word8

byte6 half-word4

byte
address

byte3 byte2 byte1 byte0

ARM

System - On

11

Instruction Set

Three instruction types

Data processing
Data transfer
Control flow

ARM

System - On

12

Supervisor mode

In user mode the operating system


handles operations outside user
privileges.
Using supervisor calls, the user goes
to system level and can perform system
functions.

ARM

System - On

13

I/O System

ARM handles peripherals as memory


mapped devices with interrupt support.
Interrupts:

IRQ: normal interrupt


FIQ: fast interrupt

ARM

System - On

14

Exceptions

Exceptions:

Interrupts
Supervisor Call
Traps

When an exception takes place:

The value of PC is copied to r14_exc


The operating mode changes into the
respective exception mode.
The PC takes the exception handler vector
address.

ARM

System - On

15

ARM programming model


r0
r1
r2
r3
r4
r5
r6
r7
r8
r9
r10
r11
r12
r13
r14
r15 (PC)

CPSR

user mode

usable in user mode


system modes only

r8_fiq
r9_fiq
r10_fiq
r11_fiq
r12_fiq
r13_fiq
r14_fiq

SPSR_fiq

fiq
mode

r13_svc
r14_svc

r13_abt
r14_abt

SPSR_svc

SPSR_abt

svc
mode

abort
mode

r13_irq
r14_irq

r13_und
r14_und

SPSR_irq SPSR_und

irq
mode

undefined
mode

THE ARM
INSTRUCTION SET

Data Processing Instructions


(1/2)

Arithmetic Operations
ADD r0, r1, r2 ; r0:= r1+r2 and dont update flags
ADDS r0, r1, r2 ; r0:= r1+r2 and update flags

Logical Operations
AND r0, r1, r2 ; r0:= r1 AND r2

Register Movement
MOV r0, r2

Comparison
CMP r1, r2

ARM

System - On

18

Data Processing Instructions


(2/2)

Operands:

Immediate operands
ADD r3, r3, #1

Shifted register operands:


ADD r3, r2, r1, LSL #3

Miscellaneous data processing


instructions:

Multiplication:
MUL r4, r3, r2

ARM

System - On

19

Data transfer instructions

Load and store instructions:

LDR r0, [r1]


STR r0, [r1]
Offset: LDR r0, [r1,#4]

Post indexed: LDR r0, [r1], #16


Auto indexed: LDR r0, [r1,#16]!

Multiple data transfers:


LDMIA r1, {r0,r2,r5}
ARM

System - On

20

Examples

PRE:

r0 = 0x00000000
r1 = 0x00009000
mem32[0x00009000] = 0x01010101
mem32[0x00009004] = 0x02020202

LDR r0, [r1, #4]!


POST:

r0 = 0x02020202
r1 = 0x00009004
ARM

System - On

21

Examples

PRE:

r0 = 0x00000000
r1 = 0x00009000
mem32[0x00009000] = 0x01010101
mem32[0x00009004] = 0x02020202

LDR r0, [r1, #4]


POST:

r0 = 0x02020202
r1 = 0x00009000
ARM

System - On

22

Examples

PRE:

r0 = 0x00000000
r1 = 0x00009000
mem32[0x00009000] = 0x01010101
mem32[0x00009004] = 0x02020202

LDR r0, [r1], #4


POST:

r0 = 0x01010101
r1 = 0x00009004
ARM

System - On

23

Examples
mem32[0x80018] = 0x03
mem32[0x80014] = 0x02
mem32[0x80010] = 0x01
r0 = 0x00080010
LDMIA r0!, {r1-r3}
r0 = 0x0008001c
r1 = 0x00000001
r2 = 0x00000002
r3 = 0x00000003

ARM

System - On

24

Examples
mem32[0x8001c] = 0x04
mem32[0x80018] = 0x03
mem32[0x80014] = 0x02
mem32[0x80010] = 0x01
r0 = 0x00080010
LDMIB r0!, {r1-r3}
r0 = 0x0008001c
r1 = 0x00000002
r2 = 0x00000003
r3 = 0x00000004

ARM

System - On

25

Conditional execution
Instructions can be executed
conditionally without braches
CMP r2, r3 ;subtract and set flags
ADDGE r4, r5, r6 ; if r2>r3
SUBLT r4, r5, r6 ; else

ARM

System - On

26

Conditional execution
mnemonics

ARM

System - On

27

Control flow instructions

Branch instruction: B label


Conditional branch: BNE label
Branch and Link:
BL label
BL

Loop

loop

MOV PC, r14 ;

ARM

System - On

28

Example 1
AREA ARMex, CODE, READONLY ; Name this block of code
ARMex
ENTRY
; Mark first instruction to execute
start
MOV r0, #10
; Set up parameters
MOV r1, #3
ADD r0, r0, r1
; r0 = r0 + r1
stop
MOV r0, #0x18 ; angel_SWIreason_ReportException
LDR r1, =0x20026
; ADP_Stopped_ApplicationExit
SWI 0x123456 ; ARM semihosting SWI
END
; Mark end of file

ARM

System - On

29

Example 2
AREA subrout, CODE, READONLY
; Name this block of
code
ENTRY
; Mark first instruction to execute
start MOV r0, #10 ; Set up parameters
MOV r1, #3
BL doadd ; Call subroutine
stop
MOV r0, #0x18 ; angel_SWIreason_ReportException
LDR r1, =0x20026
; ADP_Stopped_ApplicationExit
SWI 0x123456 ; ARM semihosting SWI
doadd
ADD r0, r0, r1
; Subroutine code
MOV pc, lr
; Return from subroutine
END
; Mark end of file

ARM

System - On

30

ARM ORGANIZATION AND


IMPLEMENTATION

3
Stage
Pipeline
(ARM7
80MHz)

Fetch
Decode
Execute

A[31:0]

c ontrol

address regi ster


P
C

i nc rementer

PC

regi ster
bank

i nstruc ti on
dec ode
A
L
U
b
u
s

multipl y
regi ster

&

b
u
s

b
u
s

barrel
shifter

c ontrol

ALU

Throughput:
1 instruction /
cycle

data out regi ster

data i n regi ster


D[31:0]

5 stage pipeline (1/2)

Program execution time:


N inst CPI
T prog
f clk
Ways to reduceTprog

Increase f clk
Logic simplification
Reduce CPI
reduce the number
of multicycle instructions.

ARM

System - On

33

5 stage
pipeline
(ARM9150MHz)

next
pc

+4

pc + 4

pc + 8

I decode
r15

LDM/
STM

+4

Fetch
Decode
Execute
Buffer /
Data
Write Back

instruction
decode

register read

(2/2)

fetch

I-cache

mul
post index

reg
shift

shift

pre-index

B, BL

immediate
fields

mux

execute

ALU

forwarding
paths

MOV pc
SUBS pc

byte repl.

load/store
address

D-cache

buffer/
data

rot/sgn ex
LDR pc

register write

writeback

ARM coprocessor interface

ARM supports upto 16 coprocessors,


which can be software emulated.
Each coprocessor has upto 16 generalpurpose registers
ARM is a load and store architecture.
Coprocessors usually handle on chip
functions, such as cache and memory
management.
ARM

System - On

35

ARCHITECTURAL SUPPORT
FOR HIGH LEVEL
LANGUAGES

Floating - point accelerator


(1/2)

For floating-point operations, ARM has the FPE


software emulator and the FPA 10 hardware
floating point accelerator.
FPA 10 includes:

Coprocessor interface
Load / store unit
Register bank ( 8 registers 80 bit )
ALU (adder, mult, div)

ARM

System - On

37

Floating - point accelerator


(2/2)
data bus

pipeline
control

instruction
issuer

load/store
unit

coprocessor
hand-shake

coprocessor
interf ace

register bank

add
mult
div

ARM

arithmetic
unit

System - On

38

APCS (1/2)

APCS (ARM Procedure Call Standard) is a set


of rules concerning C procedure input and
output.
Specific use of general purpose registers. (r0
r4: arguments, r4 r8 variables, r10 stack
limit, etc. )
Procedure I/O:
BL

Loop

Loop

MOV

pc, lr

ARM

System - On

39

APCS (2/2)
Assembly code

C code
void f1(int a)
{
f2(a); }

f1

16
8

LDR r0, [r13]


STR r13!, [r14]
STR r13!, [r0]
BL f2
SUB r13,#4
LDR r13!, r15

4
0

Stack
pointer

ARM

System - On

40

THUMB PROGRAMMERS
MODEL

General information

Thumb objective:
Code density.
Thumb has a 16 bit instruction set.
A subset of the ARM instruction set is coded
to a 16bit space
With appropriate use great benefits can be
achieved in terms of

Power efficiency
Enhanced performance
ARM

System - On

42

Going in and out of Thumb


mode

Using the BX instruction, in ARM state:


e.g. r0

Commands are assembled as 16 bit


instructions with the appropriate directive
If r0[0] is 1, the T bit in the CPSR becomes
1 and the PC is set to the address obtained
from the remaining bits of r0.
Using the BX instruction from Thumb
state, we return to ARM state.

ARM

System - On

43

The Thumb programmers


model

Thumb registers
r0
r1
r2
r3
r4
r5
r6
r7
r8
r9
r10
r11
r12
SP (r13)
LR (r14)
PC (r15)

shaded registers have


restricted access
Lo registers

Hi registers
CPSR

ARM

System - On

44

ARM vs. Thumb (1/3)

Thumb

Upto 70% code


size reduction
40% more
instructions.
45% faster code
with 16-bit
memory
Requires about
30% less
external memory

ARM

ARM

40% faster code


when coupled
with a 32-bit
memory

System - On

45

ARM vs. Thumb (2/3)

If performance is critical:

ARM

If cost and power consumption are


critical:

Thumb

ARM

System - On

46

ARM and humb


interaction

A 32 bit ARM system can go into Thumb


mode for specific routines, in order to meet
power and memory constraints.
A 16 bit system: Can use an on chip, 32
bit memory for ARM state routines, and a
16-bit off chip memory and Thumb code
for the rest of the application.

ARM

System - On

47

Example 3

AREA ThumbSub, CODE, READONLY ; Name this block of code


ENTRY
; Mark first instruction to execute
CODE32
; Subsequent instructions are ARM
header ADR r0, start + 1
; Processor starts in ARM st
BX r0 ; so small ARM code header used
; to call Thumb main program
CODE16
; Subsequent instructions are Thum
start
MOV r0, #10
; Set up parameters
MOV r1, #3
BL doadd
; Call subroutine
stop
MOV r0, #0x18
;
angel_SWIreason_ReportException
LDR r1, =0x20026
; ADP_Stopped_ApplicationExit
SWI 0xAB
; Thumb semihosting SWI
doadd
ADD r0, r0, r1
; Subroutine code
MOV pc, lr
; Return from subroutine
END
; Mark end of file

ARM

System - On

48

Example 4
Implement the following pseudocode in
ARM and Thumb assembly. Which is
more efficient in terms of execution time
and which in terms of code size?
If r1>r2 then
R3= r4 + r5
R6 = r4 r5
Else
R3= r4 - r5
R6 = r4 + r5

ARM

System - On

49

Example 5

Write an ARM assembly program


that loads data from memory
location 0x40, sets bits 3 to 5,
clears bits 0 to 2 and leaves the
remaining bits unchanged.
Test it using 0xAD as input data

ARM

System - On

50

ARCHITECTURAL
SUPPORT FOR SYSTEM
DEVELOPMENT

The ARM memory


interface
ROM0e

control

A basic
ARM
memor
y
system

RAMwe3

RAMwe2

RAMwe1

A[n+2:2]

A[n+2:2]

A[n+2:2]

RAMwe0

RAMoe
A[n+2:2]
A[31:0]

ARM
D[31:0]
D[31:0]

SRAM

SRAM

SRAM

SRAM

D[7:0]

D[7:0]

D[7:0]

D[7:0]

D[31:24]

D[23:16]

D[15:8]

D[7:0]

D[7:0]

D[7:0]

D[7:0]

D[7:0]

ROM

ROM

ROM

ROM

A[m+2:2]

A[m+2:2]

A[m+2:2]

A[m+2:2]

AMBA (1/4)

Advanced Microcontroller Bus


Architecture

Advanced High Performance Bus


Advanced System Bus
Advanced Peripheral Bus

AMBA objectives:
Technology independence
To encourage modular system design
ARM

System - On

53

AMBA (2/4)

A typical AMBA based system

ARM

System - On

54

AMBA (3/4)

AHB bus

arbiter

Burst
transaction
Split
transaction
Data bus 64
128 bit

address
master
1

slave
1
write
data

master
2

slave
2

master
3

slave
3

read
data
decoder

ARM

System - On

55

AMBA (4/4)

AMBA Design Kit (ADK)

An environment that assists designers in developing


based components SoC designs.

ARM

System - On

56

Signal Processing Support


(1/2)

Piccolo DSP coprocessor.


Various data memories for
maximizing throughput.

ARM

System - On

57

Signal Processing Support


Piccolo
ALU
mult

decode and control

(2/2)

I cache

ARM7TDMI
output
buffer

register
bank

input
buffer

A M B A i/ f

AMBA i/f
AMBA

MEMORY HIERARCHY

Memory hierarchy
Larger size
Memory
type

Lower speed
Size

Registers 32 bit

Speed
A few nsec

On chip 8
10 nsec
cache
32kbytes
Off chip 100
cache
200
kbytes
RAM

Mbytes
ARM

10 30
nsec
100 nsec
System - On

60

On chip memory

Necessary for performance


Some system prefer RAM to on chip
cache. Simpler, cheaper and less
power-hungry.

ARM

System - On

61

Cache types

Cache types:

Unified cache.
Separate instruction and data caches.

Performance:hit rate miss rate

t av htcache (1 h)t main

Compulsory miss: first time and address is accessed


Capacity miss: When cache full
Conflict miss: Two addresses compete for the same
place in the cache

ARM

System - On

62

Replacement policy
-implementation

Least Recently Used (LRU)


Least Frequently Used (LFU)
Data prediction
Fully-associative
Direct-mapped
Set-associative
ARM

System - On

63

Direct mapped cache


address

A line
of data
stored
in a
tag of
memor
y

tag RAM

data RAM

compare

mux

hit

data

ARM

System - On

64

(1/2)

Direct mapped cache

(2/2)

Each memory location has a specific


place in the cache.
Tag and data can be accessed at
the same time.
Tag RAM smaller than data RAM and
has a smaller access time allowing
the comparison to complete before
accessing the data RAM.
ARM

System - On

65

address

2 way set

associative
cache. (1/3)

tag RAM

data RAM

compare

mux

hit

compare

tag RAM

data

mux

data RAM

Set associative cache (2/3)

A set associative cache has a number


of sets yielding n way associative
cache.
Two addresses that would be competing
for the same spot in a direct mapped
cache, can be stored in different
locations and accessed independently.

ARM

System - On

67

Set associative (3/3)

Set selection:

Random allocation
Least recently used (LRU)
Round robin (cyclic)

ARM

System - On

68

Fully associative (1/2)


address

tag CAM

data RAM

mux
hit

data

Write strategies

Write through
All write operations are passed to main memory

Write through with buffered write


Write operations are passed to main memory
through the write buffer

Copy back (write back)


Write operations update only the cache.

ARM

System - On

70

Cache feature summary


Org ani zati o nal f eature
Cac he- MMU re l ati o ns hi p
Cac he c o nte nts
As s o ci ati v i ty
Re pl ac ement s t rateg y
Wri te s t rateg y

Physical cache
Unified instruction
and data cache
Direct-mapped
RAM-RAM
Cyclic
Write-through

ARM

Opti o ns
Virtual cache
Separate instruction
and data caches
Set-associative
RAM-RAM
Random
Write-through with
write buffer

System - On

71

Fully associative
CAM-RAM
LRU
Copy-back

Perfect cache
performance
Cache fo rm
No cache
Instruction-only cache
Instruction and data cache
Data-only cache

ARM

Perfo rmance
1
1.95
2.5
1.13

System - On

72

MMU (1/3)

Two memory management


approaches:
Segmentation
Paging

ARM

System - On

73

MMU (2/3)

Segmented memory management:


segment selec tor

base

logical address

limit

segment descriptor table

>?

physical address

access fault

ARM

System - On

74

MMU (3/3)

Paging memory management:


31

22 21

12 11

logical address

data

page
directory

ARM

page
table

page
frame

System - On

75

ARCHITECTURAL
SUPPORT FOR
OPERATING SYSTEMS
External
Clock

W'Dog

External
Reset &
Battery Fail

System
Control

14 External
Interrupts

Trace Port
Analyser

ETM

Timers
&
RTC
(PL031)

VIC
(PL192)

8 external DMA
requests

DMAC
(PL080)

64

AHB/APB
Bridge

64

64

64

1.
2.
3.
4.
5.
6.
7.
8.

config

64
64
64
64
MPMC
(PL176)

Static
Memory

SMC
(PL093)

unassigned

SDRAM
& DDR

CLCD
Display

CLCD
(PL110)

ARM1136JF
core

8 AHBs

Bus Matrix

config

1.
2.
3.
4.
5.
6.
7.
8.

ARM Periph AHB


ARM D Write AHB
ARM D Read AHB
ARM I AHB
ARM DMA AHB
CLCD AHB
DMA 2 AHB
DMA 1 AHB

AHB/APB
Bridge

AHB/APB
Bridge

GPIO
(PL061)

SSP
(PL022)

32 GPIO
Lines

UART
(PL011)

2x UARTs

SCI
(PL131)

Smart Card
(UICC
compliant)

CP15

On chip coprocessor for MMU,


cache, protection unit control.
Control takes place through registers
with instructions executed in
supervisor mode.

ARM

System - On

77

Protection Unit

Simpler alternative to the MMU.


Requires simpler software and
hardware.
Does not use translation tables,
but 8 protection regions instead.

ARM

System - On

78

ARM DEVELOPER SUITE

ARMULATOR (1/2)

Armulator: Emulator of various


ARM processors.
Allows project development in C,
C++ or Assembly.
It includes debugger, compilers,
assembler and this entire set is
called ARM Developer Suite (ADS).
ARM

System - On

80

ARMULATOR (2/2)

Possible project options:

ARM and Thumb Interworking


Mixing C, C++ and Assembly
Code for ROM
Exception handlers

MM
ARM

System - On

81

ARMULATOR TUTORIAL

CODEWARRIOR ENVIRONMENT

ARM

System - On

82

ARM

System - On

83

ARM

System - On

84

ARM

System - On

85

ARM

System - On

86

ARM

System - On

87

Das könnte Ihnen auch gefallen