You are on page 1of 24

The ARM Processor

CSC 522 Embedded Systems Summer, 2006

ARM What is it?

ARM stands for Advanced RISC Machines An ARM processor is basically any 16/32bit microprocessor designed and licensed by ARM Ltd, a microprocessor design company headquartered in England, founded in 1990 by Herman Hauser A characteristic feature of ARM processors is their low electric power consumption, which makes them particularly suitable for use in portable devices. It is one of the most used processors currently on the market

Examples of ARM Based Products

The Toshiba 46HM94 46-inch Television

The Nano IPod

Samsung S3FJ9SK Smartcard IC

The Motorola E680i is one of the latest mobile handsets

History of ARM

Acorn Computers: a British computer company founded in Cambridge, England, in 1978, by Hermann Hauser and Chris Curry. The company produced a number of computers which were especially popular in the UK. These included the Acorn Electron, the BBC Micro and the Acorn Archimedes. Acorn's BBC Micro computer dominated the UK educational computer market during the 1980s and early 1990s. VLSI Technology, Inc. produced the first ARM processor based on Acorn designs. ARM based PCs did not sell well, Acorn acquired by Olivetti in 1985 ARM contracted to develop for Apple for the Apple Newton Handheld built by VLSI. The company was broken up into several independent operations in 2000, one of which, notably, was ARM Holdings ARM holdings primary business model is to license its RISC based designs to other manufactures.

General Computer Architecture Idealized Baseline

A Stored-Program digital computer keeps its instructions and data in the same memory system, allowing the instructions to be treated as data when necessary. A Stored-Program computer is sometimes reflected by its configuration as a desktop machine where the user runs different programs at different times. Other times a Stored-Program computer is reflected by the same processor being used in a range of different applications, each with a fixed program, i.e. an embedded system.

General Computer Architecture Idealized Baseline (cont.)

MU0 University of Manchester. (Basically a simplified MARIE computer) Basic components Program Counter (IR) Accumulator (ACC) Arithmetic-Logic Unit (ALU) Instruction Register (IR) An instruction set

MU0 data path example

MU0 instruction format

MU0 Instruction Set

General Computer Architecture Definitions

Semantic Gap The distance, in implementation terms, between a high-level language construct and a machine instruction. Compiler A computer program that translates a high-level language program into a sequence of machine instructions. Processor Design Trade-offs Processor design is to define an instruction set that supports the functions that are useful to the programmer while at the same time allowing an implementation that is as efficient as possible. Good processor design should define the instruction set to be a good compiler target rather than something that the programmer will use directly.

General Computer Architecture Two Types of Instruction Sets

Complex Instruction Set Computers(CISC) Intended to reduce the semantic gap. Single instruction procedure entries and exits Variable length instruction sets with many formats Complex sequence of operations over many clock cycles Processors based on CISC were sold on the sophistication and number of their addressing modes, data types, etc Developed in the 1970s when computers had slow main memory so processors were controlled by faster ROMs Frequently used operations are drawn from ROM as microcode sequences rather than having instructions pulled from main memory Reduced Instruction Set Computers(RISC) Pipeline execution Starting a second instruction before the first one has finished A fixed (32 bit) instruction size with few formats. A load-store architecture where instructions that process data operate only on registers and are separate from instructions that access memory A large register bank of 32-bit registers, all of which can be used for any purpose, to allow the load-store architecture to operate efficiently Hard-wired instruction decode logic Single-cycle execution

RISC Architecture Advantages/Disadvantages

A smaller die size A simpler processor requires fewer transistors and less silicon area. A shorter development time Less design effort and therefore a lower cost A higher performance Simpler instructions are executed faster.

Poor code density compared with CISCs Doesnt execute x86 code

RISC Power-efficient Processing

Principals of low-power circuit design

Minimize the power supply voltage Minimize the circuit activity Minimize the number of gates Minimize the clock frequency

RISC Power-efficient Processing (cont.)

Strategy of low-power circuit design

Minimize voltage

Choose the lowest clock frequency that delivers the required performance, then set the poser supply voltage as low as is practical. Off-chip capacitances are much higher than on-chip loads Avoid clocking unnecessary circuit functions and to employ sleep modes where possible

Minimize off-chip activity

Minimize on-chip activity

ARM Architecture
RISC features incorporated by ARM

A load-store Architecture Fixed-length 32-bit instructions 3-address instruction formats Pipelining

RISC features not incorporated into ARM

Delayed branches

Single-cycle execution of all instructions

ARM Architecture Instruction Set Foundation

Visible Registers

User Addressable System Addressable

ARM Architecture Instruction Set Foundation

Current Program Status Register

Used in user-level programs to store the condition code bits.

N: Negative; the last ALU operation which changed the flags produced a negative result Z: Zero; the last ALU operation which changed the flags produced a zero result C: Carry; the last ALU operation which changed the flags generated a carry-out. V: Overflow; the last arithmetic ALU operation which changed the flags generated an overflow into the sign bit.

ARM Architecture Instruction Set Foundation

The Memory System

The ARM system has memory state

Viewed as a linear array of bytes numbered from 0 to 232-1 Data items may be
8-bit bytes 16-bit half-words 32-bit words

Words are always aligned on a 4-byte boundary

ARM Architecture Instruction Set Foundation

Load-store Architecture

The instruction set will only process (add, subtract, etc.) values which are in registers (or specified directly within the instruction itself), and will always place the results of such processing into a register The only operations which apply to memory state are one which copy memory values into registers (load instructions) or copy register values from memory (store instructions)

ARM instructions fall into three categories

Data processing instructions.

These use and change only register values These copy memory values into registers (load instructions) or copy values into memory (store instructions). An additional form, useful only in systems code, exchanges a memory value with a register value.

Data transfer instructions

Control flow instructions

Control flow instructions cause execution to switch to a different address, either permanently (branch instructions) or saving a return address to resume the original sequence (branch and link instructions) or trapping into system code (supervisor calls)

ARM Architecture Instruction Set Foundation

Supervisor mode

The ARM processor supports a protected supervisor mode. The protection mechanism to ensures that the user code cannot gain supervisor privileges without appropriate checks being carried out to ensure that the code is not attempting illegal operations

ARM Architecture Instruction Set Foundation

The ARM Instruction Instruction Set Features The load-store architecture Set 3-address data processing instructions

All ARM instructions are 32 bits wide and are aligned on 4-byte boundaries The exception is the compressed 16 bit Thumb instructions

Conditional execution of every instruction Inclusion of load and store multiple register instructions Ability to perform a general shift operation and a general ALU operation in a single instruction that executes in a single clock cycle Open instruction set extension through the coprocessor instruction set, including adding new registers and data types A very dense 16-bit compressed representation of the instruction set in the Thumb architecture

ARM Architecture Instruction Set Foundation

The I/O System

The ARM handles I/O peripherals as memory-mapped devices with interrupt support. The internal registers in these devices appear as addressable locations within the ARMs memory map and may be read and written using the same (load-store) instructions as any other memory location Peripherals may attract the processors attention by making an interrupt request using either the normal interrupt (IRQ) or the fast interrupt (FIQ) input

ARM Organization and Implementation

3-stage pipeline organization

Principal components

The register bank The barrel shifter

Can shift or rotate one operand by any number of bits

The ALU The address register and incrementer

Select and hold all memory addresses and generate sequential


The data registers The instruction decoder and associated control logic

Process Instruction Flow

In a single-cycle data processing instruction, two register operands are accessed, the value on the B bus is shifted and combined with the value on the A bus in the ALU, then the result is written back into the register bank. The program counter value is in the address register, from where it is fed into the incrementer, then the incremented value is copied back into r15 in the register bank and also into the address register to be used as the address for the next instruction fetch

ARM Organization and Implementation

ARM processors employ a simple 3-stage pipeline with the following pipeline stages

Fetch The instruction is fetched from memory and placed in the instruction pipeline Decode The instruction is decoded and the data path control signals prepared for the next cycle. In this stage the instruction owns the decode logic but not the data path Execute The instruction owns the data path; the register bank is read, an operand shifted, the ALU result generated and written back into a destination register

Example ARM Instruction Set

The ARM processor has a rich history both in academia and in the commercial space. It uses innovative architectural design to achieve high performance with low power consumption. It is highly utilized in mobile and embedded devices due to its power characteristics and is one of the most populous processors currently used. It utilizes the RISC instruction set to achieve this performance. It also uses a variety of organizational designs such as pipelining, in addition to the instruction set. The ARM processor is a robust development platform that will be in use for many years to come.