ARM Cortex Coding

Getting started on Cortex A8 Instruction Set
Instruction Sets
32-bit ARM instruction set : 16-bit Thumb instruction set : 32-bit Thumb-2 instruction set :
(Trade off between two above), Most 32 bit instructions are unconditional when compared to ARM
Advanced SIMD architecture.

Enabling the same operation to be performed on multiple items in parallel. Instructions operate on vectors held in 64-bit or 128-bit registers
Other instruction sets

ThumbEE instruction set Jazelle Extension
Register Set (ARM and Neon)

33 general-purpose 32-bit registers
In user mode only R0 to R15 are available R14 -> Link register : Holds the return address when the branch is called with link (BL) R15 -> Program counter
seven 32-bit status registers

Status Flags/Processor mode
Neon Register Bank

View 1: 32x64-bit general-purpose registers or (D0-D31) View 2: 16x128-bit (quadword) registers, Q0-Q15. Combination of these 128-bit and 64-bit registers, Q0-Q15 and D0-D31.
ARM Instruction set

All ARM instructions are 32 bits long
Branch instructions Data processing instructions Register load and store instructions Multiple register load and store instructions Status register access instructions(OOS) Coprocessor instructions (OOS)
ARM Instruction set

Branch Instructions
branch backwards to form loops branch forward in conditional structures branch to subroutines
e.g.
B label1 BL label1(Branch with link) BEQ {pc}+4
ARM Instruction set

Data processing instructions
Add or multiply two registers Add register with constant Bitwise operations operate on 8 bit, 16 bit and 32 bit data Long multiply instructions give a 64-bit result in two registers
e.g.
ADD r2, r1, r3 SUBS r8, r6, #240 ; sets the flags on the result RSB r4, r4, #1280 ; subtracts contents of r4 from 1280 AND r9,r2,#0xFF00 ORREQ r2,r0,r5 MOVS r3, r2, LSR #3 ;
ARM Instruction set

Register load and store instructions
Load or store the a single register - 8,16,32 bit Load double words Byte and halfword loads can be zero filled or sign extended
e.g.
STMFD r13!, {r0-r5} LDMFD r13!, {r0-r5} PUSH {r5-r7,lr} POP {r5-r7,pc} LDR r3, [r0], #4 ;r0 is incremented by 4 LDR r3, [r0],r4 ;r0 is incremented by r4 LDR r3,[r0,#0x2C] ;load with offset LDR r3,[r0,r4,lsl #2] ;
ARM Instruction set

Conditional Execution Flags
N Set when the result of the operation was Negative. Z Set when the result of the operation was Zero. C Set when the operation resulted in a Carry. V Set when the operation caused oVerflow.
Most of the ARM instructions can be conditional E.g.

ADD r0, r1, r2 ; r0 = r1 + r2, don't update flags ADDS r0, r1, r2 ; r0 = r1 + r2, and update flags ADDSCS r0, r1, r2 ; If C flag set then r0 = r1 + r2, and update flags CMP r0, r1 ; update flags based on r0-r1.
why conditional instructions are required if branch instructions are available?
ARM Instruction set

Suffix details
Neon Instruction set

Vector Duplicate
VDUP{cond}.size Qd, Dm[x]
cond is an optional condition code size must be 8, 16, or 32 Qd specifies the destination register for a quadword operation Dm[x] specifies the NEON scalar.
VADD.datatype {Qd}, Qn, Qm VADD.datatype {Dd}, Dn, Dm

Datatype -> I8, I16, I32 for VADD and VSUB Datatype -> S64, U64 for VQADD or VQSUB(depends on instruction, refer TRM)
Neon Instruction set (e.g.)
Effective Assembly coding

Branch prediction
Maximize usage of conditional instructions instead of branches
a 512-entry 2-way set associative Branch Target Buffer (BTB) a 4096-entry Global History Buffer (GHB) an 8-entry return stack
Pipeline model- Instruction cycle timing

fetch, decode, execute >> 13 stage
Load Store MAC ALU
Neon Pipeline >> 10 Removing interlocks/stalls
Maximize usage of SIMD/Neon Instructions Maximize Dual Issue

how to read ARM instruction tables
ADDEQ R0, R1, R2 LSL#10

Interlock e.g.(Refer Table in next slide)
SMLAL R0, R1, R2, R3 ADD R7,R8,R0 >> four cycles waisted
Alternate approach
SMLAL R0, R1, R2, R3
MOV r4,#0x6 ADD r5,r4,r5 MOV r6,#0x6 LDR r5,[r6,#0x2C]
ADD R7,R8,R0

dummy

Dual Issue
Two basic pipeleines ->Pipeline0 and Pipeline1
LS pipeline, Multiply pipeline, ALU pipeline Multiply pipeline always goes in Pipeline 0 The first instruction always issues in pipeline 0 and the second instruction, if present, issues in pipeline 1 Instructions with the same destination cannot be issued in the same cycle. Refer next Slide for more e.g.
Dual issue (contd..)
General ARM optimization Techniques

Loop unrolling
Use fixed point arithmetic Use shifts instead of multiply and divisions See if complex calculations can be avoided using table lookup Minimize the number of arguments of a function Avoid branches in low level functions
Assly Funcs/files e.g

First four argument go in r0,r1,r2,r3 e.g. of assembly function
General /Neon optimization Techniques

Code Vectorization in C itself Use word arrays instead of halfword or byte arrays Cache friendly coding Put code belonging to same module in the same code section
Code Vectorization
Code Vectorization
Code Vectorization
Code Warrior Demo/Hands on

ARM Cortex Coding

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

ARM Cortex Coding

Hochgeladen von

Copyright:

Verfügbare Formate

Getting started on Cortex A8 Instruction Set

Advanced SIMD architecture.

Other instruction sets

Register Set (ARM and Neon)

seven 32-bit status registers

Neon Register Bank

ARM Instruction set

ARM Instruction set

ARM Instruction set

ARM Instruction set

ARM Instruction set

Most of the ARM instructions can be conditional E.g.

why conditional instructions are required if branch instructions are available?

ARM Instruction set

Neon Instruction set

VADD.datatype {Qd}, Qn, Qm VADD.datatype {Dd}, Dn, Dm

Neon Instruction set (e.g.)

Effective Assembly coding

Pipeline model- Instruction cycle timing

Neon Pipeline >> 10 Removing interlocks/stalls

Maximize usage of SIMD/Neon Instructions Maximize Dual Issue

Effective Assembly coding

Effective Assembly coding

Effective Assembly coding

Effective Assembly coding

Dual issue (contd..)

General ARM optimization Techniques

Assly Funcs/files e.g

General /Neon optimization Techniques

Code Warrior Demo/Hands on

Das könnte Ihnen auch gefallen