Graduate Institute of Electronics Engineering, NTU
Blackfin Blackfin Processor Architecture Processor Architecture Instructor: Prof. Andy Wu ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU Introduction Blackfin Processor Blackfin Processor Product Highlights ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU Introduction Blackfin Processor Blackfin Processor Product Highlights ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU Berkeley incorporated a Reduced Instruction Set Computer (RISC) architecture It has the following key features: A fixed (32-bit) instruction size with few formats CISC processors typically had variable length instruction sets with many formats A load store architecture were instructions that process data operate only on registers and are separate from instructions that access memory CISC processors typically allowed values in memory to be used as operands in data processing instructions A large register bank of thirty-two 32-bit registers, all of which could be used for any purpose, to allow the load-store architecture to operate efficiently CISC register sets were getting larger, but none was this large and most had different registers for different purposes ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU Hard-wired instruction decode logic CISC processor used large microcode ROMs to decode their instructions Pipelined execution CISC processors allowed little, if any, overlap between consecutive instructions (though they do now) Single-cycle execution CISC processors typically took many clock cycles to completes a single instruction ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU Single memory space for program and data Shared global bus ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU Separate program and data memory spaces Usually refer to separate program and data buses ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU Program bus can be use for coefficient loading for MAC ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU Introduction Blackfin Processor Blackfin Processor Product Highlights ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU Made by Analog Devices Coporation A new breed of embedded media processor designed specifically for today s embedded audio, video and communication applications. Combine a 32-bit RISC-like instruction set and dual 16-bit multiply accumulate (MAC) signal processing functionality Perform equally well both in signal processing and control processing applications-in many cases deleting the requirement for separate heterogeneous processors ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU Two 16-bit MACs, two 40-bit ALUs, four 8-bit Video ALUs Support for 8/16/32-bit integer and 16/32-bit fractional data types Concurrent Fetch of One instruction and two unique data elements Two loop counters that allow for nested zero-overhead looping A Modified Harvard architecture in combinational with a hierarchical memory ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU Arbitrary bit and bit field manipulation, insertion and extraction Two data address generator (DAG) units with circular and bit-reversed addressing Data address generator contains two 32-bit address ALUs and an address register file Address register file consists of six 32-bit general purpose pointer registers and four 32-bit circular buffer addressing registers ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU Unified 4GB memory space Mixed 16/32-bit instruction encoding for best code density Memory protection for support of OS operation ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU Three modes of operation User mode User mode has restricted access to a subset of system resources, thus providing a protected software environment User mode is considered the domain of application programs Supervisor mode and Emulation mode Supervisor mode and Emulation mode have unrestricted access to the core resources Supervisor mode and Emulation mode are usually reserved for the kernel code of an operating system ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU Blackfin Blackfin Architecture Support Architecture Support (Single Cycle ) (Single Cycle ) Possibility of the following parallel operations processed in one clock cycle Execution of a single instruction operating on both MACs or ALUs Execution of a 2 x 32-bit data moves 2 reads or 1 read/1 write Execution of two pointer updates Execution of hardware loop updates ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU Blackfin Blackfin Processor Compute Unit Processor Compute Unit ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU BF533 Memory Access BF533 Memory Access Under the right conditions 4 memory accesses at same time 64 bit Instruction Fetch, 2x32 bit Data Loads, 32 bit Data Store PLUS up to 2 ALU(32 bit) and 2 MAC(16 bit) operations at the same time PLUS background DMA activity ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU Compute Unit Architecture Compute Unit Architecture ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU Register File Register File Data Register Syntax R0, R1 etc. refer to 32 bit registers R0.L refers to the low 16 bits of the R0 32 bit reg R0.H refers to the high 16 bits of the R0 register Accumulator Syntax A0.L => low 16 bits A0.H => next 16 bits A0.W => least significant 32 bit word A0.X => MS 8 bit extension SHARC 16 32-bit data registers, integer and float. There is a pair of SHARC accumulator registers too 8 x 32 bit OR 16 x 16 bit 2 x 40 bit accumulators ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU 68K MOVE.L R2, R0 ADD.L R1, R0 MOVE.W R2, R0 ADD.W R1, R0 MOVE.L R2, R0 ASR.L #16, R0 MOVE.L R1, R3 ASR.L #16, R3 ADD.W R3, R0 ASL.L #16, R0 MOVE.W R2, R0 ADD.W R1, R0 SHARC R0 = R1 + R2; Closest R0 = R1 + R2, R4 = R1 R2; Blackfin R0 = R1 + R2; R0.L = R1.L + R2.H; R0 = R1 +|- R2; Means R0.L = R1.L R2.L in parallel with R0.H = R1.H + R2.H ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU A & B registers must stay on the same side of the | for both Instruction For dual and quad 16 bit operations the (CO) option causes the destination registers to cross ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU Multiplies are signed fractional by default Signed fractional multiply result is automatically left shifted 1 bit Signed fractional multiply != signed integer multiply Rounding available on fractional number multiplies and special option of integer number multiplies ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU Two cases Rounding adds 0x8000 to the 32 bit multiplier result or accumulator value before extracting a 16 bit value to the destination register too ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU When extracting a 16 bit fractional value from an accumulator the high 16 bits is taken Where in the destination register it goes depends on which accumulator is being extracted from ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU When extracting a 16 bit integer value from an accumulator the low 16 bits is taken Where in the destination register the 16 bit value goes depends on which accumulator is being extracted from ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU In general there are 16 and 32 bit versions of the arithmetic instructions Most of the 32 bit instructions can be executed in parallel with 2 x 16 bit memory/index operations Exceptions are DIVS, DIVQ and MULTIPLY with 32 bit operands || means parallel Examples: A1=R2.L*R1.L,A0=R2.H*R1.H||R2.H=W[I2++] || [I3++]=R3;\ R2=R2+|+R4, R4=R2-|-R4 || I0+=M0||R1=[I0]; ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU Blackfin Blackfin Processor Processor Memory Architecture Memory Architecture ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU A single, unified 4G byte address space using 32-bit addresses The L1 memory system is the primary highest performance memory available to the core and is faster than L2 memory system The L2 memory system is off-chip and have longer access latencies ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU Blackfin Blackfin Processor Peripherals Processor Peripherals ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU Parallel Peripheral Interface (PPI) Serial Ports (SPORTs) Serial Peripheral Interface (SPI) General-purpose timers Universal Asynchronous Receiver Transmitter (UART) Real-Time Clock (RTC) Watchdog timer General-purpose I/O (programmable flags) ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU Introduction Blackfin Processor Blackfin Processor Product Highlights ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU ADSP ADSP- -BF535 EZ BF535 EZ- -KIT KIT Lite Lite Key features Attributes ADSP-BF535 Blackfin Processor 4M x 32-bit SDRAM 272K x 16-bit FLASH memory AD1885 48 kHz AC 97 SoundMax codec Power management capability JTAG ICE 14-pin header Evaluation suite of VisualDSP++ Three 90-pin conncetors for analyzing and interfacing with the processors peripheral interfaces CE Certified System Requirements Pentium 166 MHz or higher Minimum of 32 MB of RAM Windows 98, Windows 2000, or Windows XP One USB port ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU Analog Devices CROSSCORE Tools Analog Devices CROSSCORE Tools CROSSCORE, Analog Devices development tools product line, provides easier and more robust methods for engineers to develop and optimize systems by shortening product development cycles for faster time-to-market VisualDSP++ software development and debugging environment An integrated software development and debugging environment allowing for fast and easy development, debug, and deployment EZ-KIT Lite evaluation systems Provides an easy way to investigate the power of the ADI s family of Embedded Processors and DSPs to develop applications Emulators Emulators are available for PCI and USB host platforms ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU ADSP ADSP- -BF535 BF535 Blackfin Blackfin Processor Processor Key features High performance 16-bit dual MAC processor core up to 350 MHz Flexible, software controlled Dynamic Power Management Optimized RISC instruction set for high code density and programming C/C++ language Enhanced media instructions to process audio, image, and video for multimedia applications Integrated system peripherals including USB device, PCI, serial ports, UARTs, SPIs, 32-bit timers, and more Blackfin processors utilize Single processor core Single instruction set Single programming model Single set of development tools ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU ADSP ADSP- -BF535 BF535 Blackfin Blackfin Processor Processor Target applications Automotive Broadband access Central office/network switch Digital imaging and printing Global positioning systems Industrial signal processing Instrumentation/telemetry Internet appliances Modem solutions Personal branch exchanges (PBX) POS terminals Telecommunications Video conferencing VoIP phone solutions ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU ADSP ADSP- -BF535 BF535 Blackfin Blackfin Processor Processor Blackfin Processor System Environment ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU ADSP ADSP- -BF535 BF535 Blackfin Blackfin Processor Processor Blackfin Processor Memory Hierarchy L1 instruction and data memories can be dynamically configured as SRAM, cache, or a combination of both L2 for larger storage need of instruction and data ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU ADSP ADSP- -BF535 BF535 Blackfin Blackfin Processor Processor Portable Low Power Architecture Dynamic power management ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU ADSP ADSP- -BF535 BF535 Blackfin Blackfin Processor Processor ADSP-BF535 Block Diagram ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU ADSP ADSP- -BF561 BF561 Blackfin Blackfin Symmetric Symmetric Multi Multi- -Processor Processor ADSP-BF561 Symmetric Multi-Processor Block Diagram ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU ADSP ADSP- -BF561 BF561 Blackfin Blackfin Symmetric Symmetric Multi Multi- -Processor Processor Key features Blackfin Symmetric Multi-Processor Dual high performance Blackfin Processors up to 756 MHz Capable of over 3000 MMACs Independent processor cores for image processing and system control functions RISC-like register and instruction model for ease of programming and C/C++ complier friendly support Enhanced media instructions process audio, image, and video data for multimedia applications Software controlled Dynamic Power Management with on-chip voltage regulation minimizes power consumption ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU ADSP ADSP- -BF561 BF561 Blackfin Blackfin Symmetric Symmetric Multi Multi- -Processor Processor Key features Highest Level of integration 328 Kbytes of total on-chip memory Dual Parallel Peripheral Interface and ITU-R 656 video data formats External memory controller providing glueless connection to multiple banks of external SDRAM, SRAM, FLASH, or ROM memory High bandwidth, two-dimensional internal DMA controllers UART with support for IrDA Integrated on-chip voltage regulator 256-ball Pb-Free Mini-BGA, and 297-ball Sparse PBGA package options ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU ADSP ADSP- -BF561 BF561 Blackfin Blackfin Symmetric Symmetric Multi Multi- -Processor Processor Key features Target Applications Digital still cameras Digital video cameras Hybrid digital video/still cameras Video security/surveillance system Portable multimedia players ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU ADSP ADSP- -BF531/BF532/BF533 BF531/BF532/BF533 Blackfin Blackfin Processor Series Processor Series Key features Blackfin Processors Offer Features Attractive to a Broad Application Base Performance to 756 MHz/1512 MMAC enables multichannel audio plus VGA/D1 video processing in multimedia applications Enhanced Dynamic Power Management with on-chip voltage regulation allows operation to 0.8V, extending battery life in portable applications Application-tuned peripherals provide glueless connectivity to general- purpose converters in data acquisition applications Multiple low cost, pin and code compatible derivatives enable software differentiation in cost-sensitive consumer applications ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU ADSP ADSP- -BF531/BF532/BF533 BF531/BF532/BF533 Blackfin Blackfin Processor Series Processor Series Key features High Level of Integration Up to 148 Kbytes of on-chip SRAM Parallel Peripheral Interface supporting ITU-R 656 video data formats Two-dual channel, full duplex synchronous serial ports supporting eight stereo IS channels 12 DMA channels supporting one- and two-dimensional data transfers Memory controller providing glueless connection to multiple banks of external SDRAM, SRAM, flash, or ROM Three timers supporting PWM and pulsewidth /event count modes UART with support for IrDA SPI compatible port Real-time clock Watchdog timer PLL capable of 1x to 63xfrequency multiplication 160-ball mini-BGA, 169-ball Pb-Free PBGA and 176-lead LQFP packages Commercial and industrial temperature ranges ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU ADSP ADSP- -BF531/BF532/BF533 BF531/BF532/BF533 Blackfin Blackfin Processor Series Core Architecture Processor Series Core Architecture Key features Two 16-bit multipliers Two 40-bit accumulators Two 40-bit arithmetic logic units (ALU) Four 8-bit video ALUs One 40-bit shifter Compute register file Contains eight 32-bit registers Can be operated as 16 Independent 16-bit registers MAC Can perform a 16 - by 16 bit multiply per cycle, with accumulation to a 40-bit result Signed and unsigned formats, rounding, and saturation are supported ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU ADSP ADSP- -BF531/BF532/BF533 BF531/BF532/BF533 Blackfin Blackfin Processor Series Core Architecture Processor Series Core Architecture Key features Program sequencer Controls the instruction execution flow, including instruction alignment and decoding For program flow control, the sequencer supports PC-relative and indirect conditional jumps ( with static branch prediction ) and subroutine calls Hardware is provided to support zero-overhead looping The architecture is fully interlocked, meaning there are no visible pipeline effects when executing instructions with data dependencies Address arithmetic unit Provides two addresses for simultaneous dual fetches from memory Contains a multiported register file consisting of four sets of 64-bit index, Modify, Length, and Base registers (for circular buffering) and eight additional 32-bit pointer registers (for C-style indexed stack manipulation) ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU ADSP ADSP- -BF531/BF532/BF533 BF531/BF532/BF533 Blackfin Blackfin Processor Series Core Architecture Processor Series Core Architecture Key features Blackfin processor support a modified Harvard architecture in combination with a hierarchical memory structure Level 1 (L1) memories typically operate at the full processor speed with little or no latency At the L1 level, the instruction memory holds instructions only. The two data memories hold data, and a dedicated scratchpad data memory stores stack and local variable information Three modes of operation User mode has restricted access to a subset of system resources, thus providing a protected software environment Supervisor and Emulation modes have unrestricted access to the system core resources ACCESS I C LAB Graduate Institute of Electronics Engineering, NTU [1] Analog Devices Web Site, http://www.analog.com/ [2] Blackfin Processor http://www.analog.com/processors/processors/blackfin/ [2] ADSP-BF533 Blackfin Processor Hardware Reference, Rev 1.0, December 2003, Analog Devices. Section 2 [3] Blackfin Processor Instruction Set Reference, Rev 3, June 2004, Analog Devices. Sections 8 ~ 10, 14 & 15 I suggest that students who want to be familiar with the Blackfin Processor should read reference 3 and 4 thoroughly.