Sie sind auf Seite 1von 30

Notes

Design of a General Purpose 8 bit RISC Processor for Computer Architecture


Learning
 Processor originally designed to teach computer architecture
 Purpose: Provide students with a flexible but consistent tool for understanding computer
architecture from its basics

Introduction

 Processors’ instruction set: Basic instructions realized directly by hardware


o The instructions set is used for creating structured software programs
 8-bit computers are the most sold in the world because they are used to control simple
computational tasks.
 Operating mode: Multi-Cycle execution

Computer Architecture Education

 There are 5 different approaches used to explain computer architecture.


o Paper
o Simple Hardware
o Simulator
o HDL
o Logic Blocks

Computer Architecture Primer

 Section introduces some basic concepts


Computer Classification
 Computers can be classified according to:
o The amount of information they process
 Information is grouped into 8 bits
 Pair of bytes is called a word (16bit)
 Pair of words is called a double word (32bit)
 Four words is called a quad word(64bit)
o Type of operations they execute and number of instructions they are able to
execute
 Instruction Type
 Reduced Instruction Set Computer (RISC)
 Complex instruction Set Computer (CISC)
 Specific Instruction set computer (SISC)
 Number of instructions
 Single Instruction-Single Data (SISD)
 Single Instruction- Multiple Data (SIMD)
 Multiple instruction-Single Data (MISD)
 Multiple Instruction-Multiple Data()
o Their architecture
 Von Neumann
 Shares program and data memory in a single bus connection
 Harvard
 Two separate memories for program and data (Independent
buses)

Instruction Execution

 Processor executes source program in a predefined sequence


o Each instruction delivers the result needed by the next instruction
 Every instruction can be divided into basic tasks that can be applied to any
instruction
o FETCH: Retrieves instruction from the program memory and loads it into
instruction registers
o DECODE: Decompose the instruction into sub-instructions to tell the
components of the processor what to do
o EXECUTE: Processor obtains a result from the instruction
o WRITEBACK: Saves the data that was generated during the execute state.
(NOT ALL THE INSTRUCTIONS HAVE A WRITEBACK STAGE)

Definition of the Didactic RISC Soft Processor

 Processor has the following characteristics


o Based on the Harvard architecture
o RISC Instruction set (29) Instructions
o Single Instruction- Single Data (SISD) execute order
o Eight 8-bit General purpose Registers
o 256 allocation of 16-bit wide ROM program memory
o 256 allocations of 8-bit wide RAM data memory
o ALU with two basic arithmetic and six logical poerations
 To obtain a functional processor
o 1.Define memory structure and ISA of processor
o 2.Define Instruction formats
o 3.Design and construct all the functional blocks that comprise the processor

Memory Organization

 Program Memory
o ROM TYPE
o Used to store sequence of instruction (Program)
o Organized as a linear sequence of 256 deep x16 bit write memory locations
 8-bit wide address
 8-bit PC (program counter) to locate next instruction
o NON-VOLATILE
o Reading and writing operation
 READING: When a program is being executed
 WRITING: When a program is going to be loaded into the processor
 Data Memory
o RAM type memory segment
 Used to store data generated by the main program
 Can be variable value or a constant value
 256 allocations, 8 bit wide
 Address is 8 bit wide
 VOLATILE MEMORY
 2 operation modes
 READ:
o Only needs to set the desired address and the data is
available at the output right away
 WRITE: Must follow the steps
o 1. Set desired address at the address bus
o 2. Set the desired data in the input data bus
o 3.Data is stored at the desired location and is available at
the output bus

General Purpose Registers

 RAM TYPE memories mae of FlipFlops


 Location is close to the ALU (faster transfer)
 Almost all instructions use a register
 Two possible operations:
o READ: Two registers are selected simultaneously
 Set desired address for source and/or destination register, and the
contents are transferred to the outputs right away
o WRITE: Only the desination register can be written and must follow the
steps
 1. Set the Register address at the destination address buss
 2. Set the data in the input data bus
 3. Introduce a pulse as the control signal
 4. Data is stored at the desired register and is available at the
output

Instruction set

 Made up of the assembly code and the machine language binary format
 Must be simple and robust
 Simple instructions are selected so that the processor can execute them in the fewest
steps possible.
 Instructions can be classified according to their purpose
o Operations: Affects the register Value
 Arithmetic operations
 ADD/Subtract registers and immediates
 Logic Operations
 AND, OR, NOT, SHIFT RIGHT, SHIFT LEFT, SWAP
o Program Control: Affect the execution order
 Branch, Jumps, Conditional branches
o Data Transferring: Affect the memory contents
 Load and storage

Addressing Modes

 Form to connect the processor to the different memories (To interchange information)
o Rule for interpreting or modifying the address field.
 Processor includes six addressing modes
o Program Memory Direct
 When a new value is introduced to the PC, causing it to change address
o Immediate
 Use a constant value K to affect a register
o Data Memory Direct
 Instructions use a constant value to select address to the data memory
(can read or write)
o Register Direct, Two Registers
 Operates on two registers (arithmetic or logic) and saves it in one of the
registers (destination register)
o Data Memory Indirect Through Register
 Data memory is addressed by the contects of a register [] and execture
read or write operations on a second register RD<=[RS] Means: Contents
in direction RS saved to RD
o Register Direct, Single Register
 1 register is used from GPR to affect its contents according to an
arithmetic or logical operation
Instruction Format
Refers to the order of the bits of the instruction (organization)
 Every instruction uses a different identifier or operation code (OPCODE).
o Length of OPCODE depends on the total number of instructions (5 bits for this
processor)
 The instructions can be classified according to their format type
o Type J: Used for jumps
o Type I: Use a register and an immediate
o Type R: Use registers to perform operations
o Type D: Carry out operations without the need for parameters
 Not all instructions use all of the bits but it is desirable to have a universal instruction
length
Functional Units
A processor is made up of many hardware blocks that are necessary for data processing. All the
blocks have to satisfy certain logic design criteria to accomplish their task. Sometimes a designer
has to make their own block or sometimes they can get it from a library (commonly used blocks) .
The control unit must use the control signals that every block has to coordinate the functioning of
the blocks.
 Program Counter
o Binary counter that produces the address of an instruction in the program
memory. This is how the computer keeps track of its location in a program. A
program memory must also be capable of loading a pre-defined address if
required.
 A common 8 bit binary counter with parallel load is use
 Instruction Register
o Divided into two 8 bit registers
 Instruction Register (IR)
 Stores the 8 most significant bits which contain the OPDCODE and
a register parameter (INSTRUCTION PARAMETERS)
 Instruction Data Register (IDR)
 Stores the 8 Least significant bits which contain the constant or
immediate used by the instruction (Instruction data)
o Made up of parallel array of D-Type Flip-Flops
o An 8-bit register (ADRESS REGISTER) is used to store the PC while the instruction is
executed.
 Instruction decoder
o In charge of decoding Data stored in the instruction Registers
 Splits the MSB and LSB 8 bits of the instruction into the OPDOCDE, RS, RD,
AND K constant.
 The split data is sent to the units that require it
o The decoder is made up of buffers inside a block to sort the signals to separate
buses
 General Purpose Registers
o Registers Used to store and save operands or results during the program
execution.
o Can share data directly with the ALU and data memory (high speed calculations)
o Control unit controls read and write
o Internally
 Consists of eight 8bit registers
 Pair of 8 bit multiplexers
 8 bit output decoder to control which register is read or written
 Reads two registers at a time but only writes one register at a time
 Arithmetic-Logic Unit (ALU)
o Executes arithmetic and logical calculations
o Executes simple operations (these can be used to make more complex operations)
 Control Unit
o State machine that synchronizes the operation of all the other functional blocks. It
sets the functioning order according to the OPCODE of the instruction. The state
diagram used to design the Control Unit must take in to account the following
considerations:
 Must have a reset state present at start up (considering initial conditions
of blocks)
 Second state must be the FETCH state (retrieved from program memory
and loaded into IR and IDR register)
 Instruction decoding happens right away once the instruction is fetched
 Next state is the execute state
 The final stage involves storage of the result into the respective functional
unit, then the PC is incremented
o Design of the control unit is the most challenging part of the processor design,
Must take every functional block into account.
 MULTI CYCLE VS SINGLE CYCLE
o This processor is a multi cycle processor
o Multi Cycle:
 Can be 1.27 times faster than single cycle
 Higher clock speed
 Less hardware
o Single Cycle: Al instructions have the same clock cycle length (Means wasted time
for shorter instructions)
 Clock cycle is determined by the longest path
 More hardware

CHARACTERISTICS OF SOPHIA PROCESSOR

AN EMBEDDED REAL-TIME PROCESSOR IMPLEMENTED ON FPGA DEVICES


 Since this processor is based on the 8051 it is a cisc processor with a lot of instructions
 URT51 Processor was designed for embedded real-time control applications
Introduction
 Real time Systems: A real time system has tasks that need to executed by a processor
before a certain time. (Mobile phones, entertainment devices, cars, medical devices, etc.)
 Temporal parameters can be defined independently of the functionality of each task.
 RTOS: REAL TIME OPERATING SYSTEM

THE URT51 ARCHITECTURE

 Includes an 8051 Core


o Implements a subset of the 8051 instruction set that can recognize special real-
time instructions to configure the real time manager. IN/OUT devices can be
driven by the 8051 core.
 Real time Manager
o Controls the real-time behavior of the URT51 processor
 Continually checks if there exists a real time action to be performed. If no
task is to be performed then the uRT51 activities are halted (reducing
power consumptions)
 Checks if there is an event that forces to change the state of the 8051 core
 This is not a processor unit! It’s a circtuir optimized to carry out real time
functions of the processor
o Stores all of the real time information that is required to execute the real time
tasks onto the RAM system
o THIS IS NOT A PROCESSOR UNIT BUT AN OPTIMIZED CIRCUIT
o IT IS DESIGNED TO CARRY OUT THE REAL TIME FUNCTIONS OF THE URT51
PROCESSOR
o Continually checks if there exists an event that requires to be activated or a task
whose state should be modified
 Debugging unit:
o Allows for programming suite integration.
 All the activities can be controlled and supervised
 Memory Controller:
o Is there to allow the uRT51 processor to connect to external memories
 Interrupt manager:
o Gives support to asynchronous real-time events and can release real-time tasks
 In this processor the RAM memory is shared between the 8051 core and the Real-Time
Manager.
o The memory sharing is synchronized by the memory controller
 The RAM in this processor is shared between the 8051 core and the real time manager.
o This sharing is synchronized by the memory controller

Time and events in the uRT51

 An event is an occurrence or happening, usually significant to the performance of a


function, operation, or task.
 The processor keeps time of the system into internal registers of the real time manager
 Maximum number of events supported depends on the physical amount of memory of the
real-time memory assigned to events structures

TASKS AND PRIORITIES

 A real time processor should support multitasking


 To store the parameters of the real time task the processor uses what is called “Class
structures”
o These are stored in the RAM memory of the system
 In multitasking systems, a priority is assigned to each task in order to schedule the set of
tasks that are ready to be executed.
o The priority of each task can be modified when the program is running (Depending
on the technique applied)
 In a traditional real time processor the priority of the tasks is pre determined
o This processor chooses the highest priority task and executes it first
o The priority of the tasks is held in a respective task structure, it can be modified
when the program is running
o The tasks priority can change if the action taken on timex events or interrupts is
configured
o If actions do not modify the tasks priority, then a fixed priority discipline is
implementes

PIPELINED 8 BIT RISC PROCESSOR DESIGN USING VERILOG HDL ON FPGA 07808194

 8-bit RISC processor with a Harvard architecture with pipelining


 It has 34 instructions
 8 bit ALU, two bit I/O ports, Serial in/Serial outports, eight 8 bit general purpose registers,
4 bit flag register and a priority based three vectored interrupts
 Can execute programs up to 262,144 instructions long
 Done using a SPARTAN 3E BOARD FPGA with 0.0517 micro seconds

Introduction

 Due to the betterment of field programmable gate arrays, we have reached a point where
the architecture of processors can be modified by programming in HDL
 Main difference between traditional processors and fpga based processors is that with
fpga processors one can make significant changes to the datapath itself
 In risc processors load and store are the only operations used to acces memory
o The rest are performed on a register-to-register basis
 Clock Gating: A method to reduce clock power, dynamically terminates the clock signals in
unused modules of the total hardware
 Universal Asynchronous Receiver Transmiter (UART): Type of serial communication
protocol, which is mostly used for short- distance, low speed, low-cost data exchange
between computers and peripherals
 Asynchronous serial communication: high reliability, less transmission line and long
distance transmission, extenseviley used as a mode of communication between computers
and peripherals. This is usually implemented by UART

8 BIT RISC PROCESSOR ARCHITECTURE

 Characteristics of the processor


o 8 bit RISC processor
o Harvard Architecture
o 8 bit ALU
o Two 8 bit I/O ports
o Eight 8 bit general registers
o 3 interrupts
o Serial-in/ Serial-out ports
o 4 bit flag register
 Zero Flag (Z)
 Carry Flag (C)
 Borrow flag (B)
 Parity flag(P)
 Proposed architecture of the system
 Will work at 2.5 volts and at a frequency of 25 Mhz
 Accumulator: Register for short term, intermediate storage of arithmetic and logic data in
a computers CPU. It is used as the source of one of the operants to the ALU
 An accumulator-based CPU architecture is a register-based CPU architecture that only has
one general purpose register (accumulator).
o Main advantage of having general purpose registers: It is easier for the CPU to do
more independent instructions in parallel (uses the stack less)
 The interrupts are priority based and one of the interrupts is masked able
o Maskable interrputs are those that can be disabled by the programmer (Even if
the interrupt happens the CPU just ignores it)
 The processor is very simple, containing only 34 instructions
o Making it easier to program, small die area
 Data memory and general-purpose registers are the modules where clock gating is used
o Loading to the registers is taken place during the falling edge of the clock
o Control signals are generated during the rising edge of the clock
PIPELINING ARCHITECTURE

 Pipelining
o Designed to improve performance and provides a way to reduce the average
execution time per instruction (decreasing the number of clock cycles per
instruction [decrease the number of clock cycles per instruction])
o While executing one instruction, the next instruction is fetched
 The pipelining architecture used for the processor is:

 The PIPELINING feature is applied in the tstate counter module

DESCRIPTION OF FUNCTIONAL MODULES

 Program Counter Unit (PCU)


o Consists of Program Counter (PC)
 18 bit wide register containing address location of the instruction being
executed
 As the instruction is executed, the address is increased by one
o Program Counter Save (PCS)
 Stores program counter incemented by two when SAV PC instruction is
executed
o Program Counter Register (PCR)
 Used to store address of the instruction when a jump instruction is
executed
o Stack PC
 Used to store the current PC value during the execution of interrupt
 Instruction Memory (IM)
o The IM is 16 bits wide and has 262,144 address locations
o In the fetch cycle the IM DATABUS transfers instruction memory content to the
instruction register. This is if IM ADDRBUS provides an address location to IM at
the same instance of time
 Control Unit
o Recieves inputs from flag register, Serial module, and Interrupt Module
o Takes the FPGA source clock and generates 59 control signales and 4 clock signals
to control all of the modules
o
o The control unit contains the following components: Instruction Register (IR),
Instruction RegisterX (IRX), tstate counter, Low Power Unit (LPU) and instruction
decoder.
 Instruction Register:
 Gets instruction for decoding during fetch cycle
 Moved to IRX during every rising edge of clock of execution cycle
 Instruction RegisterX(IRX)
 Contents of IR are stored here since a new instruction is fetched
while the first is being executed. They are stored so that the
instruction is executed.
 Tstate Counter
 THE PIPELINING FEATURE IS APPLIED IN THE TSTATE COUNTER
MODULE
 Generates fetch and execution cycles required for proper
functioning of the processor
 The states that are generated at the rising edge of the clock are:
o TF1 (Fetch cycle)
o TX1 (Execution State)
o TX2(Execution cycle2 ONLY FOR BRANCHING
INSTRUCTIONS)
 Low Power Unit (LPU)
 Takes care of Clock Gating for Data Memory and General Purpose
Register
 Instruction Decoder:
 Generates control signals whenever IRX is loaded with a valid
instruction
 Control signals are also generated every rising edge of the clock
(Like IRX)
 DATA MEMORY
o 8 bit wide with 4096 address locations
o Gets the required address location by 12-bit address (DM ADDRBUS) from control
unit
o Has read and write control
o Accessed by 8 bit Data Bus known as SYSTEM_DATABUS
 ALU UNIT
o ALU is connected to Accumulator and general-purpose registers by its 8 bit buses
ALU DATABUS A and ALU DATABUS B.
o Consists of following operations: AND, OR, XOR, ADD, SUB.
 The result of the operation is stroed in the accumulator
 The Result contains Zero flag (Z), Carry Flag (C), Borrow Flag (B) and the
parity flag=> ALL STORED IN A 4 BIT REGISTER
 Accumulator
o 8 bit wide register
o Data tranfer, ALU operations and I/O operations takes place through it
o Contains increment, decrement, rotate right, rotate left and compliment
operations
o The accumulator is connected to the serial module for sending and recievin data
through the serial port
 Register Set
o Eight 8 bit registers used for storing data that is frequently used
o Connected to the ALU for fast operations,
o Also connected to system data bus for loading and storing data
 Interrupt Module
o Contains 2 interrupts (IO and I1)
 They are priority based
 The I1 is maskable
o Has a 3 bit register INTCON to control de functions in the interrupt module
 Bit 2 : enables and disables timer interrupt
 Bit 1: Used for enabling and disabling external interrupts
 Bit 0: used for masking I1
o The timer module has 10bit register for counting the predefined interval
 When the maximum value is reached TMF0 gets set
 I/O Module
o 8 bit input and output ports for communication with external environment
 Directly connected to accumulator
o The module will transfer data to and from accumulator when a specific control
signal is enabled
 Serial Module
o Rxin is serial in port
o Txout is serial out port
o The serial module can transmit and receive simultaneously
o When the data is sent, a start bit of 0 is sent first then the actual data bits. The LSB
is sent first. A stop bit is added at the end (1)
o The module consists of 2 8 bit registers TBUFF and RBUFF
 Storing data while transmission and reception
 The data stored in TBUFF is shifted out during serial data transmission and
RBUFF is shifted out during reception

INSTRUCTION SET ARCHITECTURE

 Contains 4 types of instructions


o Data Transfer Instructions
o Arithmetic and Logic Instructions
o Branching Instructions
o Machine Control and I/O instructions
 Contains 34 basic instructions and 83 total opcodes
o MAIN ADVANTAGE IS THE USE OF SAV PC INSTRUCTION
 Saves PC incremented by two value in PCS
 THIS INSTRUCTION CAN BE USED BEFORE JUMP INSTRUCTION.
THEN RES PC ( loads PCS to PC) can be used to comeback to the
instruction after the jump
 This works like a CALL instruction but doesn’t increase the
instructions

SOFT-CORE PROCESSORS FOR EMBEDDED SYSTEMS

 Soft-Core Processor: Hardware description language (HDl) model of a specific processor


o Can be customized for a given application and synthesized for an ASIC or FPGA
target
o Advantages:
 Reduced Cost
 Flexibility
 Platform independence
 Immunity to obsolescence

EXAMINE SOFT-CORE PROCESSORS FOR EMBEDDED SYSTEMS 2013


 When designing an embedded system using FPGAS, a controller Is needed. There are
various options that are available
o Using an OFF the shelf MicroProcessor: Mounted on the board and connecting it
to the FPGA using a standard bus
 Some time this approach does not meet the requirements, some
examples:
 Application that requires peripheral functionality that is not
available in a discrete solution
 When board real estate is limited
o Hard Processor
 Dedicated silicon area on the FPGA
 Able to operate at high frequencies (Similar to that of a discrete
processor)
 Examples: PowerPC, ARM Cortex-A9 dual-core, Etc
 Problems with this approach
 Does not provide the ability to adjust it to better meed the needs
of the application
 Not flexible
o Soft Processor:
 A processor that is made using the FPGA logic fabric
 Processor does not operate at the same clock frequencies
 Does not have the same performace
 In many embeded Solutions the high performance is not needed
 Performance is traded for functionality, reduced cost, and
flexibility
 There are FPGA vendor soft processors and independently developed soft-
core processors
 The independent soft-core processors are platform independent
therefore can be implemented in any FPGA design
 THE LEON3
o VHDL model of the 32 bit processor
 SPARC V8 architecture
o Suitable for system on a chip (SOC) design
o Pros
 No licences are required for research and education use
 All of the source code is available
 Linux and RTOS can be installed
o Cons
 Not all FPGA development boards are supported
 Not used a lot
 MicoBlaze
o 32 bit by Xilinx, RISC architecture
o Pros:
 Can be used in all Xilinx FPGA families
 Abundant configuration options
 Standard buses
o Cons:
 Can ONLY be used in Xilinx FPGAs
 EDK needs a license
 Source Code not available
o OpenRISC
 An open source risc micro-processor
 Pros:
 Everything is open source
 Large user community
 Cons:
 Only a few development boards are supported
 Complicated debugging solutions
 Outdated bus
 Many IP blocks are not maintained
o THE NIOS II
 Proprietary 32 bit RISC architecture processor by ALTERA for their FPGAs
 Pros:
 Development environment is easy to use
 No liscence needed when using QUARTUS II Web Edition
 Cons:
 Can only be used in Altera FPGAs
 Some cores have Time-Limited licenses (Will stop working after
some time)

SOFT PROCESSORS AS A PROSPECTIVE PLATFORM OF THE FUTURE

 FPGAS provide the highest degree of flexibility and are almost fully application neutral
o Sacrifices: Higher Usage of basic Logic gates
o Decrease in circuit operation frequency
 Caused by the use of switched interconnect fabric
o Soft-CPUs allow
 Improve or fully replace CPCU architecture In the field
 Implement hard logic solutions with maximum reliability
 Implement functions that can be efficiently implemented only with hard
logic (Example: Coding/Decoding)
 Implement functions that require hard deterministic and fixed timing
 Full control over technologies used
 Applicability of Moores law for general purpose FPGA devices
o MAIN METRICS
 Metrics used to compare ASIC and FPGA-Based soft CPUS
 Number of gates (Transistors)
 Maximum Internal Clock Frequency
 Maximum clock frequency for FPGA based soft-CPU is usually 3.5 slower
then ASIC implementations
 The clock signal has to pass through several statically controlled
FET switches with higher resistence then ASICs metal
interconnects
 Economics of soft-CPU adoption
o ASIC development and NRE costs are much higher
 Unit cost for huge production volumes is much lower
o FPGA development and NRE costs are much lower
 Final unit cost is high even for large volumes
o In I development, the cost for different approaches should be considered. There
are 3 types of costs
 Unit cost for final product (Cprod)
 The cost of production for one device unit and consists for cost of
preparation for production one-time non-recurring engineering
cost
o One time cost to research, design, develop and test a new
product
 Development cost (Cdevel)

 Cost of time to market delay (Cttm)

W65C02S 8-bit MICROPROCESSOR

INTRODUCTION

THE PROCESSOR
 Low power
 Low cost
 8-bit microprocessor
 Fully static core
o The main processing unit can be stopped by stoping the system clock oscillator
that is driving I t
 It maintains this state until the clock is introduced again and then the
processing resumes where it stopped without a problem
 When this happens, they consume very little power
 Useful in designes where MPU remains in standby mode until
needed a
 Features:
o 8 bit data bus
o 16 bit address bus
o 8 bit ALU
o 16 bit PC
o 69 Instructions
o 16 addressing modes
o 212 operation codes (OpCodes)
o Variable length instructions
 Provides for lower power and smaller code optimization over fixed length
instruction set processors
 FUNCTIONAL DESCRIPTION
o The organization of the core is divided into two parts. The register section and the
control section
 Instructions from program memory are executed withtin the register
section
 The signals that cause the data transfers are generated within the control
section
 Instruction Register (IR) and Decode
o The OPCODE portion of the instruction is loaded into the instruction register and is
latched during the OpCode fetch cycle
o The instruction is then decoded to generate various control signals for program
execution
 Timing Control unit (TCU)
o Provides timing for each instruction cycle that is executed. It is set to zero for each
instruction fetch, and it is advanced until the instruction Is completed
o The data transfers between the registers depends on decoding the contents of
both the IR and the TCU
 Arithmetic and Logic Unit (ALU)
o All arithmetic and logical operations take place within the ALU
o Also calculates the effective address for relative and indexed addressing nodes
o The result of the operation can be stored in memory or registers
o The flags (Carry, Negative, Overflow and zero) are updated after the ALU has done
the operation
 Accumulator Register (A)
o 8 bit general purpose register that holds one of the operands and the result of the
ALU
 Index Registers (X and Y)
o Two 8 bit index registers
 Can be used as general purpose registers or to provide an index value for
calculation of the effective address
 When executing an instruction with indexed addressing
 Processor fetched OPCODE and a base address
o The address is modified by taking the contents of the
index register and adding them to the address prior to
performing the desired operation
 Processor Status Register (P)
o Contains status flags to report to the ALU
o On top of the status flags, the status register also contains mode bits for user
input
 Program Counter Register (PC)
o A 16 bit register which provides the addresses that are used to execute a program
 Every time an instruction or operand is fetched from the program
memory, the register is incremented
 Stack Pointer Register (S)
o 8 bit register to indicate the next available address is the stack memory
o
 16 adressing modes
o An aspect of the instruction set architecture
o The addressing modes define how the instruction is read (what bits are the
OPCODE, What bits are the registers, constants, etc.)

BUILDING EMBEDDED SYSTEMS USING SOFT IP CORES “HANDS-ON EXPERIENCE WITH ALTERA
FPGA DEVELOPMENT BOARDS”

 The NIOS II is the most widely used soft processor in FPGA industry
 Soft IP (synthesizable IP) provides customers with lot of design flexibility because they
allow the customer to alter the design at functional level
 Soft Core Processors for Embedded Systems

o Embedded systems: Hardware and software combination to achieve a desired task


 Designing an embedded based product is challenging as it has to meet
constraints on area usage, size, power consumption, time to market
o A small change in the requirements of an embedded system might mean that the
design needs to start from scratch
 Pre-designed and tested IP cores are an alternative to solve this problem
o Advantages of using soft IPs in embedded systems
 Flexible ( can be changed without needing to redesign everything)
 Since they are described in HDL the overall design is easier to understand
 Protection from becoming obsolete (it can be synthesizable for any target
device)
 A Survey of Soft Core Processors
o NIOS ll- Altera
 Easy to begin to use (can be instantiated by a simple selection process in
the SOPC Builder)
 RISC and supports HARVARD MEMORY ARCHITECTURE
 32 general purpose registers, 32 bit ISA
o MicroBlaze and PicoBlaze- Xilinx
 32 bit
 HARVARD RISC architecture
 MicroBlaze is targeted on Virtex and Spartan FPGAs only
 The PicoBlaze is an 8-bit MICROCONTROLLER targeted on low end FPGAs
like Spartan3-Virtex-ll and Virtex ll pro families of FPGA
 Mostly used for simple data processing applications
o Tensilica Inc. soft processors
 Has a number of low cost, power optimized soft IP cores aimed at
embedded system design
 Mostly used for DSP application
 They are “reconfigurable” in the sense that they are pre-defined
parameters that can be changed. (allows the designer to tune the
processor to an intended application)
o Open-Source Cores
 Mostly used by academia for research and development of their
embedded system-based product
 Examples include: UT NIOS open-core, Open SPARC, LEON, OPEN RISC
 Comparison of Soft Core Processors
o NIOS ll and MicroBlaze are designed to be implemented on FPGA
o The other cores are not meant for specific target technology
 Different versions of the NIOS soft processors
o The fast version of the NIOS ll uses more FPGA resource, but this results in better
system performance
o The standard version: designed to have a balance of System Performance and cost.
This processor uses minimum FPGA resources but takes a hit on the performance
o The economy version of the NIOS processor uses a minimum amount of resources and
has less features. This version is suitable for low-cost applications

 Big-endian and little endian modes


o These terms are used as a way to describe the order in which information is saved in
memory.
 Big endian= the MSB is saved first
 Little endian= the LSB is saved first

DESIGNING SOFT-CORE PROCESSORS for FPGAs JAMES BALL ALTERA,INC.

 In the mid to late 1990s, Soft-Core FPGAs were mainly used for research because they were
expensive and had low performance. They also occupied most of the space within the FPGA
meaning that it could only be used for that
o Nowadays a soft core processor does not occupy a lot of space within the FPGA
 Efficiency is a ratio of performance to cost

 CONNFIGURABLE PROCESSORS
o FPGA processors and ASIC processors support generation-time-configuration
 This allows designers to trade off cost and performance as needed
 Examples of this are: Pipelining, cache size, multiplier implementation
o FPGA designers have an advantage over ASIC designers due to the configurability
nature of the FPGA. Where FPGA designers can test their designs in real life without a
problem, ASIC designers test their designs in a simulation which is not as accurate.
o FPGA designers can also tune their designes much easier to meet requirements
 CHALLENGES OF FPGA PROCESSOR DESIGN
o When working with FPGAS, designers need to develop solutions appropriate for FPGAs
and not adopting solutions that work for other forms of implementation
o Some techniques used by ASIC processors to increase performance might not work in
FPGA due to their difference in how they work.
o Designers also need to accommodate the low efficiency of FPGA resources relative to
ASICs.
 An efficient soft processor needs to have a simple instruction set on a simple
pipeline.
 Higher levels of application performance are available by using
multiple FPGA processors, adding custom instructions, and/or adding
custom accelerators
 OPPORTUNITIES OF FPGA PROCESSOR DESIGN
o Despite the disadvantages relative to ASIC, the flexibility offered provides unique
opportunities in processor design.
 An FPGA designer can change their processor configuration whenever
 They can make their own custom periferals
 When using ASICs if the system requirements change then the entire ASIC
needs to be changed
 ASICs are usually made to provide more performance than required
o This tends to increase the cost and size of the processor
o The end user is the one that pays for the increased
performance that is not being used. THERE ALSO EXISTS THE
POTENTIAL THAT THE REQUIREMENTS SURPASS EVEN THE
EXTRA PERFORMANCE GAP.
o FPGAS CAN AVOID ADDITIONAL COST BY BEING CONFIGURED
WITH MINIMAL OR NO PERFORMANCE MARGIN
 The usage of FPGAS also offers debug facilities, these allow the software
developers to control the processor and observe its state
 The extent of the debug facilities for ASIC processors is fixed once it is
produced
o Some debug facilities include:
 Stepping, breakpointing, watchpoining, tracing, and
examining/modifying memory and registers
o IN AN FPGA DEBUG FACILITIES CAN BE ADDED AS NEEDED
AND REMOVED WHEN THE DESIGNER IS DONE USING THEM
 FPGA LOGIC OVERVIEW
 In order to make good designs, the processor designer must have a
good understanding of FPGA devices
o FPGAS are composed of logic elements, RAM blocks,
Multiplier blocks, and routing
 Routing occupies most of the die area
 The resources are configured through SRAM blocks
that get loaded with configuration information every
time power is applied to the FPGA. The configuration
file Is typically held in non-volatile memory
 LOGIC ELEMENTS
o Typical Logic element
o

o Consists of 4bit input look up table, carry chain logic and a flip
flop.
 SRAM is used to hold the contents of the look up
table, the inputs are connected to the address of the
SRAM and the lookup table computes the result based
on the inputs
 The flip-flop stores the output of the carry logic or the
lookup table
o RAM BLOCKS
 FPGAS usually require dedicated RAM blocks
 These blocks typically support simple dual
port (one read and one write port)
 The RAM locks are typically only available as
synchronour SRAMs (Synchronouor refers to
the fact that the data transfer is controlled by
a clock, either falling endge or rising edge of
the clock)
o MULTIPLIER BLOCKS
 Composed of several small multiplies
 Some fpgas offer dedicated circuitry to
combine smaller multipliers
o The ones that offer dedicared circuitry
usually offer higher frequency
 On top of multiplication, these blocks can also provide
other features, like:
 Saturated arithmetic, accumulators, or barrel
shifters
o FPGA ROUTING
o

 The internals of an fpga are made up of resources that


are organized in a two dimensional array, between
these arrays there are wires of varying lengths and
speeds. These wires are organized into a hierarchy
 The wires are connected with Multiplexers
 The resources are connected with different kinds of
wires that have different speeds. Routing is organized
in a way that resources that are closer together have
smaller delays than resources that are further apart,
these are connected with multiple wires and switches
which cause more delay
 Specialized wiring if provided for carry chains (the
carry out of one logic element is directly connected to
the carry in of the next logic element) This is the
reason why adders In fpga are much faster than
adders in ASIC
 FPGA DESIGN ISSUES
o To make an efficient processor design in fpga one must
recognize the differences between FPGA resources and ASIC
resources
o ROUTING
 When designing an ASIC the placement and the wiring
is optimized for that specific application, therefore the
time it takes for signals to travel is minimized
 FPGAs have large wiring delays because the wiring is
fixed. The switches used to choose the wires also add
more delay
 Routing delays between multiplie blocks and RAM
blocks is particularly large due to the scarcity of these
elements compared to logic elements
o SUGGESTED TECHNIQUES TO DEAL WITH WIRE DELAYS
 Minimize the logic elements between registers and
use all of the inputs of the lookup table for each logic
element as much as possible
 Take advantage of registers
 Configure the multiplier block to have registered
inputs and outputs
 Ensure that pipeline stalling singnals are driven early
in the cycle and that it comes directly from a register
o CONTROL LOGIC
 Consists of flipflops and combinational logic
 In FPGAS the control logic should be kept as simple as
possible, this eliminates critical timing paths
o ADDERS
 Adders can be used as much as possible because they
are fast and low cost
 Each logic element also supports a 1 bit adder
 Shairng FPGA adders should be avoided
o MULTIPLIER BLOCKS
 They are totally reconfigurable just like any other
resource in the FPGA
 Registers can be used in the input and the output of
the block to avoid critical paths
 CRITIAL PATHS: paths between the input
values and the output values that have the
longest delay
 SOME SUGGESTED TECHNIQUES WHEN USING
MULTIPLIER BLOCKS
 High performance FPGA processors
o Can achieve a throughput of one
result per cycle because they delay
the availability of the results by two
cycles
 Requires a pipeline long
enough to absorb the latency
of the multiplication
operation
 Mid performance FPGA processor with a
shorter pipeline
o Can achieve a throughput of one
result every three cycles by stalling
the multiply instruction for 2 cycles
o EQUALITY COMPARISONS
 Equality comparisons are implemented differently
based on the type of platform that is being used to
implement the design. FPGA implementation and ASIC
implementation work differently
 They still work if they are implemented in a
different way, but they will not be as fast as
possible
o INSTRUCTION DECODING
 Instructions are decoded so that the processor knows
what to do (what blocks to activate and what blocks
to not activate). In a pipelines processor, the pipelines
are also activated by control signals
 Pipeline control singnals should not be used in
the same stage that the instruction is decoded
(can create critical paths)
 SUGGESTED TECHNIQUES TO DEAL WITH POTENTIAL
INSTRUCTION DECODING PROBLEMS
 Unused bits can be used to provide pre-
decoded control signals to reduce critical
paths
o These are computed when an
instruction is fetched from memory
and written into the instruction cache
 Decode instructions earlier in the pipeline
than required and then pipeline them to the
required stage. (uses more flip-flops but those
are abundant in FPGAs)
 Create as many pipeline signals as possible in
the same pipeline stage (allows the synthesis
tool more opportunities to create optimal
decoding logic)
o MULTIPLEXERS
 ASICS have muxes which are compared of gates that
are fast and have a low cost, ASICS can also have a
higher wire density when required so muxes can be
packed closer together
 IN FPGAS a MUX is made of logic elements which are
slower and more expensive than ASIC
implementation. They are also further apart because
of the fixed routing
 Processors naturally contain circuits with high
concentration of MUX for various uses
 SUGGESTED TECHNIQUES TO DEAL WITH
MULTIPLEXERS
 Omit some stages from the bypassing (The
data at that stage is passed to the stage
required)
 Only bypass one of the 2 input operands
 Provide early signals to the bypass muxes to
allocate most of the cycle for the muxing and
routing delays
 Add a pipeline stage before the GPR file write
(provides extra time to mux the write value)
o CONSTANTS
 Logic elements are more efficient at storing a constant
value than a variable value
 Whenever possible, convert run-time
parameters to generation-time constants
o BARREL SHIFTERS
o

FEATURES OF AN EMBEDDED PROCESSOR- SPRINGER LINK INTERNATIONAL PUBLISHING


SWITZERLAND 2015

The basic elements of embedded systems are going to be discussed

2.1 THE COMPONENTS OF EMBEDDED SYSTEM

 An embedded system is a computer controlled device designed to perform specific tasks


that help revolve the real time control of machines or processes. They are cheaper than
general purpose systems

2.1.1 PROCESSOR

 The processor in an embedded system can be a microcontroller or a generic


microprocessor. These are programmed to do the specific tasks for which the system has
been designed

2.1.2 Memory

 There are 3 types of memory that are found within an embedded system
o RAM: Is a hardware component within the syste that is used to temporary store
data during the execution of the program
o ROM: Is also a hardware component that stores the information needed for the
system to work (the program)
o Cache: Used to store information from slower memory to speed up processing
times

2.1.3 System Clock

 The clock is used to synchronize all of the computers operations. There are two ways of
doing this. The instructions or an operation is done as the edge of the clock rises or can be
done as the edge of the clock falls

2.1.4 Peripherals
 Peripherals are devices that are connected to the cpu, they are not part of the computer
itself but supply the cpu with information that is needed to execture the processes

2.2 CHARACTERISTICS AND EXAMPLES OF EMBEDDED SYSTEMS

 It is quite difficult to define standards for embedded systems because the application
dictates the design choices

2.3 HARDWARE and SOFTWARE DESIGN

 A good embedded design will optimize between various metrics of design (again the
importance of the metrics is dictated by the application)
o Unit cost and NRE cost
o Size and weight
o Performance and power consumption
o Flexibility and maintainability
o Time-to-market
o Correctness
o Security of the system
 The platform on which an embedded system can be developed varies and it depends on
the complexity cost (practically the metrics mentioned)

Das könnte Ihnen auch gefallen