Sie sind auf Seite 1von 23

VHDL Come strumento di progettazione per circuiti digitali VLSI

Fabio Campi, STMicroelectronics, Agrate Brianza (MB)

Seminario del corso di Progetto di Sistemi Elettronici L/A AA 2009/2010

Contatti

Fabio Campi
ST, Central CAD & Design Solutions / SignOff Team c/o Lab. STAR, Universita di Bologna, Viale C.Pepoli 3/2 051/2093834 fabio.campi@st.com , fabio.campi@unibo.it

Possibilita (per un limitato numero di studenti)


Tesi di laurea a Bologna o Agrate Brianza
Riconosciuta come stage/tirocinio aziendale

Outline
Signal Processing trends The Morpheus Signal Processor:
Introduction Re-configurable cores System-level Interconnect Strategy Measurements

Applications:
Motion Detection for surveillance systems Network Routing Systems High resolution Film Grain Removal

STMicroelectronics: Product Portfolio

ST Microelectronics (1)

ASIC Design Flow

HDL RTL

HDL Simulation

Synthesis

Floorplan Place & Route


Parasitic Extraction Timing Analysis

Power Integrity

LVS / DRC
6

Asic Design Flow: Basic Steps


From WIKIPedia: In electronics, logic synthesis is a process by which an abstract form of desired circuit behavior (typically register transfer level (RTL)) is turned into a design implementation in terms of logic gates Place and route is a stage in the design of integrated circuits. The first step, placement, involves deciding where to place all electronic components, circuitry, and logic elements in a generally limited amount of space. This is followed by routing, which decides the exact design of all the wires needed to connect the placed components. This step must implement all the desired connections while following the rules and limitations of the manufacturing process. In the design of Integrated Circuits, signoff is the collective name given to a series of verification steps that must pass before the design can be taped out. This implies an iterative process involving incremental fixes across the board in one or more check type and retesting the design.
7

System-on-Chip Template Architecture

P P (ARM/PowerPC) (ARM/PowerPC)

Memory Memory

Interconnect (Bus // Network-on-Chip) Interconnect (Bus Network-on-Chip)

DSPs DSPs

ASIC CORES

I/O

Full chip re-design is too costly for almost any design project, except only huge volumes such as GP P or GPU Logic Synthesis has enabled re-use of elementary layout blocks (Logic Gates / Std-Cells) making architecture design essentially independent of technology Most SOC building blocks are Technology-Independent Soft-IP written in HDL that can be re-utilized and re-synthesized depending on the desired target technology
8

RTL - HDL
RTL HDL is a Description Style of Hardware Description Languages (essentially VHDL and Verilog) that is friendly for Logic Synthesis RTL is based on the strict separation between combinatorial blocks and Sequential parts
Sequential parts are mapped on the chosen FF structure Combinatorial parts are heavily optimized by complex reduction algorithms on the most efficient set of standard cells available Synthesis is partitioned in a Logic pre-processing and a Technologyaware refinement The basic Constraints at the base of logic synthesis are TIMING/AREA/POWER. The user can tune priorities between the three leading to significantly variable results
9

RTL Coding

Sequential

Combinatorial

10

Structure of a Typical RISC Microprocessor

RISC Microprocessors are very successful IPs for SOC design because they have a simple structure that is very portable across different Technology/Synthesis styles They also offer small area, and good compiler flexibility The Harvard Memory model (concurrent Access to separate data & Instruction memories) is almost always exploited to ensure higher performance

11

Basic Building Blocks: (1) Generic Register


D EN Q D EN Q

entity Data_Reg is port ( clk, reset, EN : in std_logic; D : in std_logic_vector(31 downto 0); Q : out std_logic_vector(31 downto 0) ) end Data_Reg; architecture behavioral of Data_Reg is begin process(CLK, reset) begin if reset = reset_active then data_out <= x00000000); elsif CLK'event and CLK = '1' then if EN = '0' then Q <= D; end if; end if; End;

12

Basic Building Blocks: (2) Register File


Registers:for i in 1 to 31 generate rx : data_reg port map (clk,reset,enable,reg_in(i),reg_out(i)); end generate Registers; READ_A_MUX: process(reg_out,ra) begin if Conv_Integer(unsigned(ra)) = 0 then a_out <= ( others => '0' ); else a_out <= reg_out(Conv_Integer(unsigned(ra))); end if; end process; READ_B_MUX: process(reg_out,rb) begin if Conv_Integer(unsigned(rb)) = 0 then b_out <= ( others => '0' ); else b_out <= reg_out(Conv_Integer(unsigned(rb))); end if; end process; WRITE_D_MUX:process(rd1,d1_in,reg_out) begin for i in 1 to 31 loop if i = Conv_Integer(unsigned(rd1)) then reg_in(i) <= d1_in; else reg_in(i) <= reg_out(i); end if; end loop; end process;

d1_in

R1 R31 . .

a_out

b_out

13

Basic Building Blocks: (3) ALU


entity Main_alu is port( in_a : in Std_logic_vector(31 downto 0); in_b : in Std_logic_vector(31 downto 0); op : in Risc_Alucode; result : out Std_logic_vector(31 downto 0); overflow : out Std_logic ); end Main_alu; architecture structural of Main_alu is begin if (op = xi_alu_add) or (op = xi_alu_addu) then result <= signed(in_a) + signed(in_b);; elsif (op = xi_alu_sub) or (op = xi_alu_subu) then result <= signed(in_a) signed(in_b); elsif op = xi_alu_slt then if signed(in_a) < signed(in_b) then result <= EXT("1",word_width); else result <= EXT("0",word_width); end if; elsif op = xi_alu_sltu then if unsigned(in_a) < unsigned(in_b) then result <= EXT("1",word_width); else result <= EXT("0",word_width); end if; elsif op = xi_alu_and then result <= in_a and in_b; elsif op = xi_alu_or then result <= in_a or in_b; elsif op = xi_alu_xor then result <= in_a xor in_b; elsif op = xi_alu_nor then result <= in_a nor in_b; else result <= in_a; end if; end process;

slt

nor

OP

result

14

ALU Detail: Overflow Logic Specification


OUTPUT_OVERFLOW: process(in_a,in_b,sum,diff,op) begin if op = xi_alu_add then if ( in_a(31) /= in_b(31) ) or ( in_a(word_width-1) = sum(word_width-1) ) then overflow <= 0'; else overflow <= 1'; end if; elsif op = xi_alu_sub then if ( in_a(31) = in_b(31) ) or ( in_a(31) = diff(31) ) then overflow <= 0'; else overflow <= 1'; end if; else overflow <= 0'; end if; end process;
15

Definition of RISC Instruction Set


31 26 25.21 20 . 16 15 11 5 0

OP

RS

RT

RD

OPX

31 26

25.21

20 . 16

15

OP

RS

RT

Immediate

31 26

25.21

20 . 16

15

OP

RS

Branch type

Immediate

16

Instruction decode Logic: ALU Operation


process(instr)

op := instr(31 downto 26); opx := instr( 5 downto 0); op_branch := instr(20 downto 16); -- Addressed Registers: rs := instr(25 downto 21); rt := instr(20 downto 16); rd := instr(15 downto 11); -- Shift Amount Field: shamt := instr(10 downto 6); -- 16-bit immediate operand: -- (Used for Register/Immediate Alu operations) immed16 := instr(15 downto 0); if op = xi_add then writeback_reg <= rd; alu_command <= (op=>opx,isel=>'1',immed=>(others => '0'),hrdwit=>'1' ); shift_op <= xi_shift_sll(2 downto 0); exe_outsel <= exeout_alu; mem_command <= (mr=>'1',mw=>'1',mb=>'1',mh=>'1',sign=>'1'); illegal_opcode <= '1'; jump_type <= xi_branch_carryon; mul_command <= xi_mul_nop; [ ..] 17

Instruction Decode Logic: Memory access operations


Alu_command

ALU

Exe_outsel

REGISTER FILE

DATA MEMORY

elsif op = xi_lw then writeback_reg <= rt; alu_command <= (op=>xi_alu_add,isel=>'0',immed=>SXT(immed16,word_width)); shift_op <= xi_shift_sll(2 downto 0); exe_outsel <= exeout_mem; mem_command <= (mr=>'0',mw=>'1',mb=>'1',mh=>'1',sign=>'0'); illegal_opcode <= '1; jump_type <= xi_branch_carryon; mul_command <= xi_mul_nop;
18

Instruction decode Logic: BRANCH Operations


+4 Instruction Memory

Alu_command

Immediate

ALU

Jump_type

+Immed
[] if op = xi_branch then writeback_reg <= r0; alu_command <= (op=>xi_alu_add,isel=>'0',immed=>SXT(immed16,word_width-2)&("00") ); shift_op <= xi_shift_sll(2 downto 0); exe_outsel <= exeout_alu; mem_command <= (mr=>'1',mw=>'1',mb=>'1',mh=>'1',sign=>'1'); illegal_opcode <= '1'; jump_type <= EXT(op_branch,6); mul_command <= xi_mul_nop; 19

5 Stages Pipeline Concept

In order to increase operating frequency, it is possible to perform different computation steps, or Stages, concurrently. Most used structures are 3 stages (Decode, Execute, WB) or 5/6 stages (Fetch, Decode, Execute, Mem Access, WB1, WB2) Advanced DSP architectures for embedded system feature ~10 to 12 pipeline stages, General Purpose uP chips can get to 30
20

Data Hazard Handling


Add r3,r4 Add r3,r4 Lw, r3 Nop Lw, r3

In a pipelined architectures, results may be required before they are actually available In this case, a specific stall/bypass logic is built to preserve data consistency
21

Data Hazard Handling: Bypass

RFILE

Bypass Selection mux

Decode

Execute DATA MEM Memory

if (source_reg = ex_rd) and (ex_rd /= r0 ) then byp_control <= Main_exe; elsif (source_reg = mem_rd) and (mem_rd /= r0 ) then byp_control <= Main_mem; else byp_control <= no_bypass; end if; [..] -- BYPASS SELECTION MULTIPLEXERS BYP_OP_MUXA: BYPASS_MUX port map ( byp_controlA, rfile_out1, in_regM, rfile_in, dpath_rega ); BYP_OP_MUXB: BYPASS_MUX port map ( byp_controlB, rfile_out2, in_regM, rfile_in, dpath_regb );

ALU

= Pipeline separation registers

22

Conclusion The introduction of HDL language, and in particular of Logic Synthesis has immensely accelerated the design time for the definition of digital circuits There is a price in performance, but is largely made up by the simplification in
Dealing with increasingly complex DRC rules Making Technology porting a trivial task

The immediate consequence of this breakthrough in the mid-1990 were


The development of innovative design architectures The emergence of the concept of IP-Reuse The emergence and success of FPGAs
23

Das könnte Ihnen auch gefallen