Beruflich Dokumente
Kultur Dokumente
Contatti
Fabio Campi
ST, Central CAD & Design Solutions / SignOff Team c/o Lab. STAR, Universita di Bologna, Viale C.Pepoli 3/2 051/2093834 fabio.campi@st.com , fabio.campi@unibo.it
Outline
Signal Processing trends The Morpheus Signal Processor:
Introduction Re-configurable cores System-level Interconnect Strategy Measurements
Applications:
Motion Detection for surveillance systems Network Routing Systems High resolution Film Grain Removal
ST Microelectronics (1)
HDL RTL
HDL Simulation
Synthesis
Power Integrity
LVS / DRC
6
P P (ARM/PowerPC) (ARM/PowerPC)
Memory Memory
DSPs DSPs
ASIC CORES
I/O
Full chip re-design is too costly for almost any design project, except only huge volumes such as GP P or GPU Logic Synthesis has enabled re-use of elementary layout blocks (Logic Gates / Std-Cells) making architecture design essentially independent of technology Most SOC building blocks are Technology-Independent Soft-IP written in HDL that can be re-utilized and re-synthesized depending on the desired target technology
8
RTL - HDL
RTL HDL is a Description Style of Hardware Description Languages (essentially VHDL and Verilog) that is friendly for Logic Synthesis RTL is based on the strict separation between combinatorial blocks and Sequential parts
Sequential parts are mapped on the chosen FF structure Combinatorial parts are heavily optimized by complex reduction algorithms on the most efficient set of standard cells available Synthesis is partitioned in a Logic pre-processing and a Technologyaware refinement The basic Constraints at the base of logic synthesis are TIMING/AREA/POWER. The user can tune priorities between the three leading to significantly variable results
9
RTL Coding
Sequential
Combinatorial
10
RISC Microprocessors are very successful IPs for SOC design because they have a simple structure that is very portable across different Technology/Synthesis styles They also offer small area, and good compiler flexibility The Harvard Memory model (concurrent Access to separate data & Instruction memories) is almost always exploited to ensure higher performance
11
entity Data_Reg is port ( clk, reset, EN : in std_logic; D : in std_logic_vector(31 downto 0); Q : out std_logic_vector(31 downto 0) ) end Data_Reg; architecture behavioral of Data_Reg is begin process(CLK, reset) begin if reset = reset_active then data_out <= x00000000); elsif CLK'event and CLK = '1' then if EN = '0' then Q <= D; end if; end if; End;
12
d1_in
R1 R31 . .
a_out
b_out
13
slt
nor
OP
result
14
OP
RS
RT
RD
OPX
31 26
25.21
20 . 16
15
OP
RS
RT
Immediate
31 26
25.21
20 . 16
15
OP
RS
Branch type
Immediate
16
op := instr(31 downto 26); opx := instr( 5 downto 0); op_branch := instr(20 downto 16); -- Addressed Registers: rs := instr(25 downto 21); rt := instr(20 downto 16); rd := instr(15 downto 11); -- Shift Amount Field: shamt := instr(10 downto 6); -- 16-bit immediate operand: -- (Used for Register/Immediate Alu operations) immed16 := instr(15 downto 0); if op = xi_add then writeback_reg <= rd; alu_command <= (op=>opx,isel=>'1',immed=>(others => '0'),hrdwit=>'1' ); shift_op <= xi_shift_sll(2 downto 0); exe_outsel <= exeout_alu; mem_command <= (mr=>'1',mw=>'1',mb=>'1',mh=>'1',sign=>'1'); illegal_opcode <= '1'; jump_type <= xi_branch_carryon; mul_command <= xi_mul_nop; [ ..] 17
ALU
Exe_outsel
REGISTER FILE
DATA MEMORY
elsif op = xi_lw then writeback_reg <= rt; alu_command <= (op=>xi_alu_add,isel=>'0',immed=>SXT(immed16,word_width)); shift_op <= xi_shift_sll(2 downto 0); exe_outsel <= exeout_mem; mem_command <= (mr=>'0',mw=>'1',mb=>'1',mh=>'1',sign=>'0'); illegal_opcode <= '1; jump_type <= xi_branch_carryon; mul_command <= xi_mul_nop;
18
Alu_command
Immediate
ALU
Jump_type
+Immed
[] if op = xi_branch then writeback_reg <= r0; alu_command <= (op=>xi_alu_add,isel=>'0',immed=>SXT(immed16,word_width-2)&("00") ); shift_op <= xi_shift_sll(2 downto 0); exe_outsel <= exeout_alu; mem_command <= (mr=>'1',mw=>'1',mb=>'1',mh=>'1',sign=>'1'); illegal_opcode <= '1'; jump_type <= EXT(op_branch,6); mul_command <= xi_mul_nop; 19
In order to increase operating frequency, it is possible to perform different computation steps, or Stages, concurrently. Most used structures are 3 stages (Decode, Execute, WB) or 5/6 stages (Fetch, Decode, Execute, Mem Access, WB1, WB2) Advanced DSP architectures for embedded system feature ~10 to 12 pipeline stages, General Purpose uP chips can get to 30
20
In a pipelined architectures, results may be required before they are actually available In this case, a specific stall/bypass logic is built to preserve data consistency
21
RFILE
Decode
if (source_reg = ex_rd) and (ex_rd /= r0 ) then byp_control <= Main_exe; elsif (source_reg = mem_rd) and (mem_rd /= r0 ) then byp_control <= Main_mem; else byp_control <= no_bypass; end if; [..] -- BYPASS SELECTION MULTIPLEXERS BYP_OP_MUXA: BYPASS_MUX port map ( byp_controlA, rfile_out1, in_regM, rfile_in, dpath_rega ); BYP_OP_MUXB: BYPASS_MUX port map ( byp_controlB, rfile_out2, in_regM, rfile_in, dpath_regb );
ALU
22
Conclusion The introduction of HDL language, and in particular of Logic Synthesis has immensely accelerated the design time for the definition of digital circuits There is a price in performance, but is largely made up by the simplification in
Dealing with increasingly complex DRC rules Making Technology porting a trivial task