Sie sind auf Seite 1von 22

RISC-V “Rocket Chip”

SoC Generator in Chisel


Yunsup Lee
UC Berkeley
yunsup@eecs.berkeley.edu
What is the Rocket Chip SoC Generator?

!  Parameterized SoC generator written in Chisel


!  Generates Tiles
- (Rocket) Core + Private Caches
!  Generates Uncore (Outer Memory System)
- Coherence Agent
- Shared Caches
- DMA Engines
- Memory Controllers
!  Glues all the pieces together

2
“Rocket Chip” SoC Generator

Rocket
Tile
Rocket
Tile HTIF
!  Generates n Tiles
Core
ROCC
Core
ROCC - (Rocket) Core
- RoCC Accelerator
Accel. Accel.
FPU FPU

- L1 I$
L1 Inst
sets,
L1 Data
sets,
L1 Inst
sets,
L1 Data
sets,
- L1 D$
!  Generates HTIF
ways ways ways ways

- Host DMA Engine


L1 Network !  Generates Uncore
Coherence Manager
- L1 Crossbar
- Coherence Manager
- Exports MemIO
Interface

TileLink / MemIO Converter


3
Why SoC Generators?

!  Helps tune the design under different


performance, power, area constraints, and
diverse technology nodes
!  Parameters include:
- number of cores
- instantiation of floating-point units, vector units
- cache sizes, associativity, number of TLB entries,
cache-coherence protocol
- number of floating-point pipeline stages
- width of off-chip I/O, and more

4
Why Chisel?

!  RTL generator written in Chisel


- HDL embedded in Scala
!  Full power of Scala
for writing generators Chisel Program

- object-oriented
Scala/JVM
programming
- functional C++
FPGA
programming code
Verilog
ASIC
Verilog

C++ Compiler

Software FPGA Tools


Simulator ASIC Tools
FPGA
Emulation
GDS
Layout
5
Rocket Scalar Core
PC IF ID EX MEM WB
ITLB Int.RF DTLB
PC To RoCC
I$ Inst. Int.EX D$ Commit to Hwacha
Gen. Accelerator
Access Decode Access
bypass paths omitted
for simplicity
FP.RF FP.EX1 FP.EX2 FP.EX3
Rocket Pipeline
- 64-bit
PC
5-stage
VITLB single-issue
VInst.
in-order pipeline
Seq- Bank1 ... Bank8
- Design
Gen. minimizes
VI$ impact
Decode of long
uencer clock-to-output
Expand
R/W delaysR/W
of compiler-generated RAMs
Access
from Rocket
- 64-entry BTB, 256-entry BHT, 2-entry RAS Hwacha Pipeline
- MMU supports page-based virtual memory
- IEEE 754-2008-compliant FPU
-  Supports SP, DP FMA with hw support for subnormals

6
ARM Cortex-A5 vs. RISC-V Rocket
Category ARM Cortex-A5 RISC-V Rocket
ISA 32-bit ARM v7 64-bit RISC-V v2
Architecture Single-Issue In-Order Single-Issue In-Order 5-stage
Performance 1.57 DMIPS/MHz 1.72 DMIPS/MHz
Process TSMC 40GPLUS TSMC 40GPLUS
Area w/o Caches 0.27 mm2 0.14 mm2
Area with 16K Caches 0.53 mm2 0.39 mm2
Area Efficiency 2.96 DMIPS/MHz/mm2 4.41 DMIPS/MHz/mm2
Frequency >1GHz >1GHz
Dynamic Power <0.08 mW/MHz 0.034 mW/MHz

- PPA reporting conditions


-  85% utilization, use Dhrystone for benchmark, frequency/power
at TT 0.9V 25C, all regular VT transistors
- 10% higher in DMIPS/MHz, 49% more area-efficient
7
HTIF: Host-Target Interface

!  UC Berkeley specific block mainly used to


emulate devices for simple test chips
- Emulates system calls, console, block devices,
frame buffer, network devices
- No need for this block once the SoC has actual
devices on the target machine
!  Consider it as a “host DMA engine”
!  A port for for host system to read/write
- Core CSRs (control and status registers)
- Target Memory

8
Important Interfaces in the Rocket Chip

!  ROCCIO
Tile Tile HTIF
Rocket Rocket HTIFIO
Core Core
ROCCIO
ROCC
Accel.
ROCC
Accel. - Interface between
Rocket/Accelerator
FPU FPU
HostIO

L1 Inst L1 Data L1 Inst L1 Data !  HTIFIO


- Read/Write CSRs
sets, sets, sets, sets,
ways ways ways ways

!  TileLinkIO
client client client client client

- Coherence Fabric

TileLink
O
inkI

L1 NetworkO arb
kIO nkI

!  MemIO
Li
L

Lin Tile
Tile

Tile
Coherence Manager
mngr
- Simple AXI-like
memory interface
client arb
!  HostIO
TileLinkIO

TileLink

mngr TileLink / MemIO Converter


- Host Interface to
HTIF
MemIO

9
TileLinkIO

Client Client
Cache Cache

Release

Release
Acquire

Acquire
Probe

Probe
Finish

Finish
Grant

Grant
Manager

- TileLinkIO consists of Acquire, Probe, Release, Grant,


Finish

10
UncachedTileLinkIO

Client Client
Cache

Release
Acquire

Acquire
Probe

Finish

Finish
Grant

Grant
Manager

- UncachedTileLinkIO consists of Acquire, Grant, Finish


- Convertors for TileLinkIO/UncachedTileLinkIO in uncore
library
11
MemIO
Master MemReqCmd Slave

MemReqCmd.valid

MemReqCmd.ready

Decoupled(MemData)

Decoupled(MemResp)

- MemReqCmd consists of addr, rw (write=true), tag


- MemData consists of 128 bit data payload
- MemResp consists of 128 bit data payload, tag
- Decoupled(interface) means an interface with ready/
valid signals
12
ROCCIO
!  Rocket sends
Rocket Decoupled(Cmd) ROCC coprocessor instruction
Accel. via the Cmd interface
Decoupled(Resp)
!  Accelerator responds
CacheIO through Resp interface
!  Accelerator sends
busy memory requests to
IRQ L1D$ via CacheIO
!  busy bit for fences
supervisor bit !  IRQ, S, exception bit
used for virtualization
UncachedTileLinkIO !  UncachedTileLinkIO for
instruction cache on
PTWIO accelerator
exception
!  PTWIO for page-table
walker ports on
accelerator
13
HTIFIO
HTIF reset Tile

core_id

Decoupled(CSRReq)

Decoupled(CSRResp)

Decoupled(IPIReq)

Decoupled(IPIResp)

- reset signal and core_id routed from HTIF (historical


reasons nothing technical)
- CSR Read/Write requests go through CSRReq/CSRResp
- IPI Requests go through IPIReq/IPIResp
- HTIFIO likely to be modified in the near future
14
Rocket Chip C++ Emulator Setup

Verilog Simulator pthread

RISC-V
Rocket Chip HostIO Frontend
MemIO
Server

pthread

DRAMSim2

15
Rocket Chip FPGA Setup
ZYNQ FPGA

Rocket Chip

MemIO
HostIO

HostIO/AXI MemIO/AXIHP
Convertor Convertor
AXI HP
AXI

AXI Master AXI HP Slave

ARM
Processing System
DDR3 DRAM

RISC-V Frontend Server


16
Rocket Chip Berkeley Test Chip Setup
Test Chip

MemIO
Rocket Chip

MemIO
Serializer

M
em
HostIO
Se
ri a
lIO
ZYNQ FPGA

HostIO/AXI MemIO/AXIHP MemIO


MemIO
Convertor Convertor Deserializer
AXI HP
AXI

AXI Master AXI HP Slave

ARM
Processing System
DDR3 DRAM

RISC-V Frontend Server

17
Rocket Chip “SoC” Setup

Interrupts

Rocket Chip TiileLinkIO


Devices
Uncached
TiileLinkIO
MemIO

mIO
Me

LPDDR3 LPDDR3 DRAM


Memory Controller

18
Who should use the Rocket Chip Generator?
People who would like to develop …
!  A RISC-V SoC
Tile Tile HTIF
Rocket Rocket HTIFIO
Core
ROCCIO
ROCC
Core
ROCC -  Look into Chisel
parameters
Accel. Accel.
FPU FPU

!  New Accelerators
HostIO

L1 Inst
sets,
L1 Data
sets,
L1 Inst
sets,
L1 Data
sets,
-  Drop in at ROCCIO level
ways ways ways ways
!  Own RISC-V Core
client client client client client
-  Drop in at TileLinkIO level
or MemIO level

TileLink
O

!  Own Device
inkI

L1 NetworkO arb
kIO Li nkI
L

Lin Tile
Tile

Tile
Coherence Manager
-  Drop in at TileLinkIO or
UncachedTileLinkIO
mngr

client arb
TileLinkIO

TileLink

mngr TileLink / MemIO Converter


MemIO

19
New Features: L2$ with Directory Bits
Tile Tile HTIF !  Shared L2$ with
multiple banks
Rocket Rocket
Core Core
ROCC ROCC

!  Each L2$ will act as a


Accel. Accel.
FPU FPU

L1 Inst L1 Data L1 Inst L1 Data


coherence manager
with directory bits
sets, sets, sets, sets,
ways ways ways ways

client client client client client


(snoop filter)
!  These caches can be

TileLink
L1 Network arb

L2Cache L2CacheCoherence ManagerL2Cache


L2Cache L2Cache composed to build
mngr mngr mngr mngr mngr
outer-level caches
such as an L3$
sets, sets, sets, sets, sets,
ways ways ways ways ways

client client client client client


TileLink

mngr TileLink / MemIO Converter

20
New Features: ROCC interfaces with L2$
Tile Tile HTIF !  ROCC talks directly
to the L2$ to
Rocket Rocket
Core Core
ROCC

address more data


Accel.
FPU FPU

ROCC
Accel.
L1 Inst
L1 Inst L1 Data L1 Data
sets, sets, sets,
ways ways ways

client client client client client

TileLink
L1 Network arb

L2Cache L2CacheCoherence ManagerL2Cache


L2Cache L2Cache
mngr mngr mngr mngr mngr

sets, sets, sets, sets, sets,


ways ways ways ways ways

client client client client client


TileLink

mngr TileLink / MemIO Converter

21
New Features on the Deck

!  Dual-issue Rocket Core


!  Hwacha Vector Unit (checkout hwacha.org)
!  Dump MemIO and use AXI

22

Das könnte Ihnen auch gefallen