Sie sind auf Seite 1von 51

VLSI System Design

EE577B
2009 Fall Semester

High Speed Memories: Evolution of Dynamic Memories


DRAM, DDR2 and DDR3
Naehyuck Chang
Visiting Professor
naehyuck.chang@usc.edu
Dept. of Electrical Engineering
Computer Engineering Division
Viterbi School of Engineering
University of Southern California

DRAM Architecture

General memory cell interconnection structure

Naehyuck Chang

DRAM Architecture

DRAM (Dynamic RAM) cell

Capacitor keeps the value


The capacitor is leaky

Reliable for 64 ms after charging or discharging

Repeat charging or discharging every 64 ms


Volatile storage

Cheap memory

Naehyuck Chang

DRAM Architecture

Wordline

Bitline

Cell values are transferred via the bitlines


Parallel architecture

Address encoding

Selection (addressing) of cells connected


to the same wordline
Exclusively selected by a decoder

Addressing

Non-redundant Address value encoding


using binary numbers

Address decoding

Large-size binary decoder: 28to-268435456 binary decoder for 256Mb


memory

Naehyuck Chang

DRAM Architecture

NOR structure

Fast random access capable

Discharged
Charged
Naehyuck Chang

DRAM Architecture

NOR structure

Fast random access capable

Discharged
Charged
Naehyuck Chang

DRAM Architecture

NOR structure

Fast random access capable

Discharged
Charged
Naehyuck Chang

DRAM Architecture

NOR structure

Fast random access capable

Discharged
Charged
Naehyuck Chang

DRAM Architecture

NOR structure

Fast random access capable

Discharged
Charged

1
Naehyuck Chang

0
5

DRAM Architecture

Multiplexed addressing

Naehyuck Chang

DRAM Architecture

Sense amplifier

Memory cell is a small transistor


(capacitor)

Bitline is a big capacitor

Connecting a memory cell to the


bitline cause only small amount of voltage change
Differential sense amplifier

Naehyuck Chang

DRAM Architecture

DRAM bus interface

Multiplexed addressing

Row address and column address


shares the same pins

Naehyuck Chang

DRAM Operations

Precharge

Naehyuck Chang

DRAM Operations

Row access

Assert RAS
(row address strobe)

Naehyuck Chang

10

DRAM Operations

Column access

Assert CAD
(column address strobe)

Naehyuck Chang

11

DRAM Operations

Refresh

DRAM cells are leaky capacitors


Can hold charge for 64 ms
Read operations restore correct data for the whole wordline
Dummy read operations for refresh
15.6 s equal time distance
Refresh overhead

Time for refresh operation


Need to close the raw currently open if not precharged already

If uniform read operations for the whole cell is guaranteed, no dummy read is
needed

Video frame buffer

Naehyuck Chang

12

DRAM Evolution

VRAM

Serial out

FPM: Fast Page Mode


EDO: Extended Data Out
P/B EDO: Pipelined Burst EDO
SDRAM: Synchronous DRAM
ESDRAM: Enhanced Synchronous DRAM
DDR: Double Data Rate
VCDRAM: Virtual Channel RADM
FCRAM: Fast Cycle RAM
MOSYS: Memory on System,
distributed DRAMs
for SoC and FPGAs

Naehyuck Chang

13

Conventional DRAM

Protocol

RAS*

CAS*

Row address strobe that latches the row address


Column address strobe that larches the column address

RAS* and CAS* go high to precharge and get ready for next cycle
Read and write is determined with WE* when CAS* is asserted

Variations are allowed such as read-modify-write

Naehyuck Chang

14

Conventional DRAM

Read operation

WE* should be high while CAS* is asserted


OE* can be enabled whenever data is need

Three state buffer enable


DRAM read operation

Naehyuck Chang

15

Conventional DRAM

Write operation

WE* should be low when CAS* is asserted

Data is lathed with CAS*

OE* is dont care


DRAM write operation

Naehyuck Chang

16

Conventional DRAM

Refresh

RAS* only refresh

Refresh address should be given


Selective refresh is feasible

Refresh overhead

In general, every 15.6 s


DRAM RAS*-only refresh

Naehyuck Chang

17

Conventional DRAM

Refresh

CAS* before RAS* refresh (CBR)

Use of a prohibit DRAM protocol


Use of internal refresh address counter

Makes the refresh control circuit simpler

DRAM CAS* before RAS* operation

Naehyuck Chang

18

Conventional DRAM

Refresh

Hidden refresh

CBR refresh is hidden after a read/write operation


WE* should be high when hidden the second CAS* is asserted
Data is available while RAS* is asserted
Reduces refresh overhead
DRAM hidden refresh

Naehyuck Chang

19

Conventional DRAM

Refresh

Self refresh

Suspend operation of its DRAM controller to save power without losing data stored in
DRAM

DRAM self refresh

Naehyuck Chang

20

Pseudo SRAM

Pseudo SRAM

Consists of a DRAM macro core with a traditional SRAM interface


On-chip refresh circuit
Higher density, higher speed, smaller die size than SDRAM
DRAM compatible process
Around 70 ns access time or 133 MHz synchronous interface
Low active standby current

Deep power down mode

E.g., < 100 A


E.g., < 5 A

Mobil phone applications

Naehyuck Chang

21

Fast Page Mode (FPM)

Keep RAS* active for a while

Repeat CAS* assertions changing the column addresses

E.g., 10 s
Keep the page or row open after a CAS* cycle completes
Continue to access different cells connected to the same wordline

Significant reduction of the access latency if spatial locality exists

Precharge and RAS delay

Naehyuck Chang

22

Extended Data Out (EDO)

Motivation

During the FPM, data disappears when CAS* becomes inactive


Need to keep CAS* until the CPU latches the output data

EDO

Extend the output data with a latch even CAS* becomes inactive for the next
column address processing
Very simple modification but enhance throughput up to 27% (5-2-2-2 @66 MHz)

Roughly 96 MB/s with a 32 bit bus

Naehyuck Chang

23

Extended Data Out (EDO)

70ns, 60ns and 50ns speeds

Longer CAS* duration

The same data out time

Shorter CAS* duration

The same data out time

Naehyuck Chang

24

Burst EDO

EDO + built-in column address counter

Spatial locality changes as cache appears

2 bit binary counter w/o carry (wrap around)

Do not need to supply new column address if the next column address is a simple
increment within the burst boundary
Initiate the memory controller concept

Page mode does not always enhance speed: wrong prediction


Cache line fill is a guaranteed special locality

5-1-1-1 access can improve throughput up to 50% than EDO

Burst EDO

Naehyuck Chang

Pipelined burst EDO

25

VRAM (Video DRAM)

EDO does not provide enough bandwidth for frame buffers

132 MB/s with a 32 bit bus (5-1-1-1 @66 MHz)


1280 by 1024 with 32 bpp @80 Hz refresh requires 400 MB/s only for refresh

Dual-port DRAM for video frame buffers

Two ports that can be used simultaneously

DRAM port: random access port like standard DRAMs


Video port: serial port with SLK

Typical DRAM arrays normally access


a full row of bits (i.e. a word line) at
up to 1024 bits at a time

Naehyuck Chang

26

Synchronous DRAM (SDRAM)

Motivation

Asynchronous bus protocols exhibit significantly


low efficiency

CAS* is actually synchronized to the 66 MHz bus clock

Synchronous bus protocol

Handshake protocol: active and inactive

Microprocessors now have synchronous bus


protocols
Burst mode EDO is ready to go to synchronous
interface

Non-zero CAS duration

All inputs are latched at the edge of the clock


No need to repeat active and inactive that occupies minimum two clock cycles

Multiplexed addressing CAS* and RAS* is preserved

Temporal redundancy still exists due to the structure of the memory organization

Pin counts are important for the chip cost

Naehyuck Chang

27

Synchronous DRAM (SDRAM)

Command scheme is introduced

Patterns of RAS*, CAS* and WE* are determined the commands

On-chip circuit for control and timing

Naehyuck Chang

28

Synchronous DRAM (SDRAM)

Generally, a dedicated memory controller is used

Convert microprocessor-compatible signals and timing to SDRAM signals and


timing

Naehyuck Chang

29

Synchronous DRAM (SDRAM)

SDRAM protocol

SDRAM 4 beat burst read


Command and data are synchronized with the clock

Bus clock and the SDRAM delay do not exactly match

Bus clock used to be synchronous (integer multiple) to the CPU core clock

Naehyuck Chang

30

Synchronous DRAM (SDRAM)

Read/write efficiency

It takes time to reverse the direction of the data bus


Read to write switch costs 2 cycles

Write to read switch costs 4 cycles


Extra NOPs inserted between requests

Switch frequency dependent on traffic

Naehyuck Chang

31

Synchronous DRAM (SDRAM)

SDRAM refresh

It is possible to refresh a RAM chip by opening and closing (activating and


precharging) each row in each bank - RAS* only refresh
To simplify the memory controller, SDRAM chips support an auto refresh command

Performs auto refresh to one row in each bank simultaneously

Maintains an internal refresh counter


Memory controller simply issues a sufficient number of auto refresh commands

Typical tREF = 64 ms, 4096 rows, then every 15.6 s

A refresh command executes in 75 ns on a DDR2- 400 256 Mb device

This corresponds to roughly 1% of the time

Naehyuck Chang

32

Synchronous DRAM (SDRAM)

On chip multi-banking

Multiple semi-independent banks


Typical configuration

4 to 8 banks

16K rows/bank
1024 columns/row

4 to 16 bits/column

Naehyuck Chang

33

Synchronous DRAM (SDRAM)

Timing and bus clock frequency

SDRAM has a DRAM architecture and synchronous bus interface


Internal timing requirements have nothing to do with the bus clock
To ensure the internal delay, a proper integer multiple wait clock cycles should be
applied

Naehyuck Chang

34

Synchronous DRAM (SDRAM)

Pipelined memory access

SDRAM interface has a separate data and command bus


Overlapping of different action in bank though command scheme

When data is transferred to or from a bank other banks are activated and precharged
(bank preparation)

Pipelining memory accesses increases efficiency and throughput

Naehyuck Chang

35

Synchronous DRAM (SDRAM)

Timing and bus clock frequency

Write to precharge at 33 MHz bus clock

Write to precharge at 66 MHz bus clock

Naehyuck Chang

36

Enhanced SDRAM (ESDRAM)

Made by Enhanced Memory Systems

Includes a small static RAM in the SDRAM chip

A wide bus between the SRAM and the SDRAM

Many accesses will be from the faster SRAM


On-chip bus

Category of cache DRAM and are used mainly for L1 and L2

Naehyuck Chang

37

Two Directions

Structural modification

Reduce the access latency


Break down the classical structure
of DRAM

Get rid of CAS* and RAS* scheme

Interface modification

Improve the bus protocol


High speed signaling

Serialization of Data
Special IO drives

Source synchronous signaling

Naehyuck Chang

38

Virtual Channel DRAM (VCDRAM)

Designed by NEC

Contains SRAM caches


Contain 16 virtual channels, or 16 1 KB SRAM caches
While the ESDRAM module handles caching internally, the VC SDRAM cache is
managed by the chipset

Naehyuck Chang

39

Fast Cycle RAM

Developed by the Fujitsu Corporation

Approaches the problem of DRAM/Processor speed in a different way

Various technologies such as EDO and SDRAM have attacked the problem with enhanced
logic circuitry and peripherals that accessed the DRAM core

FCRAM seeks to change the DRAM core itself

Core segmentation and pipeline operation


Ability to send row and column information at the same time

Naehyuck Chang

40

Memory on System (MoSys)

Surrounds the bit cell with control circuitry that makes the memory
functionally equivalent to SRAM

Controller hides all DRAM-specific operations such as precharging and refresh

Closer in size and density to embedded DRAM


SoC applications

Naehyuck Chang

41

Double Data Rate (DDR) DRAM memories

Double Data Rate (DDR) memories employ

Stub series terminated logic (SSTL_2) IO drivers

Signal swing of 2.5 Volt


JEDEC Standard JESD8-9B

Synchronous bus

Sender of the data sends a reference strobe signal along with data

The edges of the strobe are used to capture the valid data

Double data rate

The data is transferred on both positive and negative edges of the clock

Data rate of 266 Mbps/pin

Naehyuck Chang

42

DDR2 SDRAM

Employs an I/O buffer between the memory and the data bus

Data bus can be run at twice the speed of the memory clock

The two factors combine to achieve a total of 4 data transfers per memory clock cycle

For a 64 bit bus, 100 MHz bus clock

Peak transfer rate = (memory clock rate) 2 (for bus clock multiplier) 2 (for dual rate)
64 (number of bits transferred) / 8 (number of bits/byte) = 3200 MB/s

Naehyuck Chang

43

DDR3 SDRAM

Double-data-rate three synchronous dynamic random access memory

An improvement over DDR2 SDRAM


For a 64 bit bus, 100 MHz bus clock

Peak transfer rate = (memory clock rate) 4 (for bus clock multiplier) 2 (for dual rate)
64 (number of bits transferred) / 8 (number of bits/byte) = 6400 MB/s

Naehyuck Chang

44

DDR3 SDRAM

SDRAM/DDR2/DDR3 CAS latency

Naehyuck Chang

45

Direct Rambus DRAM

A wide internal bus connected via a high-speed interface to a narrow


external bus

An 18-bit-wide bidirectional data field


An 8-bit-wide field carrying commands and row and column addresses
Narrow on-chip bus is serialized and deserialized to provide a 144-/128-bit data
path into the core, which provides 16 bytes every 10 ns internally
A 2 byte external 1.25 ns bus yields a 1,600-Mbyte/s bandwidth

Transfers are accomplished on the rising and falling edges of the clock

Naehyuck Chang

46

Direct Rambus DRAM

Deep pipeline

High throughput
High latency

Naehyuck Chang

47

Das könnte Ihnen auch gefallen