Sie sind auf Seite 1von 32

DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang University of Maryland

DRAM Circuit and Architecture Basics


Overview Terminology Access Protocol Architecture


Word Line Bit Line Storage element (capacitor)

Switching element

DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang University of Maryland

DRAM Circuit Basics


DRAM Cell
DRAM

Word Line Bit Line

Storage element (capacitor)


Data In/Out Buffers

Column Decoder Sense Amps ... Bit Lines... Row Decoder


. .. Word Lines ...

Memory Array

Switching element

Row, Bitlines and Wordlines

DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang University of Maryland

DRAM Circuit Basics


Row Defined
Bit Lines Word Line

Row of DRAM

Row Size: 8 Kb @ 256 Mb SDRAM node 4 Kb @ 256 Mb RDRAM node

DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang University of Maryland

DRAM Circuit Basics


Sense Amplifier I

1 2 3

4 5 6
Sense and Amplify

6 Rows shown

DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang University of Maryland

DRAM Circuit Basics


Sense Amplifier II : Precharged
precharged to Vcc/2

1 2 3 Vcc (logic 1) Gnd (logic 0)

4 5 6 Vcc/2
Sense and Amplify

DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang University of Maryland

DRAM Circuit Basics


Sense Amplifier III : Destructive Read

1 2 3 Vcc (logic 1) Gnd (logic 0)

4 5 6
Sense and Amplify Wordline Driven

Vcc/2

DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang University of Maryland

DRAM Access Protocol


ROW ACCESS
DRAM Column Decoder Data In/Out Buffers
CPU MEMORY BUS CONTROLLER

Sense Amps ... Bit Lines...

Row Decoder

. .. Word Lines ...

Memory Array

AKA: OPEN a DRAM Page/Row or ACT (Activate a DRAM Page/Row) or RAS (Row Address Strobe)

DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang University of Maryland
once the data is valid on ALL of the bit lines, you can select a subset of the bits and send them to the output buffers ... CAS picks one of the bits big point: cannot do another RAS or precharge of the lines until finished reading the column data ... cant change the values on the bit lines or the output of the sense amps until it has been read by the memory controller

DRAM Circuit Basics


Column Defined
Column: Smallest addressable quantity of DRAM on chip SDRAM*: column size == chip data bus width (4, 8,16, 32) RDRAM: column size != chip data bus width (128 bit fixed) SDRAM*: get n columns per access. n = (1, 2, 4, 8) RDRAM: get 1 column per access. 4 bit wide columns #0 #1 #2 #3 #4 #5

One Row of DRAM * SDRAM means SDRAM and variants. i.e. DDR SDRAM

DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang University of Maryland

DRAM Access Protocol


COLUMN ACCESS I
DRAM Column Decoder Data In/Out Buffers
CPU MEMORY BUS CONTROLLER

Sense Amps ... Bit Lines...

Row Decoder

. .. Word Lines ...

Memory Array

READ Command or CAS: Column Address Strobe

DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang University of Maryland
then the data is valid on the data bus ... depending on what you are using for in/out buffers, you might be able to overlap a litttle or a lot of the data transfer with the next CAS to the same page (this is PAGE MODE)

DRAM Access Protocol


Column Access II
DRAM Column Decoder Data In/Out Buffers
CPU MEMORY BUS CONTROLLER

Sense Amps ... Bit Lines...

Row Decoder

. .. Word Lines ...

Memory Array

Data Out ... with optional additional CAS: Column Address Strobe

note: page mode enables overlap with CAS

DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang University of Maryland
NOTE

DRAM Speed Part I


How fast can I move data from DRAM cell to sense amp?
DRAM Column Decoder Data In/Out Buffers
CPU MEMORY BUS CONTROLLER

Sense Amps ... Bit Lines...

Row Decoder

. .. Word Lines ...

Memory Array

tRCD
RCD (Row Command Delay)

DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang University of Maryland

DRAM Speed Part II


How fast can I get data out of sense amps back into memory controller? tCAS aka tCASL aka tCL
Data In/Out Buffers
CPU MEMORY BUS CONTROLLER

DRAM Column Decoder Sense Amps ... Bit Lines... Row Decoder
. .. Word Lines ...

Memory Array

CAS: Column Address Strobe CASL: Column Address Strobe Latency CL: Column Address Strobe Latency

DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang University of Maryland

DRAM Speed Part III


How fast can I move data from DRAM cell into memory controller?
DRAM Column Decoder Data In/Out Buffers
CPU MEMORY BUS CONTROLLER

Sense Amps ... Bit Lines...

Row Decoder

. .. Word Lines ...

Memory Array

tRAC = tRCD + tCAS


RAC (Random Access Delay)

DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang University of Maryland

DRAM Speed Part IV


How fast can I precharge DRAM array so I can engage another RAS?
DRAM Column Decoder Data In/Out Buffers
CPU MEMORY BUS CONTROLLER

Sense Amps ... Bit Lines...

Row Decoder

. .. Word Lines ...

Memory Array

tRP
RP (Row Precharge Delay)

DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang University of Maryland

DRAM Speed Part V


How fast can I read from different rows?
DRAM Column Decoder Data In/Out Buffers
CPU MEMORY BUS CONTROLLER

Sense Amps ... Bit Lines...

Row Decoder

. .. Word Lines ...

Memory Array

tRC = tRAS + tRP


RC (Row Cycle Time)

DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang University of Maryland

DRAM Speed Summary I


What do I care about? tRCD tCAS tRP tRC = tRAS + tRP tRAC = tRCD + tCAS
Embedded systems designers DRAM manufactuers Computer Architect: Latency bound code i.e. linked list traversal Seen in ads. Easy to explain Easy to sell

RAS: Row Address Strobe CAS: Column Address Strobe RCD: Row Command Delay RAC :Random Access Delay RP :Row Precharge Delay RC :Row Cycle Time

DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang

DRAM Speed Summary II


DRAM Type Frequency Data Bus Width (per chip) 16 16 16 16 32 Peak Data Bandwidth (per Chip) 200 MB/s 532 MB/s 1.6 GB/s 0.8 GB/s 2.4 GB/s Random Access Time (tRAC) 45 ns 45 ns 60 ns 25 ns 25 ns Row Cycle Time (tRC) 60 ns 60 ns 70 ns 25 ns 25 ns

University of Maryland
PC133 SDRAM DDR 266 PC800 RDRAM FCRAM RLDRAM 133 133 * 2 400 * 2 200 * 2 300 * 2

DRAM is slow But doesnt have to be tRC < 10ns achievable Higher die cost Not commodity Not adopted in standard Expensive

DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang University of Maryland
DRAM latency isnt deterministic because of CAS or RAS+CAS, and there may be significant queuing delays within the CPU and the memory controller Each transaction has some overhead. Some types of overhead cannot be pipelined. This means that in general, longer bursts are more efficient.

DRAM latency
F CPU A B C D E2/E3 DRAM Mem
Controller

E1

A: Transaction request may be delayed in Queue B: Transaction request sent to Memory Controller C: Transaction converted to Command Sequences (may be queued) D: Command/s Sent to DRAM E1: Requires only a CAS or E2: Requires RAS + CAS or E3: Requires PRE + RAS + CAS F: Transaction sent back to CPU DRAM Latency = A + B + C + D + E + F

DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang University of Maryland
NOTE

DRAM Architecture Basics


PHYSICAL ORGANIZATION
x2 DRAM Data Buffers x4 DRAM Data Buffers x8 DRAM Column Decoder Sense Amps ... Bit Lines... Data Buffers Column Decoder Sense Amps ... Bit Lines...

Column Decoder Sense Amps ... Bit Lines...

Row Decoder

Row Decoder

Memory Array

Memory Array

Row Decoder

Memory Array

x2 DRAM

x4 DRAM

x8 DRAM

This is per bank Typical DRAMs have 2+ banks

....

....

....

DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang University of Maryland
lets look at the interface another way .. the say the data sheets portray it. [explain] main point: the RAS\ and CAS\ signals directly control the latches that hold the row and column addresses ...

DRAM Architecture Basics


Read Timing for Conventional DRAM

RAS

Row Access Column Access

CAS

Data Transfer
Address Row Address Column Address Row Address Column Address

DQ

Valid Dataout

Valid Dataout

DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang University of Maryland
since DRAMs inception, there have been a stream of changes to the design, from FPM to EDO to Burst EDO to SDRAM. the changes are largely structural modifications -- nimor -- that target THROUGHPUT. [discuss FPM up to SDRAM Everything up to and including SDRAM has been relatively inexpensive, especially when considering the pay-off (FPM was essentially free, EDO cost a latch, PBEDO cost a counter, SDRAM cost a slight re-design). however, were run out of free ideas, and now all changes are considered expensive ... thus there is no consensus on new directions and myriad of choices has appeared [ do LATENCY mods starting with ESDRAM ... and then the INTERFACE mods ]

DRAM Evolutionary Tree


...... MOSYS ........

FCRAM Conventional DRAM

Structural Modifications Targeting Latency

$ (Mostly) Structural Modifications Targeting Throughput


VCDRAM

FPM

EDO

P/BEDO

SDRAM

ESDRAM

Interface Modifications Targeting Throughput


Rambus, DDR/2 Future Trends

DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang University of Maryland
NOTE

DRAM Evolution
Read Timing for Conventional DRAM
Row Access Column Access Transfer Overlap
RAS

Data Transfer

CAS

Address Row Address Column Address Row Address Column Address

DQ

Valid Dataout

Valid Dataout

DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang University of Maryland
FPM aallows you to keep th esense amps actuve for multiple CAS commands ... much better throughput problem: cannot latch a new value in the column address buffer until the read-out of the data is complete

DRAM Evolution
Read Timing for Fast Page Mode
Row Access Column Access Transfer Overlap
RAS

Data Transfer

CAS

Address Row Address Column Address Column Address Column Address

DQ

Valid Dataout

Valid Dataout

Valid Dataout

DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang University of Maryland
solution to that problem -instead of simple tri-state buffers, use a latch as well. by putting a latch after the column mux, the next column address command can begin sooner

DRAM Evolution
Read Timing for Extended Data Out
Row Access Column Access Transfer Overlap Data Transfer
RAS

CAS

Address Row Address Column Address Column Address Column Address

DQ

Valid Dataout

Valid Dataout

Valid Dataout

DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang University of Maryland
by driving the col-addr latch from an internal counter rather than an external signal, the minimum cycle time for driving the output bus was reduced by roughly 30%

DRAM Evolution
Read Timing for Burst EDO
Row Access Column Access Transfer Overlap Data Transfer
RAS

CAS

Address Row Address Column Address

DQ

Valid Data

Valid Data

Valid Data

Valid Data

DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang University of Maryland
pipeline refers to the setting up of the read pipeline ... first CAS\ toggle latches the column address, all following CAS\ toggles drive data out onto the bus. therefore data stops coming when the memory controller stops toggling CAS\

DRAM Evolution
Read Timing for Pipeline Burst EDO
Row Access Column Access Transfer Overlap Data Transfer
RAS

CAS

Address Row Address Column Address

DQ

Valid Data

Valid Data

Valid Data

Valid Data

DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang University of Maryland
main benefit: frees up the CPU or memory controller from having to control the DRAMs internal latches directly ... the controller/CPU can go off and do other things during the idle cycles instead of wait ... even though the time-to-first-word latency actually gets worse, the scheme increases system throughput

DRAM Evolution
Read Timing for Synchronous DRAM
Clock

Row Access Column Access

RAS

Transfer Overlap Data Transfer

CAS

Command ACT Address Row Addr Col Addr READ

DQ

Valid Data

Valid Data

Valid Data

Valid Data

(RAS + CAS + OE ... == Command Bus)

DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang University of Maryland
output latch on EDO allowed you to start CAS sooner for next accesss (to same row) latch whole row in ESDRAM -allows you to start precharge & RAS sooner for thee next page access -- HIDE THE PRECHARGE OVERHEAD.

DRAM Evolution
Inter-Row Read Timing for ESDRAM
Regular CAS-2 SDRAM, R/R to same bank
Clock

Command ACT Address Row Addr Col Addr Bank Row Addr Col Addr READ PRE ACT READ

DQ

Valid Data

Valid Data

Valid Data

Valid Data

Valid Data

Valid Data

Valid Data

Valid Data

ESDRAM, R/R to same bank


Clock

Command ACT Address Row Addr Col Addr Bank Row Addr Col Addr READ PRE ACT READ

DQ

Valid Data

Valid Data

Valid Data

Valid Data

Valid Data

Valid Data

Valid Data

Valid Data

DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang University of Maryland
neat feature of this type of buffering: write-around

DRAM Evolution
Write-Around in ESDRAM
Regular CAS-2 SDRAM, R/W/R to same bank, rows 0/1/0
Clock

Command ACT Address Row Addr Col Addr Bank Row Addr Col Addr Bank Row Addr Col Addr READ PRE ACT WRITE PRE ACT READ

DQ

Valid Data

Valid Data

Valid Data

Valid Data

Valid Data

Valid Data

Valid Data

Valid Data

Valid Data

Valid Data

Valid Data

ESDRAM, R/W/R to same bank, rows 0/1/0


Clock

Command ACT Address Row Addr Col Addr Bank Row Addr Col Addr Col Addr READ PRE ACT WRITE READ

DQ

Valid Data

Valid Data

Valid Valid Data Data

Valid Data

Valid Data

Valid Valid Data Data

Valid Valid Data Data

Valid Valid Data Data

(can second READ be this aggressive?)

DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang University of Maryland
main thing ... it is like having a bunch of open row buffers (a la rambus), but the problem is that you must deal with the cache directly (move into and out of it), not the DRAM banks ... adds an extra couple of cycles of latency ... however, you get good bandwidth if the data you want is cache, and you can prefetch into cache ahead of when you want it ... originally targetted at reducing latency, now that SDRAM is CAS-2 and RCD-2, this make sense only in a throughput way

DRAM Evolution
$

Internal Structure of Virtual Channel


Bank B Bank A
2Kb Segment

16 Channels (segments)

Input/Output Buffer

2Kb Segment

2Kbit
2Kb Segment

# DQs

DQs

2Kb Segment

Row Decoder

Sense Amps
Prefetch Restore

Sel/Dec
Read Write

Activate

Segment cache is software-managed, reduces energy

DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang University of Maryland
FCRAM opts to break up the data array .. only activate a portion of the word line

DRAM Evolution
Internal Structure of Fast Cycle RAM
SDRAM FCRAM

Row Decoder

13 bits

8M Array (8Kr x 1Kb)

Row Decoder

8K rows requires 13 bits tto select ... FCRAM uses 15 (assuming the array is 8k x 1k ... the data sheet does not specify)

15 bits

8M Array (?)

Sense Amps

Sense Amps

tRCD = 15ns (two clocks)

tRCD = 5ns (one clock)

Reduces access time and energy/access

DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang University of Maryland
MoSys takes this one step further ... DRAM with an SRAM interface & speed but DRAM energy [physical partitioning: 72 banks] auto refresh -- how to do this transparently? the logic moves tthrough the arrays, refreshing them when not active. but what is one bank gets repeated access for a long duration? all other banks will be refreshed, but that one will not. solution: they have a bank-sized CACHE of lines ... in theory, should never have a problem (magic)

......

Internal Structure of MoSys 1T-SRAM


addr

Bank Select

Auto Refresh $ DQs

........

DRAM Evolution

Das könnte Ihnen auch gefallen