Sie sind auf Seite 1von 120

Can

an Informa*on Theorist Be Happy in a Center for Informa*on Storage?


Jack Keil Wolf CMRR, UCSD

Padovani Lecture 2010 School of Informa*on Theory USC

This talk is dedicated to David Slepian who taught me all that I know about informa*on theory and a lot more.

Introduc*on

A very short pictorial history of my rela*onship with Roberto Padovani and how I ended up teaching at UCSD.

The Univ. of Pennsylvania 1952-1956

Princeton University 1956-1959

USAF 1960-1963

NYU (Uptown) 1963-1965

Brooklyn Poly 1965-1973

UMASS 1973-1984

Roberto Padovani and Me


Roberto Padovani was one of my graduate students at the University of Massachuse\s. His M.S. thesis was on the performance of error detec*ng codes and his Ph.D. thesis was on the design and performance of trellis codes. He joined Linkabit Corpora*on upon gradua*on from UMass.

Roberto Padovani and Me


He joined Qualcomm shortly a_er it was founded. One of the rst Qualcomm products was a TCM chip based upon a pragma*c coding scheme he co-developed. He was the principle architect of Qualcomms high speed cellular data system. He is presently CTO of Qualcomm.

He is a great friend and a terric boss.

Adver*sement for TCM Chip

Roberto Prior to Giving an Invited Lecture at UMass in 2008

Roberto Presen*ng the Lecture

Roberto Answering Ques*ons About the Future of Commumica*ons

UCSD
In 1983 a new interdisciplinary research center was being formed at UCSD. It was called the Center for Magne*c Recording Research (CMRR) and was concerned with educa*ng students and pursuing research in magne*c recording. It sounded interes*ng to me because:
Our kids had all le_ Amherst and we were looking for something new. I had worked with Goeried Ungerboeck at IBM Zurich on coding for a par*al response channel which I learned was a model for the magne*c recording channel. UCSD was in San Diego.

Loca*on, Loca*on, Loca*on

A Minor Problem
I knew nothing about magne*c recording. Not only did I not know how to spell coercivity but the rst *me I men*oned it in a talk I mispronounced it. But UCSD reluctantly made me an oer as the rst faculty member in CMRR.

Advice from Others to Me


Berlekamp had wri\en:
"Communica*on links transmit informa*on from here to there. Computer memories transmit informa*on from now to then. That sounded very good to me.

But many of my very smart friends said:


Magne*c recording is boring. Not only is it boring but it is a dead end! All the advances have been made. The future lies in
Op*cal recording Holographic recording Etc., etc., etc.

But

Very smart people are some*mes wrong.

1956: IBM RAMAC First Magne*c Hard Drive

Total Capacity = 5 Mbyte

Gigabyte Drive Circa 1983 IBM 3380

A 2010 2 Terabyte Drive

$119.00 Amazon.com

Compressed History of HDDs


1956 5 Mbyte IBM RAMAC 2010 2 TByte HDD

2 kbits/in2 $10,000

Areal Density Cost per MByte * 7 cents a gigabyte

500 Gbits/in2 $0.00007*

Historical Areal Density Increase of Hard Disk Drives

*
Areal density has been increased more than 250 million :mes with respect to the rst RAMAC in 1956 from 0.002 Mbit/in2 to 500 Gbit/in2 in today We expect much higher areal density in the future, i.e., 1 Tbit/in2 and 10 Tbit/in2

* CAGR = Cumula*ve Annual Growth Rate

Jack Wolf Arrives at CMRR

25

Ques*on: What is This? Answer: A 1975 HDD Factory Floor

Facts About This Factory Floor

The total capacity of all of the drives shown on this factory oor was less than 20 Gigabytes. The total selling price of all of these drives was about $4,000,000!

More About the IBM 3380 HDD

1980s: IBM 3380 Drive

Hard Drives vs. Flash Memory

Lower capacity consumer products (e.g., IPODs) have transi*oned to ash memory. But ash is s*ll more expensive than HDDs for computer applica*ons. Flash memory is NOT a direct replacement for HDDs because of several dierences to be discussed later.

$$ per Megabyte Hard Disk vs. Solid State Drives

Because of the success of Flash, the 1 HDD was discon*nued a few years ago.

Coat of Arms of an Informa*on Theorist


Source * Source Encoder Channel Encoder

Blocks a digital recording designer cant control Channel * * Blocks an informa*on theorist cant control Sink * Source Decoder Channel Decoder

Coat of Arms of Informa*on Theorist Working in Digital Recording


Error Correction Encoder Modulation Encoder Channel Encoder Channel
Note that the channel is controllable (but by physicists)

Write Equalization

Error Correction Decoder

Modulation Decoder Channel Decoder

Equalization and Detection

Some Items of Interest to an Informa*on Theorist


The channel (which is made up of the magne*c media and the write and read heads) keeps changing.
Big improvements in recording density have been achieved here!!

The error correc*ng code used for the last 25 years is a Reed Solomon code in conjunc*on with a hard input decoder.
But LDPC codes and itera*ve decoding are on the way!!

The purpose of the modula*on code is to prevent certain bad sequences from being wri\en.
To an informa*on theorist, this is coding for the noiseless channel.

Informa*on Theorists Like Simple Models


The write signal is plain vanilla +1 / -1 baseband binary data. (No QAM, M-PSK, etc.)

An AWGN channel is o_en used as a rst order approxima*on for the channel model. But the actual channel is really much more complex. At low recording densi*es there is essen*ally no ISI so matched lter (bit by bit) detec*on is op*mal (for an AWGN channel).

Some Items of Interest to an Informa*on Theorist


However, at higher recording densi*es, the ISI cannot be ignored. To achieve higher recording densi*es, in 1984 the industry abandoned bit by bit detec*on and adopted par*al response signaling with Viterbi sequence detec*on to combat ISI. IBM called it PRML (since they wanted to avoid the use of Viterbis name). Every disk drive today uses some form of Viterbi detec*on.

Why Such Amazing Progress?


It depends upon who you talk to.
Physicists credit advanced materials for heads and disks. Mechanical engineers credit advanced mechanics. Informa*on theorists credit applica*ons of Shannon theory.

One es*mate is that about 20% of the progress was due to advances in signal processing. However, advances in all elds were required to make the system work.

Progress as Seen By a Physicist

2007 Nobel Prize in Physics was awarded to the inventors of the GMR head

Progress as Seen by an Informa*on Theorist


ECC Reed Solomon Codes TMTR Modulation Codes LDPC Codes?? Post-processing

MFM FM

MSN NPML (0,G/I) E2PR4 (1,7) EPR4 (2,7) PR4

Peak Detection

PRML Par*al Response With Viterbi Detec*on

The True Sources of Progress


Many dierent technological advances led to this amazing progress. New inven*ons were the enabling technology. However, the constant progress between the introduc*on of these new inven*ons, was the result of scaling (i.e., shrinking the dimensions of everything).

Longitudinal vs. Perpendicular Recording


Longitudinal magnetic recording (LMR) technology Perpendicular magnetic recording (PMR) technology

Limit was around 150 Gbit/in2

PMR technology

- High anisotropy material - Ver:cal alignment of magne:za:on Writing is bit is possible - Much smaller due to flux leaking

from the write head to the disk. Reading is due from flux leaking from the disk to the read head.

It was achieved by 500 Gbit/in2 today

41

Shingled Write Process


Gap is 100 nm but bits are 25 nm. How can this be??
100 nm

100 nm

disk

Modula*on Codes
The purpose of modula*on codes is to prohibit the occurrence of certain troublesome sequences such as sequences which cause excessive ISI or which make *ming recovery dicult. The most well known example of a modula*on code is the so-called (d,k) code, where no run of 0s longer than k or less than d is permi\ed. d and k are nonnega*ve integers for which k > d. In early Gbyte drives (circa 1980), (2,7) and (1,7) codes were used. Today, varia*ons on (0,k) codes are used. Shannon discussed such codes in the very beginning of his 1948 paper in a sec*on called en*tled The Discrete Noiseless Channel.

Shannon Statue at CMRR

Claude Shannon
In his classic 1948 paper, Shannon showed that for large n, the number of length n constrained sequences, N(n), is approximately 2Cn. The quan*ty C is called the capacity of the constrained system. Said in another way

The rate of a code, R, is the (average) ra*o of the number of unconstrained digits to constrained digits. Shannon showed that there exists codes at rate R, if and only if R < C.

Compu*ng the Capacity


Shannon (1948) gave two equivalent methods for compu*ng the capacity which are applicable to (d,k) codes. First method:

For nite k, N(n) sa*ses the linear dierence equa*on: N(n)=N(n-(d+1))+N(n-(d+2))+ +N(n-(k+1)).

Compu*ng the Capacity


By standard methods of solving linear dierence equa*ons Shannon showed that C is equal to the base 2 logarithm of the largest real root of the equa*on:
xk+2 - xk+1 - xk-d+1 +1 = 0.

Second Method:

Shannon showed that the capacity is equal to the base 2 logarithm of the largest eigenvalue of the adjacency matrix of a graph which generates the code symbols. We illustrate these two methods for a (1,2) code (i.e., d=1 and k=2).

Compu*ng the Shannon Capacity of Binary (1,2) Codes


First method

If d=1 and k=2, the equa*on xk+2 - xk+1 - xk-d+1 +1 = 0 becomes x4 - x3 - x2 +1 = 0. The largest real root of this equa*on is 1.3247 and its base 2 logarithm is 0.4057.

Compu*ng the Shannon Capacity of Binary (1,2) Codes


Second Method A constraint graph that generates code words in a (1,2) code is:
0 1 0 1

The adjacency matrix of this graph is: 0 1 0 1 0 1 1 0 0 .

Compu*ng the Shannon Capacity of Binary (1,2) Codes


The largest real eigenvalue of this matrix is 1.3247 and its base 2 logarithm is 0.4057. Thus the capacity of this constraint is C=0.4057. Said in another way, for any (1,2) code, the ra*o of unconstrained binary digits to constrained (i.e., coded) binary digits is at most 0.4057.

Example: A Rate 1/2 (2,7) Code


Using either of Shannons methods, the capacity, C, of a (2,7) code is found to be 0.5174. However, Shannon did not tell us how to construct codes at rates near or at capacity. A variable length, xed rate, R =, (2,7) code: 0 1 .
informa*on phrases 1 0 1 1 0 0 0 10 11 0 1 0 0 1 1 010 011 0 0 1 0 0 0 1 1 code words 0 1 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0

000

The code words form a prex-free code so can be decoded.

0010 0011

Example: (2,7) Codes


This code was used to combat ISI in systems using bit by bit detec*on: No coding: channel bit spacing = T 1 0 1 1 0 0 0
Minimum separa*on between 1s = T

With rate (2,7) coding: channel bit spacing = T/2 0 1 0 0 1 0 0 0 0 0 0 1 0 0


minimum separa*on between 1s = 3T/2

Constrained Codes (2-Dimensions)

Coding theorists are also interested in 2-dimensional constrained binary codes: i.e., constrained binary arrays where the binary digits are arranged in an array of rows and columns. Such codes might have applica*on in 2-dimensional storage.

Constrained Codes (2-Dimensions)


We will use as an example, a 2-dimensional array where every row and every column sa*ses a 1-dimensional (d,k) constraint. Other interes*ng constraints exist.

A 2-Dimensional (1,2) Array

0 1 0 0 1 0 1 0 0 1 0 0 1 . . .
1 0 0 1 0 1 0 0 1 0 0 1 0 . . . 0 0 1 0 1 0 0 1 0 0 1 0 1 . . . 0 1 0 1 0 0 1 0 0 1 0 1 0 . . . 1 0 1 0 0 1 0 0 1 0 1 0 1 . . . . . .

Constrained Codes (2-Dimensions)


Let the array have m rows and n columns. N(m,n): the number of arrays that sa*sfy the 2- dimensional constraint. Then the 2-dimensional capacity, C2, is dened as:

Capacity of (d,k) Constrained Arrays

For 2-dimensional (d,k) constraints, C2 exists but Shannon didnt tell us how to compute it. To this day, for rectangular constraints, the exact value of the capacity is unknown except for the trivial cases where C2=0 or C2= 1.

ECC Codes
Reed Solomon codes are used in todays hard disk drives. We are on the verge of seeing the introduc*on of LDPC codes with itera*ve decoding in HDD.

Press Release August 4, 2010


announces its low-density parity check (LDPC)-based device is currently shipping in mainstream 2.5-inch mobile hard disk drive products. Todays HDD data recovery architectures are mostly based on concatenated coding schemes which use Reed Solomon error correc*on codes, invented almost 50 years ago. Now, by using LDPC-based solu*ons, HDD vendors can con*nue to double the storage capacity of their drives every 18 months. current LDPC-based device reduces the number of errors read from a disk from 1 in 100 to 1 in 100 Million bits of data, rela*ve to the previously-used concatenated coding schemes.

The Future of HDDs


It is possible that the areal density will saturate very soon using the present technology . As the size of the stored bit shrinks, the present magne*c material will not hold its magne*za*on. This is called the superparamagne*c eect. It is believed that a radically new system will be required to overcome this eect.

The Future of Disk Drives


Two solu*ons are being pursued to overcome the superparamagne*c eect.
One solu*on is to use a magne*c material with a much higher coercivity. The problem with this solu*on is that you cannot write on the material at room temperature so you need to heat the media to write. This is done with a laser The second approach is called pa\erned media where bits are stored on physically separated magne*c islands separated by a sea of non-magne*c material.

Future Technology?
HAMR-Heat Assisted Magne*c Recording

Pa\erned Media

Wri*ng on Pa\erned Media

Ordinary Media Pa\erned Media

In ordinary media, one can write a bit anywhere on the magne*c surface. In pa\erned media one must write each bit on a magne*c island. This is a dicult task since one cannot read and write simultaneously.

Shingled Recording for Pa\erned Media


Time
0 1 2 3 4 Data __ 0 1 1 0 (wrtten late) Islands / Recorded Bits x x x x x x x x x 0 0 0 0 x x x x x 0 1 1 1 1 x x x x 0 1 1 1 1 1 x x x 0 1 1 1 0 0 0 0 x

5 0 (written late) 0 1 1 1 0 0 0 0 0 Note that if the data bit written late is the same as the previous bit, there is no error in the recorded bit!!!

At *me i = 1, 2, 3, . . . Xi data bit {0,1} Yi recorded bit {0,1} Zi state of channel {0,1} Zi = 0 if data bit is wri\en on correct island Zi = 1 if data bit is wri\en late Then: Thus: Yi = Xi if Zi = 0 Yi = Xi-1 if Zi = 1. Yi = Xi (Xi Xi-1) Zi

Mathema*cal Model

Previous Example
X Z Y 0 0 0 1 0 1 1 0 1 0 1 1 0 1 0 1 0 1 0 1 1 . . . . . . . . .

A Simple Rate Code and Channel Capacity


Consider the trivial binary rate code where each data bit Mi is recorded twice. That is, assume that X2i-1 = X2i = Mi {0,1}. Then since Y2i = X2i (X2i X2i-1) Z2i

Y2i = Mi independent of the value of Z2i.

A decoder can decode this rate code with zero error probability just by observing the values of Y with even indices and thus the zero error capacity of this channel is at least . As a result, a lower bound to the capacity of the channel is independent of the sta*s*cal model assumed for the Z process.

Two Dierent Models for Z


Random Z {Zi} is Bernouli with parameter p: B(p) That is, {Zi} are i.i.d., and p=Pr[Zi = 1] 2-state Gilbert model Z: G(p0,1,p1,0) p0,1 1-p0,1
Zi=0 Zi=1

1-p1,0

p1,0

Channel Capacity
For any model for the Z process, the channel capacity is dened in the usual manner. We call the capacity of the Bernoulli state model with parameter p, CB(p), and we call the capacity of the Gilbert state model with parameters (p0,1 , p1,0), CG (p0, 1 , p1,0). For the Bernouli state model one can prove that: CB (p)=CB (1-p) and for Gilbert state model, one can prove that: CG (p0,1 , p1,0) = CG (p1,0 , p0,1).

Proof that CB (p)=CB (p) = CB (1-p)

Details are given in the paper Write Channel Model for Bit-Patterned Media Recording which will appear in the IEEE Transactions of Magnetics.

Bernoulli Model
The parameter space can therefore be reduced to the interval p [0, 1/2 ]. Furthermore the same symmetry argument holds for not just the rate-maximizing input distribu*on, but for all input distribu*ons. The capacity of the Bernoulli model is upper bounded by the achievable rate for a genie-aided decoder, i.e., one with the {Z} process realiza*on known.

Upper Bound on Capacity for the Bernouli Model


Given the realiza*on of the Z process, whenever Zi1=1 and Zi=0, the value of Xi-1 cannot be determined from the Y process. Thus the Bernoulli state channel is equivalent to a correlated symmetric erasure channel with average erasure rate Pr{Zi1=1, Zi=0} = p(1-p). The resul*ng erasure channel is correlated since, erasures being dependent on 1 to 0 transi*ons in the Z process, two consecu*ve bits cannot be erased.

Upper Bound on Capacity for the Bernouli Model

The capacity of a correlated symmetric erasure channel is the same as that of a memoryless symmetric erasure channel with the same erasure probability. Therefore, CB (p) < [1 p(1-p)].

Example
Z Y X 0 0 0 1 0 1 1 1 1 0 0 1 . . . . . .

Example
Z Y X 0 0
0

0 1

0 1

1 1

1 0

0 1

. . . . . .

Example
Z Y X 0 0 0 0 1 1 0 1 1 1 1 0 0 1 . . . . . .

Example
Z Y X 0 0 0 0 1 1 0 1 1 1 1 1 0 0 1 . . . . . .

Example
Z Y X 0 0 0 0 1 1 0 1 1 1 1 0 1 0 0 1 . . . . . .

Example
Z Y X 0 0 0 0 1 1 0 1 1 1 1 0 1 0 ? 0 1 1 . . . . . .

Bernouli Model
The input distribu*on that maximizes the mutual informa*on is unknown so no closed form expression has been found for the capacity of the Bernouli model. However lower bounds on the capacity can be found by assuming par*cular forms for the input distribu*on (e.g., an i.i.d. input process). Very *ght upper and lower bounds have been found for the symmetric informa*on rate when the input is uniform and i.i.d. An accurate es*mate of the symmetric informa*on rate has been obtained using the BCJR algorithm.

Bernouli Model Symmetric Informa*on Rate

Bernouli Model Symmetric Informa*on Rate


Note that on the previous slide, for some values of p, an upper bound to the symmetric informa*on rate is strictly less than . But we know that the true capacity is greater than or equal to for all values of p. This shows that for these values of p, the capacity achieving input process is not i.u.d.

Bernouli Model
To explore the loss in achievable rate due to an i.i.d. input we considered a symmetric rst order binary Markov input where Pr{Xi = 1|Xi-1 = 0} = Pr{Xi = 0|Xi-1 = 1} = . Upper and lower bounds were found for the mutual informa*on for a Markovian input as a func*on of and p.

Bernouli Model with Markovian Input

Comparison of the Symmetric Informa*on Rate and the Informa*on Rate for a Markovian Source

Summary for Bernouli State Model

It was found that for the Bernouli model for Z, considerable gains in the reliable transfer rate are possible by using an input with memory.

Gilbert Model
We know less about compu*ng the capacity for this model than for the Bernouli model By using a genie to inform the decoder of the Z process we can obtain an upper bound to the capacity. Again the result is a correlated erasure channel with average erasure probability Pr{Zi-1=1, Zi=0} =p1,0 p0,1/(p1,0+p0,1) resul*ng in the upper bound for the capacity: CG(p0,1 , p1,0) < 1 [p1,0 p0,1/(p1,0+p0,1)].

Inser*ons and Dele*ons


For any model for Z one can interpret the eects of the channel in terms of inser*ons and dele*ons. If Zi-1=0 and Zi=1, then Yi-1 = Yi = Xi-1 so there is an inser*on of Xi-1 in the Y sequence. If on the other hand Zj-1=1 and Zj=0, then Yj-1 = Xj-2 and Yj = Xj so there is a dele*on of Xj-1 in the Y sequence. Note that in this model, inser*ons and dele*ons alternate in occurrence and inser*ons are a repeat of the previous data digit.

Inser*ons and Dele*ons


Example X Z Y 0 0 0 1 1 0 1 1 1 0 1 1 0 1 0 1 0 1 0 0 0 . . . . . .

inser*on dele*on

Some Final Remarks ON HDDs


People have been predic*ng the death of magne*c hard disk drives for many years. Lacking a prognos*scope, it is dicult to predict how long the HDD will remain the storage device of choice. However, magne*c hard disk drives seems to be a cat with nine lives having beat out all compe*tors in the past.

Coding for Flash Memories

91

Flash Memory
Flash is a non-vola*le memory which is fast, power ecient and has no moving parts. Electrically programmed and erased. Used in: Digital cameras Low capacity IPODS Mobile phones Laptop computers Hybrid drives
92

Flash Memories Structure


Array of cells made from oa*ng gate transistors. Cells are subdivided into blocks and then into pages. The cells are programmed by pulsing electrons via hot-electron injec*on.

Flash Memories Structure


Array of cells made from oa*ng gate transistors. Cells are subdivided into blocks and then into pages. The cells are programmed by pulsing electrons via hot-electron injec*on.

94

Flash Memories Structure


Array of cells made from oa*ng gate transistors. Cells are subdivided into blocks and then into pages. The cells are programmed by pulsing electrons via hot-electron injec*on.
Each cell can have q levels,

represented by dierent amounts of electrons.


In todays products, q =2,4,8 or

16.

In order to reduce a cell level, all the cells in that block must be reset to level 0 before rewri*ng. A VERY EXPENSIVE OPERATION
95

Flash Memory Structure


The memory consists of blocks
The size of each block is 128(or256) KB. Each block consists of 64 (or128) pages. The size of each page is 2KB. Page 1 Page 2 Page 3 !

Wri*ng Write sequen*ally to the next available page. Erasing Can only erase an en*re block!

Page 63 Page 64

!
96

Flash Memory Constraints


The endurance of ash memories is related to to the number of *mes the blocks are erased. In single level ash with q=2, a block can tolerate ~104-105 erasures before it starts producing excessive errors.
SLC: Single Level Cell

The larger the value of q, the less the endurance.


MLC: Mul* Level Cell

The Goal: Represen*ng the data eciently such that block erasures are postponed as much as possible.
97

Experiment Descrip*on
For each block the following steps were repeated:
The block was erased. Pseudo-random data was wri\en to the block. The data was read and compared to nd errors.

Remark:
The experiment was done under lab condi*ons. Other factors such as temperature change, intervals between erasures and mul*ple readings before erasures were not considered.

98

Raw BER for SLC block


!10-4

Guaranteed life*me by the manufacturer

99

!10
6

Raw BER for MLC block


!10-3

Guaranteed life*me by the manufacturer

100

!10
5

An Introduc*on to WOM-codes
WOM-codes allow us to write several *mes to the same block of memory without erasing. Example: In 1982, Rivest and Shamir found a way to write 2 bits of informa*on twice using only 3 cells. We denote a WOM code that writes k *mes to n cells as a <V1, V2, , Vk>/n code where Vi is the number of messages wri\en on the ith write. Thus the Rivest Shamir code is a <4,4>/3 code with k=2.
101

The Rivest-Shamir Code


Data Bits First Write Second Write 00 01 10 11 000 001 010 100 111 110 101 011
Example 1: First Write: Want to store data 01: Write 001 to memory. Second Write: Want to store data 10: Write 101 to memory. Example 2: First Write: Want to store data 01 Write 001 to memory. Second Write: Want to store data 01: Leave 001 in memory.

102

If we want to write the same data on the second write, we do not change what is wri\en on the rst write.

Note that when going from rst write to second write, no 1s are erased.

Rate of a WOM-code
The rate of the ith write is: Bits of informa*on Ri = Total number of bits log2(Vi) Ri = n The total rate of a WOM-code is R = (Ri). The Rivest Shamir code has R1 = R2 = 2/3 and R = 4/3.
103

WOM Capacity Region


The capacity region of a binary WOM code with two writes is CWOM = {(R1, R2) | p [0, 0.5], R1 h(p), R2 1 p} where h(p) = -p log2(p) (1-p) log2(1-p). R = R1 + R2 < h(p) + 1 p. The right hand side is maximized for p = 1/3 yielding Rmax = log2(3) = 1.58

104

Our Construc*on for a 2-Write WOM Code


Choose a linear code (n, k) with parity check matrix H. Let r=n-k so that H is an r-row by n-column matrix of rank r. For a vector v {0,1}n, let H(v) be the matrix H with 0s replacing the columns that correspond to the posi*ons of the 1s in v. On rst write, write only those vectors v such that rank(H(v)) = r. Let V1 = { v {0,1}n | rank(H(v)) = r}. Then R1 = log2|V1|/n.

Our Construc*on for a 2-Write WOM Code


Assume that e1 is vector wri\en on the rst write. Second write:
Consider a data vector s2 of r bits. Find e2 such that H(e1)e2 = s1 s2, where s1 = He1. A solu*on e2 exists since rank(H(e1)) = r. Write e1 e2 to memory.

Decoding on the second write:

Mul*ply the stored vector e1 e2 by H: H(e1 e2) = He1 He2 = s1 (s1 s2) = s2

Thus, R2 = (n-k)/n and R = R1 + R2 = [log2|V1|+(n-k)]/n.

WOM-Code Construc*on: An Example


Let H be the parity check matrix of a (7,4) Hamming code.
1 1 1 0 1 0 0 H = 1 0 1 1 0 1 0 1 1 0 1 0 0 1 n = # of columns = 7 r = # of rows = 3

On the rst write, we program only vectors v such that rank(H (v)) = 3, V1 = { v {0,1}n | rank(H(v)) = 3}. For H as shown above, |V1| = 1 + 7 + 21 + 35 + (35-7) = 92. Thus, we can write one of 92 messages at the rst write. Encoding and decoding of the rst write are done with a lookup table. Say we write e1 = 0 1 0 1 1 0 0.

For a vector v {0,1}n, let H(v) be the matrix H with 0s in the columns that correspond to the posi*ons of the 1s in v.

WOM-Code Construc*on: An Example


e1 = 0 1 0 1 1 0 0 Note that s1 = He1 = 0 1 0. Insert 0s in the columns of H that correspond to 1s in the rst write. This new matrix is H(e1): 1 1 1 0 1 0 0 1 0 1 0 0 0 0 H = 1 1 1 0 1 0 1 0 1 0 0 1 = H(e ) 0 0 1 1 1 0 1 0 0 1 1 0 0 0 0 0 1 We can now write a message of length r = 3 bits. Say we want to store s2 = 0 1 1 on second write. Want to nd a vector e2 such that H(e1) e2 = s1 s2. s1 s2 = 0 0 1 1 0 1 0 0 0 0 1 0 1 0 0 1 0 1 0 0 0 0 0 1 Choose e2 = 0 0 0 0 0 0 1. Then we write e1 e2 = 0 1 0 1 1 0 1

WOM-Code Construc*on: An Example


Note that we can decode by mul*plying by H: 0 1 1 1 1 0 1 0 0 0 0 1 0 1 1 0 1 0 . 1 = 1 = s2 1 1 0 1 0 0 1 1 1 0 1
We can write 92 messages at the rst write so V1 = 92 and R1 = log2(92)/7 = 0.932, R2 = 3/7 = 0.429 and R = 1.361 which is be\er than the Rivest-Shamir construc*on. However, R1 = R2 for the Rivest-Shamir construc*on.

Some More Results


The best value of R previously achieved for two writes was by Wu who obtained R = 1.371. We obtained many codes that be\ered this result. For the Golay(24,12) code, we obtained R = 1.4547. For the Golay(23,11) code, we obtained R = 1.4632. Choosing the code as the dual of the Reed-Muller(4,2) code, we obtained R = 1.4566. If R1 > R2 , we can limit the number of messages used on the rst write so that R1 = R2 and R= 2 R1 . Doing this for the dual of the Reed-Muller(4,2) code, we obtained R = 1.375.
110

Computer Search for Good 2-Write WOM Codes


Construct a random matrix of size nr and rank r. Cycle through all vectors of length n and Hamming weight at most n-r. For each vector v, zero out columns of the matrix where 1s exist in v. Compute the rank of the matrix. If it is the same as the original rank, add one to |V1|. Once we know |V1|, we can compute the rate with R = (1/n) (log2|V1| + r)

111

Time Sharing
If we know two codes with rates (R1, R2) and (R3, R4), we can achieve any rate pair (t*R1 + (1-t)*R3, t*R2 + (1-t)*R4) for t a ra*onal number between 0 and 1.

112

Some Achievable Rate Pairs and Capacity for WOM With Two Writes
113

More Achievable Rate Pairs and Capacity for WOM With Two Writes

114

Codes for More than 2 Writes


t - # of Writes Lower Bound 3 4 5 6 7 8 9 10 1.58 1.75 1.75 1.75 1.82 1.88 1.95 2.01 Our Construc*on 1.66 1.95 1.99 2.14 2.15 2.23 2.23 2.27 Upper Bound log4 = 2 log5 = 2.32 log6 = 2.58 log7 = 2.8 log8 = 3 log9 = 3.17 log10 = 3.32 log11=3.46

the ques*on: Can an Informa*on Theorist Be Happy in a Center for Informa*on Storage? is a resounding yes.

If you havent guessed already, the answer to

My Collaborators on this Talk

Paul Siegel

Aravind Iyengar

Eitan Yaakobi*

* with Sco\ Kayser

Some of the Rest of the Cast at CMRR

And a Special Thanks to All of My Ph.D. Students


Altekar, Shirish A. Armstrong, Alan J. Aviran, Sharon Baggen, Constant P. M. J. Barndt, Richard D. Bender, Paul Bernal, Robert W. Bridwell, John D. Bunin, Barry J. Caroselli, Joseph P. Chiang, Chung-Yaw Demirkan, Ismail Dorfman, Vladimir Eddy, Thomas W. Ergul, Faruk R. Fitzpatrick, James Fredrickson, L. J. French, Catherine Ann Friedmann, Arnon A. Goldberg, Jason S Gupta, Dev Vart Hartman, Paul D. Ho, Kelvin K. Y. Karakulak, Seyhan Kerdock, Anthony M. Kim, Byung Guk Klein, Theodore J. Knudson, Kelly J. Kurkoski, Brian M. Lee, Patrick Levie, Karl Li, Allan I. Lin, Yinyi Ma, Howard H. Ma, Joong S. MacDonald, Charles E. Mangano, Dennis T. Marrow, Marcus Masnick, Burt McEwen, Peter A. Miller, John Milstein, Laurence B. Padovani, Roberto Panwar, Shivendra S. Pasternack, Gerald Paeerson, John D. Philips, Thomas K. Prohazka, Craig G. Raghavan, Sreenivasa A. Ritz, Mordechai Rodriguez, Manoel A. Schi, Leonard Souvignier, Thomas V. Trismen, Robert Wainberg, Stanley Walvick, Edward Weathers, Anthony D. Zehavi, Ephraim Zhang, Wenlong

Thank you for your kind a\en*on.

Das könnte Ihnen auch gefallen