Data Compression and Encryption chapt 1

© All Rights Reserved

Als PDF, TXT **herunterladen** oder online auf Scribd lesen

2 Aufrufe

Data Compression and Encryption chapt 1

© All Rights Reserved

Als PDF, TXT **herunterladen** oder online auf Scribd lesen

- Matlab IEEE 2009 Projects @ Hades InfoTech
- ITC UNIT-III.ppt
- Java Jmf2 0-Guide
- Running a Realtime Stats Service on MySQL
- 50120140501009
- Noise and Entropy in Music - On the Stochasticity of Musical Signals (Revised Version)
- Bahari09
- [1]
- Effects of compression and individual variability on face recognition performance.
- DBA-IN-Flay
- Fractal Image Compression
- Audio Compression
- Data Compression Report
- Computing Customization to Generate Expertise Exploitation
- 10.1.1.10
- 631S09_lec16MPEG
- Example Shannon
- Welcome to International Journal of Engineering Research and Development (IJERD)
- 1.Improving Various Reversible Data Hiding Schemes via Optimal Codes for Binary Covers
- VLSI_book

Sie sind auf Seite 1von 21

Karishma Raut

DATA COMPRESSION

compact form. We create these compact representations by identifying & using

structures that exist in the data. Data can be characters in a text file, numbers

that are samples of speech or image waveforms or sequences of numbers that are

generated by other processes.

The reason we need data compression is that more & more of the

information that we generate & use is in digital form in the form of numbers

represented by bytes of data. And the no. of bytes required to represent multimedia

data can be huge.

Ex: In order to digitally represent 1 sec of video without compression, we need more

than 20MB or 160MBits.

compression which is to “assign short codes to common events & long codes to rare

events”.

(long) to efficient (short).

Compression performance

i/p stream.

A value of 0.6 means that the data occupies 60% of its original size after

compression. Bits per sec bps,bits per pixel bpp.

2. Compression factor:

The value greater than one indicates compression.i.e. compression factor of

60% means that o/p stream occupies 40% of original size & saving of 60%

3. Compression gain:

Where the ref. Size is either size of i/p stream or size of i/p stream or size of

compressed stream produced by some standard lossless compression

method.

Prof. Karishma Raut

the average number of machine cycles it take to compress one byte. This

measure is important when compression is done special hardware.

5. Other quantities:Such as, mean square error (MSE) & peak signal to noise

ratio (PSNR), are used to measure the distortion caused by lossy

compression of image & movies.

6. Relative compression: It is used to measure the compression gain in

lossless audio compression methods, such as MLP. (meridian lossless

packing) This expresses the quality of compression by the number of bits

each audio sample is reduced. MLP(multilevel progressive method) for image

comp.

COMPRESSION TECHNIQUES

information.

2. Original data can be recovered exactly Original data cannot be recovered

from the compressed data. exactly.

3. It is used where the difference

between original & constructed data is Tolerable

not tolerated.

4. Higher compression ratios are Higher compression ratios.

generally not possible.

5. Application: text compression, image Application: storage & technique of

processing in medical satellite signal speech, video communication.

processing

Prof. Karishma Raut

Statistical methods

Variable size codes statistical methods use the symbols (characters or pixels)

they operate on with the shorter codes assigned to symbols or groups of symbols

that appear more often in data (have a higher probability of occurrence).

Designer & implementers of variable size codes have to idea with the two

problems of

Prefix codes

Tunshall codes

Shannon-Fano code

Huffman

Arithmetic etc.

Shannon-Fano Coding

2. The set of symbols is then divided into two subsets that the same (or almost

the same) probabilities.

3. All symbols in one sunset get assigned codes that start with a 0, while the

codes of the symbols in the other subset start with a 1.

4. Each subset is then recursively divided into two subsets of roughly equal

probabilities & repeat the procedure.

5. The process continues until no more subset remains.

2. Partition the set of symbols into two sets that is as close to being

equiprobable as possible.

3. Assign ‘1’ to each symbol in the upper set & ‘0’ to each symbol in

lower set.

4. Continue this process, each time partitioning the sets with as nearly

equal probabilities as possible until further partitioning is not

possible.

Prof. Karishma Raut

HUFFMAN CODING

descending order of their probabilities. It then constructs a tree, with a symbol at

every leaf, from the bottom up.

This is done in steps, where at each step the two symbols with smallest

probabilities are selected, added to the top of the partial tree, selected from the list

& replaced with an auxiliary symbol representing the two original symbols. When

the list is reduced to just one auxiliary symbol, the tree is complete.

Prof. Karishma Raut

prefix codes are uniquely decodable.

shorter code words than symbols that occur less frequently.

2. In the optimum code, the two symbols that occur lest frequently will

have the same length.

Standard Hufffman

Min. Variance Hufffman

Non-binary Hufffman

Adaptive Hufffman

Prof. Karishma Raut

their redundancy. However the variance of length of the code words is

significantly different.

Prof. Karishma Raut

BY STANDARD HUFFMAN

It is reasonable to elect the use of second code i.e. min. variance Huffman

code.

compared to standard Huffman code.

The binary coding procedure can be easily extended to the non-binary case

i.e. m-ary alphabet (m ≠ 2).

In general case of an m-ary code & an M-letter alphabet. Let’s ‘m’ be the no.

of letters that are combined in the first phase. Then ‘m’ is the number between two

& m which is equal to M modulo (m – 1).

sequence. If these probabilities are unknown then Huffman coding becomes two

Prof. Karishma Raut

pass procedures. In the 1st procedure the statistics are collected & in the second

pass the source is coded.

For the adaptive Huffman codes two extra parameters are added to the

binary tree.

Weight: the weight of each external node is the number of time the symbol

corresponding to the leaf has been encountered.

The weight of each internal node is the sum of weight of its offspring’s.

Node number: A unique number is assigned to every node. If an alphabet size is ‘n’

then there will be 2n – 1 internal & external nodes which can be numbered as y 1 to

y2n-1.

1. if suppose wj is the weight of yj then for other nodes the weight will be such

that X1 ≤ X2 ≤ X3……. ≤ X2n – 1

2. if say y2j-1 & y2j are the offspring’s of the same parent node for 1 ≤ j ≤ n &

the node number for the parent node is greater than offspring then this

property is called sibling property of Huffman code.

In the adaptive Huffman codes neither Tx nor Rx knows anything about the

source statistics that the starting ofTxn.

PROCEDURE

At the beginning the tree at both Tx& Rx consist of signal node that

corresponds to all the symbols which not yet transmitted (NYT) & has zero weight.

added to the tree & tree is configured using update procedure.

Before Txn. starts, a fixed code for each symbol is agreed upon betn Tx & Rx.

If suppose a square has an alphabet (a1, a2, ….am) of sine ‘m’ then pick ‘e’ &

‘r’ such that m = 2e + r & 0 ≤ r ≤ 2e

k ≤ 2r OR ak is encoded as ‘e’ bit binary representation of k – r -1.

When the symbol is encountered 1st time then NYT code is transmitted

followed by fixed code for that symbol.

Prof. Karishma Raut

The TX &Rx start with same tree structure & both will use same updating

procedure & therefore encoding & decoding process remains synchronized with

each other.

As the received binary string is read then the tree is traversed in a manner

identical to that used in the encoding procedure. Once a leaf is encountered the

symbol corresponding to leaf is decoded. If the leaf is the ‘NYT’ node then be check

the next ‘e’ bits to see if the resulting number is less than r.

If it is less than r be read another bit to complete the code for the symbol.

The index for the symbol is obtained by adding ‘r + 1’ to the decimal number

corresponding to the ‘e’ bit binary string or by adding ‘1’ to the ‘e+1’ binary string.

Once the symbol has been decoded the tree is updated & the next bit is

used to start another traversal down the tree.

If it leads to internal node go on reading bits till we reach to leaf.

If the leaf corresponds to external symbol on the tree decode it & update

weight.

Else it is a NYT node. Then, read next e bits, if decimal equivalent of those e

bits is less than r read one more bit. The index of symbol ‘k’ is then decimal

equivalent plus 1. Update tree. K = ( )10 +1

If decimal eq. of e bits is greater than r then the index of symbol ‘k’ is then

decimal eq. plus r plus1 update tree. K = ( )10 + r + 1

ARITHMETIC CODING

symbols rather than generating a separate code word for each symbol in a

sequence. But this approach becomes impractical with Huffman & causes

exponential growth in the size of codebook.

tag is generated for the sequence to be encoded. This tag corresponding to a binary

fraction which becomes the binary code for the sequence.

distribution function (cdf) is used.

numbers

X(ai) = I ai € A

random variable.

Prof. Karishma Raut

P( x = i) = P(ai)

The procedure for generating tag works by reducing the size of the interval in which

the tag resides as more and more elements of the sequence are received.

We start out by first dividing unit interval into subintervals of form [Fx (i-1),Fx (i))

i.e. [0,1)

Boundaries of the interval contain the tag for the sequence being encoded as,

observation symbol.

Tag =

Prof. Karishma Raut

Deciphering tag

There are two ways to know the entire sequence has been decoded. The decoder

may know the length of sequence or second way to use special symbol to indicate

end of transmission.

Prof. Karishma Raut

Prof. Karishma Raut

DICTIONARY METHODS

There are many applications in which the o/p of source consists of recurring

patterns. The example is a text source in which certain word occur recursively. To

encode such source a list or a dictionary is maintained which has a collection of

frequently occurring patterns from the source.

corresponding matching entry in the dictionary.

If the pattern has similar entry in the dictionary then the o/p source pattern

is simply encode with the index of related pattern in dictionary instead of coding it

continuously.

This obtains the compression & the technique is called dictionary encoding

method. The encoder & decoder have identical dictionaries.

STATIC DICTIONARY

the source is known to us in advance. The static dictionary contains those patterns

which are anticipated to occurs more frequently from source o/p.

For the different pattern entries with recurring nature a highly efficient

compression nature a highly efficient compression technique with static dictionary

method is obtained.

These schemes would work well only for those applications & data for which they

were designed for. This static scheme is not capable to be used with other

applications & if it is used results in expansion rather than compression.

dictionary technique adapts to the characteristic of source o/p rather than

changing as like in static approach.

Dynamic dictionary method does not required initial knowledge of the source

o/p for the compression purpose.

The new patterns o/p from source which do not exist in the dictionary are

included dynamically into the dictionary.

LZ-77 LimpleZiv

LZ-78 LimpleZiv

LZ-w Limpel-Ziv-Wetch

Prof. Karishma Raut

LZ-77 METHOD

In this method dictionary is portion of previously encoded sequence through

coding window. The window consists of two parts, a search buffer contains portion

of recently encodes sequence. The look ahead buffer contains next portion of the

sequence to be encoded.

To encode the sequence in look ahead buffer the encoder moves search

pointer through search buffer until it encounters a match to first symbol in look

ahead buffer.

The distance of pointer from look ahead buffer is called offset.Encoder then

examines symbols following. The symbol of pointer location to see for the match in

look ahead buffer. The number of consecutive symbols in search buffer that

matches with consecutive symbols in look ahead buffer starting with first symbol is

called length of match.

The encoder then searches for longest match & when a longest match is

found then encoder encodes it with triple < 0, ,c>

Where, 0 offset

Length of match

following by match.

DECODING

We receive the triples < 0, 0, c(d) >, < 7, 4, c(r)>, &< (3, 5 , c(d))>

cabraca

Cabracad

2. < 7, 4, c(r)>

Cabracadabrar

3. < (3, 5 ,c(d)>

Cabraca d abrarrarrad

Prof. Karishma Raut

Disadv:

2. Large size buffers are required. Efficient search algorithms are needed.

3. Method turns to be inefficient if period of repeatation is larger than the

size of search buffer.

Prof. Karishma Raut

by dropping the reliance in the search buffer & keeping an explicit dictionary. This

dictionary has to be built at both the encoder & decoder in identical manner. The

i/p are coded as double <i, c >

i index corresponding to dictionary entry that was longest match to the i/p.

c code for the character I the i/p following matched portion to the i/p.( 1 st

unmatched charater)

while the LZ78 algorithm has the ability to capture patterns & hold them

indefinitely. It also has a rather serious drawback. The dictionary keeps growing

without bound.

compression.

Index 0 is used for the unmatched portion. The encoded double becomes a

new entry to the dictionary. Thus, each new entry into the dictionary is one new

symbol concatenated with an existing dictionary entry.

Prof. Karishma Raut

Prof. Karishma Raut

LZW LimpelZivWeitch

second element of the pair <i , c>. That is, the encoder would only send the index to

the dictionary.

In order to this, the dictionary has to be prime with all the letters of the

source alphabet. The i/p to the encoder is accumulated in a pattern ‘p’ as long as

‘p’ is contained in the dictionary. If the addition of another letter ‘a’ results in a

pattern ‘p *a’ that is not in the dictionary, then the index of ‘p; is transmitted to the

receiver, the pattern ‘p *a’ is added to the dictionary & we start another pattern

with the letter ‘a’.

DECODING

The encoder o/p sequence becomes the encoder i/p sequence. The decoder

starts with the same dictionary as encoder.

With reference to the correct example, the index value 5 corresponds to the

letter w, so we decode w as the first element of our sequence. At the same time, in

order to mimie the dictionary construction procedure of the encoder, we begin

construction of the next element of the dictionary. We start with the letter w. this

pattern exists in the dictionary, so we do not add it to the dictionary & continue

with the decoding process. The next decoder input is 2, which is the index

corresponding to the letter a. we decode an ‘a’ & concatenate it with our current

pattern to from the pattern wa. As this does not exist in the dictionary, we add it as

next entry of the dictionary & start a new pattern beginning with the letter a.

Prof. Karishma Raut

AUDIO COMPRESSION

Waveform codes: It tries to produce, audio samples that are as close to the original

samples. Ex. PCM, DCPAM, Subband& adaptive transform

different freq. bands& codes each subband separately with ADPCM or a similar

quantization method.

domain with DCT.

Source codes: It used a mathametical method of the source of data. The model

depends on certain parameters & the encoder used the i/p data to compute those

parameters. Once the parameters are obtained, they are written on the compressed

stream. The decoder i/p the parameters & employs the mathematical model to

reconstruct the original data.

There are different MPEG std for audio compression MPEG1 and MPEG-2 and

known as layer I,II,III. Layer III is refered as MP3.

It defines the bit stream that should be presented to the decoder leaving the design

of the encoder to individual venders. The basic strategy used in all 3 layers is

shown.

The i/p consisting of 16 bit PCM words is first transformed into the frequency

domain. The frequency coefficient are quantized, coded and packed into the MPEG

bit stream. Each layer is previous layer and also provides higher compression. The

three layers are backward compatible i.e. a decoder for layer III should be able to

decode layer I and II. A decoder for layer II should be able to decode layer-I encoded

audio.

MPEG-1 provides a 4:1 compression. In layer I coding the time frequency mapping

is accomplished using a bank of 32 subband filters. The o/p of the subband filter is

critically sampled. i.e. o/p of each filter is down sampled by 32. The samples are

divided into groups of 12 samples each.

Twelve samples from each 32 subband filters make up one frame of the layerI

coder. Once the frequency components are obtained the algorithm examines a

scalefactor. The subband o/p is divided by scale factor before being linearly

quantized. There are total of 63 scale factors specified in the MPEG standard

specification of each scalefactor requires 6 bits.

The output of the quantization and bit allocation steps are combined into a frame

as shown. The header is made up of 32 bits.

Prof. Karishma Raut

If protection info known, all 16 bit can be used for frame synchronization.

2 bit - mode

Prof. Karishma Raut

JPEG Standard

JPEG is joint photograohic expert group is one of the most widely used standard for lossy

image compression. It is a joint collaboration of ISO and ITU.

1. The DCT is used in JPEG scheme. In this scheme the image to be compressed is level

shifted by 2p-1.

2. The level shifting is performed by subtracting value 2p-1 from the each pixel value of

the given image where P is number of bit required to represent one pixel in the image.

3. Therefore if we have 8 bit images then 28-1 = 128and pixels values varies between -

128 to 127. The whole image is then divided into 8*8 blocks of pixels.

4. These 8*8 blocks then transformed using DCT. It is found that dimension of image is

not multiple of 8 then encoder replicates last column or row until the final size

becomes multiple of 8.

5. These additional rows and column are removed during decoding process. The DCT of

level shifted block gives the DCT coefficient.

6. Then algorithm uses uniform midtreadquantizer to quantize the DCT coefficient. The

quantized value of each DCT coefficient is represented by label.

7. The reconstructed value is obtained from this label by multiplying it with the

corresponding entry in sample quantization table which is provided.

8. The step size generally increases as we move from DC coefficient to higher order

coefficient. More quantization error is introduced at higher frequency coefficient as

compared to lower frequency coefficient as quantization error is increasing function

of steps.

9. This label for DC and AC coefficient are coded differently by JPEG. This results in

higher compression but more complex operation.

- Matlab IEEE 2009 Projects @ Hades InfoTechHochgeladen vonGouse79
- ITC UNIT-III.pptHochgeladen vonAbhishek Bose
- Java Jmf2 0-GuideHochgeladen vonapi-3728818
- Running a Realtime Stats Service on MySQLHochgeladen vonBest Tech Videos
- 50120140501009Hochgeladen vonIAEME Publication
- Noise and Entropy in Music - On the Stochasticity of Musical Signals (Revised Version)Hochgeladen vonCorné Driesprong
- Bahari09Hochgeladen vonchandra sekhar
- [1]Hochgeladen vonSanthosh Veeramalla
- Effects of compression and individual variability on face recognition performance.Hochgeladen vonnunu_abe
- DBA-IN-FlayHochgeladen vonYassin Elhardallo
- Fractal Image CompressionHochgeladen vonSubhakanta Swain
- Audio CompressionHochgeladen vonmuks_reet
- Data Compression ReportHochgeladen vonhrhodge
- Computing Customization to Generate Expertise ExploitationHochgeladen vonAnonymous 8erOuK4i
- 10.1.1.10Hochgeladen vonanisatya
- 631S09_lec16MPEGHochgeladen vonNooriya Mila Devi
- Example ShannonHochgeladen vonAnnu Gautam
- Welcome to International Journal of Engineering Research and Development (IJERD)Hochgeladen vonIJERD
- 1.Improving Various Reversible Data Hiding Schemes via Optimal Codes for Binary CoversHochgeladen vonSreenivasulu Sree
- VLSI_bookHochgeladen vondeveloper_2k11
- Autonomous Spectral Discovery and Mapping Onboard the EO-1 SpacecraftHochgeladen vonsolmazbabakan
- 51400025-ppt-3Hochgeladen vonSujitha Kurup
- Teradata 13.10 FeaturesHochgeladen vonLansa Phillip
- COMP211slides_6Hochgeladen vonalexisthe
- Solution IA5010Hochgeladen vonNarendra Kumar
- 110EC0564-4.pdfHochgeladen vonmamudu francis
- 1-s2.0-S0143816616303694-mainHochgeladen vonAnonymous Y9vxQkET8n
- 01_MM_BASICS_slides.pdfHochgeladen vonJoyce Ann Ebora Sulabo
- Motion Compensated Wyner-Ziv Video CodingHochgeladen vonVoros Rukmini Xanathor
- Scalable Rate ControlHochgeladen vonc.sowndarya

- Popular electronicsHochgeladen vonorli2004
- Manual del Decodificador DCT1800 (Ingles)Hochgeladen vonAlexandro Conde Martínez
- HPB Dual BaseConfigHochgeladen vonJuhari Geznet
- Charles Mingus DoctoradoHochgeladen vonandressilvadc
- Kurzweil MicropianoHochgeladen vonestereo8
- Color TherapyHochgeladen vonapi-3860170
- INSIGNIA - Portable Pico Projector CubeHochgeladen vonEliseo Gallo
- amplificadoslm324Hochgeladen vonjahuar459
- dekadaHochgeladen vonKath Tan Alcantara
- Beam Forming OfdmaHochgeladen vonTechnos_Inc
- Wireless sensor networkHochgeladen vonRupam Vaghasiya
- 3D Design & Simulation of Printed Dipole AntennaHochgeladen vonAJER JOURNAL
- Hx100V User ManualHochgeladen voncarrier2
- 160923_MARU 310_320 MST.pdfHochgeladen vonDwijaputra Templorer
- an430Hochgeladen vonramanaidu1
- the slight edgeHochgeladen vonapi-17857159
- IR.88-v10.0 LTE and EPC Roaming GuidelinesHochgeladen vonPatt Cha
- Radar Presentation from freescaleHochgeladen vonGuang Chen
- Fight SongHochgeladen vonRicardo Linares Garza
- Detecon Opinion Paper P2P TV: a looming media revolution?Hochgeladen vonDetecon International
- Max 3485Hochgeladen vondangtuandat
- DynamoHochgeladen vonDany Kp
- Ad 8341Hochgeladen vonahsoopk
- Midterm Exam LitHochgeladen vonsordy mahusay mingasca
- Christmas SongsHochgeladen vonFreya Atkinson
- File DownloadHochgeladen vonPedro Santos
- 4_473545566295425091Hochgeladen vonmasoud1716
- Mark magazine 45, Aug-Sep 2013Hochgeladen vonC_Rigby
- AIN'T NO SUNSHINE Chords - Bill Withers _ E-ChordsHochgeladen vonMaria Edith Maroca
- 24pp RADIAFLEX BROCHURE_Update-2_web.pdfHochgeladen vonMelissa Corimanya

## Viel mehr als nur Dokumente.

Entdecken, was Scribd alles zu bieten hat, inklusive Bücher und Hörbücher von großen Verlagen.

Jederzeit kündbar.