Sie sind auf Seite 1von 27

Discrete memoryless souce

Thursday, June 23, 2011 4:31 PM

If we have a binary discrete memoryless source


It emits symbols 0 and 1 at regular intervals called symbols period =T Symbol rate Rs=1/Ts

In ones second Rs symbols are emitted Or 1 symbol is emitted every Ts seconds Symbols 0,1 are emitter with equal probability
50 45 40 35 30 25 20 15 10 5 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

L=100samples we have 50 samples of 0 and 50 samples of 1 being emitted The symbols emitted are discrete in nature Thus we need statistical characterization of information content/symbols emitter by the DMS Entropy is a statistical measure which defines the Average information content emitted by discrete memoryless source Entropy of above source is 1 bit
entorpu Page 1

If 0 and 1 are emitted with probability 0.25 and 0.75 At a average 1 will be transmitter more than 0 Uncertainity is less in this output since apriori probability tell us that 1 is expected to be the likely outcome .

8000 7000 6000 5000 4000 3000 2000 1000 0

0.2

0.4

0.6

0.8

1.2 x 10
-4

Out of 1000 samples ideally 2500 are 0 and 7500 are 1's But the probabilities will approach these ideal values when sample values are large ideally infinite

As the sample values are large the estimated entropy approaches the ideal entropy At a average we require about 0.8 bits which is less than 1 bit when the two symbols ,since probability of occurrence of 1 is greater than 0 Infromation hence uncertainty in the message is less this is reflected in the reduced values in entropy Below it is seen that as number of samples become very large the
entorpu Page 2

Below it is seen that as number of samples become very large the Relative frequency distribution approaches the apriori probabilities Thus calculated entropy approaches ideal entropy

Screen clipping taken: 23-Jun-11 6:11 PM

For small number of samples entropy of the values is not stationary It may give different values even if sames experiment is performed In all the experiments the maximum entropy if 1 corresponding to maximum uncertainty Hmax=log2(M) where M is number of distinct symbols emitted by the discrete memory less source The minimum entropy is found when one of tow symbols has probability 1 This represents condition of no uncertainity as every symbol period the sames symbols is transmitter there is no abguity or uncertainty at the output of the DMS as to which symbol is transmitter
Entropy provides the statistical characterization of information It represents the average information content of the signal And it is bounded as 0<H<log2(M)

Where 0 represents no uncertainty And log2(M) represents maximum uncertainty The entropy of the DMS is log2(M) when all the symbols are equiprobable

entorpu Page 3

Extension of discrete memoryless source


Thursday, June 23, 2011 6:19 PM

Consider output of discrete memoryless source Let output symbols be 0,1 Entropy of the DMS when both symbols are equiprobable is 1 bit Let p1 be probability of occurance of 0 Let p2 be probability of occurance of 1 Consider the set of symbols 000110111101010101001011010101010 We process information in n-tuples Let n=2 We group the incoming symbols in groups of two symbols And consider this n-tuple as a new symbol Symbol Probability 00 01 10 11 P1p1 P1p2 P2p1 p2p2

Thus if we have M distinct symbols we can have 2^M Different combination of the sumbols Is n=2(2 symbols are considerd) The new set of symbols can be considered as the output of DMS This is called the extended source ie 2nd extension of the source Let us calculate the entropy of the extended source
entorpu Page 4

Let us calculate the entropy of the extended source M=4 for the extended source This is for 100 samples

The calculated entropy is less than the actual entopy Also we can see that entropy of extended source H=H(x)*2

14000

12000

10000

8000

6000

4000

2000

0.5

1.5

2.5

entorpu Page 5

As number of samples increases the entropy of extended source Becomes equal to the ideal entropy of the source

entorpu Page 6

Average code word length


Thursday, June 23, 2011 8:10 PM

As seen the entropy represents the average information content of the symbol

It is just a measure of average information content of the signal If the entropy of source is 1 bit Information gained by observing the output of DMS over the symbol period is 1 bit It also tells us that if the input symbols are equiprobable the entropy is log2(M)
If we have M equiprobable messages Number of bits required to uniquely decode M messages is log2(M) Hence average code word length is log2(M) If however some symbols are more probable than other If we represent the symbols using fixed code length of log2(M) .

Message which more probable are also assigned the same code word length as the message which is less probable A efficient coding mechanism would be to use variable length code Wherein we use less number of bits to represent symbols which are more probable and large code work to represent the symbols that are less proable This would ensure that most of the time the desired encoded message is transferred using less number of bits as possible and some time large code word are transferred The average code word length is also a statistical measure . It represents the average code word length required to represent a symbol over the entire source alphabet Coding efficiency is represented as ratio of minimum code word length to average code word length
If the minimum code word length is equal to average code word length n=1

Maximum efficieny has been achieved It indicates that when large number of symbols are being transmitted most of the time the code word length used is equal to the average code word length
We may deliberatley sometimes use code word lengths that are longer that required and introduce redundancy in the signal,but that is taken care by a different module after the source encoding step Hence we consider the optimization to represent the desired symbols in the least number of bits possible We consider case of maximum uncretainty
entorpu Page 7

We consider case of maximum uncretainty When all the symbols are equiprobable we use fixed length codes
The average code word length of fixed length codes is log2(M) Consider a scheme

If we use a encoding scheme as follows

entorpu Page 8

Prefix coding
Thursday, June 23, 2011 8:48 PM

Information theory is about abstration Thus instead of sending the actual message we send the

encode form of the message Since source symbols are chosen from a finite set We can encode the source symbols are binary digits Some symbols may be more probable than other Efficient way of coding would be to represent the symbols with high probability with short code words and those with lower probability with long core words Prefix coding is a instantaneous coding method Usually as in case of fixed block codes we need to wait till all the bits representing the symbol read before we make a estimate as to which symbol was transmitted If however we design a encoding scheme such that Each codeword does not have any other code word as a prefix we can perform instantaneous decoding In a set of prefix code no code word is a prefix of another code word in the set Prefix codes may be generated in many ways One such scheme may be So S1 S2 0 1 110

S3

111

The decoding is instantaneous in the sense 1,0,111,110,00 is received s1s0s3s2SoSo


entorpu Page 9

As soon as 10 is received we can make decision s1 was transmitted, Once we receive 01 we can make a decision that s0 is transmitted Once we receive 11 we can have two possible combination , 110 and 111 and we wait for one more bit Prefix codes has been constructed for a discrete memoryless source with symbol sk and probability pk Code word length will always satisfy the inequality known as Kraft-Mcmillan inequality

Let length of code words be inverly related to probability Lk=[log2(1/pk)] actual lk will be higher Lk>log2(1/pk) 2^lk > 1/pk 2^-lk < pk Summing all symbols 2^-l <=1 If codeword has high probability,it will have lesser code word length
entorpu Page 10

word length Given the average code word length of shannon-Fano codes Average code word length l=log2(1/p) p=2^-l This probability with which symbol p is emitted Length of symbol assigned is -log2(p)

Average code word length

Entropy of source

Thus n=Lmin/L If Lmin=L we get optimum coding efficiency

entorpu Page 11

If we have a arbitary discrete memoryless source for optimum efficiency we need to match the source code to average code word length For a discrete memoryless source H(x) represents entropy of DMS It represents average information content of the signal in bits If we can represent the codewords using code word of length l

We consider the second extension of source We group the incoming symbols are n-tuples and 00,01,10,11 Instead of mapping two symbols we map incomping symbols into groups of 2 Then we encode the symbols to bits depending on the
entorpu Page 12

Then we encode the symbols to bits depending on the source probabilies Consider the second extension of the source Since we know the entropy of extended source is nH(X)

Consider the 3rd extension of source

entorpu Page 13

entorpu Page 14

Screen clipping taken: 24-Jun-11 12:19 AM

Screen clipping taken: 24-Jun-11 12:20 AM

entorpu Page 15

As we take n-tuple of symbols are represent it as a symbol and encode a symbol We can see that source entropy H(xn)=nH(x) The average codeword length for extended source is bounded as Number of symbols M=2^n (if we take n tuples) For fixed block length number of bits required to encode the message log2(M) is all the symbols are equiprobable If different symbols have different aproiri proability We can use prefix coding Bound of prefix code is given as H(Xn)<=L <=H(xn)+1 Now H(X) < L/N <=H(X)+1/N As N becomes very large
entorpu Page 16

As N becomes very large The upper and lower bounds converge

entorpu Page 17

Extended source
Friday, June 24, 2011 12:29 AM

We calculate the statistics for 2nd extension of the source

The apriori probabilities of symbols second extension of the source are smaller and number of symbols are more
If original source only 0,1 could be emitted

If we consider the extended source as the output of the DMS 0,1,2,3 symbols can be emitter

In the original source


0 0.3113

1 S

0.5000 0.8113

In entension of source

Thus as N becomes large

entorpu Page 18

Number of possible code words increases N=64 This is the 6th extension of the source Since some symbols are more probable than other they are represented by codeword's of smaller lengths than others

Thus are we consider the extension of the source and use prefix coding Between bits 5-6 we can incorporate 32 symbols Difference between entropy and length of fixed length codes increases as M increases or we consider higher extension of the source As H increase slower than L Entropy is information content of the signal Symbols with low probability provide the most information will be encoded less bits Symbols with high probability provide the least information will be encore using more bits

We can see that the entropy is less that the average code word length if the symbols were equiprobable
This tell us that we can design a scheme where average code word length can be reduced when N is increases
entorpu Page 19

N is increases According to source coding theorem we can make the average code word length as less as but no lesser than the entropy of the souce

If we divide the output of DMS into groups and encode groups Of instead of encoding binary symbols We encode M-ARY Symbols We can achive optimum coding efficiency by making M large this will lead

By making the order of extended source large enough we can make the average code word length of extended source as small as the entropy of the source,provided extended source has a high enough order.

Price is increased decoding complexity due to increase in the order of the prefix code of the system

entorpu Page 20

Source coding theorem


Friday, June 24, 2011 2:26 AM

The source coding theorem states that the average code word length can be made as less as the entropy but no smaller than entropy of the source

entorpu Page 21

Data compression
Friday, June 24, 2011 2:39 AM

A Source code C for a random variable X is a mapping from X, the range X, to D*, the set of finite length strings of symbols from a D-ary alphabet. Let c(x) denote the codeword corresponding to x and let l(x) denote the length of c(x).

Screen clipping taken: 24-Jun-11 2:39 AM

entorpu Page 22

Screen clipping taken: 24-Jun-11 2:41 AM

entorpu Page 23

Screen clipping taken: 24-Jun-11 2:41 AM

Screen clipping taken: 24-Jun-11 2:41 AM

entorpu Page 24

entorpu Page 25

Friday, June 24, 2011 2:45 AM

Screen clipping taken: 24-Jun-11 2:45 AM

Screen clipping taken: 24-Jun-11 2:45 AM

entorpu Page 26

Screen clipping taken: 24-Jun-11 2:46 AM

entorpu Page 27

Das könnte Ihnen auch gefallen