Sie sind auf Seite 1von 9

HUFFMAN CODING

Ms 140400147
SADIA YUNAS BUTT

Abstract Use of Huffman coding method to compress


their content most of the time images and
In 1952 David A.Huffman the student of movies are also compress using lossless
MIT discover this algorithm during work on technique.so it is general technique which
his term paper assigned by his professor we often use in our computer like PKZIP in
Robert M.fano.The idea came in to his mind which WINZIP is one and other is GZIP
that using a frequency sorted binary tree we these are use for general purpose algorithm
can able to find out the most efficient binary which compress your text file and these use
code[1].Huffman deny the major flaw of Huffman another application of Huffman is
Claude Shannon Fano coding by building audio so there is MP3 and AAC format for
the tree from bottom up instead of from the audio whenever we use Ipod we use MP3 all
top down .There is a lot of compression these are encoded compress using Huffman
algorithm but we focus on lossless there is lossly compression but there is
compression so in this regard Huffman lossless compression as well AAC is very
algorithm is so cool and efficient algorithm widely use in U tube so rightly now we are
in sort of smallest application .Huffman use using Huffman algorithm in your Iphone in
for image compression for example PNG Ipads these are use AAC .Huffman proof
,JPG for simple picture of bird it also use that compression can be easy using a
MPEG whenever you watch movie or video frequency sorted binary tree in the bottom
in computer chances are that its image will up fashion.
be in MPEG format both these PNG MPEG
and for text compression for example static
and dynamic Huffman coding techniques

INTRODUCTION a two row at each -and so u say it you have


to take a 4 bit for every letter but there is
Huffman coding is a great way to compress
down and to able to shrink it down by bites
and chunks so give you example
how it works write down on right an side
the letter A B C D E if the document only
have a range of these five letters and we
want to do compress the data Popping could actually a lie
be the great way for doing it and the reason u looking have been 8 bit for every letter
why ? if we want to stress the data in regular but that is actually wrong, a better way for
text format usually with store data you will doing it and that was Huffman coding does
after do like this you need set of your system and its only work on specific data so if you
and each one would have to represent a bit is need 4 range data this is the something that
would work for you but we can do actually after sum up it root node contain which does
is take care similar to something like this not contain any node a sum of total data we
we can actually take care and stated looking for
acquiring two row 8 after you can do that
just by using 3 bits and what can do have to
do it in 3 form first then i am assure you
have actual code soft works very first we

why it will be better then ASCII


we will assign here the values 0 at the left
hand side and 1 at the right hand side
can do is . if you wondering when which
actually is important certain strategy in we see the value 0 to A and if we see at
computer science paring matrix that this is the binary tree we will see loop here from
actually critical because you are doing with root to B we will see binary value 100 and in
dynamic command of data and you need to the same way the all nodes get there values
be compress and Huffman coding is best now we store all this in compress form
way to doing that so the temporary tree
generate is can be robed different then some instead of 8 value here we give 1 and next
the other we have seen as specifically can we see three give value 8 same way do this
be different like binary tree because it all we are doing is the shrinking down the
cannot be use c associated with them and numbers of bits that required
we can actually stood of with two lowest
values we can give a E and we say it is if we look here is 5 time 8 so we need a total
equal to 8 e:8 and we draw a box 40 as compare the new system Huffman
coding
around it
which is total 1+ 4 set of three so we
and we know that D is the same thing and
looking 13
we say they are equal 16

starting with the low value first so we can


start of c:10 B:12 so they will be equal to
22 and now we connect it 22 and 16 it will
make 38 and then we connect it to A :24
by Huffman coding we see well over half of your data a lot , the binary symbol
the amount of space and that's one of the big representing it is shorter. If it is used rarely
reason that Huffman coding so powerful the symbol representing it is longer .this way
because by simply doing something by all the data will be take less physical space
shrinking doing that are necessary on per when encoded. There is a way to decide
letter base they much more for sequences or what binary code to give each character
something where u can were the data sort using trees.
like that
char frequency
A 1
B 6
C 7
D 2
E 8

In this paper we may discuss first that what


is Huffman coding and then how this
algorithm works In 3 section we will see
adoptive Huffman coding how Huffman
encode decode , image text compression
finally we will discuss the conclusion and in this figure we have 5 different letter and
the future scope of it. also have frequency which often be use a
and d in this example is the least ,
frequency represent how often character
WHAT IS HUFFMAN CODING appear in a string of data now imagine these
as 5 separate trees combine the two smallest
There are many ways to store information.
tress in order we can combine them slowly
computer scientist always looking for new
bit by bit first we combine a and d
and better ways to store strings of data with
which are smallest after combining them i
as little space as possible. Huffman coding is
have new tree with a greater frequency 3
a method of storing strings of data as binary
now i will go for next least number which is
code in an efficient manner .Huffman
here b:6 now i have new tree with greater
coding uses" lossless data compression"
frequency 9. next i have two more least
which means no information is lost which
character c and e and get the frequency
you are coded[3]
15 .finally i combine these two tress and get
Huffman coding uses ' variable length [6] one large tree .now we have large tree
coding' which means that symbol in the data containing all the characters we can now
you are encoded or converted to binary assign them a binary code to each symbol by
symbol based on how often that symbol is going down the tree (each left branch
used .for example if character 'a ' is used in receives a '0' and each right branch will
receive '1' .if we go to the top we will get a ,e=11
binary string a=000,b=01 , c=10 , d=001
These are what a ,b,c,d,e will each be if we have to decode it you have to start
converted to. either from back way front or front
Huffman coding is to encode data to take up suppose we have to accrue the average
less space wouldn't it make sense to give length via Huffman coding there is special
some character binary code only 1 digit long formula for that so average code word
('0' or'1')or shouldn't more character be length=1/f(T)*sum of d(i)*f(i), for i=0 to
given code 2 digit long instead of 3?consider i=n
how you would read the code it is important
where n = number of character
for how each representation to be unique
from each other such that you can easily tell f(T)=total frequency
which character that part is suppose to each
represent .If '1' and '0' is representing a f(i)=frequency of that character
character any other representation d(i)= length of that symbol
containing 1 and 0 could be different
character . now encode a,b,c,d,e using if we consider it in our example[2]
Huffman code so we will take each Average length
character in replace it with the binary =(1/(1+6+7+2=8)*(3*1+2*6+2*7+3*2+2*8
abcbe=000 01 1001 11-00001100111 )

decode 1011001000011101 =(1/24)*(3+12+14+6+16)

we have to compare the representation =(1/24)*(51)


above to the binary code bit by bit to fill the =2.125 digit long
only possible result
1011001000011101=c1001000011101(only (the average letter in the cod)[8]
c start with 1,then 0) so it must start with 'c'
after that it will be start with 'e'=c e
001000011101(only e start with 1 ,then 1)
then start with =c e d 000011101(only d
start with 0 then 0 then 1)
= c d e a 011101(only a
start with 0 then 0 then 0)
=c d e a b1101(only b start
with 0 then 1)
=c d e a b e 01(only e start
with 1 then 1)
=c d e a b e b=cdeabeb
e is encoded to 2.125 binary digit.

Adoptive Huffman coding


In adoptive Huffman coding we should
remember two steps are . nodes should be
increasing order and of the frequency and
sum of the frequency in parent node greater
than all the frequency of the side node .so
taking these two facts in count it grow the
Huffman tree using the adoptive Huffman
coding [3],[4]
now we will see all left hand side nodes are
First of all in the for the string greater than right hand side so in this case
(MISSISSIPPI) we write an empty node we will see that it is not in proper order so
always we first insert empty node after
doing this we insert m we will re-loop this node( )

so we put two empty node here and put the


frequency of m here that is m:1 we should
always update the parent node so 1+0 =1
now we insert i so again we put two empty
we go back and interchange this node .[5]
node and insert i in it and the frequency of
i:1 and update its parent node which was so in the string (MISSISSIPPI) MISS
empty so after putting it becomes 1 comes on right hand side and PPI go to left

now we can put s so we again make two


empty node and insert s in the right node
and insert the frequency of s which is s:1 side) after interchange we will get this tree
and update its parent node we also update
the i parent node which is now 1+1=2 after each update in parent node we omit the
previous frequency and write new after add
in the previous.
now we have to add next s but there is

frequency in empty node so update here the


parent node
already s so we just update the frequency
but to do this we will see problem with
updating see what will happened when we 1
more s in the previous s frequency
so we will interchange this 1:s with 1:m

but here we will see 5 > 4 so we will


interchange them with each other because
they should be either less or equal to the
right hand side.
so then we continue the operation s is
already here so we update it and it becomes
2 now it look healthy tree so now we add
new frequency of I here but there is
already I so we just update the I so it will
become 2 and if go to remaining parent node
they all will add 1 more in them see
now we take next s from the string and
update it ,now it becomes 3 now we do
interchange these two nodes so s frequency
will go to right side.So this is the update tree
next is again frequency i so we add new
now we again insert next p and if we see
string it in it will be now 3 then we take
here we have p so we update the
string p there is no p that why we put this
frequency it become 2 and its parent node
become 1+1=3 and their parent node will
so due to this interchange whole tree will
change the nodes frequency now it become
healthy tree .now we add last character i
we have i so we update its frequency it will
become 4 then we update its parent node it
will become
if we check the integrating of the tree it is in

become 3+3=6 and lastly 6+4=10 at the root


node

increasing order so it is healthy tree all the


parent node are greater then its child node

Huffman encode decode [2]


sally sell sea shells
we put the binary values to this string
s 10
now if we look at the tree we will see it is
again not a healthy tree because 2> 1 so we a 1111
have to interchange it but now at this time
we will interchange 1:m with 2:p l 01
y 1110
_ 001
e 110
h 0001
! 0000
now we assign values to these characters
then we add them and make the root node so
on. Now we write the branches as 0 and 1

Text compression With Huffman Coding


this is the complete construction of tree. To
Huffman coding technique use for
find the binary tree the code of the letter just
compression .there is a idea that more
start the top of the tree and follow tree
common letter equal fewer bits, and
branches encoding of I is 00 and for S is
similarly less common letter require more
bits to represent them[7].in this case we use 01 for P encoding is 100 do the same things
the string MISSISSIPPI for each other letter. We will see that letter
RIVER=17characters .If we reduce the with greatest frequency have the lower
presentation and lower frequency have the
common ASCII we required 8 bit each for
longest presentation that exactly we
total of 136 bit, Now we will write that how
wanted.[9]
much time each letter is present in the string.
We will SEE

Now we sort this string by order I5 S4 P2 now we write down the bits for message
R2 M1 V1 E1 -1.now we make a tree and
start with the small nods

this is Huffman coded binary message if we


count this message is equal to 46 bits this is
the lesser than the original 136 bits if we
divide them we get 33% of the original one
which save the 67 % of the original message
in this way we can compress the text
message through Huffman coding very
easily.
CONCLUSION [5] A fast adoptive Huffman coding
algorithm
In this paper we proof that how Huffman is
cool and efficient algorithm in the filed of Communications, IEEE Transactions on
compression although there are more (Volume:41 , Issue: 4 )
applications are available in the filed of
compression like arithmetic coding and [6] An efficient decoding technique for
Lempel ziv, but mostly we see in our daily Huffman codes. Rezaul Alam
life Huffman algorithm work in our Iphone Chowdhury(Department of Computer
Ipods etc because it is not complex and more Science and Engineering, Bangladesh
optimal among the family of symbol code. University of Engineering and Technology,
Huffman coding is a greedy algorithm so for Dhaka-1000, Bangladesh)
the compression of data it is consider good
Irwin King(Department of Computer
that use longer bit instead of more frequent
Science and Engineering, The Chinese
bits but University of Hong Kong, Hong Kong,
Arithmetic coding gives greater People’s Republic of China)
compression, is faster for adaptive models,
and clearly separates the model from the [7] Evaluation of Huffman and Arithmetic
channel encoding. Algorithms for Multimedia Compression
Standards
Asadollah Shahbahrami,Ramin Bahrampour
REFRENCES ,Mobin Sabbaghi Rostami,
Mostafa Ayoubi Mobarhan,
[1].Gary Stix. Profile: David A. Huffman. Department of Computer Engineering,
Scientific American, 265(3):54, 58, Faculty of Engineering, University of
September 1991. Guilan, Rasht, Iran
[8] Data Compression Scheme of
[2]. Fundamentals of Information Theory Dynamic Huffman Code for Different
and Coding Design Languages
By Roberto Togneri, Christopher J.S 1Shivani Pathak, 2Shradha Singh
deSilva ,3 Smita Singh,4Mamta Jain
, 5Anand Sharma
[3] Web Page Compression using [9] Huffman Data Compression
Huffman Coding Technique Joseph Lee
Manjeet Gupta (Assistant professor) ©May 23, 2007
Department of CSE JMIT Radaur
,Brijesh Kumar (associate professor)
Department of IT Lingyaya’s university.

[4] Fundamental Data Compression

Author(s):Ida Mengyi Pu

Das könnte Ihnen auch gefallen