SlideShare a Scribd company logo
Huffman Coding
The idea behind Huffman coding is to find a way to compress
the storage of data using variable length codes. Our standard
model of storing data uses fixed length codes. For example,
each character in a text file is stored using 8 bits. There are
certain advantages to this system. When reading a file, we
know to ALWAYS read 8 bits at a time to read a single
character. But as you might imagine, this coding scheme is
inefficient. The reason for this is that some characters are
more frequently used than other characters. Let's say that the
character 'e' is used 10 times more frequently than the
character 'q'. It would then be advantageous for us to use a 7
bit code for e and a 9 bit code for q instead because that could
shorten our overall message length.
Huffman coding finds the optimal way to take advantage of
varying character frequencies in a particular file. On average,
using Huffman coding on standard files can shrink them
anywhere from 10% to 30% depending to the character
distribution. (The more skewed the distribution, the better
Huffman coding will do.)
The idea behind the coding is to give less frequent characters
and groups of characters longer codes. Also, the coding is
constructed in such a way that no two constructed codes are
prefixes of each other. This property about the code is crucial
with respect to easily deciphering the code.
Building a Huffman Tree
The easiest way to see how this algorithm works is to work
through an example. Let's assume that after scanning a file we
find the following character frequencies:
Character Frequency
'a' 12
'b' 2
'c' 7
'd' 13
'e' 14
'f' 85
Now, create a binary tree for each character that also stores
the frequency with which it occurs.
The algorithm is as follows: Find the two binary trees in the list
that store minimum frequencies at their nodes. Connect these
two nodes at a newly created common node that will store NO
character but will store the sum of the frequencies of all the
nodes connected below it. So our picture looks like follows:
9 12 'a' 13 'd' 14 'e' 85 'f'
/ 
2 'b' 7 'c'
Now, repeat this process until only one tree is left:
21
/ 
9 12 'a' 13 'd' 14 'e' 85 'f'
/ 
2 'b' 7 'c'
21 27
/  / 
9 12 'a' 13 'd' 14 'e' 85 'f'
/ 
2 'b' 7 'c'
48
/ 
21 27
/  / 
9 12 'a' 13 'd' 14 'e' 85 'f'
/ 
2 'b' 7 'c'
133
/ 
48 85 'f'
/ 
21 27
/  / 
9 12 'a' 13 'd' 14 'e'
/ 
2 'b' 7 'c'
Once the tree is built, each leaf node corresponds to a letter
with a code. To determine the code for a particular node, walk
a standard search path from the root to the leaf node in
question. For each step to the left, append a 0 to the code and
for each step right append a 1. Thus for the tree above we get
the following codes:
Letter Code
'a' 001
'b' 0000
'c' 0001
'd' 010
'e' 011
'f' 1
Why are we guaranteed that one code is NOT the prefix of
another?
Find a set of valid Huffman codes for a file with the given
character frequencies:
Character Frequency
'a' 15
'b' 7
'c' 5
'd' 23
'e' 17
'f' 19
Calculating Bits Saved
All we need to do for this calculation is figure out how many
bits are originally used to store the data and subtract from that
how many bits are used to store the data using the Huffman
code.
In the first example given, since we have six characters, let's
assume each is stored with a three bit code. Since there are 133
such characters, the total number of bits used is 3*133 = 399.
Now, using the Huffman coding frequencies we can calculate
the new total number of bits used:
Letter Code Frequency Total Bits
'a' 001 12 36
'b' 0000 2 8
'c' 0001 7 28
'd' 010 13 39
'e' 011 14 42
'f' 1 85 85
Total 238
Thus, we saved 399 - 238 = 161 bits, or nearly 40% storage
space. Of course there is a small detail we haven't taken into
account here. What is that?
Huffman Coding is an Optimal Prefix Code
Of all prefix codes for a file, Huffman coding produces an
optimal one. In all of our examples from class on Monday, we
found that Huffman coding saved us a fair percentage of
storage space. But, we can show that no other prefix code can
do better than Huffman coding.
First, we will show the following:
Let x and y be two characters with the least frequencies in a
file. Then there exists an optimal prefix code for C in which the
codewords for x and y have the same length and differ only in
the last bit.
Here is how we will prove this:
Assume that a tree T stores an optimal prefix code. Let and
characters a and b be sibling nodes stored at the maximum
depth of the tree. We will show that we can create T' with x
and y as siblings at the lowest depth of the tree such that the
number of bits used for the coding with T' is the same as with
T. (Let f(a) denote the frequency of the character a. Without
loss of generality, assume f(x) ≤ f(y) and f(a) ≤ f(b). It also
follows that f(x) ≤ f(a) and f(y) ≤ f(b). Let h be the height of the
tree T. Let x have a depth of dx in T and y have a depth of dx in
T.)
Create T' as follows: swap the nodes storing a and x, and then
swap the nodes storing b and y. Now, we have that the depth of
x and y in T' is h, the depth of a is dx and the depth of b is dy in
T'.
Now, let's calculate the change in the number of bits used for
the coding with tree T' with the coding in tree T. (Note: Since
all other codes remain unchanged, we only need to analyze the
total number of bits it takes to code a, b, x and y.)
# bits for tree T (for a,b,x and y) = hf(a) + hf(b) + dxf(x) dyf(y)
# bits for tree T' (for a, b, x, and y) = dxf(a) + dyf(b) + hf(x) +
hf(y).
Difference =
hf(a) + hf(b) + dxf(x) dyf(y) - (dxf(a) + dyf(b) + hf(x) + hf(y)) =
hf(a) + hf(b) + dxf(x) dyf(y) - dxf(a) - dyf*b) - hf(x) - hf(y) =
h(f(a) - f(x)) + h(f(b)-f(y)) + dx(f(x) - f(a)) + dy(f(y) - f(b)) =
h(f(a) - f(x)) + h(f(b)-f(y)) - dx(f(a) - f(x)) - dy(f(b) - f(y)) =
(h - dx)(f(a) - f(x)) + (h - dy)(f(b) - f(y))
Notice that all four of the terms above must be non-negative
since we know that h ≥ dx, h ≥ dy, f(a) ≥ f(x), and f(b) ≥ f(y).
Thus, it follows that this difference must be 0. Thus, the
number of bits to used in a code where x and y (the two
characters with lowest frequency) are siblings at maximum
depth of the coding tree is optimal.
In layman's terms, give me what you think is an optimal coding
tree, and I can create a new one from it with the two nodes
corresponding to low frequencies at the bottom of the tree.
To complete the proof, you'll notice that by construction,
Huffman coding ALWAYS makes sure that the nodes with the
lowest frequencies are at the bottom of the coding tree, all the
way through the construction. (You can't find any pair of
nodes for which this isn't true.) Technically, to carry out the
proof, you'd use induction, but we'll skip that for now...
Ad

More Related Content

What's hot (17)

Huffman coding || Huffman Tree
Huffman coding || Huffman TreeHuffman coding || Huffman Tree
Huffman coding || Huffman Tree
SatishKumarInumarthi
 
Huffman Coding
Huffman CodingHuffman Coding
Huffman Coding
Muhammad Saqib Rehan
 
Huffman coding || Huffman Tree
Huffman coding || Huffman TreeHuffman coding || Huffman Tree
Huffman coding || Huffman Tree
Gurunadh Guru
 
Huffman and Arithmetic coding - Performance analysis
Huffman and Arithmetic coding - Performance analysisHuffman and Arithmetic coding - Performance analysis
Huffman and Arithmetic coding - Performance analysis
Ramakant Soni
 
Huffman Coding
Huffman CodingHuffman Coding
Huffman Coding
Ehtisham Ali
 
Huffman's Alforithm
Huffman's AlforithmHuffman's Alforithm
Huffman's Alforithm
Roohaali
 
Shannon Fano
Shannon FanoShannon Fano
Shannon Fano
anithabalaprabhu
 
Multimedia lossless compression algorithms
Multimedia lossless compression algorithmsMultimedia lossless compression algorithms
Multimedia lossless compression algorithms
Mazin Alwaaly
 
Huffman Codes
Huffman CodesHuffman Codes
Huffman Codes
Md. Shafiuzzaman Hira
 
Huffman coding
Huffman codingHuffman coding
Huffman coding
George Ang
 
Huffman coding
Huffman coding Huffman coding
Huffman coding
Nazmul Hyder
 
Text compression
Text compressionText compression
Text compression
Sammer Qader
 
Lossless
LosslessLossless
Lossless
anithabalaprabhu
 
Data Compression - Text Compression - Run Length Encoding
Data Compression - Text Compression - Run Length EncodingData Compression - Text Compression - Run Length Encoding
Data Compression - Text Compression - Run Length Encoding
MANISH T I
 
Huffman coding
Huffman codingHuffman coding
Huffman coding
Nihal Gupta
 
Arithmetic Coding
Arithmetic CodingArithmetic Coding
Arithmetic Coding
anithabalaprabhu
 
Data Communication & Computer network: Shanon fano coding
Data Communication & Computer network: Shanon fano codingData Communication & Computer network: Shanon fano coding
Data Communication & Computer network: Shanon fano coding
Dr Rajiv Srivastava
 
Huffman coding || Huffman Tree
Huffman coding || Huffman TreeHuffman coding || Huffman Tree
Huffman coding || Huffman Tree
Gurunadh Guru
 
Huffman and Arithmetic coding - Performance analysis
Huffman and Arithmetic coding - Performance analysisHuffman and Arithmetic coding - Performance analysis
Huffman and Arithmetic coding - Performance analysis
Ramakant Soni
 
Huffman's Alforithm
Huffman's AlforithmHuffman's Alforithm
Huffman's Alforithm
Roohaali
 
Multimedia lossless compression algorithms
Multimedia lossless compression algorithmsMultimedia lossless compression algorithms
Multimedia lossless compression algorithms
Mazin Alwaaly
 
Huffman coding
Huffman codingHuffman coding
Huffman coding
George Ang
 
Data Compression - Text Compression - Run Length Encoding
Data Compression - Text Compression - Run Length EncodingData Compression - Text Compression - Run Length Encoding
Data Compression - Text Compression - Run Length Encoding
MANISH T I
 
Data Communication & Computer network: Shanon fano coding
Data Communication & Computer network: Shanon fano codingData Communication & Computer network: Shanon fano coding
Data Communication & Computer network: Shanon fano coding
Dr Rajiv Srivastava
 

Similar to Huffman coding01 (20)

Huffman analysis
Huffman analysisHuffman analysis
Huffman analysis
Abubakar Sultan
 
Lec5 Compression
Lec5 CompressionLec5 Compression
Lec5 Compression
anithabalaprabhu
 
Huff
HuffHuff
Huff
Reetika Singh
 
Huffman
HuffmanHuffman
Huffman
keerthi vasan
 
Huffman
HuffmanHuffman
Huffman
keerthi vasan
 
12_HuffmanhsjsjsjjsiejjssjjejsjCoding_pdf.pdf
12_HuffmanhsjsjsjjsiejjssjjejsjCoding_pdf.pdf12_HuffmanhsjsjsjjsiejjssjjejsjCoding_pdf.pdf
12_HuffmanhsjsjsjjsiejjssjjejsjCoding_pdf.pdf
SHIVAM691605
 
INSTRUCTIONS For this assignment you will be generating all code on y.pdf
 INSTRUCTIONS For this assignment you will be generating all code on y.pdf INSTRUCTIONS For this assignment you will be generating all code on y.pdf
INSTRUCTIONS For this assignment you will be generating all code on y.pdf
adayarboot
 
Lossless
LosslessLossless
Lossless
Vishal Suri
 
Lecture 15_Strings and Dynamic Memory Allocation.pptx
Lecture 15_Strings and  Dynamic Memory Allocation.pptxLecture 15_Strings and  Dynamic Memory Allocation.pptx
Lecture 15_Strings and Dynamic Memory Allocation.pptx
JawadTanvir
 
Lecft3data
Lecft3dataLecft3data
Lecft3data
ali Hussien
 
The assigment is overdue now. I will up the price I am willing to pa.docx
The assigment is overdue now. I will up the price I am willing to pa.docxThe assigment is overdue now. I will up the price I am willing to pa.docx
The assigment is overdue now. I will up the price I am willing to pa.docx
rtodd17
 
Komdat-Kompresi Data
Komdat-Kompresi DataKomdat-Kompresi Data
Komdat-Kompresi Data
mursalinfajri007
 
Mysql Performance Optimization Indexing Algorithms and Data Structures
Mysql Performance Optimization Indexing Algorithms and Data StructuresMysql Performance Optimization Indexing Algorithms and Data Structures
Mysql Performance Optimization Indexing Algorithms and Data Structures
Abhijit Mondal
 
Day5 String python language for btech.pptx
Day5 String python language for btech.pptxDay5 String python language for btech.pptx
Day5 String python language for btech.pptx
mrsam3062
 
Hufman coding basic
Hufman coding basicHufman coding basic
Hufman coding basic
radthees
 
Perly Parallel Processing of Fixed Width Data Records
Perly Parallel Processing of Fixed Width Data RecordsPerly Parallel Processing of Fixed Width Data Records
Perly Parallel Processing of Fixed Width Data Records
Workhorse Computing
 
16_Greedy_Algorithms.ppt
16_Greedy_Algorithms.ppt16_Greedy_Algorithms.ppt
16_Greedy_Algorithms.ppt
DrAliKMattar
 
16_Greedy_Algorithms Greedy_AlgorithmsGreedy_Algorithms
16_Greedy_Algorithms Greedy_AlgorithmsGreedy_Algorithms16_Greedy_Algorithms Greedy_AlgorithmsGreedy_Algorithms
16_Greedy_Algorithms Greedy_AlgorithmsGreedy_Algorithms
Shanmuganathan C
 
16_Greedy_Algorithms.ppt
16_Greedy_Algorithms.ppt16_Greedy_Algorithms.ppt
16_Greedy_Algorithms.ppt
talwandibhindran
 
Basics of coding theory
Basics of coding theoryBasics of coding theory
Basics of coding theory
Madhumita Tamhane
 
12_HuffmanhsjsjsjjsiejjssjjejsjCoding_pdf.pdf
12_HuffmanhsjsjsjjsiejjssjjejsjCoding_pdf.pdf12_HuffmanhsjsjsjjsiejjssjjejsjCoding_pdf.pdf
12_HuffmanhsjsjsjjsiejjssjjejsjCoding_pdf.pdf
SHIVAM691605
 
INSTRUCTIONS For this assignment you will be generating all code on y.pdf
 INSTRUCTIONS For this assignment you will be generating all code on y.pdf INSTRUCTIONS For this assignment you will be generating all code on y.pdf
INSTRUCTIONS For this assignment you will be generating all code on y.pdf
adayarboot
 
Lecture 15_Strings and Dynamic Memory Allocation.pptx
Lecture 15_Strings and  Dynamic Memory Allocation.pptxLecture 15_Strings and  Dynamic Memory Allocation.pptx
Lecture 15_Strings and Dynamic Memory Allocation.pptx
JawadTanvir
 
The assigment is overdue now. I will up the price I am willing to pa.docx
The assigment is overdue now. I will up the price I am willing to pa.docxThe assigment is overdue now. I will up the price I am willing to pa.docx
The assigment is overdue now. I will up the price I am willing to pa.docx
rtodd17
 
Mysql Performance Optimization Indexing Algorithms and Data Structures
Mysql Performance Optimization Indexing Algorithms and Data StructuresMysql Performance Optimization Indexing Algorithms and Data Structures
Mysql Performance Optimization Indexing Algorithms and Data Structures
Abhijit Mondal
 
Day5 String python language for btech.pptx
Day5 String python language for btech.pptxDay5 String python language for btech.pptx
Day5 String python language for btech.pptx
mrsam3062
 
Hufman coding basic
Hufman coding basicHufman coding basic
Hufman coding basic
radthees
 
Perly Parallel Processing of Fixed Width Data Records
Perly Parallel Processing of Fixed Width Data RecordsPerly Parallel Processing of Fixed Width Data Records
Perly Parallel Processing of Fixed Width Data Records
Workhorse Computing
 
16_Greedy_Algorithms.ppt
16_Greedy_Algorithms.ppt16_Greedy_Algorithms.ppt
16_Greedy_Algorithms.ppt
DrAliKMattar
 
16_Greedy_Algorithms Greedy_AlgorithmsGreedy_Algorithms
16_Greedy_Algorithms Greedy_AlgorithmsGreedy_Algorithms16_Greedy_Algorithms Greedy_AlgorithmsGreedy_Algorithms
16_Greedy_Algorithms Greedy_AlgorithmsGreedy_Algorithms
Shanmuganathan C
 
Ad

More from Nv Thejaswini (12)

Mea notes
Mea notesMea notes
Mea notes
Nv Thejaswini
 
Appendix b 2
Appendix b 2Appendix b 2
Appendix b 2
Nv Thejaswini
 
Chapter9 4
Chapter9 4Chapter9 4
Chapter9 4
Nv Thejaswini
 
Lecture11
Lecture11Lecture11
Lecture11
Nv Thejaswini
 
Ch8
Ch8Ch8
Ch8
Nv Thejaswini
 
Branch and bound
Branch and boundBranch and bound
Branch and bound
Nv Thejaswini
 
Unit 5 jwfiles
Unit 5 jwfilesUnit 5 jwfiles
Unit 5 jwfiles
Nv Thejaswini
 
Unit 4 jwfiles
Unit 4 jwfilesUnit 4 jwfiles
Unit 4 jwfiles
Nv Thejaswini
 
Unit 3 daa
Unit 3 daaUnit 3 daa
Unit 3 daa
Nv Thejaswini
 
Unit 2 in daa
Unit 2 in daaUnit 2 in daa
Unit 2 in daa
Nv Thejaswini
 
Ch8 of OS
Ch8 of OSCh8 of OS
Ch8 of OS
Nv Thejaswini
 
Presentation solar
Presentation solarPresentation solar
Presentation solar
Nv Thejaswini
 
Ad

Recently uploaded (20)

AI-Powered Data Management and Governance in Retail
AI-Powered Data Management and Governance in RetailAI-Powered Data Management and Governance in Retail
AI-Powered Data Management and Governance in Retail
IJDKP
 
01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf
01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf
01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf
PawachMetharattanara
 
Machine foundation notes for civil engineering students
Machine foundation notes for civil engineering studentsMachine foundation notes for civil engineering students
Machine foundation notes for civil engineering students
DYPCET
 
Modeling the Influence of Environmental Factors on Concrete Evaporation Rate
Modeling the Influence of Environmental Factors on Concrete Evaporation RateModeling the Influence of Environmental Factors on Concrete Evaporation Rate
Modeling the Influence of Environmental Factors on Concrete Evaporation Rate
Journal of Soft Computing in Civil Engineering
 
IPC-7711D-7721D_ EN 2023 TOC Rework, Modification and Repair of Electronic As...
IPC-7711D-7721D_ EN 2023 TOC Rework, Modification and Repair of Electronic As...IPC-7711D-7721D_ EN 2023 TOC Rework, Modification and Repair of Electronic As...
IPC-7711D-7721D_ EN 2023 TOC Rework, Modification and Repair of Electronic As...
ssuserd9338b
 
Control Methods of Noise Pollutions.pptx
Control Methods of Noise Pollutions.pptxControl Methods of Noise Pollutions.pptx
Control Methods of Noise Pollutions.pptx
vvsasane
 
🚀 TDX Bengaluru 2025 Unwrapped: Key Highlights, Innovations & Trailblazer Tak...
🚀 TDX Bengaluru 2025 Unwrapped: Key Highlights, Innovations & Trailblazer Tak...🚀 TDX Bengaluru 2025 Unwrapped: Key Highlights, Innovations & Trailblazer Tak...
🚀 TDX Bengaluru 2025 Unwrapped: Key Highlights, Innovations & Trailblazer Tak...
SanjeetMishra29
 
DeFAIMint | 🤖Mint to DeFAI. Vibe Trading as NFT
DeFAIMint | 🤖Mint to DeFAI. Vibe Trading as NFTDeFAIMint | 🤖Mint to DeFAI. Vibe Trading as NFT
DeFAIMint | 🤖Mint to DeFAI. Vibe Trading as NFT
Kyohei Ito
 
David Boutry - Specializes In AWS, Microservices And Python
David Boutry - Specializes In AWS, Microservices And PythonDavid Boutry - Specializes In AWS, Microservices And Python
David Boutry - Specializes In AWS, Microservices And Python
David Boutry
 
UNIT 3 Software Engineering (BCS601) EIOV.pdf
UNIT 3 Software Engineering (BCS601) EIOV.pdfUNIT 3 Software Engineering (BCS601) EIOV.pdf
UNIT 3 Software Engineering (BCS601) EIOV.pdf
sikarwaramit089
 
Personal Protective Efsgfgsffquipment.ppt
Personal Protective Efsgfgsffquipment.pptPersonal Protective Efsgfgsffquipment.ppt
Personal Protective Efsgfgsffquipment.ppt
ganjangbegu579
 
Agents chapter of Artificial intelligence
Agents chapter of Artificial intelligenceAgents chapter of Artificial intelligence
Agents chapter of Artificial intelligence
DebdeepMukherjee9
 
Using the Artificial Neural Network to Predict the Axial Strength and Strain ...
Using the Artificial Neural Network to Predict the Axial Strength and Strain ...Using the Artificial Neural Network to Predict the Axial Strength and Strain ...
Using the Artificial Neural Network to Predict the Axial Strength and Strain ...
Journal of Soft Computing in Civil Engineering
 
Frontend Architecture Diagram/Guide For Frontend Engineers
Frontend Architecture Diagram/Guide For Frontend EngineersFrontend Architecture Diagram/Guide For Frontend Engineers
Frontend Architecture Diagram/Guide For Frontend Engineers
Michael Hertzberg
 
Transport modelling at SBB, presentation at EPFL in 2025
Transport modelling at SBB, presentation at EPFL in 2025Transport modelling at SBB, presentation at EPFL in 2025
Transport modelling at SBB, presentation at EPFL in 2025
Antonin Danalet
 
Mode-Wise Corridor Level Travel-Time Estimation Using Machine Learning Models
Mode-Wise Corridor Level Travel-Time Estimation Using Machine Learning ModelsMode-Wise Corridor Level Travel-Time Estimation Using Machine Learning Models
Mode-Wise Corridor Level Travel-Time Estimation Using Machine Learning Models
Journal of Soft Computing in Civil Engineering
 
Water Industry Process Automation & Control Monthly May 2025
Water Industry Process Automation & Control Monthly May 2025Water Industry Process Automation & Control Monthly May 2025
Water Industry Process Automation & Control Monthly May 2025
Water Industry Process Automation & Control
 
698642933-DdocfordownloadEEP-FAKE-PPT.pptx
698642933-DdocfordownloadEEP-FAKE-PPT.pptx698642933-DdocfordownloadEEP-FAKE-PPT.pptx
698642933-DdocfordownloadEEP-FAKE-PPT.pptx
speedcomcyber25
 
Lecture - 7 Canals of the topic of the civil engineering
Lecture - 7  Canals of the topic of the civil engineeringLecture - 7  Canals of the topic of the civil engineering
Lecture - 7 Canals of the topic of the civil engineering
MJawadkhan1
 
22PCOAM16_MACHINE_LEARNING_UNIT_IV_NOTES_with_QB
22PCOAM16_MACHINE_LEARNING_UNIT_IV_NOTES_with_QB22PCOAM16_MACHINE_LEARNING_UNIT_IV_NOTES_with_QB
22PCOAM16_MACHINE_LEARNING_UNIT_IV_NOTES_with_QB
Guru Nanak Technical Institutions
 
AI-Powered Data Management and Governance in Retail
AI-Powered Data Management and Governance in RetailAI-Powered Data Management and Governance in Retail
AI-Powered Data Management and Governance in Retail
IJDKP
 
01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf
01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf
01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf
PawachMetharattanara
 
Machine foundation notes for civil engineering students
Machine foundation notes for civil engineering studentsMachine foundation notes for civil engineering students
Machine foundation notes for civil engineering students
DYPCET
 
IPC-7711D-7721D_ EN 2023 TOC Rework, Modification and Repair of Electronic As...
IPC-7711D-7721D_ EN 2023 TOC Rework, Modification and Repair of Electronic As...IPC-7711D-7721D_ EN 2023 TOC Rework, Modification and Repair of Electronic As...
IPC-7711D-7721D_ EN 2023 TOC Rework, Modification and Repair of Electronic As...
ssuserd9338b
 
Control Methods of Noise Pollutions.pptx
Control Methods of Noise Pollutions.pptxControl Methods of Noise Pollutions.pptx
Control Methods of Noise Pollutions.pptx
vvsasane
 
🚀 TDX Bengaluru 2025 Unwrapped: Key Highlights, Innovations & Trailblazer Tak...
🚀 TDX Bengaluru 2025 Unwrapped: Key Highlights, Innovations & Trailblazer Tak...🚀 TDX Bengaluru 2025 Unwrapped: Key Highlights, Innovations & Trailblazer Tak...
🚀 TDX Bengaluru 2025 Unwrapped: Key Highlights, Innovations & Trailblazer Tak...
SanjeetMishra29
 
DeFAIMint | 🤖Mint to DeFAI. Vibe Trading as NFT
DeFAIMint | 🤖Mint to DeFAI. Vibe Trading as NFTDeFAIMint | 🤖Mint to DeFAI. Vibe Trading as NFT
DeFAIMint | 🤖Mint to DeFAI. Vibe Trading as NFT
Kyohei Ito
 
David Boutry - Specializes In AWS, Microservices And Python
David Boutry - Specializes In AWS, Microservices And PythonDavid Boutry - Specializes In AWS, Microservices And Python
David Boutry - Specializes In AWS, Microservices And Python
David Boutry
 
UNIT 3 Software Engineering (BCS601) EIOV.pdf
UNIT 3 Software Engineering (BCS601) EIOV.pdfUNIT 3 Software Engineering (BCS601) EIOV.pdf
UNIT 3 Software Engineering (BCS601) EIOV.pdf
sikarwaramit089
 
Personal Protective Efsgfgsffquipment.ppt
Personal Protective Efsgfgsffquipment.pptPersonal Protective Efsgfgsffquipment.ppt
Personal Protective Efsgfgsffquipment.ppt
ganjangbegu579
 
Agents chapter of Artificial intelligence
Agents chapter of Artificial intelligenceAgents chapter of Artificial intelligence
Agents chapter of Artificial intelligence
DebdeepMukherjee9
 
Frontend Architecture Diagram/Guide For Frontend Engineers
Frontend Architecture Diagram/Guide For Frontend EngineersFrontend Architecture Diagram/Guide For Frontend Engineers
Frontend Architecture Diagram/Guide For Frontend Engineers
Michael Hertzberg
 
Transport modelling at SBB, presentation at EPFL in 2025
Transport modelling at SBB, presentation at EPFL in 2025Transport modelling at SBB, presentation at EPFL in 2025
Transport modelling at SBB, presentation at EPFL in 2025
Antonin Danalet
 
698642933-DdocfordownloadEEP-FAKE-PPT.pptx
698642933-DdocfordownloadEEP-FAKE-PPT.pptx698642933-DdocfordownloadEEP-FAKE-PPT.pptx
698642933-DdocfordownloadEEP-FAKE-PPT.pptx
speedcomcyber25
 
Lecture - 7 Canals of the topic of the civil engineering
Lecture - 7  Canals of the topic of the civil engineeringLecture - 7  Canals of the topic of the civil engineering
Lecture - 7 Canals of the topic of the civil engineering
MJawadkhan1
 

Huffman coding01

  • 1. Huffman Coding The idea behind Huffman coding is to find a way to compress the storage of data using variable length codes. Our standard model of storing data uses fixed length codes. For example, each character in a text file is stored using 8 bits. There are certain advantages to this system. When reading a file, we know to ALWAYS read 8 bits at a time to read a single character. But as you might imagine, this coding scheme is inefficient. The reason for this is that some characters are more frequently used than other characters. Let's say that the character 'e' is used 10 times more frequently than the character 'q'. It would then be advantageous for us to use a 7 bit code for e and a 9 bit code for q instead because that could shorten our overall message length. Huffman coding finds the optimal way to take advantage of varying character frequencies in a particular file. On average, using Huffman coding on standard files can shrink them anywhere from 10% to 30% depending to the character distribution. (The more skewed the distribution, the better Huffman coding will do.) The idea behind the coding is to give less frequent characters and groups of characters longer codes. Also, the coding is constructed in such a way that no two constructed codes are prefixes of each other. This property about the code is crucial with respect to easily deciphering the code.
  • 2. Building a Huffman Tree The easiest way to see how this algorithm works is to work through an example. Let's assume that after scanning a file we find the following character frequencies: Character Frequency 'a' 12 'b' 2 'c' 7 'd' 13 'e' 14 'f' 85 Now, create a binary tree for each character that also stores the frequency with which it occurs. The algorithm is as follows: Find the two binary trees in the list that store minimum frequencies at their nodes. Connect these two nodes at a newly created common node that will store NO character but will store the sum of the frequencies of all the nodes connected below it. So our picture looks like follows: 9 12 'a' 13 'd' 14 'e' 85 'f' / 2 'b' 7 'c'
  • 3. Now, repeat this process until only one tree is left: 21 / 9 12 'a' 13 'd' 14 'e' 85 'f' / 2 'b' 7 'c' 21 27 / / 9 12 'a' 13 'd' 14 'e' 85 'f' / 2 'b' 7 'c' 48 / 21 27 / / 9 12 'a' 13 'd' 14 'e' 85 'f' / 2 'b' 7 'c' 133 / 48 85 'f' / 21 27 / / 9 12 'a' 13 'd' 14 'e' / 2 'b' 7 'c' Once the tree is built, each leaf node corresponds to a letter with a code. To determine the code for a particular node, walk
  • 4. a standard search path from the root to the leaf node in question. For each step to the left, append a 0 to the code and for each step right append a 1. Thus for the tree above we get the following codes: Letter Code 'a' 001 'b' 0000 'c' 0001 'd' 010 'e' 011 'f' 1 Why are we guaranteed that one code is NOT the prefix of another? Find a set of valid Huffman codes for a file with the given character frequencies: Character Frequency 'a' 15 'b' 7 'c' 5 'd' 23 'e' 17 'f' 19
  • 5. Calculating Bits Saved All we need to do for this calculation is figure out how many bits are originally used to store the data and subtract from that how many bits are used to store the data using the Huffman code. In the first example given, since we have six characters, let's assume each is stored with a three bit code. Since there are 133 such characters, the total number of bits used is 3*133 = 399. Now, using the Huffman coding frequencies we can calculate the new total number of bits used: Letter Code Frequency Total Bits 'a' 001 12 36 'b' 0000 2 8 'c' 0001 7 28 'd' 010 13 39 'e' 011 14 42 'f' 1 85 85 Total 238 Thus, we saved 399 - 238 = 161 bits, or nearly 40% storage space. Of course there is a small detail we haven't taken into account here. What is that?
  • 6. Huffman Coding is an Optimal Prefix Code Of all prefix codes for a file, Huffman coding produces an optimal one. In all of our examples from class on Monday, we found that Huffman coding saved us a fair percentage of storage space. But, we can show that no other prefix code can do better than Huffman coding. First, we will show the following: Let x and y be two characters with the least frequencies in a file. Then there exists an optimal prefix code for C in which the codewords for x and y have the same length and differ only in the last bit. Here is how we will prove this: Assume that a tree T stores an optimal prefix code. Let and characters a and b be sibling nodes stored at the maximum depth of the tree. We will show that we can create T' with x and y as siblings at the lowest depth of the tree such that the number of bits used for the coding with T' is the same as with T. (Let f(a) denote the frequency of the character a. Without loss of generality, assume f(x) ≤ f(y) and f(a) ≤ f(b). It also follows that f(x) ≤ f(a) and f(y) ≤ f(b). Let h be the height of the tree T. Let x have a depth of dx in T and y have a depth of dx in T.) Create T' as follows: swap the nodes storing a and x, and then swap the nodes storing b and y. Now, we have that the depth of x and y in T' is h, the depth of a is dx and the depth of b is dy in T'.
  • 7. Now, let's calculate the change in the number of bits used for the coding with tree T' with the coding in tree T. (Note: Since all other codes remain unchanged, we only need to analyze the total number of bits it takes to code a, b, x and y.) # bits for tree T (for a,b,x and y) = hf(a) + hf(b) + dxf(x) dyf(y) # bits for tree T' (for a, b, x, and y) = dxf(a) + dyf(b) + hf(x) + hf(y). Difference = hf(a) + hf(b) + dxf(x) dyf(y) - (dxf(a) + dyf(b) + hf(x) + hf(y)) = hf(a) + hf(b) + dxf(x) dyf(y) - dxf(a) - dyf*b) - hf(x) - hf(y) = h(f(a) - f(x)) + h(f(b)-f(y)) + dx(f(x) - f(a)) + dy(f(y) - f(b)) = h(f(a) - f(x)) + h(f(b)-f(y)) - dx(f(a) - f(x)) - dy(f(b) - f(y)) = (h - dx)(f(a) - f(x)) + (h - dy)(f(b) - f(y)) Notice that all four of the terms above must be non-negative since we know that h ≥ dx, h ≥ dy, f(a) ≥ f(x), and f(b) ≥ f(y). Thus, it follows that this difference must be 0. Thus, the number of bits to used in a code where x and y (the two characters with lowest frequency) are siblings at maximum depth of the coding tree is optimal. In layman's terms, give me what you think is an optimal coding tree, and I can create a new one from it with the two nodes corresponding to low frequencies at the bottom of the tree. To complete the proof, you'll notice that by construction, Huffman coding ALWAYS makes sure that the nodes with the
  • 8. lowest frequencies are at the bottom of the coding tree, all the way through the construction. (You can't find any pair of nodes for which this isn't true.) Technically, to carry out the proof, you'd use induction, but we'll skip that for now...
  翻译: