Naila (Psycoacoustics)

Seminar Report 2004
1. Introduction
Basics of Audio Compression
Advances in digital audio technology are fueled by two sources: hardware

developments and new signal processing techniques. When processors dissipated tens of
watts of power and memory densities were on the order of kilobits per square inch, portable
playback devices like an MP3 player were not possible. Now, however, power dissipation,
memory densities, and processor speeds have improved by several orders of magnitude.
Advancements in signal processing are exemplified by Internet broadcast applications:

if the desired sound quality for an internet broadcast used 16-bit PCM encoding at 44.1
KHz, such an application would require a 1.4 Mbps (2 x 16 x 44k) channel for a stereo
signal! Fortunately new bit rate reduction techniques in signal processing for audio of this
quality are constantly being released.
Increasing hardware efficiency and an expanding array of digital audio representation

formats are giving rise to a wide variety of new digital audio applications. These
applications include portable music playback devices, digital surround sound for cinema,
high-quality digital radio and television broadcast, Digital Versatile Disc (DVD), and many
others.
This paper introduces digital audio signal compression, a technique essential to the
implementation of many digital audio applications. Digital audio signal compression is the
removal of redundant or otherwise irrelevant information from a digital audio signal, a
process that is useful for conserving both transmission bandwidth and storage space. We
Dept. of ECE -1- Govt. Engg. College,

Thrissur
Seminar Report 2004
begin by defining some useful terminology. We then present a typical “encoder” (as
compression algorithms are often called) and explain how it functions. Finally consider
some standards that employ digital audio signal compression, and discuss the future of the
field.
Psychoacoustics is the study of subjective human perception of sounds. Effectively, it is

the study of acoustical perception. Psychoacoustic modeling has long-since been an integral
part of audio compression. It exploits properties of the human auditory system to remove
the redundancies inherent in audio signals that the human ear cannot perceive. More
powerful signals at certain frequencies ‘mask’ less powerful signals at nearby frequencies
by de-sensitizing the human ear’s basilar membrane (which is responsible for resolving the
frequency components of a signal). The entire MP3 phenomenon is made possible by the
confluence of several distinct but interrelated elements: a few simple insights into the nature
of human psychoacoustics, a whole lot of number crunching, and conformance to a tightly
specified format for encoding and decoding audio into compact bitstreams.
2. Terminology
Audio Compression vs. Speech Compression
This paper focuses on audio compression techniques, which differ from those used in
speech compression. Speech compression uses a model of the human vocal tract to express
particular signal in a compressed format. This technique is not usually applied in the field of
audio compression due to the vast array of sounds that can be generated – models that
represent audio generation would be too complex to implement. So instead of modeling the
source of sounds, modern audio compression models the receiver, i.e., the human ear.
Dept. of ECE 2 Govt. Engg. College,

Thrissur
Seminar Report 2004
Lossless vs. Lossy
When we speak of compression, we must distinguish between two different types:

lossless and lossy. Lossless compression retains all the information in a given signal, i.e., a
decoder can perfectly reconstruct a compressed signal. In contrast, lossy compression
eliminates information from the original signal. As a result, a reconstructed signal may
differ from the original. With audio signals, the differences between the original and
reconstructed signals only matter if they are detectable by the human ear. As we will
explore shortly, audio compression employs both lossy and lossless techniques.
3. Basic Building Blocks
Figure 1 shows a generic encoder or “compressor that takes blocks of sampled audio
signal as its input. These blocks typically consist of between 500 and 1500 samples per
channel, depending on the encoder specification. For example, the MPEG-1 layer III (MP3)
specification takes 576 samples per channel per input block. The output is a compressed
representation of the input block (a “frame”) that can be transmitted or stored for
subsequent decoding.

Thrissur
Seminar Report 2004
4. Anatomy and Physiology of the human ear
No matter what you do, your ears are always working. They are constantly detecting,
deciphering and analyzing sounds and communicating them to the brain. In a comparatively
tiny area of our body the ear is performing many highly technical and intricate functions.
There are three distinct portions to the ear: the outer ear containing the fleshy skin and the
canal that leads to the inner ear, the middle ear containing the three smallest bones in the
human body the malleus, incus and stapes (commonly called the hammer, anvil and stirrup)

Thrissur
Seminar Report 2004
and the inner ear, made up of a cluster of three semicircular canals and the snail shaped
cochlea. Let’s take a look at them one at a time…
The Outer Ear

The outer ear collects sound waves in the air and channels them to the inner parts of the
ear. The outer ear along with its canal has been shown to enhance sounds within a certain
frequency range. That range just happens to be the same range that most of the
characteristics of human speech sounds fall into. This allows the sounds to be boosted to
twice their original intensity.
The Middle Ear

The middle ear transforms the acoustical vibration of the sound wave into mechanical
vibration and passes it onto the inner ear. The three tiny bones of the middle ear act as a
lever to bridge the eardrum with the oval window. Incoming forces are magnified by about
30%. This increased force allows the fluid in the cochlea of the inner ear to be activated.
The Inner Ear

The semi-circular canals in the inner ear allow us to maintain balance and coordination.
The cochlea, which is a bundle of three fluid filled canals coiled up in a spiral, is set in
motion by the stirrup in the middle ear. Moving in and out it sets up hydraulic pressure in
the fluid. As these waves travel to and from the apex of the spirals, they cause the walls
separating the canals to undulate. Along one of these walls is a sensitive organ called the
Corti. It is made up of many thousands of sensory hair cells. From here thousands of nerve
fibers carry information about the frequency, intensity and timbre of all these sounds to the
brain, where the sensation of hearing occurs.
Scientists cannot fully explain just how the signals are transmitted to the brain. They do
know that the signals sent by all the hair cells are about the same in duration and strength.

Thrissur
Seminar Report 2004
This has led them to believe that it is not the content of the signals but rather the signals
themselves that convey some sort of message to the brain.
Our ears, so often taken for granted, thus are a marvel of intricacy and design that leaves
anything that man can produce in the shade as a cheap imitation. Your hearing can never be
replaced. Don’t take it for granted.
5. Psychoacoustics
How do we reduce the size of the input data? The basic idea is to eliminate information
that is inaudible to the ear. This type of compression is often referred to as perceptual
encoding. To help determine what can and cannot be heard, compression algorithms rely on
the field of psychoacoustics, i.e., the study of human sound perception. Waves vibrating at
different frequencies manifest themselves differently, all the way from the astronomically
slow pulsations of the universe itself to the inconceivably fast vibration of matter (and
beyond). Somewhere in between these extremes are wavelengths that are perceptible to
human beings as light and sound. Just beyond the realms of light and sound are sub- and
ultrasonic vibration, the infrared and ultraviolet light spectra, and zillions of other
frequencies imperceptible to humans (such as radio and microwave). Our sense organs are
tuned only to very narrow bandwidths of vibration in the overall picture. In fact, even our
own musical instruments create many vibrational frequencies that are imperceptible to our
ears. Frequencies are typically described in units called Hertz (Hz), which translates simply
as "cycles per second." In general, humans cannot hear frequencies below 20Hz (20 cycles
per second), nor above 20kHz (20,000 cycles per second), as shown in Figure 2.

Thrissur
Seminar Report 2004
While hearing capacities vary from one individual to the next, it's generally true that
humans perceive midrange frequencies more strongly than high and low frequencies,[2] and
that sensitivity to higher frequencies diminishes with age and prolonged exposure to loud
volumes. In fact, by the time we're adults, most of us can't hear much of anything above
16kHz (although women tend to preserve the ability to hear higher frequencies later into life
than do men). The most sensitive range of hearing for most people hovers between 2kHz to

Thrissur
Seminar Report 2004
4kHz, a level probably evolutionarily related to the normal range of the human voice, which
runs roughly from 500Hz to 2kHz.
Specifically, audio compression algorithms exploit the conditions under which signal
characteristics obscure or mask each other. This phenomenon occurs in three different
ways: threshold cut-off, frequency masking and temporal masking. The remainder of this
section explains the nature of these concepts; subsequent sections explain how they are
typically applied to audio signal compression.
Threshold Cut-off
The human ear detects sounds as a local variation in air pressure measured as the Sound
Pressure Level (SPL). If variations in the SPL are below a certain threshold in amplitude,
the ear cannot detect them. This threshold, shown in Figure 3, is a function of the sound’s
frequency. Notice in Figure 3 that because the lowest-frequency component is below the
threshold, it will not be heard.

Thrissur
Seminar Report 2004
Frequency Masking
Even if a signal component exceeds the hearing threshold, it may still be masked by
louder components that are near it in frequency. This phenomenon is known as frequency
masking or simultaneous masking. Each component in a signal can cast a “shadow” over
neighbouring components. If the neighbouring components are covered by this shadow,
they will not be heard. The effective result is that one component, the masker, shifts the
hearing threshold. Figure 4 shows a situation in which this occurs.

Thrissur
Seminar Report 2004
Temporal Masking
Just as tones cast shadows on their neighbors in the frequency domain, a sudden
increase in volume can mask quieter sounds that are temporally close. This phenomenon is
known as temporal masking. Interestingly, sounds that occur both after and before the
volume increase can be masked! Figure 5 illustrates a typical temporal masking scenario:
events below the indicated threshold will not be heard. The idea behind temporal masking is
that humans also have trouble hearing distinct sounds that are close to one another in time.

Thrissur
Seminar Report 2004
For example, if a loud sound and a quiet sound are played simultaneously, you won't be
able to hear the quiet sound. If, however, there is sufficient delay between the two sounds,
you will hear the second, quieter sound. The key to the success of temporal masking is in
determining (quantifying) the length of time between the two tones at which the second
tone becomes audible, i.e., significant enough to keep it in the bitstream rather than
throwing it away. This distance, or threshold, turns out to be around five milliseconds when
working with pure tones, though it varies up and down in accordance with different audio
passages.
6. Spectral Analysis
Of the three masking phenomena explained above, two are best described in the
frequency domain. Thus, a frequency domain representation, also called the “spectrum” of a

Thrissur
Seminar Report 2004
signal, is a useful tool for analyzing the signal’s frequency characteristics and determining
thresholds. There are several different techniques for converting a finite time sequence into
its spectral representation, and these typically fall into one of two categories: transforms
and filter banks. Transforms calculate the spectrum of their inputs in terms of a set of basis
sequences; e.g., the Fourier Transform uses basic sequences that are complex exponentials.
Filter banks apply several different band pass filters to the input. Typically the result is
several time sequences, each of which corresponds to a particular frequency band. Taking
the spectrum of a signal has two purposes:
 To derive the masking thresholds in order to determine which portion of the
signal can be dropped.
 To generate a representation of the signal to which the masking threshold can
be applied.
Some compression schemes use different techniques for these two tasks.
The most popular transform in signal processing is the Fast Fourier Transform (FFT).
Given a finite time sequence, the FFT produces a complex-value frequency domain
representation. Encoders often use FFTs as a first step toward determining masking thresholds.
Another popular transform is the Discrete Cosine Transform (DCT), which outputs a real-
valued frequency domain representation. Both the FFT and the DCT suffer from distortion
when transforms are taken from contiguous blocks of time data. To solve this problem, inputs
and outputs can be overlapped and windowed in such a way that, in the absence of lossy
compression techniques, entire time signals can be perfectly reconstructed. For this reason,
most transform-based encoding schemes employ an overlapped and windowed DCT known as
the Modified Discrete Cosine Transform (MDCT). Some compression algorithms that use the
MDCT are MPEG-1 layer-III, MPEG-2 AAC, and Do
Dolby AC-3. Filter banks pass a block of time samples through several band pass filters to
generate

Thrissur
Seminar Report 2004
different signals corresponding to different sub-bands in frequency. After filtering, masking

thresholds can be applied to each sub-band. Two popular filter bank structures are the poly-
phase filter bank and the wavelet filter bank. The poly-phase filter bank uses parallel band
pass filters of equal width whose outputs are down-sampled to create one (shorter) signal per
sub-band. In the absence of lossy compression techniques, a decoder can achieve perfect
reconstruction by up-sampling, filtering, and adding each sub-band. This type of structure is
used in all of the MPEG-1 audio encoders.
7. Examples & Applications
The purpose of this section is to discuss some existing standards in digital audio
compression, in particular the MPEG-1 layer III. Features of interest for each standard include
which compression techniques are used, special details or unique characteristics, and target
applications.
7.1 MPEG-1 Layer III
7.1.1 History
In 1987, the Fraunhofer IIS started to work on perceptual audio coding in the
framework of the EUREKA project EU147, Digital Audio Broadcasting (DAB). In a joint
cooperation with the University of Erlangen (Prof. Dieter Seitzer), the Fraunhofer IIS finally
devised a very powerful algorithm that is standardized as ISO-MPEG Audio Layer-3 (IS
11172-3 and IS 13818-3).
7.1.2 About MP3

Thrissur
Seminar Report 2004
Without data reduction, digital audio signals typically consist of 16 bit samples recorded
at a sampling rate more than twice the actual audio bandwidth (e.g. 44.l1 KHz for Compact
Discs). So
you end up with more than 1.4 Mbit to represent just one second of stereo music in CD quality.
By using MPEG audio coding, you may shrink down the original sound data from a CD by a
factor of 12, without losing sound quality. Basically, this is realized by perceptual coding
techniques addressing the perception of sound waves by the human ear.
Using MPEG audio, one may achieve a typical data reduction of
1: 4 by Layer 1 (corresponds to 384 kbps for a stereo signal)
1: 6 to 1: 8 by Layer 2 (corresponds to 256 to 192 kbps for a stereo signal)
1: 10 to 1: 12 by Layer 3 (corresponds to 128 to 112 kbps for a stereo signal)
still maintaining the original CD sound quality.
By exploiting stereo effects and by limiting the audio bandwidth, the coding schemes
may achieve an acceptable sound quality at even lower bit rates. MPEG Layer-3 is the most
powerful member of the MPEG audio coding family. For a given sound quality level, it
requires the lowest bit rate or for a given bit rate, it achieves the highest sound quality.
7.1.3 Breaking It Down
MP3 uses two compression techniques to achieve its size reduction ratios over
uncompressed audio-one lossy and one lossless. First it throws away what humans can't hear
anyway (or at least it makes acceptable compromises), and then it encodes the redundancies to

Thrissur
Seminar Report 2004
achieve further compression. However, it's the first part of the process that does most of the
grunt work, requires most of the complexity.
Perceptual codecs are highly complex beasts, and all of them work a little differently.
However, the general principles of perceptual coding remain the same from one codec to the
next. In
brief, the MP3 encoding process can be subdivided into a handful of discrete tasks (not
necessarily in this order):
• Break the signal into smaller component pieces called " frames," each typically lasting a
fraction of a second. You can think of frames much as you would the frames in a movie
film.
• Analyze the signal to determine its "spectral energy distribution." In other words, on the
entire spectrum of audible frequencies, find out how the bits will need to be distributed
to best account for the audio to be encoded. Because different portions of the frequency
spectrum are most efficiently encoded via slight variants of the same algorithm, this step
breaks the signal into sub-bands, which can be processed independently for optimal
results (but note that all sub-bands use the algorithm-they just allocate the number of
bits differently, as determined by the encoder).
• The encoding bitrate is taken into account, and the maximum number of bits that can be
allocated to each frame is calculated. For instance, if you're encoding at 128 kbps, you
have an upper limit on how much data can be stored in each frame (unless you're
encoding with variable bitrates, but we'll get to that later). This step determines how
much of the available audio data will be stored, and how much will be left on the cutting
room floor.
• The frequency spread for each frame is compared to mathematical models of human
psychoacoustics, which are stored in the codec as a reference table. From this model, it

Thrissur
Seminar Report 2004
can be determined which frequencies need to be rendered accurately, since they'll be

perceptible to humans, and which ones can be dropped or allocated fewer bits, since we
wouldn't be able to hear them anyway. Why store data that can't be heard?
• The bitstream is run through the process of " Huffman coding," which compresses
redundant information throughout the sample. The Huffman coding does not work with
a psychoacoustic model, but achieves additional compression via more traditional
means. Thus, you can see the entire MP3 encoding process as a two-pass system: First
you run all of the psychoacoustic models, discarding data in the process, and then you
compress what's left
to shrink the storage space required by any redundancies. This second step, the Huffman
coding, does not discard any data-it just lets you store what's left in a smaller amount of
space.
• The collection of frames is assembled into a serial bitstream, with header information
preceding each data frame. The headers contain instructional "meta-data" specific to
that frame.
Along the way, many other factors enter into the equation, often as the result of options
chosen prior to beginning the encoding. In addition, algorithms for the encoding of an
individual frame often rely on the results of an encoding for the frames that precede or follow
it. The entire process usually includes some degree of simultaneity; the preceding steps are not
necessarily run in order.
7.1.4 Sound Quality
Some typical performance data of MPEG Layer-3 are:

Thrissur
Seminar Report 2004
Sound Quality Bandwidth Mode Bit Rate Reduction Ratio
Telephone sound 2.5 KHz mono 8 kbps* 96: 1
Near CD 15 KHz stereo 96 kbps 16: 1
CD >15 KHz stereo 112 to 128 kbps 14 to 12: 1
* Fraunhofer IIS uses a non-ISO extension of MPEG Layer-3 for enhanced performance
(“MPEG 2.5”)
7.1.5 MP3 Encoding (Block Diagram)

Thrissur
Seminar Report 2004
Filter Bank
The filter bank used in MPEG Layer-3 is a hybrid filter bank which consists of a poly-phase
filter bank and a Modified Discrete Cosine Transform (MDCT). This hybrid form was chosen
for reasons of compatibility to its predecessors, Layer-1 and Layer-2.
Perceptual Model

Thrissur
Seminar Report 2004
The perceptual model mainly determines the quality of a given encoder implementation. It
uses either a separate filter bank or combines the calculation of energy values (for the masking
calculations) and the main filter bank. The output of the perceptual model consists of values for
the masking threshold or the allowed noise for each coder partition. If the quantization noise
can be kept below the making threshold, then the compression results should be
indistinguishable from the original signal.
Joint Stereo
Joint stereo coding takes advantage of the fact that both channels of a stereo channel pair
contain far the same information. These stereophonic irrelevancies and redundancies are
exploited to reduce the total bit rate. Joint stereo is used in cases where only low bit rates are
available but stereo signals are desired.
Quantization and Coding

A system of two nested iteration loops is the common solution for quantization and coding
in a Layer-3 encoder.
Quantization is done via a power-law quantizer. In this way, larger values are automatically
coded with less accuracy and some noise shaping is already built into the quantization process.
The quantized values are coded by Huffman coding. As a specific method for entropy
coding, Huffman coding is lossless. This is called noiseless coding because no noise is added to
the audio signal.
The process to find the optimum gain and scale factors for a given block, bit-rate and output
from the perceptual model is usually done by two nested iteration loops in an analysis-by-
synthesis way:
 Inner iteration loop (rate loop)
The Huffman code tables assign shorter code words to (more frequent) smaller
quantized values. If the number of bits resulting from the coding operation exceeds
the number of bits available to code a given block of data, this can be corrected by
adjusting the global

Thrissur
Seminar Report 2004
gain to result in a larger quantization step sizes until the resulting bit demand for
Huffman coding is small enough.
 Outer iteration loop (noise control/distortion loop)

To shape the quantization noise according to the masking threshold, scale factors are
applied to each scale factor band. The system starts with a default factor of 1.0 for
each band. If the quantization noise in a given band is found to exceed the masking
threshold (allowed noise) as supplied by the perceptual model, the scale factor for
this band is adjusted to reduce the quantization noise. Since achieving a smaller
quantization noise requires a larger number of quantization steps and thus a higher
bit rate, the rate adjustment loop has to be repeated every time new scale factors are
used. In other words, the rate loop is nested within the noise control loop. The outer
(noise control) loop is executed until the actual noise (computed from the difference
of the original spectral values minus the quantized spectral values) is below the
masking threshold for every scale factor band (i.e. critical band).
7.1.6 MP3 Decoding
The great bulk of the work in the MP3 system as a whole is placed on the encoding
process. Since one typically plays files more frequently than one encodes them, this makes
sense. Decoders do not need to store or work with a model of human psychoacoustic principles,
nor do they require a bit allocation procedure. All the MP3 player has to worry about is
examining the bitstream of header and data frames for spectral components and the side
information stored alongside them, and then reconstructing this information to create an audio
signal. The player is nothing but an (often) fancy interface onto your collection of MP3 files
and playlists and your sound card, encapsulating the relatively straightforward rules of
decoding the MP3 bitstream format.

Thrissur
Seminar Report 2004
While there are measurable differences in the efficiency-and audible differences in the
quality-of various MP3 decoders, the differences are largely negligible on computer hardware
manufactured in the last few years. That's not to say that decoders just sit in the background
consuming no resources. In fact, on some machines and some operating systems you'll notice a
slight (or even pronounced) sluggishness in other operations while your player is running. This
is particularly true on operating systems that don't feature a finely grained threading model,
such as MacOS and most versions of Windows. Linux and, to an even greater extent, BeOS are
largely exempt from MP3 skipping problems, given decent hardware. And of course, if you're
listening to MP3 audio streamed over the Internet, you'll get skipping problems if you don't
have enough bandwidth to handle the bitrate/sampling frequency of the stream.
Some MP3 decoders chew up more CPU time than others, but the differences between
them in terms of efficiency are not as great as the differences between their feature sets, or
between the efficiency of various encoders. Choosing an MP3 player becomes a question of
cost, extensibility, audio quality, and appearance.
7.2 Other Audio Formats Using Psychoacoustic Coding
 Windows Media Audio (WMA) from Microsoft

 AC-3, used in Dolby Digital and DVD
 MPEG-4 AAC- Advanced Audio Compression
8. Future of Digital Music

Thrissur
Seminar Report 2004
Today's music technologies have turned passive listeners into active participants that
can capture, record, transform, edit, and save their music in a variety of digital formats. An
emerging technology that can significantly reduce the size of digital music files while
maintaining their original sound quality is mp3PRO.
A coding scheme for compressing audio signals, MPEG reduces the size of audio
files using three coding schemes or layers. The third layer, commonly known as MP3, uses
audio coding and psychoacoustic compression to remove the information or sounds that can't be
perceived by the human ear. The size of the original sound recording is subsequently reduced
by a factor of 12 without sacrificing sound quality.
Music compressed with MP3 is very similar to the original. However, when you
start to reduce the bit rate — thereby reducing the file size — the music begins to sound dull. In
addition, a 3-minute, satisfactory quality MP3 song takes about 15 minutes to download using a
56K modem.
The solution is spectral band replication (SBR). Developed by Coding

Technologies, this technology maintains the sound quality of a digital music file while reducing
the bit rate. The resulting audio format, known as mp3PRO, is composed of two components:
the mp3 part for the low frequencies and the SBR or "PRO" part for the high frequencies. Since
the "PRO" part requires only a few kbps, the format could be done in a way that it is still
compatible with the original mp3 format. In addition, existing mp3 players can be used to play
mp3PRO files. They simply ignore the PRO part.
It takes tremendous computing power to encode and decode music, especially to

MP3 or mp3PRO files. The speed in which these tasks can be accomplished is tied to both the
speed of a processor and whether the application being used is optimized for a specific
processor.

Thrissur
Seminar Report 2004
9. Conclusion
By eliminating audio information that the human ear cannot detect, modern
audio coding standards are able to compress a typical 1.4 Mbps signal by a factor of
about twelve. This is done by employing several different methodologies, including
noise allocation techniques based on psychoacoustic models.
Future goals for the field of audio compression are quite broad. Several
initiatives are focused on establishing a format for digital encryption (watermarking) to
protect copyrighted audio content. Improvements in psychoacoustic models are
expected to drive bit rates lower. Finally, entirely new avenues are being explored in an
effort to compress audio based on how it is produced rather than how it is perceived.
This last approach was integral in the development of the MPEG-4 standard.

Thrissur
Seminar Report 2004
References
 B. Cavagnolo and J. Bier “Introduction to Digital Audio Compression”, Berkely

Design Inc.
 Madisetti, Vijay K. and Williams, Douglass B. “The Digital Signal Processing
Handbook”, Section IX. CRC Press LLC, 1998.
 Massie, Dana, and Strawn, John, et al., Digital Audio: Applications, Algorithms
and Implementation. Seminar presented at Embedded Processor Forum by Berkely
Design Technology, Inc., 12 June 2000.
 Dylan T F Carline, Reuben Edwards, Paul Coulton “Psychoacoustic properties of
multi-channel audio signals” Lancaster University.

Thrissur

Naila (Psycoacoustics)

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Naila (Psycoacoustics)

Hochgeladen von

Copyright:

Verfügbare Formate

Seminar Report 2004

Basics of Audio Compression

Advances in digital audio technology are fueled by two sources: hardware

Advancements in signal processing are exemplified by Internet broadcast applications:

Increasing hardware efficiency and an expanding array of digital audio representation

Dept. of ECE -1- Govt. Engg. College,

Psychoacoustics is the study of subjective human perception of sounds. Effectively, it is

Audio Compression vs. Speech Compression

Dept. of ECE 2 Govt. Engg. College,

Lossless vs. Lossy

When we speak of compression, we must distinguish between two different types:

3. Basic Building Blocks

Dept. of ECE 3 Govt. Engg. College,

4. Anatomy and Physiology of the human ear

Dept. of ECE 4 Govt. Engg. College,

The Outer Ear

The Middle Ear

The Inner Ear

Dept. of ECE 5 Govt. Engg. College,

Dept. of ECE 6 Govt. Engg. College,

Dept. of ECE 7 Govt. Engg. College,

Dept. of ECE 8 Govt. Engg. College,

Dept. of ECE 9 Govt. Engg. College,

Dept. of ECE 10 Govt. Engg. College,

Dept. of ECE 11 Govt. Engg. College,

Dept. of ECE 12 Govt. Engg. College,

different signals corresponding to different sub-bands in frequency. After filtering, masking

7. Examples & Applications

7.1 MPEG-1 Layer III

7.1.2 About MP3

Dept. of ECE 13 Govt. Engg. College,

Using MPEG audio, one may achieve a typical data reduction of

1: 4 by Layer 1 (corresponds to 384 kbps for a stereo signal)

1: 6 to 1: 8 by Layer 2 (corresponds to 256 to 192 kbps for a stereo signal)

1: 10 to 1: 12 by Layer 3 (corresponds to 128 to 112 kbps for a stereo signal)

still maintaining the original CD sound quality.

7.1.3 Breaking It Down

Dept. of ECE 14 Govt. Engg. College,

Dept. of ECE 15 Govt. Engg. College,

can be determined which frequencies need to be rendered accurately, since they'll be

7.1.4 Sound Quality

Some typical performance data of MPEG Layer-3 are:

Dept. of ECE 16 Govt. Engg. College,

Sound Quality Bandwidth Mode Bit Rate Reduction Ratio

Telephone sound 2.5 KHz mono 8 kbps* 96: 1

Near CD 15 KHz stereo 96 kbps 16: 1

CD >15 KHz stereo 112 to 128 kbps 14 to 12: 1

7.1.5 MP3 Encoding (Block Diagram)

Dept. of ECE 17 Govt. Engg. College,

Dept. of ECE 18 Govt. Engg. College,

Quantization and Coding

Dept. of ECE 19 Govt. Engg. College,

 Outer iteration loop (noise control/distortion loop)

7.1.6 MP3 Decoding

Dept. of ECE 20 Govt. Engg. College,

7.2 Other Audio Formats Using Psychoacoustic Coding

 Windows Media Audio (WMA) from Microsoft

8. Future of Digital Music

Dept. of ECE 21 Govt. Engg. College,

The solution is spectral band replication (SBR). Developed by Coding

It takes tremendous computing power to encode and decode music, especially to

Dept. of ECE 22 Govt. Engg. College,