Audio Intro

EE1432 Pengolahan Sinyal Multimedia
Audio Theory and Characteristics

Endang Widjiati ewidjiati@telkom.net
Bidang Studi Telekomunikasi Multimedia Jurusan Teknik Elektro Fakultas Teknologi Industri Institut Teknologi Sepuluh Nopember
Introduction
Sound within the human hearing range called audio; and the waves in this frequency range called acoustic signal. Speech is an acoustic signal produced by humans Typical audio signal classes: telephone speech, wideband speech and wideband audio. The differences are in bandwidth, dynamic range, and in listener expectation of offered quality Some important concepts: - sampling the analog signal in time dimension - quantization the analog signal in amplitude dimension - Nyquist theorem
Introduction
The frequency range is divided into:
Infra sound Human hearing frequency range Ultrasound Hypersound from 0 to 20 Hz from 20 Hz to 20 KHz from 20 KHz to 1 GHz from 1 Ghz to 10 THz
Multimedia systems typically make use of sound only within the frequency range of human hearing; usually from 8 kHz to 48 kHz. Amplitude of the sound waves is a property heard as loudness
Introduction
SNR: ratio of the power of the correct signal to the noise; measure the quality of the signal. Usually measured in decibels (dB). The levels of sound we hear are described in terms of dB, as a ratio to the quietest sound we are able to hear. Magnitudes of common sounds, in decibels
Threshold of hearing Rustle of leaves Very quiet room Average room Conversation Busy street 0 10 20 40 60 70 Loud radio Train through station Riveter Threshold of discomfort Threshold of pain Damage eardrum 80 90 100 120 140 160
Other concepts: SQNR and segmental SNR

4
Introduction
Coding of the audio gets its compression without making assumptions about the nature of the audio source. The coder exploits the perceptual limitations of human auditory system. Much of the compression results from the removal of perceptually irrelevant parts of the audio signal. Removal of such part results in inaudible distortions, thus audio can compress any signal meant to be heard by the human ear
Introduction
Audio format Audio Quality vs Data Rate
Sample Rate Quality [KHz] Telephone AM Radio FM Radio CD DAT 8 11.025 22.05 44.1 48 8 8 16 16 16 24 (max) Mono Mono Stereo Stereo Stereo up to 6 channels Bits per sample Mono/Stereo [KBytes/sec] 8 11.0 88.2 176.4 192.0 1,200.0 (max) [Hz] 200-3,400 100-5.500 20-11,000 20-20,000 20-20,000 0-96,000 (max) Data rate (uncompressed) Frequency Band
DVD Audio 192 (max)
Popular audio file format: .au (Unix workstation), .aiff (MAC), .wav (PC, DEC workstation)
6
MIDI
MIDI Music Instrument Digital Interface: a protocol that enables computer, synthesizers, keyboards, and other musical device to communicate with each other Two components of MIDI: Hardware: connects the equipment. Specifies the physical connection between musical instruments Data format: messages that are used by MIDI devices to communicate with each other If a musical instrument satisfies both components of the MIDI standard, the instrument MIDI devices is capable of communicating with other MIDI devices through channels. The MIDI standard specifies 16 channels
7
MIDI
The heart of any MIDI system is the MIDI device e.g. synthesizer; with its main property is the maximum number of simultaneously played notes per channel. The range can be from 3 to 16 notes per channel Common components of most synthesizer: Sound generators: produce an audio signal that becomes sound. By varying the voltage oscillation of the audio signal, a generator changes the quality of the sound (its pitch, loudness and tone color) to create a wide variety of sound and notes Microprocessor: specifies note and sound commands to the sound generators Keyboard: affords the musicians direct control of the synthesizer. This should have at least 5 octaves with 61 keys
8
MIDI
Control panel: it controls functions that are not directly concerned with notes and durations, e.g. sets volume Auxilary controllers: control the notes played on the keyboard. Two common variables are pitch bend and modulation Memory: store patches for the sound generators and setting on the control panel MIDI messages Transmit information between MIDI devices and determine type of musical events can be passed from device to device Format of MIDI messages consists of the status byte (the first byte of any message describe the kind of message) and data bytes (the following bytes)
9
MIDI
Classification MIDI messages Channel messages: messages that are transmitted on individual channels rather that globally to all devices in the MIDI network Channel voice messages: instruct the receiving instrument to assign particular sounds to its voice; turn notes on and off; alter the sound of the currently active note or notes. e.g. note on, note off, control change, etc. Channel mode messages: determine the way that a receiving MIDI device responds to channel voice messages. They set the MIDI channel receiving modes for different MIDI devices, stop spurious notes from playing and affect local control of a device. e.g. local control, all notes off, omni mode off, etc.
10
MIDI
System messages: carry information that is not channel specific, such as timing signal for synchronization, positioning information in prerecorded MIDI sequences, and detailed setup information for the destination device. System real-time messages: messages related to synchronization. E.g. system reset, timing clock (MIDI clock), etc. System common messages: commands that prepare sequencers and synthesizers to play a song. E.g. song select, tune request, etc. System exclusive messages: messages related to things that cannot be standardized, and addition to the original MIDI specification. It is a stream of bytes that start with a system-exclusive-message, where the manufacturer is specified, and ends with an end-ofexclusive message.
11
MIDI
General MIDI Requirements for general MIDI compatibility: - Support all 16 channels - Each channel can play a different instrument/program (multitimbral) - Each channel can play many voices (polyphony) - Minimum of 24 fully dynamically allocated voices MIDI + instrument Patch Map + Percusion Key Map a piece of MIDI music sounds the same anywhere it is played - Instrument patch map is a standard program list consisting of 128 patch types - Percussion map specifies 47 percussion sounds - Key-based percussion is always transmitted on MIDI channel 10.
12
Psychoacoustics model
Threshold in quiet Put a person in a quiet room. Raise level of 1 kHz tone until just barely audible. Vary the frequency and plot
The threshold levels are frequency dependent. The human ear is most sensitive to 2-4 KHz.
13
Frequency masking Play 1 KHz tone (masking tone) at fixed level (60dB). Play test tone at different level (e.g. 1.1 kHz), and raise level until just distinguishable. Vary the frequency of the test tone and plot the threshold when it becomes audible
14
The threshold for the test tone is much larger than the threshold in quiet, near the masking frequency Repeat similar experiment for various frequencies of masking tones, yields:
Critical Bands: the widths of the masking bands for different masking tones are different, increasing with the frequency of the masking tone. About 100Hz for masking frequency < 500Hz, grow larger and larger above 500Hz.
15
Temporal masking If we hear a loud sound, then it stops, it takes a little while until we can hear a soft tone nearby Play 1 KHz masking tone at 60dB, plus a test tone at 1.1 KHz at 40dB. Test tone cant be heard (its masked). Stop masking tone, then stop test tone after a short delay. Adjust delay time to the shortest time that test tone can be heard (e.g., 5ms). Repeat with different level of the test tone and plot:
16
Temporal masking Try other frequencies for test tone (masking tone duration constant). Total effect of temporal masking:
17
Perceptual audio coding Quantization: The maximum quantization error for a uniform quantizer with stepsize Q is Q/2 The quantization noise introduced by reducing 1 bit for each sample (or increase the stepsize by a factor of 2) is 6dB Subband coding: Decompose a signal into separate frequency bands by using a filter bank Quantize samples in different bands with accuracy proportional to perceptual sensitivity
18
Perceptual audio coding The quantization step-size for each frequency band is set so that the quantization noise is just below the masking level, which is determined by taken into account of all three masking effects
19
MPEG
MPEG Motion Picture Experts Group; an ISO standard for the high fidelity compression of digital audio. MPEG/audio coder gets its compression without making assumption about the nature of the audio source. It exploits the perceptual limitations of the human auditory system MPEG-1 standard: defines coding standards for both audio and video, and how to packetize the coded audio and video bits to provide time synchronization Total rate: 1.5 Mbps for audio and video Video (352*240 pels/frame, 30 frame/s): 30 Mbps 1.2 Mbps Audio (2 channels, 48 Ksamples/s, 16 bits/sample): 2*768 kbps <=0.3 Mbps
20
10
MPEG
MPEG-2: for better quality audio and video (520*480 pels/frame) Supports one or two audio channels in one of the four modes: Monophonic mode for a single audio channel Dual-monophonic mode for two independent audio channels (similar to stereo) Stereo mode for stereo channels with a sharing bits between the channels, but no joint-stereo coding Joint stereo mode either takes advantage of correlations between stereo channels or irrelevancy of the phase difference between channels, or both
21
MPEG
MPEG-1 Audio coding block diagram:
22
11
MPEG
MPEG layers MPEG defines 3 layers for audio. Basic model is the same, but codec complexity increases with each layer Input sequence is separated into 32 frequency bands. Each subband filter produces 1 sample out for every 32 samples in Layer 1 processes 12 samples at a time in each subband Layer 2 and Layer 3 process 36 samples at a time
23
MPEG
Subband filtering and framing:
24
12
MPEG
Basic steps in algorithm: Use convolution filters to divide the audio signal into frequency subbands that approximate the 32 critical bands sub-band filtering Determine amount of masking for each band based on its frequency (threshold-in-quiet), and the energy of its neighboring band (frequency masking) (this is called the psychoacoustic model) If the energy in a band is below the masking threshold, dont encode it Otherwise, determine number of bits needed to represent the coefficient in this band such that the noise introduced by quantization is below the masking effect (recall that 1 bit of quantization introduces about 6 dB of noise)
25
MPEG
Basic steps in algorithm: Format bitstream: insert proper headers, code the side information, e.g. quantization scale factors for different bands, and finally code the quantized coefficient indices, generally using variable length encoding, e.g. Huffman coding
26
13
MPEG
Example: Assume that the levels of 16 of the 32 bands are:
Band Level (dB) 1 0 2 8 3 12 4 10 5 6 6 2 7 10 8 60 9 35 10 20 11 15 12 2 13 3 14 5 15 3 16 1
Assume that if the level of the 8th band is 60dB, it gives a masking of 12dB in the 7th band, 15 in the 9th. Level in 7th band is 10dB (<12dB), so ignore it Level in 9th band is 35dB (>15dB), so send it can encode with up to 2 bits (=12dB) of quantization error. If the original sample is represented with 8 bits, then we can reduce it to 6 bits.
27
MPEG
MPEG-1 audio layers: Performance Comparison MPEG defines 3 layers audio. Basic model is same (as described thus far), but coding efficiency increases with each layer, at the expense of the codec complexity.
Layer Layer 1 Layer 2 Layer 3 Target bit rate 192 kbit 128 kbit 64 kbit Ratio 4:1 6:1 12:1 quality @ 64 kbits -2.1 to 2.6 3.6 to 3.8 quality @ 128 kbits -4+ 4+
5 = perfect, 4 = just noticable . 1 = very annoying Raw data rate per audio channel: 48 KHz samples/s*16 bits/sample = 768 kbps
28
14
MPEG
At the time of MPEG-1 audio development (finalized 1992), layer 3 was considered too complex to be practically useful. But today, layer 3 is the most widely deployed audio coding method (known as MP3), because it provides good quality at an acceptable bit rate. It is also because the code for layer 3 is distributed freely
29
MPEG
Technical difference of audio layers: Input sequence is separated into 32 frequency bands. Each subband divides into frames, each contains 384 samples, 12 samples from each subbands Layer 1: DCT type filter with one frame and equal frequency spread per band. Psychoacoustic model only uses frequency masking Layer 2: Use three frames in filter (before, current, next, a total of 1152 samples). This models a little bit of the temporal masking Layer 3 (MP3): Better critical band filter is used (non-equal frequencies), psychoacoustic model includes temporal masking effects, takes into account stereo redundancy, and uses Huffman coder
30
15
MPEG
MPEG-4 A new standard, which became international in early 1999, that takes into account that a growing part of information is read, seen and heard in interactive ways It supports new forms of communications, in particular: Internet, Multimedia and Mobile Communications. MPEG-4 represents an audiovisual scene as a composition of (potential meaningful) objects and supports the evolving ways in which audiovisual material is produced, delivered, and consumed. E.g. computer-generated content becomes part in the production of an audiovisual scene. In addition, interaction with objects with scene is possible. The future: MPEG-7 & MPEG-21
31
References
Z.N. Li and M.S. Drew, Fundamentals of Multimedia, Pearson Prentice Hall, 2004 S. Furui, Digital Speech Processing, Synthesis, and Recognition, Marcel Dekker, Inc, 1989 R. Steinmetz and K. Nahrstedt, Multimedia: Computing, Communications & Applications, Prentice Hall PTR, 1995 B. Gold and N. Morgan, Speech and Audio Signal Processing, Processing and Perceptual of Speech and Music, John Wiley & Sons, Inc. 2000 D. Pan, A Tutorial on MPEG/Audio Compression, IEEE Multimedia, pp. 60-74, summer issue, 1995 P. Noll, Digital Audio for Multimedia, Proc. Signal Processing for Multimedia, NATO Advance Audio Institute, 1999
32
16
References
T. Painter and A. Spanias, Perceptual Coding of Digital Audio, Proc. of IEEE, vol. 88. No 4, April 2000 Audio Compression, http://www.cs.sfu.ca/undergrad/CourseMaterials/CMPT479/material/n otes/Chap4/Chap4.3/Chap4.3.html Multimedia Data Representation, http://www.cs.sfu.ca/CourseCentral/365/li/material/notes/Chap3/Chap 3.1/Chap3.1.html ISO, Overview of the MPEG-4 Standard, http://www.chiariglione.org/mpeg/standards/mpeg-4/mpeg-4.html
33
17

Audio Intro

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Audio Intro

Hochgeladen von

Copyright:

Verfügbare Formate

EE1432 Pengolahan Sinyal Multimedia

Audio Theory and Characteristics

Other concepts: SQNR and segmental SNR

DVD Audio 192 (max)

Das könnte Ihnen auch gefallen