Sie sind auf Seite 1von 145


Telephony Systems
By: Dr. Raed Al-Zubi
Information: audio, image, data, and video signal
Federal Communications Commission (FCC) for long
distance traffic
Public utilities Commissions of individual states for
local services
British Telecom and Mercury communications
provide local and trunk services
In Jordan
Telecommunications Regulatory Commission (TRC)
1. International telecommunications union (ITU)
The ITU Telecommunication sector (ITU-T)
Formerly the comite consultatif telegraphique et
telephonique (CCITT)
Study technical questions , operating methods and
specifications for telephony , telegraphy , and data
The ITU Radiocommunications sector (ITU-R)
Formerly the comite consultatif International des
Radiocommunications ( CCIR)
It studies all technical and operating questions related
to radio communications, including point - to point
communications, mobile services and broadcasting
2. International Frequency Registration Board
Associated with ITU-R.
Assignment of radio frequencies to prevent
3. International Standards Organization (ISO)
For many fields including Telecommunications
4. Institute of Electrical and Electronic Engineers
Networks Classification
1. Body Network
2. Personal Area Network (PAN): in office
3. Local Area Network (LAN): in Building (100 m),
campus (1 Km)
4. Metropolitan Area Network (MAN): in city (10 Km)
5. Wide Area Network (WAN): in country (100 Km),
continent (1000 Km)
6. The Internet: Planet
Example of MAN: cable television network
Major backbone operators are companies like AT&T and
Sprint. They Operate large international backbone
networks with thousands of routers connected by high-
bandwidth fiber optics
Routers in Regional ISP are located in different cities
Regional Cable Network (RCN) project between Jordan,
Syria, Turkey, Saudi Arabia, and UAE to establish an
optics fiber network for internet. Started in late 2010 and
expected to be launched during the third quarter of 2012.
(Jordan Times in Apr 06, 2012)
Manufacturers of IP-based telephone
systems: Aastra, Alcatel-Lucent, Avaya
Communications, Cisco Systems, Inc.,
Microsoft, Mitel Networks, NEC, Polycom,
ShoreTel, Inc., Siemens ICN, Vertical
Amazon, Apple, Google, Microsoft, and Netflix
have all developed innovative products and
software that have had enormous influence on
the Internet, consumer services, and business
Public Switched Telecommunication (Telephone) network- PSTN
CT: centre de transit
Connecting all the exchanges as a mesh
network is called junction network
Connecting all the exchanges as a star
network is done via a central switching
center called a tandem exchange
Connecting Tandem exchanges is called
trunk network (it is used between
In practice, a mixture of mesh and star
network is used
Trunk network Toll network
Trunk exchange Toll office
Trunk Junctor
Junction Inter-office trunk
Local exchange End office
Exchange Central office
Local network Customer loop
OSI Reference Model
The Open Systems Interconnection (OSI)
model was proposed by ISO to achieve
compatibility between different communication
devices produced by different manufacturers
The function of each device is divided into
different layers
Each layer has type of protocols that should
perform a well-defined function
OSI Model
MAC address: Media Access Control address.
Linked to the hardware of network adapters. For
this reason, the MAC address is sometimes
called the hardware address, the burned-in
address (BIA), or the physical address. Assigned
by the manufacturer of a network interface card
(NIC) and are stored in its hardware, the card's
read-only memory
IP address: is associated with software
TCP/IP model
OSI model is more general but rarely used
any more. TCP/IP model has the opposite
TCP/IP model
Internet layer: permits hosts to inject packets into any
network and have them travel independently to the
destination (potentially on different network)
It contains more complicated protocols than OSI model
OSI model was used to connect hundreds of universities
and government installations using leased telephone
line. However, ARPANET (research network sponsored
by DoD) required a new model to connect to satellite and
radio networks
Fiber-optic cabling, multi-core processors,
and low-cost memory are the building blocks of
modern networks. They enable networks to carry
more information, faster
The introduction of fiber cabling by MCI (long-
distance company for fiber) was in 1983 for intercity
The most significant advantage of fiber-optic cabling
is its enormous capacity. Also, Non-electric signals
(light) can travel 80 miles before having to be
But data over copper cabling are subject to fading over relatively short distances
(1.5 mile). Consequently, amplifiers are needed every mile and a half to boost
the electrical signals carried on copper-based networks
Also, copper cabling is heavier, and has less capacity than fiber cabling. Also,
Signals transmitted via copper react to electrical interference or noise on the
line. Interference from nearby wires is called crosstalk and can be reduced by
twisting each insulated copper wire of a two-wire pair
Fiber optics and its associated electronics have evolved to the point where a
consortium of companies including Google, Japanese carrier KDDI, Singapore
Telecommunications, and Indias Reliance Globalcom are constructing and will
operate a six-pair fiber undersea cable with a capacity of 17 terabits per second
(Tbps). (One terabit equals 1,000Gb.) Thats fast enough to transmit every book
in the British Library 20 times per second
Wavelength-division multiplexing further expanded
fibers capacity. These multiplexers essentially split a
single fiber into numerous channels, each able to
transmit a high-speed stream of light pulses, as
shown in Figure 1-1. The current generation of
multiplexers are capable of transmitting up to 88
channels of information, each operating at 100 (Gbps)
The undersea cable will run from Singapore to Japan, with
extensions to Hong Kong, Indonesia, the Philippines, Thailand,
and Guam. At the time of this writing, it was scheduled to begin
operation sometime in 2012
- The development of the erbium-doped fiber amplifier (EDFA) made DWDM
- C band EDFAs operate from 1530 nm to 1560 nm.
- The bandwidth of a C Band system is 4 trillion Hz
Faster, Lower-Priced Processors
They enable networks to process multiple
streams of light signals simultaneously
95 percent of mobile devices sold worlwide are
equipped with ARM chips. This architecture now
incorporates 32-bit processing (the ability to
process data in chunks of 32 bits), which means
that they process data faster. Moreover, they are
small and inexpensive, and they use only small
amounts of power.
Channel Access Methods
FDMA: Frequency division multiple access
TDMA: Time division multiple access
CDMA: Code division multiple access
SDMA: (Space-Division Multiple Access) using directional antenna,
power control.
PDMA: (Polarization division multiple access) Separate antennas are
used in this type, each with different polarization and followed by
separate receivers, allowing simultaneous regional access of satellites.
Each participating earth station with an antenna that has dual
PAMA: Pulse-address multiple access. It enables the ability of a
communication satellite to receive signals from several Earth terminals
simultaneously and to amplify, translate, and relay the signals back to
Earth, based on the addressing of each station by an assignment of a
unique combination of time and frequency slots.
Random Access: CSMA, ALOHA (pure ALOHA and slotted ALOHA)
Multiplexing combines traffic from multiple devices
or people into one stream so that they can share a
channel or path through a network
Types of Multiplexing:
Space-division multiplexing
Frequency-division multiplexing
Time-division multiplexing
Code-division multiplexing
4 KHz Human Voice Channel
Note: In the United States, AT&T designed its FDM systems to
handle the band of signals between 200 and 3400 Hz (Bw = 3200
12 channels per group
5 groups per supergroup
5 super groups per mastergroup
3 master groups per supermastergroup
900 Hz guardband between channels
Single Sideband suppressed carrier modulation SSB
Typically, there are 64 or 128 chips per bit
Each station is assigned a unique m-bit code
called a chip sequence
if station A is assigned the chip sequence
00011011, it sends a 1 bit by sending 00011011
and a 0 bit by sending 11100100
It is known how to generate such
orthogonal chip sequences using a method
known as Walsh codes.
Inner product of two chip codes is zero
In RX: sampling pulses should be synchronized with that
in TX
In Telephone system: sampling rate is 8KHz (2 x 4 KHz,
Nyquist frequency). So 8k pulses/second
Possible modulation methods: PAM-pulse amplitude
modulation, PWM-pulse width modulation , PPM-pulse
position modulation
Problems: attenuation and delay causes dispersion of
the transmitted pulses. So pulses interfere with other
pulses of adjacent channels. Hence, inter-channel
Solution: Pulse Code Modulation
A/D: an analog level of voltage is
converted to a group of bits (word=32 bits
or byte= 8 bits)
In telephone system: 8 bit encoding is
used (256 levels).
8 K pulse /sec ( 1 pulse = 8 bits)
Bit rate = 8 k x 8 /sec = 64 kbps (= baud
rate, since bit is one symbol).
Nyquist showed that the minimum bandwidth
needed to transmit a digital signal at B bauds is
So, in telephone system: minimum BW= 32 KHz.
( but for analog BW =4 KHz)
PCM introduces quantization distortion which is
not found in analog transmission.
If quantization is done using uniform size steps,
then high quantization error. So, non-uniform
size steps are used.
We can use uniform quantization but we need
companding (compression and expanding)
Compression: reduces the dynamic range of the
analog signal such that the quantization process
results in a good SNR.
Two companding laws were standardized by
CCITT: A-law in European system, mu-law used
in America and Japan system
TDM and VoIP
Once a connection is established, capacity
is saved even when the device is not
sending information. But there are small
slices of silence in voice (wasting network
capacity). This is the reason TDM is being
gradually replaced in high-traffic portions
of networks by Voice over Internet
Protocol (VoIP) technologies
Statistical Multiplexers
Unlike TDMs, statistical multiplexers do not guarantee
capacity for each device connected to them. Rather, they
transmit voice, data, and images on a first-come, first-
served basis, as long as there is capacity
Statistical multiplexers support more devices than TDMs
because they dont need to save capacity when a device
is not active. It can be used in a Wide Area Network
(WAN) to connect customers to the Internet. Customers
who contract for more costly, high-priority service can
obtain higher speeds than customers with lower-priority
service during traffic spikes
PCM Primary Multiplex Group
European 30-channel system (E1-
European system 1)
24- channel system (T1-transmission
system 1) used in North America and
Note: T0 and E0- are used for one channel
(64 Kbps)
What about E2, E3, or T2, T3
E1 System
Uses A-law companding
Bit rate= 32*8/125=2.048 Mbps
T1 System
Uses mu-law companding
Frame =1/fs = 1/ 8K = 125 micro Seconds
Number of bits= 1 (synch) + 8*24 = 194
Bit rate = 194/125=1.544 Mbps
Digit 8 of each channel in each 6
frame is used
for signaling for that channel
Digital Hierarchy in E1 and T1
The plesiochronous digital hierarchy-PDH (old)
The synchronous digital hierarchy-SDH (new)
The synchronous optical network SONET
PDH: the timing and clocking information are
contained within the digital bit stream and thus
this system is self-synchronized or
SDH/SONET: the timing and clocking information
are obtained from a highly accurate master clock
European PDH (Ex)
North American PDH (Tx or DSx)
There are some bits are added for frame
alignment and justification
In North America and Japan, T3 carries
672 conversations over one line at a
speed of 45Mbps. T3 is used for large
enterprises, call centers, and Internet
access. Small and midsize organizations
commonly use T1 for Internet access
PDH Multiplex mountain
They are standardized protocols that
transfer multiple digital bit streams over
optical fiber
Developed to replace the PDH system for
transporting large amounts of telephone
calls and data traffic over the same fiber
without synchronization problems
(synchronization sources of various
circuits were different)
SONET in the United States and Canada,
and SDH in the rest of the world. Although
the SONET standards were developed
before SDH, it is considered a variation of
SDH because of SDH's greater worldwide
market penetration
STM-1: Synchronous Transport Module, level 1
STS-1: Synchronous Transport Signal, level 1
Advantages of Digital Transmission
The same digital equipments can be used to
process all types of digital sources
Digital signals are highly resistant to crosstalk.
The crosstalk is most annoying when the two
parties of a call are not talking and can hear and
understand another call. In digital, it is random.
Signaling is made simpler and cheaper
Low cost
Advantages of Digital Transmission
Easy to multiplex and the ability to mix
voice, video, photographs, and e-mail on
the same transmission enables networks
to transmit more data
Better performance in the presence of
Higher speeds: It is faster to re-create
binary digital ones and zeros than more
complex analog wavelengths
Disadvantages of Digital
Higher bandwidth
The information capacity of digital system
is limited
Shannon limit for information capacity
C = 3.32 B log (1+ S/N) , C: bps, B:
bandwidth Hz
A/D and D/A are required
Selected Topics As Assignment
Simulation of TCP (using MATLAB or other programs)
Simulation of statistical multiplexer (using MATLAB,
Arena, or other programs)
Cognitive radio
Routing protocols
Simulation of wireless network
Selected Topic in Routing
Dijkstra Algorithm (Step 1)
Dijkstra Algorithm (Step 2)
Dijkstra Algorithm (Step 3)
Dijkstra Algorithm (Step 4)
Dijkstra Algorithm (Step 5)
Dijkstra Algorithm (Step 6)
Network Performance Metrices
Packet loss: This refers to the packets that are dropped
when there is network congestion. Voice conversations
break up when packet loss is too high
Latency: This term refers to delays (in milliseconds) that
are incurred when voice packets traverse the network.
Latency results in long pauses within conversations, and
clipped words
Throughput: measures actual user data transmitted
over a fixed period of time
Bit error rate
Fairness: to determine whether users or applications
are receiving a fair share of system resources
Jain's fairness index (1/n worst , 1 best)
Network Performance Metrices
Jitter: results in noisy calls that contain pops
and clicks or crackling sounds
The instant at which pulses are retransmitted by a repeater
are determined by a local oscillator synchronized to the digit
rate, which must be extracted from the received waveform.
Variations in the extracted frequency can cause a periodic
variations of the times of the regenerated pulses, which is
called jitter
Wander: the variation in the times of the
regenerated pulses due to changes in
propagation time
Network Performance Metrics
Power levels: usually SNR in dB is used
Network Performance Metrics
Power levels
log 10
= G
mW 1
log 10
= G
W 1
log 10
= G
used to indicate power levels relative to 1 mW
Example: 1 W = 30 dBm
used to indicate power levels relative to 1 W
Network Performance Metrics
Network Performance Metrics
Relative level of a signal at any point in the
system with respect to its level at the
reference point is denoted by dBr
Signal level in terms of the corresponding
level at the reference point is denoted by
dBmO: dBmO= dBm-dBr
Network Performance Metrics
Echo: is the annoying effect of hearing
your voice repeated. This is usually
corrected during installation by special
echo-canceling software
In analog system: Echo is produced due to
amplification in two directions.
Network Performance Metrics
2-wire / 4-wire circuit
Using Amplifier in one direction cannot
pass signal in the second direction
Using two amplifiers causes continuous
oscillation called singing
So, we use hybrid transformer, but the
price is 3 dB loss in each direction
Imperfect line balance causes part of the
signal transmitted in one direction to return
in the other. This is called echo.
2-wire / 4-wire circuit
2-wire / 4-wire circuit
Total attenuation from one two-wire circuit
to the other is
Transhybrid loss (TL): is the attenuation
through the hybrid transformer from one
side of the 4-wire circuit to the other.
TL = 6+B dB
4 2
6 L G =
Z - N
log 20 B
2-wire / 4-wire circuit
B: balance return loss due to impedance
mismatch between 2-wire line and the
balance network
Z: impedance of the 2 wire line
N: impedance of the balance network
The attenuation of the echo that reaches
the talkers 2-wire line
dB B 2L 3 G - 6) (B G - 3 L
2 4 4 t
+ = + + + =
2-wire / 4-wire circuit
The attenuation of the echo that reaches the
listeners 2-wire line
We can control echo by applying loss when 2T4
< 40 msec (increasing L2 by increasing the
length but this increases delay)
For 2T4 > 40 msec, we use echo suppressor
(electronic device)
dB B 2 2L G - 6) (B G - 6) (B L
2 4 4 l
+ = + + + =
Echo suppressor
2-wire / 4-wire circuit
2-wire / 4-wire circuit
Singing point of a circuit is the maximum gain S
that can be obtained from 2-wire to 2-wire line
without producing singing
Stability margin is the maximum amount of
additional gain M that can be introduced (equally
and simultaneously) in each direction of the
transmission without causing singing
dB L B M
0 M 2 ) L 2(B
0 2M L
+ =
= +
Transmission Performance In
Telephone System
A rating system was standardized by CCITT to
grade a customer satisfaction is called Overall
Loudness Rating (OLR) or Overall Reference
Equivalent (ORE) in dB
ORE= TRE + RRE + losses dB
TRE: transmit reference equivalent
RRE: receive reference equivalent
-ve dB: the system is better than the reference
Transmission Performance In
Telephone System
TRE and RRE is measured using a reference
system called NOSFER in ITU lab in Geneva
How the Telephone Works !!!
The first Telephone
In 1876, Alexander Graham Bell made the
first Telephone, called the Bell Telephone
The same equipment is used at TX and
Types of switching:
Circuit switching (space-division switching or
analog switching)
Message switching (delay or queuing system)
Packet switching (time-division switching or
digital switching)
Circuit Switching
A physical path is established in advance
between the sender and the receiver and
this path is reserved for only one call (so,
for voice network)
Circuit Switching
It is an example of lost-call system ( the
call cannot be stored as in message
In old systems: manual switching was
done by operator
Then, automatic systems were used:
The step-by-step switching system
The crossbar switching system
Step-by-step Switching System
Uses the two-motion selector which was invented by
Almon B. Strowger
Had a lifetime of nearly 100 years
It is the first automated switching system
Crossbar Switching System
Packet Switching
Dividing the long messages into packets is useful to solve the
problem of queuing messages with different lengths
Used in internet
Developed in 1969 by ARPANET (Advanced Research Projects
Agency ). So, ARPANET was the pioneer to todays Internet
The Department of Defense wanted a more reliable network with
route diversity capability. In a national emergency such as the
September 11, 2001 attacks in the United States on the Pentagon in
Washington, DC, and the World Trade Center in New York City, the
Internet still functioned when many portions of the public voice and
cellular networks were either out of service or so overwhelmed with
traffic that people could not make calls
Packet Switching
Timing of events in circuit switching, message switching,
packet switching
Deep Packet Inspection
DPI is software that can be installed as part of routers or in
stand-alone switches connected to a network in order to
manage and understand network traffic
For the most part, DPI examines the content in the headers of
packets rather than user content. But also it can examine the
data and take a copy of it
DPI can be used to: Prioritize traffic, Maintain control over
proprietary information, Protect networks against hackers,
Block traffic to certain sites, Plan network capacity
requirements, track terrorists or people critical of the
It can distinguish Voice over Internet Protocol (VoIP) traffic
from data, gaming, and video traffic so that video or gaming
traffic within a carrier network can be given the treatment
required to maintain optimum performance
Deep Packet Inspection
DPI software develops a database of patterns, also
referred to as signatures. Each signature or pattern is
associated with a particular application such as peer-to-
peer music sharing or protocols such as VoIP or traffic
from certain hackers
In 2009, several music companies sued a 25-year-old
graduate student for illegal music downloads and won a
$670,000 judgment. In another case, music companies
won a $1.2 million judgment case against a single
DPI equipment can be used to monitor e-mail messages
by detecting keywords. For example, during 2009
presidential elections in Iran
Types of Services
Connection-oriented service is modeled after
the telephone system. The user first establishes
a connection, uses the connection, and then
releases the connection
Connectionless service is modeled after the
postal system. Each message (letter) carries the
full destination address, and each one is routed
through the system independent of all the others
Other Types of Services
Refers to the exchange of control information
associated with the establishment of a telephone
call on a telecommunications circuit.
Example: digits dialed by the caller
Two types of signaling:
Channel Associated Signaling (CAS): when the
signaling is performed on the same circuit that carries
the conversation of the call. This is the case for earlier
analog trunks
Common Channel Signaling (CCS): the path and
facility used by the signaling is separate and distinct
from the telecommunication channels that carry the
telephone call. (controls multiple data channels)
Channel Associated Signaling
Common Channel Signaling
Advantages of CCS:
Information can be exchanged more rapidly
Signals relating to a call can be sent while the call is
in progress. This enables customers to alter the
connections after they have been set up. For
example, a customer can transfer a call elsewhere, or
request a third party to be connected into an existing
Signals can be exchanged between processors for
functions other than call processing, for example for
maintenance or network-management purposes.
In CAS, the successful exchange of signals over a circuit
proves that the circuit is working. CCS does not
inherently provide this checking facility, so a separate
means (e.g., automatic routine testing) must be provided
to ensure the integrity of the speech circuits.
In CCS, each message must contain a label called the
circuit identity code that indicates to which speech circuit
and thus to which call it belongs. So, no connection is
required to an incoming junction before an address
signal is received. The address signal can therefore be
the first message sent.
A signaling link operating at 64 Kbps normally provides
signaling for up to 1000 to 1500 speech circuits.
Signaling Networks
Associated Signaling
Signaling Networks
Non-associated Signaling:
Signaling Networks
Quasi-associated signaling: used when there
are few circuits between A and B and thus little signaling
traffic between them. It is normally provided in case the
associated-signaling link fails
Customer Line Signaling
Pulse dialing: rapidly disconnecting and
re-connecting the calling partys telephone
line. Similar to flicking a light switch on and
off . Used to determine the dialed number
Dual-tone multi-frequency signaling
(DTMF): used by push-button telephone.
Sends each digit as a combination of two
frequencies, why?
It generates a sinusoidal tone which is mixture of the row and column frequencies
FDM Signaling
Out-band signaling
is done on a channel that is dedicated for the purpose
and separate from the channels used for the
telephone call. Out-of-band signaling is used in
Signaling System 7 (SS7), the latest standard for the
signaling that controls the world's phone calls.
Example: send signaling information in the guard
band in the telephone channel
Disadvantage: all the routes in the network must use
out-band signaling (practical problem)
Out-band Signaling
FDM Signaling
In-band Signaling:
called voice frequency signaling (VF)
Signaling is done within the band of the data
Advantage: independent on the transmission
system and can work over any circuit
Example: DTMF tones
Voice Frequency Signaling System
VF Receiver
PCM Signaling
30-channel system:
PCM Signaling
24-channel system:
In every sixth frame, the eighth bit for each
channel is used for signaling instead of
speech. This has been found to cause a
negligible increase in quantization
Signalling System No. 7 (SS7)
Before SS7:
CCITT R1 (region1): national level, CAS system, Multi-frequency, USA
CCITT R2 (region2): national level, CAS system, Multi-frequency, Europe
SS5: both national and international, CAS system, In-band signaling
SS6: both national and international, out-band signaling and it is the first CCS
system. Had a restricted 28-bit signal unit that was both limited in function and
not suitable for digital systems
Uses out-band signaling and it is CCS system
Called CCITT signaling system No. 7
Called Common Channel Signaling System 7
Is a set of telephony signaling protocols which are used to set up most of the
world's PSTN telephone calls
The main purpose is to start and end telephone calls but then it is used for other
service such as prepaid billing mechanism and SMS
Signalling System No. 7 (SS7)
Signalling System No. 7 (SS7)
Signal messages are passed from the central processor
of the sending exchange to the CCS system.
The Signaling-control subsystem structures the
messages in the appropriate format and queues them for
transmission. When there are no messages to send, it
generates filler messages to keep the link active.
Messages then pass to the signaling termination
subsystem, where complete signal unit (SU) are
assembled using sequence numbers and check bits
generated by error-control subsystem.
At the receiving terminal, the reverse sequence is carried
SS7 and OSI Model
SS7 can be modeled as a stack of
protocols like OSI model
SS7 consists of:
Level 1: the physical level
Level 2: The data-link level
Level 3: The signaling-network level
Level 4: The user part
SS7 and OSI Model
Level 1: performs the functions of sending bit
streams over a physical path
Level 2: performs the functions of error control,
error rate monitoring, flow control and
delineation of messages
Level 3: provides the functions required for a
signaling network. Each node in the network has
a signal-point code, which is a 14-bit address.
Every message contains the point code of the
transmitter and receiver.
Level 4: provides the functions required to
provide different services to the user.
MTP: Message Transfer Part
ISDN User Part (ISUP) : The ISDN User Part (ISUP) defines the
protocol used to set-up, manage, and release trunk circuits that carry
voice and data between terminating line exchanges (e.g., between a
calling party and a called party).
Telephone User Part (TUP): In some parts of the world (e.g., China,
Brazil), the Telephone User Part (TUP) is used to support basic call
setup and tear-down. TUP handles analog circuits only. In many
countries, ISUP has replaced TUP for call management.
Signaling Connection Control Part (SCCP): SCCP provides
connectionless and connection-oriented network services. SCCP is used
as the transport layer for TCAP-based services.
Transaction Capabilities Applications Part (TCAP): TCAP supports
the exchange of non-circuit related data between applications across the
SS7 network using the SCCP connectionless service. For example,
TCAP carries Mobile Application Part (MAP) messages sent between
mobile switches and databases to support user authentication,
equipment identification, and roaming.
Speech Coding
Narrowband speech codecs: used to give an
efficient digital representation of telephone
bandwidth speech
An ideal speech codec will represent this speech
with as few bits as possible, while producing
reconstructed speech which sounds identical, or
almost identical, to the uncoded speech
In practice there is always a trade-off between
the bit rate of the codec and the quality of its
reconstructed speech
The Basic Properties of Speech
Speech is produced when air is forced from the lungs
through the vocal cords and along the vocal tract.
An important part of many speech codecs is the
modelling of the vocal tract as a short term filter
The spectral peaks of the sound spectrum |P(f)| are
called formants
formants are controlled by the shape of the tract and
they are the poles of the short term filter
As the shape of the vocal tract varies relatively slowly,
the transfer function of its modelling filter needs to be
updated only relatively infrequently (typically every 20
ms or so)
The Basic Properties of Speech
Speech sounds can be broken into three
classes depending on their mode of
excitation :
1. Voiced sounds are produced when the vocal
cords vibrate open and closed, thus
interrupting the flow of air from the lungs to
the vocal tract and producing quasi-periodic
pulses of air as the excitation. The rate of the
opening and closing gives the pitch of the
-Voiced sounds show a high degree of periodicity at the pitch period,
which is typically between 2 and 20 ms.
- These figures show a segment of voiced speech sampled at 8 kHz.
Here the pitch period is about 8 ms or 64 samples.
The Basic Properties of Speech
2. Unvoiced sounds result when the excitation is a
noise-like turbulence produced by forcing air at high
velocities through a constriction in the vocal tract
while the glottis is held open. Such sounds show
little long-term periodicity as can be seen from next
figures, although short-term correlations due to the
vocal tract are still present
3. Plosive sounds result when a complete closure is
made in the vocal tract, and air pressure is built up
behind this closure and released suddenly
The Basic Properties of Speech
The shape of the vocal tract and its mode
of excitation change relatively slowly, and
so speech can be considered to be quasi-
stationary over short periods of time (of
the order of 20 ms)
Speech coders attempt to exploit this
predictability in order to reduce the data
rate necessary for good quality voice
Commonly Used Speech Codecs
Divided into three classes: waveform
codecs, source codecs, and hybrid codecs
Waveform Codecs
They are signal independent and work well with
non-speech signals
They are low complexity codecs which produce
high quality speech at rates above about 16
Time-domain waveform codecs
64 Kbps, this is the simplest waveform codecs.
They have the advantages of low complexity and delay with
high quality reproduced speech, but require a relatively high
bit rate.
Waveform Codecs
It utilizes the correlations present in speech samples due to
the effects of the vocal tract and the vibrations of the vocal
It quantizes the difference between the original and predicted
48 kbps
Adapts the quantization step to the difference (small step for
small difference and large step for big difference)
32 kbps (very similar to the 64 kbits/s PCM codecs)
Later ADPCM codecs operating at 16,24 and 40 kbits/s were
also standardized.
Used in VoIP
Waveform Codecs
Frequency-domain waveform codecs:
Sub-Band Coding (SBC):
The input speech is split into a number of frequency bands, or sub-
bands, and each is coded independently using for example an
ADPCM like coder. At the receiver the sub-band signals are
decoded and recombined to give the reconstructed speech signal.
The advantages of doing this come from the fact that the noise in
each sub-band is dependent only on the coding used in that sub-
band. Therefore we can allocate more bits to perceptually important
sub-bands so that the noise in these frequency regions is low, while
in other sub-bands we may be content to allow a high coding noise
because noise at these frequencies is less perceptually important.
16-32 kbits/s
Due to the filtering necessary to split the speech into sub-bands
they are more complex than simple DPCM coders, and introduce
more coding delay. However the complexity and delay are still
relatively low when compared to most hybrid codecs.
Waveform Codecs
Frequency-domain waveform codecs:
Adaptive Transform Coding (ATC):
Uses a fast transformation (such as the discrete
cosine transformation) to split blocks of the speech
signal into a large numbers of frequency bands.
The number of bits used to code each
transformation coefficient is adapted depending on
the spectral properties of the speech.
Toll quality reproduced speech can be achieved at
bit rates as low as 16 kbits/s.
Source Codecs
Source coders operate using a model of how the source
was generated, and attempt to extract, from the signal
being coded, the parameters of the model. It is these
model parameters which are transmitted to the decoder
Source coders for speech are called vocoders
The vocal tract is represented as a time-varying filter and
is excited with either a white noise source, for unvoiced
speech segments, or a train of pulses separated by the
pitch period for voiced speech. Therefore the information
which must be sent to the decoder is the filter
specification, a voiced/unvoiced flag, the necessary
variance of the excitation signal, and the pitch period for
voiced speech. This is updated every 10-20 ms to follow
the non-stationary nature of speech.
Source Codecs
Vocoders tend to operate at around 2.4 kbits/s
or below, and produce speech which although
intelligible is far from natural sounding.
Increasing the bit rate much beyond 2.4 kbits/s
is not worthwhile because of the inbuilt limitation
in the coder's performance due to the simplified
model of speech production used. The main use
of vocoders has been in military applications
where natural sounding speech is not as
important as a very low bit rate to allow heavy
protection and encryption
Hybrid Codecs
The most successful and commonly used
are time domain Analysis-by-Synthesis
(AbS) codecs
AbS codecs work by splitting the input
speech to be coded into frames, typically
about 20 ms long. For each frame
parameters are determined for a synthesis
filter, and then the excitation to this filter is
Hybrid Codecs
AbS codecs to produce good quality speech at
low bit rates.
The numerical complexity involved in passing
every possible excitation signal through the
synthesis filter is huge
Types of AbS codecs:
MPE: multi-pulse codecs
RPE: Regular Pulse Excited (RPE) codec
CELP: Code Excited Linear Prediction
The distinguishing feature of AbS codecs is how
the excitation waveform u(n) for the synthesis
filter is chosen
Hybrid Codecs
u(n) is given by a fixed number of non-zero pulses for every
frame of speech. The positions of these non-zero pulses within
the frame, and their amplitudes, must be determined by the
encoder and transmitted to the decoder.
In theory it would be possible to find the very best values for all
the pulse positions and amplitudes, but this is not practical due
to the excessive complexity it would entail. In practice some sub-
optimal method of finding the pulse positions and amplitudes
must be used.
Typically about 4 pulses per 5 ms are used, and this leads to
good quality reconstructed speech at a bit-rate of around 10
Hybrid Codecs
It uses a number of non-zero pulses to give the excitation signal u(n).
However in RPE codecs the pulses are regularly spaced at some fixed
interval, and the encoder needs only to determine the position of the
first pulse and the amplitude of all the pulses.
Therefore less information needs to be transmitted about pulse
positions, and so for a given bit rate the RPE codec can use many more
non-zero pulses than MPE codecs.
For example at a bit rate of about 10 kbits/s around 10 pulses per 5 ms
can be used in RPE codecs, compared to 4 pulses for MPE codecs.
This allows RPE codecs to give slightly better quality reconstructed
speech quality than MPE codecs.
However they also tend to be more complex.
The European GSM mobile telephone system uses a simplified RPE
codec, with long-term prediction, operating at 13 kbits/s to provide toll
quality speech.
Hybrid Codecs
The excitation is given by an entry from a large vector quantizer
codebook, and a gain term to control its power
Typically the codebook index is represented with about 10 bits (to give a
codebook size of 1024 entries) and the gain is coded with about 5 bits
Toll quality speech at bit rates between 4.8 and 16 kbits/s
The complexity of the original CELP codec was much too high for it to
be implemented in real-time
With large advances, now it is relatively easy to implement a real-time
CELP codec on a single, low cost, DSP chip
Several important speech coding standards have been defined based
on the CELP principle, for example the American Department of
Defence (DoD) 4.8 kbits/s codec, and the CCITT low-delay 16 kbits/s
codec .
Standard Speech Codecs
64 kbits/s PCM Codecs
The 32 kbits/s G721 ADPCM Codec
The 16 kbits/s G728 Low Delay CELP
The 13 kbits/s GSM Codec
The 4.8 kbits/s DoD CELP Codec
PCM Codecs
If linear quantization is used then about 12 bits per
sample are needed, giving a bit rate of about 96 kbits/s.
With non-linear quantization 8 bits per sample was
sufficient for speech quality which is almost
indistinguishable from the original. This gives a bit rate of
64 kbits/s, and two such non-linear PCM codecs were
standardised in the 1960s.
Because of their simplicity, excellent quality and low
delay both these codecs are still widely used today. For
example the .au audio files that are often used to convey
sounds over the Web are in fact just PCM files.
Listen to an audio file compressed by
different codecs: