Sie sind auf Seite 1von 13

Coding algorithm based on loss compression using scalar

quantization switching technique and logarithmic


companding

ZORAN PERIC, ALEKSANDAR MOSIC AND STEFAN PANIC
Department of Telecommunications, Faculty of Electrical Engineering
University of Nis
18000 Nis, Serbia


This paper proposes a novel coding algorithm based on loss compression using
scalar quantization switching technique. The algorithm of switching is performed by
the estimating input variance and further coding with nonuniform switched scalar
compandor (NSSC- Nonuniform Switched Scalar Compandor). An accurate estimation
of the input signal variance is needed when finding the best compressor function for a
compandor implementation. It enables quantizers to be adapted to the maximal
amplitudes of input signals. Additionally, we have discussed the performances of
coding schemes designed according to waveform G.711 and G.712 standards and a
novel presented codec standard for wideband speech and audio coding. We have
pointed out the benefits that can be achieved by using our algorithm: raise of quality
and compression. The main contribution of this model is reaching the loss compression
through reaching the higher quality of the signal-to-quantization noise ratio (SQNR) in
a wide range of signal volumes (variances) with respect to the necessary robustness
over a broad range of input variances, and applying possibility for VOIP applications
and an effective coding of signals that likewise speech signals follow Gaussian
distribution and have the time varying characteristics.

Keywords: coding algorithm, loss compression, nonuniform scalar quantization
switching technique, logarithmic companding


1. INTRODUCTION

Coding is a procedure used for representing a digitized signal with as few bits as
possible, maintaining at the same time a reasonable level of quality (usually measured by
SQNR). A not so popular name having the same meaning is compression. Coding has
matured to the point where it now constitutes an important application area of signal
processing. Due to the increasing demand for communication, coding technology has
received augmenting levels of interest from the research, standardization, and business
communities [1], [2]. With the widespread availability of low-cost high performance
digital signal processors, many signal processing tasks done in the old days using analog
circuitry are now predominantly executed in the digital domain. Advantages of going
digital are many: first, the minimizing of necessary communication capacity for high
quality signal transmissions like picture or speech signal transmissions; second,
minimizing the memory capacity for storing this information into fast mediums or data
bases; third, simplifying correct description of processed signal in order to minimize
algorithm for signal processing. Numerous researches, for example [3], have been
conducted during recent years with the goal to develop a coding algorithm that
minimizes the bit rate in the digital representation of a signal without an objectionable
loss of signal quality in the process. Although, the great numbers of coding algorithms
have been developed, it is still insinuated on the reasonable need to continue research in
this field. We destined to focus our research in the field of waveform coders since they
provide the highest level of quality. It is well known that the waveform coders attempt to
code the exact shape of the speech signal waveform, without considering in detail the
nature of human speech production and speech perception. The simplest and the most
commonly used waveform coding algorithm, defined by the G.711 standard [4], provides
conversion of 12 bits samples to 8-bit code (bit-rate of 64 kb/s) by using compounded 8-
bit Pulse Code Modulation (PCM). Until now, so much work has been done to provide
additional reduction over PCM bit rate while preserving the original quality of the
digitized speech signal.
There is a desire, and sometimes an absolute need, in many VoIP applications to
send voice data over IP networks to the end systems in the identical PCM format it was
presented to the VoIP system by a PSTN/GSTN interworking device [5], [6]. Most
examples of this need are associated with the fact that many voice coders do not transport
the signal with the required fidelity for the application using the channel (e.g.,
DTMF/TTY/TDD pass-through and modem or FAX pass-through). As many VoIP
transport providers [7] desire not to degrade the audio quality over the distortions already
created by the G.711 companding, there is often a need to transport the signal in the
identical PCM format presented to the PSTN/GSTN interworking device. The
importance and reality of the alternative scheme investigation occupied us and induce to
think about possible solutions. Consequently, our work here is motivated by the problem
of finding alternative coding scheme that has as less as possible complexity and provides
reasonable level of data quality (near optimal SQNR) while attaining as higher as
possible compression. We will solve the presented problem with the coding algorithm
based on the scalar quantization switching technique application. One simple technique is
switched codebook adaptive scalar quantization. The algorithm of switching is performed
by coding with nonuniform switched scalar compandor (NSSC-Nonuniform Switched
Scalar Compandor).
An accurate estimation of the input signal variance is needed for finding the best
compressor function for the compandor implementation. We have begun our research
with distortion analysis in section 2. Then we have presented our novel algorithm model
in section 3 and tested the performance of the proposed coding algorithm. In order to
point out the benefits of the proposed algorithm, in section 4 we have compared the
performances with the simplest and the most commonly used waveform coding
algorithms, defined by the G.711 standard and G.712 standard, that defines the smallest
SQNR that have to be achieved for high quality transmission. Similar conclusions have
been given in [8-10]. After our algorithm of loss compression, a method of lossless
compression for G.711 [11], might be utilized at the beginning of end-to-end media
transmission. That is why our algorithm provides better performances than lossless
compression, because lossless compression can be performed after our algorithm. -law
compounded implementation to achieve compromise between high-rate digitalization and
variance adaptation was also considered in [12]. In a number of papers the quantization
of Gaussian source was analyzed since the pdf of the instantaneous speech signal values
for lower number of digitalization samples is better represented by the Gaussian then the
Laplacean function [1], [2]. The volume of speech signal has the Gaussian distribution
too. Also for appropriate design of filters, every continuous signal that is filtered, will
have an approximately Gaussian distribution [13]. Three main advantages versus new
codec presented in [5], [6], which also has higher quality than G.711 and G.712 standards
are pointed. These advantages are: 1) higher quality (of almost 16 dB and 10 dB
considering bit rates per sample of nearly 9 and 8 bits per sample, respectively), 2) bit
compression (of almost 1 or 2 bits respectively), and 3) simpler realization. The main
contribution of this model is reaching the loss compression through reaching higher
quality of SQNR in a wide range of signal volumes (variances) with respect to it's
necessary robustness over a broad range of input variances. The possibility of applying
proposed model emerges for VOIP applications and the effective coding of signals that
likewise speech signals follow Gaussian distribution and have the time varying
characteristics.
2. DISTORTION ANALYSES
In situations such as speech coding, the exact value of the input variance is not
known in advance; and in addition it tends to change with time. In such situations, a
signal-to-quantization ratio that is constant over a broad range of input variance can be
obtained by using logarithmic companding law.
The -law companding is used for PCM telephone systems in the USA, Canada and
Japan, with the standard value of = 255, and -law compression characteristic is
characterized by:
max
max
ln(1 )
( ) sgn
ln(1 )
x
x
g x x x
+
=
+
(1)

An N-point nonuniform scalar quantizer for a source characterized as a continuous
random variable with probability density p(x) has distortion defined as the expected mean
square error between original and quantized signal.
In this paper we consider the Gaussian input signals with unrestricted amplitude
range. Determination of the support region enables quantizers to be adapted to the
amplitudes of input signals. Gaussian probability density function of the original random
variable x with variance can be expressed by:
( )
2
2
2
2
1


x
e x p

= . (2)
Total distortions consist of two components, granular and overload distortion, where
the granular and overload distortions are defined as:



( ) ( )

=
1
2
2
1
N
i
t
t
i g
i
i
dx x p y x D
, (3)

( ) ( )

=
1
2
2
N
t
N o
dx x p y x D
. (4)

where t
i
, t
i-1
, y
i
and y
n
are defined as in [12] After some mathematical transformations,
with small lost of accuracy similarly to [12], we can derive the following expression for
granular and overload distortion:


g
D =
( )
(

+ +
+
1
2 2 1
3
1 ln
max
2
2
max
2
2
2
2

x x
N
. (5)
(
(

\
|

|
|

\
|
|

\
|
+ =


2
2
1 1
2
2
max
2
max max
2
max 2
x
o
e
x x
erf
x
D , (6)
where erf(x) denotes the error function.
Since we now know how to calculate distortion for robust quantization of a Gaussian
source that has variable average power in a wide range, we can find the signal power-to-
total-distortion ratio (dB), which is denoted as signal-to-quantization-noise ratio SQNR
instantaneous value of a signal masked by Gaussian distributed noise at time:

( (( ( ) )) )
( (( ( ) )) )
2
2
2
2 2
2
2 2
1
SQNR 10 10
1 2 2 1 2
1 1 1
3 2


log log
ln
c
t
D
c
c c c erf ce
N

= = = = = = = =
| | | | | | | | + ++ +
| | | | | | | |
+ + + + + + + + + + + + + + + +
| | | | | | | |
\ \ \ \
\ \ \ \
(7)
where c = x
max
/.
Optimization of total distortion is derived in two steps. First, we accomplish
adaptation on maximal amplitude of input signal, or the optimization for parameter c in
correspondence to , which is described as:

c
D
t

=0 ( )
opt
c c = . (8)
And then in the second step, we find required
opt
, for which total distortion should
has its minimum, which is described as:

( )

opt
c c
t
D
=

=0 ( )
min , t opt t opt
D D = = . (9)
3. A NOVEL SPEECH CODING ALGORITHM
Every speech coding scheme can be defined as an algorithm since speech coding is
performed using numerous steps or operations. The focus of our analysis is speech
coding algorithm based on loss compression using scalar quantization switching
technique. At the beginning we must know the corresponding values of
opt
and c
opt.

Hence, the first step of the considered algorithm (see Figure 1) is known as buffering.
Namely, a finite number M of input samples (frame) are used for variance computation,
where M 1 is a finite number, known as a frame length. The estimated variance is
quantized by the j-th scalar quantizer and used to scale the current input sample x
k
. For
this block we must define and store in memory following vector [
0j
,
1j,

2j
] for j-th
quantizer, where
0j
2
[
2
1j,

2
2j
), and
K
j=1
[
2
1j
[dB]
,

2j
[dB])= [-20,20) follows:

1j
[dB]= -20 +(j-1) ;
2j
[dB]= -20 +j ;
oj
[dB]= -20 +(j-1/2) , (10)

with = 2x
max
/N being quantization step size. The index J specifying the class is used to
select a particular codebook from a predesigned set of K codebooks. One frame has
length of M. The index to identify the class is sent on the end of block. If each of the K
codebooks has size N, the bit rate per sample is: R=log
2
N + (log
2
K)/M. Codebook size N
depends on number of bits n that are used for the encoding. The relation between N and
n is N =2
n
, where n is the number of bits per sample. Then the scaled input x
k
is coded
with nonadaptive (fixed) scalar coder that consists of scalar encoder and scalar decoder
in cascade. Encoder converts the input signal to the index of an interval into which the
input signal falls. Decoder converts the index of an interval into which the input signal
falls to the input signal.

Fig. 1. Speech coding algorithm based on forward adaptive technique using nonlinear scalar
switching compandor (NSSQ)
Output signal
k
x

J

k
I

k
I

J

J

k
x


Speech signal,
opt,
c
opt
Channel transmission
Transmission of index J
Variance estimation
Nonuniform scalar quantizing
Buffering
Encoding by NSSQ
Decoding by NSSQ Decoding by nonuniform SQ
^
g
The index, denoted with I
k
, is mapped to (the codeword that represents) the
reconstruction level, denoted with
^
j
g , corresponding to the interval in the decoder. Note
that the index J that represents the side information needed for the decoding with j-th
scalar decoder. If we have K codebooks for transmission of side information we will
need N
0
bits, where K = 2
N
0
. Namely, switching technique can accurately of control the
variance level the input sequence to be quantized, but the side information must be
transmitted to the decoder. As we mentioned, encoder converts the input signal to the
index of an interval into which the input signal falls. This operation consists of following
steps. In first step, we compute x
maxj
of j-th compandor as x
max j
=c
opt

0j
. Then we
perform companding given by (11):

( )
x
x
x
x g
j
k
j
x
j
k
sgn
) 1 ln(
) 1 ln(
max
max

+
+
=
(11)

At the end, we convert the input signal to the index of an interval into which the input
signal falls. The index, denoted with I
k
, is mapped to (the codeword that represents) the
reconstruction level if the following equation stands:


( )
+ +
+1 max max k j
x
j k j
I x g I x
k
, k = 1,, N - 1. (12)
where = 2x
maxj
/N is quantization step size. After channel transmission, we need to
convert the index of an interval into which the input signal falls to the input signal. First,
we compute x
maxj
of j-th decoder as x
max j
=c
opt

0j
. Then, we convert the index of an
interval into which the input signal falls to quantized value of input signal. The codeword
that represents input signal, denoted with
( )
^ k
x
j
g , is mapped to the index of an interval
into which the input signal falls to the input signal if the following equation stands:
( )
^
max
1
2
k
x
j k j
g x I
| |
= + +
|
\
, k = 1,, N - 1. (13)
Finally, we reach the reconstructed value k x
^
. Expanding in step four is given by (14):
( )
( )
( )
^
max
ln 1
^ ^
max
1 sgn
x
k
j
k
j
g
x
j x
k j
x
x e g
+
| |
|
| |
|
| =
|
|

\ |
|
\
. (14)
4. RESULTS AND DISCUSSIONS
Relying on the analysis of the proposed speech coding scheme (algorithm), here we
provide some numerical results. In order to obtain the results we consider the compandor
model having K=16 codebooks and assume the optimal value of for each codebook of
our NSSQ model. Furthermore, we assume dynamic range of 40 dB that corresponds to
the speech signal modeled by the Gaussian distribution. In Fig.2, we have compared
performances of our NSSQ algorithm with those achieved by the G.711 standard.
Measurement of these performances was quality of signal given by SQNR .
-20 -15 - 10 -5 0 5 10 15 20
22
24
26
28
30
32
34
36
38
40
42
44
46
48
50
1) NSSQ N=512, K=16,c
o pt
=4.6,
opt
=5.5
2) NSSQ N=256, K=16,c
o pt
=4.2,
opt
=4.5
3) NSSQ N=128, K=16,c
o pt
=4,
opt
=4
4) G.711
5) G.712
S
Q
N
R

[
d
B
]
20 l og (/
0
) [dB]
1
2
3
4
5

Fig.2 Compare between G.711 and G.712 standards and our algorithm of NSSQ in function of
SQNR.
In order to point out that the proposed speech coding algorithm provides high quality
speech coding, we will use G.712 standard that defines the smallest SQNR that have to be
achieved for high quality transmission. Comparration of these performances are also
given at Fig.2. Here we can see that SQNR for NSSQ varies acts for each input power
range [
2
1j
/
2
0,

2
2j
/
2
0
),
K
j=1
[
2
1j
/
2
0
[dB]
,

2
2j
/
2
0j
[dB])= [-20,20), in case of 16
codebooks and codebook sizes of 512, 256 and 128. Our bit rate per sample is: R =
log
2
N + (log
2
K)/M, where M denotes frame length, in our case, when we use 16
codebooks, and for frame length of 80, we will have bit rate per sample of 9.05, 8.05,
7.05 bits per sample respectively. We are using codebook sizes of 512 and 256 to reach
very high standards of quality. Comparing with G.711 standard which has average
quality of 32.18 dB, we derive conclusion that we easily have overachieved his average
quality value for almost 16 dB and 10 dB, using bit rate per sample of 9.05, 8.05,
respectively. Even for bit rate per sample of 7.05 bits per sample our model shows better
performances of almost 4.6 dB comparing to 8 bits per sample, G.711 standard. Also,
considering signal-to-quantization-noise-ratio values a measurement of quality, we can
see that with applying our speech coding algorithm insures that average value of SQNR
has approximately 2.4 dB higher value than maximal value of G.712 standard, even for
bit rate per sample of 7.05 bits per sample. Considering this, there is a conclusion, that if
we want to satisfy the same standard for varying of SQNR in twice larger input power
range of [-40dB, 40dB], we will have to use same codebook size for 32 codebooks. If we
want to satisfy the less restrictive standards of SQNR variance for each input power range
[
2
1j
/
2
0,

2
2j
/
2
0
), we can use smaller number of codebooks, and if we want to achieve
smaller peak value of SQNR, we can use smaller size of each codebook, for each input
power range. Also we can compare our model with the codec proposed in [5,6]. First, we
clearly see that we overachieve quality of G.711 standard for almost 16 dB and 10 dB in
cases when we are trying to reach very high standards of quality for wideband speech
and audio coding, with using bit rate per sample of nearly 9 and 8 bits per sample,
respectively, while proposed codec in [5],[6] overachieves quality of G.711 standard for
only 0.5dB. This advantage of our model is caused by the usage of variance adaptation.
Another big advantage is compression. Namely, the codec proposed in [5],[6] requires bit
rate per sample of 10 bits per sample. This means that our model provides not only
higher quality of nearly 15.5 dB or 9.5 dB, comparing with the proposed codec in [5],[6],
but also provides bit compression of 1 or 2 bits respectively. And finally, we found that
our nonuniform switched scalar compandor is much simpler for practical realization, than
proposed codec using vector companding, although they both have same delay. There is
a general conclusion that our method has better performances than codec proposed in [6].
Also, G.711 standard transmission is here clearly overachieved and could be substituted
by our method for wide range of transmission applications including VOIP.
5. CONCLUSION
In this paper a novel speech coding algorithm is proposed. Our switching algorithm
is performed by estimating variance and further coding and decoding with nonuniform
switched scalar compandor (NSSC- Nonuniform Switched Scalar Compandor). We have
discussed the performances of signal coding schemes designed according to G.711 and
G.712 standards and a novel presented codec standard for wideband speech and audio
coding. We have pointed out the benefits that can be achieved by using our algorithm.
Considering signal-to-quantization-noise-ratio values as a measurement of quality, we
can see that with applying our speech coding algorithm we have clearly overachieved an
average quality of G.711 and G.712 standards. Also we have clearly shown three main
advantages versus newely proposed codec: higher quality, bit compression, and simplier
realization. It is shown how we will achieve a high quality signal-to-quantization-noise-
ratio value, that does not have a large variation during input volume range. It is also
shown that this novel speech coding algorithm based on noise compression using scalar
nonlinear compandor switching technique can be applied for VOIP applications and
effective coding of signals that likewise speech signals follow Gaussian distribution and
have the time varying characteristics. The main benefit of our algorithm is that we have
reached the noise compression without lossless compression, which can be performed
after proposed algorithm, in order to upgrade characteristics of transmission systems.
ACKNOWLEGMENT
The authors would like to thank the Editor and the anonymous reviewers, for their
careful reading and for many interesting remarks. Also the authors wish to thank
prof. Mihajlo C. Stefanovic, for his suggestions that significantly improved the
quality of this paper.


REFERENCES
1. N. S. Jayant, and P. Noll, Digital Coding of Waveforms, Prentice-Hall, New Jersey,
1984.
2. G. Lukatela, D. Drajic, G. Petrovic, and R. Petrovic, Digitalne Telekomunikacije,
Graevinska knjiga, Beograd, 1981.
3. M. H. Johnson, and A. Alwan, Wiley Encyclopedia of Telecommunications-Speech
coding-Fundamentals and appli-cations, Wiley, 2003.
4. ITU-T, Recommendation G.711: Pulse Code Modulation (PCM) of Voice
Frequencies, International Telecommunication Union, 1972.
5. Y. Hiwasaki, H. Ohmuro, T. Mori, S. Kurihara, and A. Kataoka, A G.711 Embedded
Wideband Speech Coding for VoIP Conferences, IEICE Transactions on Info and
Systems, Vol. E89-D, 2006, pp. 2542-2552.
6. Y. Hiwasaki, T. Mori, S. Sasaki, H. Ohmuro, and A. Kataoka , A Wideband Speech
and Audio Coding Candidate for ITU-T G.711 WBE Standardization, ICASSP, 2008.
7. O. Hersent, J.P. Petit, and D. Gurle, Beyond VOIP Protocols - Understanding Voice
Technology and Networking Techniques for IP technology, Wiley, 2005.
8. Z. Peric, A. Jovanovic, M. Stefanovic, and S. Bogosavljevic, Switched Nonuniform
Polar Quantization of Sources with a Wide Dynamic Range of Power, Journal of
Communications Technology and Electronics, Vol. 52, 2007, pp. 13401349.
9. Z. Peric, "Quantization Optimizations of Speech Signal in Wide Volume Range,"
Electroniccs and Electrical Engineering, Vol 45, 2003, pp 41-48.
10. Z. Peric, I. Djordjevic, M. Stefanovic, and S. Bogosavljevic, "Combined Source and
Channel Coding of Speech Signal in Wide Volume Range," Proceedings of the
IASTED International Conference Signal and Image Processing (SIP'98), October
28-31, 1998-Las Vegas, Nevada, USA
11. Ramalho G.711 Losless (RGL) Codec Whitepapper.html, http://216.252.110.31
12. Z. Peric, A. Mosic, and S. Panic," Robust and Switched Nonuniform Scalar
Quantization of Gaussian source in a wide dynamic range of power", Automatic
Control and Computer Sciences, Springer, Vol.42, 2008, pp. 334-341.
13. K. Popat and K. Zeger, Robust quantization of memoryless sources using
dispersive FIR filters, IEEE Trans. Commun. Vol. 40, 1992, pp.1670-1674.
Zoran H. Peric was born in Nis, Serbia, in 1964. He received
the B. Sc. degree in electronics and telecommunications from
the Faculty of Electronic En gineering, Nis, Serbia,
Yugoslavia, in 1989, and M. Sc. degree in telecommunications
from the University of Nis, in 1994. He received the Ph. D.
degree from the University of Nis, also, in 1999.He is currently
Professor at the Department of Telecommunications and
vicedean of the Faculty of Electronic Engineering, University of Nis, Serbia. His current
research interests include the information theory, source and channel coding and signal
processing. He is particulary working on scalar and vector quantization techniques in
speech and image coding. He was author and coauthor in over 100 papers in digital
communications. Dr Peric has been a Reviewer for IEEE Transactions on Information
Theory. He is member Editorial Board of Journal Electronics and Electrical
Engineering
Aleksandar V. Mosi was born in Leskovac, Serbia, in
1983. He received M.S. degree in electrical engineering from
Faculty of Electronic Engineering, Ni, Serbia, in 2007. He
joined the Department of Telecommunication, Faculty of
Electronic Engineering, Ni in 2008 as Research Assistant on
joint project between Faculty of Electronic Engineering and
Ministry of Science Republic of Serbia. His current research
interests include the information theory, source and channel coding and signal
processing. He has published several papers on the above subjects.

Stefan R. Pani was born in Pirot, Serbia, in 1983. He
received M.S. degree in electrical engineering from Faculty of
Electronic Engineering, Ni, Serbia, in 2007. He joined the
Department of Telecommunication, Faculty of Electronic
Engineering, Ni in 2008 as Research Assistant on joint project
between Faculty of Electronic Engineering and Ministry of
Science Republic of Serbia. His current research interests
include the information theory, source and channel coding and signal processing. He has
published several papers on the above subjects.

Das könnte Ihnen auch gefallen