Beruflich Dokumente
Kultur Dokumente
x
e x p
= . (2)
Total distortions consist of two components, granular and overload distortion, where
the granular and overload distortions are defined as:
( ) ( )
=
1
2
2
1
N
i
t
t
i g
i
i
dx x p y x D
, (3)
( ) ( )
=
1
2
2
N
t
N o
dx x p y x D
. (4)
where t
i
, t
i-1
, y
i
and y
n
are defined as in [12] After some mathematical transformations,
with small lost of accuracy similarly to [12], we can derive the following expression for
granular and overload distortion:
g
D =
( )
(
+ +
+
1
2 2 1
3
1 ln
max
2
2
max
2
2
2
2
x x
N
. (5)
(
(
\
|
|
|
\
|
|
\
|
+ =
2
2
1 1
2
2
max
2
max max
2
max 2
x
o
e
x x
erf
x
D , (6)
where erf(x) denotes the error function.
Since we now know how to calculate distortion for robust quantization of a Gaussian
source that has variable average power in a wide range, we can find the signal power-to-
total-distortion ratio (dB), which is denoted as signal-to-quantization-noise ratio SQNR
instantaneous value of a signal masked by Gaussian distributed noise at time:
( (( ( ) )) )
( (( ( ) )) )
2
2
2
2 2
2
2 2
1
SQNR 10 10
1 2 2 1 2
1 1 1
3 2
log log
ln
c
t
D
c
c c c erf ce
N
= = = = = = = =
| | | | | | | | + ++ +
| | | | | | | |
+ + + + + + + + + + + + + + + +
| | | | | | | |
\ \ \ \
\ \ \ \
(7)
where c = x
max
/.
Optimization of total distortion is derived in two steps. First, we accomplish
adaptation on maximal amplitude of input signal, or the optimization for parameter c in
correspondence to , which is described as:
c
D
t
=0 ( )
opt
c c = . (8)
And then in the second step, we find required
opt
, for which total distortion should
has its minimum, which is described as:
( )
opt
c c
t
D
=
=0 ( )
min , t opt t opt
D D = = . (9)
3. A NOVEL SPEECH CODING ALGORITHM
Every speech coding scheme can be defined as an algorithm since speech coding is
performed using numerous steps or operations. The focus of our analysis is speech
coding algorithm based on loss compression using scalar quantization switching
technique. At the beginning we must know the corresponding values of
opt
and c
opt.
Hence, the first step of the considered algorithm (see Figure 1) is known as buffering.
Namely, a finite number M of input samples (frame) are used for variance computation,
where M 1 is a finite number, known as a frame length. The estimated variance is
quantized by the j-th scalar quantizer and used to scale the current input sample x
k
. For
this block we must define and store in memory following vector [
0j
,
1j,
2j
] for j-th
quantizer, where
0j
2
[
2
1j,
2
2j
), and
K
j=1
[
2
1j
[dB]
,
2j
[dB])= [-20,20) follows:
1j
[dB]= -20 +(j-1) ;
2j
[dB]= -20 +j ;
oj
[dB]= -20 +(j-1/2) , (10)
with = 2x
max
/N being quantization step size. The index J specifying the class is used to
select a particular codebook from a predesigned set of K codebooks. One frame has
length of M. The index to identify the class is sent on the end of block. If each of the K
codebooks has size N, the bit rate per sample is: R=log
2
N + (log
2
K)/M. Codebook size N
depends on number of bits n that are used for the encoding. The relation between N and
n is N =2
n
, where n is the number of bits per sample. Then the scaled input x
k
is coded
with nonadaptive (fixed) scalar coder that consists of scalar encoder and scalar decoder
in cascade. Encoder converts the input signal to the index of an interval into which the
input signal falls. Decoder converts the index of an interval into which the input signal
falls to the input signal.
Fig. 1. Speech coding algorithm based on forward adaptive technique using nonlinear scalar
switching compandor (NSSQ)
Output signal
k
x
J
k
I
k
I
J
J
k
x
Speech signal,
opt,
c
opt
Channel transmission
Transmission of index J
Variance estimation
Nonuniform scalar quantizing
Buffering
Encoding by NSSQ
Decoding by NSSQ Decoding by nonuniform SQ
^
g
The index, denoted with I
k
, is mapped to (the codeword that represents) the
reconstruction level, denoted with
^
j
g , corresponding to the interval in the decoder. Note
that the index J that represents the side information needed for the decoding with j-th
scalar decoder. If we have K codebooks for transmission of side information we will
need N
0
bits, where K = 2
N
0
. Namely, switching technique can accurately of control the
variance level the input sequence to be quantized, but the side information must be
transmitted to the decoder. As we mentioned, encoder converts the input signal to the
index of an interval into which the input signal falls. This operation consists of following
steps. In first step, we compute x
maxj
of j-th compandor as x
max j
=c
opt
0j
. Then we
perform companding given by (11):
( )
x
x
x
x g
j
k
j
x
j
k
sgn
) 1 ln(
) 1 ln(
max
max
+
+
=
(11)
At the end, we convert the input signal to the index of an interval into which the input
signal falls. The index, denoted with I
k
, is mapped to (the codeword that represents) the
reconstruction level if the following equation stands:
( )
+ +
+1 max max k j
x
j k j
I x g I x
k
, k = 1,, N - 1. (12)
where = 2x
maxj
/N is quantization step size. After channel transmission, we need to
convert the index of an interval into which the input signal falls to the input signal. First,
we compute x
maxj
of j-th decoder as x
max j
=c
opt
0j
. Then, we convert the index of an
interval into which the input signal falls to quantized value of input signal. The codeword
that represents input signal, denoted with
( )
^ k
x
j
g , is mapped to the index of an interval
into which the input signal falls to the input signal if the following equation stands:
( )
^
max
1
2
k
x
j k j
g x I
| |
= + +
|
\
, k = 1,, N - 1. (13)
Finally, we reach the reconstructed value k x
^
. Expanding in step four is given by (14):
( )
( )
( )
^
max
ln 1
^ ^
max
1 sgn
x
k
j
k
j
g
x
j x
k j
x
x e g
+
| |
|
| |
|
| =
|
|
\ |
|
\
. (14)
4. RESULTS AND DISCUSSIONS
Relying on the analysis of the proposed speech coding scheme (algorithm), here we
provide some numerical results. In order to obtain the results we consider the compandor
model having K=16 codebooks and assume the optimal value of for each codebook of
our NSSQ model. Furthermore, we assume dynamic range of 40 dB that corresponds to
the speech signal modeled by the Gaussian distribution. In Fig.2, we have compared
performances of our NSSQ algorithm with those achieved by the G.711 standard.
Measurement of these performances was quality of signal given by SQNR .
-20 -15 - 10 -5 0 5 10 15 20
22
24
26
28
30
32
34
36
38
40
42
44
46
48
50
1) NSSQ N=512, K=16,c
o pt
=4.6,
opt
=5.5
2) NSSQ N=256, K=16,c
o pt
=4.2,
opt
=4.5
3) NSSQ N=128, K=16,c
o pt
=4,
opt
=4
4) G.711
5) G.712
S
Q
N
R
[
d
B
]
20 l og (/
0
) [dB]
1
2
3
4
5
Fig.2 Compare between G.711 and G.712 standards and our algorithm of NSSQ in function of
SQNR.
In order to point out that the proposed speech coding algorithm provides high quality
speech coding, we will use G.712 standard that defines the smallest SQNR that have to be
achieved for high quality transmission. Comparration of these performances are also
given at Fig.2. Here we can see that SQNR for NSSQ varies acts for each input power
range [
2
1j
/
2
0,
2
2j
/
2
0
),
K
j=1
[
2
1j
/
2
0
[dB]
,
2
2j
/
2
0j
[dB])= [-20,20), in case of 16
codebooks and codebook sizes of 512, 256 and 128. Our bit rate per sample is: R =
log
2
N + (log
2
K)/M, where M denotes frame length, in our case, when we use 16
codebooks, and for frame length of 80, we will have bit rate per sample of 9.05, 8.05,
7.05 bits per sample respectively. We are using codebook sizes of 512 and 256 to reach
very high standards of quality. Comparing with G.711 standard which has average
quality of 32.18 dB, we derive conclusion that we easily have overachieved his average
quality value for almost 16 dB and 10 dB, using bit rate per sample of 9.05, 8.05,
respectively. Even for bit rate per sample of 7.05 bits per sample our model shows better
performances of almost 4.6 dB comparing to 8 bits per sample, G.711 standard. Also,
considering signal-to-quantization-noise-ratio values a measurement of quality, we can
see that with applying our speech coding algorithm insures that average value of SQNR
has approximately 2.4 dB higher value than maximal value of G.712 standard, even for
bit rate per sample of 7.05 bits per sample. Considering this, there is a conclusion, that if
we want to satisfy the same standard for varying of SQNR in twice larger input power
range of [-40dB, 40dB], we will have to use same codebook size for 32 codebooks. If we
want to satisfy the less restrictive standards of SQNR variance for each input power range
[
2
1j
/
2
0,
2
2j
/
2
0
), we can use smaller number of codebooks, and if we want to achieve
smaller peak value of SQNR, we can use smaller size of each codebook, for each input
power range. Also we can compare our model with the codec proposed in [5,6]. First, we
clearly see that we overachieve quality of G.711 standard for almost 16 dB and 10 dB in
cases when we are trying to reach very high standards of quality for wideband speech
and audio coding, with using bit rate per sample of nearly 9 and 8 bits per sample,
respectively, while proposed codec in [5],[6] overachieves quality of G.711 standard for
only 0.5dB. This advantage of our model is caused by the usage of variance adaptation.
Another big advantage is compression. Namely, the codec proposed in [5],[6] requires bit
rate per sample of 10 bits per sample. This means that our model provides not only
higher quality of nearly 15.5 dB or 9.5 dB, comparing with the proposed codec in [5],[6],
but also provides bit compression of 1 or 2 bits respectively. And finally, we found that
our nonuniform switched scalar compandor is much simpler for practical realization, than
proposed codec using vector companding, although they both have same delay. There is
a general conclusion that our method has better performances than codec proposed in [6].
Also, G.711 standard transmission is here clearly overachieved and could be substituted
by our method for wide range of transmission applications including VOIP.
5. CONCLUSION
In this paper a novel speech coding algorithm is proposed. Our switching algorithm
is performed by estimating variance and further coding and decoding with nonuniform
switched scalar compandor (NSSC- Nonuniform Switched Scalar Compandor). We have
discussed the performances of signal coding schemes designed according to G.711 and
G.712 standards and a novel presented codec standard for wideband speech and audio
coding. We have pointed out the benefits that can be achieved by using our algorithm.
Considering signal-to-quantization-noise-ratio values as a measurement of quality, we
can see that with applying our speech coding algorithm we have clearly overachieved an
average quality of G.711 and G.712 standards. Also we have clearly shown three main
advantages versus newely proposed codec: higher quality, bit compression, and simplier
realization. It is shown how we will achieve a high quality signal-to-quantization-noise-
ratio value, that does not have a large variation during input volume range. It is also
shown that this novel speech coding algorithm based on noise compression using scalar
nonlinear compandor switching technique can be applied for VOIP applications and
effective coding of signals that likewise speech signals follow Gaussian distribution and
have the time varying characteristics. The main benefit of our algorithm is that we have
reached the noise compression without lossless compression, which can be performed
after proposed algorithm, in order to upgrade characteristics of transmission systems.
ACKNOWLEGMENT
The authors would like to thank the Editor and the anonymous reviewers, for their
careful reading and for many interesting remarks. Also the authors wish to thank
prof. Mihajlo C. Stefanovic, for his suggestions that significantly improved the
quality of this paper.
REFERENCES
1. N. S. Jayant, and P. Noll, Digital Coding of Waveforms, Prentice-Hall, New Jersey,
1984.
2. G. Lukatela, D. Drajic, G. Petrovic, and R. Petrovic, Digitalne Telekomunikacije,
Graevinska knjiga, Beograd, 1981.
3. M. H. Johnson, and A. Alwan, Wiley Encyclopedia of Telecommunications-Speech
coding-Fundamentals and appli-cations, Wiley, 2003.
4. ITU-T, Recommendation G.711: Pulse Code Modulation (PCM) of Voice
Frequencies, International Telecommunication Union, 1972.
5. Y. Hiwasaki, H. Ohmuro, T. Mori, S. Kurihara, and A. Kataoka, A G.711 Embedded
Wideband Speech Coding for VoIP Conferences, IEICE Transactions on Info and
Systems, Vol. E89-D, 2006, pp. 2542-2552.
6. Y. Hiwasaki, T. Mori, S. Sasaki, H. Ohmuro, and A. Kataoka , A Wideband Speech
and Audio Coding Candidate for ITU-T G.711 WBE Standardization, ICASSP, 2008.
7. O. Hersent, J.P. Petit, and D. Gurle, Beyond VOIP Protocols - Understanding Voice
Technology and Networking Techniques for IP technology, Wiley, 2005.
8. Z. Peric, A. Jovanovic, M. Stefanovic, and S. Bogosavljevic, Switched Nonuniform
Polar Quantization of Sources with a Wide Dynamic Range of Power, Journal of
Communications Technology and Electronics, Vol. 52, 2007, pp. 13401349.
9. Z. Peric, "Quantization Optimizations of Speech Signal in Wide Volume Range,"
Electroniccs and Electrical Engineering, Vol 45, 2003, pp 41-48.
10. Z. Peric, I. Djordjevic, M. Stefanovic, and S. Bogosavljevic, "Combined Source and
Channel Coding of Speech Signal in Wide Volume Range," Proceedings of the
IASTED International Conference Signal and Image Processing (SIP'98), October
28-31, 1998-Las Vegas, Nevada, USA
11. Ramalho G.711 Losless (RGL) Codec Whitepapper.html, http://216.252.110.31
12. Z. Peric, A. Mosic, and S. Panic," Robust and Switched Nonuniform Scalar
Quantization of Gaussian source in a wide dynamic range of power", Automatic
Control and Computer Sciences, Springer, Vol.42, 2008, pp. 334-341.
13. K. Popat and K. Zeger, Robust quantization of memoryless sources using
dispersive FIR filters, IEEE Trans. Commun. Vol. 40, 1992, pp.1670-1674.
Zoran H. Peric was born in Nis, Serbia, in 1964. He received
the B. Sc. degree in electronics and telecommunications from
the Faculty of Electronic En gineering, Nis, Serbia,
Yugoslavia, in 1989, and M. Sc. degree in telecommunications
from the University of Nis, in 1994. He received the Ph. D.
degree from the University of Nis, also, in 1999.He is currently
Professor at the Department of Telecommunications and
vicedean of the Faculty of Electronic Engineering, University of Nis, Serbia. His current
research interests include the information theory, source and channel coding and signal
processing. He is particulary working on scalar and vector quantization techniques in
speech and image coding. He was author and coauthor in over 100 papers in digital
communications. Dr Peric has been a Reviewer for IEEE Transactions on Information
Theory. He is member Editorial Board of Journal Electronics and Electrical
Engineering
Aleksandar V. Mosi was born in Leskovac, Serbia, in
1983. He received M.S. degree in electrical engineering from
Faculty of Electronic Engineering, Ni, Serbia, in 2007. He
joined the Department of Telecommunication, Faculty of
Electronic Engineering, Ni in 2008 as Research Assistant on
joint project between Faculty of Electronic Engineering and
Ministry of Science Republic of Serbia. His current research
interests include the information theory, source and channel coding and signal
processing. He has published several papers on the above subjects.
Stefan R. Pani was born in Pirot, Serbia, in 1983. He
received M.S. degree in electrical engineering from Faculty of
Electronic Engineering, Ni, Serbia, in 2007. He joined the
Department of Telecommunication, Faculty of Electronic
Engineering, Ni in 2008 as Research Assistant on joint project
between Faculty of Electronic Engineering and Ministry of
Science Republic of Serbia. His current research interests
include the information theory, source and channel coding and signal processing. He has
published several papers on the above subjects.