Beruflich Dokumente
Kultur Dokumente
Measurement
Description
Manual
February 2009
SwissQual License AG
Baarerstrasse 78
CH-6301 Zug
Switzerland
Internet: http://www.swissqual.com
Office: +41 32 686 65 65
Fax: +41 32 686 65 66
Part Number: 16-100-200047-3 Rev 2.20
Contents
1
Contents
Examples ........................................................................................................................... 30
Evaluation of the transmitted signal...................................................................................30
Evaluation of the transmitted signal...................................................................................31
4
Round Trip........................................................................................................................ 43
Introduction ........................................................................................................................ 43
The Round Trip Method..................................................................................................... 43
Results............................................................................................................................... 43
References ........................................................................................................................ 43
Appendix .......................................................................................................................... 44
Abbreviations ..................................................................................................................... 44
Figures
Figure 2-1 Block Diagram of the SQuad........................................................................................... 8
Figure 2-2 Main outcomes of SQuad-LQ ......................................................................................... 9
Figure 2-3 Typical MOS-LQ values for Different Codecs ................................................................. 9
Figure 2-4 Frequency Shift ............................................................................................................. 11
Figure 2-5 Histogram of Noised Speech Sample ........................................................................... 12
Figure 2-6 Level Chart with AGC.................................................................................................... 13
Figure 2-7 Similarity Chart with Impulsive Noise............................................................................ 13
Figure 2-8 Background Noise......................................................................................................... 14
Figure 2-9 Level Chart with Handover............................................................................................ 14
Figure 2-10 Time Clipping .............................................................................................................. 15
Figure 2-11 Variable Delay, Voice Jitter ......................................................................................... 15
Figure 2-12. Example for Variable Delay, which shows that Block B is delayed for 244 samples
to the left (arrives earlier when compared with the same reference block). Block B arrives later by
244 samples. .................................................................................................................................. 16
Figure 2-13 Frequency Shift ........................................................................................................... 17
| Contents
Tables
Table 2-1 Example for Variable Delay where five blocks are elayed at different offsets regarding
Reference Speech Sample............................................................................................................. 16
Table 3-1 Energy Windows used in Calculation of SPLR and PLR ...............................................22
Table 4-1 DTMF Result Code......................................................................................................... 33
Table 7-1 One Way Delay Quality Classes .................................................................................... 43
Contents
Introduction
This document describes the parameters that are measured with the SwissQual QoS
Measurement System. It also describes briefly the used algorithms as well as some background
information with regards to the causes of different kind of quality degradations. The screenshots
are made from the SwissQuals Post Processing System NQDI.
| Chapter 1
Introduction
For network operators or equipment manufacturers, it is important to know where and why the
speech quality may be degraded. Since speech quality is a major factor determining customer
satisfaction, encoding techniques must be designed for optimal speech quality. In order to assess
the quality of speech encoding techniques, large-scale auditory tests are commonly employed.
However, it is practically impossible to reproduce results obtained in such way. Furthermore, such
results are depending on the level of motivation of the individual test candidates. Therefore, it is a
big advantage to have an instrumental method capable of physically measuring speech quality
parameters and producing results, which correlates as closely as possible with subjectively
acquired results. The perfect transmission of speech via a telecommunications channel with a
bandwidth of 0.3 - 3.4 kHz results in a sentence intelligibility of approx. 98%. The speech coders
introduced for handsets used in digital mobile radio networks also further impair intelligibility.
Speech quality is a vague term compared with bit rate, echo or loudness. Since customer
satisfaction can be measured directly by the quality of the transmitted speech, encoding
techniques must be selected and optimized based on their speech quality.
SQuad Method
SQuad consists of three main parts. First, a pre-processing unit adjusts reference and coded
sample. Then, an auditory model is used to reduce both samples to their perceptually relevant
features. Finally, an assessment unit evaluates the perceptual difference between reference and
coded sample and outputs the result as a MOS value.
A speech sample is transmitted over a line with generally unknown combination of speech coders.
This speech sample is available in digital form. The sampling frequency is 8 kHz and the digital
quantization is 16 bits. As an initial step, the source speech signal is read into the vector x(i) and
the coded speech signal into the vector y(i). These speech signals are synchronized with respect
to both time and level. The DC offset must be removed from every sample. In addition, the signals
are normalized to a common RMS (Root Mean Square) level, to ensure that the constant
amplification factor is not taken into account.
The signals are split into processing units of 32 ms duration, also called Frames. The unit overlap
is 50%. During the first processing step, the frame is multiplied by a hamming window. The
source signal x(t) in the time domain is now transformed to the frequency domain using a discrete
Fourier transform, followed by computation of the squared magnitude FFT spectrum. Both signals
are filtered using a filter equivalent to the receiving curve of the corresponding telephone handset.
A rough approximation of the time masking is already achieved through the frame overlapping
during the signal pre-processing. The comparison method of SQuad is based on the following
principle; Signal parts with high energy are more important for the perceived speech quality. A
similarity coefficient for reference and impaired signal is computed for 4 different energy
thresholds. Only the parts of the signal exceeding the respective threshold are considered. This
can be viewed as a multi-resolution analysis with respect to signal energy. The overall
similarity is then computed using the coefficients from all thresholds. A polynomial is used to
transform the comparison result to the ITU MOS scale. The length of the speech sample varies
between 4 and 30 seconds.
Degraded
signal
Network
IRS-filtering &
BG Noise
detection
N
U
T
Time &
Level
alignment
Psychoacoustic
s
modelling
Listening only
Quality
Estimation
- round-trip delay
- jitter
Listening only
Quality
estimation
Overall
Q
Audio
Quality
QLQ
- echo
- call setup quality
Frequency
equalization
Referenc
e
Psychoacoustic
s
modelling
Other
measured
data
MOS Rating
Speech Quality is defined as a measure of a listeners satisfaction and is generally expressed as
a Mean Opinion Score (MOS). SQuad delivers MOS rating as one number, ranging 1 to 4.5, fully
in accordance to the Listening Scale defined in ITUs P.800 recommendation. This is not exactly
the same scope as MOS which is defined with 1-5. This is allowed since based on subjective
tests used for the validation of Squad-LQ the values above 4.5 have almost never appeared.
As described in ITUs P.800 recommendation Annex B.4.5, various five-point category-judgment
scales may be used for different purposes. The Listening Only quality scale is the most
frequently used for ITU-T applications:
Score
Excellent
Good
Fair
Poor
Bad
The following picture gives an overview about the obtained results in the main section of NQDI:
| Chapter 2
Codec
Typical MOS
Value
Typical
SQuadLQ
G.711
4.3
4.4
G.729
3.8
3.7
G.723.1
(6.3)
3.5
3.5
GSM-EFR
4.0
3.9
GSM-HR
3.4
3.3
AMR 12.2
4.0
3.9
AMR 7.4
3.8
3.7
AMR 4.75
3.4
3.4
Channel Gain
This is a value in dBr, which shows the power level of the received signal relatively to the
reference (input) signal. Because, SQuad-LQ is applied to the electrical interfaces of the
connection, the terminal depending Send Loudness Rating (SLR) and the Receive Loudness
Rating (RLR) as well are modelled in SQuad-LQ itself. In Principle, SQuad-LQ is connected to the
so-called 0dbr-point of the networks input. At this 0dBr point a nominal level of -26dBov
(corresponds to -20dBm at a four-wire 600 Ohms interface) will be inserted.
The Channel Gain reflects only gains or attenuation caused by network (exception: attenuating
PSTN subscriber loops). It is close to the so-called JLR (Junction Loudness Rating) but does not
apply any spectral weighting.
In a transparent ISDN connection the Channel Gain should be around 0 dB. In principle also in a
Mobile-to-ISDN or Mobile-to-Mobile connection this value should be around 0dB too. Caused by
individual signal amplifications of cellular network providers this value might differ. Mainly they
amplify the signals, so a gain in the positive range can be observed. If a overall gain of 6dB is
exceeded, amplitude clipping may occur. This will lead like in a real call to quality impacts and
result in a lower SQuad-LQ score.
On the other way around, an attenuating PSTN subscriber loop may lead to negative Channel
Gains because it is part of the evaluated transmission chain. Like a PSTN phone, which is more
sensitive, also SQuad-LQ gain internally such attenuated signals to a nominal level of -26dBov
(corresponds to 79dB sound pressure level at the subscribers ear).
To inform the user of SQuad-LQ, within NQDI Channel Gains outside of the expected range are
highlighted. The expected range is here +6-9dB and in an extended range down to -15dB.
The Channel Gain is available as a single overall value in dBr (total Gain) but also as a range of
values in the time domain (every 16ms) like a an attenuation profile. Based on this attenuation
profile values a chart can be created providing information on:
AGC (Adaptive Gain Control) Elements that are not working correctly
Level Jumps (for example after a handover)
Level Interruptions (for example interruptions in the audio path or during handovers)
Clipping
Temporal Speech Clipping (also called front-end clipping) is the loss of speech frames. It may
occur when voice activity detection is used, when Digital Circuit Multiplication Equipment
(DCME) is used or during uncontrolled slips. Time clipping is presented as clipped frames in a
function of time.
Clipping is an annoying phenomenon that cuts off a bit of speech in the instant it takes for the
transmitter to detect presence of speech. It is almost impossible to eliminate clipping in a
traditional circuit-switched voice conversation. Using circuit switching, the transmitter is not turned
on until sound is detected, and by then, a piece of the speech has been clipped off. SQuad
detects this clipping and generates the results as a distribution of time. The resolution of the
clipping measurement is 8 milliseconds. First, the mean energy per 8 milliseconds is calculated.
The energy values are then saved for each frame (both reference and coded). After the whole
speech sample has been processed, the post processing of time clipping data is done. There are
some simple rules during this post-processing:
Time clipping can only occur during transitions pause-speech.
Minimum pause length must be reached. In our case, it is 64 milliseconds.
The difference Energy (ref) Energy (cod) must be at least 10 dB.
Clipped frames are succeeding frames.
The clipping measurement values are indicated as an average % value per sample (number of
active speech frames / number of clipped speech frames) and as a time domain distribution. Time
Clipping in SQuad-LQ is calculated each 8 ms, but only an average value of two succeeding
frames is reported in output file.
DC-Offset
This number shows the DC-Offset of the coded signal in percentage. This is an important piece of
information if the measured speech quality is lower than expected. Various interface problems
(impedance, coding technique, HW) can produce DC-offset discrepancies.
DC Offset is calculated as
100 * average_audio_voltage / Max_audio_voltage
Max_audio_voltage for 16 bit digital resolution is equal 2^15 (32768).
For example: average_audio_voltage=300 results in DC_Offset=100*300/32768=0.91%
10
| Chapter 2
Frequency-Shift
A low bit rate encoder can move the formants (spectral peaks) of the speech. This degradation
can be described as frequency shift of one or more components of the source signal. This
drift is measured as a percentage of moved frequency components in the speech active phases.
The result is a number of pos- and neg -shifted frames in %, reflected in a compressed
frequency (bark). Figure 2-2 shows a typical situation for one processing buffer of voice signal
(32ms).
For the detection of the frequency shift, the peaks above the loudness threshold in both
reference- and degraded-signals are analyzed. The threshold for compressed loudness is set to
10. The position of each peak in the reference is compared with the position of the peak in the
coded signal (within +/- 1Bark). Frequency shift is found if the location of the two peaks is not at
the same. The amplitude of the coded and reference loudness must not be equal but above the
threshold value. This is allowed because the level- and frequency-alignment is done previously in
a separate module.
Typical Network Elements that are responsible for frequency shift are:
Very low bit rate vocoders
Speech enhancer (Noise suppressors)
Non linear filter elements
Speech Threshold
This is a value in dBov, which shows a level of the speech in a coded signal. The measurement
is based on building of r.m.s. histograms for both coded and reference signals. dBov means
decibel relative to a digital over-load point. The range for this value is 90 to 0 dBov. For signals
containing background noise, this value is between 55 to 40 dBov.
11
A histogram evaluates an individual frequency for a set of data bins. The result is a number of
occurrences of a value in a data set. A histogram table presents the energy-grade boundaries and
the number of scores between the lowest bound and the current bound.
Energy Histogram for noisy signal
25
Noise position
Count
20
Speech level
15
Bound position
10
5
-15.0
-16.6
-18.2
-19.7
-21.3
-22.9
-24.5
-26.1
-27.7
-29.3
-30.8
-32.4
-34.0
-35.6
-37.2
-38.8
-40.4
-41.9
-43.5
-45.1
-46.7
-48.3
-49.9
-51.5
-53.0
0
RMS of the coded Signal (dB)
In Figure 2-4, is shown an example of a histogram for noisy speech signal of 10 seconds duration.
In SQuad-LQ, internally, the histogram is presented with 50 bins between minimum r.m.s and
maximum r.m.s. values. There are two maxima, one for speech-pauses and one for speech active
intervals. In our example, the first maximum is found at 45.1 dB, which is level of silent intervals.
Second peak is at about 26 dB which is speech active level. Speech threshold measured in
SQuad-LQ is defined as a boundary between these two peaks (Bound position).
Degradations
The below list present some possible degradation reasons for the Listening Quality Value using
a clean reference sample:
AGC (Adaptive Gain Control) Elements
Speech Enhancer / Noise Suppressors
Impulsive Noise
Background Noise
Interruptions
VAD (Voice Activity Detectors)
Variable Delay or Jitter in Packet Networks
AGC Problems
Indications: LQ less than expected, Level Chart indicates an abnormal level trend.
Example of an AGC of a mobile handset that attenuates too strong toward the end of a sample:
12
| Chapter 2
Impulsive Noise
Indications: LQ less than expected. Similarity Chart shows a lot of quite big degradation peaks.
Background Noise
Indications: LQ less than expected. Signal Envelope Chart shows some additional energy
during the speech pause.
Example:
13
Interruptions
Indications: LQ less than expected. Similarity Chart shows blue bars and Signal Envelope
indicates a peak drop.
Example of an Interruption due to a Handover (interruption is indicated in blue):
Interruption _ result =
Nr _ Of _ IntFrames
16
Interruption result is in the range 0 and 1 with step=1/16. If only one sub-frame is lost (interrupted)
in the signal, then is Interruption=1/16=0.0625. When the signal in all sub-frames is deleted (lost)
then is Interruption=1.
14
| Chapter 2
Delays Deviation
DelaysDeviation is placed in the section !SQuad_LQ_AVG (in Squad result file) and is
defined as an absolute value of the standard deviation of block delays (D), divided by an average
of block delays [in samples]. The duration of one sample at 8000 Hz, sampling frequency is 125
s. DelaysDeviation shows the smoothness of an array of delays. Small DelaysDeviation
value means there is a uniform delay distribution, where a large value indicates a big delay-jitter
like in IP networks. For only one single delay, this value is equal zero.
15
stdev( D)
DelaysDeviation = fabs
average( D)
Example: Coded file has fixed offset of 1024 samples to the reference file. Six blocks with
different variable delays are found with Squad-LQ:
Table 2-1 Example for Variable Delay where five blocks are elayed at different offsets regarding Reference
Speech Sample
Block
Delays (D) in
samples
All
1024
780
-244
1024
244
1536
512
1024
-512
1304
280
N D 2 ( D )
Stdev(D)=
=241.53
average(D)=
1
N
D =1115.3
DelaysDeviation=241.53/1115.3=0.217
Delay Spread is also another important parameter, which describes the maximum delay
amplitude calculated over all single group delays. Based on the example above, we can calculate
new Delay Values (D), which are scaled D values by subtracting a fix delay from all other
values.
Figure 2-12. Example for Variable Delay, which shows that Block B is delayed for 244 samples to the left
(arrives earlier when compared with the same reference block). Block B arrives later by 244 samples.
Delay Spread is calculated as a distance between the minimum and the maximum block delay. In
our example, minimum value is 512 samples and maximum is +512 samples. So the distance
between max and min equals 1024 samples. This is then converted to time in ms.
1
Fs
Fs = 8000 Hz
In the calculation for our example, we get the value for DelaySpread=1024/8000=128 ms.
16
| Chapter 2
Frequency Shifts
The distribution of the frequency shifts is shown in the histogram below, with the number of
frames in which a shift at a certain frequency occurred. The diagram covers the whole range of
frequencies in steps of 31.25 Hz.
Quality Code
The thresholds for each degradation descriptor is, as follows:
MOS-drops
Quality distribution is unsteady such as during handovers or interruptions.
Received signal level out of recommended range
The level difference to the reference level exceeds +9dB or falls below -12dB.
Signal interruptions
Temporal clipping for more then 8 ms.
High DC-Offset
Malfunction of terminal or interface card. DC-Offset > 0.2%.
Variable delay
Indicates possible packet-switched transmission.
Variable delay during speech
Same as Variable delay but occurring during speech active intervals.
Background noise
High level of circuit noise. Higher then 50 dBov.
Impulse noise
Relay/switching problems detected. More then 1 pulse / second.
Low bitrate coding / coding artefacts
Low bit rate coding scheme has been used (e.g. Less then 8 kbit/s) or residual errors from
decoding are introduced (e.g. by frame loss concealment).
Not Specified
signalizes that the speech quality is degraded but no outstanding reason for that degradation
could be classified.
OK
shows that the speech quality is nearly non-degraded
Furthermore, special problems in the audio-path will be reported:
Silence/Audio Level Too Low
There is no signal activity in the audio path or the signal level is below -45dBov. SQuad-LQ will
not calculated since it will lead to misleading results.
Corrupted Signal/Wrong Reference
17
Here the received audio signal is heavily corrupted (e.g. only partly transmitted or the audio
stream was lost completely). Such a behaviour can observed e.g. during a call drops. Normally,
SQuad will score those signals with close to 1.0. For statistical reasons, NQDI allows the
exclusion of such results from the reporting.
This indicator will also signalize if a wrong reference signal was used for SQuad.
That both results are basing on the same algorithm. The transformation according to P.862.1
describes only a scale mapping.
5.0
Scale limit = 4.5
4.5
4.0
P.862.1
3.5
3.0
2.5
2.0
1.5
1.0
-0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
P.862
Note:
The P.862 results are a bit lower in tendency compared to SQuadLQ especially in the range from 3.0 4.0. It is mainly caused by a high
sensitivity of P.862 regarding clipping and time-variant filtering.
It has taken also into account that P.862 does not rate any linear distortions such as frequency
responses. Those linear distortions will be compensated completely by P.862 itself before the
quality prediction starts.
18
| Chapter 2
The P.862 option has to be enabled by the test-type 'Speech-P.862' and requires a special
software key.
19
Introduction
The noise suppression is a feature designed to enhance speech quality in a range of
environments where there is significant (acoustic) background noise. The noise suppression
function is a pre-processing module that is used to improve the signal to noise ratio of a speech
signal prior to voice coding.
For noise suppressors, there are certain requirements that need to be fulfilled:
The noise suppression function must not have a statistically significant distorting effect on
clean speech in comparison with the performance of the speech codec without noise
suppression applied.
The noise suppression function must not introduce any degradation of speech and no
undesirable effects in the residual noise when there is (acoustic) background noise in the
speech signal.
DTMF and other signalling tones transmission performance during the application of noise
suppression shall be no worse than when noise suppression is turned off.
The above requirements are all checked with SQUAD Noise Suppression test.
The algorithm measures the Noise Power Level Reduction (NPLR) and Signal-to-Noise Ratio
Improvement (SNRI), similar to the definitions in ETSI STC SMG11 (GSM 06.77) document. A
comparison of the SNRI and NPLR measures are used to acquire an indication of possible
speech distortion produced by the tested NS method.
For the Noise Suppression test, two reference signals are used:
Clean speech reference
Clean speech with background noise
The sample with background noise is sent as a test sample.
Listening Quality
Speech Quality is measured according to ITUs P.800 where the coded file and the clean
reference are inputs for SQuad LQ algorithm. The algorithm is elaborated in Section 2.
The Listening Quality evaluation is running twice. The LQ of the noised input signal is estimated
and the LQ of the de-noise output signal as well. From both results the change of the speech
quality is derived.
20
| Chapter 3
First, the internal reference (MOS_ref) is calculated. Degraded signal is assessed by comparing it
with the clean reference. Result is presented on CCR scale by subtracting MOS_ref from the
measured MOS.
Speech quality measurement in noisy environment is done by sending the noisy reference
through the network under test. The noisy reference is made by adding a noise signal to the clean
reference. Comparing the clean reference with the coded signal would not produce stable results,
since the SNR of the noisy reference will impact the results. To make this measurement
independent from the reference properties, MOS_ref is calculated first. MOS_ref defines the
reference speech quality, which will be measured in degraded signal if there would not be any
degradations or improvements in the network. This value is mostly lower then 4.5 (excellent
quality) because of noise influence. The range of MOS generated by Squad-NS is 3.5 to +3.5,
which is slightly different from the ITU definition.
Comparison Category Rating (CCR)
The range of the Comparison Category Scale (CCR) as defined in source ITU P.800:
3:
Much Better
2:
Better
1:
Slightly Better
0:
1:
Slightly Worse
2:
Worse
3:
Much Worse
The CCR methods are particularly useful for assessing the performance of telecommunications
systems when the input has been corrupted by background noise. An advantage of the CCR
method over the other scales is the possibility to assess speech processing that either degrades
or improves the quality of the speech.
21
Range Description
Level class
noise
pause frames
-2
unused
Figure 3-2 The Five Energy Windows for 16 bit Digital System (90.3 dB dynamics)
22
| Chapter 3
Range
Description
Quality
<0
worse
>0
better
no improvement
Range
Description
Quality
no noise reduction
<0
good
>0
bad
23
The definition of Windows is given in Table 3. The aim of this measurement is to detect the
influence of noise reduction circuits on speech parts of the signal.
Five SPLR values are calculated: SPLRh , SPLRm , SPLRl , SPLRn and SPLR p . SPLRn is
equal to NPLR value. Good noise reduction would generate SPLRh closed to zero and
SPLR p below 10 dB. The trend curve down through these five values shows the quality and
ability of noise reduction circuit to reduce only noisy frames and to keep unchanged the speech
active frames. In other words, the first coefficient (a) of the trend curve y=ax+b must be negative
(see example in Figure 16). The SPLR measure in SquadNS algorithm is equal to this coefficient
(a) of the trend curve.
Figure 3-4 SPLR Calculation out of Five Values Calculated in Five Different Energy Windows.
The bottom picture shows good noise reduction, whereas on the right is shown poor noise
reduction.
24
| Chapter 3
SPLR is then mapped to a new range 1 4.5 (like MOS scale). This mapping from SPLR to
SPLRm is shown in Figure 16. SPLRm > 2.5 should be achieved for good noise reduction.
Overall NS Quality
25
Quality Index
The calculation of Quality Index is done by using of four input parameters: NPLR, SNRI, SPLR
and MOS_acr
Quality Index was introduced first in SW Release 2.2. Four values: SMOS, SNRI, NPLR and
SPLR are combined into one objective number. SMOS is measured with SQuad-LQ, where the
clean reference is compared with the coded signal. The range for Quality Index is 1 to 4.5 (like for
MOS). Rating 1 is standing for bad quality and 4.5 for excellent one. The following equation
shows the calculation of quality index based on four input parameters previously scaled into range
1-4.5.
=1
Note:
The Quality Index describes the performance of the noise
reduction system in combination with the network and not the Listening
Quality of the de-noised signal.
The following table shows some measurement examples for different network conditions including
noise reduction effects:
26
| Chapter 3
The Quality Index correlates much better with MOS_CCR than MOSobj based only on speech
quality evaluation.
Convergence Time
For the measurement of the Convergence Time in a noisy signal, the algorithm examines the first
two seconds of the given signal. For the calculations it uses the filtered difference between the
coded and the reference signal (red color, see Figure 3-6).
Figure 3-10 Example of Convergence Time Evaluation
First it checks whether the signal belongs to the noise or pause group and then it compares data
with the set threshold. The threshold is calculated as NPLR + 25 (default, use PERCENT to
change) percent of the difference between the maximum value of the filtered signal in these first 2
seconds and noise level afterwards (NPLR). If the filtered data is lower than the threshold the first
condition for the convergence is fulfilled (see Figure 3-7).
27
Filtered difference
Convergence
0
-5
11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171 181 191
-10
-15
-20
Threshold
-25
Figure 3-11 Filtered Difference Envelope is Compared with the Threshold Value
The second condition is that the signal has a falling tendency. To verify that we check 5 (default,
use CT_NR_POINTS to change) equally spaced points over the tested convergence time. In case
of the falling signal the difference in values between every two consecutive points has to be less
than zero. In Fig. 21, we see that the difference between signal values in third and forth points is
bigger that 0, which signifies raising tendency of the signal. Here we perform additional check to
clarify what is actually going on.
30
20
10
190
183
176
169
162
155
148
141
134
127
120
113
99
92
106
-10
85
78
71
64
57
50
43
36
29
22
15
Threshold
-20
-30
-40
Figure 3-12 Five Point Analysis of the Difference Envelope during Decision on Noise Reduction State
This test is based on the average level of the signal before and after the first convergence
criterion is met. If the average level of the signal after falling below the threshold is less than that
threshold, and the average level of the signal before that point, is higher than the same threshold,
we say that the signal has converged. If not, the algorithm continues searching for convergence
until the end of 2 seconds buffer.
28
| Chapter 3
30
20
10
181
171
161
151
141
131
121
111
101
-10
91
81
71
61
51
41
31
21
11
Threshold
-20
-30
-40
The noise suppression and the noise reduction are both used to enhance speech quality in a
range of environments where there is significant (audible) background noise (see Fig. 24). The
noise suppression reduces the noise in pause and noise power classes, and has very little, or not
at all, influence on higher power classes (speech active intervals). To draw a distinction, the noise
reduction reduces the noise equally in all power classes. Therefore we have based our algorithm
on measurement of the difference between signal power level reduction in high and medium
speech power class. If the difference between two levels is less than a calculated threshold we
say that the noise suppression was applied.
29
An important role has the level of noise in the reference signal. If the signal to noise ratio of the
reference signal is higher than 30 dB, we say that reference signal is not good for conducting the
measurement, due to too low noise level. The same reference SNR is used for calculating the
threshold offset between SPLRs of high and medium power classes.
Examples
The Signal Envelope shows that the noise is really reduced and the speech part is more or less
the same as for the reference signal.
30
| Chapter 3
Besides of the pure LQ also the speech and noise levels are shown. Furthermore, the clipping
value can be used for evaluation non-linear processings by the NS device.
31
DTMF Tests
Introduction
In telecommunications today, the most used signalling system is DTMF signalling. DTMF stands
for Dual Tone Multi-Frequency. As the name suggests, the DTMF signal consists of two
superimposed sinusoidal waveforms with frequencies chosen from a set of eight standardized
frequencies.
When a DTMF signal is sent over a network it can be degraded, especially when it is encoded.
For an operator of a network, it is of interest to know if the receiver of the DTMF signals can
convert the DTMF signal back into a digit or a symbol. The objective is to measure the percentage
of detected and undetected DTMF digits.
In the first part of SwissQual's algorithm for DTMF test, the algorithm scans through a given signal
and detects the locations of DTMF signals. Once a DTMF signal is found, the algorithm calculates
the characteristics and decides if the signal is valid. If the tone is invalid, the DTMF-Test
describes which condition that was not accomplished. The algorithm collects all characteristics
and saves them in a file.
The DTMF signal used for tests, which consist of two frequencies. According to the CCITT
Recommendation Q.23 [5] and Q.24, there are two frequency groups, each with four frequencies:
The figure below shows how the frequencies are allocated to the various digits and symbols of a
push-button set. Every digit and symbol consists of a frequency from the low and the high group.
Figure 4-1 Allocation of Frequencies to the Various Digits and Symbols of a Push-button Set
DTMF-Test Overview
One or more DTMF signals are sent over a network. The coded signal will be used for the DTMFTest. This signal is available in digital form; the data format is PCM (without compression). The
sampling frequency is 8 kHz or 16 kHz. The digital quantization of the signal can be 8 bit
(unsigned or signed) or 16 bit (big or little endian). Inside, the algorithm works with 16 bit
32
| Chapter 4
DTMF Tests
resolution. The figure below illustrates the basic algorithm of DTMF-Test. DTMF-Test saves its
result in a comma delimitated text file.
Criterions
The objective of the SwissQual model for DTMF testing is to measure the percentage of
undetected DTMF digits processed through the network. The DTMF signals are generated at the
frequencies specified in the ITU-T Rec. Q.23.
The algorithm follows the ETSI guidelines defined in "TS 101 235-1" "Technical Specification of
Dual Tone Multi-Frequency (DTMF)".
The received DTMF signal shall be detected as valid when:
Only two of the signalling frequencies are present, one from the high group and one from the low
group, fulfilling the conditions as described above
Each of these signalling frequencies are within +/-(1,5 % +2 Hz) of the nominal value
The level of each of these two signalling frequencies is within the range -27 dBV to -5 dBV
The difference in level of these two signalling frequencies is not more than 6 dB.
Results
Table 4-1 DTMF Result Code
Code
Description
Tone Length
Pause Length
33
Code
Description
Measured Level
Level Deviation
Freq. Low
Freq. High
DevFreqLow [Hz]
DevFreqHigh [Hz]
DevFreqLow [%]
DevFreqHigh [%]
Twist [dB]
Signal Valid
Signal Match
34
| Chapter 4
DTMF Tests
Code
Description
If a tone in the coded file is too short, or wrong in any other aspect, it
is not matched with the reference tone the field 'SignalMatch' is set
to the code 'A'
Cause: AdditionalNotRegular (B)
If there are one or more irregular tones with no reference, the field
'SignalMatch' is set to the code 'B'
Cause: MissingTone (C)
If a reference tone has no match but there are two or more irregular
tones, the field 'SignalMatch' is set to the code 'C'
Cause: MultipleMissingTones (D)
If there are two or more reference tones with no match but three or
more irregular tones, the field 'SignalMatch' is set to the code 'D'
Cause: MultipleMissingTones (E)
If there are more reference tones with no match than irregular tones,
the field 'SignalMatch' is set to the code 'E'
Cause: MultipleTone (F)
If a reference tone has two or more matching tones in the coded file,
the field 'SignalMatch' is set to the code 'F'
Cause: AdditionalTone (G)
If there is no reference for a tone in the coded file, the field
'SignalMatch' is set to the code 'G'
Cause: MissingTone (H)
If for a reference tone there is no tone in the coded file, the field
'SignalMatch' is set to the code 'H' Cause: Disparity (I)
If the number and order for a string of tones in reference and coded
file cannot be matched, the field 'SignalMatch' is set to the code 'I'
35
Introduction
The measurement application of the Acoustic Echo Check (also called: Acoustic Echo Check) can
be applied to a SwissQual measurement probe at the far end side as well as to any number on
that an automatic hook-up device is connected. SQuad-AEC does not require any artificial test
signals but it is optimized to detect echoes by using human speech as measuring signal. So it
works for all technologies that serve voice communications and make the algorithm ready for inservice live monitoring.
The SQuad-AEC measurement will detect echoes in that active connection by sending a speech
signal to the far end side and observing the receiving direction for any reflections. If no signal is
inserted at the far end side the procedure is measuring during that single talk situation only.
Because commonly used Non-Linear-Processors like VAD's suppress low power send signals
also echoes will not occur in such connections. Therefore the SQuad-AEC algorithm is also
designed to detect echoes during double talk situations. Such a double talk situation may simulate
by an active playing answering station or by using a real phone at the far-end side and talking in
during the measurement. This double talk at the far end side will switch through the sending path
and also the echo can be transmitted.
The SQuad-AEC Test is especially designed to detect electrical as well as acoustical echoes and
is able to detect 'dry' and 'hallow' acoustical echoes as well as hybrid echoes and is more robust
against double talk. In case of a 4-wire connected the far-end station the echoes introduced by
the network will be found. By using real (echo-producing) terminals the insertion of echoes
caused by network AND the terminal can be calculated.
Echo Measurement
This Advanced Echo Check Passive Test (AEC passive) does not simulate anything on BSide. The A-Side starts a call and after B-Side has answered the call; the collecting of the downlink (B->A) audio stream is started. When the recording of the stream has finished, the search of
echo signal is started by comparing the registered signal with the reference signal.
On the B-Side, we can use any (self-answering) voice terminal or a SwissQual Diversity
measurement probe. The AEC algorithm is able to detect echo in presence of background noise
and double talk.
The algorithm is running in two steps:
Observing a wide range of echo delay for possible echoes (scan procedure)
Analysing accepted echo regions in detail for calculating the echo loss and the other results
Measurement Results
The AEC algorithm generates the following results:
Signal type
Echo Delay in milliseconds
Echo Loss during Single Talk acc. ITU-T G.122
Echo Loss for the complete signal (incl. Double Talk)
Echo Objection Rate acc. ITU-T G.131 in %
36
| Chapter 5
Echo status shows the grade of annoyance of the echo signal. There are three possible values:
GOOD stands for good echo performance, FAIR for an acceptable echo performance and POOR
for annoying echoes. Using the Echo Objection curves given in ITU-T G.131 the Echo Status is
derived. Therefore the TELR is estimated by adding 10dB to the Echo Loss and the
corresponding cross-point between the TELR and half of the Echo Delay (= one way transmission
time of the echo).
Please remark that in case of pure single talk situations a powerful echo region might classified
as Double Talk and a Double Talk Ratio of some percentage will shown if the adaptive Double
Talk Threshold is exceeded. This might be observed especially for time varying echo paths.
Furthermore, a huge mount of noise in the receiving part may be classified as double talk too.
37
The GSM 03.50 test is defined in ETSI GSM 03.50 (Section 3.4) and derives the required
Terminal Coupling Loss (TCL) from the G.131 TELR (talker echo loudness rating) chart. Under
the assumption of a no-loss 4-wire connection from the measuring point to the terminal, the ERL
can be interpreted directly as the TCL, because the terminal itself is the only existing source of
echoes. Thus, SQuad-AEC measures the TCL value directly in this case. Thus, we can set:
TCL = TELR - (SLR + RLR) dB, where typically SLR + RLR = 10 dB
TCL = TELR - 10dB
TCL should be ideally 40 dB to 46 dB. 46dB is derived from 1% EOR curve of G.131, with
maximum delay (about 400 ms). If a TCL of higher than 46 dB is reached, the 1% EOR curve will
never pass even for high delays (if no other echo sources besides the terminal exist).
If the measured TELR = TCL + 10 dB is higher than the 1% EOR, the GSM3.50 test shows a
passed value. In the case of a lower TCL, this value is considered to be in the failed range.
EOR (Echo Objection Rate) in % is an estimate of the percentage of the listeners who has
perceived a talker echo when listening to a given telephone setup. ITU-T G.131 shows two
different curves one for 1% EOR and one for 10% EOR. It is assumed that a set of equally
shaped curves will describe each EOR between 0100%. Based on the described crossing point
of estimated TELR and the half of the Echo Delay a (theoretical) corresponding curve can be
derived and the assumed EOR can be taken.
If this EOR is less than 1%, the Echo Status is also GOOD, if this EOR is above 10% it is POOR.
Between both values the echo will be rated as FAIR.
The Distance to 1% EOR is also calculated directly from the chart given in ITU-T G.131. This
value gives the distance to the 1% EOR curve for the calculated echo delay. All negative values
are in the 'green region,' values above 10dB are in the 'red region.'
Echo Loss profile: The Echo Loss is shown graphically versus the delay time. This figure is for
detailed information and should visualize the echo region found. This Echo Loss profile is the one
result of the scan process and represents only situation during single talk. This profile is used for
pin-pointing echoes only. The detailed echo analysis itself will be done in a separated step and
therefore the echo loss can not derived directly from this curve.
38
| Chapter 5
The Level of Received Signal in dBov gives only information about the r.m.s. level of the
received signal at all. It covers echoes, double talk sequences and noises.
39
Introduction
The active named measurement application of the Acoustic Echo Check can be applied only to an
SwissQual measurement probe at the far end caused by active actions has to be done. It is
generating an echo at he far end side.
Echo Measurement
The SQUAD AEC active measurement is using the same echo detection approach as the passive
measurement described above. Compared to the passive measurement, where the far-end side is
silent in the active mode the far-end side will create an echo actively. The SQUAD AEC active
measurement includes an inband synchronization between both sides. In a first communication
the incoming signal will be recorded at the far-end side. Based on that signal an echo is
generated by applying selectable echo path responses on that. If required, the generated echo
can be interlaced with double talk. During receiving the signal second time, the pre-processed
echo will be played back to the sending side for evaluation.
This measurement is especially designed to detect and rate echo cancellers or suppressors in the
network. The generated echo will challenge these echo cancellers and possible integrated levelswitching devices will be forced by an inserted double talk signal.
The echo-detection is more confident if the remaining echo has linear components. Especially
during double talk, only linear dependent echoes can be detected. In connections including low
bit-rate codecs and/or non-linear processors the residual or low echoes might be non-detectable
by the measurement. For more confidence chose higher echo levels to increase their
differentiation from doubletalk and other non-linear components.
40
| Chapter 6
The results in the Figure presented should be used here for discussion of the results as well. The
measurement was done in a Mobile to PSTN connection. The PSTN-side was the echo
generating loop. At this side the incoming signal was convoluted by the echo path response M1
from ITU-T G.168 (G168_M1). Afterwards it was attenuated by 20dB and interlaced by a double
talk signal containing 50% active speech (dt_50_08kHz.wav). An additional delay was not chosen
at PSTN side.
The results show that this echo was detected at the mobile side. The echo path delay of 224ms is
typical for a Mobile to PSTN connection. The echo loss over the complete signal is 21dB, which
reflects pretty well the range of the defined echo at the PSTN side. The echo loss during single
talk is a bit lower, which signalizes that there is an active component reducing the echo in speech
pauses at least a bit. 2
Using this results the corresponding Echo Objection Rate is calculated (here: 54%) and the
distance to the G.131 1% curve (12dB) as well. That means a increasing of the echo loss by 12dB
would be necessarily to reach the 1% curve and therefore the echo status good.
Consequently, by the reached results the echo status is rated as poor.
Additionally, the Double Talk Ratio is 47%, which is caused by the defined signal at the far-end
side.
Also for SQuad-AEC in the active mode the echo loss profile is displayed.
If no echo is found or if it was not detectable, only the status messages and the level of the
received signal will be displayed:
Please note that the channel gain will also influence the measured echo loss. Basically, the
channel attenuation in both directions has to be added to the defined echo loss at far-end side.
The measuring signal is attenuated due to the transmission from A to B, is there attenuated again
(during the defined loss value) and will be attenuated again due to the transmission from B to A
again. The echo loss reflects the level of the received echo compared to the original measuring
signal.
41
So a simple Listening Quality measurement can be done in parallel. The interesting point is here:
How the Listening Quality is affected by double talk/echo in the other direction. By comparison of
both Listening Quality values the double talk capability can be evaluated. If a network is fully
duplex both Listening Quality values should be the same even a double talk signal is chosen.
The right value gives the Listening Quality for the first transmission where no echo or double talk
is played back. The right value gives the LQ during the echo / double talk is sent at the same
time. In addition the channel gain and the clipping of the received signal is also given.
Please note that a strong side-tone at that B-side may affect the SQuad-LQ measurement,
because it interleaves with the received and evaluated signal.
42
| Chapter 6
Round Trip
Introduction
The Round Trip Time is the time a signal needs to travel from the near end side to the far end
side and back. The Round Trip Time is mostly close to the delay of the latest possible echo. The
time speech needed to travel from one talker to the other (One Way Signal Delay) is an
important indicator of the conversational quality of a call. A travel time that is too high leads to the
annoying effect that the talkers interrupt each other unintentionally.
Results
The measurable Round Trip Time is limited from 4ms in minimum to 3000ms in maximum; the
maximal delay jitter between the three repetitions within one measurement has to be below
500ms. The results of the measurement are presented in the following table. In addition, the
lowest of the one way and round trip time of the measurements in milliseconds is shown as final
results.
4 (BEST)
3 (HIGH)
2 (MEDIUM)
1 (BEST
EFFORT)
< 100 ms
< 100 ms
< 150 ms
< 400 ms
References
ETSI TS 101 329-2 V1.1.1 (2000-07), Part 2: Definition of Quality of Service (QoS) Classes
43
Appendix
Abbreviations
Abbreviation Description
ACR
CELP
DCR
DMOS
MOS
dBov
ADPCM
BFI
CCITT
CDMA
CRC
DAC
DMR
DTMF
DTX
EPROM
ETR
ETS
ETSI
FER
FR
Full Rate
GMSK
GSM
44
| Appendix A
Abbreviation Description
GSM MS
HANDO
Handover
HDLC
HR
Half Rate
IEC
ISDN
ISO
ITU
LAN
MSC
OSI
PABX
PDN
PSPDN
PSTN
QOS
Quality Of Service
RXLEV
RXQUAL
S/W
Software
SIM
SS7
TDMA
TE
Terminal Equipment
VAD
Appendix A
45