Am Be

PERFORMANCE ASSESSMENT OF 4.8 KBIT/S AMBE CODING UNDER AERONAUTICAL ENVIRONMENTAL CONDITIONS* ‘Simao F.Campos Neto, Franklin L. Corcoran COMSAT Laboratories, 22300 Comsat Drive, Clarksburg, MD 20871, USA é& John Phipps Compression Telecommunications, 19540 Amaranth Drive, Germantown, MD 20874, USA & Spiros Dimolitsas Lawrence Livermore National Laboratory, University of California, Livermore, CA 94551, USA ABSTRACT ‘The quality delivered by low-rate parametric speech coding deployed in commercial mobile satellite systems has to be assessed by formal subjective assessment methods due to their nonlinear nature, Aeronautical systems, especially those for deployment in small aircraft, pose yet ‘more stringent conditions due to high environmental noise levels. In this paper the results of an evaluation of the voice quality of DVSI's 4.8 kbivs Advanced Multi- Band Excited (AMBE) coding under aeronautical channel conditions are presented for listening and conversational experiments, From these assessments it was concluded that that the voice quality of the Inmarsat Aeronautical system will be perceptibly improved when the current full-rate codec is replaced by the AMBE codec. 1. INTRODUCTION Digital air-to-ground and ground-to-air communications systems employing satellite networks have been commercially employed for more than six years now, ‘mostly on large aircraft. In the meantime, the extension of the use of aeronautical systems to aircraft smaller than the wide-body jets where these systems are presently installed has become a commercial imperative, including providing for the use of the aeronautical system for ATR- style commuter jets. This extension requires for the voice system to operate under much worse ambient noise conditions than the currently available aeronautical communication systems. Speech coding technology developed recently and codecs delivering higher quality at lower rates, offer the potential of improved efficiency and performance for systems currently in use. To realize this potential, however, improved voice performance is equired under these high ambient-noise conditions. Adequately characterizing this performance, given the easing non-linear nature of low-rate speech codecs, demands the use of direct subjective evaluation methods. ¥ This work was sponsored by Inmarsat. 0-7803-3192-3/96 $5.0001996 IEEE 499 This paper presents the results of an evaluation of the voice quality of the 4.8 kbit/s Advanced Multi-Band Excited (AMBE) codec (developed by DVSI), under aeronautical channel conditions [1]. The performance of this system was compared against that of 9.6 kbit’s CELP, which is presently used in the Inmarsat ‘Aeronautical system, and a newer generation speech codec consisting of a modified version of the full-rate version but operating at 4.8 kbit/s. The 4.8 kbiv’s AMBE system was recently selected for use in the Inmarsat mini-M (land-mobile) notebook-sized satellite terminal [2) ‘These characterizations, which were functionally divided in four subjective experiments (three listening experi- ‘ments in Phase I, and one conversational experiment in Phase I), included the performance assessment under various input speech-levels, transmission errors, interconnectivity with other signal processing devices, and background noise. In the second phase, only the two best codecs from Phase I were considered. 2, EXPERIMENTAL DESIGN In defining the experiment design for measuring codec voice performance, several factors were taken into ‘consideration, including knowledge of human psychol- ogy, statistics, experiment size, and the objective of the evaluation in terms of the system performance parameters sought. 2.1 Phase I Listening Tests. The first phase of tests was implemented to parametrically determine which of two low-bit rate speech coding schemes would better, suit the evolution of the Inmarsat Aeronautical system. ‘Three listening-opinion tests were designed. Two of the listener-opinion tests were conducted using an Absolute Category Rating (ACR) (single-stimulus) 5-point Mean Opinion Score (MOS) transmission quality scale [3] to quantify the performance under different input-levels, bit errors (random and burst), and tandem combinations, for unweighted (flat) and IRS [3] weighted speech. ACRassessments are usually conducted by arranging for a listener to hear a succession of groups of typically two to three sentences (stimulus), with each stimulus or group of stimuli. being reproduced over a different circuit condition. After each sample is heard, listeners express an opinion with regard to their perception of quality of the processed speech, expressed as an excellent, good, fair, poor, or bad (5, 4, 3, 2, 1) rating. Each opinion is based ‘on exposure to the most recently heard sample only, and listeners are typically given 5 to 10 seconds in which to cast a vote before the next sample is heard. The other listener-opinion test used a Degradation Category Rating (DCR) (dual-stimulus) 5-point Degradation MOS (DMOS) scale to assess the quality degradation for IRS-weighted speech in the presence of vehicle and aeronautical environment background noise, The DCR procedure is similar to the ACR procedure, with the exception that votes are cast for a pait of stimuli, of which the first stimulus is the unprocessed speech, and the second is the speech processed under a given circuit condition. The listener is asked how the second stimulus is degraded in relation the unprocessed speech: no degradation, audible but not annoying, slightly annoying, annoying, and very annoying degradation (5, 4, 3, 2, 1). Each opinion is based on exposure to the most recently heard pair of samples (stimulus) only, and the listener is again given 5 to 10 seconds to cast a vote before the next pair of samples is heard, ‘The experimental designs of the three listening tests were based on a balanced block structure, and provided for arranging the conditions in presentation blocks, where each block contained a complete set of randomized codec- condition combinations. In these tests, eight talkers were used for the ACR experiments and four talkers were used. for the DCR experiment (equal number of male and female talkers). ‘The conditions, which were evaluated by 40 non-expert listeners in each experiment (120 in total), included the network configurations whose assessment was sought, as well as a number of reference systems, including a number of Modulated Noise Reference Units (MNRU, ITU-T Rec. P.80). The two ACR tests also used as reference conditions: IS-54 8 kbit/s Vector-Sum Excited Linear Prediction (VSELP) codec; G.728 16 kbit/s Low- Delay Code-Excited Linear Predition (LD-CELP) codec; the InmarsatM 6.4 kbit/s Improved Multi-Band Excitation (IMBE) speech codec; and four interconnected 32 kbit/s Adaptive Differential Code Pulse Modulation (ADPCM) devices (whose cumulative distortion is accepted as perceptually equivalent to the maximum end- 500 to-end quantization distortion permitted in wireline connections). 2.2 Phase I Conversational Test. The second phase of tests consisted of one conversational experiment to characterize the best lower bit rate codec in Phase I in relation to the full-rate aeronautical codec and to two other reference codecs in a dynamic simulation of actual network conditions, involving air-to-ground and ground- to-sir connections. Environmental noise conditions consisted of one of the booths having aeronautical noise, and the other booth having either office babble noise or simulating a quiet room (40 dBA). Four scales were used, two quintal (quality and ease-of-interruption) and two binary (difficulty-in-conversing, acceptability). The quality scale was identical to the ACR scale, while the cease-of-interruption scale ranged uniformly from “no effort” to interrupt (5) to “extreme effort” to interrupt (1), The difficulty-in-conversing scale consisted simply of asking the subjects whether any difficulty was felt during the conversations (yes or no). The acceptability scale, similarly, consisted of asking whether the connection was ‘considered acceptable. Binary scale scores are computed based on the number of answers “yes”. The design followed a 16x16 Latin-Square, where test conditions (quiet, babble, and aeronautical _noise environments) altemated among the 64 pairs of non- expert conversation participants. Channel conditions included both error (0.1% random bit errors) and error-free situations for the full-rate codec and the best of the two lower rate codecs, plus two reference codecs (8 kbit/s VSELP and Inmarsat M system 6.4 kbit’s IMBE) in an error-free situation. The conversational test also included a 270 ms delay simulating a one-hop satellite configura- tion. 3. RESULTS & ANALYSIS 3.1 Phase I. The results of Phase I, summarized in Table 1, indicated a substantial overall advantage of the AMBE codec over the half-rate version of the aeronautical codec. All statistical analises were conducted at a 95% confidence level. In Experiment 1, where unweighted speech was assessed in the presence of codec input level variation, transmission errors, and double-transcodings (“tandem”), the AMBE codec and the half-rate codec were statistically equivalent, and better in performance than the full-rate codec. For IRS-weighted input speech in the presence of the same circuit conditions (Experiment 2), however, the full- and half-rate aeronautical codecs. delivered an equivalent overall performance, while the AMBE codec performed substantially better than both ofthem. In Experiment 3, IRS-weighted speech contami- nated either by interfering talker or by gaussian, vehicular, aeronautical, lorry or babble background noise was assessed over error-free channel conditions. In this, experiment, the full-rate (9.6 kbit/s) aeronautical codec had the best performance, which was statistically better than the half-rate codec, while the AMBE performed statistically better than the half-rate codec for all the tested conditions. The test variance was approximately 0.9 for the two ACR experiments and 1.7 for the DCR ‘experiment. The average standard error was in 0.06 for the ACR tests and 0.08 for the DCR test, which were within the target experiment accuracies. ‘The AMBE was hence chosen for further testing, since the AMBE codec performed equivalently to the full-rate and the half-rate aeronautical codec for unweighted speech, and better than the latter in the presence of IRS-weighted speech, as well as in the presence of background noise. 3.2 Phase II. In Phase Il, only the full-rate (9.6 kbit/s) ‘aeronautical codec and the 4.8 kbit/s AMBE codec were tested using two quintal and two binary scales. Table 2 reports the results only for the Quality and the Accept- ability scales, which allowed for a more insightful understanding of the dynamic performance of both codecs. ‘The test variance was 0.9 and 0.2 respectively for the Quality and the Acceptability scales. Standard errors were. 0.9 and 4% respectively for those two scales. Examining Table 2, it can be seen that the overall performance of the AMBE codec was equivalent to that of the VSELP codec, while the full-rate aeronautical codec was either worse or in the same range of quality as that of the Inmarsat-M system IMBE codec. Another observation derived by means of a proper analysis of variance is that the codec performance was affected more by the room noise than by circuit condition impairments used. 4. CONCLUSIONS From the results obtained it was determined that the overall performance of the 4.8 kbils AMBE is, in general, better than that of the half-rate version of the Inmarsat Aeronautical codec (both having outperformed the 9.6 kbit/s full-rate aeronautical codec), and was thus selected for further evaluation using conversational methods. From the conversational test, it was subse- quently confirmed that 4.8 kbit/s AMBE coding performed equivalently to or better than the other codecs under ‘Acronautical-noise conditions. Thus, it was concluded that the voice quality of the Inmarsat Aeronautical system will be perceptibly improved when the current full-rate codec is replaced by the AMBE codec, as well as will allow for 501 expansion of the Inmarsat Aeronautical service to smaller aircraft while maintaining good cellular speech quality. Table 1(a): Results for the DCR Listening Ex- periment (Performance with Background Noise) FACTOR, ‘ATR, Takeoff,SIN=6dB ATR, Takeoff S/N=60B WBA, Cruise, SIN=20 dB WBA, Cruise, S/N=20 dB GI Noise 1, SINGD B CI Noise 1, SIN=9 dB ‘CI Noise 1, SIN=9 dB Gi Noise 2, SINEIS 4B Ci Noise 2, SIN=15 4B Babble Noise, S/N=20 dB Babble Noise, S/N=20 dB Babble Noise, S/N=20 dB Babble Noise, S/IN=20 4B Talker Noise, SIN=24 dB “Talker Noise, S/N=24 dB ‘Talker Noise, S/N=24 dB ‘Talker Noise, S/N=24 dB sian Noise, S/N=20 4B [Gaussian Noise, S/N=20 dB Gaussian Noise, S/N=20 4B Gaussian Noise, S/N=20 4B Lorry Noise, SIN=15 4B Lorry Noise, S/N=15 4B IMBE | Lorry Noise, SIN=15 dB “48k Aero | Lorry Noise, SIN=I5 dB WEBI: Wide-body jer CH: Corporate Jet ATR: Turbo-propeller "AMBE 9.6k Aero 4.8K Aero IMBE "AMBE 916k AeroTable 1(b): Results for the ACR Listening Ex- Table 1(¢): Results for the ACR Listening Ex- periment (Effect of Tandem, Input Level Variation periment (Effect of Tandem, Input Level Variation )-20 dBm0 20 dBm0 =10 aBmO =10 dBmo_ “30 dBmO BER=0, -30 dBm0 Coder/Coder, -20 dBm0 ‘Coder/Coder, -20 dBm0 ‘Coder/Coder, -20 dBm0 G.726/Coder, -20 dBmO G.726/Coder, -20 dBmO G.726/Coder, -20 dBm0 BER = 1%, -200Bm0 BER = 1%, -20Bm0 ‘Coder/Coder, -20 dBm0 | (G.726/Coder, -20 4Bm0 6.726/Coder, -20 4Bm0 CiMi=12 dB, -20 dBm 48k Acro ‘CiM=12 4B, -20 dBmO [9.6K Aero BER=0, -20 dBm0 VSELP BER=0, -20 dBm0 436.726 BER=~0, -20 dBm0 IMBE_| _ BER=0, -20 dBm0 ‘Table 2: Results for the Conversational Experi- 5, REFERENCES ‘ment (Environmental noise and channel errors [1] S.F.Campos Neto er al., “Performance Assessment of the Inmarsat Aeronautical Codec”, Final Report, Inmarsat Contract INM/94/1367/ES. [2] S. Dimolitsas et al., “Evaluation of Voice Codec Performance for the Inmarsat Mini-M_ System,” Proceedings, 10th Int. Digital Satellite (ICDSC’10), Brighton, England, May, 95. [3] ITU-T,"Methods for Subjective Determination of Transmission Quality,” Rec.P.80, March, 1993. (4) CCITT, “Specification of an _Intermediate Reference System,” Rec. P.48, Blue Book, Vol. V, pp. 81-86, Melbourne, 1988 Cireuit Noise YQ): scores for the Quality scale Y(A): percentage of “Yes” for Acceptability 502

Am Be

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Am Be

Hochgeladen von

Copyright:

Verfügbare Formate

Das könnte Ihnen auch gefallen