Sie sind auf Seite 1von 3

Enhanced Silence Detection in Variable Rate Coding Systems Using Voice Extraction

David R. Paoletti
Clarity LLC 1301 West Long Lake Rd Suite 360 Troy MI

Gail Erten
IC-Tech Inc Science Parkway Center 4295 Okemos Rd Suite 100 Okemos MI 48864

Abstract-Voice extraction (VE) is a signal processing technique to isolate a single voice from a mixture of sounds. VE works very well with silence detection algorithms of variable rate coding systems. This paper describes how VE can be used in digital telecommunications for improved bandwidth utilization.

I. INTRODUCTION The primary barrier to the proliferation and user acceptance of voice based command and communications technologies has been noise sources that contaminate the speech signal and degrade the quality of speech processing results. The consequences are poor voice signal quality, especially in far field microphones and low speech recognition accuracy for voice based command applications. The current commercial remedies, such as noise cancellation filters and noise canceling microphones have been inadequate to deal with a multitude of real world situations, at best providing limited improvement, and at times making matters worse. Rather than suppressing noise as many competing noise cancellation technologies, voice extraction techniques described in this paper separate out a single voice signal of interest from a mixture of sounds using time-domain signal processing techniques. Voice extraction (VE) is achieved by exploiting inter-microphone differential information and the statistical properties of independent signal sources [l]. Two microphones are used to record mixtures of sound sources. These microphone signals are then processed to separate out a single voice signal of interest from the mixture. In the most general terms, the algorithms used embody multiple nonlinear mathematical equations that capture the nonlinear characteristics and inherent ambiguity in distinguishing between mixed signals in real environments.

Nporeis the pole capacity, W is the bandwidth, R the data rate, ( E b / ( N o + & ) L is ~the target SNR, and F is the frequency reuse factor [3]. From (1) we can see that the pole capacity is inversely related to v, so that a reduction in the voice activity factor, through improved silence detection, will give increased pole capacity. By attacking the equation through v, a gain in bandwidth can be achieved through changes to the end user cell phone, rather than the existing system infrastructure. In a code division multiple access (CDMA) system, four rates are available to encode speech, typically broken down into full, half, quarter, and eighth rate packets. A silence detector analyzes the microphone signal and decides which rate to use for encoding the current packet. It is desirous to have as many eighth rate packets as possible without degrading the voice quality, since this uses less bandwidth in the cell.
The silence detector works well in quiet environments where the signal to noise ratio of speech is high. In noisy environments, especially when the user moves the communication device away from their lips, the pauses between speech segments get filled with background noise. The signal to noise ratio of the voice signal drops and the silence detector starts to fail. Using VE greatly reduces the noise between speech segments and improves the signal to noise ratio of the voice signal when speech is present. The pauses in speech, which before voice extraction had too much noise content to be identified as silence by the silence detector, contain far less noise after voice extraction. To see a pictorial illustration of how much noise is eliminated, please refer to Fig. 1 and Fig. 2. The elimination of noise allows the silence detector to be effective again and identify the silence periods both more often and more accurately. Because silence is coded at a lower rate than speech, the result is a reduced number of bits that need to be sent over a channel. The economic impact of these savings can be quite phenomenal.
III. TESTING

I I .

USING VE FOR BETTER SILENCE DETECTION

Besides improving the quality of the voice signal for better human and machine intelligibility, VE adds another significant benefit to variable rate coding communications. In variable rate coding systems, silence is coded at a lower rate than speech. This saves bandwidth since the two speakers over a full duplex channel take t u r n s speaking and also because speech naturally contains pauses between words and phrases. Research at Bell Labs shows that a speaker is active approximately 35-40% of the time [2]. This is referred to as the voice activity factor, v .

In cellular communications, the capacity of the pole can be estimated by

Before collecting data, a script was designed which was one half of a 90 second cell phone conversation. Thus the test subject would speak a phrase, such as Hello? and wait for an imagined response before continuing. Besides English, the test was run using native speakers of languages other than English, to show that the VE system is not dependant on langmge in any way. The effect of using VE during cellular communications was tested under various noise con&tions: in a mall food court during the lunchtime rush and in a moving vehicle. The mall test was run with the microphones on a table, at approximately the viewing distance of a PDA device (12-

592
Proc.43rd IEEE Midwest Symp. on Circuits and Systems, Lansing MI, Aug 8-1 1,2000 0-7803-6475-9/00/$10.00OIEEE 2oM)

14",) with three speakers, two male and one female. One of the male speakers spoke in Hindi. In the vehicle, the microphones were mounted on the headliner, and the test was run at three different noise levels. All of the tests were run at 65 mph, with the low noise test adding no additional noise, the medium noise test turning the fan on high, and the high noise test opening with the front windows 3" while leaving the fan on high. This test was run with four speakers, two male and two female. One of the male speakers spoke in Spanish and one of the female speakers spoke in Turkish.

TABLE I VEHICLE TEST DATA Noise Level High High Medium Medium Type Original Extracted Original Extracted Original Extracted
Full rate packets

Half rate packets 364.75

3931.00

Fig. 1. Before voice extraction

4000 3500 3000 2500 2000 1500 1000 500 0 High Medium Low

Fig. 3. Vehicle: full rate packets Fig. 2. After voice extraction The results were recoraea ana men orougnt back to the laboratory for analysis. Although the system runs in real time, recording the tests allows the same data to be used both before and after processing. The recordings were tested using a Qualcomm QCP 2760 digital cellular phone with the Qualcomm DM packet analyzer software running to display packet counts. In all tests, the quarter rate packet counts were zero, and the eighth rate packet counts are not shown. This is because there is a slight human factor delay after the call is established but before the recording is played over the lmk, and this delay is filled with eighth rate packets. Any packet count decrease in full andor half rate packets has produced a corresponding increase in eighth rate packets, as desired. A.
VehicleTests
500

400

300
200

QOriginaI

100
0

High

Medium

Low

Fig. 4. Vehicle: half rate packets


B. MallTest

The data for the vehicle tests were averaged over the four test subjects, and are presented in Table 1. The full rate packet counts are shown in Fig. 3, and the half rate packet counts in Fig. 4. The packet count decrease is significant, from 16.4% savings in the low noise test up to 38.1% in the bgh noise test. It can also be observed that the packet counts for both full and half rate packets are relatively independent of the noise level.

The data for the mall test was averaged over the three test subjects, and is presented in Table 11. The full rate packet counts are shown in Fig. 5. In this test, the half rate packets increased from an average of 4.33 to 181.33. Ignoring this change in half rate packets, there is a savings of 24.4% in full rate packets. If we take the pessimistic approach that only those packets converted to eighth rate are significant and deduct these 177 half rate packets from the full rate packet count, we still see a 2 1.1% decrease in full rate packets that were converted to eighth rate packets. The packet counts are higher than for the vehicle test due to the Hindi speaker taking over two minutes, as compared to the approximately ninety seconds of the other speakers.

593

Type
Original

TABLE I1 MALLTEST DATA Full rate Half rate packets packets

5502.33

4.33

In CDMA, the real savings in bandwidth results fmm a decrease in transmitted power, and the lower rate packets require less power to transmit [3]. The decreased transmission power m p m d by the overall lower data rate may improve battery lifetime, another concern of both c e l l phone rnanufkcbxm and consumers. This could be t e s t e dby transmitting the ori& data i n a continuousloop u n t i l the battery runs dead, performing the same t e s t with the
extracteddata,andcomparingthebathylifetimeineachcase. The i n t e l l i g i b i l i t y has been tested internally,both with human listeners and speech recognitiOn software. A more rigorous test would be running a Mean Opinion Scare (MOS), which uses 100 trajnedpmf&onals to grade the sound quahty. However, bandwidth savings ahd i n t e l l i g i b i l i t y are not the only h i s considemtiom, as there is one obvious problem with adopting t techno log^. On the downside, it requires modifications to the c e l l phone design to inch.& the extra m i q h o n e and processing Capabilities.

6000 5000 4000

3000
2000 1000

Fig. 5. Mall: full rate packets

ACKNOWLEDGMENT This work was supported in part by the SBIR Phase I1 Contract F33615-98-C-1230 fiom the Department of Defense Ballistic Missile Defense Organization (BMDO).

IV.

CONCLUSIONS AND FUTURE RESEARCH

Averagingthe Mlr a t e packet savings over all four t e s t s shows a savings of27.2%. This is an appreciable savings, and should be athadive to c e l l u l a r service carriers. The t e s t s show savings over time of the five m o s t prevalent languages in the world, by number of speakers [ 4 ] , which indicatesits a p p l i c a b i l i t ywhmver c e l lphone useishigh.

REFERENCES
[l]
G . Erten and F.M. Salam, Voice extraction by on-line signal separation and recovery, IEEE Transactions on Circuits and Systems 11: Analog and Digital Signal Processing, vol. 46 7, pp. 915-922, July 1999. P.T. Brady, A statistical analysis of on-off patterns in 16 conversations, Bell System Technical Journal 47, pp. 73-91, 1968. A.H.M. Ross, The CDMA revolution, http://www.cdP.org/tech/a ross/CDMARevolution.html,1999. B.F. Grimes (editor), Ethnologue, 13ed, 1996

[21
[31

[41

594

Das könnte Ihnen auch gefallen