Beruflich Dokumente
Kultur Dokumente
Suman Samui
(Roll no: 13AT91R03)
UNDER THE GUIDANCE OF
Dr. Indrajit Chakrabarti
(Dept of E&ECE)
&
Prof. Soumya Kanti Ghosh
(School of Information Technology)
COURSE WORK: - I have completed my course work (as decided by the honourable DSC) successfully. A list of
subjects credited by me during my course work is as follows.
Subject Code
Subject
Grade Obtained
IT60116
EX
IT60108
CS60052
HS63002
CONTENTS
CHAPTER 1 INTRODUCTION.............................................................................. 1
1.1 Motivation ........................................................................................................... 1
1.2 Objectives ............................................................................................................ 1
1.3 Literature Survey ................................................................................................. 2
CHAPTER 2 MULTI-BAND COMPLEX SPECTRAL SUBTRATION............... 7
2.1 Introduction ......................................................................................................... 7
2.2 Problem Statement and Proposed Algorithm ...................................................... 8
2.3 Implementation ................................................................................................. 11
2.4 Experimental Results ......................................................................................... 12
2.5 Future work ....................................................................................................... 14
REFERENCES....................................................................................................... 15
CHAPTER 1
INTRODUCTION
1.1 Motivation
Hearing impairment is undoubtedly a serious issue. Throughout the world, the number of people, suffering
from hearing loss, goes on increasing rapidly. During the last few decades, the design of an effective hearing
aid turned out to be a challenging problem [1]. Despite the tremendous endeavour in hearing aid
technology in terms of smaller size, processing speed and low power consumption ,the most distressing
fact is that only 60-70 percentage of hearing aid active users are satisfied with their devices [2].
The performance of hearing aid degrades severely in the presence of acoustic back-ground noise. Noise
reduction algorithms must be incorporated in the hearing aid device in order to enhance the quality of
perceived speech signal without compromising the intelligibility of speech. During the long history of speech
enhancement, numerous schemes had been proposed. A broad classification would be to divide the
approaches into time domain approach and transform domain approach. Filtering performed directly on the
time sequences includes techniques such as LPC based digital filtering [3], Kalman filtering [4] as well as
Hidden Markov model (HMM) based filtering [5]. In transform domain approach, noise attenuation is
performed on the transform co-efficient. The transforms used can be discrete Fourier transform (DFT),
discrete cosine transform (DCT) [6], Karhunen Loeve transform (KLT) or even wavelet transform (WT).
Though transform domain approaches suffer from annoying artefact called residual musical tones, these
methods seems to be more popular amongst researchers.
1.2
Objectives
While current hearing aids offer intelligibility improvements in some environments, in complex noisy
situations, where the potential benefit to users is the greatest, the potential is largely unrealized. Feedback
control and noise reduction performance have been identified as two performance bottlenecks. Hearing aids
have recently made the shift from analog to digital devices this has enabled the introduction of digital
signal processing algorithms for feedback and noise control. However, the traditional speech processing
algorithms employed are largely designed for telephony applications and the assumptions invoked in their
development do not reflect the reality of current and future hearing aids. So our aim is to improve the state of
the art of speech enhancement for hearing aids by developing algorithms that take into account the
challenges faced in real environments.
Speech enhancement is a continuously evolving research field with a wide variety of existing techniques.
Moreover, no current speech enhancement algorithm is universally recognized as the "best" solution in every
aspect and every context.
Practically speaking, each method has its strengths and weaknesses, and in general enhancement algorithms
can be characterized by:
The amount of noise reduction
The amount of distortion (or "damage") inflicted to the speech at the output of the enhancer
The effect of the algorithm on intelligibility
The amount and the nature of artefacts introduced in the enhanced speech
The flexibility of the method (i.e., for what range of noises or speakers will it perform as intended, and
conversely, when will it fail?)
The computational complexity of the enhancement system.
In general, one algorithm cannot excel simultaneously in all of the above categories; specifically, it is well
known that one of the central issue in speech enhancement is centred around the trade-off between the
amount of noise reduction and the output naturalness.
Our objective will be to design a speech enhancement algorithm to handle real-world situations. To be more
precise, real-world situations means realistic noisy speech conditions which are characterized by the
following:
Non-stationary noise
Low signal to noise ratio
Reduced intelligibility this means the nature and/or level of the speech or noise adversely affecting the
decipherability of spoken words or sentences.
Fig 1.1: The basic block diagram of a typical filter bank based digital hearing aid.
The basic block diagram of a typical filter bank based digital hearing aid has been shown in Fig.1.1
[9]. The continuous audio signal, collected through the microphone, is converted to digital signal by ADC
and then it passes through acoustic feedback cancellation block (AFC) to reduce the effect of feedback
signal on the current input signal samples. Then, the feedback-free signal is decomposed into a specific
number of frequency sub-bands and passed through noise reduction (NR) block. This NR block attenuates
the background noise and improves the quality of speech.
(1.1)
Simple speech enhancement approaches work in the STFT domain by attenuating frequency bins with low
SNRs, leaving bins with high SNRs unmodified. Fig. 2 presents a block diagram of a STFT speech
enhancement system. The clean speech signal x[n] is mixed with additive measurement noise v[n] to give
the measurement signal z[n]. The noisy signal is enhanced in the transform domain using an estimate of the
noise spectrum and the clean signal estimate is reconstructed from the inverse transformed frames using
overlap-add (OLA) synthesis. STFT-based enhancement approaches exploit the fact that additive statistically
independent noise in the time-domain remains additive statistically independent noise in the frequency
domain; and while speech and noise overlap in time, they may not overlap in all time-frequency bins. STFT
speech enhancement algorithms are commonly used because they offer consistently high noise attenuation
and the availability of the FFT makes them computationally cheap to implement.
| ( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
( ))
(1.2)
If the signal and noise are in phase, then cos(<X(k) <V(k)) = 1 and
( )
| ( )
( ( )
( )
( ))
( )
(1.3)
In the first case, the clean signal magnitude can be recovered by subtracting an estimate of the noisy signal
magnitude |V(k)|. In the second case the clean signal power spectrum can be recovered by subtracting an
estimate of the noise power |V(k)|2. These two approaches are known as magnitude and power spectral
subtraction respectively. Since the subtraction can produce negative spectral components, half-wave
rectification is used to ensure a valid spectrum.
Spectral subtraction is appealing because it is both conceptually and computationally simple, as it requires
only a subtraction, and because it can offer high levels of noise reduction. However, the performance of
spectral subtraction relies heavily upon the accuracy of the noise estimate, |V(k)|. The original
implementation in [10] used a voice activity detector (VAD) to estimate the noise spectrum during speech
pauses, invoking the assumption of quasi-stationary, slowly varying noise. Since the noise estimate is a
smoothed average, there will generally be a mis-match between it and the true spectrum. In addition,
ignoring the cross terms in the governing equation (1.2) will introduce additional error even if the noise
magnitude estimate is exact. This leads to under or over-subtraction of the noise resulting in speech
distortion or residual noise.
The main drawback of the spectral subtraction algorithm is the nature of the residual noise. In timefrequency regions where the noisy signal spectral amplitude is close to the estimated noise amplitude,
successive over and under-estimation of the true noise spectrum leads to fluctuating, narrowband residual
noise components in the enhanced speech known as musical noise or musical tones. Musical noise is
problematic because it occurs in time-frequency regions where speech energy is low, so it is not masked, and
its un-natural quality makes it disturbing to listeners, so that the original noisy speech is often preferable to
enhanced speech with musical noise [11].
In [12], a number of modifications are proposed to reduce the level of musical noise introduced by the basic
spectral subtraction algorithm. An over-estimate of the noise is used to attenuate the spurious spectral peaks
that lead to musical noise, and a spectral floor was introduced to mask any residual peaks. A generalized
exponent is also considered to complement the magnitude and power spectral subtraction algorithms. These
5
modifications combined give the generalized spectral subtraction estimate of the clean speech FFT
coefficient as:
( ) =
( ( )
( ) )
( )
(1.4)
Where is the over-subtraction factor, which is generally SNR dependent, and is the spectral floor, which
is commonly frequency dependent. Due to the variance of the noise magnitude about its mean value, has to
be large to fully prevent the musical noise phenomenon; values of 3 to 6 are used in [12], corresponding to
over-subtraction of up to 8 dB for power subtraction. However, this same noise spectral variance means that
when the instantaneous noise spectrum is below its expected value, the desired speech will be severely
distorted and low energy speech can be removed completely. The over-subtraction factor can therefore be
used to trade-off between speech distortion, musical noise artifacts and residual noise level; and different
algorithms have been proposed to control the over-subtraction and spectral floor parameters to control this
trade-off. Despite these attempts, while spectral subtraction can reduce the perceptual impact of noise, the
modifications required to prevent musical noise result in a processed speech signal which does not improve
and can actually reduce intelligibility compared to unprocessed speech [11].
Spectral subtraction algorithms only enhance the speech magnitude or power spectrum, the noisy signal
phase spectrum is used to reconstruct the clean signal phase. This is often justified by noting that improving
the phase estimate has a relatively small impact on overall SNR speech quality compared to improving the
magnitude spectrum estimate, and that the perceptual impact of phase distortion is lower than magnitude
distortion [13]. However, the presence of cross-terms in equation (1.2) means that a perfect estimate of the
clean signal spectrum cannot be obtained by subtracting the noise spectrum from the noisy signal spectrum,
even if a perfect estimate of the noise spectrum is available.
CHAPTER 2
Multi-band Complex Spectral Subtraction Technique
2.1 INTRODUCTION
The performance of hearing aid degrades severely in the presence of acoustic back-ground noise. Noise
reduction algorithms must be incorporated in the hearing aid device in order to enhance the quality of
perceived speech signal without compromising the intelligibility of speech. Spectral subtraction method is an
extensively used noise-reduction approach though it suffers from an artefact called musical noise [11][14].
To address this problem, several other versions of spectral subtraction had been proposed starting with Bolls
original work [10]. Based on introducing over-estimation and spectral oor factor in the subtraction process,
Berouti et al. [12] suggested a method to reduce the musical noise by subtracting an overestimate of the noise
spectrum from the original noisy speech spectrum, while preventing the resultant spectral components from
going below a pre-set minimum value. A frequency adaptive subtraction factor based approach is proposed
[15][16] based on the assumption that, in general noise may not affect the speech signal uniformly over the
whole spectrum. Some frequencies are affected more severely than the others depending on the spectral
characteristics of the noise. Lockwood and Boudy [16] proposed the non-linear spectral subtraction (NSS)
method where the over subtraction factor is frequency dependent in each frame of speech. Larger values of
estimated noise are subtracted at frequencies with low SNR levels, and a smaller value is subtracted at
frequencies with high SNR levels. Kamath and Loizou [15] have extended this concept and developed a
multi-band spectral subtraction method that divides the speech spectrum into N non-overlapping bands, and
the over subtraction factor for each band is calculated independently. In [17], a method is proposed to reduce
the musical noise in silence and unvoiced regions by dividing each silence and unvoiced frame of spectral
subtracted speech into several sub-frames and randomizing the phases of each sub-frame over a uniform
interval. Hasan et al. [18] have introduced a procedure based on self-adaptive averaging factor to estimate the
a priori SNR, which is applied to the conventional spectral subtraction algorithm.
However, the performance of the above methods has not been satisfactory in adverse environments,
particularly when the SNR is very low. The reason is that in very low SNR conditions, it is very difficult to
suppress noise without degrading intelligibility and introducing residual noise and speech distortion.
Moreover, in most of the research work reported so far, in the spectral domain only the estimated magnitude
or power of noise is subtracted from the original noisy speech signal while the phase of the signal remains
unaltered assuming that human perception is phase-deaf i.e. phase of speech signal has very minor effect on
human sound perception [19]. A few recent experimental results [20] have shown that phase would play a
signicant role in speech perception mechanism, particularly at low SNR. Therefore, intelligibility and
listening fatigue may be improved further if the phase information is incorporated in the algorithm.
Here we proposed a multi-band complex spectral subtraction method incorporating the phase also in the
subtraction process based on following assumptions:
7
Although the phase of the original speech signal cannot be recovered from the noisy audio signal,
incorporation of phase estimation in the spectral subtraction procedure is immensely important for accurately
estimating the magnitude of the clean speech signal, particularly at low SNR condition. Using the proposed
algorithm, on an average 0.87% improvement in PESQ has been observed over Kamaths [15] multi-band
spectral subtraction technique and almost 10.03% improvement over Beroutis [12] spectral subtraction
method for 0 dB non-stationary noise.
(2.1)
Where Y(n), S(n) and N(n) represent the corrupted speech samples, clean speech samples and noisy speech
samples respectively. In the spectral domain, the noisy speech vector can be expressed as the resultant of
noise and clean speech vectors.
(2.2)
where k denotes the frequency bin and j is the frame index. Again using vector magnitude property, it can be
written:
Where
denotes the phase difference between clean speech spectra and noise spectra. If
speech and noise are in same phase then,
(2.3)
=0 i.e. Clean
(2.4)
8
Hence, clean speech signal can be estimated from the noisy speech signal if we estimate the background
noise.
(2.5)
In the real world scenario, the clean speech signal and noise are not co-linear and some error E is
introduced due to the
factor in equation (3). This is best illustrated in the Fig. 2.1 which shows that there
is a signicant difference between the projected |S(k,j)| and the estimated |S(k,j)| that in turn produces the
error E. This error increases further as the SNR decreases. In Fig.2.2, using a single frequency phasor
diagram, it is demonstrated that if clean speech signal magnitude and
are kept constant and the noise
level increases , then the error changes from E1 to E2 ,where E2 > E1. Hence, the computational error
due to the phase difference between S(k) and N(k) is increased for an input speech signal having a low SNR
value.
Fig. 2.1: Phase difference between clean speech and noise spectrum introduces magnitude error in
estimating the clean speech magnitude spectra from noisy signal spectrum.
Fig. 2 . 2: Phasor diagram showing that the estimation error increases with decreasing SNR.
9
To get rid of this problem, in this work, a multiband complex spectral subtraction technique has been
proposed to estimate the clean speech spectrum, which can be expressed in the matrix form as:
(2.6)
where k indicates the frequency bin of the ith frequency band of the jth frame and bi k ei . bi
and ei are beginning and ending frequency bins of the ith band respectively. | i(k,j)| and n are
the estimated magnitude and phase spectrum of the noise respectively. The parameter i is the
over-subtraction factor of the ith band and it is a function of segmental SNR. Moreover, i is an
additional band-subtraction factor (tweaking factor) [15], which can provide an additional degree
of control within each frequency band.
The band-specific segmental SNRi can be given as:
(2.7)
(2.8)
(2.9)
10
Fig. 2 .3: (a) Time-domain representation of input noisy signal,(b) Variation of values of oversubtraction factor alpha with frame-number (c) Variation of values of Delta with frame-number.
Fig. 2.4: The Segmental SNR of four linearly spaced frequency bands of speech corrupted by noise.
2.3 IMPLEMENTATION
The block diagram of the proposed multi-band complex spectral subtraction (MBCSS) technique is shown in
Fig. 2.5. The signal, acquired from AFC, is first pre-emphasised and Hamming windowed using a 25-ms
window and 50% overlap. The magnitude spectrum is estimated using 256 point FFT block. After that,
the speech spectrum is divided into N non-overlapping bands. In each band, the segmental SNR (SSNR)
is evaluated and based on the value of SSNR, over-subtraction factor and all other empirical factors
involved in subtraction process explained in section 2.2, are generated for each band. Phase is also
estimated from taking inputs from the noise spectrum estimator. After updating the noise spectrum and
11
getting the phase spectrum of noise and noisy speech, the complex subtraction process starts. Finally, the
processed bands are recombined and the signal is reconstructed using modified magnitude spectra and
modified phase spectra. The synthesis procedure is carried out with the overlap-add method [12].
Fig. 2.6:
Fig. 2. 7:
Fig. 2.8:
Fig. 2.9:
Time-domain noisy speech signal with SNR-0dB along with its spectrogram.
.
13
14
REFERENCES
[1] Kochkin,S,"MarkeTrak VIII: Consumer satisfaction with hearing aids is slowly
increasing",Hearing Journal: Volume 63- Issue 1, pp.4-48,Jan 2010.
[2] Kochkin,S "MarkeTrak VII: Hearing Loss Population Tops 31 Million," HEARING
REVIEW, vol. 12, no. 7, pp. 16 - 29, 2005.
[3] Lim, J.S.; Oppenheim, A.V., "All-pole modeling of degraded speech," Acoustics, Speech and
Signal Processing, IEEE Transactions on , vol.26, no.3, pp.197- 210, Jun 1978
[4] Paliwal, K.K.; Basu, A., "A speech enhancement method based on Kalman
filtering," Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP
'87.
[5] Zhao, D.Y.; Kleijn, W.B., "HMM-Based Gain Modelling for Enhancement of Speech in
Noise," Audio, Speech, and Language Processing, IEEE Transactions on , vol.15,no.3, pp.882892, March 2007
[6] Ding, H; Soon, I.Y; Yeo,C.K, "A DCT-Based Speech Enhancement System With Pitch
Synchronous Analysis," Audio, Speech, and Language Processing, IEEE Transaction , vol.19,
no.8, pp.2614-2623, Nov. 2011
[7] Hamacher,V ; Fischer, E ; Kornagel, U; Puder, H "Applications of adaptive signal processing
methods in high-end hearing aids," in Topics in Acoustic Echo And Noise Control: Selected
Methods for the Cancellation of Acoustical Echoes, the Reduction of Background Noise, And
Speech Processing (E. Hansler and G. Schmidt, eds.), Springer, 2006.
[8] Hamacher, V ;Chalupper, J ; Eggers, J ; Fischer, E ; Kornagel, U; Puder, H;Rass, U "Signal
processing in high-end hearing aids: state of the art, challenges, and future trends," EURASIP
Journal on Applied Signal Processing, vol. 2005, pp. 2915-29, 10/15 2005.
[9] Chen, Y.J ; Wei, C.W; FanChiang, Y; Meng, Y .L ;Huang, Y; Jou, S, "Neuromorphic Pitch
Based Noise Reduction for Monosyllable Hearing Aid System Application," Circuits and Systems
I: Regular Papers, IEEE Transactions on , vol.61, no.2, pp.463-475, Feb. 2014
[10] Boll, S., "Suppression of acoustic noise in speech using spectral subtraction," Acoustics,
Speech and Signal Processing, IEEE Transactions on , vol.27, no.2, pp.113-120, Apr 1979
[11] Loizou,P.C, Speech Enhancement - Theory and Practice. Boca Raton,FL, USA: CRC Press,
Taylor and Francis Group, 2007.
[12] Berouti. M.; Schwartz. R.; Makhoul. J., "Enhancement of speech corrupted by acoustic
noise," Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP
'79. , vol.4, no., pp.208-211, Apr 1979
15
[26] Rangachari, S.; Loizou, P.C.; Yi Hu, "A noise estimation algorithm with rapid adaptation
for highly nonstationary environments," Acoustics, Speech, and Signal Processing, 2004.
Proceedings. (ICASSP '04). IEEE International Conference on , vol.1, no., pp.305-8, vol.1,pp.
17-21 May 2004
[27] Ma, J., Hu, Y. and Loizou, P.. "Objective measures for predicting speech intelligibility in
noisy conditions based on new band-importance functions", Journal of the Acoustical Society of
America, 125(5), pp.3387-3405, 2009
17