Beruflich Dokumente
Kultur Dokumente
Presentation Outline
Problem Denition and Background Proposed system Double Talk Detection Speaker Identication Performance Evaluation
R. Saeidi et al.
2 / 13
Fundamentals
Recognize BOTH of the speakers existing in a MIXED audio le Novelty of this work is including double-talk detector (DTD) as a pre-processor for a previously proposed speaker identication back-end
R. Saeidi, P. Mowlaee, T. Kinnunen, Z. H Tan, M. G. Christensen, S. H. Jensen and P. Frnti, a Signal-to-signal ratio independent speaker identication for co-channel speech signals, IEEE 20th International Conference on Pattern Recognition, ICPR 2010, , pp. 4565-4568, Istanbul, Turkey, August 2010.
R. Saeidi et al.
3 / 13
There are SOME studies to recognize BOTH of the speakers, but they need at least TWO microphones [1] There are FEW studies to recognize BOTH of the speakers when we have only ONE microphone [2] Building a stand alone speaker identication system as a computationally less intensive alternative for Super Human Iroquois system [2] Bringing single-talk/double-talk information in frame-level to improve monaural speaker identication
[1] Y. E. Kim, J. M. Walsh, and T. M. Doll, Comparison of a joint iterative method for multiple speaker identication with sequential blind source separation and speaker identication, in Odyssey 2008: The Speaker and Language Recognition Workshop, Jan. 2008. [2] J. R. Hershey, S. J. Rennie, P. A. Olsen, and T. T. Kristjansson, Super-human multi-talker speech recognition: A graphical modeling approach, Elsevier Computer Speech and Language, vol. 24, no. 1, pp. 4566, Jan 2010.
R. Saeidi et al.
4 / 13
System structure
R. Saeidi et al.
5 / 13
Mixed signal
2 1 0 1 0 4 3
0.2
0.4
0.6
0.8
1.2
1.4
1.6
1.8
Speaker 1
Speaker 2
2 1 0 1 0 0.2 0.4 0.6 0.8 1 Time (sec) 1.2 1.4 1.6 1.8
Figure: Double-talk detection results for mixture of male and female mixed at 3 dB SSR. (Labels are -1 for no speech, 0 for mixed signal,1 for speaker 1 and 2 for speaker 2.)
R. Saeidi et al. Monaural Speaker Identication 7 / 13
Speaker Identication
Frame Level Likelihood (FLL)
The main idea is to use mixed speech domain GMM models We use here T0 frames of input feature stream which are recognized to be mixed speech We compute FLL as: sigt = log[p(xt |ig )] log[p(xt |U BM )] (1) Finding the most probable speaker for each frame, we count number of winning frames per speaker and normalize it
Speaker Identication
Kullback-Leibler divergence (KLD)
We use here T0 frames of input feature stream which are recognized to be mixed speech We compute KLD as: KLDig =
1 2 M m=1 wm (me
Speaker Identication
Score Fusion
For T0 frames of input feature stream which are recognized to be mixed speech we form the score per speaker as: score = 0.5 KLD + 0.5 FLL For T1 (T2 ) frames of input which are recognized to belong to speaker 1 (2), we pass them to KLD module to nd the best match idx is the identied speaker from single-talk frames, we add a bonus score to its decision score as: score[idx] = score[idx] + T1 /T (or T2 /T )
R. Saeidi et al. Monaural Speaker Identication 10 / 13
Evaluation Corpus
Grid corpus
Number of sentences per talker: 1000 Number of speakers: 34 (18 male and 16 female) Corpus size: 34,000 Number of distinct sentences: 2048 Files duration: typically 1-2 sec
Speaker Identication
Results
Table: Speaker identication performance (% error) where both speakers are correctly found in the top-3 list. Yes/No indicates whether the proposed DTD method is included. For the ST scenario both systems provide no error.
SG DTD SSR -9 dB -6 dB -3 dB 0 dB 3 dB 6 dB Average No 7.26 3.35 0.56 1.68 2.79 6.15 3.64 Yes 6.70 3.35 0.56 1.68 2.23 5.59 3.35 No 17.50 6.00 2.50 1.00 6.50 9.50 7.17 DG Yes 13.03 5.00 2.00 2.00 5.00 10.50 6.17 Average No Yes 8.00 3.00 1.00 0.83 3.00 5.00 3.47 5.32 2.29 0.61 0.61 1.89 4.37 2.57
R. Saeidi et al.
12 / 13
Conclusion
Successful ideas from speaker verication is applied for monaural speaker identication Mixed speech with dierent SSRs used to train speaker GMMs Speaker models are created by MAP adaptation rather than conventional ML train Double talk detection introduced to enhance speaker identication system performance
MATLAB code
will be made available in my webpage: cs.joensuu./pages/saeidi contact: rahim@cs.joensuu.
R. Saeidi et al.
13 / 13