Sie sind auf Seite 1von 3

1

Codebook training

The codebook must be trained with speech data in order to be used for speech enhancement.
The more data used for the training, the better the codebook will represent speech, this also
means that the codebook must be trained with many different people and many varieties of
speech sounds. The training of the codebook gets even more difficult if it is going to be used
for different languages since many languages has unique sounds that differs from other languages.
To make a codebook that includes all ages and several languages will be a demanding task
both when it comes to generating and finding the speech samples that give a good representation of the language but also computing power to calculate the codebook. Computing
power is a big issue, since the time it takes to do the calculations seems to rise like an exponential function of the amount of input data for the training.
Computer power and time were the main limitations for generating a codebook for this
project. The result was that the codebook only represents four persons; two women and
two men. One codebook with all four persons was made and two codebooks with men and
women respectively. For each person approximately five sound files are chosen.
Below are the names of the files for the training of the codebook listed. The three letters
of the prefix describes the contents of the speech: NDS are numbers, PDS are phrases and
finally SDS are sentences. The next two letters are the initials of the person.
Men
NDS_EJN1001016k_original_short.wav
NDS_EJN2001116k_original_short.wav
NDS_EJN3001216k_original_short.wav
NDS_KTN1001116k_original_short.wav
NDS_KTN2001216k_original_short.wav
NDS_KTN3001316k_original_short.wav
PDS_EJK2000416k_original_short.wav
PDS_EJK3000516k_original_short.wav
PDS_KTD2000516k_original_short.wav
PDS_KTD3000616k_original_short.wav
SDS_EJK5000716k_original_short.wav
SDS_KTD5000816k_original_short.wav
Sound duration men: 296 sek.

Women
NDS_HGN1000716k_original_short.wav
NDS_HGN2000816k_original_short.wav
NDS_HGN3000916k_original_short.wav
NDS_JDN1026516k_original_short.wav
NDS_JDN2026616k_original_short.wav
PDS_HGF1000016k_original_short.wav
PDS_HGF2000116k_original_short.wav
PDS_JDG2000316k_original_short.wav
PDS_JDG3000416k_original_short.wav
PDS_JDG4000516k_original_short.wav

Sound duration women: 237 sek

The total duration of the speech files for training of the codebook is 533 seconds. It would
1

CHAPTER 1. CODEBOOK TRAINING

have been desirable to have longer duration, but the limited computer power restricted the
amount of data. Another approach for indirect getting more data could be to narrow down
the number of persons, so the codebook gives an even better representation of only one
person used for the training.

1.1 Matlab code for codebooktraining.


Below on figure 1.1 is shown a simple block diagram of the functions for codebook training. Descriptions of the function K-mean is covered in chapter ?? on page ??, as well as a
more detailed description of error correction and the idea behind the codebook is found in
chapter ?? on page ??.
Final codebook
Wideband
training data

Framing

Bandpass filter
(Telefilter)

Narrowband
codebook
(2048 x 10)

K-means

Index

Error
calculation

Wideband
codebook
(2048 x 20)

Figure 1.1: Overview of the different processes for training the codebook.

The three final codebooks are calculated from the training date of men and women respectively and a combination of both. The number of clusters for these codebooks are 2048.

1.2 Matlab Source Code


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

f u n c t i o n f u n c _ c o d e b o o k t r a i n i n g ( names , s i z e _ o f _ c o d e b o o k )
% F u n t i o n f o r t r a i n i n g t h e co d eb o o k . I n p u t v a r i a b l e s a r e an a r r a y f o r t h e names o f t h e
% t r a n i n g d a t a , end " s i z e o f co d eb o o k " d e s c r i b e s t h e number a f c l u s t e r f o r t h e co d eb o o k .
lsf = [];
etotal = [];
f o r j = 1 : l e n g t h ( names )
u s e d _ w a v _ f i l e = c h a r ( names ( j ) ) ;
[ y , f s ] = wavread ( u s e d _ w a v _ f i l e ) ;
y = y(: ,1) ;
g l o b a l HP LP
load f i l t r e
y = func_telefilter2 (y) ;
fs = fs /2;

%Read w a v e f i l e s

%" Ch a n n el " f i l t e r i n g

framelength = 2010^ 3;
f r am el en g t h o v er l ap = 1010^ 3;
framelengthwindow = framelength ;

%l e n g t h o f f r a m e f ro m i n p u t s i g n a l ( e v e n number ) [ u n i t : s e c o n d ]
%l e n g t h o f o v e r l a p b e t w e e n t o f r a m e s [ u n i t : s e c o n d ]
%+ 2 f r a m e l e n g t h o v e r l a p ; % t o t a l l e n g t h o f f r a m e s [ u n i t : s e c o n d ]

framesamples = framelengthwindow / ( 1 / f s ) ;
framesamplesoverlap = framelengthoverlap / ( 1 / fs ) ;
m ax f r am es = l e n g t h ( y ) / f r a m e s a m p l e s o v e r l a p ;

%l e n g t h o f f r a m e f ro m i n p u t s i g n a l [ u n i t : s a m p l e s ]
%l e n g t h o f o v e r l a p b e t w e e n t o f r a m e s [ u n i t : s a m p l e s ]
%u s e d f o r f r a m e i n g [ s a m p l e s i n f ra m e , number o f f r a m e s ]

tic
f o r f r a m e = 1 : maxframes 1
%f r a m i n g f u n c t i o n s
s i g n a l = f unc_f r ame_in_data ( y , framesamples , f r amesamplesover lap , frame ) ;

1.2. MATLAB SOURCE CODE

29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82

w i n d o w e d s i g n a l = f u n c_ w i n d o w i n g ( s i g n a l ) ;
[ aLPC , e ] = f u n c _ l p c _ c o e f f ( w i n d o w e d s i g n a l , 1 0 ) ;
etotal = [ etotal e ];
l s f = [ l s f p o l y 2 l s f ( aLPC ) ] ;
end
toc
end
% Kmean f u n c t i o n . The p a r a m e t e r " s q E u c l i d e a n " c o n t r o l s t h e d i s t a n c e m ea su rem en t f ro m ea ch
% c l u s t e r t o t h e v e c t o r . " E m t y A c t i o n " s e t s t h e way t h e f u n c t i o n h a n d l e s an em p t y c l u s t e r .
% When " S i n g l e t o n " i s ch o sen , an em p t y c l u s t e r w i l l be d e l e t e d , and a new one c r e a t e d . The
% new c l u s t e r i s l o c a t e d wh ere t h e l o n g e s t d i s t a n c e b e t w e e n a c l u s t e r and a v e c t o r i s
% found .
[ c i d x , c t r s f i l t e r e d ] = kmeans ( l s f , s i z e _ o f _ c o d e b o o k , d i s t , s q E u c l i d e a n , r e p , 1 , d i s p , i t e r , Em p t y A ct i o n , s i n g l e t o n )
;
eval ( s p r i n t f ( save f i l t e r e d _ t e l e f i l t
clear
lsf =
etotal
energy

c t r s f i l t e r e d cidx ) )

l s f ; clear e t o t a l ; clear y ;
[];
= [];
= [];

f o r j = 1 : l e n g t h ( names )
u s e d _ w a v _ f i l e = c h a r ( names ( j ) ) ;
[ y , f s ] = wavread ( u s e d _ w a v _ f i l e ) ;
y = y (: ,1) ;
tic
f o r f r a m e = 1 : maxframes 1
s i g n a l = f unc_f r ame_in_data ( y , framesamples , f r amesamplesover lap , frame ) ;
w i n d o w e d s i g n a l = f u n c_ w i n d o w i n g ( s i g n a l ) ;
[ aLPC , e ] = f u n c _ l p c _ c o e f f ( w i n d o w e d s i g n a l , 2 0 ) ;
e n e r g y = [ e n e r g y sum ( abs ( w i n d o w e d s i g n a l ) . ^ 2 ) ] ;
etotal = [ etotal e ];
l s f = [ l s f p o l y 2 l s f ( aLPC ) ] ;
end
toc
end
c t r s f u l l = zeros ( size_of_codebook , 2 0 ) ;
for i = 1 : size_of_codebook
I = find ( cidx == i ) ;
for j = 1 : length ( I )
ctrsfull (i ,:) = [ lsf (: , I ( j ) ) + ctrsfull ( i ,:) ];
end
c t r s f u l l ( i , : ) = c t r s f u l l ( i , : ) / length ( I ) ;
end
eval ( s p r i n t f ( save f u l l _ t e l e f i l t c t r s f u l l cidx ) )
%C a l c u l a t i o n o f t h e e r r o r d e i s t a n c e .
c t r s E r r o r W B = [ c t r s f u l l ( : , 1 : 1 0 ) c t r s f i l t e r e d ( : , : ) / 2 c t r s f u l l ( : , 1 1 : 2 0 ) ] ;
eval ( s p r i n t f ( save e r r o r w b _ t e l e f i l t ctrsErrorWB frameenergy er r or power cidx ) )

Das könnte Ihnen auch gefallen