Sie sind auf Seite 1von 6

To: Amisha Agarwal, Head DSP Manager

From: Daniel Miller, Lead DSP Engineer


Subject: ecommendation !or Modi!ication o! "era# S$eech Enhancer
Date: % A$ril &'%&
Foreword
The "era# S$eech Enhancer is designed to im$ro(e the intelligibilit) o! human s$eech in other
media* +n the current im$lementation, the re$roduced sound is not o! the ,ualit) that we e#$ect* +
ha(e been tas-ed to in(estigate what $otential im$ro(ements could be made to the de(ice to
im$ro(e the o(erall sound ,ualit) $roduced b) our de(ice* This memo is a recommendation !or a
modi!ication to the "era# S$eech Enhancer*
Summar)
The "era# S$eech Enhancer uses d)namic range com$ression and e,uali.ation to im$ro(e the
intelligibilit) o! s$eech* This means that the d)namics o! the s$eech are smoothed out and raised
in le(el so that both the ,uiet and loud $arts o! s$eech are more easil) heard* The de(ice also
anal).es the incoming signal and determines a$$ro$riate $arameters !or the sound modi!ications
described abo(e*
Part o! the $rocess !or this modi!ication to the sound is ta-ing the (alue o! the signal at s$eci!ic
times* /) ta-ing these (alues more o!ten, we are able to re$roduce higher !re,uenc) sounds*
Humans can hear u$ to &' -H. at the high end* +n our current im$lementation o! the de(ice we
are sam$ling at %0 -H., which can onl) re$roduce !re,uencies u$ to 1 -H.* /ecause o! the
sam$ling rate we use, the to$ hal! o! our hearing range is being cuto!!*
The reasoning !or the reduced sam$ling rate was to allow more time !or com$utation in between
gathering each (alue o! the signal* This decision was made be!ore we had a de!inite (alue !or
how long our com$utation would ta-e* 2ow that the algorithms !or the com$ression and !iltering
ha(e been com$leted, is is clear that increasing the sam$ling rate is a (iable o$tion*
3ther $otential im$ro(ements such as de(elo$ing more anal)sis methods and using larger !ilters
would re,uire either signi!icantl) increased de(elo$ment time or signi!icantl) increased $roduct
cost* The sam$ling rate increase would re,uire minimal de(elo$ment and can run on the same
hardware that has alread) been a$$ro(ed* For the abo(e reasons, + recommend that we increase
the sam$ling rate o! the "era# S$eech Enhancer !rom %0 -H. to 41 -H.*
Discussion
The "era# S$eech Enhancer is a de(ice designed to be used between an) stereo sound source and
an) s$ea-ers* /) using d)namic range com$ression and !re,uenc) e,uali.ation the de(ice
smooths out the d)namics o! (oices as well as bringing u$ the le(el o! the (ocal !re,uencies* The
de(ice also determines a$$ro$riate $arameters !or the com$ression and e,uali.ation based on the
!re,uenc) characteristics o! the incoming signal* This $rocess is im$lemented b) sam$ling the
incoming signal, !iltering the signal, com$ressing the signal, then recombining the signal and
con(erting the signal bac- to analog, while also anal).ing the signal and deciding a$$ro$riate
$arameters* The $ur$ose o! this memo is discuss the $rocess and $otential im$ro(ements to the
signal*
Sam$ling
The !irst ste$ in our $rocess is the sam$ling o! the audio* +n the current im$lementation we ha(e
chosen to sam$le at %0 -H.* This sam$ling rate allows us to accuratel) re$roduce !re,uencies
below 1 -H.* The range o! human hearing reaches !rom about &' H. to &' -H.* 5learl) this
sam$ling rate does not allow !or the com$lete range o! human hearing which reduces the
ma#imum sound ,ualit) that is $ossible !or our de(ice to re$roduce* The main reasons that we
chose the sam$ling rate we did were com$utational time considerations* All o! our !iltering and
com$ression algorithms ha(e to be able to be com$leted in the time between sam$les, so
increasing the time between sam$les b) lowering the sam$ling rate gi(es more time !or
com$utation* 3ne $otential sound ,ualit) im$ro(ement could be increasing the sam$ling rate,
but this is onl) $ossible i! all o! the com$utation is !ast enough*
Filtering
The ne#t ste$ in the $rocess is !iltering the signal* There are man) di!!erent wa) o! se$arating the
incoming signal into !re,uenc) bands including ++ !ilters, F+ !ilters, and FFT !ilterban-s* Each
o! these technologies ha(e bene!its, but onl) some o! these bene!its are a$$licable to our $roject
Filterban-s are the $rocess o! ta-ing the !ast !ourier trans!orm o! a set o! sam$les to get
coe!!icients !or each !re,uenc) in the s$ectrum o! the signal* To !ilter the signal using these
coe!!icients, grou$s o! the coe!!icients are se$arated !rom each other so that there are multi$le
bands o! !re,uencies* Filterban-s allow !or e#tremel) stee$ !iltering because the s$acing on the
coe!!icients o! the FFT is determined b) the sam$ling rate di(ided b) the number o! sam$les
used !or the FFT* 3nce the grou$s o! !re,uencies ha(e been determined, an +FFT is ta-en o! each
grou$ o! coe!!icients with .eros in the $lace o! the unused coe!!icients* This technolog) is not
(er) use!ul to us because we are tr)ing to o$erate in near real6time conditions and ta-ing FFTs
and +FFTs are (er) time consuming* +! we reduce the si.e o! the FFTs so that the com$utational
dela) is reduced, the !re,uenc) resolution is reduced and an) lea-age that results !rom ta-ing the
FFT is accentuated* For the abo(e reasons we ha(e decided not to use !ilterban-s !or the de(ice*
F+ !ilters are a design o! !ilter using onl) current and $ast sam$les with no !eedbac- to
determine the !iltered out$ut* The design o! F+ !ilters leads to them alwa)s being causal and
stable* The stabilit) o! the !ilter hel$s because i! the) are used with !i#ed $oint signals, the !ilter
will ne(er ha(e an) internal o(er!low states* F+ !ilters also ha(e linear $hase relations to
!re,uenc) and a constant grou$ dela)* This means that b) using the !ilter, !re,uencies within the
signal will not shi!t in relation to each other, which could cause $hase cancellation issues* F+
!ilters also $ro(ide disad(antages !or our situation* /ecause we are tr)ing to o$erate in real time,
dela) is a (er) real $roblem !or our s)stem and the order o! the F+ !ilter is e,ual to the dela) in
sam$les o! the signal* To get the stee$ness o! !ilter that we want, we would re,uire F+ !ilters o!
at least order %''* F+ !ilters must be (er) high order !or the slo$e at cuto!!s to be stee$ enough
to ade,uatel) se$arate the !re,uencies enough !or our $ur$oses* Also, the $hase linearit) and
constant grou$ dela) o! F+ !ilters are irrele(ant !or audio wor- because the human auditor)
s)stem cannot detect di!!erences in $hase, so an) changes to the $hases o! the !re,uencies in the
signal cannot be detected meaning that constant grou$ dela) is unnecessar) !or audio wor-*
++ !ilters are designed using current and $ast in$uts as well as $ast out$uts* This means that the
current out$ut is de$endent on $ast out$uts which were also de$endent on $ast in$uts, which is
where ++ !ilters get the name +n!inite im$ulse res$onse* ++ !ilters do not ha(e the o(er!low and
constant grou$ dela) bene!its o! F+ !ilters, but re,uire a !ar lower order to achie(e the same
stee$ness in the sto$ band o! the !ilter* Although ++ !ilters can ha(e issues with o(er!low and
grou$ dela), these issues can be dealt with easil)* As e#$lained in the F+ section, (ariable grou$
dela) is not an issue !or audio signals because $hase relations are not $ercei(ed b) the ear* To
$re(ent internal o(er!lows o! the signal in an ++ !ilter, the !ilter can be bro-en into what are
called bi,uads* The bi,uads are si# coe!!icient ++ !ilters* /) choosing which .eros and holes to
use !or each bi,uad, a trans!er !unction can be determined !or each bi,uad* /) using the
coe!!icients, a 7gain8 (alue can be determined !or each bi,uad which the in$ut should be
multi$lied b) be!ore being sent through the bi,uad to $re(ent an) $otential o(er!lows that ma)
arise !rom that $articular bi,uad* /) using the gain coe!!icients on each bi,uad, the whole ++
!ilter is $re(ented !rom internal o(er!lows and the !ilter is sa!e to use with !i#ed $oint signals*
5om$ression
D)namic range com$ression is the $rocess o! detecting when a signal is o(er a s$eci!ied
threshold and bringing it down closer to the threshold* To ma-e this $rocess sound more natural,
the signal is brought down to the di!!erence between the threshold and the signal times some
ratio o(er the threshold o(er the course o! a determined 7attac-8 time* 9hen the signal crosses
below the threshold again, the signal is com$ression is 7released8 o(er some amount o! time as
well to smooth out the transition between com$ressed and uncom$ressed*
+n our com$ression algorithm, we are using the MS (alue o! the signal to determine whether
the signal should be com$ressed* :sing MS detection instead o! $ea- detection causes the
com$ression to be a little more gradual and not as harsh on e(er) single $ea- in the signal* To
calculate MS we ha(e to store s,uares o! $re(ious in$uts as well as $er!orming di(ision and
ta-ing a s,uare root !or each out$ut* This re,uires some com$utation that sim$l) using $ea-
(alues does not, but the result sounds more a$$ro$riate !or our a$$lication* :sing the MS (alue
also re,uires that we sa(e a large number o! $re(ious sam$les so that we can remo(e them !rom
the calculation once the window mo(es $ast the time $eriod where that sam$le was included in
the MS calculation* ight now, the MS (alue we are using is onl) calculated o(er %' ms,
which means that the si.e re,uired to store those (alues is not $rohibiti(el) large*
The actual gain reduction o! our com$ression algorithm uses the standard de!inition o! threshold
$lus the di!!erence between the signal and the threshold multi$lied b) a ratio that is less than one*
This is $rett) sim$le to im$lement, but gets more com$licated with the addition o! attac- and
release times* The biggest issues with our com$ression is $otential loss o! $recision !rom the
multi$lication* This is one o! the areas with no signi!icant areas !or $otential im$ro(ement*
+m$lementing attac- and release times !or the com$ressor is !airl) straight!orward, but there are
a !ew issues that could $otentiall) be im$ro(ed* The !irst issue is that because o! the de!inition o!
com$ression and the wa) release wor-s, the signal below the threshold will actuall) be am$li!ied
until the release is !inished* This is not the intended !unctionalit) o! a com$ressor, so we ha(e
made sure that the !actor b) which sam$les are multi$lied to achie(e the a$$ro$riate gain
reduction is onl) u$dated i! the MS (alue o! the signal is abo(e the threshold* There could be a
better wa) o! $re(enting the multi$lier !rom going o(er one* Attac- times are !airl)
straight!orward* +n the current im$lementation, the attac- and release times are related b) a
!actor, so that the attac- and release times must be integer multi$les o! each other* +m$lementing
a s)stem so that each time can be com$letel) inde$endent is one $otential im$ro(ement to the
current im$lementation*
Signal Anal)sis
The audio anal)sis section o! the $roject does not ha(e to o$erate in real time, and there!ore can
use larger FFTs and more com$le# com$utations because it is not limited b) the time between
sam$les* +n the current im$lementation, we use three di!!erent methods to hel$ ma-e a
determination o! a$$ro$riate $arameters* /) ta-ing se(eral FFTs o(er the course o! multi$le
seconds and com$aring all o! the sub6decisions we come u$ with one o! three categories !or the
incoming signal*
The !irst method o! determining s$ectral ,ualit) is determining the a(erage (alue o! coe!!icients
in the &'61' H. !re,uenc) range to the a(erage (alue o! coe!!icients in the 1'6;''H. range* This
com$arison essentiall) loo-s !or the $resence o! bass drums or other (er) low instruments
com$ared to the range which contains the !undamental !re,uenc) o! most (oices*
The second method is com$aring the s$ectrum is determining whether the highest $ea-s in the
&'61' H. !re,uenc) range are higher than the highest $ea-s in the 1'6;'' H. range* This
$ro(ides another com$arison between (er) low instruments and (oices, but just worries about
$ea-s because the a(erages could be arti!iciall) high or low de$ending on the content o! the
signal*
The third com$arison is between the a(erage (olume o! !re,uencies abo(e ;<'' H. and the
a(erage (olume o! !re,uencies in the %'''6;<'' H. range* /ecause the !re,uenc) content o! the
(oice is negligible abo(e about ;<'' H., music and other !ull range media t)$es should ha(e
higher high !re,uenc) than $urel) (ocal signals*
More methods could be de(elo$ed to categori.e the signals and more categories could be
determined to im$ro(e the s$eci!ic tuning o! $arameters to signal t)$e i! this area o! the $roject
were to be im$ro(ed*
Potential +m$ro(ements
To im$ro(e the sound ,ualit) o! the de(ice, there are man) o$tions* +ncreasing the $ower o! the
hardware b) increasing $rocessor s$eed and memor) si.e would allow !or larger !ilters and more
com$utation between sam$les, but u$grading the hardware increases the cost o! the de(ice
signi!icantl)* 5hanging !ilter t)$es $ro(ides no real bene!its on the hardware that we are using
because o! the com$utation time and memor) limitations* S$ending time im$ro(ing the attac-
and release time algorithms as well as the gain reduction algorithm could $ro(ide sound ,ualit)
im$ro(ements, but the) would li-el) be subtle changes that would not be widel) a$$reciated*
S$ending time to de(elo$ new methods and categories !or our media t)$e detection algorithms
would hel$ to tailor the com$ression to the s$eci!ic in$ut signal but those sound ,ualit) gains
would also li-el) be subtle and hard !or the a(erage consumer to a$$reciate* The largest and
easiest wa) to im$ro(e the sound ,ualit) o! our de(ice is to increase the sam$ling rate
To increase the sam$ling rate would be a (er) eas) solution* The current im$lementation o! the
de(ice sam$les and the de!ault sam$ling rate o! the board which is 41 -H. and we just throw
awa) two o! e(er) three sam$les* All that would be re,uired is we -ee$ all o! the sam$les and
s$end some time redesigning our ++ !ilters !or a 41 -H. sam$ling rate* /ased on the current
amount o! com$utational time used !or the !iltering, rms com$utation, and com$ression, all o!
our time sensiti(e com$utations will !it inside o! the s$ace between sam$les at 41 -H.* The FFT
and signal anal)sis all ha$$ens inde$endent o! the com$ression and can be $aused and resumed
using interru$ts allowing us to use a higher sam$ling rate* This increase in sam$ling rate
im$ro(es !re,uenc) re$roduction !rom 1 -H. at the ma#imum to &4 -H. and the most* This
brings almost two octa(es o! !re,uencies into the range that we are able to re$roduce and allows
us to !ull) re$roduce the range o! human hearing*
3utline
%*&
3P: Low sound ,ualit)
TA: +denti!) $ossible im$ro(ements
+P: ecommend a wa) o! increasing sound ,ualit) within our limitations
ecommendation: +ncrease sam$ling rate to allow !or better high !re,uenc) re$resentation
Higher sam$ling rate allows !or higher !re,uencies to be re$resented allowing !or a more natural
sound*
+ncreasing sam$ling rate will onl) re,uire the redesign o! !ilters* 3ther $otential im$ro(ements
re,uire new hardware*

Das könnte Ihnen auch gefallen