Speech Coding 1

SpeechCodingTechniques(I)
IntroductiontoQuantization
Waveformbasedcoding
PulseCodedModulation(PCM)
DifferentialPCM(DPCM)
AdaptiveDPCM(ADPCM)
Modelbasedcoding
Scalarquantization
Uniformquantization
Nonuniformquantization
Channelvocoder
AnalysisbySynthesistechniques
Harmonicvocoder
OriginofSpeechCoding
Watson,ifIcangetamechanismwhichwillmakea
currentofelectricityvaryitsintensityastheairvaries
indensitywhensoundispassingthroughit,Ican
telegraphanysound,eventhesoundofspeech.
A.G.Bell1875
analogcommunication
N
H X = p i log2 p i (bits/sample)
i =1
digitalcommunication
orbps
Entropyformula
ofadiscretesource
C.E.Shannon1948
DigitizationofSpeechSignals
xc(t)
Sampler
continuoustime
speechsignal
x(n)=xc(nT)
Quantizer
x(n)
discretesequence
speechsamples
Sampling
SamplingTheorem
whensamplingasignal
(e.g.,convertingfroman
analogsignaltodigital),the
samplingfrequencymust
begreaterthantwicethe
bandwidthoftheinput
signalinordertobeableto
reconstructtheoriginal
perfectlyfromthesampled
version.
Samplingfrequency:>8Ksamples/second
(humanspeechisroughlybandlimitedat4KHz)
Quantization
InPhysics
Inspeechcoding
Tolimitthepossiblevaluesofamagnitudeorquantity
toadiscretesetofvaluesbyquantummechanical
rules
Tolimitthepossiblevaluesofaspeechsampleor
predictionresiduetoadiscretesetofvaluesby
informationtheoreticrules(tradeoffbetweenRateand
Distortion)
QuantizationExamples
Examples
Continuoustodiscrete
Discretetodiscrete
Roundyourtaxreturntointegers
Themileageofmycarisabout60K.
Playwithbits
Precisionisfinite:themoreprecise,themorebitsyouneed(to
resolvetheuncertainty)
Keepacardinsecretandaskyourpartnertoguess.He/shecan
onlyaskYes/Noquestions:isitbiggerthan7?Isitlessthan4?...
However,noteverybithasthesameimpact
aquarterofmilk,twogallonsofgas,normaltemperatureis98.6F,
myheightis5foot9inches
Howmuchdidyoupayforyourcar?(twothousandsvs.$2016.78)
Scalarvs.VectorQuantization
Scalar:foragivensequenceofspeechsamples,
wewillprocess(quantize)eachsample
independently
Vector:wewillprocess(quantize)ablockof
speechsampleseachtime
Input:Nsamplesoutput:Ncodewords
Input:Nsamplesoutput:N/dcodewords(block
sizeisd)
SQisaspecialcaseofVQ(d=1)
ScalarQuantization
InSQ,quantizingNsamplesisnotfundamentallydifferentfrom
Quantizingonesample(sincetheyareprocessedindependently)
original
value
x
quantization
index
sS
quantized
value
f1
^
x
Quantizer
x
^
x
Aquantizerisdefinedbycodebook(collectionofcodewords)
andmappingfunction(straightforwardinthecaseofSQ)
RateDistortionTradeoff
Rate:Howmany
codewords(bits)are
used?
Distortion
Example:16bitaudiovs.
8bitPCMspeech
Distortion:Howmuch
distortionisintroduced?
Example:meanabsolute
difference(L1),mean
squareerror(L2)
SQ
VQ
Rate(bps)
Question:whichquantizerisbetter?
UniformQuantization
UniformQuantization
Ascalarquantizationiscalleduniformquantization(UQ)ifallitscodewords
areuniformlydistributed(equallydistanced)
Example(quantizationstepsize=16)
24
40
UniformDistribution denotedbyU[A,A]
f x =
1
2A
0
f(x)
1/2A
x [ A , A ]
else
248
...
6dB/BitRule
f(e)
1/
/2
/2
Note:QuantizationnoiseofUQonuniformdistribution
isalsouniformlydistributed
Forauniformsource,addingonebit/samplecan
reduceMSEorincreaseSNRby6dB
(Thederivationofthis6dB/bitrulewillbegivenintheclass)
NonuniformQuantization
Motivation
Speechsignalshavethe
characteristicthatsmallamplitude
samplesoccurmorefrequently
thanlargeamplitudeones
Humanauditorysystemexhibitsa
logarithmicsensitivity
Moresensitiveatsmallamplitude
range(e.g.,0mightsounddifferent
from0.1)
Lesssensitiveatlargeamplitude
range(e.g.,0.7mightnotsound
differentmuchfrom0.8)
histogramoftypical
speechsignals
FromUniformtoNonuniform
F:nonlinearcompressingfunction
F1:nonlinearexpandingfunction
FandF1:nonlinearcompander
F
Example
F:y=log(x)
^
y
F1
^
x
F1:x=exp(x)
WewillstudynonuniformquantizationbyPCMexamplenext
Waveformbasedcoding
AdaptiveDPCM(ADPCM)
Modelbasedcoding
Scalarquantization
Uniformquantization
Channelvocoder
Harmonicvocoder
PulseCodeModulation
Basicidea:assignsmallerquantizationstepsize
forsmallamplituderegionsandlarger
quantizationstepsizeforlargeamplituderegions
Twotypesofnonlinearcompressingfunctions
MulawadoptedbyNorthAmerican
telecommunicationssystems
AlawadoptedbyEuropeantelecommunications
systems
MuLaw(law)
MuLawExamples
y
ALaw
ALawExamples
y
Comparison
y
PCMSpeech
Mulaw(Alaw)compressesthesignalto8bits/sampleor
64Kbits/second(withoutcompandor,wewouldneed
12bits/sample)
ALookInsideWAVFormat
MATLABfunction[x,fs]=wavread(filename)
ChangetheGear
Strictlyspeaking,PCMismerelydigitizationof
speechsignalsnocoding(compression)atall
Byspeechcoding,werefertorepresenting
speechsignalsatthebitrateof<64Kbps
Tounderstandhowspeechcodingtechniques
work,Iwillcoversomebasicsofdata
compression
DataCompressionBasics
Discretesource
Variablelengthcodes
Information=uncertainty
Quantificationofuncertainty
Sourceentropy
Motivation
Prefixcondition
Huffmancodingalgorithm
Datacompression=sourcemodeling
ShannonsPictureonCommunication
(1948)
source
channel
encoder
channel
channel
decoder
source
encoder
destination
source
decoder
Thegoalofcommunicationistomoveinformation
fromheretothereandfromnowtothen
Examplesofsource:
Humanspeeches,photos,textmessages,computerprograms
Examplesofchannel:
storagemedia,telephonelines,wirelessnetwork
Information
Whatdowemeanbyinformation?
Howtoquantitativelymeasureandrepresent
information?
Shannonproposesaprobabilisticapproach
Howtoachievethegoalofcompression?
Anumericalmeasureoftheuncertaintyofan
experimentaloutcomeWebsterDictionary
Representdifferenteventsbycodewordswith
varyingcodelengths
Information=Uncertainty
Zeroinformation
Littleinformation
ItisverycoldinChicagoinwintertime(notmuchuncertaintysince
itisknowntomostpeople)
DozensofhurricanesforminAtlanticoceaneveryyear(notmuch
uncertaintysinceitisprettymuchpredictable)
Largeinformation
WVUlosttoFSUinGatorBowl2005(pastnews,nouncertainty)
YaoMingplaysforHoustonRocket(celebrityfact,nouncertainty)
HurricanexxxisgoingtohitHouston(sinceKatrina,weallknow
howdifficultitistopredictthetrajectoryofhurricanes)
TherewillbeanearthquakeinLAaroundXmas(areyousure?an
unlikelyevent)
QuantifyingUncertaintyofanEvent
Selfinformation
I p =log 2 p
Ip
probabilityoftheeventx
(e.g.,xcanbeX=HorX=T)
notes
musthappen
(nouncertainty)
unlikelytohappen
(infiniteamountofuncertainty)
Intuitively,I(p)measurestheamountofuncertaintywitheventx
DiscreteSource
Adiscretesourceischaracterizedbya
discreterandomvariableX
Examples
Coinflipping:P(X=H)=P(X=T)=1/2
Dicetossing:P(X=k)=1/6,k=16
Playingcarddrawing:
P(X=S)=P(X=H)=P(X=D)=P(X=C)=1/4
Howtoquantifytheuncertaintyofadiscretesource?
WeightedSelfinformation
p
0
1/2
1
Ip
I w p =pI p
1
0
1/2
0
Aspevolvesfrom0to1,weightedselfinformation
I w p =plog 2 p firstincreasesandthendecreases
Question:
WhichvalueofpmaximizesIw(p)?
MaximumofWeightedSelfinformation
p=1/e
I w p =
1
e ln2
UncertaintyofaDiscreteSource
Adiscretesource(randomvariable)isacollection
(set)ofindividualeventswhoseprobabilitiessumto1
Xisadiscreterandomvariable
x {1,2,... ,N }
p i = prob x = i ,i =1,2,... ,N
p i=1
i =1
Toquantifytheuncertaintyofadiscretesource,
wesimplytakethesummationofweightedself
informationoverthewholeset
ShannonsSourceEntropy
Formula
N
H X = I w pi
i =1
H X = p i log2 p i (bits/sample)
i =1
orbps
SourceEntropyExamples
Example1:(binaryBernoullisource)
Flippingacoinwithprobabilityofheadbeingp(0<p<1)
p = prob x =0 ,q =1p = prob x =1

H X = p log2 p q log 2 q
Checkthetwoextremecases:
Aspgoestozero,H(X)goesto0bpscompressiongainsthemost
Aspgoestoahalf,H(X)goesto1bpsnocompressioncanhelp
EntropyofBinaryBernoulliSource
SourceEntropyExamples
Example2:(4wayrandomwalk)
E
1
1 W
prob x =S = , prob x = N =
2
4
1
S
prob x =E =prob x = W =
8
1
1 1
1 1
1 1
1
H X = log2 log 2 log 2 log2 =1.75 bps
2
2 4
4 8
8 8
8
SourceEntropyExamples(Cont)
Example3: (sourcewithgeometricdistribution)
Ajarcontainsthesamenumberofballswithtwodifferentcolors:blueandred.
Eachtimeaballisrandomlypickedoutfromthejarandthenputback.Consider
theeventthatatthekthpicking,itisthefirsttimetoseearedballwhatisthe
probabilityofsuchevent?
1
1
p = prob x = red = , 1 p =prob x = blue =
2
2
Prob(event)=Prob(blueinthefirstk1picks)Prob(redinthekthpick)
=(1/2)k1(1/2)=(1/2)k
MorseCode(1838)
A B C D E F G H I
J K L M
.08 .01 .03 .04 .12 .02 .02 .06 .07 .00 .01 .04 .02
. .. ..
... . .
..
. . ...
N O P Q R S T U V W X Y Z
.07 .08 .02 .00 .06 .06 .09 .03 .01 .02 .00 .02 .00
.
.. . ..
.. . .. . ..
AverageLengthofMorseCodes
Nottheaverageofthelengthsoftheletters:
(2+4+4+3+)/26=82/263.2
Wewanttheaverageatobesuchthatinatypical
realsequenceofsay1,000,000letters,thenumberof
dotsanddashesshouldbeabouta1,000,000
Theweightedaverage:
(freqofA)(lengthofcodeforA)+(freqofB)(lengthofcodeforB)+
=.082+.014+.034+.043+2.4
Question:isthistheentropyofEnglishtexts?
EntropySummary
Selfinformationofaneventxisdefinedas
I(x)=log2p(x)(rareeventslargeinformation)
Entropyisdefinedastheweightedsummation
ofselfinformationoverallpossibleevents
N
H X = p i log2 p i
Foranydiscretesource
i =1
N
pi=1
i =1
(bits/sample)
orbps
0p i 1
Howtoachievesourceentropy?
discrete
sourceX
entropy
coding
binary
bitstream
P(X)
Note:Theaboveentropycodingproblemisbasedonsimplified
assumptionsarethatdiscretesourceXismemorylessandP(X)
iscompletelyknown.Thoseassumptionsoftendonotholdfor
realworlddatasuchasspeechandwewillrecheckthemlater.
Discretesource
Variablelengthcodes
Sourceentropy
Motivation
Prefixcondition
DataCompression=sourcemodeling
VariableLengthCodes(VLC)
Recall:
Selfinformation
I p =log 2 p
Itfollowsfromtheaboveformulathatasmallprobabilityeventcontains
muchinformationandthereforeworthmanybitstorepresentit.Conversely,
ifsomeeventfrequentlyoccurs,itisprobablyagoodideatouseasfewbits
aspossibletorepresentit.Suchobservationleadstotheideaofvaryingthe
codelengthsbasedontheeventsprobabilities.
Assignalongcodewordtoaneventwithsmallprobability
Assignashortcodewordtoaneventwithlargeprobability
4wayRandomWalkExample
symbolk
S
N
E
W
pk
0.5
0.25
0.125
0.125
fixedlength variablelength
codeword
codeword
00
01
10
11
0
10
110
111
symbolstream: SSNWSENNNWSSSNESS
fixedlength: 00000111001001011100000001100000 32bits
variablelength:0010111011010101110001011000 28bits
4bitssavingsachievedbyVLC(redundancyeliminated)
ToyExample(Cont)
sourceentropy:4
H X = pk log2 p k
k =1
=0.51+0.252+0.1253+0.1253
=1.75bits/symbol
averagecodelength:
l =
fixedlength
Nb
Ns
l =2 bps H X
(bps)
Totalnumberofbits
Totalnumberofsymbols
variablelength
l =1.75 bps=H X
ProblemswithVLC
Whencodewordshavefixedlengths,the
boundaryofcodewordsisalwaysidentifiable.
Forcodewordswithvariablelengths,their
boundarycouldbecomeambiguous
symbol
S
N
E
W
VLC
0
1
10
11
SSNWSE
e
00111010
00111010
d
SSWNSE
00111010
d
SSNWSE
UniquelyDecodableCodes
Toavoidtheambiguityindecoding,weneedto
enforcecertainconditionswithVLCtomake
themuniquelydecodable
Sinceambiguityariseswhensomecodeword
becomestheprefixoftheother,itisnaturalto
considerprefixcondition
Example: pprpreprefprefiprefix
ab:aistheprefixofb
Prefixcondition
Nocodewordisallowedto
betheprefixofanyother
codeword.
Wewillgraphicallyillustratethiscondition
withtheaidofbinarycodewordtree
BinaryCodewordTree
root
1
Level1
Level2
Levelk
#ofcodewords
11
10
01 00
22
2k
PrefixConditionExamples
symbolx
S
N
E
W
codeword1 codeword2
0
0
1
10
10
110
11
111
1
11
0
10
01 00
codeword1
1
11
111
0
10
110
codeword2
Howtosatisfyprefixcondition?
Basicrule:Ifanodeisusedasacodeword,
thenallitsdescendantscannotbeusedas
codeword.
1
Example
11
111
0
10
110
VLCSummary
Rule#1:shortcodewordlargeprobability
event,longcodewordsmallprobabilityevent
Rule#2:nocodewordcanbetheprefixofany
othercodeword
Question:givenP(X),howtosystematically
assignthecodewordsthatalwayssatisfythose
tworules?
Answer:Huffmancoding,arithmeticcoding
EntropyCoding
HuffmanCodes(Huffman1952)
CodingProceduresforanNsymbolsource
Sourcereduction
Codewordassignment
Listallprobabilitiesinadescendingorder
Mergethetwosymbolswithsmallestprobabilitiesintoa
newcompoundsymbol
RepeattheabovetwostepsforN2steps
Startfromthesmallestsourceandworkbackwardtothe
originalsource
Eachmergingpointcorrespondstoanodeinbinary
codewordtree
ExampleI
Step1:Sourcereduction
symbolx
S
N
E
W
p(x)
0.5
0.25
0.125
0.125
0.5
0.25
0.25
(EW)
0.5
0.5
(NEW)
compoundsymbols
ExampleI(Cont)
Step2:Codewordassignment
symbolx
NEW
S
0
N
1 0
S
EW
10
E
N
1
0
W
111 110
W
E
1
p(x)
0.5
0.5
0.5
0.25
0.25 0
0.5
0
0.125
0.25 1
0.125 1
codeword
0
0
10
1
110
111
ExampleI(Cont)
1
NEW
0
1
0
S
EW
10
N
1
0
110
W
E
NEW
1
0
1
S
or
EW
01
N
1
0
000
001
W
E
Thecodewordassignmentisnotunique.Infact,ateach
mergingpoint(node),wecanarbitrarilyassign0and1
tothetwobranches(averagecodelengthisthesame).
ExampleII
Step1:Sourcereduction
symbolx
e
a
i
o
u
p(x)
0.4
0.2
0.2
0.1
0.1
0.4
0.2
0.2
0.2
(ou)
0.6
(aiou)
0.4
(iou) 0.4
0.2
0.4
compoundsymbols
ExampleII(Cont)
Step2:Codewordassignment
symbolx
e
a
i
o
u
p(x)
0.4
0.2
0.2
0.1
0.1
0.4
0.2
0.2
0.6 0
(aiou)
0.4
1
(iou) 0.4
0.2
0.4
0.2
(ou)
compoundsymbols
codeword
1
01
000
0010
0011
ExampleII(Cont)
0
1
(aiou)
e
00
01
(iou) a
000
001
(ou)
i
0010 0011
o
u
binarycodewordtreerepresentation
Discretesource
Variablelengthcodes
Sourceentropy
Motivation
Prefixcondition
DataCompression=sourcemodeling
WhatisSourceModeling
discrete
sourceX
entropy
coding
binary
bitstream
P(X)
discrete
sourceX
Modeling Y
Process
entropy
coding
P(Y)
probability
estimation
binary
bitstream
ExamplesofModelingProcess
Runlengthcoding
Dictionarybasedcoding
Applylineartransformtoablockofsymbols(e.g.,discrete
cosinetransforminJPEG)
Predictivecoding
Recordrepeatingpatternsintoadictionaryupdatedonthefly
(e.g.,LempelZivalgorithminWinZip)
Transformcoding
Counttherunlengthsofidenticalsymbols(suitablefor
binary/graphicimages)
Applylinearpredictiontothesequenceofsymbols(e.g.,DPCM
andADPCMinwiredtransmissionofspeech)
PredictiveCoding
discrete
sourceX
Y
Linear
Prediction
entropy
coding
P(Y)
probability
estimation
binary
bitstream
PredictionresiduesequenceYusuallycontainsless
uncertainty(entropy)thantheoriginalsequenceX
WHY?BecausetheredundancyisassimilatedintotheLPmodel
TwoExtremeCases
tossing
afaircoin
H,H,T,H,T,H,T,T,T,H,T,T,
P(X=H)=P(X=T)=1/2:(maximumuncertainty)
Nopredictioncanhelp(havetospend1bit/sample)
tossingacoinwith
twoidenticalsides
Head
or
Tail?
duplication
HHHH
TTTT
P(X=H)=1,P(X=T)=0:(minimumuncertainty)
Predictionisalwaysright(1bitisenoughtocodeall)
DifferentialPCM
Basicidea
Linearprediction
Sincespeechsignalsareslowlyvarying,itispossible
toeliminatethetemporalredundancybyprediction
Insteadoftransmittingoriginalspeech,wecode
predictionresiduesinstead,whichtypicallyhave
smallerenergy
Fixed:thesamepredictorisusedagainandagain
Adaptive:predictorisadjustedonthefly
FirstorderPrediction
x1x2xN
Encoding
e1=x1
Decoding
xn
e1e2eN
x1x2xN
x1=e1
xn=en+xn1,n=2,,N
en
xn
Encoder
en=xnxn1,n=2,,N
_ en
xn1
e1e2eN
xn
xn1
D
xn1
DPCMLoop
Decoder
PredictionMeetsQuantization
Openloop
Closedloop
Predictionisbasedonunquantizedsamples
Sincedecoderonlyhasaccesstoquantizedsamples,
werunintoasocalleddriftingproblemwhichisreally
badforcompression
Predictionisbasedonquantizedsamples
Stillsuffersfromerrorpropagation(i.e.,quantization
errorsofthepastwillaffecttheefficiencyof
prediction),butnodriftingisinvolved
OpenloopDPCM
xn
_ en
xn1
Q
D
xn
xn1
^
en
^
xn
^x
n1
Decoder
Encoder
Notes: Predictionisbasedonthepastunquantizedsample
QuantizationislocatedoutsidetheDPCMloop
ClosedloopDPCM
xn
_ en
^x
n1
_
en
Q
D
x^n
^x
n1
_
en
^x
n
^x
n1
Decoder
Encoder
xn,en:unquantizedsamplesandpredictionresidues
_
^x ,e :decodedsamplesandquantizedpredictionresidues
n
Notes: Predictionisbasedonthepastdecodedsample
QuantizationislocatedinsidetheDPCMloop
NumericalExample
xn
90 92 91 93 93 95
en
90 2 2 3 0
[]
Q Q x = x 3
en
^x
a
ab
b
90 3 3 3 0
90 93 90 93 93 96
a
b a+b
ClosedloopDPCMAnalysis
_
xn
A:
B:
_ en
A
^x
n1
en =x n x n1
x n=en x n1
en
Q
D
x^n
B
^x
n1
x n x n =en e n
Thedistortionintroducedtopredictionresidueenis
identicaltothatintroducedtotheoriginalsamplexn
HighorderLinearPrediction
originalsamples
x1x2xn1xnxn+1xN
x1x2xN
e1e2eN
initialize e1=x1,e2=x2,,ek=xk
Encoding
prediction en =x n a i x ni ,n =k 1,.. . ,N
Decoding
i =1
e1e2eN
x1x2xN
initialize x1=e1,x2k=e2,,xk=ek
prediction x n=en a i x ni ,n =k 1,.. . ,N
i =1
Thekeyquestionis:howtoselectpredictioncoefficients?
Recall:LPAnalysisofSpeech
minimize
n=1
n=1
k =1
MSE = e2 n = [ x n a k x n k ]2
][ ] [ ]
R n 0
R n 1
R n K 1 a 1
R n 1
R n 1
R n 0
a 2 = R n 2
Rn 1
aK R n K
R n K 1
R n 1
Rn 0
Notethatinfixedprediction,autocorrelationiscalculated
overthewholesegmentofspeech(NOTshorttimefeatures)
QuantizedPredictionResidues
Furtherentropycodingispossible
However,thecurrentpracticeofspeechcodingassign
fixedlengthcodewordstoquantizedpredictionresidues
Variablelengthcodes:e.g.,Huffmancodes,Golombcodes
Arithmeticcoding
Theassignedcodelengthsarealreadynearlyoptimalfor
achievingthefirstorderentropy
Goodfortherobustnesstochannelerrors
DPCMcanachievecompressionoftwo(i.e.,32kbps)
withoutnoticeablespeechqualitydegradation
AdaptiveDPCM
Adaptation
K
en =x n a i x n i ,n =K 1,.. . ,N
i =1
][ ] [ ]
R n 0
R n 1
R n K 1 a1
R n 1
R n 1
R n 0
a2 = R n 2
Rn 1
aK R n K
R n K 1
R n 1
Rn 0
Totracktheslowlyvaryingpropertyofspeechsignals,
theestimatedshorttimeautocorrelationisupdatedonthefly
PredictionGain
Gp
Gp=10log 10
2x
2e
dB
ForwardandBackwardAdaptation
Forwardadaptation
Backwardadaptation
Theautocorrelationisestimatedfromthecurrent
frameandthequantizedpredictioncoefficientsare
transmittedtothedecoderassideinformation(wewill
discusshowtoquantizesuchvectorinthediscussion
ofCELPcoding)
Theautocorrelationisestimatedfromthecausalpast
andthereforenooverheadisrequiredtobe
transmitted;decoderwillduplicatetheencoders
operation
IllustrationofForwardAdaptation
F1
F2
F3
F4
F5
Foreachframe,asetofLPCsaretransmitted
IllustrationofBackward
Adaptation
Foreachframe,LPCarelearnedfromthepastframe
Comparison
Forwardadaptive
prediction
Backwardadaptive
prediction
Asymmetriccomplexity
allocation
(encoder>decoder)
Overheadnonnegligible
Symmetriccomplexity
allocation
(encoder=decoder)
Nooverhead
robusttoerrors
sensitivetoerrors
Moresuitableforlowbit
ratecoding
Moresuitableforhighbit
ratecoding
ARvs.MA
Autoregressive(AR)Model
Movingaverage(MA)model
EssentiallyanIIRfilter(allpoles)
EssentiallyaFIRfilter(allzeros)
AutoregressiveMovingAverage(ARMA)model
ITUTG.726AdaptiveDifferential
PulseCodeModulation(ADPCM)
Decoder
Encoder
WaveformCodingDemo
http://wwwlns.tf.unikiel.de/demo/demo_speech.htm
DataCompressionParadigm
discrete
sourceX
Modeling Y
Process
entropy
coding
P(Y)
probability
estimation
binary
bitstream
Question:WhydowecallDPCM/ADPCMwaveformcoding?
Answer:BecauseY(predictionresidues)isalsoakindof
waveformjustliketheoriginalspeechXtheyhavethe
samedimensionality
Waveformbasedcoding
AdaptiveDPCM(ADPCM)
Modelbasedcoding
Scalarquantization
Uniformquantization
Channelvocoder
Harmonicvocoder
IntroductiontoModelbasedCoding
modelspace
Signalspace
Analysis
Synthesis
{x1,,xN}
{1,,K}
RN
RK
K<<N
ToyExample
Modelbasedcodingofsinusoidsignals
x n=sin 2 fn ,n =1,2,... ,N
Inputsignal
{x1,,xN}
analysis
1=f
2=
synthesis
Reconstructed
signal
Question(testyourunderstandingaboutentropy)
Ifasourceisproducedbythefollowingmodel:
x(n)=sin(50n+p),n=1,2,,100wherepisarandomphase
withequalprobabilitiesofbeing0,45,90,135,180,
225,270,315.Whatistheentropyofthissource?
BuildingModelsforHuman
Speech
Waveformcoderscanalsoviewedasakindof
modelswhereK=N(notmuchcompression)
Usuallymodelbasedspeechcoderstargetat
veryhighcompressionratio(K<<N)
Thereisnofreelunch
Highcompressionratiocomesalongwith
severequalitydegradation
Wewillseehowtoachieveabettertradeoffin
partII(CELPcoder)
ChannelVocoder:Encoding
Bandpass
filter
Bandpass
filter
Voicing
detector
Pitch
detector
rectifier
rectifier
Lowpass
filter
Lowpass
filter
1bit
A/D
converter
34bits
/channel
A/D
converter
E
N
C
O
D
E
R
6bits
Framesize:20ms
Bitrate=24003200bps
ChannelVocoder:Decoding
D/A
Converter
D
E
C
O
D
E
R
Bandpass
filter
D/A
Converter
Voicing
Infor.
Pitch
period
Bandpass
filter
Pulse
generator
Noise
generator
AnalysisbySynthesis
Motivation
WhatisAbyS?
Theoptimalityofsomeparameteriseasyto
determine(e.g.,pitch),butnotforothers(e.g.,gain
parametersacrossdifferentbandsinchannelvoder)
Theinteractionamongparametersisdifficultto
analyzebutimportanttosynthesispart
Dothecompleteanalysisandsynthesisinencoding
Decoderisembeddedintheencoderforthereasonof
optimizingtheextractedparameters
AbySisaClosedLoop
inputspeech
=[ , ... , ]
1
K
Analysis
Synthesis
x =[ x 1 , ... , xN ]
e=[ e1 , ... ,eN ]
MMSE
ToyExampleRevisited
Inputsignal
{x1,,xN}
analysis
1=f
2=
synthesis
Reconstructed
signal
Functiondist=MSE_AbyS(x,f,phi)
n=1:length(x);
x_rec=sin(2*pi*f*n+phi);
e=xx_rec;
dist=sum(e.*e);
MATLABprovidesvarioustoolsforsolvingoptimizationproblem
>helpfminsearch
HarmonicModels
Forspeechwithinaframe
s n = A j cos j n j
j=1
Forvoicedsignals
Forunvoicedsignals
Phaseiscontrolledbypitchperiod
Pitchisoftenmodeledbyaslowlyvaryingquadratic
model
randomphase
Lessaccuratefortransitionsignals(e.g.,plosive,
voiceonsettingetc.)
AbySHarmonicVocoder
BitAllocation
Framesize=20ms,Bitrate=4Kbps
TowardstheFundamentalLimit
Howmuchinformationcanoneconveyinone
minute?
Modelbasedspeechcoders
Itdependsonhowfastonespeaks
Itdependsonwhichlanguageonespeaks
Itsurelyalsodependsonthespeechcontent
Cancompressspeechdownto300500bits/second,
canttellwhospeaksit,nointonationorstressor
genderdifference
Atheoreticallyoptimalapproach:speechrecognition
+speakerrecognition+speakerdependentspeech
synthesis

Speech Coding 1

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Speech Coding 1

Hochgeladen von

Copyright:

Verfügbare Formate

SpeechCodingTechniques(I)

p = prob x =0 ,q =1p = prob x =1

Das könnte Ihnen auch gefallen