Sie sind auf Seite 1von 98

SpeechCodingTechniques(I)

IntroductiontoQuantization

Waveformbasedcoding

PulseCodedModulation(PCM)
DifferentialPCM(DPCM)
AdaptiveDPCM(ADPCM)

Modelbasedcoding

Scalarquantization
Uniformquantization
Nonuniformquantization

Channelvocoder
AnalysisbySynthesistechniques
Harmonicvocoder

OriginofSpeechCoding
Watson,ifIcangetamechanismwhichwillmakea
currentofelectricityvaryitsintensityastheairvaries
indensitywhensoundispassingthroughit,Ican
telegraphanysound,eventhesoundofspeech.
A.G.Bell1875

analogcommunication
N

H X = p i log2 p i (bits/sample)
i =1

digitalcommunication

orbps

Entropyformula
ofadiscretesource
C.E.Shannon1948

DigitizationofSpeechSignals
xc(t)

Sampler

continuoustime
speechsignal

x(n)=xc(nT)

Quantizer

x(n)
discretesequence
speechsamples

Sampling

SamplingTheorem

whensamplingasignal
(e.g.,convertingfroman
analogsignaltodigital),the
samplingfrequencymust
begreaterthantwicethe
bandwidthoftheinput
signalinordertobeableto
reconstructtheoriginal
perfectlyfromthesampled
version.

Samplingfrequency:>8Ksamples/second
(humanspeechisroughlybandlimitedat4KHz)

Quantization

InPhysics

Inspeechcoding

Tolimitthepossiblevaluesofamagnitudeorquantity
toadiscretesetofvaluesbyquantummechanical
rules
Tolimitthepossiblevaluesofaspeechsampleor
predictionresiduetoadiscretesetofvaluesby
informationtheoreticrules(tradeoffbetweenRateand
Distortion)

QuantizationExamples

Examples

Continuoustodiscrete

Discretetodiscrete

Roundyourtaxreturntointegers
Themileageofmycarisabout60K.

Playwithbits

Precisionisfinite:themoreprecise,themorebitsyouneed(to
resolvetheuncertainty)

Keepacardinsecretandaskyourpartnertoguess.He/shecan
onlyaskYes/Noquestions:isitbiggerthan7?Isitlessthan4?...

However,noteverybithasthesameimpact

aquarterofmilk,twogallonsofgas,normaltemperatureis98.6F,
myheightis5foot9inches

Howmuchdidyoupayforyourcar?(twothousandsvs.$2016.78)

Scalarvs.VectorQuantization

Scalar:foragivensequenceofspeechsamples,
wewillprocess(quantize)eachsample
independently

Vector:wewillprocess(quantize)ablockof
speechsampleseachtime

Input:Nsamplesoutput:Ncodewords

Input:Nsamplesoutput:N/dcodewords(block
sizeisd)

SQisaspecialcaseofVQ(d=1)

ScalarQuantization
InSQ,quantizingNsamplesisnotfundamentallydifferentfrom
Quantizingonesample(sincetheyareprocessedindependently)
original
value
x

quantization
index

sS

quantized
value

f1

^
x

Quantizer
x

^
x

Aquantizerisdefinedbycodebook(collectionofcodewords)
andmappingfunction(straightforwardinthecaseofSQ)

RateDistortionTradeoff

Rate:Howmany
codewords(bits)are
used?

Distortion

Example:16bitaudiovs.
8bitPCMspeech

Distortion:Howmuch
distortionisintroduced?

Example:meanabsolute
difference(L1),mean
squareerror(L2)

SQ
VQ
Rate(bps)
Question:whichquantizerisbetter?

UniformQuantization
UniformQuantization
Ascalarquantizationiscalleduniformquantization(UQ)ifallitscodewords
areuniformlydistributed(equallydistanced)
Example(quantizationstepsize=16)

24

40

UniformDistribution denotedbyU[A,A]
f x =

1
2A
0

f(x)
1/2A

x [ A , A ]
else

248

...

6dB/BitRule
f(e)
1/
/2

/2

Note:QuantizationnoiseofUQonuniformdistribution
isalsouniformlydistributed
Forauniformsource,addingonebit/samplecan
reduceMSEorincreaseSNRby6dB
(Thederivationofthis6dB/bitrulewillbegivenintheclass)

NonuniformQuantization

Motivation

Speechsignalshavethe
characteristicthatsmallamplitude
samplesoccurmorefrequently
thanlargeamplitudeones
Humanauditorysystemexhibitsa
logarithmicsensitivity

Moresensitiveatsmallamplitude
range(e.g.,0mightsounddifferent
from0.1)
Lesssensitiveatlargeamplitude
range(e.g.,0.7mightnotsound
differentmuchfrom0.8)

histogramoftypical
speechsignals

FromUniformtoNonuniform
F:nonlinearcompressingfunction
F1:nonlinearexpandingfunction
FandF1:nonlinearcompander

F
Example
F:y=log(x)

^
y

F1

^
x

F1:x=exp(x)

WewillstudynonuniformquantizationbyPCMexamplenext

SpeechCodingTechniques(I)

IntroductiontoQuantization

Waveformbasedcoding

PulseCodedModulation(PCM)
DifferentialPCM(DPCM)
AdaptiveDPCM(ADPCM)

Modelbasedcoding

Scalarquantization
Uniformquantization
Nonuniformquantization

Channelvocoder
AnalysisbySynthesistechniques
Harmonicvocoder

PulseCodeModulation
Basicidea:assignsmallerquantizationstepsize
forsmallamplituderegionsandlarger
quantizationstepsizeforlargeamplituderegions
Twotypesofnonlinearcompressingfunctions

MulawadoptedbyNorthAmerican
telecommunicationssystems
AlawadoptedbyEuropeantelecommunications
systems

MuLaw(law)

MuLawExamples
y

ALaw

ALawExamples
y

Comparison
y

PCMSpeech

Mulaw(Alaw)compressesthesignalto8bits/sampleor
64Kbits/second(withoutcompandor,wewouldneed
12bits/sample)

ALookInsideWAVFormat

MATLABfunction[x,fs]=wavread(filename)

ChangetheGear
Strictlyspeaking,PCMismerelydigitizationof
speechsignalsnocoding(compression)atall
Byspeechcoding,werefertorepresenting
speechsignalsatthebitrateof<64Kbps
Tounderstandhowspeechcodingtechniques
work,Iwillcoversomebasicsofdata
compression

DataCompressionBasics

Discretesource

Variablelengthcodes

Information=uncertainty
Quantificationofuncertainty
Sourceentropy
Motivation
Prefixcondition
Huffmancodingalgorithm

Datacompression=sourcemodeling

ShannonsPictureonCommunication
(1948)
source

channel
encoder

channel

channel
decoder

source
encoder

destination

source
decoder
Thegoalofcommunicationistomoveinformation
fromheretothereandfromnowtothen

Examplesofsource:
Humanspeeches,photos,textmessages,computerprograms
Examplesofchannel:
storagemedia,telephonelines,wirelessnetwork

Information

Whatdowemeanbyinformation?

Howtoquantitativelymeasureandrepresent
information?

Shannonproposesaprobabilisticapproach

Howtoachievethegoalofcompression?

Anumericalmeasureoftheuncertaintyofan
experimentaloutcomeWebsterDictionary

Representdifferenteventsbycodewordswith
varyingcodelengths

Information=Uncertainty

Zeroinformation

Littleinformation

ItisverycoldinChicagoinwintertime(notmuchuncertaintysince
itisknowntomostpeople)
DozensofhurricanesforminAtlanticoceaneveryyear(notmuch
uncertaintysinceitisprettymuchpredictable)

Largeinformation

WVUlosttoFSUinGatorBowl2005(pastnews,nouncertainty)
YaoMingplaysforHoustonRocket(celebrityfact,nouncertainty)

HurricanexxxisgoingtohitHouston(sinceKatrina,weallknow
howdifficultitistopredictthetrajectoryofhurricanes)
TherewillbeanearthquakeinLAaroundXmas(areyousure?an
unlikelyevent)

QuantifyingUncertaintyofanEvent
Selfinformation

I p =log 2 p

Ip

probabilityoftheeventx
(e.g.,xcanbeX=HorX=T)
notes

musthappen
(nouncertainty)
unlikelytohappen

(infiniteamountofuncertainty)

Intuitively,I(p)measurestheamountofuncertaintywitheventx

DiscreteSource

Adiscretesourceischaracterizedbya
discreterandomvariableX
Examples

Coinflipping:P(X=H)=P(X=T)=1/2
Dicetossing:P(X=k)=1/6,k=16
Playingcarddrawing:
P(X=S)=P(X=H)=P(X=D)=P(X=C)=1/4

Howtoquantifytheuncertaintyofadiscretesource?

WeightedSelfinformation
p

0
1/2
1

Ip

I w p =pI p

1
0

1/2
0

Aspevolvesfrom0to1,weightedselfinformation

I w p =plog 2 p firstincreasesandthendecreases
Question:

WhichvalueofpmaximizesIw(p)?

MaximumofWeightedSelfinformation

p=1/e

I w p =

1
e ln2

UncertaintyofaDiscreteSource
Adiscretesource(randomvariable)isacollection
(set)ofindividualeventswhoseprobabilitiessumto1

Xisadiscreterandomvariable
x {1,2,... ,N }
p i = prob x = i ,i =1,2,... ,N

p i=1

i =1

Toquantifytheuncertaintyofadiscretesource,
wesimplytakethesummationofweightedself
informationoverthewholeset

ShannonsSourceEntropy
Formula
N

H X = I w pi
i =1

H X = p i log2 p i (bits/sample)
i =1

orbps

SourceEntropyExamples
Example1:(binaryBernoullisource)
Flippingacoinwithprobabilityofheadbeingp(0<p<1)

p = prob x =0 ,q =1p = prob x =1


H X = p log2 p q log 2 q
Checkthetwoextremecases:
Aspgoestozero,H(X)goesto0bpscompressiongainsthemost
Aspgoestoahalf,H(X)goesto1bpsnocompressioncanhelp

EntropyofBinaryBernoulliSource

SourceEntropyExamples
Example2:(4wayrandomwalk)

E
1
1 W
prob x =S = , prob x = N =
2
4
1
S
prob x =E =prob x = W =
8
1
1 1
1 1
1 1
1
H X = log2 log 2 log 2 log2 =1.75 bps
2
2 4
4 8
8 8
8

SourceEntropyExamples(Cont)
Example3: (sourcewithgeometricdistribution)
Ajarcontainsthesamenumberofballswithtwodifferentcolors:blueandred.
Eachtimeaballisrandomlypickedoutfromthejarandthenputback.Consider
theeventthatatthekthpicking,itisthefirsttimetoseearedballwhatisthe
probabilityofsuchevent?

1
1
p = prob x = red = , 1 p =prob x = blue =
2
2
Prob(event)=Prob(blueinthefirstk1picks)Prob(redinthekthpick)
=(1/2)k1(1/2)=(1/2)k

MorseCode(1838)
A B C D E F G H I
J K L M
.08 .01 .03 .04 .12 .02 .02 .06 .07 .00 .01 .04 .02
. .. ..

... . .

..

. . ...

N O P Q R S T U V W X Y Z
.07 .08 .02 .00 .06 .06 .09 .03 .01 .02 .00 .02 .00
.

.. . ..

.. . .. . ..

AverageLengthofMorseCodes

Nottheaverageofthelengthsoftheletters:
(2+4+4+3+)/26=82/263.2

Wewanttheaverageatobesuchthatinatypical
realsequenceofsay1,000,000letters,thenumberof
dotsanddashesshouldbeabouta1,000,000

Theweightedaverage:

(freqofA)(lengthofcodeforA)+(freqofB)(lengthofcodeforB)+

=.082+.014+.034+.043+2.4
Question:isthistheentropyofEnglishtexts?

EntropySummary

Selfinformationofaneventxisdefinedas
I(x)=log2p(x)(rareeventslargeinformation)
Entropyisdefinedastheweightedsummation
ofselfinformationoverallpossibleevents
N

H X = p i log2 p i
Foranydiscretesource

i =1
N

pi=1

i =1

(bits/sample)
orbps

0p i 1

Howtoachievesourceentropy?
discrete
sourceX

entropy
coding

binary
bitstream

P(X)
Note:Theaboveentropycodingproblemisbasedonsimplified
assumptionsarethatdiscretesourceXismemorylessandP(X)
iscompletelyknown.Thoseassumptionsoftendonotholdfor
realworlddatasuchasspeechandwewillrecheckthemlater.

DataCompressionBasics

Discretesource

Variablelengthcodes

Information=uncertainty
Quantificationofuncertainty
Sourceentropy
Motivation
Prefixcondition
Huffmancodingalgorithm

DataCompression=sourcemodeling

VariableLengthCodes(VLC)
Recall:
Selfinformation

I p =log 2 p

Itfollowsfromtheaboveformulathatasmallprobabilityeventcontains
muchinformationandthereforeworthmanybitstorepresentit.Conversely,
ifsomeeventfrequentlyoccurs,itisprobablyagoodideatouseasfewbits
aspossibletorepresentit.Suchobservationleadstotheideaofvaryingthe
codelengthsbasedontheeventsprobabilities.

Assignalongcodewordtoaneventwithsmallprobability
Assignashortcodewordtoaneventwithlargeprobability

4wayRandomWalkExample
symbolk
S
N
E
W

pk
0.5
0.25
0.125
0.125

fixedlength variablelength
codeword
codeword

00
01
10
11

0
10
110
111

symbolstream: SSNWSENNNWSSSNESS
fixedlength: 00000111001001011100000001100000 32bits
variablelength:0010111011010101110001011000 28bits
4bitssavingsachievedbyVLC(redundancyeliminated)

ToyExample(Cont)
sourceentropy:4

H X = pk log2 p k
k =1

=0.51+0.252+0.1253+0.1253
=1.75bits/symbol
averagecodelength:
l =
fixedlength

Nb
Ns

l =2 bps H X

(bps)

Totalnumberofbits
Totalnumberofsymbols
variablelength

l =1.75 bps=H X

ProblemswithVLC

Whencodewordshavefixedlengths,the
boundaryofcodewordsisalwaysidentifiable.
Forcodewordswithvariablelengths,their
boundarycouldbecomeambiguous
symbol
S
N
E
W

VLC

0
1
10
11

SSNWSE
e
00111010
00111010
d
SSWNSE

00111010
d
SSNWSE

UniquelyDecodableCodes

Toavoidtheambiguityindecoding,weneedto
enforcecertainconditionswithVLCtomake
themuniquelydecodable
Sinceambiguityariseswhensomecodeword
becomestheprefixoftheother,itisnaturalto
considerprefixcondition
Example: pprpreprefprefiprefix
ab:aistheprefixofb

Prefixcondition

Nocodewordisallowedto
betheprefixofanyother
codeword.
Wewillgraphicallyillustratethiscondition
withtheaidofbinarycodewordtree

BinaryCodewordTree
root
1

Level1
Level2

Levelk

#ofcodewords

11

10

01 00

22

2k

PrefixConditionExamples
symbolx
S
N
E
W

codeword1 codeword2
0
0
1
10
10
110
11
111

1
11

0
10

01 00

codeword1

1
11
111

0
10
110
codeword2

Howtosatisfyprefixcondition?

Basicrule:Ifanodeisusedasacodeword,
thenallitsdescendantscannotbeusedas
codeword.
1

Example
11
111

0
10
110

VLCSummary
Rule#1:shortcodewordlargeprobability
event,longcodewordsmallprobabilityevent
Rule#2:nocodewordcanbetheprefixofany
othercodeword
Question:givenP(X),howtosystematically
assignthecodewordsthatalwayssatisfythose
tworules?

Answer:Huffmancoding,arithmeticcoding
EntropyCoding

HuffmanCodes(Huffman1952)

CodingProceduresforanNsymbolsource

Sourcereduction

Codewordassignment

Listallprobabilitiesinadescendingorder
Mergethetwosymbolswithsmallestprobabilitiesintoa
newcompoundsymbol
RepeattheabovetwostepsforN2steps
Startfromthesmallestsourceandworkbackwardtothe
originalsource
Eachmergingpointcorrespondstoanodeinbinary
codewordtree

ExampleI
Step1:Sourcereduction
symbolx
S
N
E
W

p(x)
0.5
0.25
0.125
0.125

0.5
0.25
0.25
(EW)

0.5
0.5
(NEW)
compoundsymbols

ExampleI(Cont)
Step2:Codewordassignment
symbolx
NEW
S
0
N
1 0
S
EW
10
E
N
1
0
W
111 110
W
E
1

p(x)
0.5
0.5
0.5
0.25
0.25 0
0.5
0
0.125
0.25 1
0.125 1

codeword
0
0
10
1
110
111

ExampleI(Cont)
1

NEW
0
1
0
S
EW
10
N
1
0
110
W
E

NEW
1
0
1
S
or
EW
01
N
1
0
000
001
W
E

Thecodewordassignmentisnotunique.Infact,ateach
mergingpoint(node),wecanarbitrarilyassign0and1
tothetwobranches(averagecodelengthisthesame).

ExampleII
Step1:Sourcereduction
symbolx
e
a
i
o
u

p(x)
0.4
0.2
0.2
0.1
0.1

0.4
0.2
0.2
0.2
(ou)

0.6
(aiou)
0.4
(iou) 0.4
0.2
0.4

compoundsymbols

ExampleII(Cont)
Step2:Codewordassignment
symbolx
e
a
i
o
u

p(x)
0.4
0.2
0.2
0.1
0.1

0.4
0.2
0.2

0.6 0
(aiou)
0.4
1
(iou) 0.4
0.2
0.4

0.2
(ou)

compoundsymbols

codeword
1
01
000
0010
0011

ExampleII(Cont)
0
1
(aiou)
e
00
01
(iou) a
000
001
(ou)
i
0010 0011
o
u
binarycodewordtreerepresentation

DataCompressionBasics

Discretesource

Variablelengthcodes

Information=uncertainty
Quantificationofuncertainty
Sourceentropy
Motivation
Prefixcondition
Huffmancodingalgorithm

DataCompression=sourcemodeling

WhatisSourceModeling
discrete
sourceX

entropy
coding

binary
bitstream

P(X)
discrete
sourceX

Modeling Y
Process

entropy
coding
P(Y)
probability
estimation

binary
bitstream

ExamplesofModelingProcess

Runlengthcoding

Dictionarybasedcoding

Applylineartransformtoablockofsymbols(e.g.,discrete
cosinetransforminJPEG)

Predictivecoding

Recordrepeatingpatternsintoadictionaryupdatedonthefly
(e.g.,LempelZivalgorithminWinZip)

Transformcoding

Counttherunlengthsofidenticalsymbols(suitablefor
binary/graphicimages)

Applylinearpredictiontothesequenceofsymbols(e.g.,DPCM
andADPCMinwiredtransmissionofspeech)

PredictiveCoding
discrete
sourceX

Y
Linear
Prediction

entropy
coding
P(Y)
probability
estimation

binary
bitstream

PredictionresiduesequenceYusuallycontainsless
uncertainty(entropy)thantheoriginalsequenceX
WHY?BecausetheredundancyisassimilatedintotheLPmodel

TwoExtremeCases
tossing
afaircoin

H,H,T,H,T,H,T,T,T,H,T,T,

P(X=H)=P(X=T)=1/2:(maximumuncertainty)
Nopredictioncanhelp(havetospend1bit/sample)

tossingacoinwith
twoidenticalsides

Head
or
Tail?

duplication

HHHH
TTTT

P(X=H)=1,P(X=T)=0:(minimumuncertainty)
Predictionisalwaysright(1bitisenoughtocodeall)

DifferentialPCM

Basicidea

Linearprediction

Sincespeechsignalsareslowlyvarying,itispossible
toeliminatethetemporalredundancybyprediction
Insteadoftransmittingoriginalspeech,wecode
predictionresiduesinstead,whichtypicallyhave
smallerenergy
Fixed:thesamepredictorisusedagainandagain
Adaptive:predictorisadjustedonthefly

FirstorderPrediction
x1x2xN

Encoding

e1=x1
Decoding
xn

e1e2eN
x1x2xN
x1=e1
xn=en+xn1,n=2,,N
en

xn

Encoder

en=xnxn1,n=2,,N

_ en
xn1

e1e2eN

xn
xn1
D

xn1
DPCMLoop

Decoder

PredictionMeetsQuantization

Openloop

Closedloop

Predictionisbasedonunquantizedsamples
Sincedecoderonlyhasaccesstoquantizedsamples,
werunintoasocalleddriftingproblemwhichisreally
badforcompression
Predictionisbasedonquantizedsamples
Stillsuffersfromerrorpropagation(i.e.,quantization
errorsofthepastwillaffecttheefficiencyof
prediction),butnodriftingisinvolved

OpenloopDPCM
xn

_ en
xn1

Q
D

xn

xn1

^
en

^
xn
^x
n1

Decoder

Encoder
Notes: Predictionisbasedonthepastunquantizedsample
QuantizationislocatedoutsidetheDPCMloop

ClosedloopDPCM
xn

_ en
^x
n1

_
en

Q
D

x^n

^x
n1

_
en

^x
n
^x
n1

Decoder

Encoder
xn,en:unquantizedsamplesandpredictionresidues
_
^x ,e :decodedsamplesandquantizedpredictionresidues
n

Notes: Predictionisbasedonthepastdecodedsample
QuantizationislocatedinsidetheDPCMloop

NumericalExample
xn

90 92 91 93 93 95

en

90 2 2 3 0

[]

Q Q x = x 3

en

^x

a
ab
b

90 3 3 3 0

90 93 90 93 93 96

a
b a+b

ClosedloopDPCMAnalysis
_
xn

A:
B:

_ en
A
^x
n1

en =x n x n1
x n=en x n1

en

Q
D

x^n

B
^x
n1

x n x n =en e n

Thedistortionintroducedtopredictionresidueenis
identicaltothatintroducedtotheoriginalsamplexn

HighorderLinearPrediction
originalsamples

x1x2xn1xnxn+1xN

x1x2xN
e1e2eN
initialize e1=x1,e2=x2,,ek=xk

Encoding

prediction en =x n a i x ni ,n =k 1,.. . ,N
Decoding

i =1

e1e2eN

x1x2xN

initialize x1=e1,x2k=e2,,xk=ek
prediction x n=en a i x ni ,n =k 1,.. . ,N
i =1

Thekeyquestionis:howtoselectpredictioncoefficients?

Recall:LPAnalysisofSpeech
minimize

n=1

n=1

k =1

MSE = e2 n = [ x n a k x n k ]2

][ ] [ ]

R n 0
R n 1

R n K 1 a 1
R n 1
R n 1
R n 0

a 2 = R n 2

Rn 1

aK R n K
R n K 1

R n 1
Rn 0

Notethatinfixedprediction,autocorrelationiscalculated
overthewholesegmentofspeech(NOTshorttimefeatures)

QuantizedPredictionResidues

Furtherentropycodingispossible

However,thecurrentpracticeofspeechcodingassign
fixedlengthcodewordstoquantizedpredictionresidues

Variablelengthcodes:e.g.,Huffmancodes,Golombcodes
Arithmeticcoding

Theassignedcodelengthsarealreadynearlyoptimalfor
achievingthefirstorderentropy
Goodfortherobustnesstochannelerrors

DPCMcanachievecompressionoftwo(i.e.,32kbps)
withoutnoticeablespeechqualitydegradation

AdaptiveDPCM

Adaptation
K

en =x n a i x n i ,n =K 1,.. . ,N
i =1

][ ] [ ]

R n 0
R n 1

R n K 1 a1
R n 1
R n 1
R n 0

a2 = R n 2

Rn 1

aK R n K
R n K 1

R n 1
Rn 0

Totracktheslowlyvaryingpropertyofspeechsignals,
theestimatedshorttimeautocorrelationisupdatedonthefly

PredictionGain
Gp

Gp=10log 10

2x
2e

dB

ForwardandBackwardAdaptation

Forwardadaptation

Backwardadaptation

Theautocorrelationisestimatedfromthecurrent
frameandthequantizedpredictioncoefficientsare
transmittedtothedecoderassideinformation(wewill
discusshowtoquantizesuchvectorinthediscussion
ofCELPcoding)
Theautocorrelationisestimatedfromthecausalpast
andthereforenooverheadisrequiredtobe
transmitted;decoderwillduplicatetheencoders
operation

IllustrationofForwardAdaptation

F1

F2

F3

F4

F5

Foreachframe,asetofLPCsaretransmitted

IllustrationofBackward
Adaptation

Foreachframe,LPCarelearnedfromthepastframe

Comparison

Forwardadaptive
prediction

Backwardadaptive
prediction

Asymmetriccomplexity
allocation
(encoder>decoder)
Overheadnonnegligible

Symmetriccomplexity
allocation
(encoder=decoder)
Nooverhead

robusttoerrors

sensitivetoerrors

Moresuitableforlowbit
ratecoding

Moresuitableforhighbit
ratecoding

ARvs.MA

Autoregressive(AR)Model

Movingaverage(MA)model

EssentiallyanIIRfilter(allpoles)

EssentiallyaFIRfilter(allzeros)

AutoregressiveMovingAverage(ARMA)model

ITUTG.726AdaptiveDifferential
PulseCodeModulation(ADPCM)

Decoder
Encoder

WaveformCodingDemo

http://wwwlns.tf.unikiel.de/demo/demo_speech.htm

DataCompressionParadigm
discrete
sourceX

Modeling Y
Process

entropy
coding
P(Y)
probability
estimation

binary
bitstream

Question:WhydowecallDPCM/ADPCMwaveformcoding?
Answer:BecauseY(predictionresidues)isalsoakindof
waveformjustliketheoriginalspeechXtheyhavethe
samedimensionality

SpeechCodingTechniques(I)

IntroductiontoQuantization

Waveformbasedcoding

PulseCodedModulation(PCM)
DifferentialPCM(DPCM)
AdaptiveDPCM(ADPCM)

Modelbasedcoding

Scalarquantization
Uniformquantization
Nonuniformquantization

Channelvocoder
AnalysisbySynthesistechniques
Harmonicvocoder

IntroductiontoModelbasedCoding
modelspace

Signalspace
Analysis

Synthesis
{x1,,xN}

{1,,K}
RN

RK
K<<N

ToyExample
Modelbasedcodingofsinusoidsignals

x n=sin 2 fn ,n =1,2,... ,N
Inputsignal
{x1,,xN}

analysis

1=f
2=

synthesis

Reconstructed
signal

Question(testyourunderstandingaboutentropy)
Ifasourceisproducedbythefollowingmodel:
x(n)=sin(50n+p),n=1,2,,100wherepisarandomphase
withequalprobabilitiesofbeing0,45,90,135,180,
225,270,315.Whatistheentropyofthissource?

BuildingModelsforHuman
Speech
Waveformcoderscanalsoviewedasakindof
modelswhereK=N(notmuchcompression)
Usuallymodelbasedspeechcoderstargetat
veryhighcompressionratio(K<<N)
Thereisnofreelunch
Highcompressionratiocomesalongwith
severequalitydegradation
Wewillseehowtoachieveabettertradeoffin
partII(CELPcoder)

ChannelVocoder:Encoding
Bandpass
filter

Bandpass
filter
Voicing
detector
Pitch
detector

rectifier

rectifier

Lowpass
filter

Lowpass
filter

1bit

A/D
converter
34bits
/channel
A/D
converter

E
N
C
O
D
E
R

6bits
Framesize:20ms

Bitrate=24003200bps

ChannelVocoder:Decoding
D/A
Converter
D
E
C
O
D
E
R

Bandpass
filter

D/A
Converter
Voicing
Infor.
Pitch
period

Bandpass
filter

Pulse
generator

Noise
generator

AnalysisbySynthesis

Motivation

WhatisAbyS?

Theoptimalityofsomeparameteriseasyto
determine(e.g.,pitch),butnotforothers(e.g.,gain
parametersacrossdifferentbandsinchannelvoder)
Theinteractionamongparametersisdifficultto
analyzebutimportanttosynthesispart
Dothecompleteanalysisandsynthesisinencoding
Decoderisembeddedintheencoderforthereasonof
optimizingtheextractedparameters

AbySisaClosedLoop
inputspeech

=[ , ... , ]
1
K
Analysis

Synthesis

x =[ x 1 , ... , xN ]
e=[ e1 , ... ,eN ]

MMSE

ToyExampleRevisited
Inputsignal
{x1,,xN}

analysis

1=f
2=

synthesis

Reconstructed
signal

Functiondist=MSE_AbyS(x,f,phi)
n=1:length(x);
x_rec=sin(2*pi*f*n+phi);
e=xx_rec;
dist=sum(e.*e);
MATLABprovidesvarioustoolsforsolvingoptimizationproblem
>helpfminsearch

HarmonicModels
Forspeechwithinaframe

s n = A j cos j n j
j=1

Forvoicedsignals

Forunvoicedsignals

Phaseiscontrolledbypitchperiod
Pitchisoftenmodeledbyaslowlyvaryingquadratic
model
randomphase

Lessaccuratefortransitionsignals(e.g.,plosive,
voiceonsettingetc.)

AbySHarmonicVocoder

BitAllocation

Framesize=20ms,Bitrate=4Kbps

TowardstheFundamentalLimit

Howmuchinformationcanoneconveyinone
minute?

Modelbasedspeechcoders

Itdependsonhowfastonespeaks
Itdependsonwhichlanguageonespeaks
Itsurelyalsodependsonthespeechcontent

Cancompressspeechdownto300500bits/second,
canttellwhospeaksit,nointonationorstressor
genderdifference
Atheoreticallyoptimalapproach:speechrecognition
+speakerrecognition+speakerdependentspeech
synthesis

Das könnte Ihnen auch gefallen