Beruflich Dokumente
Kultur Dokumente
Submitted in the partial fulfillment of the requirement for the award of degree of
Ref:__________
Dated: 27/04/2012
Certificate
Certified that this project entitled speech compression and decompression submitted by saurabh lohani (10808634), pankaj singh negi (10807965) students of Electronics & Communication Engineering Department, Lovely Professional University, Phagwara Punjab in the partial fulfillment of the requirement for the award of Bachelors of Technology (Electronics & Communication Engineering) Degree of LPU, is a record of students own study carried under my supervision & guidance.
This report has not been submitted to any other university or institution for the award of any degree.
Acknowledgement
We would like to express our deep sense of gratitude and indebtedness to Miss. Lovleen kaur mam who guided us at all stages in the preparation of this dissertation. This project would no have been possible without her valuable suggestion and encouragement. It would not be out of place to mention here that my revered parents have always been a great source of inspiration to me. My head bows in obeisance to them. We are highly appreciative of all others who directly or indirectly contributed to its completion. Last but not the least all that I am capable of doing I owe to THE ALMIGHTY.
Abstract
The objective of the project is to develop speech compression and decompression system using ADSP-2105/2115 processor. It is proposed to employ ADPCM (Adaptive Pulse Code Modulation) for Compression and Decompression.The analog speech signal is digitized by sampling. For maintaining the voice quality, each sample has to be represented by 13 or 16 bits. In compression technique the digitized samples are represented by an equivalent 4 to 8 bit samples. In decompression the compressed samples are expanded back to original sample size and converted back to analog signals. the main focus of any speech recognition system (SRS) is to facilitate and improve the direct audible man- machine communication and provide an alternative access to machines, a speech compression system (SCS) focuses on reducing the amount of redundant data while preserving the integrity of signals. The compression of speech signals has many practical applications. One example is in digital cellular technology where many users share the same frequency bandwidth. Compression allows more users to share the system than otherwise possible. Another example is in digital voice storage (e.g. answering machines). For a given memory size, compression allows longer messages to be stored than otherwise.
TABLE OF CONTENTS
1. 2.
3. 4. 5.
6.
7. 8. 9.
hardware requirements..
Functioning. Application.. Software implementation.
10. MATLAB source code. 11. Sampled speech signal 12. Calculating threshold.. 13. voice ,unvoiced and mixed speech frames.. 14. Performance measure 1. signal to noise ratio 2.peak signal to noise ratio 3. normalized root mean square 4. retained signal energy 5. compression ratios 15. Future work.. 1. Enhancing quality 2. Improving compression ratio 16. conclusion
Introduction
Speech is a very basic way for humans to convey information to one another. With a bandwidth of only 4kHz, speech can convey information with the emotion of a human voice. People want to be able to hear someone.s voice from anywhere in the world . as if the person was in the same room.As a result a greater emphasis is being placed on the design of new and efficient speech coders for voice communication and transmission. Today applications of speech coding and compression have become very numerous. Many applications involve the real time coding of speech signals, for use in mobile satellite communications, cellular telephony, and audio for videophones or video teleconferencing systems. Other applications include the storage of speech for speech synthesis and playback, or for the transmission of voice at a later time. Some examples include voice mail systems, voice memo wristwatches, voice logging recorders and interactive PC software. Traditionally speech coders can be classified into two categories: waveform coders and analysis/synthesis vocoders (from .voice coders.). Waveform coders attempt to copy the actual shape of the signal produced by the microphone and its associated analogue circuits [9]. A popular waveform coding technique is pulse code modulation (PCM), which is used in telephony today. Vocoders use an entirely different approach to speech coding, known as parametercoding, or analysis/synthesis coding where no attempt is made at reproducing the exact speech waveform at the receiver, only a signal perceptually equivalent to it. These systems provide much lower data rates by using a functional model of the human speaking mechanism at the receiver. One of the most popular techniques for analysissynthesis coding of speech is called Linear Predictive Coding (LPC). Some higher quality vocoders include RELP (Residual Excited Linear Prediction) and CELP (Code Excited Linear Prediction) . This project looks at a new technique for analysing and compressing speech signals using wavelets. Very simply wavelets are mathematical functions of finite duration with an average value of zero that are useful in representing data or other functions. Any signal can be represented by a set of scaled and translated versions of a basic function called the .mother wavelet.. This set of wavelet functions forms the wavelet coefficients at different scales and positions and results from taking the wavelet ransform of the original signal. The coefficients represent the signal in the wavelet domain and all data operations can be performed using just the corresponding wavelet coefficients. Speech is a non-stationary random process due to the time varying nature of the human speech production system. Non-stationary signals are characterised by numerous transitory drifts, trends and abrupt changes. The localisation feature of wavelets, along with its time-frequency resolution properties makes them well suited for coding speech signals.
SPEECH REPRESENTATIONS
Extracting information from a speech signal to be used in a recognition engine or for compression purposes relies usually on transforming such a signal to a different domain than its original state. Although, processing a signal in the time domain can be beneficial to obtain measures such as zero crossing and others, most important properties of the signal resides in the timefrequency and time-scale domains. Thissection contains a review and a comparison of the different methods and techniques that allow such extractions. In this paper, x(t) represents the continuous speech signal to be analyzed. In order to digitally process a signal x(t), it has to be sampled at a certain rate. 20000 Hz is a standard sampling frequency for the Digits and the English alphabets in. To make the distinction in the representation with the digitized signals, the latter is referred to as x(m). Most speech processing schemes assume slow changes in the properties of speech with time, usually every 10-30 milliseconds. This assumption influenced the creation of short time processing, which suggests the processing of speech in short but periodic segments called analysis frames or just frames. Each frame is then represented by one or a set of numbers, and the speech signal has then a new time-dependent representation. In many speech recognition systems like the ones introduced in, frames of size 200 samples and a sampling rate of 8000 Hz (i.e., 200 1000/8000 = 25 milliseconds) are considered. This segmentation is not error free since it creates blocking effects that makes a rough transition in the representation (or measurements) of two consecutive frames. To remedy this rough transition, a window is usually applied to data of twice the size of the frame and overlapping 50% the consecutive analysis window. This multiplication of the frame data by a window favors the samples near the center of the window over those at the ends resulting into a smooth representation. If the window length is not too long, the signal properties inside it remains constant. Taking the Fourier Transform of the data samples in the window after adjusting their length to a power of 2, so one can apply the Fast Fourier Transform , results in time-dependent Fourier transform which reveals the frequency domain properties of the signal .The spectrogram is the plot estimate of the short-term frequency content of the signals in which a three-dimensional representation of the speech intensity, in different frequency bands, over time is portrayed . The vertical dimension corresponds to frequency and the horizontal dimension to time. The darkness of the pattern is proportional to the energy of the signal. The resonance frequencies of the vocal tract appear as dark bands in the spectrogram . Mathematically, the spectrogram of a speech signal is the magnitude square of the Short Time Fourier Transform of that signal . In the literature one can find many different windows that can be applied to the frames of speech signals for a short-term frequency analysis. Three of them are depicted in Figure
Figure 2 shows a small part of a recorded audio stream. Analog audio input samples (PCM values) and the differences between successive samples (DPCM values) are compared in the two diagrams in Figure 2. The range of the PCM values is between 26 and 203, with a delta of 177 steps. The encoded DPCM values are within a range of 44 and 46, with a delta of 90 steps. Despite a quantizer step size of one, this DPCM encoding already shows a compression of the input data. The range of the encoded DPCM values could be further decreased by selecting a higher quantizer step size.
The ADPCM encoder calculates the signal estimate, (Se), by decoding the ADPCM code. This means that the decoder is part of an ADPCM encoder. Hence, the encoded audio data stream can only be replayed using the decoder. This means that the decoder must track the encoder. The initial encoder and decoder signal estimate level, as well as the step-size adaptation level, must be defined before starting encoding or decoding. Otherwise, the encoded or decoded value could exceed the scale. HARDWARE The objective of the project is to develop speech compression and decompression system using ADSP-2105/2115 processor. It is proposed to employ ADPCM (Adaptive Pulse Code Modulation) for Compression and Decompression. The analog speech signal is digitized by sampling. For maintaining the voice quality, each sample has to be represented by 13 or 16 bits. In compression technique the digitized samples are represented by an equivalent 4 to 8 bit samples. In decompression the compressed samples are expanded back to original sample size and converted back to analog signals. The Hardware consists of DSP processor ADSP 2105/2115 as CPU, CODEC, EPROM RAM, Amplifier sections, Mic and speaker. The CODEC has been interfaced to ADSP processor through its serial port. The optional hardware include PC serial port interface consisting of serial I/O port and RS-232 level converter .The TTL logic levels of serial port are converted to RS232 level using level converter, so that the system can directly communicate with the standard serial port (com1/com2) of personal computer.
MIC
RAM
AMPLIFIER SPEAKER
CODEC
SYSTEM BUS
EPROM
SERIAL PORT
BUFFER
RS-232
RS-232 ADSP
PC BUS
LPORT
2105/2115 LATCH
FUNCTIONING: The system is operated through the reset and interrupt switch. Once the system is resetted it will be ready to accept the speech signals through the Mic and CODEC. The analog speech signals are amplified by the pre-amplifier and fed to the CODEC for analog to digital conversion .The CODEC transmits the digitized signal to the ADSP 2105/2115 processor, which then compress the speech data using the ADPCM techniques and store in RAM. When the processor is interrupted, it reads the compressed data from RAM expands the data and send the data to CODEC. The CODEC converts the digital data to analog signal, which is amplified and output through the speaker. APPLICATION: The speech compression and decompression techniques are implemented in the applications like :1. 2. 3. 4. 5. Cellular phones Voice mail transmission Speech recognition system Voice storage IVRS(Interactive Voice Response System)
Software Implementation
In this project we are making use of MATLAB software to Implement the function and working of this project.MATLAB stands for Matrix Laboratory. According to The Mathworks, its producer, it is a "technical computing environment". We will take the more mundane view that it is a programming language. This section covers much of the language, but by no means all. We aspire to at the least to promote a reasonable proficiency in reading procedures that we will write in the language but choose to address this material to those who wish to use our procedures and write their own programs. MATLAB is one of a few languages in which each variable is a matrix (broadly construed) and "knows" how big it is. Moreover, the fundamental operators (e.g. addition, multiplication) are programmed to deal with matrices when required. And the MATLAB environment handles much of the bothersome housekeeping that makes all this possible. Since so many of the procedures required for Macro-Investment Analysis involve matrices, MATLAB proves to be an extremely efficient language for both communication and implementation MATLAB is a programming environment for algorithm development, data analysis, visualization, and numerical computation. Using MATLAB, you can solve technical computing problems faster than with traditional programming languages, such
as C, C++, and Fortran. You can use MATLAB in a wide range of applications, including signal and image processing, communications, control design, test and measurement, financial modeling and analysis, and computational biology. For a million engineers and scientists in industry and academia, MATLAB is the language of technical computing.
Matlab source code:close all; clear all disp('load speech data'); load speech.dat; lg=length(speech); t=[0:1:lg-1]/8000; disp('loading finished'); disp('mulaw companding') nspeech=speech/(2^15); % 15 bits mu=input('input mu =>'); for x=1:lg munspeech(x) =mulaw(nspeech(x),1,mu); %mulaw compression end disp('finished mu-law companding'); disp('start to quantization') bits = input('input bits=>'); for x=1:lg [pq uindx(x)]= midtread(bits,1,nspeech(x)); [pq muindx(x)]= midtread(bits,1,munspeech(x)); end %% % transmission % disp('expander'); for x=1:lg qunspeech(x)= mtrdec(bits,1,uindx(x)); qmunspeech(x)=mtrdec(bits,1,muindx(x)); end for x=1:lg expnspeech(x)= muexpand(qmunspeech(x),1,mu); end quspeech=qunspeech.*2^15; qspeech =expnspeech.*2^15; disp('finished')
qerr = speech-qspeech; subplot(2,1,1),plot(t, speech, 'w', t, qspeech, 'c',t,qspeech-speech,'r');grid subplot(2,1,2),plot(t, speech, 'w', t, quspeech,'b',t,quspeech-speech,'r');grid disp('speech:orginal data 15 bits'); disp('quspeech: PCM in quantized'); disp('qspeech: mulow deccoded'); disp('SNR speech and qspeech'); snr(speech,qspeech); disp('SNR speech quspeech'); snr(speech,quspeech);
function qvalue = mulaw(vin, vmax, mu) vin = vin/vmax; qvalue = vmax*sign(vin)*log(1+mu*abs(vin))/log(1+mu); function rvalue = muexpand(y,vmax, mu) y=y/vmax; rvalue=sign(y)*(vmax/mu)*((1+mu)^abs(y) -1); function [ pq, indx ] = midtread(NoBits,Xmax,value) % function [pq indx] = midtread(NoBits, Xmax, value) % this routine is created for simulation of uniform quatizer. % % NoBits: number of bits used in quantization. % Xmax: overload value. % value: input to be quantized. % pq: output of quantized value % indx: codeword integer % Note: the midtread method is used in this quantizer. %
if NoBits == 0
pq = 0; indx=0; else delta = 2*abs(Xmax)/(2^NoBits-1); Xrmax=delta*(2^NoBits/2-1); if abs(value) >= Xrmax tmp = Xrmax; indx=(2^NoBits/2-1); else tmp = abs(value); end indx=round(tmp/delta); pq =round(tmp/delta)*delta; if value < 0 pq = -pq; indx=-indx; end end function pq = mtrdec(NoBits,Xmax,indx) % function pq = mtrdec(NoBits, Xmax, indx) % this routine is created for simulation of uniform quatizer. % % NoBits: number of bits used in quantization. % Xmax: overload value % pq: output of quantized value % indx: codeword integer
if NoBits == 0 pq = 0; else delta = 2*abs(Xmax)/(2^NoBits-1); pq=indx*delta; end function snr = calcsnr(speech, qspeech) function snr = calcsnr(speech, qspeech) % this routine is created for calculation of SNR % speech: original speech waveform.
% qspeech: quantized speech. % snr: output SNR in dB. % % Note: midrise method is used in this quantizer. % qerr = speech-qspeech; snr = 10*log10(sum(speech.*speech)/sum(qerr.*qerr)) % Waveform coding using DCT and MDCT for a block size of 16 samples % main program close all; clear all load speech.dat % create scale factors N=16; % block size scalef4bits=sqrt(2*N)*[1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768]; % provided by the instructor
scalef3bits=sqrt(2*N)*[256 512 1024 2048 4096 8192 16384 32768]; scalef2bits=sqrt(2*N)*[4096 8192 16384 32768]; scalef1bit=sqrt(2*N)*[16384 32768]; scalef=scalef2bits; nbits =3; % ensure the block size to be 16 samples. x=[speech zeros(1,16-mod(length(speech),16))]; Nblock=length(x)/16; DCT_code=[]; scale_code=[]; % encoder for i=1:Nblock xblock_DCT=dct(x((i-1)*16+1:i*16)); diff=(scalef-(max(abs(xblock_DCT)))); iscale(i)=min(find(diff=min(diff(find(diff>=0))))); %find a scale factor xblock_DCT=xblock_DCT/scalef(iscale(i)); % scale the input vector for j=1:16 [DCT_coeff(j) pp]=biquant(nbits,-1,1,xblock_DCT(j)); end DCT_code=[DCT_code DCT_coeff ]; end %decoder Nblock=length(DCT_code)/16; xx=[]; for i=1:Nblock DCT_coefR=DCT_code((i-1)*16+1:i*16); for j=1:16
xrblock_DCT(j)=biqtdec(nbits,-1,1,DCT_coefR(j)); end xrblock=idct(xrblock_DCT.*scalef(iscale(i))); xx=[xx xrblock]; end % Transform coding using MDCT xm=[zeros(1,8) speech zeros(1,8-mod(length(speech),8)), zeros(1,8)]; Nsubblock=length(x)/8; MDCT_code=[]; % encoder for i=1:Nsubblock xsubblock_DCT=wmdct(xm((i-1)*8+1:(i+1)*8)); diff=(scalef-max(abs(xsubblock_DCT))); iscale(i)= iscale(i)=min(find(diff=min(diff(find(diff>=0))))); %find a scale factor xsubblock_DCT=xsubblock_DCT/scalef(iscale(i)); % scale the input vector for j=1:8 [MDCT_coeff(j) pp]=biquant(nbits,-1,1,xsubblock_DCT(j)); end MDCT_code=[MDCT_code MDCT_coeff]; end %decoder % recover thr first subblock Nsubblock=length(MDCT_code)/8; xxm=[]; MDCT_coeffR=MDCT_code(1:8); for j=1:8
xmrblock_DCT(j)=biqtdec(nbits,-1,1,MDCT_coeffR(j)); end xmrblock=wimdct(xmrblock_DCT*scalef(iscale(1))); xxr_pre=xmrblock(9:16) % recovered first block for overlap and add for i=2:Nsubblock MDCT_coeffR=MDCT_code((i-1)*8+1:i*8); for j=1:8 xmrblock_DCT(j)=biqtdec(nbits,-1,1,MDCT_coeffR(j)); end xmrblock=wimdct(xmrblock_DCT*scalef(iscale(i))); xxr_cur=xxr_pre+xmrblock(1:8); % overlap and add xxm=[xxm xxr_cur]; xxr_pre=xmrblock(9:16); end % set for the next overlap
subplot(3,1,1);plot(x,'k');grid; axis([0 length(x) -10000 10000]) ylabel('Original signal'); subplot(3,1,2);plot(xx,'k');grid;axis([0 length(xx) -10000 10000]); ylabel('DCT coding') subplot(3,1,3);plot(xxm,'k');grid;axis([0 length(xxm) -10000 10000]); ylabel('W-MDCT coding'); xlabel('Sample number'); function [ tdac_coef ] = wmdct(ipsig) % % This function transforms the signal vector using the W-MDCT % usage:
% ipsig: inpput signal block of N samples (N=even number) % tdac_coe: W-MDCT coefficents (N/2 coefficients) % N = length(ipsig); NN =N; for i=1:NN h(i) = sin((pi/NN)*(i-1+0.5)); end for k=1:N/2 tdac_coef(k) = 0.0; for n=1:N tdac_coef(k) = tdac_coef(k) + ... h(n)*ipsig(n)*cos((2*pi/N)*(k-1+0.5)*(n-1+0.5+N/4)); end end tdac_coef=2*tdac_coef;
function [ opsig ] = wimdct(tdac_coef) % % This function transform the W-MDCT coefficients back to the signal % usage: % tdac_coeff: N/2 W-MDCT coeffcients % opsig: output signal black with N samples % N = length(tdac_coef); tmp_coef = ((-1)^(N+1))*tdac_coef(N:-1:1); tdac_coef = [ tdac_coef tmp_coef];
N = length(tdac_coef); NN =N; for i=1:NN f(i) = sin((pi/NN)*(i-1+0.5)); end for n=1:N opsig(n) = 0.0; for k=1:N opsig(n) = opsig(n) + ... tdac_coef(k)*cos((2*pi/N)*(k-1+0.5)*(n-1+0.5+N/4)); end opsig(n) = opsig(n)*f(n)/N; end
Openfile.m function sdata = openfile(fName); % openfile : function to read a speech file with a .od extension % call syntax: sdata = openfile(fName); % -------------------------------% Read sound file data into a column vector sdata = dlmread(fName); Play.m function play(M); % PLAYFILE: Plays a sound file which is stored as a vector % call syntax: playfile(M); % -----------------------% Play sound file soundsc(M, 8000, 8); Main.m % Speech Compression Simulation Program % User Inputs fileName = c:\program files\matlab\work\s180.od; wavelet = db10; % Compress speech
[tC, tL, PZEROS, PNORMEN] = compress(fileName, wavelet); % Decompress speech rS = decompress(tC,tL, wavelet); % Performance calculations [SNR, PSNR, NRMSE] = pefcal(fileName, rS); Compress.m function [tC, tL, PZEROS, PNORMEN] = compress(fileName, wavelet); % Compress : compresses speech signals wavelet coefficients % Inputs: speech signal file name, wavelet % Outputs: compressed coefficients, length vector, compression score % and retained energy % Call syntax: [tC, tL, PZEROS, PNORMEN] = compress(fileName, wavelet); % -------------------------------------% Initialise other variables N = 5; % level of decomposition ALPHA = 1.5; % compression parameter SORH = h; % hard thresholding % Read speech file sdata = openfile(fileName); % Compute the DWT to level N [C,L] = wavedec(sdata,N,wavelet); % Calculate level dependent thresholds [THR,NKEEP] = lvlThr(C,L,ALPHA); % Compress signal using hard thresholding %[XC,CXC,LXC,PERF0,PERFL2] = Trunc(lvd,C,L,wavelet,N,THR,SORH); % Encode coefficients cC = encode(CXC); % Transmitted coefficients; tC = cC; % Transmitted coefficients vector length tL = L; % Percentage of zeros PZEROS = PERF0; % Retained energy PNORMEN = PERFL2; % Compression ratio with encoding CompRatio = length(sdata)/length(tC) Decompress.m function rSignal = decompress(tC,tL, wavelet);
% Decompress : uncompress DWT coefficients and reconstructs signal % Inputs: encoded wavelet coefficients, coeff vector length % Output: reconstructed signal % Call syntax: rSignal = decompress(tC,tL, wavelet); % ----------------------------------% Decode coefficients rC = decode(tC); % Reconstruct signal from coefficients rSignal=waverec(tC,tL,wavelet); % ----------------------------% Initialise variables zeroseq = flse; % True if previous array entry was a zero zerocount = 0; % Count of no of zeros in sequence j= 1; % Start index value for compressed coefficients compC = [ ]; % compressed coefficients vector % Start iterating thru array for m=1:length(C) if (C(m) == 0) & (zeroseq == flse) % First zero compC = [compC C(m)]; j = j+1; zeroseq = true; zerocount = 1; % Reached end of array and last value is zero if m == length(C) compC = [compC zerocount]; end elseif (C(m) == 0) & (zeroseq == true) % Sequence of zeros zerocount = zerocount + 1; % Reached end of array and last value is zero if m == length(C) compC = [compC zerocount]; end elseif (C(m) ~= 0) & (zeroseq == true) % End of zeros compC = [compC zerocount C(m)]; j = j+2; zeroseq = flse; zerocount = 0; else % Non-zero entry compC = [compC C(m)];
j = j+1; end end cC = compC; Decode.m function rC = Decode(cC); % Decode: function to decode consecutive zero valued coefficients % Call syntax: rC = Decode(cC); % ---------------------------% Initialise variables dcompC = [ ]; % Empty reconstructed coefficients array i = 1; % Initial index of loop % Start iterating thru array while i <=length(cC) if cC(i) ~= 0 % Non-zero entry dcompC = [dcompC cC(i)]; i = i + 1; else % Zero entry count = cC(i+1); for m=1:count % Add zeros dcompC = [dcompC 0]; end i = i + 2; end end rC = dcompC; Pefcal.m function [SNR, PSNR, NRMSE] = pefcal(fileName, rS); % Pefcal: Peformance Calculations function file % Calculates Signal to Noise Ratio, Peak Signal to Noise Ratio % and Normalized Root Mean Square Error % Get original speech signal origdata = openfile(fileName); % Resize reconstructed signal for the mathematics to work rS = rS(1:length(origdata)); % Signal to Noise Ratio sqdata = origdata.^2; % Square of original speech signal sqrS = rS.^2; % Square of reconstructed signal msqdata = mean(sqdata); % Mean square of speech signal sqdiff = (sqdata-sqrS); % Square difference
msqdiff = mean(sqdiff); % Mean square difference SNR = 10*log10(msqdata/msqdiff); % Signal to noise ratio % Peak Signal to Noise Ratio N = length(rS); % Length of reconstructed signal X = max(abs(sqdata)); % Maximum absolute square of orig signal diff = origdata - rS; % Difference signal endiff = (norm(diff))^2; % Energy of the difference between the % original and reconstructed signal PSNR = 10*log10((N*(X^2))/endiff); % Peak Signal to noise ratio % Normalised Root Mean Square Error diffsq = diff.^2; % Difference squared mdiffsq = mean(diffsq); % Mean of difference squared mdata = mean(origdata); % Mean of original speech signal scaledsqS = (origdata - mdata).^2; % Squared scaled data mscaledsqS = mean(scaledsqS); % Mean of squared scaled data NRMSE = sqrt(mdiffsq/mscaledsqS); % Normalized Root Mean Square Error Comp.m function [tC, tL, PZEROS, PNORMEN, cScore, nFrames] = comp(fileName, wavelet, N, frameSize) % Comp: function simulates real time compression of speech signals % Inputs: speech signal file name, wavelet and frame size % If frame size is 0 no frames are used % Outputs: compressed coefficients and compression ratio % Call Syntax: [tC, tL, PZEROS, PNORMEN, cScore, nFrames] = comp(fileName, wavelet, N, frameSize) % Calculate no of frames fileSize = FileSize(fileName); if frameSize == 0 frameSize = fileSize; end numFrames = ceil(fileSize/frameSize); % Initialise other variables %tC = [ ]; % transmitted coefficients vector tXC = [ ]; % uncompressed coefficients vector %lenOrigC = 0; % length of original coefficients PERF0V = [ ]; % vector of % truncation for each frame PERFL2V = [ ]; % vector of % retained energy for each frame for i=1:numFrames % Read a frame from the speech file sdata = FrameSelect(i,frameSize,fileName, fileSize);
% Compute the DWT to level N [C,L] = wavedec(sdata,N,wavelet); % Calculate default thresholds [THR, SORH, KEEPAPP] = gblThr(cmp,wv,sdata); SORH = h; KEEPAPP = 0; % Can threshold approximation coefficients also % Compress signal using hard thresholding [XC,CXC,LXC,PERF0,PERFL2]=Trunc(gbl,C,L,wavelet,N,THR,SORH,KEEPAPP); % Encode coefficients cC = encode(CXC); % Transmitted coefficients tXC = [tXC cC]; % Truncation % Vector PERF0V = [PERF0V PERF0]; % Retained Energy Vector PERFL2V = [PERFL2V PERFL2]; end % Return Values tC = tXC; tL = tXC; PZEROS = mean(PERF0V); PNORMEN = mean(PERFL2V); cScore = fileSize/length(tC); nFrames = numFrames; Decomp.m function rSignal = decomp(tC,tL,wavelet,numFrames,frameSize); % Decomp: function simulates real time decoding of signals % Inputs: encoded wavelet coefficients, coeff vector length % Call syntax: rSignal = decompress(tC,tL,numFrames,frameSize); % Outputs: reconstructed signal % Call Syntax: rSignal = decomp(tC,tL,wavelet,numFrames,frameSize); % Initialise other variables rS = [ ]; % reconstructed signal frameSize = sum(tL)-frameSize; % frame size of DWT coefficients % Decode coefficients rC = decode(tC); for i=1:numFrames % Range of frame R1 = (i-1)*frameSize + 1; R2 = i*frameSize;
% Read coefficients in frame fC = rC(R1:R2); % Reconstruct frame signal X = waverec(fC,tL,wavelet); % Total reconstructed signal rS=[rS; X]; end % Return output rSignal = rS; Filesize.m function fSize = FileSize(fName); % FileSize: counts no of samples in a speech file % Call syntax: fSize = FileSize(fName); % --------------------data = OpenFile(fName); fSize=length(data); Frameselect.m function v = FrameSelect(fNum,fSize,fileName,fileSize); % FrameSelect : reads a frame of data from a speech file into a column vector % call syntax: v = FrameSelect(fNum,fSize,fileName); % -------------------------------% Read the corresponding frame from the sound file into a column vector % range = [R1 C1 R2 C2] C1 = C2 = 0 since only one column % R1 = First Value, R2 = Last Value R1 = fSize*(fNum-1); R2 = (fSize*fNum - 1); R3 = R2; % Adjust range value for last frame if R2 >= fileSize R2 = fileSize - 1; end range = [R1 0 R2 0]; v = dlmread(fileName,,range); % If data for last frame is smaller than frame size % Zero pad the frame if R3~=R2 N = (R3-R2); for i= 1:N v = [v;0]; end
end Optimal.m % Optimal Wavelet For Speech Compression % This script file determines the percentage of Speech Frame Energy % Concentrated by wavelets in the first N/2 Coefficients % -----------------------------------------------------------------% Inputs: speech signal file name, wavelet and frame size % Outputs: compressed coefficients and compression ratio % User Inputs fileName = c:\program files\matlab\work\s180.od; wavelet = db10; frameSize = 160; % Calculate no of frames fileSize = FileSize(fileName); if frameSize == 0 frameSize = fileSize; end numFrames = ceil(fileSize/frameSize); % Vector to Store Retained Energy of Each Frame PREV = [ ]; % Step thru each frame and calculate Retained Energy %for i=1:numFrames % Read a frame from the speech file sdata = FrameSelect(8,frameSize,fileName, fileSize); % Compute the DWT to level 5 [C,L] = wavedec(sdata,5,wavelet); % Calculate Energy Retained in first N/2 Coefficients xC = C(1:(length(C)/2)); RE = 100*(norm(xC))^2/(norm(C))^2; PREV = [PREV ; RE]; %end PREV Voiced.m % Vocied, Unvoiced and Mixed Frames % This script file plots frames % -----------------------------------------------------------------% Inputs: speech signal file name, wavelet and frame size % Outputs: compressed coefficients and compression ratio % User Inputs fileName = c:\program files\matlab\work\s180.od;
wavelet = db10; frameSize = 1024; % Calculate no of frames fileSize = FileSize(fileName); if frameSize == 0 frameSize = fileSize; end numFrames = ceil(fileSize/frameSize); % Read frame i from the speech file i = 9; sdata = FrameSelect(i,frameSize,fileName, fileSize); % Compute the DWT to level 5 [C,L] = wavedec(sdata,5,wavelet); % Calculate Energy Retained in first N/2 Coefficients xC = C(1:(length(C)/2)); RE = 100*(norm(xC))^2/(norm(C))^2; % Plot frame and wavelet transfrom coefficients subplot(2,1,1); plot(sdata,r); title(Mixed Speech Segment); subplot(2,1,2); plot(C); title(DWT Coefficients Using Db10 Wavelet);
Result
Calculating Thresholds
For the truncation of small-valued transform coefficients, two different thresholding techniques are used, Global Thresholding and By-Level Thresholding. The aim of Global Thresholding is to retain the largest absolute value coefficients, regardless of the scale in the wavelet decomposition tree. Global thresholds are calculated by setting the % of coefficients to be truncated. Level dependent thresholds are calculated using the Birge-Massart strategy [15]. This thresholding scheme is based on an approximation result from Birge and Massart and is well suited for signal compression. This strategy keeps all of the approximation coefficients at the level of decomposition J.The number of detail coefficients to be kept at level i starting from 1 to J are given by the formula:
is a compression parameter and its value is typically 1.5. The value of M denotes the how scarcely distributed the wavelet coefficients are in the transform vector. If L denotes the length of the coarsest approximation coefficients then M takes on the values in Table 4.1, depending on the signal being analysed.
Thus this approach to thresholding selects the highest absolute valued coefficients at each level.
the Literature Review. This encoding scheme is the primary means of achieving signal compression. In Matlab however, the coding of this compression algorithm using vectors results in relatively slow performance, with unacceptable delays for real time speech coding. This encoding process can be speeded up significantly by programming it in another language such as C++
Performance Measures:
A number of quantitative parameters can be used to evaluate the performance of the wavelet based speech coder, in terms of both reconstructed signal quality after decoding and compression scores. The following parameters are compared: ! Signal to Noise Ratio (SNR), ! Peak Signal to Noise Ratio (PSNR), ! Normalised Root Mean Square Error (NRMSE), ! Retained Signal Energy ! Compression Ratios
The results obtained for the above quantities are calculated using the following formulas: 1. Signal to Noise Ratio
is the mean square of the speech signal and original and reconstructed signals. Peak Signal to Noise Ratio
2e
N is the length of the reconstructed signal, X is the maximum absolute square value of the signal x and 2 x r is the energy of the difference between the original and reconstructed signals. Normalised Root Mean Square Error
x(n) is the speech signal, r(n) is the reconstructed signal, and x(n) is the mean of the speech signal. Retained Signal Energy
x( n ) is the norm of the original signal and r( n ) is the norm of the reconstructed signal. For one-dimensional orthogonal wavelets the retained energy is equal to the L2-norm recovery performance.
Compression Ratio
Conclusion:
speech coding is currently an active topic for research in the areas of Very Large Scale Integrated (VLSI) circuit technologies and Digital Signal Processing (DSP). The Discrete Wavelet Transform performs very well in the compression of recorded speech signals. For real time speech processing however, its performance is not as good. Therefore for real time speech coding it is recommended to use a wavelet with a small number of vanishing moments at level 5 decomposition or less. The wavelet based compression software designed reaches a signal to noise ratio of 17.45 db at a compression ratio of 3.88 using the Daubechies 10 wavelet. The performance of the wavelet scheme in terms of compression scores and signal quality is comparable with other good techniques such as code excited linear predictive coding (CELP) for speech, with much less computational burden. In addition, using wavelets the compression ratio can be easily varied, while most other compression techniques have fixed compression ratios.