Beruflich Dokumente
Kultur Dokumente
INTRODUCTION Large storage requirements limit the amount of audio data that can be stored on compact discs, flash memory, and other media. Large file sizes also give rise to long download times for retrieving songs from the internet. For these reasons (and others), there is considerable interest in shrinking the storage requirements of sampled sound.
plot (f,w*c) % Pick a cut-off value and split the coefficients into low- and highprecision sets: cutoff = 0.00075; mask = (abs(w*c)<cutoff); low=mask.*c; high=(1-mask).*c; % This plot nicely illustrates the cut-off region: plot(f,w*high,'-R',f,w*low,'-b') % Now pick a precision (in bits) for the low precision data set: lowbits=8; % We wont quantize the high-precision set of coefficients (high), only the % low precision part (requires quantize.m): m = max(abs(low)); y = low/m; y = floor((2^lowbits - 1)*y/2); y = 2*y/(2^lowbits -1); low = m*y; % Finally, lets reconstruct our compressed audio sample and listen to it! z=idct(low+high); sound(z,R)
The psychoacoustic model is based on many studies of human perception. The two main properties of the human auditory system that make up the psychoacoustic model are:
Masking
Many portions of an audio stream cannot actually be heard. Any sound with intensity below a certain threshold (called the threshold in quiet) cannot be heard, due to the limits of the ears sensitivity. Sometimes, sounds above the threshold in quiet cannot be heard because other sounds cover them up. This is due to a psychoacoustic phenomenon known as masking. If two separate tones are close enough in frequency, one tone may actually cover up the other one. The tone that is heard is called the masker, and the tone which is not heard is called the maskee.
The above phenomenon is known as simultaneous masking. There is another phenomenon known as temporal masking. Temporal masking is the masking of a sound before or after the masker event occurs.
2. DESIGN
2.1 Frames The first step in designing an audio coder is to segment the audio stream into frames. A frame is a short section of audio, typically less than 50ms each. At a sampling frequency of 44.1kHz, a frame of 2048 samples is about 46ms long. Enframing the audio stream allows the engineer to treat each frame as a relatively stationary sound. Frame lengths longer than 50ms are typically not used, since pleasant sounding audio is non-stationary.. The coder presented in this paper uses a fixed frame length of 2048 samples. 2.2 Signal-to-Mask Ratio There are many ways to calculate the masking threshold. In general, the masking threshold varies depending on the frequency and intensity of the masker signal. In order to calculate the masking threshold, the first step is to calculate the FFT of the frame, and find the spectral peaks. To find the peaks, simply search for every point where the slope changes from positive to negative. Each of these peaks corresponds to individual frequencies in the signal.
2.3 Bit Allocation From the SMR, it can determined which frequency bands should receive the most bits. As a general rule, each bit increases signal-to-noise ratio by about 6dB. Therefore, allocating a bit for each 6dB of SMR would ensure that quantization noise is below the masking threshold, and thus inaudible. However, there may not be enough bits available to do this, bits must be allocated to where they are needed most. The water-filling bit allocation algorithm is used to allocate bits by looking for the maximum value of the SMR, allocating a bit to that subband, subtracting 6dB from the SMR at that frequency, and repeating as long as bits are
available to allocate. 2.4 Quantization After determining where bits should be allocated, the next step is to quantize the audio signal to the appropriate number of bits. This audio coder is based on the Modified Discrete Cosine Transform (MDCT), so the MDCT coefficients are quantized. The MDCT of the original time-domain frame must first be computed. Then the coefficients must be attenuated because values as large as those typically found in the MDCT cannot typically be quantized. Therefore, an attenuation factor is chosen equal to the maximum value found in the MDCT, reducing the maximum value that needs to be quantized to unity. After attenuating the coefficients, they are quantized according to the bit allocation scheme determined earlier. 2.5 Reading/Writing the files Once the MDCT coefficients are quantized, they can be written to a file. In addition to the MDCT coefficients, the gain factor must also be specified as well as the number of bits allocated to each band. In this coder, a file header is also included which contains information such as the sampling frequency, frame length, bit rate, number of bits used for writing the gain factor, and the number of frames in the file. Because only a few bits are to be used to represent the gain factor, the logarithm of the gain is written to the file.
sig=sin(2*pi*1000*[1/Fs:1/Fs:(N/2)/Fs]); win=(0.5 - 0.5*cos((2*pi*([1:(N/2)]-0.5))/(N/2))); fftmax = max(abs(fft(sig.*win))); % defined as 96dB % Enframe Audio FRAMES = enframe(tone,N,N/2); % Write File Header fid = fopen(coded_filename,'w'); fwrite(fid, Fs, 'ubit16'); % Sampling Frequency fwrite(fid, N, 'ubit12'); % Frame Length fwrite(fid, bitrate, 'ubit18'); % Bit Rate fwrite(fid, scalebits, 'ubit4'); % Number of Scale Bits per Sub-Band fwrite(fid, length(FRAMES(:,1)), 'ubit26'); % Number of frames % Computations for frame_count=1:length(FRAMES(:,1)) if mod(frame_count,10) == 0 outstring = sprintf('Now Encoding Frame %i of %i', frame_count, length(FRAMES(:,1))); disp(outstring); end fft_frame = fft(FRAMES(frame_count,:)); if fft_frame == zeros(1,N) Gain = zeros(1,floor(fftbark(N/2,N/2,Fs))+1); bit_alloc = zeros(1,floor(fftbark(N/2,N/2,Fs))+1); else len = length(fft_frame); peak_width = zeros(1,len); peak_points = cell(len,len); peak_min_value = zeros(1,len); % Find Peaks centers = find(diff(sign(diff( abs(fft_frame).^2) )) == -2) + 1; spectral_density = zeros(1,length(centers)); for k=1:length(centers) peak_max(k) = centers(k) +2; peak_min(k) = centers(k) - 2; peak_width(k) = peak_max(k) - peak_min(k); for j=peak_min(k):peak_max(k) if (j > 0) & (j < N) spectral_density(k) = spectral_density(k) + abs(fft_frame(j))^2; end end end % This gives the amplitude squared of the original signal modified_SD = spectral_density / ((N^2)/8); SPL = 96 + 10*log10(modified_SD); % TRANSFORM FFT'S TO SPL VALUES fft_spl = 96 + 20*log10(abs(fft_frame)/fftmax); % Threshold in Quiet
f_kHz = [1:Fs/N:Fs/2]; f_kHz = f_kHz/1000; A = 3.64*(f_kHz).^(-0.8) - 6.5*exp(-0.6*(f_kHz - 3.3).^2) + (10^(3))*(f_kHz).^4; % Masking Spectrum big_mask = max(A,Schroeder(centers(1)*(Fs/2)/N,fft_spl(centers(1)),... 14.5+bark(centers(1)*(Fs/2)/N))); for peak_count=2:length(centers) try big_mask = max(big_mask,Schroeder(centers(peak_count)*(Fs/2)/N,fft_spl((peak_count)),. .. 14.5+bark(centers(peak_count)*(Fs/2)/N))); catch peak_count=peak_count; end end % Signal Spectrum - Masking Spectrum (with max of 0dB) New_FFT = fft_spl(1:N/2)-big_mask; New_FFT_indices = find(New_FFT > 0); New_FFT2 = zeros(1,N/2); for i=1:length(New_FFT_indices) New_FFT2(New_FFT_indices(i)) = New_FFT(New_FFT_indices(i)); end if frame_count == 55 semilogx([0:(Fs/2)/(N/2):Fs/2-1],fft_spl(1:N/2),'b'); hold on; semilogx([0:(Fs/2)/(N/2):Fs/2-1],big_mask,'m'); hold off; title('Signal (blue) and Masking Spectrum (pink)'); figure; semilogx([0:(Fs/2)/(N/2):Fs/2-1],New_FFT2); title('SMR'); figure; stem(allocate(New_FFT2,bits_per_frame,N,Fs)); title('Bits perceptually allocated'); end bit_alloc = allocate(New_FFT2,bits_per_frame,N,Fs); [Gain,Data] = p_encode(mdct(FRAMES(frame_count,:)),Fs,N,bit_alloc,scalebits); end % end of If-Else Statement % Write Audio Data to File qbits = sprintf('ubit%i', scalebits); fwrite(fid, Gain, qbits); fwrite(fid, bit_alloc, 'ubit4'); for i=1:25 indices = find((floor(fftbark([1:N/2],N/2,Fs))+1)==i); qbits = sprintf('ubit%i', bit_alloc(i)); % bits(floor(fftbark(i,framelength/2,48000))+1) if ((bit_alloc(i) ~= 0) & (bit_alloc(i) ~= 1)) fwrite(fid, Data(indices(1):indices(end)) ,qbits); end end end % end of frame loop
%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % FFTBARK % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function b=fftbark(bin,N,Fs) % b=fftbark(bin,N,Fs) % Converts fft bin number to bark scale % N is the fft length % Fs is the sampling frequency f = bin*(Fs/2)/N; b = 13*atan(0.76*f/1000) + 3.5*atan((f/7500).^2);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % SCHROEDER % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function m=Schroeder(freq,spl,downshift) % Calculate the Schroeder masking spectrum for a given frequency and SPL N = 2048; f_kHz = [1:48000/N:48000/2]; f_kHz = f_kHz/1000; A = 3.64*(f_kHz).^(-0.8) - 6.5*exp(-0.6*(f_kHz - 3.3).^2) + (10^(3))*(f_kHz).^4; f_Hz = f_kHz*1000; % Schroeder Spreading Function dz = bark(freq)-bark(f_Hz); mask = 15.81 + 7.5*(dz+0.474) - 17.5*sqrt(1 + (dz+0.474).^2); New_mask = (mask + spl - downshift); m = New_mask; %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % BARK % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function b=bark(f) % b=bark(f) % Converts frequency to bark scale % Frequency should be specified in Hertz b = 13*atan(0.76*f/1000) + 3.5*atan((f/7500).^2); %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % ALLOCATE %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function x=allocate(y,b,N,Fs) % x=allocate(y,b,N) % Allocates b bits to the 25 subbands % of y (a length N/2 MDCT, in dB SPL) bits(floor(bark( (Fs/2)*[1:N/2]/(N/2) )) +1) = 0; for i=1:N/2 bits(floor(bark( (Fs/2)*i/(N/2) )) +1) = max(bits(floor(bark( (Fs/2)*i/(N/2) )) +1) , ceil( y(i)/6 )); end indices = find(bits(1:end) < 2); bits(indices(1:end)) = 0; % NEED TO CALCULATE SAMPLES PER SUBBAND n = 0:N/2-1; f_Hz = n*Fs/N; f_kHz = f_Hz / 1000; A_f = 3.64*f_kHz.^-.8 - 6.5*exp(-.6*(f_kHz-3.3).^2) + 1e-3*f_kHz.^4; % *** Threshold in Quiet z = 13*atan(0.76*f_kHz) + 3.5*atan((f_kHz/7.5).^2); % *** bark frequency scale crit_band = floor(z)+1; num_crit_bands = max(crit_band); num_crit_band_samples = zeros(num_crit_bands,1); for i=1:N/2 num_crit_band_samples(crit_band(i)) = num_crit_band_samples(crit_band(i)) + 1; end x=zeros(1,25); bitsleft=b; [blah,i]=max(bits); while bitsleft > num_crit_band_samples(i) [blah,i]=max(bits); x(i) = x(i) + 1; bits(i) = bits(i) - 1; bitsleft=bitsleft-num_crit_band_samples(i); end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % P_ENCODE % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [Quantized_Gain,quantized_words]=p_encode(x2,Fs,framelength,bit_alloc,scale bits) for i=1:floor(fftbark(framelength/2,framelength/2,Fs))+1 indices = find((floor(fftbark([1:framelength/2],framelength/2,Fs))+1)==i); Gain(i) = 2^(ceil(log2((max(abs(x2(indices(1):indices(end))+1e10)))))); if Gain(i) < 1 Gain(i) = 1; end x2(indices(1):indices(end)) = x2(indices(1):indices(end)) / (Gain(i)+1e-10);
Quantized_Gain(i) = log2(Gain(i)); end for i=1:length(x2) quantized_words(i) = midtread_quantizer(x2(i), max(bit_alloc(floor(fftbark(i,framelength/2,Fs))+1),0)+1e-10); % 03/20/03 end %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % MIDTREAD_QUANTIZER % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [ret_value] = midtread_quantizer(x,R) Q = 2 / (2^R - 1); q = quant(x,Q); s = q<0; ret_value = uint16(abs(q)./Q + s*2^(R-1)); %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % MIDTREAD_DEQUANTIZER % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [ret_value] = midtread_dequantizer(x,R) sign = (2 * (x < 2^(R-1))) - 1; Q = 2 / (2^R - 1); x_uint = uint32(x); x = bitset(x_uint,R,0); x = double(x); ret_value = sign * Q .* x;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % P_DECODE % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function Fs=p_decode(coded_filename,decoded_filename) %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % READ FILE HEADER % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% fid = fopen(coded_filename,'r'); Fs = fread(fid,1,'ubit16'); framelength = fread(fid,1,'ubit12'); bitrate = fread(fid,1,'ubit18'); scalebits = fread(fid,1,'ubit4' ); num_frames = fread(fid,1,'ubit26');
% % % % %
Sampling Frequency Frame Length Bit Rate Number of Scale Bits per Sub-Band Number of frames
for frame_count=1:num_frames %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % READ FILE CONTENTS % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% qbits = sprintf('ubit%i', scalebits); gain = fread(fid,25,qbits); bit_alloc = fread(fid,25,'ubit4'); for i=1:floor(fftbark(framelength/2,framelength/2,Fs))+1
indices = find((floor(fftbark([1:framelength/2],framelength/2,Fs))+1)==i); if ((bit_alloc(i) ~= 0) & (bit_alloc(i) ~= 1)) qbits = sprintf('ubit%i', bit_alloc(i)); InputValues(indices(1):indices(end)) = fread(fid, length(indices) ,qbits); else InputValues(indices(1):indices(end)) = 0; end end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % DEQUANTIZE VALUES % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% for i=1:length(InputValues) if InputValues(i) ~= 0 if max(bit_alloc(floor(fftbark(i,framelength/2,Fs))+1),0) ~= 0 InputValues(i) = midtread_dequantizer(InputValues(i),... max(bit_alloc(floor(fftbark(i,framelength/2,Fs))+1),0)); end end end for i=1:25 gain2(i) = 2^gain(i); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % APPLY GAIN % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% for i=1:floor(fftbark(framelength/2,framelength/2,Fs))+1 indices = find((floor(fftbark([1:framelength/2],framelength/2,Fs))+1)==i); InputValues(indices(1):indices(end)) = InputValues(indices(1):indices(end)) * gain2(i); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % INVERSE MDCT % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% x2((frame_count-1)*framelength+1:frame_count*framelength) = imdct(InputValues(1:framelength/2)); end status = fclose(fid); %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % RECOMBINE FRAMES % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% x3 = zeros(1,(length(x2)-1)/2+1); for i=0:0.5:floor(length(x2)/(2*framelength))-1 x3(i*framelength+1 : (i+1)*framelength) = x3(i*framelength+1 : (i+1)*framelength) + x2((2*i)*framelength+1 : (2*i+1)*framelength); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % WRITE FILE %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%% wavwrite(x3/2,Fs,decoded_filename);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % MDCT % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function y = mdct(x) x=x(:); N=length(x); n0 = (N/2+1)/2; wa = sin(([0:N-1]'+0.5)/N*pi); y = zeros(N/2,1); x = x .* exp(-j*2*pi*[0:N-1]'/2/N) .* wa; X = fft(x); y = real(X(1:N/2) .* exp(-j*2*pi*n0*([0:N/2-1]'+0.5)/N)); y=y(:); %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % IMDCT % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function y = imdct(X) X=X(:); N = 2*length(X); ws = sin(([0:N-1]'+0.5)/N*pi); n0 = (N/2+1)/2; Y = zeros(N,1); Y(1:N/2) = X; Y(N/2+1:N) = -1*flipud(X); Y = Y .* exp(j*2*pi*[0:N-1]'*n0/N); y = ifft(Y); y = 2*ws .* real(y .* exp(j*2*pi*([0:N-1]'+n0)/2/N));
****************************
END
***************************************