Beruflich Dokumente
Kultur Dokumente
w0
FE = log(∫ X ( w) dw)
2
(1) (a)
0
Where X(w) is the Fourier coefficients, is half
sampling rate.
The flowchart of extracting audio fingerprint in
Fig.1.
764
energy which cannot be varied under noise and other SNR=5; from the figure, redundancy can be generated
forms of distortion. Using FFT to convert the signal to which interfered by noise. So this paper introduces the
frequency domain S(k,j)=FFT(s(k,j)), then can get threshold comparison method to find peaks in order to
local energy aggregation properties after weighted- make sure the sparse uniform characteristics of the
processing, referred in Equation (3). peaks. The update of threshold in Equation (4).
765
Table 1. Results of comparing Shazam with
Proposed
matching Start Value
approach ID
Number frame i=1
Shazam
481 11 3357 0
approach
Proposed
481 48 3358 -1
approach
766
For the present the proposed scheme has not yet
be evaluated in a variety of noises and degradations. It
is possible that the minimal bit error rate will be raised
significantly and the recognition accuracy will decline
correspondingly. In the future we’ll focus on testing
using degraded audios samples and improve the
proposed algorithm accordingly.
7. Acknowledgements
This work was supported in part by Shanghai’s Key
Discipline Development Program under Grant No.
J50104. It was also supported in part by Datasentric
Inc, 38660 Lexington Street No.440 Fremont, CA,
USA.
References
[1] Avery Wang. “The Shazam Music Recognition Service”.
Communications of the ACM, Vol.49, No.8, August 2006.
[2] J.T.Foote, “An Overview of Audio Information
Retrieval”,ACM-SpringerMultimediaSystem,Vol.7,no.1,pp.2-
11,ACMPress/Springer-Verlag,Jan.1999.
[3] Yuan-Yuan Shi, Xuan Zhu, Hyoung-Gook Kim, Ki-
Wan Eom. A Robust Music Retrieval System.AES 120th
Convention, Paris, France, pp.20-30, 2006.
[4] J. Haitsma and T. Kalker. “A Highly Robust Audio
Fingerpriting System”, Proc. 3rd ISMIR, pp. 144-148,
2002.
[5] A.Wang. “An industrial strength audio search algorithm”,
Proc. 4th ISMIR, pp. 7–13, 2003.
[6] B.Zhu,W Li, “A Novel Audio Fingerprinting Method
Robust to Time Scale Modification and Pitch Shifting” Proc.
10thMM, pp. 25–29, 2010.
[7] Lotter, T., Vary, P. “Speech enhancement by MAP
spectral amplitude estimation using a super-Gaussian speech
model”. EURASIPJ. Appl. Signal Proc. 7, pp.1110–1126,
2005.
767