Sie sind auf Seite 1von 5

International Journal of Scientific Research Engineering &Technology (IJSRET)

Volume 1 Issue2 pp 011-015 May 2012 www. ijsret.org ISSN 2278 - 0882

Musical Instrument Transcription Using Neural Network


V.S.Shelar1, D.G.Bhalke2
(JSPMs Rajarshi Shahu College of Engineering, Tathwade, Pune, Maharashtra, India) *(Email: varsha3005@gmail.com) ** (Email: bhalkedg2000@yahoo.co.in)

ABSTRACT
Automatic music transcription (AMT) is a challenging topic in audio signal processing. Music transcription is the process of converting a musical recording to a musical score (i.e. symbolic representation). The system discussed in this paper takes as input the sound (.wav file) of a recorded monophonic music and it produces conventional musical representation as output. In Monophonic Music notes of single instrument are played one-byone. The system is mainly composed of two phases first is training phase & second is testing phase. In training phase reference feature vector is detected using various features and in second phase transcription is achieved with the help of artificial neural network. In this paper we work for C Note of musical instrument. Keywords: Feature extraction, music transcription.

Following is the fig. 1 which shows Music transcription Process.

Fig.1: Music Transcription Process II.

METHODOLOGY

I.

INTRODUCTION

Musical instrument Transcription is the process of converting audio signals (of performed music) into symbolic representation of music scores. It consists in transforming the musical content of audio data into a symbolic notation. Musical instrument recognition is also useful for Automatic Music Transcription (AMT). Automatic music transcription is used in Audio Watermarking & a visualization of media players, which displays the music score during the playback. It can also be used to monitor students playing musical instruments, by transcribing the music played by the student and evaluating it against the standard score. Transcription also plays an Indispensable role in content based music retrieval, such as query by humming. The automatic transcription of real-world music is an extremely challenging task in audio signal processing.

We must discern two cases in which the behaviour of the automatic transcription systems is different between monophonic music and polyphonic music. In Monophonic Music notes of single instrument are played one-by-one and in polyphonic music two or several notes of one instrument or more than one instrument can be played simultaneously. The system discussed in this paper takes as input the sound of a recorded monophonic music and it produces conventional musical representation as output. It consists in transforming the musical content of audio data into a symbolic notation. Over the years, considerable work has been done in music transcription; various approaches for Automatic music transcription have been reported. [1] Describes the transcription system consists of two main components which are the musical-feature extractors and the probabilistic models. Here, the musical-feature extractors include pitch, voicing, phenomenal accent, and metrical accent estimators. [4] Gives review of the Automatic music transcription system for each note event two main characteristics are considered: the attack instant and the pitch. Onset detection is obtained through a timefrequency representation of the audio signal. Note classification is based on constant Q transform (CQT)

IJSRET @ 2012

International Journal of Scientific Research Engineering &Technology (IJSRET)


Volume 1 Issue2 pp 011-015 May 2012 and support vector machines (SVMs). Matija Marolt [5] introduces artificial neural networks acting as pattern recognisers to extract notes from the output of the oscillator network. The system consists of three main stages. They use a cochlear model based on the gammatone filter bank to transform an audio signal of a piano performance into time frequency space. In the second stage a network of coupled adaptive oscillators are used to extract partial tracks from the output of the cochlear model and in the third stage they employ artificial neural networks acting as pattern recognizers to extract notes from the output of the oscillator network. Taylan Cemgil, Hilbert J. Kappen, David Barber [10] represents a graphical model for polyphonic music transcription. It provides a clear framework in, which both high level prior information on music structure can be coupled with low level information in a principled manner to perform the analysis. The model is a special case of the, generally intractable, switching Kalman filter model. They use piano-roll for music transcription. From this Literature survey we conclude that, Automatic music transcription (AMT) is basically consist of Time-Frequency Analysis from that we can extract features. These extracted features give feature vector. Finally with the help of Neural Network we are able to obtain a note sequence.
III.

www. ijsret.org

ISSN 2278 - 0882

Fig.2: Block Diagram for Music Transcription system.

3.1 Pre-Processing: The pre-processing is simply a kind of normalization by scaling the sampled sound file data to fall within the range of -1 to 1. Audio signal having many features to extract musical features, and then the feature vector is obtained from extracted features. This feature vector is given to the neural network to obtain the transcription of music. 3.2 Feature Extraction: The purpose of feature extraction is to obtain the relevant information from the input data to execute certain task using desired set of features.

PROPOSED METHOD

Fig.3: Description of Feature Extraction Process. 3.2.1 Spectral Features (Timbral Features): 3.2.1.1 Spectral flux: Spectral flux reflects the rate of change of the power spectrum. It is measure of how quickly the power spectrum changes from frame to frame. It is calculated by comparing the power spectrum of one spectrum with the power spectrum of the previous frame. We may calculate the Euclidean distance between the normalized spectra. Spectral Flux can be used to determine Timbre of an Audio signal.

It is the process of converting audio signals into symbolic representation of music scores. Project work is divided in to two phases first is training phase & second is testing phase.

IJSRET @ 2012

International Journal of Scientific Research Engineering &Technology (IJSRET)


Volume 1 Issue2 pp 011-015 May 2012 3.2.1.2 Spectral Centroid: This parameter characterizes the spectrum of signal. It indicates the location of the centre of gravity of magnitude spectrum. Perceptually, it gives the impression of brightness of sound. It can be evaluated as the weighted mean of spectral frequencies. We find the FFT of the signal segment and find the average energy distribution in steady state portion of tone. It measures the spectral shape. Higher Centroid values correspond to brighter texture with more high frequencies. www. ijsret.org ISSN 2278 - 0882

accomplished through trial and error. In our proposed system we employed a feature selection technique to find the ideal feature set for transcription system. First extract a single feature for N number of samples. Then compare the feature value of N samples with each other if the difference is measurable then select it for a transcription system. This process continues by testing all features. The combination of all selected feature form a desired feature set. The extracted features are normalized by their means and standard deviations. Then a sequential forward selection method is used to select the best feature subset. Firstly the best single feature is selected based on transcription accuracy it can provide. 3.4 Neural Network: The feed forward neural network begins with an input layer. This input layer must be connected to a hidden layer. This hidden layer can then be connected to another hidden layer or directly to the output layer. There can be any number of hidden layers so long as at least one hidden layer is provided. In common use most neural networks will have only one hidden layer. It is very rare for a neural network to have more than two hidden layers.

3.2.1.3 Spectral Roll-off: Spectral Roll-off is defined as the frequency bin M below which 85% of the magnitude distributions concentrated. This is one more measure of Spectral Shape.

IV.
3.2.2Time domain Features (Temporal Features): Time Domain Features include features related to the shape of envelope of a note. Time domain Features are measured as the Attack Time (A), Decay Time (D), Sustain Time (S), and Release Time (R).

RESULT

We have a standard database of McGill University of master sample. Here we use three musical instruments for Automatic Music Transcription; namely piano, guitar, violin. 4.1 Spectral Features Extraction: For all type of programming here we use MATLAB Software. MATLAB is a very powerful toolbox. Here we extract Spectral Features Spectral Roll Off, Spectral Flux and Spectral Centroid. Following figure show spectrum musical note from that we calculate spectral roll off, spectral flux and Spectral Centroid.

Fig.4: ADSR envelope. 3.3 Feature vector: Choosing the extract feature is an essential task in the instrument transcription system. This can be

IJSRET @ 2012

International Journal of Scientific Research Engineering &Technology (IJSRET)


Volume 1 Issue2 pp 011-015 May 2012 www. ijsret.org ISSN 2278 - 0882

Fig.8: ADSR envelope of Piano C4

Fig. 5: Spectrum of Violin C4

Fig. 9: ADSR envelope of Guitar C4

Fig. 6: Spectrum of Guitar C4 Fig. 10: ADSR envelope of Violin C4

4.3 Fundamental Frequency F0 Using FFT: Algorithm to find Fundamental Frequency F0 i. Read .WAV file ii. Plot Waveform iii. Take FFT of Signal iv.Plot Spectrum of Signal v. We get F0. To compare all the values of extracted Features; Results of are shown below.
Fig.7: Spectrum of Piano C4

4.2 Temporal Feature Extraction: Here we extract Temporal Features mean Time Domain Features. Time domain Features are measured as the Attack Time (A), Decay Time (D), Sustain Time (S), and Release Time (R). Following are the fig. shows ADSR envelope. From the shape of ADSR envelope we can identify the correct family of musical instrument.

TABLE I Identification of Notes Using Standard Fundamental Frequency F0

IJSRET @ 2012

International Journal of Scientific Research Engineering &Technology (IJSRET)


Volume 1 Issue2 pp 011-015 May 2012 Table I shows fundamental frequency f0 piano, violin and guitar; comparing with standard fundamental frequency f0. It is used to identify exact notes of instrument. www. ijsret.org ISSN 2278 - 0882

[4] Giovanni Costantini, Massimiliano Todisco, Renzo Perfetti, Roberto Basili , Daniele Casali, SVM Based Transcription System with ShortTerm Memory Oriented to Polyphonic Piano Music, IEEE, 2010. [5] Matija Marolt, Transcription of polyphonic piano music with neural networks, IEEE, Mediterranean Electro technical Conference, MEleCon, Vol. 11, 2000. [6] Zheng Guibin, Liu Sheng, Automatic Transcription Method for Polyphonic Music Based on Adaptive Comb Filter and Neural Network, IEEE Conference, 2007. [7] Matija Marolt, A Connectionist Approach to Automatic Transcription of Polyphonic Piano Music IEEE Conference, 2004.

V.

CONCLUSION

We have presented an automatic music transcription system for musical instrument namely piano, guitar and violin. Here we work on C Note of musical instrument. It is necessary to extract all the features to get reference feature vector but here only results of Spectral and Temporal Features are shown. Temporal Features include features related to the shape of envelope of a note. From Spectral and Temporal Features we can find the correct family of musical instrument. For pitch estimation we find out Fundamental Frequency F0. From the fundamental frequency F0 we can find the correct Note of musical instrument. Results are shown above. It gives 100% accuracy to find out correct Note of musical instrument. The whole system is implemented with the help of artificial neural network.

REFERENCE
[1] Md. Omar Faruqe, S Ahmad, Md. Al-Mehedi Hasan & Farazul H Bhuiyan , Template Music Transcription for Different types of Musical Instruments, IEEE, 2010. [2] mahdi triki and dirk t.m. slock , Perceptually motivated quasi-periodic signal selection for polyphonic music transcription, ICASSP, 2009. [3] Filipe C. da C. B. Diniz, Luiz W. P. Biscainho, Sergio L. Nettoprovides, Practical Design of Filter Banks for Automatic Music Transcription, ISPA, 2007.

[8] Signalskenichi miyamoto, hirokazu kameoka, haruto takeda, takuya Nishimoto and Shigeki Sagayama, Probabilistic approach to automatic music transcription from audio, IEEE Conference, 2007. [9] G. Costantini, A.Rizzi, D.Casali, Recognition of musical instruments by generalized min-max Classifiers, IEEE Workshop on NN for Signal Processing, 2003. [10] A. Taylan Cemgil, Hilbert J. Kappen, David Barber, a Generative Model for Music Transcription, IEEE transactions, vol. 14, NO. 2 MARCH 2006. [11] Dr. Shaila D. Apte, Speech Signal Processing, (Wiley India, 2012)

IJSRET @ 2012

Das könnte Ihnen auch gefallen