Sie sind auf Seite 1von 4

1

AUTO GENERATION OF MUSIC USING


GAN
Dharineesh Ram T.P.,Durga, Sampath Raju, , Shivangi Shukla

under the guidance of

Mr.Anshuman Agarwal, Mr.Naman

relies only on the frequency. So computers understand only

Abstract—Music is a produced beauty using instruments


and defines a sense of harmony and emotion. People create the math behind the music. That is reason why we are able to
their own music and songs and enjoy the pleasantness it hear the music that can be played in Computers. Now to
creates. When it comes to Music Generation, it’s the generate music we need to impart a composer in machine, i.e.
famous Composers who come in our mind. But it’s not the machine works like a pseudo-composer. To do so the first
only the humans who can imagine composing the music. and foremost is to input data to the machine such that it
It’s also the computers and the machines that can define a understands well on how to generate music and then takes step
new meaning for music generation. In the study we use a to generate them. We make use of the Generative
specialized Neural Network known as the Generative Adversarial Neural Network(GANs), a special Neural
Adversarial Neural Network(GAN). GANs have created a
Network where we train on the real music and then generate
new approach in designing and creating real life structures
the samples of music. A Neural Network basically consists of
like images,music and so on. In this paper we will see how
neurons that are described layer wise. It is the fundamental
to generate music using GANs where we exclusively use
MuseGAN . duty of the neuron to understand and replicate the structure of
the brain. Based on the data and the purpose to serve, the
neurons understand them and perform the task that has to be
I. INTRODUCTION performed. In our case our task to generate music.
Music is evolved with dual sense as we need both ears and
II. RELATED WORK
mouth to feel the essence of it. Humans have found music as
the ultimate feeling to reduce stress and calm the minds. Lot of studies and researches in the recent history has shown
Music can broadly be classified as Western and Classical. that generation using computers have been improved on
Both these classifications has its own set of instruments that accuracy as result of change in the architecture of the Neural
Networks/fine tuning the hyper parameters/even new networks
can be used for its generation namely the Guitar, Violin,
that have been created specially serving for a particular
Piano, Flute and so on. Some have the passion to generate the
purpose and many more. In the field of music generation,
music while some are more interested in only listening to
RNN(Recurrent Neural Network) played a major role as they
them. To generate music we require the coordination of every were the ones used extensively for sequential data generation.
person in the group so that the resultant music is nice to hear. But after the introduction of GAN’s by Ian J. Goodfellow , a
But basically music is a time dependent factor. The new way of generation/creation of objects were starting to
frequencies at a particular moment decide what kind of pick up. GANs were originally introduced for image
tone/beat to be produced at that instance. We know that the generation which totally has another dimensional applications
human range of hearing is 20 Hz to 20 KHz. Anything beyond in the world. But later researchers found that GANs could do
or below this range cannot be sensed by us. Therefore when it anything related to generation. Since we know that RNNs
comes to us Music is not only the invisible sense but also the have already played a major role in handling sequential data ,
frequency that matters. we can make use of this RNN and infuse into the GAN
structure for better results.
Humans have the ability to generate music using instruments LSTM(Long Short-Term Memory ) an algorithm of RNN
and vocals. But computer cannot do based on instruments as it was used to generate small pieces of music samples which.
This is a plain method where we just use the previous state to
define the next state . Similarly GANSyth is a methodology
which uses spectral information for the generation of music
2

and it is found to have high accuracy . It uses the image based As shown in Fig 2(a) multiple generators work independently
spectral information to train the network and understand the and generate music of its own track from a private random
frequency changes, the evolution of the crust and trough and vector zi, i = 1, 2,...,M, where M denotes the number of
cycle of waves. But in this paper we use MuseGAN which is generators (or tracks). Therefore to generate n tracks we use n
another GAN based approach for music generation. This generators and n discriminators.
approach is completely a complex structure where it uses train
the model using the data collected from Lakh Pianoroll b)Composer Model:
Dataset to generate pop song phrases consisting of bass, One single generator creates a multichannel piano-roll, with
drums, guitar, piano and strings tracks. each channel representing a specific track, as shown in Fig
2(b).Hence an entire sequence is generated by single generator
and is evaluated by a single discriminator.
III. METHODOLOGY Hybrid Model:
Combining the idea of jamming and composing, we further
GAN(Generative Adversarial Neural Network): propose the hybrid model. As illustrated in Fig 2(c), each of
The GAN consists of two neural networks namely the the M generators takes as inputs an inter-track random vector
Generator and the Discriminator. These two networks define z and an intra-track random vector z(i).This model wholly
the overall structure of the GANs. The structures and the relies on a single discriminator for each and every track
functions of these two are totally different. But they both are generated by each generators.
interlinked to each other, i.e. they both depend on each other
for training as shown in Fig:1

Generator :
The Generator is the one which produces the
result as per our requirement.The Generator receives an input
vector/noise that is passed through a series of neural networks
of the Generator and produces some values/entities based on Fig 2(a): Jamming Model
the input vector.
Discriminator:
The Discriminator is the one which teaches the
Generator to produce the original entity. So once the values
that are generated by the Generator, it is passed to the
Discriminator. The Discriminator is now trained with the
ground truth values and the fake values so as to make the
Generator to understand more on the original value. Hence the Fig 2(b): Composer Model
discriminator is trained on the actual data and understands the
difference between the real and fake ones and makes the
updation of weights of the Generator.

Now again the Generator produces an output which is again


fed to the Discriminator and the similar task is performed
again and again until we achieve a proper output(which
closely resembles to the ground truth value) from the
Generator.

Fig 2(c): Hybrid Model

MUSEGAN:

We now present the MuseGAN, an integration and extension


of the proposed multi-track and temporal models. As shown in
Figure 5, the input to MuseGAN, denoted as z, is composed
of four parts: an inter-track time-independent random vectors
z, an intra-track time-independent random vectors z(i), an
Fig 1: Generative Adversarial Neural Network
inter-track time-dependent random vectors z(t) and an intra-
track time-dependent random vectors z(i,t). For track i (i = 1,
GAN Model: 2,...,M), the shared temporal structure generator Gtemp, and
There are basically three GAN models for music generation. the private temporal structure generator Gtemp,i take the time-
dependent random vectors, z(t) and z(i,t) respectively, as their
a)Jamming Model: inputs, and each of them outputs a series of latent vectors
3

containing inter-track and intra-track, respectively, temporal


information. The output series (of latent vectors), together
with the time-independent random vectors, z and z(i), are
concatenated2 and fed to the bar generator Gbar, which then
generates piano-rolls sequentially. (5) For the track-
conditional scenario, an additional encoder E is responsible for
extracting useful inter-track features from the user-provided
track.3 This can be done analogously so we omit the details
due to space limitation.
Fig 4(a):Piano roll Representation of Bernoulli sampling

Fig 4(a):Piano roll Representation of Hard Thresholding

Fig 3:MuseGAN Architecture

DATASET:

The piano-roll dataset we use in this work is derived from the Fig 4(a):Piano roll Representation of Hard Thresholding
Lakh MIDI dataset (LMD) (Raffel 2016), a large collection of
176,581 unique MIDI files. We convert the MIDI files to
V. CONCLUSION
multi-track piano-rolls. For each bar, we set the height to 128
and the width (time resolution) to 96 for modeling common The objective metrics and the subjective user study show that
temporal patterns such as triplets and 16th notes.5 We use the the proposed models can start to learn something about music.
python library pretty midi (Raffel and Ellis 2014) to parse and Although musically and aesthetically it may still fall behind
process the MIDI files.These MIDI files are converted into the level of human musicians, the proposed model has a few
piano rolls which consists on only binary values. Hence our desirable properties, and we hope follow-up research can
final output is also binary output which is a piano roll and has further improve it.
to be converted into MIDI files. Now we have produced music without knowing any prior
information on music and instruments. This is the cool stuff of
IV. EXPERIMENTAL RESULTS Neural Networks. We have not learnt anything on how
generate music. But the whole simple thing was made by our
The model was trained on a pertained model where the
weights and model files were imported. Now based on the Neural Network. We have found that not only humans, but
pretrained model we received different samples of music files. also machines can be trained to be super intellectual to
The resultant music was classified on the basis on Bernoulli compose a music. No one other than a machine can compose
sampling and hard thresholding. The music generated was a music within few minutes. This is the major advantage of
found to be more likely a human composed one. Fig Neural Network where they can produce music as per our
4(a),(b),(c),(d) shows the piano roll representation of the likeness. This work can be further extended to generate music
music generated by MuseGAN. based on gerne. Also music can be performed based on the
A Web Application was developed where we can click the music we have in our playlists so that the GANs can produce
generate button to generate music . This produces the music music as per our likeness.
generated by the GAN and the user can experience the music
performed by the machine. Flask framework was used to
suffice this purpose. VI. ACKNOWLEDGMENT

We thank Bennett University for providing the opportunity


and faculty support for the completion of this project.

Fig 4(a):Piano roll Representation of Bernoulli sampling


4

REFERENCE

[1] Dong, Hao-Wen, et al. "MuseGAN: Multi-track


sequential generative adversarial networks for symbolic
music generation and accompaniment." Thirty-Second
AAAI Conference on Artificial Intelligence. 2018.

[2] Dong, Hao-Wen, and Yi-Hsuan Yang. "Convolutional


generative adversarial networks with binary neurons for
polyphonic music generation." arXiv preprint
arXiv:1804.09399 (2018).

[3] Dong, Hao-Wen, et al. "MuseGAN: demonstration of a


convolutional gan based model for generating multi-track
piano-rolls." Proceedings of International Society of
Music Information Retrieval Conference. 2017.

[4] Goodfellow, Ian, et al. "Generative adversarial nets."


Advances in neural information processing systems. 2014.

[5] Choi, Keunwoo, et al. "Convolutional recurrent neural


networks for music classification." 2017 IEEE
International Conference on Acoustics, Speech and Signal
Processing (ICASSP). IEEE, 2017.

[6] Engel, Jesse, et al. "Gansynth: Adversarial neural audio


synthesis." arXiv preprint arXiv:1902.08710 (2019).

Das könnte Ihnen auch gefallen