Beruflich Dokumente
Kultur Dokumente
Griffin Barich
QAC 241
For my final project in this class I looked at the network of artists based
on the related artists from Spotify. Spotify is a music streaming company that
any occasion. It also allows smaller artists to get their music listened to more. I
have used Spotify for many years and loved the freedom it has given me to
listen to all types of music from Bach to Britney. When listening to music, I
find that Spotify does a really good job of finding related music so I can always
find new and interesting music. This sparked my interest in how Spotify finds
these artists. Do they use genres? Do they use user data? Do they use data
from the artist’s songs? Using a network of related artists, I made the first step
to finding out.
The first step in creating this network was finding the data. Luckily,
Spotify has a pretty easy to use API. Their API works by submitting GET
requests with a token at certain endpoints in order to get data about artists,
their albums, songs, users, and some relevant metadata. All the data is
returned in JSON format and is relatively easy to parse. I had some trouble
found that many people had created packages in R that contain functions
specifically for Spotify querying. The package that was best for my uses was
Rspotify by Tiago Dantas. The package allows the user to authenticate their
token with a function and do many searches through functions. While this
package initially made life easier, the functions inside are very rigid. The
functions dealt with the majority of the data I needed well, but broke down at
any slight weirdness in the dataset. The Rspotify package was an essential, but
I wanted to gain the network of related artists for a single artist, but I
process into a function. The function uses the getRelated2() function to get the
related artists of the given artist, iterates over the related artists getting the
name, id, and popularity of each, then repeating the process for the number of
layers. This left me with a data frame of all the artists and their data. From
here I was able to make a network that connected artists and related artists. At
this point, I iterated over the data frame and added a node characteristic which
where the artists were connected based on genre. I then exported both graphs
into Cytoscape and created a custom style that would fit all of the graphs.
to make some interesting conclusions. The first and most obvious conclusion is
3
that genre is very closely related with related artists. In Cytoscape, the Force
diffusion layout couldn’t separate the nodes. The connections were so strong
that the nodes were on top of each other. This showed that genre and related
artists are very heavily correlated. The real answer for how related artists are
found on the Spotify for Artists FAQ page. “These artists are determined
the internet with Spotify user listening data” (Link). The other interesting
finding came from comparing the shapes of different graphs from different
much larger. This can be easily compared to the graph of Phish, a less popular,
more niche artist. They have a small cluster of similar artists, tightly held
together by their similar appeal. Only when you get to the outside do the more
mainstream bands like Crosby, Stills, Nash and Young. The graph of Mozart’s
related artists really surprised me. While there is a cluster in the middle, the
range extends much further than I would have expected. My guess for why the
network is so large and dense is that many classical artists have multiple
names associated with their work. Sometimes only the composer is credited,
This leads to many nodes in a similar location in the network. Different types of
While I am happy with the results of my project, there are definitely areas
in the Rspotify package. I just wouldn’t use it and write the functions myself.
This would allow me to write catches in the code to avoid errors that end the
code running after hours of running. Another feature of the API that I didn’t
use was the ability to query multiple ids at a time. Rspotify only has support
for that. Doing many requests at a time would allow me to make the code faster
and more efficient. Finally, I would like to analyze the Networks in more depth
and maybe see how other aspects of the artist (i.e. track popularity, key, tone of
songs) play into their relations. I enjoyed this project because it allowed me to
Appendix:
#Final Work
setwd(dir = "Final Project")
library(httr)
clientID= "9f8dd375fe384d5db40a6b8c0363322d"
secret = "40f0ebe28dc44ea98ff04706e2becd6c"
# Sys.setenv(SPOTIFY_CLIENT_ID = clientID)
# Sys.setenv(SPOTIFY_CLIENT_SECRET = secret)
library(devtools)
install_github("tiagomendesdantas/Rspotify")
library(Rspotify)
keys <- spotifyOAuth("QAC Final",clientID, secret)
library(igraph)
return(relatedArtists)
}
# stringsAsFactors=F)
# df = rbind(df, temp_df)
#}
#
# links_df = data.frame(from=rep(df$id[1], 20),
# to = df$id[2:21], stringsAsFactors=F)
# g = graph_from_data_frame(d=links_df, directed=T, vertices=df[, c("id", "name",
"popularity")])
# plot(g, vertex.size= (V(g)$popularity/3))
# n=nrow(df)
# for (k in 2:n) {
#
# ## take the id and construct a new URL for Spotify request
# artistfull=getArtistinfo(df$id[k], token = keys)
# artist=artistfull[,1:3]
# tmp_a = as.character(artist$name)
# tmp_a = enc2utf8(tmp_a)
# relatedartists= getRelated2(artist$id, token = keys)
# #if (nrow(relatedartists) == 0) { next }
#
# ## reuse the code of for loop from before
# for (j in 1:nrow(relatedartists)) {
# temp_df = data.frame(name = relatedartists$name[j],
# id = relatedartists$id[j],
# popularity = relatedartists$popularity[j],
# stringsAsFactors=F)
# df = rbind(df, temp_df)
#
#
# links_df = rbind(links_df,
# data.frame(from=df$id[k],
# to=relatedartists$id[j], stringsAsFactors=F))
# }
#}
#
# k= duplicated(df$id)
# df2=df[!k,]
#
# g = graph_from_data_frame(d=links_df, directed=T, vertices=df2[, c("id", "name",
"popularity")])
# plot(g, vertex.label=V(g)$name, vertex.size=(V(g)$popularity/10), edge.arrow.size=0.5)
links_df = rbind(links_df,
data.frame(from=df$id[k],
to=relatedartists$id[j], stringsAsFactors=F))
}
}
k= duplicated(df$id)
9
df=df[!k,]
i=i+1
}
g = graph_from_data_frame(d=links_df, directed=T, vertices=df[, c("id", "name", "popularity")])
return(g)
}
r = sapply(V(g2)$genres, FUN=length)
genres_vec = unlist(V(g2)$genres)
genres_vec = paste("#", genres_vec, sep="")
xx = rep(V(g2)$name, times=r)
g2_genre_net = data.frame(from=xx,
to = genres_vec, stringsAsFactors = F)
genre_net = graph_from_data_frame(g2_genre_net, directed=F)
V(genre_net)$type = grepl("^#", V(genre_net)$name)
bands = bipartite.projection(genre_net, which="false")
length((V(gf)))
V(gf)$genres=vector(mode= "character", length = 1)
for (x in (y-1):length(V(gf))){
tmp=searchArtist(gsub(pattern="[^a-zA-Z0-9\\s]", replacement= " ",
x=enc2utf8(as.character(V(gf)$name[x]))), token = keys)
10
if (length(tmp)==0) {next}
cat(x)
y=x
V(gf)$genres[x]=strsplit(as.character(getArtistinfo(tmp$id[1], token = keys)$genres), split = ",",
fixed = T)
}
r = sapply(V(gf)$genres, FUN=length)
genres_vec = unlist(V(gf)$genres)
genres_vec = paste("#", genres_vec, sep="")
xx = rep(V(gf)$name, times=r)
gf_genre_net = data.frame(from=xx,
to = genres_vec, stringsAsFactors = F)
genre_net = graph_from_data_frame(gf_genre_net, directed=F)
V(genre_net)$type = grepl("^#", V(genre_net)$name)
bands2 = bipartite.projection(genre_net, which="false")
V(bands2)$name[1:20]
plot(bands2, vertex.label=V(bands)$name, vertex.label.cex=.5, vertex.size=2)
length((V(ghuge)))
V(ghuge)$genres=vector(mode= "character", length = 1)
y=2
for (x in (y-1):length(V(ghuge))){
tmp=searchArtist(gsub(pattern="[^a-zA-Z0-9\\s]", replacement= " ",
x=enc2utf8(as.character(V(ghuge)$name[x]))), token = keys)
if (length(tmp)==0) {next}
cat(x)
y=x
V(ghuge)$genres[x]=strsplit(as.character(getArtistinfo(tmp$id[1], token = keys)$genres), split
= ",", fixed = T)
}
r = sapply(V(ghuge)$genres, FUN=length)
genres_vec = unlist(V(ghuge)$genres)
genres_vec = paste("#", genres_vec, sep="")
xx = rep(V(ghuge)$name, times=r)
ghuge_genre_net = data.frame(from=xx,
to = genres_vec, stringsAsFactors = F)
11
V(bands3)$name[1:20]
plot(bands3, vertex.label=V(bands)$name, vertex.label.cex=.5, vertex.size=2)