Sie sind auf Seite 1von 3

Player timelines with ggplot | PremierSoccerStats http://www.premiersoccerstats.com/wordpress/?

p=1124

PremierSoccerStats
Your x on the facts

Player timelines with ggplot


Timelines can be quite a handy way of getting an overview of a players career in terms of when they played, with which team and who were their contemporaries
As often is the case, I turned to Stackoverow to set me on my way for an R solution. In this instance, I did not take the accepted answer but rather the ggplot variation.
I used the RODBC package to extract records of all EPL appearances from my database into a dataframe, allGames

?View Code RSPLUS

head(allGames)
FIRSTNAME LASTNAME PLAYERID POSITION TEAMID PLAYER_TEAM TEAMNAME DATE START ON
1 Steve Jones JONESS1 F WHU 2054 West Ham U 1993-11-01 0 0

The data is pretty self-evident. Position shows that Steve Jones is a forward and that for the game in question he neither started nor was used as a substitute. As I am basically try-
ing to show when players were in the team squad, I will still include these data in the analysis. To obtain a players career length at a particular club, I need to nd the earliest and
latest dates: probably overkill, but I am used to using the plyr package

?View Code RSPLUS

library(plyr)
allGames.summary <- ddply(allGames,.(PLAYERID,TEAMID),function(x) c(start=min(x$DATE),end=max(x$DATE)))
# Here is Steve Jone's line at West Ham
subset(allGames.summary,TEAMID=="WHU"&PLAYERID=="JONESS1")
PLAYERID TEAMID start end
2574 JONESS1 WHU 1993-08-14 1997-02-01

OK. Now we can get to some graphing. Lets go way back to the beginning of the Premier League and look at the squad of the champions that season, Manchester United, id MNU

?View Code RSPLUS

library(ggplot2)
q <- ggplot(subset(allGames.summary,TEAMID=="MNU"&start==as.POSIXct(min(allGames.summary$start)))) +
geom_segment(aes(x=start, xend=end, y=PLAYERID, yend=PLAYERID), size=3)
print(q)

Note the use of the min function again to get the rst date and the geom_segment function of ggplot perfect for producing the required lines. Two gotchas to watch out for. The
dates are of POSIXct datatype and unless they are coerced to that an error arises. Also, if the + is placed on the second line the layer does not get added and no plot appears

So what have we got?

1/3 2015.04.24. 0:34


Player timelines with ggplot | PremierSoccerStats http://www.premiersoccerstats.com/wordpress/?p=1124

As can be seen, the data looks reasonable. All the lines start at one point and show dierent end points. To those in the know, Giggss line correctly extends to the current day; he is
the only player appearing 20 years ago still to pull on a shirt.
However, it is not that aesthetically pleasing. Aspects that could be included include

Change axes labels and add a title


Make players name more apparent
Show other EPL teams appeared for, if any
Give some indication of relative appearances
Utilize the full width of the graph
and nally

Wrap it in a function

Some of these amendments need more analysis, others are just adding to the ggplot code

?View Code RSPLUS

# we need players name from the original dataframe.


allGames$player <- paste(allGames$LASTNAME,str_sub(allGames$FIRSTNAME, end=1),sep=" ") #str_sub is in the loaded plyr package

# the allGames.summary needs to be reworked


allGames.summary <- ddply(allGames,.(PLAYERID,PLAYER_TEAM,TEAMID,player),function(x) c(start=min(x$DATE),end=max(x$DATE),apps=length(x$player)))

# create a function which takes the team id and game date as parameters
tlPlot <- function(theTeam,theDate) {

# to cover all clubs a player appeared for we need to obtain a list of their ids
squad <- subset(allGames.summary,TEAMID==theTeam&start==as.POSIXct(theDate))$PLAYERID

# order the data by the number of appearances whilst with the team ( and reversed for graph)
playerOrder <- arrange(subset(allGames.summary,TEAMID==theTeam&PLAYERID %in% squad),desc(apps))$player
playerOrder <- rev(playerOrder)

# create the title (full team name and date would be shown with more space)
theTitle <- paste("Careers for players appearing for",theTeam,"on",theDate,sep=" ")

# Now create the graph object


# subset to selected players but for all their teams , indicated by colour
q <- ggplot(subset(allGames.summary,PLAYERID %in% squad), aes(colour=TEAMID)) +
# show player surname and initial
geom_segment(aes(x=start, xend=end, y=player, yend=player), size=3) +
# order players in terms of apps for team
scale_y_discrete(limits=playerOrder) +
# get rid of axis labels and add the title
xlab("") + ylab("") +ggtitle(theTitle)+
# extend lines to full width
scale_x_datetime(expand = c(0, 0))
return(q)

# make selection. In a production version test for valid teams and


# dates would be performed
tlPlot("MNU","1992-08-15")

Voila!

2/3 2015.04.24. 0:34


Player timelines with ggplot | PremierSoccerStats http://www.premiersoccerstats.com/wordpress/?p=1124

Not perfect but certainly more informative and now replicable. The analysis can easily be extended. For instance, one could select the players with top ten appearances for a club
or show all those who were on squads whilst a particular player was there. The position factor could be identied by colour whilst using an alpha scale for apps.
But thats all for now

Short URL: http://tinyurl.com/mkrl24c


No related posts.

This entry was posted in R, Soccer er Football on October 21, 2012 [http://www.premiersoccerstats.com/wordpress/?p=1124] .

3/3 2015.04.24. 0:34

Das könnte Ihnen auch gefallen