Sie sind auf Seite 1von 7

The Largest

Vocabulary in Hip Hop


UPDATE June '14: I've released v2 of this project
(http://mfdaniels.tumblr.com/post/93313634355/updated-rappers-sorted-bysize-of-vocabulary-20-new), with 20 new rappers added. You can now also
purchase the chart as a poster from Pop Chart Lab!
(http://www.popchartlab.com/collections/prints/products/the-hip-hop-flowchart).

Matt Daniels (http://www.mdaniels.com/) is a designer, coder, and data


scientist at Undercurrent (http://www.undercurrent.com/) in New York City.
His past works include the Etymology of "Shorty"
(http://www.mdaniels.com/shorty) and Outkast, in graphs and charts
(http://www.mdaniels.com/outkast). He decided to examine the vocabulary of
hip hop artists, and this is what he found. May 2014

Literary elites love to rep Shakespeares vocabulary: across his entire corpus,
he uses 28,829 words (http://www.opensourceshakespeare.org/stats/),
suggesting he knew over 100,000 words and arguably had the largest
vocabulary, ever.
I decided to compare this data point against the most famous artists in hip
hop. I used each artists first 35,000 lyrics. That way, prolific artists, such as
Jay-Z, could be compared to newer artists, such as Drake.

# of Unique words used within artist s first 35,000 lyrics


2,900 Words

3,600

4,300

5,000

5,700

6,400

AllArtists

ViewbyRegion

Just

shakespeare

moby dick

would be here

would be here

(5,170)

(6,022)

Notes/sources:
(1)(2) I used the first 5,000 words for 7 of Shakespeare's works:
Hamlet, Romeo and Juliet, Othello, Macbeth, As You Like It,
Winter's Tale, and Troilus and Cressida. For Melville, I used the
first 35,000 words of Moby Dick.
All lyrics are provided by Rap Genius, but are only current to
2012. My lack of recent data prevented me from using quite a
few current artists.
This data viz uses code by Amelia Bellamy-Royds's in this
(http://fiddle.jshell.net/6cW9u/8/) jsfiddle.

35,000 words covers 3-5 studio albums and EPs. I included mixtapes if the
artist was just short of the 35,000 words. Quite a few rappers dont have
enough official material to be included (e.g., Biggie, Kendrick Lamar). As a
benchmark, I included data points for Shakespeare and Herman Melville,
using the same approach (35,000 words across several plays for Shakespeare,
first 35,000 of Moby Dick).
I used a research methodology called token analysis
(http://www.nltk.org/book/ch01.html#counting-vocabulary) to determine
each artists vocabulary. Each word is counted once, so pimps, pimp, pimping,
and pimpin are four unique words. To avoid issues with apostrophes (e.g.,
pimpin vs. pimpin), theyre removed from the dataset. It still isnt perfect. Hip
hop is full of slang that is hard to transcribe (e.g., shorty vs. shawty),
compound words (e.g., king shit), featured vocalists, and repetitive choruses.
Its still directionally interesting. Of the 85 artists in the dataset, lets take a
look at who is on top.

#1

Aesop Rock

When I first published this analysis, I excluded Aesop Rock, figuring he was too
obscure. The Reddit hip hop community was in uproar, claiming Aesop would
absolutely be #1. Sure enough, Aesop Rock is well-above every artist in my
dataset and I was obliged to add him to the chart. In fact, his datapoint is so
far to the right that he should be off the chart (I'm lazy and didn't adjust the
scale).

#2, #6, #7, #9, #20, and #23

wu-tang clan aint nothin ta fuck wit

Wu-Tang Clan at #6 is fucking impressive given that 10 members, with vastly


different styles, are equally contributing lyrics. Add the fact that GZA,
Ghostface, Raekwon, and Method Man's solo works are also in the top 20
notably, GZA at #2. Perhaps their countless hours of studio time together (and
RZAs mentorship) exposed each rappers vocabulary to one another.
Lets take a deeper look at Wu-Tang five studio albums to better understand
each members contribution. Here's a breakdown of the number and percent
of words used by each member.

To understand each rapper's vocabulary (# of unique words) in Wu-Tang's first


five albums, I chose a 3,500 word threshold so that each person was on an
equal footing. That way, we could include GZA, but unfortunately had to
exclude Ol' Dirty Bastard, Cappadonna, and Masta Killa, who have too few
verses across Wu-Tang's corpus.

U-God and GZA clearly bolster the groups average. Raekwon and Method
Mans contributions have a lower average compared to other members, but
recognize that their data points would exceed most artists in hip hop.

#3 - 5

Kool Keith, Canibus, Cunninlynguists

Moving past Wu-Tangs dominance, the next three artists are relatively not as
well-known. Of the three, Kool Keith (http://en.wikipedia.org/wiki/Kool_Keith)
has the most diverse vocabulary. For a taste of his work, check out his album
with the largest vocab: Dr. Octagonecologyst
(http://open.spotify.com/album/0GAqyZFjgaz6V5ozTS0dfW). #2 and #3 are
two relatively underground (yet accomplished) acts: Jamaican-born rapper
Canibus (http://en.wikipedia.org/wiki/Canibus) and southern-based group
CunninLyguists (http://en.wikipedia.org/wiki/CunninLynguists).

#14 - 15

Outkast and E-40

Of course E-40 (http://en.wikipedia.org/wiki/E-40) is in the top 20; hes


considered to be the inventor of much slang. Just a few that hes been
responsible for: all good, pop ya collar, shizzle, and you feel me.

responsible for: all good, pop ya collar, shizzle, and you feel me.
At #15, Outkasts deep vocabulary is definitely a function of their style:
frequent use of portmanteau (e.g., ATLiens, Stankonia), southern drawl (e.g.,
nahmsayin, eryday), and made-up slang (e.g., flawsky-wawsky).
As expected, other southern-based acts arent in Outkasts league. Take a look
at the regional break-out below:

The south has the lowest average (4,268) and the east-coast the highest
(4,804). In fact, only 4 of the 17 southern-based artists in the dataset are above
average. My guess is that this is a function of crunk music's call-and-response
style, resulting in more repetition of words.

#26 and #33

busta rhymes and Twista

Since both rappers are known for their speed, its nice to see that their verses
are just as lyrically diverse as their peers.

And skipping ahead to the bottom of the dataset...

#67, #68, #71, and #72

snoop dogg, 2pac, Kanye west, and lil wayne

Some of the biggest names in hip hop were in the bottom 20%. Lets take
another look at the data:

While Lil Wayne has never been celebrated for the complexity of his word
choices, I expected 2pac, Snoop, and Kanye to be well above average.
It's also worth noting that Drake, one of the most popular artists of late, is #83
on this list.

#85

DMX

At #85 and in last place: DMX. But this shouldn't undermine an artist whose
raw energy and honesty were the most memorable qualities of his music.

So what's all this mean?


io9 writer Robert Gonzalez (http://io9.com/rappers-ranked-by-vocabularysize-1571623387) blew my mind with this point, "On The Black Album track
'Moment of Clarity,' Jay-Z contrasts his lyricism with that of Common and
Talib Kweli (both of whom "rank" higher than him, when it comes to the
diversity of their vocabulary):
I dumbed down for my audience to double my dollars
They criticized me for it, yet they all yell "holla"
If skills sold, truth be told, I'd probably be
Lyrically Talib Kweli
Truthfully I wanna rhyme like Common Sense

Truthfully I wanna rhyme like Common Sense


But I did 5 mil - I ain't been rhyming like Common since

@matthew_daniels (https://twitter.com/matthew_daniels)
Want more hip hop data analysis? Show love below.
Or buy this glorious project as a poster for your wall
(http://www.popchartlab.com/collections/prints/products/the-hip-hop-flowchart) :)
Like

Tweet

Share 67,993peoplelikethis.SignUptosee
whatyourfriendslike.

3,774

Das könnte Ihnen auch gefallen