TM and Stylo

Hochgeladen von

segundacuentade

0% fanden dieses Dokument nützlich (0 Abstimmungen)

15 Ansichten1 Seite

Originaltitel

Tm and Stylo

Copyright

Verfügbare Formate

PDF, TXT oder online auf Scribd lesen

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Dieses Dokument melden

Copyright:

Verfügbare Formate

Als PDF, TXT herunterladen oder online auf Scribd lesen

Markieren Sie unangemessene Inhalte

0% fanden dieses Dokument nützlich (0 Abstimmungen)

15 Ansichten1 Seite

TM and Stylo

Hochgeladen von

segundacuentade

Copyright:

Verfügbare Formate

Als PDF, TXT herunterladen oder online auf Scribd lesen

Markieren Sie unangemessene Inhalte

Zu Seite

Sie sind auf Seite 1von 1

Im Dokument suchen

RPubs - Text Mining With R and the "tm&qu... https://rpubs.

com/sgeletta/95577

RPubs brought to you by RStudio

Text Mining With R and the "tm" Package by Simon
Sign in
Last updated almost 2 years ago
Register

Data Science Capstone Comments () Share Hide Toolbars

Simon Geletta
Saturday, July 25, 2015

Milestone Report
Introduction and Objectives
The main goal of this report is to demonstrate the level of competency achieved in working with
unstructured data in order to produce a structured set of records which can then be used for the
purposes of statistical modeling. The first step in any such task is to really know (as much as
possible), what is included in the raw data (or document corpus) and to separate out the useful from
the not-so-useful information. I would like to note that because the running of the codes while
preparing the document for publication on RPub.com was taking unreasonably long period of time, I
am forced to present this report based on a 10% sample of the entire data that was provided. The
idea is to provide this as an evidence of what I will do with the entire data at the end of the capstone
project.

Methods
The first task is to download the raw resources that would be used for the analytics tasks - The main
being the three data sources en_US.blogs.txt, en_US.news.txt, and en_US.tweets.txt. In addition, the
list of bad/profane words were also obtained (later to be used to exclude from the analysis). The raw
data were extracted from the given site: http://d396qusza40orc.cloudfront.net/dsscapstone/dataset
/Coursera-SwiftKey.zip (http://d396qusza40orc.cloudfront.net/dsscapstone/dataset/Coursera-
SwiftKey.zip) in a compressed format and locally uncompressed. The bad/profane words were
downloaded from https://raw.githubusercontent.com/shutterstock/List-of-Dirty-Naughty-Obscene-
and-Otherwise-Bad-Words/master/en (https://raw.githubusercontent.com/shutterstock/List-of-Dirty-
Naughty-Obscene-and-Otherwise-Bad-Words/master/en). These were also locally stored as
en_bws.txt. The following chunc of code shows how the files acquisition went.

dtsrc <- "http://d396qusza40orc.cloudfront.net/dsscapstone/dataset/Coursera-SwiftKe

y.zip"
if (!file.exists("coursera-swiftkey.zip")){
download.file(dtsrc, destfile="coursera-swiftkey.zip")
unzip("coursera-swiftkey.zip")
}
## list of bad/profane words download from github
bwsrc1<-"https://raw.githubusercontent.com/shutterstock/List-of-Dirty-Naughty-Obscen

1 of 1 06/04/2017 09:49 PM

Das könnte Ihnen auch gefallen

The Magic Lantern - A Novel and - Jose Tomas de Cuellar
Dokument203 Seiten
The Magic Lantern - A Novel and - Jose Tomas de Cuellar
segundacuentade
Noch keine Bewertungen
Puerto Rican Fish Stew (Bacalao) (Printer-Friendl..
Dokument2 Seiten
Puerto Rican Fish Stew (Bacalao) (Printer-Friendl..
segundacuentade
Noch keine Bewertungen
Program Schedule-Nebraska Dance Summit
Dokument1 Seite
Program Schedule-Nebraska Dance Summit
segundacuentade
Noch keine Bewertungen
Bunsen Labs RC Openbox
Dokument17 Seiten
Bunsen Labs RC Openbox
segundacuentade
Noch keine Bewertungen
Almandoz Urbanismo
Dokument33 Seiten
Almandoz Urbanismo
segundacuentade
Noch keine Bewertungen
Animals
Dokument6 Seiten
Animals
segundacuentade
Noch keine Bewertungen
Grit: The Power of Passion and Perseverance
Von Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
Bewertung: 4 von 5 Sternen
4/5 (588)
The Yellow House: A Memoir (2019 National Book Award Winner)
Von Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
Bewertung: 4 von 5 Sternen
4/5 (98)
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Von Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
Bewertung: 4 von 5 Sternen
4/5 (5795)
Never Split the Difference: Negotiating As If Your Life Depended On It
Von Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
Bewertung: 4.5 von 5 Sternen
4.5/5 (838)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Von Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
Bewertung: 4 von 5 Sternen
4/5 (895)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Von Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
Bewertung: 4.5 von 5 Sternen
4.5/5 (345)
Shoe Dog: A Memoir by the Creator of Nike
Von Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
Bewertung: 4.5 von 5 Sternen
4.5/5 (537)
Yes Please
Von Everand
Yes Please
Amy Poehler
Bewertung: 4 von 5 Sternen
4/5 (1891)
The Little Book of Hygge: Danish Secrets to Happy Living
Von Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
Bewertung: 3.5 von 5 Sternen
3.5/5 (400)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Von Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
Bewertung: 4.5 von 5 Sternen
4.5/5 (474)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Von Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
Bewertung: 3.5 von 5 Sternen
3.5/5 (231)
On Fire: The (Burning) Case for a Green New Deal
Von Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
Bewertung: 4 von 5 Sternen
4/5 (74)
The Emperor of All Maladies: A Biography of Cancer
Von Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
Bewertung: 4.5 von 5 Sternen
4.5/5 (271)
Angela's Ashes: A Memoir
Von Everand
Angela's Ashes: A Memoir
Frank McCourt
Bewertung: 4.5 von 5 Sternen
4.5/5 (440)
Bad Feminist: Essays
Von Everand
Bad Feminist: Essays
Roxane Gay
Bewertung: 4 von 5 Sternen
4/5 (1016)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Von Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
Bewertung: 4.5 von 5 Sternen
4.5/5 (266)
The Unwinding: An Inner History of the New America
Von Everand
The Unwinding: An Inner History of the New America
George Packer
Bewertung: 4 von 5 Sternen
4/5 (45)
Team of Rivals: The Political Genius of Abraham Lincoln
Von Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
Bewertung: 4.5 von 5 Sternen
4.5/5 (234)
Principles: Life and Work
Von Everand
Principles: Life and Work
Ray Dalio
Bewertung: 4 von 5 Sternen
4/5 (599)
Fear: Trump in the White House
Von Everand
Fear: Trump in the White House
Bob Woodward
Bewertung: 3.5 von 5 Sternen
3.5/5 (738)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Von Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
Bewertung: 3.5 von 5 Sternen
3.5/5 (2259)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Von Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brene Brown
Bewertung: 4 von 5 Sternen
4/5 (1091)
Steve Jobs
Von Everand
Steve Jobs
Walter Isaacson
Bewertung: 4.5 von 5 Sternen
4.5/5 (806)
Rise of ISIS: A Threat We Can't Ignore
Von Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
Bewertung: 3.5 von 5 Sternen
3.5/5 (137)
John Adams
Von Everand
John Adams
David McCullough
Bewertung: 4.5 von 5 Sternen
4.5/5 (2409)
The Glass Castle: A Memoir
Von Everand
The Glass Castle: A Memoir
Jeannette Walls
Bewertung: 4.5 von 5 Sternen
4.5/5 (1713)
The Outsider: A Novel
Von Everand
The Outsider: A Novel
Stephen King
Bewertung: 4 von 5 Sternen
4/5 (1839)
Brooklyn: A Novel
Von Everand
Brooklyn: A Novel
Colm Toibin
Bewertung: 3.5 von 5 Sternen
3.5/5 (1937)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Von Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
Bewertung: 4.5 von 5 Sternen
4.5/5 (121)
The Light Between Oceans: A Novel
Von Everand
The Light Between Oceans: A Novel
M.L. Stedman
Bewertung: 4.5 von 5 Sternen
4.5/5 (789)
The Art of Racing in the Rain: A Novel
Von Everand
The Art of Racing in the Rain: A Novel
Garth Stein
Bewertung: 4 von 5 Sternen
4/5 (4200)
Manhattan Beach: A Novel
Von Everand
Manhattan Beach: A Novel
Jennifer Egan
Bewertung: 3.5 von 5 Sternen
3.5/5 (792)
The Woman in Cabin 10
Von Everand
The Woman in Cabin 10
Ruth Ware
Bewertung: 3.5 von 5 Sternen
3.5/5 (2322)
The Perks of Being a Wallflower
Von Everand
The Perks of Being a Wallflower
Stephen Chbosky
Bewertung: 4.5 von 5 Sternen
4.5/5 (2104)
Wolf Hall: A Novel
Von Everand
Wolf Hall: A Novel
Hilary Mantel
Bewertung: 4 von 5 Sternen
4/5 (3811)
A Man Called Ove: A Novel
Von Everand
A Man Called Ove: A Novel
Fredrik Backman
Bewertung: 4.5 von 5 Sternen
4.5/5 (4610)
Little Women
Von Everand
Little Women
Louisa May Alcott
Bewertung: 4 von 5 Sternen
4/5 (104)
Sing, Unburied, Sing: A Novel
Von Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
Bewertung: 4 von 5 Sternen
4/5 (1103)
A Tree Grows in Brooklyn
Von Everand
A Tree Grows in Brooklyn
Betty Smith
Bewertung: 4.5 von 5 Sternen
4.5/5 (1929)
Her Body and Other Parties: Stories
Von Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
Bewertung: 4 von 5 Sternen
4/5 (821)
The Constant Gardener: A Novel
Von Everand
The Constant Gardener: A Novel
John le Carré
Bewertung: 3.5 von 5 Sternen
3.5/5 (104)
Oomd
Dokument9 Seiten
Oomd
stalin_oxf89
Noch keine Bewertungen
Nandha Engineering College, Erode-52: Continuous Assessment Test - III 15Cs404 Mobile Computing
Dokument2 Seiten
Nandha Engineering College, Erode-52: Continuous Assessment Test - III 15Cs404 Mobile Computing
Kavitha Subramaniam
Noch keine Bewertungen
Final Project PDF
Dokument20 Seiten
Final Project PDF
Frankpic
Noch keine Bewertungen
Python Cheat Sheet (2009) PDF
Dokument1 Seite
Python Cheat Sheet (2009) PDF
Julia fernandez
Noch keine Bewertungen
Ssrs SQL Tutorial
Dokument192 Seiten
Ssrs SQL Tutorial
nha
100% (2)
Language Extensions For CBEA 2.6
Dokument168 Seiten
Language Extensions For CBEA 2.6
Ruben Palmer
Noch keine Bewertungen
Event Category Health Rule Violation Events
Dokument8 Seiten
Event Category Health Rule Violation Events
Vegga Firsthya
Noch keine Bewertungen
Fundamentals of Database Management
Dokument6 Seiten
Fundamentals of Database Management
SundeepKumar
100% (1)
Gaussian Elimination
Dokument86 Seiten
Gaussian Elimination
Janine Jade Setias
100% (1)
C TADM56 74 Sample Questions
Dokument6 Seiten
C TADM56 74 Sample Questions
Kapil Kulkarni
Noch keine Bewertungen
Guia NPDP
Dokument8 Seiten
Guia NPDP
Haniel Austria
Noch keine Bewertungen
Artificial Intelligence in Contact Centers
Dokument4 Seiten
Artificial Intelligence in Contact Centers
SriramVenkat
Noch keine Bewertungen
CL23
Dokument14 Seiten
CL23
ENDLURI DEEPAK KUMAR
Noch keine Bewertungen
MS SQL Server Interview Questions
Dokument17 Seiten
MS SQL Server Interview Questions
Anilkumar000
100% (4)
Simple Mod Bus Master
Dokument5 Seiten
Simple Mod Bus Master
AldenirJoseBatista
Noch keine Bewertungen
Platform Administration Tasks
Dokument28 Seiten
Platform Administration Tasks
Ricardo Quezada Rey
Noch keine Bewertungen
RFC 781
Dokument2 Seiten
RFC 781
NickyNET
Noch keine Bewertungen
How To Build A GPU-Accelerated Research Cluster
Dokument7 Seiten
How To Build A GPU-Accelerated Research Cluster
x2y2z2rm
Noch keine Bewertungen
Activity - Using John The Ripper To Crack Passwords - 2.3 Activity and Discussion - Material Del Curso CYBER504x - Edx PDF
Dokument4 Seiten
Activity - Using John The Ripper To Crack Passwords - 2.3 Activity and Discussion - Material Del Curso CYBER504x - Edx PDF
neon48
Noch keine Bewertungen
LCD Interfacing Tutorial: Commands and Instructions
Dokument6 Seiten
LCD Interfacing Tutorial: Commands and Instructions
Tafadzwa Nyoni
Noch keine Bewertungen
A Security Scheme For Wireless Sensor Networks
Dokument5 Seiten
A Security Scheme For Wireless Sensor Networks
Ronaldo Milfont
Noch keine Bewertungen
Developer Mozilla Org en US PDF
Dokument5 Seiten
Developer Mozilla Org en US PDF
a
Noch keine Bewertungen
Der Web Agent Guide 6QMR5
Dokument275 Seiten
Der Web Agent Guide 6QMR5
harshad27
Noch keine Bewertungen
1 My First Perceptron With Python Eric Joel Barragan Gonzalez (WWW - Ebook DL - Com)
Dokument96 Seiten
1 My First Perceptron With Python Eric Joel Barragan Gonzalez (WWW - Ebook DL - Com)
Sahib Qafarsoy
Noch keine Bewertungen
WDEBU7 Setting Up BEx Web Short Track
Dokument44 Seiten
WDEBU7 Setting Up BEx Web Short Track
bigtime44
Noch keine Bewertungen
Object Oriented Design and Patterns
Dokument204 Seiten
Object Oriented Design and Patterns
altafvasi9380
Noch keine Bewertungen
Honeypot in Network Security A Survey
Dokument7 Seiten
Honeypot in Network Security A Survey
Ayushi Khatod
Noch keine Bewertungen
Optimizing Data Loading
Dokument26 Seiten
Optimizing Data Loading
budulinek
Noch keine Bewertungen
Principles Compiler Design Dec 2002
Dokument3 Seiten
Principles Compiler Design Dec 2002
api-3782519
Noch keine Bewertungen
3 Business Explorer
Dokument9 Seiten
3 Business Explorer
Manuel Vázquez Gil
Noch keine Bewertungen