Text Mining

Hochgeladen von

Puthpura Puth

0% fanden dieses Dokument nützlich (0 Abstimmungen)

21 Ansichten3 Seiten

text mining

Copyright

Verfügbare Formate

TXT, PDF, TXT oder online auf Scribd lesen

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Dieses Dokument melden

text mining

Copyright:

Verfügbare Formate

Als TXT, PDF, TXT herunterladen oder online auf Scribd lesen

Markieren Sie unangemessene Inhalte

0% fanden dieses Dokument nützlich (0 Abstimmungen)

21 Ansichten3 Seiten

Text Mining

Hochgeladen von

Puthpura Puth

text mining

Copyright:

Verfügbare Formate

Als TXT, PDF, TXT herunterladen oder online auf Scribd lesen

Markieren Sie unangemessene Inhalte

Zu Seite

Sie sind auf Seite 1von 3

Im Dokument suchen

Libraries

1.textir
2. MARSS
3.tm
4.SnowballC
library(e1071)
library(textir)
library(tm)
library(VGAM)

=================================Lab 1========================================
@taken the test construct a excel sheet. 2nd column is document numebr
@constructed a excel sheet with 4 column.
@How many times this document
freqs.df= read.csv("frequency.csv", stringsAsFactors = FALSE)
@create a table where the rows are all possible.
result <- xtabs(Freq ~ ID+Var1, data=freqs.df)
dim(result)
colnames(result)
#To be useful in the next step, you need to convert result into a matrix object
@convert every entry into numeric.
final.m <- apply(result, 2, as.numeric)
#limit the feature list to only those words that appear across the entire corpus
with a mean
relative frequency of some threshold.
@this will create a bolean vector for every column.
smaller.m <- final.m[,apply(final.m,2,mean)>=.25]
# Create a distance object
dm <- dist(smaller.m)
# Perform a cluster analysis on the distance object
cluster <- hclust(dm)
plot(cluster)
=================================Lab 2==========================================
==
@take different senteneces and predict the author.
@every row corrresponding to novel
@each column is relative frequency.
novels = read.csv("novels.csv",stringsAsFactors = FALSE)
@
anon.v <- which(novels$author.v == "anonymous")
@training set excluding the rows which we don't know the authors name
train <- novels[-anon.v,2:ncol(novels)]
@testing set
class.f <- novels[-anon.v,"author.v"]
@support vector machine
library(e1071)
model.svm <- svm(train, factor(class.f))

pred.svm <- predict(model.svm, train)

table(pred.svm, class.f)
testdata <- novels[anon.v,2:ncol(novels)]
final.result <- predict (model.svm, testdata)
as.data.frame(final.result)
library(textir)
@DATASET COMES with above fucntions
data(we8there)
#Classification using vglm
colnames(we8thereCounts)
t = sample(1:nrow(we8thereCounts),1000)
fgl = we8thereCounts[t,1:100]
covars = fgl
for( i in 1: 100){
covars[,i] = (fgl[,i]- mean(fgl[,i]))/sd(fgl[,i])
}
dd=data.frame(cbind(type=as.numeric(we8thereRatings[t,5]) ,as.matrix(covars)))
library(VGAM)
gg <- vglm(type~.,multinomial,data=dd)
round(fitted(gg),2)
cbind(round(fitted(gg),2),as.numeric(we8thereRatings[t,5]))
@Never used vglm for text analytcs
#Using mnlm functionl
trainlist = sample(1:nrow(we8thereCounts),0.7*nrow(we8thereCounts))
trainset = we8thereCounts[trainlist,]
testset = we8thereCounts[-trainlist,]
trainres = factor(we8thereRatings[trainlist,5])
fit = mnlm(NULL,covars=trainset, counts=trainres)
testres = we8thereRatings[-trainlist,5]
predictedres = round(predict(fit, testset, type="response"),1)
cbind(predictedres,testres)
======================lab 3============================================
#Preprocessing the text to construct DocumentTermMatrix
d = read.csv("C:/cd/Data sets/9-text analytics/quotes.csv")
library(tm)
ds <- DataframeSource(d)
@Create corpus object
myCorpus<-Corpus(ds)
# convert to lower case
myCorpus <- tm_map(myCorpus, tolower)
# remove punctuation
myCorpus <- tm_map(myCorpus, removePunctuation)
# remove numbers
myCorpus <- tm_map(myCorpus, removeNumbers)
myCorpus<- tm_map(myCorpus, PlainTextDocument)

#In the above code, tm_map() is an interface to apply transformations (mappings)

to corpora. A
list of available transformations can be obtained with getTransformations(), and
the mostly used
ones are as.PlainTextDocument(), removeNumbers(), removePunctuation(), stemDocum
ent() and stripWhitespace().
myTdm<- DocumentTermMatrix(myCorpus, control = list(weighting = weightTf, stopwo
rds = TRUE,minWordLength=2))
temp = as.matrix(myTdm)

Das könnte Ihnen auch gefallen

The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Von Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
Bewertung: 4 von 5 Sternen
4/5 (5794)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Von Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
Bewertung: 4 von 5 Sternen
4/5 (1090)
Never Split the Difference: Negotiating As If Your Life Depended On It
Von Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
Bewertung: 4.5 von 5 Sternen
4.5/5 (838)
Principles: Life and Work
Von Everand
Principles: Life and Work
Ray Dalio
Bewertung: 4 von 5 Sternen
4/5 (599)
The Glass Castle: A Memoir
Von Everand
The Glass Castle: A Memoir
Jeannette Walls
Bewertung: 4.5 von 5 Sternen
4.5/5 (1712)
Sing, Unburied, Sing: A Novel
Von Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
Bewertung: 4 von 5 Sternen
4/5 (1103)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Von Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
Bewertung: 4 von 5 Sternen
4/5 (894)
Grit: The Power of Passion and Perseverance
Von Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
Bewertung: 4 von 5 Sternen
4/5 (587)
Shoe Dog: A Memoir by the Creator of Nike
Von Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
Bewertung: 4.5 von 5 Sternen
4.5/5 (537)
The Perks of Being a Wallflower
Von Everand
The Perks of Being a Wallflower
Stephen Chbosky
Bewertung: 4.5 von 5 Sternen
4.5/5 (2099)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Von Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
Bewertung: 4.5 von 5 Sternen
4.5/5 (474)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Von Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
Bewertung: 4.5 von 5 Sternen
4.5/5 (344)
Bad Feminist: Essays
Von Everand
Bad Feminist: Essays
Roxane Gay
Bewertung: 4 von 5 Sternen
4/5 (1015)
The Outsider: A Novel
Von Everand
The Outsider: A Novel
Stephen King
Bewertung: 4 von 5 Sternen
4/5 (1839)
Her Body and Other Parties: Stories
Von Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
Bewertung: 4 von 5 Sternen
4/5 (821)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Von Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
Bewertung: 4.5 von 5 Sternen
4.5/5 (119)
The Emperor of All Maladies: A Biography of Cancer
Von Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
Bewertung: 4.5 von 5 Sternen
4.5/5 (271)
Angela's Ashes: A Memoir
Von Everand
Angela's Ashes: A Memoir
Frank McCourt
Bewertung: 4.5 von 5 Sternen
4.5/5 (440)
The Little Book of Hygge: Danish Secrets to Happy Living
Von Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
Bewertung: 3.5 von 5 Sternen
3.5/5 (399)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Von Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
Bewertung: 3.5 von 5 Sternen
3.5/5 (2219)
A Man Called Ove: A Novel
Von Everand
A Man Called Ove: A Novel
Fredrik Backman
Bewertung: 4.5 von 5 Sternen
4.5/5 (4609)
The Yellow House: A Memoir (2019 National Book Award Winner)
Von Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
Bewertung: 4 von 5 Sternen
4/5 (98)
Brooklyn: A Novel
Von Everand
Brooklyn: A Novel
Colm Toibin
Bewertung: 3.5 von 5 Sternen
3.5/5 (1937)
The Art of Racing in the Rain: A Novel
Von Everand
The Art of Racing in the Rain: A Novel
Garth Stein
Bewertung: 4 von 5 Sternen
4/5 (4200)
A Tree Grows in Brooklyn
Von Everand
A Tree Grows in Brooklyn
Betty Smith
Bewertung: 4.5 von 5 Sternen
4.5/5 (1929)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Von Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
Bewertung: 4.5 von 5 Sternen
4.5/5 (265)
Steve Jobs
Von Everand
Steve Jobs
Walter Isaacson
Bewertung: 4.5 von 5 Sternen
4.5/5 (806)
The Woman in Cabin 10
Von Everand
The Woman in Cabin 10
Ruth Ware
Bewertung: 3.5 von 5 Sternen
3.5/5 (2322)
Yes Please
Von Everand
Yes Please
Amy Poehler
Bewertung: 4 von 5 Sternen
4/5 (1891)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Von Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
Bewertung: 3.5 von 5 Sternen
3.5/5 (231)
Team of Rivals: The Political Genius of Abraham Lincoln
Von Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
Bewertung: 4.5 von 5 Sternen
4.5/5 (234)
Fear: Trump in the White House
Von Everand
Fear: Trump in the White House
Bob Woodward
Bewertung: 3.5 von 5 Sternen
3.5/5 (738)
John Adams
Von Everand
John Adams
David McCullough
Bewertung: 4.5 von 5 Sternen
4.5/5 (2409)
Wolf Hall: A Novel
Von Everand
Wolf Hall: A Novel
Hilary Mantel
Bewertung: 4 von 5 Sternen
4/5 (3811)
On Fire: The (Burning) Case for a Green New Deal
Von Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
Bewertung: 4 von 5 Sternen
4/5 (73)
The Light Between Oceans: A Novel
Von Everand
The Light Between Oceans: A Novel
M.L. Stedman
Bewertung: 4.5 von 5 Sternen
4.5/5 (789)
The Unwinding: An Inner History of the New America
Von Everand
The Unwinding: An Inner History of the New America
George Packer
Bewertung: 4 von 5 Sternen
4/5 (45)
Manhattan Beach: A Novel
Von Everand
Manhattan Beach: A Novel
Jennifer Egan
Bewertung: 3.5 von 5 Sternen
3.5/5 (792)
The Constant Gardener: A Novel
Von Everand
The Constant Gardener: A Novel
John le Carre
Bewertung: 3.5 von 5 Sternen
3.5/5 (104)
ISO 20022 Securities Dashboard - Description of Business Processes
Dokument6 Seiten
ISO 20022 Securities Dashboard - Description of Business Processes
Puthpura Puth
Noch keine Bewertungen
Rise of ISIS: A Threat We Can't Ignore
Von Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
Bewertung: 3.5 von 5 Sternen
3.5/5 (137)
Little Women
Von Everand
Little Women
Louisa May Alcott
Bewertung: 4 von 5 Sternen
4/5 (104)
Connectivity Guide For IBM InfoSphere Classic Federation Server For z/OS
Dokument70 Seiten
Connectivity Guide For IBM InfoSphere Classic Federation Server For z/OS
Roshava Kratuna
Noch keine Bewertungen
Online Store Locator Using PHP, MySQL and Google Maps API
Dokument4 Seiten
Online Store Locator Using PHP, MySQL and Google Maps API
Aureliano Duarte
Noch keine Bewertungen
Swift Solutions Gpi Interim Report Nostro DLT Poc
Dokument24 Seiten
Swift Solutions Gpi Interim Report Nostro DLT Poc
kiranns1978
Noch keine Bewertungen
Unit 2 - Probability and Distributions: Suggested Reading: Openintro Statistics, Openintro Statistics
Dokument3 Seiten
Unit 2 - Probability and Distributions: Suggested Reading: Openintro Statistics, Openintro Statistics
Puthpura Puth
Noch keine Bewertungen
TransFORM 2014 Appendices Glossaries 8SEPT14
Dokument63 Seiten
TransFORM 2014 Appendices Glossaries 8SEPT14
Puthpura Puth
Noch keine Bewertungen
My Passport to Finance KPIs
Dokument7 Seiten
My Passport to Finance KPIs
Puthpura Puth
Noch keine Bewertungen
Documents-LO Unit1 IntroToData
Dokument3 Seiten
Documents-LO Unit1 IntroToData
eroteme.thinks8580
Noch keine Bewertungen
Derivatives
Dokument32 Seiten
Derivatives
akunting2001
Noch keine Bewertungen
MT 950
Dokument2 Seiten
MT 950
Puthpura Puth
Noch keine Bewertungen
Specification of Stock Option
Dokument1 Seite
Specification of Stock Option
Puthpura Puth
Noch keine Bewertungen
Futures Contract Generic Specifications
Dokument5 Seiten
Futures Contract Generic Specifications
Puthpura Puth
Noch keine Bewertungen
Video Streaming Service Roadmap
Dokument9 Seiten
Video Streaming Service Roadmap
Puthpura Puth
Noch keine Bewertungen
Business Analytics at The Analytical Level
Dokument6 Seiten
Business Analytics at The Analytical Level
Puthpura Puth
Noch keine Bewertungen
TLM Practical Training Road Map
Dokument115 Seiten
TLM Practical Training Road Map
Puthpura Puth
Noch keine Bewertungen
MT 950
Dokument2 Seiten
MT 950
Puthpura Puth
Noch keine Bewertungen
Pruim 2011 Computational Statistics Using R and R Studio
Dokument115 Seiten
Pruim 2011 Computational Statistics Using R and R Studio
Puthpura Puth
Noch keine Bewertungen
Unit 4 - Inference For Numerical Variables: Suggested Reading: Openintro Statistics, Suggested Exercises
Dokument4 Seiten
Unit 4 - Inference For Numerical Variables: Suggested Reading: Openintro Statistics, Suggested Exercises
Puthpura Puth
Noch keine Bewertungen
Conditional Multimonail Logit
Dokument39 Seiten
Conditional Multimonail Logit
Puthpura Puth
Noch keine Bewertungen
BI Is Not Data Science
Dokument4 Seiten
BI Is Not Data Science
Puthpura Puth
Noch keine Bewertungen
Stat 133 Class Notes - Spring, 2011
Dokument351 Seiten
Stat 133 Class Notes - Spring, 2011
b1410112
Noch keine Bewertungen
Presentation 1
Dokument6 Seiten
Presentation 1
Puthpura Puth
Noch keine Bewertungen
BIA - A Glossary of Terms Used in Payments and Settlement Systems
Dokument53 Seiten
BIA - A Glossary of Terms Used in Payments and Settlement Systems
cjackchen
Noch keine Bewertungen
Using R For Data Analysis and Graphics Introduction, Code and Commentary
Dokument96 Seiten
Using R For Data Analysis and Graphics Introduction, Code and Commentary
jto777
Noch keine Bewertungen
Linear Regression
Dokument64 Seiten
Linear Regression
eduardo
Noch keine Bewertungen
Icap Interest Rates Docs
Dokument34 Seiten
Icap Interest Rates Docs
Puthpura Puth
Noch keine Bewertungen
R - Intro
Dokument8 Seiten
R - Intro
Puthpura Puth
Noch keine Bewertungen
Case Study
Dokument13 Seiten
Case Study
Puthpura Puth
Noch keine Bewertungen
CS.1.Music Plusnew Sub
Dokument4 Seiten
CS.1.Music Plusnew Sub
Puthpura Puth
Noch keine Bewertungen
Sample Case Structure PPT - S8
Dokument6 Seiten
Sample Case Structure PPT - S8
Puthpura Puth
Noch keine Bewertungen
Data Mining For Technical Operation of Telecommunications Companies: A Case Study
Dokument6 Seiten
Data Mining For Technical Operation of Telecommunications Companies: A Case Study
DarshanVakkalagadda
Noch keine Bewertungen
Arc Hydro Groundwater Introduccion
Dokument11 Seiten
Arc Hydro Groundwater Introduccion
FrancoPaúlTafoyaGurtz
Noch keine Bewertungen
The Usefulness of Qualitative and Quantitative Approaches and Methods in Researching Problem-Solving Ability in Science Education Curriculum
Dokument10 Seiten
The Usefulness of Qualitative and Quantitative Approaches and Methods in Researching Problem-Solving Ability in Science Education Curriculum
Glazelle Paula Prado
Noch keine Bewertungen
DBMS
Dokument5 Seiten
DBMS
Abhinandan
Noch keine Bewertungen
Data Vault Modelling
Dokument40 Seiten
Data Vault Modelling
AramTovmasyan
100% (1)
Walt Whitman's Poem I Hear America Singing A Study of Michael Riffaterre's Semiotics
Dokument8 Seiten
Walt Whitman's Poem I Hear America Singing A Study of Michael Riffaterre's Semiotics
International Journal of Innovative Science and Research Technology
Noch keine Bewertungen
Abinitio Question
Dokument3 Seiten
Abinitio Question
abhishek pal
Noch keine Bewertungen
Oth 2
Dokument2 Seiten
Oth 2
Nazmul-Hassan Sumon
Noch keine Bewertungen
Artificial Intelligence and Expert Systems: Narayana Engineering College Nellore
Dokument10 Seiten
Artificial Intelligence and Expert Systems: Narayana Engineering College Nellore
Girish Chowdary
Noch keine Bewertungen
3.X.25 and Frame Relay
Dokument20 Seiten
3.X.25 and Frame Relay
Utsav Kakkad
Noch keine Bewertungen
Performance Scenario Sudden Slowdown On Rac
Dokument45 Seiten
Performance Scenario Sudden Slowdown On Rac
behanchod
Noch keine Bewertungen
Storage Best Practices and Resiliency Guide
Dokument26 Seiten
Storage Best Practices and Resiliency Guide
subhrajitm47
Noch keine Bewertungen
Linkedlist Cormen
Dokument5 Seiten
Linkedlist Cormen
adithya604
Noch keine Bewertungen
Crash 20110128160519
Dokument326 Seiten
Crash 20110128160519
gogu87
Noch keine Bewertungen
Ncrack Usage
Dokument2 Seiten
Ncrack Usage
Ftfw
Noch keine Bewertungen
A Join Vs Database Join
Dokument21 Seiten
A Join Vs Database Join
Pradeep Kothakota
Noch keine Bewertungen
Big Data Challenges: Managing Growth & Generating Insights
Dokument4 Seiten
Big Data Challenges: Managing Growth & Generating Insights
Akhila Shaji
100% (1)
CAATs and GAS for Audit Analytics
Dokument4 Seiten
CAATs and GAS for Audit Analytics
Agung Febri
Noch keine Bewertungen
Ch01 Introduction To Marketing Research
Dokument7 Seiten
Ch01 Introduction To Marketing Research
quachthanhtin19
Noch keine Bewertungen
Differences between qualitative and quantitative research
Dokument9 Seiten
Differences between qualitative and quantitative research
Angelo Villafranca
Noch keine Bewertungen
ICS Telecom Import Export Functions
Dokument15 Seiten
ICS Telecom Import Export Functions
kamal
100% (1)
Marketing Research: Submitted By: Ms - Diksha
Dokument28 Seiten
Marketing Research: Submitted By: Ms - Diksha
Rupali Kashyap
Noch keine Bewertungen
Inventory Textil Paper
Dokument45 Seiten
Inventory Textil Paper
mustafe ABDULLAHI
Noch keine Bewertungen
SQL Server Database Access With IIS
Dokument7 Seiten
SQL Server Database Access With IIS
Anonymous I0bTVB
Noch keine Bewertungen
HHFA Dic Module1 Unit4 en
Dokument15 Seiten
HHFA Dic Module1 Unit4 en
Moisés Cuinhane
Noch keine Bewertungen
17 - Snow, C. C. Thomas, J. B.
Dokument25 Seiten
17 - Snow, C. C. Thomas, J. B.
wpd83
Noch keine Bewertungen
PROJECT REPORT Timsy
Dokument12 Seiten
PROJECT REPORT Timsy
FortuneDeals Budh Vihar
Noch keine Bewertungen
List - Unsorted and Sorted
Dokument60 Seiten
List - Unsorted and Sorted
Tasbiul Hasan Towsif 2131741042
Noch keine Bewertungen
CSE Database Management System Course Outline
Dokument22 Seiten
CSE Database Management System Course Outline
ANAND kumar
Noch keine Bewertungen