Sie sind auf Seite 1von 5

String

Comparisons
in R

Reuben
McCreanor

Motivation

R stringdist
String Comparison in R
An example

References
Reuben McCreanor

Stat 521 - Data Mining and Predictive Modeling

Thursday, September 2, 2015


Motivation: Why would you want to compare
strings?
String
Comparisons
in R
”No one should ever claim to be a data analyst until he or she
Reuben has done string manipulation” - Gaston Sanchez
McCreanor

Motivation
Strings in R are largely lexicographic
R stringdist

An example

References

String comparisons can be used for:


Cleaning dirty data
Web search
Biomedical research
Matching in data frames
R stringdist: How do you compare strings?

String
Comparisons
in R Stringdist is a package that calculates distances between
Reuben
McCreanor
strings
Adds functionality to R by allowing approximate string
Motivation
matching
R stringdist

An example Very flexible - allows the user to set what should be


References considered a match

Key Functions
amatch returns the position of the closest string match
aint indicates wether an element approximately matches
stringdist computes distances between different strings
phonetic translates text into phonetic codes
An example: Using stringdist to match similar
words
String
Comparisons
in R

Reuben
McCreanor

Motivation

R stringdist

An example

References
References and further reading

String
Comparisons
in R
Want to know more?
Reuben
McCreanor Handling and Processing Strings in R by Gaston Sanchez
Motivation
http://gastonsanchez.com/Handling_and_Processing_
R stringdist
Strings_in_R.pdf
An example

References
References
Relational Operators in R https://stat.ethz.ch/
R-manual/R-devel/library/base/html/Comparison.html
R Tutorial - Characters http://www.r-tutor.com/
r-introduction/basic-data-types/character
Package stringdist https://cran.r-project.org/web/
packages/stringdist/stringdist.pdf

Das könnte Ihnen auch gefallen