Sie sind auf Seite 1von 3

Data Analytics Technical Details with references to R

packages
I.What we have and we know tu use:
1. lustering: clust (shipped with base R), kmeans, dbscan, hierarchical clustering
!. Association Rules: Package arules provides both data structures for efficient handling of
sparse binary data as well as interfaces to implementations of Apriori and Eclat for mining
frequent itemsets, maximal frequent itemsets, closed frequent itemsets and association rules
II.What we can do "tested with hypothetical data#:
$. Recursive %artitioning: Tree-structured models for regression, classification and survival
analysis, following the ideas in the !"R# book, are implemented in rpart (shipped with base R)
and tree Package rpart is recommended for computing !"R#$like trees " rich toolbox of
partitioning algorithms is available in %eka , package R%eka provides an interface to this
implementation, including the &'($variant of !') and *) #he !ubist package fits rule$based
models (similar to trees) with linear regression models in the terminal leaves, instance$based
corrections and boosting #he !)+ package can fit !)+ classification trees, rule$based models,
and boosted versions of these Two recursive partitioning algorithms with unbiased variable
selection and statistical stopping criterion are implemented in package party ,unction ctree() is
based on non-parametrical conditional inference procedures for testing independence between
response and each input variable whereas mob() can be used to partition parametric models
-xtensible tools for visualizing binary trees and node distributions of the response are available
in package party as well "n adaptation of rpart for multivariate responses is available in
package mvpart ,or problems with binary input variables the package .ogicReg implements
logic regression Graphical tools for the visuali/ation of trees are available in package maptree
"n approach to deal with the instability problem via extra splits is available in package #%01
#rees for modelling longitudinal data by means of random effects are offered by packages
R--*tree and longRPart Partitioning of mixture models is performed by RP**
!omputational infrastructure for representing trees and unified methods for predition and
visuali/ation is implemented in partykit #his infrastructure is used by package evtree to
implement evolutionary learning of globally optimal trees Obliue trees are available in package
obliquetree
&. 'eural 'etworks: single-hidden-layer neural networ! are implemented in package nnet
(shipped with base R) Package R2332 offers an interface to the "tuttgart #eural #etwor!
2imulator (2332)
(. )pti*i+ation using ,enetic Algorith*s: Packages rgp and rgenoud offer optimi/ation
routines based on genetic algorithms
III.What we can do in future "learning and testing in progress#:
-. Regulari+ed and .hrinkage /ethods: $egression models with some constraint on the
parameter estimates can be fitted with the lasso4 and lars packages %asso with simultaneous
updates for groups of parameters (groupwise lasso) is available in package grplasso5 the grpreg
package implements a number of other group penalization models, such as group *!P and
group 2!"6 #he .7 regularization path for generali/ed linear models and !ox models can be
obtained from functions available in package glmpath, the entire lasso or elastic-net
regularization path (also in elasticnet) for linear regression, logistic and multinomial regression
models can be obtained from package glmnet #he penali/ed package provides an alternative
implementation of lasso &%'( and ridge &%)( penali/ed regression models (both 8.* and !ox
models) "emiparametric additive hazards models under lasso penalties are offered by package
aha/ " generalisation of the %asso shrin!age techniue for linear regression is called relaxed
lasso and is available in package relaxo #he shrun!en centroids classifier and utilities for gene
expression analyses are implemented in package pamr "n implementation of multivariate
adaptive regression splines is available in package earth *ariable selection through clone
selection in 29*s in penali/ed models (2!"6 or .7 penalties) is implemented in package
penali/ed29* 9arious forms of penalized discriminant analysis are implemented in packages
hda, rda, sda, and 266" Package .iblineaR offers an interface to the .0:.03-"R library #he
ncvreg package fits linear and logistic regression models under the the 2!"6 and *!P
regression penalties using a coordinate descent algorithm
0. 1oosting: 9arious forms of gradient boosting are implemented in package gbm (tree$based
functional gradient descent boosting) #he +inge-loss is optimi/ed by the boosting
implementation in package bst Package 8"*:oost can be used to fit generalized additive
models by a boosting algorithm "n extensible boosting framework for generalized linear,
additive and nonparametric models is available in package mboost %i!elihood-based boosting
for !ox models is implemented in !ox:oost and for mixed models in 8**:oost 8"*.22
models can be fitted using boosting by gamboost.22
2. .upport 3ector /achines and 4ernel /ethods: #he function svm() from e7+;7 offers an
interface to the .0:29* library and package kernlab implements a flexible framework for
!ernel learning (including 29*s, R9*s and other kernel learning algorithms) "n interface to
the 29*light implementation (only for one$against$all classification) is provided in package
klaR #he relevant dimension in kernel feature spaces can be estimated using rdetools which also
offers procedures for model selection and prediction
5. 1ayesian /ethods: -ayesian Additive $egression #rees (:"R#), where the final model is
defined in terms of the sum over many weak learners (not unlike ensemble methods), are
implemented in package :ayes#ree -ayesian nonstationary, semiparametric nonlinear
regression and design by treed 8aussian processes including :ayesian !"R# and treed linear
models are made available by package tgp
16. /odel selection and validation: Package e7+;7 has function tune() for hyper parameter
tuning and function errorest() (ipred) can be used for error rate estimation #he cost parameter !
for support vector machines can be chosen utili/ing the functionality of package svmpath
,unctions for R<! analysis and other visualisation techniques for comparing candidate
classifiers are available from package R<!R Package caret provides miscellaneous functions
for building predictive models, including parameter tuning and variable importance measures
#he package can be used with various parallel implementations (eg *P0, 3%2 etc)
11. 7le*ents of .tatistical 8earning: 6ata sets, functions and examples from the book #he
-lements of 2tatistical .earning: 6ata *ining, 0nference, and Prediction by #revor =astie,
Robert #ibshirani and &erome ,riedman have been packaged and are available as -lem2tat.earn

Das könnte Ihnen auch gefallen