Sie sind auf Seite 1von 138

Object-Oriented programming in R

Susana Eyheramendy

Introduction
Object-oriented programming (OOP) has
become a widely used and valuable tool for software engineering. easier to design, write and maintain software when there is some clear separation of the data representation from the operations that are to be performed on it.
2

Its value derives from the fact that it is often

Introduction
In an OOP system, real physical things are generally
represented by classes, and methods (functions) are written to handle the different manipulations that need to be performed on the objects. system, in which classes dene objects and there are repositories for the methods that can act on those objects. specication of generic functions, its a functioncentric system. 3

Many peoples view if OOP is based on a class-centric R separates the class specication from the

Introduction
R supports two internal OOP systems: S3
and S4.

S3 is easy to use but can be made unreliable


through nothing other than bad luck, or a poor choice of names. software projects but has an increased complexity of use.
4

S4 is better suited for developing large

Introduction

Four general elements that an oop language should support:

Objects: encapsulate state information and control behavior. Classes: describe general properties for groups of objects. Inheritance: new classes can be dened in terms of existing classes. Polymorphism: a (generic) function has different behaviors, although similar outputs, depending on the class of one or more of its arguments. 5

Introduction
In S3, there is no formal specication for classes In S4, formal class denitions were included in
and hence there is a weak control of objects and inheritance. The emphasis of the S3 system was on generic functions and polymorphism. the language and based on these, more controlled software tools and paradigms for the creation of objects and the handling of inheritance were introduced.
6

The basic of OOP



Classes describe the objects that will be represented in computer code. A class specication details all the properties that are needed to describe an object. An object is an instance of exactly one class and it is the class denition and representation that determine the properties of the object. Instances of a class differ only in their state. New classes can be dened in terms of existing classes through an operation called inheritance.
7

The basic of OOP


Inheritance allows new classes to extend,
often by adding new slots or by combining two or more existing classes into a single composite entity. that A is a superclass of B , and equivalently that B is a subclass of A. a subclass of each of its superclasses.
8

If a class A extends the class B , then we say No class can be its own subclass. A class is

The basic of OOP


If the language only allows a class to extend, at most,
one class, then we say that language has single inheritance.

Computing the class hierarchy is then very simple, since


the resulting hierarchy is a tree and there is a single unique path from any node to the root of the tree. This path yields the class linearization. by the values in the class attribute, which is a vector.
9

In the S3 system, the class of an instance is determined

The basic of OOP


If the language allows a class to directly
extend several classes, then we say that the language supports multiple inheritance and computing the class linearization is more difficult.

S4 supports multiple dispatch.


10

The basic of OOP

A method is a type of function that is invoked depending on the class of one or more of its arguments and this process is called dispatch. While in some systems, such as S3, methods can be invoked directly, it is more common for them to be invoked via a generic function. When a generic function is invoked, the set of methods that might apply must be sorted into a linear order, with the most specic method rst and the least specic method last. This is often called method linearization and computing it depends on being able to linearize the class hierarchy. 11

The basic of OOP


If the language supports dispatching on a single argument, then we say it has single dispatch. The S3 system use single dispatch. When the language supports dispatching on several arguments, we say that the language supports multiple dispatch and the set of specic classes of the arguments for each formal parameter of the generic function is called the signature. S4 supports multiple dispatch. With multiple dispatch, the additional complication of precedence of the arguments arises. In particular, when method selection depends on inheritance, there may be more than one superclass for which a method has been dened. In this case, a concept of the distance between the class and its superclasses is used to guide12selection.

The basic of OOP

The evaluation process for a call to a generic function is roughly as follows:

The actual classes of supplied arguments that match the signature of the generic function are determined. Based on these, the available methods are ordered from most specic to least. Then, after evaluating any code supplied in the generic, control is transferred to the most specic method.

In S4, a generic function has a xed set of named formal arguments and these form the basis of the signature. Any call to the generic will be dispatched with respect to its signature. 13

the FreqFlyer class has every slot that an instance of the Passenger class has. The relationship between a subclass and its superclasses should be an is a relationship. Every frequent yer is a passenger and not all passengers are frequent yers. Sometimes the notion of subclass and superclass can be confusing. One reason that the more specialized class is called a subclass is because the set of objects that can be used exchangeably with the FreqFlyer class are a subset of those that can be used exchangeably with the Passenger class. In the example below, we provide a very basic S4 implementation of the Passenger and FreqFlyer classes.

Consider as an example of the S4 system the modeling of airline passengers.

The basic of OOP: inheritance

> setClass("Passenger", representation(name = "character", + origin = "character", destination = "character")) [1] "Passenger" > setClass("FreqFlyer", representation(ffnumber = "numeric"), + contains = "Passenger") [1] "FreqFlyer" > getClass("FreqFlyer")

We then say that the FreqFlyer is a subclass of Slots: Passenger and that Passenger is a superclass of Name: ffnumber name origin FreqFlyer . 14
Class: numeric character character

Exercise
Dene a class for passenger names that has slots for the rst name, middle initial and last name. Change the denition of the Passenger class to reect your new class. Does this change the inheritance properties of the Passenger class or the FreqFlyer class?

15

origin = "character", destination = "character"))

[1] "Passenger"

> setClass("FreqFlyer", representation(ffnumber = "numeric"), + contains = "Passenger") [1] "FreqFlyer"

The basic of OOP: inheritance


ffnumber numeric name character origin character

> getClass("FreqFlyer") Slots: Name: Class:

Name: destination Class: character

Object-Oriented Programming in R
[1] "FreqFlyer"

Extends: "Passenger"

> subClassNames("Passenger") Object-Oriented Programming in R

71

71

mes("FreqFlyer") > superClassNames("FreqFlyer")


[1] "Passenger"
16

"

The basic of OOP: Dispatch


A method is a specialized function that can
be applied to instances of one or more classes (objects). method to invoke is called dispatch.

The process of determining the appropriate A call to a function, such as plot, will invoke
a method that is determined by the class of the rst argument in the call to plot.
17

The basic of OOP: Dispatch


When a generic function is called, it must examine the
supplied arguments and determine the applicable methods. All applicable methods are arranged from most specic to least specic and the most specic method is invoked. specic methods by calling NextMethod in S3 and via callNextMethod for S4.
18

During evaluation, control may be passed to less

The basic of OOP: Dispatch

For example, consider a print method for passengers that prints their names and ight details. invoke the passenger method, and then add a line indicating the frequent yer number. Using this approach, very little additional code is needed; and if the printing of passenger information is changed, the update is automatically applied to printing of frequent yer information.
19

A print method for frequent yers could simply

The basic of OOP: Dispatch


With both S3 and S4, dispatching is

implemented through the use of generic functions.

20

Many programmers believe that objectoriented programming (OOP) makes for clearer, more reusable code. Though very different from the familiar OOP languages like C++, Java, and Python, R is very much OOP in outlook.

The following themes are key to R: Everything you touch in Rranging from numbers to character strings to matricesis an object. R promotes encapsulation, which is packaging separate but related data items into one class instance. Encapsulation helps you keep track of related variables, enhancing clarity. R classes are polymorphic, which means that the same function call leads to different operations for objects of different classes. For instance, a call to print() on an object of a certain class triggers a call to a print function tailored to that class. Polymorphism promotes reusability. R allows inheritance, which allows extending a given class to a more specialized class.
21

The S3 system
S3 is the original R structure for classes S3 is still the dominant class paradigm in R use today Most of Rs built-in classes are of the S3 type An S3 class consists of a list, with a class name S4 classes were developed later with the goal of
adding safety (cannot accidentally access a class component that is not already in existence).
22

attribute and dispatch capability added, which enables the use of generic functions

The S3 system
Generic functions and methods are widely Some classes are internal or implicit and One determines the class of an object
the function class().
23

used but there is little use of inheritance and classes are quite loosely dened. others are specied explicitly, typically by using the class attribute. using

The class attribute is a vector of character


values, each of which species a particular class. The most specic class comes rst, followed by any less specic classes.
> x = 1:10 attr(x, "class") > class(x) [1] "integer"

class of an object using the function class, and for most purposes this is sufcient; however, there are some important exceptions that arise with respect to internal functions. While there is no formal mechanism for organizing or representing instances of a class, they are typically lists, where the dierent slots are represented as named elements in the list. Using setOldClass will register an S3 class as an S4 class. The class attribute is a vector of character values, each of which species a particular class. The most specic class comes rst, followed by any less specic classes. For our frequent yer example from Section 3.2.1, the class vector should always have FreqFlyer rst and Passenger second. The recommended way of testing whether an S3 object is an instance of a particular class is to use the inherits function. Direct inspection of the class attribute is not recommended since implicit classes, such as matrix and array , are not listed in the class attribute. Notice in the code below that the class of x changes Object-Oriented Programming R class attribute, and when a dimension attribute isin added, that there is no75 that once x is a matrix it is no longer considered to be an integer .

The S3 system

1] "matrix"

ULL

Object-Oriented Programming in R > dim(x) = c(2, 5) inherits(x, "integer") > class(x) [1] "matrix" 1] FALSE

75

> attr(x, "class") In the next example we return to our FreqFlyer example and provide an S3 mplementation. NULL
24

The S3 system
A way of testing whether an S3 object is an
[1] "matrix" > Object-Oriented Programming in R 75

instance of a particular class is to use the attr(x, "class") inherits function.

NULL > inherits(x, "integer") [1] FALSE In the next example we return to our FreqFlyer example and provide an S3 implementation. > x = list(name = "Josephine Biologist", origin = "SEA", + destination = "YXY") 25 > class(x) = "Passenger"

> inherits(x, "integer") [1] FALSE

In the next example we return to our FreqFlyer example and provide an S3 implementation. > + > > + > > x = list(name = "Josephine Biologist", origin = "SEA", destination = "YXY") class(x) = "Passenger" y = list(name = "Josephine Physicist", origin = "SEA", destination = "YVR", ffnumber = 10) class(y) = c("FreqFlyer", "Passenger") inherits(x, "Passenger")

Example

[1] TRUE > inherits(x, "FreqFlyer") [1] FALSE > inherits(y, "Passenger") [1] TRUE
26

A major problem with this approach is that there is no mechanism programmers can use to ensure that all instances of the Passenger or Freq classes have the correct slots, the correct types of values in those slots, an correct class attribute. One can easily produce an object with these c that has none of the slots we have dened. And as a result, one typicall to do a great deal of checking of arguments in every S3 method. The function function is.object tests whether orwhether not an R object The is.object tests or has a clas tribute. This is somewhat important as the help page for class indicates not dispatch an R object has to a class attribute. some is restricted objects for which is.object is true.

The S3 system

76 x = 1:10 > > is.object(x) [1] FALSE

R Programming for Bioinformatics

> class(x) = "myint" > is.object(x) [1] TRUE

3.3.1

Implicit classes
27

The S3 system: Implicit classes


The earliest versions of the S language predate the
widespread use of object- oriented programming and hence the class representations for some of the more primitive or basic classes do not use the class attribute. class function while matrices and arrays are implicitly of classes matrix and array , respectively.
28

For example, functions and closures are implicitly of

The S3 system
v <- 1:10 >v [1] 1 2 3 4 5 6 7 8 9 10 > attributes(v) NULL > class(v) [1] "integer" > class(v) <- "character"
> attributes(v) NULL

> class(v) [1] "character"


29

OOP in the lm() Linear 9.1.2 Example: OOP in the lm() Linear Model Function Model As an example, lets look function at a simple regression analysis run vi
tion. First, lets see what lm() does:
> ?lm
Lets try creating an instance of this object and then printing it:
> x <- c(1,2,3) > y <- c(1,3,8) The output of this help query will tell you, > lmout <- lm(y ~ x) function returns an object of class "lm". > class(lmout) [1] "lm" > lmout Call: lm(formula = y ~ x) Coefficients: (Intercept) -3.0

dispatch the call to the proper class method, meaning that it w call to a function dened for the objects class.

among other thing

hapter 9

x 3.5

30

OOP in the lm() Linear Model function


# S3 classes library(car) # for data mod.prestige <- lm(prestige ~ income + education + women, data=Prestige) attributes(mod.prestige) $names [1] "coefficients" "residuals" [6] "assign" "qr" [11] "terms" "model" $class [1] "lm" class(mod.prestige) [1] "lm"
31

"effects" "rank" "fitted.values" "df.residual" "xlevels" "call"

The generic function is responsible for setting up


the evaluation environment and for initiating dispatch.
UseMethod that

S3 generic functions and methods

A generic function does this through a call to

initiates the dispatch on a single argument, usually the rst argument to the generic function. only two formal arguments, one often named x and the other the ... argument.
32

The generic is typically a very simple function with

OOP in the lm() Linear 9.1.2 Example: OOP in the lm() Linear Model Function Model As an example, lets look function at a simple regression analysis run vi
tion. First, lets see what lm() does:
> ?lm
Lets try creating an instance of this object and then printing it:

dispatch the call to the proper class method, meaning that it w call to a function dened for the objects class.

> x <- c(1,2,3) In R terminology, the call to > y <- c(1,3,8) the generic function print() wasthing The output of this help query will tell you, among other > lmout <- lm(y ~ x) function returns an object ofdispatched class "lm". to the method > class(lmout) print.lm() associated with the [1] "lm" class "lm". > lmout Call: lm(formula = y ~ x) Coefficients: (Intercept) -3.0

What happened here?

hapter 9

x 3.5

33

Methods are regular functions and are identied by A simple generic function named

new arguments that are appropriate to the computations they will perform. A disadvantage of this approach is that mistakes in naming arguments will be silently ignored. The mis-typed name will not match any formal argument and hence is placed in the . . . argument, where it is never used. In R, UseMethod dispatches on the class as returned by class, not that returned by oldClass. Not all method dispatch honors implicit classes. In particular, group generics (Section 3.3.5) and internal generics do not. Group generics dispatch on the oldClass for eciency reasons, and internal generics only dispatch on objects for which is.object is TRUE. An internal generic is a function that calls directly to C code (a primitive or internal function), and there checks to see if it should dispatch. To make use of these, you will need to explicitly set the class attribute. You can do that using class<-, oldClass<or by setting the attribute directly using attr<-. For most generic functions, a default method will be needed. The default method is invoked if no applicable methods are found, or if the least specic method makes a call to NextMethod. fun Methods are regular functions and are identied by their name, which is a concatenation of the name of the generic and the name of the class that they are intended to apply to, separated by a dot. A simple generic function named fun and a default method are shown below. The string default is used as if it were a class and indicates that the method is a default method for the generic. > fun = function(x, ...) UseMethod("fun") > fun.default = function(x, ...) print("In the default method") > fun(2) [1] "In the default method"
34

S3 generic functions and methods


Object-Oriented Programming in R 79

their name, which is a concatenation of the name of the generic and the name of the class that they are intended to apply to, separated by a dot. and a default method are shown below. The string default is used as if it were a class and indicates that the method is a default method for the generic.

generic.

> fun = function(x, ...) UseMethod("fun") > fun.default = function(x, ...) print("In the default method") > fun(2) [1] "In the default method"
Consider a class system with two classes, Foo which extends Bar. consider Then we adene two methods: fun.Foo We have Next, class system with two classes,and Foofun.Bar which . extends Bar . them out methods: a message, calland thefun.Bar function and then fun.Foo . WeNextMethod have them print out Then we print dene two print out second message. a message, call a the function NextMethod and then print out a second message.

S3 generic functions and methods

> + + + + > + + + +

fun.Foo = function(x) { print("start of fun.Foo") NextMethod() print("end of fun.Foo") } fun.Bar = function(x) { print("start of fun.Bar") NextMethod() print("end of fun.Bar") }
35

80

Now we can show how dispatch occurs by creating an instance that has both classes and calling fun with that instance as the rst argument.
Now we can show how dispatch occurs by creating an instance that has both classes and calling fun with that instance as the rst argument.

S3 generic functions and methods


R Programming for Bioinformatics

> x = 1 > class(x) = c("Foo", "Bar") > fun(x) [1] [1] [1] [1] [1] "start of fun.Foo" "start of fun.Bar" "In the default method" "end of fun.Bar" "end of fun.Foo"

Notice that the call to NextMethod transfers control to the next most specic method.

Notice that the call to NextMethod transfers control to the next most specic method. This is one of the benets of using an OOP paradigm. Typically, less code needs to be written, and it is easier to maintain as the methods for 36

Here, we printed out the object lmout. (Remember that by simply typing the name of an object in interactive mode, the object is printed.) The R interpreter then saw that lmout was an object of class "lm" and thus called print.lm(), a special print method for the "lm" class. In R terminology, the call to the generic function print() was dispatched to the method print.lm() associated with the class "lm". Lets take a look at the generic function and the class method in this case:
> print function(x, ...) UseMethod("print") <environment: namespace:base> > print.lm function (x, digits = max(3, getOption("digits") - 3), ...) { cat("\nCall:\n", deparse(x$call), "\n\n", sep = "") if (length(coef(x))) { cat("Coefficients:\n") print.default(format(coef(x), digits = digits), print.gap = 2, quote = FALSE) } else cat("No coefficients\n") cat("\n") invisible(x) } <environment: namespace:stats>

OOP in the lm() Linear Model function


Printing depends on context, with a special print function called for the lm class.

You may be surprised to see that print() 37 consists solely of a call to

OOP in the lm() Linear Model function


class attribute removed:
> unclass(lmout) $coefficients (Intercept) -3.0 $residuals 1 2 3 0.5 -1.0 0.5 $effects (Intercept) -6.928203 $rank [1] 2 ...

What happens when we print Dont worry about the details of . The main point isthis that the object with its printing depends on context, with a special print function called for the class removed? class. Now lets see what happens attribute when we print this object with its
print.lm() "lm" x 3.5

x -4.949747

The author of lm() decided to make print.lm() much more concise, limiting it to printing a few key quantities.
1.224745

Ive shown only the rst few lines heretheres a lot 38 more. (Try run-

able methods generic function but it does this simply by loo can be very large for and a wegiven want to control the default information that is printed for the its PHENODS3 and S3 EXPRS3 classes. bythe R. Write S3 print at names. We methods demonstrate use on the generic function mean in code below.
3.3.3.1 Due to the somewhat simple nature of the S3 system, there is very little or reection possible. The function methods reports on all avail>introspection methods("mean") able methods for a given generic function but it does this simply by looking at the names. We demonstrate its use on the S3 generic function mean in the The function methods reports on all available methods for a given [1] mean.POSIXct mean.POSIXlt code mean.Date below. generic function but it does this simply by looking at the names.

S3 generic functions and methods


Finding methods

[4] mean.data.frame mean.default


> methods("mean")

mean.difftime

[1] mean.Date mean.POSIXct mean.POSIXlt One can also use methods to nd all available methods [4] code mean.data.frame mean.default the below we nd all methods mean.difftime for the class glm .

for a given class

One can also use methods to nd all available methods for a given class. In

the code below we nd all methods class glm . One can also use methods to nd for all the available methods for a given class. In Object-Oriented Programming in R 81 code below we nd= all"glm") methods for the class glm . >the methods(class
[1] [3] [5] [7] [9] [11] [13] [15] [17] [19] [21]

> methods(class = "glm")

add1.glm* confint.glm* deviance.glm* effects.glm* family.glm* influence.glm* model.frame.glm print.glm rstandard.glm summary.glm weights.glm*

anova.glm cooks.distance.glm* drop1.glm* extractAIC.glm* formula.glm* logLik.glm* predict.glm residuals.glm rstudent.glm vcov.glm*

Non-visible functions are asterisked

39

The S3 system
# S3 generic functions and methods print # the print generic print.lm # print method for "lm" objects mod.prestige print(mod.prestige) # equivalent print.lm(mod.prestige) # equivalent, but bad form methods("print") # print methods methods(class="lm") # methods for objects of class "lm" [1] add1.lm* alias.lm* anova.lm case.names.lm* [5] confint.lm* cooks.distance.lm* deviance.lm* dfbeta.lm* [9] dfbetas.lm* drop1.lm* dummy.coef.lm* effects.lm* [13] extractAIC.lm* family.lm* formula.lm* hatvalues.lm [17] influence.lm* kappa.lm labels.lm* logLik.lm* [21] model.frame.lm model.matrix.lm plot.lm predict.lm [25] print.lm proj.lm* residuals.lm rstandard.lm [29] rstudent.lm simulate.lm* summary.lm variable.names.lm* [33] vcov.lm* Non-visible functions are asterisked
40

So, the function is in the utils namesp adding such a qualier:

The S3 system

> utils:::print.aspell(aspout) Youmispelled can nd the invisible functions via the wrds:1:15 function getAnywhere()

see all the generic methods this You can You can see all the generic methods th way:
> methods(class="default") ...
41

Writing S3 classes
A class instance is created by forming a list,
with the components of the list being the member variables of the class. the attr() or class() function.

The class attribute is set by hand by using

42

9.1.4

Writing S3 Classes

S3 classes have a rather cobbled-together structure. A class instance is created by forming a list, with the components of the list being the member variables of the class. (Readers who know Perl may recognize this ad hoc nature in Perls own OOP system.) The "class" attribute is set by hand by using the attr() or class() function, and then various implementations of generic functions are dened. We can see this in the case of lm() by inspecting the function:

Writing S3 classes

> lm ... z <- list(coefficients = if (is.matrix(y)) matrix(,0,3) else numeric(0L), residuals = y, fitted.values = 0 * y, weights = w, rank = 0L, df.residual = if (is.matrix(y)) nrow(y) else length(y)) } ... class(z) <- c(if(is.matrix(y)) "mlm", "lm") ...

Again, dont mind the details; the basic process is there. A list was created and assigned to z, which will serve as the framework for the "lm" class instance (and which will eventually be the value returned by the function). Some components of that list, such as residuals, were already assigned when the list was created. In addition, the class attribute was set to "lm" (and possi43 bly to "mlm", as will be explained in the next section).

Some components of that list, such as residuals, were already assigned when the list was created. In addition, the class attribute was set to "lm" (and possi$class bly to "mlm", as will be explained in the next section). [1] "employee" As an example of how to write an S3 class, lets switch to something simpler. Continuing our employee example from Section 4.1, we could Before we write a print method when we call the default print(): write this:

[1] "name" "salary" "union"

Writing S3 classes
> j $name [1] "Joe" $salary [1] 55000 $union [1] TRUE

> j <- list(name="Joe", salary=55000, union=T) > class(j) <- "employee" > attributes(j) # let's check
$names [1] "name" "salary" "union" $class [1] "employee"

pter 9

Before we write a print method for this class, lets see what happens attr(,"class") when we call the default print(): [1] "employee"

Before we write a print > j method for this class, $name lets see what happens [1] "Joe" when we call the default $salary print():
[1] 55000 $union

Essentially, j was treated as a list for Now lets write our own print m Essentially, j was

treated as a list for print.employee <- function(wrkr) printing purposes cat(wrkr$name,"\n")


}

cat("salary",wrkr$salary,"\n") cat("union member",wrkr$union,"\n


44

[1] 55000 $union [1] TRUE

attr(,"class") [1] "employee"

Writing S3 classes

Essentially, j was treated as a list for printing purposes. Now lets write our own print method:
print.employee <- function(wrkr) { cat(wrkr$name,"\n") cat("salary",wrkr$salary,"\n") cat("union member",wrkr$union,"\n") }

So, any call to print() on an object of class "employee" should now be referred to print.employee(). We can check that formally:
> methods(,"employee") [1] print.employee

Or, of course, we can simply try it out:


> j Joe salary 55000 union member TRUE

45

Using inheritance

For Using example, we could form a new class Inheritance 9.1.5

9.1.5 Using new Inheritance The idea of inheritance is to form The idea of inheritance is to form new class classes as specialized versions of old ones. ones. In our previous employee example, fo

class devoted to hourly employees, "hrlyempl as follows:

k as <- specialized list(name="Kate", salary= 68000, union=F The idea of inheritance is to form new classes versions of old <-we c("hrlyemployee","employee") ones. In our previous employee example, forclass(k) instance, could form a new class devoted to hourly employees, "hrlyemployee", as a subclass of "employee", as follows: Our new class has one extra variable: hr

devoted to hourly employees, hrlyemployee, as a subclass of employee, as follows: class consists of two character strings, repres

class. Our new k <- list(name="Kate", salary= 68000, union=F,old hrsthismonth= 2) class inherits the method print.employee() still works on the new class: class(k) <- c("hrlyemployee","employee")
> k OurOur new class new class has one extra variable: hrsthismonth. The name of the new Kate class consists of two character strings, representing the new class and the salary 68000 inherits the methods old class. Our new class inherits the methodsunion of the old one. For instance, member FALSE print.employee() works on the new class: from the old still class
46

Given the goals of inheritance, that is not su

S3 system: group generic


The S3 object system has the capability for dening
methods for groups of functions simultaneously. These tools are mainly used to dene methods for three dened sets of operators. their behavior modied for members of special classes.

This means that operators such as == or < can have The functions and operators have been grouped
into three categories and group methods can be written for each of these categories. 47

Group generic functions


Object-Oriented Programming in R Group Math Functions abs, acos, acosh, asin, asinh, atan, atanh, ceiling, cos, cosh, cumsum, exp, floor, gamma, lgamma, log, log10, round, signif, sin, sinh, tan, tanh, trunc all, any, max, min, prod, range, sum +, -, *, /, ^, < , >, <=, >=, !=, ==, %%, %/%, &, |, ! Table 3.1: Group generic functions.
48

83

Summary Ops

Group generic functions

It is possible to write methods specic to any function within a group and then a method dened for a single member of group takes precedence over the group method.

49

The S3 system
# S3 "inheritance" mod.mroz <- glm(lfp ~ ., family=binomial, data=Mroz) class(mod.mroz)

50

The S3 system
# Example: a logistic-regression function lreg3 <- function(X, y, predictors=colnames(X), max.iter=10, tol=1E-6, constant=TRUE) { if (!is.numeric(X) || !is.matrix(X)) # data checks stop("X must be a numeric matrix") if (!is.numeric(y) || !all(y == 0 | y == 1)) stop("y must contain only 0s and 1s") if (nrow(X) != length(y)) stop("X and y contain different numbers of observations") if (constant) { # attach constant? X <- cbind(1, X) colnames(X)[1] <- "Constant" } b <- b.last <- rep(0, ncol(X)) it <- 1 while (it <= max.iter){ p <- as.vector(1/(1 + exp(-X %*% b))) var.b <- solve(crossprod(X, p * (1 - p) * X)) b <- b + var.b %*% crossprod(X, y - p) if (max(abs(b - b.last)/(abs(b.last) + 0.01*tol)) < tol) break b.last <- b it <- it + 1 } if (it > max.iter) warning("maximum iterations exceeded") dev <- -2*sum(y*log(p) + (1 - y)*log(1 - p)) result <- list(coefficients=as.vector(b), var=var.b, deviance=dev, converged= it <= max.iter, predictors=predictors) class(result) <- "lreg3" # assign class result } 51

The S3 system
Mroz$lfp <- with(Mroz, ifelse(lfp == "yes", 1, 0)) Mroz$wc <- with(Mroz, ifelse(wc == "yes", 1, 0)) Mroz$hc <- with(Mroz, ifelse(hc == "yes", 1, 0)) mod.mroz.3 <- with(Mroz, lreg3(cbind(k5, k618, age, wc, hc, lwg, inc), lfp)) class(mod.mroz.3) mod.mroz.3 # whoops! print.lreg3 <- function(x, ...) # print method for class "lreg3" { coef <- x$coefficients names(coef) <- x$predictors print(coef) if (!x$converged) cat("\n *** lreg did not converge ***\n") invisible(x) # note: passes through argument invisible } mod.mroz.3

52

The S3 system
summary

# summary generic

summary.lreg3 <- function(object, ...) # summary method for class "lreg3" { b <- object$coefficients se <- sqrt(diag(object$var)) z <- b/se table <- cbind(b, se, z, 2*(1-pnorm(abs(z)))) colnames(table) <- c("Estimate", "Std.Err", "Z value", "Pr(>z)") rownames(table) <- object$predictors result <- list(coef=table, deviance=object$deviance, converged=object$converged) class(result) <- "summary.lreg3" # creates an object of class "summary.lreg3" result } print.summary.lreg3 <- function(x, ...) # print method for class "summary.lreg3" { printCoefmat(x$coef) cat("\nDeviance =", x$deviance,"\n") if (!x$converged) cat("\n Note: *** lreg did not converge ***\n") } summary(mod.mroz.3)

53

The S3 system
# writing a generic function names(summary(mod.prestige)) rsq <- function(model, ...) { UseMethod("rsq") } rsq.lm <- function(model, adjusted=FALSE, ...) { summary <- summary(model) if (adjusted) summary$adj.r.squared else summary$r.squared } rsq(mod.prestige) rsq(mod.prestige, adjusted=TRUE) rsq(mod.mroz) # via inheritance (doesn't work)

names(summary(mod.mroz)) names(mod.mroz) rsq.glm <- function(model, ...) { 1 - model$deviance/model$null.deviance } rsq(mod.mroz)

54

S3 example: A class for storing upper triangular matrices


class(z) <- c(if(is.matrix(y)) "mlm", "lm")

caused UseMethod() to search for a print method on the rst of ks two cl names, "hrlyemployee". That search failed, so UseMethod() tried the other name, "employee", and found print.employee(). It executed the latter. Recall that in inspecting the code for "lm", you saw this line:

We will write an R class ut for upper

You can now see that "mlm" is a subclass of "lm" for vector-valued res variables.

The motivation is to save storage space. For example, the matrix


will be stored in
NOTE

triangular matrices (squared matrices whose 9.1.6 Extended Example: A Class for Storing Upper-Triangular Matric elements below the diagonal are zeros). Now its time for a more involved example, in which we will write an R
ments below the diagonal are zeros, such as shown in Equation 9.1. 1 5 12 0 6 9 0 0 2

"ut" for upper-triangular matrices. These are square matrices whose ele

Our motivation here is to save storage space (though at the expens little extra access time) by storing only the nonzero portion of the mat

> mat <- c(1,5,6,12,9,2)


55

The R class "dist" also uses such storage, though in a more focused context and out the class functions we have here.

be stored, in column-major order. Storage for the matrix (9.1), for instance, consists of the vector (1,5,6,12,9,2), and the component mat has that value. We will include a component ix in this class, to show where in mat the various columns begin. For the preceding case, ix is c(1,2,4), meaning that column 1 begins at mat[1], column 2 begins at mat[2], and column 3 begins at mat[4]. This allows for handy access to individual elements or columns of the matrix. The following is the code for our class.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

S3 example: A class for storing upper triangular matrices


This function is a constructor

# class "ut", compact storage of upper-triangular matrices # utility function, returns 1+...+i sum1toi <- function(i) return(i*(i+1)/2)

# create an object of class "ut" from the full matrix inmat (0s included) ut <- function(inmat) { n <- nrow(inmat) rtrn <- list() # start to build the object vector that contains where in class(rtrn) <- "ut" mat each column rtrn$mat <- vector(length=sum1toi(n)) rtrn$ix <- sum1toi(0:(n-1)) + 1 begins for (i in 1:n) { # store column i ixi <- rtrn$ix[i] rtrn$mat[ixi:(ixi+i-1)] <- inmat[1:i,i] } return(rtrn) }
56

11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37

S3 example: A class for storing upper triangular matrices


} # uncompress utmat to a full matrix expandut <- function(utmat) { n <- length(utmat$ix) # numbers of rows and cols of matrix fullmat <- matrix(nrow=n,ncol=n) for (j in 1:n) { # fill jth column start <- utmat$ix[j] fin <- start + j - 1 abovediagj <- utmat$mat[start:fin] # above-diag part of col j fullmat[,j] <- c(abovediagj,rep(0,n-j)) } return(fullmat) } # print matrix print.ut <- function(utmat) print(expandut(utmat))
Object-Oriented Programming

rtrn$mat <- vector(length=sum1toi(n)) rtrn$ix <- sum1toi(0:(n-1)) + 1 for (i in 1:n) { # store column i ixi <- rtrn$ix[i] rtrn$mat[ixi:(ixi+i-1)] <- inmat[1:i,i] } return(rtrn)

215

57

S3 example: A class for storing upper triangular matrices


38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64

# multiply one ut matrix by another, returning another ut instance; # implement as a binary operation "%mut%" <- function(utmat1,utmat2) { n <- length(utmat1$ix) # numbers of rows and cols of matrix utprod <- ut(matrix(0,nrow=n,ncol=n)) for (i in 1:n) { # compute col i of product # let a[j] and bj denote columns j of utmat1 and utmat2, respectively, # so that, e.g. b2[1] means element 1 of column 2 of utmat2 # then column i of product is equal to # bi[1]*a[1] + ... + bi[i]*a[i] # find index of start of column i in utmat2 startbi <- utmat2$ix[i] # initialize vector that will become bi[1]*a[1] + ... + bi[i]*a[i] prodcoli <- rep(0,i) for (j in 1:i) { # find bi[j]*a[j], add to prodcoli startaj <- utmat1$ix[j] bielement <- utmat2$mat[startbi+j-1] prodcoli[1:j] <- prodcoli[1:j] + bielement * utmat1$mat[startaj:(startaj+j-1)] } # now need to tack on the lower 0s startprodcoli <- sum1toi(i-1)+1 utprod$mat[startbi:(startbi+i-1)] <- prodcoli } return(utprod) } 58

allocate space for the product matrix

the ith columns is a linear combination of the columns of utmat2

S3 example: A class for storing upper triangular matrices

roduct can be expressed as a linear combination of the columns of the rst column i to ofbe the product can expressed as a this linear combination of the columns of product can expressed as abe linear combination of the columns of shown the rst actor. It will help see a specic example of property, in Equathe rst factor. factor. It will help to see a specic example of this property, shown in Equaon 9.2. tion 9.2. 4 3 2 4 5 9 1 2 3 4 3 2 4 5 9 1 2 3 0 1 2 0 1 4 2 = (9.2) 0 1 0 21 0 1 2 0 1 4 = (9.2) 0 0 1 0 0 5 0 0 5 0 0 1 0 0 5 0 0 5 The comments say that, for instance, column 3 of the product is equal to

the following: The The comments say that, for instance, column 3 of the product third column of the can be calculated asis equal to product 1 2 3 he following: 2 0 + 2 1 + 1 2 0 0 5 1 2 3 Inspection of Equation conrms relation. 09.2 the + 1 2 1 2 + 2 Couching the multiplication problem in terms of columns of the 0 us to compact 0 the code and5to likely increase two input matrices enables

the speed. The latter again stems from vectorization, a benet discussed nspectionin of Equation 9.2 conrms the detail in Chapter 14. This approach is relation. used in the loop beginning at 59 line 53. (Arguably, in this case, problem the increasein in speed comes at the expense Couching the multiplication terms of columns of the

S3 Example: A procedure for polynomial regression


Consider a statistical regression setting with one predictor variable. In principle, you can get better and better models by tting polynomials of higher and higher degrees. However, at some point, this becomes overtting, so that the prediction of new, future data actually deteriorates for degrees higher than some value.

60

S3 Example: A procedure for polynomial regression


The class "polyreg" aims to deal with this issue. It ts polynomials of various degrees but assesses ts via cross-validation to reduce the risk of overtting. In this form of cross-validation, known as the leaving-one-out method, for each point we t the regression to all the data except this observation, and then we predict that observation from the t. An object of this class consists of outputs from the various regression models, plus the original data.

61

tting, so that the prediction of new, future data actually deteriorates for degrees higher than some value. The class "polyreg" aims to deal with this issue. It ts polynomials of various degrees but assesses ts via cross-validation to reduce the risk of overtting. In this form of cross-validation, known as the leaving-one-out method, for each point we t the regression to all the data except this observation, and then we predict that observation from the t. An object of this class consists of outputs from the various regression models, plus the original data. The following is the code for the "polyreg" class.
1 2 3 4 5 6 7 8 9 10

S3 Example: A procedure for polynomial regression

# "polyreg," S3 class for polynomial regression in one predictor variable # polyfit(y,x,maxdeg) fits all polynomials up to degree maxdeg; y is # vector for response variable, x for predictor; creates an object of # class "polyreg" polyfit <- function(y,x,maxdeg) { # form powers of predictor variable, ith power in ith column pwrs <- powers(x,maxdeg) # could use orthog polys for greater accuracy lmout <- list() # start to build class class(lmout) <- "polyreg" # create a new class

11 12 13 14 15 16 17 18 19 20

for (i in 1:maxdeg) { lmo <- lm(y ~ pwrs[,1:i]) Object-Oriented Programming # extend the lm class here, with the cross-validated predictions lmo$fitted.cvvalues <- lvoneout(y,pwrs[,1:i,drop=F]) lmout[[i]] <- lmo } lmout$x <- x lmout$y <- y return(lmout) }

219

62

14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38

S3 Example: A procedure for polynomial regression


} lmout$x <- x lmout$y <- y return(lmout) } # print() for an object fits of class "polyreg": print # cross-validated mean-squared prediction errors print.polyreg <- function(fits) { maxdeg <- length(fits) - 2 n <- length(fits$y) tbl <- matrix(nrow=maxdeg,ncol=1) colnames(tbl) <- "MSPE" for (i in 1:maxdeg) { fi <- fits[[i]] errs <- fits$y - fi$fitted.cvvalues spe <- crossprod(errs,errs) # sum of squared prediction errors tbl[i,1] <- spe/n } cat("mean squared prediction errors, by degree\n") print(tbl) }
63

lmo$fitted.cvvalues <- lvoneout(y,pwrs[,1:i,drop=F]) lmout[[i]] <- lmo

29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

S3 Example: A procedure for polynomial regression


} # forms matrix of powers of the vector x, through degree dg powers <- function(x,dg) { pw <- matrix(x,nrow=length(x)) prod <- x for (i in 2:dg) { prod <- prod * x pw <- cbind(pw,prod) } return(pw) }

for (i in 1:maxdeg) { fi <- fits[[i]] errs <- fits$y - fi$fitted.cvvalues spe <- crossprod(errs,errs) # sum of squared prediction errors tbl[i,1] <- spe/n } cat("mean squared prediction errors, by degree\n") print(tbl)

# finds cross-validated predicted values; could be made much faster via # matrix-update methods 64 lvoneout <- function(y,xmat) {

41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58

S3 Example: A procedure for polynomial regression


} # finds cross-validated predicted values; could be made much faster via # matrix-update methods lvoneout <- function(y,xmat) { n <- length(y) predy <- vector(length=n) for (i in 1:n) { # regress, leaving out ith observation lmo <- lm(y[-i] ~ xmat[-i,]) betahat <- as.vector(lmo$coef) # the 1 accommodates the constant term predy[i] <- betahat %*% c(1,xmat[i,]) } return(predy) }
65 # polynomial function of x, coefficients cfs

pw <- matrix(x,nrow=length(x)) prod <- x for (i in 2:dg) { prod <- prod * x pw <- cbind(pw,prod) } return(pw)

59

Chapter 9 60
61 62 63 64 65

S3 Example: A procedure for polynomial regression


60 61 62 63 64 65 66 67 68 69 70 71 72 73 74

59

# the 1 accommodates the constant term predy[i] <- betahat %*% c(1,xmat[i,])

} return(predy)

# polynomial function of x, coefficients cfs poly <- function(x,cfs) { val <- cfs[1] prod <- 1 dg <- length(cfs) - 1 for (i in 1:dg) { prod <- prod * x val <- val + cfs[i+1] * prod } }
66

29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54

S3 Example: A procedure for polynomial regression


} print(tbl) } # generic plot(); plots fits against raw data plot.polyreg <- function(fits) { plot(fits$x,fits$y,xlab="X",ylab="Y") # plot data points as background maxdg <- length(fits) - 2 cols <- c("red","green","blue") dg <- curvecount <- 1 while (dg < maxdg) { prompt <- paste("RETURN for XV fit for degree",dg,"or type degree", "or q for quit ") rl <- readline(prompt) dg <- if (rl == "") dg else if (rl != "q") as.integer(rl) else break lines(fits$x,fits[[dg]]$fitted.values,col=cols[curvecount%%3 + 1]) dg <- dg + 1 curvecount <- curvecount + 1 } }
67 # forms matrix of powers of the vector x, through degree dg

fi <- fits[[i]] errs <- fits$y - fi$fitted.xvvalues spe <- sum(errs^2) tbl[i,1] <- spe/n

S4 classes
Some programmers feel that S3 does not provide the safety normally associated with OOP. For example, consider our earlier employee database example where our class "employee" had three elds: name, salary, and union. Here are some possible mishaps: ! We forget to enter the union status. ! We misspell union as onion. ! We create an object of some class other than "employee" but accidentally set its class attribute to "employee". In each of these cases, R will not complain. The goal of S4 is to elicit a complaint and prevent such accidents.

68

Overview between the differences In each of these cases, R will not complain. The goal of S4 is to elici complaint and prevent S3 such accidents. between and S4 classes
Table 9-1: Basic R Operators
Operation Dene class Create object Reference member variable Implement generic f() Declare generic S3 Implicit in constructor code Build list, set class attr $ Dene f.classname() UseMethod() S4
setClass() new() @ setMethod() setGeneric()

We create an object of some class other than "employee" but acciden set its class attribute to "employee".

S4 structures are considerably richer than S3 structures, but here we present just the basics. Table 9-1 shows an overview of the differences between the two classes.

9.2.1

Writing S4 Classes

You dene an S4 class by calling 69 setClass(). Continuing our employee e

9.2.1 Writing S4 Classes

You dene an S4 class by calling setClass(). Conti ple, we could write the following:
> setClass("employee", + representation( + name="character", + salary="numeric", + union="logical") + ) [1] "employee"

Writing S4 classes

This denes a new class, "employee", with three specied types.


70

+ name="character", + salary="numeric", + union="logical") + ) [1] "employee"

Writing S4 classes

This denes a new class, "employee", with three member variables of the specied types. NowNow letslets create an instance of this class, for Joe, using new(), a create an instance of this class, for Joe, using new(), a built-in built-in constructor for S4 classes: constructor functionfunction for S4 classes:
> joe <- new("employee",name="Joe",salary=55000,union=T) > joe An object of class "employee" Slot "name": [1] "Joe" Slot "salary": [1] 55000
Object-Oriented Programming

223

Slot "union": [1] TRUE

Note that the member variables are called slots, referenced via the @ symbol. Heres an example:
Note that the member variables are called slots, referenced via the @ symbol.
71

Slot "salary": [1] 55000 Slot "union": [1] TRUE

Writing S4 classes

Note that the member variables are called slots, referenced via the @ symbol. Heres an example:
> joe@salary [1] 55000

We can also use the slot() function, say, as another way to query Joes salary:
> slot(joe,"salary") [1] 55000

We can assign components similarly. Lets give Joe a raise:


> joe@salary <- 65000 > joe An object of class "employee" Slot "name":

72

> joe@salary [1] 55000

We can also use the slot() function, say, as another way to query Joes salary:
> slot(joe,"salary") [1] 55000

Writing S4 classes

We can assign components similarly. Lets give Joe a raise:


> joe@salary <- 65000 > joe An object of class "employee" Slot "name": [1] "Joe" Slot "salary": [1] 65000 Slot "union": [1] TRUE

Nah, he deserves a bigger raise that that:


73

Writing S4 classes
As noted, an advantage of using S4 is safety. To illustrate this, suppose we were to accidentally spell salary as salry, like this:
> joe@salry <- 48000 Error in checkSlotAssignment(object, name, value) : "salry" is not a slot in class "employee"

By contrast, in S3 there would be no error message. S3 classes are just lists, and you are allowed to add a new component (deliberately or not) at any time.

9.2.2

Implementing a Generic Function on an S4 Class

To dene an implementation of a generic function on an S4 class, use 74

Implementing a generic function on an S4 class


To dene an implementation of a generic function on an S4 class, use setMethod(). Lets do that for our class "employee" here. Well implement the show() function, which is the S4 analog of S3s generic "print".

75

To dene an implementation of a generic function on an S4 class, use setMethod(). Lets do that for our class "employee" here. Well implement show() function, which S4 analog S3s generic "print" . of the In the R, when you type the name ofis a the variable while inof interactive mode, the value As know, variable is you printed out: in R, when you type the name of a variable while in interactive mode, the value of the variable is printed out:
> joe An object of class "employee" Slot "name": [1] "Joe" Slot "salary": [1] 88000 Slot "union": [1] TRUE

Implementing a generic function on an S4 class


9.2.2 Implementing a Generic Function on an S4 Class

By contrast, in S3 there would be no error message. S3 classes are just lists, and you are allowed to add a new component (deliberately or not) at any time.

Since joe is an S4 object, the action here is that show() is called. In fact, we would get the same output by typing this:
> show(joe)
76

[1] TRUE

Since joe is an S4 object, the action here is that show() is called. In fact, we would get the same output by typing this:
> show(joe)

Implementing a generic function on an S4 class


Lets override that, with the following code:

setMethod("show", "employee", function(object) { inorout <- ifelse(object@union,"is","is not") cat(object@name,"has a salary of",object@salary, "and",inorout, "in the union", "\n") } )

The rst argument gives the name of the generic function for which we will dene a classwill dene class-specic method, and argument the specic method,a and the second argument gives the the classsecond name. We then denegives the new name. then function. Lets We try it out:dene the new function.

The rst argument gives the name of the generic function for which we class
225

> joe Joe has a salary of 55000 and is in the union


77

Object-Oriented Programming

S4 system
The S4 system was designed to overcome some of
the deciencies of the S3 system as well as to provide other functionality that was simply missing from the S3 system. explicit representation of classes, together with tools that support programmatic inspection of the class denitions and properties. S4 methods are registered directly with the appropriate generic. 78

Among the major changes between S3 and S4 are the Multiple dispatch is supported in S4, but not in S3, and

S4 system
These changes greatly increase the stability of the
system and make it much more likely that code will perform as intended by its authors. slower and it is more difficult to design and modify a system interactively.

This comes with some costs; code is slightly

79

S4 system: classes
A class denition species the structure, inheritance and initialization of instances of that class. A class is dened by a call to the function setClass. The following arguments can be specied in the call to setClass:
Class a character string naming the class. representation a named vector of types or classes. The names correspond to the slot names in the class and the types indicate what type of value can be stored in the slot. contains a character vector of class names, indicating the classes extended or subclassed by the new class. prototype an object (usually a list) providing the default data for the slots specied in the representation. validity a function that checks the validity of instances of the class. It must return either TRUE or a character vector describing how the object is invalid.
80

S4 system: classes
Once a class has been dened by a call to
setClass, it

is possible to create instances of the class through calls to new. dene default values to use for the different components of the class. Prototype values can be overridden by expressly setting the value for the slot in the call to new.
81

The prototype argument can be used to

Once a class has been dened by a call to setClass, it is possible to create instances of the class through calls to new. The prototype argument can be used to dene default values to use for the dierent components of the class. Prototype values can be overridden by expressly setting the value for the slot in the call to new. In the code below, we create a new class named A that has a single slot, s1, that contains numeric data and we set the prototype for that slot to be 0.

Example

> setClass("A", representation(s1 = "numeric"), + prototype = prototype(s1 = 0)) [1] "A" > myA = new("A") > myA An object of class "A" Slot "s1": [1] 0

82

Example
86 R Programming for Bioinformatics > m2 = new("A", s1 = 10) > m2 An object of class "A" Slot "s1": [1] 10

We can create a second class B that contains A, so that B is a direct subclass of A or, put another way, B inherits from class A. Any instance of the class B will have all the slots in the A class and any additional ones dened specically for B . Duplicate slot names are not allowed, so the slot 83 names for B must be distinct from those for A.

S4 system: classes
We can create a second class B that contains
A, so that B is a direct subclass of A or, B inherits from class A.

Any instance of the class B will have all the slots in the A class and any additional ones dened specically for B . Duplicate slot names are not allowed, so the slot names for B must be distinct from those for A.
84

We can create a second class B that contains A, so that B is a direct subclass of A or, put another way, B inherits from class A. Any instance of the class B will have all the slots in the A class and any additional ones dened specically for B . Duplicate slot names are not allowed, so the slot names for B must be distinct from those for A.

Example

> setClass("B", contains = "A", representation(s2 = "character"), + prototype = list(s2 = "hi")) [1] "B" > myB = new("B") > myB An object of class "B" Slot "s2": [1] "hi" Slot "s1": [1] 0
85

S4 system: classes
Classes can be removed using the function
removeClass. However, this

is not especially useful since you cannot remove classes from attached packages. with class creation interactively.

The removeClass is most useful when experimenting


86

not especially useful since you cannot remove classes from attached packages. The removeClass is most useful when experimenting with class creation interactively. But in most cases, users are developing classes within packages, and the simple expedient of removing the class denition and rebuilding the package is generally used instead. We demonstrate the use of this function on a user-dened class in the code below.

Example

> setClass("Ohno", representation(y = "numeric")) [1] "Ohno"

Object-Oriented Programming in R

87

Slots: > getClass("Ohno") Name: y Class: numeric > removeClass("Ohno") [1] TRUE > tryCatch(getClass("Ohno"), error = function(x) "Ohno is gone") [1] "Ohno is gone"
87

S4 system: classes
Once a class has been dened, there are a These include:

getSlots

number of software tools that can be used to nd out about that class. that will report the slot names and
slotNames

types,

the function

that will report only

the slot names.


88

3.4.1.1

Introspection

Once a class has been dened, there are a number of software tools that can be used to nd out about that class. These include getSlots that will report the slot names and types, the function slotNames that will report only the slot names. These functions are demonstrated using the class A dened above. > getSlots("A") s1 "numeric" > slotNames("A") [1] "s1" The class itself can be retrieved using getClass.The function extends can be called with either the name of a single class, or two class names. If called with two class names, it returns TRUE if its rst argument is a subclass of its second argument. If called with a single class name, it returns the names of 89

Example

S4 system: classes

The class itself can be retrieved using
getClass.

The function extends can be called with either the name of a single class, or two class names.

If called with two class names, it returns TRUE if its rst argument is a subclass of its second argument. If called with a single class name, it returns the names of all subclasses, including the class itself.

Additional helper functions have been dened in the RBioinf package, superClassNames and subClassNames, to print the names of the superclasses and of the subclasses, respectively.
90

second argument. If called with a single class name, it returns the names of all subclasses, including the class itself. However, this is slightly confusing and additional helper functions have been dened in the RBioinf package, superClassNames and subClassNames, to print the names of the superclasses and of the subclasses, respectively. The use of these functions is shown in the code below.

Example

> extends("B") 88

R Programming for Bioinformatics

[1] "B" "A" > extends("B", "A")


[1] TRUE > extends("A", "B") [1] FALSE > superClassNames("B") [1] "A" > subClassNames("A") [1] "B"

91 These functions also provide information about builtin classes that have

[1] "A" > superClassNames("B") [1] "A" > subClassNames("A") [1] "A" > subClassNames("A") [1] "B" > subClassNames("A") [1] "B" [1] "B" These functions also provide information about builtin classes that have been converted via setOldClass . information about builtin classes that have These functions also provide

S4 system: classes
that

been converted via also setOldClass These functions provide . information about builtin classes that have functions also .provide information about built-in classes been These converted via setOldClass > getClass("matrix") have been converted via setOldClass. > getClass("matrix")

> getClass("matrix") No Slots, prototype of class "matrix" No Slots, prototype of class "matrix" Extends: No Slots, prototype of class "matrix" Class "array", directly Extends: Class by class "array", distance 2 Class "structure", "array", directly Extends: Class by class "array", distance 3, with explicit co Class "vector", "structure", by class "array", distance 2 Class "array", directly erce Class "vector", by class "array", distance 3, with explicit co Class "structure", by class "array", distance 2 erce Class "vector", by class "array", distance 3, with explicit co Known Subclasses: erce "array", directly, with explicit test and coerce Class Known Subclasses: Class "array", directly, with explicit test and coerce Known Subclasses: Class "array", directly, with explicit test and coerce > extends("matrix") > extends("matrix") [1] "matrix" "array" "structure" "vector" > extends("matrix") [1] "matrix" "array" "structure" "vector" [1] "matrix" "array" "structure" "vector" To determine whether or not a class has been dened, use isClass. You 92 . can test whether whether or not an Rnot object is an instance of an S4use class using. isS4 To determine or a class has been dened, isClass You

S4 system: classes
To determine whether or not a class has
been dened, use isClass.

You can test whether or not an R object is


an instance of an S4 class using isS4.

All S4 objects should also return

for is.object, but so will any object with a class attribute.


TRUE
93

S4 system: classes
The standard mechanism for coercing objects from
one class to another is the function as, which has two forms. class is coerced to the other class, and

One form is coercion where an instance of one


the second form is an assignment version, where a portion of the object supplied is coerced.

The second form is really only applicable to


94

situations where one class is a subclass of the other.

The standard mechanism for coercing objects from one class to another is the function as, which has two forms. One form is coercion where an instance of one class is coerced to the other class, and the second form is an assignment version, where a portion of the object supplied is coerced. The second form is really only applicable to situations where one class is a subclass of the other. In the example below, we rst create an instance of B , then coerce it to be an instance of A. The method for this is automatically available since the classes are nested, and in fact you can also coerce from the superclass to the subclass, with missing slots being lled in from the prototype.

Example

> myb = new("B") > as(myb, "A") An object of class "A" Slot "s1": [1] 0 The second form is the assignment form where we replace the A part of myb with the new values in mya.

> mya = new("A", s1 = 20)

95

An object of class "A" Slot "s1": [1] 0

Example

The second form is the assignment form where we replace the A part of myb with the new values in mya.

> mya = new("A", s1 = 20) > as(myb, "A") <- mya > myb An object of class "B" Slot "s2": [1] "hi" Slot "s1": [1] 20 When classes are not nested, the user must provide an explicit version of the coercion function, and optionally of the replacement function. The syntax
96

S4 system: classes
When classes are not nested, the user must provide
an explicit version of the coercion function, and optionally of the replacement function.

The syntax is setAs(from, to, def, replace),


between which coercion is being dened.

where the from and to are the names of the classes

The coercion function is supplied as the argument def


and it must be a function of one argument, an instance of the from class and return an instance of the to class. 97

S4 system: classes
Once a class has been dened, users will want to the specication of a prototype for the class, the creation of an initialize method, or through values supplied in the call to new.
98

create instances of that class. The creation of instances is controlled by three separate but related tools:

to the initialize method hierarchy. Provided any user-supplied initialize methods have a call to callNextMethod, this hierarchy will be traversed until the default method is encountered. In this method the value is modied according to the arguments supplied to new and the result is returned. The prototype can be set using either a list or a call to prototype. In the example below, we dene a class, Ex1 , whose prototype has a random sample of values from the N (0, 1) distribution in its s1 slot.

Example

Programming in R > setClass("Ex1",Object-Oriented representation(s1 = "numeric"), prototype = prototype(s1 = rnorm(10))) [1] "Ex1" > b = new("Ex1") > b An object of class "Ex1" Slot "s1": [1] -1.3730 -0.5483 0.2648 [8] -1.6695 -0.0536 0.0729

91

0.0487

1.4423

0.0283

1.1793

Exercise 3.6 What happens if you generate a second instance of the Ex1 class? Why 99

might this not be desirable? Examine the prototype for the class and see if you can understand what has happened. Will changing the prototype to list(s1=quote(rnorm(10))) x the problem? When a subclass, such as B from our previous example, is dened, then a prototype is constructed from the prototypes of the superclasses for slots that are not specied in the prototype for the subclass. We see, below, that the prototype for B has a value for the s1 slot, even though none was formally supplied, and that value is the one for the superclass A. > bb = getClass("B") > bb@prototype <S4 Type Object> attr(,"s2") [1] "hi" attr(,"s1") [1] 0 If desired, one can dene an initialize method for a class. The default initialize method takes either named arguments, where the names are those 100 of slots, or one or more unnamed arguments that correspond to instances of

Example

Example
In the example below, we dene two new classes, one a simple class, W, and then a class that is a subclass of both A, dened earlier, and W . When creating new instances of W and A, we made use of named arguments to the initialize method, but when creating a new instance of the WA class, we used the unnamed variant and supplied instances of the superclasses. 92 R Programming for Bioinformatics

> setClass("W", representation(c1 = "character")) [1] "W" > setClass("WA", contains = (c("A", "W"))) [1] "WA" > a1 = new("A", s1 = 20) > w1 = new("W", c1 = "hi") > new("WA", a1, w1) An object of class "WA" Slot "s1": [1] 20 Slot "c1": [1] "hi"
101

Types of classes
A class can be instantiable or virtual. Direct instances of virtual classes cannot be
created.

One can test whether or not a class is


virtual using isVirtualClass()
.

102

[1] "a" > b = new("Foo", a = 10) > a(b) [1] 10

Using S3 classes with S4 classes

3.4.6 UsinginS3 classes S4 classes used for dispatch S4 methods bywith rst creating an S4 virtualization of the class. This is

S3 classes can be used to describe the contents of a slot in an S4 class, and they can be

done a call to setOldClass , and many such are of created when S3 with classes can be used to describe the classes contents a slot in the an methods S4 class, package is attached. The resulting S4 classes classes, that creating instances an S4 and they can be used for dispatch in are S4 virtual methods by so rst cannot be created All classes created a call to inherit from the class , and many virtualization of directly. the class. This is done by with a call to setOldClass oldClass . such classes are created when the methods package is attached.

> setOldClass("mymatrix") > getClass("mymatrix") Virtual Class No Slots, prototype of class "S4" Extends: "oldClass" The resulting S4 classes are virtual 103 classes, so that instances cannot be

S4 generic functions and methods


Generic functions are created by calls to setGeneric
and, once created, methods can be associated with them through calls to setMethod.

The arguments of the method must conform, to some


extent, with those of the generic function. The method denition indicates the class of each of the formal arguments and this is called the signature of the method. There can be, at most, one method with any signature.
104

S4 generic functions and methods


In most cases the call to setGeneric will
follow a very simple pattern. There are a number of arguments that can be specied when calling setGeneric: the generic function

the name argument species the name of


the def argument provides the denition for the generic function.
105

S4 generic functions and methods


In almost all cases the body of the function
supplied as the def argument will be a call to standardGeneric since this function is used to: dispatch to methods based on the supplied arguments to the generic function and be used if no function with matching signature is found.
106

it also establishes a default method that will

setMethod()
setMethod(f, signature=character(), definition, where = topenv(parent.frame()), valueClass = NULL, sealed = FALSE) f A generic function or the character-string name of the function. signature A match of formal argument names for f with the character-string names of corresponding classes. See the details below; however, if the signature is not trivial, you should use method.skeleton to generate a valid call to setMethod. definition A function definition, which will become the method called when the arguments in a call to f match the classes in signature, directly or through inheritance. where the environment in which to store the definition of the method. For setMethod, it is recommended to omit this argument and to include the call in source code that is evaluated at the top level; that is, either in an R session by something equivalent to a call to source, or as part of the R source code for a package. For removeMethod, the default is the location of the (first) instance of the method for this signature. valueClass Obsolete and unused, but see the same argument for setGeneric. sealed If TRUE, the method so defined cannot be redefined by another call to setMethod (although it can be removed and then re-assigned).

107

generic function and it also establishes a default method that will be used if no function with matching signature is found. The syntax is quite straightforward. The def argument is a function, each named argument can be dispatched on, and the . . . argument should be used if other arguments to the generic will be permitted. These arguments cannot be dispatched on, however. So in the code below, the generic function has two named arguments, object and x, and methods can be dened that indicate dierent signatures for these two arguments.

Example

> setGeneric("foo", function(object, x) standardGeneric("foo")) [1] "foo" > setMethod("foo", signature("numeric", "character"), function(object, x) print("Hi, I m method one")) [1] "foo" Exercise 3.9 Dene another method for the generic function foo dened above, with a dierent signature. Test that the correct method is dispatched to for dierent 108 arguments.

S4 generic functions and methods


Any argument passed through the . . . argument
cannot be dispatched on.

It is possible to have named arguments that are not

part of the signature of the generic function. This is achieved by explicitly stating the signature for the generic function using the signature argument in the call to setGeneric.
109

arguments. Any argument passed through the . . . argument cannot be dispatched on. It is possible to have named arguments that are not part of the signature of the generic function. This is achieved by explicitly stating the signature for the generic function using the signature argument in the call to setGeneric, as is demonstrated below. In that case it may make sense for a method to provide default values for the arguments not in the signature.

Example

> setGeneric("genSig", signature = c("x"), function(x, y = 1) standardGeneric("genSig")) [1] "genSig" > setMethod("genSig", signature("numeric"), function(x, y = 20) print(y)) [1] "genSig" > genSig(10) [1] 20
110

S4 generic functions and methods


Whether or not a function is a generic function
can be determined using
removeGeneric, but
isGeneric.

Generic functions can be removed using

this is not too useful since only generic functions dened in the users workspace are easily removed. the packages that they are dened in, use the function getGenerics, with no arguments.
111

To nd all generic functions that are dened, and

Example
> getClass("ObjectsWithPackage") Class "ObjectsWithPackage" [package "methods"]

104 Slots:

generic functions dened in that package. In the example below, we load the Extends: Biobase package and then try to nd all generic functions that are dened Class "character", from data part in it. "vector", by class "character", distance 2 Class
Class "data.frameRowLabels", by class "character", distance 2 Class "characterORMIAME", by class "character", distance 2

Name: .Data package Class: character character

R Programming for Bioinformatics

> > > > >

library("Biobase") allG = getGenerics() allGs = split(allG@.Data, allG@package) allGBB = allGs[["Biobase"]] length(allGBB)


112

[1] 78

Evaluation model for generic functions


When the generic function is invoked, the supplied
arguments are matched to the arguments of the generic function; those that correspond to arguments in the signature of the generic are evaluated.

Once evaluation of the generic function begins, all


methods registered with the generic function are inspected and the applicable methods are determined.
113

Evaluation model for generic functions


A method is applicable if for all arguments in
its signature, the class specied in the method either matches the class of the supplied argument or is a superclass of the class of the supplied argument.

The applicable methods are ordered from

most specic to least specic. Dispatch is entirely determined by the signature and the registered methods at the time evaluation of the generic function begins.
114

Methods are declared and assigned to generic


functions through calls to setMethod.
removeMethod or removeMethods.

The syntax of method declaration

They can be removed through a call to either The method should have one argument matching
each argument in the signature of the generic function.

These arguments can correspond to any dened class


or they can be either of the two special classes: ANY and missing . 115

The syntax of method declaration


Use ANY if the method will accept any value
for that argument.

The class missing is appropriate when the

method will handle some, but not all, of the arguments in the signature of the generic.

116

The syntax of method declaration


When . . . is an argument to the generic function, There can be only one method, with any given
117

you can dene methods with named arguments that will be handled by the . . . argument to the generic function. But some care is needed because these arguments, in some sense, do not count. signature (set of classes dened for the formal arguments to the generic), regardless of whether or not other argument names match.

When . . . is an argument to the generic function, you can dene methods with named arguments that will be handled by the . . . argument to the generic function. But some care is needed because these arguments, in some sense, do not count. There can be only one method, with any given signature (set of classes dened for the formal arguments to the generic), regardless of whether or not other argument names match.

Example

>

setGeneric("bar", function(x, y, ...) standardGeneric("bar"))

[1] "bar" > setMethod("bar", signature("numeric", "numeric"), function(x, y, d) print("Method1"))

[1] "bar" > > ##removes the method above setMethod("bar", signature("numeric", "numeric"), function(x, y, z) print("Method2"))

[1] "bar" > bar(1,1,z=20)

[1] "Method2" > bar(2,2,30)

[1] "Method2" > tryCatch(bar(2,4,d=20), error=function(e) 118 print("no method1"))

if other arguments to the generic will be permitted. These arguments cannot be dispatched on, however. So in the code below, the generic function has two named arguments, object and x, and methods can be dened that indicate dierent signatures for these two arguments.

Example

> setGeneric("foo", function(object, x) standardGeneric("foo")) [1] "foo" > setMethod("foo", signature("numeric", "character"), function(object, x) print("Hi, I m method one")) [1] "foo"

Dene another method for the generic function foo dened above, with a di erent signature. Test that the correct method is dispatched to for dierent > foo(5,3) arguments.

> foo(5,"l") Exercise 3.9 [1] "Hi, I m method one"

Error en function (classes, fdef, mtable) : unable to nd an inherited method for function "foo", for signature "numeric","numeric" Any argument passed through the . . . argument cannot be dispatched on.

119 It is possible to have named arguments that are not part of the signature of

100

3.4.5

Accessing slots directly using the @ operator relies on the implementation details of the class, and such access will make it very dicult to change that implementation. In many cases it will be advantageous to provide accessor functions for some, or all, of the components of an object. Suppose that the create accessor for this slot, we createfor a generic function named class To Foo has a an slot named a.function To create an accessor function this slot, we aa and a method fornamed instances ofathe classfor Foo. create generic function a and method instances of the class Foo. > setClass("Foo", representation(a = "ANY")) [1] "Foo" > setGeneric("a", function(object) standardGeneric("a")) [1] "a" > setMethod("a", "Foo", function(object) object@a) [1] "a" > b = new("Foo", a = 10) > a(b) [1] 10
120

S4 system: Accessor functions


R Programming for Bioinformatics

Accessor functions

The S4 system
# definition of S4 classes setClass("lreg4",representation(coefficients="numeric", var="matrix",iterations="numeric", deviance="numeric", predictors="character"))

121

The S4 system
lreg4 <- function(X, y, predictors=colnames(X), constant=TRUE, max.iter=10, tol=1E-6) { if (!is.numeric(X) || !is.matrix(X)) stop("X must be a numeric matrix") if (!is.numeric(y) || !all(y == 0 | y == 1)) stop("y must contain only 0s and 1s") if (nrow(X) != length(y)) stop("X and y contain different numbers of observations") if (constant) { X <- cbind(1, X) colnames(X)[1] <- "Constant" } b <- b.last <- rep(0, ncol(X)) it <- 1 while (it <= max.iter){ p <- as.vector(1/(1 + exp(-X %*% b))) var.b <- solve(crossprod(X, p * (1 - p) * X)) b <- b + var.b %*% crossprod(X, y - p) if (max(abs(b - b.last)/(abs(b.last) + 0.01*tol)) < tol) break b.last <- b it <- it + 1 } if (it > max.iter) warning("maximum iterations exceeded") # create an instance of the "lreg4" class: result <- new("lreg4", coefficients=as.vector(b), var=var.b, iterations=it, deviance=-2*sum(y*log(p) + (1 - y)*log(1 - p)), predictors=predictors) result } 122

The S4 system
mod.mroz.4 <- with(Mroz, lreg4(cbind(k5, k618, age, wc, hc, lwg, inc), lfp)) class(mod.mroz.4) mod.mroz.4

123

The S4 system
show # the S4 generic function show # defining an S4 method setMethod("show", signature(object="lreg4"), definition=function(object) { coef <- object@coefficients names(coef) <- object@predictors print(coef) } ) mod.mroz.4 # invokes show method

124

The S4 system
setMethod("summary", signature(object="lreg4"), definition=function(object, ...) { b <- object@coefficients se <- sqrt(diag(object@var)) z <- b/se table <- cbind(b, se, z, 2*(1-pnorm(abs(z)))) colnames(table) <- c("Estimate", "Std.Err", "Z value", "Pr(>z)") rownames(table) <- object@predictors printCoefmat(table) cat("\nDeviance =", object@deviance,"\n") } ) summary(mod.mroz.4)

125

The S4 system
# Lexical scope f <a <x <f(2) x function (x) x + a 10 5 # x bound to 2 in frame of f(), a to 10 in global frame # global x is undisturbed f <- function (x) { a <- 5 g(x) } g <- function(y) y + a f(2) # a bound to 10 in global frame a # global a is undisturbed f <- function (x) { a <- 5 g <- function (y) y + a g(x) } f(2) # a is bound to 5, x to 2 in frame of f(), y to 2 in frame of g()

126

The S4 system
# a function that returns a closure (function + environment) makePower <- function(power) { function(x) x^power } square <- makePower(2) square # power bound to 2 square(4) cuberoot <- makePower(1/3) cuberoot # power bound to 1/3 cuberoot(64)

127

When a generic function is invoked, the classes of all


supplied arguments that are in the signature of the generic function form the target signature.

The semantics of method invocation

A method is said to be applicable for this target

signature if for every argument in the signature the class specied by the method is the same as the class of the corresponding supplied argument, a superclass of that class, or has class ANY. on the classes.

To order the applicable methods, we need a metric


128

The semantics of method invocation


A simple metric is the following:

if the classes are the same, the distance is zero; if the class in the signature of the method is a direct superclass of the class of the supplied argument, then the distance is one, and so on. The distance from a class to ANY is chosen to be larger than any other distance.

The distance between an applicable method and the target signature can then be computed by summing up the distances over all arguments in the signature of the generic function, and these distances can then be used to order the methods.

The semantics of method invocation


Once the the ordered list of methods has
been computed, control is passed to the most specic method. function, so post-processing is possible.

S4 control will return to the generic

130

Finding methods
We will often need to be able to determine
which methods are registered with a particular generic function.

At other times we will want to be able to


determine whether a particular signature will be handled by a generic. functions listed next.

Functionality of this sort is provided by the


131

Finding methods

showMethods shows the methods for one or more generic functions. The class argument can be used to ask for all methods that have a particular class in their signature. The output is printed to stdout by default and cannot easily be captured for programmatic use. getMethod returns the method for a specic generic function whose signature is congruent with the specied signature. An error is thrown if no such method exists. ndMethod returns the packages in the search path that contain a denition for the generic and signature specied.
132

Finding methods

selectMethod returns the method for a specic generic function and signature, but differs from getMethod in that inheritance is used to identify a method. existsMethod tests for a method with a congruent signature (to that provided) registered with the specied generic function. No inheritance is used. Returns either TRUE or FALSE. hasMethod tests for a method with a congruent signature for the specied generic function. It seems that this would always return TRUE (since there must be a default method). It does return FALSE if there is no generic function, but it seems that there are better ways to handle that.
133

Finding Documentation
Either a direct call to help or the use of the ?
operator will obtain the help page for most functions. example, t he syntax for displaying the help page for the graph class, from the graph package is: class?graph help("graph-class")
134

To nd out about classes an inx syntax is used, for

Finding Documentation
Help for generic functions requires no special syntax;
one just looks for help on the name of the generic function.

The syntax for two different ways to nd the help


method?nodes("graphNEL") help("nodes,graphNEL-method")
135

page for a method for the nodes generic function, for an argument of class graphNEL.

Finding Documentation
library(RBioinf) S4Help() The function takes the name of either a S4 generic
136

function or a S4 class and provides a selection menu to choose a help page.

Managing S3 and S4 together


Testing for inheritance is done differently
between S3 and S4. The former uses the function inherits while the latter uses is.

137

> setOldClass(c("C1", "C2")) > is(x, "C2") [1] TRUE

The function asS4 can be used to allow an


> x = 1 > setClass("A", representation(s1 = "numeric")) [1] "A" > setMethod("+", c("A", "A"), function(e1, e2) print("howdy")) [1] "+" > class(x) = "A" > x + x [1] 2 attr(,"class") [1] "A" > asS4(x) + x [1] "howdy" [1] "howdy"

The function isS4 returns TRUE for an instance of an S4 class. For primitive functions that support dispatch, S4 methods are restricted to S4 objects. The function asS4 can be used to allow an instance of an S3 class to be passed to an S4 method. In the next example we show that when x is an S3 instance, we do not dispatch to the S4 method, but once we use asS4, then dispatch to the S4 method occurs.

Managing S3 and S4 together

Object-Oriented Programming in R

113

instance of an S3 class to be passed to an S4 method.

138