Sie sind auf Seite 1von 5

Extracting information from data:

Principal Component Analysis, Robustness, and Outlier Detection

Introduction to R. Graphical Representation of


Multivariate Observations
Laboratory Guide I

1. Start by opening R and change the working directory, using the menu: File -
> Change dir...
choose the appropriate path to save your work, e.g.

C:\ROLIVEIRA\MODCLIM\Project1

2. Try simple commands typing them in the command window:

1+1
10*3
c(1,2,3)
c(1,2,3)*10
x <- 5
x*x
exp(1)

3. Create a vector and a matrix of your choice, e.g.

lx<-c(15:1, 3:11, 7*2, 8)


m<-matrix(lx,13,2,byrow=TRUE)

Get the logarithm in base e, 10 and the square root of the vector lx.

log(lx)
log(lx,10)
sqrt(lx)

Multiply m by the two first entrances of lx. Multiply m by its transpose.

m%*%lx[1:2]
m%*%t(m)

4. The Iris data is a famous multivariate dataset and will be used to illustrate the
use of various commands. Start by typing

iris
dim(iris)
colnames(iris)

1
allowing you to see your dataset, the size of the data matrix and the names of
each column. Your data can be stored in different structures. Verify that the
object iris is not a matrix, but a data.frame using the commands:

is.matrix(iris)
is.data.frame(iris)

5. Calculate summary statistics for each of the first 4 columns of your dataset. Do
it in two different ways:

summary(iris[,1])
summary(iris[,2])
summary(iris[,3])
summary(iris[,4])
sd(iris[,1])
sd(iris[,2])
sd(iris[,3])
sd(iris[,4])

apply(iris[,1:4],2,summary)
apply(iris[,1:4],2,sd)

Use also the package psych to obtain a more complete summary statistics:

library(psych)

describe(iris[,1:4])
describe.by(iris[,1:4],group=iris[,5])

6. The last column represents the type of lily that is under study. Check how many
types of lilies we have and many flowers belong to each category.

table(iris[,5])

7. Calculate the covariance matrix and correlations among the 4 variables that cha-
racterize the flowers.

var(iris[,1:4])
cor(iris[,1:4])

Calculate the covariance matrix between the variables: Sepal.Length, Sepal.Width


and Petal.Width.

cov(iris[,1:2],iris[,4])

8. Plot the observations using histograms, box plots, dot plots, etc. When possible,
assign the same color and symbol to the same species (category).

par(mfrow=c(2,2))
hist(iris[,1],prob=TRUE,xlab="Sepal.Length")
hist(iris[,2],prob=TRUE,xlab="Sepal.Width")
hist(iris[,3],prob=TRUE,xlab="Petal.Length")
hist(iris[,4],prob=TRUE,xlab="Petal.Width")

2
par(mfrow=c(1,1))
boxplot(iris[,1:4],prob=TRUE,xlab="")

error.bars(iris[,1:4])

boxplot(iris[,1:4])
error.bars(iris[,1:4],add=TRUE,col="red",lwd=2)

par(mfrow=c(3,2))
plot(iris[,1:2],col=iris$Species,xlab="Sepal.Length",ylab="Sepal.Width")
plot(iris[,1],iris[,3],col=iris$Species,xlab="Sepal.Length",ylab="Petal.Length")
plot(iris[,1],iris[,4],col=iris$Species,xlab="Sepal.Length",ylab="Petal.Width")
plot(iris[,2],iris[,3],col=iris$Species,xlab="Sepal.Width",ylab="Petal.Length")
plot(iris[,2],iris[,4],col=iris$Species,xlab="Sepal.Width",ylab="Petal.Width")
plot(iris[,3],iris[,4],col=iris$Species,xlab="Petal.Length",ylab="Petal.Width")

par(mfrow=c(1,1))
pairs(iris[,1:4],col=iris[,5])

pairs.panels(iris[,1:4], smooth = FALSE, scale = FALSE, density=TRUE,


ellipses=FALSE,digits = 2,col=iris[,5],hist.col="green")

par(mfrow=c(2,2))
plot(iris$Petal.Length ~ iris$Species, col="cyan")
plot(iris[,2] ~ iris$Species, col="cyan")
plot(iris[,3] ~ iris$Species, col="cyan")
plot(iris[,4] ~ iris$Species, col="cyan")

9. Plot the density function of a univariate normal with expected value zero and
unit variance. Start using the help to see how to use the command curve.

?curve
par(mfrow=c(1,1))
curve(dnorm(x),from=-4,to=4)

10. Make the Chernoff faces and star charts for the dataset that characterize the
sparrows. Start by reading the file sparrows.txt.

sparrows<-read.table("sparrows.txt")

library(graphics)
stars(sparrows)
library(aplpack)
faces(sparrows)

11. Check the objects you have in memory by typing ls(). Use rm(xpto) if you
want to delete the object xpto from memory. Record the object, e.g. in the file
aula1.RDATA, you are working with using the command

save.image("class1.RDATA")

3
To read these data type

load("class1.RDATA").

12. Make the following demos that allow you to understand some of the graphics
capability of R.

demo(image)
demo(graphics)
demo(persp)
demo(plotmath)
library(lattice)
demo(lattice)
library(vcd )
demo(mosaic)

13. Build a function to draw Andrews curves using the sparrows dataset. Remember
that the individual xi = (x1 , . . . , xp ) is represented by the following function:
x1
fxi (t) = + x2 sin(t) + x3 cos(t) + x5 sin(2t) + x6 cos(2t) + . . . , < t
2

"Andrews"<-function(data)
{
n<-nrow(data)

data<-scale(data)
i<-1
curve(data[i,1]/sqrt(2)+data[i,2]*sin(x)+data[i,3]*cos(x)+data[i,4]*sin(2*x)+
data[i,5]*cos(2*x),from=-pi,to=pi,ylab="Curvas de Andrews",add=FALSE,
ylim=c(-8,8))

for(i in 2:n)
{
if(i<=21) cor<-1
else cor<-2

curve(data[i,1]/sqrt(2)+data[i,2]*sin(x)+data[i,3]*cos(x)+data[i,4]*sin(2*x)+
data[i,5]*cos(2*x),from=-pi,to=pi,ylab="Curvas de Andrews",add=TRUE,
col=cor)
}
}

sparrows<-read.table("sparrows.txt",header=TRUE)
names(sparrows)

Andrews(sparrows)

The Rs webpage is available in:


http://cran.r-project.org/
For more information about the graphical capabilities of R, see:

4
https://cran.r-project.org/web/views/Graphics.html

And for more examples of graphical representations see:

http://www.visualcomplexity.com/vc/

Das könnte Ihnen auch gefallen