Sie sind auf Seite 1von 10

A Report on R

NAME-KAVEENA
ROLL NO-12EE46
1. Introduction
1.1. What is R?. R is an integrated suite of software facilities for data manipulation, calculation and
graphical display.It is an open Source software Among other things it has
e ective data handling and storage facility,
a suite of operators for calculations on arrays, in particular matrices
a large, coherent, integrated collection of intermediate tools for data analysis
graphical facilities for data analysis and display either directly at the computer or on hardcopy
a well developed, simple and e ective programming language (called S) which includes conditionals, loops, user defined recursive functions and input and output facilities. (Indeed most of the
system supplied functions are themselves written in the S language.)
R is very much a vehicle for newly developing methods of interactive data analysis. It has developed
rapidly, and has been extended by a large collection of packages. However, most programs written in R
are essentially ephemeral, written for a single piece of data analysis.

Figure 1. R Environment

2. Objects
In every computer language variables provide a means of accessing the data stored in memory. R does
not provide direct access to the computers memory but rather provides a number of specialized data
structures we will refer to as ob jects. These ob jects are referred to through symbols or variables. In R,
however, the symbols are themselves objects and can be manipulated in the same way as any other object.
This is di erent from many other languages and has wide ranging e ects.The di erent type of objects are
null,symbol,pairlist,closure,char,logical,integer,double etc
> x <- 1:3
> typeof(x)
[1] integer
> mo de(x)
[1] numeric
> storage.mode(x)
[1] integer

2.1. Basic Types.


2.1.1. VECTOR OBJECTS. Vectors can be thought of as contiguous cells containing data. Cells are
accessed through indexing operations such as x[5]. R has six basic (atomic) vector types: logical, integer,
real, complex, string (or character) and raw. The modes and storage modes for the di erent vector types
are listed in the following table.
typeof mode storage.mode
Logical logical logical
integer numeric integer
doublel numeric double
complex complex complex
character character character
raw raw raw

Single numbers, such as 4.2, and strings, such as four point two are still vectors, of length 1; there are
no more basic types. Vectors with length zero are possible (and useful).String vectors have mode and
storage mode character. A single element of a character vector is often referred to as a character string.
operation Discription
> a=c(5,5.6,1,4,-5) build the object a containing a numeric vector of
hspace5cm dimension
5 with elements 5, 5.6, 1, 4, 5
> a[1] Display the first element of a
> b=a[2:4] build the numeric vector b of dimension 3 with elements 5.6, 1, 4
> d=a[c(1,3,5)] build the numeric vector d of dimension 3 with elements 5, 1, 5
>2*a multiply each element of a by 2
>log(d*e) multiply the vectors d and e term by term and transform
each term into its natural logarithm
> sum(d) calculate the sum of d
> length(d calculate the length of d
>t(d) transpose d, the result is a row vector

2.1.2. List Objects. A list in R is a rather lo ose object made of a collection of other arbitrary ob jects
known as its components. For instance, a list can be derived from n existing objects using the function
list:
a = list(name

= object

, ..., name

= object

This command creates a list with n arguments using objects for the components, each being associated
with the arguments name.For instance,aname1 will be equal to object1. (It can also be represented as
a[[1]], but this is less practical, as it requires some bookkeeping of the order of the objects contained in the
list.) Lists are very useful in preserving information about the values of variables used within R functions
in the sense that all relevant values can be put within a list that is the output of the corresponding
function.
> li = l ist(num = 1 : 5, y = color, a = T )
create a list with three arguments
> a = matrix(c(6, 2, 0, 2, 6, 0, 0, 0, 36), nrow = 3)
create a (3, 3) matrix
> res = eigen(a, symmetric = T )
diagonalize a and
> names(res)
pro duce a list with two arguments: vectors and values of eigenvalues

2.1.3. Matrix and Array Objects. The matrix class provides the R representation of matrices. A typical
entry is,for instance,
> x = matrix(vec, nrow = n, ncol = p)
which creates an n p matrix whose elements are those of the vector vec,assuming this vector is of dimension
np. An important feature of this entry is that, in a somewhat unusual way, the components of vec are
stored by column, which means that x[1,1] is equal to vec[1], x[2,1] is equal to The matrix class provides
the R representation of matrices. A typical entry is, for instance,
> matrix(1 : 4, ncol = 3)
[, 1] [, 2] [, 3]
[1, ]1 3 1
[2, ]2 4 2
Warning message: data length [4] is not a submultiple or multiple of the number of columns [3] in
matrix in: matrix(1:4, ncol = 3)
> x1 = matrix(1 : 20, nrow = 5)
build the numeric matrix x1 of dimension 5 4 with first row 1, 6, 11, 16
> x2 = matrix(1 : 20, nrow = 5, byrow = T )
build the numeric matrix x2 of dimension 5 4 with first row 1, 2, 3, 4
> dim(x1)
display the dimensions of x1
> b[, 2]
select the second column of b
> b[c(3, 4), ]
select the third and fourth rows of b
> b[-2, ]
delete the second row of b
> rbind(x1, x2)
vertical merging of x1 and x2
> cbind(x1, x2)
horizontal merging of x1 and x2
> apply (x1, 1, sum)
calculate the sum of each row of x1
> as.matrix(1 : 10)
turn the vector 1:10 into a 10 1 matrix

Figure 2. matrix representation

A factor is a vector of characters or integers used to specify a discrete classification of the components
of other vectors with the same length. Its main di erence from a standard vector is that it comes with a
level attribute used to specify the possible values of the factor. This structure is therefore appropriate to
represent qualitative variables.
> state = c(tas, tas, sa, sa, wa)
create a vector with five values
> statef = f actor(state)
distinguish entries by group
> level s(statef )
give the groups
> incomes = c(60, 59, 40, 42, 23)
create a vector of incomes
> tapply(incomes, statef, mean)
average the incomes for each group
> statef = f actor
(state, define a new level with one more+ levels=c(tas,sa,wa,yo))group than observed
> table(statef )
return statistics for all levels
2.2. Evaluation of Expression. When a user types a command at the prompt (or when an expression
is read from a file) the first thing that happens to it is that the command is transformed by the parser
into an internal representation. The evaluator executes parsed R expressions and returns the value of the
expression. All expressions have a value. This is the core of the language.
2.2.1. Simple Evaluation.
2.2.2. constants. Any number typed directly at the prompt is a constant and is evaluated.
>1
[1]1
Perhaps unexpectedly, the number returned from the expression 1 is a numeric. In most cases, the
di erence between an integer and a numeric value will be unimportant as R will do the right thing when
using the numbers. There are, however, times when we would like to explicitly create an integer value
for a constant. We can do this by calling the function as.integer or using various other techniques. But
perhaps the simplest approach is to qualify our constant with the su x character L. For example, to
create the integer value 1, we might use
> 1L
[1]
We can use the L su x to qualify any number with the intent of making it an explicit integer. So 0x10L
creates the integer value 16 from the hexadecimal representation. The constant 1e3L gives 1000 as an
integer rather than a numeric value and is equivalent to 1000L. (Note that the L is treated as qualifying
the term 1e3 and not the 3.) If we qualify a value with L that is not an integer value, e.g. 1e-3L, we get
a warning and the numeric value is created. A warning is also created if there is an unnecessary decimal
point in the number, e.g. 1.L. We get a syntax error when using L with complex numbers, e.g. 12iL gives
an error. Constants are fairly boring and to do more we need symbols.

2.3. Function call. Most of the computations carried out in R involve the evaluation of functions. We
will also refer to this as function invocation. Functions are invoked by name with a list of arguments
separated by commas.
> mean(1 : 10)
[1]5.5
In this example the function mean was called with one argument, the vector of integers from 1 to 10. R
contains a huge number of functions with di erent purposes. Most are used for producing a result which
is an R object, but others are used for their side e ects, e.g., printing and plotting functions. Function
calls can have tagged (or named) arguments, as in plot(x, y, pch = 3). Arguments without tags are known
as positional since the function must distinguish their meaning from their sequential positions among the
arguments of the call, e.g., that x denotes the abscissa variable and y the ordinate. The use of tags/names
is an obvious convenience for functions with a large number of optional arguments.
2.4. operators. R allows the use of arithmetic expressions using operators similar to those of the C
programming language, for instance
>1+2
[1]3
Expressions can be grouped using parentheses, mixed with function calls, and assigned to variables in a
straightforward manner
> y < -2 * (a + log(x))
R contains a number of operators. They are listed in the table below.
operator functions
+ plus can be unary and binary
- minus can be unary and binary
! unary not
tidal used for model formula can be either unary or binary
? help
: Sequence,binary(in model formulae :interactions
/ division binary
> greater than,binary
< less than,binary
= = equal to,binary
>= greater than equal to,binary
<= less than equal to,binary
|| Or,binary,Not vectorized
<- Left assignment,binary
-> Right assignment,binary

Figure 3. operators illustration


3. Probability Distribution
Normal Distribution T-Distribution Bionomial Distribution chi-squared Distribution For every distribution there are four commands. The commands for each distribution are prepended with a letter to
indicate the functionality:
NAME DISCRIPTION
d Returns the height of Probability Density Function
p Returns the Cumulative Density Function
q Returns the inverse cumulative Density Function
d Returns Randomly generated numbers

Normal Distribution There are four functions that can be used to generate the values associated with the
normal distribution. You can get a full list of them and their options using the help command:
>help(Normal)
> dnorm(0)
> dnorm(0)*sqrt(2*pi)
> dnorm(0,mean=4)
> dnorm(0,mean=4,sd=10)
>v <- c(0,1,2)
> dnorm(v)
> x <-seq(-20,20,by=.1)
>y <- dnorm(x)
>plot(x,y)
The second function we examine is pnorm. Given a number or a list it computes the probability that a
normally distributed random number will be less than that number. This function also goes by the rather
ominous title of the Cumulative Distribution Function. It accepts the same options as dnorm:
> pnorm(0)
>pnorm(1)
>pnorm(0,mean=2)
>pnorm(0,mean=2,sd=3)
>v <- c(0,1,2)
> pnorm(v)
>x <- seq(-20,20,by=.1)
> y <- pnorm(x)

Figure 4. illustration of dnorm


> plot(x,y)
> y <- pnorm(x,mean=3,sd=4)
> plot(x,y)

The next function we look at is qnorm which is the inverse of pnorm. The idea behind qnorm is that you

Figure 5. illustration of pnorm


give it a probability, and it returns the number whose cumulative distribution matches the probability.
For example, if you have a normally distributed random variable with mean zero and standard deviation
one, then if you give the function a probability it returns the associated Z-score:

> qnorm(0.5)
> qnorm(0.5,mean=1)
>qnorm(0.5,mean=1,sd=2)
> qnorm(0.5,mean=2,sd=2)
> qnorm(0.5,mean=2,sd=4)
> qnorm(0.25,mean=2,sd=2)
> qnorm(0.333)
> qnorm(0.333,sd=3)
> qnorm(0.75,mean=5,sd=2)
> v = c(0.1,0.3,0.75)
> qnorm(v)
> x<- seq(0,1,by=.05)

> y <- qnorm(x)


> plot(x,y)
> y <- qnorm(x,mean=3,sd=2)
> plot(x,y)
> y <- qnorm(x,mean=3,sd=0.1)
> plot(x,y)

Figure 6. illustration of qnorm


The last function we examine is the rnorm function which can generate random numbers whose distribution is normal. The argument that you give it is the number of random numbers that you want, and it
has optional arguments to specify the mean and standard deviation:
> rnorm(4)
>rnorm(4,mean=3)
>rnorm(4,mean=3,sd=3)
> rnorm(4,mean=3,sd=3)
> y <- rnorm(200)
> hist(y)
> y <- rnorm(200,mean=-2)
> hist(y)
> y <- rnorm(200,mean=-2,sd=4)
> hist(y)
> qqnorm(y)

Figure 7. Histogram of Y

Figure 8. Normal Q-Q plot

4. Merits and Demerits


4.1. Merits. The Merits or advantages of R language is as follows:
R is the most comprehensive statistical analysis package available.It incorporates all of the standard statistical tests, models, and analyses, as well as providing a comprehensive language for
managing and manipulating data. New technology and ideas often appear first in R.
R is a programming language and environment developed for statistical analysis by practising
statisticians and researchers. It re ects well on a very competent community of computational
statisticians.
The graphical capabilities of R are outstanding, providing a fully programmable graphics language that surpasses most other statistical and graphical packages.
The validity of the R software is ensured through openly validated Because R is open source,
unlike closed source software, it has been reviewed by many internationally renowned statisticians
and computational scientists.
R is free and open source software, allowing anyone to use and, importantly, to modify it. R is
licensed under the GNU General Public License, with copyright held by The R Foundation for
Statistical Computing.
R has no license restrictions (other than ensuring our freedom to use it at our own discretion),
and so we can run it anywhere and at any time, and even sell it under the conditions of the license.
Anyone is welcome to provide buges, code enhancements, and new packages, and the wealth
of quality packages available for R is a testament to this approach to software development and
sharing.
R has over 4800 packages available from multiple repositories specializing in topics like econometrics, data mining, spatial analysis, and bio-informatics.
R is cross-platform. R runs on many operating systems and di erent hardware. It is popularly
used on GNU/Linux, Macintosh, and Microsoft Windows, running on both 32 and 64 bit processors.
R plays well with many other tools, importing data, for example, from CSV les, SAS, and SPSS, or
directly from Microsoft Excel, Microsoft Access, Oracle, MySQL, and SQLite. It can also produce
graphics output in PDF, JPG, PNG, and SVG formats, and table output for LATEX and HTML.

10

4.2. Demerits. The Demerits or disadvantages of R language is as follows:


R has a steep learning curveit does take a while to get used to the power of Rbut no steeper
than for other statistical languages. R is not so easy to use for the novice. There are several
simple-to use graphical user interfaces (GUIs) for R that encompass point and-click interactions,
but they generally do not have the polish of the commercial o erings.
Documentation is sometimes patchy and terse, and impenetrable to the non-statistician. However, some very high-standard books are increasingly plugging the documentation gaps.
The quality of some packages is less than perfect, although if a package is useful to many people,
it will quickly evolve into a very robust pro duct through collaborative e orts.
There is, in general, no one to complain to if something doesnt work. R is a software application
that many people freely devote their own time to developing. Problems are usually dealt with
quickly on the open mailing lists, and bugs disappear with lightning speed. Users who do require
it can purchase support from a number of vendors internationally.
Many R commands give little thought to memory management, and so R can very quickly consume all available memory. This can be a restriction when doing data mining. There are various
solutions, including using 64 bit operating systems that can access much more memory than 32
bit ones.

5. Application
6. Conclusion
7. Personal Opinion

Das könnte Ihnen auch gefallen