R Notes

R is an integrated suite of software facilities for data manipulation, calculati
on and graphical display

A 'vector' is a list of numbers (or character or logical values) (equal to a col
umn in a table)
A 'vector' can only contain elements of the same mode (type)
A 'data frame' is a table where rows are not necessarily related
A 'Matrix' is a table (where rows and columns are related). Also defined as a se
t of vectors of the same length.
R is a functional based language, arguments and options go inside brackets - dat
a and options are separated by commas.
'Function(data, options)'
e.g.
q() = quit
help() = help
help('if')
..
RKWard is a GUI IDE for R
RKWard is not confined to working with just one piece of data at a time, but rat
her you have a "Workspace", where all your different variables, tables, etc. are
accessible.
To create a new spreadsheet (data.frame), choose:
File->New->Dataset
Assign a name to this table
To open pre-existing data, choose:
File->Import->Load data #Loads data in CSV format
..
R from terminal
R
>x <- c(1,2,3) #Function c
combines Values into a vector consisting of integers 1 ,2 &
3 & assigns the value to variable 'x'
>x #displays the vector 'x'
[1] 1 2 3
e.g.
>a <>b <>d <>f <-
c(1,2,5.3,6,-2,4) # numeric vector

c("one","two","three") # character vector
c(TRUE,TRUE,TRUE,FALSE,TRUE,FALSE) #logical vector
c( 1, a , T ) # Mixture
>library() #Displays all available libraries. # found in, for e.g. '/usr/lib/R/s
ite-library'
>'<-' the
>5 -> a #
>a = 5 #
>a <- 5 #
assignment operator. Can also be the other way around

assigns scalar 5 to variable 'a'
>a <- c(1,2,3,4,5) # assigns vector to variable 'a'

>print(a) # prints value of 'a'
>a = c(1:10) #creates a vector with values ranging from 1 to 10
>a = seq(1,10) #creates a vector with values ranging from 1 to 10

> a = seq(1, 40, by=0.5)
incrementing by 0.5
> a
[1] 1.0 1.5 2.0 2.5
[16] 8.5 9.0 9.5 10.0
[31] 16.0 16.5 17.0 17.5
[46] 23.5 24.0 24.5 25.0
[61] 31.0 31.5 32.0 32.5
[76] 38.5 39.0 39.5 40.0
# creates a sequence with values ranging from 1 to 40,

3.0
10.5
18.0
25.5
33.0
3.5
11.0
18.5
26.0
33.5
4.0
11.5
19.0
26.5
34.0
4.5
12.0
19.5
27.0
34.5
5.0
12.5
20.0
27.5
35.0
5.5
13.0
20.5
28.0
35.5
6.0
13.5
21.0
28.5
36.0
6.5
14.0
21.5
29.0
36.5
7.0
14.5
22.0
29.5
37.0
7.5
15.0
22.5
30.0
37.5
8.0
15.5
23.0
30.5
38.0
>a = seq(length=40, from=4.6, by=0.5) #creates a sequence with 40 entries, start

ing from 4.6, with an increment of 0.5
> a =
> a
[1]
[19]
[37]
[55]
[73]
[91]
seq(length=100, from=0, by=10)
> 1/a
[1]
[7]
[13]
[19]
[25]
[31]
[37]
[43]
[49]
[55]
[61]
[67]
[73]
[79]
[85]
[91]
[97]
#Divides 1 by every element for every element of the sequence 'a'

Inf 0.100000000 0.050000000 0.033333333 0.025000000 0.020000000
0.016666667 0.014285714 0.012500000 0.011111111 0.010000000 0.009090909
0.008333333 0.007692308 0.007142857 0.006666667 0.006250000 0.005882353
0.005555556 0.005263158 0.005000000 0.004761905 0.004545455 0.004347826
0.004166667 0.004000000 0.003846154 0.003703704 0.003571429 0.003448276
0.003333333 0.003225806 0.003125000 0.003030303 0.002941176 0.002857143
0.002777778 0.002702703 0.002631579 0.002564103 0.002500000 0.002439024
0.002380952 0.002325581 0.002272727 0.002222222 0.002173913 0.002127660
0.002083333 0.002040816 0.002000000 0.001960784 0.001923077 0.001886792
0.001851852 0.001818182 0.001785714 0.001754386 0.001724138 0.001694915
0.001666667 0.001639344 0.001612903 0.001587302 0.001562500 0.001538462
0.001515152 0.001492537 0.001470588 0.001449275 0.001428571 0.001408451
0.001388889 0.001369863 0.001351351 0.001333333 0.001315789 0.001298701
0.001282051 0.001265823 0.001250000 0.001234568 0.001219512 0.001204819
0.001190476 0.001176471 0.001162791 0.001149425 0.001136364 0.001123596
0.001111111 0.001098901 0.001086957 0.001075269 0.001063830 0.001052632
0.001041667 0.001030928 0.001020408 0.001010101
0
180
360
540
720
900
10
190
370
550
730
910
20
200
380
560
740
920
30
210
390
570
750
930
40
220
400
580
760
940
50
230
410
590
770
950
60
240
420
600
780
960
70
250
430
610
790
970
80
260
440
620
800
980
90
270
450
630
810
990
100
280
460
640
820
#sequences can be divided by sequences

e.g.
> b = seq(length=40, from=1, by=0)
> a = seq(length=40, from=0, by=0.1)
> a
[1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
[20] 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9
[39] 3.8 3.9
> b
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[39] 1 1
> b/a
[1]
Inf 10.0000000 5.0000000 3.3333333
[7] 1.6666667 1.4285714 1.2500000 1.1111111
[13] 0.8333333 0.7692308 0.7142857 0.6666667
[19] 0.5555556 0.5263158 0.5000000 0.4761905
110
290
470
650
830
120
300
480
660
840
130
310
490
670
850
140
320
500
680
860
150
330
510
690
870
160
340
520
700
880
170
350
530
710
890
1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8

3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
2.5000000
1.0000000
0.6250000
0.4545455
2.0000000
0.9090909
0.5882353
0.4347826
[25] 0.4166667 0.4000000 0.3846154 0.3703704 0.3571429 0.3448276

[31] 0.3333333 0.3225806 0.3125000 0.3030303 0.2941176 0.2857143
[37] 0.2777778 0.2702703 0.2631579 0.2564103
>a*8 # multiplies every element of the sequence by 8
>a+a #adds each sequence elements of 'a' to itself
>mean(a) #computes mean of 'a'
[1] 1.95
>var(a) #computes variance of 'a'
[1] 1.366667
> sum(a) #computes sum of all elements of 'a'
[1] 78
>prod(a) #computes product of all elements of 'a'
[1] 0
> sqrt(a) #computes square root of
[1] 0.0000000 0.3162278 0.4472136
[8] 0.8366600 0.8944272 0.9486833
[15] 1.1832160 1.2247449 1.2649111
[22] 1.4491377 1.4832397 1.5165751
[29] 1.6733201 1.7029386 1.7320508
[36] 1.8708287 1.8973666 1.9235384
all members of sequence a

0.5477226 0.6324555 0.7071068
1.0000000 1.0488088 1.0954451
1.3038405 1.3416408 1.3784049
1.5491933 1.5811388 1.6124515
1.7606817 1.7888544 1.8165902
1.9493589 1.9748418
0.7745967
1.1401754
1.4142136
1.6431677
1.8439089
> length(a) #length of a

[1] 40
> min(a) # minimum val of 'a'
[1] 0
>max(a) # maximum val of 'a'
[1] 3.9
> sort(a) #sorts sequence
[1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8
[20] 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7
[39] 3.8 3.9
#Adding vectors of different lengths
> a <- c(1,2,3)
> b <- c(4)
> c <- c(1,2)
> a+b
[1] 5 6 7 # adds the only element of vector 'b' to every element of vector 'a',
as long as any of the vectors is a multiple of the other. doesn't work otherwise
.
> a+c
[1] 2 4 4
Warning message:
In a + c : longer object length is not a multiple of shorter object length
Logical values
TRUE & FALSE
> TRUE
[1] TRUE
> FALSE
[1] FALSE
> (a<1)
[1] FALSE FALSE FALSE #displays logic for each element of vector a. Logic test c
an also be assigned to a var.
e.g.
>logic <- (a<1)
Relational operators '== != < > <= >= ' #Used in logic tests
Boolean operators:
& = logical AND
| = logical OR
! = logical NOT
> a
[1] 1 2 3
> a[a>1] #displays elements of vector 'a' whose values are >1
[1] 2 3
> d
[1] 2 4 4
> b <- d[1:3] #assigns elements of vector 'd', from position 1 to position 3 to
variable 'b'
> b
[1] 2 4 4
Bioconductor: collection of packages for the analysis and comprehension of genom
ic data. has hundreds of packages to deal with microarray analysis and NGS seque
ncing data.
..........................
R tutorial at http://www.cyclismo.org/tutorial/R/
Reading a CSV file
>heisenberg <- read.csv(file="simple.csv",head=TRUE,sep=",")
> heisenberg
trial mass velocity
1
A 10.0
12
2
A 11.0
14
3
B 5.0
8
4
B 6.0
10
5
A 10.5
13
6
B 7.0
11
> summary(heisenberg)
trial
mass
A:3 Min. : 5.00
B:3 1st Bu.: 6.25
Median : 8.50
Mean : 8.25
3rd Qu.:10.38
Max. :11.00
velocity
Min. : 8.00
1st Qu.:10.25
Median :11.50
Mean :11.33
3rd Qu.:12.75
Max. :14.00
Each column is assigned a name based on the header (the first line in the file).
You can now access each individual column using a "$" to separate the two names
:
If you are not sure what columns are contained in the variable you can use the n
ames command:
> names(heisenberg)
[1] "trial"
"mass"
"velocity"
> heisenberg$mass
[1] 10.0 11.0 5.0 6.0 10.5 7.0
assigning a csv file to a variable called 'tree'
> tree <- read.csv(file="trees91.csv",header=TRUE,sep=",");
If you are not sure what kind of variable you have then you can use the attribut
es command. This will list all of the things that R uses to describe the variabl
e:
> attributes(tree)
$names
[1] "C"
"N"
[9] "STNCC" "RTNCC"
[17] "LFKCC" "STKCC"
[25] "RTPCC" "LFSCC"
"CHBR"
"LFBCC"
"RTKCC"
"STSCC"
"REP"
"LFBM" "STBM" "RTBM" "LFNCC"
"STBCC" "RTBCC" "LFCACC" "STCACC" "RTCACC"
"LFMGCC" "STMGCC" "RTMGCC" "LFPCC" "STPCC"
"RTSCC"
$class
[1] "data.frame"
$row.names
[1] "1" "2"
[16] "16" "17"
[31] "31" "32"
[46] "46" "47"
"3"
"18"
"33"
"48"
"4"
"19"
"34"
"49"
"5"
"20"
"35"
"50"
"6"
"21"
"36"
"51"
"7"
"22"
"37"
"52"
"8"
"23"
"38"
"53"
"9" "10" "11" "12" "13" "14" "15"

"24" "25" "26" "27" "28" "29" "30"
"39" "40" "41" "42" "43" "44" "45"
"54"
The first thing that R stores is a list of names which refer to each column of t
he data.
'tree' is of type data.frame.
It is common to come across data that is organized in flat files and delimited a
t preset locations on each line. This is often called a "fixed width file."The c
ommand to deal with these kind of files is read.fwf. for info, use 'help(read.fw
f)'
> ls() #used to get a list of the variables defined in a session
[1] "a" "b"
> a <- c(1,2,3,4,5)
> a[0]
numeric(0) #Note: the zero entry indicates how data is stored.
> b <- c('Hello there')
> b
[1] "Hello there"
> b[1]
[1] "Hello there"
> b <- c('Hello','there')
> b[2]
[1] "there"
> b[0]
character(0)
Factors in R
Often, an experiment includes trials for different levels of some variable. For
e.g. when looking at the impact of CO2 on the growth rate of a tree you might tr
y to observe how different trees respond when exposed to different preset concen
trations of CO2. The different levels are also called factors.
> summary(tree$CHBR)
A1 A2 A3 A4 A5 A6 A7 B1 B2 B3 B4 B5 B6 B7 C1 C2 C3 C4 C5 C6
3 1 1 3 1 3 1 1 3 3 3 3 3 3 1 3 1 3 1 1
C7 CL6 CL7 D1 D2 D3 D4 D5 D6 D7
1 1 1 1 1 3 1 1 1 1
In this data set, several of the columns are factors, but the researchers used n
umbers to indicate the different levels. For example, the first column, labeled
"C," is a factor. Each trees was grown in an environment with one of four differ
ent possible levels of carbon dioxide. The researchers quite sensibly labeled th
ese four environments as 1, 2, 3, and 4. Unfortunately, R cannot determine that
these are factors and must assume that they are regular numbers.
This is a common problem and there is a way to tell R to treat the "C" column as
a set of factors. You specify that a variable is a factor using the factor comm
and. In the following example we convert tree$C into a factor:
> tree$C
[1] 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3
[39] 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4
> summary(tree$C)
Min. 1st Qu. Median
Mean 3rd Qu.
Max.
1.000 2.000 2.000 2.519 3.000 4.000
> tree$C <- factor(tree$C)
> tree$C
[1] 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3
[39] 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4
Levels: 1 2 3 4
> summary(tree$C)
1 2 3 4
8 23 10 13
> levels(tree$C)
[1] "1" "2" "3" "4"
Once a vector is converted into a set of factors then R treats it in a different
manner then when it is a set of numbers. A set of factors have a decrete set of
possible values, and it does not make sense to try to find averages or other nu
merical descriptions. One thing that is important is the number of times that ea
ch factor appears, called their "frequencies," which is printed using the summar
y command.

R Notes

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

R Notes

Hochgeladen von

Copyright:

Verfügbare Formate

R is an integrated suite of software facilities for data manipulation, calculati

on and graphical display

c(1,2,5.3,6,-2,4) # numeric vector

assignment operator. Can also be the other way around

>a <- c(1,2,3,4,5) # assigns vector to variable 'a'

>a = seq(1,10) #creates a vector with values ranging from 1 to 10

# creates a sequence with values ranging from 1 to 40,

>a = seq(length=40, from=4.6, by=0.5) #creates a sequence with 40 entries, start

seq(length=100, from=0, by=10)

#Divides 1 by every element for every element of the sequence 'a'

#sequences can be divided by sequences

1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8

[25] 0.4166667 0.4000000 0.3846154 0.3703704 0.3571429 0.3448276

all members of sequence a

> length(a) #length of a

"9" "10" "11" "12" "13" "14" "15"

Das könnte Ihnen auch gefallen