Beruflich Dokumente
Kultur Dokumente
>library() #Displays all available libraries. # found in, for e.g. '/usr/lib/R/s
ite-library'
>'<-' the
>5 -> a #
>a = 5 #
>a <- 5 #
3.5
11.0
18.5
26.0
33.5
4.0
11.5
19.0
26.5
34.0
4.5
12.0
19.5
27.0
34.5
5.0
12.5
20.0
27.5
35.0
5.5
13.0
20.5
28.0
35.5
6.0
13.5
21.0
28.5
36.0
6.5
14.0
21.5
29.0
36.5
7.0
14.5
22.0
29.5
37.0
7.5
15.0
22.5
30.0
37.5
8.0
15.5
23.0
30.5
38.0
> 1/a
[1]
[7]
[13]
[19]
[25]
[31]
[37]
[43]
[49]
[55]
[61]
[67]
[73]
[79]
[85]
[91]
[97]
0
180
360
540
720
900
10
190
370
550
730
910
20
200
380
560
740
920
30
210
390
570
750
930
40
220
400
580
760
940
50
230
410
590
770
950
60
240
420
600
780
960
70
250
430
610
790
970
80
260
440
620
800
980
90
270
450
630
810
990
100
280
460
640
820
110
290
470
650
830
120
300
480
660
840
130
310
490
670
850
140
320
500
680
860
150
330
510
690
870
160
340
520
700
880
170
350
530
710
890
2.0000000
0.9090909
0.5882353
0.4347826
0.7745967
1.1401754
1.4142136
1.6431677
1.8439089
> FALSE
[1] FALSE
> (a<1)
[1] FALSE FALSE FALSE #displays logic for each element of vector a. Logic test c
an also be assigned to a var.
e.g.
>logic <- (a<1)
Relational operators '== != < > <= >= ' #Used in logic tests
Boolean operators:
& = logical AND
| = logical OR
! = logical NOT
> a
[1] 1 2 3
> a[a>1] #displays elements of vector 'a' whose values are >1
[1] 2 3
> d
[1] 2 4 4
> b <- d[1:3] #assigns elements of vector 'd', from position 1 to position 3 to
variable 'b'
> b
[1] 2 4 4
Bioconductor: collection of packages for the analysis and comprehension of genom
ic data. has hundreds of packages to deal with microarray analysis and NGS seque
ncing data.
..........................
R tutorial at http://www.cyclismo.org/tutorial/R/
Reading a CSV file
>heisenberg <- read.csv(file="simple.csv",head=TRUE,sep=",")
> heisenberg
trial mass velocity
1
A 10.0
12
2
A 11.0
14
3
B 5.0
8
4
B 6.0
10
5
A 10.5
13
6
B 7.0
11
> summary(heisenberg)
trial
mass
A:3 Min. : 5.00
B:3 1st Bu.: 6.25
Median : 8.50
Mean : 8.25
3rd Qu.:10.38
Max. :11.00
velocity
Min. : 8.00
1st Qu.:10.25
Median :11.50
Mean :11.33
3rd Qu.:12.75
Max. :14.00
Each column is assigned a name based on the header (the first line in the file).
You can now access each individual column using a "$" to separate the two names
:
If you are not sure what columns are contained in the variable you can use the n
ames command:
> names(heisenberg)
[1] "trial"
"mass"
"velocity"
> heisenberg$mass
[1] 10.0 11.0 5.0 6.0 10.5 7.0
assigning a csv file to a variable called 'tree'
> tree <- read.csv(file="trees91.csv",header=TRUE,sep=",");
If you are not sure what kind of variable you have then you can use the attribut
es command. This will list all of the things that R uses to describe the variabl
e:
> attributes(tree)
$names
[1] "C"
"N"
[9] "STNCC" "RTNCC"
[17] "LFKCC" "STKCC"
[25] "RTPCC" "LFSCC"
"CHBR"
"LFBCC"
"RTKCC"
"STSCC"
"REP"
"LFBM" "STBM" "RTBM" "LFNCC"
"STBCC" "RTBCC" "LFCACC" "STCACC" "RTCACC"
"LFMGCC" "STMGCC" "RTMGCC" "LFPCC" "STPCC"
"RTSCC"
$class
[1] "data.frame"
$row.names
[1] "1" "2"
[16] "16" "17"
[31] "31" "32"
[46] "46" "47"
"3"
"18"
"33"
"48"
"4"
"19"
"34"
"49"
"5"
"20"
"35"
"50"
"6"
"21"
"36"
"51"
"7"
"22"
"37"
"52"
"8"
"23"
"38"
"53"
The first thing that R stores is a list of names which refer to each column of t
he data.
'tree' is of type data.frame.
It is common to come across data that is organized in flat files and delimited a
t preset locations on each line. This is often called a "fixed width file."The c
ommand to deal with these kind of files is read.fwf. for info, use 'help(read.fw
f)'
> ls() #used to get a list of the variables defined in a session
[1] "a" "b"
> a <- c(1,2,3,4,5)
> a[0]
numeric(0) #Note: the zero entry indicates how data is stored.
> b <- c('Hello there')
> b
[1] "Hello there"
> b[1]
[1] "Hello there"
> b <- c('Hello','there')
> b[2]
[1] "there"
> b[0]
character(0)
Factors in R
Often, an experiment includes trials for different levels of some variable. For
e.g. when looking at the impact of CO2 on the growth rate of a tree you might tr
y to observe how different trees respond when exposed to different preset concen
trations of CO2. The different levels are also called factors.
> summary(tree$CHBR)
A1 A2 A3 A4 A5 A6 A7 B1 B2 B3 B4 B5 B6 B7 C1 C2 C3 C4 C5 C6
3 1 1 3 1 3 1 1 3 3 3 3 3 3 1 3 1 3 1 1
C7 CL6 CL7 D1 D2 D3 D4 D5 D6 D7
1 1 1 1 1 3 1 1 1 1
In this data set, several of the columns are factors, but the researchers used n
umbers to indicate the different levels. For example, the first column, labeled
"C," is a factor. Each trees was grown in an environment with one of four differ
ent possible levels of carbon dioxide. The researchers quite sensibly labeled th
ese four environments as 1, 2, 3, and 4. Unfortunately, R cannot determine that
these are factors and must assume that they are regular numbers.
This is a common problem and there is a way to tell R to treat the "C" column as
a set of factors. You specify that a variable is a factor using the factor comm
and. In the following example we convert tree$C into a factor:
> tree$C
[1] 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3
[39] 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4
> summary(tree$C)
Min. 1st Qu. Median
Mean 3rd Qu.
Max.
1.000 2.000 2.000 2.519 3.000 4.000
> tree$C <- factor(tree$C)
> tree$C
[1] 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3
[39] 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4
Levels: 1 2 3 4
> summary(tree$C)
1 2 3 4
8 23 10 13
> levels(tree$C)
[1] "1" "2" "3" "4"
Once a vector is converted into a set of factors then R treats it in a different
manner then when it is a set of numbers. A set of factors have a decrete set of
possible values, and it does not make sense to try to find averages or other nu
merical descriptions. One thing that is important is the number of times that ea
ch factor appears, called their "frequencies," which is printed using the summar
y command.