Sie sind auf Seite 1von 29

Week 3: Basic concepts II: data structures

ACTL 1101 Introduction to Actuarial Studies


R software tutorial series
Xinda Yang 1
1

The material is adapted from earlier slides created by Pierre Lafaye


de Micheaux

Week 3:
Basic concepts II: data structures 1

1/29

Reference: LaDroLi 3.2.2

Week 3: Basic concepts II: data structures

1 Goals
2 Data type and data structure
3 Vectors
4 Matrices and arrays
5 Lists
6 Data frames
7 Factors

2/29

Week 3: Basic concepts II: data structures


Goals

1 Goals
2 Data type and data structure
3 Vectors
4 Matrices and arrays
5 Lists
6 Data frames
7 Factors

3/29

Week 3: Basic concepts II: data structures


Goals

Goals of this tutorial

By the end of this tutorial, you should be able to


recognise and create different data structures in
understand the features of each data structure

4/29

Week 3: Basic concepts II: data structures


Data type and data structure

1 Goals
2 Data type and data structure
3 Vectors
4 Matrices and arrays
5 Lists
6 Data frames
7 Factors

5/29

Week 3: Basic concepts II: data structures


Data type and data structure

The various data types in - page 50


Last week we discussed different data types (of a single data point)
in
Data type
real number (integer or not)
complex number
logical (true/false)
missing
text (string)
binary

Type in

numeric 2
complex
logical
logical
character
raw

Display

3.27
3+2i
TRUE or FALSE
NA
"text"
1c

This week, we will discuss how to structure data (of multiple data
points) in .
6/29

the numeric type is the same as the double type

Week 3: Basic concepts II: data structures


Vectors

1 Goals
2 Data type and data structure
3 Vectors
4 Matrices and arrays
5 Lists
6 Data frames
7 Factors

7/29

Week 3: Basic concepts II: data structures


Vectors

Vectors - page 51
Vector represents a sequence of data points of the same type. You
can create a vector in different ways
by using the function c().
by using the function seq()
by using a colon :
Run the following codes and see what you get

>
>
>
>
8/29

c(3,1,7)
seq(from=0,to=1,by=0.1)
seq(from=0,to=20,length=5)
(vec <- 2:33)

Week 3: Basic concepts II: data structures


Vectors

Vectors - page 51
Results

> c(3,1,7)
[1] 3 1 7
> seq(from=0,to=0.9,by=0.1)
[1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
> seq(from=0,to=20,length=5)
[1] 0 5 10 15 20
> (vec <- 2:15) # [12]=rank of the next element.
[1] 2 3 4 5 6 7 8 9 10 11 12
[12] 13 14 15

9/29

Week 3: Basic concepts II: data structures


Vectors

Vectors - page 51
Note:
Despite that you can mix data of different types,
implicitly convert them into a single data type

will

You can also name" the elements of a vector using the


function names()
Run the following codes and see what you get

> c(3,TRUE,7)
> vec <- 1:10
> names(vec) <- letters[1:10]

10/29

Week 3: Basic concepts II: data structures


Vectors

Vectors - page 51

Results

> c(3,TRUE,7) #
[1] 3 1 7
> vec <- 1:10 #
> names(vec) <a b c d e
1 2 3 4 5

Automatic conversion occurs.


Stored as integers.
letters[1:10]
f g h i j
6 7 8 9 10

Here, the built-in constants letters[1:10] return the first 10 letters of the alphabet.

11/29

Week 3: Basic concepts II: data structures


Matrices and arrays

1 Goals
2 Data type and data structure
3 Vectors
4 Matrices and arrays
5 Lists
6 Data frames
7 Factors

12/29

Week 3: Basic concepts II: data structures


Matrices and arrays

Matrices and arrays- page 52


Matrices and arrays are generalisation of vectors
there are two dimensions in a matrix (hence you need two
indices to access a data point)
an array allows for multiple dimensions (so you need multiple
indices)
Try the following codes

>
>
>
>
13/29

(X <- matrix(1:12,nrow=4,ncol=3,byrow=TRUE))
(X <- matrix(1:12,nrow=4,ncol=3,byrow=FALSE))
class(X)
(X <- array(1:12,dim=c(2,2,3)))

Week 3: Basic concepts II: data structures


Matrices and arrays

Matrices and arrays - page 52

14/29

> (X <- matrix(1:12,nrow=4,ncol=3,byrow=TRUE))


[,1] [,2] [,3]
[1,]
1
2
3
[2,]
4
5
6
[3,]
7
8
9
[4,]
10
11
12
> (X <- matrix(1:12,nrow=4,ncol=3,byrow=FALSE))
[,1] [,2] [,3]
[1,]
1
5
9
[2,]
2
6
10
[3,]
3
7
11
[4,]
4
8
12
> class(X)
[1] "matrix"

Week 3: Basic concepts II: data structures


Matrices and arrays

Matrices and arrays - page 53


> (X <- array(1:12,dim=c(2,2,3)))
, , 1
[,1] [,2]
[1,]
1
3
[2,]
2
4
, , 2
[,1] [,2]
[1,]
5
7
[2,]
6
8
, , 3
[,1] [,2]
[1,]
9
11
[2,]
10
12
15/29

Week 3: Basic concepts II: data structures


Matrices and arrays

Matrices and arrays - page 53


How do you interpret a three-dimensional array?

16/29

Week 3: Basic concepts II: data structures


Lists

1 Goals
2 Data type and data structure
3 Vectors
4 Matrices and arrays
5 Lists
6 Data frames
7 Factors

17/29

Week 3: Basic concepts II: data structures


Lists

Lists - page 53

Lists can group together in one structure data of different types without altering them.

>
+

A <- list(TRUE,-1:3,my.matrix=matrix(1:4,
nrow=2),c(1+2i,3),"A character string")

Run the codes and answer the following questions


How many elements do we have in the object A?
Does each element have its own name?
Do all elements have the same data types?

18/29

Week 3: Basic concepts II: data structures


Lists

Lists - page 53
> A
[[1]]
[1] TRUE
[[2]]
[1] -1 0 1 2 3
$my.matrix
[,1] [,2]
[1,]
1
3
[2,]
2
4
[[4]]
[1] 1+2i 3+0i
[[5]]
[1] "A character string"
19/29

Week 3: Basic concepts II: data structures


Lists

Lists - page 54

Results
There are 5 elements.
The third elements name is my.matrix. The rest have default
names that show their positions in the list.
No - this is one advantage of using the list. In fact, each
element can be a vector, a matrix, an array or even a list.
Note: naming element will make it easier to read the data from a list
- we will discuss this further in later sessions.

20/29

Week 3: Basic concepts II: data structures


Data frames

1 Goals
2 Data type and data structure
3 Vectors
4 Matrices and arrays
5 Lists
6 Data frames
7 Factors

21/29

Week 3: Basic concepts II: data structures


Data frames

Data frames

A data.frame in

is a table where

each row represents a single observation (e.g. an individual)


each column represents a single variable, which must be of
the same data type across all rows
Data frames are very widely used in
flexibility of having multiple data types
in many cases, a dataset can be formulated as a data.frame

22/29

Week 3: Basic concepts II: data structures


Data frames

Data frames
> (BMI <- data.frame(
+
Gender=c("M","F","M","F"),
+
Height=c(1.83,1.76,1.82,1.60),
+
Weight=c(67,58,66,48),
+
row.names=c("Jack","Julia","Henry"
+
,"Emma")))
Gender Height Weight
Jack
M
1.83
67
Julia
F
1.76
58
Henry
M
1.82
66
Emma
F
1.60
48
23/29

Week 3: Basic concepts II: data structures


Data frames

Data frames

> str(BMI) # Structure of each column.


'data.frame': 4 obs. of 3 variables:
$ Gender: Factor w/ 2 levels "F","M": 2 1 2 1
$ Height: num 1.83 1.76 1.82 1.6
$ Weight: num 67 58 66 48

24/29

Week 3: Basic concepts II: data structures


Factors

1 Goals
2 Data type and data structure
3 Vectors
4 Matrices and arrays
5 Lists
6 Data frames
7 Factors

25/29

Week 3: Basic concepts II: data structures


Factors

Factors
A factor can be used to store character strings
each element is treated as a factor (even if the input is a real
number)
some functions require data structured as a factor
Try the following example

>x <- factor(c("blue","green","blue","red",


+
"blue","green","green"))
> levels(x)
> class(x)

26/29

Week 3: Basic concepts II: data structures


Factors

Factors

> x <- factor(c("blue","green","blue","red",


+
"blue","green","green"))
> x
[1] blue green blue red
blue green green
Levels: blue green red
> levels(x)
[1] "blue" "green" "red"
> class(x)
[1] "factor"
Note: you do not need to count how many unique factors there are
by yourself.
27/29

Week 3: Basic concepts II: data structures


Factors

Other structures
You can use the as.Date function for dates

> dates<-c("92/27/02","92/02/27")
> as.Date(dates,"%y/%m/%d")
[1] NA
"1992-02-27"
You can also create a time series structure

> ts(1:10,frequency=4,start=c(1959,2))
Qtr1 Qtr2 Qtr3 Qtr4
1959
1
2
3
1960
4
5
6
7
1961
8
9
10
28/29

Week 3: Basic concepts II: data structures


Factors

Summary: the various data structures in


Data structure

Instruction
in

vector
matrix

c()
matrix()

array
list
data frame

factor
dates
time series

29/29

Description

Sequence of elements of the same nature.


Two-dimensional table of elements of the
same nature.
array()
More general than a matrix; table with several dimensions.
list()
Sequence of structures of any (and possibly different) nature.
data.frame() Two-dimensional table. The columns can be
of different natures, but must have the same
length.
factor()
Vector of character strings associated with a
modality table.
as.Date()
Vector of dates.
ts()
Values of a variable observed at several time
points.

Das könnte Ihnen auch gefallen