Sie sind auf Seite 1von 1

R

VGU

R exercises Set 4
Load the data before proceeding. Lets name the dataset as flights
flights = read.csv(file.choose(), 2008.csv)
Look over the help pages for the which, mean, median, impute, lm, predict.
1. Print the structure of the data. What do you think about it ?
2. Print the summary statistics of the data. What do you think about the values ? (format,
consistency, completeness)
3. Print the dimensionality of the data (number of rows and columns)
4. Print the number of rows. This may seem like a silly command, but it is quite useful for loops
and if statements.
5. Print the number of columns.
6. Print the names of the variables.
7. Print whether the first column has missing values (NAs). Try to answer this question with two
ways. Hint: %in%
8. Print the number of variables that contain missing values.
9. Find the portion of the variables that contain missing values. What do you think about it?
10. Print the names of the variables that contain missing values
11. Impute the missing values of flights$ArrTime with the mean using which.
12. Impute the missing values of flights$CRSArrTime with the median using which.
library(Hmisc)
13. Impute the missing values of flights$ArrDelay with the median using the impute operator.
14. Impute the missing values of flights$CRSElapsedTime with the median using the impute operator.
15. Make a linear regression model named lm dep time delay with dependent variable flights$DepDelay
and independent variables : flights$ArrTime, flights$AirTime, flights$ArrDelay, flights$DepTime.
16. Create an object pred dep time delay and assign the predicted values.

Instructor: Dr. Son P. Nguyen

January 2, 2017

Das könnte Ihnen auch gefallen