Beruflich Dokumente
Kultur Dokumente
First I will set the working directory using setwd() to my local preferred folder to solve the below
problems.
input.data = read.csv("data/Bank500.csv")
Note: CSV2 function we aren’t using since the data sets doesn’t contain semi-colon
separated values , if that would have been the case , would have used.
_______________________________________________
2. Use following commands to learn about the data you have read into input.data – Write
code for each;
a. Number of rows :
500 rows
_____________________
Number of columns
[AB]- Three ways (which I know ). Listed
b. 1. dim(input.data)[2]
2. ncol(input.data)
3. NCOL(input.data)
21 Columns
_____________________
c. Display Top 6 rows
[AB]- Three ways (which I know ). Listed
head(input.data) or
input.data[1:6,] or
input.data[c(1:6),]
_____________________
d. Display bottom 6 rows
[AB]: Three ways which I know I listed down
tail(input.data)
bottom6thRow = nrow(input.data)-5
print(bottom6thRow)
input.data[bottom6thRow:nrow(input.data),]
input.data[c(bottom6thRow:nrow(input.data)),]
_____________________
e. Display first 20 rows
[AB]:
head(input.data,20) or
input.data[1:20,] or
input.data[c(1:20),]
_____________________
5. What is the datatype of variable “contact”? What are the different levels or values that the
variable contact can assume?
___________________________________________
[AB]:
Well the question seems incomplete to me , If it means only distinct values then
table(input.data$age) would be right
else
input.data$age will also display the age variable
___________________________________________
colnames(input.data)
names(input.data)
___________________________________________
8. Create a new variable “age_below_40” which is a subset of all input.data where all the rows
of age <= 40. Use subset function.
[AB]:age_below_40 = subset(input.data, age<=40)
*Note: subset is very slow compared to deplyr filter
___________________________________________
____________________________________________
12. From the output above, What is the balance of the customer listed on the top row?
[AB]:3571
____________________________________________
Basic Graphs in R
In this section, we will learn the following:
13. Generate a Histogram for ‘age’ field in input.data – Write code below:
[AB]: hist(input.data$age,main = "Age Histogram",ylab = "Count",xlab = "Age Range")
________________________________
14. Do you observe anything unusual from the histogram generated above? If yes, state your
observation
[AB]: No
________________________________
15. Generate a Barplot to understand the distribution of customers based on marital status –
Write code below:
[AB] : Not sure from the question whether the columns should be horizontally placed or
vertically .Anyways both the answers are below
1.VerticalBarplot = barplot(table(input.data$marital),col=rainbow(3))
2.horizontalBarPlot = barplot(table(input.data$marital),col=rainbow(3),horiz = TRUE)
________________________________
16. Generate a Scatter Plot to understand the correlation between Duration on x-axis and
Balance on y-axis. Write code below:
[AB]:
Normal plot :
________________________________
17. Generate a Box Plot for the variable ‘balance’. Make sure to specify the title and labels for X
and Y axis. Write the code below:
[AB]: boxplot(input.data$balance, main = "Balance", ylab = "balance",xlab="Records Count")
_____________________________________________________
_____________________________________________________
18. Generate a Box Plot for balance versus job. Write code below:
[AB]:ggplot(input.data, aes(x = job, y = balance,)) + geom_boxplot()
_____________________________________________________
_____________________________________________________
19. From the Box Plot generated above, which job category has the highest median balance?
[AB]: Unknown
_____________________________________________________
20. Using ggplot2 package, generate a point plot where x-axis is education and y-axis is balance.
Write code below:
[AB]: ggplot(input.data, aes(x = education, y = balance)) +
geom_point()
_____________________________________________________
21. Using ggplot2 package, generate a point plot where x-axis is age and y-axis is balance. Write
code below: - Is there a correlation between age and balance?
[AB]: ggplot(input.data, aes(x = age, y = balance)) + geom_point().
There is no direct co relation but an inference can be made between two though might not
be practical enough
_____________________________________________________
_____________________________________________________
22. Enhance the point plot above so that we can identify those who have housing loan based on
colour. Write code below:
[AB]: ggplot(input.data, aes(x = age, y = balance,colour=loan)) + geom_point().
__________________________________________________________
23. Let us try to understand the overall Monthly Average Balance. Run the code below and
answer relevant questions: