Beruflich Dokumente
Kultur Dokumente
Dataset: It is a school drop out data from schools of Andhra Pradesh. Data was
collected to make a prediction as to which child was about to drop-out
so as to take preventive measures. File name: studentDropIndia_20161215.csv
PART-I
-------
Histograms and Density Plots
1. Plot a histogram of 'science_marks'. Use bins = 10.
2. Plot a density plot of 'science_teacher'
3. Plot a density plot of 'science_teacher', 'contiue_drop' wise
and use alpha = 0.2. We will draw more density plots in Part II
and interpret them
Bar charts
4. Draw a stcaked bar chart of continue_drop vs gender
5. In the bar chart that you have drawn above, make the
only change: write geom_bar() as geom_bar(position = "fill")
The type of bar-graph changes. Can you explain? And interpret?
Box plots
6. Draw boxplots of gender vs mathematics_marks. Who is
better in maths: males or females? Can you think of
possible reasons?
7. Does guardian matter in performance of a child?
Draw boxplots of guardian vs total marks.
(In ggplot itself you can write like this:
mathematics_marks + english_marks + science_marks)
Scatterplots
8. Those who are good in science are they also good in english.
Does this observation generally hold? Draw a scatterplot
of mathematics_marks vs english_marks.
9. Smoothen the above graph and then observe.
Data being real, if you are inquisitive, you can try to dig
into why girls are dropping more than boys. Is it lack of
toilets? Or other reasons?
PART-II
Exercise on Feature Plotting
===========================
One important analysis often made in predictive analytics is regarding
which features are more important in making correct prediction of target.
Visually this can be done as below.
ALL PLOTS ARE TO BE PASTED IN MS-WORD FILE AND THEN SAVE MS-WORD FILE AS PDF
AND UPLOAD IN MOODLE. EACH PLOT MUST CARRY YOUR INTERPRETATION.
library(caret)
featurePlot(x = data[,c(1:4, 7, 10,13)], # change 'data' to your dataset name
as also change col numbers
y = data$target, # change 'target' to your target name
plot = "box",
scales = list(y = list(relation="free"),
x = list(rot = 90)),
layout = c(3,3 ), # 3 X 3 grid
auto.key = list(columns = 4))
# 2. Density plots
*************************************************************************
YOU ARE WELCOME TO EMAIL ME OR PROF DHANYA FOR SEEKING ANY CLARIFICATIONS
*************************************************************************