Sie sind auf Seite 1von 7

International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842

Issue 12, Volume 4 (December 2017) www.irjcs.com

REGRESSION ANALYSIS ON STUDENTS LIFESTYLE


Anushka Ranjan
Information Technology Department, NMIMS MPSTME
Shirpur, India
Anushkaranjan203@gmail.com;
Pratiksha Meshram
Information Technology Department, NMIMS MPSTME
Shirpur, India
Manuscript History
Number: IRJCS/RS/Vol.04/Issue12/DCCS10085
DOI: 10.26562/IRJCS.2017.DCCS10085
Received: 18, November 2017
Final Correction: 27, November 2017
Final Accepted: 07, December 2017
Published: December 2017
Citation: Ranjan, A. & Meshram, P. (2017). REGRESSION ANALYSIS ON STUDENTS LIFESTYLE- International
Research Journal of Computer Science, Volume IV, 21-27. doi: 10.26562/IRJCS.2017.DCCS10085
Editor: Dr.A.Arul L.S, Chief Editor, IRJCS, AM Publications, India
Copyright: ©2017 This is an open access article distributed under the terms of the Creative Commons Attribution
License, Which Permits unrestricted use, distribution, and reproduction in any medium, provided the original author
and source are credited
Abstract-- Big Data is the most upcoming field of research and can be used to interpret a lot of trends in data .In
my research I have analysed some attributes on the grade of students in the subject Maths from a Portugal senior
secondary school with the help of Regression Analysis on Excel. It draws various significance in the data and
various patterns to be inculcated in curriculum of schools. The Regression Analysis consists of T-Statistics, P-
Statistics and ANOVA test. It is determined between 95% confidence levels.

I. INTRODUCTION
Big Data is a pool of data which becomes useful only when patterns are drawn and knowledge and information is
extracted. Student’s lifestyle has different aspects which affect their grades. This is an area of utmost interest. It
can be used for the study of evolution in crowd and changes that can be made in course curriculum to make
students more efficient and interactive. Big data is a vast topic of research; I have proposed a way for the analysis
of big data of students of college in secondary education of two Portuguese schools. The data collected various
grades classified as G1 , G2 half yearly grade and G3 Final grade and students personal devotion of time into
various day-to-day activities.G1,G2 are strongly co-related with G3 as the final grade is cumulative outcome of
these two. Other aspects of students are graded and analysed which help in Multiple regression analysis. Some
aspects show a strong relation with the Grades which I have explained in my paper. Regression technique is being
used which helps in verifying the relation of different attributes with each other .It consist of independent and
several dependent variable whose correlation is test for the data analysis .It is the basic technique of data mining
and implemented by various tools like Excel , R , Python. The data sets can be further studied by the help of graphs
specifically Scattered Graphs.
II. REGRESSION ANALYSIS

Regression Analysis is used to find trends in data. It is specifically used in statistics. For example If you want to
check how much was your profit within 3 years on the basis of cost of raw material then you can predict it by
sketching a linear regression curve and determine the value of slope to analyse the rate of cost. Since regression
can be analysed by curve, it is easy to draw conclusion and predict the nature of trend.
__________________________________________________________________________________________________
IRJCS: Impact Factor Value – SJIF: Innospace, Morocco (2016): 4.281
Indexcopernicus: (ICV 2016): 88.80
© 2014- 17, IRJCS- All Rights Reserved Page -21
International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842
Issue 12, Volume 4 (December 2017) www.irjcs.com

Fig 1 Linear Regression Equation


A) Multiple Regressions
It consist of one independent variable and several dependent variables to draw relation among them and their
effectiveness on the independent variable For Example :- If the cost of cloth is to be observed we take cost of raw
material, machines cost and logistics cost as independent variable after the regression analysis a scatter plot can
be draw suppose there is 0.14 coefficient of regression of raw material then price changes by 14%.These are the
conclusions drawn and there is a square of R which tells the interrelation.

Fig 2 Equation of Multiple Regressions


B) P-Statistics
The "p-value" of a test usually identifies the probability of the test results happening by luck or random chance. So
the p-value is a probability, which can be zero or one, or any number in between. It is preferred to be less than
0.05.

Fig 3 Equation for calculation P-value


C) The confidence level
It indicates the probability that the value of a parameter falls within a specified range of values. It refers it if the
same population is sampled number of time and intervals are made on every sampling the result would contain
the true population in the bracket of 95% of distribution in the case.

Fig 4 Normal Distribution


__________________________________________________________________________________________________
IRJCS: Impact Factor Value – SJIF: Innospace, Morocco (2016): 4.281
Indexcopernicus: (ICV 2016): 88.80
© 2014- 17, IRJCS- All Rights Reserved Page -22
International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842
Issue 12, Volume 4 (December 2017) www.irjcs.com
D) T-stats
It is known as test statistics by scientists. It compares the data to what is affected by null hypothesis .T-statics is
calculated with the help of p-value, it basically indicated if the sample size which is being evaluated is significant to
general data or not. Degree of freedom (df) is closely related to sample size and helps in plot of t-distribution
Graph.

Fig 5 Equation of t-statistics

Fig 6 T-Distribution Curve


This represents the T-distribution curve for degree of freedom =20. It is related to sample size 21.It describes the
probability density function.
Output of Multiple Regression Analysis
 The coefficient of Determination or R
 Value of R2
 Error Estimation
III. DESCRIPTION OF DATA SET
For prediction of student performance data set of a Portugal school was calculated by questionnaire and students
profile records to analyse. The data set consist of 33 attributes each having independent description as follows:-
1. school –name of the student's school ('GP' - Gabriel Pereira or 'MS' - Mousinho da Silveira)
2. sex - student's sex ('F' - female or 'M' - male)
3. age - student's age ( from 15 to 22)
4. address - student's home address ('U' - urban or 'R' - rural)
5. famsize - family size ( 'LE3' - less or equal to 3 or 'GT3' - greater than 3)
6. Pstatus - parent's marital status('T' - living together or 'A' - apart)
7. Medu – education of mother ( 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 secondary
education or 4 higher education)
8. Fedu - education of father ( 0 - none, 1 - primary education (4th grade), 2 5th to 9th grade, 3 secondary
education or 4 higher education)
9. Mjob - mother's job ( 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or
'other')
10. Fjob - father's job ( 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or
'other')
11. reason - reason to choosing the respective school ( close to 'home', school 'reputation', 'course' preference or
'other')
12. guardian - ( 'mother', 'father' or 'other')
13. traveltime – time spent on errands (numeric: 1 - <15 min., 2 - 15 to 30 min., 3 - 30 min. to 1 hour, or 4 - >1
hour)
14. studytime– weekly hours donated to study (1 - <2 hours, 2 - 2 to 5 hours, 3 - 5 to 10 hours, or 4 - >10 hours)
__________________________________________________________________________________________________
IRJCS: Impact Factor Value – SJIF: Innospace, Morocco (2016): 4.281
Indexcopernicus: (ICV 2016): 88.80
© 2014- 17, IRJCS- All Rights Reserved Page -23
International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842
Issue 12, Volume 4 (December 2017) www.irjcs.com
15. failures – number of classes failed (numeric: n if 1<=n<3, else 4)
16. schoolsup - educational support ( yes or no)
17. famsup - educational support via family (yes or no)
18. paid – external classes taken for math (Math) (yes or no)
19. activities - extra-curricular indulgenment of students in activities ( yes or no)
20. nursery – studied nursery school ( yes or no)
21. higher – opt for higher education ( yes or no)
22. internet –usage of Internet ( yes or no)
23. romantic –indulgent in romantic relationship ( yes or no)
24. famrel – type of family relationships (numeric: from 1 - very bad to 5 - eExcellent)
25. freetime - ( from 1 - very low to 5 - very high)
26. goout – time spend with friends outside home ( from 1 - very low to 5 - very high)
27. Dalc - schoolday alcohol consumption (numeric: from 1 - very low to 5 - very high)
28. Walc - weekend alcohol drink(from 1 - very low to 5 - very high)
29. health - health status ( from 1 - very bad to 5 - very good)
30. absences – absenteeism in school (numeric: from 0 to 93)
31. G1 - first period grade (numeric: from 0 to 20)
G2 - second period grade (numeric: from 0 to 20)
G3 - final grade (numeric: from 0 to 20, output target) obtained in Maths.
IV. REGRESSION ANALYSIS ON DATA SET
The outcome of regression analysis observed showed variance with lesser value of r2.I has taken some attributes
in consideration for regression analysis, the following are:-
 G3
 studytime
 failures
 internet
 freetime
 famrel

Fig 7 Sample Output of Regression Data Set


__________________________________________________________________________________________________
IRJCS: Impact Factor Value – SJIF: Innospace, Morocco (2016): 4.281
Indexcopernicus: (ICV 2016): 88.80
© 2014- 17, IRJCS- All Rights Reserved Page -24
International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842
Issue 12, Volume 4 (December 2017) www.irjcs.com

 health
 absences
 traveltime
 goout
 Dalc
In this independent variable is Y=G3 (Final Grade score) and dependent variables X is from 2 – 11. All these
attributes have contributed in the analysis and have various different coefficients of correlation- test define the
probability of luck of an even or it’s true occurrence and t-test is obtained with the help of p- value which helps in
determining the significance. The Regression test has the following output which was conducted on the data set
with use of Excel data analysis technique. The below result is not used for analysis as the p-value greater than 1.5
does not help to predict the outcome i.e. not of significance. Therefore we run the regression again to interpret the
significant answer.

Fig 8 Sample Output of Regression after omitting attributes of p-value <1.5


Now that the result is modified the predictive values are much closer to 0.05 and our predictions would be more
accurate. The result shown above has a section called ANOVA which just means it analyses the variance and gives
us bunch of data like regression, residual and total. The degree of freedom in regression specifies the number of
dependent variable k=3 i.e. travel time, going out time with friends and free time. The number of observation is
n=395.Now the Degree of freedom of residual is n-k-1=391.And the rest of the table gives us more significance
about why all the attributes do not contribute equally to the grade outcomes of students or technically why is
there a variation. The total number of variation is the SST i.e. is the total sum of squares equal to 8269.90.Now to
understand if the regression test did a good job we take the values of sum of squares and run the F-test which tells
are the two attributes variance is different. Since the explained variance is 4 times than the unexplained variance
therefore the F value is large. Since P-value that is significance is large the P value is significant because p<0.05.
The result shown above provides us with multiple R value of 0.18 i.e. 18% which is coefficient of correlation
meaning the dependency of dependent variable is 18% in the given data set .R2 of the data set is 3% that shows
low significance but as the scientist have proclaimed that human behaviour and attributes are mostly
unpredictable therefore the significance level is mostly found less than 50%and as the number of dependent
variables increase so does the R2 value. The intercept is 12.5 and that is where the line is intercepting with the Y-
axis. Now let’s observe each of the attribute individually. Time spent by a student on travelling from school to
home gets him tired and exhausted therefore the coefficient is 0.74 i.e. with increase in 1 unit of time spent on
travelling the decrease of students grade will be -0.736801421Output target .Similarly all other factors like free
time increase by 0.231304228, whereas going out with friends reduces by 0.592448826 output target. The upper
and lower 95% indicates the true population parameter and varies according to the attributes. When we look at
individual p-value of attributes travel time and going out time have value less than 0.05 therefore they are
significant whereas free time isn’t that significant.

__________________________________________________________________________________________________
IRJCS: Impact Factor Value – SJIF: Innospace, Morocco (2016): 4.281
Indexcopernicus: (ICV 2016): 88.80
© 2014- 17, IRJCS- All Rights Reserved Page -25
International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842
Issue 12, Volume 4 (December 2017) www.irjcs.com

V. SEPARATE ANALYSIS OF DEPENDENT VARIABLE


Travel Time

traveltime
4.5
4
3.5
3
2.5
2
1.5
1
0.5 y = -0.0178x + 1.6338
0 R² = 0.0137
0 5 10 15 20 25

Fig 9 Graph of Travel Time


Free Time

freetime
6

1
y = 3E-05x + 3.2287
0 R² = 2E-05
0 100 200 300 400 500
Fig 10 Graph of Free Time
Going out time

goout
6

1 y = -0.0323x + 3.4449
R² = 0.0176
0
0 5 10 15 20 25

Fig 11 Graph of Going Out time

__________________________________________________________________________________________________
IRJCS: Impact Factor Value – SJIF: Innospace, Morocco (2016): 4.281
Indexcopernicus: (ICV 2016): 88.80
© 2014- 17, IRJCS- All Rights Reserved Page -26
International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842
Issue 12, Volume 4 (December 2017) www.irjcs.com

VI. CONCLUSION
As we have seen in the analysis of regression test on the Data Set the Final Grade of student decrease by increase
in one unit of travelling time and going out with friends but there is slight increase by 0.23 in grade if the attribute
free time increase as it lets the student do some recreational activities and helps in clearing his /her mind.
Therefore introduction of recess is not only good for health but also for the refreshment of brain and removal of
saturation caused by studying. By this research we can conclude that students should be given free time to make
them more efficient and energetic .In future more attributes of students life could be judged to analyse their scores
as they are the future of every nation and more powerful tools can also be used for analysis like R programming or
most upcoming language Python.
VII. LITERATURE REVIEW
Few research papers were found that discuss the behaviour of students. Abdul RaufBaiga ,HajiraJabeenb wrote a
paper entitled "analytics and expands it from the limited realm of websites and Ecommerce. They argue that
enough data is available in a university environment Big data analytics for behaviour monitoring of students", In
this paper, he explains a new meaning to behavioural ornament that can be harnessed with the help of Big Data
model and accompanying technologies to monitor and predict deviant behavior in students. Sarah Mohamed
Hassanaimed and Muna S. Al-Razgan in their research paper “Pre-University Exams Effect on Students GPA: A case
Study in IT Department", analyze the Many students have a very high score in the high school, but they did not
enter the college they want because of the scores in competition exams. GhadaBadra,b*, Afnan Algobaila, Hanadi
Almutairia, ManalAlmutery in their paper “Predicting Students’ Performance in University Courses: A Case Study
and Tool in KSU Mathematics Department” tells the performance of students in programming courses based on
their performance in English and Mathematics course .An application is designed based on CBA rule generation
algorithm .It shows that English course has Significant effect on programming course. Predicting Critical Courses
Affecting Students Performance: A Case Study by Yasmeen Altujjar, Wejdan Altamimi, Isra Al-Turaiki∗, Muna Al-
Razgan explain the effect of various subjects of IT department in the future selection of courses by the student and
their effectiveness. They have used the technique of Educational Data Mining to draw patterns and predict the data
accurately.

REFERENCES
1. GhadaBadra,b, AfnanAlgobaila, HanadiAlmutairia, ManalAlmutery use eeducational data mining technique for
analysis on” Predicting Students’ Performance in University Courses: A Case Study and Tool in KSU
Mathematics Department”
2. Abdul RaufBaiga,, HajiraJabeen analyse the deviation of students towards terrorism in their paper “Big data
analytics for behavior monitoring of students”
3. Sarah Mohamed Hassana,,Muna S. Al-Razganb on their paper “Pre-University Exams Effect on Students GPA:
A case Study in IT Department” study on gpa and hig school marks and draw conclusions.
4. HatemAbdulKadera, EmadElAbdb, WaleedEadc in their paper “Protecting online social networks profiles by
hiding sensitive data attributes” study about the todays generation interaction with social media and it’s
sensitive data.
5. LinahAburahmaha, Hajar AlRawib, Yamamah Izzc, Liyaka thunisa Syedd study on online social media and
gaming impacts on current generation on their paper” Online Social Gaming and Social Networking Sites”
6. Yasmeen Altujjar, Wejdan Altamimi, Isra Al-Turaiki , Muna Al-Razgan in their paper “Predicting Critical
Courses Affecting Students Performance: A Case Study” study about difficult subjects.
7. Extraction of Data Set from:- P. Cortez and A. Silva. Using Data Mining to Predict Secondary School Student
Performance. In A. Brito and J. Teixeira Eds., Proceedings of 5th Future Business Technology Conference
(FUBUTEC 2008) pp. 5-12, Porto, Portugal, April, 2008, EUROSIS, ISBN 978-9077381-39-7.

__________________________________________________________________________________________________
IRJCS: Impact Factor Value – SJIF: Innospace, Morocco (2016): 4.281
Indexcopernicus: (ICV 2016): 88.80
© 2014- 17, IRJCS- All Rights Reserved Page -27

Das könnte Ihnen auch gefallen