Sie sind auf Seite 1von 8

Generalized Poisson Regression Models for Drop Out of Senior

High School in East Java – 2019

Ir. Sri Pingit Wulandari, M.Si 1 and Lina Izzah Mazidah 2


1, 2
Business Statistics, Sepuluh Nopember Institute of Technology, INDONESIA.
(E-mail: sripingitwulandari@gmail.com, linamzd11@gmail.com)

Abstract. Education is one of the important sectors to contribute the quality of human resources in
Indonesia, education is indicator of human development index trough mean years schooling and
expected years of schooling. One of education problems in Indonesia is a high number of drop out.
In 2017 Indonesia has 31,123 drop out in a grade of senior high school. Drop out is students who
leave school before graduation at a certain grade of education, the lower drop out numbers the better
education. East Java is the province with the most number of student drop out of school population
after West Java, in amount to 3,850 for senior high school in 2017. This research method using
Generalized Poisson Regression to find out the factors that affect the number of student dropped out
of school in East Java. Based on the research that has been done get the results is variables that effect
significantly to the number of drop out in East Java was percentage of poor people and expectation
years of schooling. If percentage of poor people rises, drop out increase. If expectation years of
schooling rises, drop out decrease.

Keywords: East Java, Drop Out, Generalized Poisson Regression

1. Introduction
Education is an important sector that directly contribute to improve the quality of human resources in
Indonesia. in accordance to Law No. 20 of 2003, education is a conscious effort and planned to help boost
the development potential and the ability of children to benefit his interests as individuals and as citizens in
the future. Education is also one of the indicators of human development index by the average length of the
old school and the expectations of the school. Various government programs to support education in
Indonesia is very diverse including Indonesia smart card which superbly spread to 18,991,972 students,
construction of libraries that have been realized on 2,269 units, and the new school unit already carried out
as many as 271 units (Kemdikbud, 2016). Program 12-year compulsory education is now considered quite
successful, but the number of compulsory school age children who only reached elementary school and
junior high school big enough.
The problems of education in Indonesia, partly because of the high number of pupils dropping out of
school. The dropout rate is the percentage of students who leave school before graduation in a certain
education level, the lower the dropout rate, the better. Dropout rate should not exceed 1% of the number of
students enrolled in school. While the number of students dropping out of school in East Java is the second
largest in Indonesia, as many as 3850 senior high school dropout. The high number of drop-outs required
the government to handle the hard work getting bigger and burdensome government regarding the
achievement of the goals of compulsory education to 12 years. So that there is research about the factors
that affect student drop-outs in East Java.
Previous research conducted by Astari (2013) in her research on school dropouts earned significant
influential variable is the ratio of pupils to the school, the ratio of students to teachers, school enrollment
rates [2]. Then on research conducted Nasra (2017) on the modeling of the dropout rate for children aged
compulsory in South Sulawesi obtained factors that affect the dropout rate is the ratio of pupils to teachers,
the number of poor population, population density, and the average length of the school [8].
The number of students dropping out of school in East Java is the data count, so that the appropriate
regression analysis was used to analyze is the Poisson regression. Poisson regression is used to model with
the response variable assuming a Poisson distributed [4]. The assumption must be met at the Poisson
regression analysis that is not the case multicolinearity between the predictor variables, and a variance equal
to the mean value. But in its application, mean and variance is not the same. Cases like this are called
overdispersion or variance value is greater than the mean. In case of overdispersion, poisson regression
becomes invalid, so we need a method to overcome overdispersion on poisson regression. Generalized
Poisson Regression models (GP) is an appropriate model for the data count in which a violation of the
assumption of the sample mean is equal to the sample variance in the Poisson distribution, or in other words
if there is over / under dispersion. so in addition  in GP there  as the dispersion parameter [5].

2. Theory
2.1 Generalized Poisson Regression
Generalized Poisson Regression Model is appropriate for the data count in which the assumption of the
sample mean is equal to the sample variance in the Poisson distribution, or in other words there is over /
under dispersion. so in addition  in Generalized Poisson contained  as the dispersion parameter.
GPR model is similar to the Poisson regression model which is a model of generalized linear models (GLM),
but on the model assumes that the random components GPR Generalized Poisson distribution. For example,
yi = 0,1,2, ... is the response variable. Generalized Poisson distribution is as follows (Famoye, et al., 2004).
y y 1
 i  i 1   yi  i
f  i , , yi     exp  i

  1   y 
i  

 1  i  yi !  1  i 
 
Where yi  0,1 , mean and variance GP model is as follows:
E  y i xi    i i iV y x    1  
i i 2
and
If   0 then the regression model GP will be a Poisson regression. If   0 , Then the regression model
GP is overdispersion, and if   0 then the count underdispersion. Generalized Poisson regression models
have the same form as the Poisson regression model.
i  exp(Xi T β)  exp( 0  1 xi1   2 xi 2  ...   k xik )

2.2 Parameter Estimation Model GPR


Parameter estimation model GPR with distribution function was conducted by MLE (Maximum
Likelihood Estimator). GPR likelihood function for the model are as follows (Putu, 2013).
n
L ( i ,  )   f (  i ,  )
i 1

    yi n (1   y ) yi 1 n
n
   1   yi   
   i
  i
 exp  i 
i 1   1  i  yi !  1  i  
 i 1 i 1

Furthermore, the equation above is converted to natural logarithm function becomes.


 
 
y 1
 n  i  i n 1 yi   i 1   yi   
y i
ln L( i , )  ln      exp  
 
 i 1 1i  i 1 yi !  1  i  
 

i i   i 
 y ln   y ln 1  y 1 
n 
i i 

 

i 1 yi  

 
i 1 ln 1 y  ln y ! 
i   
i 1i 

T
By substituting the value  i  exp X i β it is obtained
 yi ln(exp( XTi β))  yi ln(1   exp( XTi β))  ln( yi !) 
n
 
ln L(  , )    exp( XTi β)(1   yi ) 
i 1  ( yi  1)ln(1   yi ) 
 1   exp( Xi β)
T 
 
 yi ( Xi β))  yi ln(1   exp( Xi β))  
T T
n
 
   ( yi  1)ln(1   yi )  ln( yi !)  
i1  1 
 exp( Xi β)(1   yi )(1   exp( Xi β)) 
T T

Then the equation of the natural logarithm of the likelihood function downgraded to β and equated to zero
T

to get the parameters  , The following results of the second derivative:


 ln L(  , )
   Yi XTi  yi exp(XTi β)(1   exp( XTi β))1   
n

T
β i 1

Information :
 XTi exp( XTi β)(1   ( XTi β)) 1   xiT 
  (1   yi )  
 (exp( XT β))2 (1   ( XT β)) 2
 i i 
If you want to get a parameter estimator  the equation derived to  and equated to zero. The resulting shape
is derived.
 ln L(  ,  ) n  y i exp( X Ti β)(1   exp( X Ti β)) 1  y i 
  1 
 i 1  ( y i  1)(1  y i )  
Information :
 yi (1   exp( XTi β))1  (1   yi ) 
  exp( XTi β)  
 exp( XT β)(1   exp( XT β)) 2
 i i 
Ln likelihood function decline against β and  often producing an explicit equation that used the
T

numerical method, Newton-Raphson iteration as in Section 2.1 to get an alternative solution.


2.2 Testing Parameter Model GPR
GPR testing model parameters were calculated using a Maximum Likelihood Ratio Test (MLRT) as the
parameter testing Poisson regression models [1]. poisson regression parameter testing with the following
hypothesis.
H0: β1  β2   βk  0
H1: at least one β j  0 ; j = 1,2, ..., k
The test statistic used is as follows.
 L ˆ  
 
D βˆ j  2 ln  
L 

ˆ   

With L ̂ Likelihood value is to complete models involving the predictor variables and L̂  , Which grades
the likelihood for a simple model without involving the predictor variables.
 
Reject H0 if D βˆ j   k ,  , If the H0 is rejected means that at least one βˆ j  0 which shows that xj
2

significantly influence the model. Testing continued with partial test the hypothesis:
H0:  j  0
H1:  j  0 ; J = 1,2, ..., k
The test statistic used is.
ˆ j
Z
 
SE ˆ j

H0 will be rejected if Z hitung  Z  Where  is the level of significance used.


2

2.3 Selection of the Best Model


Modeling is required to get a relationship that describes the response variable and the predictor
variable. There are several methods to determine the best model in Generalized Poisson regression, one of
which is the Akaike Information Criterion (AIC) [3]. AIC is defined as follows:
AIC  2ln L  β   2k
Where L β  is the likelihood value, and k is the predictor variable. Best Model of Generalized Poisson
regression is the smallest AIC value.

3. Result and Discussion


3.1 Characteristics Number of Students Dropout in East Java in 2017
Characteristics of high school drop-out students in East Java can be seen in figure 1 below.
Figure 1. Map Spread High School Dropout

Figure 1 shows that the average level of high school dropouts were in the range 40-65 student. The
number of drop-outs above the average found in Malang, Lumajang, Banyuwangi, Bondowoso, Situbondo,
Probolinggo, Mojokerto, Jombang, Nganjuk, Bojonegoro, Bangkalan and Sumenep. While the number of
students dropping out of school at least or below the average among others Pacitan, Tulungagung, Sidoarjo,
Madiun, Magetan, Lamongan, Kota Blitar, Malang, Probolinggo, Pasuruan, Mojokerto, Madiun and
Surabaya. The number of drop-outs in East Java caused by several factors, such as family problems to
economic problems.
Characteristics of school dropout and suspected factors are shown in Table 1 below.
Table 1 Characteristics of Dropout and Factors Affecting Suspected
Variables Average standard Deviation Minimum Maximum
Y 52.66 39.98 0 150
X1 938 1068 138 4051
X2 73.35 10.81 50.61 92.17
X3 11,625 4.72 4,172 23.562
X4 13.084 .921 11.38 15.39
X5 11.404 3.739 3.84 20.79
X6 339.4 124.5 90.4 539.1
X7 11.97 17.47 0 76
Table 1 shows that the number of students drop out of school in a high school level. Dropout is a students
16-18 years age group are no longer attending school or graduated to an education level of high school
(SMA). Dropout in East Java in 2017 had an average of 52.66 ≈ 53 students dropped out of school with a
standard deviation of 39.98 pupils which means the number of pupils dropping out of school in East Java
as many as 53 in each regency city with a diversity of pupils 39.98. Standard deviation shows the great
diversity of the dropouts in East Java. The number of students dropping out of school in East Java, the
lowest in Stone Town that is equal to 0 students or no students drop out of school. The highest number of
student drop-outs are in Bojonegoro by 150 pupils.
The average population density (X1) in the city of East Java in 2017 amounted to 938 inhabitants /km2.
The diversity of the data indicated by standard deviation amounted to 1063 people / km2. This indicates that
the population density in East Java of 939 inhabitants / km2 in each regency / city with considerable diversity
that is 1063 people / km2. Surabaya is the region that has the highest population density in the amount of
4051.41 inhabitants / km2 while the lowest population density in the amount of 138 inhabitants / km2 in
Banyuwangi.
School enrollment rates (X2) the proportion of children of school age education levels specified in the
appropriate age group the education levelmiddle school (high school). School Enrollment in the city in East
Java in 2017 had an average of 73.35% with a diversity of data amounting to 10.81%. Lumajang has the
lowest school enrollment rates in the amount of 50.61% and the highest in Blitar City by 92.17%.
The percentage of poor people (X3) seen from the poor in East Java, The poor population is the population
had an average monthly per capita expenditure below the poverty line. The percentage of poorin the city of
East Java in 2017 had an average of 11.625% with a diversity of data at 4.72%. Areas with the lowest
percentage of poor population that is Malang amounting to 4.172%, while the percentage of poor people is
highest in Sampang District of 23.56%.
School expectation number (X4) defined the length of the school (in years) are expected to be felt by
children at a certain age. Hope the school expectation number can be used to determine the condition of the
development of the education system as indicated by the length of education is expected to be reached until
the high school. School expectation number in the East Java in 2017 had an average of 13.084 years, with
a standard deviation of 0921. Standard deviation value indicates that school expectation number in East Java
has a fairly small value of diversity. This suggests that the expectation years of schoolingin East Java for
13.084 years with a fairly small diversity. The region has the highest expectation years of schoolingin
Malang by 15,
The student-teacher ratio (X5) is ratio between the number of students at a school level with the number
of teachers. The student-teacher ratio describes the workload of teachers in teaching and see the quality of
teaching in the classroom. The student-teacher ratio in high school of East Java in 2017 had an average of
11.404 with a diversity of 3.789. This shows that the workload of high school teachers in East Java by
11.404 students. Madison County has a value of student / teacher ratios low of 3.84. While Pasuruan has a
value of student / teacher ratios high of 20.79.
The ratio of student / school (X6) is a comparison of the number of students by the number of schools at
the high school level. The ratio of students to school middle school (high school) in the city of East Java in
2017 had an average of 339.4 and a standard deviation of 124.5. This shows that on average there are 339
students at each high school level school in East Java. The area with the ratio of student / school is located
in Pamekasan lows of 90.4 and the highest in Pasuruan that is equal to 539.1.
Number of students failing a grade (X7) is the number of students failing a grade or grade repetition
because does not reach the standard of minimum completeness. Number of students repeated a grade in the
middle high school in East Java in 2017 had an average of 11.97 ≈ 12 students repeated with a diversity of
17.47. This means that the number of students repeated in East Java by 12 students in each regency / city
with a diversity of pupils 17.47. There are 76 students repeated which is the largest number in East Java
present in Sumenep. While at Kota Batu and Madiun there is no repeat at the high school level students.

3.2 Model Generalized Poisson Regression (GPR)

GPR can overcome the odds overdispersi for load distribution function dispersion parameters in it. GPR
modeling done by regression on all possible combinations of the predictor variables. Model GPR obtained
is then selected based on the Akaike's Information Criterion (AIC), the smallest obtained through a
combination of 7 predictor variables as many as 126 combination and summarized in Table 2 below.
Table 2 AIC Value GPR
Variables AIC
X3 378.8
X3, X4 377 *
X3, X4, X7 377.4
X3, X4, X5, X7 377.7
X1, X3, X4, X5, X7 379.7
X1, X3, X4, X5, X6, X7 381
X1, X2, X3, X4, X5, X6, X7 383
Table 2 shows that the combination of predictor variables that have the smallest AIC value is a
combination of the two predictor variables, namely X3, X4 amounted to 377. The results of the analysis
model of GPR with 2 predictor variables contained in Appendix 13 obtained the following model.
ˆ  exp  7,0658  0,1468 X 3  0,09705 X 4 
The above model is a model of GPR further testing the significance of parameters simultaneously and
partially. Concurrent testing of the model parameters GPR aims to determine whether the predictor variables
impact the response variables simultaneously. Hypothesis on simultaneously testing is as follows.
H0 : 1   2  0 (All predictor variables no significant effect on model)
H1 : At least one  j  0, j  1,2 (At least one variable that significantly influence the model)
Value D( ˆ ) is 369 with significance level used by 5%, then the value D( ˆ ) greater than the value
X 2 (0,05;35) ie 22.465 so it was decided reject H0, meaning that there is at least one variable that significantly
affect to the model.
Furthermore, to determine which variable significantly affect the model parameter significance testing is
done partially with the hypothesis as follows.
H0 :  j  0, j  1,2 (Variable j does not significant effect on model)
H1 :  j  0, j  1,2 (Variable j significant effect on model)

variables Coefficient Z  exp(  )


X3 .1468 2.23 * .1468 1,158
X4 -0.3717 -2.23 * -0.3717 .689
The results of significance test parameters using the 5% significance level obtained Z0,05 / 2 by 1.96. Table
3 shows that the value of | Z | the predictor variable has a value greater than Z0,05 / 2, meaning that the
variable X3, X4 significant effect on the model. The variable is the percentage of the por people (X3), and
the expectation years of schooling.
The above model shows that each additional 1% percentage of poor (X3), the number of high school
drop-out students in East Java will increase by one student on condition of other predictor variables constant.
It can be seen, the poverty rate is the percentage of people who have an average expenditure below the
poverty line so that the higher the dropout poverty will also increase. The number of drop-outs in areas with
a high percentage of poor people, so the government needs to reduce the number of poor people through
various policies including job opportunities, facilitate access to education to free education and so forth.
Then, each 1% increase in the expectation years of schooling (X4), the number of students dropping out of
school will be decreased by 0, 689 pupils on condition that the other predictor variables constant. It can be
seen that the hope of an old school old school is expected to be felt by children at a certain age in the future.
Hope it can be used to determine the condition of education system development in an area so that the higher
the expectation years of schooling dropout it will be low.
This model sites generate the estimated value of  by 0.09705. the estimated value  is closer to 0,
meaning that the resulting models do not occur overdispertion case that does not produce biased parameter
estimates.

4. Conclusion

Based on the analysis and discussion that has been described, it was concluded that the best model is
obtained using Generalized Poisson Regression method that is
ˆ  exp  7,0658  0,1468 X 3  0,09705 X 4 
Factors that influence the number of high school drop-out students in East Java in 2017 by using Generalized
Poisson Regression is the percentage of poor (X3) and the expectation years of schooling(X4). If the
percentage of poor people increases, the dropout increase and if the expectation years of schooling dropout
increases then decreases.

5. Reference

[1] Agresti, A. (2002). Categorical Data Analysis Second Edition. New York: John Wiley and Sons.
[2] Astari, GA (2013). Paper: Modeling Number of School Children in Bali with Semi-Parametric
Approach Geographycally Weighted Poisson Regression. Bali's Udayana University: Department of
Mathematics.
[3] Bozdogan, H. (2000). Akaike's Information Criterion and Recent Developments in Information
Complexity. Mathematical Psychology, 44, 62-91
[4] Cameron, C. A and Trivedi, P. (1998). Regression Analysis of Count Data. New York: Cambridge
University Press
[5] Famoye, F., Wulu, JT & Singh, KP (2004). On The Generalized Poisson Regression Model with an
Application to Accident Data. Journal of Data Science 2 (2004) 287-285
[6] Kemendikbud. (2017). High School Education Statistics. Jakarta: Kemendikbud
[7] Mc Cullagh, P., & Nelder, J. (1989). Generalized Linear Models Second Edition. London: Chapman
& Hall.
[8] Nasra, N. (2017). Modeling Attrition For Childhood Compulsory In the province of South Sulawesi.
Makassar State University. Mathematics. FMIPA
[9] Putu, I. (2013). Application of Generalized Poisson Regression to Overcome Overdispersi phenomenon
in Case Poisson regression. Bukit Jimbaran: Universitas Udayana

Das könnte Ihnen auch gefallen