Comparative Performance of Six Supervised Learning Methods For The Development of Models of Hard Rock Pillar Stability Prediction

Nat Hazards (2015) 79:291316
DOI 10.1007/s11069-015-1842-3
ORIGINAL PAPER
Comparative performance of six supervised learning

methods for the development of models of hard rock
pillar stability prediction
Jian Zhou1,2 Xibing Li1 Hani S. Mitri2,3
Received: 16 May 2014 / Accepted: 30 May 2015 / Published online: 10 June 2015
Springer Science+Business Media Dordrecht 2015
Abstract The prediction of pillar stability (PS) in hard rock mines is a crucial task for
which many techniques and methods have been proposed in the literature including
machine learning classification. In order to make the best use of the large variety of
statistical and machine learning classification methods available, it is necessary to assess
their performance before selecting a classifier and suggesting improvement. The objective
of this paper is to compare different classification techniques for PS detection in hard rock
mines. The data of this study consist of six features, namely pillar width, pillar height, the
ratio of pillar width to its height, uniaxial compressive strength of the rock, pillar strength,
and pillar stress. A total of 251 pillar cases between 1972 and 2011 are analyzed. Six
supervised learning algorithms, including linear discriminant analysis, multinomial logistic
regression, multilayer perceptron neural networks, support vector machine (SVM), random
forest (RF), and gradient boosting machine, are evaluated for their ability to learn for PS
based on different input parameter combinations. In this study, the available data set is
randomly split into two parts: training set (70 %) and test set (30 %). A repeated tenfold
cross-validation procedure (ten repeats) is applied to determine the optimal parameter
values during modeling, and an external testing set is employed to validate the prediction
performance of models. Two performance measures, namely classification accuracy rate
and Cohens kappa, are employed. The analysis of the accuracy together with kappa for the
& Jian Zhou

csujzhou@hotmail.com
Xibing Li
xbli@mail.csu.edu.cn
Hani S. Mitri
hani.mitri@mcgill.ca
1
School of Resources and Safety Engineering, Central South University, Changsha 410083, China
Department of Mining and Materials Engineering, McGill University, Montreal, QC H3A 2A7,
Canada
School of Civil Engineering, Henan Polytechnic University, Jiaozuo 454150, China
123
292
Nat Hazards (2015) 79:291316
PS data set demonstrates that SVM and RF achieve comparable median classification
accuracy rate and Cohens kappa values. All models are fitted by R programs with the
libraries and functions described in this study.
Keywords Pillar stability Pillar design Hard rock mine Supervised learning
Classification Repeated cross-validation R system
1 Introduction
Underground mining almost invariably involves leaving portions of the ore in the form of
pillars, which are key structural columns (Brady and Brown 2003; Deng et al. 2003; Zhou
et al. 2011). Pillar stability is an essential prerequisite for safe working conditions in roomand-pillar mines (Salamon 1970; Ghasemi et al. 2014a). Unstable pillars can result in rock
sloughing from the pillar and can lead to the collapse of the roof if one or more pillars
should fail (Mortazavi et al. 2009). As mining goes deeper, pillar failure becomes more
frequent and critical due to the increase in ambient stresses. Consequently, pillar design
and stability are two of the most complicated, extremely important and extensive problems
in mining related to rock mechanics and ground control subjects.
Because of their significance in safe and economical extraction of underground ores
over the past decades, a great deal of valuable results on this topic has been reported by a
number of authors on a variety of aspects and has made admirable efforts over the pillar
design and layout applied in rocks. Various researchers have proposed a number of
empirical design methods for pillar strength determination and often applied in practice,
which have been reviewed and summarized in the literature (Hustrulid 1976; Lunder 1994;
Brady and Brown 2003; Mark 2006; Mitri 2007; Jawed et al. 2013), i.e., the linear shape
effect formula (Bieniawski and van Heerden 1975; York 1998), the power shape effect
formula (Salamon and Munro 1967; Bieniawski 1968; Hedley and Grant 1972), the size
effect formula (Hustrulid 1976), the HoekBrown formula (Hoek and Brown 1980), and
analysis of retreat mining pillar stability method (Mark and Barton 1997; Ghasemi and
Shahriar 2012). In an underground pillar design, it is difficult to determine the actual stress
that will be acting on a pillar. However, the three main methods of calculating pillar stress
are tributary area theory, numerical modeling (Lunder 1994), and neural network method
(Monjezi et al. 2011). Thus, the stability of a pillar can be evaluated by calculating a safety
factor (SF), which is the ratio of the average strength to the average stress in the pillar
(Zhou et al. 2011). Theoretically, the SF value greater than 1 means that the pillar is stable,
while the SF value lower than 1 means unstable. More often than not, these methods are
questionable because pillar failures did occur even though the failed pillars had been
considered stable, i.e., SF [ 1 (Deng et al. 2003; Zhou et al. 2011). Moreover, empirical
methods are based on interpretation of available databases, which are compiled from
ongoing or completed projects. It is therefore difficult to generalize the obtained results
beyond the scope of the original site characteristics.
Meanwhile, considerable work related to the prediction of PS has been undertaken by
means of numerical simulation methods that allow for consideration of complex boundary
conditions and material behavior. For example, a design methodology was proposed by
York (1998) using the fast Lagrangian analysis of continua (FLAC) code to enable the
yield point of the foundation of deep-level stabilizing pillars to be predicted in terms of the
123
Nat Hazards (2015) 79:291316
293
cohesion, friction angle, and depth. Hutchinson et al. (2002) recommended the use of
simulation methods for considerations crown pillar stability risk assessment in mine
planning. Jaiswal et al. (2004) used three-dimensional boundary element method (BEM) to
model asymmetry in the induced stresses over coal mine pillars with complex geometries
and have enabled successful simulation of mining conditions. Griffiths et al. (2007)
combined random field theory with an elastoplastic finite element method (FEM) algorithm
in a Monte Carlo framework to estimate the stability of pillars. By using an explicit finite
difference program FLAC3D, a model of numerical calculation was established by Li et al.
(2007) for a deep mining pillar with dynamic disturbance under high stress. Numerical
modeling was carried out by Jaiswal and Shrivastva (2009) using a three-dimensional FEM
code to study the stressstrain behavior of coal pillars. Mortazavi et al. (2009) delved into
the mechanisms involved in pillar failure as well as to investigate the nonlinear behavior of
rock pillars within the FLAC model. Elmo and Stead (2010) investigated the use of the
hybrid FEM/DEM code ELFEN in studying the failure modes of jointed pillars. Recently,
Li et al. (2013) established 3D numerical modeling based on FLAC3D to determine the
minimum thickness of the crown pillar for the subsea gold mine. Each of the numerical
methods has its advantages and disadvantages. However, the estimation of reliable values
of model input parameters is found to be an increasingly difficult task.
Besides the numerical modeling approach, statistical and analytical methods, the
probabilistic methods, and artificial intelligence-based methods or their hybrids have been
investigated in recent years and successfully used for designing pillars in coal or hardrock.
Esterhuizen (1993) showed that variability in rock mass properties and mining factors
could be taken into consideration for hard rock pillar design by statistical methods and
point estimate method. Griffiths et al. (2002) and Cauvin et al. (2009) investigated the
underground pillar stability based on probabilistic methods. Ghasemi et al. (2010) studied
the effect of variability in parameters such as uniaxial compressive strength of coal
specimen, pillar width, pillar height, entry width, and depth of cover on pillar safety factors
using a Monte Carlo simulation. Zhou et al. (2011) presented two models for predicting
pillar stability applying support vector machine and Fisher discriminant analysis techniques. Wattimena et al. (2013) developed the logistic regression model for predicting the
probability of stability of a coal pillar. On the other hand, different types of artificial neural
networks are based on combining different learning techniques, such as hybrid or ensemble
techniques. These have been reported on pillar stability analysis in recent years. Deng et al.
(2003) proposed a pillar design based on Monte Carlo simulation by combining finite
element methods, neural networks, and reliability analysis. Four ANNs, based on two
different architectures, the multilayer perceptron (MLP) and the radial basis function
(RBF), were constructed by Tawadrous and Katsabanis (2007) to predict the stability of
surface crown pillars. Monjezi et al. (2011) developed a MLP neural network model
methodology to predict the pillar stress concentration in the bord and pillar method and
compare the results with BEM numerical solution. Recently, Ghasemi et al. (2014a, b)
developed two models for the evaluation and prediction of global stability in room-andpillar coal mines considering the retreat mining conditions by employing the logistic
regression and the fuzzy logic techniques. In these studies, all the data are separated into
training and testing sets. However, cross-validation process is not implemented, and thus,
the accuracy of the predictive model is not fully understood. Hence, the issue of pillar
stability prediction still poses considerable challenge for underground mines.
Supervised learning (SL) has become steadily more mathematical and more successful
in applications over the past 20 years. The use of SL algorithms for the development of
predictive and descriptive data mining models has become widely accepted in mining and
123
294
Nat Hazards (2015) 79:291316
geotechnical applications, promising powerful new tools for practicing engineers (Garzon
et al. 2006; Berrueta et al. 2007; Sakiyama et al. 2008; Pino-Mejas et al. 2008, 2010;
Pozdnoukhov et al. 2009; Tesfamariam and Liu 2010; Zhou et al. 2011, 2012, 2013;
Gonzalez-Rufino et al. 2013; Liu et al. 2013, 2014). Numerous approaches for PS prediction have been developed based on different SL techniques during recent decades
(Tawadrous and Katsabanis 2007; Zhou et al. 2011; Wattimena et al. 2013; Ghasemi et al.
2014a, b). However, there is no comparison of SL techniques over the PS estimation.
Based on these considerations, the main objective of this study is to investigate the
suitability of different SL algorithms for the prediction of pillar stability (PS) in underground engineering. To achieve this goal, a research methodology is developed for the
comparison of the performance of different SL algorithms, including linear discriminant
analysis (LDA), multinomial logistic regression (MLR), multilayer perceptron neural
networks (MLPNN), support vector machine (SVM), random forest (RF), and gradient
boosting machine (GBM). These methods are specifically chosen because they are being
increasingly used in engineering, yet have not been compared with one another exhaustively and also thanks to the open-source software availability. The rest of this paper is
organized as follows: Sect. 2 briefly presents hard rock pillars data set and provides an
overview of SL techniques. In Sect. 3, these methods are applied to the PS prediction, and
in Sect. 4, the results are discussed by performance criteria. Finally, the conclusion is
provided in Sect. 5.
2 Materials and methods

2.1 Data sources and parameters
To measure the performance of the developed SL approaches, the data utilized in this study
are composed of 251 cases of pillar histories collected from more than 10 published
research works. The sources are reliable and include references published over the period
19722011. Field data are obtained from ten different databases of hard rock mines, which
are: (1) Elliot Lake uranium mines in Canada (Hedley and Grant 1972); (2) Selebi-Phikwe
mines in South Africa (Von Kimmelman et al. 1984); (3) open stope mines in Canada
(Hudyma 1988); (4) Zinkgruvan mine in Sweden (Sjoberg 1992); (5) Westmin Resources
Ltd.s H-W mine in Canada (Lunder 1994); (6) Dawenkou gypsum mine in China (Liu and
Zhai 2000); (7) Shizishan copper mine in China (Zheng 2002); (8) Marble mine in Spain
(Gonzalez-Nicieza et al. 2006); (9) Gaoshan gypsum mine in China (Sheng et al. 2010);
and (10) Stone mines in USA (Esterhuizen et al. 2011). The general database is the
continuation of an existing database developed by Lunder (1994) and Zhou et al. (2011).
Additional projects are added to the original sets from other available sources found in the
literature. An effort is also made to complete missing data fields within the pillar database
by checking many sources and published literature.
2.2 Data visualization

Figure 1a shows the number of cases used in this study after 1972 from different countries.
The distribution of PS data is shown in Fig. 1b as a pie chart illustrating the proportion of
the three types of PS in hard rock mines, categorized as stable (S, 108 cases), unstable (U,
54 cases), and failed (F, 89 cases). The boxplot of the original data set is given in Fig. 2.
123
Nat Hazards (2015) 79:291316
295
Fig. 1 Distribution of observed

hard rock pillar events
Fig. 2 Boxplot of each variable for the three conditions of PS
The circles represent outliers (observations greater than the third quartile plus 1.5 times the
interquartile range or smaller than the first quartile minus 1.5 times the interquartile range).
For most of the data groups, the median is not in the center of the box, which indicates that
the distribution of most data groups is not symmetric (Fig. 2). In addition, all dependent
variables have some outliers expect pillar stress and pillar strength for all PS types, UCS
for S and U types, and pillar width for U type. As shown in Fig. 3, the scatterplot matrix in
the upper panel describes the pairwise relationship between parameters with corresponding
correlation coefficients showing in the lower panel, whereas the marginal frequency distribution for each parameter is demonstrated on the diagonal. It can be seen that the
parameter UCS is notably correlated with pillar strength and pillar stress and that pillar
height is notably correlated with pillar width.
2.3 Selection of input and output variables

As mentioned above, Zhou et al. (2011) developed their models based on five parameters
including pillar width, pillar height, the ratio of the pillar width to its height, the uniaxial
compressive strength of the rock, and pillar stress. Wattimena et al. (2013) applied their
model for predicting coal pillar stability for given geometry (pillar width to height ratio)
and stress conditions (pillar strength to stress ratio). Ghasemi et al. (2010) investigated the
effect of variability in parameters such as uniaxial compressive strength of coal specimen,
pillar width, pillar height, entry width, and depth of cover on pillar safety factors using a
123
296
Nat Hazards (2015) 79:291316
Fig. 3 PS parameter interaction

matrix. Pairwise relationships are
in the lower triangle; correlation
coefficients are in the upper
triangle; marginal distributions
are on the diagonal
Monte Carlo simulation. Ghasemi et al. (2014a, b) used major contributing parameters on
pillar stability such as mining height, depth of cover, entry width, panel width, pillar width,
pillar length, crosscut angle, roof strength rating, and loading condition. Moreover, the
UCS of the intact rock is used as it is an index that can be utilized in a simpler way without
carrying out pillar strength estimation (Wattimena 2014). Pillar shape expressed by the
ratio of the pillar width to pillar height (K) that accounts for the increased strength provided by the shape and confinement of the pillar. As noted above, various researchers have
determined strength of the hard rock pillars based on empirical formulas, which is based on
parameter pillar width, pillar height, and UCS (Hedley and Grant 1972; Lunder 1994;
Martin and Maybee 2000; Esterhuizen et al. 2011). On the other hand, the pillar stability
graph typically involves two parameters: (1) pillar stress to UCS ratio and pillar width to
height ratio (K) (Lunder and Pakalnis 1994; Martina and Maybee 2000; Wattimena et al.
2014) or (2) pillar strength to stress ratio (SF) and K (Wattimena et al. 2013). Based on
these conditions, six relevant indicators are adopted in this study. There are as follows:
pillar width, pillar height, K, the uniaxial compressive strength (UCS) of the rock, pillar
strength, and pillar stress. Those indicators are recognized as the major parameters to
quantitatively discover the activities in context of pillar. Theoretically, there may be
additional indicators, but the data collection is a massive challenge for their applicability.
Hence, the six indicators are adopted, and compositions of the indicators are investigated to
discover the effects of varying indicators in the current study. In terms of pillar shape,
pillar strength, and pillar load, a number of experiments are performed using different
combinations of input parameters to assess the performance SL methods for PS, as listed in
Table 1.
Numerous scholars have conducted a variety of PS classification methods. In all of the
cases in the combined database, pillar stability assessments, which range from a simple
assessment of Stable/Failed to a more rigorous approach based upon a five- or six-stage
stability classification method (Hedley and Grant 1972; Von Kimmelman et al. 1984;
Lunder 1994), have been investigated. Reviewing the combined database and the suggestion of Lunder (1994), the PS is classified into three stages, which can be provided
adequate results for the combined database, i.e., stable (S), unstable (U), and failed (F), as
depicted in Table 2.
123
Nat Hazards (2015) 79:291316
297
Table 1 Different models for PS prediction with different input parameters

Model
Pillar height
Pillar width
UCS
Pillar strength
Pillar stress
Table 2 Description of hard rock pillar stability classification for the combined database (Modified from
Lunder 1994)
Pillar stability
classification
Description of observed pillar characteristics
Failed
Crushed (Hedley and Grant 1972)

Severe spalling, pronounced opening of joints, deformation of drill holes (Von
Kimmelman et al. 1984)
Disintegration of pillar; blocks falling out; fractures trough pillar with fracture
apertures wider than 10 mm (Lunder 1994)
Pillars with manifested failure (Gonzalez-Nicieza et al. 2006)
Unstable
Partially failed (Hedley and Grant 1972)

Prominent spalling (Von Kimmelman et al. 1984)
Fractures also appear on the central parts of the pillar (Krauland and Soder 1987)
Sloughing (Hudyma 1988)
Showing one or more of the following signs: cracking and spalling in development
and raises within the rib pillar; audible noise in the pillar; deformed drill holes;
excess muck being pulled from stopes; cracking of pillars; major displacements
within the pillar (Potvin et al. 1989)
Pre-failure, severe spalling (Sjoberg 1992)
Corner breaking only up to fracturing in pillar walls with fracture aperture up to
10 mm (Lunder 1994)
Stable pillars with incipient damage (Gonzalez-Nicieza et al. 2006)
Stable
Minor spalling, no joint opening (Von Kimmelman et al. 1984)

No sign of stress induced fracturing (Lunder 1994)
2.4 Supervised learning classification methods

As noted by Hastie et al. (2009), the machine learning problems can be roughly categorized
as either unsupervised learning or supervised learning. From a theoretical point of view,
supervised and unsupervised learning differ only in the causal structure of the
model (Kordon 2010). In unsupervised learning, all the observations are assumed to be
caused by latent variables, there is no outcome measure, and the goal is to describe the
associations and patterns among a set of input measures. In supervised learning, the model
defines the effect on one set of observations (called inputs), has on another set of observations which is fully labeled (called outputs), and the goal of supervised learning is to
predict the value of an outcome measure based on a number of input measures. Six
classification techniques are considered in the current study. These share characteristics
that make them interesting to the current analysis: (a) All these methods are being
increasingly used; (b) some of them have been used in PS classification tasks with good
results and are known to enable the analysis of more complex, nonlinear relationships;
123
298
Nat Hazards (2015) 79:291316
(c) they have efficient implementations; and (d) the resulting model allows for fast classification processing. The following six methods are compared with respect to their predictive performance. In this study, the feature vector X consists of six PS performance
modifiers {X1, X2, X3, X4, X5, X6}, which correspond to the variables discussed in Sect. 3.
The set of all feature vectors is denoted as H. Three PS states are defined, i.e., stable (S),
unstable (U), and failure (F), respectively, as Zi(i = 1, 2, 3). Each class Zi is associated
with a discriminant function fi(x). Several articles are published that compare multiple SL
techniques (e.g., Garzon et al. 2006; Berrueta et al. 2007; Sakiyama et al. 2008; PinoMejas et al. 2008, 2010; Pozdnoukhov et al. 2009; Zhou et al. 2011; Gonzalez-Rufino
et al. 2013). Based on these articles and the focus herein on PS classifications, only a brief
description of each classification technique will be presented. For a more in-depth discussion, the reader is referred to the relevant references.
2.4.1 Linear discriminant analysis (LDA)

The goal of LDA proposed by Fisher (1936) is derived from the Bayes rule, assuming that
patterns belonging to class Z, and followed a normal (Gaussian) distribution with mean lZ
and non-singular covariance matrix R common to all the classes (Zhou et al. 2010, 2011;
Gonzalez-Rufino et al. 2013). Under these hypotheses, the Bayes rule assigns a test pattern
x to the class Z with the highest posterior probability w(Z|x), given by
1
logwZjx log pZ x lZ T R1 x lZ logjRj
2
where jRzj is the determinant of Rz, or, equivalently, to the class Z which maximizes the
linear function as follows:
The matrix
LZ x 2 log pZ lTZ R1 lZ 2lTZ R1 x
is approximated by the within-class covariance matrix

W X QMT X QM=n M
where X is the N 9 n-order training set matrix, Q is the N 9 M-order matrix of class
indicators, and M is the M 9 n matrix with the class means, n, N, and M denote the number
of inputs, training patterns, and classes, respectively.
2.4.2 Multinomial logistic regression (MLR)

MLR is found to work well for multiclass classification (Sadat-Hashemi et al. 2005;
Krishnapuram et al. 2005; Pandya et al. 2014). Let x and y be the matrices of predictors and
response with Z levels, respectively; hence, if wi = wr (y = i) and i = 1, 2, 3, , Z, then
by considering a multinomial distribution for y, we have RZi1 wi 1, so that a method to
predict wi s is to use a MLR as (Z - 1) equations on (Z - 1) dummy variables, yi, as
follows:

w
1
if y i
yi
4
; In i Xi bi ; i 1; 2; 3; . . .; Z 1
0
otherwise
wZ
where bi is regression coefficient for the ith equation, wi is the probability of obtaining the
ith outcome. We can compute wi, , wZ-1 and wZ for each subject using these (Z - 1)
equations. Each case will be allocated to jth category of y, if wj = Max(w1, w2, , wZ).
123
Nat Hazards (2015) 79:291316
299
2.4.3 Random forest (RF)

The RF technique is based on the use of a large series of low-dimensional regression trees.
Its theoretical development is presented by Breiman (2001). The RF algorithm used here is
implemented by Liaw and Wiener (2002). The main idea of RF algorithm is to reduce the
correlation among the trees in order to improve the variance reduction in bagging by
growing trees that perform random selection on the input variables.
RF is very user-friendly in the sense that it has only two parametersthe number of
variables in the random subset at each node (mtry) and the number of trees in the forest
(ntree)and is usually not very sensitive to their values (Kuhn and Johnson 2013).
2.4.4 Artificial neural network (ANN)

ANN is a computational paradigm that provides a great variety of mathematical nonlinear
models, useful for tackling different statistical problems (Haykin 1998; Pino-Mejas et al.
2010). In this work, multilayer perceptron neural network (MLPNN) is employed (PinoMejas et al. 2010). Defining G = (G1, , GM) as the vector of all the M coefficients of the
net, and given n q-sized target vectors y1, , yn, then the BroydenFletcherGoldfarb
Shanno procedure (Bishop 1995) method can be applied to the following nonlinear leastsquares problem (Pino-Mejas et al. 2010; Kuhn and Johnson 2013):

2
2
n
5
Min k RM
ii Gi Rii kyi yi k
G
The R implementation of an MLPNN model requires the specification of two parameters:

the size of the hidden layer (H) and the decay parameter (g), and it must be remarked that
greater values for H could not be attempted due to the limited memory resources of our
personal computers (Hastie et al. 2009; Gonzalez-Rufino et al. 2013).
2.4.5 Support vector machine (SVM)

SVM is among the most recent, significant developments in the field of discriminatory
analysis (Vapnik 1995). In the case of non-separable data, the ideal boundary must be
adapted to tolerate errors for some objects i:
8

1
<
minimize jdj2 CRni1 ni
6
2
:
under the constraints yi b d xi ni 1 and ni 0
where C is the penetrating parameter, d and b are, respectively, the normal vector and the
bias of the hyperplane, and each ni corresponds to the distance between the object i and the
corresponding margin hyperplane (Cortes and Vapnik 1995; Devos et al. 2009; Zhou et al.
2011, 2012, 2013).
To learn nonlinearly separable functions, data are implicitly mapped to a higher-dimensional space by means of mercer kernels, which can be decomposed into a dot product,
K(xi, xj) = u(xi)u(xj) (Zhou et al. 2012). Radial basis function kernel that is being
extensively used is given below:

2

K xi ; xj exp rxi xj
7
where r is the kernel parameter.
123
300
Nat Hazards (2015) 79:291316
2.4.6 Gradient boosting machine (GBM)

Friedman (2001) proposed the gradient boosting machine (GBM) using the connection
between boosting and optimization. GBM builds the model in a stage-wise fashion like
other boosting methods do, and it generalizes them by allowing optimization of an arbitrary differentiable loss function (Guelman 2012; Kuhn and Johnson 2013). Using this
technique, function approximation is viewed from the perspective of numerical optimization in the function space, rather than in the parameter space.
2.5 R Software
R (R Development Core Team 2013) is a popular open-source software environment for
statistical computing and data visualization available for most mainstream platforms
(http://www.R-project.org/). All data processing is performed using R software (version
3.02). R provides the most common SL classification methods. Herein, we will only list
some of them. Further details about input parameters, implementation, and references can
be found in package documentation manuals. The packages necessary for each method and
the functions utilized to build the models are summarized in Table 3.
3 Pillar stability assessment model development

3.1 Preparing training and testing data sets
In supervised learning, the performance of a classifier needs to be assessed on a given data
set before using it to predict the class of a new project. To do so, the original data set of PS
with known classes is randomly divided into two subsets: a training set and a test set. The
training set is required to estimate model parameters and construct the each PS classifier
model. In this study, approximately 70 % of the available data (177 cases of database) are
considered for training data set. The test set is used as external validation set for testing the
performance and predictive power of each final model. In this work, the reserved 74 cases
are considered as testing data set.
To guarantee comparability, all classifiers are generated using the same training set and
are validated by applying them to the same PS. Utilizing these classifiers, the classes of the
Table 3 R packages and functions used to run methods described in Sect. 3
Method
R package
R function
Tuning
parameters
References
LDA
MASS
lda
None
Fisher (1936), Venables and Ripley (2002)
MLR
nnet
multinom
{g}
ANN
nnet
nnet
{L, g}
Venables and Ripley (2002), Ripley (2009)

Haykin (1998), Pino-Mejas et al. (2010)
SVM
kernlab
svmRadial
{C, r}
Vapnik (1995), Karatzoglou et al. (2004)
RF
randomForest
randomForest
ntree, mtry
Breiman (2001), Liaw and Wiener (2002)
GBM
gbm
gbm
{ntree, v, J}
Friedman (2001), Ridgeway (2007)
Default function parameters are considered, optimized by tenfold CV whenever possible; mtry = number of
variables to choose the best split; ntree = number of trees; L = number of hidden neurons in the MLP;
g = decay parameter; C = penalization error coefficient; r = width of radial basis in the SVM function;
v = shrinkage; J = interaction.depth
123
Nat Hazards (2015) 79:291316
301
PS are predicted and subsequently the misclassification error rate is calculated with the
preferred classifier being the one with the lowest misclassification error rate. Scaling of the
inputoutput data is generally required prior to processing. All input variables are scaled
with function preProcess, which can be used to impute data sets based only on information
in the training set (Kuhn 2008).
3.2 Evaluation of classifiers performance

There is no generally accepted measure of performance for multiclass models. Thus, the
predictive power of SL algorithms on PS data is evaluated by means of the overall
accuracy (OA, Foody 2002) of the classification and the Cohens kappa coefficient (Kappa,
Cohen 1960) in this study. For each classification, a confusion matrix is presented, along
with its users and producers accuracy (Congalton and Green 2009). Let rij (i and j = 1, 2,
, m) is the joint frequency of observations assigned to class i by prediction and to class j
by observation data, ri? is the total frequency of class i as derived from the prediction, and
r?j is the total frequency of class j as derived from the observation data, as indicated in
Table 4. The OA, which is defined as the percentage of records that is correctly predicted
by the model relative to the total number of records among the classification models, is a
primary evaluation criterion. The OA can be obtained by

1 m
Ri1 rii 100%
8
OA
n
The Cohens kappa coefficient measures the proportion of correctly classified units after
the probability of chance agreement has been removed, which is a robust index which takes
into account the probability that a pixel is classified by chance (Kuhn and Johnson 2013).
The kappa is, therefore, always slightly lower than the classification accuracy rate measurement and can be obtained using the following expression
Kappa
m
nRm
i1 rii Ri1 ri ri
m
2
n Ri1 ri ri
where xii is the cell count in the main diagonal, n is the number of examples, Z is the
number of class values, and r?i, ri? are the columns and rows total counts, respectively.
Table 4 Population confusion matrix with rij representing the proportion of area in the prediction category
i and the observation category j
Predicted
Observed
1
Total
UA (%)
(r11/r1?) 9 100 %
r11
r12
r1m
r1?
r21
r22
r2m
r2?
(r22/r2?) 9 100 %
rm1
rm2
rmm
rm?
(rmm/rm?) 9 100 %
Total
r?1
r?1
r?m
PA (%)
(r11/r?1) 9
100 %
(r22/r?2) 9
100 %
(rmm/r?m) 9
100 %
PA producers accuracy, UA users accuracy
123
302
Table 5 Relative strength of
agreement associated with kappa
statistic
Nat Hazards (2015) 79:291316
Kappa statistic
Strength of agreement
0.811.00
Almost perfect
0.610.80
Substantial
0.410.60
Moderate
0.210.40
Fair
0.000.20
Slight
-1.000.00
Poor
The kappa measures the correct classification rate after the probability of chance
agreement has been removed (Congalton 1991). Landis and Koch (1977) proposed a scale
to describe the degree of concordance (Table 5); the kappa ranges from -1 (total disagreement) through 0 (random classification) to 1 (perfect agreement), as can be seen from
Table 5; a value of kappa below 0.4 is an indication of poor agreement and a value of 0.4
and above is an indication of good agreement (Landis and Koch 1977; Sakiyama et al.
2008).
According to Congalton and Green (2009), producers accuracy of class i (PAi) can be
computed by

pii
pii
100 %
100 %
10
PAi
pm
Rm
i1 pim
and the users accuracy of class i (UAi) can be computed by
!

pii
pii
100 %
100 %
UAi
pm
Rm
j1 pmj
11
3.3 Validation method of the proposed models

Several adjustable tuning parameters used by each of the SL method to optimize
classification performance are examined using repeated tenfold cross-validation (CV) in
terms of computation time and variance, which is the number of folds recommended by
Kohavi (1995) when comparing the performance of machine learning algorithms (Kohavi
1995; Le-Thi-Thu et al. 2011; Clark 2013; Kuhn and Johnson 2013). In this procedure,
compounds of the training data are randomly divided into 10 subsets. Nine subsets are used
as novel training data to develop each SL method, and the holdout set is used for
predict the performance of the fitted model. This process is repeated 10 times on
different training subsets, and at the end, every instance has been used exactly once for
testing, and finally, the CV estimate of overall accuracy is calculated by simply averaging
the 10 individual accuracy measures for CV accuracy, and the whole tenfold CV process is
also repeated 10 times (folds) to obtain the reliable results. This procedure is used for
the selection of parameters and to avoid over-fitting of models. The test set is never used in
the development of the model, but it is used to test the predictive power of the final model.
Thus, the repeated tenfold CV resampling technique (Molinaro et al. 2005) is used to
create and optimize SL models for hard rock pillars classification in the present work. We
construct the predictive models using selected variables and training set and applied to test
set as shown in Fig. 4.
123
Nat Hazards (2015) 79:291316
303
Stable
Unstable
Failed
Hard rock pillar stability database
Splitting into training and test sets
Training Set
(70%)
Test Set
(30%)
R environment
LDA, MLR, ANN, SVM, GBM and RF Classifiers

Repeated 10-fold cross validation
Optimal predictive
supervised learning model
Performance evaluation of predictive model (Accuracy and Kappa)
Fig. 4 Overall procedure flowchart for performance evaluation for PS classification using SL methods
3.4 SL method development and parameter optimization

This study examines the suitability of the following six common classification algorithms
using the PS data set: LDA, MLR, MLPNN, SVM, GBM, and RF algorithms. Most
classifiers (MLR, ANN, SVM, RF, and GBM methods) include parameters that have to be
tuned. The train function from caret (classification and regression training) package
within R (Kuhn 2008) performs a grid of tuning parameters for a number of classification
routines, which allows for a single consistent environment for training each of the SL
methods and tuning their associated parameters. After assessing the optimal parameters,
the whole training data set is used to build the final model for PS prediction. The term
PS refers to the classification task. A desired tune length variable can be passed to the
train function in caret package (Kuhn 2012; Kuhn and Johnson 2013). Optimal values
for tuning parameters are selected using a repeated tenfold CV based on the original
training data set, with the original test removed completely from the CV process. Tuning
parameters are considered optimized based on classification models that achieved the
largest value of kappa during the CV process. So it will find the one with the highest
accuracy and kappa, and thus, an optimal solution can be searched. Specific details on
tuning parameters used by the six SL algorithms examined in the current study are listed in
the following sections. The final results (classification rate and kappa and tuning parameters for each algorithm) are presented in Table 6.
LDA: This classifier needs no tuning of hyperparameters.
MLR: The parameter for weight decay (decay) is tuned for 10 values (0, 1e-04, 0.000237,
0.000562, 0.00133, 0.00316, 0.0075, 0.0178, 0.0422, and 0.1) to find the optimal model.
RF: Tuning of RF method involves finding optimum value for number of classification
trees (ntree) and number of variables (mtry), which are randomly selected at each split in
the tree building process. It has been observed that the OA is more sensitive to mtry and
not much effected by ntree (Breiman and Cutler 2007). Therefore, ntree is fixed at default
123
304
Nat Hazards (2015) 79:291316
Table 6 Tuning parameters of each model for an optimal classification

Method
Turning parameters
Model A
Model B
Model C
Model D
LDA
None
None
None
None
MLR
g = 0.1000
g = 0.1000
g = 0.0001
g = 0.1000
RF
{mtry, ntree} = {2,

500}
{mtry, ntree} = {1,

500}
{mtry, ntree} = {1,

500}
{mtry, ntree} = {4,

500}
ANN
{L, g} = {15,
0.0421}
{L, g} = {19, 0.0075}
{L, g} = {17,
0.0074}
{L, g} = {19,
0.0075}
SVM
{C, r} = {16, 0.552}
{C, r} = {16, 0.563}
{C, r} = {64, 1.187}
{C, r} = {16, 0.622}
GBM
{ntree, v, J} = {400,
0.1, 2}
{ntree, v, J} = {150,
0.1, 10}
{ntree, v, J} = {50,
0.1, 10}
{ntree, v, J} = {500,
0.1, 6}
value 500 and mtry is tested for t values, where t is the number of input layers in each
classification setup.
ANN: The number H of hidden neurons in the range 1 \ H \ 19 (ten values), and
trying 10 random weight initializations (decay). Delay = {0, 1e-04, 0.000237,
0.000562, 0.00133, 0.00316, 0.0075, 0.0178, 0.0422, and 0.1}.
SVM: The parameter C is tuned for 12 values (2-2, 2-1, 20, 21, 22, 23, 24, 25, 26, 27, 28,
and 29) to find the optimal model. The caret package initially estimates an
approximate value for the sigma parameter using the sigest function based on the
training data.
GBM: The GBM has three tweaking parameters: the total number of iterations
(n.trees), the learning rate (shrinkage parameter v), and the complexity of the tree
(indexed by interaction.depth J). Tuning parameter shrinkage is held constant at a
value of 0.1, n.trees = 50, 100, 150, 200, 250, 300, 350, 400, 450, 500; J = 1, 2, 3, 4,
5, 6, 7, 8, 9, 10.
4 Results and discussions

4.1 Classification results achieved by classifiers
Average values obtained from 10 repetitions of tenfold CV are implemented for all the
comparisons for each method. Visual comparison of the performance of all SL classification methods is indicated in Figs. 5, 6. The boxplot in Fig. 5 reports the performance of
six classifiers using different input variables on repeated tenfold CV phrase with training
set average accuracy and variance. On each box, the central mark is the median, the edges
of the box are the 25th and 75th percentiles, the whiskers extend to the most extreme data
points not considered outliers, and outliers are plotted individually. Since the notches in the
box plots overlap, we can conclude, with a certain confidence, that the true medians do not
differ (See Fig. 5). The outliers are marked separately with the dotted points. The difference between lower and upper quartiles in RF (Fig. 5a, c) is comparatively smaller than
the others that show relatively low variance of accuracies in different iterations. Density
plot methods can be used to visualize the resampling distributions (Kuhn and Johnson
123
Nat Hazards (2015) 79:291316
305
Fig. 5 Boxplot distributions of

training set in terms of
accuracy and kappa for six
SL methodsresulting from
repeated tenfold CV procedure
(a) Model A
(b) Model B
(c) Model C
(d) Model D
123
306
Nat Hazards (2015) 79:291316
Fig. 6 Density plot distributions

of training set in terms of
accuracy and kappa for six
SL methodsresulting from
repeated tenfold CV procedure
(a) Model A
(b) Model B
(c) Model C
(d) Model D
123
Nat Hazards (2015) 79:291316
307
Table 7 Performance metrics of each model for test data

Method
Model A
Model B
Model C
Model D
OA (%)
Kappa
OA (%)
Kappa
OA (%)
Kappa
OA (%)
Kappa
LDA
67.8
0.461
59.5
0.345
63.5
0.389
64.9
0.424
MLR
66.2
0.436
63.5
0.392
63.5
0.388
63.5
0.400
RF
82.4
0.723
75.7
0.609
71.6
0.543
73.0
0.576
ANN
81.1
0.703
75.7
0.611
75.7
0.613
75.7
0.615
SVM
82.4
0.726
68.9
0.502
71.6
0.550
79.7
0.684
GBM
79.7
0.678
77.0
0.636
66.2
0.474
75.7
0.616
Bolded values indicate the highest value for each model
2013). Figure 6 illustrates density plots of the 200 bootstrap estimates of accuracy and
kappa for the final model. Table 7 summarizes the overall predictability of test set by
comparing two measures between six classifiers, OA and kappa. The kappa is a measure of
true accuracy which takes into account the agreement that may have occurred by chance. It
is considered to be preferable when it is larger than 0.4 (Landis and Koch 1977). Not
surprisingly, the linear methods, such as LDA and MLR, did not do well herein, and this is
likely due to the model inability to handle nonlinear class boundaries. For the model A, in
terms of average accuracy rate for training set, SVM predictor achieved the highest OA
(84.3 %) and followed by ANN, GBM, and RF with average accuracy rates of 83.2, 80.9,
and 77.8 %, respectively. MLR performed relatively worse with an average accuracy rate
of 69.3 %, and LDA with the lowest average accuracy rate of 67.0 %. However, RF and
SVM achieved the highest OA (82.4 %) for test set. ANN performed relatively worse with
OA (81.1 %), and LDA with the lowest OA (66.2 %). One the other hand, the kappa of the
LDA, MLR, RF, ANN, SVM, and GBM techniques for training set and test set in model
calibration data is from moderate to substantial on the basis of the scale of concordance
presented by Landis and Koch (1977). The accuracies of all modeling techniques for the
model evaluation test set are from moderate to substantial according to the scale of concordance. For the model B, ANN predictor achieves the highest OA (80.3 %) for training
set and followed by RF, GBM, and SVM with average accuracy rates of 79.1, 78.4, and
76.9 %, respectively. LDA performs relatively worse with an average accuracy rate of
65.7 %, and MLR with the lowest average accuracy rate of 65.3 %. However, GBM
achieves the highest OA (77.0 %) for test set. GBM, ANN, and SVM performed relatively
worse with OA (75.778.4 %), and LDA with the lowest OA (59.5 %). One the other hand,
the kappa of LDA, MLR, RF, ANN, SVM, and GBM techniques for training set in model
calibration data is from moderate to substantial. The accuracies of all modeling techniques
for the model evaluation test set are from fair to substantial. Similarly, results for model C
and D can be seen in Fig. 5 and Table 7. We can observe the following facts: For the six
SL techniques, the performance (in terms of accuracy) of the training set falls into the
range of (65.384.3 %) across the four models, while the performance (in terms of
accuracy) of the test set performance falls into the range of (59.582.4 %). The predictive
accuracy of LDA, MLR, RF, ANN, SVM, and GBM techniques for the training set in
model calibration data ranges from moderate to substantial. The accuracy of all modeling
techniques for the test set ranges from fair to substantial.
123
308
Nat Hazards (2015) 79:291316
4.2 Comparison of SL classification techniques

Out of the four models in Table 7, the results indicate a better performance of SVM with
model A using five parameters (pillar height, pillar width, UCS, pillar strength, and pillar
stress) to train the SVM. A classification accuracy of 84.3 % is achieved with the RBF
kernel. Out of 74 test data records, 17 are incorrectly classified (Table 8). As can be seen
from Table 8, Model A provides the best results for testing data using all input parameters
with RF. RF produced the best outcome in terms of OA and kappa for test set (Table 8).
Higher classification accuracy with training set in comparison with the test set may not
indicate a better generalization capability. This may be due to the over-training (or
memorizing the training set) of models. A comparison of results from model A and D
suggests that the contribution of pillar strength is a bit sensitive (Fig. 5; Table 8). However, there are very few significant differences in OA and kappa among of the model B and
C, and the combinations of four (model A) or five (model D) input parameter results are
better than combinations of three input parameters (model B and C), thus suggesting the
combinations of five (model A) input parameters is the best choice.
For quantifying the accuracy of the classified maps, accuracy assessment based on a
confusion or error matrix is carried out using 74 independent reference samples. Four
metrics (Congalton and Green 2009): (1) OA, (2) kappa, (3) producer accuracy (PA), and
(4) user accuracy (UA) are calculated for each PS class using the model A from the
confusion matrix, which is presented in Table 8. The PA and UA indicate that some
features are better classified than others. As can be seen from Table 8, Stable
(90.696.9 % and 66.093.5 %) is classified more accurately compared to Unstable
(056.3 % and 069.2 %). Also Failed receives a relatively low PA (69.292.3 %) and
UA (69.288.5 %). Unstable is often confused with Stable and Failed. This is likely
the effect of overlapping classification rules and the small number of samples in this class.
The results show that some SL methods substantially outperform others for this classification problem. It is clear that SVM and RF are both capable of achieving high accuracy
for all classes despite the heavily unbalanced data set. In particular, the combination of the
five features SVM and RF classification is very effective and yields one of the best
classification results with the OA of 82.4 %.
As can be observed in Fig. 6, LDA and MLR of kappa distributions appear to be similar
to one another but different from the ANN, SVM, RF, and GBM methods. There are very
few statistically significant differences between these methods (ANN, SVM, RF, and
GBM) using the resampling results. Given this, any of these methods would be a reasonable choice. Since models are fit on the same versions of the training data, it makes
sense to make inferences on the differences between models. From the above observations,
the predictions are very unstable along the different SL methods, and none of the methods
is found as being excellent with respect to every test measure, but the methods of SVM and
RF, particularly the method of SVM, are performed to be better than the others ANN and
GBM; there is no significant difference in terms of generalization performance, but if we
have to choose the best, SVM is the one. However, GBM and ANN are the most computationally intensive techniques and take the longest amount of time to train. This high
classification accuracy likely results from the computationally intensive back-propagation
process, during which feature weights are modified according to an iterative method. In
addition, LDA performs the worst in terms of training set. Its predictive performance is
also worse than the learner of MLR. While MLR performs the worst in terms of test set, its
predictive performance is also worse than the learner of LDA.
123
69.2 %
PA (%)
26
26
88.5 %
Total
PA (%)
90.6 %
32
29
56.3 %
16
0.0 %
26
74
74
13
35
OA = 82.4 % Kappa = 0.723
69.2 %
82.9 %
88.5 %
OA = 62.2 % Kappa = 0.436
0.0 %
PA (%)
Total
PA (%)
Total
92.3 %
26
24
88.5 %
26
23
88.5 %
26
23
Observed
90.6 %
32
29
90.6 %
32
29
90.6 %
32
29
37.5 %
16
56.3 %
16
50.0 %
16
74
11
35
28
74
14
31
29
74
14
34
26
Total
OA = 79.7 % Kappa = 0.678
54.5 %
82.9 %
85.7 %
OA = 82.4 % Kappa = 0.726
64.3 %
93.5 %
79.3 %
OA = 81.1 % Kappa = 0.703
57.1 %
85.3 %
88.5 %
UA (%)
Diagonal elements (correct decisions) are marked in bold letters. OA overall classification accuracy, PA producers accuracy, UA users accuracy
23
96.9 %
16
66.0 %
69.2 %
PA (%)
32
47
GBM
26
Total
69.2 %
PA (%)
RF
31
OA = 67.8 % Kappa = 0.461
Total
18
74
SVM
6.3 %
16
50.0 %
MLR
96.9 %
32
26
67.4 %
Total
46
31
69.2 %
26
ANN
Predicted
UA (%)
18
Total
Observed
LDA
Predicted
Table 8 Confusion matrices and associated classifier accuracies for best model predictions based on test data of hard rock pillars
Nat Hazards (2015) 79:291316

309
123
310
Nat Hazards (2015) 79:291316
Furthermore, Lunder and Pakalnis (1997) proposed that the pillar stability could be
adequately expressed by two SF lines. Pillars with a SF [1.4 stable while those with a SF
\1 are failed, and the transition zone from stable condition to failed condition
(1\SF \ 1.4) is referred to as unstable, and pillars in this region are prone to spalling and
slabbing but have not completely failed (Lunder and Pakalnis 1997; Martin and Maybee
2001). Similarly, work on classification of pillars in marble mines has been established by
Gonzalez-Nicieza et al. (2006); they suggested: SF [1.25 (stable), 0.90 \ SF \ 1.25
(unstable), and SF \0.90 (failed). The predictive accuracy of two methods is 68.9 %
(Lunder and Pakalnis 1997) and 68.5 % (Gonzalez-Nicieza et al. 2006) of the original data,
respectively. It is obvious judging from the predictive accuracy that the empirical methods
cannot generate satisfactory predictions on these pillar instances.
4.3 Superiority and Limitations

The primary strength of this study is in the systematic assessment of the pillar stability
classification in hard rock mines utilizing six SL methods. It is evident that a powerful
statistical programming language system is necessary to implement this computing task; a
good and cheap choice is to employ the R system, offering us the free implementation of
SL classification methods.
Although there are important discoveries revealed by these studies, there are also
limitations. First, training any classifier with an unbalanced data set in favor of negative
instances makes it difficult to learn the positive instances. The unbalanced distribution in
prior probabilities of the three classes in both training and test sets affects the reliability of
the predictor in all SL approaches. Second, other relevant variables of pillar stability may
be collected in an effort to increase the prediction accuracy of the models. Third, regarding
the comparison of more SL methods, other newly developed classification methodologies
can be employed and their results can then be compared with those studied in the paper.
Additionally, discontinuities and joint factors have been omitted in this study. On the other
hand, every approach has its advantages and disadvantages; Table 9 summarizes the
strengths and weaknesses of the six SL methods described above.
4.4 Relative importance of variables

The generic function varImp () in caret package can be used to characterize the general
effect of predictors on the model (Kuhn and Johnson 2013). VarImp also works with
objects produced by train, but this is a simple wrapper for the specific models previously
listed. In this work, we illustrate how to determine relative importance of discriminating
features by RF method. For most classification methods, each predictor will have a separate variable importance for each class and the default variable importance metric regards
the area under the curve or AUC from a ROC analysis with regard to each predictor, and is
model independent.
Variables are sorted by average importance across the classes in this work. From Fig. 7,
we can find how important the variables are for PS classification with RF method. Figure 7a demonstrates that pillar stress is the most sensitive factor among the indicators for
the prediction of PS classification. Not surprisingly, the indicator pillar strength takes the
second place of sensitivity. The index of pillar width is a bit sensitive. The factors of UCS
and pillar height are not as sensitive as the former three factors. Figure 7bc also provides
the result of the RF method using function varImp() in the caret package and displays
the relative variable importance for each of the three predictor variables. Pillar stress is also
123
Nat Hazards (2015) 79:291316
311
Table 9 Summary of strengths and weaknesses of the various SL methods

Method
Strengths
Weaknesses
LDA
Simple, fast, efficient, strong theory base,

linear, interpretability
LDA only performs well when all classes are

strictly homogeneous and cannot be used if
the number of variables is higher than the
total number of samples
MLR
Extract more information from the data and

prevent the loss of information due to
collapsing
Computationally intensive
RF
Works well with high-dimensional small

sample sizes, some tolerance to correlated
inputs, fast computation
Difficult to interpret, prone to overfitting in

certain data sets, do not handle large number
of irrelevant features as well as other
ensemble methods
SVM
Can be used to classify complex biological

nonlinear data, not prone to local minima,
works well with high-dimensional small data
sets, avoids overfitting, robust to noise
Very black box, computational scalability, lack

of transparency, restricted to pairwise
classification, cannot be used directly for
feature selection
ANN
Nonlinear adaptability, no assumptions

required on probability density and
distribution, certain configurations have been
proven to being a universal approximator
Difficult to design an optimal architecture;

High computational cost, not robust to
outliers, loss of generality, risk of overfitting,
prone to sub-optimal local minima, inability
to extract features responsible for results,
black box presents uncertainties for mission
critical applications
GBM
Theoretical properties, identification of outliers
High computational cost, not interpretable
the most sensitive factor among all response variables for model B and Model C, followed
by the indicator pillar strength, K, UCS. Figure 7d shows the different results between
model A and model D for variable importance of UCS and pillar width. However, these
results demonstrate that pillar stress is the most relevant predictor among the indicators for
the prediction of PS classification.
5 Summary and conclusions

A data set of 251 pillar cases compiled from published research work available in recent
years is utilized to construct the proposed models. A comparison of the performances of six
SL classifiers for the prediction of PS in hard rock mines is discussed. Based on the
analysis results, the following conclusions can be drawn. First, none of the SL classification methods should be used blindly, as none of them deemed to be a fully automatic
classification method. For the PS data set, the use of repeated tenfold CV strategy for
selecting appropriate parameters in the tuning process is necessary. The use of multiple,
random splitting into training set and test set is also needed for a reliable model comparison. Among the six different SL methods, SVM and RF are found to be the best
methods. Nonlinear classification methods ANN and GBM have a slightly higher performance and reliability than linear classifiers LDA and MLR. The comparisons indicate that
the model A consisting of five input variables (pillar width, pillar height, UCS, pillar
strength, and pillar stress) with SVM and RF methods is more reliable for evaluating pillar
123
312
Nat Hazards (2015) 79:291316
Fig. 7 Variable importance

assessment in each model for
predicting PS with RF method
(a) Model A
(b) Model B
(c) Model C
(d) Model D
stability than other models. RF demonstrates that pillar stress is the most relevant PS
predictor for all models A, B, C, and D. Finally, for the six SL classifiers studied, the
performance (in terms of accuracy) of the training set falls in the range of 65.384.3 %
across the six models with different input parameter combination, while the performance
(in terms of accuracy) of the test set performance falls into the range of 59.582.4 %.
123
Nat Hazards (2015) 79:291316
313
Acknowledgments This research was partially supported bythe National Natural Science Foundation
Project (Grant Nos. 11472311 and 41272304) of China, the Graduated Students Research, Innovation Fund
Project (Grant No. CX2011B119) of Hunan Province of China, Project (Grant No. 1343-76140000022)
supported by the Scholarship Award for Excellent Doctoral Student of Ministry of Education of China and
the Valuable Equipment Open Sharing Fund of Central South University. The authors would like to express
thanks to these foundations. The first author would like to thank the Chinese Scholarship Council for
financial support to the joint PhD at McGill University, Canada. We also would like to thank the three
anonymous referees and editors for their valuable comments and suggestions which improved a previous
version of this manuscript.
References
Berrueta LA, Alonso-Salces RM, Heberger K (2007) Supervised pattern recognition in food analysis.
J Chromatogr A 1158(12):196214
Bieniawski ZT (1968) The effects of specimen size on the compressive strength of coal. Int J Rock Mech
Min Sci Geomech Abstr 5(4):325335
Bieniawski ZT, Van Heerden WL (1975) The significance of in situ tests on large rock specimens. Int J
Rock Mech Min Sci Geomech Abstr 12(4):101113.
Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, New York
Brady B, Brown ET (2003) Rock mechanics for underground mining, 2nd edn. Chapman and Hall, London
Breiman L (2001) Random forests. Mach Learn 45:532
Breiman L, Cutler A (2007) Randomforestsclassification description: randomforests. http://stat-www.
berkeley.edu/users/breiman/RandomForests/cc_home.htm. Accessed 15 Jan 2014
Cauvin M, Verde T, Salmon R (2009) Modeling uncertainties in mining pillar stability analysis. Risk Anal
29(10):13711380
Clark M (2013) An introduction to machine learning: with applications in R. http://www3.nd.edu/
*mclark19/learn/ML.pdf
Congalton RG, Green K (2009) Assessing the accuracy of remotely sensed data: principles and practices,
2nd edn. Lewis, Boca Raton
Deng J, Yue ZQ, Tham LG, Zhu HH (2003) Pillar design by combining finite element methods, neural
networks and reliability: a case study of the Feng Huangshan copper mine, China. Int J Rock Mech
Min Sci 40(4):585599
Elmo D, Stead D (2010) An integrated numerical modellingdiscrete fracture network approach applied to
the characterisation of rock mass strength of naturally fractured pillars. Rock Mech Rock Eng
43(1):319
Esterhuizen GS (1993) Variability considerations in hard rock pillar design. In: Proceedings of the SANGORM symposium: rock engineering problems related to hard rock mining at shallow to intermediate
depth, Rustenburg, South Africa
Esterhuizen GS, Dolinar DR, Ellenberger JL (2011) Pillar strength in underground stone mines in the United
States. Int J Rock Mech Min Sci 48(1):4250
Fisher DH (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7(7):179188
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat
29(5):11891232
Garzon MB, Blazek R, Neteler M, de Dios RS, Ollero HS, Furlanello C (2006) Predicting habitat suitability
with machine learning models: the potential area of Pinus sylvestris L. in the Iberian Peninsula. Ecol
Model 97:383393
Ghasemi E, Shahriar K (2012) A new coal pillars design method in order to enhance safety of retreat mining
in room and pillar mines. Saf Sci 50:579585
Ghasemi E, Shahriar K, Sharifzadeh M, Hashemolhosseini H (2010) Quantifying the uncertainty of pillar
safety factor by Monte Carlo simulationa case study. Arch Min Sci 55:623635
Ghasemi E, Ataei M, Shahriar K (2014a) An intelligent approach to predict pillar sizing in designing room
and pillar coal mines. Int J Rock Mech Min Sci 65:8695
Ghasemi E, Ataei M, Shahriar K (2014b) Prediction of global stability in room and pillar coal mines. Nat
Hazards 118
Gonzalez-Nicieza C, Alvarez-Fernandez MI, Menendez-Daz A, Alvarez-Vigil AE (2006) A comparative
analysis of pillar design methods and its application to marble mines. Rock Mech Rock Eng
39(5):421444
123
314
Nat Hazards (2015) 79:291316
Gonzalez-Rufino E, Carrion P, Cernadas E, Fernandez-Delgado M, Domnguez-Petit R (2013) Exhaustive

comparison of colour texture features and classification methods to discriminate cells categories in
histological images of fish ovary. Pattern Recogn 46(9):23912407
Griffiths DV, Fenton GA, Lemons CB (2007) The random finite element method (RFEM) in mine pillar
stability analysis. Probabilist Methods Geotech Eng 491:271294
Griffiths DV, Fenton GA, Lemons CB (2002) Probabilistic analysis of underground pillar stability. Int J
Numer Anal Meth Geomech 26(8):775791
Guelman L (2012) Gradient boosting trees for auto insurance loss cost modeling and prediction. Expert Syst
Appl 39:36593667
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and
prediction, 2nd edn. Springer, New York
Haykin S (1998) Neural networks: a comprehensive foundation, 2nd edn. Prentice Hall, New Jersey
Hedley DGF, Grant F (1972) Stope-and-pillar design for the Elliot Lake Uranium Mines. Bull Can Inst Min
Metall 65:3744
Hoek E, Brown ET (1980) Underground excavation in rock. The Institute of Mining and Metallurgy,
London
Hudyma MR (1988) Rib pillar design in open stope mining. MASc thesis, University of BC
Hustrulid WA (1976) A review of coal pillar strength formulas. Rock Mech 8(2):115145
Hutchinson DJ, Phillips C, Cascante G (2002) Risk considerations for crown pillar stability assessment for
mine closure planning. Geotech Geol Eng 20(1):4163
Jaiswal A, Shrivastva BK (2009) Numerical simulation of coal pillar strength. Int J Rock Mech Min Sci
46(4):779788
Jaiswal A, Sharma SK, Shrivastva BK (2004) Numerical modeling study of asymmetry in the induced
stresses over coal mine pillars with advancement of the goaf line. Int J Rock Mech Min Sci
41(5):859864
Jawed M, Sinha RK, Sengupta S (2013) Chronological development in coal pillar design for bord and pillar
workings: a critical appraisal. J Geol Min Res 5(1):111
Karatzoglou A, Smola A, Hornik K, Zeileis A (2004) Kernlaban S4 package for kernel methods in R.
J Stat Softw 11(9):120
Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In:
IJCAI95 Proceedings of the 14th international joint conference on artificial intelligence, vol 2.
Morgan Kaufmann Publishers Inc., San Francisco, pp 11371143
Kordon AK (2010) Applying computational intelligence: How to create value. The Dow Chemical Company, Freeport, p 459
Krishnapuram B, Carin L, Figueiredo MA, Hartemink AJ (2005) Sparse multinomial logistic regression: fast
algorithms and generalization bounds. Pattern Anal Mach Intell IEEE Trans 27(6):957968
Kuhn M (2008) Building predictive models in R using the caret package. J Stat Softw 28(5):126
Kuhn M (2012) Caret package (R Package Version 5.15-023). R Foundation for Statistical Computing,
Vienna, Austria
Kuhn M, Johnson K (2013) Applied predictive modeling. Springer, New York
Landis J, Koch G (1977) The measurement of observer agreement for categorical data. Biometrics
33(1):159174
Li XB, Li DY, Guo L, Ye ZY (2007) Study on mechanical response of highly-stressed pillars in deep mining
under dynamic disturbance. Chin J Rock Mech Eng 26(5):922928
Li XB, Li DY, Liu ZX, Zhao GY, Wang WH (2013) Determination of the minimum thickness of crown
pillar for safe exploitation of a subsea gold mine based on numerical modelling. Int J Rock Mech Min
Sci 57:4256
Liaw A, Wiener M (2002) Classification and regression by randomforest. R News 2:1822
Liu XZ, Zhai DY (2000) The reliability design of pillar. Chin J Rock Mech Eng 18(6):8588
Liu ZB, Shao JF, Xu WY, Meng YD (2013) Prediction of rock burst classification using the technique of
cloud models with attribution weight. Nat Hazards 68(2):549568
Liu ZB, Shao JF, Xu WY, Chen HJ, Zhang Y (2014) An extreme learning machine approach for slope
stability evaluation and prediction. Nat Hazards 118
Lunder PJ (1994) Hard rock pillar strength estimation an applied empirical approach. University of British
Columbia, Vancouver (MASc thesis)
Lunder PJ, Pakalnis R (1997) Determination of the strength of hard rock mine pillars. Bull Can Inst Min
Metall 90(1013):5155
Mark C (2006) The evolution of intelligent coal pillar design: 19812006. In: Proceedings of the 25th
international conference on ground control in mining. West Virginia University, Morgantown,
pp 325334
123
Nat Hazards (2015) 79:291316
315
Mark C, Barton TM (1997) Pillar design and coal strength. In: Proceedings of the New Technology for
Ground Control in Retreat Mining. US Department of Health and Human Services, Public Health
Service, Centers for Disease Control and Prevention, National Institute for Occupational Safety and
Health, Pittsburgh, PA, pp 4959, DHHS (NIOSH) Publication (No. 97122)
Martin CD, Maybee WG (2000) The strength of hard-rock pillars. Int J Rock Mech Min Sci 37:12391246
Mitri HS (2007) Assessment of horizontal pillar burst in deep hard rock mines. Int J Risk Assess Manag
7(5):695707
Molinaro AM, Simon R, Pfeiffer RM (2005) Prediction error estimation: a comparison of resampling
methods. Bioinformatics 21(15):33013307
Monjezi M, Hesami SM, Khandelwal M (2011) Superiority of neural networks for pillar stress prediction in
bord and pillar method. Arab J Geosci 4(56):845853
Mortazavi A, Hassani FP, Shabani M (2009) A numerical investigation of rock pillar failure mechanism in
underground openings. Comput Geotech 36(5):691697
Pandya DH, Upadhyay SH, Harsha SP (2014) Fault diagnosis of rolling element bearing by using multinomial logistic regression and wavelet packet transform. Soft Comput 18(2):255266
Pino-Mejas R, Carrasco-Mairena M, Pascual-Acosta A, Cubiles-De-La-Vega M, Joaqun Munoz-Garca J
(2008) A comparison of classification models to identify the Fragile X Syndrome. J Appl Stat
35(3):233244
Pino-Mejas R, Cubiles-de-la-Vega MD, Anaya-Romero M, Pascual-Acosta A, Jordan-Lopez A, Bellinfante-Crocci N (2010) Predicting the potential habitat of oaks with data mining models and the R
system. Environ Model Softw 25(7):826836
Potvin Y, Hudyma M, Miller HDS (1989) Rib pillar design in open stope mining. Bull Can Inst Min Metall
82(927):3136
Pozdnoukhov A, Foresti L, Kanevski M (2009) Data-driven topo-climatic mapping with machine learning
methods. Nat Hazards 50(3):497518
R Development Core Team (2013) R: a language and environment for statistical computing. R Foundation
for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. http://www.R-project.org/
Ridgeway G (2007) Generalized boosted models: a guide to the gbm package. http://cran.r-project.org/web/
packages/gbm/index.html
Ripley B (2009) Nnet: feed-forward neural networks and multinomial log-linear models. http://cran.rproject.org/web/packages/nnet/index.html
Sadat-Hashemi SM, Kazemnejad A, Lucas C, Badie K (2005) Predicting the type of pregnancy using
artificial neural networks and multinomial logistic regression: a comparison study. Neural Comput
Appl 14(3):198202
Sakiyama Y, Yuki H, Moriya T, Hattori K, Suzuki M, Shimada K, Honma T (2008) Predicting human liver
microsomal stability with machine learning techniques. J Mol Graph Model 26:907915
Salamon MDG (1970) Stability, instability and design of coal pillar workings. Int J Rock Mech Min Sci
Geomech Abstr 7(6):613631
Salamon MDG, Munro AH (1967) A study of the strength of coal pillars. J South Afr Inst Min Metall
68:5567
Sheng JH, Liao WJ, Li WM (2010) Analysis of pillar safety factor in Gaoshan gypsum mine. Metal Mine
(suppl.):791793
Sjoberg JS (1992) Failure modes and pillar behaviour in the Zinkgruvan mine. In: Tillerson JA, Wawersik
WR (eds) Proceedings of the 33rd U.S. rock mechanics symposium. A.A. Balkema, Sante Fe. Rotterdam, pp 491500
Tawadrous AS, Katsabanis PD (2007) Prediction of surface crown pillar stability using artificial neural
networks. Int J Numer Anal Meth Geomech 31(7):917931
Tesfamariam S, Liu Z (2010) Earthquake induced damage classification for reinforced concrete buildings.
Struct Saf 32(2):154164
Vapnik VN (1995) The nature of statistical learning theory. Springer, New York
Venables WN, Ripley BD (2002) Modern applied statistics with S. Springer, New York
Von Kimmelman MR, Hyde B, Madgwick RJ (1984) The use of computer applications at BCL Limited in
planning pillar extraction and design of mining layouts. In: Proceedings of the ISRM symposium:
design and performance of underground excavations. British Geotechnical Society, London, p 53J63
Wattimena RK (2014) Predicting the stability of hard rock pillars using multinomial logistic regression. Int J
Rock Mech Min Sci 71:3340
Wattimena RK, Kramadibrata S, Sidi ID, Azizi MA (2013) Developing coal pillar stability chart using
logistic regression. Int J Rock Mech Min Sci 58:5560
York G (1998) Numerical modelling of the yielding of a stabilizing pillar/foundation system and a new
design consideration for stabilizing pillar foundations. J. S.A. Inst Min Metall 98:281293
123
316
Nat Hazards (2015) 79:291316
Zhou J, Shi XZ, Dong L, Hu HY, Wang HY (2010) Fisher discriminant analysis model and its application
for prediction of classification of rockburst in deep-buried long tunnel. J Coal Sci Eng 16(2):144149
Zhou J, Li XB, Shi XZ, Wei W, Wu BB (2011) Predicting pillar stability for underground mine using Fisher
discriminant analysis and SVM methods. Trans Nonferrous Metals Soc China 21(12):27342743
Zhou J, Li XB, Shi XZ (2012) Long-term prediction model of rockburst in underground openings using
heuristic algorithms and support vector machines. Saf Sci 50(4):629644
Zhou J, Li XB, Mitri HS, Wang SM, Wei W (2013) Identification of large-scale goaf instability in
underground mine using particle swarm optimization and support vector machine. Int J Min Sci
Technology 23(5):701707
123

Comparative Performance of Six Supervised Learning Methods For The Development of Models of Hard Rock Pillar Stability Prediction

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Comparative Performance of Six Supervised Learning Methods For The Development of Models of Hard Rock Pillar Stability Prediction

Hochgeladen von

Copyright:

Verfügbare Formate

Nat Hazards (2015) 79:291316