Nat Hazards (2015) 79:291–316 DOI 10.1007/s1106901518423
ORIGINAL PAPER 
Comparative performance of six supervised learning methods for the development of models of hard rock pillar stability prediction
Jian Zhou ^{1}^{,}^{2} • Xibing Li ^{1} • Hani S. Mitri ^{2}^{,}^{3}
Received: 16 May 2014 / Accepted: 30 May 2015 / Published online: 10 June 2015 Springer Science+Business Media Dordrecht 2015
Abstract The prediction of pillar stability (PS) in hard rock mines is a crucial task for which many techniques and methods have been proposed in the literature including machine learning classiﬁcation. In order to make the best use of the large variety of statistical and machine learning classiﬁcation methods available, it is necessary to assess their performance before selecting a classiﬁer and suggesting improvement. The objective of this paper is to compare different classiﬁcation techniques for PS detection in hard rock mines. The data of this study consist of six features, namely pillar width, pillar height, the ratio of pillar width to its height, uniaxial compressive strength of the rock, pillar strength, and pillar stress. A total of 251 pillar cases between 1972 and 2011 are analyzed. Six supervised learning algorithms, including linear discriminant analysis, multinomial logistic regression, multilayer perceptron neural networks, support vector machine (SVM), random forest (RF), and gradient boosting machine, are evaluated for their ability to learn for PS based on different input parameter combinations. In this study, the available data set is randomly split into two parts: training set (70 %) and test set (30 %). A repeated tenfold crossvalidation procedure (ten repeats) is applied to determine the optimal parameter values during modeling, and an external testing set is employed to validate the prediction performance of models. Two performance measures, namely classiﬁcation accuracy rate and Cohen’s kappa, are employed. The analysis of the accuracy together with kappa for the
& Jian Zhou csujzhou@hotmail.com
Xibing Li
xbli@mail.csu.edu.cn
Hani S. Mitri hani.mitri@mcgill.ca
^{1} School of Resources and Safety Engineering, Central South University, Changsha 410083, China
^{2} Department of Mining and Materials Engineering, McGill University, Montreal, QC H3A 2A7, Canada
^{3} School of Civil Engineering, Henan Polytechnic University, Jiaozuo 454150, China
292
Nat Hazards (2015) 79:291–316
PS data set demonstrates that SVM and RF achieve comparable median classiﬁcation accuracy rate and Cohen’s kappa values. All models are ﬁtted by ‘‘R’’ programs with the libraries and functions described in this study.
Keywords Pillar stability Pillar design Hard rock mine Supervised learning Classiﬁcation Repeated crossvalidation R system
1 Introduction
Underground mining almost invariably involves leaving portions of the ore in the form of pillars, which are key structural columns (Brady and Brown 2003; Deng et al. 2003; Zhou et al. 2011). Pillar stability is an essential prerequisite for safe working conditions in room andpillar mines (Salamon 1970; Ghasemi et al. 2014a). Unstable pillars can result in rock sloughing from the pillar and can lead to the collapse of the roof if one or more pillars should fail (Mortazavi et al. 2009). As mining goes deeper, pillar failure becomes more frequent and critical due to the increase in ambient stresses. Consequently, pillar design and stability are two of the most complicated, extremely important and extensive problems in mining related to rock mechanics and ground control subjects. Because of their signiﬁcance in safe and economical extraction of underground ores over the past decades, a great deal of valuable results on this topic has been reported by a number of authors on a variety of aspects and has made admirable efforts over the pillar design and layout applied in rocks. Various researchers have proposed a number of empirical design methods for pillar strength determination and often applied in practice, which have been reviewed and summarized in the literature (Hustrulid 1976; Lunder 1994; Brady and Brown 2003; Mark 2006; Mitri 2007; Jawed et al. 2013), i.e., the linear shape effect formula (Bieniawski and van Heerden 1975; York 1998), the power shape effect formula (Salamon and Munro 1967; Bieniawski 1968; Hedley and Grant 1972), the size effect formula (Hustrulid 1976), the Hoek–Brown formula (Hoek and Brown 1980), and analysis of retreat mining pillar stability method (Mark and Barton 1997; Ghasemi and Shahriar 2012). In an underground pillar design, it is difﬁcult to determine the actual stress that will be acting on a pillar. However, the three main methods of calculating pillar stress are tributary area theory, numerical modeling (Lunder 1994), and neural network method (Monjezi et al. 2011). Thus, the stability of a pillar can be evaluated by calculating a safety factor (SF), which is the ratio of the average strength to the average stress in the pillar (Zhou et al. 2011). Theoretically, the SF value greater than 1 means that the pillar is stable, while the SF value lower than 1 means unstable. More often than not, these methods are questionable because pillar failures did occur even though the failed pillars had been considered stable, i.e., SF [ 1 (Deng et al. 2003; Zhou et al. 2011). Moreover, empirical methods are based on interpretation of available databases, which are compiled from ongoing or completed projects. It is therefore difﬁcult to generalize the obtained results beyond the scope of the original site characteristics. Meanwhile, considerable work related to the prediction of PS has been undertaken by means of numerical simulation methods that allow for consideration of complex boundary conditions and material behavior. For example, a design methodology was proposed by York (1998) using the fast Lagrangian analysis of continua (FLAC) code to enable the yield point of the foundation of deeplevel stabilizing pillars to be predicted in terms of the
Nat Hazards (2015) 79:291–316
293
cohesion, friction angle, and depth. Hutchinson et al. (2002) recommended the use of simulation methods for considerations crown pillar stability risk assessment in mine planning. Jaiswal et al. (2004) used threedimensional boundary element method (BEM) to model asymmetry in the induced stresses over coal mine pillars with complex geometries and have enabled successful simulation of mining conditions. Grifﬁths et al. (2007) combined random ﬁeld theory with an elastoplastic ﬁnite element method (FEM) algorithm in a Monte Carlo framework to estimate the stability of pillars. By using an explicit ﬁnite difference program FLAC3D, a model of numerical calculation was established by Li et al. (2007) for a deep mining pillar with dynamic disturbance under high stress. Numerical modeling was carried out by Jaiswal and Shrivastva (2009) using a threedimensional FEM code to study the stress–strain behavior of coal pillars. Mortazavi et al. (2009) delved into the mechanisms involved in pillar failure as well as to investigate the nonlinear behavior of rock pillars within the FLAC model. Elmo and Stead (2010) investigated the use of the hybrid FEM/DEM code ELFEN in studying the failure modes of jointed pillars. Recently, Li et al. (2013) established 3D numerical modeling based on FLAC3D to determine the minimum thickness of the crown pillar for the subsea gold mine. Each of the numerical methods has its advantages and disadvantages. However, the estimation of reliable values of model input parameters is found to be an increasingly difﬁcult task. Besides the numerical modeling approach, statistical and analytical methods, the probabilistic methods, and artiﬁcial intelligencebased methods or their hybrids have been investigated in recent years and successfully used for designing pillars in coal or hardrock. Esterhuizen (1993) showed that variability in rock mass properties and mining factors could be taken into consideration for hard rock pillar design by statistical methods and point estimate method. Grifﬁths et al. (2002) and Cauvin et al. (2009) investigated the underground pillar stability based on probabilistic methods. Ghasemi et al. (2010) studied the effect of variability in parameters such as uniaxial compressive strength of coal specimen, pillar width, pillar height, entry width, and depth of cover on pillar safety factors using a Monte Carlo simulation. Zhou et al. (2011) presented two models for predicting pillar stability applying support vector machine and Fisher discriminant analysis tech niques. Wattimena et al. (2013) developed the logistic regression model for predicting the probability of stability of a coal pillar. On the other hand, different types of artiﬁcial neural networks are based on combining different learning techniques, such as hybrid or ensemble techniques. These have been reported on pillar stability analysis in recent years. Deng et al. (2003) proposed a pillar design based on Monte Carlo simulation by combining ﬁnite element methods, neural networks, and reliability analysis. Four ANNs, based on two different architectures, the multilayer perceptron (MLP) and the radial basis function (RBF), were constructed by Tawadrous and Katsabanis (2007) to predict the stability of surface crown pillars. Monjezi et al. (2011) developed a MLP neural network model methodology to predict the pillar stress concentration in the bord and pillar method and compare the results with BEM numerical solution. Recently, Ghasemi et al. (2014a, b) developed two models for the evaluation and prediction of global stability in roomand pillar coal mines considering the retreat mining conditions by employing the logistic regression and the fuzzy logic techniques. In these studies, all the data are separated into training and testing sets. However, crossvalidation process is not implemented, and thus, the accuracy of the predictive model is not fully understood. Hence, the issue of pillar stability prediction still poses considerable challenge for underground mines. Supervised learning (SL) has become steadily more mathematical and more successful in applications over the past 20 years. The use of SL algorithms for the development of predictive and descriptive data mining models has become widely accepted in mining and
294
Nat Hazards (2015) 79:291–316
geotechnical applications, promising powerful new tools for practicing engineers (Garzo´ n et al. 2006; Berrueta et al. 2007; Sakiyama et al. 2008; PinoMejı´as et al. 2008, 2010; Pozdnoukhov et al. 2009; Tesfamariam and Liu 2010; Zhou et al. 2011, 2012, 2013; Gonza´ lezRuﬁno et al. 2013; Liu et al. 2013, 2014). Numerous approaches for PS pre diction have been developed based on different SL techniques during recent decades (Tawadrous and Katsabanis 2007; Zhou et al. 2011; Wattimena et al. 2013; Ghasemi et al. 2014a, b). However, there is no comparison of SL techniques over the PS estimation. Based on these considerations, the main objective of this study is to investigate the suitability of different SL algorithms for the prediction of pillar stability (PS) in under ground engineering. To achieve this goal, a research methodology is developed for the comparison of the performance of different SL algorithms, including linear discriminant analysis (LDA), multinomial logistic regression (MLR), multilayer perceptron neural networks (MLPNN), support vector machine (SVM), random forest (RF), and gradient boosting machine (GBM). These methods are speciﬁcally chosen because they are being increasingly used in engineering, yet have not been compared with one another exhaus tively and also thanks to the opensource software availability. The rest of this paper is organized as follows: Sect. 2 brieﬂy presents hard rock pillars data set and provides an overview of SL techniques. In Sect. 3, these methods are applied to the PS prediction, and in Sect. 4, the results are discussed by performance criteria. Finally, the conclusion is provided in Sect. 5.
2 Materials and methods
2.1 Data sources and parameters
To measure the performance of the developed SL approaches, the data utilized in this study are composed of 251 cases of pillar histories collected from more than 10 published research works. The sources are reliable and include references published over the period 1972–2011. Field data are obtained from ten different databases of hard rock mines, which are: (1) Elliot Lake uranium mines in Canada (Hedley and Grant 1972); (2) SelebiPhikwe mines in South Africa (Von Kimmelman et al. 1984); (3) open stope mines in Canada (Hudyma 1988); (4) Zinkgruvan mine in Sweden (Sjoberg 1992); (5) Westmin Resources Ltd.’s HW mine in Canada (Lunder 1994); (6) Dawenkou gypsum mine in China (Liu and Zhai 2000); (7) Shizishan copper mine in China (Zheng 2002); (8) Marble mine in Spain (Gonza´ lezNicieza et al. 2006); (9) Gaoshan gypsum mine in China (Sheng et al. 2010); and (10) Stone mines in USA (Esterhuizen et al. 2011). The general database is the continuation of an existing database developed by Lunder (1994) and Zhou et al. (2011). Additional projects are added to the original sets from other available sources found in the literature. An effort is also made to complete missing data ﬁelds within the pillar database by checking many sources and published literature.
2.2 Data visualization
Figure 1a shows the number of cases used in this study after 1972 from different countries. The distribution of PS data is shown in Fig. 1b as a pie chart illustrating the proportion of the three types of PS in hard rock mines, categorized as stable (S, 108 cases), unstable (U, 54 cases), and failed (F, 89 cases). The boxplot of the original data set is given in Fig. 2.
Nat Hazards (2015) 79:291–316
295
Fig. 1 Distribution of observed hard rock pillar events
Fig. 2 Boxplot of each variable for the three conditions of PS
The circles represent outliers (observations greater than the third quartile plus 1.5 times the interquartile range or smaller than the ﬁrst quartile minus 1.5 times the interquartile range). For most of the data groups, the median is not in the center of the box, which indicates that the distribution of most data groups is not symmetric (Fig. 2). In addition, all dependent variables have some outliers expect pillar stress and pillar strength for all PS types, UCS for S and U types, and pillar width for U type. As shown in Fig. 3, the scatterplot matrix in the upper panel describes the pairwise relationship between parameters with corresponding correlation coefﬁcients showing in the lower panel, whereas the marginal frequency dis tribution for each parameter is demonstrated on the diagonal. It can be seen that the parameter UCS is notably correlated with pillar strength and pillar stress and that pillar height is notably correlated with pillar width.
2.3 Selection of input and output variables
As mentioned above, Zhou et al. (2011) developed their models based on ﬁve parameters including pillar width, pillar height, the ratio of the pillar width to its height, the uniaxial compressive strength of the rock, and pillar stress. Wattimena et al. (2013) applied their model for predicting coal pillar stability for given geometry (pillar width to height ratio) and stress conditions (pillar strength to stress ratio). Ghasemi et al. (2010) investigated the effect of variability in parameters such as uniaxial compressive strength of coal specimen, pillar width, pillar height, entry width, and depth of cover on pillar safety factors using a
296
Nat Hazards (2015) 79:291–316
Fig. 3 PS parameter interaction matrix. Pairwise relationships are in the lower triangle; correlation coefﬁcients are in the upper triangle; marginal distributions are on the diagonal
Monte Carlo simulation. Ghasemi et al. (2014a, b) used major contributing parameters on pillar stability such as mining height, depth of cover, entry width, panel width, pillar width, pillar length, crosscut angle, roof strength rating, and loading condition. Moreover, the UCS of the intact rock is used as it is an index that can be utilized in a simpler way without carrying out pillar strength estimation (Wattimena 2014). Pillar shape expressed by the ratio of the pillar width to pillar height (K) that accounts for the increased strength pro vided by the shape and conﬁnement of the pillar. As noted above, various researchers have determined strength of the hard rock pillars based on empirical formulas, which is based on parameter pillar width, pillar height, and UCS (Hedley and Grant 1972; Lunder 1994; Martin and Maybee 2000; Esterhuizen et al. 2011). On the other hand, the pillar stability graph typically involves two parameters: (1) pillar stress to UCS ratio and pillar width to height ratio (K) (Lunder and Pakalnis 1994; Martina and Maybee 2000; Wattimena et al. 2014) or (2) pillar strength to stress ratio (SF) and K (Wattimena et al. 2013). Based on these conditions, six relevant indicators are adopted in this study. There are as follows:
pillar width, pillar height, K, the uniaxial compressive strength (UCS) of the rock, pillar strength, and pillar stress. Those indicators are recognized as the major parameters to quantitatively discover the activities in context of pillar. Theoretically, there may be additional indicators, but the data collection is a massive challenge for their applicability. Hence, the six indicators are adopted, and compositions of the indicators are investigated to discover the effects of varying indicators in the current study. In terms of pillar shape, pillar strength, and pillar load, a number of experiments are performed using different combinations of input parameters to assess the performance SL methods for PS, as listed in Table 1. Numerous scholars have conducted a variety of PS classiﬁcation methods. In all of the cases in the combined database, pillar stability assessments, which range from a simple assessment of ‘‘Stable/Failed’’ to a more rigorous approach based upon a ﬁve or sixstage stability classiﬁcation method (Hedley and Grant 1972; Von Kimmelman et al. 1984; Lunder 1994), have been investigated. Reviewing the combined database and the sug gestion of Lunder (1994), the PS is classiﬁed into three stages, which can be provided adequate results for the combined database, i.e., stable (S), unstable (U), and failed (F), as depicted in Table 2.
Nat Hazards (2015) 79:291–316
297
Table 1 Different models for PS prediction with different input parameters
Model 
Pillar height 
Pillar width 
K 
UCS 
Pillar strength 
Pillar stress 
A 
H 
H 
HH 
H 

B 
HH 
H 

C 
HHH 

D 
HH 
H 
H 
Table 2 Description of hard rock pillar stability classiﬁcation for the combined database (Modiﬁed from Lunder 1994)
Pillar stability 
Description of observed pillar characteristics 
classiﬁcation 

Failed 
Crushed (Hedley and Grant 1972) Severe spalling, pronounced opening of joints, deformation of drill holes (Von Kimmelman et al. 1984) Disintegration of pillar; blocks falling out; fractures trough pillar with fracture apertures wider than 10 mm (Lunder 1994) Pillars with manifested failure (Gonza´ lezNicieza et al. 2006) 
Unstable 
Partially failed (Hedley and Grant 1972) Prominent spalling (Von Kimmelman et al. 1984) Fractures also appear on the central parts of the pillar (Krauland and Soder 1987) Sloughing (Hudyma 1988) Showing one or more of the following signs: cracking and spalling in development and raises within the rib pillar; audible noise in the pillar; deformed drill holes; excess muck being pulled from stopes; cracking of pillars; major displacements within the pillar (Potvin et al. 1989) Prefailure, severe spalling (Sjoberg 1992) Corner breaking only up to fracturing in pillar walls with fracture aperture up to 10 mm (Lunder 1994) Stable pillars with incipient damage (Gonza´ lezNicieza et al. 2006) 
Stable 
Minor spalling, no joint opening (Von Kimmelman et al. 1984) No sign of stress induced fracturing (Lunder 1994) 
2.4 Supervised learning classiﬁcation methods
As noted by Hastie et al. (2009), the machine learning problems can be roughly categorized as either unsupervised learning or supervised learning. From a theoretical point of view, supervised and unsupervised learning differ only in the causal structure of the model (Kordon 2010). In unsupervised learning, all the observations are assumed to be caused by latent variables, there is no outcome measure, and the goal is to describe the associations and patterns among a set of input measures. In supervised learning, the model deﬁnes the effect on one set of observations (called inputs), has on another set of obser vations which is fully labeled (called outputs), and the goal of supervised learning is to predict the value of an outcome measure based on a number of input measures. Six classiﬁcation techniques are considered in the current study. These share characteristics that make them interesting to the current analysis: (a) All these methods are being increasingly used; (b) some of them have been used in PS classiﬁcation tasks with good results and are known to enable the analysis of more complex, nonlinear relationships;
298
Nat Hazards (2015) 79:291–316
(c) they have efﬁcient implementations; and (d) the resulting model allows for fast clas siﬁcation processing. The following six methods are compared with respect to their pre dictive performance. In this study, the feature vector X consists of six PS performance modiﬁers {X _{1} , X _{2} , X _{3} , X _{4} , X _{5} , X _{6} }, which correspond to the variables discussed in Sect. 3. The set of all feature vectors is denoted as H. Three PS states are deﬁned, i.e., stable (S), unstable (U), and failure (F), respectively, as Z _{i} (i = 1, 2, 3). Each class Z _{i} is associated with a discriminant function f _{i} (x). Several articles are published that compare multiple SL techniques (e.g., Garzo´ n et al. 2006; Berrueta et al. 2007; Sakiyama et al. 2008; Pino Mejı´as et al. 2008, 2010; Pozdnoukhov et al. 2009; Zhou et al. 2011; Gonza´ lezRuﬁno et al. 2013). Based on these articles and the focus herein on PS classiﬁcations, only a brief description of each classiﬁcation technique will be presented. For a more indepth dis cussion, the reader is referred to the relevant references.
2.4.1 Linear discriminant analysis (LDA)
The goal of LDA proposed by Fisher (1936) is derived from the Bayes rule, assuming that patterns belonging to class Z, and followed a normal (Gaussian) distribution with mean l _{Z} and nonsingular covariance matrix R common to all the classes (Zhou et al. 2010, 2011; Gonza´ lezRuﬁno et al. 2013). Under these hypotheses, the Bayes rule assigns a test pattern x to the class Z with the highest posterior probability w(Zx), given by
log½wðZ jxÞ ¼ log p _{Z}
1
þ ðx l _{Z} Þ ^{T} R ^{} ^{1} ðx l _{Z} Þ _{2} ½logjRj
ð1Þ
where jRzj is the determinant of Rz, or, equivalently, to the class Z which maximizes the linear function as follows:
L _{Z} ðxÞ ¼ 2 log p _{Z} l _{Z} R ^{} ^{1} l _{Z} þ 2l _{Z} R ^{} ^{1} x
T
T
ð2Þ
The matrix ^{P} is approximated by the withinclass covariance matrix
W ¼ ðX QMÞ ^{T} ðX QMÞ=ðn M Þ
ð3Þ
where X is the N 9 norder training set matrix, Q is the N 9 Morder matrix of class indicators, and M is the M 9 n matrix with the class means, n, N, and M denote the number of inputs, training patterns, and classes, respectively.
2.4.2 Multinomial logistic regression (MLR)
MLR is found to work well for multiclass classiﬁcation (SadatHashemi et al. 2005; Krishnapuram et al. 2005; Pandya et al. 2014). Let x and y be the matrices of predictors and response with Z levels, respectively; hence, if w _{i} = w _{r} (y = i) and i = 1, 2, 3, …, Z, then
by considering a multinomial distribution for y, we have R _{i}_{¼}_{1} ^{Z} w _{i} ¼ 1, so that a method to
predict w _{i} s is to use a MLR as (Z  1) equations on (Z  1) dummy variables, y _{i} , as follows:
y i ¼
1
0
if y ¼ i
otherwise ^{;}
^{I}^{n}
w i
w Z
¼ X _{i} b _{i} ;
i ¼ 1; 2; 3;
...;
Z 1
ð4Þ
where b _{i} is regression coefﬁcient for the ith equation, w _{i} is the probability of obtaining the ith outcome. We can compute w _{i} , …, w _{Z} _{} _{1} and w _{Z} for each subject using these (Z  1) equations. Each case will be allocated to jth category of y, if w _{j} = Max(w _{1} , w _{2} , …, w _{Z} ).
Nat Hazards (2015) 79:291–316
299
2.4.3 Random forest (RF)
The RF technique is based on the use of a large series of lowdimensional regression trees.
Its theoretical development is presented by Breiman (2001). The RF algorithm used here is
implemented by Liaw and Wiener (2002). The main idea of RF algorithm is to reduce the
correlation among the trees in order to improve the variance reduction in bagging by
growing trees that perform random selection on the input variables.
RF is very userfriendly in the sense that it has only two parameters—the number of
variables in the random subset at each node (m _{t}_{r}_{y} ) and the number of trees in the forest
(n _{t}_{r}_{e}_{e} )—and is usually not very sensitive to their values (Kuhn and Johnson 2013).
2.4.4 Artiﬁcial neural network (ANN)
ANN is a computational paradigm that provides a great variety of mathematical nonlinear
models, useful for tackling different statistical problems (Haykin 1998; PinoMejı´as et al.
2010). In this work, multilayer perceptron neural network (MLPNN) is employed (Pino
Mejı´as et al. 2010). Deﬁning G = (G _{1} , …, G _{M} ) as the vector of all the M coefﬁcients of the
net, and given n qsized target vectors y _{1} , …, y _{n} , then the Broyden–Fletcher–Goldfarb–
Shanno procedure (Bishop 1995) method can be applied to the following nonlinear least
squares problem (PinoMejı´as et al. 2010; Kuhn and Johnson 2013):
Min k R ^{M}
G
i¼i ^{G} i ^{2}
^{} þ R ^{n} k y _{i} y _{i} k ^{2}
i¼i
ð5Þ
The R implementation of an MLPNN model requires the speciﬁcation of two parameters:
the size of the hidden layer (H) and the decay parameter (g), and it must be remarked that
greater values for H could not be attempted due to the limited memory resources of our
personal computers (Hastie et al. 2009; Gonza´ lezRuﬁno et al. 2013).
2.4.5 Support vector machine (SVM)
SVM is among the most recent, signiﬁcant developments in the ﬁeld of discriminatory
analysis (Vapnik 1995). In the case of nonseparable data, the ‘‘ideal boundary’’ must be
adapted to tolerate errors for some objects i:
8
<
:
minimize
1
2
2
jdj þC R n
n
i¼1
i
under the constraints y _{i} ðb þ d x _{i} Þ þ n _{i} 1 and n _{i} 0
ð6Þ
where C is the penetrating parameter, d and b are, respectively, the normal vector and the
bias of the hyperplane, and each n _{i} corresponds to the distance between the object i and the
corresponding margin hyperplane (Cortes and Vapnik 1995; Devos et al. 2009; Zhou et al.
2011, 2012, 2013).
To learn nonlinearly separable functions, data are implicitly mapped to a higherdi
mensional space by means of mercer kernels, which can be decomposed into a dot product,
K(x _{i} , x _{j} ) = u(x _{i} ) u(x _{j} ) (Zhou et al. 2012). Radial basis function kernel that is being
extensively used is given below:
K
x i ; x j
^{} ¼ exp r
x i x j
2
where r is the kernel parameter.
ð7Þ
300
Nat Hazards (2015) 79:291–316
2.4.6 Gradient boosting machine (GBM)
Friedman (2001) proposed the gradient boosting machine (GBM) using the connection
between boosting and optimization. GBM builds the model in a stagewise fashion like
other boosting methods do, and it generalizes them by allowing optimization of an arbi
trary differentiable loss function (Guelman 2012; Kuhn and Johnson 2013). Using this
technique, function approximation is viewed from the perspective of numerical opti
mization in the function space, rather than in the parameter space.
2.5 R Software
R (R Development Core Team 2013) is a popular opensource software environment for
statistical computing and data visualization available for most mainstream platforms
(http://www.Rproject.org/). All data processing is performed using R software (version
3.02). R provides the most common SL classiﬁcation methods. Herein, we will only list
some of them. Further details about input parameters, implementation, and references can
be found in package documentation manuals. The packages necessary for each method and
the functions utilized to build the models are summarized in Table 3.
3 Pillar stability assessment model development
3.1 Preparing training and testing data sets
In supervised learning, the performance of a classiﬁer needs to be assessed on a given data
set before using it to predict the class of a new project. To do so, the original data set of PS
with known classes is randomly divided into two subsets: a training set and a test set. The
training set is required to estimate model parameters and construct the each PS classiﬁer
model. In this study, approximately 70 % of the available data (177 cases of database) are
considered for training data set. The test set is used as external validation set for testing the
performance and predictive power of each ﬁnal model. In this work, the reserved 74 cases
are considered as testing data set.
To guarantee comparability, all classiﬁers are generated using the same training set and
are validated by applying them to the same PS. Utilizing these classiﬁers, the classes of the
Table 3 R packages and functions used to run methods described in Sect. 3
Method R package 
R function 
Tuning 
References 

parameters 

LDA 
MASS 
lda 
None 
Fisher (1936), Venables and Ripley (2002) 
MLR 
nnet 
multinom 
{g} 
Venables and Ripley (2002), Ripley (2009) 
ANN 
nnet 
nnet 
{L, g} 
Haykin (1998), PinoMejı´as et al. (2010) 
SVM 
kernlab 
svmRadial 
{C, r} 
Vapnik (1995), Karatzoglou et al. (2004) 
RF 
randomForest randomForest n _{t}_{r}_{e}_{e} , m _{t}_{r}_{y} 
Breiman (2001), Liaw and Wiener (2002) 

GBM 
gbm 
gbm 
{n _{t}_{r}_{e}_{e} , v, J} 
Friedman (2001), Ridgeway (2007) 
Default function parameters are considered, optimized by tenfold CV whenever possible; m _{t}_{r}_{y} = number of variables to choose the best split; n _{t}_{r}_{e}_{e} = number of trees; L = number of hidden neurons in the MLP; g = decay parameter; C = penalization error coefﬁcient; r = width of radial basis in the SVM function; v = shrinkage; J = interaction.depth
Nat Hazards (2015) 79:291–316
301
PS are predicted and subsequently the misclassiﬁcation error rate is calculated with the
preferred classiﬁer being the one with the lowest misclassiﬁcation error rate. Scaling of the
input–output data is generally required prior to processing. All input variables are scaled
with function preProcess, which can be used to impute data sets based only on information
in the training set (Kuhn 2008).
3.2 Evaluation of classiﬁer’s performance
There is no generally accepted measure of performance for multiclass models. Thus, the
predictive power of SL algorithms on PS data is evaluated by means of the overall
accuracy (OA, Foody 2002) of the classiﬁcation and the Cohen’s kappa coefﬁcient (Kappa,
Cohen 1960) in this study. For each classiﬁcation, a confusion matrix is presented, along
with its user’s and producer’s accuracy (Congalton and Green 2009). Let r _{i}_{j} (i and j = 1, 2,
…, m) is the joint frequency of observations assigned to class i by prediction and to class j
by observation data, r _{i}_{?} is the total frequency of class i as derived from the prediction, and
r _{?} _{j} is the total frequency of class j as derived from the observation data, as indicated in
Table 4. The OA, which is deﬁned as the percentage of records that is correctly predicted
by the model relative to the total number of records among the classiﬁcation models, is a
primary evaluation criterion. The OA can be obtained by
OA ¼
1
n
R ^{m}
i¼1 ^{r} ii
100%
ð8Þ
The Cohen’s kappa coefﬁcient measures the proportion of correctly classiﬁed units after
the probability of chance agreement has been removed, which is a robust index which takes
into account the probability that a pixel is classiﬁed by chance (Kuhn and Johnson 2013).
The kappa is, therefore, always slightly lower than the classiﬁcation accuracy rate mea
surement and can be obtained using the following expression
Kappa ¼
nR ^{m}
i¼1 ^{r} ii ^{} ^{R} ^{m}
_{i}_{¼}_{1} ðr _{i}_{þ} r _{þ}_{i} Þ
n ^{2} R ^{m}
_{i}_{¼}_{1} ðr _{i}_{þ} r _{þ}_{i} Þ
ð9Þ
where x _{i}_{i} is the cell count in the main diagonal, n is the number of examples, Z is the
number of class values, and r _{?} _{i} , r _{i}_{?} are the columns and rows total counts, respectively.
Table 4 Population confusion matrix with r _{i}_{j} representing the proportion of area in the prediction category i and the observation category j
Predicted
Observed
Total
UA (%)
1
2
…
m
1
2
…
m
Total
PA (%)
r _{1}_{1}
r _{2}_{1}
…
r _{m}
_{1}
r ?
1
(r _{1}_{1} /r _{?} _{1} ) 9 100 %
r _{1}_{2}
r _{2}_{2}
…
r _{m}
_{2}
r ?
1
(r _{2}_{2} /r _{?} _{2} ) 9 100 %
…
…
r _{1} _{m}
r _{2} _{m}
……
…
…
…
r _{m}_{m}
r ? m
(r _{m}_{m} /r _{?} _{m} ) 9 100 %
r _{1}_{?}
r _{2}_{?}
…
r _{m} _{?}
(r _{1}_{1} /r _{1}_{?} ) 9 100 % (r _{2}_{2} /r _{2}_{?} ) 9 100 % … (r _{m}_{m} /r _{m} _{?} ) 9 100 %
PA producer’s accuracy, UA user’s accuracy
302
Nat Hazards (2015) 79:291–316
Table 5 Relative strength of agreement associated with kappa statistic
Kappa statistic 
Strength of agreement 
0.81–1.00 
Almost perfect 
0.61–0.80 
Substantial 
0.41–0.60 
Moderate 
0.21–0.40 
Fair 
0.00–0.20 
Slight 
1.00–0.00 
Poor 
The kappa measures the correct classiﬁcation rate after the probability of chance
agreement has been removed (Congalton 1991). Landis and Koch (1977) proposed a scale
to describe the degree of concordance (Table 5); the kappa ranges from 1 (total dis
agreement) through 0 (random classiﬁcation) to 1 (perfect agreement), as can be seen from
Table 5; a value of kappa below 0.4 is an indication of poor agreement and a value of 0.4
and above is an indication of good agreement (Landis and Koch 1977; Sakiyama et al.
2008).
According to Congalton and Green (2009), producer’s accuracy of class i (PA _{i} ) can be
computed by
PA _{i} ¼
p ii
p þm
100 %
¼
p ii
R ^{m}
i¼1 ^{p} im
100 %
ð10Þ
and the user’s accuracy of class i (UA _{i} ) can be computed by
UA _{i} ¼
p ii
p mþ
100 % ¼
_{p} mj ! 100 %
p ii
m
R
j¼1
ð11Þ
3.3 Validation method of the proposed models
Several adjustable ‘‘tuning parameters’’ used by each of the SL method to optimize
classiﬁcation performance are examined using repeated tenfold crossvalidation (CV) in
terms of computation time and variance, which is the number of folds recommended by
Kohavi (1995) when comparing the performance of machine learning algorithms (Kohavi
1995; LeThiThu et al. 2011; Clark 2013; Kuhn and Johnson 2013). In this procedure,
compounds of the training data are randomly divided into 10 subsets. Nine subsets are used
as novel ‘‘training data’’ to develop each SL method, and the holdout set is used for
‘‘predict’’ the performance of the ﬁtted model. This process is repeated 10 times on
different training subsets, and at the end, every instance has been used exactly once for
testing, and ﬁnally, the CV estimate of overall accuracy is calculated by simply averaging
the 10 individual accuracy measures for CV accuracy, and the whole tenfold CV process is
also repeated 10 times (‘‘folds’’) to obtain the reliable results. This procedure is used for
the selection of parameters and to avoid overﬁtting of models. The test set is never used in
the development of the model, but it is used to test the predictive power of the ﬁnal model.
Thus, the repeated tenfold CV resampling technique (Molinaro et al. 2005) is used to
create and optimize SL models for hard rock pillars classiﬁcation in the present work. We
construct the predictive models using selected variables and training set and applied to test
set as shown in Fig. 4.
Nat Hazards (2015) 79:291–316
303
Fig. 4 Overall procedure ﬂowchart for performance evaluation for PS classiﬁcation using SL methods
3.4 SL method development and parameter optimization
This study examines the suitability of the following six common classiﬁcation algorithms
using the PS data set: LDA, MLR, MLPNN, SVM, GBM, and RF algorithms. Most
classiﬁers (MLR, ANN, SVM, RF, and GBM methods) include parameters that have to be
tuned. The ‘‘train’’ function from caret (classiﬁcation and regression training) package
within R (Kuhn 2008) performs a grid of tuning parameters for a number of classiﬁcation
routines, which allows for a single consistent environment for training each of the SL
methods and tuning their associated parameters. After assessing the optimal parameters,
the whole training data set is used to build the ﬁnal model for PS prediction. The term
‘‘PS’’ refers to the classiﬁcation task. A desired ‘‘tune length’’ variable can be passed to the
‘‘train’’ function in caret package (Kuhn 2012; Kuhn and Johnson 2013). ‘‘Optimal’’ values
for tuning parameters are selected using a repeated tenfold CV based on the original
training data set, with the original test removed completely from the CV process. Tuning
parameters are considered optimized based on classiﬁcation models that achieved the
largest value of kappa during the CV process. So it will ﬁnd the one with the highest
accuracy and kappa, and thus, an optimal solution can be searched. Speciﬁc details on
tuning parameters used by the six SL algorithms examined in the current study are listed in
the following sections. The ﬁnal results (classiﬁcation rate and kappa and tuning param
eters for each algorithm) are presented in Table 6.
• LDA: This classiﬁer needs no tuning of hyperparameters.
• MLR: The parameter for weight decay (decay) is tuned for 10 values (0, 1e04, 0.000237,
0.000562, 0.00133, 0.00316, 0.0075, 0.0178, 0.0422, and 0.1) to ﬁnd the optimal model.
• RF: Tuning of RF method involves ﬁnding optimum value for number of classiﬁcation
trees (n _{t}_{r}_{e}_{e} ) and number of variables (m _{t}_{r}_{y} ), which are randomly selected at each split in
the tree building process. It has been observed that the OA is more sensitive to m _{t}_{r}_{y} and
not much effected by n _{t}_{r}_{e}_{e} (Breiman and Cutler 2007). Therefore, n _{t}_{r}_{e}_{e} is ﬁxed at default
304
Nat Hazards (2015) 79:291–316
Table 6 Tuning parameters of each model for an optimal classiﬁcation
Method Turning parameters
Model A 
Model B 
Model C 
Model D 

LDA 
None 
None 
None 
None 
MLR 
g = 0.1000 
g = 0.1000 
g = 0.0001 
g = 0.1000 
RF 
{m _{t}_{r}_{y} , n _{t}_{r}_{e}_{e} } = {2, 
{m _{t}_{r}_{y} , n _{t}_{r}_{e}_{e} } = {1, 
{m _{t}_{r}_{y} , n _{t}_{r}_{e}_{e} } = {1, 
{m _{t}_{r}_{y} , n _{t}_{r}_{e}_{e} } = {4, 
500} 
500} 
500} 
500} 

ANN 
{L, g} = {15, 
{L, g} = {19, 0.0075} {L, g} = {17, 
{L, g} = {19, 

0.0421} 
0.0074} 
0.0075} 

SVM 
{C, r} = {16, 0.552} {C, r} = {16, 0.563} 
{C, r} = {64, 1.187} {C, r} = {16, 0.622} 

GBM 
{n _{t}_{r}_{e}_{e} , v, J} = {400, 0.1, 2} 
{n _{t}_{r}_{e}_{e} , v, J} = {150, 0.1, 10} 
{n _{t}_{r}_{e}_{e} , v, J} = {50, 0.1, 10} 
{n _{t}_{r}_{e}_{e} , v, J} = {500, 0.1, 6} 
value 500 and m _{t}_{r}_{y} is tested for t values, where t is the number of input layers in each
classiﬁcation setup.
• 
ANN: The number H of hidden neurons in the range 1 \ H \ 19 (ten values), and 
trying 10 random weight initializations (decay). Delay = {0, 1e04, 0.000237, 

0.000562, 0.00133, 0.00316, 0.0075, 0.0178, 0.0422, and 0.1}. 

• 
SVM: The parameter C is tuned for 12 values (2 ^{} ^{2} , 2 ^{} ^{1} , 2 ^{0} , 2 ^{1} , 2 ^{2} , 2 ^{3} , 2 ^{4} , 2 ^{5} , 2 ^{6} , 2 ^{7} , 2 ^{8} , 
and 2 ^{9} ) to ﬁnd the optimal model. The ‘‘caret’’ package initially estimates an
approximate value for the sigma parameter using the ‘‘sigest’’ function based on the
training data.
• GBM: The GBM has three tweaking parameters: the total number of iterations
(n.trees), the learning rate (shrinkage parameter v), and the complexity of the tree
(indexed by interaction.depth J). Tuning parameter ‘‘shrinkage’’ is held constant at a
value of 0.1, n.trees = 50, 100, 150, 200, 250, 300, 350, 400, 450, 500; J = 1, 2, 3, 4,
5, 6, 7, 8, 9, 10.
4 Results and discussions
4.1 Classiﬁcation results achieved by classiﬁers
Average values obtained from 10 repetitions of tenfold CV are implemented for all the
comparisons for each method. Visual comparison of the performance of all SL classiﬁ
cation methods is indicated in Figs. 5, 6. The boxplot in Fig. 5 reports the performance of
six classiﬁers using different input variables on repeated tenfold CV phrase with training
set average accuracy and variance. On each box, the central mark is the median, the edges
of the box are the 25th and 75th percentiles, the whiskers extend to the most extreme data
points not considered outliers, and outliers are plotted individually. Since the notches in the
box plots overlap, we can conclude, with a certain conﬁdence, that the true medians do not
differ (See Fig. 5). The outliers are marked separately with the dotted points. The dif
ference between lower and upper quartiles in RF (Fig. 5a, c) is comparatively smaller than
the others that show relatively low variance of accuracies in different iterations. Density
plot methods can be used to visualize the resampling distributions (Kuhn and Johnson
Nat Hazards (2015) 79:291–316
305
Fig. 5 Boxplot distributions of training set in terms of ‘‘accuracy’’ and ‘‘kappa’’ for six SL methods—resulting from repeated tenfold CV procedure
(d) Model D
306
Nat Hazards (2015) 79:291–316
Fig. 6 Density plot distributions of training set in terms of ‘‘accuracy’’ and ‘‘kappa’’ for six SL methods—resulting from repeated tenfold CV procedure
(d) Model D
Nat Hazards (2015) 79:291–316
307
Table 7 Performance metrics of each model for test data
Method 
Model A 
Model B 
Model C 
Model D 

OA (%) 
Kappa 
OA (%) 
Kappa 
OA (%) 
Kappa 
OA (%) 
Kappa 

LDA 
67.8 
0.461 
59.5 
0.345 
63.5 
0.389 
64.9 
0.424 

MLR 
66.2 
0.436 
63.5 
0.392 
63.5 
0.388 
63.5 
0.400 

RF 
82.4 
0.723 
75.7 
0.609 
71.6 
0.543 
73.0 
0.576 

ANN 
81.1 
0.703 
75.7 
0.611 
75.7 
0.613 
75.7 
0.615 

SVM 
82.4 
0.726 
68.9 
0.502 
71.6 
0.550 
79.7 
0.684 

GBM 
79.7 
0.678 
77.0 
0.636 
66.2 
0.474 
75.7 
0.616 
Bolded values indicate the highest value for each model
2013). Figure 6 illustrates density plots of the 200 bootstrap estimates of accuracy and
kappa for the ﬁnal model. Table 7 summarizes the overall predictability of test set by
comparing two measures between six classiﬁers, OA and kappa. The kappa is a measure of
true accuracy which takes into account the agreement that may have occurred by chance. It
is considered to be preferable when it is larger than 0.4 (Landis and Koch 1977). Not
surprisingly, the linear methods, such as LDA and MLR, did not do well herein, and this is
likely due to the model inability to handle nonlinear class boundaries. For the model A, in
terms of average accuracy rate for training set, SVM predictor achieved the highest OA
(84.3 %) and followed by ANN, GBM, and RF with average accuracy rates of 83.2, 80.9,
and 77.8 %, respectively. MLR performed relatively worse with an average accuracy rate
of 69.3 %, and LDA with the lowest average accuracy rate of 67.0 %. However, RF and
SVM achieved the highest OA (82.4 %) for test set. ANN performed relatively worse with
OA (81.1 %), and LDA with the lowest OA (66.2 %). One the other hand, the kappa of the
LDA, MLR, RF, ANN, SVM, and GBM techniques for training set and test set in model
calibration data is from moderate to substantial on the basis of the scale of concordance
presented by Landis and Koch (1977). The accuracies of all modeling techniques for the
model evaluation test set are from moderate to substantial according to the scale of con
cordance. For the model B, ANN predictor achieves the highest OA (80.3 %) for training
set and followed by RF, GBM, and SVM with average accuracy rates of 79.1, 78.4, and
76.9 %, respectively. LDA performs relatively worse with an average accuracy rate of
65.7 %, and MLR with the lowest average accuracy rate of 65.3 %. However, GBM
achieves the highest OA (77.0 %) for test set. GBM, ANN, and SVM performed relatively
worse with OA (75.7–78.4 %), and LDA with the lowest OA (59.5 %). One the other hand,
the kappa of LDA, MLR, RF, ANN, SVM, and GBM techniques for training set in model
calibration data is from moderate to substantial. The accuracies of all modeling techniques
for the model evaluation test set are from fair to substantial. Similarly, results for model C
and D can be seen in Fig. 5 and Table 7. We can observe the following facts: For the six
SL techniques, the performance (in terms of accuracy) of the training set falls into the
range of (65.3–84.3 %) across the four models, while the performance (in terms of
accuracy) of the test set performance falls into the range of (59.5–82.4 %). The predictive
accuracy of LDA, MLR, RF, ANN, SVM, and GBM techniques for the training set in
model calibration data ranges from moderate to substantial. The accuracy of all modeling
techniques for the test set ranges from fair to substantial.
308
Nat Hazards (2015) 79:291–316
4.2 Comparison of SL classiﬁcation techniques
Out of the four models in Table 7, the results indicate a better performance of SVM with
model A using ﬁve parameters (pillar height, pillar width, UCS, pillar strength, and pillar
stress) to train the SVM. A classiﬁcation accuracy of 84.3 % is achieved with the RBF
kernel. Out of 74 test data records, 17 are incorrectly classiﬁed (Table 8). As can be seen
from Table 8, Model A provides the best results for testing data using all input parameters
with RF. RF produced the best outcome in terms of OA and kappa for test set (Table 8).
Higher classiﬁcation accuracy with training set in comparison with the test set may not
indicate a better generalization capability. This may be due to the overtraining (or
memorizing the training set) of models. A comparison of results from model A and D
suggests that the contribution of pillar strength is a bit sensitive (Fig. 5; Table 8). How
ever, there are very few signiﬁcant differences in OA and kappa among of the model B and
C, and the combinations of four (model A) or ﬁve (model D) input parameter results are
better than combinations of three input parameters (model B and C), thus suggesting the
combinations of ﬁve (model A) input parameters is the best choice.
For quantifying the accuracy of the classiﬁed maps, accuracy assessment based on a
confusion or error matrix is carried out using 74 independent reference samples. Four
metrics (Congalton and Green 2009): (1) OA, (2) kappa, (3) producer accuracy (PA), and
(4) user accuracy (UA) are calculated for each PS class using the model A from the
confusion matrix, which is presented in Table 8. The PA and UA indicate that some
features are better classiﬁed than others. As can be seen from Table 8, ‘‘Stable’’
(90.6–96.9 % and 66.0–93.5 %) is classiﬁed more accurately compared to ‘‘Unstable’’
(0–56.3 % and 0–69.2 %). Also ‘‘Failed’’ receives a relatively low PA (69.2–92.3 %) and
UA (69.2–88.5 %). Unstable is often confused with ‘‘Stable’’ and ‘‘Failed.’’ This is likely
the effect of overlapping classiﬁcation rules and the small number of samples in this class.
The results show that some SL methods substantially outperform others for this classiﬁ
cation problem. It is clear that SVM and RF are both capable of achieving high accuracy
for all classes despite the heavily unbalanced data set. In particular, the combination of the
ﬁve features SVM and RF classiﬁcation is very effective and yields one of the best
classiﬁcation results with the OA of 82.4 %.
As can be observed in Fig. 6, LDA and MLR of kappa distributions appear to be similar
to one another but different from the ANN, SVM, RF, and GBM methods. There are very
few statistically signiﬁcant differences between these methods (ANN, SVM, RF, and
GBM) using the resampling results. Given this, any of these methods would be a rea
sonable choice. Since models are ﬁt on the same versions of the training data, it makes
sense to make inferences on the differences between models. From the above observations,
the predictions are very unstable along the different SL methods, and none of the methods
is found as being excellent with respect to every test measure, but the methods of SVM and
RF, particularly the method of SVM, are performed to be better than the others ANN and
GBM; there is no signiﬁcant difference in terms of generalization performance, but if we
have to choose the best, SVM is the one. However, GBM and ANN are the most com
putationally intensive techniques and take the longest amount of time to train. This high
classiﬁcation accuracy likely results from the computationally intensive backpropagation
process, during which feature weights are modiﬁed according to an iterative method. In
addition, LDA performs the worst in terms of training set. Its predictive performance is
also worse than the learner of MLR. While MLR performs the worst in terms of test set, its
predictive performance is also worse than the learner of LDA.
Nat Hazards (2015) 79:291–316
309
OA = 82.4 % Kappa = 0.726
OA = 79.7 % Kappa = 0.678
OA = 81.1 % Kappa = 0.703
UA (%)
57.1 %
85.3 %
88.5 %
64.3 %
93.5 %
79.3 %
54.5 %
82.9 %
85.7 %
Diagonal elements (correct decisions) are marked in bold letters. OA overall classiﬁcation accuracy, PA producer’s accuracy, UA user’s accuracy
Total
14
34
74
14
74
74
26
29
28
35
31
11
Table 8 Confusion matrices and associated classiﬁer accuracies for best model predictions based on test data of hard rock pillars
50.0 %
56.3 %
37.5 %
16
16
16
FSU
2
6
9
8
4
6
5
5
3
90.6 %
90.6 %
90.6 %
32
32
32
29
29
29
0
0
2
3
3
1
Predicted Observed
88.5 %
88.5 %
92.3 %
24
23
26
23
26
26
0
0
0
2
3
3
PA (%)
PA (%)
PA (%)
GBM
SVM
Total
Total
Total
ANN
U
U
U
F
F
F
S
S
S
OA = 62.2 % Kappa = 0.436
OA = 82.4 % Kappa = 0.723
OA = 67.8 % Kappa = 0.461
UA (%)
50.0 %
67.4 %
69.2 %
66.0 %
69.2 %
69.2 %
88.5 %
82.9 %
0.0 %
Total
74
74
74
46
26
26
26
47
35
13
2
1
56.3 %
6.3 %
0.0 %
16
16
16
FSU
0
9
4
9
7
7
8
1
3
96.9 %
96.9 %
90.6 %
32
32
32
29
31
31
10
10
0
13
1
1
Predicted Observed
69.2 %
69.2 %
88.5 %
18
18
26
26
23
26
2
7
7
PA (%)
PA (%)
PA (%)
Total
Total
Total
MLR
LDA
RF
U
U
U
F
F
F
S
S
S
310
Nat Hazards (2015) 79:291–316
Furthermore, Lunder and Pakalnis (1997) proposed that the pillar stability could be
adequately expressed by two SF lines. Pillars with a SF [1.4 stable while those with a SF
\1 are failed, and the transition zone from stable condition to failed condition
(1\SF \ 1.4) is referred to as unstable, and pillars in this region are prone to spalling and
slabbing but have not completely failed (Lunder and Pakalnis 1997; Martin and Maybee
2001). Similarly, work on classiﬁcation of pillars in marble mines has been established by
Gonza´ lezNicieza et al. (2006); they suggested: SF [1.25 (stable), 0.90 \ SF \ 1.25
(unstable), and SF \0.90 (failed). The predictive accuracy of two methods is 68.9 %
(Lunder and Pakalnis 1997) and 68.5 % (Gonza´ lezNicieza et al. 2006) of the original data,
respectively. It is obvious judging from the predictive accuracy that the empirical methods
cannot generate satisfactory predictions on these pillar instances.
4.3 Superiority and Limitations
The primary strength of this study is in the systematic assessment of the pillar stability
classiﬁcation in hard rock mines utilizing six SL methods. It is evident that a powerful
statistical programming language system is necessary to implement this computing task; a
good and cheap choice is to employ the R system, offering us the free implementation of
SL classiﬁcation methods.
Although there are important discoveries revealed by these studies, there are also
limitations. First, training any classiﬁer with an unbalanced data set in favor of negative
instances makes it difﬁcult to learn the positive instances. The unbalanced distribution in
prior probabilities of the three classes in both training and test sets affects the reliability of
the predictor in all SL approaches. Second, other relevant variables of pillar stability may
be collected in an effort to increase the prediction accuracy of the models. Third, regarding
the comparison of more SL methods, other newly developed classiﬁcation methodologies
can be employed and their results can then be compared with those studied in the paper.
Additionally, discontinuities and joint factors have been omitted in this study. On the other
hand, every approach has its advantages and disadvantages; Table 9 summarizes the
strengths and weaknesses of the six SL methods described above.
4.4 Relative importance of variables
The generic function varImp () in caret package can be used to characterize the general
effect of predictors on the model (Kuhn and Johnson 2013). VarImp also works with
objects produced by train, but this is a simple wrapper for the speciﬁc models previously
listed. In this work, we illustrate how to determine relative importance of discriminating
features by RF method. For most classiﬁcation methods, each predictor will have a sep
arate variable importance for each class and the default variable importance metric regards
the area under the curve or AUC from a ROC analysis with regard to each predictor, and is
model independent.
Variables are sorted by average importance across the classes in this work. From Fig. 7,
we can ﬁnd how important the variables are for PS classiﬁcation with RF method. Fig
ure 7a demonstrates that pillar stress is the most sensitive factor among the indicators for
the prediction of PS classiﬁcation. Not surprisingly, the indicator pillar strength takes the
second place of sensitivity. The index of pillar width is a bit sensitive. The factors of UCS
and pillar height are not as sensitive as the former three factors. Figure 7b–c also provides
the result of the RF method using function ‘‘varImp()’’ in the ‘‘caret’’ package and displays
the relative variable importance for each of the three predictor variables. Pillar stress is also
Nat Hazards (2015) 79:291–316
311
Table 9 Summary of strengths and weaknesses of the various SL methods
Method Strengths 
Weaknesses 

LDA 
Simple, fast, efﬁcient, strong theory base, linear, interpretability 
LDA only performs well when all classes are strictly homogeneous and cannot be used if the number of variables is higher than the total number of samples 
MLR 
Extract more information from the data and prevent the loss of information due to collapsing 
Computationally intensive 
RF 
Works well with highdimensional small sample sizes, some tolerance to correlated inputs, fast computation 
Difﬁcult to interpret, prone to overﬁtting in certain data sets, do not handle large number of irrelevant features as well as other ensemble methods 
SVM 
Can be used to classify complex biological nonlinear data, not prone to local minima, works well with highdimensional small data sets, avoids overﬁtting, robust to noise 
Very black box, computational scalability, lack of transparency, restricted to pairwise classiﬁcation, cannot be used directly for feature selection 
ANN 
Nonlinear adaptability, no assumptions required on probability density and distribution, certain conﬁgurations have been proven to being a universal approximator 
Difﬁcult to design an optimal architecture; High computational cost, not robust to outliers, loss of generality, risk of overﬁtting, prone to suboptimal local minima, inability to extract features responsible for results, black box presents uncertainties for mission critical applications 
GBM Theoretical properties, identiﬁcation of outliers
High computational cost, not interpretable
the most sensitive factor among all response variables for model B and Model C, followed
by the indicator pillar strength, K, UCS. Figure 7d shows the different results between
model A and model D for variable importance of UCS and pillar width. However, these
results demonstrate that pillar stress is the most relevant predictor among the indicators for
the prediction of PS classiﬁcation.
5 Summary and conclusions
A data set of 251 pillar cases compiled from published research work available in recent
years is utilized to construct the proposed models. A comparison of the performances of six
SL classiﬁers for the prediction of PS in hard rock mines is discussed. Based on the
analysis results, the following conclusions can be drawn. First, none of the SL classiﬁ
cation methods should be used blindly, as none of them deemed to be a fully automatic
classiﬁcation method. For the PS data set, the use of repeated tenfold CV strategy for
selecting appropriate parameters in the tuning process is necessary. The use of multiple,
random splitting into training set and test set is also needed for a reliable model com
parison. Among the six different SL methods, SVM and RF are found to be the best
methods. Nonlinear classiﬁcation methods ANN and GBM have a slightly higher perfor
mance and reliability than linear classiﬁers LDA and MLR. The comparisons indicate that
the model A consisting of ﬁve input variables (pillar width, pillar height, UCS, pillar
strength, and pillar stress) with SVM and RF methods is more reliable for evaluating pillar
312
Nat Hazards (2015) 79:291–316
Fig. 7 Variable importance assessment in each model for predicting PS with RF method
stability than other models. RF demonstrates that pillar stress is the most relevant PS
predictor for all models A, B, C, and D. Finally, for the six SL classiﬁers studied, the
performance (in terms of accuracy) of the training set falls in the range of 65.3–84.3 %
across the six models with different input parameter combination, while the performance
(in terms of accuracy) of the test set performance falls into the range of 59.5–82.4 %.
Nat Hazards (2015) 79:291–316
313
Acknowledgments This research was partially supported bythe National Natural Science Foundation Project (Grant Nos. 11472311 and 41272304) of China, the Graduated Students’ Research, Innovation Fund Project (Grant No. CX2011B119) of Hunan Province of China, Project (Grant No. 134376140000022) supported by the Scholarship Award for Excellent Doctoral Student of Ministry of Education of China and the Valuable Equipment Open Sharing Fund of Central South University. The authors would like to express thanks to these foundations. The ﬁrst author would like to thank the Chinese Scholarship Council for ﬁnancial support to the joint PhD at McGill University, Canada. We also would like to thank the three anonymous referees and editors for their valuable comments and suggestions which improved a previous version of this manuscript.
References
Berrueta LA, AlonsoSalces RM, Heberger K (2007) Supervised pattern recognition in food analysis. J Chromatogr A 1158(1–2):196–214 Bieniawski ZT (1968) The effects of specimen size on the compressive strength of coal. Int J Rock Mech Min Sci Geomech Abstr 5(4):325–335 Bieniawski ZT, Van Heerden WL (1975) The signiﬁcance of in situ tests on large rock specimens. Int J Rock Mech Min Sci Geomech Abstr 12(4):101–113. Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, New York Brady B, Brown ET (2003) Rock mechanics for underground mining, 2nd edn. Chapman and Hall, London Breiman L (2001) Random forests. Mach Learn 45:5–32 Breiman L, Cutler A (2007) Randomforests—classiﬁcation description: randomforests. http://statwww. berkeley.edu/users/breiman/RandomForests/cc_home.htm. Accessed 15 Jan 2014 Cauvin M, Verde T, Salmon R (2009) Modeling uncertainties in mining pillar stability analysis. Risk Anal
29(10):1371–1380
Clark M (2013) An introduction to machine learning: with applications in R. http://www3.nd.edu/
Congalton RG, Green K (2009) Assessing the accuracy of remotely sensed data: principles and practices, 2nd edn. Lewis, Boca Raton Deng J, Yue ZQ, Tham LG, Zhu HH (2003) Pillar design by combining ﬁnite element methods, neural networks and reliability: a case study of the Feng Huangshan copper mine, China. Int J Rock Mech Min Sci 40(4):585–599 Elmo D, Stead D (2010) An integrated numerical modelling–discrete fracture network approach applied to the characterisation of rock mass strength of naturally fractured pillars. Rock Mech Rock Eng
43(1):3–19
Esterhuizen GS (1993) Variability considerations in hard rock pillar design. In: Proceedings of the SAN GORM symposium: rock engineering problems related to hard rock mining at shallow to intermediate depth, Rustenburg, South Africa Esterhuizen GS, Dolinar DR, Ellenberger JL (2011) Pillar strength in underground stone mines in the United States. Int J Rock Mech Min Sci 48(1):42–50 Fisher DH (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7(7):179–188 Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat
29(5):1189–1232
Garzo´ n MB, Blazek R, Neteler M, de Dios RS, Ollero HS, Furlanello C (2006) Predicting habitat suitability with machine learning models: the potential area of Pinus sylvestris L. in the Iberian Peninsula. Ecol Model 97:383–393 Ghasemi E, Shahriar K (2012) A new coal pillars design method in order to enhance safety of retreat mining in room and pillar mines. Saf Sci 50:579–585 Ghasemi E, Shahriar K, Sharifzadeh M, Hashemolhosseini H (2010) Quantifying the uncertainty of pillar safety factor by Monte Carlo simulation—a case study. Arch Min Sci 55:623–635 Ghasemi E, Ataei M, Shahriar K (2014a) An intelligent approach to predict pillar sizing in designing room and pillar coal mines. Int J Rock Mech Min Sci 65:86–95 Ghasemi E, Ataei M, Shahriar K (2014b) Prediction of global stability in room and pillar coal mines. Nat Hazards 1–18 Gonza´ lezNicieza C, AlvarezFernandez MI, Mene´ ndezDı´az A, AlvarezVigil AE (2006) A comparative analysis of pillar design methods and its application to marble mines. Rock Mech Rock Eng
39(5):421–444
314
Nat Hazards (2015) 79:291–316
Gonza´ lezRuﬁno E, Carrio´ n P, Cernadas E, Ferna´ ndezDelgado M, Domı´nguezPetit R (2013) Exhaustive comparison of colour texture features and classiﬁcation methods to discriminate cells categories in histological images of ﬁsh ovary. Pattern Recogn 46(9):2391–2407 Grifﬁths DV, Fenton GA, Lemons CB (2007) The random ﬁnite element method (RFEM) in mine pillar stability analysis. Probabilist Methods Geotech Eng 491:271–294 Grifﬁths DV, Fenton GA, Lemons CB (2002) Probabilistic analysis of underground pillar stability. Int J Numer Anal Meth Geomech 26(8):775–791 Guelman L (2012) Gradient boosting trees for auto insurance loss cost modeling and prediction. Expert Syst Appl 39:3659–3667 Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New York Haykin S (1998) Neural networks: a comprehensive foundation, 2nd edn. Prentice Hall, New Jersey Hedley DGF, Grant F (1972) Stopeandpillar design for the Elliot Lake Uranium Mines. Bull Can Inst Min Metall 65:37–44 Hoek E, Brown ET (1980) Underground excavation in rock. The Institute of Mining and Metallurgy, London Hudyma MR (1988) Rib pillar design in open stope mining. MASc thesis, University of BC Hustrulid WA (1976) A review of coal pillar strength formulas. Rock Mech 8(2):115–145 Hutchinson DJ, Phillips C, Cascante G (2002) Risk considerations for crown pillar stability assessment for mine closure planning. Geotech Geol Eng 20(1):41–63 Jaiswal A, Shrivastva BK (2009) Numerical simulation of coal pillar strength. Int J Rock Mech Min Sci
46(4):779–788
Jaiswal A, Sharma SK, Shrivastva BK (2004) Numerical modeling study of asymmetry in the induced stresses over coal mine pillars with advancement of the goaf line. Int J Rock Mech Min Sci
41(5):859–864
Jawed M, Sinha RK, Sengupta S (2013) Chronological development in coal pillar design for bord and pillar workings: a critical appraisal. J Geol Min Res 5(1):1–11 Karatzoglou A, Smola A, Hornik K, Zeileis A (2004) Kernlab—an S4 package for kernel methods in R. J Stat Softw 11(9):1–20 Kohavi R (1995) A study of crossvalidation and bootstrap for accuracy estimation and model selection. In:
IJCAI’95 Proceedings of the 14th international joint conference on artiﬁcial intelligence, vol 2. Morgan Kaufmann Publishers Inc., San Francisco, pp 1137–1143 Kordon AK (2010) Applying computational intelligence: How to create value. The Dow Chemical Com pany, Freeport, p 459 Krishnapuram B, Carin L, Figueiredo MA, Hartemink AJ (2005) Sparse multinomial logistic regression: fast algorithms and generalization bounds. Pattern Anal Mach Intell IEEE Trans 27(6):957–968 Kuhn M (2008) Building predictive models in R using the caret package. J Stat Softw 28(5):1–26 Kuhn M (2012) ‘‘Caret’’ package (R Package Version 5.15023). R Foundation for Statistical Computing, Vienna, Austria Kuhn M, Johnson K (2013) Applied predictive modeling. Springer, New York Landis J, Koch G (1977) The measurement of observer agreement for categorical data. Biometrics
33(1):159–174
Li XB, Li DY, Guo L, Ye ZY (2007) Study on mechanical response of highlystressed pillars in deep mining under dynamic disturbance. Chin J Rock Mech Eng 26(5):922–928
Li XB, Li DY, Liu ZX, Zhao GY, Wang WH (2013) Determination of the minimum thickness of crown
pillar for safe exploitation of a subsea gold mine based on numerical modelling. Int J Rock Mech Min Sci 57:42–56 Liaw A, Wiener M (2002) Classiﬁcation and regression by randomforest. R News 2:18–22 Liu XZ, Zhai DY (2000) The reliability design of pillar. Chin J Rock Mech Eng 18(6):85–88 Liu ZB, Shao JF, Xu WY, Meng YD (2013) Prediction of rock burst classiﬁcation using the technique of cloud models with attribution weight. Nat Hazards 68(2):549–568 Liu ZB, Shao JF, Xu WY, Chen HJ, Zhang Y (2014) An extreme learning machine approach for slope stability evaluation and prediction. Nat Hazards 1–18 Lunder PJ (1994) Hard rock pillar strength estimation an applied empirical approach. University of British Columbia, Vancouver (MASc thesis) Lunder PJ, Pakalnis R (1997) Determination of the strength of hard rock mine pillars. Bull Can Inst Min Metall 90(1013):51–55 Mark C (2006) The evolution of intelligent coal pillar design: 1981–2006. In: Proceedings of the 25th international conference on ground control in mining. West Virginia University, Morgantown, pp 325–334
Nat Hazards (2015) 79:291–316
315
Mark C, Barton TM (1997) Pillar design and coal strength. In: Proceedings of the New Technology for Ground Control in Retreat Mining. US Department of Health and Human Services, Public Health Service, Centers for Disease Control and Prevention, National Institute for Occupational Safety and Health, Pittsburgh, PA, pp 49–59, DHHS (NIOSH) Publication (No. 97–122) Martin CD, Maybee WG (2000) The strength of hardrock pillars. Int J Rock Mech Min Sci 37:1239–1246 Mitri HS (2007) Assessment of horizontal pillar burst in deep hard rock mines. Int J Risk Assess Manag
7(5):695–707
Molinaro AM, Simon R, Pfeiffer RM (2005) Prediction error estimation: a comparison of resampling methods. Bioinformatics 21(15):3301–3307 Monjezi M, Hesami SM, Khandelwal M (2011) Superiority of neural networks for pillar stress prediction in bord and pillar method. Arab J Geosci 4(5–6):845–853 Mortazavi A, Hassani FP, Shabani M (2009) A numerical investigation of rock pillar failure mechanism in underground openings. Comput Geotech 36(5):691–697 Pandya DH, Upadhyay SH, Harsha SP (2014) Fault diagnosis of rolling element bearing by using multi nomial logistic regression and wavelet packet transform. Soft Comput 18(2):255–266 PinoMejı´as R, CarrascoMairena M, PascualAcosta A, CubilesDeLaVega M, Joaquı´n Mun˜ ozGarcı´a J (2008) A comparison of classiﬁcation models to identify the Fragile X Syndrome. J Appl Stat
35(3):233–244
PinoMejı´as R, CubilesdelaVega MD, AnayaRomero M, PascualAcosta A, Jorda´ nLo´ pez A, Bellin fanteCrocci N (2010) Predicting the potential habitat of oaks with data mining models and the R
system. Environ Model Softw 25(7):826–836 Potvin Y, Hudyma M, Miller HDS (1989) Rib pillar design in open stope mining. Bull Can Inst Min Metall
82(927):31–36
Pozdnoukhov A, Foresti L, Kanevski M (2009) Datadriven topoclimatic mapping with machine learning methods. Nat Hazards 50(3):497–518 R Development Core Team (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3900051070. http://www.Rproject.org/ Ridgeway G (2007) Generalized boosted models: a guide to the gbm package. http://cran.rproject.org/web/ packages/gbm/index.html Ripley B (2009) Nnet: feedforward neural networks and multinomial loglinear models. http://cran.r project.org/web/packages/nnet/index.html
SadatHashemi SM, Kazemnejad A, Lucas C, Badie K (2005) Predicting the type of pregnancy using artiﬁcial neural networks and multinomial logistic regression: a comparison study. Neural Comput Appl 14(3):198–202 Sakiyama Y, Yuki H, Moriya T, Hattori K, Suzuki M, Shimada K, Honma T (2008) Predicting human liver microsomal stability with machine learning techniques. J Mol Graph Model 26:907–915 Salamon MDG (1970) Stability, instability and design of coal pillar workings. Int J Rock Mech Min Sci Geomech Abstr 7(6):613–631 Salamon MDG, Munro AH (1967) A study of the strength of coal pillars. J South Afr Inst Min Metall
68:55–67
Sheng JH, Liao WJ, Li WM (2010) Analysis of pillar safety factor in Gaoshan gypsum mine. Metal Mine
(suppl.):791–793
Sjoberg JS (1992) Failure modes and pillar behaviour in the Zinkgruvan mine. In: Tillerson JA, Wawersik WR (eds) Proceedings of the 33rd U.S. rock mechanics symposium. A.A. Balkema, Sante Fe. Rot terdam, pp 491–500 Tawadrous AS, Katsabanis PD (2007) Prediction of surface crown pillar stability using artiﬁcial neural networks. Int J Numer Anal Meth Geomech 31(7):917–931 Tesfamariam S, Liu Z (2010) Earthquake induced damage classiﬁcation for reinforced concrete buildings. Struct Saf 32(2):154–164 Vapnik VN (1995) The nature of statistical learning theory. Springer, New York Venables WN, Ripley BD (2002) Modern applied statistics with S. Springer, New York Von Kimmelman MR, Hyde B, Madgwick RJ (1984) The use of computer applications at BCL Limited in planning pillar extraction and design of mining layouts. In: Proceedings of the ISRM symposium:
design and performance of underground excavations. British Geotechnical Society, London, p 53J63 Wattimena RK (2014) Predicting the stability of hard rock pillars using multinomial logistic regression. Int J Rock Mech Min Sci 71:33–40 Wattimena RK, Kramadibrata S, Sidi ID, Azizi MA (2013) Developing coal pillar stability chart using logistic regression. Int J Rock Mech Min Sci 58:55–60 York G (1998) Numerical modelling of the yielding of a stabilizing pillar/foundation system and a new design consideration for stabilizing pillar foundations. J. S.A. Inst Min Metall 98:281–293
316
Nat Hazards (2015) 79:291–316
Zhou J, Shi XZ, Dong L, Hu HY, Wang HY (2010) Fisher discriminant analysis model and its application for prediction of classiﬁcation of rockburst in deepburied long tunnel. J Coal Sci Eng 16(2):144–149 Zhou J, Li XB, Shi XZ, Wei W, Wu BB (2011) Predicting pillar stability for underground mine using Fisher discriminant analysis and SVM methods. Trans Nonferrous Metals Soc China 21(12):2734–2743 Zhou J, Li XB, Shi XZ (2012) Longterm prediction model of rockburst in underground openings using heuristic algorithms and support vector machines. Saf Sci 50(4):629–644 Zhou J, Li XB, Mitri HS, Wang SM, Wei W (2013) Identiﬁcation of largescale goaf instability in underground mine using particle swarm optimization and support vector machine. Int J Min Sci Technology 23(5):701–707