You are on page 1of 26

Nat Hazards (2015) 79:291–316 DOI 10.1007/s11069-015-1842-3

Nat Hazards (2015) 79:291–316 DOI 10.1007/s11069-015-1842-3 ORIGINAL PAPER Comparative performance of six supervised learning methods for

ORIGINAL PAPER

 

Comparative performance of six supervised learning methods for the development of models of hard rock pillar stability prediction

Jian Zhou 1,2 Xibing Li 1 Hani S. Mitri 2,3

Received: 16 May 2014 / Accepted: 30 May 2015 / Published online: 10 June 2015 Springer Science+Business Media Dordrecht 2015

Abstract The prediction of pillar stability (PS) in hard rock mines is a crucial task for which many techniques and methods have been proposed in the literature including machine learning classification. In order to make the best use of the large variety of statistical and machine learning classification methods available, it is necessary to assess their performance before selecting a classifier and suggesting improvement. The objective of this paper is to compare different classification techniques for PS detection in hard rock mines. The data of this study consist of six features, namely pillar width, pillar height, the ratio of pillar width to its height, uniaxial compressive strength of the rock, pillar strength, and pillar stress. A total of 251 pillar cases between 1972 and 2011 are analyzed. Six supervised learning algorithms, including linear discriminant analysis, multinomial logistic regression, multilayer perceptron neural networks, support vector machine (SVM), random forest (RF), and gradient boosting machine, are evaluated for their ability to learn for PS based on different input parameter combinations. In this study, the available data set is randomly split into two parts: training set (70 %) and test set (30 %). A repeated tenfold cross-validation procedure (ten repeats) is applied to determine the optimal parameter values during modeling, and an external testing set is employed to validate the prediction performance of models. Two performance measures, namely classification accuracy rate and Cohen’s kappa, are employed. The analysis of the accuracy together with kappa for the

& Jian Zhou csujzhou@hotmail.com

Xibing Li

xbli@mail.csu.edu.cn

Hani S. Mitri hani.mitri@mcgill.ca

  • 1 School of Resources and Safety Engineering, Central South University, Changsha 410083, China

  • 2 Department of Mining and Materials Engineering, McGill University, Montreal, QC H3A 2A7, Canada

  • 3 School of Civil Engineering, Henan Polytechnic University, Jiaozuo 454150, China

123

292

Nat Hazards (2015) 79:291–316

PS data set demonstrates that SVM and RF achieve comparable median classification accuracy rate and Cohen’s kappa values. All models are fitted by ‘‘R’’ programs with the libraries and functions described in this study.

Keywords Pillar stability Pillar design Hard rock mine Supervised learning Classification Repeated cross-validation R system

1 Introduction

Underground mining almost invariably involves leaving portions of the ore in the form of pillars, which are key structural columns (Brady and Brown 2003; Deng et al. 2003; Zhou et al. 2011). Pillar stability is an essential prerequisite for safe working conditions in room- and-pillar mines (Salamon 1970; Ghasemi et al. 2014a). Unstable pillars can result in rock sloughing from the pillar and can lead to the collapse of the roof if one or more pillars should fail (Mortazavi et al. 2009). As mining goes deeper, pillar failure becomes more frequent and critical due to the increase in ambient stresses. Consequently, pillar design and stability are two of the most complicated, extremely important and extensive problems in mining related to rock mechanics and ground control subjects. Because of their significance in safe and economical extraction of underground ores over the past decades, a great deal of valuable results on this topic has been reported by a number of authors on a variety of aspects and has made admirable efforts over the pillar design and layout applied in rocks. Various researchers have proposed a number of empirical design methods for pillar strength determination and often applied in practice, which have been reviewed and summarized in the literature (Hustrulid 1976; Lunder 1994; Brady and Brown 2003; Mark 2006; Mitri 2007; Jawed et al. 2013), i.e., the linear shape effect formula (Bieniawski and van Heerden 1975; York 1998), the power shape effect formula (Salamon and Munro 1967; Bieniawski 1968; Hedley and Grant 1972), the size effect formula (Hustrulid 1976), the Hoek–Brown formula (Hoek and Brown 1980), and analysis of retreat mining pillar stability method (Mark and Barton 1997; Ghasemi and Shahriar 2012). In an underground pillar design, it is difficult to determine the actual stress that will be acting on a pillar. However, the three main methods of calculating pillar stress are tributary area theory, numerical modeling (Lunder 1994), and neural network method (Monjezi et al. 2011). Thus, the stability of a pillar can be evaluated by calculating a safety factor (SF), which is the ratio of the average strength to the average stress in the pillar (Zhou et al. 2011). Theoretically, the SF value greater than 1 means that the pillar is stable, while the SF value lower than 1 means unstable. More often than not, these methods are questionable because pillar failures did occur even though the failed pillars had been considered stable, i.e., SF [ 1 (Deng et al. 2003; Zhou et al. 2011). Moreover, empirical methods are based on interpretation of available databases, which are compiled from ongoing or completed projects. It is therefore difficult to generalize the obtained results beyond the scope of the original site characteristics. Meanwhile, considerable work related to the prediction of PS has been undertaken by means of numerical simulation methods that allow for consideration of complex boundary conditions and material behavior. For example, a design methodology was proposed by York (1998) using the fast Lagrangian analysis of continua (FLAC) code to enable the yield point of the foundation of deep-level stabilizing pillars to be predicted in terms of the

123

Nat Hazards (2015) 79:291–316

293

cohesion, friction angle, and depth. Hutchinson et al. (2002) recommended the use of simulation methods for considerations crown pillar stability risk assessment in mine planning. Jaiswal et al. (2004) used three-dimensional boundary element method (BEM) to model asymmetry in the induced stresses over coal mine pillars with complex geometries and have enabled successful simulation of mining conditions. Griffiths et al. (2007) combined random field theory with an elastoplastic finite element method (FEM) algorithm in a Monte Carlo framework to estimate the stability of pillars. By using an explicit finite difference program FLAC3D, a model of numerical calculation was established by Li et al. (2007) for a deep mining pillar with dynamic disturbance under high stress. Numerical modeling was carried out by Jaiswal and Shrivastva (2009) using a three-dimensional FEM code to study the stress–strain behavior of coal pillars. Mortazavi et al. (2009) delved into the mechanisms involved in pillar failure as well as to investigate the nonlinear behavior of rock pillars within the FLAC model. Elmo and Stead (2010) investigated the use of the hybrid FEM/DEM code ELFEN in studying the failure modes of jointed pillars. Recently, Li et al. (2013) established 3D numerical modeling based on FLAC3D to determine the minimum thickness of the crown pillar for the subsea gold mine. Each of the numerical methods has its advantages and disadvantages. However, the estimation of reliable values of model input parameters is found to be an increasingly difficult task. Besides the numerical modeling approach, statistical and analytical methods, the probabilistic methods, and artificial intelligence-based methods or their hybrids have been investigated in recent years and successfully used for designing pillars in coal or hardrock. Esterhuizen (1993) showed that variability in rock mass properties and mining factors could be taken into consideration for hard rock pillar design by statistical methods and point estimate method. Griffiths et al. (2002) and Cauvin et al. (2009) investigated the underground pillar stability based on probabilistic methods. Ghasemi et al. (2010) studied the effect of variability in parameters such as uniaxial compressive strength of coal specimen, pillar width, pillar height, entry width, and depth of cover on pillar safety factors using a Monte Carlo simulation. Zhou et al. (2011) presented two models for predicting pillar stability applying support vector machine and Fisher discriminant analysis tech- niques. Wattimena et al. (2013) developed the logistic regression model for predicting the probability of stability of a coal pillar. On the other hand, different types of artificial neural networks are based on combining different learning techniques, such as hybrid or ensemble techniques. These have been reported on pillar stability analysis in recent years. Deng et al. (2003) proposed a pillar design based on Monte Carlo simulation by combining finite element methods, neural networks, and reliability analysis. Four ANNs, based on two different architectures, the multilayer perceptron (MLP) and the radial basis function (RBF), were constructed by Tawadrous and Katsabanis (2007) to predict the stability of surface crown pillars. Monjezi et al. (2011) developed a MLP neural network model methodology to predict the pillar stress concentration in the bord and pillar method and compare the results with BEM numerical solution. Recently, Ghasemi et al. (2014a, b) developed two models for the evaluation and prediction of global stability in room-and- pillar coal mines considering the retreat mining conditions by employing the logistic regression and the fuzzy logic techniques. In these studies, all the data are separated into training and testing sets. However, cross-validation process is not implemented, and thus, the accuracy of the predictive model is not fully understood. Hence, the issue of pillar stability prediction still poses considerable challenge for underground mines. Supervised learning (SL) has become steadily more mathematical and more successful in applications over the past 20 years. The use of SL algorithms for the development of predictive and descriptive data mining models has become widely accepted in mining and

123

294

Nat Hazards (2015) 79:291–316

geotechnical applications, promising powerful new tools for practicing engineers (Garzo´ n et al. 2006; Berrueta et al. 2007; Sakiyama et al. 2008; Pino-Mejı´as et al. 2008, 2010; Pozdnoukhov et al. 2009; Tesfamariam and Liu 2010; Zhou et al. 2011, 2012, 2013; Gonza´ lez-Rufino et al. 2013; Liu et al. 2013, 2014). Numerous approaches for PS pre- diction have been developed based on different SL techniques during recent decades (Tawadrous and Katsabanis 2007; Zhou et al. 2011; Wattimena et al. 2013; Ghasemi et al. 2014a, b). However, there is no comparison of SL techniques over the PS estimation. Based on these considerations, the main objective of this study is to investigate the suitability of different SL algorithms for the prediction of pillar stability (PS) in under- ground engineering. To achieve this goal, a research methodology is developed for the comparison of the performance of different SL algorithms, including linear discriminant analysis (LDA), multinomial logistic regression (MLR), multilayer perceptron neural networks (MLPNN), support vector machine (SVM), random forest (RF), and gradient boosting machine (GBM). These methods are specifically chosen because they are being increasingly used in engineering, yet have not been compared with one another exhaus- tively and also thanks to the open-source software availability. The rest of this paper is organized as follows: Sect. 2 briefly presents hard rock pillars data set and provides an overview of SL techniques. In Sect. 3, these methods are applied to the PS prediction, and in Sect. 4, the results are discussed by performance criteria. Finally, the conclusion is provided in Sect. 5.

2 Materials and methods

  • 2.1 Data sources and parameters

To measure the performance of the developed SL approaches, the data utilized in this study are composed of 251 cases of pillar histories collected from more than 10 published research works. The sources are reliable and include references published over the period 1972–2011. Field data are obtained from ten different databases of hard rock mines, which are: (1) Elliot Lake uranium mines in Canada (Hedley and Grant 1972); (2) Selebi-Phikwe mines in South Africa (Von Kimmelman et al. 1984); (3) open stope mines in Canada (Hudyma 1988); (4) Zinkgruvan mine in Sweden (Sjoberg 1992); (5) Westmin Resources Ltd.’s H-W mine in Canada (Lunder 1994); (6) Dawenkou gypsum mine in China (Liu and Zhai 2000); (7) Shizishan copper mine in China (Zheng 2002); (8) Marble mine in Spain (Gonza´ lez-Nicieza et al. 2006); (9) Gaoshan gypsum mine in China (Sheng et al. 2010); and (10) Stone mines in USA (Esterhuizen et al. 2011). The general database is the continuation of an existing database developed by Lunder (1994) and Zhou et al. (2011). Additional projects are added to the original sets from other available sources found in the literature. An effort is also made to complete missing data fields within the pillar database by checking many sources and published literature.

  • 2.2 Data visualization

Figure 1a shows the number of cases used in this study after 1972 from different countries. The distribution of PS data is shown in Fig. 1b as a pie chart illustrating the proportion of the three types of PS in hard rock mines, categorized as stable (S, 108 cases), unstable (U, 54 cases), and failed (F, 89 cases). The boxplot of the original data set is given in Fig. 2.

123

Nat Hazards (2015) 79:291–316

295

Fig. 1 Distribution of observed hard rock pillar events

Nat Hazards (2015) 79:291–316 295 Fig. 1 Distribution of observed hard rock pillar events Fig. 2
Nat Hazards (2015) 79:291–316 295 Fig. 1 Distribution of observed hard rock pillar events Fig. 2

Fig. 2 Boxplot of each variable for the three conditions of PS

The circles represent outliers (observations greater than the third quartile plus 1.5 times the interquartile range or smaller than the first quartile minus 1.5 times the interquartile range). For most of the data groups, the median is not in the center of the box, which indicates that the distribution of most data groups is not symmetric (Fig. 2). In addition, all dependent variables have some outliers expect pillar stress and pillar strength for all PS types, UCS for S and U types, and pillar width for U type. As shown in Fig. 3, the scatterplot matrix in the upper panel describes the pairwise relationship between parameters with corresponding correlation coefficients showing in the lower panel, whereas the marginal frequency dis- tribution for each parameter is demonstrated on the diagonal. It can be seen that the parameter UCS is notably correlated with pillar strength and pillar stress and that pillar height is notably correlated with pillar width.

2.3 Selection of input and output variables

As mentioned above, Zhou et al. (2011) developed their models based on five parameters including pillar width, pillar height, the ratio of the pillar width to its height, the uniaxial compressive strength of the rock, and pillar stress. Wattimena et al. (2013) applied their model for predicting coal pillar stability for given geometry (pillar width to height ratio) and stress conditions (pillar strength to stress ratio). Ghasemi et al. (2010) investigated the effect of variability in parameters such as uniaxial compressive strength of coal specimen, pillar width, pillar height, entry width, and depth of cover on pillar safety factors using a

123

296

Nat Hazards (2015) 79:291–316

Fig. 3 PS parameter interaction matrix. Pairwise relationships are in the lower triangle; correlation coefficients are in the upper triangle; marginal distributions are on the diagonal

296 Nat Hazards (2015) 79:291–316 Fig. 3 PS parameter interaction matrix. Pairwise relationships are in the

Monte Carlo simulation. Ghasemi et al. (2014a, b) used major contributing parameters on pillar stability such as mining height, depth of cover, entry width, panel width, pillar width, pillar length, crosscut angle, roof strength rating, and loading condition. Moreover, the UCS of the intact rock is used as it is an index that can be utilized in a simpler way without carrying out pillar strength estimation (Wattimena 2014). Pillar shape expressed by the ratio of the pillar width to pillar height (K) that accounts for the increased strength pro- vided by the shape and confinement of the pillar. As noted above, various researchers have determined strength of the hard rock pillars based on empirical formulas, which is based on parameter pillar width, pillar height, and UCS (Hedley and Grant 1972; Lunder 1994; Martin and Maybee 2000; Esterhuizen et al. 2011). On the other hand, the pillar stability graph typically involves two parameters: (1) pillar stress to UCS ratio and pillar width to height ratio (K) (Lunder and Pakalnis 1994; Martina and Maybee 2000; Wattimena et al. 2014) or (2) pillar strength to stress ratio (SF) and K (Wattimena et al. 2013). Based on these conditions, six relevant indicators are adopted in this study. There are as follows:

pillar width, pillar height, K, the uniaxial compressive strength (UCS) of the rock, pillar strength, and pillar stress. Those indicators are recognized as the major parameters to quantitatively discover the activities in context of pillar. Theoretically, there may be additional indicators, but the data collection is a massive challenge for their applicability. Hence, the six indicators are adopted, and compositions of the indicators are investigated to discover the effects of varying indicators in the current study. In terms of pillar shape, pillar strength, and pillar load, a number of experiments are performed using different combinations of input parameters to assess the performance SL methods for PS, as listed in Table 1. Numerous scholars have conducted a variety of PS classification methods. In all of the cases in the combined database, pillar stability assessments, which range from a simple assessment of ‘‘Stable/Failed’’ to a more rigorous approach based upon a five- or six-stage stability classification method (Hedley and Grant 1972; Von Kimmelman et al. 1984; Lunder 1994), have been investigated. Reviewing the combined database and the sug- gestion of Lunder (1994), the PS is classified into three stages, which can be provided adequate results for the combined database, i.e., stable (S), unstable (U), and failed (F), as depicted in Table 2.

123

Nat Hazards (2015) 79:291–316

297

Table 1 Different models for PS prediction with different input parameters

Model

Pillar height

Pillar width

K

UCS

Pillar strength

Pillar stress

A

H

H

HH

H

B

HH

 

H

C

HHH

D

HH

H

H

Table 2 Description of hard rock pillar stability classification for the combined database (Modified from Lunder 1994)

Pillar stability

Description of observed pillar characteristics

classification

Failed

Crushed (Hedley and Grant 1972) Severe spalling, pronounced opening of joints, deformation of drill holes (Von Kimmelman et al. 1984) Disintegration of pillar; blocks falling out; fractures trough pillar with fracture apertures wider than 10 mm (Lunder 1994) Pillars with manifested failure (Gonza´ lez-Nicieza et al. 2006)

Unstable

Partially failed (Hedley and Grant 1972) Prominent spalling (Von Kimmelman et al. 1984) Fractures also appear on the central parts of the pillar (Krauland and Soder 1987) Sloughing (Hudyma 1988) Showing one or more of the following signs: cracking and spalling in development and raises within the rib pillar; audible noise in the pillar; deformed drill holes; excess muck being pulled from stopes; cracking of pillars; major displacements within the pillar (Potvin et al. 1989) Pre-failure, severe spalling (Sjoberg 1992) Corner breaking only up to fracturing in pillar walls with fracture aperture up to 10 mm (Lunder 1994) Stable pillars with incipient damage (Gonza´ lez-Nicieza et al. 2006)

Stable

Minor spalling, no joint opening (Von Kimmelman et al. 1984) No sign of stress induced fracturing (Lunder 1994)

2.4 Supervised learning classification methods

As noted by Hastie et al. (2009), the machine learning problems can be roughly categorized as either unsupervised learning or supervised learning. From a theoretical point of view, supervised and unsupervised learning differ only in the causal structure of the model (Kordon 2010). In unsupervised learning, all the observations are assumed to be caused by latent variables, there is no outcome measure, and the goal is to describe the associations and patterns among a set of input measures. In supervised learning, the model defines the effect on one set of observations (called inputs), has on another set of obser- vations which is fully labeled (called outputs), and the goal of supervised learning is to predict the value of an outcome measure based on a number of input measures. Six classification techniques are considered in the current study. These share characteristics that make them interesting to the current analysis: (a) All these methods are being increasingly used; (b) some of them have been used in PS classification tasks with good results and are known to enable the analysis of more complex, nonlinear relationships;

123

298

Nat Hazards (2015) 79:291–316

(c) they have efficient implementations; and (d) the resulting model allows for fast clas- sification processing. The following six methods are compared with respect to their pre- dictive performance. In this study, the feature vector X consists of six PS performance modifiers {X 1 , X 2 , X 3 , X 4 , X 5 , X 6 }, which correspond to the variables discussed in Sect. 3. The set of all feature vectors is denoted as H. Three PS states are defined, i.e., stable (S), unstable (U), and failure (F), respectively, as Z i (i = 1, 2, 3). Each class Z i is associated with a discriminant function f i (x). Several articles are published that compare multiple SL techniques (e.g., Garzo´ n et al. 2006; Berrueta et al. 2007; Sakiyama et al. 2008; Pino- Mejı´as et al. 2008, 2010; Pozdnoukhov et al. 2009; Zhou et al. 2011; Gonza´ lez-Rufino et al. 2013). Based on these articles and the focus herein on PS classifications, only a brief description of each classification technique will be presented. For a more in-depth dis- cussion, the reader is referred to the relevant references.

  • 2.4.1 Linear discriminant analysis (LDA)

The goal of LDA proposed by Fisher (1936) is derived from the Bayes rule, assuming that patterns belonging to class Z, and followed a normal (Gaussian) distribution with mean l Z and non-singular covariance matrix R common to all the classes (Zhou et al. 2010, 2011; Gonza´ lez-Rufino et al. 2013). Under these hypotheses, the Bayes rule assigns a test pattern x to the class Z with the highest posterior probability w(Z|x), given by

log½wðZ jxÞ ¼ log p Z

1

þ ðx l Z Þ T R 1 ðx l Z Þ 2 ½logjRj

ð1Þ

where jRzj is the determinant of Rz, or, equivalently, to the class Z which maximizes the linear function as follows:

L Z ðxÞ ¼ 2 log p Z l Z R 1 l Z þ 2l Z R 1 x

T

T

ð2Þ

The matrix P is approximated by the within-class covariance matrix

W ¼ ðX QMÞ T ðX QMÞ=ðn M Þ

ð3Þ

where X is the N 9 n-order training set matrix, Q is the N 9 M-order matrix of class indicators, and M is the M 9 n matrix with the class means, n, N, and M denote the number of inputs, training patterns, and classes, respectively.

  • 2.4.2 Multinomial logistic regression (MLR)

MLR is found to work well for multiclass classification (Sadat-Hashemi et al. 2005; Krishnapuram et al. 2005; Pandya et al. 2014). Let x and y be the matrices of predictors and response with Z levels, respectively; hence, if w i = w r (y = i) and i = 1, 2, 3, , Z, then

by considering a multinomial distribution for y, we have R i¼1 Z w i ¼ 1, so that a method to

predict w i s is to use a MLR as (Z - 1) equations on (Z - 1) dummy variables, y i , as follows:

y i ¼

1

0

if y ¼ i

otherwise ;

In

w i

w Z

¼ X i b i ;

i ¼ 1; 2; 3;

...;

Z 1

ð4Þ

where b i is regression coefficient for the ith equation, w i is the probability of obtaining the ith outcome. We can compute w i , , w Z - 1 and w Z for each subject using these (Z - 1) equations. Each case will be allocated to jth category of y, if w j = Max(w 1 , w 2 , , w Z ).

123

Nat Hazards (2015) 79:291–316

299

  • 2.4.3 Random forest (RF)

The RF technique is based on the use of a large series of low-dimensional regression trees.

Its theoretical development is presented by Breiman (2001). The RF algorithm used here is

implemented by Liaw and Wiener (2002). The main idea of RF algorithm is to reduce the

correlation among the trees in order to improve the variance reduction in bagging by

growing trees that perform random selection on the input variables.

RF is very user-friendly in the sense that it has only two parameters—the number of

variables in the random subset at each node (m try ) and the number of trees in the forest

(n tree )—and is usually not very sensitive to their values (Kuhn and Johnson 2013).

  • 2.4.4 Artificial neural network (ANN)

ANN is a computational paradigm that provides a great variety of mathematical nonlinear

models, useful for tackling different statistical problems (Haykin 1998; Pino-Mejı´as et al.

2010). In this work, multilayer perceptron neural network (MLPNN) is employed (Pino-

Mejı´as et al. 2010). Defining G = (G 1 , , G M ) as the vector of all the M coefficients of the

net, and given n q-sized target vectors y 1 , , y n , then the Broyden–Fletcher–Goldfarb–

Shanno procedure (Bishop 1995) method can be applied to the following nonlinear least-

squares problem (Pino-Mejı´as et al. 2010; Kuhn and Johnson 2013):

Min k R M

G

i¼i G i 2

þ R n k y i y i k 2

i¼i

ð5Þ

The R implementation of an MLPNN model requires the specification of two parameters:

the size of the hidden layer (H) and the decay parameter (g), and it must be remarked that

greater values for H could not be attempted due to the limited memory resources of our

personal computers (Hastie et al. 2009; Gonza´ lez-Rufino et al. 2013).

  • 2.4.5 Support vector machine (SVM)

SVM is among the most recent, significant developments in the field of discriminatory

analysis (Vapnik 1995). In the case of non-separable data, the ‘‘ideal boundary’’ must be

adapted to tolerate errors for some objects i:

8

<

:

minimize

1

2

2

jdj þC R n

n

i¼1

i

under the constraints y i ðb þ d x i Þ þ n i 1 and n i 0

ð6Þ

where C is the penetrating parameter, d and b are, respectively, the normal vector and the

bias of the hyperplane, and each n i corresponds to the distance between the object i and the

corresponding margin hyperplane (Cortes and Vapnik 1995; Devos et al. 2009; Zhou et al.

2011, 2012, 2013).

To learn nonlinearly separable functions, data are implicitly mapped to a higher-di-

mensional space by means of mercer kernels, which can be decomposed into a dot product,

K(x i , x j ) = u(x i ) u(x j ) (Zhou et al. 2012). Radial basis function kernel that is being

extensively used is given below:

K

x i ; x j

¼ exp r

x i x j

2

where r is the kernel parameter.

ð7Þ

123

300

Nat Hazards (2015) 79:291–316

2.4.6 Gradient boosting machine (GBM)

Friedman (2001) proposed the gradient boosting machine (GBM) using the connection

between boosting and optimization. GBM builds the model in a stage-wise fashion like

other boosting methods do, and it generalizes them by allowing optimization of an arbi-

trary differentiable loss function (Guelman 2012; Kuhn and Johnson 2013). Using this

technique, function approximation is viewed from the perspective of numerical opti-

mization in the function space, rather than in the parameter space.

  • 2.5 R Software

R (R Development Core Team 2013) is a popular open-source software environment for

statistical computing and data visualization available for most mainstream platforms

(http://www.R-project.org/). All data processing is performed using R software (version

3.02). R provides the most common SL classification methods. Herein, we will only list

some of them. Further details about input parameters, implementation, and references can

be found in package documentation manuals. The packages necessary for each method and

the functions utilized to build the models are summarized in Table 3.

3 Pillar stability assessment model development

  • 3.1 Preparing training and testing data sets

In supervised learning, the performance of a classifier needs to be assessed on a given data

set before using it to predict the class of a new project. To do so, the original data set of PS

with known classes is randomly divided into two subsets: a training set and a test set. The

training set is required to estimate model parameters and construct the each PS classifier

model. In this study, approximately 70 % of the available data (177 cases of database) are

considered for training data set. The test set is used as external validation set for testing the

performance and predictive power of each final model. In this work, the reserved 74 cases

are considered as testing data set.

To guarantee comparability, all classifiers are generated using the same training set and

are validated by applying them to the same PS. Utilizing these classifiers, the classes of the

Table 3 R packages and functions used to run methods described in Sect. 3

Method R package

R function

Tuning

References

 

parameters

LDA

MASS

lda

None

Fisher (1936), Venables and Ripley (2002)

MLR

nnet

multinom

{g}

Venables and Ripley (2002), Ripley (2009)

ANN

nnet

nnet

{L, g}

Haykin (1998), Pino-Mejı´as et al. (2010)

SVM

kernlab

svmRadial

{C, r}

Vapnik (1995), Karatzoglou et al. (2004)

RF

randomForest randomForest n tree , m try

Breiman (2001), Liaw and Wiener (2002)

GBM

gbm

gbm

{n tree , v, J}

Friedman (2001), Ridgeway (2007)

Nat Hazards (2015) 79:291–316

301

PS are predicted and subsequently the misclassification error rate is calculated with the

preferred classifier being the one with the lowest misclassification error rate. Scaling of the

input–output data is generally required prior to processing. All input variables are scaled

with function preProcess, which can be used to impute data sets based only on information

in the training set (Kuhn 2008).

3.2 Evaluation of classifier’s performance

There is no generally accepted measure of performance for multiclass models. Thus, the

predictive power of SL algorithms on PS data is evaluated by means of the overall

accuracy (OA, Foody 2002) of the classification and the Cohen’s kappa coefficient (Kappa,

Cohen 1960) in this study. For each classification, a confusion matrix is presented, along

with its user’s and producer’s accuracy (Congalton and Green 2009). Let r ij (i and j = 1, 2,

, m) is the joint frequency of observations assigned to class i by prediction and to class j

by observation data, r i? is the total frequency of class i as derived from the prediction, and

r ? j is the total frequency of class j as derived from the observation data, as indicated in

Table 4. The OA, which is defined as the percentage of records that is correctly predicted

by the model relative to the total number of records among the classification models, is a

primary evaluation criterion. The OA can be obtained by

OA ¼

1

n

R m

i¼1 r ii

100%

ð8Þ

The Cohen’s kappa coefficient measures the proportion of correctly classified units after

the probability of chance agreement has been removed, which is a robust index which takes

into account the probability that a pixel is classified by chance (Kuhn and Johnson 2013).

The kappa is, therefore, always slightly lower than the classification accuracy rate mea-

surement and can be obtained using the following expression

Kappa ¼

nR m

i¼1 r ii R m

i¼1 ðr iþ r þi Þ

n 2 R m

i¼1 ðr iþ r þi Þ

ð9Þ

where x ii is the cell count in the main diagonal, n is the number of examples, Z is the

number of class values, and r ? i , r i? are the columns and rows total counts, respectively.

Table 4 Population confusion matrix with r ij representing the proportion of area in the prediction category i and the observation category j

Predicted

Observed

Total

UA (%)

1

2

m

1

2

m

Total

PA (%)

r 11

r 21

r m

1

r ?

1

(r 11 /r ? 1 ) 9 100 %

r 12

r 22

r m

2

r ?

1

(r 22 /r ? 2 ) 9 100 %

r 1 m

r 2 m

……

r mm

r ? m

(r mm /r ? m ) 9 100 %

r 1?

r 2?

r m ?

(r 11 /r 1? ) 9 100 % (r 22 /r 2? ) 9 100 % (r mm /r m ? ) 9 100 %

302

Nat Hazards (2015) 79:291–316

Table 5 Relative strength of agreement associated with kappa statistic

Kappa statistic

Strength of agreement

0.81–1.00

Almost perfect

0.61–0.80

Substantial

0.41–0.60

Moderate

0.21–0.40

Fair

0.00–0.20

Slight

-1.00–0.00

Poor

The kappa measures the correct classification rate after the probability of chance

agreement has been removed (Congalton 1991). Landis and Koch (1977) proposed a scale

to describe the degree of concordance (Table 5); the kappa ranges from -1 (total dis-

agreement) through 0 (random classification) to 1 (perfect agreement), as can be seen from

Table 5; a value of kappa below 0.4 is an indication of poor agreement and a value of 0.4

and above is an indication of good agreement (Landis and Koch 1977; Sakiyama et al.

2008).

According to Congalton and Green (2009), producer’s accuracy of class i (PA i ) can be

computed by

PA i ¼

p ii

p þm

100 %

¼

p ii

R m

i¼1 p im

100 %

ð10Þ

and the user’s accuracy of class i (UA i ) can be computed by

UA i ¼

p ii

p mþ

100 % ¼

p mj ! 100 %

p ii

m

R

j¼1

ð11Þ

3.3 Validation method of the proposed models

Several adjustable ‘‘tuning parameters’’ used by each of the SL method to optimize

classification performance are examined using repeated tenfold cross-validation (CV) in

terms of computation time and variance, which is the number of folds recommended by

Kohavi (1995) when comparing the performance of machine learning algorithms (Kohavi

1995; Le-Thi-Thu et al. 2011; Clark 2013; Kuhn and Johnson 2013). In this procedure,

compounds of the training data are randomly divided into 10 subsets. Nine subsets are used

as novel ‘‘training data’’ to develop each SL method, and the holdout set is used for

‘‘predict’’ the performance of the fitted model. This process is repeated 10 times on

different training subsets, and at the end, every instance has been used exactly once for

testing, and finally, the CV estimate of overall accuracy is calculated by simply averaging

the 10 individual accuracy measures for CV accuracy, and the whole tenfold CV process is

also repeated 10 times (‘‘folds’’) to obtain the reliable results. This procedure is used for

the selection of parameters and to avoid over-fitting of models. The test set is never used in

the development of the model, but it is used to test the predictive power of the final model.

Thus, the repeated tenfold CV resampling technique (Molinaro et al. 2005) is used to

create and optimize SL models for hard rock pillars classification in the present work. We

construct the predictive models using selected variables and training set and applied to test

set as shown in Fig. 4.

123

Nat Hazards (2015) 79:291–316

303

Stable Unstable Failed Hard rock pillar stability database Splitting into training and test sets Training Set
Stable
Unstable
Failed
Hard rock pillar stability database
Splitting into training and test sets
Training Set
Test Set
(70%)
(30%)
LDA, MLR, ANN, SVM, GBM and RF Classifiers
Repeated 10-fold cross validation
Optimal predictive
supervised learning model
Performance evaluation of predictive model (Accuracy and Kappa)
R environment

Fig. 4 Overall procedure flowchart for performance evaluation for PS classification using SL methods

3.4 SL method development and parameter optimization

This study examines the suitability of the following six common classification algorithms

using the PS data set: LDA, MLR, MLPNN, SVM, GBM, and RF algorithms. Most

classifiers (MLR, ANN, SVM, RF, and GBM methods) include parameters that have to be

tuned. The ‘‘train’’ function from caret (classification and regression training) package

within R (Kuhn 2008) performs a grid of tuning parameters for a number of classification

routines, which allows for a single consistent environment for training each of the SL

methods and tuning their associated parameters. After assessing the optimal parameters,

the whole training data set is used to build the final model for PS prediction. The term

‘‘PS’’ refers to the classification task. A desired ‘‘tune length’’ variable can be passed to the

‘‘train’’ function in caret package (Kuhn 2012; Kuhn and Johnson 2013). ‘‘Optimal’’ values

for tuning parameters are selected using a repeated tenfold CV based on the original

training data set, with the original test removed completely from the CV process. Tuning

parameters are considered optimized based on classification models that achieved the

largest value of kappa during the CV process. So it will find the one with the highest

accuracy and kappa, and thus, an optimal solution can be searched. Specific details on

tuning parameters used by the six SL algorithms examined in the current study are listed in

the following sections. The final results (classification rate and kappa and tuning param-

eters for each algorithm) are presented in Table 6.

LDA: This classifier needs no tuning of hyperparameters.

MLR: The parameter for weight decay (decay) is tuned for 10 values (0, 1e-04, 0.000237,

0.000562, 0.00133, 0.00316, 0.0075, 0.0178, 0.0422, and 0.1) to find the optimal model.

RF: Tuning of RF method involves finding optimum value for number of classification

trees (n tree ) and number of variables (m try ), which are randomly selected at each split in

the tree building process. It has been observed that the OA is more sensitive to m try and

not much effected by n tree (Breiman and Cutler 2007). Therefore, n tree is fixed at default

123

304

Nat Hazards (2015) 79:291–316

Table 6 Tuning parameters of each model for an optimal classification

Method Turning parameters

 

Model A

Model B

Model C

Model D

LDA

None

None

None

None

MLR

g = 0.1000

g = 0.1000

g = 0.0001

g = 0.1000

RF

{m try , n tree } = {2,

{m try , n tree } = {1,

{m try , n tree } = {1,

{m try , n tree } = {4,

500}

500}

500}

500}

ANN

{L, g} = {15,

{L, g} = {19, 0.0075} {L, g} = {17,

{L, g} = {19,

0.0421}

0.0074}

0.0075}

SVM

{C, r} = {16, 0.552} {C, r} = {16, 0.563}

{C, r} = {64, 1.187} {C, r} = {16, 0.622}

GBM

{n tree , v, J} = {400, 0.1, 2}

{n tree , v, J} = {150, 0.1, 10}

{n tree , v, J} = {50, 0.1, 10}

{n tree , v, J} = {500, 0.1, 6}

value 500 and m try is tested for t values, where t is the number of input layers in each

classification setup.

ANN: The number H of hidden neurons in the range 1 \ H \ 19 (ten values), and

trying 10 random weight initializations (decay). Delay = {0, 1e-04, 0.000237,

0.000562, 0.00133, 0.00316, 0.0075, 0.0178, 0.0422, and 0.1}.

SVM: The parameter C is tuned for 12 values (2 - 2 , 2 - 1 , 2 0 , 2 1 , 2 2 , 2 3 , 2 4 , 2 5 , 2 6 , 2 7 , 2 8 ,

and 2 9 ) to find the optimal model. The ‘‘caret’’ package initially estimates an

approximate value for the sigma parameter using the ‘‘sigest’’ function based on the

training data.

GBM: The GBM has three tweaking parameters: the total number of iterations

(n.trees), the learning rate (shrinkage parameter v), and the complexity of the tree

(indexed by interaction.depth J). Tuning parameter ‘‘shrinkage’’ is held constant at a

value of 0.1, n.trees = 50, 100, 150, 200, 250, 300, 350, 400, 450, 500; J = 1, 2, 3, 4,

5, 6, 7, 8, 9, 10.

4 Results and discussions

4.1 Classification results achieved by classifiers

Average values obtained from 10 repetitions of tenfold CV are implemented for all the

comparisons for each method. Visual comparison of the performance of all SL classifi-

cation methods is indicated in Figs. 5, 6. The boxplot in Fig. 5 reports the performance of

six classifiers using different input variables on repeated tenfold CV phrase with training

set average accuracy and variance. On each box, the central mark is the median, the edges

of the box are the 25th and 75th percentiles, the whiskers extend to the most extreme data

points not considered outliers, and outliers are plotted individually. Since the notches in the

box plots overlap, we can conclude, with a certain confidence, that the true medians do not

differ (See Fig. 5). The outliers are marked separately with the dotted points. The dif-

ference between lower and upper quartiles in RF (Fig. 5a, c) is comparatively smaller than

the others that show relatively low variance of accuracies in different iterations. Density

plot methods can be used to visualize the resampling distributions (Kuhn and Johnson

123

Nat Hazards (2015) 79:291–316

305

Fig. 5 Boxplot distributions of training set in terms of ‘‘accuracy’’ and ‘‘kappa’’ for six SL methods—resulting from repeated tenfold CV procedure

(a) Model A (b) Model B (c) Model C
(a)
Model A
(b)
Model B
(c)
Model C
  • (d) Model D

123

306

Nat Hazards (2015) 79:291–316

Fig. 6 Density plot distributions of training set in terms of ‘‘accuracy’’ and ‘‘kappa’’ for six SL methods—resulting from repeated tenfold CV procedure

(a) Model A (b) Model B (c) Model C
(a)
Model A
(b)
Model B
(c)
Model C
  • (d) Model D

123

Nat Hazards (2015) 79:291–316

307

Table 7 Performance metrics of each model for test data

Method

Model A

Model B

Model C

Model D

OA (%)

Kappa

OA (%)

Kappa

OA (%)

Kappa

OA (%)

Kappa

LDA

67.8

0.461

59.5

0.345

63.5

0.389

64.9

0.424

MLR

66.2

0.436

63.5

0.392

63.5

0.388

63.5

0.400

RF

82.4

0.723

75.7

0.609

71.6

0.543

73.0

0.576

ANN

81.1

0.703

75.7

0.611

75.7

0.613

75.7

0.615

SVM

82.4

0.726

68.9

0.502

71.6

0.550

79.7

0.684

GBM

79.7

0.678

77.0

0.636

66.2

0.474

75.7

0.616

Bolded values indicate the highest value for each model

2013). Figure 6 illustrates density plots of the 200 bootstrap estimates of accuracy and

kappa for the final model. Table 7 summarizes the overall predictability of test set by

comparing two measures between six classifiers, OA and kappa. The kappa is a measure of

true accuracy which takes into account the agreement that may have occurred by chance. It

is considered to be preferable when it is larger than 0.4 (Landis and Koch 1977). Not

surprisingly, the linear methods, such as LDA and MLR, did not do well herein, and this is

likely due to the model inability to handle nonlinear class boundaries. For the model A, in

terms of average accuracy rate for training set, SVM predictor achieved the highest OA

(84.3 %) and followed by ANN, GBM, and RF with average accuracy rates of 83.2, 80.9,

and 77.8 %, respectively. MLR performed relatively worse with an average accuracy rate

of 69.3 %, and LDA with the lowest average accuracy rate of 67.0 %. However, RF and

SVM achieved the highest OA (82.4 %) for test set. ANN performed relatively worse with

OA (81.1 %), and LDA with the lowest OA (66.2 %). One the other hand, the kappa of the

LDA, MLR, RF, ANN, SVM, and GBM techniques for training set and test set in model

calibration data is from moderate to substantial on the basis of the scale of concordance

presented by Landis and Koch (1977). The accuracies of all modeling techniques for the

model evaluation test set are from moderate to substantial according to the scale of con-

cordance. For the model B, ANN predictor achieves the highest OA (80.3 %) for training

set and followed by RF, GBM, and SVM with average accuracy rates of 79.1, 78.4, and

76.9 %, respectively. LDA performs relatively worse with an average accuracy rate of

65.7 %, and MLR with the lowest average accuracy rate of 65.3 %. However, GBM

achieves the highest OA (77.0 %) for test set. GBM, ANN, and SVM performed relatively

worse with OA (75.7–78.4 %), and LDA with the lowest OA (59.5 %). One the other hand,

the kappa of LDA, MLR, RF, ANN, SVM, and GBM techniques for training set in model

calibration data is from moderate to substantial. The accuracies of all modeling techniques

for the model evaluation test set are from fair to substantial. Similarly, results for model C

and D can be seen in Fig. 5 and Table 7. We can observe the following facts: For the six

SL techniques, the performance (in terms of accuracy) of the training set falls into the

range of (65.3–84.3 %) across the four models, while the performance (in terms of

accuracy) of the test set performance falls into the range of (59.5–82.4 %). The predictive

accuracy of LDA, MLR, RF, ANN, SVM, and GBM techniques for the training set in

model calibration data ranges from moderate to substantial. The accuracy of all modeling

techniques for the test set ranges from fair to substantial.

123

308

Nat Hazards (2015) 79:291–316

4.2 Comparison of SL classification techniques

Out of the four models in Table 7, the results indicate a better performance of SVM with

model A using five parameters (pillar height, pillar width, UCS, pillar strength, and pillar

stress) to train the SVM. A classification accuracy of 84.3 % is achieved with the RBF

kernel. Out of 74 test data records, 17 are incorrectly classified (Table 8). As can be seen

from Table 8, Model A provides the best results for testing data using all input parameters

with RF. RF produced the best outcome in terms of OA and kappa for test set (Table 8).

Higher classification accuracy with training set in comparison with the test set may not

indicate a better generalization capability. This may be due to the over-training (or

memorizing the training set) of models. A comparison of results from model A and D

suggests that the contribution of pillar strength is a bit sensitive (Fig. 5; Table 8). How-

ever, there are very few significant differences in OA and kappa among of the model B and

C, and the combinations of four (model A) or five (model D) input parameter results are

better than combinations of three input parameters (model B and C), thus suggesting the

combinations of five (model A) input parameters is the best choice.

For quantifying the accuracy of the classified maps, accuracy assessment based on a

confusion or error matrix is carried out using 74 independent reference samples. Four

metrics (Congalton and Green 2009): (1) OA, (2) kappa, (3) producer accuracy (PA), and

(4) user accuracy (UA) are calculated for each PS class using the model A from the

confusion matrix, which is presented in Table 8. The PA and UA indicate that some

features are better classified than others. As can be seen from Table 8, ‘‘Stable’’

(90.6–96.9 % and 66.0–93.5 %) is classified more accurately compared to ‘‘Unstable’’

(0–56.3 % and 0–69.2 %). Also ‘‘Failed’’ receives a relatively low PA (69.2–92.3 %) and

UA (69.2–88.5 %). Unstable is often confused with ‘‘Stable’’ and ‘‘Failed.’’ This is likely

the effect of overlapping classification rules and the small number of samples in this class.

The results show that some SL methods substantially outperform others for this classifi-

cation problem. It is clear that SVM and RF are both capable of achieving high accuracy

for all classes despite the heavily unbalanced data set. In particular, the combination of the

five features SVM and RF classification is very effective and yields one of the best

classification results with the OA of 82.4 %.

As can be observed in Fig. 6, LDA and MLR of kappa distributions appear to be similar

to one another but different from the ANN, SVM, RF, and GBM methods. There are very

few statistically significant differences between these methods (ANN, SVM, RF, and

GBM) using the resampling results. Given this, any of these methods would be a rea-

sonable choice. Since models are fit on the same versions of the training data, it makes

sense to make inferences on the differences between models. From the above observations,

the predictions are very unstable along the different SL methods, and none of the methods

is found as being excellent with respect to every test measure, but the methods of SVM and

RF, particularly the method of SVM, are performed to be better than the others ANN and

GBM; there is no significant difference in terms of generalization performance, but if we

have to choose the best, SVM is the one. However, GBM and ANN are the most com-

putationally intensive techniques and take the longest amount of time to train. This high

classification accuracy likely results from the computationally intensive back-propagation

process, during which feature weights are modified according to an iterative method. In

addition, LDA performs the worst in terms of training set. Its predictive performance is

also worse than the learner of MLR. While MLR performs the worst in terms of test set, its

predictive performance is also worse than the learner of LDA.

123

Nat Hazards (2015) 79:291–316

309

OA = 82.4 % Kappa = 0.726

OA = 79.7 % Kappa = 0.678

OA = 81.1 % Kappa = 0.703

UA (%)

57.1 %

85.3 %

88.5 %

64.3 %

93.5 %

79.3 %

54.5 %

82.9 %

85.7 %

Diagonal elements (correct decisions) are marked in bold letters. OA overall classification accuracy, PA producer’s accuracy, UA user’s accuracy

Total

14

34

74

14

74

74

26

29

28

35

31

11

Table 8 Confusion matrices and associated classifier accuracies for best model predictions based on test data of hard rock pillars

50.0 %

56.3 %

37.5 %

16

16

16

FSU

2

6

9

8

4

6

5

5

3

90.6 %

90.6 %

90.6 %

32

32

32

29

29

29

0

0

2

3

3

1

Predicted Observed

88.5 %

88.5 %

92.3 %

24

23

26

23

26

26

0

0

0

2

3

3

PA (%)

PA (%)

PA (%)

GBM

SVM

Total

Total

Total

ANN

U

U

U

F

F

F

S

S

S

OA = 62.2 % Kappa = 0.436

OA = 82.4 % Kappa = 0.723

OA = 67.8 % Kappa = 0.461

UA (%)

50.0 %

67.4 %

69.2 %

66.0 %

69.2 %

69.2 %

88.5 %

82.9 %

0.0 %

Total

74

74

74

46

26

26

26

47

35

13

2

1

56.3 %

6.3 %

0.0 %

16

16

16

FSU

0

9

4

9

7

7

8

1

3

96.9 %

96.9 %

90.6 %

32

32

32

29

31

31

10

10

0

13

1

1

Predicted Observed

69.2 %

69.2 %

88.5 %

18

18

26

26

23

26

2

7

7

PA (%)

PA (%)

PA (%)

Total

Total

Total

MLR

LDA

RF

U

U

U

F

F

F

S

S

S

123

310

Nat Hazards (2015) 79:291–316

Furthermore, Lunder and Pakalnis (1997) proposed that the pillar stability could be

adequately expressed by two SF lines. Pillars with a SF [1.4 stable while those with a SF

\1 are failed, and the transition zone from stable condition to failed condition

(1\SF \ 1.4) is referred to as unstable, and pillars in this region are prone to spalling and

slabbing but have not completely failed (Lunder and Pakalnis 1997; Martin and Maybee

2001). Similarly, work on classification of pillars in marble mines has been established by

Gonza´ lez-Nicieza et al. (2006); they suggested: SF [1.25 (stable), 0.90 \ SF \ 1.25

(unstable), and SF \0.90 (failed). The predictive accuracy of two methods is 68.9 %

(Lunder and Pakalnis 1997) and 68.5 % (Gonza´ lez-Nicieza et al. 2006) of the original data,

respectively. It is obvious judging from the predictive accuracy that the empirical methods

cannot generate satisfactory predictions on these pillar instances.

  • 4.3 Superiority and Limitations

The primary strength of this study is in the systematic assessment of the pillar stability

classification in hard rock mines utilizing six SL methods. It is evident that a powerful

statistical programming language system is necessary to implement this computing task; a

good and cheap choice is to employ the R system, offering us the free implementation of

SL classification methods.

Although there are important discoveries revealed by these studies, there are also

limitations. First, training any classifier with an unbalanced data set in favor of negative

instances makes it difficult to learn the positive instances. The unbalanced distribution in

prior probabilities of the three classes in both training and test sets affects the reliability of

the predictor in all SL approaches. Second, other relevant variables of pillar stability may

be collected in an effort to increase the prediction accuracy of the models. Third, regarding

the comparison of more SL methods, other newly developed classification methodologies

can be employed and their results can then be compared with those studied in the paper.

Additionally, discontinuities and joint factors have been omitted in this study. On the other

hand, every approach has its advantages and disadvantages; Table 9 summarizes the

strengths and weaknesses of the six SL methods described above.

  • 4.4 Relative importance of variables

The generic function varImp () in caret package can be used to characterize the general

effect of predictors on the model (Kuhn and Johnson 2013). VarImp also works with

objects produced by train, but this is a simple wrapper for the specific models previously

listed. In this work, we illustrate how to determine relative importance of discriminating

features by RF method. For most classification methods, each predictor will have a sep-

arate variable importance for each class and the default variable importance metric regards

the area under the curve or AUC from a ROC analysis with regard to each predictor, and is

model independent.

Variables are sorted by average importance across the classes in this work. From Fig. 7,

we can find how important the variables are for PS classification with RF method. Fig-

ure 7a demonstrates that pillar stress is the most sensitive factor among the indicators for

the prediction of PS classification. Not surprisingly, the indicator pillar strength takes the

second place of sensitivity. The index of pillar width is a bit sensitive. The factors of UCS

and pillar height are not as sensitive as the former three factors. Figure 7b–c also provides

the result of the RF method using function ‘‘varImp()’’ in the ‘‘caret’’ package and displays

the relative variable importance for each of the three predictor variables. Pillar stress is also

123

Nat Hazards (2015) 79:291–316

311

Table 9 Summary of strengths and weaknesses of the various SL methods

Method Strengths

Weaknesses

LDA

Simple, fast, efficient, strong theory base, linear, interpretability

LDA only performs well when all classes are strictly homogeneous and cannot be used if the number of variables is higher than the total number of samples

MLR

Extract more information from the data and prevent the loss of information due to collapsing

Computationally intensive

RF

Works well with high-dimensional small sample sizes, some tolerance to correlated inputs, fast computation

Difficult to interpret, prone to overfitting in certain data sets, do not handle large number of irrelevant features as well as other ensemble methods

SVM

Can be used to classify complex biological nonlinear data, not prone to local minima, works well with high-dimensional small data sets, avoids overfitting, robust to noise

Very black box, computational scalability, lack of transparency, restricted to pairwise classification, cannot be used directly for feature selection

ANN

Nonlinear adaptability, no assumptions required on probability density and distribution, certain configurations have been proven to being a universal approximator

Difficult to design an optimal architecture; High computational cost, not robust to outliers, loss of generality, risk of overfitting, prone to sub-optimal local minima, inability to extract features responsible for results, black box presents uncertainties for mission critical applications

GBM Theoretical properties, identification of outliers

High computational cost, not interpretable

the most sensitive factor among all response variables for model B and Model C, followed

by the indicator pillar strength, K, UCS. Figure 7d shows the different results between

model A and model D for variable importance of UCS and pillar width. However, these

results demonstrate that pillar stress is the most relevant predictor among the indicators for

the prediction of PS classification.

5 Summary and conclusions

A data set of 251 pillar cases compiled from published research work available in recent

years is utilized to construct the proposed models. A comparison of the performances of six

SL classifiers for the prediction of PS in hard rock mines is discussed. Based on the

analysis results, the following conclusions can be drawn. First, none of the SL classifi-

cation methods should be used blindly, as none of them deemed to be a fully automatic

classification method. For the PS data set, the use of repeated tenfold CV strategy for

selecting appropriate parameters in the tuning process is necessary. The use of multiple,

random splitting into training set and test set is also needed for a reliable model com-

parison. Among the six different SL methods, SVM and RF are found to be the best

methods. Nonlinear classification methods ANN and GBM have a slightly higher perfor-

mance and reliability than linear classifiers LDA and MLR. The comparisons indicate that

the model A consisting of five input variables (pillar width, pillar height, UCS, pillar

strength, and pillar stress) with SVM and RF methods is more reliable for evaluating pillar

123

312

Nat Hazards (2015) 79:291–316

Fig. 7 Variable importance assessment in each model for predicting PS with RF method

(a) Model A (b) Model B (c) Model C (d) Model D
(a) Model A
(b)
Model B
(c) Model C
(d)
Model D

stability than other models. RF demonstrates that pillar stress is the most relevant PS

predictor for all models A, B, C, and D. Finally, for the six SL classifiers studied, the

performance (in terms of accuracy) of the training set falls in the range of 65.3–84.3 %

across the six models with different input parameter combination, while the performance

(in terms of accuracy) of the test set performance falls into the range of 59.5–82.4 %.

123

Nat Hazards (2015) 79:291–316

313

Acknowledgments This research was partially supported bythe National Natural Science Foundation Project (Grant Nos. 11472311 and 41272304) of China, the Graduated Students’ Research, Innovation Fund Project (Grant No. CX2011B119) of Hunan Province of China, Project (Grant No. 1343-76140000022) supported by the Scholarship Award for Excellent Doctoral Student of Ministry of Education of China and the Valuable Equipment Open Sharing Fund of Central South University. The authors would like to express thanks to these foundations. The first author would like to thank the Chinese Scholarship Council for financial support to the joint PhD at McGill University, Canada. We also would like to thank the three anonymous referees and editors for their valuable comments and suggestions which improved a previous version of this manuscript.

References

Berrueta LA, Alonso-Salces RM, Heberger K (2007) Supervised pattern recognition in food analysis. J Chromatogr A 1158(1–2):196–214 Bieniawski ZT (1968) The effects of specimen size on the compressive strength of coal. Int J Rock Mech Min Sci Geomech Abstr 5(4):325–335 Bieniawski ZT, Van Heerden WL (1975) The significance of in situ tests on large rock specimens. Int J Rock Mech Min Sci Geomech Abstr 12(4):101–113. Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, New York Brady B, Brown ET (2003) Rock mechanics for underground mining, 2nd edn. Chapman and Hall, London Breiman L (2001) Random forests. Mach Learn 45:5–32 Breiman L, Cutler A (2007) Randomforests—classification description: randomforests. http://stat-www. berkeley.edu/users/breiman/RandomForests/cc_home.htm. Accessed 15 Jan 2014 Cauvin M, Verde T, Salmon R (2009) Modeling uncertainties in mining pillar stability analysis. Risk Anal

29(10):1371–1380

Clark M (2013) An introduction to machine learning: with applications in R. http://www3.nd.edu/

Congalton RG, Green K (2009) Assessing the accuracy of remotely sensed data: principles and practices, 2nd edn. Lewis, Boca Raton Deng J, Yue ZQ, Tham LG, Zhu HH (2003) Pillar design by combining finite element methods, neural networks and reliability: a case study of the Feng Huangshan copper mine, China. Int J Rock Mech Min Sci 40(4):585–599 Elmo D, Stead D (2010) An integrated numerical modelling–discrete fracture network approach applied to the characterisation of rock mass strength of naturally fractured pillars. Rock Mech Rock Eng

43(1):3–19

Esterhuizen GS (1993) Variability considerations in hard rock pillar design. In: Proceedings of the SAN- GORM symposium: rock engineering problems related to hard rock mining at shallow to intermediate depth, Rustenburg, South Africa Esterhuizen GS, Dolinar DR, Ellenberger JL (2011) Pillar strength in underground stone mines in the United States. Int J Rock Mech Min Sci 48(1):42–50 Fisher DH (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7(7):179–188 Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat

29(5):1189–1232

Garzo´ n MB, Blazek R, Neteler M, de Dios RS, Ollero HS, Furlanello C (2006) Predicting habitat suitability with machine learning models: the potential area of Pinus sylvestris L. in the Iberian Peninsula. Ecol Model 97:383–393 Ghasemi E, Shahriar K (2012) A new coal pillars design method in order to enhance safety of retreat mining in room and pillar mines. Saf Sci 50:579–585 Ghasemi E, Shahriar K, Sharifzadeh M, Hashemolhosseini H (2010) Quantifying the uncertainty of pillar safety factor by Monte Carlo simulation—a case study. Arch Min Sci 55:623–635 Ghasemi E, Ataei M, Shahriar K (2014a) An intelligent approach to predict pillar sizing in designing room and pillar coal mines. Int J Rock Mech Min Sci 65:86–95 Ghasemi E, Ataei M, Shahriar K (2014b) Prediction of global stability in room and pillar coal mines. Nat Hazards 1–18 Gonza´ lez-Nicieza C, Alvarez-Fernandez MI, Mene´ ndez-Dı´az A, Alvarez-Vigil AE (2006) A comparative analysis of pillar design methods and its application to marble mines. Rock Mech Rock Eng

39(5):421–444

123

314

Nat Hazards (2015) 79:291–316

Gonza´ lez-Rufino E, Carrio´ n P, Cernadas E, Ferna´ ndez-Delgado M, Domı´nguez-Petit R (2013) Exhaustive comparison of colour texture features and classification methods to discriminate cells categories in histological images of fish ovary. Pattern Recogn 46(9):2391–2407 Griffiths DV, Fenton GA, Lemons CB (2007) The random finite element method (RFEM) in mine pillar stability analysis. Probabilist Methods Geotech Eng 491:271–294 Griffiths DV, Fenton GA, Lemons CB (2002) Probabilistic analysis of underground pillar stability. Int J Numer Anal Meth Geomech 26(8):775–791 Guelman L (2012) Gradient boosting trees for auto insurance loss cost modeling and prediction. Expert Syst Appl 39:3659–3667 Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New York Haykin S (1998) Neural networks: a comprehensive foundation, 2nd edn. Prentice Hall, New Jersey Hedley DGF, Grant F (1972) Stope-and-pillar design for the Elliot Lake Uranium Mines. Bull Can Inst Min Metall 65:37–44 Hoek E, Brown ET (1980) Underground excavation in rock. The Institute of Mining and Metallurgy, London Hudyma MR (1988) Rib pillar design in open stope mining. MASc thesis, University of BC Hustrulid WA (1976) A review of coal pillar strength formulas. Rock Mech 8(2):115–145 Hutchinson DJ, Phillips C, Cascante G (2002) Risk considerations for crown pillar stability assessment for mine closure planning. Geotech Geol Eng 20(1):41–63 Jaiswal A, Shrivastva BK (2009) Numerical simulation of coal pillar strength. Int J Rock Mech Min Sci

46(4):779–788

Jaiswal A, Sharma SK, Shrivastva BK (2004) Numerical modeling study of asymmetry in the induced stresses over coal mine pillars with advancement of the goaf line. Int J Rock Mech Min Sci

41(5):859–864

Jawed M, Sinha RK, Sengupta S (2013) Chronological development in coal pillar design for bord and pillar workings: a critical appraisal. J Geol Min Res 5(1):1–11 Karatzoglou A, Smola A, Hornik K, Zeileis A (2004) Kernlab—an S4 package for kernel methods in R. J Stat Softw 11(9):1–20 Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In:

IJCAI’95 Proceedings of the 14th international joint conference on artificial intelligence, vol 2. Morgan Kaufmann Publishers Inc., San Francisco, pp 1137–1143 Kordon AK (2010) Applying computational intelligence: How to create value. The Dow Chemical Com- pany, Freeport, p 459 Krishnapuram B, Carin L, Figueiredo MA, Hartemink AJ (2005) Sparse multinomial logistic regression: fast algorithms and generalization bounds. Pattern Anal Mach Intell IEEE Trans 27(6):957–968 Kuhn M (2008) Building predictive models in R using the caret package. J Stat Softw 28(5):1–26 Kuhn M (2012) ‘‘Caret’’ package (R Package Version 5.15-023). R Foundation for Statistical Computing, Vienna, Austria Kuhn M, Johnson K (2013) Applied predictive modeling. Springer, New York Landis J, Koch G (1977) The measurement of observer agreement for categorical data. Biometrics

33(1):159–174

  • Li XB, Li DY, Guo L, Ye ZY (2007) Study on mechanical response of highly-stressed pillars in deep mining under dynamic disturbance. Chin J Rock Mech Eng 26(5):922–928

  • Li XB, Li DY, Liu ZX, Zhao GY, Wang WH (2013) Determination of the minimum thickness of crown

pillar for safe exploitation of a subsea gold mine based on numerical modelling. Int J Rock Mech Min Sci 57:42–56 Liaw A, Wiener M (2002) Classification and regression by randomforest. R News 2:18–22 Liu XZ, Zhai DY (2000) The reliability design of pillar. Chin J Rock Mech Eng 18(6):85–88 Liu ZB, Shao JF, Xu WY, Meng YD (2013) Prediction of rock burst classification using the technique of cloud models with attribution weight. Nat Hazards 68(2):549–568 Liu ZB, Shao JF, Xu WY, Chen HJ, Zhang Y (2014) An extreme learning machine approach for slope stability evaluation and prediction. Nat Hazards 1–18 Lunder PJ (1994) Hard rock pillar strength estimation an applied empirical approach. University of British Columbia, Vancouver (MASc thesis) Lunder PJ, Pakalnis R (1997) Determination of the strength of hard rock mine pillars. Bull Can Inst Min Metall 90(1013):51–55 Mark C (2006) The evolution of intelligent coal pillar design: 1981–2006. In: Proceedings of the 25th international conference on ground control in mining. West Virginia University, Morgantown, pp 325–334

123

Nat Hazards (2015) 79:291–316

315

Mark C, Barton TM (1997) Pillar design and coal strength. In: Proceedings of the New Technology for Ground Control in Retreat Mining. US Department of Health and Human Services, Public Health Service, Centers for Disease Control and Prevention, National Institute for Occupational Safety and Health, Pittsburgh, PA, pp 49–59, DHHS (NIOSH) Publication (No. 97–122) Martin CD, Maybee WG (2000) The strength of hard-rock pillars. Int J Rock Mech Min Sci 37:1239–1246 Mitri HS (2007) Assessment of horizontal pillar burst in deep hard rock mines. Int J Risk Assess Manag

7(5):695–707

Molinaro AM, Simon R, Pfeiffer RM (2005) Prediction error estimation: a comparison of resampling methods. Bioinformatics 21(15):3301–3307 Monjezi M, Hesami SM, Khandelwal M (2011) Superiority of neural networks for pillar stress prediction in bord and pillar method. Arab J Geosci 4(5–6):845–853 Mortazavi A, Hassani FP, Shabani M (2009) A numerical investigation of rock pillar failure mechanism in underground openings. Comput Geotech 36(5):691–697 Pandya DH, Upadhyay SH, Harsha SP (2014) Fault diagnosis of rolling element bearing by using multi- nomial logistic regression and wavelet packet transform. Soft Comput 18(2):255–266 Pino-Mejı´as R, Carrasco-Mairena M, Pascual-Acosta A, Cubiles-De-La-Vega M, Joaquı´n Mun˜ oz-Garcı´a J (2008) A comparison of classification models to identify the Fragile X Syndrome. J Appl Stat

35(3):233–244

Pino-Mejı´as R, Cubiles-de-la-Vega MD, Anaya-Romero M, Pascual-Acosta A, Jorda´ n-Lo´ pez A, Bellin- fante-Crocci N (2010) Predicting the potential habitat of oaks with data mining models and the R

system. Environ Model Softw 25(7):826–836 Potvin Y, Hudyma M, Miller HDS (1989) Rib pillar design in open stope mining. Bull Can Inst Min Metall

82(927):31–36

Pozdnoukhov A, Foresti L, Kanevski M (2009) Data-driven topo-climatic mapping with machine learning methods. Nat Hazards 50(3):497–518 R Development Core Team (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. http://www.R-project.org/ Ridgeway G (2007) Generalized boosted models: a guide to the gbm package. http://cran.r-project.org/web/ packages/gbm/index.html Ripley B (2009) Nnet: feed-forward neural networks and multinomial log-linear models. http://cran.r- project.org/web/packages/nnet/index.html

Sadat-Hashemi SM, Kazemnejad A, Lucas C, Badie K (2005) Predicting the type of pregnancy using artificial neural networks and multinomial logistic regression: a comparison study. Neural Comput Appl 14(3):198–202 Sakiyama Y, Yuki H, Moriya T, Hattori K, Suzuki M, Shimada K, Honma T (2008) Predicting human liver microsomal stability with machine learning techniques. J Mol Graph Model 26:907–915 Salamon MDG (1970) Stability, instability and design of coal pillar workings. Int J Rock Mech Min Sci Geomech Abstr 7(6):613–631 Salamon MDG, Munro AH (1967) A study of the strength of coal pillars. J South Afr Inst Min Metall

68:55–67

Sheng JH, Liao WJ, Li WM (2010) Analysis of pillar safety factor in Gaoshan gypsum mine. Metal Mine

(suppl.):791–793

Sjoberg JS (1992) Failure modes and pillar behaviour in the Zinkgruvan mine. In: Tillerson JA, Wawersik WR (eds) Proceedings of the 33rd U.S. rock mechanics symposium. A.A. Balkema, Sante Fe. Rot- terdam, pp 491–500 Tawadrous AS, Katsabanis PD (2007) Prediction of surface crown pillar stability using artificial neural networks. Int J Numer Anal Meth Geomech 31(7):917–931 Tesfamariam S, Liu Z (2010) Earthquake induced damage classification for reinforced concrete buildings. Struct Saf 32(2):154–164 Vapnik VN (1995) The nature of statistical learning theory. Springer, New York Venables WN, Ripley BD (2002) Modern applied statistics with S. Springer, New York Von Kimmelman MR, Hyde B, Madgwick RJ (1984) The use of computer applications at BCL Limited in planning pillar extraction and design of mining layouts. In: Proceedings of the ISRM symposium:

design and performance of underground excavations. British Geotechnical Society, London, p 53J63 Wattimena RK (2014) Predicting the stability of hard rock pillars using multinomial logistic regression. Int J Rock Mech Min Sci 71:33–40 Wattimena RK, Kramadibrata S, Sidi ID, Azizi MA (2013) Developing coal pillar stability chart using logistic regression. Int J Rock Mech Min Sci 58:55–60 York G (1998) Numerical modelling of the yielding of a stabilizing pillar/foundation system and a new design consideration for stabilizing pillar foundations. J. S.A. Inst Min Metall 98:281–293

123

316

Nat Hazards (2015) 79:291–316

Zhou J, Shi XZ, Dong L, Hu HY, Wang HY (2010) Fisher discriminant analysis model and its application for prediction of classification of rockburst in deep-buried long tunnel. J Coal Sci Eng 16(2):144–149 Zhou J, Li XB, Shi XZ, Wei W, Wu BB (2011) Predicting pillar stability for underground mine using Fisher discriminant analysis and SVM methods. Trans Nonferrous Metals Soc China 21(12):2734–2743 Zhou J, Li XB, Shi XZ (2012) Long-term prediction model of rockburst in underground openings using heuristic algorithms and support vector machines. Saf Sci 50(4):629–644 Zhou J, Li XB, Mitri HS, Wang SM, Wei W (2013) Identification of large-scale goaf instability in underground mine using particle swarm optimization and support vector machine. Int J Min Sci Technology 23(5):701–707

123