Sie sind auf Seite 1von 13

Computers & Geosciences 39 (2012) 64–76

Contents lists available at ScienceDirect

Computers & Geosciences


journal homepage: www.elsevier.com/locate/cageo

Support vector regression to predict porosity and permeability: Effect of


sample size
A.F. Al-Anazi, I.D. Gates n
Department of Chemical and Petroleum Engineering, Schulich School of Engineering, University of Calgary, 2500 University Drive NW, Calgary, Alberta, Canada T2N 1N4

a r t i c l e i n f o a b s t r a c t

Article history: Porosity and permeability are key petrophysical parameters obtained from laboratory core analysis.
Received 3 December 2010 Cores, obtained from drilled wells, are often few in number for most oil and gas fields. Porosity and
Received in revised form permeability correlations based on conventional techniques such as linear regression or neural
21 May 2011
networks trained with core and geophysical logs suffer poor generalization to wells with only
Accepted 21 June 2011
Available online 2 July 2011
geophysical logs. The generalization problem of correlation models often becomes pronounced when
the training sample size is small. This is attributed to the underlying assumption that conventional
Keywords: techniques employing the empirical risk minimization (ERM) inductive principle converge asympto-
Porosity prediction tically to the true risk values as the number of samples increases. In small sample size estimation
Permeability prediction
problems, the available training samples must span the complexity of the parameter space so that the
Small sample size
model is able both to match the available training samples reasonably well and to generalize to new
Loss function
Support vector machines data. This is achieved using the structural risk minimization (SRM) inductive principle by matching the
Artificial neural networks capability of the model to the available training data. One method that uses SRM is support vector
Log data regression (SVR) network. In this research, the capability of SVR to predict porosity and permeability in
Core data a heterogeneous sandstone reservoir under the effect of small sample size is evaluated. Particularly, the
impact of Vapnik’s e-insensitivity loss function and least-modulus loss function on generalization
performance was empirically investigated. The results are compared to the multilayer perception (MLP)
neural network, a widely used regression method, which operates under the ERM principle. The mean
square error and correlation coefficients were used to measure the quality of predictions. The results
demonstrate that SVR yields consistently better predictions of the porosity and permeability with small
sample size than the MLP method. Also, the performance of SVR depends on both kernel function type
and loss functions used.
& 2011 Elsevier Ltd. All rights reserved.

1. Introduction structure to match the complexity of the predictive learning


technique to the available training data. The structure is com-
One of the most important tasks in modeling geoscience data posed of elements of increasing complexity that need to be
is development of robust and accurate correlation models cali- chosen to imitate the response of the learning problem using a
brated to small sample size problems. For permeability estima- limited number of data (Cherkassky and Mulier, 2007).
tion, correlation models between porosity and permeability are Artificial neural networks (ANNs) is a universal approximator that
often built based on the relationship between geophysical logs is capable of approximating any nonlinear function to any degree of
and core-measured porosity and permeability. In typical practice, accuracy provided that there are a sufficient number of neurons in
core plugs are extracted from a few key wells during drilling the network (Hornik et al., 1989). The structure implemented by
whereas geophysical logs are run for all wells in the oil/gas field. ANN may be captured by the number and size of the hidden layers,
The limited number of core plug data poses a challenging problem which are controlled explicitly by the user. This structure may also
to existing empirical techniques that employ the empirical risk lead to an overfitting problem during learning, particularly in the
minimization (ERM) principle such as linear regression and neural presence of a small sample size, which potentially yields a poor
networks. Statistical learning theory (SLT) shows that these generalization model. Although ANN has shown some successful
techniques can be safely used as a measure of the true risk when applications to porosity and permeability (Helle and Ursin, 2001;
the sample size is sufficiently large. SLT calls for introducing a Huang et al., 2001), the underlying learning algorithm has been
developed for learning problems of large sample sizes. Hence, for a
given small sample size, extensive experiments with several different
n
Corresponding author. Tel.: þ1 403 220 5752; fax: þ1 403 284 4852. learning techniques are required to devise an accurate ANN-based
E-mail address: ian.gates@ucalgary.ca (I.D. Gates). regression model (Kaviani et al., 2008).

0098-3004/$ - see front matter & 2011 Elsevier Ltd. All rights reserved.
doi:10.1016/j.cageo.2011.06.011
A.F. Al-Anazi, I.D. Gates / Computers & Geosciences 39 (2012) 64–76 65

Nomenclature x input variable


y output variable
b bias constant y^ estimated output value
MLP multilayer perceptron neural networks
c regularization parameter Greek symbols
DT sonic porosity log
f an unknown function a,a* Lagrangian multiplier to be determined
GR gamma ray e error accuracy
h Vapnik-Chervonenkis dimension Z,Z* Lagrangian multipliers
ILD deep inductive laterolog k, W sigmoid function parameters
K kernel function/permeability x,x* slack variables
L Lagrangian equation for a dual programming problem w weight vector
or loss function
NPHI neutron porosity log Subscripts and superscripts
RBF radial basis function
RHOB bulk density log
k,l indices
Remp empirical risk
N number of samples
R structural risk
n input space dimension
SVR support vector regression
SVM support vector machines

Recently, support vector machines (SVMs) have been gaining examined and compared to a multilayer perceptron network.
popularity in regression and classification due to their excellent The empirical evaluation of the generalization performance under
generalization performance. The SVM approach has been success- small sample setting is conducted for two loss functions: first,
fully applied to several different applications such as face recogni- the e-insensitivity loss function, and second, the least-modulus
tion, object detection, handwriting recognition, text detection, (or absolute value) loss function.
speech recognition and prediction, lithology identification, and
porosity and permeability determination from log data (Li et al.,
2000; Lu et al., 2001; Choisy and Belaid, 2001; Gao et al., 2001; Kim 2. Background
et al., 2001; Ma et al., 2001; Van Gestel et al., 2001; Al-Anazi and
Gates, 2010a, 2010b, 2010c, 2010d). The SVM formulation is based 2.1. Multilayer perceptron neural network model
on the structural risk minimization (SRM) inductive principle
where the empirical risk minimization (ERM) inductive principle ANN has been frequently used as an intelligent regression
and the Vapnik and Chervonenkis (VC) confidence interval are technique in petrophysical properties estimation (Rogers et al.,
simultaneously minimized (Vapnik and Chervonenkis, 1974; 1995; Huang et al., 1996, 2001; Fung et al., 1997; Helle and Ursin,
Vapnik, 1982, 1995). The SRM principle introduces a structure 2001; Helle and Bhatt, 2002). Backpropagation multilayer percep-
where each element of the structure is indexed by a measure of tron neural networks are considered to be universal approxima-
complexity defined by the margin size between two classes in a tors: it has been mathematically proven that a network with a
classification learning problem and by an insensitivity zone size in hidden layer of an arbitrary large number of nonlinear neurons
a regression problem (Cherkassky and Mulier, 2007). The SVM can approximate any continuous nonlinear function over a
optimization formulation implicitly matches a suitable structure of compact subset to any desirable accuracy (Hornik et al., 1989).
certain complexity to the available small size sample. This type of In our study, a backpropagation conjugate gradient learning
structure is controlled independently of the dimension of the algorithm is used to train the multilayer perception (MLP) net-
problem, which is an advantage over classical learning techniques. work by minimizing a squared residual cost function. During
In regression applications, the empirical error (the training error) is training, input patterns are propagated forward through hidden
minimized by Vapnik’s e-insensitivity loss function rather than the layers toward the output while the output error signal is back-
quadratic error and absolute-value loss functions used in neural propagated toward the input layer to adjust the weights of the
networks and classical regression methods. To generalize to non- hidden and output layers in order to approximate the target
linear regression, kernel functions are used to project the input hypersurface. One well-known problem with such a training
space into a feature space where a linear or nearly linear regression algorithm is that it can get trapped in local minima because
hypersurface results. A regularization term is used to determine a the algorithm performance is sensitive to the selection of the
trade-off between the training error and the VC confidence term. starting weight values (Hastie et al., 2001). To overcome this
The learning problem is formulated as a constrained convex issue, the initial range of initial weight values is chosen by the
optimization problem whose solution is used to construct the Nguyen–Widrow algorithm (Nguyen and Widrow, 1990). The
mapping function between the empirical input and the output data conjugate gradient algorithm is used to optimize values of
(Kecman, 2005). the weights. Optimization is done several times with different
Previously, our research demonstrated the generalization cap- starting values of the weight values (chosen randomly) to
ability of SVM in lithology classification and porosity and perme- improve the chances of converging to the global solution. The
ability predictions with sensitivity analysis of kernel function MLP network can also overfit the training data leading to poor
types and SVM regularization parameters. In this research, how- generalization to new data. In this study, a cross-validation
ever, the sensitivity of the SVM-based prediction of porosity and technique was used to terminate training to select the best model
permeability to sample size and empirical loss functions are (Sherrod, 2009).
66 A.F. Al-Anazi, I.D. Gates / Computers & Geosciences 39 (2012) 64–76

2.2. Statistical learning theory regression problems, this structure is controlled independently of
the dimension of the input space. This learning characteristic is
A major goal of the Vapnik–Chervonenkis (VC) theory is to also well pronounced when the input space is required to be
characterize the generalization error instead of the validation transformed to a higher dimensional space to solve nonlinear
error adopted by conventional statistical learning techniques problems.
(SLTs) such as neural networks. The generalization error is
R
defined by RðwÞ ¼ Lðy,f ðx,wÞÞpðx,yÞdxdy, where Lðy,f ðx,wÞÞ is a 2.3. SVM methodology
loss function and pðx,yÞ is the joint probability distribution. Since
the probability distribution is unknown, one can bound the true Although SVMs were originally developed to solve pattern
risk based on VC theory. Vapnik and Chervonenkis (1971) intro- recognition problems, with the introduction of Vapnik’s e-insen-
duced a complexity measure called the VC dimension of the sitive loss function, they have also been used to solve nonlinear
learning model to control the convergence of the ERM to the true regression estimation problems; these methods are referred to as
risk. In a given binary classification learning problem, a set of support vector regression (Vapnik, 1995). Here, a brief review of
functions has VC dimension, h, if there exists a maximum number the SVM method is presented. More details can be found in
of training points, h, upon which all possible separations can Al-Anazi and Gates (2010a, 2010b, 2010c, 2010d).
correctly be generated by members of the set of functions A linear regression function can be expressed by (Suykens
(Van Gestel et al., 2001; Cherkassky and Mulier, 2007). Based et al., 2002)
on SLT theory, the empirical risk can be safely used as a measure
of the true risk when the sample size (really the ratio of the f ðxÞ ¼ wT xþ b ð3Þ
number of training points to the VC dimension of approximating with input values xk A RNn and output values yk A RN for N given
functions) is large. However, under the presence of small sample training points. The empirical risk is given by
size, there exists a large deviation between the empirical risk and
the true risk that cannot be neglected. This leads to controlled 1XN
Remp ¼ 9y wT xk b9e ð4Þ
learning processes where the complexity of approximating func- Nk¼1 k
tions is matched to the available finite samples. This results in the
birth of the SRM inductive principle where a structure is defined with the Vapnik’s e-insensitive loss function defined as
(
on a set of elements of approximating functions and an element of 0 if 9yf ðxÞ9 r e
optimal capacity, that is, the VC dimension, where low empirical 9yf ðxÞ9e ¼ : ð5Þ
yf ðxÞ9e otherwise
error is selected (Cherkassky and Mulier, 2007). Based on the
following regression generalization bounds on the true risk This problem can be reformulated in a dual space by
formulated by the SLT (Vapnik, 1995), the concept of SRM can
be clearly illustrated. The regression multiplicative analytic 1 X N
maxJD ða, an Þ ¼  ða an Þða an ÞxT x
bound on the generalization ability of a learning machine holds 2 k,l ¼ 1 k k l l k l
true with a probability of at least 1Z for all functions including X
N X
N
the function minimizing the empirical risk, e ðak þ ank Þ þ yk ðak ank Þ ð6Þ
k¼1 k¼1
Remp ðwÞ
RðwÞ ¼ pffiffiffi ð1Þ 8
ð1c eÞ, > X
N
>
< ðak ank Þ
where subject to k¼1 ,
>
>
hðlnða2 N=hÞ þ1ÞlnðZ=4Þ : ak , an A ½0,c
k
e ¼ a1 :
N
where ak , ank Z 0 are positive Lagrange multipliers.
With a1 ¼p a2ffiffiffiffi¼1, N is the sample size, and by setting Once the optimal Lagrange multipliers pairs are determined,
Z ¼ minð4= N ,1Þ, the VC bound on the true risk is then given by the linear hypersurface regression is
(Vapnik, 1995; Cherkassky and Mulier, 2007) XN
Remp ðwÞ f ðxÞ ¼ k¼1
ðak ank Þ xTk xþ b ð7Þ
RðwÞ r pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð2Þ
ð1 pp ln p þ ðln N=2NÞÞ þ , with
XN
where p ¼ h=N and h is the VC dimension, which characterizes the w¼ ðak ank Þxk : ð8Þ
k¼1
capacity of the set of functions. The notation ð1=xÞ þ indicates that
ð1=xÞ þ ¼ 1=x if x 40 and 0otherwise. The denominator of this The training points with nonzero ak values correspond to the
multiplicative bound refers to the confidence term that depends free support vectors, which allow the calculation of the bias term
on the VC dimension. Similar observations to the additive bound b. The safest practice to compute the bias term is by solving the
can be drawn on sample size. For large sample size, while keeping regression function for the bias term and then averaging over all
other parameters fixed the value of e (Eq. (1)) becomes so small the free support vectors (Kecman, 2005). The solution reveals the
that the empirical risk can converge to the true risk. On the other merits of the support vector machine over conventional statistical
hand, for small samples, the ratio is required to be minimized learning algorithms: it is both global and unique. The dual
through the use of the SRM principle (Cherkassky and Mulier, problem formulation is independent of the input space dimension
2007). Therefore, generalization bounds lead to the development and only depends on the number of training patterns (Suykens
of the SRM inductive principle and can also be used as other et al., 2002). Generalization to nonlinear regression estimation
classical bounds for model selection such as the resampling can be done by the application of kernel functions. Typical
method (i.e., cross-validation). kernel functions used are linear, Gaussian radial basis (RBF)
Since the VC dimension is difficult to be measured in real-life kernel, polynomial, and sigmoid functions (Al-Anazi and Gates,
problems, a structure based on the margin-based loss function 2010a, 2010b, 2010c, 2010d). Both RBF and sigmoid kernel
was introduced by Vapnik (1995) to capture the complexity of functions as defined in Table 1 were used to model porosity and
approximating functions efficiently. For both classification and permeability in this study. The nonlinear hyperplane regression
A.F. Al-Anazi, I.D. Gates / Computers & Geosciences 39 (2012) 64–76 67

model is given by pattern search is carried out over the region to find a global
XN optimum. To select the best prediction model, 10-fold cross-
f ðxÞ ¼ k¼1
ðak ank ÞKðx,xk Þ þ b, ð9Þ validation was used.
where ak , ank are the solution for the quadratic programming
problem above and the bias term b is calculated as an average 4. Raw data
value over the free support vectors. Again, the solution is global
and unique as long as the kernel function is positive definite. 4.1. Reservoir description

The raw input data were obtained from three wells completed in
3. Input parameters for the learning algorithms a heterogeneous sandstone oil reservoir deposited in a fluvial
dominated deltaic environment of Middle Cretaceous Albian age.
3.1. Multilayer perceptron neural network The reservoir, consisting of a sequence of sandstone, siltstone, and
shale with thin intervals of limestone, coal, and varying amounts of
In MLP neural networks, the topology of the networks consists ironstone, can be divided into three main lithologic units. The first
of input layer, single hidden layer, and output layer. The size of one is mainly shale and sands that are thinly interbedded with
the hidden layer is measured by the number of processing small amount of siltstones, green mud, and small siderite nodules.
neurons where a sigmoid kernel function is used. The size of This unit represents a shallow marine and prodelta environment.
the hidden layer determines the complexity of the regression The second one is a thick homogeneous clean sandstone, deposited
model. Increasing the complexity may lead to overfitting the as distributary mouth bars, and distributary and fluvial channels.
training data and lead in turn to potentially poor generalization to This sand is the thickest and best part of the reservoir with good
testing data. Therefore, the size of the hidden layer must be vertical and lateral continuity occasionally interbedded with thin
optimized. A 10-fold cross-validation resampling technique was shale. The third one contains thin shale and sand layers and some
used to strike the right trade-off between overfitting and thin beds of shallow marine limestones. Fig. 1 shows a semilog plot
underfitting. of core permeability versus core porosity measurements from the
three wells. The data show a rough trend between permeability and
3.2. SVM parameter and model selection porosity; however, large scatter is evident, suggesting a high degree
of heterogeneity in this reservoir.
The accuracy of the SVM depends strongly on the values of
SVM model and kernel function parameters. Here, following Al- 4.2. Porosity and permeability data
Anazi and Gates (2010b), grid and pattern search methods were
used to determine the optimal set of SVM input parameters. Porosity and permeability as petrophysical properties can be
Specifically, first, the search starts out with the grid search to derived from log or core measurements or both. Each reservoir
determine a region close to the global optimal point. Second, a has unique discriminating features embedded in its petrophysical
properties. Therefore, beyond the relationship between porosity
Table 1 and permeability and input variables as presented in the corre-
Common kernel function and corresponding mathematical expression. sponding empirical equations, there are still some formation
signals that may contribute to petrophysical properties modeling.
Kernel function Mathematical expression
The database used here consists of 701 training samples (log
Gaussian radial basis function 2 2 inputs, core-based porosity, and permeability outputs). For por-
kðxi ,xÞ ¼ eð99xi x99 =2s Þ
Sigmoid kðxi ,xÞ ¼ tanhðk/xi ,xS þ WÞ
osity modeling, each training pattern consists of three logs:
gamma ray (GR), neutron porosity (NPHI), sonic porosity (DT),

10000
Well 1
Well 2
1000
Well 3

100
Permeability, mD

10

0.1

0.01
0 5 10 15 20 25 30 35 40
Porosity, %

Fig. 1. Crossplot of core permeability versus core porosity (modified from Al-Anazi and Gates, 2010b).
68 A.F. Al-Anazi, I.D. Gates / Computers & Geosciences 39 (2012) 64–76

and bulk density (RHOB) as the input vector, and the core-based distribution. The tops and bottoms of box represent the 25th
porosity, f, as a scalar output. For permeability, each training and 75th percentiles of the error, respectively. The red line in each
pattern consists of gamma ray (GR), neutron porosity (NPHI), box is the median of the error. The whiskers are the lines that
sonic porosity (DT), bulk density (RHOB), and formation resistiv- extend from each end of the box to 1.5 times the interquartile
ity (ILD) in the input vector and core-based permeability, log(k), range (the range between the upper and the lower quartile values,
as a scalar output. which consists of 50% of the distribution) extending above and
below each box. The plot also provides information about data
with values beyond the ends of the whiskers (the outliers) and
5. Small sample size testing procedure marked by positive sign symbols. The notches in the box plot
reflect the variability of the error median between regression
The modeling data were prepared as follows: models (Mathworks, 2007). Each plot presents a summary of the
errors of the predictions obtained from each regression model
1. The data were randomly subdivided into divisions as percen- using 50 realizations of the testing data (sampled from the full
tages of the data as shown in Table 2 in nine cases. data set). Table 2 lists the number of points used for the small
2. A total of 50 realizations were randomly generated for each datasets (for training) and large datasets (for evaluating the
case. Since the regression model itself depends on a particular prediction accuracy).
(random) realization of a fixed size training sample, its
estimated prediction accuracy is also a random variable. 6.1. Porosity prediction
Therefore, the experiment calls for repeating the experimental
procedure with different realizations of training data to The notched box plots in Figs. 2–10 present the prediction
determine the average prediction accuracy to compare the errors of the porosity of the SVM and MLP regression models. In
predictions from the models. Fig. 2, Case 1 (2% of full data used to train the regression models),
3. The small training dataset (number of data points range from the results show that the SVM models have better performance
15 to 71 depending on case) is used to learn the regression than the MLP. Similar results occur for the other cases shown in
model parameters and testing dataset is used to examine the Figs. 3–10. The plots show that the prediction error distribution of
prediction capability of the model. The risk is quantified by the the SVM models has consistently lower variability than the MLP
mean squared error (MSE). models (the variability is measured by the length of the inter-
4. The SVR models, one using the e-insensitivity loss function and quartile range enclosed by the upper and lower quartiles). As the
the other using the least-modulus loss function, and MLP box becomes smaller relative to the whiskers, the plot signifies
model are used to predict porosity and permeability using that the kurtosis of the error distribution is becoming more
the test datasets. positive; that is, the error distribution has a sharper peak. This
5. A notched box plot was generated for each prediction model kurtosis is well pronounced in Figs. 3, 6, 8, and 10 for the SVM
with the 50 realizations of the test datasets using only the
e-insensitivity loss function.
6. To compare the result of support vector regression to neural 100
networks, larger datasets comprising 20 and 30% of the total data
were used to construct porosity and permeability regression
models. 10-1
MSE

6. Small sample size analysis with the e-insensitivity loss 10-2


function

To analyze the results, notched box plots of the mean squared


error from the Gaussian-based SVR (SVR–RBF), sigmoid-based 10-3
MLP SVR-RBF SVR-SIGMOID
SVR (SVR-Sigmoid), and MLP models are compared. The notched
box plot is an exploratory data analysis tool that provides Fig. 2. Comparison of the mean squared error (MSE) statistics from the SVMs and
statistical summaries of the underlying prediction error MLP porosity predictions from 50 random realizations, each one consisting of 2%
of the full data set as the training set and the remaining 98% as the testing set.

Table 2
Training and testing data allocation. 100
Case Data, % Number of points in Number of points in
training set testing set

1 2 15 686
10-1
2 3 22 679
MSE

3 4 29 672
4 5 36 665
5 6 43 658 10-2
6 7 50 651
7 8 57 644
8 9 64 637
9 10 71 630 10-3
10 20 141 560 MLP SVR-RBF SVR-SIGMOID
11 30 211 490
Fig. 3. Comparison of the mean squared error (MSE) statistics from the SVMs and
For each Case, 50 realizations of the training sets were constructed with points MLP porosity predictions from 50 random realizations, each one consisting of 3%
randomly chosen from the full dataset. of the full data set as the training set and the remaining 97% as the testing set.
A.F. Al-Anazi, I.D. Gates / Computers & Geosciences 39 (2012) 64–76 69

100 100

10-1 10-1

MSE
MSE

10-2 10-2

10-3 10-3
MLP SVR-RBF SVR-SIGMOID MLP SVR-RBF SVR-SIGMOID

Fig. 4. Comparison of the mean squared error (MSE) statistics from the SVMs and Fig. 7. Comparison of the mean squared error (MSE) statistics from the SVMs and
MLP porosity predictions from 50 random realizations, each one consisting of 4% MLP porosity predictions from 50 random realizations, each one consisting of 7%
of the full data set as the training set and the remaining 96% as the testing set. of the full data set as the training set and the remaining 93% as the testing set.

100 100

10-1 10-1

MSE
MSE

10-2 10-2

10-3 10-3
MLP SVR-RBF SVR-SIGMOID MLP SVR-RBF SVR-SIGMOID

Fig. 5. Comparison of the mean squared error (MSE) statistics from the SVMs and Fig. 8. Comparison of the mean squared error (MSE) statistics from the SVMs and
MLP porosity predictions from 50 random realizations, each one consisting of 5% MLP porosity predictions from 50 random realizations, each one consisting of 8%
of the full data set as the training set and the remaining 95% as the testing set. of the full data set as the training set and the remaining 92% as the testing set.

100 100

10-1 10-1
MSE
MSE

10-2 10-2

10-3 10-3
MLP SVR-RBF SVR-SIGMOID MLP SVR-RBF SVR-SIGMOID

Fig. 6. Comparison of the mean squared error (MSE) statistics from the SVMs and Fig. 9. Comparison of the mean squared error (MSE) statistics from the SVMs and
MLP porosity predictions from 50 random realizations, each one consisting of 6% MLP porosity predictions from 50 random realizations, each one consisting of 9%
of the full data set as the training set and the remaining 94% as the testing set. of the full data set as the training set and the remaining 91% as the testing set.

models. Both SVM models have similar prediction error medians, network collapses to a much smaller variability relative to that
although the RBF kernel function appears to have lower error and of the previous smaller cases.
smaller variability than the sigmoid kernel function for cases with Fig. 11 plots the median of the MSE for the porosity predictions
2–7% of the data used for training (Cases 1–6) whereas for larger from the group of realizations for each case for the SVM and MLP
training datasets (Cases 7–9), the sigmoid kernel function has regression models. The results show that the MLP has significantly
smaller variability. Since the notches in the box plot do not larger medians than that of the SVM models. However, the median
overlap, the MSE error medians of SVM models do differ from of the MSE for the MLP drops and reaches a value similar to that of
those of MLP models with 95% confidence over all small sample the SVM models when the sample size increases. The SVM results
sizes. In addition, for all cases, the RBF–SVM models yield the are relatively unchanged over all of the cases, demonstrating that
minimum number of outliers when compared to the sigmoid- the method is robust. The results demonstrate that the RBF and
SVM and MLP models. When notched box plot (the plots are not sigmoid–SVM regression models statistically outperform the MLP
shown here for space limitation) was constructed for cases with model to predict porosity with small sample sizes. However, the
20% and 30% of the data for training (Cases 10 and 11), the prediction accuracy of the MLP network converges to that of SVM
variability of error prediction of both SVM models and MLP models with larger sample sizes of 20% and 30% of the data, which
70 A.F. Al-Anazi, I.D. Gates / Computers & Geosciences 39 (2012) 64–76

100 might be drawn from the assertion that the employed empirical
risk converges asymptotically to the expected risk. Fig. 11 also
shows that the variability of the SVM prediction error appears to be
sensitive to the kernel function used in nonlinear porosity predic-
10-1 tion, although the median of the MSE error distribution is similar.
To demonstrate how the porosity predictions using the
MSE

e-insensitivity loss function are correlated to the actual core-


measured data, the mean of the correlation coefficient of predic-
10-2
tions obtained from the SVM regression using RBF and sigmoid
kernel functions, and MLP models over all sample sizes obtained
from 50 random realizations are plotted as shown in Fig. 12.
10-3 The SVM predictions with RBF and sigmoid kernel functions cor-
MLP SVR-RBF SVR-SIGMOID relate better than those obtained from the MLP model. However,
Fig. 10. Comparison of the mean squared error (MSE) statistics from the SVMs and the SVM with RBF kernel function performs much better than with
MLP porosity predictions from 50 random realizations, each one consisting of 10% the sigmoid kernel function particularly under sparse sample size
of the full data set as the training set and the remaining 90% as the testing set. (2% up to 7% case) settings. On the other hand, there exists an

0.025

0.020
Median of MSE

0.015

0.010

0.005

0.000
2 3 4 5 6 7 8 9 10 20 30
Sample Size, %
MLP SVR-RBF SVR-SIGMOID

Fig. 11. Comparison of the medians of the mean squared error (MSE) porosity predictions obtained from the RBF and sigmoid-based SVM and MLP models versus sample
size obtained from 50 random realizations using the e-insensitivity loss function.

0.8

0.7
Mean of Correlation Coefficient

0.6

0.5

0.4

0.3

0.2

0.1

0.0
2 3 4 5 6 7 8 9 10 20 30
Sample Size, %
MLP SVR-RBF SVR-SIGMOID

Fig. 12. Comparison of the mean of the correlation coefficient of porosity predictions obtained from the RBF and sigmoid-based SVM and MLP models versus sample size
obtained from 50 random realizations using the e-insensitivity loss function.
A.F. Al-Anazi, I.D. Gates / Computers & Geosciences 39 (2012) 64–76 71

increasing trend of prediction accuracy of all models as the number 103


of samples increases. This might be attributed to the fact that the
underlying asymptotic condition that ensures optimality of the
least-squares loss function used in MLP network is approached.
102

MSE
6.2. Permeability prediction
101
The notched box plots in Figs. 13–21 present the prediction
errors of the permeability of the RBF and sigmoid SVM and MLP
regression models. The results reveal that the predictions from
the SVM models outperform the MLP ones for small sample sizes. 100
The notched box plots show that the median values of the MSEs of MLP SVR-RBF SVR-SIGMOID
the SVM models are consistently less than those of the MLP model
Fig. 16. Comparison of the mean squared error (MSE) statistics from the SVMs and
over all small sample sizes. The variability of the MSE prediction
MLP permeability predictions from 50 random realizations, each one consisting of
5% of the full data set as the training set and the remaining 95% as the testing set.
103
103

102
102
MSE

MSE
101
101

100
MLP SVR-RBF SVR-SIGMOID 100

Fig. 13. Comparison of the mean squared error (MSE) statistics from the SVMs and MLP SVR-RBF SVR-SIGMOID
MLP permeability predictions from 50 random realizations, each one consisting of
2% of the full data set as the training set and the remaining 98% as the testing set. Fig. 17. Comparison of the mean squared error (MSE) statistics from the SVMs and
MLP permeability predictions from 50 random realizations, each one consisting of
6% of the full data set as the training set and the remaining 94% as the testing set.
103
103

102
102
MSE

MSE

101
101

100
MLP SVR-RBF SVR-SIGMOID 100

Fig. 14. Comparison of the mean squared error (MSE) statistics from the SVMs and MLP SVR-RBF SVR-SIGMOID
MLP permeability predictions from 50 random realizations, each one consisting of
3% of the full data set as the training set and the remaining 97% as the testing set. Fig. 18. Comparison of the mean squared error (MSE) statistics from the SVMs and
MLP permeability predictions from 50 random realizations, each one consisting of
7% of the full data set as the training set and the remaining 93% as the testing set.
103

distribution is a distinctive characteristic of the SVM and the MLP


regression models. The SVM models, particularly the RBF-based
102 one, are characterized by a thin box relative to the whiskers,
indicating that a very high number of predictions are contained
MSE

within a very small error margin. This in turn signifies an error


101 distribution with a thinner peak, that is, higher kurtosis. The MLP
model, on the other hand, is characterized by a wider box relative
to the whiskers, indicating a wider error peak. For the 2% and 3%
100 prediction cases in Figs. 13 and 14, respectively, the sigmoid-
based SVM model shows error variability similar to that of the
MLP SVR-RBF SVR-SIGMOID MLP model yet with lower median error values. Similar to the
Fig. 15. Comparison of the mean squared error (MSE) statistics from the SVMs and
porosity prediction, the RBF–SVR prediction model has produced
MLP permeability predictions from 50 random realizations, each one consisting of the minimum outliers compared to sigmoid-SVM and MLP mod-
4% of the full data set as the training set and the remaining 96% as the testing set. els. The results also reveal that the notched box plot (the plots are
72 A.F. Al-Anazi, I.D. Gates / Computers & Geosciences 39 (2012) 64–76

103 the SVM regression modes statistically outperform the MLP


models in predicting permeability with small sample sizes.
Fig. 22, however, shows that the prediction accuracy of the MLP
network converges to that of SVM models with larger sample
102 sizes of 20% and 30% of the data. Similar to the porosity predic-
tion, the SVM prediction error variability depends on the type of
MSE

kernel functions used in permeability prediction.


101 One interesting result is that the support vector regression
models, in particular the RBF–SVR, are relatively insensitive to the
size of the porosity and permeability sample populations. This
100 powerful characteristic of SVM might be attributed to the under-
lying SRM inductive principle. On the other hand, the multilayer
MLP SVR-RBF SVR-SIGMOID perceptron neural network is highly sensitive to the sample size.
Fig. 19. Comparison of the mean squared error (MSE) statistics from the SVMs and This may explain that classical models employing ERM principle
MLP permeability predictions from 50 random realizations, each one consisting of converge to the true risk only under the asymptotic condition
8% of the full data set as the training set and the remaining 92% as the testing set. where the sample size is sufficiently large.
To demonstrate how the permeability predictions using the
103 e-insensitivity loss function are correlated to the actual core-
measured data, the mean of the correlation coefficient of predic-
tions obtained from the SVM regression with RBF and sigmoid
kernel functions, and MLP models over all sample sizes obtained
102
from 50 random realizations, are plotted in Fig. 23. Similar to
porosity predictions, the SVM predictions with RBF and sigmoid
MSE

kernel functions correlate better than those obtained from the


101 MLP. However, the SVM with RBF kernel function outperforms
predictions obtained with sigmoid kernel function. The prediction
performance of the SVM and MLP networks demonstrates that
100 there exists an apparent shift in correlation prediction curves
from lower prediction capability of the MLP network to a higher
MLP SVR-RBF SVR-SIGMOID prediction capability of the SVM with an RBF kernel function. The
Fig. 20. Comparison of the mean squared error (MSE) statistics from the SVMs and correlation coefficient of prediction performance of MLP network
MLP permeability predictions from 50 random realizations, each one consisting of and SVM with sigmoid kernel function increases as the number of
9% of the full data set as the training set and the remaining 91% as the testing set. samples increases.

103
7. Small sample size analysis with least-modulus loss function

Although it was demonstrated that the prediction accuracy of


102 the SVM using e-insensitivity loss function outperforms ordinary
least-squares and least-modulus methods for linear regression
MSE

problems under small size high-dimensional datasets (Cherkassky


101 and Ma, 2004), it would be of practical importance to investigate
the impact of different loss functions on prediction accuracy of
nonlinear SVM regression with RBF Gaussian and sigmoid kernel
functions. As previously stated, SVM simultaneously minimizes
100
the confidence interval and empirical term controlled by a loss
MLP SVR-RBF SVR-SIGMOID function. The analysis here replaces the e-insensitivity loss func-
tion with the least-modulus loss function. The analysis of predic-
Fig. 21. Comparison of the mean squared error (MSE) statistics from the SVMs and
MLP permeability predictions from 50 random realizations, each one consisting of tion accuracy was repeated using nonlinear SVM regression with
10% of the full data set as the training set and the remaining 90% as the testing set. the least-modulus loss function. SVM functional minimization
was carried out by setting e ¼0, which simulates the least-
not shown here for space limitation) constructed for cases with modulus loss function as can be observed from Eq. (5).
20% and 30% of the data for training (Cases 10 and 11) shows that
the variability of error prediction of both SVM models and MLP 7.1. Porosity prediction
network collapses to a very smaller variability relative to that of
the smaller cases. Fig. 24 plots the median of the MSE of the porosity predictions
Fig. 22 plots the median of the MSE for the permeability from the group of realizations for each case obtained from the
predictions from the group of realizations for each case for the SVM and MLP regression models. The results show that the MLP
SVM and MLP regression models. The results demonstrate that has significantly larger medians than that of the SVM models.
the MLP has significantly larger medians than that of the SVM However, the median of the MSE for the MLP drops and reaches a
models. The medians of the RBF–SVM model are consistently value similar to that of the SVM models at larger sample size. The
lower than that of the sigmoid-based SVM. The median of the SVM results are relatively unchanged over all of the cases,
MSE for the MLP falls as the sample size increases but is still demonstrating that the method is robust. The results demonstrate
larger than that of the SVM models when 10% of the full dataset is that the RBF and sigmoid SVM regression models statistically
used. The RBF–SVR results are relatively unchanged over all of the outperform the MLP model to predict porosity with small sample
cases demonstrating its robustness. The results demonstrate that sizes. The prediction performance of SVM regression performs
A.F. Al-Anazi, I.D. Gates / Computers & Geosciences 39 (2012) 64–76 73

6.0

5.0

4.0

Median of MSE
3.0

2.0

1.0

0.0
2 3 4 5 6 7 8 9 10 20 30
Sample Size, %
MLP SVR-RBF SVR-SIGMOID

Fig. 22. Comparison of the medians of the mean squared error (MSE) permeability predictions obtained from the RBF and sigmoid-based SVM and MLP models versus
sample size obtained from 50 random realizations using the e-insensitivity loss function.

0.9

0.8
Mean of Correlation Coefficient

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0
2 3 4 5 6 7 8 9 10 20 30
Sample Size, %
MLP SVR-RBF SVR-SIGMOID

Fig. 23. Comparison of the mean of the correlation coefficient of permeability predictions obtained from the RBF and sigmoid-based SVM and MLP models versus sample
size obtained from 50 random realizations using the e-insensitivity loss function.

equally the same with sigmoid function being slightly more except for Cases 5 and 19 of SVR-RBF and for Cases 8 and 10 of
sensitive at very small sample sizes. Fig. 24 reveals that the SVR-Sigmoid models. On the other hand, the correlation coeffi-
prediction of MLP network converges to the SVM predictions for cient of SVR–RBF prediction is larger than that of SVR-Sigmoid
larger datasets of 20% and 30% of the available data. prediction, indicating that the sensitivity of SVM porosity predic-
To compare the relative porosity prediction accuracy of the tion capability to kernel functions seems to be larger than that to
SVM regressions with RBF and sigmoid kernel functions using the type of loss function used. Table 3 also shows that SVM
the e-insensitivity and least-modulus loss functions, the mean of prediction demonstrates consistent performance, indicating its
the correlation coefficient of predictions was estimated over all robustness to sample size.
samples sizes obtained from 50 random realizations as listed in
Table 3. The mean of correlation coefficients of predictions of MLP 7.2. Permeability prediction
regression was also tabulated. The empirical comparison of
correlation coefficients of predictions of SVM porosity models Fig. 25 plots the median of the MSE for the permeability
statistically outperforms the MLP models over all cases even predictions from the group of realizations for each case for the
though when the asymptotic setting for statistical optimality of SVM and MLP regression models. The results demonstrate that
the least-squares loss function used in MLP network is the MLP has significantly larger medians than that of the SVM
approached. The correlation coefficient of SVM prediction using models. The medians of the RBF–SVM model are consistently
least-squares loss functions is slightly larger than that obtained lower than those of the sigmoid-based SVM. The median of the
from the SVM models with the e-insensitivity loss functions MSE for the MLP falls as the sample size increases but is still
74 A.F. Al-Anazi, I.D. Gates / Computers & Geosciences 39 (2012) 64–76

0.025

0.020

Median of MSE
0.015

0.010

0.005

0.000
2 3 4 5 6 7 8 9 10 20 30
Sample Size, %
MLP SVR-RBF SVR-SIGMOID

Fig. 24. Comparison of the medians of the mean squared error (MSE) porosity predictions obtained from the RBF and sigmoid-based SVM and MLP models versus sample
size obtained from 50 random realizations using the least-modulus loss function.

Table 3
Comparison of the mean of the correlation coefficient porosity predictions obtained from the RBF and sigmoid-based SVM and MLP models versus sample size obtained
from 50 random realizations.

Case Data, % Mean of correlation coefficients

Multilayer perceptron network Support vector machines

e-Sensitivity loss function Least-modulus loss function

RBF kernel function Sigmoid kernel function RBF kernel function Sigmoid kernel function

1 2 0.3442 0.6511 0.5444 0.6533 0.5726


2 3 0.3095 0.5669 0.4345 0.5767 0.4715
3 4 0.3456 0.5919 0.5337 0.6171 0.5716
4 5 0.3940 0.6066 0.5699 0.6124 0.6012
5 6 0.4214 0.6261 0.5860 0.6124 0.6012
6 7 0.4389 0.6196 0.5445 0.6343 0.6143
7 8 0.4509 0.6261 0.6360 0.6420 0.6520
8 9 0.4833 0.6378 0.6342 0.6499 0.6184
9 10 0.5574 0.6415 0.6470 0.6486 0.6582
10 20 0.6415 0.6780 0.6828 0.6770 0.6811
11 30 0.6578 0.6835 0.6838 0.6849 0.6892

larger than that of the SVM models when the asymptotic over all cases even though the statistical optimality of the least-
optimality of the least-squares loss function used in MLP is squares loss function used in MLP network is approached. The
approached. The RBF–SVR results are relatively unchanged over results suggest that the use of the least-modulus loss function in
all of the cases, demonstrating its robustness whereas the SVR- the SVM empirical risk results in higher prediction frequency over
Sigmoid has some sensitivity to small sample size. The results all cases than use of the e-insensitivity loss function. Similar to
demonstrate that the SVM models statistically outperform the the porosity prediction, the SVM prediction capability is more
MLP models at predicting permeability with small sample sizes. sensitive to the kernel functions than to the loss function for the
The results also reveal that the prediction of MLP network unknown underlying permeability relationship.
converges to the SVM predictions for larger datasets of 20 and The overall analysis suggests that the selection of an appro-
30% of the available data. Similar to the porosity prediction, the priate kernel function that captures the signature of the under-
SVM prediction error variability depends on the type of kernel lying function may result in drastic prediction improvement of
functions used in permeability prediction. the nonlinear SVM regression. The results also indicate that the
To compare the relative permeability prediction accuracy of generalization ability of the SVM regression is superior to that of
the SVM regressions with RBF and sigmoid kernel functions with the MLP network implementing the least-squares empirical loss.
the e-insensitivity and least-modulus loss functions, the mean of In real-life applications, porosity and permeability are often
the correlation coefficients of predictions was estimated over all estimated from small size samples from key wells with unknown
sample sizes obtained from 50 random realizations as demon- noise density type and level. Therefore, it is important to give
strated in Table 4. The mean of correlation coefficient of predic- attention to the selection of kernel function type and its para-
tions of MLP regression was also tabulated. A comparison of meters as well as the SVM regularization hyper-parameters to
correlation coefficients of the predictions of the SVM porosity improve the generalization of prediction models. During training
models statistically outperform the results of the MLP models on SVM and MLP network prediction models, the results show
A.F. Al-Anazi, I.D. Gates / Computers & Geosciences 39 (2012) 64–76 75

6.0

5.0

4.0

Median of MSE
3.0

2.0

1.0

0.0
2 3 4 5 6 7 8 9 10 20 30
Sample Size, %
MLP SVR-RBF SVR-SIGMOID
Fig. 25. Comparison of the medians of the mean squared error (MSE) permeability predictions obtained from the RBF and sigmoid-based SVM and MLP models versus
sample size obtained from 50 random realizations using the least-modulus loss function.

Table 4
Comparison of the mean of the correlation coefficient permeability predictions obtained from the RBF and sigmoid-based SVM and MLP models versus sample size
obtained from 50 random realizations.

Case Data, % Mean of correlation coefficients

Multilayer perceptron network Support vector machines

e-Sensitivity loss function Least-modulus loss function

RBF kernel function Sigmoid kernel function RBF kernel function Sigmoid kernel function

1 2 0.4374 0.6891 0.5277 0.6832 0.5107


2 3 0.3529 0.6297 0.5110 0.6512 0.5174
3 4 0.3582 0.6428 0.5624 0.6520 0.5854
4 5 0.4886 0.6778 0.5959 0.6835 0.6097
5 6 0.4865 0.6701 0.6008 0.6592 0.6102
6 7 0.4895 0.6565 0.5883 0.6326 0.6090
7 8 0.5416 0.6909 0.6482 0.7024 0.6544
8 9 0.6048 0.7407 0.7046 0.7443 0.7092
9 10 0.5967 0.7425 0.6817 0.7364 0.6734
10 20 0.7131 0.7649 0.7566 0.7586 0.7614
11 30 0.7414 0.7676 0.7658 0.7696 0.7716

that SVM models require larger computation time than that of the sample sizes considered in the study (the smallest sample
MLP network for larger datasets (e.g., Cases 10 and 11), which in size was 2% of the full dataset whereas the largest was 30%);
turn suggests that the MLP network should be selected for larger 3. the median of the mean squared error of the predictions of the
datasets and SVM models should be used for small datasets. SVM models differs from that of MLP with 95% confidence;
4. the SVM prediction error distribution has lower variability
than MLP;
8. Conclusions 5. of the SVM models, the Gaussian radial basis kernel function
outperforms the sigmoid kernel functions;
The capability of support vector regression to model porosity and 6. of all the models, the Gaussian radial basis kernel function
permeability in a heterogeneous reservoir with finite sample size has outputs the least extreme outliers compared to the MLP and
been investigated. The result of the prediction of SVM models with sigmoid kernel function;
Gaussian radial basis and sigmoid kernel functions have been 7. the SVM regression models, in particular the RBF–SVR, are
compared to multilayer perceptron neural networks. The results relatively insensitive to the size of the porosity and perme-
reveal that ability sample populations as compared to MLP network,
which might be attributed to the application of structural
1. the SVM based on radial basis and sigmoid kernel functions is risk minimization inductive principle in the formal and the
capable of modeling small datasets of porosity and perme- empirical risk principle in the latter;
ability outperforms the MLP method; 8. the SVM nonlinear regression with the least-modulus loss
2. the SVM predictions, in particular the one using the radial function shows relatively better performance compared to
basis kernel functions, are relatively consistent over all the regression with the e-insensitivity loss function;
76 A.F. Al-Anazi, I.D. Gates / Computers & Geosciences 39 (2012) 64–76

9. the prediction of SVM nonlinear regression appears to be Huang, Z., Shimeld, J., Williamson, M., Katsube, J., 1996. Permeability prediction
more sensitive to kernel functions than to loss functions; and with artificial neural network modelling in the Venture gas field, offshore
eastern Canada. Geophysics 61, 422–436.
10. although the SVM prediction models outperform those from Kaviani, D., Bui, T.D., Jensen, J.L., Hanks, C.L., 2008. The application of artificial
the MLP network, less computational time is required for neural networks with small data sets: an example for analysis of fracture
interpretation models estimated from larger datasets. spacing in the Lisburne formation, Northeastern Alaska. SPE Reservoir Evalua-
tion and Engineering 11 (3), 598–605.
Kecman, V., 2005. Support vector machines—an introduction. In: Wang, L. (Ed.),
Support Vector Machines: Theory and Applications. Springer-Verlag, Berlin
References Heidelberg, pp. 1–47 (Chapter 1).
Kim, K., Jung, K., Park, S., Kim, H.J., 2001. Support vector machine-based text
detection in digital video. Pattern Recognition 34, 527–529.
Al-Anazi, A., Gates, I.D., 2010a. On the capability of support vector machines to Li, Z., Weida, Z., Licheng, J., 2000. Radar target recognition based on support vector
classify lithology. Natural Resources Research 19 (2), 125–139. machine. In: Proceedings of Fifth International Conference on Signal Proces-
Al-Anazi, A., Gates, I.D., 2010b. Support vector regression for permeability sing, vol. 3, pp. 1453–1456.
prediction in a heterogeneous reservoir: a comparative study. SPE Reservoir Lu, J., Plataniotis, K., Ventesanopoulos, A., 2001. Face recognition using feature
Evaluation and Engineering 13 (3), 485–495. optimization and v-support vector machine. In: Proceedings of the IEEE Signal
Al-Anazi, A., Gates, I.D., 2010c. A support vector machine algorithm to classify
Processing Society Workshop, North Falmouth, MA, vol. 11, pp. 373–382.
lithofacies and model permeability in heterogeneous reservoirs. Engineering Ma, C., Randolph, M., Drish, J., 2001. A support vector machines-based rejection
Geology 114 (3–4), 267–277.
technique for speech recognition. In: Proceedings of the IEEE International
Al-Anazi, A.F., Gates, I.D., 2010d. Support vector regression for porosity prediction
Conference on Acoustics, Speech, and Signal Processing, Salt Lake City, Utah,
in a heterogeneous reservoir: a comparative study. Computers and Geos-
vol. 1, pp. 381–384.
ciences 36 (12), 1494–1503.
Mathworks Matlab User’s Guide, 2007. Statistics Toolbox, Matlab CD-ROM Math-
Cherkassky, V., Ma, Y., 2004. Selecting of the loss function for robust linear
works, Inc.
regression. Neural Computation 1, 395–400.
Nguyen, D., Widrow, B., 1990. Improving the learning speed of 2-layer neural
Cherkassky, V., Mulier, F., 2007. Learning from Data. Concepts, Theory, and
networks by choosing initial values of the adaptive weights. In: Proceedings of
Methods, 2nd ed. John Wiley & Sons Inc., Hoboken, New Jersey 538pp.
Choisy, C., Belaid, A., 2001. Handwriting recognition using local methods for the International Joint Conference on Neural Networks, vol. 3, pp. 21–26.
normalization and global methods for recognition. In: Proceedings of Sixth Rogers, S.J., Chen, H.C., Kopaska-Merkel, D.C., Fang, J.H., 1995. Predicting perme-
International Conference on Document Analysis and Recognition, pp. 23–27. ability from porosity using artificial neural networks. AAPG Bulletin 79,
Fung, C., Wong, K., Eren, H., 1997. Modular artificial neural network for prediction 1786–1797.
of petrophysical properties from well log data. IEEE Transactions on Instru- Sherrod, P.H., 2009. DTREG Predictive Modeling Software User’s Manual, Version
mentation and Measurement 46 (6), 1295–1299. 9.1, /http://www.dtreg.comS.
Gao, D., Zhou, J., Xin, L., 2001. SVM-based detection of moving vehicles for Suykens, J.A.K., Van Gestel, T., Brabanter, J., De Moor, B., Vandewalle, J., 2002. Least
automatic traffic monitoring. In: Proceedings of the IEEE Intelligent Transpor- Squares Support Vector Machines. World Scientific, Singapore.
tation System, Oakland, CA, pp. 745–749. Van Gestel, T., Suykens, J., Baestaens, D., Lambrechts, A., Lanckriet, G., Vandaele, B.,
Hastie, T., Tibshirani, R., Friedman, J., 2001. The Elements of Statistical Learning: De Moor, B., Vandewalle, J., 2001. Financial time series prediction using least
Data Mining, Inference, and Prediction. Springer, New York. squares support vector machines within the evidence framework. IEEE
Helle, H., Ursin, B., 2001. Porosity and permeability prediction from wireline logs Transactions on Neural Networks 12 (4), 809–821.
using artificial neural networks: a North Sea case study. Geophysical Pro- Vapnik, V., 1995. The Nature of Statistical Learning Theory. Springer, New York.
specting 49, 431–444. Vapnik, V., Chervonenkis, A., 1971. On the uniform convergence of relative
Helle, H., Bhatt, A., 2002. Fluid saturation from well logs using committee neural frequencies of events to their probabilities. Theory of Probability and its
networks. Petroleum Geoscience 8, 109–118. Applications 16 (2), 264–280.
Hornik, K., Stinchcombe, M., White, H., 1989. Multilayer feedforward networks are Vapnik, V., Chervonenkis, A., 1974. Theory of Pattern Recognition [in Russian].
universal approximators. Neural Networks 2 (5), 359–366. Nauka, Moscow (German Translation: Wapnik W., Tscherwonenkis A., Theorie
Huang, Y., Gedeon, T.D., Wong, P.M., 2001. An integrated neural-fuzzy-genetic- der Zeichenerkennung, Akademie-Verlag, Berlin, 1979.).
algorithm using hyper-surface membership functions to predict permeability in Vapnik, V.N., 1982. Estimation of Dependences Based on Empirical Data. Springer,
petroleum reservoirs. Engineering Applications of Artificial Intelligence 14, 15–21. Berlin.

Das könnte Ihnen auch gefallen