Sie sind auf Seite 1von 10

MULTIPLE POINT GEOSTATISTICS : OPTIMAL TEMPLATE SELECTION

AND IMPLEMENTATION IN MULTI-THREADED COMPUTATIONAL


ENVIRONMENTS

ALVARO E. BARRERA, JOSEPHUS NI and SANJAY SRINIVASAN


Department of Petroleum & Geosystems Engineering,
University of Texas at Austin, USA TX 78712-0228

Abstract.

Sequential Indictor Simulation is a classic stochastic simulation approach that is pixel-


based and utilizes kriging/cokriging to obtain estimates of the necessary conditional
distributions and generate various analogous reservoir realizations of the target
reservoir. However, reproduction of complex 3D patterns is not possible using
traditional two-point statistics (variogram) based approaches. Therefore, new stochastic
simulation approaches based on multiple point statistics have surfaced in the literature.
Since computational efficiency is a significant consideration in implementation of
multiple point statistics based modeling approaches, parallel computational techniques
and multithreading have to be implemented in order to render the process efficient for
field scale reservoir characterization. The implementation of such a multi-threaded,
multiple point simulation algorithm is discussed. The selection of an optimal spatial
template is critical for capturing and reproducing complex spatial patterns observed in
analogous systems. An efficient template optimization algorithm is also presented.

1. Introduction

Geostatistics has become a widespread tool for reservoir modeling and uncertainty
assessment. Multiple equiprobable reservoir models constrained to data of different
types and volume supports (geologic, geophysical and production data) can be generated
while preserving the apparent geological structure. Traditional geostatistics is anchored
on two basic concepts: the variogram model as representation of the spatial
heterogeneity or continuity (more statistical than geologic), and kriging for spatial
interpolation. Variogram-based geostatistics is mathematically consistent and
convenient (mathematical representation of physical concepts), and its application is
appropriate considering the lack of conditioning information in typical reservoir
modeling scenarios and given the computational limitations facing the earlier generation
of earth modelers.

Practitioners and particularly geologists have found the variogram suitable to describe
geological heterogeneity within a single facies, but too limiting in describing and
reproducing more organized and sharp geological features such as faults, fractures, and

621

O. Leuangthong and C. V. Deutsch (eds.), Geostatistics Banff 2004, 621–630.


© 2005 Springer. Printed in the Netherlands.
622 A E. BARRERA, J. NI AND S. SRINIVASAN

facies distributions among others. These geological features usually have the largest
impact on the flow response. The traditional variogram-based geostatistics fail to
appropriately retain the required conditional information to reproduce these features;
generating amorphous realizations that exhibit maximum entropy instead of
systematically organized structures and patterns as expected from prior geological
knowledge. These geological features call for a different approach utilizing multiple
point statistics instead of variogram models (two point statistics) to capture the required
conditional information to generate fields that exhibit lower entropy, more structural
organization and preserve reservoir heterogeneity.

This approach requires a training image, a numerical representation of the spatial law
and an explicit non-conditional conceptual description of the geological structures and
patterns to be reproduced in the field. Training images can be obtained from outcrops
and photographs. Although it seems easier to define a variogram model than develop
training model depicting the critical reservoir heterogeneity, however, it is important to
realize that variogram models are not any less subjective or constraining, and moreover,
they are more limiting and less intuitive in describing geological heterogeneity. Patterns
to be reproduced can be explicitly depicted in the training model as compared to the
hidden higher-order statistics implicit within traditional variogram-based geostatistical
models.

The general objective of this paper is to implement a new simulation approach that is
based on geological feature identification and reproduction using a unique growth-based
simulation algorithm. Features are grown starting from conditional data locations based
on multiple point statistics inferred using optimized spatial templates. The simulation is
implemented on a multi-threaded computational environment and consequently can be
used to efficiently simulate 3D models comprising of over 10 million cells. The
robustness of the simulation methodology and the reproduction of realistic geological
features depend on the optimality of the selected spatial template used to capture the
pattern characteristics. A unique approach for optimization of the spatial template is also
presented.

2. Literature Survey

Stochastic simulation was introduced by Matheron (1973) and Journel (1974) to correct
for the smoothing effects and other artifacts of kriging, allowing the reproduction of the
spatial variance predicted by the variogram model. Different algorithms were developed
including sequential simulation (Journel, 1983, Isaaks, 1990; Srivastava, 1992;
Goovaerts, 1997; Chiles and Delfiner, 1999), which has become the workhorse for many
current geostatistical applications. Stochastic simulation provides the capability to
generate multiple equiprobable realizations, giving birth to the idea of assessing spatial
uncertainty (Journel and Huijbregts, 1978).

Stochastic simulation stepped away from the variogram and kriging based core for the
first time with the Boolean object-based algorithms, introduced by Stoyan, Kendal and
Mecke (1987), and Haldorsen and Damsleth (1990), in an attempt to reproduce
geological features, like channels and fractures, by fitting parametric shapes. Initially
MULTIPLE POINT GEOSTATISTICS 623

Srivastava (1992), and later Caers (1998) and Strebelle (2000), proposed the idea of
borrowing conditional probabilities directly from a training image, allowing the use of
higher order or multiple point statistics to reproduce geological structures and patterns.
Advantages of this approach over the Boolean object-based method include the pixel-
based non-iterative process and the ease of data integration of different types.

Training images and multiple point statistics methods remained largely untried until
developments in computational capabilities and multiple point scanning techniques took
place. The search tree, a training image scanning technique proposed by Strebelle
(2000), has allowed for wider application of the multiple point statistics approach. In the
search tree approach, the training proportions are recorded in a dynamic data structure
that renders it convenient to store nested information such as number of outcomes of
data events (joint probabilities). However, only “exact” data events directly found in the
training image can be stored and later retrieved as the conditioning information in the
estimation. A second approach allows the reproduction of “similar” data events by
modeling and interpolating between found data events. This second approach was
supported by the classification methods proposed by Arpat (2003) and Breiman et al.
(1984); and neural net based models.

Currently, the basic components that characterize the application of multiple point
geostatistics remain the major focus of continuous research and development efforts,
with the objective of defining a more consolidated and practical methodology. These
components include, among others, the generation or acquisition of numerical spatial
representations to be used as training images; the optimal definition of scanning
templates to capture the proper conditional information from the training image; and
development of computational schemes that render the application more practical and
convenient. The two last components are addressed in this paper.

3. Proposed Methodology

3.1 FEATURE IDENTIFICATION AND SIMULATION

The purpose of the multiple point geostatistics approach is borrowing conditional


proportions (joint probabilities) from a numerical representation of the spatial law
(training image) describing the random function. Training images require some prior
information. When this prior information is uncertain, alternative training images can be
used to narrow the range of uncertainty and a Bayesian approach can be adopted to
incorporate that uncertainty in resultant geological models (Liu et al., 2004). The
training image requires the effort of geologists to generate one or more numerical
representations of spatial heterogeneity, reflecting prior information about complex
shapes and geometrical patterns.

In order to capture the relevant details of reservoir heterogeneity, the size of the training
data as well as the size and complexity of the spatial template can be large.
Consequently the process of scanning and storing multiple point conditional
probabilities can have a rather high computational requirement. Taking advantage of
recent advances in computational technology using multiple cpu processors, the task of
624 A E. BARRERA, J. NI AND S. SRINIVASAN

scanning training images is performed using multiple processing threads. The training
image is segmented into multiple domains and the task of scanning each sub-domain
using the specified spatial template(s) is assigned to a dedicated cpu thread. At the end
of the scanning process, the statistics computed by each thread is compiled into a single
statistical summary for the entire training image.

The objective of the scanning process is to obtain the joint probabilities:


Prob{Z (ui ) zk i 1,.., N ; k 1,..., K} (1)
N is the number of nodes in the template, ui is the position of the i node, zk is the kth
th

threshold of the random variable Z (u) . During the simulation phase, the spatial
template is placed at a location u o . Some nodes u j , j 1,.., N ' where N '  N is the
subset of template nodes, already have simulated values. The requirement is of the
probability of a simulation event A(uo ) conditional to the pattern A(u j , j  N ' ) in the
surrounding nodes of the template. The joint probability expression in Expression (1)
can be used to calculate the requisite conditional probabilities:
Prob{ A(u o ) ‰ A(u j )}
Prob{ A(u o ) A u j , j  N '  N } (2)
Prob{ A(u j )}
In terms of the notation in Expression (1), the numerator in Expression (2) is simply the
joint probability: Prob{Z (ui ) zk i  N ' ; Z (u j ) zk 'j  N ' ; k , k ' 1,..., K } . The
denominator is the joint probability in the numerator summed over all occurrences of
patterns A(uo ) :
Prob{ A(u j )} ¦ Prob{ A(uo ) ‰ A(u j )} (3)
A(u o )
Knowing the conditional probability given by expression (2), a simulation pattern can be
obtained by drawing from the conditional probability distribution.

Remark: The simulation event A(uo ) in traditional multiple point statistics


implementations consists of a single point event i.e. the outcome at the central node of
the template given the multiple point event in the surrounding nodes. In the
implementation presented here, the simulation event is allowed to be a multiple point
event. Thus both the conditioning and simulation events are multiple point events. This
renders the simulation process to be fast and more important, the simulated patterns
exhibit better continuity.

Following scanning, the process of stochastic simulation is commenced where the


inferred multiple point histogram is used in conjunction with reservoir specific data
distributions to obtain realizations with realistic spatial heterogeneity. In the traditional
implementations of multiple point geostatistics, the simulation nodes are visited along a
random path and the probability of the central node given the configuration of pattern in
the surrounding nodes is obtained from the search tree that contains a catalog of the
scanned patterns. At the beginning of the simulation when only a few nodes have
assigned values, considerable cpu time may be spent searching for unusual or infrequent
patterns. In order to alleviate this problem, patterns or structures are grown from data
MULTIPLE POINT GEOSTATISTICS 625

locations in the method implemented in this study. This approach improves the
continuity of the simulated patterns while reducing the artifacts in the simulated image.

Dividing the simulation domain into regions based on the density of conditional data,
the conditioning data locations are visited along a random path. A node surrounding the
data location is marked as “simulatable” based on whether the spatial template centered
at that location contains at least a single conditional data. Corresponding to a randomly
picked conditioning data location, a “simulatable” node in the vicinity of that data
location is also randomly picked. The multiple point simulation event A(uo ) is picked
from the conditional probability distribution (Expression (2)) at that location. After all
the conditioning data locations are visited, the list of “simulatable” nodes is updated and
the next node is selected from this updated list. The simulation is continued until all the
simulation nodes have been assigned a value.

3.2 PROGRAM IMPLEMENTATION

The program is implemented in Java to facilitate cross platform compatibility. There are
several inherent performance hindrances with the Java platform, however. The biggest
hindrance to achieve equal performance to C++ is that of memory allocation and
garbage collection. De-allocation of memory in Java is done automatically by an
automated “garbage collector.” Because almost everything in Java is an object, the
creation of objects can be expensive, because creating an object requires calls not only
to its own constructor, but also its parent class’s constructor. For these two reasons,
much of the reusable data types in this implementation are first created by Object
Oriented design and then optimized using basic data types. Much attention was paid to
choosing thread-safe data types. Other methods of optimization such as inline methods
and reducing in-loop instructions and instantiations were used in conjunction with
previously mentioned optimizations. Currently, the second most costly operation is an
object creation step that is looped through nearly every single simulated node. This
object creation overhead can be reduced by introducing a library system for objects with
check-in, checkout feature. The most costly operation is repeating a search operation on
the Vector data type. This problem can be alleviated by reducing the usage of such
operations or by rewriting the library manually. The more detailed implementation
algorithm is described below.

For the scanning process, the template generates a marginal offset at the edges of the
training image in order to operate within the bounds. This marginal offset is defined by
the template configuration and has an important impact in the required size of the
training image when large-scale templates are used. The scanning process is described
by the following steps:

1. Select a cell location in the training image and superimpose the central node of the
template at that location.
2. Capture the cell values corresponding to the template nodes
3. Identify the pattern and store it. The pattern is stored in a long integer data type, I,
by converting the N individual template node values, vi, to integers in the range [1-9]
626 A E. BARRERA, J. NI AND S. SRINIVASAN

and looping over all the template nodes with the formula: I i = Ii-1 + vi * 10(i-1);
for i = 1,2,…,N; and I0 = 0.
4. Repeat steps 1 through 3 until the center of the template has been located in all
training image cells excluding the marginal offset.
As the training image is being scanned, each obtained pattern integer representing a data
event is put into an array that is large enough to hold all possible occurrences for that
training image. The array is sorted and scanned for number of occurrences of each data
event. The occurrences are stored in an alternate array at the same index as the first
occurrence of the data event. Both arrays are then compressed by eliminating the
repetitions and zeros.

In order to optimize the performance and utilization of computational resources, the


scanning process is parallelized. The first degree of parallelization is multithreading.
Multithreading is much more preferable to forking a process due to much lower
computing overhead in thread allocation. By applying multithreading, the scanning
process time was reduced significantly – to a few seconds, by only two threads. This
means that further degrees of parallelization are not required for this process including
automated thread control to spawn more threads according to number of physical
processors present.

The conventional stochastic simulation algorithms can be rather slow and cumbersome
since they only simulate one point at a time. The scanning process defined above allows
the simulation algorithm employed here to simultaneously simulate all cells under the
template. All simulation nodes except the conditioning data locations are assigned a
negative one (-1) value initially. Next, during the process of simulation:
1. Select a cell as center cell of the spatial template
2. Detect the pattern t(n) on the nodes of the spatial template by matching only non-
negative values against the ones previously recorded during the scanning phase.
Only when all these positive values and their positions are completely matched,
then the record is chosen. Retrieve the subset of scanned patterns that exhibit the
pattern t(n).
3. Take the probability of each pattern in the subset and construct a step CDF with
probability on y-axis and pattern index number on x-axis.
4. Randomly sample from that CDF.
5. Get the corresponding pattern by its index and fills the cells under the template that
are marked as empty.

In order to speed up the simulation process, the simulation domain is divided into a
number of regions, each corresponds to a thread task. The conditioning data points are
also divided into the simulation sub-domains, with each set containing a subset of
simulated cells and a subset of candidate (“simulatable”) cells, which are defined as
uninformed cells located near previously informed cells. If any of the cells in a template
during simulation is within the border of the image but beyond the quadrant border
defined by the thread, the value is put down into that cell. The procedure is safe and
will not conflict with another thread’s execution since this portion of the code is
synchronized between the threads.
MULTIPLE POINT GEOSTATISTICS 627

3.3. RESULTS

The following are slice one through four of the simulation image, which is of size
100x130x10. The training image consists of a 3-D volume made of fluvial channels
trending in the North-South direction. Conditioning data along vertical wells are
assumed.

Figure 1. Slices of a 3D simulation image obtained by application of the mp statistics


based algorithm. The training image (Left) consists of channels trending in the North-
South direction.

The single thread operation took 2566881 milliseconds, where the multithreaded
operation with four concurrent threads took on average 688000 milliseconds, which is
nearly four times faster. This means that our application scaled very well with the
increased number of processors present. It should be noted however, that the operating
system in use is Redhat Linux 7.3 with kernel version 2.4.20 and that this kernel does
not have a hyperthreading optimized scheduler, thus, the performance could be further
enhanced by upgrading to a more modern kernel version.

It is also to be noted that for this particular case, the best simulation images resulted
with 2-D templates. This is because the channels change in orientation from one layer of
the training image to the next. This causes the 3-D templates to introduce much noise
into the simulation. Despite the 2-D templates used, the slices exhibit correlation from
one slice to the next due to the profusion of conditioning data in the vertical direction
(due to the presence of vertical wells). Further optimization of 3-D templates to account
for vertical variations will be attempted in the future.
628 A E. BARRERA, J. NI AND S. SRINIVASAN

3.4. TEMPLATE SELECTION

A suitable scanning template is essential whose size and geometry define the search
neighborhood and the detection of multiple point data events describing the spatial law.
Consider a grid u  (4  :) where 4 is the size of the grid and : is the size of the
simulation domain. A stationary attribute Z (u) is assumed over the grid. Suppose a
spatial template of size N is desired. The objective is to identify a template
t (ui ), i 1,.., N within the grid 4 such that the selected template optimally represents
the dominant pattern of reservoir heterogeneity. The size and the scale of the template
N are user-specified and are dependent on the available cpu, complexity of the
geological image etc. Designating the central node in the grid 4 as u o , the covariance
C (u j , u o ) between pairs of nodes u j 4 : u j z uo and u o is calculated on the basis of
the particular training image. Under stationarity, the required covariance is calculated by
translating a two-point template h jo | u j  u o | over the training image. The
4  1 covariance values C (u j , uo ), j 1,.., 4  1 are ranked and the top N values and
the corresponding locations ui , i 1,.., N define the optimal spatial template t (u).
Spatial templates at multiple scales can be obtained by choosing the grid 4 of different
resolution (multiple grids), accounting for geological patterns at different scales.

In order to test the efficacy of the template optimization algorithm, multiple training
images were generated exhibiting similar features but with variations in spatial
orientation and anisotropy. Some examples of optimal templates obtained for these
training images comprised of modified geological feature (lens) rotated at different
angles are shown in Figure 2.

Figure 2. Template configurations obtained with the Template Selection Algorithm for
similar geological features (lens) with different orientations and anisotropy.
MULTIPLE POINT GEOSTATISTICS 629

3.4.1 Impact of template geometry on mp statistics captured by the scanning process.

In order to evaluate the results of the template selection algorithm and the importance of
the template selection, a sensitivity study was performed. In this study, a training image
exhibiting ellipsoidal lens with North-South orientation (Figure 3) was generated and
scanned with six templates of the same size but different geometry. All the templates
were produced by the template selection algorithm, considering different numerical
representations of lens with the same size but different orientations.

Figure 3. Training image exhibiting North-South ellipsoidal lens.

The study objective is to demonstrate the importance of an appropriate template for


capturing more information about the multiple point statistics describing a particular
geological feature. Hence, the multiple point statistics inference process was performed
using a variety of spatial templates. It can be noticed in Figure 4, enhanced performance
is obtained corresponding to Template 1, which has a North-South orientation. This
template captured more relevant data events (with frequency higher than 10) from the
training image (approximately 11% more than the second best Template) and
consequently, the total number of occurrences retained as multiple point conditional
information is increased. The difference in configuration between Templates 2 and 3 is
the location of a single node; however this single node causes the number of relevant
data events to drop in approximately 11%.

4 Conclusions

A unique stochastic simulation approach based on growth of objects within the


simulation domain is presented. The object growth is controlled by the multiple point
statistics inferred on training images. In order to render the simulation computationally
efficient, the process is implemented on a multi-threaded environment. The scanning as
well as the simulation processes both take advantage of multiple cpus. The
computational process is demonstrated to scale well when going from 1 to 4 processors.

The selection of an optimal spatial template for retrieving the multiple point statistics is
an important aspect of the proposed simulation algorithm. A fast and robust approach to
derive optimal spatial templates is presented. The results show that the number of
template nodes and their geometry influence the robustness of the retrieved statistics.
630 A E. BARRERA, J. NI AND S. SRINIVASAN

Figure 4. Number of relevant data events (frequency greater than 10 in training image)
plotted against the template configuration.

5 References

Arpat, B., A pattern recognition approach to multiple-point simulation, Report no.16, Stanford Center for
Reservoir Forecasting, Stanford University, 2003.
Breiman, L., Friedman, J., Olshen, R. and Stone, C., Classification and regression trees, publ. Wadworth,
Monterrey, 1984.
Caers, J., Stochastic simulation using neural networks, Report no. 11, Stanford Center for Reservoir
Forecasting, Stanford University, 1998.
Caers, J. and Journel, A.G., Stochastic reservoir simulation using neural networks trained on outcrop data, SPE
paper no. 49026, 1998.
Chiles, J.P. and Delfiner, P., Geostatistics: Modeling spatial uncertainty, publ. Wiley, N.Y, 1999.
Goovaerts, P., Geostatistics for Natural Resources Evaluation, publ. Oxford Press, N.Y., 1997.
Haldorsen, H. and Damsleth, E., Stochastic modelling, J. of Pet. Technology, April 1990, pp.404-412, 1990.
Isaaks, E., The application of Monte Carlo methods to the analysis of spatially correlated data, Ph.D. thesis,
Stanford University, 1990.
Journel, A.G., Geostatistics for conditional simulation of orebodies, in Economic Geology, vol. 69, pp. 673-
687, 1974.
Journel, A.G., Non-parametric estimation of spatial distributions, Math. Geology, vol. 15, no. 3, pp. 445-468,
1983.
Liu, X. and Srinivasan, S., Field Scale Stochastic Modeling of Fracture Networks – Combining pattern
statistics with geomechanical criteria for fracture growth, In this volume.
Matheron, G., The intrinsic random functions and their application, Advances in Applied Probability, vol. 5,
pp. 439-468, 1973.
Srivastava, R.M., Reservoir characterization with probability field simulation, SPE paper no. 24753, 1992.
Stoyan, D., Kendall, W. and Mecke, J., Stochastic geometry and its applications, publ. Wiley, N.Y., 1987.
Strebelle, S., Conditional simulation of complex geological structures using multiple-point statistics, Math.
Geology, vol. 34, no. 1, pp. 1-21, 2002.

Das könnte Ihnen auch gefallen