Sie sind auf Seite 1von 5

SPE 77422

Sums and Products of Distributions: Rules of Thumb and Applications


James A. Murtha, Consultant

Copyright 2002, Society of Petroleum Engineers Inc.

This paper was prepared for presentation at the SPE Annual Technical Conference and
2. Its mean is equal to the sum of the means of
Exhibition held in San Antonio, Texas, 29 September 2 October 2002. the components
This paper was selected for presentation by an SPE Program Committee following review of
information contained in an abstract submitted by the author(s). Contents of the paper, as
3. Its standard deviation is approximately equal to the sum
presented, have not been reviewed by the Society of Petroleum Engineers and are subject to of the standard deviations of the components times n.
correction by the author(s). The material, as presented, does not necessarily reflect any
position of the Society of Petroleum Engineers, its officers, or members. Papers presented at
SPE meetings are subject to publication review by Editorial Committees of the Society of 4. And as a consequence of 2 and 3, the Coefficient of
Petroleum Engineers. Electronic reproduction, distribution, or storage of any part of this paper Variation, which we denote here as R, shrinks by a
for commercial purposes without the written consent of the Society of Petroleum Engineers is
prohibited. Permission to reproduce in print is restricted to an abstract of not more than 300 factor of n.
words; illustrations may not be copied. The abstract must contain conspicuous
acknowledgment of where and by whom the paper was presented. Write Librarian, SPE, P.O.
Box 833836, Richardson, TX 75083-3836, U.S.A., fax 01-972-952-9435. Can we find similar facts about the product of n distributions?
Investigation
Abstract Regarding 4, recall that the R, the Coefficient of Variation or
When n probability distributions are added, the Central Limit Variability, is the ratio of standard deviation to mean. This
Theorem provides estimates of the mean and standard dimensionless statistic is a common measure of dispersion.
deviation of the result. One useful consequence is that the Rather than say that a variable has a standard deviation of 32
relative uncertainty as measured by the coefficient of variation units of whatever were measuring, we usually use the
the ratio of standard deviation to mean- shrinks by a factor definition of the R to say instead that the standard deviation of
of roughly n. What can be said about the mean and standard the variable is 5% of the mean (very tight) or 40% of the mean
deviation of a product of distributions? Both sum and product (widely dispersed). Recall that for normal distributions, the
formulas abound in the oil and gas business. While Monte ranges of one, two, and three standard deviations to each side
Carlo simulation generates the appropriate output distribution, of the mean account respectively for 68%, 95%, and 99.7% of
some rules of thumb are useful in advance. all the data. Thus when someone says plus or minus 10%
(by which we understand the entire range to be between 90%
Introduction
and 110% of the mean) they are referring to a variable with a
Sums are normal and products are lognormal are useful
standard deviation of slightly more than 3.3% of the mean, or
generalizations when doing uncertainty analysis. Both
an R of about .033.
assertions are widely recognized, if not always fully
When do we add distributions?
understood or appreciated. The oil and gas industry uses
We add distributions (sometimes called aggregation)
product models extensively for resource and reserve
whenever we a) add line item costs to get total cost of drilling
estimation and sum models to aggregate production and
a well or constructing a pipeline, b) add reserves from
reserves. The starting point for our investigation is the oft
different fields, c) add production from different wells. Thus,
mentioned Central Limit Theorem, which tells us how sums
adding distributions is a common operation in analyzing
behave in terms of their summands. The challenge is to find a
uncertainty. So according to the remarkable facts, these
suitable analog for products and then to test the approximation
aggregations will generate an estimation of the values of the
on field examples.
statistics we are seeking that is far more precise than the
Problem Statement components. In other words, if we add 10 line items, each
The Central Limit Theorem tells us that under rather represented by a distribution whose min and max can be 30%
restrictive hypotheses (the summands are identical, from the mean, then the aggregate will be a distribution whose
independent, and normal), the sum of n distributions is min and max may deviate from the mean by only about 10%.
normal. More importantly, we can relax the assumptions and While this fact is generally regarded as helpful in arriving at
demonstrate by simulation the following remarkable facts: estimates, most people dont realize that it can also uncover
inconsistencies in our logic.
1. The sum of n distributions of any type is approximately
Using these facts to discover errors in logic
normal
One case in point is the pitfall of assigning conservative
2 JAMES A. MURTHA SPE 77422

reserves, say P10 values, for individual fields only to learn are dissimilar. We concentrate on cases where the output is a
that the aggregation of these values can grossly underestimate product of distributions, since they are so popular in the
the reserves for a business unit comprised of these fields. petroleum literature.
Another pitfall is when line item costs are represented by Applications
skewed right distributions and their modes (peaks) are taken as The most common example of a product of distributions is a
the base cost and summed to get a base value for the total cost: volumetric estimate of hydrocarbon in place or reserves. We
the result is much too conservative, and can have less than a illustrate the approximation method with four examples from
1% chance of occurrence. the literature. The distributions include triangular; triangular
What About Products? specified by P10, mode, and P90; normal; lognormal; and
What can we say about taking products of distributions, which truncated normal.
is another common operation in uncertainty analysis? Before Whereas the normal and lognormal distributions are typically
we present more Remarkable Facts, recall two things about specified with their mean and standard deviation, making the
products: R simple to compute, all the other distribution types require
either fancier formulas or simulation to obtain estimates of
the logarithm of a product is the sum of the logs of the their mean and standard deviations.
factors, and For triangles, we have the simple formulas (where L, M, H are
a lognormal distribution is simply one whose logarithm is min, mode, max)
normal. Mean = (L+M+H)/3
Thus if Variance = (L2 + M 2 + H2 LM LH MH)/18
Y = X1 *X2 **Xn , Stdev = sqrt(Variance)
Then When P10 and P90 are specified instead of L and H, there is
Log(Y) = Log(X1 )+Log(X2 )++Log(Xn ), no closed form equation for L and H. One can either use an
The Central Limit Theorem assures us that the latter is iterative procedure outlined by Murtha and Janusz1 or simulate
approximately normally distributed, so that Y must be to estimate the values of mean and Stdev.
approximately lognormal. Thus, Because our approximation assumes that the output is a
1. The product of distributions is approximately product of the inputs, it is sometimes necessary to transform
lognormal some of the variables. For instance, if we start with Sw, then
What about the other two aspects of the Central Limit we must convert to So or Sg, which is (1-Sw). If we start with
Theorem? Bo, we must convert to 1/Bo, and so on. A simple way to find
the appropriate distribution for the transformed variable is to
Can we say that the product of the means is the mean of run a simulation. The following examples illustrate how
the product? closely REstimate1 and Restimate2 approximate the simulated
In addition, what can be said about the coefficient of value. To use REstimate3, the inputs must have similar Rs.
variation of the product? Example 1. Cronquist2 used triangular distributions by
The first of these is true, is detailed in the Appendix, and can specifying the P10, mode, and P90 values rather than the
be found in many statistics books, PROVIDED there is NO conventional P0 (min), mode, and P100 (max). We used a
CORRELATION among the factors Xi . utility program (ref, Murtha) to get the actual P0 and P100,
2. Mean of Y = Product (means of Xi ) then calculated the Rs and confirmed them with simulation.
The analog for the standard deviation, unfortunately, is more The inputs had Rs ranging from about 0.1 to 0.3. The
complicated. Whereas the R of a sum shrinks, the R of simulated output R was 0.534 and the approximations are
products expands. We prove in the Appendix that: 0.551 and .525. Details are in Table 1.
Example 2. Davis 3 uses extensive data to model a Lansing-
3. REstimate1 1+ Ry2 (1+ Ri 2 ) Kansas City prospect in the Central Kansas Uplift. Using
This can be simplified somewhat with a little loss in accuracy lognormal distributions for area and net pay, Davis switches to
to normal distributions for porosity, saturation and recovery
REstimate2 RY ( Ri2 ) factor, truncating each of these. Porosity is truncated at the
When the Rs are similar, we get low end to honor a judgment that less than 2% porosity is
nonproductive and at the high end (30%) in keeping with a
REstimate3 RY n*R
report on carbonate reservoirs in the US. While the low cutoff
In other words,
does alter the distribution a little, the high cutoff is four (!)
Adding similar distributions shrinks the R by about a standard deviations above the mean. Likewise, his cutoffs of
factor of n; 0% and 100%, intended to guarantee that the sampled values
are meaningless, in fact do little to alter the mean and standard
Multiplying similar distributions expands the R by deviation. Area is a highly skewed distribution, with R of
about a factor of n. 1.04, and thereby dominates the product. Here the first order
Only experience will reveal how good these approximations approximation yields 1.40 compared to the approximations of
are in practice, when the components to be added or multiplied 1.36 and 1.21.
SPE 77422 SUMS AND PRODUCTS OF DISTRIBUTIONS: RULES OF THUMB AND APPLICATIONS 3

Example 3. Caldwell and Heather4 used a product of five correlations, the result would typically be less than a 50%
triangular distributions to estimate gas reserves in a coal seam increase in R.
under a desorption driving mechanism. Two of the One MUST regard each case separately. It is often true that
distributions were sharply skewed left. Of course, the product correlation makes little difference in the outputs.
is skewed right. The inputs have Rs ranging from 0.01 to 0.39. Summary
Using 2000 trials, the output has an R of 0.561 and the Let Y = X1 *X2 **Xn ,
approximations are 0.560 and .535. Then
Example 4. Murtha5 used three distribution types for his five 1. Y is approximately lognormal
inputs: normal, lognormal and triangular, with Rs ranging 2. Mean of Y Product (means of Xi ), with equality
from 0.067 to 0.400. The output R is 0.445 and Restimates 1 prevailing when the factors are independent
and 2 are 0.445 and .439. Note that the original description 3. 1+ Ry2 (1+ Ri 2 )
used a triangular distribution (0.2, 0.3, 0.45) for Sw, which
easily converts to another triangular (with opposite skewness), Which can be further approximated by
for So, namely (0.55,0.7, 0.8). We also had to convert the R1 RY ( Ri2 ) in general
distribution, Normal(1.34, 0.06) to an appropriate distribution And
for 1/Bo. Simulation showed this reciprocal to be R2 RY n*R, when the Rs are similar.
approximately Normal(0.75, 0.034).
Correlation effects Acknowledgment
None of these examples of products considered the Wilton Adams was kind enough as usual to lend his critical
complicating effects of correlation among inputs. When eye. Several years ago, Dave Morgan and I had some
variables are correlated both the mean and the standard correspondence relating to the product of medians and modes
deviation can be affected. In general, volumetric estimates of for lognormal distributions, in which he used this R2 +1 term.
reserves are more likely to have positive correlations between I had forgotten all about it until I was in the thick of writing
pairs of inputs. For instance, typical pairs of correlated this paper. So I suspected he knew this stuff all along. At any
variables are rate, he deserves some credit for planting some seeds in my
area and net pay brain. Now as the paper goes to press, Dave tells me he and I
should both be indebted to his former colleagues at BP, P. J.
porosity and hydrocarbon saturation Smith, D. J. Hendry and A. R. Crowther6 . As I read their
paper, I realize there is some overlap and urge the reader to
net pay and recovery efficiency
have a look at both.
net-to-gross ratio and porosity.
It is much easier to imagine geologic reasons for positive than References
negative correlations between these paired variables. When 1. Murtha, J.A. and Janusz, G.J., Spreadsheets Generate and
all the correlations are positive, then it is easy to predict the Validate Uncertainty Distributions, Oil and Gas J. 13 March
direction but not the extent of the impact of correlation: We 1995
can make the following generalization (but remember what
2. Cronquist, C.: "Reserves and Probabilities -- Synergism or
Oliver Wendell Holmes said: No generalization is worth a
Anachronism?," JPT, Oct. 1991, p. 1258-1264.
damnincluding this one) based on numerous simulations in
a classroom setting. 3. Davis, John C., Risk Assessment for the Independent
The Monte Carlo Approach, Geobyte, vol. 7, No. 6,
Positive correlation between one or more pairs of inputs December 1992, January 1993, p. 57-65.
to a product can increase the mean of the product by as
much as 8 or 10% and the standard deviation by as much 4. Caldwell, R.H. and Heather, D.I.: "How To Evaluate Hard-
as 40 or 50%, depending on the strength of the To-Evaluate Reserves," JPT, Aug 1991, p. 998-1003.
correlation. 5. Murtha, J. A.Monte Carlo Simulation: Its Status and
The impact on R is to increase it beyond the uncorrelated Future (Distinguished Author Series), JPT, (April 1997) 361,
product, by as much as 30% also presented as paper SPE 37932 at the 1997 Annual
For Sums, the remarkable thing is that the mean is not affected Technical Conference and Exhibition, San Antonio, 5-8
by correlation, but the standard deviation is. Thus, we can October 1997
generalize: 6. Smith, P. J., Hendry, D. J., and Crowther, A. R., The
Positive correlation between one or more pairs of inputs Quantification and Management of Uncertainty in Reserves,
to an aggregation can increase the standard deviation and paper SPE 26056 presented at the Western Regional Meeting
thus will increase the R by the same percentage. While held in Anchorage, Alaska, U.S.A., 26-28 May 1993.
the impact depends on the number and strength of the
4 JAMES A. MURTHA SPE 77422

Appendix Finally, when the Ri are approximately the same, we can


We state here without proof two well known facts. approximate
Let X and Y be random variables and let Z be their sum. Then Ry2 + 1 (R2 + 1)n = 1+ n R2 + + R2n
Z = X + Y (1)
Moreover, if X and Y are independent, then their variances are Ignoring everything after the first two terms,
additive: Ry2 nR2
Z2 = X2 + y2 (2) Ry sqrt(n)*R
Let A = Lognormal(, ), a lognormal distribution with mean To generalize then,
and standard deviation, . For any distributions, A i , letting
By definition, the natural logarithm of A is a normal Y = A i = A 1 *A2 **An
distribution, specifically We have the approximations
Ln(A) = Normal (1 , 1 ), a normal distribution with mean 1 , ( Ri2 + 1) Ry2 + 1 REstimate1 (5)
and standard deviation, 1 , where 6 and when the Ri are similar (to R, say) in variance,
1 = Ln(2 /sqrt( 2 + 2 )) Ry sqrt(n)* R (REstimate2)
12 = Ln(( 2 + 2 )/ 2 ) Even if the Ri are not similar, we can go a step further and
Putting R = / , the coefficient of variation, we have argue
12 = Ln(R2 + 1) Ry2 + 1 ( Ri2 + 1) =1 + Ri2 + + Ri2
Now let Y = A i = A 1 *A 2 **A n Since, in general the powers of Ri2 are small, we again ignore
Where A i = Lognormal(1, 1 ) and assume further that they are all but the first two terms and get
independent.
Put Z = Ln(Y) = Ln(A 1 )+Ln(A 2 ) ++Ln(A n ) = Ln(A i ) Ry2 Ri2
By the Central Limit Theorem, Z is normal and hence Y is so that
lognormal. Moreover, from (1), Ry sqrt(Ri2 ) (REstimate3)
z = Ln( i /sqrt( i + i )) = Ln( 2i / sqrt ( i2 + 2i ))
2 2 2

But since Z = Ln(Y),


z = Ln( y /sqrt( y + 2y ))
2 2

Thus, exponentiating both sides,


2i / sqrt ( i2 + 2i ) = 2y /sqrt( y2 + 2y ) (A)
From (2),
z2 = Ln( Ri2 + 1) = Ln(( Ri2 + 1))
But also z = Ln( R y2 + 1), so that
2

( Ri2 + 1) = Ry2 + 1 (3)


or
(( + 2i )/ 12 ) = ( y2 + 2y )/ 2y
2
i (B)
We have now shown that for special cases (products of
independent lognormal distributions), the statistic R2 + 1 is
preserved by products.
Multiplying the left hand side (LHS) of A with the LHS of B
and setting that equal to the product of the right hand side
(RHS) of A with RHS B, then squaring both sides gives
( i + 2i ) = y2 + 2y
2
(A)
Setting equal the square of LHS A and LHS B with the square
of the RHS A and the RHS of B then taking roots of each side
yields
i = y (4)
which says that the product of the means is the mean of the
product (assuming the factors to be lognormal and
independent).
SPE 77422 SUMS AND PRODUCTS OF DISTRIBUTIONS: RULES OF THUMB AND APPLICATIONS 5

Cronquist Triangular with P10, mode, and P90 specified


P10 Mode P90 Mean Std Ractual REstimate1 REstimate2
ac Area 20 40 50 35.70 11.12 0.312
ft net pay 15 25 30 22.85 5.56 0.243
frac So 0.5 0.6 0.65 0.578 0.056 0.096
frac porosity 0.1 0.2 0.25 0.178 0.056 0.312
frac E function of h, phi, sw 0.217 0.025 0.114
STB Reserves 435632.5 232543.3 0.534 0.551 0.525
Table 1. Comparison of R-values for actual and two approximations, Cronquist paper.

Davis Lognormal and truncated Normal


Truncation
dist type Mean Std Low high trueMean trueStd Ractual REstimate1 REstimate2
Lognormal Area 92 96 92 96 1.043
Lognormal Thickness 12 4 12 4 0.333
TNorm porosity 0.1 0.05 0.02 0.30 0.106 0.045 0.422
TNorm So 0.7 0.1 0.00 1.00 0.700 0.099 0.142
TNorm RecFac 0.3 0.075 0.00 1.00 0.300 0.075 0.250
STB Reserves 188121.5 255948.4 1.36 1.40 1.21
Table 2 Comparison of R-values for actual and two approximations, Davis, Harbaugh book.
Caldwell& Heather Triangular

units min Mode max mean Std Ractual REstimate1 REstimate2


acre Area A 20 320 640 326.7 126.6 0.39
ft NetPay h 30 60 70 53.3 8.5 0.16
scf/ton GasContent C 200 350 550 366.7 71.7 0.20
ton/ac-ft Density rho 1800 1825 1850 1825.0 10.2 0.01
fraction RecFac Er 0.10 0.50 0.60 0.400 0.11 0.27
Bcf Reserves G 4.655 2.610 0.561 0.560 0.535
Table 3. Comparison of R-values for actual and two approximations, Caldwell-Heather paper.

Murtha Lognormal, normal, triangular


Min Mode Max Mean std Ractual REstimate1 REstimate2
lognormal Area N/A N/A N/A 2000 800 0.400
normal Pay N/A N/A N/A 45 3 0.067
normal porosity N/A N/A N/A 0.14 0.02 0.143
Triangular So 0.55 0.7 0.8 0.683 0.05 0.075
Normal 1/Bo N/A N/A N/A 0.8 0.0 0.045
MMSTB Oil in Place 50.07 22.28 0.445 0.445 0.439
Table 4. Comparison of R-values for actual and two approximations, Murtha paper.

Das könnte Ihnen auch gefallen