Sie sind auf Seite 1von 25

1

"Lies, damned lies, and statistics" is a phrase describing the persuasive power of numbers,
particularly the use of statistics to bolster weak arguments. It is also sometimes colloquially
used to doubt statistics used to prove an opponent's point.
The term was popularised in United States by Mark Twain (among others), who attributed
it to the British Prime Minister Benjamin Disraeli: "There are three kinds of lies: lies,
damned lies, and statistics." However, the phrase is not found in any of Disraeli's works and
the earliest known appearances were years after his death. Several other people have been
listed as originators of the quote, and it is often erroneously attributed to Twain himself.

a) Parameter
b) Statistic

Whats a Probability Distribution?


Things happen all the time: dice are rolled, it rains, buses arrive.

After the fact, the specific outcomes are certain: the dice came up 3 and 4, there was half
an inch of rain today, the bus took 3 minutes to arrive.
Before, we can only talk about how likely the outcomes are.

Probability distributions describe what we think the probability of each outcome is, which
is sometimes more interesting to know than simply which single outcome is most likely.
They come in many shapes, but in only one size: probabilities in a distribution always add
up to 1.

Bernoulli Coin Toss 1s or 0s (not necessarily equiprobable if someone has a weighted


coin)
Uniform Rolling a fair dice all of the multiple outcomes are equally likely
Binomial Sum of the outcomes of a series of Bernoulli events, such as picking red or black
balls from a large bucket of 50:50 mixed balls, then replacing the ball and picking another.
Hypergeometric Similar to Binomial, but you affect the next result at each step by not
replacing the last ball selected
Poisson Incorporates a frequency component and expresses the probability of a given
number of events occurring in a fixed interval of time and/or space if these events occur
with a known average rate and independently of the time since the last event. For example
the number of decay events per second from a radioactive source.
Geometric Is related to the binomial test, and reflects the number of failures before a
success can be expected, e.g. newly-wed couple plans to have children, and will continue
until the first girl. What is the probability that there are zero boys before the first girl, one
boy before the first girl, two boys before the first girl, and so on?
Negative Binomial Similar to above, now we are measuring the number of success in a
sequence before the a specified number of failures. For example, measuring the number of
days a certain machine works before it breaks down for a week.

Sometimes called a Gaussian Distribution after German Mathematician Karl Friedrich


Gauss.

In statistics, the standard deviation (SD, also represented by the Greek letter sigma or the
Latin letter s) is a measure that is used to quantify the amount of variation or dispersion of
a set of data values. A low standard deviation indicates that the data points tend to be
close to the mean (also called the expected value) of the set, while a high standard
deviation indicates that the data points are spread out over a wider range of values.
Some Hydrocarbon Geoscience variables commonly found to display normal distributions
are:
Core Porosity
Mineral Abundances in specific rocks
Chemical Abundances in specific rocks

By plotting Core Porosity values on Normal Probability paper a normal distribution would
plot as a straight line.

Deviation from this at the low porosity end of the data may indicate a lack of normality
or a measurement error. However it is a reasonable fit and the extrapolation of a best-fit
line is satisfactory. This quality of fit from an input dataset which consisted of only 42
points is good and helps to confirm the validity of the dataset.
We can easily read off the percentiles for P10 (~13.5%), P50, which is also the Mean
(~17.5%) and the P90 (~21.5%)

10

Mode is the most likely outcome


Median has 50%> and 50%< outcomes
Mean is the average outcome if you have enough samples. It is not the most likely outcome
in any single case.
Some Hydrocarbon Geoscience variables commonly found to display normal distributions
are:
Permeability
Net:Gross
Fracture Distribution
Field Sizes

Aside: Using Permeability Data.

When using permeability in Darcys equation to calculate Fluid flow, the best value of
permeability to use is the Median value, as this best represents the reservoir
heterogeneities. However, most core reports which list an average permeability are
actually listing the arithmetic average the Mean. In a Log-Normal distribution the Mean
is higher than the Median and hence using this value will lead to higher flow rates to be
calculated...

11

Arithmetic mean is good where you have a large sample size


Statistical Mean is accurate, but complex to compute and prone to bias by a few high
values (remember theoretically a LogNormal distribution goes to infinity)
Truncated Statistical Mean removes bias to a few large numbers, but remains relatively
complex to compute
Swansons Mean, removes bias to upside and is simple to compute.

12

Remember that we can only Sum the Means of distributions to get an understanding of the
value of the a portfolio. Summing Means gives the Mean of the total set. Summing P10s or
P90s is not valid, each step moves your value further away from the original percentile.

13

Sometimes also called a random distribution.

Used where you have no reasonable idea of the varying probability of the parameter; for
example:
What day-rate will be charged for a drilling rig? The market is charging between
$200,000/day and $300,000/day, but you have not yet begun negotiations with a
contractor.

14

Not a natural distribution, Triangular distributions allow you to input a limited evaluation
of a variable and use this to control inputs to Monte-Carlo analyses.
For example in our drilling costs case from the uniform distribution, we have now begun
discussions with a contractor, and our negotiators believe they can narrow the range of
costs down to somewhere between $220,000/day and $350,000/day with a most likely
cost of $250,000/day.

In some cases geologists use this to represent porosity or net:gross values in Monte-Carlo
analyses where they have very little data to guide them on a realistic distribution but, in
general, this is not good practice.

15

Using the Bernoulli process to calculate the likelihood of a success, or number of expected
succeses, in a 5 well drilling campaign is relatively straightforward, but is dependent upon
the three fundamental points:
Only two possible outcomes
Each trial is an independent event
The probability of each outcome remains constant over repeated trials.

Of these the independence is the one most likely to cause problems in assessing our
drilling programme as few 5 well programmes are likely to be testing completely
independent opportunities. At least one of the Play elements of Source, Reservoir, Trap,
Seal, Migration is likely to be common and then the independence breaks down...
It also assumes replacement of opportunities. Which is an unreal assumption.

A slight improvement in this calculation may come from using a Hypergeometric


Distribution in the calculation as it assumes non-replacement of opportunities.

Unfortunately both are over-simplistic in their treatment of the additional information


coming from each successive well and how that impacts the probability of success in all
subsequent wells...

16

17

18

Monte Carlo simulation is a computerized mathematical technique that allows people to


account for risk in quantitative analysis and decision making.
The technique was first used by scientists working on the atom bomb; it was named for
Monte Carlo, the Monaco resort town renowned for its casinos. Since its introduction in
World War II, Monte Carlo simulation has been used to model a variety of physical and
conceptual systems.

Monte Carlo simulation performs risk analysis by building models of possible results by
substituting a range of valuesa probability distributionfor any factor that has inherent
uncertainty. It then calculates results over and over, each time using a different set of
random values from the probability functions. Depending upon the number of
uncertainties and the ranges specified for them, a Monte Carlo simulation could involve
thousands or tens of thousands of recalculations before it is complete. Monte Carlo
simulation produces distributions of possible outcome values.

19

For instance, suppose you draw a card from a deck of 52, without showing it to me.
Assuming the deck has been well shuffled, I should believe that the probability that the
card is a jack, P(A), is 4/52, or 1/13, since there are four jacks in the deck. But now suppose
you tell me that the card is a face card. The probability that the card is a jack, given that it is
a face card, is 4/12, or 1/3, since there are 12 face cards in the deck. We represent this
conditional probability as P(A|B), meaning the probability that the card is a jack given
that it is a face card.
The idea is that P(A|B) represents the probability assigned to A after taking into account
the new piece of evidence, B.

To calculate this we need, in addition to the prior probability P(A), two further conditional
probabilities indicating how probable our piece of evidence is depending on whether our
theory is or is not true.
We can represent these as P(B|A) and P(B|~A), where ~A is the negation of A, i.e. the
proposition that A is false.

We can use this to understand the way that probabilities change during a drilling campaign.
And seek to make decisions as quickly as possible.
We can use this equation to determine how our understanding of a drilling portfolio should
change as we progress through a drilling campaign.
We set an original probability of finding oil in the whole campaign, then start drilling wells.
Each new well is a test of the predicted success rate and will change the chances of the
successor wells. We can also use the system to predict how many dry-holes would indicate
our initial estimates were incorrect and we may need to re-assess.

20

To calculate the Expected Monetary Value in project risk management, you need to:
Assign a probability of occurrence for the risk.
Assign monetary value of the impact of the risk when it occurs.
Multiply Step 1 and Step 2.
The value you get after performing Step 3 is the Expected Monetary Value. This value is
positive for opportunities (positive risks) and negative for threats (negative risks). Project
risk management requires you to address both types of project risks.

21

22

23

24

25

Das könnte Ihnen auch gefallen