Sie sind auf Seite 1von 5

Confidence Intervals for Validating Simulation Models1

Summary. This paper summarizes the processes for building and using confidence intervals to evaluate
the validity of simulation models.2
We can use statistical methods to evaluate the goodness of fit between simulation engines and real life.
This requires the operator to select a probability of compliance that feels good, e.g., 98%. With that value
of probability of inclusion (P) we can calculate the probability of exclusion (a), and with that and the
sample size n calculate the interval (in terms of the experimental average and variance S2) within which
the simulation data must fall to be representative of real life.
The basic principle is to use live data sampling to establish a statistically valid estimate of the population
mean and variance for a particular parameter as well as a range of values about each (called confidence
intervals) that are dependent on the operator’s sense of percent confidence required. Given these
intervals the task is to then decide whether the corresponding mean and variance statistics of the
simulation data fall within the acceptable confidence interval about the sample statistics.
Parameters other than mean and variance can be used for comparison. For example, the Universal Naval
Task List (UNTL) contains a host of measures of effectiveness (MOEs) that can be used for making
comparisons between systems and processes. We have restricted ourselves to the mean and variance in
this paper only for illustrative purposes.
Confidence Interval. A confidence interval is the region in the vicinity of a specified value of a
phenomenon’s parameter within which another value may lie with a given probability [100(1-α)%], based
on the results of observing n samples of the phenomenon:

α% Interval

Figure 1 - Generic Confidence Interval Graph


In general, experimental sample data are assumed to be only representative of some larger (and
generally infinite) set of data elements. Suppose we are measuring the time t to detect a target after it
arrives within a specified range of the sensor. We can observe the detection phenomenon a number of
times and calculate the average time to detect. But we might be more interested in knowing what the
average detection time would be if we observed an infinite number of trials. This infinite set is said to be
the global “population” of data.
Sample and Population Statistics. A “statistic” is some number calculated from data that is used to
characterize the data set. There are many different statistics; the more commonly used are the arithmetic
3
mean (“average”) and the variance.
• The population arithmetic mean is generally indicated by the Greek “m” letter µ (“mu”), while the
sample mean is indicated by a letter with a bar over the top, for example t (called “t-bar”). The
sample mean is calculated as the sum of the parameters divided by their count:4

1
C. Andrews La Varre, Booz Allen & Hamilton Inc., Newport, RI, October 2000
2
Douglas C. Montgomery, Statistical Quality Control, John Wiley & Sons, 1985, ISBN 0-471-80870-9, Section 2-3, Chapter 3.
3
Others include the standard deviation, the t statistic, and many, many others. We will highlight those of immediate interest for this
problem.
4
For example, the mean of 2, 4, and 6 = (2+4+6)/3 = 4

1
n

∑t
t = i =1

n
• The population variance is generally indicated by the squared Greek “s” letter σ2 (“sigma
squared”) while the sample variance is indicated by the squared capital “S” letter S2 (“S-
squared”). The sample variance is the average squared distance that the parameter varies from
its average:5

∑ (t − t )
n 2

t = i =1

n −1
• The square root of the variance is called the “standard deviation” as both the population σ and the
sample S.
It turns out that6 the sample and population statistics are related mathematically by expressions that
depend on the sample size. For the normal distribution discussed below these relationships are that:

• the sample mean t is distributed normally about the true population mean µ, but with a variance
equal to the population variance σ2 divided by the sample size n: σx2 = σ2/n.
Similarly the variance of the sample set is not the same as that of the population; here there is a function
of the sample variance that fits a particular curve called the Chi-squared distribution. The formula is rather
arcane so is bypassed here for simplicity.
Confidence intervals are used to evaluate how close sample statistics are to the population statistics.
Effectively, they let us scale the observed data to a common (“standard”) interval in order to reach
consistent conclusions about experimental data. The formulas are different for different kinds of events.
We will describe the formulas for the most common type of event, that whose data conform to the familiar
bell-shaped curve, the “Normal” distribution.
Normal Distribution. A “Normal” distribution curve is a symmetric curve centered on an average, the
“arithmetic mean” value. It gives a graph (a continuous histogram) of the expected number of times a
Normally-distributed parameter is likely to occur. The edges (or “skirts”) of the graph fall off in a manner
defined by the “variance”, such that 69% of the values occur within plus or minus one sigma (+σ ) on
either side of the mean. The mean can be any value, as can the variance.

σ
-σ µ σ

Figure 2 - "Normal" Distribution


Standard Normal Distribution. The “Standard” Normal distribution is a specially scaled version of the
ordinary “Normal” distribution. This curve is centered on zero and shaped so that its variance is 1.0,
causing 69% of the observations will fall between the values +1 and -1:

5
For the same example, 2, 4, 6 vary from their average (4) by –2, 0, 2 respectively, the squared amounts being 4, 0, 4, respectively.
So their variance is the average of these three values, or 8/3. The term is squared to nullify the effect of negative numbers, since we
are interested in just the size of the distance. The sum is divided by n-1 rather than n to ensure the result more closely approaches
the population variance.
6
Ibid, Section 3-1

2
-1 1 0
Figure 3 - "Standard" Normal Distribution
The process of converting a normal to a standard normal distribution is called standardization and
involves using a conversion parameter z:
t−µ (1)
z=
σ
These characteristics are very useful as a “standard” since any “Normal” curve can be scaled to the
“Standard Normal” to allow making comparisons and drawing conclusions.7
Normal and Standard Normal Probabilities. It turns out that the probability of a value being less than or
equal to some value t in the Normal distribution is calculated as the integral of the Normal distribution
over the range from the far left edge up to the value t.8
The value of this integral is mathematically equal to the probability of the variable z being less than or
equal to the expression above, or:
P (t ≤ t 0 ) ≡ P ( z ≤ z0 ) (2)
t0 − µ
z0 =
σ
These “cumulative probability” values are tabulated in a number of different texts for various values of z.
It also turns out that the probability of a value being less than some other is equal to one minus the
probability of it being bigger than the other:
P (u ≤ q ) = 1 − P (u > q ) (3)

Equations (2) and (3) can be used to calculate the probability that some value is between two other
values. For example, the probability that the mean is between a and b is equal to the probability that it is
less than or equal to b minus the probability that it is less than or equal to a:
P (a ≤ t ≤ b ) = P (t ≤ b ) − P (t ≤ a ) (4)

a−µ b−µ
Confidence Intervals. So with (2) and (4) we can use the tabulated values for za = and zb =
σ σ
to calculate the probability of a < t < b:
 b−µ  a−µ (5)
P (a ≤ t ≤ b ) = P  z ≤  −P z ≤ σ 
 σ   
= Pzb − Pza

7
The standard normal and cumulative standard normal distributions are widely published in tables. Additionally they are readily
available in Microsoft Excel with the functions NORMDIST() and NORMSDIST(). The first computes the values for either a normal or
standard normal distribution, the latter only for the cumulative standard normal distribution.
8
The “integral” is simply a very precise way of adding things up. It “integrates” a range of incremental probabilities into a cumulative
probability value. In a discrete problem it is represented by the summation symbol, the Greek capital S: Σ. The difference between a
summation and an integral is essentially only the size of the samples being added up. So if we add up the incremental probabilities
of a number being just so, we get the total probability of a number being less than or equal to the last “just so” value.

3
where Pza and Pzb are obtained from the Cumulative Standard Normal Distribution tables. What this
means is that for any Normally distributed set of data we can calculate the likelihood of its mean value
being between two arbitrarily chosen values a and b.
Application to Evaluating Simulated Data. We can collect data on a simulation engine for a range of
parameters, collect data on live events for the same parameters, and use the process above to compare
the closeness of the simulation data to the live data. For example, if we have a mean time to detect of 30
minutes in the simulation data and a mean time of 50 minutes from the live data, we can use the
confidence interval to calculate the probability that the simulation mean is within range of the population
mean as established by the sampling of live data, that is, is 30 minutes within the 100(1-α)% interval
about the sample mean of 50 minutes. If it is not in that interval then we need to ask questions about why
it is not, and what it would take to get it into that range.
This does not, however, immediately answer the question about sample size needed to reach these
conclusions. At this point the literature gets really arcane.
However, we can simplify it. Recall that the sample and population statistics are related by a normal
distribution for the mean and a Chi-squared distribution for the variance. These relationships allow us, for
a given probability of containment, to calculate a confidence interval for the mean under different
conditions. Specifically (and this is turn-the-crank stuff for an Operations Research / Systems Analysis
[ORSA] person):
Unknown population distribution, known population mean and variance.
a. Select a desired probability of containment P.
1− P α
b. Calculate the resulting probability of exclusion α / 2 = = .
2 2

c. Calculate the sample mean t


d. Calculate the sample size n.

 α
1 − 2  − µ
e. Calculate the value of z for that exclusion probability: za / 2 = 
σ
f. Calculate the interval within which the population mean is contained with a probability of P =
(1-α) from:
zα / 2σ zα / 2σ
x− ≤µ≤x+
n n
Normal population distribution, known population mean, unknown variance.
a. Select a desired probability of containment P.
1− P α
b. Calculate the resulting probability of exclusion α / 2 = = .
2 2
c. Calculate the sample mean t and variance S2
d. Calculate the degrees of freedom = n – 1 from a sample size n.
e. Use t-distribution tables to get the percentage point ω of the t-distribution with n-1 degrees of
freedom and exclusion probability α/2
f. Use the Chi-squared tables to get the percentage point ξα/2 and ξ(1-α/2) of the Chi-squared-
distribution with n-1 degrees of freedom for both exclusion probabilities α/2 and (1-α/2)
g. Calculate the interval within which the population mean is contained with a probability of P = (1-α)
from:

4
ωσ ωσ
t − ≤µ≤t +
n n
h. Calculate the interval within which the population variance is contained with a probability of P =
(1-α) from:
( n − 1) S 2 ( n − 1) S 2
≤ σ2 ≤
ζα 2 ζ (1−α 2)

Evaluating the Validity of Simulation Data. We can use these methods to evaluate the validity of the
simulation data. Specifically:
a. Calculate the simulation data statistics t and S2.
b. Use standard methods to extrapolate to population µ and σ2
c. Postulate live sample sizes n and sample mean t and variance S2
d. Calculate the population mean and variance confidence levels as described.
e. If the simulation data mean and variance do not fall in those intervals then take steps to calibrate
the program to have the results reflect the real data.
Alternative Approaches. There are, of course, many other methods of comparing data. “Curve Fitting”
(regression theory) can be used to devise a closed form equation that describes the data. The equations
can then be compared to evaluate their similarities. Another interesting approach is the Kolmogorov-
Smirnov test.9 The Kolmogorov-Smirnov D is a particularly simple measure: It is defined as the maximum
value of the absolute difference between two cumulative distribution functions. This allows direct
comparison between two sets of data after constructing a synthetic cumulative distribution function for
each set. The simplicity of this approach is that no assumptions need be made about the actual
distribution of the sample sets, you simply perform an absolute distance test between curves that
represent each set.
Conclusion. We can use statistical methods to evaluate the goodness of fit between simulation engines
and real life. This requires the operator to select a probability of compliance that feels good, e.g., 98%.
With that value of probability of inclusion (P) we can calculate the probability of exclusion (α), and with
that and the sample size n calculate the interval (in terms of the experimental average t and variance S2)
within which the simulation data must fall to be representative of real life. We can also use a variety of
other methods to establish a confidence of similarity between separate data sets.

9
http://www.ulib.org/webRoot/Books/Numerical_Recipes/bookcpdf/c14-3.pdf

Das könnte Ihnen auch gefallen