Simulation Input Data Analysis

Graduate Program in
Engineering and
Technology
Management
Simulation-4
INPUT MODELNG
Asl Sencer
STEPS OF NPUT MODELNG
1) Collect data from real system of interest
Requires substantial time and effort
Use expert opinion in case of no sufficient data
2) Identify a probability distribution to represent the
input process
Draw frequency distribution, histograms
Choose a family of theoretical distribution
3) Estimate the parameters of the selected distribution
4) Apply goodness-of-fit tests to evaluate the chosen
distribution and the parameters
Chi-square tests
Kolmogorov Smirnov Tests
5) If these tests are not justified, choose a new
theoretical distribution and go to step 3! If all
theoretical distributions fail, then either use 2
emprical distribution or recollect data.
STEP 1: DATA COLLECTON NCLUDES
LOTS OF DFFCULTES
Nonhomogeneous interarrival time distribution;

distribution changes with time of the day, days of
the week, etc. You cant merge all these data for
distribution fitting!
Two arrival processes might be dependent; like
demand for washing machines and dryers. You
shouldnt treat them seperately!
Start and end of service durations might not be
clear; You should split the service into well
defined processes!
Machines may breakdown randomly; You should
3
collect data for up and down times!
STEP 2.1: IDENTFY THE PROBABLTY
DSTRBUTON Raw Data
10 8 5 1 6 0 4 6 2 3
Histogram with Discrete Data 2

5
3
1
5
8
9
9
2
1
0
9
2
3
4
7
2
4
3
0
2 6 3 1 4 5 0 3 3 2
2 10 0 3 6 0 6 5 7 0
Arrivals per 8 2 3 7 0 2 2 1 0 4
0 2 4 1 2 5 1 5 3 2
period Frequency 8 6 3 4 6 11 3 2 8 0
0 12 2 4 2 4 1 3 1 2 1 2
3 10 0 7 3 5 3 7 3 4
1 10
2 19 Histogram of Arrivals per Period
3 17 20
4 10 18
5 8
16
6 7
14
7 5
12
8 5
10
9 3 Frequency
8
10 3
6
11 1
4 4
2
0
0 1 2 3 4 5 6 7 8 9 10 11
STEP 2.1: IDENTFY THE PROBABLTY
DSTRBUTON Raw Data
79.919 3.081 0.062 1.961 5.845
Histogram with Continuous Data 3.027
6.769
6.505
59.899
0.021
1.192
0.013
34.760
0.123
5.009
Component Life 18.387 0.141 43.565 24.420 0.433
(days) Frequency 144.695 2.663 17.967 0.091 9.003
[0-3) 23 0.941 0.878 3.148 2.157 7.579
[3-6) 10 0.624 5.380 3.371 7.078 23.960
[6,9) 5 0.590 1.928 0.300 0.002 0.543
7.004 31.764 1.005 1.147 0.219
[9-12) 1
3.217 14.382 1.008 2.336 4.562
[12-15) 1
[15-18) 2 Histogram of Component Life
[18-21) 0
25
[21-24) 1
[24-27) 1
[27-30) 0 20
[30-33) 1
[33-36) 1 15
... ...
[42-45) 1 Frequency
... ... 10
[57-60) 1
... ... 5 5
[78-81) 1
... ...
[144-147) 1 0
3 6 9 12 15 18 21 24 27 30 33 36
STEP 2.2: SELECTNG THE FAMLY OF
DSTRBUTONS
The purpose of preparing a histogram is to infer a

known pdf or pmf.
This theoretical distribution is used to generate
random variables like interarrival times and
service times during simulation runs.
Exponential, normal and poisson ditributions are
frequently encountered and are not difficult to
analyze.
Yet there are beta, gamma and weibull families
that provide a wide variety of shapes.
6
Applications of Exponential Distribution
Used to model time between independent events,

like arrivals or breakdowns 7
Inappropriate for modeling process delay times

8
Applications of Poisson Distribution
Discrete distribution, used to model the number of

independent events occuring per unit time,
Eg. Batch sizes of customers and items
9
If the time betweeen successive events is exponential,
then the number of events in a fixed time intervals
is poisson.
10
11
Applications of Beta Distribution:
Often used as a rough model in the absence of data

Represent random proportions 12
Can be transformed into scaled beta sample
Y=a+(b-a)X
13
Applications of Erlang Distribution
Used to represent the time required to complete a task

which can be reprsented as the sum of k exponentially
distributed durations.
For large k, Erlang approaches normal distribution.
For k=1, Erlang is the exponential distribution with

rate=1/.
Special case of gamma distribution in which , the

shape parameter of gamma distribution is k.
14
Applications of Gamma Distribution
Used to represent time required to complete a

task
15
Same as Erlang distribution when the shape
parameter
is an integer.
Applications of Johnson Dist.
Flexible domain being bounded or
unbounded 16
allows it to fit many data sets.
If >0, the domain is bounded
If <0, the domain is unbounded
Applications of Lognormal Distribution
Used to represent quantities which is the product
of large number of random quantities
Used to represent task times which are skewed
17 to
right. If X~LOGN( l , l ), then lnX ~NORM(,)
18
Applications of Weibull Distribution
Widely used in reliability models to represent lifetimes.
If the system consists of large number of parts that fail

independently, time between successive failures can be
Weibull.
Used to model nonnegative task times that are skewed to left.
It turns out to be exponential distribution when =1.
19
Applications of
Continuous
Empirical
Distribution
Used to incorporate
empirical data as an
alternative to
theoretical
distribution, when
there are
multimodes,
significant outliers,
etc.
20
Applications of
Discrete Empirical
Distribution
Used for discrete

assignments such as job
type,
visitation sequence or
batch size
21
STEP 3: ESTMATE THE PARAMETERS OF
THE SELECTED DSTRBUTON
A theoretical distribution is specified by its parameters that
are obtained from the whole population data.
Ex: Let V,W,X,Y,Z be random variables, then
V~N(,2), where is the mean and 2 is the variance.
W~Poisson (), where is the mean
X~Exponential (), where is the mean
Y~Triangular (a,m,b), where a, m,b are the minimum,mod
and the maximum of the data
Z~Uniform (a,b), where a and b are the minimum and
maximum of the data
These parameters are estimated by using the point

estimators defined on the sample data
22
Sample mean and the sample variance are the point estimators for the
population mean and population variance
Let Xi; i=1,2,...,n iid random variables (raw data are known) , then the
sample mean and sample variance s2 are calculated as
Discrete Raw Data Continuous Raw Data

10 8 5 1 6 0 4 6 2 3 79.919 3.081 0.062 1.961 5.845
2 3 5 9 2 0 2 4 2 3 3.027 6.505 0.021 0.013 0.123
5 1 8 9 1 9 3 7 4 0 6.769 59.899 1.192 34.760 5.009
2 6 3 1 4 5 0 3 3 2 18.387 0.141 43.565 24.420 0.433
2 10 0 3 6 0 6 5 7 0 144.695 2.663 17.967 0.091 9.003
8 2 3 7 0 2 2 1 0 4 0.941 0.878 3.148 2.157 7.579
0 2 4 1 2 5 1 5 3 2 0.624 5.380 3.371 7.078 23.960
8 6 3 4 6 11 3 2 8 0 0.590 1.928 0.300 0.002 0.543 23
2 4 2 4 1 3 1 2 1 2 7.004 31.764 1.005 1.147 0.219
3 10 0 7 3 5 3 7 3 4 3.217 14.382 1.008 2.336 4.562
If the data are discrete and have been grouped in a
frequency distribution, i.e., the raw data are not known,
then
where k is the number of distinct values of X and fj;

j=1,2,...,k is the observed frequency of the value Xj of X.
Arrivals per period Frequency Arrivals per period Frequency
0 12 6 7
1 10 7 5
2 19 8 5
3 17 9 3
24
4 10 10 3
5 8 11 1
If the data are discrete or continuous and have been
grouped in class intervals, i.e., the raw data are not known,
then
where fj; j=1,2,...,c is the observed frequency of the jth class

interval and mj is the midpoint of the jth interval.
Component Life Component Life Component Life

(days) Frequency (days) Frequency (days) Frequency
[0-3) 23 [21-24) 1 ... ...
[3-6) 10 [24-27) 1 [57-60) 1
[6,9) 5 [27-30) 0 ... ...
[9-12) 1 [30-33) 1 [78-81) 1
[12-15) 1 [33-36) 1 ... ... 25

[15-18) 2 ... ... [144-147) 1
[18-21) 0 [42-45) 1
The minimum, mod (i.e., data value with the highest

frequency) and maximum of the population data are
estimated from the sample data as
Xt is the data value that has the

highest frequency.
26
STEP 4: GOODNESS OF FT TEST
Goodness of fit tests (GFTs) provide helpful guidance

for evaluating the suitability of the selected input
model as a simulation input.
GFTs check the discrepancy between the emprical and
the selected theoretical distribution to decide whether
the sample is taken from that theoretical distribution
or not.
The role of sample size, n:
If n is small, GFTs are unlikely to reject any theoretical
distribution, since discrepancy is attributed to the sampling
error!
If n is large, then GFTs are likely to reject almost all
distributions. 27
STEP 4: GOODNESS OF FT TESTS
CH SQUARE TEST
Chi square test is valid for large sample sizes and for both discrete
and continuous assumptions when parameters are estimated with
maximum likelihood.
Hypothesis test:
Ho: The random variable X conforms to the theoretical distribution
with the estimated parameters
Ha: The random variable does NOT conform to the theoretical
distribution with the estimated parameters
We need a test statistic to either reject or fail to reject Ho. This test
statistic should measure the discrepency between the theoretical and
the emprical distribution.
If this test statistic is high, then Ho is rejected,
Otherwise we fail to reject Ho! (Hence we accept Ho) 28
CH SQUARE TEST
Test statistic:
Arrange n observations into a set of k class intervals or cells.
The test statistic is given by
where Oi is the observed frequency in the ith class interval and

Ei is the expected frequency in the ith class interval.
where pi is the theoretical probability associated with the ith

class, i.e., pi =P(random variable X belongs to ith class).
29
CH SQUARE TEST
Recommendations for number of class intervals
for continuous data
Sample Size, Number of Class Intervals
n k
20 Do not use chi-square test
50 5-10
100 10 to 20
>100 to n/5
It is suggested that . In case it is smaller,

then that class should be combined with the
adjacent classes. Similarly the corresponding Oi
values should also be combined and k should be 30
reduced by every combined cell.
CH SQUARE TEST
Evaluation
Let =P(rejecting Ho when it is true); the significance level is 5%.
follows the chi-

square distribution
with k-s-1 degress of
freedom, where s is
the number of
estimated
parameters.
Fail to Reject Ho Reject Ho
If probability of the test statistic < , reject Ho and the distribution

otherwise, fail to reject Ho. 31
CH-SQUARE DSTRBUTON TABLE
(k-s-1)
2
,1
32
STEP 4: GFT - CH SQUARE TEST
EX: POSSON DSTRBUTON
Consider the discrete data we analyzed in step 2.
Ho: # arrivals, X~ Poisson (=3.64)
Ha: ow
is the mean rate of arrivals, =3.64
The following probabilities are found by using the pmf
P(0)=0.026 P(6)=0.085
P(1)=0.096 P(7)=0.044
P(2)=0.174 P(8)=0.020
P(3)=0.211 P(9)=0.008
P(4)=0.192 P(10)=0.003
33
P(5)=0.140 P(>11)=0.001
EX: POSSON DSTRBUTON
Calculation of the chi-square test statistic with k-s-1=7-1-1=5
degrees of freedom and =0,05.
34
So, Ho is rejected!
EX: ARENA NPUT ANALYZER
Distribution Summary
Distribution: Normal
Expression: NORM(225, 89)
Square Error: 0.037778 Reject Normal distribution at
5% significance level!
Chi Square Test
Number of intervals = 12
Degrees of freedom =9
Test Statistic = 1.22e+004
Corresponding p-value < 0.005
Data Summary Fit all summary

Number of Data Points = 27009 Function Sq Error
-----------------------
Min Data Value =1
Normal 0.0506
Max Data Value = 1.88e+003 Gamma 0.0625
Sample Mean = 225 Beta 0.0639
Sample Std Dev = 89 Erlang 0.0673
Weibull 0.079
Histogram Summary Lognormal 0.0926
Exponential 0.286
Histogram Range = 0.999 to 1.88e+003
Triangular 0.311
Number of Intervals = 40
Uniform 0.36 35
Distribution: Lognormal
Expression: 2 + LOGN(145, 67.9)
Square Error: 0.000271
Reject Lognormal distribution
Chi Square Test at 5% significance level!
Number of intervals =4
Degrees of freedom =1
Test Statistic = 207
Data Summary
Number of Data Points = 21547
Min Data Value =2
Max Data Value = 6.01e+003
Sample Mean = 146
Sample Std Dev = 79.5
Histogram Summary
Histogram Range = 2 to 6.01e+003 36
Number of Intervals = 40
Distribution: Weibull
Expression: 0.999 + WEIB(94.7, 0.928)
Square Error: 0.002688 Reject Weibull distribution at
5% significance level!
Chi Square Test
Number of intervals = 20
Degrees of freedom = 17
Test Statistic = 838
Data Summary
Number of Data Points = 12418
Min Data Value =1
Max Data Value = 1.47e+003
Sample Mean = 108
Sample Std Dev = 135
Histogram Summary
Histogram Range = 0.999 to 1.47e+003
Number of Intervals = 40 37
DRAWBACKS OF CH-SQUARE GFT
The Chi-square test uses the estimates of the parameters
obtained from the sample that decreases the degrees of
freedom.
Chi-square test requires the data to be placed in class
intervals in the continuous distributions where these
classes are arbitrary and affects the value of the chi-square
test statistic.
The distribution of the chi-square test statistic is known
approximately and the power of the test (probability of
rejecting an incorrect theoretical distribution) is sometimes
low.
Hence other GFTs are also needed!
38
KOLMOGOROV-SMRNOV TEST
Useful when the sample sizes are small and when no
parameters are estimated from the sample data.
Compares the cdf of the theoretical distribution, F(x) with the
emprical cdf, SN(x) of the sample of N observations.
Hypothesis test:
Ho: Data follow the selected pdf
Ha: Data do NOT follow the selected pdf
Test Statistic:
The largest deviation, D between F(x) and SN(x).
39
Steps of K-S Test:
1. Rank the data so that (1) (2) ()

2. Calculate the maximum discrepancy D between F and SN,
() = ( () )
# ()
() = =

40
If F is discrete = +, , where
+

= max = max
0 0

1
= max 1 = max
0 0
If F is continuous
= max
0
41
3. Evaluation
> , ,
, ,
42
EXAMPLE: KOLMOGOROV-SMRNOV TEST
Consider the data:
0.44, 0.81, 0.14, 0.05, 0.93
Ho: Data are uniform between (0,1)

Ha: ow
i 1 2 3 4 5
Since D=0.26 < 0.05,5 = 0.565
0.05 0.14 0.44 0.81 0.93 Ho is not rejected!
= 0.05 0.14 0.44 0.81 0.93 Data are uniform between (0,1)
= / 0.20 0.40 0.60 0.80 1.00
43
/ 0.15 0.26 0.16 - 0.07
= ( 1)/ 0.05 - 0.04 0.21 0.13

Simulation Input Data Analysis

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Simulation Input Data Analysis

Hochgeladen von

Copyright:

Verfügbare Formate

Graduate Program in

Nonhomogeneous interarrival time distribution;

Histogram with Discrete Data 2

The purpose of preparing a histogram is to infer a

Used to model time between independent events,

Inappropriate for modeling process delay times

Discrete distribution, used to model the number of

Often used as a rough model in the absence of data

Used to represent the time required to complete a task

For large k, Erlang approaches normal distribution.

For k=1, Erlang is the exponential distribution with

Special case of gamma distribution in which , the

Used to represent time required to complete a

Widely used in reliability models to represent lifetimes.

If the system consists of large number of parts that fail

Used to model nonnegative task times that are skewed to left.

It turns out to be exponential distribution when =1.

Used for discrete

These parameters are estimated by using the point

Discrete Raw Data Continuous Raw Data

where k is the number of distinct values of X and fj;

where fj; j=1,2,...,c is the observed frequency of the jth class

Component Life Component Life Component Life

[0-3) 23 [21-24) 1 ... ...

[3-6) 10 [24-27) 1 [57-60) 1

[6,9) 5 [27-30) 0 ... ...

[9-12) 1 [30-33) 1 [78-81) 1

[12-15) 1 [33-36) 1 ... ... 25

The minimum, mod (i.e., data value with the highest

Xt is the data value that has the

Goodness of fit tests (GFTs) provide helpful guidance

where Oi is the observed frequency in the ith class interval and

where pi is the theoretical probability associated with the ith

It is suggested that . In case it is smaller,

follows the chi-

If probability of the test statistic < , reject Ho and the distribution

Data Summary Fit all summary

1. Rank the data so that (1) (2) ()

Ho: Data are uniform between (0,1)

= ( 1)/ 0.05 - 0.04 0.21 0.13

Das könnte Ihnen auch gefallen