Sie sind auf Seite 1von 43

Graduate Program in

Engineering and
Technology
Management
Simulation-4

INPUT MODELNG
Asl Sencer
STEPS OF NPUT MODELNG
1) Collect data from real system of interest
Requires substantial time and effort
Use expert opinion in case of no sufficient data
2) Identify a probability distribution to represent the
input process
Draw frequency distribution, histograms
Choose a family of theoretical distribution
3) Estimate the parameters of the selected distribution
4) Apply goodness-of-fit tests to evaluate the chosen
distribution and the parameters
Chi-square tests
Kolmogorov Smirnov Tests
5) If these tests are not justified, choose a new
theoretical distribution and go to step 3! If all
theoretical distributions fail, then either use 2
emprical distribution or recollect data.
STEP 1: DATA COLLECTON NCLUDES
LOTS OF DFFCULTES

Nonhomogeneous interarrival time distribution;


distribution changes with time of the day, days of
the week, etc. You cant merge all these data for
distribution fitting!
Two arrival processes might be dependent; like
demand for washing machines and dryers. You
shouldnt treat them seperately!
Start and end of service durations might not be
clear; You should split the service into well
defined processes!
Machines may breakdown randomly; You should
3
collect data for up and down times!
STEP 2.1: IDENTFY THE PROBABLTY
DSTRBUTON Raw Data
10 8 5 1 6 0 4 6 2 3

Histogram with Discrete Data 2


5
3
1
5
8
9
9
2
1
0
9
2
3
4
7
2
4
3
0
2 6 3 1 4 5 0 3 3 2
2 10 0 3 6 0 6 5 7 0
Arrivals per 8 2 3 7 0 2 2 1 0 4
0 2 4 1 2 5 1 5 3 2
period Frequency 8 6 3 4 6 11 3 2 8 0
0 12 2 4 2 4 1 3 1 2 1 2
3 10 0 7 3 5 3 7 3 4
1 10
2 19 Histogram of Arrivals per Period
3 17 20
4 10 18
5 8
16
6 7
14
7 5
12
8 5
10
9 3 Frequency
8
10 3
6
11 1
4 4
2

0
0 1 2 3 4 5 6 7 8 9 10 11
STEP 2.1: IDENTFY THE PROBABLTY
DSTRBUTON Raw Data
79.919 3.081 0.062 1.961 5.845
Histogram with Continuous Data 3.027
6.769
6.505
59.899
0.021
1.192
0.013
34.760
0.123
5.009
Component Life 18.387 0.141 43.565 24.420 0.433
(days) Frequency 144.695 2.663 17.967 0.091 9.003
[0-3) 23 0.941 0.878 3.148 2.157 7.579
[3-6) 10 0.624 5.380 3.371 7.078 23.960
[6,9) 5 0.590 1.928 0.300 0.002 0.543
7.004 31.764 1.005 1.147 0.219
[9-12) 1
3.217 14.382 1.008 2.336 4.562
[12-15) 1
[15-18) 2 Histogram of Component Life
[18-21) 0
25
[21-24) 1
[24-27) 1
[27-30) 0 20
[30-33) 1
[33-36) 1 15
... ...
[42-45) 1 Frequency
... ... 10

[57-60) 1
... ... 5 5
[78-81) 1
... ...
[144-147) 1 0
3 6 9 12 15 18 21 24 27 30 33 36
STEP 2.2: SELECTNG THE FAMLY OF
DSTRBUTONS

The purpose of preparing a histogram is to infer a


known pdf or pmf.
This theoretical distribution is used to generate
random variables like interarrival times and
service times during simulation runs.
Exponential, normal and poisson ditributions are
frequently encountered and are not difficult to
analyze.
Yet there are beta, gamma and weibull families
that provide a wide variety of shapes.
6
Applications of Exponential Distribution

Used to model time between independent events,


like arrivals or breakdowns 7

Inappropriate for modeling process delay times


8
Applications of Poisson Distribution

Discrete distribution, used to model the number of


independent events occuring per unit time,
Eg. Batch sizes of customers and items

9
If the time betweeen successive events is exponential,
then the number of events in a fixed time intervals
is poisson.
10
11
Applications of Beta Distribution:

Often used as a rough model in the absence of data


Represent random proportions 12
Can be transformed into scaled beta sample
Y=a+(b-a)X
13
Applications of Erlang Distribution

Used to represent the time required to complete a task


which can be reprsented as the sum of k exponentially
distributed durations.

For large k, Erlang approaches normal distribution.

For k=1, Erlang is the exponential distribution with


rate=1/.

Special case of gamma distribution in which , the


shape parameter of gamma distribution is k.

14
Applications of Gamma Distribution

Used to represent time required to complete a


task

15
Same as Erlang distribution when the shape
parameter
is an integer.
Applications of Johnson Dist.
Flexible domain being bounded or
unbounded 16
allows it to fit many data sets.
If >0, the domain is bounded
If <0, the domain is unbounded
Applications of Lognormal Distribution
Used to represent quantities which is the product
of large number of random quantities
Used to represent task times which are skewed
17 to
right. If X~LOGN( l , l ), then lnX ~NORM(,)
18
Applications of Weibull Distribution

Widely used in reliability models to represent lifetimes.

If the system consists of large number of parts that fail


independently, time between successive failures can be
Weibull.

Used to model nonnegative task times that are skewed to left.

It turns out to be exponential distribution when =1.

19
Applications of
Continuous
Empirical
Distribution

Used to incorporate
empirical data as an
alternative to
theoretical
distribution, when
there are
multimodes,
significant outliers,
etc.

20
Applications of
Discrete Empirical
Distribution

Used for discrete


assignments such as job
type,
visitation sequence or
batch size
21
STEP 3: ESTMATE THE PARAMETERS OF
THE SELECTED DSTRBUTON
A theoretical distribution is specified by its parameters that
are obtained from the whole population data.
Ex: Let V,W,X,Y,Z be random variables, then
V~N(,2), where is the mean and 2 is the variance.
W~Poisson (), where is the mean
X~Exponential (), where is the mean
Y~Triangular (a,m,b), where a, m,b are the minimum,mod
and the maximum of the data
Z~Uniform (a,b), where a and b are the minimum and
maximum of the data

These parameters are estimated by using the point


estimators defined on the sample data
22
STEP 3: ESTMATE THE PARAMETERS OF
THE SELECTED DSTRBUTON
Sample mean and the sample variance are the point estimators for the
population mean and population variance
Let Xi; i=1,2,...,n iid random variables (raw data are known) , then the
sample mean and sample variance s2 are calculated as

Discrete Raw Data Continuous Raw Data


10 8 5 1 6 0 4 6 2 3 79.919 3.081 0.062 1.961 5.845
2 3 5 9 2 0 2 4 2 3 3.027 6.505 0.021 0.013 0.123
5 1 8 9 1 9 3 7 4 0 6.769 59.899 1.192 34.760 5.009
2 6 3 1 4 5 0 3 3 2 18.387 0.141 43.565 24.420 0.433
2 10 0 3 6 0 6 5 7 0 144.695 2.663 17.967 0.091 9.003
8 2 3 7 0 2 2 1 0 4 0.941 0.878 3.148 2.157 7.579
0 2 4 1 2 5 1 5 3 2 0.624 5.380 3.371 7.078 23.960
8 6 3 4 6 11 3 2 8 0 0.590 1.928 0.300 0.002 0.543 23
2 4 2 4 1 3 1 2 1 2 7.004 31.764 1.005 1.147 0.219
3 10 0 7 3 5 3 7 3 4 3.217 14.382 1.008 2.336 4.562
STEP 3: ESTMATE THE PARAMETERS OF
THE SELECTED DSTRBUTON
If the data are discrete and have been grouped in a
frequency distribution, i.e., the raw data are not known,
then

where k is the number of distinct values of X and fj;


j=1,2,...,k is the observed frequency of the value Xj of X.
Arrivals per period Frequency Arrivals per period Frequency

0 12 6 7

1 10 7 5

2 19 8 5

3 17 9 3
24
4 10 10 3

5 8 11 1
STEP 3: ESTMATE THE PARAMETERS OF
THE SELECTED DSTRBUTON
If the data are discrete or continuous and have been
grouped in class intervals, i.e., the raw data are not known,
then

where fj; j=1,2,...,c is the observed frequency of the jth class


interval and mj is the midpoint of the jth interval.

Component Life Component Life Component Life


(days) Frequency (days) Frequency (days) Frequency

[0-3) 23 [21-24) 1 ... ...

[3-6) 10 [24-27) 1 [57-60) 1

[6,9) 5 [27-30) 0 ... ...

[9-12) 1 [30-33) 1 [78-81) 1

[12-15) 1 [33-36) 1 ... ... 25


[15-18) 2 ... ... [144-147) 1

[18-21) 0 [42-45) 1
STEP 3: ESTMATE THE PARAMETERS OF
THE SELECTED DSTRBUTON

The minimum, mod (i.e., data value with the highest


frequency) and maximum of the population data are
estimated from the sample data as

Xt is the data value that has the


highest frequency.

26
STEP 4: GOODNESS OF FT TEST

Goodness of fit tests (GFTs) provide helpful guidance


for evaluating the suitability of the selected input
model as a simulation input.
GFTs check the discrepancy between the emprical and
the selected theoretical distribution to decide whether
the sample is taken from that theoretical distribution
or not.
The role of sample size, n:
If n is small, GFTs are unlikely to reject any theoretical
distribution, since discrepancy is attributed to the sampling
error!
If n is large, then GFTs are likely to reject almost all
distributions. 27
STEP 4: GOODNESS OF FT TESTS
CH SQUARE TEST
Chi square test is valid for large sample sizes and for both discrete
and continuous assumptions when parameters are estimated with
maximum likelihood.

Hypothesis test:
Ho: The random variable X conforms to the theoretical distribution
with the estimated parameters
Ha: The random variable does NOT conform to the theoretical
distribution with the estimated parameters

We need a test statistic to either reject or fail to reject Ho. This test
statistic should measure the discrepency between the theoretical and
the emprical distribution.
If this test statistic is high, then Ho is rejected,
Otherwise we fail to reject Ho! (Hence we accept Ho) 28
STEP 4: GOODNESS OF FT TESTS
CH SQUARE TEST
Test statistic:
Arrange n observations into a set of k class intervals or cells.
The test statistic is given by

where Oi is the observed frequency in the ith class interval and


Ei is the expected frequency in the ith class interval.

where pi is the theoretical probability associated with the ith


class, i.e., pi =P(random variable X belongs to ith class).

29
STEP 4: GOODNESS OF FT TESTS
CH SQUARE TEST
Recommendations for number of class intervals
for continuous data
Sample Size, Number of Class Intervals
n k
20 Do not use chi-square test
50 5-10
100 10 to 20
>100 to n/5

It is suggested that . In case it is smaller,


then that class should be combined with the
adjacent classes. Similarly the corresponding Oi
values should also be combined and k should be 30
reduced by every combined cell.
STEP 4: GOODNESS OF FT TESTS
CH SQUARE TEST
Evaluation
Let =P(rejecting Ho when it is true); the significance level is 5%.

follows the chi-


square distribution
with k-s-1 degress of
freedom, where s is
the number of
estimated
parameters.
Fail to Reject Ho Reject Ho

If probability of the test statistic < , reject Ho and the distribution


otherwise, fail to reject Ho. 31
CH-SQUARE DSTRBUTON TABLE

(k-s-1)
2
,1

32
STEP 4: GFT - CH SQUARE TEST
EX: POSSON DSTRBUTON
Consider the discrete data we analyzed in step 2.
Ho: # arrivals, X~ Poisson (=3.64)
Ha: ow
is the mean rate of arrivals, =3.64
The following probabilities are found by using the pmf

P(0)=0.026 P(6)=0.085
P(1)=0.096 P(7)=0.044
P(2)=0.174 P(8)=0.020
P(3)=0.211 P(9)=0.008
P(4)=0.192 P(10)=0.003
33
P(5)=0.140 P(>11)=0.001
STEP 4: GFT - CH SQUARE TEST
EX: POSSON DSTRBUTON
Calculation of the chi-square test statistic with k-s-1=7-1-1=5
degrees of freedom and =0,05.

34
So, Ho is rejected!
STEP 4: GFT - CH SQUARE TEST
EX: ARENA NPUT ANALYZER
Distribution Summary
Distribution: Normal
Expression: NORM(225, 89)
Square Error: 0.037778 Reject Normal distribution at
5% significance level!
Chi Square Test
Number of intervals = 12
Degrees of freedom =9
Test Statistic = 1.22e+004
Corresponding p-value < 0.005

Data Summary Fit all summary


Number of Data Points = 27009 Function Sq Error
-----------------------
Min Data Value =1
Normal 0.0506
Max Data Value = 1.88e+003 Gamma 0.0625
Sample Mean = 225 Beta 0.0639
Sample Std Dev = 89 Erlang 0.0673
Weibull 0.079
Histogram Summary Lognormal 0.0926
Exponential 0.286
Histogram Range = 0.999 to 1.88e+003
Triangular 0.311
Number of Intervals = 40
Uniform 0.36 35
STEP 4: GFT - CH SQUARE TEST
EX: ARENA NPUT ANALYZER
Distribution Summary
Distribution: Lognormal
Expression: 2 + LOGN(145, 67.9)
Square Error: 0.000271
Reject Lognormal distribution
Chi Square Test at 5% significance level!
Number of intervals =4
Degrees of freedom =1
Test Statistic = 207
Corresponding p-value < 0.005

Data Summary
Number of Data Points = 21547
Min Data Value =2
Max Data Value = 6.01e+003
Sample Mean = 146
Sample Std Dev = 79.5
Histogram Summary
Histogram Range = 2 to 6.01e+003 36
Number of Intervals = 40
STEP 4: GFT - CH SQUARE TEST
EX: ARENA NPUT ANALYZER
Distribution Summary
Distribution: Weibull
Expression: 0.999 + WEIB(94.7, 0.928)
Square Error: 0.002688 Reject Weibull distribution at
5% significance level!
Chi Square Test
Number of intervals = 20
Degrees of freedom = 17
Test Statistic = 838
Corresponding p-value < 0.005

Data Summary
Number of Data Points = 12418
Min Data Value =1
Max Data Value = 1.47e+003
Sample Mean = 108
Sample Std Dev = 135
Histogram Summary
Histogram Range = 0.999 to 1.47e+003
Number of Intervals = 40 37
STEP 4: GOODNESS OF FT TESTS
DRAWBACKS OF CH-SQUARE GFT
The Chi-square test uses the estimates of the parameters
obtained from the sample that decreases the degrees of
freedom.
Chi-square test requires the data to be placed in class
intervals in the continuous distributions where these
classes are arbitrary and affects the value of the chi-square
test statistic.
The distribution of the chi-square test statistic is known
approximately and the power of the test (probability of
rejecting an incorrect theoretical distribution) is sometimes
low.
Hence other GFTs are also needed!

38
STEP 4: GOODNESS OF FT TESTS
KOLMOGOROV-SMRNOV TEST
Useful when the sample sizes are small and when no
parameters are estimated from the sample data.
Compares the cdf of the theoretical distribution, F(x) with the
emprical cdf, SN(x) of the sample of N observations.

Hypothesis test:
Ho: Data follow the selected pdf
Ha: Data do NOT follow the selected pdf

Test Statistic:
The largest deviation, D between F(x) and SN(x).
39
STEP 4: GOODNESS OF FT TESTS
KOLMOGOROV-SMRNOV TEST
Steps of K-S Test:

1. Rank the data so that (1) (2) ()


2. Calculate the maximum discrepancy D between F and SN,

() = ( () )

# ()
() = =

40
STEP 4: GOODNESS OF FT TESTS
KOLMOGOROV-SMRNOV TEST

If F is discrete = +, , where

+

= max = max
0 0


1
= max 1 = max
0 0

If F is continuous
= max
0

41
STEP 4: GOODNESS OF FT TESTS
KOLMOGOROV-SMRNOV TEST
3. Evaluation
> , ,

, ,

42
STEP 4: GOODNESS OF FT TESTS
EXAMPLE: KOLMOGOROV-SMRNOV TEST
Consider the data:
0.44, 0.81, 0.14, 0.05, 0.93

Ho: Data are uniform between (0,1)


Ha: ow

i 1 2 3 4 5
Since D=0.26 < 0.05,5 = 0.565
0.05 0.14 0.44 0.81 0.93 Ho is not rejected!
= 0.05 0.14 0.44 0.81 0.93 Data are uniform between (0,1)
= / 0.20 0.40 0.60 0.80 1.00
43
/ 0.15 0.26 0.16 - 0.07

= ( 1)/ 0.05 - 0.04 0.21 0.13

Das könnte Ihnen auch gefallen