Sie sind auf Seite 1von 36

Elsevier Editorial System(tm) for Expert Systems With Applications

Manuscript Draft

Manuscript Number: ESWA-D-08-00701


Title: An Ordinal Optimization Theory Based Algorithm for a Class of Simulation Optimization Problems and
Application
Article Type: Full Length Article
Keywords: ordinal optimization, stochastic simulation optimization, artificial neural network, genetic algorithm,
wafer probe testing
Corresponding Author: Assistant Professor Shih-Cheng Horng, Ph.D.
Corresponding Author's Institution: Chaoyang University of Technology
First Author: Shih-Cheng Horng, Ph.D.
Order of Authors: Shih-Cheng Horng, Ph.D.; Shieh-Shing Lin, Ph.D.

Cover Letter

Dear Editor,
We would like to submit the enclosed manuscript entitled An Ordinal Optimization
Theory Based Algorithm for a Class of Simulation Optimization Problems and
Application, which we wish to be considered for publication in Expert Systems with
Applications.
Correspondence and phone calls about the paper should be directed to Shih-Cheng
Horng at the following address, phone and fax number, and e-mail address

Institute: Department of Computer Science & Information Engineering, Chaoyang


University of Technology
Adress: 168 Jifong E. Rd., Wufong Township Taichung County, 41349, Taiwan, R.O.C.
Phone: +886-4-23323000 ext 7633
Fax: +886-4-23742375
e-mail : schong.ece90g@nctu.edu.tw
Thank you very much your considering our manuscript for potential publication. I'm
looking forward to hearing from you soon.
Sincerely yours,
Shih-Cheng Horng

* Manuscript
Click here to view linked References

An Ordinal Optimization Theory Based Algorithm


for a Class of Simulation Optimization
Problems and Application
Shih-Cheng Horng and
schong@cyut.edu.tw

Shieh-Shing Lin

sslin@mail.sju.edu.tw

Submitted to the Expert Systems with


Applications
as a REGULAR PAPER
Correspondent : Assistant Professor Shih-Cheng Horng
Institute: Department of Computer Science & Information Engineering
Chaoyang University of Technology
Address: 168 Jifong E. Rd., Wufong Township, Taichung County, 41349,
Taiwan, R.O.C.
Phone: +886-4-23323000 ext 7801
Fax: +886-4-23742375
e-mail : schong@cyut.edu.tw

Shih-Cheng Horng is currently an assistant professor of the Department of Computer Science and
Information Engineering at Chaoyang University of Technology, Taiwan, R.O.C. Shieh-Shing Lin is
now a professor of the Department of Electrical Engineering at St. John's University, Taiwan, R.O.C.
This work was partially supported by National Science Council in Taiwan, R.O.C. under Grant
NSC96-2622-E-129-005-CC3.

Abstract
In this paper, we have proposed an ordinal optimization theory based two-stage algorithm to
solve for a good enough solution of the stochastic simulation optimization problem with huge
input-variable space . In the first stage, we construct a crude but effective model for the
considered problem based on an artificial neural network. This crude model will then be used
as a fitness function evaluation tool in a genetic algorithm to select N excellent settings
from . In the second stage, starting from the selected N excellent settings we proceed
with the existing goal softening searching procedures to search for a good enough solution of
the considered problem.
We applied the proposed algorithm to the reduction of overkills and retests in a wafer probe
testing process, which is formulated as a stochastic simulation optimization problem that
consists of a huge input-variable space formed by the vector of threshold values in the testing
process. The vector of good enough threshold values obtained by the proposed algorithm is
promising in the aspects of solution quality and computational efficiency. We have also
justified the performance of the proposed algorithm in a wafer probe testing process based on
the ordinal optimization theory.
Key Words: ordinal optimization, stochastic simulation optimization, artificial neural
network, genetic algorithm, wafer probe testing.

1.

Introduction

Simulation optimization problems could be viewed as optimization problems of a system


whose outputs can only be evaluated by simulations (Fu et al., 2005). Thus, the objective of
simulation optimization is to find the optimal settings of the input variables to the simulated
system that makes the output variables at their best or optimal conditions. Various methods
had been developed for this purpose such as the Gradient Search based methods (Nocedal &
Wright, 2006; Kim, 2006), the Stochastic Approximation methods (Theiler & Alper, 2006;
Spall, 2003), the Sample Path methods (Hunt, 2005), the Response Surface methods (Myers
et al., 2004), and Heuristic search methods. These methods had been thoroughly discussed in
(April et al., 2003). Among them, the Heuristic search methods including the Genetic
Algorithm (GA) (Haupt & Haupt, 2004), the Simulated Annealing (SA) method (Suman &
Kumar, 2006), and the Tabu Search (TS) method (Hedar & Fukushima, 2006) are frequently
used in simulation optimization (Blum & Roli, 2003; Tekin & Sabuncuoglu, 2004). According
to an empirical comparison of these algorithms (Lacksonen, 2001), GA showed the capacity
to robustly solve large problems and performed well over the others in solving a wide variety
of simulation problems. Despite the success of several applications of the above heuristic
methods (Ahmed, 2007; Fattahi et al., 2007), many technical hurdles and barriers to broader
application remain as indicated in (Dro et al., 2006). Chief among these is speed, because
using the simulation to evaluate the output variables for a given setting of the input variables
is already computationally expensive not even mention the search of the best setting provided
that the input-variable space is huge. Furthermore, simulation often faces situations where
variability is an integral part of the problem. Thus, stochastic noise further complicates the
simulation optimization problem. The purpose of this paper is to resolve this challenging
stochastic simulation optimization problem effectively.
The considered stochastic simulation optimization problem is stated in the following

min J ( )

(1)

where is an input-variable space, and J () is the objective function, which may be an


expected output or a function of expected outputs of the simulated system. To cope with the
computational complexity of this problem, we will employ the Ordinal Optimization (OO)
theory based goal softening strategy (Lau & Ho, 1997; Ho, 1999), which seeks a good enough
solution with high probability instead of searching the best for sure based on the expectation
that the performance order of the input-variable settings is likely to be preserved even
evaluated by a crude model.

A crude model is defined as a model that is tolerant of a large

modeling noise. From here on, we will use the word setting to represent the setting of input
variables.
The basic idea of the OO theory based goal softening strategy is to reduce the searching
space gradually, and its existing searching procedures can be summarized in the following
(Lau & Ho, 1997): (i) Uniformly select N , say 1000, settings from . (ii) Evaluate and
order the N settings using a crude model of the considered problem, then pick the top s ,
say 50, settings to form the Selected Subset (SS), which is the estimated Good Enough Subset
(GS). A Good Enough Subset is defined as the subset consisting of the top n% solutions in
the input-variable space. (iii) Evaluate and order all the s settings in SS using the exact
model, then pick the top k ( 1) settings. In OO theory (Lau & Ho, 1997), the model noise is
used to describe the degree of roughness of the crude model. The OO theory had shown that
for N =1000 in (i) and a crude model with significant noise in (ii), the top setting (i.e., k=1)
selected from (iii) with s 50 must belong to the GS with probability 0.95, where GS
represents a collection of the top 5% actually good enough settings among N . This means the
actual top setting in SS selected from (iii) is among the actual top 5% of the N settings with
probability 0.95. However, the good enough solution of problem (1) that we are searching for
should be a good enough setting in instead of the N settings unless is as small as
3

N (Chen et al., 1999; Ho et al., 2007). As indicated in a recent paper by Lin and Ho (Lin &

Ho, 2002), under a moderate modeling noise, the top 3.5% of the uniformly selected N
settings will be among the top 5% settings of a huge with a very high probability ( 0.99),
and the best case can be among the top 3.5% settings of provided that there is no
modeling error. However, for with size of 10 30 , a top 3.5% setting is a setting among the
top 3.5 10 28 ones. This certainly not seems to be a good enough solution in the sense of
practical optimization; however, it is acceptable only when consists of lots of good
settings so that even if the performance order of the selected setting is not practically good
enough, the corresponding objective value is. As a matter of fact, most of the practical
stochastic simulation optimization problems do not have lots of good settings; otherwise,
finding a good enough solution wont be difficult. Therefore to apply the existing goal
softening searching procedures, we need to develop a new scheme to select N excellent
settings from to replace (i) so as to ensure the final selected-setting is a good enough
solution of (1) from the practical viewpoint.
Heuristic methods for obtaining N excellent settings may depend on how well ones
knowledge about the considered system. For instance in the optimal power flow problems
with discrete control variables, Lin et al. proposed an algorithm based on the OO theory and
engineering intuition to select N excellent discrete control vectors (Lin et al., 2004).
However, the engineering intuition may work only for specific systems. Thus, in this paper,
we will propose an OO theory based systematic approach to select N excellent settings
from and combine with the existing goal softening searching procedures to find a good
enough solution of (1). The presentation of this OO theory based two-stage algorithm to solve
(1) for a good enough solution is a novel approach in the area of simulation optimization and is
one of the contributions of this paper.
Reducing overkills and retests is an important issue in semiconductor wafer probe testing
4

process. Taking the chip demand into account, we have formulated this problem as a
stochastic simulation optimization problem, which possesses a huge input-variable space and
is most suitable for demonstrating the validity of the proposed OO theory based two-stage
algorithm. This novel formulation as well as the novel solution methodology for this
important and practical stochastic optimization problem is another contribution of this paper.
We organize our paper in the following manner. In Section 2, we will describe the OO
theory based two-stage approach and present the proposed two-stage algorithm. In Section 3,
we will introduce the stochastic optimization problem of reducing overkills and retests in
semiconductor wafer probe testing process and present the application of the proposed
algorithm. In Section 4, we will show the test results of applying the proposed algorithm on a
real case and demonstrate the solution quality and the computational efficiency by comparing
with a vast number of randomly generated solutions and competing methods, respectively.
We have also justified the performance of the proposed algorithm in a wafer probe testing
process based on the ordinal optimization theory. Finally, we will make a conclusion in
Section 5.
2.

The OO Theory Based Two-Stage Approach

Apparently the optimization problem (1) is a stochastic simulation optimization problem


with huge discrete input-variable space . However, to evaluate the true objective value of a
setting , we need to perform a stochastic simulation of infinite test samples for the .
Although infinite test samples will make the objective value of (1) stable, in fact, this is
practically impossible. Thus, sufficiently large test samples are utilized in place of infinite test
samples to make the objective value of (1), J ( ) , sufficiently stable.
The proposed OO theory based approach consists of two stages to solve (1) for a good
enough setting. The first stage is an exploration stage. In this stage, we will employ a Genetic
Algorithm (GA) to search through using an off-line trained Artificial Neural Network
5

(ANN) as a crude model for fitness evaluation and select N (=1024) excellent settings. The
heuristic generation of N (=1024) is based on the OO theory (Lau & Ho, 1997). The second
stage is an exploitation stage to find a good enough setting from the N settings obtained in
first stage with more refined crude models. A more refined crude model is defined as a model
that is tolerant of a small modeling noise. Suppose we use the exact model to evaluate all the
N settings, we can obtain the best setting in the N , however at the cost of too much

computation time, which is against our objective. Therefore, we will divide the second stage
into multiple subphases. The more refined crude models for estimating J ( ) of a setting
employed in these subphases are stochastic simulations of various lengths ranging from very
short (crude model) to very long (exact model). The candidate solution set in each subphase (or
the estimated good enough subset resulted from previous subphase) will be reduced gradually.
In the last subphase, we will use the exact model to evaluate all the settings in the most
updated candidate solution set, and the one with smallest J ( ) is the good enough setting
that we seek. Therefore, the computational complexity can be drastically decreased, because
the size of the candidate solution set had been largely reduced when the crude model is more
refined. In the following, we will present the details of the OO theory based two-stage
approach.
2.1 The First Stage Approach
Since the order of settings are relatively immune to effects of estimation noise, performance
order of the settings is likely to be preserved even evaluated using a crude model. Thus, to
select N excellent settings from without consuming much computation time, we need
to construct a crude but effective model to evaluate the objective value J ( ) for a given
setting , and use a selection scheme to select N excellent settings. Our crude model is
constructed based on ANN (Graupe, 2007), and our selection scheme is GA (Haupt & Haupt,
2004).
6

2.1.1 The Artificial Neural Network (ANN) Based Model


ANN is considered to be a universal function approximator due to its genetic and
convenient property to model complicated nonlinear input-output relationships. Considering
the inputs and outputs as the settings and the corresponding objective values J ( ) ,
respectively, we can use an ANN to implement the mapping from the inputs to the outputs
(Graupe, 2007). To construct such an ANN, first of all, we will select a training data set by
randomly sampling M settings without replacement from . The formula to calculate the
number of random sample (RS) of a given input-variable space is as follows (Moore &
McCabe, 1999):

RS

p (1 p ) z 2 CI 2
1 [ p (1 p ) z 2 CI 2 ]

(2)

where z is 1.96 and 2.57 for 95% and 99% confidence level, respectively; p is the
percentage picking a choice, which is 0.5 used for calculating sample size; and CI is
confidence interval which expresses as decimal. The confidence level is the estimated
probability that a population estimate lies within a given margin or error. The confidence
interval measures the precision with which an estimate from a single sample approximates the
population value. Consider an input-variable space with 1030 , the number of random
sample determining by (2) is 16641 for confidence level 99% and confidence interval 1%.
Then we will evaluate the objective values of these M =16641 settings using an exact model,
which can be a stochastic simulation with sufficient large number of test samples as indicated
in (Chen et al., 1999). These collected M input-output pairs of ( , J ( ) ) will be used to
train the ANN to adjust its arc weights. Once this ANN is trained, we can input any setting
to obtain an estimation of the corresponding J ( ) from the output of the ANN; in this
manner, we can avoid an accurate but lengthy stochastic simulation to evaluate J ( ) for a
given . This forms our crude model to roughly estimate the objective value of (1) for a
7

given setting . Effectiveness of this crude model is justified by the OO theory as mentioned
above, because what we care here are the relative order of s, not the value of J ( ) s.
2.1.2 The Genetic Algorithm (GA)
GA is a stochastic search algorithm based on the mechanism of natural selection and natural
genetics. By the aid of the above effective objective value (or the so-called fitness value in
GA terminology) evaluation model, we can select N excellent settings from using GA,
which is briefly described as follows. Assuming an initial random population produced and
evaluated, genetic evolution takes place by means of three basic genetic operators: (a) parent
selection; (b) crossover; (c) mutation. The chromosome in GA terminology represents a
setting in our problem, and each chromosome is encoded by a string of 0s and 1s. Parent
selection is a simple procedure whereby two chromosomes are selected from the parent
chromosome based on their fitness values. Solutions with high fitness values have a high
probability of contributing new offspring to the next generation. The selection rule we used in
our approach is a simple roulette-wheel selection. Crossover is an extremely important
operator for the GA. It is responsible for the structure recombination (information exchange
between mating chromosomes) and the convergence speed of the GA and is usually applied
with relatively high probability, say 0.7. The chromosomes of the two parents selected are
combined to form new chromosomes that inherit segments of information stored in parent
chromosomes. There are many crossover scheme, we employ the single-point crossover in our
approach. While crossover is the main genetic operator exploring the information included in
the current generation, it does not produce new information. Mutation is the operator
responsible for the injection of new information. With a small probability, random bits of the
offspring chromosomes flip from 0 to 1 and vice versa and give new characteristics that do
not exist in the parent chromosome. In our approach, the mutation operator is applied with a
relatively small probability 0.02 to every bit of the chromosome.

There are two criteria for the convergence of GA. One is when the fitness value of the best
chromosome does not improve from the previous generation, and the other is when evolving
enough generations. The initial populations of the GA employed in our first stage approach
are I , say 5000, randomly selected settings from . After the applied GA converges, we
rank the final generation of these I chromosomes based on their fitness values and pick the
top N chromosomes, which form the N excellent settings that we look for.
2.2 The Second Stage Approach
Starting from the selected N excellent settings, in the second stage, we will proceed
directly with step (ii) of the existing goal softening searching procedures described in Section
1. In this stage, we will evaluate the objective value of each setting using a more refined
model than the crude one employed in the first stage. This more refined model uses stochastic
simulation with various lengths (i.e. number of test samples) L . We let Ls = 100000
represent the sufficiently large L . In the sequel, we define the exact model of (1) as when the
simulation length L Ls . For the sake of simplicity in expression, we let J s ( ) denote the
objective value of a setting computed by exact model, i.e. L Ls .
First, we define a basic simulation length L0 =500. We set the simulation length of subphase
i , denoted by Li , to be Li kLi 1 (or Li k i L0 ), i 1,2,..., where the positive integer
k ( 2 ) denotes the parameter for controlling the simulation length Li . We let N 1 N and

set the size of the selected estimated good enough subset in subphase i to be N i N i 1 / k
(or N i N 1 / k i ), i 2,3,... . We denote nk as the total number of subphases, and n k is
determined by n k arg{ min ( L0 k nk 1 Ls L0 k nk , 1 N nk 10)} , where Ls = 100000. The
nk

above formula determines n k to be the minimum of the following: (i) the n k such that
simulation length L0 k nk exceeds the length of exact model, Ls , and (ii) the size of the

selected estimated good enough subset resulted in subphase n k is small enough, i.e.
1 N nk 10 . Once n k is determined, we set Lnk Ls , which imply that in the last subphase

(i.e. subphase n k ), the crude model is in fact the exact model of (1), and the setting with
smallest J ( ) is the good enough setting that we seek. Suppose k is very large such that
L1 kL0 Ls , then there will be only one subphase, and each of the N setting will be
evaluated by the exact model, which will consume too much computation time even though
the resulted setting is exactly the best among the N . However, it is not easy to quantify the
tradeoff between the computation time and the goodness of the obtained good enough setting
into an analytical formula. In fact, what is the best k is really problem dependent, because
some problems may care more on computation time and some others on the goodness of the
obtained solution. Therefore, we will show the computation time and the goodness of the
obtained good enough solution of our problem for various k in Section 3.
2.3 The Two-Stage Algorithm
Now, our OO theory based two-stage algorithm can be stated as follows.
Step 1: Randomly select M s from . Compute the corresponding J s ( ) for each

using simulation length Ls . Train an ANN by adjusting its vector of arc weights
using the obtained M input-output pairs, i.e. the M pairs of ( , J ( ) )s. Let f ( , )
denote the functional output of the trained ANN.
Step 2: Randomly select I settings from as the initial populations. Apply a GA with
the following setup: simple roulette-wheel selection scheme, single-point crossover scheme
with probability pc , and mutation probability p m to these chromosomes by the aid of the
fitness-value evaluation model, 1 f ( , ) . After the algorithm converges, we rank all the
final I chromosomes based on their fitness values and select the best N chromosomes (i.e.

s).
10

Steps 1 and 2 constitute the first stage approach.


Step 3: Use the stochastic simulation with simulation length Li k i L0 to estimate the
J ( ) of the candidate N / k i 1 s, i 1,, nk 1 ; rank the candidate N / k i 1 s based

on their estimated J ( ) and select the best N / k i s as the candidate solution set for
subphase i 1 .
Step 4: Use the stochastic simulation with simulation length Ls to compute the J s ( ) of
the candidate N / k nk s. The with the smallest J s ( ) is the good enough that we
look for.
Steps 3 and 4 represent the procedures of the second stage approach.
3.

Application to Reduction of Overkills in Wafer Probe Testing Process

3.1 Wafer Probe Testing Process


The wafer fabrication process is a sequence of hundreds of different process steps, which
results in an unavoidable variability accumulated from the small variations of each process
step. Chips are tested multiple times throughout the design and manufacturing process to
ensure the integrity of the chip design and the quality of the manufacturing process. Thus, to
avoid incurring the significant expense of assembling and packaging chips that do not meet
specifications, the wafer probing in the manufacturing process becomes an essential step to
identify flaws early. The primary components of a wafer probe testing system include probes,
probe card, probe station, and test equipment. Wafer probing establishes a temporary
electrical contact between test equipment and each individual die (or chip) on a wafer to
determine the goodness of a die. In general, an 8-inches wafer may consist of 500 to 15000
dies and each die is a chip of integrated circuits. Although there exist techniques such as the
statistical methods and machine learning methods (Chen et al., 2003; Barnett et al., 2005) for
monitoring the operations of the wafer probes, the probing errors may still occur in many

11

aspects and cause some good dies being over killed; consequently, the profit is diminished.
Figure 1 shows the Cause-and-Effect diagram of overkills.
Thus, reducing the number of overkills is always one of the main objectives in wafer probe
testing process. The key tool to identify or save overkills is retest, which is an additional wafer
probing. However, retest is a major factor for decreasing the throughput. Thus, the overkill and
the retest possess inherent conflicting factors, because reducing the former can gain more
profit, however, at the expense of increasing the latter, which will degrade the throughput and
increase the cost. What implies is that drawing a fine line for deciding whether to go for a
retest to save possible overkills is an important research issue in this optimization problem of
the wafer probe testing process. Considering the economic situation regarding throughput
requirement, it would be most beneficial for us to use the trade-off method (Collette & Siarry,
2003) to solve the current problem. That is to minimize the overkills subject to a tolerable
level of retests provided by the decision maker.
Probe Station

Method

Tester

Setup
Test Program

Probes
Overkills

Customer
request
Others

Probe
card

Device

Eng.
mistake
Material

Operator

Figure 1: Cause-and-Effect diagram of overkills.


There may be different testing procedures in different chip manufacturers. After the wafer
probing, a bin number is used to label each bad die of the wafer. A bin number denotes a
classification of circuitry-defect failure in a die. The bin number goes from 1 to a certain
number as defined by engineers. But, no matter what testing procedures are used, the decision
for carrying out the retest should be based on whether the number of good dies and the
12

number of bins in a wafer exceed the corresponding threshold values. Thus, determining these
threshold values so as to minimize the overkills under a tolerable level of retests is the main theme of
the optimization problem considered here. Furthermore, since the goodness of a die and the
probing errors are of stochastic nature, the considered problem becomes a stochastic simulation
optimization problem. Thus, this computationally intractable problem is most suitable for the
application of our OO theory based two-stage algorithm to seek for good enough threshold
values.
3.2 Problem Statements and Mathematical Formulation
In this section, we employ typical testing procedures used in a renowned wafer foundry in
Taiwan, which is briefly described in the following.
For every wafer, the wafer probing is performed twice. The second probing applies only to
those dies failed in the first one. A die is considered to be good if it is good in either probing.
We let wi ( wi ) denote the number of good (bad) dies in wafer i , and let Bij denote the
number of bin j in wafer i . Assume there are J types of bins in a wafer, then
J

wi Bij and wi TDi wi , where TDi denotes the total number of dies in wafer i .
j 1

Following the two times of wafer probing, a two-stage checking on the number of good dies
is performed to determine the necessity of carrying out a retest, i.e. an additional wafer
probing. We let Wmin denote the threshold value of the number of good dies in a wafer to
determine whether to pass or hold the wafer; we let b j max , j 1,..., J , denote the threshold
value of the number of dies of bin j in the hold wafer to determine whether to perform a
retest. The mechanism of the two-stage checking can be summarized below. If wi Wmin , we
pass wafer i ; otherwise, we will hold this wafer and check its bins. For those hold wafers, if
Bij b j max , we will perform retests for all dies of bin j to check whether there are probing

13

errors that cause overkills. This particular class of policies for deciding retest based on the
threshold values is commonly practiced in wafer fabrication processes. Thus, the relationship
between the inputs and the outputs of the considered problem can be described in Figure 2, in
which Wmin , b j max , j 1,..., J , are the input variables, V and R are the output variables,
and the tested wafers are part of the testing procedures. V

1 L
1 L
and
V
R

i
Ri
L i1
L i1

represent the average overkills and retests per wafer, respectively, in which Vi and Ri
denote the number of overkills and retests in wafer i , respectively, and L denotes the total
number of the tested wafers as shown in Figure 2.

Wmin

bj max , j 1,, J
Input
variables

L tested wafers

V,R
Wafer probe testing
procedures

Output
variables

Figure 2: Relationship between the inputs and the outputs of wafer probe testing procedures.
Details of the testing procedures for a wafer are shown in the flow chart of Figure 3, in
which the calculations of the number of overkills and retests are also included. For the
purpose of simulations, we randomly generate Bij based on a Poisson probability
distribution with mean j to represent the results of two times of wafer probing, which are
not performed in current computer simulation and thus shown in the dashed-line square in
Figure 3. Once Bij is generated, we can randomly generate the number of overkills in Bij ,
denoted by vijo , based on a Poisson probability distribution with mean j Bij , where j is
the proportional coefficient for bin j . The number of overkills in a bin is, in general,
proportional to the number of dies of that bin; that means the former will be less provided that
the latter is less. The values of j and j can be found from the real manufacturing data.
14

Next wafer
i=i+1

Current wafer i

Two times of wafer probing

Randomly generate Bij and the corresponding vij for all J


types of bins; calculate w
i

ij

j 1

and wi TDi wi

Is
wi W min ?

Yes
J

No

Vi

Pass
wafer i

Are all

v
ij

j 1

Ri 0

Yes

B ij b j max , j 1,..., J ?

No
j=1
Next bin
j=j+1

Next bin
j=j+1
No

Is

Pass
bin j

Bij bj max ?

vij vij
rij 0

Yes
Perform retests on all
dies of bin j

No

Is
jJ?

Yes

v ij 0
rij B ij

No

Is
jJ?

Vi

Yes

ij

j 1

Ri

ij

j 1

Figure 3: Flow chart of the wafer probe testing procedures.


15

In contrast to vijo , we let vij denote the number of overkills for bin j of wafer i after
completing the testing procedures and let rij denote the corresponding number of retests. In
these testing procedures, although we may pass the wafer when the threshold-value test is a
success, there may be overkills. As indicated in Figure 3, for the passed wafer i , the number
J

of overkills Vi vijo and the number of retests Ri =0. The same logic applies to the passed
j 1

bin j of the hold wafer i that vij = vijo and rij =0. However, for any retested bin, the
probability of any unidentified overkill is extremely small, because the dies had been probed
three times, which include two times of wafer probing before any retest. Thus, for any
retested bin j , we have vij =0 and rij = Bij as indicated in Figure 3. The resulting values of
Vi and Ri of wafer i shown in Figure 3 will be used to calculate V and R .
From Figure 3, we see that if we increase Wmin while decreasing b j max , there will be more
retests and less overkills. Thus, to reduce overkills under a tolerable level of retests, we will
set minimizing the average number of overkills per wafer, V , as our objective function while
keeping the average number of retests per wafer, R , under a satisfactory level. Thus, using
the trade-off method (Collette & Siarry, 2003), this optimization problem can be formulated
as the following constrained stochastic simulation optimization problem:
min V
xX

1 L
Vi
L i1

subject to R

1 L
Ri rT ,
L i 1

(3)

where x [Wmin , b j max , j 1,..., J ] denotes the vector of threshold values, that is the vector of
input variables; X

denotes the input-variable space; and rT denotes the tolerable

average-number of retests per wafer.

16

Remark 1: The value of rT is determined by the decision maker based on the economic
situation. When the chip demand is weak, the throughput, in general, is not a critical problem
in the manufacturing process; therefore, we can allow a larger rT so as to save more
overkills to gain more profit. On the other hand, if the chip demand is strong, then the
throughput is more important, thus we should set the value of rT smaller. Taking the chip
demand into account is a distinguished feature of our formulation.

This constrained stochastic simulation optimization problem (3) is to find an optimal vector
of threshold values, x , to minimize V subject to the employed testing procedures and the
constraint on R . Therefore, we can use a penalty function to transform (3) into the following
unconstrained stochastic simulation optimization problem:

min F ( x) V P( R rT ) ( R rT )
xX

(4)

where P( R rT ) denotes a continuous penalty function for the constraint R rT , such that
P( R rT ) 0 for R rT and P( R rT ) 0 for R rT .
3.3 Application of the Two-Stage Algorithm
The stochastic simulation optimization problem (4) has the same form as (1) by treating x
as , X

as , and F ( x) V P( R rT ) ( R rT ) as J ( ) . The size of the

input-variable space X is huge; for example, for an 8-inches wafer, which consists of a
typical number of 588 dies, the possible ranges of the integer values Wmin and b j max are [1,
588] and [1, 588], respectively. Consequently for a typical number of bin types K 10 , the
size of X will be more than 2.9 1030 . Thus, this stochastic simulation optimization
problem (4) is most suitable for the application of our two-stage algorithm.
3.3.1 Applying Step 1
To apply Step 1 of the two-stage algorithm to problem (4), we need to construct the crude
17

model based on the ANN first, which consists of two parts: (A) collecting the training data set,
and (B) train the ANNs. We employ two three-layer feed-forward back propagation ANNs
(Graupe, 2007). Assume there are J types of bins in a wafer, J 1 , 2 ( J 1) and 1
neurons are used in the input, hidden and output layers, respectively. The activation functions
of the neurons in the hidden and output layers are the hyperbolic tangent sigmoid and linear
functions, respectively. The inputs for both ANNs are x X ; while for the outputs, one is the
corresponding V and the other is R . We obtain the set of training data for the two ANNs
by the following two steps. (a) Narrow down the input-variable space X by excluding the
irrational threshold values and denote the reduced input-variable space by X . In general, the
yield rate and statistical distribution of the number of any bin for typical products can be
collected from a wafer foundry. Thus the threshold values, Wmin and b j max , should lie in a
reasonable range determined based on their corresponding mean values of wi and Bij ,
respectively. (b) Randomly select M =16641 vectors from X and compute the corresponding
outputs V and R using a stochastic simulation of large number of test wafers (Chen et al.,
1999), that is to perform the simulations of the testing procedures shown in Figure 3 for
Ls =100000 wafers. This constitutes part (A) of constructing the crude model.
We denote the M

randomly selected input vectors by x i , i 1,..., M , the M

corresponding simulated outputs V s by Vi , i 1,..., M , and R s by Ri , i 1,..., M . The


training problems for adjusting the arc weights of the above two ANNs are:
M

min c1 [Vi f1 ( xi | c1 )]2

(5)

i 1

and
M

min c2 [ Ri f 2 ( xi | c2 )]2 ,

(6)

i 1

where c1 and c2 denote the vectors of the arc weights of the ANN for V and the ANN
18

for R , respectively;

f1 ( xi | c1 )

and

f 2 ( xi | c 2 )

denote the actual outputs of the

corresponding ANNs when the input vector is x i . Thus, the training problems are trying to
adjust the vector of arc weights c1 and c2 to make the actual outputs f1 ( xi | c1 ) and
f 2 ( xi | c2 ) as close to the desired outputs Vi and Ri as possible. To speed up the
convergence of the back propagation training, we employed the BFGS quasi-Newton method
(Gill et al., 1981; Stanevski & Tsvetkov, 2004) and the one step secant method (Battiti, 1992;
Fiore et al., 2004) to solve (5) and (6), respectively. Stopping criteria of the above two
training algorithms are when any of the following two conditions occurs: (i) the sum of the
mean squared errors, i.e. the objective value of the training problem, is smaller than 10-3, and
(ii) the number of epochs exceeds 300. This constitutes part (B) for constructing the crude
model. Once these two ANNs are trained, we can input any vector x to the two ANNs to
estimate the corresponding V and R , which will be used to estimate F ( x ) . This forms
our crude but effective model to estimate F ( x ) for a given input vector x .
3.3.2 Applying Step 2
With the above crude but effective objective value (or the so-called fitness value in GA
terminology) evaluation model, we are ready to apply Step 2 of the two-stage algorithm to
select N ( 1024) excellent input vectors from

X using GA. The coding scheme we

employed for all the vectors in X is rather straightforward, because each component of the
vector x is an integer. We start from I ( 5000) randomly selected vectors from X as our
initial populations. The fitness value of each vector is calculated from F ( x ) based on the
outputs of the two ANNs. Apply a GA with the following setup: simple roulette-wheel
selection scheme,

pc 0.7 , single-point crossover scheme and

pm 0.02 to these

chromosomes. After the GA evolves for 20 iterations, we rank the final generation of the
I ( 5000) chromosomes based on their fitness values and pick the top N ( 1024)
19

chromosomes to serve as the N ( 1024) input vectors needed in Step 3.


3.3.3 Applying Step 3
Starting from the N ( 1024) input vectors obtained in Step 2, we will compute F ( x ) for
each input vector using a more refined model than ANNs, that is a stochastic simulation with
various number of test wafers. The basic number of test wafers is L0 =500, and n k is
determined by n k arg{ min ( L0 k nk 1 Ls L0 k nk , 1 N nk 10)} , where Ls = 100000. From
nk

i 1 to nk 1 , use the stochastic simulation with Li k i L0 test wafers to estimate the

F ( x ) of the candidate N / k i 1

x s; rank the candidate N / k i 1

x s based on their

estimated F ( x ) and select the best N / k i x s as the candidate solution set for subphase
i 1.

3.3.4 Applying Step 4


In this step, we will compute the objective value of (4) for each of the N nk input vectors
obtained in Step 3 using the exact model that is a stochastic simulation with sufficiently large
number of test wafers (i.e. Ls =100000) that makes the estimated objective value sufficiently
stable. Then the input vector among N nk associated with the smallest F ( x ) is the good
enough solution that we seek.

4. Test Results and Performance Evaluation


4.1 Test Results and Comparisons
Our simulations are based on the following data collected from a practical product of a
renowned wafer foundry in Taiwan. The product is made in 6-inches wafers. Each wafer
consists of 206 dies. There are 10 bins in the wafers of this product, and the values of their
means j , j 1,...,10, are respectively the following 10 positive real numbers: 0.5, 0.5, 1.1,
1.3, 0.8, 3.7, 3.5, 40, 45, and 13. The yield rate of this product is 46.6%. The mean of the
20

overkills that occurred in bin j is 0.03 Bij for j =1,,10, that is j 0.03 for all j .
The input-variable space is X { x [W min , b j max , j 1,...,10] | Wmin [1,206], b j max [1,206], j 1,...,10}.
We used the sigmoid-type function as our penalty function P( R rT ) in (4), i.e.,
P( R rT ) =

that

1
1 e ( R rT )

max Vi

i{1,..., M }

max Ri

for R rT , where ( 0.0617) is a normalized coefficient such

, and P( R rT ) 0 for R rT . We have simulated three cases of different

i{1,..., M }

rT s, which are 10, 40 and 80.


Remark 2: The reason we use 6-inches wafer products is for easier identification of the bins
and overkills in experiments. In fact, our results can apply to any size of wafer.

Specific data in the two-stage algorithm applying to this product are given in the following.
In Step 1 of the first stage approach, we have that (a) X is narrowed down rationally
however conservatively to X { x [W min , b j max , j 1,...,10] | W min [50,206], b j max [1,6 j ], j 1,...,10} ,
and (b) M =16641 and Ls =100000 wafers. In Step 2, I =5000, N =1024, and the
convergence criteria we employed for our GA is when the evolving number of generations
exceed 20. It should be noted that all the test results shown in this section are simulated in a
Pentium IV PC.
In the second stage approach, we test the computation time and the goodness of the
obtained good enough solution of our problem for various k in order to choose the suitable
one. We show the F ( x g ) (vertical axis) of the good enough solutions, x g , obtained and the
corresponding CPU time (horizontal axis) consumed by our algorithm with k 2,3,4,5,6
and 200 for the case of rT =10 in Figure 4. In general, smaller k corresponds to less CPU
time consumption because of less simulation replications. However, there is no guarantee that
21

larger k will lead to smaller F ( x g ) . Nonetheless, for sufficiently large k such as k =200,
the corresponding F ( x g ) is the least and CPU time consumption is the longest among all the
tested k s as we expect. Therefore, the choice of k is really problem dependent regarding
how fast one intends to obtain the solution or how good one cares about the obtained solution.
As can be observed in Figure 4, the consumed CPU time for k =2 in this test is within 2
minutes, it is suitable to choose k =2 for the sake of real-time application. Therefore, the
parameters in second stage of our algorithm are set as follows: k =2, L0 =500, Ln = 2 n L0 ,

n2 =8 and N n = 1024 2 n 1 .

Figure 4: The F ( x g ) obtained and the corresponding CPU time consumed by our algorithm
with k 2,3,4,5,6 and 200 for the case of rT =10.
Table 1 shows the simulation length and candidate solution set in each subphase of second
stage. In the last subphase, we use the stochastic simulation with simulation length Ls =
100000 to compute the F ( x g ) of the N n2 8 candidate solutions. The x with the smallest
Fs ( x ) is the good enough vector of threshold values x g that we look for.

22

Table 1: Number of candidate solution and simulation length in each subphase of second
stage.
1
subphase
1024
Nn
1000
Ln

2
512

3
256

4
128

2000

4000

8000

5
64

6
32

7
16

8
8

16000 32000 64000 100000

The good enough vector of threshold values and the average overkill percentage for three
cases rT =10, 40 and 80 we obtained from the two-stage algorithm are shown in Table 2.
From this table, we can observe that when rT increases, the values of Wmin increases as
shown in row 2, and the values of leading b j max , j =8 and 9, which accounts for most of the
retests, decrease as shown in rows 10 and 11, respectively. This indicates that if we allow
more retests, that is increasing rT , we can set more stringent threshold values, that are
increasing Wmin and decreasing the leading b j max , so as to save more overkills, that is
decreasing the average overkill percentage, as indicated in the last row of Table 2. This also
demonstrates the conflicting nature between the two objectives reducing overkills and
retests. We use 590 real test wafers, whose bins Bij and overkills before retest vijo are
known, to test the performance of the vector of threshold values obtained by our algorithm for
the three cases shown in Table 2. The corresponding results of the pair of the average
overkills per wafer, V ( 1

590

V ) ,
590
i 1

and the average retests per wafer, R ( 1

590

R ),
590
i 1

for

these 590 test wafers are shown in Figure 5 as the points marked by , , with the
corresponding rT shown on the top right corner of the figure. We also use 2000 randomly
selected vectors of threshold values to test the same 590 test wafers; the resulted pairs of V
and R are shown as the points marked by in Figure 5.

23

Table 2: The good enough vector of threshold values and the average overkill percentage for
three different rT s.

rT

10

40

80

132
2
1
5
5
5
3
6
64
78
29

146
2
1
2
4
5
8
4
55
62
11

157
3
1
3
4
3
5
4
14
52
13

1.36%

0.85%

0.23%

Good enough
vector x g
Wmin

b1 max
b2 max
b3 max
b4 max
b5 max

b6 max
b7 max

b8 max

b9 max
b10 max
V
100%
206

Figure 5: The resulted pairs of ( V , R ) obtained by our algorithm and the randomly generated
vector of threshold values.

We have also used typical GA and Simulated Annealing (SA) algorithm to solve (4) for the
case of rT =40. As indicated at the beginning of Section 1, the global searching techniques are
computationally expensive in solving (4). We stop the GA and SA when they consumed 30
24

times of the CPU time consumed by the two-stage algorithm, and the objective values of (4)
they obtained are still 11.8% and 19.9% more than the final objective value obtained by the
two-stage algorithm, respectively. Using the threshold values they obtained to test the 590
wafers, the resulted ( V , R ) pairs from GA and SA are marked by and + in Figure 5,
respectively. We found that using two-stage algorithm, we can save 11.8% and 19.9% more
overkills than using the GA and SA for R 40 , respectively. In addition, both GA and SA do
not generate the optimal solution, because the best so far solution they obtained for one hour
of CPU time are still far away from the optimal solution of (4).
We see that for R 40 , the V resulted by the good enough vectors of threshold values
obtained by our algorithm is almost the minimum compared with the randomly selected
vectors of threshold values. Similar conclusions can be drawn for the cases of rT =10 and 80.
From Figure 5, we can see that the results we obtained for the cases of rT =10, 40 and 80 are
almost on the boundary of the region resulted from the randomly generated vectors of
threshold values; this implicit boundary represents the ( V , R ) pairs resulted by the optimal
vectors of threshold values. The above result implies that our algorithm not only controls the
level of retests but also obtain a near optimal solution.
4.2 Performance Evaluation
It should be interesting to address how excellent the N selected vectors are among the
various types of input-variable space X so as to demonstrate the validity of our first stage
approach. Although there exists in-depth analysis of the approximation errors for ANN to
approximate continuous functions, the accuracy of approximating the input and output
relationships of a discrete event simulated system is usually addressed using empirical results.
Thus, it is not surprising that we do not get any analytical result for the quality of the N
vectors selected in our first stage approach. Since the input-variable space for test product is

25

X { x [Wmin , b j max , j 1,...,10 ] | Wmin [1,206 ], b j max [1,206 ], j 1,...,10} , the size of the
input- variable space is X 20611 .
The methodology for our performance evaluation is to simulate based on the Ordered
Performance Curves (OPCs) (Lau & Ho, 1997) and the employed crude model. The Order
Performance Curve (OPC) of all the ordered vectors x1 , x2 ,..., x| X | in X is determined by
the spread of the order performance F[1] , F[ 2] ,..., F[| X |] , where F[i ] denotes F ( xi ) . Without
loss of generality,

F[i ] s

can

be normalized into the

range [0,1], i.e., for

i 1,2,..., | X |, yi ( F[ i ] F[1] ) /( F[| X |] F[1] ) . Meanwhile, the ordered | X | vectors, spaced

equally,

are

also

mapped

into

the

range

[0,1]

such

that

for

i 1,2,..., | X |,

z ( xi ) z[i ] (i 1) /(| X | 1) . There are five broad categories of OPC models: (i) lots of good
vectors, (ii) lots of intermediate but few good and bad vectors, (iii) equally distributed good,
bad and intermediate vectors, (iv) lots of good and lots of bad but few intermediate vectors,
and (v) lots of bad vectors. Figure 6 shows a graphical expression of these five types of OPCs.
More precisely, a standardized OPC can be determined by a two-parameter smooth curve
B 1 ( z | , ) = B( z |

1 1
, ) , where B( z | ,) is the Incomplete Beta function of the two

parameters (,) . In general, <1, >1 corresponds to the OPC of type (i); >1, >1
corresponds to the OPC of type (ii); =1, =1 corresponds to the OPC of type (iii); <1,

<1 corresponds to the OPC of type (iv); >1, <1 corresponds to the OPC of type (v).
As indicated in Section 1, we need not consider the types of X consisting of lots of good
vectors in this evaluation, thus we take only the three OPC types (ii), (iii) and (v) into
account.

26

Figure 6: Five types of standardized OPCs.

The roughness of the ANN model can be described by adding a uniform noise to the
normalized performances yi s (Lau & Ho, 1997; Ho, 1999). That means, the model of ANN
can be described by the noisy model yi + i , where the random noise i representing a
large modeling noise that is generated from a uniform distribution random variable. We
assume various magnitudes of modeling noise of uniform distribution to represent the
approximation errors caused by the proposed ANN based model and make the following
simple experiments to compare the quality of the N vectors selected by GA based on the
ANN model with those selected in random from the solution space. We let U [-0.1,0.1]
denote the uniform distribution of a random noise ranging from -0.1 to 0.1 to be added to the
normalized performance, i.e. the normalized objective value, of the exact model. The
normalized performance for all solutions in a solution space is equally-spaced ranging from 0
to 1 with 0 as the top performance.
We studied a total of 28 OPCs distributed uniformly among the three broadly generic types,
(ii), (iii) and (v), formed from the following parameters: =1.0, 2.0, 4.0, 5.0 and =0.2, 0.4,
0.8, 1.0, 2.0, 4.0, 5.0. We carried out a Monte Carlo study for vast number of OPCs similar to
that in (Lau & Ho, 1997) for an assumed noise distribution and pick the top N vectors using
GA. In all of our Monte-Carlo calculations, we simulate 10000 realizations of noisy OPCs.
Consider three modeling noise distributions, U [-0.01,0.01], U [-0.05,0.05] and U [-0.1,0.1],
27

the top 5% solutions in N , which are selected by GA, are at least a top 2.37 10 6% , top
8.95 10 4 %, and top 1.19 10 3 % solution in X with probability 0.95, respectively.

However, the top 5% solutions in N , which are selected in random, is at best (i.e. no
modeling error) a top 5% solution in X only. Therefore, we have greatly improved the
quality of the N vectors by replacing the existing uniformly selecting procedure.

Remark 3: Though we do not investigate the actual order of the N vectors for the OPC
types (i) and (iv), our first stage approach can still be applied for problems with of these
two types of OPCs. This is because even if the order of the obtained N vectors of the two
types of OPC may not be as good as those of the other three OPC types due to the sharp
sensitivity of the noise to the performance in these two types, however their actual objective
values will still be good enough due to the existence of lots of good vectors. That means in
both OPC types (i) and (iv), there can be a big difference in the order of good vectors but the
difference in objective values are very small. Thus, no matter what types of OPC we are
facing, our first stage approach works the same way.
5. Conclusions
To cope with the computationally intractable stochastic simulation optimization problems,
we have proposed an ordinal optimization theory based two-stage algorithm to solve for a
good enough solution using reasonable computational time. To demonstrate the applicability
of the proposed algorithm, we have used it to solve for a vector of good enough threshold
values to reduce overkills and retests in a wafer probe testing process of a wafer foundry. We
have tested the performance of the solution we obtained using the real data and found that the
resulting average number of overkills and retests per wafer lie almost on the boundary
resulted from the optimal vector of threshold values of the considered stochastic optimization
problem. This indicates that the proposed algorithm will not only control the tolerable level of
28

retests by taking the various chip demand into account but also provide a near optimal vector
of threshold values. We have demonstrated the computational efficiency of the proposed
algorithm by comparing with the genetic algorithm and the simulated annealing method and
found that when the latter two methods consume more than 30 times of the CPU times
consumed by the proposed algorithm, the best so far objective values they obtained are still
not better than that obtained by the proposed algorithm. We have also justified the
performance of the proposed algorithm in a wafer probe testing process based on the ordinal
optimization theory.

References
Ahmed, M.A. (2007). A modification of the simulated annealing algorithm for discrete
stochastic optimization. Engineering Optimization, 39(6), 701-714.
April, J., Glover, F., Kelly, J.P. & Laguna, M. (2003). Practical introduction to simulation
optimization. In: Proceedings of the 2003 Winter Simulation Conference, vol.1 (pp.71-78).
New Orleans, LA.
Barnett, T.S., Grady, M., Purdy, K. & Singh, A.D. (2005). Exploiting prediction defect
clustering for yield and reliability. IEE Proceedings-Computers and Digital Techniques,
152(4), 407-413.
Battiti, R. (1992). First and second order methods for learning: Between steepest descent and
Newton's method. Neural Computation, 4(3), 141-166.
Blum, C. & Roli, A. (2003). Metaheuristics in combinatorial optimization: overview and
conceptual comparison. ACM Computing Surveys, 35(4), 268-308.
Chen, C.-H., Wu, S.D. & Dai, L. (1999). Ordinal comparison of heuristic algorithms using
stochastic optimization. IEEE Transactions on Robotics and Automation, 15(1), 44-56.
Chen, F.L., Lin, S.C., Doong, Y.Y. & Young, K.L. (2003). LOGIC product yield analysis by
wafer bin map pattern recognition supervised neural network. In: Proceedings of the 2003
29

IEEE International Symposium on Semiconductor Manufacturing (pp. 501-504). San Jose,


CA.
Collette, Y. & Siarry, P. (2003). Multiobjective optimization: principles and case studies. New
York: Springer-Verlag.
Dro, J., Ptrowski, A., Siarry, P. & Taillard, E. (2006). Metaheuristics for hard optimization:
methods and case studies. Berlin: Springer-Verlag.
Fattahi, P., Mehrabad, M.S. & Jolai, F. (2007). Mathematical modeling and heuristic
approaches to flexible job shop scheduling problems. Journal of Intelligent Manufacturing,
18(4), 331-342.
Fiore, C.D., Fanelli, S. & Zellini, P. (2004). An efficient generalization of Battiti-Shanno's
Quasi-Newton algorithm for learning in MLP-networks. In: Proceedings of ICONIP 2004,
Lecture Notes in Computer Science, Vol. 3316 (pp. 483-488). Berlin: Springer.
Fu, M.C., Glover, F.W. & April, J. (2005). Simulation optimization: a review, new
developments, and applications. In: Proceedings of the 2005 Winter Simulation
Conference (pp. 83-95). Orlando, FL.
Gill, P.E., Murray, W. & Wright, M.H. (1981). Practical optimization. New York: Academic
Press.
Graupe, D. (2007). Principles of artificial neural networks. 2nd ed. New Jersey: Hackensack.
Haupt, R.L. & Haupt, S.E. (2004). Practical genetic algorithms. 2nd ed. New York : John
Wiley.
Hedar, A.R. & Fukushima, M. (2006). Tabu Search directed by direct search methods for
nonlinear global optimization. European Journal of Operational Research, 170(3),
329-349.
Ho, Y.C. (1999). An explanation of ordinal optimization: Soft computing for hard problems.
Information Sciences, 113(3-4), 169-192.

30

Ho, Y.C., Zhao, Q.C. & Jia, Q.S. (2007). Ordinal optimization: Soft optimization for hard
problems. New York: Springer-Verlag.
Hunt, F.Y. (2005). Sample path optimality for a Markov optimization problem. Stochastic
Processes and Their Applications, 115(6), 769-779.
Kim, S. (2006). Gradient-based simulation optimization. In: Proceedings of the 2006 Winter
Simulation Conference (pp. 159-167). Monterey, CA.
Lacksonen, T. (2001). Empirical comparison of search algorithms for discrete event
simulation. Computers & industrial Engineering, 40(12), 133-148.
Lau, T.W.E. & Ho, Y.C. (1997). Universal alignment probability and subset selection for
ordinal optimization. Journal of Optimization Theory and Applications, 39(4), 455-489.
Lin, S.-Y. & Ho, Y.C. (2002). Universal alignment probability revisited. Journal of
Optimization Theory and Applications, 113(3), 399-407.
Lin, S.-Y., Ho, Y.C. & Lin, C.-H. (2004). An ordinal optimization theory based algorithm for
solving the optimal power flow problem with discrete control variables. IEEE
Transactions on Power Systems, 19(1), 276-286.
Moore, D. & McCabe, G. (1999). Introduction to the practice of statistics. 3rd ed. New York:
W.H. Freeman and Company.
Myers, R.H., Montgomery, D.C., Vining, G.G., Borror, C.M. & Kowalski, S.M. (2004).
Response surface methodology: A retrospective and literature survey. Journal of Quality
Technology, 36(1), 53-77.
Nocedal, J. & Wright, S.J. (2006). Numerical Optimization. 2nd ed. New York: Springer
Verlag.
Spall, J.C. (2003). Introduction to stochastic search and optimization estimation, simulation,
and control. New Jersey: John Wiley & Sons.
Suman, B. & Kumar, P. (2006). A survey of simulated annealing as a tool for single and

31

multiobjective optimization. Journal of the Operational Research Society, 57(10),


1143-1160.
Tekin, E. & Sabuncuoglu, I. (2004). Simulation optimization: A comprehensive review on
theory and applications. IIE Transactions, 36(11), 1067-1081.
Theiler, J. & Alper, J. (2006). On the choice of random directions for stochastic
approximation algorithms. IEEE Transactions on Automatic Control, 51(4), 476-481.
Stanevski, N. & Tsvetkov, D. (2004). On the quasi-Newton training method for feed-forward
neural networks. In: Proceedings of the International Conference on Computer Systems
and Technologies (pp. II.12-1-5). Rousse, Bulgaria.

32

List of Figures
On
First
page reference
page

Figure 1 Cause-and-Effect diagram of overkills. . . . . . . . . . . . . . . . . . . . . .

12

12

14

14

15

14

22

21

24

23

27

26

Figure 2 Relationship between the inputs and the outputs of wafer probe
testing procedures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Figure 3 Flow chart of the wafer probe testing procedures. . . . . . . . . . . . . .
Figure 4 The F ( x g ) obtained and the corresponding CPU time consumed
by our algorithm with k 2,3,4,5,6 and 200 for the case of

rT =10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Figure 5 The resulted pairs of ( V , R ) obtained by our algorithm and the
randomly generated vector of threshold values. . . . . . . . .
Figure 6 Five types of standardized OPCs. . . . . . . . . . . . . . . . . . . . . . . . . . .

List of Tables
On
First
page reference
page

Table 1

Number of candidate solution and simulation length in each


subphase of second stage.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Table 2

23

22

24

23

The good enough vector of threshold values and the average


overkill percentage for three different rT s . . . . . . . . . . . . . . . . . .

33