Sie sind auf Seite 1von 112

EE 658

Diagnosis and Design of Reliable


Digital Systems

Lecture 1:
Introduction to testing of digital circuits and
systems

University of Southern California


Viterbi School of Engineering
Ming Hsieh Dept. of Electrical Engineering
Moe Tabar
Fall 2020
References: Dr. Breuer’s lecture slides, books listed in the syllabus, and online resources
Key to Slides
Question: try to answer this question before class
Animation: reminder for me
Read only: not discussed in class
Advanced: material related to subject but not on exams

Dr. Moe Tabar - EE 658 2


What you should get from this lecture

▪ A good idea of the subject matter of this


course
▪ The importance of VLSI test
▪ Basic terminology and concepts related to
VLSI test
▪ How this material fits into the VLSI/CAD
curricula

Dr. Moe Tabar - EE 658 3


outline
▪ 1. The VLSI fabrication process
▪ 2. Defects
▪ 3. Process variation
▪ 4. Errors in computation
▪ 5. Yield
▪ 6. Burn-in
▪ 7. Wafer testing
▪ 8. Testing
▪ Introduction
▪ Modeling defects as faults
▪ Quality of a test
▪ Generating and applying a test
▪ Three test methods
▪ 9. Summary

Dr. Moe Tabar - EE 658 4


1. The VLSI fabrication process
Dr. Moe Tabar - EE 658 5
Read only
Building circuitry
Layer-by-layer
⚫ Murphy’s Law:
◦ If anything can go wrong, it
will.
⚫ Currently above 10 layers
of metal
⚫ 20-30 masks
⚫ Some problem areas:
▪ Planarization is difficult-need
filler in empty spaces
▪ Add 10 more items to this list

Dr. Moe Tabar - EE 658 6


Read only

Semiconductor Manufacturing
⚫ They're everywhere. From appliances to space ships, semiconductors have
pervaded every fabric of our society. They have transformed the world so
drastically that we've practically gone through hundreds of industrial revolutions
during the last five decades. So overwhelming is the power of computing and
signal processing today that it's difficult to believe how these can come from
sand.
⚫ Indeed, this world was reinvented simply by purifying sand, making it flat, and
adding materials to it. This magical process of building integrated circuits from
sand is now referred to as semiconductor manufacturing.

Semiconductor manufacturing consists of the following steps:


1) production of silicon wafers from very pure silicon ingots;
2) fabrication of integrated circuits onto these wafers;
3) assembly of every integrated circuit on the wafer into a finished product; and
4) testing and back-end processing of the finished products.

Dr. Moe Tabar - EE 658 7


Read only

Wafer Fabrication
⚫ Wafer fabrication generally refers to the process of building integrated circuits
on silicon wafers. Prior to wafer fabrication, the raw silicon wafers to be used
for this purpose are first produced from very pure silicon ingots, through either
the Czochralski (CZ) or the Float Zone (FZ) method. The ingots are shaped
then sliced into thin wafers through a process called wafering.
⚫ The semiconductor industry has already advanced tremendously that there now
exist so many distinct wafer fab processes, allowing the device designer to
optimize his design by selecting the best fab process for his device.
Nonetheless, all existing fab processes today simply consist of a series of steps
to deposit special material layers on the wafers one at a time in precise amounts
and patterns.
⚫ The first step might be to grow a p-type epitaxial layer on the silicon substrate
through chemical vapor deposition. A nitride layer may then be deposited over
the epi-layer, then masked and etched according to specific patterns, leaving
behind exposed areas on the epi-layer, i.e., areas no longer covered by the
nitride layer. These exposed areas may then be masked again in specific
patterns before being subjected to diffusion or ion implantation to receive
dopants such as phosphorus, forming n-wells.

Dr. Moe Tabar - EE 658 8


Read only

Epitaxy
⚫ Epitaxy or epitaxial growth is the process of depositing a
thin layer (0.5 to 20 microns) of single crystal material over a
single crystal substrate, usually through chemical vapor
deposition (CVD). In semiconductors, the deposited film is
often the same material as the substrate, and the process is
known as homoepitaxy, or simply, epi. An example of this is
silicon deposition over a silicon substrate.

⚫ Silicon epitaxy is done to improve the performance of bipolar


devices. By growing a lightly doped epi layer over a
heavily-doped silicon substrate, a higher breakdown voltage
across the collector-substrate junction is achieved while
maintaining low collector resistance. Lower collector
resistance allows a higher operating speed with the same
current.

Dr. Moe Tabar - EE 658 9


Read only

Diffusion and polysilicon


⚫ Diffusion and ion implant are the two major processes by which chemical
species or dopants are introduced into a semiconductor such as silicon to form
the electronic structures that make integrated circuits useful (although ion
implant is now much more widely used for this purpose than thermal diffusion).
⚫ Diffusion is the movement of a chemical species from an area of high
concentration to an area of lower concentration. The controlled diffusion of
dopants into silicon is the foundation of forming a p-n junction and fabrication of
devices during wafer fabrication.
⚫ Diffusion is used primarily to alter the type and level of conductivity of
semiconductor materials. It is used to form bases, emitters, and resistors in
bipolar devices, as well as drains and sources in MOS devices. It is also used to
dope polysilicon layers.
⚫ Thin films of  polycrystalline silicon, or polysilicon (also known as poly-Si or
poly), are widely used as MOS transistor gate electrodes and for
interconnection in MOS circuits.  It is also used as resistor, as well as in ensuring
ohmic contacts for shallow junctions.  When used as gate electrode, a metal
(such as tungsten) or metal silicide (such as tantalum silicide) may be deposited
over it to enhance its conductivity.

Dr. Moe Tabar - EE 658 10


Read only

Lithography/Etch
⚫ The fabrication of circuits on silicon wafers requires that several different layers,
each with a different pattern, be deposited on the surface one at a time, and
that doping of the active regions be done in very controlled amounts over tiny
regions of precise areas. The various patterns used in depositing layers and
doping regions on the substrate are defined by a process called lithography.
⚫ Simply put, the lithography process generally consists of the following steps. A
layer of photoresist (PR) material is first spin-coated on the surface of the
wafer. The resist layer is then selectively exposed to radiation such as ultraviolet
light, electrons, or xrays, with the exposed areas defined by the exposure tool,
mask, or computer data.
⚫ After exposure, the PR layer is subjected to development which destroys
unwanted areas of the PR layer, exposing the corresponding areas of the
underlying layer. Depending on the resist type, the development stage may
destroy either the exposed or unexposed areas. The areas with no resist
material left on top of them are then subjected to additive or subtractive
processes, allowing the selective deposition or removal of material on the
substrate.

Dr. Moe Tabar - EE 658 11


Read only

Optical Lithography
⚫ The fabrication of circuits on a wafer requires a process by which specific
patterns of various materials can be deposited on or removed from the wafer's
surface. The process of defining these patterns on the wafer is known as
lithography. Lithography uses photoresist materials to cover areas on the wafer
that will not be subjected to material deposition or removal.
⚫ Optical Lithography refers to a lithographic process that uses visible or
ultraviolet light to form patterns on the photoresist through printing. Printing is
the process of projecting the image of the patterns onto the wafer surface using
a light source and a photo mask. There are three types of printing - contact,
proximity, and projection printing. Equipment used for printing are known as
printers or aligners.
⚫ Patterned masks, usually composed of glass or chromium, are used during
printing to cover areas of the photoresist layer that shouldn't get exposed to
light. Development of the photoresist in a developer solution after its exposure
to light produces a resist pattern on the wafer, which defines which areas of the
wafer are exposed for material deposition or removal.

Dr. Moe Tabar - EE 658 12


Read only

Electron Beam Lithography 


⚫ Electron Beam Lithography (EBL) refers to a lithographic process that uses a
focused beam of electrons to form the circuit patterns needed for material
deposition on (or removal from) the wafer, in contrast with optical lithography
which uses light for the same purpose. Electron lithography offers higher
patterning resolution than optical lithography because of the shorter wavelength
possessed by the 10-50 keV electrons that it employs.
⚫ Given the availability of technology that allows a small-diameter focused beam of
electrons to be scanned over a surface, an EBL system doesn't need masks
anymore to perform its task (unlike optical lithography, which uses photomasks
to project the patterns). An EBL system simply 'draws' the pattern over the
resist wafer using the electron beam as its drawing pen. Thus, EBL systems
produce the resist pattern in a 'serial' manner, making it slow compared to
optical systems.
⚫ A typical EBL system consists of the following parts: 1) an electron gun or
electron source that supplies the electrons; 2) an electron column that 'shapes'
and focuses the electron beam; 3) a mechanical stage that positions the wafer
under the electron beam; 4) a wafer handling system that automatically feeds
wafers to the system and unloads them after processing; and 5) a computer
system that controls the equipment.

Dr. Moe Tabar - EE 658 13


Read only

Photoresist
⚫ During development, the unwanted areas in the PR are dissolved by the
developer. In the case wherein the exposed areas become soluble in the
developer, a positive image of the mask pattern is produced on the resist. Such a
resist is therefore called a positive photoresist. Negative photoresist layers
result in negative images of the mask pattern, wherein the exposed areas are
made less soluble in the developer. Wafer fabrication may employ both positive
and negative photoresists, although positive resists are preferred because they
offer higher resolution capabilities.
⚫ Photoresist materials consist of three components: 1) a matrix material (also
known as resin), which provides body for the photoresist; 2) the inhibitor (also
referred to as sensitizer), which is the photoactive ingredient; and 3) the solvent,
which keeps the resist liquid until it is applied to the substrate.
⚫ Etching is the process of removing regions of the underlying material that are
no longer protected by photoresist after development.  The rate at which the
etching process occurs is known as the etch rate.  The etching process is said to
be isotropic if it proceeds in all directions at the same rate.  If it proceeds in only
one direction, then it is completely anisotropic. 

Dr. Moe Tabar - EE 658 14


Read only
Etching
⚫ Since etching processes generally fall between being completely isotropic and completely
anisotropic, an etching process needs to be described in terms of its level of isotropy. Wet
etching, or etching with the use of chemicals, is generally isotropic. On the other hand, dry
etching processes that employ reactive plasmas are generally anisotropic. Wet Etching is
an etching process that utilizes liquid chemicals or etchants to remove materials from the
wafer, usually in specific patterns defined by photoresist masks on the wafer.  Materials not
covered by these masks are 'etched away' by the chemicals while those covered by the
masks are left almost intact.  These masks were deposited on the wafer in an earlier wafer
fab step known as 'lithography.
⚫ A simple wet etching process may just consist of dissolution of the material to be removed
in a liquid solvent, without changing the chemical nature of the dissolved material. In general,
however, a wet etching process involves one or more chemical reactions that consume the
original reactants and produce new species.
⚫ Reactive plasma etching involves the removal of surface material not protected by
lithographic masks using chemically active species. These species are usually oxidizing and
reducing agents produced from process gases that have been ionized and fragmentized by a
glow discharge. The species react with the exposed surface material, removing them from
the substrate while forming volatile byproducts in the process.

Dr. Moe Tabar - EE 658 15


A survey
▪ Consider a fabrication facility mass
producing a large state of the art
processor chip using their newest and
most advanced nano-CMOS technology.
What fraction of newly manufactured die
(chips) are bad and need to be discarded?

Dr. Moe Tabar - EE 658 16


2. Defects
Dr. Moe Tabar - EE 658 17
Spot defects-resistive opens and short
An impurity
resulting in
two open
wires
and one
partially cut

Sputtering in
deposition
of material
leading to
a short
Dr. Moe Tabar - EE 658 18
Example of 2-dimensional defect due
to a particulate

Khare and Maly, p. 28: Courtesy


Siemens A. G., Munich, Germany

Dr. Moe Tabar - EE 658 19


Extra metal spot defect

Khare and Maly, p. 3: Courtesy


Siemens A. G., Munich, Germany

Dr. Moe Tabar - EE 658 20


Another Defect

Dr. Moe Tabar - EE 658 21


3. Process variation
Dr. Moe Tabar - EE 658 22
Clean room

Dr. Moe Tabar - EE 658 23


Sources of process disturbances
▪ Human errors and equipment failures
▪ Instabilities in the process conditions
▪ Random fluctuations in the process environment such as turbulent
flow of gasses, inaccuracies in control of furnace temperature, etc.
▪ Material instabilities - variations in the physical parameters of
chemical compounds and other materials such as fluctuations in the
purity and physical characteristics of chemical compounds, density
and viscosity of photo-resist, etc.
▪ Substrate and surface inhomogeneities
▪ Local disturbances in the properties of substrate wafers that typically fall
into three categories: 1) point defects, 2) dislocations, and 3) surface
imperfections
▪ Spots: mainly lithography related disturbances caused by mask
fabrication process

Dr. Moe Tabar - EE 658 24


The normal distribution
▪ Most process steps conform to the normal distribution
▪ Thickness of deposition of a material
▪ Width of a line
Self-study: Look into the normal distribution, including its average and variance

Dr. Moe Tabar - EE 658 25


Normal distribution and acceptability

Unacceptable
parameter
Why might values
thicker wires
be more
acceptable
than thinner
ones? Acceptable metal 2
wire thickness (Ω/)

Dr. Moe Tabar - EE 658 26


Read only

Variation of solder thickness for a PCB

Dr. Moe Tabar - EE 658 27


4. Errors in computation
Dr. Moe Tabar - EE 658 28
Defects, faults and errors
⚫ A die defect is an unintended variation in the
physical aspects of a die, such as shown earlier.
⚫ A fault is a defect that can create an error in
computation. This can occur when a logic 1
erroneously becomes a logic 0 or vice versa, or
when a transition in logic value occurs to early or
late. Faults are usually associated with a common
form of defect, such as an open.
⚫ An error is when a signal has the wrong logic value
for a certain period of time. Not all errors create
wrong values in state variables.

Dr. Moe Tabar - EE 658 29


Sources of error producing entities
▪ Design error - e.g., the logic design has an
error, or the layout has an error, etc.
▪ These problems should have been caught during
design verification
▪ Producing a new chip consists primarily of design,
verification and test
▪ Such errors occur in almost all large VLSI designs
▪ Such errors are analogous to a bug in a computer
program
▪ Design errors are not considered in this course
▪ All copies of the component have this same error

Dr. Moe Tabar - EE 658 30


More sources of error producing entities

⚫ Fabrication error/mistake/fault - e.g.,


someone wired up a system improperly, or
inserted the wrong component
◦ These problems only occur in some components
◦ Fabrication errors are not dealt with in this
course

Dr. Moe Tabar - EE 658 31


More sources of error producing entities
▪ Fabrication defects - these are defects due to the
manufacturing process. They are primarily due to
▪ Process variations
▪ Spot defects due to deposition, etching or impurities

▪ Physical failures - these occur during the normal


lifetime of a system due to wear-out mechanisms like
metal migration and environmental factors
(temperature, vibration, etc. )

Yes- these are the guys we will focus on in this


class

Dr. Moe Tabar - EE 658 32


Characteristics associated with defects
⚫ Variation of defect occurrence with respect
to time
● Permanent- always present
● Intermittent – comes and goes – Due to an internal
problem, e.g. vibration or temperature. Very common
and hard to diagnose. Semi-predictable, e.g., always
occurs if T > 100o C.
● Transient – usually a random occurrence (non
predictable) due to an external source, e.g., a high
energy alpha particle in space, or a power surge.
(Soft-error rate (SER) is currently a major field of
study)
This class will only deal with permanent defects

Dr. Moe Tabar - EE 658 33


Effect of defect on logic and interconnect
⚫ The defect might affect the logical function of a circuit
component, e.g., a NAND gate operating as an Inverter. That
is, F = NOT(A•B) operates as F = NOT(A).
⚫ The defect might affect the interconnection pattern (topology)
of a circuit, e.g. creating an open circuit or a short.

Dr. Moe Tabar - EE 658 34


More effects due to defects
⚫ The defect might affect the operating speed of a
circuit, e.g., circuit has too much delay. May be
possible to operate the circuit “correctly” if you
slow down the clock. (See speed binning)
⚫ The defect might affect the voltage and/or
current levels of the signals, e.g., the threshold
voltage levels.
⚫ While not a behavioral effect, the number of
distinct defects that are present simultaneously
is important. (Single vs. multiple faults)

Dr. Moe Tabar - EE 658 35


5. Yield
Dr. Moe Tabar - EE 658 36
The problem is process variations and
defects
⚫ As scaling approaches molecular limits
◦ Process variation is very significant
◦ It is harder to control spot defects just by having
clean rooms
◦ Wavelength of light used for etching is larger
than feature size -- lithography is currently a
fundamental limitation to scaling

Dr. Moe Tabar - EE 658 37


Yield and other terms
⚫ Yield (Y): The yield of a manufacturing process is the fraction of
manufactured components (printed circuit board, chips, radios,
etc.) that have “no failures,” i.e., no error-producing
manufacturing/packaging anomaly. That is, the component meets all
the specifications. Usually we cannot determine this value of yield,
so we back-off on the definitions and focus on detected
error-producing anomalies and hence try to identify components
that appear to meet all specs.
⚫ Defect-level (DL or d): The fraction of bad components
certified as good after testing. These are sent to unhappy
customers. We like to minimize the defect level.
◦ Example: DL = 0.000050 implies that, on average, there are
about 50 bad components per million (106) components.
⚫ Yield-loss (YL): The fraction of good components certified as
bad and thus discarded. This is nice to minimize too.

Dr. Moe Tabar - EE 658 38


Why does yield-loss occur?
⚫ Testing is costly, to do less than a perfect job we usually try to error
on the side of caution, i.e., we are pessimistic. The engineering concept
is called “guard banding”.
⚫ Example: say a chip is good if runs at 500 MHz, at a temperature of …,
humidity of …
◦ What frequency should we test our chips?
◦ User’s clock might vary from 480-510 MHz?
◦ We must do this fast to keep the cost of testing low, so the
chip will not cost too much.
⚫ Assume a chip responds erroneously to a sequence of binary patterns
that can never occur under actual functional operation- is this chip
good or bad?
⚫ IDDQ testing usually has some yield-loss (quiescent Idd, or
quiescent power-supply current)

Dr. Moe Tabar - EE 658 39


Yield learning
End mass production

100% yield

X nanom
Begin mass production
0.7X nanom

Time in years

Dr. Moe Tabar - EE 658 40


Read only

Future fabrication issues


⚫ Global distributions and spot defects contribute to yield. As a
process matures, better control is achieved over the process
parameters and variations. Spot defects are also reduced,
but not to the same degree. Thus, in the early 1990’s,
spot defects were the dominant cause for yield loss in
mature processes.
⚫ In future CMOS technologies, it appears that process
variations will be a greater problem than spot defects.
⚫ What do you think will be the major manufacturing issues
related to
◦ Quantum computing devices
◦ Biological computing devices
◦ DNA computing

Dr. Moe Tabar - EE 658 41


Metal migration
▪ Metal migration occurs due to a high current density in a conductor.
Molecules actually get stripped away. The conductor gets thinner and
eventually becomes an open circuit.

Dr. Moe Tabar - EE 658 42


Read only

The bathtub curve


⚫ The operating life cycle
distribution of a population of
devices may be modeled as a
bathtub curve, if the failures
are plotted on the y-axis against Manufacturing failures
the operating life in the x-axis.
The bathtub curve shows that
the highest failure rates
experienced by a population of
devices occur during the early
stage of the life cycle, or early
life, and during the wear-out
period of the life cycle. Between
the early life and wear-out stages
is a long period wherein the
devices fail very sparingly.

0 hrs 100 hrs 200 hrs

Dr. Moe Tabar - EE 658 43


6. Burn-in
Dr. Moe Tabar - EE 658 44
What is Burn-in?
⚫ Wafer Burn-in is a form of accelerated aging
◦ Increase temperature
◦ Increase voltage
◦ Shake-rattle and roll!
◦ Power on device
⚫ Try to break “weak” parts of a die
⚫ Try NOT to break “good” parts of a die

Dr. Moe Tabar - EE 658 45


Read only
Burn-in
⚫ Burn-in is an electrical stress test that employs voltage and
temperature to accelerate in time the electrical failure of a
device. Burn-in essentially simulates the operating life of the
device, since the electrical excitation applied during burn-in may
mirror the worst-case bias that the device will be subjected to
in the course of its useable life. Depending on the burn-in
duration used, the reliability information obtained may pertain
to the device's early life or its wear-out. Burn-in may be used as
a reliability monitor or as a production screen to weed
out potential infant mortalities from the lot.

Dr. Moe Tabar - EE 658 46


Read only
Burn-in --cont
⚫ Burn-in is usually done at 125 deg C, with electrical
excitation applied to the samples. The burn-in process is
facilitated by using burn-in boards (see Fig. 1) where the
samples are loaded. These burn-in boards are then
inserted into the burn-in oven (see Fig. 2), which supplies
the necessary voltages to the samples while maintaining
the oven temperature at 125 deg C. The electrical bias
applied may either be static or dynamic, depending on
the failure mechanism being accelerated.

Figure 1: Bare and


Socket-populated burn-in
boards

Dr. Moe Tabar - EE 658 47


Burn in units
Figure 2: Burn-in ovens

Dr. Moe Tabar - EE 658 48


7. Wafer testing
Dr. Moe Tabar - EE 658 49
A wafer and test structures

Wires, oscillators,
transistors
Gates, F/Fs, etc.

Dr. Moe Tabar - EE 658 50


Wafer testing
From Wikipedia, the free encyclopedia

⚫ Wafer testing is a step performed during semiconductor


device fabrication. During this step, performed before a
wafer is sent to die preparation, all individual integrated
circuits that are present on the wafer are tested for
functional defects by applying special test patterns to
them. The wafer testing is performed by a piece of test
equipment called a prober, and the process is sometimes
referred to as a probe test or wafer sort.
⚫ When all test patterns pass for a specific die, its position is
remembered for later use during IC packaging. A die that
does not pass all test patterns is usually considered to be
faulty and is thrown away. Non-passing circuits are typically
marked with a small dot of paint in the middle of the die.

Dr. Moe Tabar - EE 658 51


Wafer testing (cont.)
From Wikipedia, the free encyclopedia

⚫ In some very specific cases, a die that passes some but not all test
patterns can still be used as a product, typically with limited
functionality. The most common example of this is a microprocessor
for which only one part of the on-die cache memory is functional. In
this case, the processor can sometimes still be sold as a lower cost part
with a smaller amount of memory and thus lower performance.
⚫ The contents of all test patterns and the sequence by which they are
applied to an integrated circuit are called the test program.
⚫ After IC packaging, a packaged chip will be tested again during the
IC testing phase, usually with the same or very similar test patterns. For
this reason, one might think that wafer testing is an unnecessary,
redundant step. In reality this is not the case, since the removal of
defective dies saves the considerable cost of packaging faulty devices.
However, when the production yield is so high that wafer testing
is more expensive than the packaging cost of defect devices, the wafer
testing step can be skipped altogether and dies will undergo blind
assembly.

Dr. Moe Tabar - EE 658 52


Probing a few pads of a die on a wafer

Dr. Moe Tabar - EE 658 53


Cutting a wafer into dice

Dr. Moe Tabar - EE 658 54


Probing a die

Dr. Moe Tabar - EE 658 55


A bare die test systems
• ATE

• Die under test

• Test fixture

• Test head
electronics

Dr. Moe Tabar - EE 658 56


Summary (till part 7)
▪ Manufacturing VLSI chips having billions of transistors and wires
▪ A sizeable fraction of these manufactured chips are bad
▪ Separating the good from the bad is the role of testing
▪ In practice chips are often partitioned, via testing, into 20-30 bins,
depending on what parts of them are good
▪ Test equipment is expensive hence testing is costly if it takes too long
▪ Determining tests that result in low defect levels and low yield loss is
very difficult and is a major focus of this course
▪ Designing chips so that test development is easier, or that the chip
actually tests itself, is also a major part of this course

Dr. Moe Tabar - EE 658 57


8. Testing – Introduction
Dr. Moe Tabar - EE 658 58
Various stages of testing
1. During wafer processing many checks take place
▪ Thickness of layers, chemicals, etc.
2. Wafer testing using probes
▪ Test simple test structures next to scribe lines, like ring
oscillators, simple logic cells, etc.
▪ Burn-in and stress testing
▪ Die on wafers usually powered on, and sometimes tests are run as
temperature and humidity are carefully controlled; in the ovens results
are not observed - infant mortality
▪ Test each die - not extensively because of head impedance and
lack of access points
▪ Label die accordingly
Package what you think are good die
4. Package testing-comprehensive
▪ Check packaging (interconnect) and logic
5. If applicable, system testing-usually functional

Dr. Moe Tabar - EE 658 59


Read only

Terminology
▪ Defects and Abnormal Processing Situations(DAPS): Something
unintended in the manufacturing process, e.g. extra metal, insufficient material
in a via, a pin hole in a conductor, etc.
▪ Defects and abnormal processing situations can effect the behavior and/or
performance of a chip, e.g. make it run slow, create errors at outputs, etc.
▪ But some DAPS have no noticeable effect on the operation of a circuit
▪ Errors: An error (usually binary) is a response that does not meet the
specification for the device.
▪ Failure: A physical part of a circuit effected by a DAPS that can result in
errors.
▪ Fault model: An abstract characterization of a failure, e.g., a line is stuck-at
1.
▪ Fault (failure) detection: Determining whether or not a circuit appears to
have a fault - actually a failure
▪ Fault (failure of defect ) diagnosis: Determining the details of the DAPS
that resulted in the failure

Dr. Moe Tabar - EE 658 60


Animation

More terminology
▪ Testing: A way to identify a faulty system, component or circuit.
Testing is an experiment in which a system is exercised (inputs
applied) and resulting responses analyzed to determine if the
behavior is correct. If not, then this implies a failure. (We assume
the design is correct.)
▪ Design-for-test and built-in self-test: Modifications made to a
design to make testing (fault detection and diagnosis) easier.
▪ One or the other or both are universally applied
▪ Automatic test pattern generation: A program that generates
test data to apply to a chip during fault detection and diagnosis.

Dr. Moe Tabar - EE 658 61


Importance aspects of testing
⚫ Quality – works properly
⚫ Reliability – will continue to work properly for
at least a reasonable amount of time (usually
7-10 years)
⚫ Maintainability – is easy to fix if it fails
◦ But we are moving into a throw-away society where reparability is
becoming less important

This course deal primarily with the issue of


Quality and a little with Maintainability and
nothing about Reliability
Dr. Moe Tabar - EE 658 62
Discussion

Some food for thought


▪ If the behavior during testing is correct,
does this imply there is no failure?
▪ Yes - what are your arguments?
▪ No - why not?

▪ If the behavior during testing is incorrect,


is the component useable?
▪ Yes- how so?
▪ No - why not?

Dr. Moe Tabar - EE 658 63


Discussion

Buying a new car


⚫ Car is minimally tested upon
assembly
⚫ Cars come with a 5 year and
50,000 warrantee
⚫ After 3000 miles you are
expected to bring it to the
dealer for an oil change and to
have your list of problems
fixed-for free
◦ The horn stopped working
after 3 weeks
◦ The cigarette lighter doesn't
work How and why is this
◦ Etc. scenario different from
buying a new high
⚫ After 4 months, the
transmission goes out and they performance processor
fix it for free chip?

Dr. Moe Tabar - EE 658 64


Discussion

Failures create errors


0011
0101
x 0 1 0 1
0 0/1 0 1
0 D 0 0

⚫ failures ≈ faults - are assumed to create errors, i.e.,


eventually under the right input conditions, the
response is wrong.
⚫ What should a system do if a failure first occurs in the
field?
⚫ How can we detect errors?
⚫ How can we detect failures?
⚫ Etc.
Dr. Moe Tabar - EE 658 65
8. Testing – Modeling Defects as faults
Dr. Moe Tabar - EE 658 66
Modeling defects: Three schools of thought
1. Model physical anomalies (DAPS) as logical and/or timing faults,
e.g., a wire that might be open is modeled as a signal line always at the
value of 0. Look at the logic design. Develop a test to exercise this
fault, that is, will create an observable error if the defect is present.
This must be repeated for every fault in the circuit that corresponds
to this model.
2. Consider only “functional” faults, e.g., does a decoder decode
properly; does a FF reset properly; etc. Look at small building blocks
such as adders, decoders and multiplexers and test their functionality.
3. Carry out functional or behavioral testing – does the circuit
carry out its primary functions
◦ Ignore the actual hardware implementation, hence ignore defects
and faults
◦ For a processor, execute programs

Dr. Moe Tabar - EE 658 67


1. Classical physical fault model
A defective NAND gate: Case 1

P1 P2
In 1 R In 2

Out
N1

N2

⚫ A manufacturing defect has shorted


drain and source of P1 together
Dr. Moe Tabar - EE 658 68
A defective NAND gate: Case 1
Steady-state behavior: In1=In2=Vdd

rsh: resistance of short


V(Out)
V(Out2)

Behavior similar Affects transient


to SA1 at line Out behavior
Dr. Moe Tabar - EE 658 69
A defective NAND gate: Case 1
Transient behavior: In1=GND->VDD, In2=Vdd

rsh = 2k
rsh = 5k
rsh = 8k

The defect causes extra delay

Dr. Moe Tabar - EE 658 70


A defective NAND gate: Case 2

A resistive open drain of N1

Dr. Moe Tabar - EE 658 71


A defective NAND gate: Case 2

V(Out
) rsh = 8k

rsh = 4k

rsh = 0
In
1

The defect causes substantial extra delay

Dr. Moe Tabar - EE 658 72


Read only

More on classical physical faults


▪ If either input (A or B) is
disconnected the input node
initially charges to VCC. Thus
the circuit acts as an inverter
w.r.t. the other input. Hence
a disconnected input in this
technology “looks like” a
stuck-at-1.
▪ An “open” in the 4 K Ω bias
resistor causes Y to appear to
be stuck-at-1.
▪ A short in this same resistor
does not result in the output
A transistor-level diagram of being stuck-at-1 or stuck-at-0.
a 7400 NAND gate relating
physical defects to logic
faults

Dr. Moe Tabar - EE 658 73


Read only

More on classical physical faults


V1 VH
RL
R1
Z (output)
D1 R2
A
A 3-input
B NAND gate
R3
C VL
V2

⚫ If diode D1 is open then this circuit operates as a 2


input NAND gate with inputs B and C. This is
equivalent to A = 1, so we can model this fault as A
stuck-at-1 ( A s-a-1)

Dr. Moe Tabar - EE 658 74


Read only

More on physical faults and shorts


VH
V1 VH

RL RL
R1
Z
D1-short R2 (output)
R2 ed
B
R3
C VL
R3
VL
V2
V2

▪ If D1 is shorted then we get the equivalent circuit shown


above
How would you model this fault ?

Dr. Moe Tabar - EE 658 75


Observations!
▪ Effect of actual defects is technology
dependent
▪ There are an infinite number of possible
defects (type and degree, like shorts and
their resistance)
▪ In general, we abstract these defects into
gross categories

Dr. Moe Tabar - EE 658 76


2. Functional fault modes and models
▪ What does a FF do?
▪ It sets, resets, holds
▪ We expect the Q and Qbar outputs to be of opposite values, and a certain
transfer rate from clock edge to output change
▪ What about a decoder?
▪ We expect the correct output to be high and all others low
▪ So, d0 = d2 = 1 is an error
▪ So X0=X1 = 1 and d3 = 0 is an error
▪ A functional test would attempt to verify that a device carries out its intended
function
▪ What about testing for illegal or non-functional inputs?
▪ Would this have to be an exhaustive test?

X0 d0
d1
2X4
d2
X1
d3

Dr. Moe Tabar - EE 658 77


3. Behavioral fault modes and models
⚫ No fault model is used
⚫ Ignore implementation issues
⚫ There are many different implementations, say C1, C2, … , Cp, for a
function f(x1, x2, …, xn). Is a “good” test for C1 necessarily a good test
for C2?
⚫ Exhaustive testing!
⚫ Let’s test f or C1 exhaustively!
◦ What is an exhaustive test for f?
◦ What effects are detected?
◦ How many patterns are in this test?
◦ What types of effects might not be detected?
◦ Is there a test that is non-exhaustive and is good for all the Ci’s?
⚫ If you did test a component in a circuit exhaustively, could you observe
the response? How?

Dr. Moe Tabar - EE 658 78


Read only
More on exhaustive testing
▪ Consider a finite state machine (FSM) M having n input wires, m
output wires, and q flip-flops. (You better know what a FSM is!)
▪ What is meant by an exhaustive test for M?
1
▪ How long is this test (in patterns or clock cycles)?
2

▪ What is the function or behavior of a microprocessor?


3
▪ Does a behavioral test consists of executing all possible operations
(op codes) with all possible operands (data)? So, a 32 bit adder
would be tested using 232 x 232 = 264 data values!
4
▪ How long would this take if a 800 MHz μp could carry out 4
instructions per clock cycle ?
5

Dr. Moe Tabar - EE 658 79


I assume you know all about:
⚫ Gates, flip-flops and latches
⚫ MUXes, decoders, encoders, ALUs, etc.
⚫ Review EE 101- a Freshman course
⚫ Some theory, such as DeMorgan’s law, K-maps, BSFs,
prime implicants, etc.

Dr. Moe Tabar - EE 658 80


8. Testing – Quality of a Test
Dr. Moe Tabar - EE 658 81
Determining the quality of a test
⚫ There are an innumerable number of failures producing DAPS
(defects) that can occur that make a device produce erroneous
results. Normally we attempt to characterize the various failures into
categories via a failure/fault model. We attempt to use models that
relate well to the most common defect that might occur, but it
appears that our classical fault models, such as single stuck-at and
bridging faults, are becoming inadequate for capturing the effect of
new fault mechanism, such as crosstalk, current leakage, delay, and
intermediate voltage levels.
⚫ We still use these models, however, because
◦ They are well understood
◦ We have lots of tools that support these models
◦ They give us reasonable results
Boy-these seem like weak excuses!
What do we mean by Quality?

Dr. Moe Tabar - EE 658 82


Fault coverage D

▪ Let D be the universe of all failures in a


circuit C.
▪ Let D* ⊆ D be those failures that
correspond to model M, e.g., a bridge or D*
open.
▪ Let T be a test, and assume T detects all
the faults corresponding to M in D** , D**
where D** ⊆ D. Note- T might also
detect other faults not in D*.
▪ T is associated with a fault coverage
value, FC, w.r.t. T, M and C, where
FC = (|D** ∩ D* |/ |D*|)100%
▪ Given T, we can usually determine FC(T)
with respect to various models M.
▪ Sometimes we generate different tests for
different models, such as Tdelay, Ts-a, Tshort,
etc.

Dr. Moe Tabar - EE 658 83


Observations regarding the next graph
⚫ Let T be a test procedure.
⚫ Defect-coverage (DC): The defect coverage of a test T is the
probability that T detects the existence of any failure (defect that
induces an error) in a circuit. In other words if a failure exists in a
circuit the defect coverage is the probability that T will detect the
failure.
⚫ Defect-level (DL): The defect level of a manufactured product
(after testing) is the probability of shipping a defective product
where Y is the yield and DC is the defect coverage

DL ≈1 – Y(1-DC) (2)

⚫ If you don’t test, then DC=0 and DL= 1-Y.


⚫ If testing is perfect, then DC=1 and DL = 0

Dr. Moe Tabar - EE 658 84


Defect level vs. defect coverage

⚫ For DC = 0.8 and Y=0.75, then


DL ≈ 0.06. So 6% of the parts
shipped are bad ! This is
unacceptable.
⚫ IC manufacturers want to ship
less than 50 bad parts per
million, i.e., 50 DPM, therefore
if Y = 0.75, what value of DC is
required?
⚫ What is the relationship
between DL and DPM?

Dr. Moe Tabar - EE 658 85


Defect level as function of defect
coverage and yield
Example: If our goal
is a quality level of
100 DPM and our
yield is 70%, then
the defect coverage
should be 99.99%.
But all the books
talk about
fault-coverage (FC).
What’s up Doc!

Defect coverage of test strategy (%)

Dr. Moe Tabar - EE 658 86


Animation

How good is enough


⚫ Clearly it is important to obtain very high quality
tests.
⚫ Normally we seek a fault coverage FC (FC ≤ DC
) that is above 98%.
⚫ Unlike some other products where statistical
sampling and testing is done, e.g. test every 100th
part, because of the concept of spot defects and
process variations, usually all IC’s must be tested.
Thus testing is a large part of the recurring cost
of an IC. In some cases it represents 50% of this
cost and this % is growing year by year.

Dr. Moe Tabar - EE 658 87


8. Testing – Generating and applying a test

Dr. Moe Tabar - EE 658 88


An example of a test for a fault
We
We say that: propagate
(sensitize)
A=x x the error
B=0 A 1
0 G 0/1
C=x B F
D=1 C 1
x E
1/0
is a test for the fault “line E D
stuck-at-0” in the circuit 1
shown.
How many binary patterns are We activated
the fault, i.e.,
actual tests for this fault? we created the
We
observed
If we apply the test pattern initial error-
an error at
0001 and observe a 1 at F, this requires
an output-
does this tell us that E is controllability
observabilit
stuck-at 0? y

Dr. Moe Tabar - EE 658 89


Testing a circuit
1. Z* = Z iff UUT is good
A test (X, Z) :
2. Z* ≠ Z iff UUT is faulty
X – input sequence
Z – correct output sequence
▪ Conditions (1) and (2) are
equivalent.
▪ Finding Z given X and UUT is
usually not too hard (Simulate the
design or run golden part).
▪ Finding Z* given X, the UUT and a
specific fault model is usually
Unit under test easy-fault simulation.
▪ Finding X given UUT so that
( UUT)
(1) is true is very hard.
X Circuit under Z* Generating X automatically given
test the UUT description (net list) and a
fault model is called Automatic Test
(CUT) Pattern Generation (ATPG).

Actual
response
Dr. Moe Tabar - EE 658 90
An ATPG system
New circuit Fault
description dictionaries
ATPG
ATE Test
description system statistics
(software)
Fault Test
models to program for
address ATE
and
coverage
One of our goals
desired
is to design and
build
an ATPG system

Dr. Moe Tabar - EE 658 91


What is Design-for-Test (DFT)?

UUT
X ← fault f
X (maybe) Z* ATE
From ATPG
DFT added
system

▪ X and Z derived from ATPG system


▪ DFT makes it easier to compute X, and to apply X and observe
Z*
▪ Compare Z and Z*
▪ If same- error free ⇒ UUT is good!
▪ Else discard UUT or use DFT to help diagnose fault/defect

Dr. Moe Tabar - EE 658 92


What is Built-In Self-Test (BIST)?

ATE
UUT
X ← fault f (maybe)
X’
BIST added Z’
From ATPG
system

⚫ Built-in hardware generates most of the test stimuli internally, and


knows most of the correct responses Z’. X’ is minimal.
⚫ Usually, no ATPG needed.
⚫ No similarity between X (ATPG), x the internal fault, and X’.
⚫ The ATE is not too complex and only needed for some simple
functions.

Dr. Moe Tabar - EE 658 93


8. Testing – Three test methods

Dr. Moe Tabar - EE 658 94


Animation

1. External Testing
Hey Doc-
⚫ ATE – Automatic Test Equipment what if the
ATE
is broken?
⚫ Go/ No-Go – Detection
⚫ Diagnostic dictionary – location
⚫ Probe information – location
⚫ Bed-of-Nails tester – I/O access
◦ In-circuit component testing

UUT
ATE CUT
DUT

Dr. Moe Tabar - EE 658 95


Read only
ATE
⚫ Electrical testing is the identification and segregation
of electrical failures from a population of devices. An
electrical failure is any unit that does not meet the
electrical specifications defined for the device. In
simplified terms, electrical testing consists of providing
a series of electrical excitation to the device under
test (DUT) and measuring the response of the DUT.
⚫ For every set of electrical stimuli, the measured
response is compared to the expected response, which
is usually defined in terms of a lower and an upper limit.
Any DUT that exhibits a response outside of the
expected range of response is considered a failure.

Dr. Moe Tabar - EE 658 96


Read only
ATE cont.
⚫ In production mode, electrical testing is usually performed
using a test system or platform, consisting of a tester (see Fig.
1) and a handler (see Fig. 2). Such a test system is also
referred to as an automatic (or automated) test equipment,
or ATE. The tester performs the electrical testing itself,
while the handler takes care of transferring the unit to the
test site and positioning it for proper testing, as well as
reloading it back into another tube after the testing process is
completed.
⚫ The testing process executed by the tester is controlled by
the test program or test software.  The test program is
usually written in a high level language such as  C++ or
Pascal.  It consists of a series of several test blocks, each of
which tests the DUT for a certain parameter. Every test
block sets up the DUT fixtures for proper testing  of the
DUT for the corresponding parameter.  It also tells the tester
what electrical excitation needs to be applied  to the DUT, as
well as the correct timing of applying them

Dr. Moe Tabar - EE 658 97


Read only
ATE cont.

Figure 2: Test handler


Figure 1: Tester

There are usually two versions of the test program. One is a


production (stringent) version and the other is a quality assurance
version. The production version has stricter limits compared to the QA
version, while the QA version more or less tests the DUT to the
datasheet specification limits. The differences in production and QA
limits, or the guardbands, should be large enough to take into account
errors attributed to over-all testing variability and noise, but not large
enough to result in over-rejection. If the guardband is chosen properly,
any unit passing the production test is almost sure to pass the
datasheet limits, regardless of which test equipment on the floor is
used.
Dr. Moe Tabar - EE 658 98
Read only
ATE cont.
⚫ The test program usually consists of two types of test blocks,
namely, parametric and functional. Functional testing checks if the
device is able to perform its basic operation. Parametric testing
checks if the device exhibits the correct voltage, current, or power
characteristics, regardless of whether the unit is functional or not.
Parametric testing usually consists of forcing a constant voltage at a
node and measuring the current response
(force-voltage-measure-current, or FVMC) at that node, or forcing a
constant current at a node and measuring the voltage response
(force-current-measure-voltage, or FCMV).
⚫ Electrical testing is normally done at ambient temperature, but
testing at other temperatures is also being done depending on the
screening requirements. For instance, latch-up problems have
better chances of being detected at an elevated temperature while
hot carrier failures are easier detected at low temperatures. Aside
from 25C, other standard test temperatures include -40C, 0C, 70C,
85C, 100C, and 125C.

Dr. Moe Tabar - EE 658 99


Read only

ATE cont.
⚫ Automatic Test Equipment (ATE), or testers (see Fig. 1),
are used in the process of automatically testing the electrical
characteristics and performance of finished devices.
⚫ ATE's vary widely in accordance with the types of products
they test. In general, however, it consists of an elaborate
controller- or microprocessor-based system that controls: 1)
boards or modules that can supply electrical excitation to the
device under test (DUT) and 2) boards or modules that can
measure the electrical characteristics and behavior of the DUT
in response to the applied excitation. Additional paraphernalia
such as family boards and DUT boards are attached to the
tester to configure it to the specific needs of the DUT, since
the testers themselves are often designed to be as generic as
possible.

Dr. Moe Tabar - EE 658 100


Read only

Test Handler
⚫ Mass production electrical testing can only be possible by attaching a test handler to an
ATE. A test handler (see Fig. 2) refers to the equipment used in presenting the unit to be
tested to the test site of the ATE, allowing the ATE to test the unit. After testing, the
handler puts the unit to the appropriate output location based on the ATE test results.
⚫ Test handlers vary widely in configuration. Some use gravity to bring the device under
test (DUT) to the test site and to reload them back into tubes. Others use special
electromechanical or pick-and-place systems to accomplish this. Some handlers can only
be assigned to one tester, yet some can be allocated to eight or more testers. A typical
test handler is equipped with a loading or input stage, a test site, a sort shuttle, an
unloading or output stage, various sensors, and interfaces to the tester.
⚫ For gravity-fed handlers, the input stage usually consists of input tracks into which the
input tubes containing the units to be tested are inserted. The units slide down the input
track into the test site for testing. After testing, the unit is then transported by the sort
shuttle to the appropriate output track based on whether the unit is good or bad.
Pick-and-place handlers usually pick the units for testing from a tray and present them to
the test site for testing. After testing, the pick-and-place system takes the unit and puts it
into the appropriate output tray.

Dr. Moe Tabar - EE 658 101


2. Self-testing
⚫ Here, the patient is the doctor! For
example, store test in memory of a μp and
have the μp execute the test.
⚫ What happens if the μp is faulty? Can a
designer know, consider, comprehend all
such faults when writing the test? What must
be working to execute a self-test?
⚫ If the power is turned off, will the machine
still display the error message “my power is
turned off ” ?

This leads to the concept of BIST


Dr. Moe Tabar - EE 658 102
Animation

BIST
n R n R

...
... C ...
1 2

Part of a Pipeline
Normal mode of operation

Test mode of operation


n R n R2

...
C ...
1* *

Comparator
Unequal
Count Counts the number Signature implies an
0f 1’s it sees: error due
generator:
0, 1, 2, ... to C,
0, 1, 2, ...
hence a
fault!
Dr. Moe Tabar - EE 658 103
Read only
More on BIST issues
▪ How would you determine the fault coverage of this technique with respect
to some model M?

▪ What constitutes, in terms of attributes, a good test pattern generator (R1*)


? How would you design it? What are the area and performance overheads?

▪ What constitutes, in terms of attributes, a good compressor (R2*)? How


would you design it? What are the area and performance overheads?

▪ How do we determine the correct signature, S?

▪ What are other good BIST architectures?

▪ How are high levels of controllability and observability achieved?

Dr. Moe Tabar - EE 658 104


Self-study

3. System level testing (diagnostic)

a11
a12 Mi
M1 M2 -- a machine
a42
a41 a23 a32 aik implies that Mi tests Mk
aik =1 implies that Mi concludes that
M4 M3 Mk is good (pass)
a33 aik = 0 implies that Mi concludes that
Mk is bad (fail)

Dr. Moe Tabar - EE 658 105


Self-study

More on system level diagnosis


▪ If Mi is fault-free then its conclusion (test
outcome) is correct; if Mi is actually faulty,
then its conclusion may be correct or
incorrect.
▪ Problem 1: Given {aij }, referred to as the
test result signature, determine which
processors are good and which are faulty?
▪ Problem 2: How should we interconnect the
test arcs to get reliable diagnosis?

Dr. Moe Tabar - EE 658 106


Self-study

An example of system level Diagnosis

a12
M1 M2
a21

▪ So, if a12 = 0 and a21 = 0, what do we fix?


▪ Would it help if we added self-testing arcs, akk ?

Dr. Moe Tabar - EE 658 107


Self-study

More on system level diagnosis


If only M3 is bad, can
M2 this system properly
diagnose the problem?

M1 M3

M6 M4

M5

Dr. Moe Tabar - EE 658 108


What are we doing about yield loss?

* * ** **
* * * *
* *
* * * * *
Error-free Error-free Error-free Discard Error
(minor memory memory after memory producing
defects) after recon- reconfiguration memory
Memory figuration and reduction in
with DT and (FT) capacity
DFM
Live with the errors if you
can!
* A defect that is masked This is called
* A defect that is not masked Error-Tolerance (ET)

Dr. Moe Tabar - EE 658 109


8. Testing – Summary

Dr. Moe Tabar - EE 658 110


Let’s summarize the important points
▪ Defects, process variation, old age and yield
▪ Failures, faults and errors
▪ Testing at various levels
▪ Wafer, die, packaged chip, system
▪ Test generation (ATPG), design-for-test and
built-in self-test and ATE
▪ Fault models
▪ Stuck-at, delay, shorts, opens, resistive
▪ Permanent, transient, intermittent
▪ Change in logic, interconnect, timing

Dr. Moe Tabar - EE 658 111


More important points
⚫ Modes of testing
◦ Off-line vs. on-line testing (occasionally vs. real-time)
⚫ Self-checking
◦ Dual-redundant, system level testing
⚫ Fault-tolerance--mask errors
⚫ Defect-tolerance -- compensate for defects that
create errors, e.g., use redundant vias
⚫ Error-tolerance -- learn to live with errors (Breuer
& Gupta)

And don’t ever forget the difference between a


Fault and an Error, and the MOST important
concepts of Observability and Controllability

Dr. Moe Tabar - EE 658 112

Das könnte Ihnen auch gefallen