Fault Diagnosis Overview

Fault Diagnosis Overview
David Lavo
UC Santa Cruz
January 13, 2005
Outline
• Introduction: What is Fault Diagnosis?
• Components: What’s involved?
• Algorithm details: How does it work?
• Diagnosis in practice: How does it really
work?
• Research: Why does (or doesn’t) it work?
How should it work?
©2005 David Lavo Fault Diagnosis Overview 2

What is Fault Diagnosis?
• A guess as to what’s wrong with a

malfunctioning circuit
• Narrows the search for physical root cause
• Makes inferences based on observed
behavior
• Usually based on the logical operation of the
circuit

VLSI Fault Diagnosis
(in One Slide)
Defective Circuit
Observed
Tests
Behavior
Location
or
Fault
Physical Analysis Diagnosis Diagnosis Algorithm

Two Types of Diagnosis
• Circuit Partitioning (“Effect-Cause” Diagnosis)

– Identify fault-free or possibly-faulty portions
– Identify suspect components, logic blocks,
interconnects
• Model-Based Diagnosis (“Cause-Effect”
Diagnosis)
– Assume one or more specific fault models
– Compare behavior to fault simulations

Circuit Partitioning
• Separate known-good portions of circuit from
likely areas of failure
• Simplest method: identify failing flip-flops
– Tester can identify failing flops or outputs
– Input cone of logic is suspect
– Intersection of multiple cones is highly
suspect
– Single clock pulse with scan can be used
for sequential/functional fails

Back-Tracing Failures
aka Effect-Cause Diagnosis
• Reasoning based on observed behavior and

expected (good-circuit) functions
• Commonly used at system and board-levels
• Tries to separate good and suspect areas
• Advantage: Simple and general
• Disadvantage: Not very precise, often gives
no indication of defect mechanism

Cause-Effect Diagnosis
• Start from possible causes (fault models),
compare to observed effects
• A simulator is used to predict behavior of the
circuit in the presence of various faults
• Match prediction(s) against observed behavior
• Advantage: Implicates a mechanism as well as a
location
• Disadvantage: Can be fooled by unmodeled
defects

Cause-Effect Diagnosis
Behavior Signature
010001010100010101010 …
Defective Circuit Comparison &

Conclusion
Tests
Diagnosis
010100110000101010100 …
Algorithm
101000100001011101100 …
010100010100011101100 …
000111000101010011110 …
Fault Simulator
Candidate Signatures
Outline
work?
How should it work?

Components of Fault Diagnosis
• Fault models
• Fault simulators
• Fault dictionaries
• Diagnosis algorithms

Fault Models
• A fault model is an abstraction of a type of
defect behavior
• A fault instance is the application of a model
to a circuit wire, node, gate, etc.
• Used to create and evaluate test sets
• For diagnosis, they can be used to simulate
and predict faulty behaviors

Stuck-at Fault Model
• The most-used fault model
(by far)
• Simple to simulate and Node A stuck-at 1:
enumerate 0/1 A
0/1
• Effective for testing, fault 1
grading, and diagnosis of B
some defects (Fault-free/faulty

logic values)
• Many defects are not well
represented by the stuck-at
model
Bridging Fault Model
• Shorts are a common
defect type in CMOS Nodes X and Y bridged:
• Different bridging fault 0

X 0
models have varying 1
accuracy and precision,
from simplistic to very 1
Y 1/0
sophisticated 1
• Difficult or impractical to Node X forces Y

enumerate to a value of 0
Some Diagnostic Fault Models
Gate Fault
Net Fault
Bridging Fault Path Fault

Fault Simulators
• A fault simulator can simulate instances of a
particular fault model
• Inputs:
– Circuit (netlist)
– Test set
– Faultlist (list of fault instances)
• Output: circuit response
• Usually, simulates the presence of a single
fault instance (“single-fault assumption”)

Fault Dictionaries
• A fault dictionary is a database of the
simulated responses for all faults in faultlist
• Used by some diagnosis algorithms for
convenience:
– Fast: no simulation at time of diagnosis
– Self-contained: netlist, simulator, and test
set not needed after dictionary creation
• Can be very large, however!

The Full-Response Dictionary
• For each fault ( f ), store the response to each

test vector ( v )
• One bit per vector, pass ( 0 ) or fail ( 1 )
• For each vector, store the expected output
response ( o )
• Total storage requirement: f  v  o bits

The Pass-Fail Dictionary
• For each fault, store only the test vector

responses
• One bit per vector, pass ( 0 ) or fail ( 1 )
• Total storage requirement: f  v bits
• Much smaller than full-response, and often
practical for even very large circuits

Dynamic Diagnosis
• Alternative to dictionary-based diagnosis
• Fault simulation is only done for certain faults,
based on test results
– Only simulate faults in input cones of failing
flip-flops/outputs
• Dictionary is eliminated, but requires complete
netlist and test pattern file
• Used by most commercial ATPG tools: Mentor
Fastscan, Synopsys, Cadence, etc.

Outline
work?
How should it work?

Algorithm Details
• Role of a diagnosis algorithm

• Scoring methods
• Types of diagnosis algorithms

Diagnosis Algorithms
• Algorithms compare observed behavior to

predicted behaviors
• An algorithm attempts to “explain” the
observed failures with fault candidates
• The job of a diagnosis algorithm is to report
the best fault candidate(s)
• “Best” is determined by scoring method

Fault Candidate Scoring
• Two common scoring methods
– Match/mismatch points
– Fault candidate probability
• Other common scorings:
– Hamming distance
– Set intersection/overlap
– Nearest neighbor

Match/mismatch Point Scoring
• Award points for matching observed failures
• Optionally deduct points for not predicting fails
• Nonprediction: A behavior not predicted by
candidate
• Misprediction: A prediction not fulfilled by
behavior
• Commercial tools (e.g. Fastscan) are usually
biased to lowest nonprediction

Probabilistic Scoring
• Probability score based on matches and
mismatches and error assumptions
– Weights for non- and mis-prediction
– Different prediction probabilities for different
fault candidates (bridges vs. stuck-at)
• Usually normalized so that total of all
candidates equals 1.0
• UCSC method uses probabilities to compare
stuck-at candidates to bridges in same
diagnosis
Types of Diagnosis Algorithms
• Stuck-at
– Most common, best supported by tools
– Surprisingly effective (~60% exact matches)
– Very fast
• IDDQ
– Orthogonal set of failing data
– Requires interpretation of tester results
– Not well supported by tools

IDDQ Threshold Setting
180
160
140
120
100
80
60
40
20
0
0 50 100 150 200
Types of Diagnosis Algorithms
(Cont)
• Bridging-fault
– May better represent common CMOS faults
– More complicated fault model
– Biggest problem: candidate selection
• Other possible (future) directions:
– Functional fails
– Delay fails
– Parametric failures

Outline
work?
How should it work?

Diagnosis in Practice
• Using a diagnosis
• Translating the results: circuit navigation
• Evaluating diagnosis quality
• Commercial diagnosis tools

Using a Diagnosis
• Fault diagnosis is used to aid physical
inspection and root-cause identification
• Diagnosis output is logical, not physical:
– Abstract faults (such as stuck-at)
– Gates, ports (nodes), and nets
– No information about location or size
• Translation to physical location requires
navigation of circuit

Types of Circuit Navigation
• Netlist
– Examine RTL (Verilog/VHDL etc) for gates
and data paths
• Schematic
– Symbolic view of gates and wires
• Layout/artwork
– Graphical view of metal lines, poly, vias,
cell boundaries, etc.

Circuit Netlist
module TOP (CLK, Reset, StartOut, SiReady, Rst_CntN, Up_DnN, Wr, SDin, Wr_RAM, Wr_Rreg,
RAM_Addr, ATG_TESTMODE, BIST_TESTMODE, SDout, TwoOnes, OneOne, NoOnes, TwoZeros,
OneZero, NoZeros);
input CLK;
inout Reset, StartOut, SiReady, Rst_CntN, Up_DnN, Wr, SDin, Wr_RAM;
inout [2:0] RAM_Addr;

inout ATG_TESTMODE;
inout BIST_TESTMODE;
inout SDout, OneZero, NoZeros;
inout TwoOnes, OneOne, NoOnes, TwoZeros, Wr_Rreg;
// Tie off cells

TLOW tielow1 (.Q(tielow));
THIGH tiehigh1 (.Q(tiehigh));
// Inverted CLK
wire CLK_N;
INVFF clkinv (.Q(CLK_N), .A(CLK));
//PADS
PADNMIOSCM0H08N05B50 PAD001_StartOut (.PUEN(tiehigh),

.PDE(tielow),
.IEN(tielow), .I(StartOut_I), .SIGNAME(StartOut),
.INMODE(in_mode_avail), .TESTI(jumper001),
.TESTIEN(tiehigh), .SCANIN(jumper001),
.OUTMODE(out_mode_avail), .TESTO(tiehigh), .TESTOEN(tiehigh),
.O(tielow), .OEN(tiehigh));
Netlist Navigation
• Either use text editor on netlist, or use
browser function in simulator
• Browsers allow you to trace forward and
backward and see logic values
• Can be used to view hierarchy and functional
blocks
• Can be tedious

Circuit Schematic
Schematic Navigation
• Either hand-drawn (from netlist navigation) or
tool-generated gate symbols and wires
• Schematic tools in simulators also allow
forward and backward traversal and display
of logic values
• Used to verify fault propagation
• Does not reflect physical distances

Circuit Artwork
Layout (Artwork) Navigation
• Use routing/floorplanning tools to view artwork
• Can usually input cell or wire name and tool will
highlight the object
• Useful for determining (x,y) values
• Also good for evaluating physical implications of
a set of fault candidates
– Faults clustered in a small area are good
– Faults/nets spread around large die areas are
bad

Fault Proximity
Net runs
across die:
physical
examination
is almost
impossible
Faults
contained in
small area:
physical
examination
is possible
Evaluating a Diagnosis
• A diagnosis without one or a few strong (high-
scoring) candidates is usually poor
• Can indicate:
– Multiple defects
– Unmodeled (complex) behavior
– Inappropriate algorithm
• If the diagnosis is poor, either try another
algorithm or look for more data (failures)

Evaluating a Diagnosis (cont)
• Many diagnoses (~60%) implicate a single
stuck-at fault
• Usually a good sign, but you must consider
equivalent faults
• Many defects can mimic a stuck-at fault,
without being a short to Vdd or Gnd
• Consider nearby nodes also, if practical

Dominance Bridging Fault
Strong inverter
FIB short
Weak inverter
Top candidate is stuck-at fault
on this node.
Candidate #2 is Best
Candidate #1 Candidate #2
Candidate #3
FIB short
Commercial Tool:
Mentor Graphics
• ATPG tool: Fastscan
• Stuck-at diagnosis only
• No IDDQ capability
• Orders candidates by number of matched
failures (biased to lowest non-prediction)
• Also has netlist & schematic browser
• Based on Waicukauski & Lindbloom (D&T‘89)

Commercial Tool: Synopsys
• ATPG tool: TetraMAX

• J. Waicukauski moved to Synopsys after
writing Fastscan
• Diagnosis capability unknown: assumed to be
similar to Fastscan

Commercial Tool: Cadence
• ATGP tool: Encounter Test
• Test and diagnosis tools purchased from IBM
• IBM has had good diagnosis research, but
Encounter’s capabilities are unknown
• Also of interest: Silicon Ensemble - routing tool
• Graphical artwork viewer
• Good for highlighting nets and cells based on
diagnosis results
• Good for determining (x,y) and producing screen
shots

Outline
work?
How should it work?

Prior Art
• Waicukauski & Lindbloom, IEEE Design & Test, Aug. ‘89
– Most widely-used algorithm for commercial tools
– Finds candidates to match individual tests, attempts to “explain”
all failing tests
• Abramovici & Breuer, IEEE Trans. Computing, June ‘80
– Effect-cause diagnosis
– Permanent stuck-at fault assumption
• Aitken & Maxwell, HP Journal, Feb. ’95
– Analysis of relative importance of models vs. algorithms
• Lavo, Larrabee, et. Al., Proceedings of ITC ’98
– Probabilistic scoring
– Mixed-model diagnosis
• Bartenstein et. Al., Proceedings of ITC ’01
– SLAT: Single Location At-a-Time diagnosis
– Focus on matching per-vector results

Prior Art (cont)
• Jee & Ferguson, Proceedings of ISTFA ’93
– Carafe – Inductive Fault Analysis (IFA)
– Examine circuit to determine likely failure locations
• Aitken, Proceedings of ITC ’95
– Using FIBs to insert defects
– Calibrate/evaluate diagnosis methods
• Henderson & Soden, Proceedings of ITC ’97
– Probabilistic physical failure analysis
• Nigh, Vallett, et. Al., Proceedings of ITC ’98
– Large-scale, multi-company SEMATECH experiment
– Failure analysis of timing and IDDQ fails

Research Directions
• Complex defect behaviors
– Beyond stuck-at and 2-line bridges
– Intermittent faults
– Delay and timing-related defects
– Parametric & process-related defects
– Multiple simultaneous defects
– Is there a simple, inductive way to infer
complex defects?

Research Directions (cont)
• Diagnosibility
– What makes a particular circuit easy or
hard to diagnose?
– What can we do to make diagnosis easier?
• Evaluation of diagnoses
– What makes a good diagnosis?
– Can we quantify our confidence in a
diagnosis?

Research Directions (cont)
• Integration with physical FA & yield improvement
– Can we incorporate process information?
– Can we produce a “physical diagnosis”?
– On-line (or even on-chip) diagnosis
• Commercial toolflow integration
– Can diagnosis tools use industry-standard data
formats?
– Can commercial tools be scripted or
programmed to do better diagnosis?

Fault Diagnosis Overview

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Fault Diagnosis Overview

Hochgeladen von

Copyright:

Verfügbare Formate

Fault Diagnosis Overview

©2005 David Lavo Fault Diagnosis Overview 2

• A guess as to what’s wrong with a

©2005 David Lavo Fault Diagnosis Overview 3

Physical Analysis Diagnosis Diagnosis Algorithm

• Circuit Partitioning (“Effect-Cause” Diagnosis)

©2005 David Lavo Fault Diagnosis Overview 5

©2005 David Lavo Fault Diagnosis Overview 6

• Reasoning based on observed behavior and

©2005 David Lavo Fault Diagnosis Overview 8

©2005 David Lavo Fault Diagnosis Overview 9

Defective Circuit Comparison &

©2005 David Lavo Fault Diagnosis Overview 11

©2005 David Lavo Fault Diagnosis Overview 12

©2005 David Lavo Fault Diagnosis Overview 13

some defects (Fault-free/faulty

• Different bridging fault 0

• Difficult or impractical to Node X forces Y

Bridging Fault Path Fault

©2005 David Lavo Fault Diagnosis Overview 17

©2005 David Lavo Fault Diagnosis Overview 18

• For each fault ( f ), store the response to each

©2005 David Lavo Fault Diagnosis Overview 19

• For each fault, store only the test vector

©2005 David Lavo Fault Diagnosis Overview 20

©2005 David Lavo Fault Diagnosis Overview 21

©2005 David Lavo Fault Diagnosis Overview 22

• Role of a diagnosis algorithm

©2005 David Lavo Fault Diagnosis Overview 23

• Algorithms compare observed behavior to

©2005 David Lavo Fault Diagnosis Overview 24

©2005 David Lavo Fault Diagnosis Overview 25

©2005 David Lavo Fault Diagnosis Overview 26

©2005 David Lavo Fault Diagnosis Overview 28

©2005 David Lavo Fault Diagnosis Overview 30

©2005 David Lavo Fault Diagnosis Overview 31

©2005 David Lavo Fault Diagnosis Overview 32

©2005 David Lavo Fault Diagnosis Overview 33

©2005 David Lavo Fault Diagnosis Overview 34

inout [2:0] RAM_Addr;

// Tie off cells

PADNMIOSCM0H08N05B50 PAD001_StartOut (.PUEN(tiehigh),

©2005 David Lavo Fault Diagnosis Overview 36

©2005 David Lavo Fault Diagnosis Overview 38

©2005 David Lavo Fault Diagnosis Overview 40

©2005 David Lavo Fault Diagnosis Overview 42

©2005 David Lavo Fault Diagnosis Overview 43

©2005 David Lavo Fault Diagnosis Overview 46

• ATPG tool: TetraMAX

©2005 David Lavo Fault Diagnosis Overview 47

©2005 David Lavo Fault Diagnosis Overview 48

©2005 David Lavo Fault Diagnosis Overview 49

©2005 David Lavo Fault Diagnosis Overview 50

©2005 David Lavo Fault Diagnosis Overview 51

©2005 David Lavo Fault Diagnosis Overview 52

©2005 David Lavo Fault Diagnosis Overview 53

©2005 David Lavo Fault Diagnosis Overview 54

Das könnte Ihnen auch gefallen