Sie sind auf Seite 1von 26

Tradeoffs in Flight-Design Upset Mitigation in State-of-the-Art FPGAs

Hardened By Design vs. Design-Level Hardening


Gary M. Swift and Ramin Roosta
Jet Propulsion Laboratory / California Institute of Technology

The research done in this paper was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under contract with the National Aeronautics and Space Administration (NASA) and was partially sponsored by the NASA Electronic Parts and Packaging Program. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not constitute or imply its endorsement by the United States Government or the Jet Propulsion Laboratory, California Institute of Technology.

Swift and Roosta

144_C4 / MAPLD04

In the beginning was Actel


Leveraging from a commercial product line ONO anti-fuse based one-time programmable (OTP) beginning = 1993 Reference:
Katz, R.; Barto, R.; McKerracher, P.; Carkhuff, B.; Koga, R.; SEU hardening of field programmable gate arrays (FPGAs) for space applications and device characterization, IEEE Transactions on Nuclear Science, Dec. 1994

Swift and Roosta

144_C4 / MAPLD04

Later, Xilinx
Leveraging from a commercial product line SRAM based reconfigurable

later = 1998 Reference: Guertin, S.M.; Swift, G.M.; Nguyen, D.; Single-event
upset test results for the Xilinx XQ1701L PROM, Radiation Effects Data Workshop Record, 1999

Quote:
(Xilinx SRAM-based FPGAs) do appear suited to a broad range of other (non-critical) applications, such as sensor and camera controllers.

Swift and Roosta

144_C4 / MAPLD04

OUTLINE
FPGAs: A key enabling technology for modern spacecraft Background in radiation testing of FPGAs
Earlier, Katz/Swift collaboration Recently, Xilinx Consortium

Feature Comparison Triple Modular Redundancy (TMR) hardware approach vs. software approach Concluding Remarks
Swift and Roosta 4 144_C4 / MAPLD04

FPGAs: A key enabling technology


Like custom ASICs, FPGAs can replace whole boards Saving mass, volume, power Achieving extra functionality FPGAs are much cheaper than ASICs Design efforts can be later in the schedule Design mistakes dont require a re-spin through the foundry

Swift and Roosta

144_C4 / MAPLD04

MER Pyro-Controller

Used self-checking of configuration to initiate a reconfiguration after spotting an upset


Swift and Roosta 6 144_C4 / MAPLD04

MER Pyro-Controller
Nearing Mars
30 25
predicted MER-A MER-B
Nov. 23 MER-A Nov. 23 MER-B

Xilinx XQR4062XL

# of Upsets

20 15 10 5 0 0

Oct. 28 MER-B

Oct. 28 MER-A

50

100 Days after Launch


7

150
144_C4 / MAPLD04

Swift and Roosta

My Background
Actel experience is older No direct involvement in radiation tests since the ONO anti-fuse was replaced Results here are from others work Xilinx experience is recent Active participant in Xilinx Rad Test Consortium Currently, finishing two+ year test campaign targeting the Virtex II family

Swift and Roosta

144_C4 / MAPLD04

Currently Available Devices

Actel RT54SX-S family (-SU)

vs.

Xilinx Virtex II family

Note: both are essentially immune to single-event latchup and have good total ionizing dose tolerance, [ Actel > 135 krad(Si); Xilinx > 200 krad(Si) ]

Swift and Roosta

144_C4 / MAPLD04

Main Feature Comparison


Actel RT54SX72S Gates: flip/flops: I/O Pins: Speed external : Speed internal : 72,000 2012 360 230 MHz 310 MHz Xilinx XQR2V6000 ~6M ( /~3.2 ) 67,584 / 3.2 = ~20k 824 / 3 = 274 622 Mb/s (I-mode LVDS) 360 MHz

Swift and Roosta

10

144_C4 / MAPLD04

Extra Features Comparison


Actel RT54SX72S Block RAM: I/O Standards: no many Xilinx XQR2V6000 2.5 Mb many Clock Manager Multipliers

Others: hardwired TMR

Swift and Roosta

11

144_C4 / MAPLD04

Actel: What bits can upset?


User flip-flops only Direct hits of same flip/flop in multiple domains
Very unlikely due to layout

Clock domain hits SEFI modes essentially eliminated

Swift and Roosta

12

144_C4 / MAPLD04

Xilinx: What bits can upset?


NAND Ex-OR Flip-Flop type etc

Configuration Bits Logical Function Routing User Options


Block RAM

User Flip-flops
Control Registers
Swift and Roosta 13

Type of I/O Mode of Block RAM Access Clock Manager etc

144_C4 / MAPLD04

Xilinx: Heavy Ion Test Results


Low Threshold (soft)
1.E-07

Low Susceptibility (hard)

Cross Section per Bit (cm )

1.E-08

1.E-09

1.E-10

X-2V1000 configuration bits Weibull Curve Fit

1.E-11 0 10 20 30 40
2

50

60

70

LET (MeV-cm /mg)

Resulting in fairly low in-space rates: ~6 per day for 2V6000 in GCRmin.
Swift and Roosta 14 144_C4 / MAPLD04

Actel: Heavy Ion Test Results


Wheres Threshold ???
Data for two RTAX2000S prototypes at 1 MHz using checkerboard pattern
1.E-09 10-9

Low Susceptibility (~100x harder)

from Fig. 12, JJ Wang et al., NSREC 2003 [Ref. 1]

Cross Section (cm 2)

307 315
1.E-10 10-10 0 20 40 60
2

80

100

120

LET (MeV-cm /mg)

Very low in-space rates (assume LETth > 40 achieved): ~1 per 6800 years for SX72-S in GCRmin.
Swift and Roosta 15 144_C4 / MAPLD04

Actel-style TMR
SX-A R cell triplicates to: RTSX-S

R cell

Swift and Roosta

16

144_C4 / MAPLD04

Actel-style TMR
Actel-style TMR is fairly straightforward: Each flip-flop is replaced by three plus feedback voter Triplicated elements spread out physically

Uses one clock/inverse-clock domain


No external parts needed

Swift and Roosta

17

144_C4 / MAPLD04

Xilinx-style TMR
Xilinx-style TMR is more complicated: First, its not too useful without configuration scrubbing Whole functional blocks are triplicated, not individual flip-flops Three voters are used Three clock domains Elimination of:
Weak keepers (aka half latches) Use of configuration cells as part of the design
- For example, SRL16

Needs some external circuitry (at least, a watchdog timer + PROMs)


Swift and Roosta 18 144_C4 / MAPLD04

Xilinx-style TMR

Swift and Roosta

19

144_C4 / MAPLD04

Xilinx-style TMR
In Xilinx-style TMR, I/Os use three pins tied externally :
P

Minority Voter

D0
P

Minority Voter D

D1
P

Minority Voter

D2

Board Traces Pins


Swift and Roosta 20 144_C4 / MAPLD04

Xilinx TMRtool
Xilinx-style TMR done by hand is difficult and tedious An automated tool which integrates into the design flow has been developed (now available) In-beam testing shows tool is very effective
Swift and Roosta 21

Design Entry
EDIF NGC

Simulation

EDIF TMR

XTMR

XILINX Back-Annotation Timing, ncd2edif, ncd2vhdl, ncd2verilog

XILINX Implementation Translate, Map, Floorplan, Par, BitGen

NGO

NCD

BIT

FPGA

144_C4 / MAPLD04

Upset Comparison
ATMR now has eliminated: Upsets of static storage elements, and SEFIs ATMR upsets from: Transients that are clocked into storage Clock tree hits Xilinx FPGAs have a small susceptibility to two types of SEFIs Reset (sometimes only partial) Disable scrub port XTMR in combination with scrubbing can lower system upset rates below the SEFI rate
Swift and Roosta 22 144_C4 / MAPLD04

Rate Comparison
Actel Dominated by transients Roughly one system error per thousand years (GCRmin) Xilinx Dominated by SEFI rate Expect one SEFI per ~65 years in GCRmin Expect one system error ~5-20x less often
GCR = Galactic Cosmic Ray background (interplanetary space) almost identical to geosynchronous orbit
Swift and Roosta 23 144_C4 / MAPLD04

CONCLUSIONS
For the present Both can achieve very acceptable radiation tolerance Actel wins on:
Less burden on the designer No auxiliary components Lower SEFI susceptibility

Xilinx wins on:


Designer control of the resources vs. hardness tradeoff On-chip feature set Re-configurability

Competition is good.
Swift and Roosta 24 144_C4 / MAPLD04

Acronyms
FPGA - Field Programmable Gate Array ASIC - Application Specific Integrated Circuit SEU - Single Event Upset SEFI - Single Event Functionality Interrupt TMR - Triple Modular Redundancy ATMR - Actel-style TMR XTMR - Xilinx-style TMR LET - Linear Energy Transfer (proportional to deposited charge per micron for a heavy ion strike on an active node) GCRmin - Galactic Cosmic Ray background (highest during solar minimum period of ~11-yr cycle of sunspots) MER - Mars Exploration Rovers (i.e., Spirit and Opportunity)
Swift and Roosta 25 144_C4 / MAPLD04

Additional References
[1] J.J. Wang, W. Wong, S. Wolday, B. Cronquist, J. McCollum, R. Katz, I. Kleyner, Single event upset and hardening in 0.15 antifuse-based field programmable gate array, IEEE Transactions on Nuclear Science, Dec. 2003 [2] Jih-Jong Wang, R.B. Katz, F. Dhaoui, J.L. McCollum, W. Wong, B.E. Cronquist, R.T. Lambertson, E. Hamdy, I. Kleyner, W. Parker, Clock buffer circuit soft errors in antifuse-based field programmable gate arrays, IEEE Transactions on Nuclear Science, Dec. 2000 [3] R. Katz, J.J. Wang, R. Koga, K.A. LaBel, J. McCollum, R. Brown, R.A. Reed, B. Cronquist, S. Crain, T. Scott, W. Paolini, B. Sin, Current radiation issues for programmable elements and devices, IEEE Transactions on Nuclear Science, Dec. 1998
Swift and Roosta 26 144_C4 / MAPLD04

Das könnte Ihnen auch gefallen