Sie sind auf Seite 1von 7

A 32-Bit RISC Processor With Concurrent Error Detection

A. Maamar G. Russell Department of Electrical and Electronic Engineering University of Newcastle Newcastle upon Tyne ,UK

This paper describes the design and implementation o f a 32-bit RISC Processor with a Concurrent Error Detection capabi@. The CED scheme uses Dongs Code where the error detection capability depends upon the number of checkbits used and not upon the number o f data bits, hence can be made application specific. The equations used for check symbol prediction of both arithmetic and logical functions are outlined and its incorporation in a 32-bit Fault-Tolerant RISC processor described.

1. Introduction
As the scale of integration increases circuits become more susceptible to sources of transient or intermittent faults, the characteristics of the intermittent faults, and the increased use of complex VLSI circuits in safety critical applications , necessitate the use of a test strategy which continuously monitors the operation of circuits and compares them with some known reference. This approach is usually refereed to as Concurrent Error Detection (CED). One approach to incorporating a CED capability into VLSI circuits, which has been shown to be viable, is the use of information redundancy (coding techniques); several RISC processors incorporating information redundancy schemes have been designed and fabricated[1,2]. Invariably, the incorporation of CED schemes incw penalties on a design in terms of area overheads resulting from the additional hardware and routing space necessary to implement the scheme, the area overhead incurred is a function of the number of the checkbits used in the coding scheme. Amongst all of the separable codes used in CED schemes, Berger Code[3] is the least redundant separable code capable of detecting all unidirectional errors. There are however, many applications where the detection of all possible unidirectional errors is unnecessary and this has led to the development of several versions of the modified Berger

Code. One of these versions is Dongs code[4], although this requires fewer check bits, it has slightly reduced error detection capabilities. Within the code the number of checkbits used is a function of the error detection capability required, and not simply on the number of information bits in the data word as in the full Berger Code. This gives a degree of flexibility of application depending upon the type of system into which it is incorporated, trading area overhead, for its implementation, against error detection capability. In this paper the mathematical equations for the prediction of the check symbols in Dongs Code are outlined together with a brief description of the implementation of the check symbol prediction hardware in the design of a 32-bit RISC Processor with a Concurrent Error Detection (CED) capability.

2. Code Construction
To construct Dongs Code, it is first necessary to set the maximum weight (m) of the unidirectional errors needed to be detected by the code, regardless of the number of information bits. The check symbol of the code consists of two parts, referred to as Ckl and Ck2. The number of bits in Ckl is j, where : j=r log2(m+1)1. Ckl is equal to the binary representation modulo(m+l) of the number of zeros in the information bits represented in j bits. To obtain Ck2, Dong in[4] simply complements Ckl bit by bit. But in this paper, Ck2 will be generated by counting the number of zeros in Ckl and representing the result in binary form, this will reduce the number of bits in Ck2 by at least one bit; however, as the number of bits in CkI increases the saving of bits in Ck2 also increases, without effecting the error detection capability. The code detects all unidirectional errors except those which affect only the information bits and have weight equal to ( m + l ) or its multiples[4]. In other words if m is set to 7, the number of information bits is 32, and the errors affect only the information bits, then the code can detect any unidirectional error of weight not equal to 8,16, 24 and 32, but all other weights can be

461 1089-6503/98$10.00 0 1998 IEEE

detected. The code can also detect some other types of errors. For example, if the checkbits are affected by any number of unidirectional errors then the code can detect all types of errors (unidirectional or bi-directional errors which affect the information bits) this comes from the fact that the check symbols of the code form a set of unordered words in which no check symbol can be changed into another by any unidirectional errors; this is an advantage over the Berger Code itself. The error detection capability of the code is be summarised in Table(1). Type of error affecting the Information Bits

3.2 - CSP for Subtraction


Ckl=(2 +Xcr I -YCL-cck I- C,,+C,,,)mod(m+ 1 ) 1

3.3 - CSP for Arithmetic Shift Left
j +I

Ckl=( 2

+XckI +C,,,)mod (mi-])

3.4 - CSP for Arithmetic Shift Right J+ I Ck 1=(2 +XcL+ CO,,- X, )mod(m +I )
3.5 - CSP for Rotate Operation Ck l=Xck I

3.6 - CSP for Complement Operation

Ck 1


Ck 1=(2J+

- XCk ,)mod(m+ 1)

or its multiples

OR 0 +I 1 t o OR 0 -+I 1 -+o AND 0 -+I

3.7 - CSP for OR Operation Ckl= + XChl YCkl (XnY)ckl)mod(m+l) + 3.8 - CSP for AND Operation Ck1=(2+ xcil + Yck~-(XuY)~k~)mOd(m+ 1)

1 40 OR

All Errors

3.9- CSP for XOR Operation Ckl=(2 +(xuY)Ckl- (XnY)ckI- l)mod(m+l)

3.10 - CSP for EX-NOR Operation
J 1 J+ 1


Table( 1 ) Error detection capability of Dongs Code

Ckl=(2 +( XnY)ckl+(X n Y )Ckl


+ l)mod(m+l)


3. Check Symbol Prediction (CSP)

Check symbol prediction is one of the schemes used to perform concurrent error detection in arithmetic and logic functions, and it has been considered for some codes. In this section the mathematical equations for the check symbol prediction using Dongs Code, for arithmetic and logic functions, will be outlined. The equations are based on the mathematical foundation for the prediction of Berger Codes described $51. These equations will be used to design the check symbol prediction circuit needed to implement concurrent error detection in the ALU of the processor. Given an arithmetic or logic operation S =X opY, where the operands X and Y are coded into Dongs Code. Let X~kl and YCklbe the Ckl check symbols of the operands. The can predicted Ckl ofthe result (SCkl) be computed from x,y , XCkl, and YCkI, i.e. SCkl = f ( x , y , XCkI, YCkI), only Ckl of the result needs to be predicted, Ck2 is then generated from Ckl. The equations for computing the check symbols for a range of arithmetic and logic operations are given below:

3.11 - CPS for Increment Operation:

3.12 - CSP for Decrement Operation

-C,,)mod(m+l) Ckl=( 2 + Xcrl + CO,,- CCkl Ckl= (2J+1 XCk] + CO,,-Cckl)mod(m+l) +

(1 1) (12)

4. RISC Architecture
A functional block diagram of the RISC processor is

shown in Figure(1). The RISC processor is divided into two main blocks, the information processing block and the
check symbol prediction block. The function and composition of the information processing and check symbol prediction block are outlined below.

4.1 Information Processing Block

The information processing block fetches and executes arithmetic and logic operations, and performs data transfer frondto register file and the external memory. The block consists of ALU, the Data Register File (DRF), and the Control Unit. a) ALU : The ALU is designed as three separate units, namely the Arithmetic Unit, Logic Unit, and the ShifterRotator Unit, the reason for the separation is because the internal carry generation is the time consuming part in the


arithmetic unit, hence, other operations which do not require internal carry generation can be performed faster if they are separated fiom the arithmetic operations. To speed up the performance of the arithmetic unit, the carry generator should be designed to be as fast as possible, there are several different methods to generate the internal carries, choosing one of them is a trade off between speed and silicon area. In this design the carry generation circuitry, initially adopted, is that proposed by Brent&Kung[7], since it is very suitable for a VLSI implementation, and takes only 16 gate levels to generate the internal carries for a 32-bit operand. However during the implementationit was discovered that it is possible to modify the design reducing the number of gate levels to 12 without affecting its regularity for VLSI implementation. b). Control Unit : The control unit is the most complex module in the RISC processor , its function is to fetch, execute the instructions, and provide the necessary sequences of operations required by other blocks in the circuit to perform their functions. The control unit uses several special purpose registers such as program counter, status register etc., to perform its task. All the instructions are executed in one clock cycle, however, the number of the internal cycles depends upon the operation to be performed. To overcome the processor-memory bottleneck problem, memory access operations are kept to a minimum, only LoaUStore instructions are used to communicate with the memory. Load instruction reads data from the memory and stores it in the register file. Store instruction, moves data from the register file and writes it into the memory. All other instructions are Register-Register instructions. e). Register File : The register file consists of two independent register files. Data Register File (DRF) for storing data, and Check Symbol Register File (CSRF) to hold the check symbols of the data. Each code word stored in the register file is divided into two parts, the information part stored in R, of the DRF, and the check symbol part which is stored in R, of the CSRJ?. Each file has its own address decoding hardware. Thus addressing errors can be detected by a mismatch between the new check symbol generated for the data word and its stored check symbol, since the probability of both registers being in error simultaneously with the same fault is very low. DRF consists of 32x32bits general purpose registers available to the user, and CSRF consists of 32-registers each of 5-bits used to store the check symbols.

the result of most operations, the internal carries of the two operands are used. Using the internal carries generated in the ALU can reduce the cost of the hardware, but it is risky, as the internal carries can be affected by an error, the error will affect both the ALU and the Prediction Unit. To date there is no straight forward way of detecting errors affecting the internal carries, therefore, a decision was made to have a separate internal carry generator for the check symbol prediction circuit. To generate the predicted check symbol of the result of the arithmetic operations, the prediction circuit uses the actual check symbols of the operands, and the internal carries generated within the prediction circuitry; to generate the predicted check symbol of the result of logic operations, the prediction circuit uses only the operand (operands), and the actual check symbols. When the result becomes available from the output of the ALU, the actual check symbol is generated; at the same time the predicted check symbol of the result of the given operation becomes ready at the output of the predicted check symbol unit. The predicted and the actual check symbols are compared using Totally Self-checking Checkers (TSC); if they match then the result is error fiee, otherwise an error signal will be generated, and the execution sequence is halted.

5. Error Detection Capability of the RISC

The discussion of the error detection capability of the coding scheme used in the RISC Processor is divided into two sections, a) Errors occurring in transferring data from the register file to ALU ,b) Errors occurring in the ALU. a) Data Transfer Errors: Before moving any data word from DRF to the ALU or IIO port, a new check symbol for the data word is generated and compared with the check symbol for the data word stored in the CSRF, if they match, then the data word and its check symbol can be moved to the ALU or to the 1/0 port, but when an error is detected, an error signal is activated and the transfer is halted. To move the result from the ALU to its destination (DRF or I/O port), first its actual check symbol has to be generated and compared with the predicted check symbol generated by the prediction check symbol circuitry, if no error is detected then the data word and its corresponding check symbol are moved, otherwise an error signal is activated. When an error signal is activated the control unit will be informed to stop the processing the instruction. b) ALU Errors : The detection of errors in the ALU is discussed in relation to 4 separate cases. Case 1: Errors which only afSect the information bits: Any unidirectional error of weight not equal to ( m + I ) or its multiples occurring in the Arithmetic Unit (affects the result only, but not the predicted check symbol), will increase or decrease the number of zeros in the result, and

4.2. Check Symbol Prediction Block

Figure ( 2 ) shows block diagram of the circuitry which

generates the predicted check symbols Ckl and Ck2, the circuit also generates the actual check symbols Ckl and Ck2 from the result. To predict the check symbol of


Error signal - 1


Check Symbol Prediction Circuits

Error Signal-3

4 I

Note : Control Unit not shown for clarity of the circuit

Figure( 1) Block Diagram of the RISC Processor with CED


the actual check symbol generated from the information bits will not match the predicted check symbol, and the error can be detected. Case 2: Errors which only affect the predicted check symbol: When the check symbol prediction circuit generates an incorrect check symbol due to the occurrence of any number of unidirectional errors, then the predicted check symbol will not match with the actual check symbol, and an error signal will be generated; this means that any unidirectional error in the prediction circuitry can be detected since no predicted check symbol can be changed into another predicted check symbol by any number of unidirectional errors. Case 3: Errors which affect both information bits and check symbol: If the check symbol is affected by unidirectional errors at the same time as the information bits are affected by either unidirectional or bi-directional errors, then the predicted check symbol will not match with the actual check symbol, and any type of error of any weight can be detected, since no unidirectional error can change one check symbol into another. Case 4: Errors affecting the internal carries : Two internal carry generators are used, one to generate the carries needed by the arithmetic unit, the other generates the internal carries needed by the prediction circuit. If an error occurred in the internal carries in the arithmetic unit , this will affect the actual check symbol, but it will not affect predicted check symbol, as it uses internal carries which are generated separately, therefore the actual check symbol and the predicted check symbol will not match and the error can be detected. The only case where an error affecting the internal carries cannot be detected arises if both the internal carries used by the arithmetic unit and the internal carries used by the prediction circuit are affected by the same error, however the possibility of this error occurring is very low.

performing the operation on the operand, the prediction circuit is generating the predicted check symbol; the time taken by the ALU to obtain the result is about the same as the time taken by the prediction circuit to generate the predicted check symbol. If the prediction circuit requires more time compared with the ALU, this will not affect the over all delay, as the generated check symbol produced by the prediction circuit cannot be used before the check symbol of the result obtained from the ALU is generated. Simulation results have shown that the circuit delay through the prediction circuitry is much less than that through the ALU and Check Symbol Generator. From Figure(3) it is seen the time penalty from introducing Concurrent Error Detection is: t, + t3 , where t, is the time taken by the Check symbol generator to generate the check symbol and the time delay through the TSC, and t3 is again the time taken by the Check symbol generator to generate the check symbol of the result and the time delay through the TSC. The total delay, over all, is equal to : 2 x ( delay in Check Symbol Generator + delay in TSC ). Check Symbol Generator is a 32bit mods-0s counter, and its total delay is about 10 gate levels; TSC delay is 2 gate levels. Therefore the total time penalty is about 24 gate levels. From above it is seen that the ALU has to wait for 12-gate levels before it can start performing the operation, and it has to wait for another 12 gate levels before the result can be moved back to the register file. To eliminate the extra delay a pipeline structure is used in the RISC design, the execution of the current instruction in the ALU can be overlapped with checking the result of the previous instruction, hence the extra time required to check the result can be neglected

7. Hardware Complexity
The penalty of implementing a Concurrent Error Detection using information redundancy is not only with respect to time but also area overheads resulting from extra hardware. The extra hardware used is summarised below: 1- Check Symbol Register File (CSRF): This file comprises 32 registers, each 5-bits wide, the CSRF uses three busses to communicate with other RISC units. Three separate address decoders are used, in order to detect any error in reading from the wrong register, or writing to the wrong register, which could not be detected if a common address decoder is used. 2- Checkers: There are three checkers, one checker for each bus. Each checker is made of, 32-bit-mods-zeroscounter, and a two-rail Totally-Self-checking (TSC) checker. 3- Prediction Block: This block generates the predicted check symbol for the result, it consists of Ckl generator ( comprising the internal carry generator, 3 adders of 3bits

6. Performance
The total delay time of the unchecked processor is the delay of reading an operand (operands) from the register file, the delay through the ALU, and the delay of writing the result back into the register file. In the checked RISC processor the total time for any operation is equal to the time needed to read the operand from the register file, time to check the operand , ALU time, time to check the result, and time to write the result back into the register file, in other words the extra time needed by the checked processor is the time for checking the operand and the time for checking the result. To check the operand, a new check symbol must be generated and then compared with the stored check symbol of the operand, if no error is detected then the operand is transferred to the ALU and its check symbol transferred to the prediction block. While the ALU is


Operand X Operand Y

Zeros Counter Adder I Subtracter Result from ALU



Figure(2) Check Symbol Prediction Circuitry

operand Reading time

Check Symbol

Check Symbol Generator Checking time t l : ( t 1 < < t2) Critical Path processing time t2 Check Symbol Generator

Writing time

Check Symbol


Figure(3) Delay through RISC Processor


each), Ck2 generator (which comprises one full adder and three inverters), and a 5bit output latch. Qualitatively, the area overheads incurred are much less than duplication, or an implementation of Concurrent Error Detection using full Berger Code.

[I] Russell, G. and Elliot, l.D.,"Design of Highly Reliable VLSI Processors Incorporating Concurrent Error Detection and Correction", Proceedings EURO ASIC91, May1991 Paris. [2] Sayers, I.L. and Russell, G." A Unified Error Detection Scheme for ASIC Design", Chapter 15 in 'Test Techniques for VLSI and WSI Circuits', Massara. R. (Editor), Peter Peregrinus Ltd, 1989 [3] J.M. Berger, " A note on error detection codes for Asymmetric Channels", lnformation and Control, vol. 4, March 1961, ~ ~ 6 8 - 7 3 . [4] H . Dong , " Modified Berger codes for detection of unidirectional errors ", 12* lnt. Symp. Fault-Tolerant Comp., June 1982, pp 317-320 [SI J. Lo, S. Thanawastien, T. R. Rao, and, M.Nicolaidis An SFS Berger Check Prediction ALU and its Application to Self-checking Processor Designs", IEEE Trans on CAD, vol 1 1 no 4, April 1992, pp 525-540. [6] A. Maamar and G Russel1,"Checkbit Prediction using Dong's Code for Arithmetic Functions", Proc. of 3rd IEEE Int. On-line Testing Workshop, Greece, July.97,pp 254-258. [7] R.P. Brent and H. T. Kung ." A Regular Layout of Parallel Adders", IEEE Trans. Computer, vo1.C-31, March 1982, pp260264
', I

6. Conclusions
The capabilities of Dong's Code to predict the check symbol for arithmetic and logic operations has been demonstrated. The design of a 32-bit RISC processor with Concurrent Error Detection capability where all the RISC processor units such as, the ALU, Register File, Control Unit, incorporate Dong's Code, for error detection has been presented. From qualitative analysis of error detection capability of the technique as discussed in Section 2, the code detects all single errors and unidirectional errors except those which affect only the information bits and have weight equal to (m+l) or its multiples. The code can also detect some other types of errors as shown previously in Table( 1).