Sie sind auf Seite 1von 4

An Undiscovered Safety Related Fault in CAN

Fuyu Yang
Chongqing Institute of Instrumentation and Automation, Chongqing, China. Email: yfy812@163.com

Abstract — The form error in passive error delimiter will order to terminate an ERROR FRAME correctly, an ’error
cause a quasi bus off state and priority inversion of messages passive’ node may need the bus to be ’bus idle’ for at least
sent by the error passive node. Present paper discusses the 3 bit times (if there is a local error at an ’error passive’
scenarios and method to cure this problem.
scenarios receiver). Therefore the bus should not be loaded to
100%.” This is a cause of fault in application. Because no
Keyword—CAN;
CAN; Error Delimiter; Priority Inversion time synchronization between nodes is established, bus
idle time can not be assured distributing as mentioned
above. Even though bus utilization rate is low, pending
CAN is most successful field bus, especially in messages will start sending after intermission (I.M.)
automotive industry. In 2006 alone more than 500 immediately. This is stipulated by specification: “A
millions of CAN controllers were sold [1]. In past 16 years message, which is pending for transmission during the
many researches are dealing with the safety issue about transmission of another message, is started in the first bit
the CAN’s application in safety- critical system. Most of following intermission”
them are timelines due to event triggered nature of CAN Lets consider a case where an error passive receiving
and message consistence caused by last-but-one-bit error. node found a error because a local error caused by EMI.
Some are schemes for treating Babbling Idiot error in It sends a passive error flag (P.E.Flag). Other nodes do
CAN system. But no literature has been published not find error and no response to this passive error flag.
concerning a fetal error rooted in specification self. This This passive error flag is affirmed after ACK delimiter
paper will discuss this problem. The first part of the paper and in EOF field of the message(Fig.1). The following
shows the problem and its catastrophic aftermath. The passive error delimiter (P.E.Del) will last over the end bit
second part shows possible scenarios exemplary in of EOF and intermission. If there is 3 bits bus idle time,
message transmission that could cause above mentioned the Start Of Frame bit (SOF) of new frame will be in the
problem and that is beyond the original estimation. The first bit of that error passive node’s intermission and is
third section shows the consequence of failure taking seen as a request of overload frame. The overload frame
SAE benchmark test suite as example. Followed is the will cause bit stuff error in other nodes, they will start
solution which could be used in redesign of CAN their error frames. Their error frames are superposed with
communication controller. Final is a short conclusion. this overload frame. Although the bit stream is explained
differently by other nodes and error passive node, at the
end of active error delimiter or overload delimiter all
1. PROBLEM CASE nodes will reset their state machine of CAN protocol to a
In CAN specification 2.0 A[2] section 3.1.3 it says:” In state that is ready to start an intermission. Hence
--------------------------------------------------------------------- CRC ACK EOF I.M New Frame

LIMITED DISTRIBUTION NOTICE Error active node SOF


no Error found R D R R R R R R R R 3 4 5 D 7 8

This report may have been submitted for publication. In view of Error passive node Passive Error Flag overlapping P.E.Del
Error found New Error Found
copyright protection in case it is accepted for publication, its distribution
is limited to peer communications and specific requests. Figure 1. Local fault in error passive receiver
synchronization is built among all nodes. This explains scenarios where the error passive node may loss
the Bosch CAN specification. But if there is no guarantee synchronization with other nodes in the cluster. The
of 3 bit bus idle time as mentioned above, what will scenarios are beyond the assumption in Bosch CAN
happen? specification in three aspects. The one is that not only
The error check in error delimiter part is clearly defined error passive receiver may loss synchronization, but also
in ISO 16845 ”Road Vehicles-Controller Area Network the error passive transmitter may loss synchronization.
(CAN)-Conformance test plan”[3]. Any dominant bit in an The second is that in some time more bus idle time (at
error delimiter part is a form error; item 7.5.6 and 8.5.13 least 10 bit) is needed to make synchronization. The third
describe the test method of Form Error in Passive Error is that not only local fault of error passive node will cause
Delimiter for receiver and transmitter correspondingly. problem, but also in rare case the local fault of error
Item 7.6.12 and 8.6.9 test receive error counter and active node may cause problem. Here are some examples.
transmit error counter increments when form error But the problem cases are not limited to these examples.
occurred in passive error delimiter. Hence when other In Fig. 2 an error passive receiver missed a global error
node starts sending a new message, its SOF bit will be on bus due to a local fault, e.g. the EMI. But it detects a
seen as an error in this error passive node’s error bit stuff error introduced by active error flag of other
delimiter part. This node will start a new passive error nodes. Its passive error frame ends later than active error
frame in the time span of other node’s new frame. The frame of other nodes. Hence synchronization is lost.
passive error flag is affirmed in new frames’ EOF field. In Fig. 3 a local fault occurs at EOF field. Because of
The passive error delimiter will last over the end bit of the nature of passive error frame other nodes will not see
EOF and intermission again. As long as the pending any error. They end transmitting or receiving in normal
message is not ending, this process will repeat again and way. The error passive node will end its passive error
again. During this interval the error passive node can not frame far later than the ending of other nodes. In this case
receive or transmit any message, because it is in sending the worst bus idle time needed for this passive node to
passive error frame process. It delays all messages that synchronize is 10 bit time. When the bus idle time is 10
should be sent by this node no matter how high message bit time, the SOF of new frame will be seen as a request
priority is. This problem causes priority inversion. No of overload frame by this node. Though the node will be
schedule analysis is possible for this system due to synchronized at the end of overload frame, it introduces
statistic nature of EMI. There are two possible ways for unnecessary band overhead for this overload frame. Here
this node to escape this trap. If there is an active error the ideal bus idle time is 13 bits.
frame occurs in somewhere of the following frame In Fig. 4 the error passive transmitter read the ACK bit
transmitting process, the passive error frame of this error wrongly due to its local fault, say EMI. Its passive error
passive node will be overlapped with the active error frame will be seen as ACK delimiter and EOF field by
frame, and the repeating is ended. But in normal other nodes. It is asynchronous as in previous cases of
application error occurrence is rare. Hence the more error passive receiver.
normal the following message transmitting is, the longer
A.E.Flag A.E.Del I.M New Frame
Error active node
this node goes under the quasi bus off state. Another Error found SOF
B B DD DD DD R R R R R R R R 3 4 5 D 7 8
possibility of the ending of repeated form error is that Error passive node
Error Found
P.E.Flag P.E.Del
New Error Found
after some transmission no any pending frame is there,
the idle time is large enough to allow the error delimiter Figure 2. Global bit stuffing error with local fault in error passive

finished correctly as the specification expected. receiver

ACK EOF I.M.


Error active nodes new frame
error not found SOF
B B B B RDRRRRRRRRRRRDB B B B
2. POSSIBLE SCENARIO CRC ACK
1 2 3 4 5 6
EOF P.E.Frame

Besides the local error of an error passive receiver that


will cause the quasi bus off failure, there are other Figure 3. Local fault in error passive receiver only
When the error passive node is trapped into the above
ACK EOF I.M.
analyzed situation, it lost the chance to receive or send
Error active nodes new frame
error not found SOF any message. It will restore to normal state till bus is idle
B B B B RRRRRRRRRRRR RDB B B B
CRC P.E.Flag P.E.Del I.M. or some error active node showing error frame. Because
ACK Error found Form Error found
the error occurring is a random issue, its interval may be
Figure 4. Local fault of error passive transmitter only quiet long in normal application environment. The worst
ACK EOF I.M.
case for that error passive node to escape the quasi bus off
Error active nodes new frame
error not found SOF state is when the bus is idle. The time needed equals to
B B B B D DR RR RR R RR RR R DB B B B
CRC P.E.Flag P.E.Del I.M. the maximum worst response time for all messages in that
Form Error found Form Error found
system. Take SAE benchmark test suite modified by
Figure 5. Local fault of error passive transmitter only Tindell[4] as example, the maximum worst response time
EOF
is 49.192ms for bus rate 125kbps and 14.404ms for
ACK I.M. new frame
Error active nodes
error not found SOF 250kbps (p13 in [4]). The bus utilization rate is 94.2%
B B B B RDDRRRRRRRRRRDB B B B
CRC ACK P.E.Flag P.E.Del and 47.1% correspondingly. If the fault occurs in node
Form Error found Form Error found
that sends or receives brake pressure message (its period
Figure 6. Local fault of error passive transmitter only is 5 ms), it means the car will lost control almost 10 times
for 125kbps or 3 times for 250kbps. If unwanted brake
A.E.Flag A.E.Del I.M New Frame
forces are exercising on the wheels during that interval, a
Error active node
Error found SOF car will risk an unpredicted result. A car rides at speed of
B B DD DD DD R R R R R R R R 3 4 5 D 7 8
Error passive node
Error Found
P.E.Flag P.E.Del 100km/h will lost control in distance of 1.4m or 0.4m.
New Error Found
The state synchronization loss of passive node may
Figure 7. Local fault in error active node causes failure in error passive end earlier than the worst case mentioned above. But this
node . is fully depends on the statistic property of error. This is
outsides the assumption of CAN schedule analysis and
In Fig. 5 local fault of error passive transmitter occurs make the CAN design tool inaccurate. When passive error
at CRC delimiter. First bit of its passive error flag is node lost synchronization, even there is some unknown
overwritten by ACK of other nodes. The rest parts of its bus idle time, it might invoke overload frame that is not
passive error frame are seen as ACK delimiter and EOF considered in scheduling analysis. Hence the tool will not
field. Hence asynchronous occurs too. give the real result accurately. This endangers the safety
In Fig. 6 an error passive transmitter found a form of designed system. The overload frame also reduces the
error at ACK delimiter due to its local fault. It starts a throughput of bus.
passive error frame. The recessive bits in its passive error
frame are read as normal EOF field by other nodes. In 4. SOLUTION
this case it also losses synchronous with other nodes. After thoroughly analysis of all possible error cases:
When a small system has only one error active node global error and local error, the relation between scenario
left, its local fault will cause an active error frame. This and necessary passive error delimiter length for passive
active error flag will be seen by all other error passive error node to synchronize correctly with other nodes is
nodes either as bit stuffing error or form error and cause clear. In some cases the length of passive error delimiter
their passive error frames (see Fig. 7). The result is the should be 8 bits as its original value. In some cases it
error active nodes finishing its error frames earlier than should be 2 bits. In some cases it should be 1 bit. When
other error passive nodes. The loss of synchronization local error occurs in EOF fields of data frame or remote
shows again. frame, to make correct synchronization the best choice is
reset the state machine of CAN protocol at last bit of
original EOF. To make suitable decision one must know
3. AFTERMATH the position where error occurred and the pattern of bit
stream. Based on these information a judgment is made consider the safety requirement of application.
that the current error is local error or global error. Then
the corresponding length of error delimiter or reset action
is taken. A total solution is patent pending. 5. CONCLUTION
Simply changing the passive error delimiter to The consequence of the form error in passive error
another fixed value may reduce complexity of delimiter has long been neglected. But in some cases this
implementation. It in some cases can avoid long quasi bus bus off and priority inversion is not permitted. It does not
off state when passive error node has local fault, but this deserve the penalty for only 1 bit EMI. To use CAN in
remedy can not cope all possible cases of loss of safety critical application, the problem must be cured.
synchronization. It will introduce other disadvantages.
For example, increase the fail-to-receive chance of error Reference:

passive node; less predictability of scheduling result due [1] ”CAN in Automation celebrates first 15 years”, 15-year-cia.pdf

to repeated passive error frame or invoking of overload [2] Robert Bosch GmbH. CAN Specification Version 2.0, September

frame; possibility of priority inversion in this node or 1991

other nodes; waste of band utilization by overload frame. [3] ISO/TC 22/SC3. International standard: ISO 16845 ”Road Vehicles-

The degradation of performance is not paid by the cost Controller Area Network (CAN)-Conformance test plan” 2004

reduction in implementation. [4] K. W. Tindell and A. Burns. “Guaranteed Message Latencies for

In this way the modified passive error frame can not pass Distributed Safety-Critical Hard Real-Time Control Networks”.

the full test cases in ISO 16468 items 7.5.6 or 8.5.13. The Technical Report YCS229, Dept. of Computer Science, University of

standard ISO 11898 AND ISO 16845 should be revised to York, June 1994. YCS-94-229.pdf

Das könnte Ihnen auch gefallen