Beruflich Dokumente
Kultur Dokumente
James S. Bennett
Heuristic Programming Project, C o m p u t e r Science Department
Stanford University, Stanford, CA 94305
C l i f f o r d R. Hollander
I B M Scientific Center, 1530 Page M i l l Road
Palo Alto. CA 94304
often unable to attack the the failure at the systemic level. Expert associated with that segment of the diagnostic activity. These subsystems
assistance is then required, increasing both the time and cost required to currently correspond to various i n p u t / o u t p u t facilities (e.g.. DISK.
determine and repair the fault. The design of D A R T reflects the expert's T A P E . TP) or the CPU-complex itself, l o r each selected subsystem, the
ability to take a systemic viewpoint on problems and to use that viewpoint user is asked to identify one or more logical pathways which might be
to indict a specific components, thus making more effective use of the involved in the situation. Kach of these logical pathways correspond to a
existing maintenance capabilities. line of communication between a peripheral and an application program.
On the basis of this information and details of the basic composition of
For our initial design, we chose to concentrate on problems the network, the appropriate communications protocol can be selected.
occurring w i t h i n the teleprocessing (TP) subsystems for the I B M 370-class The user is also asked to indicate which diagnostic tools (e.g., traces,
computers. This subsystem includes various network controllers, dumps, logic probes) arc available for examining each logical pathway.
terminals, remote-job entry facilities, modems, and several software access
Once the logical pathway and protocol have been determined,
methods. In addition to these well-defined components there are
descriptions arc gathered of the often multiple physical pathways that
numerous available test points the program can use d u r i n g diagnosis. We
actually implement the logical pathway, it is on this level that diagnostic
have focussed our effort on handling two of the most frequent TP
test results arc presented and actual component indictments occur. For
problems, (1) when a user is unable to log on to the system from a remote
D A R T to be useful at this level, the field engineer must be familiar with
terminal, and (2) when the system operator is unable to initialize the TP
the diagnostic equipment and software testing and tracing facilities which
network itself. In a new system configuration, these two problems
can be requested, and, of course, must also have access to information
constitute a significant percentage of service calls received.
about the specific system hardware and softwate configuration of the
Interviews with field-service experts made it apparent that much of installation. Finally, at the end of the consultation session, D A R T
their problem-solving expertise is derived from their knowledge of several summarizes its findings and recommends additional tests and procedures
to follow. Figure 1 below depicts the major steps of the diagnostic
process outlined above.
843
Report Findings and Recommendations The TP subsystem f o r PROBLEM-1 w i l l be c a l l e d :
for the Session --TP.SUBSYSTEM-1 -
t
Complete one l i n e f o r each remote t e r m i n a l .
Determine Protocol Violations and applic. communic.
Make Specific Component Indictments terminal cluster program controller
for each Physical Pathway LU-name PU-name LU-name address
t HI- logical unit, PU-physical unit
Select Appropriate Protocol and 13) •• TERM1 CLUS01 IMS 069
Available Diagnostic Tools 14) •• DONE
for each logical Pathway The p a t h to t e r m i n a l TERM1 w i l l be c a l l e d :
t SNA.PATH-1
15) What is the name of t h e LINE c o n n e c t i n g
Identify Logical Pathways
CLUS01 to 059 ( i t s TP c o n t r o l l e r ) ?
for each Subsystem
• • LINE01
t
Infer Suspected CPU-I/O Subsystems 18) Please e n t e r the name of the access
for each Problem method used by IMS to communicate w i t h
t TERM1. (ACF/VTAM VTAM ACF/TCAM TCAM)
Gather Initial Symptom and •• ACF/VTAM
Configuration Information 19) What t y p e of TP c o n t r o l l e r is used
for each System f o r t h i s t e r m i n a l ? (3704 3705)
•• 370G
Figure 1. The DART Diagnostic Inference Process 20) What c o n t r o l program i s r u n n i n g i n t h i s
TP c o n t r o l l e r ? (NCP PEP)
The following transcript illustrates DARI's interactions with an • • NCP
engineer and DART's conclusions about possible faulty components. The
engineers responses arc emboldened; annotations appear in italics. 22) Please i n d i c a t e ALL of t h e nodes which
are a b l e t o a c h i e v e and m a i n t a i n
( t h r o u g h o u t the a c t i v a t i o n p r o c e s s ) a n
SESSION-201 active status.
1) What Is y o u r user-number? * • I MS There is a console command a vailable
•• 009776 which can be used to interrogate node status.
2) Are you f a m i l i a r w i t h t h e use o f t h i s t o o l ?
•• YES T o most e f f e c t i v e l y i d e n t i f y t h e cause o f t h i s
3) B r i e f l y d e s c r i b e t h e reason f o r t h i s s e s s i o n : p r o b l e m w i l l r e q u i r e a t r a c e showing the
•• HAVING DIFFICULTY ACTIVATING TP NETWORK a c t i v a t i o n of t h e LINE, the 3705 and NCP.
23) Which VTAM t r a c e t y p e s arc a v a i l a b l e to
The f i r s t system f o r SESSION-201 w i l l b e c a l l e d : a i d i n p u r s u i n g t h i s problem?
--SYSTEM-1--- •• VTAM.10
5) Is SYSTEM-1 a l o o s e l y c o u p l e d m u l t i - p r o c e s s o r ? 24) Does your VTAM 10 (RNIO) t r a c e show
• • NO t h a t a n ACTLINK ( a c t i v a t e l i n k )
The f i r s t problem w i t h SYSTEM 1 w i l l be c a l l e d : command was sent to l i n e LINE01?
PROBLEM-1 • • YES
Please enter the following information. 25) What does your VTAM 10 (RNIO) t r a c e
CPU SCP i n d i c a t e was the response to t h e
machine SCP running ACTLINK? (POSITIVE NEGATIVE NONE)
type name under VM? •• NEGATIVE
SCP is the operating system type;
VM is an IBM-supplied virtual machine operating system The most l i k e l y s o u r c e ( s ) o f the failure
6) •• 3081 MVS/SP NO d i s c u s s e d i n PROBLEM-1 i s :
7) What symptom is e x h i b i t e d by PROBLEM-1?
• • ?? LINE-LINE01 ( 6)
Expected responses a r e : UNABLE.TO.LOGON, The VTAM I 0 (RNIO) t r a c e i n d i c a t e s t h a t
UNABLE.TO.ACTIVATE.TP.NEIWORK. MISSING.DATA, an ACTLINK was sent to NCP f o r LINE01
HANG. LOOP, e t c . and t h a t a n e g a t i v e response was
•• UNABLE.TO.ACTIVATE.TP.NETWORK r e c e i v e d , The ACTLINK s h o u l d have
The user employed KM YC/N's automatic completion feature caused NCP to i s s u e an ENABLE to
8) Is t h e NETWORK.ACTIVATION problem o c c u r r i n g t h e line a n d t h i s , i n t u r n , s h o u l d
in a SNA network? (YES NO) have cause d t h e modem to respond
• • YES SNA is a particular network discipline w i t h DATA SET READY (DSR). You
9 ) I s t h e t e r m i n a l ( s ) i n v o l v e d i n PROBLEM-1 m i g h t want to check t h e modem
l o c a l l y (channel) attached, remotely i n t e r f a c e , If t h e DSR is a c t i v e t h e n
a t t a c h e d or both? (LOCAL REMOTE BOTH) i t i s 1 i k e l y t h a t t h e l i n e i s not
•• REMOTE t h e s o u r c e of the f a i l u r e . A LINE
( o r PT2) t race c o u l d be used to
At this point the TP subsystem becomes the focus of the f u r t h e r i n v e s t i g a t e the problem.
diagnostic activity
3705-059 (.2) (See e v i d e n c e f o r L I N E 0 1 ) .
844
After DART has indicated the components most likely to be at
fault, the responsibility for performing a detailed determination and
repair of the actual component failures (i.e.. microcode bugs, integrated
circuit failure, etc.) would then shift to the appropriate maintenance
groups for those components.
D. Concluding Remarks
Acknowledgements
References
845