Sie sind auf Seite 1von 119

Active Response Gravity Offload System (ARGOS) Subject Drop Mishap JSC 13-0006 IRIS S-2013-022-00002 Close Call

Mishap Date of Mishap: January 16, 2013 Date of Report: March 22, 2013

NASA/JSC

ACTIVE RESPONSE GRAVITY OFFLOAD SYSTEM (ARGOS) INVESTIGATION BOARD REPORT

NR

Final Report Findings and Recommendation


3/22/2013

Gail Chapline, Joe Anderson, Mary Cerimele, Mike Cooke, Mike Foreman, John Haas, Art Knell, Asher Lieberman, John Ruppert
1

Table of Contents
1 2 3 4 5 6 Executive Summary ...............................................................................................................................................1 Acknowledgments ................................................................................................................................................2 Background ...........................................................................................................................................................3 Investigation Board Objectives .............................................................................................................................4 ARGOS Description ...............................................................................................................................................4 Investigation .......................................................................................................................................................10 6.1 Interviews .......................................................................................................................................................10 6.2 Mechanical System .........................................................................................................................................10 6.3 Hardware Inspection ......................................................................................................................................11 6.3.1 Running Torque Measurements.............................................................................................................12 6.3.2 Incremental Disassembly and Sample Collection ...................................................................................13 6.3.3 Detailed Inspection of Major Components ............................................................................................18 6.4 Fault Tree ........................................................................................................................................................24 6.5 Control System ...............................................................................................................................................26 6.6 Electronic System ...........................................................................................................................................28 6.6.1 E-Stop System ........................................................................................................................................28 6.6.2 Z-Axis Motor Controller ..........................................................................................................................29 6.6.3 CAN Bus Interface ..................................................................................................................................29 6.6.4 Computer Running Trick Simulation ......................................................................................................29 6.7 Software .........................................................................................................................................................30 6.7.1 Background ............................................................................................................................................30 6.7.2 ARGOS Hoist Control System Background .............................................................................................30 6.7.3 Software Validation ................................................................................................................................31 6.7.4 ARGOS Software Configuration Management .......................................................................................31 6.7.5 Software Regression Testing ..................................................................................................................34 6.7.6 Fault Detection Software Logic ..............................................................................................................34 6.7.7 Test Data Review ....................................................................................................................................39 6.8 Safety and Hazard Analysis .............................................................................................................................40 6.8.1 Safety .....................................................................................................................................................40 6.8.2 Hazard Analysis ......................................................................................................................................40 6.9 Engineering Processes, Roles and Responsibilities .........................................................................................40 7 Findings and Recommendations .........................................................................................................................42 7.1 General Findings: ............................................................................................................................................42 7.2 Proximate/Root Causes and Contributing Factors .........................................................................................42 7.2.1 Proximate Cause ....................................................................................................................................42 7.2.2 Intermediate Cause ................................................................................................................................42 7.2.3 Root Cause .............................................................................................................................................43 7.2.4 Contributing Factors...............................................................................................................................43 7.3 Specific Findings: ............................................................................................................................................43 7.3.1 Findings Specific to Mechanical System Design .....................................................................................43 7.3.2 Findings Specific to the Z-axis Controller System ...................................................................................46 7.3.3 Findings Specific to Software Design......................................................................................................47 7.3.4 Findings Specific to Safety and Hazards .................................................................................................48 8 References ..........................................................................................................................................................50 8.1 Appointment Letter ........................................................................................................................................51 8.2 Materials Chemical Analysis Report ...............................................................................................................53 8.3 Materials Metallurgical Analysis Report .........................................................................................................61 8.4 ARGOS Startup Checklist ..............................................................................................................................106

List of Figures
Figure 1: Handrail Involved in the Incident....................................................................................................................3 Figure 2: ARGOS System Picture ....................................................................................................................................5 Figure 3: Inline Lifting Components ...............................................................................................................................6 Figure 4: Yates Shock Absorber .....................................................................................................................................7 Figure 5: Spring/Damper Festo Muscle ......................................................................................................................7 Figure 6: STI Load Cell ....................................................................................................................................................8 Figure 7:VNCHI Gimbal Assembly ..................................................................................................................................8 Figure 8: Genie Assembly without Rope........................................................................................................................9 Figure 9: ARGOS Z-Axis Heavy Lift Assembly Overview ..............................................................................................11 Figure 10: ARGOS Z-Axis Heavy Lift Assembly Exploded View....................................................................................12 Figure 11: Torque Measurement Site .........................................................................................................................12 Figure 12: Torque Measurements for Various Knob Positions ...................................................................................13 Figure 13: ARGOS Project Created Fault Tree .............................................................................................................24 Figure 14: High Level Control Loop ..............................................................................................................................26 Figure 15: Electronics System Block Diagram .............................................................................................................28 Figure 16: ARGOS Control System Block Diagram .......................................................................................................30 Figure 17: Slide from EMU TRR Software ....................................................................................................................31 Figure 18: ARGOS Startup Checklist from day of incident ...........................................................................................32 Figure 19: ARGOS Startup Checklist Path Verification (step 54)..................................................................................32 Figure 20: ARGOS Trick Simulation Control Screen capture from day of incident ......................................................33

ii

List of Tables
Table 1: ARGOS Evolution of Mechnaical System ..........................................................................................................4 Table 2: Listing of Major Components within Focus Area ..........................................................................................18 Table 3: Trick Simulation Source Code Files ................................................................................................................33 Table 4: Fault Detection for the Hazard of Un-commanded Motion ..........................................................................38 Table 5: Test Parameters Recorded by the ARGOS Trick Simulation ...........................................................................39

iii

1 Executive Summary
On January 16, 2013, during a test in the ARGOS with a test participant in a pressurized Modified Advanced Crew Escape Suit (ACES), the test participant was un-intentionally dropped, approximately 12 to 18 inches (30.5 to 45.7 cm) in the vertical (-Z) direction. Although the test subject did not suffer significant injuries, the potential for a serious injury was present. Serious injury can occur at shorter distances as the test subject has less time to react. The un-intended drop could have been as much as 4 to 5 feet (1.2 to 1.5 m), based on the length of cable released. Slight damage to the test structure and a handrail mock-up was also sustained. The Software, Robotics and Simulation Division (ER) initiated an internal investigation, which included running two tests on the hardware with no test subject, interviews with test participants and a preliminary report. A close call was filed to record the incident. On January 21, 2013, an Engineering Investigation Board was convened by the Director of Engineering to investigate the close call and identify the causes, and any contributing factors relating to the close call. The team was also charged with developing recommendations to prevent a similar incident. The Investigation Board found that the incident was most likely caused by partial gearbox binding/jamming causing an undesired motor controller response. The motor controller is COTS hardware, and little to no information was provided by the vendor in regard to how it performs its function, essentially a black box in the system control loop. In combination with the hardware design issues found during the investigation, the controller commanded a high velocity downward motion, resulting in the test subject free falling onto a test mockup (handrail).

2 Acknowledgments
The Board would like to acknowledge the many discussions with Larry Dungan and Paul Valle regarding the ARGOS. Without their help, answers to endless questions, and thorough knowledge of the ARGOs, the Board could not have completed the investigation. The Board would also like to thank several consultants to the Board; Monty Carroll and Ray Morales for their invaluable expertise on control systems, Duane Pierson and Linda Shackelford, from the Institutional Review Board (IRB) for their discussions regarding human safety and hazardous environments and Irene Piatek and Charlene Curtis for discussions on Engineering Project management work instructions.

3 Background
On January 16, 2013, during a test in the ARGOS with a test participant in a pressurized Modified Advanced Crew Escape Suit (ACES), the test participant was unintentionally dropped, approximately 12 to 18 inches (30.5 to 45.7 cm) in the vertical (-Z) direction. Although the test participant fell approximately 12 to 18 inches (30.5 to 45.7 cm), 4 to 5 feet (1.2 to 1.5 m) of wire rope was driven off the drum in the vertical (-Z) direction during this event. The test began normally and approximately nine minutes into the test the drop occurred. At the time the participant was translating along a handrail, simulating microgravity, in a horizontal position, body parallel to the ground, see cover page image. The handrail was mounted on pallets, raising the handrail approximately 24 to 36 inches (61 to 91 cm) above the floor. The participant landed on the handrail, permanently bending the handrail, See Figure 1.

Figure 1: Handrail Involved in the Incident The drop of the participant was followed by a slight roll to the right and then the gimbal mechanism fell on top of the participants back. The entire event took approximately 0.5 seconds. The participant had minor injuries (bruising), with no medical attention required, and minor damage to the test facility was incurred, bent handrail. The incident was classified as a close call. The test was terminated, the suit and ARGOS personnel, assisted the participant out of the ACES suit. The ARGOS team initiated a preliminary investigation. Two tests were conducted, which were approved by Test Safety Officer and Software, Robotics and Simulation Division Management. With no weight on the system, the ARGOS configuration GUI was launched and the system was enabled to check the motor controller. After enabling, the system drifted downward slowly about 2 inches (5 cm) during the first two seconds. Then it suddenly moved downward rapidly for 15 inches (38 cm) before a manual emergency stop was initiated by an operator. Next, the cable drum was turned manually by hand and was subjectively noted to be stiffer and difficult to rotate than normal. The ARGOS was then manually jogged, unloaded at 30 rpm, motor velocity, both up and down using the ARGOS Configuration GUI in the unsuited gear ratio. Again, the ARGOS engineers noted that during the jog

commands, the system exhibited abnormal behavior as follows: 1. Decreased ability to hold constant velocity, 2. Sluggish acceleration and 3. Sluggish deceleration. During the event the safety stops which were enabled, all failed to stop the test subject from impacting the handrail. If the handrail had not stopped the drop, the test subject could have been dropped as much as 4 to 5 feet (1.2 to 1.5 m).

4 Investigation Board Objectives


The Boards primary objective is to gather the facts and identify the cause(s) and contributing factors relating to the ARGOS incident and to recommend appropriate actions to prevent a similar incident from occurring again. The Board was comprised of members from the Engineering Directorate, Safety and Mission Assurance Directorate, Mission Operations Directorate and Crew Office, see appointment letter EA-13-001.

5 ARGOS Description
The goal of the Active Response Gravity Offload System (ARGOS), shown in Figure 1, is to develop the technology for a facility to simulate reduced gravity environments found in low earth orbit, in proximity of asteroids, and on lunar and Martian surfaces. ARGOS is used to evaluate unsuited and suited human performance of ambulation and exploration, EVA tasks at different offloads and with different interfaces, including the use of various gimbals and harnesses. The various tasks intended to characterize human performance on ARGOS include treadmill walking, incline walking and jogging; over ground walking; jumping; exploration type EVA tasks; and other dynamic movements of the human. The project started initially with an X (horizontal translation) and Z-axis (vertical translation/offload) commercial off the shelf (COTS) mechanical system that expanded to a large X and Y (horizontal translation) and Z-axis (vertical translation/offload) system. A custom Z-axis mechanical system was then designed, prototyped, and tested which was then combined with the COTS X and Y-axis mechanical system. The evolution of ARGOS mechanical systems is shown in Table 1. The NASA Standards for Lifting Devices and Equipment, Doc #: NASA-STD-8719.9, and other industry standards (ASME B30.2, Overhead and Gantry Cranes and ASME B30.5 Mobile and Locomotive Cranes), have been used as guidelines but there is no Voluntary Consensus Standards that specifies the design or operation of a ground based human rated robotic system. Testing of the system started with simple activation of the motor, progressed to static weight testing, utilization of a Stewart platform, and finally a human in the loop. Performance data collected from ARGOS Generation 1 led to development of an improved Generation 2 ARGOS. COTS X and Z COTS XYZ Generation 1 Custom Z Generation 1 Custom Z Generation 1 Custom Z COTS XY, Gen 1 Custom Z COTS XY, Gen 1 Custom Z Generation 2 Custom XY Generation 2 Custom Z Generation 2 Custom XYZ Generation 2 Custom XYZ Human Interaction Testing Human Interaction Testing Non-Human Interaction Testing
(Stewart 6-DOF platform gait simulator)

8/26/2008 2/13/2009 4/9/2009 7/24/2009 9/28/2009 1/19/2010 4/8/2010 6/20/2011 9/30/2011 11/7/2011 3/1/2012

Human Interaction Testing Human Attached Testing Human Interaction Testing Human Attached Testing Human Interaction Testing Human Interaction Testing Human Interaction Testing Human Attached Testing

Table 1: ARGOS Evolution of Mechanical System 4

The Generation 2 ARGOS system has two different gear ratios (Unsuited and Suited). The Unsuited gear ratio provides the capability to offload up to 300 lbf (1334 N) with high dynamic capabilities. The Suited gear ratio provides the capability to offload up to 750 lbf (3336 N) with low dynamic capabilities. The system works by providing a constant force offload through an overhead motion control system. The Generation 2 ARGOS system provides a wider range of capabilities for robotic, rover, and human space flight testing. The following sections provide descriptions of the major sub-systems.

Figure 2: ARGOS System Picture

E-stop System An emergency shutdown can be activated by the following: Manual activation of the e-stop by the test team. Automatic activation of the e-stop by the motor controller in the event of system fault requiring an emergency stop. Automatic activation of the e-stop by the limit switch system. In each direction of travel the system is equipped with two limit switches as required by the NASA crane standard. This e-stop can only occur if the first limit switch has failed.

Subject Force Input Due to the dynamic movement capabilities of ARGOS, forces can be induced into the person whose weight is being offloaded in the event that an emergency shutdown is required. These forces will not be seen during normal operations. The analysis of potential forces is very difficult and a very conservative worst cases analysis indicates forces could reach approximately 2698 lbf (12kN). The ARGOS team worked with human performance experts to determine the force levels of a world class athlete jumping upward and have the system e-stop activation occur at the worst time, just after leaving the ground. The probability of this is very small and most people or systems could not achieve the required kinetic energy. However this case was considered and the hazard controlled. Figure 3 illustrates the components in the lifting path with the exception of the gimbal assembly.

Figure 3: Inline Lifting Components The OSHA limits for fall protection at the hook attachment point are 1800lbf (8kN) (OSHA 29 CFR Parts 1910 and 1926). To prevent these forces from transferring into the human, a Yates shock absorber (shown in Figure 4), a COTS product utilized in climbing fall protection, is installed in line with the lifting cable. The Yates part number is 602. The shock absorbers deploy when forces exceed 450 lbf (2kN). The 450 lbf (2kN) is based on the manufacturers design and data which was confirmed with deployment tests. The forces into the human or robot would not exceed 450lbf (2kN) and is one-fourth the allowed OSHA forces. Over the past four years of testing there has not been a deployment of these devices during human testing.

Figure 4: Yates Shock Absorber Series Elastic Actuator (SEA) A series elastic actuator (SEA) provides spring and dampening in the load path. SEAs adds a spring with a known spring constant in series with manipulators to increase compliance and decrease natural frequency. This spongier manipulator results in better force control allowing improved tuning of the system and increased stability. A COTS product from Festo Inc, shown in Figure 5 is utilized. This product is actually a pneumatic muscle being used in a constant pressure application. This device was evaluated and determined to not be a pressure system. Festo muscle is used in the load path with a load rated choker in parallel. There are two Festo muscle lengths that can be utilized and any combination may be placed above and below the load cell. ARGOS currently uses two Festo muscles in line with the load cell (one above and one below).

Figure 5: Spring/Damper Festo Muscle

Load Cell An STI load cell, shown in Figure 6, with an amplified output provides the force measurement. The cable is double shielded and the electronics are housed in a metal box to decrease electromagnetic interference. A programmable anti-aliasing filter is utilized as a low pass filter to eliminate aliasing issues between the load cell and a/d converter. The force measurement is sampled every millisecond for input to the control logic which adjusts the output velocity of the motor needed to maintain desired off-load force throughout load disturbance.

Figure 6: STI Load Cell Gimbal The Versatile Neutral Capability Horizontal Interface (VNCHI), shown in Figure 7, is attached to ARGOS via the Festo muscle and to the suited subject. Other gimbals and harness setups are available/utilized depending upon test objectives. The VNCHI gimbal design is intended to connect a human test participant to the ARGOS in the horizontal position for microgravity simulation. The intent of the VNCHI gimbal assembly is to have a system that provides roll, pitch, and yaw rotations about the test participants center of gravity (CG) while connected to ARGOS in the horizontal position. The gimbal attaches rigidly to the test participants hang -gliding harness, which the participant lays in securely. There are adjustments to align the participan ts CG with the lifting path, so the CG is always centered under the ARGOS cable. The gimbal consists of custom Aluminum 6061-T6 and 1515-5 PH Stainless Steel parts with COTS bearings and fasteners.

Figure 7:VNCHI Gimbal Assembly Emergency Egress In the event of a power outage or system failure that prevents the function of ARGOS the test participant will be removed from the system by a rolling stair case ladder. If the treadmill is being used at this time, a small stair case ladder will be placed on the treadmill deck and the participant will walk down the ladder. For a power outage with a robotic system the load will be treated as a suspended load and removed after power has returned to the facility. In the event the test participant becomes injured and is unable to walk down the stair case ladder a Sky Genie variable descent device will be deployed to lower the person to the ground. The Sky Genie was used by the Space Shuttle program for crew member emergency egress from the orbiter. It is shown below in Figure 8. Prior to each use the Sky Genie hardware, rope, and cables are inspected for cuts, frays, broken strands, or other visual damage. The rope is changed out after two years of use, and has a shelf life of 5 years. The attachment point onto the z-axis 8

is rated for a 4945 lbf (22 kN) load as required by OSHA and the vendor documentation. The Sky Genie is attached to the z-axis and lifting path by locking carabiners. The Sky Genie is a controlled descent device and not intended for use as a fall protection system.

Figure 8: Genie Assembly without Rope For testing with individuals in space suits or other loads where the preference may be to remove the load with the man basket instead of the Sky Genie a 4 x 8 feet (1.2 x 2.4 m) COTS man lift backset attached to the fork lift is used to lower the load to the ground. Personnel in riding in the man basket are required to wear fall protection equipment. When required this equipment and a certified operator are required in the ARGOS area during the testing. Controller See Controller System, Section 6.5 Electronics See Electronics System, Section 6.6 Software See Software, Section 6.7 Mockups In the ARGOS test area several floor mockups are used to simulate space station hand rails, bolt torquing, different rock surfaces and interactions. These mockups are moved in/out of the test area as needed. These mockups do include rocks and the hazards associated with handling rocks. The use of hand tools and battery powered drills are part of the tasks conducted with these mockups.

6 Investigation
6.1 Interviews
Limited interviews were conducted as witness statements were taken by the ARGOS team immediately after the close call. Two interviews were conducted. The first interview was with the subject crew member in the close call. This test run was the first experience for the test subject in the ARGOS. So there was no comparisons he could draw on as far as how the system behaved. He also stated that since he was in a modified ACES suit, with headphones on, he was insulated both physically and from external noises. He said although he was dropped 12 to 18 inches (30.5 to 45.7 cm) onto the handrail, and that the harness fell on top of him, he was not injured. He did experience a fairly good impact on the face plate of the helmet that hit his jaw. He also stated that it was difficult 2 to tell where resistance came from, as far as the suit (pressurized at 4.3 psi (29,650 N/m )) or ARGOS when translating. Just prior to the incident, he was translating along the handrail, using both pull and push, but was not commanding a downward motion. The second interview was with Safety and Test Operations Division, subject matter expert on lifting requirements in NASA Standard 8719.9. Most of the requirements in this document were believed to have been met by the ARGOS team but there are some exceptions, specifically with the control system design and the limit switch configurations. The Board members and the subject matter expert did agree that ARGOS has unique performance requirements and Chapter 4 was the closest fit lifting system in NASA Standard 8719.9 in terms of providing guidance to the design team.

6.2

Mechanical System

For the purposes of this investigation, only the ARGOS Heavy Lift Z-Axis Assembly (ARGOSZAE500) will be discussed. See Figure 9. The Heavy Lift Assembly is a NASA designed electromechanical system whose basic function is to raise and lower a suspended object or human in response to commands issued from a force feedback control system. The object is suspended via a Hoist Cable wrapped around a spiral cut Drum which can rotate and translate. The rotation of the drum provides the change in object elevation, while the translation (synchronized to the spiral lead) maintains a constant cable exist point and prevents cable layering. The assembly contains redundant fail safe brakes and an integral servomotor brake that will engage to prevent Drum rotation when power is removed. Connected in series to the Drum, is a constantly meshed two-speed transmission. The transmission makes use of helical cut gear sets to reduce vibration and driveline noise so that disturbance inputs to the force feedback control system are minimized. The transmission contains two manually selectable gear ratios: 1. 2. Unsuited Gear Ratio: This ratio is used for objects whose mass is less than 300 lbf (1334 N) Suited Gear Ratio: This ratio is used for objects whose mass is less than 750 lbf (3336 N) **The use of Suited/Unsuited does not describe the configuration of the test object. The gear ratios have no synchronization and require complete offload before selection. The selection mechanism is comprised of a Shift Fork connected to a Clutch Plate with anti-friction nylon pads. The Shift Fork moves the Clutch Plate between the desired gear ratios by use of a spline drive, driven externally and manually by a Gear Selector Knob. Positive indication of transmission engagement is accomplished visually by locking the Gear Selector Knob into position and electronically by end of travel limit switches. Connected to the transmission is an AC servomotor manufactured by Kollmorgen, driven by an off the shelf motor controller and commanded by a NASA designed control system. The control system receives object position data from an absolute encoder geared off the Drums rotation shaft, an integral AC motor encoder and two Drum end of travel limit switches.

10

Figure 9: ARGOS Z-Axis Heavy Lift Assembly Overview

6.3

Hardware Inspection

On January 28, 2013 the ARGOS Heavy Lift Z-Axis Assembly (ARGOSZAE500) was removed from the Heavy Lift XAxis Assembly (ARGOSSTE502) and placed on a disassembly bench in NASA JSC Building 9. The hardware inspection team was comprised of all Board members and representatives from the Software, Robotics, and Simulation Division (ER). The goal of the inspection was to evaluate the hardware for any signs of binding, seizing or jamming using the following approach: Prior to disassembly, measure the systems running torque, under no load Perform a complete visual inspection of all assessable rotating parts Develop a focus area, comprised of major components most likely to cause mechanical failures Disassemble the items within the focus area incrementally to allow for visual inspection and photography

Shown below in Figure 10, is an exploded view of the Heavy Lift Assembly and identification of the focus area and its major contributors. Each major component is identified using an item number for reference in subsequent discussions.

11

Figure 10: ARGOS Z-Axis Heavy Lift Assembly Exploded View

6.3.1

Running Torque Measurements

Initial inspection of the hardware using external torque measurements was performed with the unit intact. Using a calibrated dial type toque wrench, both break away and running torque measurements were taken on the output shaft (Drum rotation axis) for various positions of the Gear Selector Knob, See Figure 11. Measurements were performed by Board member, Joe Anderson, with care taken to minimize inertial loading onto the measurement device.

Figure 11: Torque Measurement Site The position of the Gear Selector Knob was varied between different states to evaluate drag from the selector mechanisms rigging. Measurements for each knob position were repeated a minimum of three times and the averages are presented in Figure 12. The results show the lowest amount of torque was achieved in the neutral position (fewest rotating components) and that the Suited and Unsuited locked gear selections resisted with 12

approximately 30 inlbf (3.4 Nm). A noticeable change in torque measurement was seen when the Unsuited selection was toggled between locked and unlocked. The cause was due to a rigging method that allowed the internal Shift Fork (Item 11) to be preloaded against the rotating Clutch Plate (Item 5) such that frictional drag was introduced into the gear train. Other that than the friction effect noted, no other anomalies were discovered and the gear train rotated smoothly under no load.

Figure 12: Torque Measurements for Various Knob Positions

6.3.2

Incremental Disassembly and Sample Collection

Following the torque measurements, the Board tasked the ER design team (Paul Valle and Dian Poncia) to start disassembly. During the course of the disassembly, an incremental process of component removal followed by visual inspection and material sampling was used. The following collection of images (Sites 1 10) is used to show the areas of the gear box that were noted as critical inspection points and where specific material samples were taken. See Section 8.2 for a detailed chemical analysis of the collected samples. Refer to Figure 10 for item number references.

13

Inspection Site 1: This site contained excess lubricant and particulates on the Clutch Plate (Item 5). This area was of particular interest due to the increased running torque recorded during the pre-disassembly torque tests.

Inspection Site 2: This site was used to obtain a fresh grease sample for use in setting a baseline for subsequent materials evaluation.

Inspection Site 3: This site contained additional particulate debris on the Clutch Plate (Item 5). The Clutch Plate area is of particular interest as it is used to transmit motor loads to the two available gear sets. Due to the close proximity of rotating components and their inherent misalignments, the probability for mechanical interference and debris generation is increased in this area.

14

Inspection Site 4: This site contained grease and residue from the interaction of the Output Gear (Item 1) and the Unsuited Gear (Item 2).

Inspection Site 5: This site contained grease and metallic debris caused by unintended contact between the Suited Gear (Item 4) and the Snap Ring (Item 10).

Inspection Site 6: This site contained grease and particulates from the interaction of the Suited Gear (Item 4) and its adjacent Thrust Washer (Item 7).

15

Inspection Site 7: This site contained grease and metallic particles generated from dithering action between the Shifting Shaft (Item 6), its drive gear and a closeout snap ring. By design these items are keyed to permit torque transmission, however excess clearance and hardness mismatch lead to galling and wear. Inspection of the design identifies a spiral retaining ring to be installed in indicated position, actual hardware had an open snap ring.

Inspection Site 8: This contained a small piece of plastic debris (Delrin) located on the RH Torque Spline (Item 12). This debris was most likely dislodged from the splines nut locat ed on the Shift Fork (Item 11). Inspection of nut shows signs of wear, but no significant failures.

Inspection Site 9: This contained a piece of plastic debris (PVC and Kapton) located on the bottom of the gearbox housing. Debris generation site is unknown and not seen as an incident contributor.

16

Inspection Site 10: This site contained metallic debris generated by the interaction of the Hoist Cable (Item 13) and the spiral cut drum. Post inspection of the drum and cable showed no signs of detrimental wear or erosion.

17

6.3.3

Detailed Inspection of Major Components

After disassembly, the major components from the focus area were sent to the Structural Engineering Division (ES) for a closer examination:

Table 2: Listing of Major Components within Focus Area As mentioned earlier, items not listed in the table above such as the Drum, Linear Guides, Motor, and Radial Ball Bearings were deemed non contributors to any gearbox faults. The ER division was left in control of the non-listed items, however they were asked to not perform any side investigations. Presented below is the summary of the major findings from the examination of the items listed above. See Section 8.3 for the complete listing of findings. Refer to Figure 10 for item number references.

Item 1, Output Shaft 36T Gear, ARGOSZAD471: The face and outer teeth edges of the Output Gear show significant signs of wearing and chipping due to unintentional contact with Unsuited Gears Dog Plate (Item 2).

18

Item 2, 18T Gear Assembly, ARGOSZAD448 (Unsuited Gear) : The face and outer edges of the Unsuited Gears Dog Plate showed signs of unintentional contact with the Output Gear (Item 1).

Item 3, Rush Gear 36T, ARGOSZAD455: The face and outer teeth edges of the Rush Gear 36T show significant signs of wearing and chipping due to unintentional contact with Su ited Gears Dog Plate (Item 4). Furthermore, the gears shaft experienced .03 in. (.076 cm) axial free play, further increasing the contact potential.

Item 4, 15T Gear Assembly, ARGOSZAD446 (Suited Gear): The face and outer edges of the Suited Gears Dog Plate showed signs of unintentional contact with the Rush Gear 36T (Item 3).

19

Item 5, Clutch Plate, ARGOSZAD450: The Clutch Plates annular sector shaped cutouts (6X) show signs of uneven loading. Load contact patterns generated by the Dog Plate Teeth are located on the radial face, the inner diameter surface and the outer diameter surface ideally all six radial surfaces would be equally loaded. Uneven loading causes overturning moment loading to exist on both the Unsuited (Item 2) and Suited (Item 4) Gears. Unaccounted for moment loading reduces needle bearing life and causes misalignments leading to the mechanical interferences seen on the Unsuited and Suited Gears (Items 2 & 4), the Output Gear (Item 1) and the Rush Gear 36T (Item 3).

Item 6, Shifting Shaft, ARGOSZAD442: The Shifting Shaft shows signs of the following: Uneven loading from the Unsuited (Item 2) and the Suited (Item 4) Gear Needle Bearings due to incompatible diameter sizing Surface Brinelling due to needle bearing edge loading Surface wear due to incapable surface hardness

20

Items 7 and 8, Thrust Washers, 7421K26 & 7421K29: The Thrust Washers used to isolate the Unsuited (Item 2) and the Suited (Item 4) Gears from the Shifting Shaft (Item 6) experienced wear from exposed Dog Plate fasteners.

Item 9, Key, ARGOSZAD494: The Key is used to anti-rotate the Shifting Shaft (Item 6) with respect to its drive gear. The key was hand fit during assembly to a length that allowed it to become lodged under the Unsuited Gear (Item 2). The interference is not a contributor to the incident, since no relative motion occurs between the Shifting Shaft and Unsuited Gear during Unsuited Gear operations. The interference will only be problematic for Suited Gear operations.

21

Item 10, Snap Ring, VS-100: Excessive clearance between Suited Gear (Item 4) needle bearings and the Shift Shaft (Item 6) caused the snap ring to be side loaded with relative motion.

Item 11, Shift Fork, ARGOSZAD465: The Shift Fork contains anti friction pads (nylon) to engage the Clutch Plate (Item 5). During pre-disassembly running torque measurements, it was noted that the Shift Fork was preloaded into the Clutch Plate during Unsuited operations. The effect of this preload is apparent when examining the nylon pad wear patterns.

22

Item 12, RH Torq Spline, ARGOSZAD467: The Torq Spline is used to drive the Shift Fork (Item 11) between the Unsuited (Item 2) and the Suited (Item 4) gear selections showed no signs of failure or wear. Upon examination of the Spline mount design, an unintentional clamp up at Location A as well as an interference at Location B are possible.

Item 13, Hoist Cable, AI 4FZC: The Hoist Cable which interfaces with the Drum and the test participant showed no signs of failure or wear.

23

6.4

Fault Tree

The Board was not chartered to create an independent fault tree for this incident. However, the Board did review the fault tree that the ARGOS Project generated as shown in Figure 13.
Could trick software be modified Motor controller operated as designed/ programmed Could motor controller be tuned different to prevent?

Motor controller increased current to the motor

Feedback from the motor encoder indicated no motion

Caused by Rapid Descent of Crewmember in ACES space suit while testing in micro-g Crew member impacted hand rail High Friction, binding, burr, or failure of shift fork pads on shift fork to clutch interface Manual turning of the drive determined that higher forces than expected were required

Binding of Gear Box Came Free

Output drum could not rotate

Shift fork is misaligned

Inspection of gear box required

Based on data

Position data indicates no motion in the system

Visual inspection of gear box, damaged metal finish

Other unknown or new failure mode

Stop:

Incorrect Software

Stop: System software was verified correct during startup and verified again after incident. (screen shot available) Z-axis control loop went unstable Motor commanded downward by software outside fo the control loop Stop: Software froze correctly, data indicates it continued commanding until frozen. After event software was restarted and operated nominally

Trick Software Failure

Stop: Software froze correctly, data indicates it continued commanding until frozen. After event software was restarted and operated nominally Stop: Software froze correctly, data indicates it continued commanding until frozen. After event software was restarted and operated nominally

ARGOS Came Out of Gear

Stop: After event shift know was still locked in place. Microswitches did not indicate a out of gear Out of gear check between encoders was not activated

CAN network communication failed

Stop: NODE Guard error checking did not detect an error. All data is correct

Motor controller and motor failed due to EMI

Electronics had been tested known EMI sources in the building

Z-axis electronics box failure

Electronics failed due to EMI

Electronics had been tested known EMI sources in the building and a custom load cell was developed to prevent EMI interference. EMI filters are present on all power input lines. Stop: All data is correct. Load cell and encoder data was correct when checked after the event

Output Shaft Encoder Failure

Stop: All data is correct. Encoder data was correct when checked after the event

Load Cell Failure

Stop: All data is correct. Load cell data was correct when checked after the event

Power Outage or Sag

No F16 was received on motor controller and data indicate motor operation

Safety System Failure

ARGOS safety system performed as designed during the event

Shutdown of the motor controller output stage

Data indicates the motor controller was active during the free fall

Failure of the ARGOS Brakes

ARGOS brakes were not activated during the free fall. They did lock when commanded

Faulty Cables

No motor controller faults were present indicating cable failure. No can bus failure was present. Data transmission was correct after event. Motor powered properly after event

Figure 13: ARGOS Project Created Fault Tree 24

The following is a list of general findings from the Projects fault tree: The fault tree correctly identified the fault, Rapid Descent of Crewmember in ACES sp ace suit while testing micro-g There are 14 level-1 causes in the tree Of the 14 level-1 causes, the project decided to only work on 2 of the 14 paths which are related to binding Our interviews observed that the project is biased to binding being the causal path The project needs some expert facilitation with the development of a fault tree and associated root cause analysis

25

6.5

Control System

The Board was chartered to review the ARGOS z-axis control system and controller to determine if the incident resulted from a controller failure or non-modeled event. Findings and recommendations are to be reported; however, the Board is not chartered to resolve issues nor design a controller. To satisfy these goals the Board met with the ARGOS control engineers and researched pertinent documentation. The investigation results follow. An executive summary of the Board findings concerning the controller performance concludes there is insufficient data to state the controller response was erroneous or the controller was unstable. A major contributing factor leading to this conclusion is the proprietary control logic for the motor controller/motor therefore no knowledge of what was occurring in this unit during the incident could be ascertained. Also, no control system simulation was developed therefore analyzing off nominal conditions such as binding and its effects could not be performed. Finally, recorded test data was insufficient, did not include outputs that were required to characterize control response. In the absence of sufficient testing, modeling and vendor information, the rationale for a rapid downward controller command is indeterminate, and will be discussed in this section. Investigation of the ARGOS z-axis control system and controller is based on the criteria of meeting design, development, testing and evaluation (DDT&E) processes. The DDT&E process elements includes: 1) detailed block diagram of the control system; 2) develop simulations for time domain and stability analysis; 3) define performance and stability requirements; 4) develop test matrix to analyze stability, performance and verify requirements; 5) documentation of all work. After meeting with the ARGOS control engineers it was determined there is no dedicated control systems document, no detailed block diagrams, no simulation of the control system. A high level description of the control system is presented in document SRSD-11-016 Failure Modes and Effects Analysis (FMEA) Active Response Gravity Offload System Generation 2. Basically the control system (Figure 14) consists of a proportional/derivative (PD) outer loop, inner loop consisting of the motor controller and motor utilizing a proportional, integration, derivative (PID) logic, a load cell registering the force on the cable. The inner loop is a black box with the PID controller being proprietary therefore there is little insight into its makeup. Additional elements include the gearbox, cable drum, encoders, a/d converters, and saturation limiters, hysteresis, latency, and converters.

Figure 14: High Level Control Loop A detailed block diagram of the control system is a critical early design step and is required for analysis prior to human rating the hardware. This should been done with or without human rating. The block diagram defines all critical control loop parameters needed to complete the design of a system. Without the block diagram the control loop parameters cannot be determined correctly. Even with a Black Box in the loop it is possible to characterize the system (at least to prove stability) with testing. It can then be determined if the safety systems are fast enough to protect the test subject and if the bus speed is sufficient to provide communications between the blocks without causing delays. During discussions with the ARGOS control engineers it was learned the control system was tuned by lowering the gain on the inner loop (black Box) and adjusting the outer loop gains until desired performance was attained. A concern is since the gain is low on inner loop the gain on the outer loop has to be higher to drive the controller during rapid changes in speed. This can lead to saturating the amplifier on the 26

previous stage resulting in a nonlinear response. To fully characterize the control system the inner loop PID knowledge is required or at a minimum construction of a transfer function. The ARGOS team stated they tried but could not develop a model due to non-linear response. It is the Boards recommendation that the ARGOS team contact control system engineers in other Engineering Divisions (Aeroscience and Flight Mechanics and Structural Engineering Divisions) in an effort to develop a model. An ARGOS simulation of the control system is required to conduct performance analysis with off nominal conditions and failures, Monte Carlo runs, stability and frequency response. The simulation will support the ARGOS certification and provide insight into system response for off nominal conditions. If a simulation had been developed then a reenactment of the incident could have been run to observe the system behavior. Thus pointing to the cause and a potential work around. A simulation will require an ARGOS detailed block diagram with representative modeling of the elements. The ARGOS team decided to build the actual unit and test with it. There is a limit to what can be tested on the ARGOS unit, frequency of test, and data gathering. It is the Boards recommendation that the ARGOS team develop a simulation that can characterize the system by performing frequency response, stability analysis, constrained motion testing, interaction between horizontal and vertical controller, and Monte Carlo runs. It was difficult to find documentation of the control system how it was developed and finally verified. In some cases detailed documentation did not exist. There was a test matrix for the ARGOS unit; however, since it was applied to the actual unit there were limitations to what could be tested. It is uncertain that all the control system performance and stability requirements were tested on the ARGOS unit. Again the need for a simulation can be argued for. Detailed documentation for all levels of the DDT&E process should be completed. Without this documentation it is nearly impossible to reconstruct the control system and the expected performance. Applying the above observations to the incident explains the indecisive result. What can be backed out from the data available is the motor was sending a command to move but no motion was seen (possible binding). It is assumed the inner loop PID controller (black box) continued to increase current to the motor until it broke loose. It is possible there was wind up on the integrator term therefore once the binding was overcome it took the system time to respond. This is a guess since the PID block diagram is proprietary. It is also plausible with a binding condition the controller could have been unstable however no way of determining it. Implementing a DDT&E process as outlined above will mitigate or reduce the possibility of this type of event occurring.

27

6.6

Electronic System
Power Distribution 3 Phase 208VAC 24VDC

The ARGOS Z-Axis electronic system consists of 3 subsystems as can be seen in the Figure 15 below.
5VDC 24VDC

E-Stop System Safety System Fault Detection

Motor

Motor Controller Control Loop

Gear Box

Brakes Limit Switches

Safety System Node Guard WatchDog Timer CAN bus Fault Detection

Gear Selector Switches

Encoder

CAN bus A/D Converter and CAN Converter Node Guard Heartbeat via CAN bus Computer running Trick Simulation Safety System Control Loop Interface to Load Data Collection
Node Guard Heartbeat

Load Cell

Load Cell Amplifier, A/D Converter, and CAN Converter

CAN bus

RS232

Horizontal System

Figure 15: Electronics System Block Diagram The following sections give a brief description for each of the 3 subsystems within the Z-axis system

6.6.1

E-Stop System

This system is a dedicated safety system which monitors fault status from the Motor Controller and senses the upper and lower crane limit switches and position encoder and performs safety hazard controls (i.e. outputs to the lifting system brakes and disables the motor controller).

28

6.6.2

Z-Axis Motor Controller

This system is a COTS system supplied by the vendor of the Z-axis motor. The system consists of 2 closed loop control systems implemented via complex electronics. One is a PID control loop and the other is a motor current control loop. The COTS system does not provide any electronic mechanism for time synchronizing the 2 internal control loops with the outside world. The system provides major external status/control interfaces which the overall ARGOS electronic system uses as follows: Discrete fault output and enable input This interface is used by the E-Stop System to perform emergency stops via the brakes and also disables the motor controller via the enable input.

6.6.3

CAN Bus Interface

This interface is used by the Trick Simulation to send motor control commands and to receive available status from the motor controller. This interface is also used to disable the motor controller when faults are detected. NOTE: Even though the CAN Bus is a very deterministic interface (i.e. time synchronized), the motor control loops are not synchronized with the Trick simulation computer or software.

6.6.4

Computer Running Trick Simulation

The computer which runs the ARGOS software is a COTS computer. There are 5 interfaces which are used to perform the outermost ARGOS control system and data acquisition functions by the Trick Simulation software. Z-axis Motor Controller Interface This is the CAN Bus interface as described above in the Motor Controller section. Z-axis Load Cell Interface This interface is accomplished via a CAN Bus enabled A/D converter and a load cell amplifier. Z-axis Gear Selection Switches Position Interface This interface is accomplished via a CAN Bus enabled A/D converter to read the position of the gear selector. Z-axis Drum Encoder Interface The ARGOS system has a position encoder separate from the one internal to the Z-axis Motor Controller. This interface is also a CAN Bus. X-axis and Y-axis Horizontal System Interface This computer also interfaces to the X-axis and Y-axis control systems via asynchronous RS-232 digital interfaces The next section on software will address any computing resource limitations for this computer. There were no electronic system design deficiencies found by the Board. However, due to the lack of time synchronization between the two internal control loops of the Z-axis Motor Controller and the overall Trick Simulation control loop does present a challenge to the overall control system modeling effort. Please see the Control System section for related control system modeling findings.

29

6.7
6.7.1

Software
Background

The case of a software fault causing the motor to unintentionally drive down at maximum velocity is investigated in this section of the report. The ARGOS Software is under development by NASA using the Trick Simulation environment to provide force feedback control system functionality as well as certain system safety parameters required to operate the ARGOS. This software is used by the ARGOS console operator to perform system setup, operation and some of the emergency fault detection and response. The scope of this analysis was to determine the following: Approved software was in use on ARGOS during the incident Approved software followed the ARGOS configuration management plan and all modifications were approved per the plan Regression testing performed on ARGOS software safety functions Software Fault Detection Logic Test data recorded during the incident was representative of the system parameters being measured Findings and recommendations to apply to the ARGOS software development process see Section 8.2.3

6.7.2

ARGOS Hoist Control System Background

The overall ARGOS Hoist control system works to maintain a target offload force in the lifting cable, which results in a reduced gravity (or microgravity) simulation for the test participant. The two key components of this control system are the Trick Simulation ARGOS controller and the Kollmorgen Servostar S620 motor controller, which work in conjunction with various sensors to consist of the overall ARGOS control system (Figure 16). All control system calculations outside of the motor controller make up the ARGOS Controller written in the Trick Simulation Environment.

Figure 16: ARGOS Control System Block Diagram The ARGOS controller is implemented with the NASA Trick Simulation environment running on a Linux Cent OS workstation. The computer control is running a one millisecond control cycle commanding a Kollmorgen Servostar S620 motor controller over a CAN Bus network. The Trick Simulation provides most of the system integration to read the cable tension, output drum encoder, gear selection switches and communicate with the s620 motor controller. Figure 16 is to be viewed as a high level loop of the controlling components in ARGOS. Multiple controllers are embedded into the ARGOS controller and Motor controller. These are discussed in the controls analysis of the investigation report.

30

6.7.3

Software Validation

The software validation evaluates the ARGOS Controller block described in Figure 16. This software is the NASA developed Trick Simulation performing the force feedback control logic and a number of fault detection scenarios. The ARGOS software falls under the requirements of Configuration Management (CM) Plan for the Active Response Gravity Offload System (ARGOS) (SRSD-08-005.A). The plan outlines requirements for use of version control software to managed released versions of production code. All software modifications on ARGOS are approved through a Test Readiness Review (TRR) prior to any human off-load testing. The software version being run is verified on a daily checklist performed prior to each day of operations on ARGOS. The approach to verify that approved production software was utilized on the ARGOS during the incident included evaluation of the configuration management plan steps being followed, documentation that the approved software was running, and ensuring source code modifications were approved by a TRR. The requirements per the ARGOS CM plan (SRSD-08-005.A) include the following: Software changes are approved through TRR Software release is given a version description and control number and is managed with a software version control application A change request is processed by the ARGOS Configuration Control Board

An ARGOS operations daily checklist (Reference 9) ensures that the production software executable is selected when running the ARGOS Control Software. A common Linux application was utilized to perform a difference check between the ARGOS source code prior to and after the most recent TRR that approved software modifications.

6.7.4

ARGOS Software Configuration Management

Based on interviewing the ARGOS software developer, the most recent change to the ARGOS software was the addition of using the output drum encoder velocity as a biasing term in the controller to maintain the current velocity. This modification was approved by the ARGOS EMU TRR conducted on 11/26/2012. The information provided in the TRR is shown in Figure 17 from the TRR slides. This provided a baseline for checking that the ARGOS daily checklist was updated to the current software version and the version was operating on the ARGOS Control Computer. The daily checklist, Figure 18 and Figure 19, show that the operator confirmed the verification step. The ARGOS computer screen capture after the incident is also consistent that the software path was opened correctly (Figure 20).
NASA
Johnson Space Center SUBJECT: NAME:

Engineering Directorate

Software

Larry K. Dungan
PAGE:

DATE:

November 2012

22

The z-axis software has been updated to improve the realism of the offload simulation
New variable allows motor velocity to influence continued motion of the system
ie. Allows the load to coast until the equal and opposite force is received All safety systems and controls are unchanged

Motor velocity graph was changed from motor RPM to linear velocity

Software has been fully tested with load Software has been revised and released per the ARGOS CM plan Procedure has been updated for new steps and revision number

Figure 17: Slide from EMU TRR Software 31

Figure 18: ARGOS Startup Checklist from day of incident

Figure 19: ARGOS Startup Checklist Path Verification (step 54)

32

Figure 20: ARGOS Trick Simulation Control Screen capture from day of incident The source code used to build the production software was reviewed to determine if the modifications to the software were consistent with the TRR approval. This required a review of all source files to identify changes that were implemented from the previous software version. The changes being evaluated were for the addition of a velocity based variable to be included into the ARGOS Control software and a change to how the motor velocity would be displayed to the operator in terms of load linear velocity rather than motor RPM. The files reviewed were the following: Filename ARGOSApplication.java ControlApplication.java ATM60.hh ATM60.cpp S620.cpp adaptive.tv data_record.dr input.py S_define Date Modified 11/15/2012 11/15/2012 11/13/2012 11/14/2012 11/14/2012 11/19/2012 11/28/2012 12/13/2012 11/20/2012 Variables saved to log file Setup parameters Main TRICK program Description GUI for velocity control variable GUI for velocity control variable Header file for external encoder Configure external encoder to provide velocity data Motor controller

Table 3: Trick Simulation Source Code Files Each of the modified files were consistent with the modifications approved by the ARGOS EMU TRR. The file, ARGOSApplication.java, included changes to allow for showing the new version number on the screen, the setting a velocity gain variable Kv, and limiting the minimum and maximum values for Kv. The file, 33

ControlApplication.java, allowed for the screen layout to include the new variable. The files, ATM60.hh and ATM60.cpp, control what information is collected from the ARGOS output drum encoder. This sensor is able to provide position and velocity over the CAN bus network and the files were modified to configure the encoder to add the velocity output from the encoder to the CAN network data being used by the Trick Simulation ARGOS controller. The function readposition() used to read the encoder position was updated to also read the velocity variable. The file, S620.cpp, was updated in increase a variable synccount used in the MessageInfo() object from 10 to 50. This changed the rate that the s620 motor amplifier would provide actual motor RPM to the Trick Simulation. This motor velocity variable is not used in the ARGOS control algorithm and is only used for troubleshooting. The Velocity variable used in the controller comes from the output drum encoder (ATM60). The file, data_record.dr, controls what the Trick Simulation environment records to a data file. The modifications all reflect the new velocity controller variable. The velocity variable from the motor controller was removed and replaced with the velocity variable from the ATM60 output drum encoder. A conversion to linear velocity is a calculated variable that is logged to the data file. A calculated variable to convert the motor RPM command to a motor linear velocity command is logged to the data file. The velocity gain variable is logged to the data file. The purpose of converting rotational velocity to linear velocity was that the data is more intuitive to ARGOS customers and operators when operating the system and reviewing test data. The file, input.py, is a test parameter file. This file includes ARGOS system limit parameters that are configured based on the test configuration. These parameters are modified after the TRR and define the operational motion limits for the system for the current configuration and set fault detection thresholds. Monitored virtual physical limits include a virtual soft stop motion limit and a virtual hard limit. The soft limit commands zero velocity to the motor controller and will allow the ARGOS operator to back out of the limit position without a system fault. The virtual hard limit causes the ARGOS Trick simulation to command a motor controller disable, initiating the emergency zero velocity ramp command to approach zero velocity and throw the external brakes. Parameters that control fault detection for the load cell measurement include magnitude of unacceptable error between the target off-load force and the measured force along with a duration. Also, an unacceptable minimum and maximum force value will fault the system and result in the brakes locking the system. These parameters were modified due to changes in the lifting path components and changes in the ARGOS system height. The file, S_define, is the main Trick Simulation control loop. This source code was modified to include the velocity component of the control system algorithm. To conclude, the ARGOS software configuration management was followed as defined in the approved Configuration Management (CM) Plan for the Active Response Gravity Offload System (ARGOS) (SRSD-08-005.A. The software has existed as a component of the overall ARGOS development and has not been identified as a specific software project by the Engineering Directorate. Under current NASA process at JSC, the software developed within the Engineering Directorate would be required to follow the process described by EA-WI-035 Software Project Management and Development.

6.7.5

Software Regression Testing

Per the ARGOS CM plan, SRSD-08-005.A, software regression testing is based on the requirements determined by the ARGOS project lead and as approved by test readiness review. The most recent change to the ARGOS software to include a velocity gain parameter did not document requirements for regression testing of the safety functions that perform fault detection logic in the system. The determination to test for controller stability under nominal drive conditions was evaluated with test inputs that included high velocity and impulse inputs into the force feedback system. There was no evidence of previous testing being done to evaluate over constrained operation of the software. This determination was made based on the logic that none of the fault detection logic or system interfaces were modified.

6.7.6

Fault Detection Software Logic

The software was reviewed in collaboration with the ARGOS Controls Engineer to establish whether the close call occurred due to a failure of the existing fault detection logic in the software. The fault detection logic is listed in the following Table 4. The Hazard of Un-commanded motion was evaluated due to the causes shown in the following table. The cause that most likely resulted in the close call was not recognized when developing the 34

software and is likely to be a case of over constraining the physical system such that the motor controller integration term increased motor torque until breakaway occurred and the system ran away without having a chance to recover. The processing capacity of the Trick Simulation computer is greater than the demands of the control system algorithm and fault detection. Trick has built in capability to monitor the control cycles and log when a control frame is delayed beyond the cycle time of the simulation. The ARGOS simulation cycle time is one millisecond. The ARGOS team has stated that a designed rate of missed frames occurs due to devices on the CAN Bus network periodically responding in a duration slightly greater than one millisecond which results in three out of 1000 frames being delayed. This is not due to processing capacity, rather asynchronous hardware clocks. If the processing capacity of any device in the Trick simulation causes more frames to be missed, the Trick Simulation will report these delays. There was no identified evidence of overloaded computer capacity in the close call incident.

35

Hazard Cause

Hazard control

Description

Criteria
If the motor encoder and output encoder differ by more than 1.2 revolutions (in terms of the motor)

Result
Shutdown command is sent to motor. Brakes commanded to engage. Trick Simulation enters freeze loop. Output gear slippage message to console Shutdown command sent to motor. Brakes engage. Trick enters freeze loop. Output gear indication message to console

Effect on Load
Depending on the rate of the gears, the minimum drop of the test subject is 1.2 rotations of the motor. This distance is increased by the duration that the position data comes in from the motor controller. The output drum position is measured every 4 msec Trick Simulation monitors switch positions every 0.25 seconds. If the system goes out of gear completely and the load starts to drop, the gear-slippage logic is more likely to react first based on a higher sample rate and this gear indication logic cycles every 0.25 seconds. If the gear is partialy engaged the switch breaks contact before completely out-of-gear with the dog-teeth remaining fully engaged and the Gear inidication switch logic would command the system to stop and engage the brakes

Gear Slippage/Out Drive input/output Gear slip detection of Gear/Encoder position detection between output drum Failure encoder and motor encoder position data mismatch

Comes Out of Gear Gear indication switch

Gear ratio selector If both switches are indication switches show depressed, neither switch which gear ratio is is depressed, or if the engaged by the shift fork opposite one of the expected gear ratio (set when initially shifted) is indicated

Drive moves past motion allowed

Virtual Soft Limit

The virtual soft limit is designed to prevent a test subject from reaching a hard limit

Absolute output encoder information indicates soft limit position has been reached

If output velocity This position is initialized during the ARGOS calculation results in a daily checklist. Positions are set to allow full commanded velocity motion in the vertical direction which will not further into the limit, the prevent an impact to the floor. The logic is Trick Simulation sends a based on encoder data sampling at (4 msec) zero velocity to motor and will output the appropriate velocity controller instead. The command on the next one millisecond control software will output a soft cycle once data is received limit message to the console Shutdown command sent to motor. Brakes engage. Trick enters freeze loop. Output hard limit message to console Trick Simulation logic is based on encoder data sampled at 4 millisecond intervals. The simulation freeze and brakes are commanded to engage on the next one millisecond control cycle

Drive moves past motion allowed

Virtual Hard Limit

The virtual hard limit is the first hard limit (located before the physical hard limit switch)

Absolute output encoder information indicates hard limit position has been reached

36

Bad input to control system

Load Cell Disconnection

Check for a reasonable load cell force

Trick Simulation will detect if the raw load cell force measurement is ever less than -100lbf (-445 N) or greater than 1000lbf (445 N)

The Trick Simulation will The Trick Simulation logic will enter a freeze send a shutdown loop and send a shutdown command to the command to the motor motor controller on the next one millisecond controller on the next one control cycle. The motor controller will ramp millisecond control cycle. to zero velocity from the current velocity and Brakes engage. Trick engage the brakes enters freeze loop. Software will output the load cell disconnect message to console Shutdown command sent to motor. Brakes engage. Trick enters freeze loop. Output relevant load cell disconnect message to console The Trick Simulation logic will take two millisecond control cycles to detect this fault and will command shutdown on the next one millisecond control cycle. The Trick Simulation logic will enter a freeze loop and send a shutdown command to the motor controller on the next one millisecond control cycle. The motor controller will ramp to zero velocity from the current velocity and engage the brakes When the the load cell is disconnected the filter and analog to digitcal converter output a noisy force between 0 and 20lbf (89 N). The effect is a low cable tension sent to the control system and the hoist will rapidly rise for 300msec to 500msec prior to entering the Trick simulation freeze loop and commanding the motor controller to ramp to zero velocity and engage the brakes

Bad input to control system

Load Cell Disconnection

Check for a reasonable If the raw load cell force delta between two changes by 125lbf (556 N) consecutive data points in one millisecond (between data points)

Bad input to control system

Load Cell Disconnection

Check for a reasonable A fast check and a slow Shutdown command sent force error over time. check: If the force error to motor. Brakes engage. The values in this loop exceeds 100 lbf (445 N)and Trick enters freeze loop. were empirically remains above 100lbf (445 Output relevant load cell developed during human N)for 300msec or if the disconnect message to testing with the ARGOS force error exceeds 35lbf console team (156 N) and remains above 35lbf (156 N) for 500msec. This loop does not run if the participant is inside of a soft limit The filtered force feeds If the filtered force is less into the proportional than zero term of the controller. A negative force can result in undesirable behavior Set the filtered force to zero

Bad input to control system

Negative Force

This scenario occurs when the load cell is measuring impacts (ie. Foot impact while jumping) that result in impulse measurements. This Trick simulation logic limits the control system response but is more of a stability control than fault detection as it does not engage the brakes or stop the simulation

37

Bad input to control system

High force error

To maintain stability If the filtered force error during impacts (foot exceeds 20 lbf (89 N) strikes, jumps, etc), cap max filtered force error (feeds into proportional term) Exceptions such as Trick has exception floating point handling exceptions, memory exceptions, etc. in Trick

Set the filtered force error Proportional term causes under-damped to 20lbf (89 N) ringing. Limiting this error reduces the amplitude of persistent oscillation

Control System Node Guard Failure/ Software Exception

Shutdown command sent Node guard S620 motor controller has node to motor. Brakes engage guarding and expects a heartbeat within 100ms If no heartbeat, instant shutdown

Control System Node Guard Failure/Computer Failure

Computer shuts down or S620 motor controller has Motor controller throws Node guard S620 motor controller has node software fault causes node guarding and expects n04 warning and shuts guarding and expects a heartbeat within abnormal exit without a heartbeat within 100ms down. Brakes engage 100ms If no heartbeat, instant shutdown executing a normal Trick shutdown routine S620 motor controller receives velocity commands via CAN network S620 has node guarding and expects a heartbeat within 100ms Motor controller throws Node guard S620 motor controller has node n04 warning and shuts guarding and expects a heartbeat within down. Brakes engage 100ms If no heartbeat, instant shutdown

Control System Node Guard Failure/Communic ations check. Break/error in CAN network or failure of Trick software CAN

Control System Software input Trick software checks Failure/ Motor variable sanitization S620 motor controller controller velocity routine gain settings at software control loop gains start incorrect

Check if KP=0.2 (proportional gain) and Tn=140 (integral time constant)

Software will not start if settings not correct

This is pre-operational fault logic that will prevent the system from starting.

Table 4: Fault Detection for the Hazard of Un-commanded Motion

38

6.7.7

Test Data Review

Per the ARGOS overview given by the ARGOS test team there are a number of test parameters being logged in a data file every 10msec for analysis after the ARGOS operations are complete. The parameters include: Variable Simulation Time Raw Load Cell Force Filtered Load Cell Force Target Offload Output Encoder Position Output Encoder Velocity Linear Velocity Commanded Linear Velocity Kpv Kpf Kdf Units seconds lbf lbf lbf counts RPM in/s in/s ---Description Time since start of simulation Tension in lifting cable Force after nonlinear filter - fed into the control loop Force control loop tries to match Position of ATM60 absolute encoder on cable drum Filtered velocity of absolute encoder on cable drum Calculated linear velocity of cable Control loop commanded velocity as linear cable motion Velocity Gain Variable Proportional Gain Variable Derivative Gain Variable

Table 5: Test Parameters Recorded by the ARGOS Trick Simulation Part of the software review was to develop confidence that the recorded parameters from the ARGOS software represented the system response and all values were within the expected range and capability of the sensing hardware and software data-types. The value for Raw Load Cell Force (lbf) evaluates the cable tension between the ARGOS hoist and the load. It is a 1000lbf full-scale strain gauge high level output sensor the load cell is 10Vdc full-scale output and is digitized through a 16bit signed CAN bus signal conditioner. The force measurement was within sensor limits throughout the incident and never appears to saturate during the event. The Output Encoder Position is provided by an ATM60 Sick brand absolute encoder. The encoder is an unsigned 32bit integer and has been verified to be well within the rotational limits of the device during the close call operations. The position is initialized during the system start-up to zero the start position. The system position is a signed 32-bit integer that takes the unsigned current position minus the unsigned initial position. The output encoder velocity is a 32 bit signed integer provided in units of revolutions per minute. All of the encoder variables are properly typecast in Trick to prevent overflow of the variables. Both the rotational position and rotational velocity are converted to linear inch units for purposes of the data recording file. The Kollmorgen Servostar S620 motor amplifier provides motor position as an unsigned integer to the Trick Simulation but it isnt recorded. Additionally the S620 amplifier is commanded with an RPM c ommand to the motor controller. For purposes of data presentation and data recording this value is converted to linear inches per second to provide units familiar to the ARGOS operator. The ARGOS gear selection switches are triggered by the gear selection and send a discrete value to the Trick Simulation through the CAN bus. All of the parameters identified for use in the control algorithm and being recorded in the data file were proven to correspond to appropriate programming data-types.

39

6.8
6.8.1

Safety and Hazard Analysis


Safety

The ARGOS Systems Requirements Document makes no reference to JPR 1700.1, JSC Safety and Health Handbook nor EA-WI-023, Project Management of Government Furnished Equipment Flight Projects. The ARGOS was considered a development project that could operate with flexible adherence to requirements. This culture was accepted by Safety and Mission Assurance oversight. The primary NASA safety document applied to ARGOS was NASA Standard 8719.9, Standard for Lifting Devices and Equipment. This standard is heavily referenced and many people viewed ARGOS as a high tech Critical Lift crane. The standard may have been the best fit but ARGOS was not just a Critical Lift and for people. Although this was known at the time, the fact that no standard for a complex human robotic system exists, restricting adherence to the best fit, kept the team from looking for needed requirements and Hazard Controls. The velocity that the system operates at when approaching obstacles is well outside the realm of normal lifting operations.

6.8.2

Hazard Analysis

SRSD-12-007 Hazard Analysis for Gen 2 ARGOS Facility Testing documented the hazards o f the ARGOS suspension system. It evolved from its initial use to document the standalone Gen 1 ARGOS system to the current Gen 2 configuration, which eventually included humans in the loop. The ARGOS team regularly updated that facility HA to reflect changes to the system that introduced new hazards. The status of the completed HA was presented at each Test Readiness Review (TRR). The same is true of the hazards unique to wearing the harness and/or a pressure suit, which were documented separately by the Crew and Thermal Systems in SRSD-12-008 Hazard Analysis for ARGOS for Test Participant Providing an Input into Gen 2 ARGOS. This test subject HA was focused solely on the hazards of the human physiology of being restrained in the harness and/or pressure suit at various orientations. It did not address the ARGOS system performance beyond the harness.

6.9

Engineering Processes, Roles and Responsibilities

This section is a discussion about how ARGOS evolved and the Engineering processes, roles and responsibilities that were observed during the investigation. It is not a technical discussion, rather observations of the environment, culture roles and expectations. During the period that ARGOS was initiated, the EA-WI-023 was written to cover GFE flight projects, in 2012 it was revised, to be much more easily used by projects at all levels. Development, research or low TRL level projects are typically not projectized, as they are viewed within Engineering as not needing the rigor because they are more risky and are undergoing constant change as the hardware is developed. This approach allows for rapid development; build-a-little, test-a-little philosophy to obtain quick results at low cost. Specifically for ARGOS, it was an internal Engineering project that had no program or external customer; it was initiated on a small amount of internal funds. It was not categorized as a project or facility, and did not initially involve test subjects. Safety and Mission Assurance support was included from near the start of ARGOS, with buy-in to the engineering development approach. Throughout the development, the ARGOS team researched the design and selection of the components to a very detailed level, to the very best of their ability. Safety was requested to assist in the right lifting requirements. However, beyond Safety, it was noted that there was limited involvement from outside the division and organizations, and that the Institutional Review Board (IRB) was considered the oversight and external review from a human safety as well as engineering oversight. Each version or addition or change to ARGOS over the 6 to 7 years were reviewed at a Test Readiness Review (TRR) Board, in fact, there were 44 TRRs found for ARGOS. In addition, 19 ER CCBs were found from 2008 to current, and one Engineering Leadership Council topic at the Engineering Directorate, during the same period. Within ER all TRRs are chaired by one branch chief for the entire division. That branch chief also happens to be the branch which developed ARGOS. No reviews beyond a TRR were held. A PDR was held at about 50%, at that time some external review was provided. There were no CDR or other reviews beyond the one PDR. At some point the ARGOS went from a development effort to using humans in the test. The test subjects were varied and diverse, from NASA engineers to retired NASA astronauts, and even outside visitors. It also grew in capability from simulating 1/6 gravity (Lunar environment) to microgravity, and 40

from upright human testing to horizontal microgravity testing. All of these changes added risk and should have triggered additional safety concerns. On March 15, 2012, there was a similar close call event of an un-intended drop of a test subject, also 12 to 18 inches (30.5 to 45.7 cm). The root cause of this event was attributed to developmental software that was incorrectly executed for the test, rather than the baseline software. In this case the motor amplifier threw a F32 (Software Failure) fault and the brakes were fully engaged to stop the fall. From the Boards perspective, there were signs that were missed. The start of human in the test, the lack of proper outside review, lack of proper level of review, and a similar event occurring less than one year prior to the close call in this report.

41

7 Findings and Recommendations


7.1
1.

General Findings:
ARGOS was an engineering development project that was treated as if it were operational. The ARGOS team was beset with funding issues from the outset. This drove the project to experience Groupthink and seek additional non-developmental users which gave the appearance that the system was operational. ARGOS Project defined requirements were never validated. ARGOS Project was a Safety-critical system which was signed off as ready for human subjects at the Branch Chief level. ARGOS Project had an earlier incident approximately one year prior to the incident being investigated here. In the earlier incident, a human subject was dropped approximately 12-18 inches but no formal investigation was conducted.

Recommendation(s): Human in the loop testing should elevate safety oversight for both EA and NA. 2. Engineering and Safety and Mission Assurance directorates do not explicitly require or define when a development project becomes large enough or critical enough to warrant additional reviews for safety. Recommendation(s): Add clear definitions in EPD 1205.1 or EA WI-023. Suggest clear thresholds, such as dollar investment, human in the loop testing, etc., would require a project to comply with rigorous oversight reviews from independent group. Safety and Mission Assurance needs processes or documentation that clearly defines when safety oversight must be elevated. For example, define when a URR or ORI are required.

7.2
7.2.1

Proximate/Root Causes and Contributing Factors


Proximate Cause

The ARGOS hoist exhibited Un-Intended Motion that drove the hoist downward fast enough to cause the hoist cable to go slack and caused the human test subject to be dropped in free fall. The System Requirements Document for ARGOS requires that the electrical and control system to be two fault tolerant to loss of control. The requirement, as demonstrated by this Close Call, was not met. Recommendations for the Intermediate Cause, Root Cause, Contributing Factors and Specific Findings address the Proximate Cause.

7.2.2

Intermediate Cause

The system, by design, is insufficient to meet the requirement to avoid loss of control or come to a safe stop. The response of the Control System to a non-linear input from mechanical binding is the most likely cause, but is indeterminate. Analysis of the event data and follow up tests showed that an induced bind in a similar system caused a similar un-intended motion. Recommendation 1 Review the system design and modify it as necessary to meet the safety critical requirements. This review may result in new or modified requirements. The emphasis is to control the hazards and manage the risks.

42

7.2.3

Root Cause

The management oversight was insufficient to control the hazards to humans and other test payloads. Neither the lack of satisfied requirements nor the missed hazards were found by management and safety oversight or during formal and informal design reviews, Test Readiness Reviews or Institutional Review Board proceedings. Recommendation 2 - Review Standards, Policies, Procedures and Administrative Controls to assure the safety critical requirements are identified, met and validated. These assurances should provide for impartial design reviews by scope appropriate subject matter experts. The application of Product Peer Review, Independent Review Team, URR and ORI processes should be reviewed.

7.2.4
1.

Contributing Factors
Since the ARGOS team regards ARGOS as a development system, they do not believe that satisfying the System Requirements Document for ARGOS is incumbent upon them. Recommendation 3 - Review Standards, Policies and Administrative Controls to assure the safety critical requirements are recognized, met and validated regardless of system nomenclature.

2.

Un-Intended Motion was not identified in the Hazard Analysis as a hazard to humans, only to equipment. Recommendation 4 - Review and correct the Hazard Analysis. Not only was the hazard to humans from Un-Intended Motion not identified but interviews with Subject Matter Experts indicate that hazards such as the one from a 12 to 18 inches (30.5 to 45.7 cm) fall should have had a Severity I instead of Severity II.

7.3
7.3.1

Specific Findings:
Findings Specific to Mechanical System Design

The following findings and recommendations summarize the review of the Z-Axis hardware, CAD models and design methods to date. The recommendations are made in a manner that maximizes the use of the existing hardware. Modifications to the baseline design as recommended below will require an update the Stress Report, ESCG-4450-10-STAN-DOC-0064. Refer to Figure 10 for item number references. 1. Excessive clearance and incompatible race hardness between the Unsuited and Suited needle bearings (embedded within Items 2 and 4) and the Shift Shaft (Item 6) caused uneven inner race loading and wear. Current misalignment based on drawing least material conditions is .004 in/in. (.01 cm/cm) Current inner race hardness is 36 43 HRC. Recommendation 5: Size shaft and housing fits per manufacturers recommendations Ion nitride shaft race ways per NASA/JSC PRC-2004 to obtain HRC 58

43

2.

Excessive clearance between the Suited needle bearing (embedded within Item 4) and the Shift Shaft (Item 6) causes the Snap Ring (Item 10) to be side loaded with relative motion. Recommendation 6: Replace snap ring with spiral retaining ring Add a washer, keyed to Shift Shaft, to handle thrust loading and to isolate the snap ring from relative motion Excessive clearance between the Unsuited and Suited needle bearings (embedded within Items 2 and 4) and the Shift Shaft (Item 6) allows gear misalignments and inadvertent contact with the Output Gear (Item 1) and the Rush Gear 36T (Item 3) during meshing. Recommendation 7: Perform tolerance analysis to understand minimum and maximum clearance Increase offset from Dog Plates to adjacent gearing as applicable Add assembly inspection point to measure gear set clearances Add a second row of needle bearings to Suited and Unsuited Gears for increased stability Break edges of all gearing to prevent gouging and personnel injury Chamfer outer edge of all gears

3.

4.

The Gear Selector Knob for the Unsuited/locked position causes the Shift Fork to be preloaded against the Clutch Plate (Item 5). Recommendation 8: During assembly ensure clearance is present between Shift Fork pads (Item 11) and Clutch Plate Add shims between the Shift Fork and RH Torq Spline Nut (Item 12) for adjustment Re-tolerance RH Torq Spline to prevent binding at Location A (from fastener over tightening) and to ensure clearance at Location B. Refer to the Detailed Inspection section. Shifting Shafts (Item 6) drive gear was retained with an open snap ring. The design calls for a spiral retaining ring. Recommendation 9: Retain spiral retaining ring, WST-100 Add an isolation washer between spiral retaining ring and the drive gear to isolate retaining ring from the drive gear to shaft dithering action

5.

6.

Unsuited and Suited Dog Plate fasteners (embedded within Items 2 and 4) are exposed and contact Thrust Washers (Items 7 & 8). Recommendation 10: Increase countersink to ensure fasteners are recessed and ensure rounded edges are present on Dog Plates to improve thrust loading surface

7.

Key (Item 9) has worn corners due to contact with the Unsuited Gear (Item 2). Recommendation 11: Reduce Shifting Shafts (Item 6) groove length and Key length to remove interferences

8.

Clutch Plate (Item 5) experiences non uniform loading when engaged with either the Unsuited (Item 2) or the Suited Gear (Item 4). Recommendation 12: Perform tolerance analysis to understand the Dog Plate teeths inner and outer diametrical clearances 44

9.

Modify Dog Plate dimensions as applicable to obtain positive clearance when considering worst case tolerance combinations including bearing clearance Size Suited and Unsuited needle bearings by assuming that only one Dog Plate tooth is loaded as it revolves around the point of gear contact

Binding and increased frictional drag is probable in three areas: a. b. c. Unsuited Gear (Item 2) meshing to Output Gear (Item 1) Suited Gear (Item 4) meshing to Rush Gear 36T (Item 3) Clutch Plate (Item 5) to Shift Fork (Item 11) interface

Recommendation 13: Create a kinematic model to understand the sensitivity of friction and disturbances. This model is to be used in conjunction with the control system model and correlated such that actual hardware performance is predicted.

10. The Rush Gears (Item 3) shaft has .03 inch (.076 cm) of axial free play. Clearances increase the opportunity for inadvertent contact with Suited Gears (Item 4) Dog Plate. Recommendation 14: Perform tolerance analysis to understand minimum and maximum free play and resulting Dog Plate clearances Add outer race shims to reduce axial free play to .002 to .005 inch (.005 to .013 cm)

11. The ARGOS Heavy System (X, Y and Z) is designed to have close clearance components rotating with respect to one another in a safety critical application where coupled failure modes are present. Recommendation 15: Apply mechanism requirements of NASA-STD-5017 to all ARGOS mechanisms Examine the effects of the Drum translational spline drive nut failure on the run away potential of the X-Y mechanisms. Consider adding rotational stops to the Drum to prevent a nut failure during an accidental hardstop encounter

12. The ARGOS Heavy System (X, Y and Z) does not provide fastener torque callouts on assembly drawings. Torque limits are required to ensure that events such as joint separation, fastener overload and joint slip are prevented. Recommendation 16: Add drawing flag notes that specifically callout the running torque limits (as applicable for self locking applications) and the minimum/maximum fastener torque above the actual (measured) running torque Ensure torque selections are based on a supporting stress analysis and account for fastener lubrication as applicable

13. The System Requirements Document (SRSD-08-007, Rev B) imposes independent strength verification testing for fasteners used in the critical load path (Requirement 3.3.2.1). After reviewing the verification methods, it was found that the fasteners were tested using a batch approach in conjunction with system proof load testing. This method fails to meet the independent verification requirement. Recommendation 17: Exchange fasteners with lot traceable versions Update engineering drawings to show TL (Trace Lot) in the bill of materials

45

7.3.2
1.

Findings Specific to the Z-axis Controller System


ARGOS Project engineers were never able to fully model their control system and hence, were not able to fully validate its performance for all variable operating conditions and all off-nominal conditions. Recommendation 18: Develop a detailed control system block diagram. Obtain outside assistance in controller design from Aeroscience and Flight Mechanics Division, Structural Engineering Division or Avionics System Division where needed

2.

Insufficient control block diagram. A block diagram of the control system is required for analysis prior to human rating the hardware. This should have been done with or without man rating. The block diagram defines critical control loop parameters needed to complete the design of a system. Even with a Black Box in the loop it is possible to characterize the system (at least to prove stability) with testing. From this it can also be determined if the safety systems are fast enough to protect the test subject. Also from this it can be determined if the CAN bus speed is sufficient to provide communications between the blocks with causing delays. Recommendation 19: Characterize the motor controller/motor transfer function in the region of operation.

3.

Insufficient end-to-end frequency response testing Recommendation 20: Develop an ARGOS simulation to conduct performance analysis with off nominal conditions and failures, Monte Carlo runs, frequency response, and verification analysis.

4.

Insufficient stability analysis Recommendation 21: Perform linear stability analysis.

5.

Insufficient constrained motion testing Recommendation 22: Develop performance and stability requirements and test matrix to evaluate the requirements.

6.

No simulation (i.e. no controller model) to verify system performance, stability utilizing Monte Carlo techniques. Recommendation 23: Perform ARGOS simulation and ARGOS comparison tests. Determine what ARGOS data is required to trouble shoot test runs and verify test run results

7.

No analysis defining the vertical and horizontal system interaction or lack thereof. Reommendation 24: Create a dedicated ARGOS control system document

8.

Given the ARGOS team has no motor controller (Danaher S620) /motor (AKM73P) block diagrams since they are proprietary, they may not have correctly understood their control system. The inner control loop consisting of the motor controller/motor is a proportional, integral, derivative (PID). (A block diagram of the ARGOS control system which they gave to this Investigation Board was incorrect, Figure 13 Closed Loop Control Plant Diagram; SRSD-11-016 Rev. D). Without the block diagram the control loop parameters cannot be determined correctly. The statement that the Black Box motor controller has the gain turned 46

down to act as a Limiter is suspect. Since the gain is low on this stage the gain on the previous stage has to be higher to drive the controller during rapid changes in speed. This can lead to saturating the amplifier on the previous stage resulting in a nonlinear response. Recommendation 25: Insufficient recorded controller data during ARGOS operation to verify control system is operating as expected. More data should be recorded. The block diagram is a good guide to select points inside the control loop for data collection. These additional points are necessary to fully understand the system operation and are useful for analysis of anonymous events (like this one). Since the control loop operates at a 1000 updates/sec rate it is necessary to acquire data from the control system at this rate. This is a lot of data but the resolution is necessary to allow analysis of what happens in the control loop on a cycle by cycle basis. This is how saturation, nonlinearity, and noise can be observed.

7.3.3
1.

Findings Specific to Software Design


The software does not detect motor run-away conditions or capture scenarios where the output velocity performs off-nominal to a commanded input. Nominal system performance has not been characterized. Recommendation 26: Characterize acceptable motor velocity and torque scenarios that are acceptable to allow for development of fault detection for detection algorithms and subsequent emergency shutdown. Design software fault detection for detection and automatic braking in the case that the velocity output is not within acceptable range of the control command.

2.

Software Classification and Safety Criticality has not be proposed and concurred on by the JSC Software Engineering Process Group. ARGOS software is legacy software developed prior to the JSC CMMI certification and needs to be mapped to JPR 7150.2 requirements. Recommendation 27: Perform internal project determination of the software classification and safety classification per JPR 7150.2A and ensure concurrence with the Engineering Directorate Software Engineering Process Group (EA SEPG) Perform compliance matrix of software requirements in NPR 7150.2A to the process used for development of ARGOS software.

3.

Not all software control variables are bounded for minimum, maximum and acceptable values. Recommendation 28: For the ATM60 SICK absolute encoder, it is unlikely to get a wrap-around error causing a jump from the maximum to minimum position, however adding a logic check to ensure there is enough range from the initial encoder position for the ARGOS range of motion, or increasing the intelligence of the calculation to handle absolute encoder wrap-around would prevent a problem from ever occurring.

4.

The motor velocity command from the Trick Simulation does not have minimum and maximum limits. Recommendation 29: Consider placing maximum and minimum limits on the motor velocity command or develop an understanding of what commanding the motor controller to velocities greater than its capabilities does to the internal control loop. It is realized that the S620 CAN bus parameter is a 32 bit signed integer; however the motor controller response to commands greater than the maximum motor velocity is not understood.

5.

Node guarding allows for a 12-18 inches (30.5 to 45.7 cm) worst case drop. Can the 100msec heartbeat cycle time be reduced to decrease this drop distance? 47

Recommendation 30: Consider testing node guard times to determine if the heartbeat time between the Trick Simulation and motor controller can be reduced. Sufficient resolution is required prior to resuming ARGOS human off-load operations

7.3.4
1.

Findings Specific to Safety and Hazards


After reviewing the HAs and how they were used, the most significant conclusion drawn by the Investigation Board was that an Integrated Hazard Analysis addressing the hazards to the test subject from operation of the ARGOS as a system was not performed. Integrating the two facets, i.e., the effects of human-induced non-linearities into the operation of the system as a whole and the control system in particular, would likely have identified additional controls that needed to be in place when operating with a human in the loop. Additionally, the existence of two separately authored HAs was determined to have created a sense of false confidence by each group that the other had considered all the possible hazards to humans. Recommendation 31: Conduct an integrated hazard analysis of the total ARGOS-to-human system and document the results in one resource. Review of the safety documents revealed gaps in the hazard assessment and the process of reviewing the hazards for the facility HA (SRSD-12-007):

2.

Hazard #30, Testing Related Injuries, Risk Assessment Code (RAC) II/C/3, identified the possibility a test subject could fall from 12-18 inches (30.5 to 45.7 cm). This was considered to be a concession to the highly sensitive requirements of the control system. In the event of an automatic e-stop with shutdown of motor controller output stage it is possible for the test participant to have a 12 to 18 inch descent which could lead to contact with the ground, mockups, or tools. This is caused by the lag between the fault detection and the engagement of the brakes. The system has been designed to minimize this time. The board determined, through interviews with center medical staff, that a drop of 12-18 inches could result in severe injury in the case of horizontal operations if the test subject contacted the underlying mockups on an unprotected throat or eyes. In fact, this short drop distance was determined to increase the likelihood of injury (regardless of orientation) since the test subject would not have distance to react and reposition themselves in the fall compared to one starting at twice that height. Recommendation 32: Change Hazard #30 RAC and improve the controls. This hazard should have a I/B/1 RAC before controls are applied. Do not combine this hazard with Slips, Trips and Falls hazards; treat it separately so that RAC are appropriate. Consult with the IRB and center medical officers to determine the physiological implications of a fall from various heights or from uncommanded motion in any axis.

3.

The signature portion at the bottom of each page of the Hazard Analysis was overlooked. This portion is designed for the test director to characterize each hazard as either Open/No Action, Closed/Controlled, Closed/Eliminated, or Closed/Accepted. The test team was unaware their active participation was required to designate each hazard as Open or Closed and werent su re of the purpose of that portion of each page. Additionally, since Hazard #30 conceded the possibility of a fall from 12-18 inches, the Hazard should have been designated by the test director as Closed/Accepted, requiring an additional concurrence signature by management. The board felt this would have raised the sensitivity to this risk and possible additional questioning. Recommendation 33: 48

4.

Review the Hazard Analysis worksheets to designate which hazards are Closed, Open, or Accepted prior to each test. Present that summary at the TRR.

Hazard #1, Uncommanded Motion, RAC III/B/3, described the hazard as damage to the equipment. All the controls in this worksheet were designed to protect the ARGOS facility hardware. The potential for uncommanded motion to harm the test subject or observers was overlooked. Given that there was an earlier incident of a test subject being dropped that was unresolved, the RAC should probably have reflected a higher probability of occurring, e.g., I/B/I. Recommendation 34: Expand Hazard #1 or Hazard #30 to include injury to the test subjects due to uncommanded motion and revise the RACs accordingly. Recognize in the Hazard Causes that uncommanded motion can be in either direction and that mockup configuration and the 'virtual' upper limit switch configuration play a role in determining hazard potential. Given that the X-Y Horizontal System does not consider test subjects either and relies on common Hazard Controls, Hazard #2 should be reviewed as well.

5.

Reviews of the available data and video show that the uncommanded motion of the Z-axis controller was actually at least 5 feet (1.5 m), not the 12-18 inches (30-46 cm) that the test subject experienced before he contacted the handrail. The line continued to unspool from the motor after the test subject was down. This implies that the uncommanded motion hazard is actually greater than was implicitly accepted in the HA if the test subjects positioning had been different. Recommendation 35: Analyze the control system and hardware/software to determine the true potential for fall from height. Revise the Hazard Analysis and determine sufficient controls for the maximum distance, which currently is 5 feet (1.5 m).

6.

In July 2012, a major redesign of the ARGOS support structure was completed, which added 7.5 feet (2.3 m) of height to the system. However, no additional hazards were identified in the facility HA because the height change didnt represent a danger to the equipment. When the first TRR was conducted in August 2012, the earlier HA was presented as unchanged/released. The height change wasnt noted in the facility HA until November 2012. Even then, it only addressed th e test teams access to the hardware via the manbasket, not the hazard to the test subject. Due to its limited scope, the test subject HA did not address the height changes to the facility structure either. Recommendation 36: Analyze the safety implications to the test subject from the 2012 ARGOS height increase. Revise the Hazard Analysis and determine sufficient controls for the maximum fall distance.

7.

The board determined that the test team relied heavily on the TRR approval process as concurrence that the TRR Board had also reviewed and approved all the documents referenced in the TRR presentation package, which was not the case. The hazards and their controls werent summarized in the TRR presentation packages. In the case of the August 2012 TRR, the HA was simply listed as unchanged/released. The test team treated the TRR as a technical review of the adequacy of the HAs, FMEAs, and Operating Procedures without going into the technical detail of those documents. It is not the responsibility of the TRR board to do the technical review of the adequacy of the data products. Their concurrence to proceed with testing should not be construed as an independent sanity check on the HAs. Recommendation 37: Perform an independent review of the adequacy of the updated Hazard Analysis. Provide a summary of the Hazard Worksheets at each TRR, listing their Before and After RACs.

49

8 References

50

8.1

Appointment Letter

51

52

8.2

Materials Chemical Analysis Report

53

54

55

56

57

58

59

60

8.3

Materials Metallurgical Analysis Report

61

62

NR

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

8.4

ARGOS Startup Checklist

2013-01-16 Startup Checklist AWI-14 Gen2 RevG.xlsx

106

107

108

109

110

111

112

Das könnte Ihnen auch gefallen