STATOIL-HCL Partnership

Root Cause Analysis Project Name SAP Code Tower Trac! Name I" #Incident Re$uest% Reported "ate Resolution "ate &ersion Num'er "ate ()* 09/06/2012 STATOIL Database / SQL
09/06/2012 5:38:16 PM(CET)

Re+ision History &ersion No 0.1 "ate 10/06/2012 Prepared 'y ,odi-ied 'y Avinash Kumar Prepared RCA


Prepared 0y Occurred - "ate 2 Time Reported - "ate 2 Time Classi-ication Re-erence i- any ,onitorin. System Alert Pro'lem Se+erity Pro'lem Statement 0usiness Impact

Avinash umar 09/06/2012 0!"#$"2% P&'C()*

09/06/2012 05:45:16 PM(CET)

S,1 Name "iscussed - "ate 2 Time Closed - "ate 2 Time "e-ect Issue

Avinash Kumar 09/06/2012 !"%!"1+ P&'C()*

09/06/2012 11:06:52 PM(CET)



3)( PRO0L1, "1TAILS SL No Chronolo.y o- the 1+ents

,hi-e ./r in0 /n ti1 et Q11#$!%# 2/r ne. SQL 1-uster instan1e insta--ati/n P&626 /n n/des S)3,1$0$/S)3,1$094 the SQL 5nstan1e insta--ati/n did n/t 1/mp-ete su11ess2u--6 due t/ Servi1e a11/unt d/es n/t have re7uired previ-ed0ees4 .hi1h resu-ted in be1/min0 and standa-/ne insta--ati/n. Later it .as de1ided that .e .i-- rem/ve the 2ai-ed insta--ati/n and re3insta-- the instan1e P&626.809306320124 0#"!$ A&9 :n 09/06/2012 ar/und 11"00 A&'C()* A;a6 started un3insta--ati/n4 ,hi-e un3insta--in0 2ai-ed instan1e P&626 2r/m Passive n/de S)3,1$094 en1/untered s/me err/r'un3 insta-- pendin0 2/r reb//t* and it did n/t un3insta-- su11ess2u--64 at this m/ment P&611 .as sti-- runnin0.8A;a6 &ishra9 A) 16"!0 P&4 Rune 1/nta1ted A;a6 and updated that 5nstan1e P&611 is n/t visib-e4 A;a6 then -// ed int/ the issue 2/und that P&611 res/ur1es .ere d/.n and 1/nta1ted .ind/.s team t/ -// int/ it. Ra esh/Sur;it .as a-s/ in2/rmed ab/ut this issue. Avinash .as a-s/ inv/-ved 2r/m 1+"#0 P&4 -ater it .as 2/und that (&C C-uster enab-er is 2ai-ed and a-- the ass/1iated 1-uster res/ur1es .ere unavai-ab-e. As per Rune su00esti/ns4 </th the 1-uster n/de .ere reb//ted b6 A;a6. Since the IP address and Networ! name cluster resource were dependent on the 1,C4Cluster5 and 1,C4Cluster4ena'ler went down and we were not a'le to see any cluster resource -or P,6** instance in the cluster console) At 1$"1! Avinash 1//rdinated .ith St/ra0e team and =CL stora.e team in-ormed that the SAN lin! is down5 which has resulted in acti+e Production Instance P,6** .oin. down5 and all the cluster resources includin. IPAddress5 S7L

Networ! Name and associated "is! resources were not +isi'le in cluster console the same messa0e .ere 1/nve6ed t/ Rune) At 19"1! =CL St/ra0e team in2/rmed that Lin s are up n/.4 but the6 .ere sti-- seein0 s/me issue .ith dis res/ur1es /n b/th the n/de. <6 this time Avinash 2/und that a-the Dis res/ur1es and P&626 instan1e are up /n-ine and runnin0 /n S)3 ,1$09'ma in0 it a1tive n/de*4 but the 5PAddress and SQL >ame res/ur1e ass/1iated .ith P&611 .ere sti-- n/t visib-e in 1-uster 1/ns/-e. Sin1e the .h/-e P&611 1-uster 5nstan1e and a-- ass/1iated res/ur1es .ere d/.n4 .hi-e dis1ussin0 .ith Rune it .as de1ided that n/. .ind/.s 1-uster 0r/up need t/ rest/re the 1-uster 1/n2i0urati/n t/ 0et the 2ai-ed instan1e up and runnin0 a0ain. At 20"2#4 Rune su00ested us t/ un3insta-- P&626 instan1e 2r/m ear-ier a1tive n/de S)3 ,1$0$ 'it .as b6 that time it be1ame passive n/de as a-- the res/ur1es .ere runnin0 /n S)3,1$09.* At 20"!2 P& ?irst attempt t/ un3insta-- instan1e P&626 2r/m n/de S)3,1$0$ 2ai-ed as the res/ur1es .ere n/t present there and .as runnin0 /n S)3,1$09. Avinash then 2ai-ed/ver the instan1e P&626 2r/m St3,1$09 t/ St3,1$0$ and tried t/ un3insta-P&626 usin0 the setup.'A) this time S)3,1$0$ .as passive n/de 2/r P&611 instan1e as the ass/1iated res/ur1es .ere present /n S)3,1$09 n/de.* A) 21"09 Avinash started un3insta--in0 the P&626 instan1e 2r/m the n/de S)3 ,1$0$'passive n/de*4 Durin0 the setup .here .e 1h//se the instan1e name t/ be rem/ved 2r/m dr/p3d/.n -ist4 Avinash tried se-e1tin0 the 5nstan1e P&626 as it .as sh/.in0 P&611 as de2au-t va-ue. 5 .as e@pe1tin0 the 1/n2irmati/n pa0e t/ 1/me4 Durin0 the se-e1ti/n pr/1ess 2r/m the dr/p3d/.n -ist the m/use behaved errati1a--6 and it pr/1eeded .ith ARem/ve >/de Pr/0ressB step4 even be2/re the 5nstan1e >ame .as se-e1ted and 1/n2irm the se-e1ti/n. )he un3insta--ati/n pr/1ess started and Avinash 1/u-d n/t st/p it '.ith n/ /pti/n t/ 1an1e- it*. :n1e the step .as 1/mp-eted and Avinash veri2ied that the binaries /2 P&611 instan1e .as rem/ved 2r/m the passive n/de S)3,1$0$. A) 21"1# P& Avinash immediate-6 in2/rmed Rune4 Ra esh and Sur;it ab/ut the issue4 that binaries 2/r P&611 instan1e .ere uninsta--ed 2r/m the n/de S)3,1$0$. 6. At 21"%!4 Rune a0ain 1/nta1ted Avinash and in2/rmed that there .i-- be an attempt t/ rest/re the 1-uster res/ur1e 1/n2i0urati/n a0ain ne@t da6 m/rnin0 ti-- that time as ed n/t t/ per2/rm an6thin0 2urther. A) 2"00A&4 Avinash a0ain 1he1 ed the status /2 the P&611 instan1e /n n/de S)3 ,1$09 and 2/und that the res/ur1es and databases .ere up and runnin0 /n n/de S)3 ,1$09.

Note: All the time mentioned above is in CET time

8)( ROOT CA/S1 ANAL9SIS 8)* Techni$ue /sed with "etails ,ind/.s event L/0s and C-uster L/0es .ere revie.ed.

8)3 Root Cause Analysis Impact on other Areas A-- the user databases .ent d/.n /n instan1e P&611.

8)8 Possi'le Causes SL No * Causes Re7uired permissi/n t/ SQLServer Servi1e a11/unt and SQL Server Servi1e A0ent A11/unt .ere 0ranted /n 1-uster res/ur1es4 resu-ted in P&626 instan1e insta--ati/n 2ai-ure. St/ra0e Lin D/.n has br/u0ht d/.n the (&C C-uster enab-er servi1e4 .hi1h 2urther br/u0ht d/.n P&611 5nstan1e and a-- ass/1iated 1-uster res/ur1es (rrati1 behavi/r /2 m/use 3 A sin0-e m/use 1-i1 s resu-ted in d/ub-e 1-i1 s .hi1h 2urther resu-ted in un3insta--ati/n /2 P&611 instan1e SQL Server.

8): Process Impro+ement SL No * Process Template Name Cn3insta--ati/n 1he1 -ist t/ be prepared t/ av/id this ind /2 mista e in 2uture. Impro+ement Re$uired

:)( R1SOL/TION ,ind/.s 1-uster rest/red

Solution Tested 0y

Deri2ied 1-uster res/ur1es 2/r P&611 .ere /n-ine

Testin. Note

A-- the database are up and runnin0 and app-i1ati/n/users are ab-e t/ 1/nne1t t/ database su11ess2u--6.

;)( ACTION "1TAILS ;)* Correcti+e Action SL No * 3 Action Item Owner "ate Completi on "ate 06/10/201 2 Status #Open Closed% 1-/sed /pen

C-uster res/ur1e 1/n2i0urati/ns .ere rest/red t/ brin0 the P&611 instan1e /n-ine. Add the 2ai-/ver 1-uster b6 re3 insta-- SQL Server /n the Passive >/de S)3,1$0$.

Rune/,ind/. s )eam D<A

;)3 Pre+enti+e Action SL No * Action Item Owner "ate Completi on "ate Status #Open Closed% :pen

Team lead to review all r! ial a tivitie" a#d a""i$# t%e re"o!r e" a ordi#$l&'

D<A/ ,inte-/

(#)i#"tallatio# %e *li"t to +e ,re,ared wit% ever& !#)i#"tallatio# wit% " ree#"%ot'