Sie sind auf Seite 1von 2

ALERT REPORTING STRUCTURE

1. Receiving Alert
a. Alert can be received through
i. Flashing on Dashboard of any monitoring tool
ii. email
iii. SMS
2. Kinds of Alerts
a. Availability
i. Physical Machine
ii. Virtual Machine
iii. Cloud
iv. Containers
v. Web Sites/URLs
vi. Network
b. Performance
i. CPU
ii. RAM (Memory)
iii. Hard Disk
iv. Application
1. APIs
a. Read Request
b. Send Response
2. Web hooks
3. Web Sockets
4. Web Request
5. Logs
6. Errors
v. Database
1. SQL Quries
vi. Middleware
3. Identifying Alert
a. False Alert
i. There might be a chance of someone is working on any of the component
and forgot to update monitoring team.
ii. Error/Failure is known but not reported to monitoring team
iii. During Maintenance window
iv. Something is down for fraction of seconds, but reported by monitoring tool
b. True Alert
i. Rest of the received alerts are true.
4. Severity of Alert (Example: Disk Space)
a. OK/Good
i. Disk space is full less than 79% of total disk space.
b. Warning
i. Disk space is available in between 80% to 89% of total disk space.
c. CRITICAL
i. Disk space is full more than 90% of total disk space.
5. Priority of Alert
a. P1 – Impact of alert is 100% to the availability of application. Everything is down
and customer is unable to use application, data is also not available at customer
end, network is down. Business of customer is hampering so there may be
financial loss to customer.
b. P2 – Impact of alert is less, i.e. 50%. Some functions of application are working and
some are unavailable. Business of customer is hampering so there may be financial
loss to customer.
c. P3 – Impact of alert is very less, i.e. 25%. Optional components of application are
not working and not hampering business of customer.
d. Normal – These alerts are some informative alerts, do not hamper on any
component of application, no financial loss.
6. Reporting Alert
a. Ticketing Tool – This tracking of alerts when they have reported, when was replied
first, when get resolved. This is helpful for SLA.
b. Email – Follow up or escalation can be done by emails.
c. Phone/Bridge/Meeting – This must be used to take care of P1 & P2 alerts/issues.
Applications need to up & running in good condition as soon as possible.
7. Escalation Matrix Timeline
a. Reporting issue (Minutes)
i. P1/P2 – 1 to 9
ii. P3/Normal – 1 to 30
b. Follow-up with engineer till 2 follow-up
i. P1/P2 – after reporting issue, interval of half an hour
ii. P3/Normal – after reporting issue, interval of 2 hours
c. Follow-up with Team Lead till 2 follow-up
i. P1/P2 – 1 hour completion after reporting issue, interval of half an hour
ii. P3/Normal – 4 hours completion after reporting issue, interval of 2 hours
d. Follow-up with Manager till 2 follow-up
i. P1/P2 – 2 hours completion after reporting issue, interval of half an hour
ii. P3/Normal – 8 hours completion after reporting issue, interval of 2 hours
e. Follow-up with Director or Bridge
f. Follow-up with Vice Present or Bridge
8. Closing Alert
a. Close the resolved ticket from ticketing tool. If you have opened bridge, then
announce the issue has resolved and ticket has been closed.

End of the Document

Das könnte Ihnen auch gefallen