Sie sind auf Seite 1von 3

International Journal of Trend in Scientific Research and Development (IJTSRD)

International Open Access Journal | www.ijtsrd.com

ISSN No: 2456 - 6470 | Volume - 2 | Issue – 6 | Sep – Oct 2018

Fault Detection and Prediction iin


n Cloud Computing
Swetha. S, Dr. S. Venkatesh kumar
Department of Computer Applications, Dr. Sns Rajalakshmi
shmi College of Arts & Science,
Coimbatore, Tamil Nadu, India

ABSTRACT
Cloud computing is a new technology in distributed such patterns or not. Additionally, each cloud server
computing. Usage of Cloud computing is increasing can be defined
ined by a state which indicates whether the
quickly day by day. In order to help the customers and cloud is running properly or it is facing some failure.
businesses agreeably, fault occurring in datacenters Limitations such as CPU usage, memory usage etc.
and servers must be detected and predicted
cted efficiently can be maintained for each of the servers.
in order to launch mechanisms to bear the failures
occurred. Failure in one of the hosted datacenters may LITERATURE SURVEY:
broadcast to other datacenters and make the situation  Accessibility directly depends upon how fast the
of poorer quality. In order to prevent such cloud structure can detect any errors and take
circumstances, one can predict a failure
ilure flourishing necessary steps to troubleshoot the problem.
throughout the cloud computing system and launch  It is a major test for service providers to provide
mechanisms to deal with it proactively. stable service or else it may cause huge financial
loss for organizations.
Keyword: Cloud computing, failure detection, cloud
datacenters,
centers, probability and statistics, Bayesian PROBLEM STATEMENT:
probability, machine learning Datacenters, fail
failure  The large scale and dynamic nature of cloud has
detection, failure management, dependable added extra difficulty when it comes to fault
computing, coordinated fault propagation, IPMI, detection and management.
FTB, and Clusters.  While it is true that effective fault detection and
prediction is serious, one should also know the
INTRODUCTION reasons that led to the fault.
Cloud computing is a new technology in distributed
computing. Usage of Cloud computing is increasing Pre-process:
quickly day by day. In order to help the customers and Firstly, we derive the message patterns from the
businesses agreeably, fault occurring in datacenters recorded messages in a message logs.
and servers must be detected and predicted effi
efficiently
in order to launch mechanisms to bear the failures Extracting the failure information:
occurred. Failure in one of the hosted datacenters may As discussed above, messages usually include a field
broadcast to other datacenters and make the situation which signifies priority information and helps the
worse. In order to prevent such situations, one can administrators to handle the messages according to
predict a failure proliferating throughout
ughout the cloud their severity.
computing system and launch mechanisms to deal
with it proactively. One of the ways to predict failures Message pattern generation:
is to train a machine to predict failure on the basis of  A message pattern is defined as a set of message
e-mails
mails or logs passed between various components of types in the message window.
the cloud. In the trainingg session, the machine can  The message pattern can be expressed as aorder of
identify certain message patterns connecting to failure messages by either considering or overlooking
of datacenters. Later on, the machine can be used to their order.
check whether a certain group of e-mails,
mails, logs follow

@ IJTSRD | Available Online @ www.ijtsrd.com | Volume – 2 | Issue – 6 | Sep-Oct


Oct 2018 Page: 878
International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456-6470
2456
Extracting the failure information: and manage most of the event
e communication
Messages generally include afield which signifies throughout the system.
priority information and helps the administrators to
handle the messages according to their severity. Intelligent Platform Management Interface (IPMI)
 The Intelligent Platform Management Interface
FAILURE DETECTION AND PREDICTION (IPMI) defines a set of common interfaces to a
MECHANISMS computer system which can be used to monitor
 We may label the runtime health related data with system health.
one of two classes, Class 0 for normal behavior  The BMC connects
onnects to SCs within the same chassis
and Class 1 for situations with failures. Then, through the Intelligent Platform Management
Class 1 is very unusual compared with Class 0. Bus/Bridge (IPMB). Among other pieces of
 In addition, data from the unusual class may be information, IPMI maintains a Sensor Data
incomplete because of some
me collection problems. Records (SDR) repository which provides the
readings.
Ensemble of Bayesian Models for Failure
Detection DESIGN AND IMPLEMENTATION
 A data point is labeled as normal or failure based  FTB-IPMI is designed to run as a single stand-
stand
on its probability of appearance as a normal data alone muse which handles multiple operations like
point. reading IPMI sensors, classifying events based on
 To construct the probabilistic model and assure severity, and propagating the fault information via
high detection precision, we develop an ensemble FTB.
of Bayesian sub models to represent a multi model  A single instance of the FTB-IPMI
FTB muse running
probability delivery. on one node can manage an entire cluster. Once
adjusted, the following actions are performed at
Decision Tree for Failure Prediction periodic user-set
set intervals.
 The failure detection method based on an
ensemble of Bayesian models presented in the CLOUD USAGE
preceding section identifiess anomalous behaviors  Private Clouds are always owned by the
in a data center. The anomalies are reported to the respective enterprises. Functionalities are not
system administrations for verification under directly visible to the customer, though in someso
failures. cases services with cloud enhanced features may
 The goodness of a split is measured by impurity. be offered this is similar to Software as a Service.
A split is pure if after the split, for all branches, all • Example: eBay.
the data taking a branch
ranch belong to the same class.  Public Clouds enterprises may use cloud
We use entropy to quantify impurity. functionality from others, respectively offer their
own services to users outside of the company.
BACKGROUND
Fault-Tolerance Backplane (FTB) TYPES OF FAULTS
 The CIFTS Fault Tolerance Backplane is an These faults can be classified on several factors such
asynchronous messaging backplane that provides as:
communication among the various system Network fault: A Fault occur in a network due to
software components. The Fault Tolerance network partition, Packet Loss, Packet corruption,
Backplane (FTB) provides a common destination failure, link failure, etc.
infrastructure for the Operating System, Libraries
and Applications to exchange information related Physical faults: This Fault can occur in hardware like
to hardware and software failures. fault in CPUs, Fault in memory, Fault in storage, etc.
 Different components can subscribe tobe alerted
about onene or more events of interest from other Media faults: Fault occurs due to media head
components, as well as notify other components crashes.
about the faults it detects. The FTB framework
comprises a set of distributed daemons called FTB Processor faults: fault occurs in the processor due to
Agents which contain the bulk of the FTB logic operating system crashes.

@ IJTSRD | Available Online @ www.ijtsrd.com | Volume – 2 | Issue – 6 | Sep-Oct


Oct 2018 Page: 879
International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456-6470
2456
Process faults: A fault which occurs due to shortage Networked Systems Design and Implementation.
of resource, software bugs, etc. Berkeley, CA, USA: USENIX Association, 16
pages.
Service expiry fault: The service time of a resource 2. Alamri, Atif, Wasai Shadab dab Ansari, Mohammad
will expire when some applications used by it. Mehedi Hassan, M. Shamim Hossain,
A failure occurs during computation on system Abdulhameed Alelaiwi, and M. Anwar Hossain.
resources can be classified as: 2013. A Survey on Sensor-Cloud:
Sensor Architecture,
 OMISSION FAILURE Applications, and Approaches. International
 TIMING FAILURE Journal of Distributed Sensor Networks, 18 pages.
 RESPONSE FAILURE 3. Ali, M., Khan, S., ., Vasilakos, A.. 2015. Security in
 CRASH FAILURE cloud computing: Opportunities and challenges,
Information Sciences, 305(1), 357-383.
357
Permanent: These failures occur by accidentally by a 4. Buyya, Rajkumar, et al. "Cloud computing and
wire cut, power breakdowns and etc. It is easy to emerging IT platforms: Vision, hype, and reality
reproduce these failures. These failures can cause for delivering computing as the 5th utility."
util Future
major disruptions and some part of the system may Generation computer systems 25.6 (2009): 599- 599
not be functioning as desired. 616.
5. Zhang, Qi, Lu Cheng, and Raouf Boutaba. "Cloud
Intermittent:: These are some of the failures that computing: state-of-the
the-art and research
appear occasionally. Mostly these failures are ignored challenges." Journal of internet services and
while testing the system and only appear when the applications 1.1 (2010): 7--18.
system goes into operation. Therefore, it is hard to 6. Y. Watanabe, H. Otsuka, M. Sonoda, S. Kikuchi
predict the extent
nt of damage these failures can bring and Y. Matsumoto, "Online failure prediction in
to the system. cloud datacenters
centers by real-
real time message pattern
learning," 4th IEEE International Conference on
Transient: These are some failures that are caused by Cloud Computing Technology and Science
some inherent fault in the system. As these failures Proceedings, Taipei, 2012, pp. 504-511.504 doi:
are corrected by retrying roll back the system to 10.1109/CloudCom.2012.6427566
CloudCom.2012.6427566
previous state such as restarting software or resending
7. Guan, Qiang, Ziming Zhang, and Song Fu.
a message. "Ensemble of bayesian predictors and decision
trees for proactive failure management in cloud
CONCLUSION: computing systems." Journal of Communications
Despite the many advantages offered by cloud
7.1 (2012): 52-61.
computing, there are also networking concerns that
8. Guan, Qiang, Ziming Zhang, and Song So Fu.
creel its fast implementation. This article has
"Proactive failure management by integrated
reviewed and analyzed the networking
networking-related issues
unsupervised and semi-supervised
supervised learning for
that arise due to resource outsourcing, the virtualized,
dependable cloud systems." Availability,
shared, and public nature of cloud computing, the
Reliability and Security (ARES), 2011 Sixth
emerging challenges from security breaches, and the
International Conference on. IEEE,
increasing need to provide a resilient cloud computing
infrastructure and services. This discussion also 9. R. K. Sahoo, A. J. Oliner, I. Rish, M. Gupta, J. E.
presented and examined ned related contributions from Moreira, S. Ma, R. Vilalta, and A.
industry, academia and correction fields
fields. Finally, the Sivasubramaniam, “Critical event prediction for
article also highlighted relevant cloud computing proactive management in large-scale
large computer
areas requiring further research. clusters,” In Proceedings of ACM International
Conference on Knowledge Discovery and Data
REFERENCE Dining (KDD), 2003.
1. Agarwal, Sharad, John Dunagan, Navendu Jain, liner, R. K. Sahoo, J. E. Moreira, M.
10. A. J. Oliner,
Stefan Saroiu, Alec Wolman, and Harbinder Gupta, and A. Sivasubramaniam, “Fault-aware
“Fault job
Bhogan. 2010. Volley: Automated Data scheduling for BlueGene/Lsystems,”
Blue In
Placement for Geo-distributed
distributed Cloud Services. Proceedings of IEEE/ACM International Parallel
Proceedings of the 7th USENIX Conference on and Distributed Processing Symposium (IPDPS),
2004

@ IJTSRD | Available Online @ www.ijtsrd.com | Volume – 2 | Issue – 6 | Sep-Oct


Oct 2018 Page: 880

Das könnte Ihnen auch gefallen