Exploiting Dynamic Resource Allocation For Efficient Parallel Data Processing in The Cloud Abstract

EXPLOITING DYNAMIC RESOURCE ALLOCATION FOR EFFICIENT PARALLEL DATA PROCESSING IN THE CLOUD Abstract:In recent years Cloud
Computing has emerged as a promising new approach for ad-hoc parallel data processing. Major cloud computing companies have started to integrate frame-works for parallel data processing in their product portfolio, making it easy for customers to access these services and to position their programs. However, the processing frameworks which are currently used stem from the field of group computing and disregard the particular nature of a cloud. As a result, the allocated compute resources may be inadequate for big parts of the submitted job and unnecessarily increase processing time and cost. In this paper we discuss the opportunities and challenges for efficient parallel data processing in clouds and present our ongoing research project Nephele. Nephele is the first data processing structure to explicitly exploit the dynamic resource allocation offered by today's compute clouds for both, task scheduling and execution. It allows assigning the particular tasks of a processing job to different types of virtual machines and takes care of their instantiation and termination during the job execution. Based on this new framework, we perform evaluations on a compute cloud system and compare the results to the existing data processing framework Hadoop.
Existing System:The vast amount of data they have to deal with every day has made traditional database solutions prohibitively expensive. Instead, these companies have popularized an architectural example based on a large number of product servers. Problems like processing crawled documents or regenerate a web index are split into several independent subtasks, distributed among the available nodes, and computed in parallel. In order to simplify the development of distributed applications on top of such architectures, many of these companies have also built modified data processing frameworks. They can be classified by terms like high-throughput compute (HTC) or many-task computing (MTC), depending on the amount of data and the number of tasks involved in the computation. Only recently, Amazon has integrated Hadoop as one of its core infrastructure services. However, instead of embracing its dynamic resource allocation, current data processing frameworks rather expect the cloud to imitate the static nature of the cluster environments they
were originally designed for, e.g., at the moment the types and number of VMs allocated at the beginning of a compute job cannot be changed in the course of processing, although the tasks the job consists of might have completely different demands on the environment. As a result, rented resources may be inadequate for big parts of the processing job, which may lower the overall processing performance and increase the cost.
DISADVANTAGES
The processing framework of current cloud computing organizations are designed for static,homogenous cluster setups and disregard the particular nature of a cloud. Big parts of the submitted job and unnecessarily increasing process time and cost. Resources may be inadequate for big parts of the processing job
Proposed System:In this paper, we have discussed the challenges and opportunities for efficient similar data processing in cloud environments and presented Nephele, the first data processing framework to exploit the dynamic resource provisioning offered by todays IaaS clouds. We have described Nepheles basic architecture and presented a performance comparison to the well-established data processing framework Hadoop. The performance evaluation gives a first impression on how the ability to assign specific virtual machine types to specific tasks of a processing job, as well as the possibility to automatically allocate/deallocate virtual machines in the course of a job execution, can help to improve the overall resource utilization and, consequently, reduce the processing cost. With a structure like Nephele at hand, there are a variety of open research issues, which we plan to address for future work. In particular, we are interested in improving Nepheles ability to adapt to resource overload or underutilization during the job execution automatically.
ADVANTAGES
Dynamic allocating/deallocating different conpute resource from cloud
Execution of task carried out by a set of instances which shared task.
Allocating/deallocating of task ensures cost and time efficient
1. NETWORK MODULE:
Server-Client computing or networking is a distributed application architecture that partitions tasks or workloads between service providers (servers) and service requesters, called clients. Often clients and servers operate over a computer network on separate hardware. A server machine is a high-performance host that is running one o r m o r e server programs which share its resources with clients. A client also shares any of its resources; Clients therefore initiate communication sessions with servers which await (listen to) incoming requests.
2. LBS SERVICES:
In particular, users are reluctant to use Location Based Services (LBS s , s i n c e revealing their position may link to their identity. Even though a user may create a fake ID to access the service, her location alone may disclose her actual identity.Linking p o s i t i o n t o a n i n d i v i d u a l i s p o s s i b l e b y v a r i o u s m e a n s , s u c h a s p u b l i c l y a v a i l a b l e information city maps. When a user u wishes to pose a query, she sends her location to atrusted server, the anonymizer through a secure connection (SSL). The latter obfuscatesher location, replacing it with an anonymizing spatial region (ASR) that encloses u. TheASR is then forwarded to the Location Service (LS). Ignoring where exactly u is, the LSr e t r i e v e s ( a n d r e p o r t s t o t h e A Z ) ( A Z - A d m i n p r o c e s s ) a c a n d i d a t e s e t
( C S ) t h a t i s guaranteed to contain the query results for any possible user location inside the ASR. TheA Z r e c e i v e s t h e C S a n d r e p o r t s t o u t h e s u b s e t o f c a n d i d a t e s t h a t c o r r e s p o n d s t o h e r original query.
3. SYSTEM MODEL:
The ASR construction at the anonymization process abides by the users privacyrequirements. Particularly, specified an anonymity degree K by u, the ASR satisfies two properties: (i) it contains u and at least another K * 1 users, and (ii) even if the LS knewthe exact locations of all users in the system. We propose an edge ordering anonymization approach for users in road networks,which guarantees K-anonymity under the strict reciprocity requirement (describedlater). We identify the crucial concept of border nodes, an important indicator of the CSsize and of the query processing cost at the LS. We consider various edge orderings, and qualitatively assess t h e i r q u e r y performance based on border nodes.
We design efficient query processing mechanisms that exploit existing
network d a t a b a s e i n f r a s t r u c t u r e , a n d g u a r a n t e e C S i n c l u s i v e n e s s
We
a n d m i n i m a l i t y . Furthermore, they apply to various network storage schemes. devise batch execution techniques for anonymous queries that
significantlyreduce the overhead of the LS by computation sharing.
4. SCHEDULED TASK:
Recently, considerable research interest has focused on preventing identity inferencein location-based services. Proposing spatial cloaking techniques. In the following, wedescribe existing techniques for ASR computation (at the AZ) and query processing (atthe LS). At the end, we cover alternative location privacy approaches and discuss whythey are inappropriate to our problem setting. This offers privacy protection in the sensethat the actual user position u cannot be distinguished from others in the ASR, even whenmalicious LS is equipped/advanced enough to possess all user locations. This spatial K-
anonymity model is most widely used in location privacy research/applications, eventhough alternative models are emerging.
5. QUERY PROCESSING:
Processing is based on implementation of the theorem uses (network-based) searchoperations as off the shelf building blocks. Thus, the NAP query evaluation methodologyis readily deployable on existing systems, and can be easily adapted to different network s t o r a g e schemes. In this case, the queries are evaluated in a batch. we propose the network-based anonymization and processing (NAP) framework, the first system for Kanonymous query processing in road networks. NAP relies on a global user ordering and bucketization that satisfies reciprocity and guarantees K-anonymity. We identify theordering characteristics that affect subsequent processing, and qualitatively comparea l t e r n a t i v e s . T h e n , w e p r o p o s e q u e r y e v a l u a t i o n t e c h n i q u e s t h a t e x p l o i t t h e s e characteristics. In addition to user privacy, N A P a c h i e v e s l o w c o m p u t a t i o n a l a n d communication costs, and quick responses overall. It is readily deployable, requiring only basic network operations.
System Requirement Specification Software Interface

JDK 1.5 Java Swing SQL Server
Hardware Interface

PROCESSOR RAM MONITOR HARD DISK KEYBOARD MOUSE :
: : :
PENTIUM IV 2.6 GHz 15 COLOR STANDARD 102 KEYS 3 BUTTON
: 512 MB DD RAM : 40 GB

Exploiting Dynamic Resource Allocation For Efficient Parallel Data Processing in The Cloud Abstract

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Exploiting Dynamic Resource Allocation For Efficient Parallel Data Processing in The Cloud Abstract

Hochgeladen von

Copyright:

Verfügbare Formate

EXPLOITING DYNAMIC RESOURCE ALLOCATION FOR EFFICIENT PARALLEL DATA PROCESSING IN THE CLOUD Abstract:In recent years Cloud

Execution of task carried out by a set of instances which shared task.

Allocating/deallocating of task ensures cost and time efficient

significantlyreduce the overhead of the LS by computation sharing.

System Requirement Specification Software Interface

PROCESSOR RAM MONITOR HARD DISK KEYBOARD MOUSE :

PENTIUM IV 2.6 GHz 15 COLOR STANDARD 102 KEYS 3 BUTTON

Das könnte Ihnen auch gefallen