Beruflich Dokumente
Kultur Dokumente
Abstract Extracting unstructured BigData from the a completely parallel manner. The framework sorts the
dataset and converting to Hadoop format. The resultant outputs of the maps, which are then input to the reduce
data is stored in the cloud and secured by double tasks.
encryption. The user can retrieve the data in the cloud Cloud computing is a type of Internet-based computing that
with the help of user interface by double decryption. The provides shared computer processing resources and data to
security to the data in the cloud is provided using Fully computers and other devices on demand. Cloud computing
Homomorphic algorithm. As a result of efficient security or, more simply, cloud security refers to a broad set
encryption, transmission and storage of sensitive data is of policies, technologies, and controls deployed to protect
achieved. We analyze the existing search algorithm over data, applications, and the associated infrastructure of cloud
cipher text, for the problem that most algorithm will computing. Homomorphic encryption is a form of
disclosure user's access patterns, we propose a new encryption that allows computations to be carried out on
method of private information retrieval supporting ciphertext, thus generating an encrypted result which, when
keyword search which combined with Homomorphic decrypted, matches the result of operations performed on the
encryption and private information retrieval. plaintext. We define the relaxed notion of a
semihomomorphic encryption scheme, where the plaintext
I. INTRODUCTION can be recovered as long as the computed function does not
increase the size of the input too much. But the
Big data is a term for data sets that are so large or complex disadvantage of these two algorithms is that the encrypted
that traditional data processing application softwares are data can be decrypted easily. To overcome the
inadequate to deal with them. Challenges include capture, disadvantages of homomorphic and semi-homomorphic
storage, analysis, data curation, search, sharing, transfer, encryption we propose a fully homomorphic encryption
visualization, querying, updating and information privacy. scheme i.e., a scheme that allows one to evaluate circuits
The goal of most big data systems is to surface insights and over encrypted data without being able to decrypt.
connections from large volumes of heterogeneous data that
would not be possible using conventional methods.
Apache Hadoop is an open source software framework for II. EXISTING SYSTEM
storage and large scale processing of data-sets on clusters of
commodity hardware. Hadoop Distributed File System a Crawlers algorithm for extraction
distributed file-system that stores data on the commodity A Web crawler, sometimes called a spider, is an Internet but
machines, providing very high aggregate bandwidth across that systematically browses the World Wide Web, typically
the cluster. for the purpose of Web indexing.Web search engines and
A MapReduce job usually splits the input data-set into some other sites use Web crawling or spidering software to
independent chunks which are processed by the map tasks in update their web content or indices of others sites' web