Sie sind auf Seite 1von 8

IDL - International Digital Library Of

Technology & Research


Volume 1, Issue 5, May 2017 Available at: www.dbpublications.org

International e-Journal For Technology And Research-2017

Algorithm to Convert Unstructured Data in Hadoop


and Framework to Secure Big Data in Cloud
1
Lakshmikantha G C2Anusha Desai3Keerthana G4Keerthi Kiran M E5Siri S Garadi
1
( Assistant professor, Department of CSE, VKIT, Bengaluru, Lakshmikantha.gc@gmail.com)
2
(B.E Student, Department of CSE, VKIT, Bengaluru, anushadesai21@gmail.com)
3
(B.E Student, Department of CSE, VKIT, Bengaluru, bramara.keerthana@gmail.com)
4
(B.E Student, Department of CSE, VKIT, Bengaluru,keerthikiran351995@gmail.com)
5
(B.E Student, Department of CSE, VKIT, Bengaluru, Sirigaradi82@gmail.com)

Abstract Extracting unstructured BigData from the a completely parallel manner. The framework sorts the
dataset and converting to Hadoop format. The resultant outputs of the maps, which are then input to the reduce
data is stored in the cloud and secured by double tasks.
encryption. The user can retrieve the data in the cloud Cloud computing is a type of Internet-based computing that
with the help of user interface by double decryption. The provides shared computer processing resources and data to
security to the data in the cloud is provided using Fully computers and other devices on demand. Cloud computing
Homomorphic algorithm. As a result of efficient security or, more simply, cloud security refers to a broad set
encryption, transmission and storage of sensitive data is of policies, technologies, and controls deployed to protect
achieved. We analyze the existing search algorithm over data, applications, and the associated infrastructure of cloud
cipher text, for the problem that most algorithm will computing. Homomorphic encryption is a form of
disclosure user's access patterns, we propose a new encryption that allows computations to be carried out on
method of private information retrieval supporting ciphertext, thus generating an encrypted result which, when
keyword search which combined with Homomorphic decrypted, matches the result of operations performed on the
encryption and private information retrieval. plaintext. We define the relaxed notion of a
semihomomorphic encryption scheme, where the plaintext
I. INTRODUCTION can be recovered as long as the computed function does not
increase the size of the input too much. But the
Big data is a term for data sets that are so large or complex disadvantage of these two algorithms is that the encrypted
that traditional data processing application softwares are data can be decrypted easily. To overcome the
inadequate to deal with them. Challenges include capture, disadvantages of homomorphic and semi-homomorphic
storage, analysis, data curation, search, sharing, transfer, encryption we propose a fully homomorphic encryption
visualization, querying, updating and information privacy. scheme i.e., a scheme that allows one to evaluate circuits
The goal of most big data systems is to surface insights and over encrypted data without being able to decrypt.
connections from large volumes of heterogeneous data that
would not be possible using conventional methods.
Apache Hadoop is an open source software framework for II. EXISTING SYSTEM
storage and large scale processing of data-sets on clusters of
commodity hardware. Hadoop Distributed File System a Crawlers algorithm for extraction
distributed file-system that stores data on the commodity A Web crawler, sometimes called a spider, is an Internet but
machines, providing very high aggregate bandwidth across that systematically browses the World Wide Web, typically
the cluster. for the purpose of Web indexing.Web search engines and
A MapReduce job usually splits the input data-set into some other sites use Web crawling or spidering software to
independent chunks which are processed by the map tasks in update their web content or indices of others sites' web

IDL - International Digital Library 1|P a g e Copyright@IDL-2017


IDL - International Digital Library Of
Technology & Research
Volume 1, Issue 5, May 2017 Available at: www.dbpublications.org

International e-Journal For Technology And Research-2017


content. Web crawlers can copy all the pages they visit for and sixteen Feistel rounds. It uses 10, 12, or fourteen
later processing by a search engine which indexes the rounds.
downloaded pages so the users can search much more
efficiently. Crawlers consume resources on the systems they Advanced Encryption Standard (AES)
visit and often visit sites without approval. Issues of Advanced Encryption Standard is a symmetric- key block
schedule, load, and "politeness" come into play when large cipher emerged as FIPS-197 in the Federal Register in
collections of pages are accessed. Mechanisms exist for December-2001 approach the National Institute of Standards
public sites not wishing to be crawled to make this known to and Technology. AES equals a non-Feistel cipher. AES
the crawling agent. For instance, including a robots.txt file encrypts information with block range of 128-bits. It uses
can request bots to index only parts of a website, or nothing 10, 12, or fourteen rounds.
at all. As the number of pages on the internet is extremely
large, even the largest crawlers fall short of making a Blowfish Algorithm
complete index. For that reason search engines were bad at Blowfish is one of a symmetric block cipher algorithm. It
giving relevant search results in the early years of the World uses the similar secret key to both encryption and decryption
Wide Web, before the year 2000. This is improved greatly of messages. The block range on behalf of Blowfish is 64
by modern search engines; nowadays very good results are bits; messages to be a multiple of 64-bits inside range have
given instantly. to be bolstered. It applies a variable length key, from 32 bits
to 448 bits. It is desirable for applications where the key is
Homomorphic Encryption not changed habitually. It is extensively faster than most
Homomorphic encryption is a form of encryption that encryption algorithms when did in 32-bit microprocessors
allows computations to be carried out on ciphertext, thus with huge information caches.
generating an encrypted result which, when decrypted,
matches the result of operations performed on the plaintext. Disadvantages:
This is sometimes a desirable feature in modern Encryption is a very complex technology. Management
communication system architectures. Homomorphic of encryption keys must be an added administrative task
encryption would allow the chaining together of different for often overburdened IT staff. One big disadvantage of
services without exposing the data to each of those services. encryption as it relates to keys is that the security of data
For example, a chain of different services from different becomes the security of the encryption key. Lose that
companies could calculate 1) the tax, 2) the currency key, and you effectively lose your data.
exchange rate, 3) shipping, on a transaction without Encrypting data and creating the keys necessary to
exposing the unencrypted data to each of those services. encrypt and decrypt the data is computionally expensive.
Homomorphic encryption schemes are malleable by design. No matter what type of encryption is used, the systems
This enables their use in cloud computing environment for performing the computional heavy lifting must have
ensuring the confidentiality of processed data. In addition available resources.
the homomorphic property of various cryptosystems can be One of the common drawbacks of traditional full-disk
used to create many other secure systems, for example encryption solutions is the reduction of overall system
secure voting systems, collision-resistant hash functions, performance upon deployment. A key pitfall is that a
private information retrieval schemes, and many more. poor encryption implementation could result in a false
There are several partially homomorphic crypto-systems, sense of security, when in fact it is wide open to attack.
and also a number of fully homomorphic crypto-systems. There are three attacks known that can break the full 16
Although a crypto-system which is unintentionally rounds of DES with less complexity than a brute-force
malleable can be subject to attacks on this basis, if treated
search: differential cryptanalysis (DC), linear
carefully homomorphism can also be used to perform cryptanalysis, and Davies' attack. However, the attacks
computations securely. are theoretical and are unfeasible to mount in practice
these types of attack are sometimes termed
Data Encryption Standard (DES)
certificational weaknesses.
In this Data Encryption Standard is a symmetric- key
The key space to be searched by brute force attacks
algorithm, contains in block cipher available as FIPS-46 in
increases by a factor of 2 for each additional bit of key
January 1977 by the Federal Register and National Institute
length. This alone increases the degree of difficulty for a
of Standards and Technology. By the side of the encryption
brute force search very rapidly. Key length itself does
site, DES picks out a 64-bit plaintext and inducts a 64-bit
not provide sufficient security against attacks, however,
cipher text, on the decryption site, it takes a 64-bit cipher
as there are ciphers with very long keys which have still
text and inducts a 64-bit plaintext, and similar 56 bit cipher
been found to be vulnerable.
key is used for both decryption and encryption. The
Encrypting a message does not guarantee that this
encryption process is progress of two permutations (P-
message is not changed while encrypted. Hence often a
boxes), which we call preliminary and ultimate permutation,

IDL - International Digital Library 2|P a g e Copyright@IDL-2017


IDL - International Digital Library Of
Technology & Research
Volume 1, Issue 5, May 2017 Available at: www.dbpublications.org

International e-Journal For Technology And Research-2017


message authentication code is added to a ciphertext to to develop effective protection mechanisms. It indicates the
ensure that changes to the ciphertext will be noted by the main risks arising in Big Data and existing security
receiver. Message authentication codes can be mechanism, focus on Hadoop security and its components
constructed from symmetric ciphers. because it remains the required Framework for the
However, symmetric ciphers cannot be used for non- management and the processing of big data.
repudiation purposes except by involving additional
parties. Another application is to build hash functions DESIGN AND IMPLEMENTATION OF HDFS DATA
from block ciphers. See one-way compression function ENCRYPTION SCHEME USING ARIA ALGORITHM
for descriptions of several such methods. ON HADOOP
A reduced-round variant of Blowfish is known to be Hadoop is developed as a distributed data processing
susceptible to known-plaintext attacks on reflectively platform for analyzing big data. Enterprises can analyze big
weak keys. Blowfish implementations use 16 rounds of data containing users' sensitive information by using
encryption, and are not susceptible to this attack. Hadoop and utilize them for marketing. Therefore,
researches on data encryption have been widely done to
III. LITERATURE SURVEY protect the leakage of sensitive data stored in Hadoop.
However, the existing researches support only the AES
DESIGN OF FOCUSED CRAWLER BASED ON international standard data encryption algorithm.
FEATURE EXTRACTION, CLASSIFICATION AND Meanwhile, the Korean government selected ARIA
TERM EXTRACTION algorithm as a standard data encryption scheme for domestic
A web crawler is a software program that scans the usages. A HDFS data encryption scheme which supports
hypertext layout of the web pages, starting from a set of both ARIA and AES algorithms on Hadoop.
seed pages. The crawler retrieves these pages, indexed them
and extracts the hyperlinks inside these pages to find out the MODIFIED RSA ALGORITHM TO ENHANCE
addresses for more pages to be crawled. It is given to SECURITY FOR DIGITAL- SIGNATURE
improve the performance of the crawler by using the Digital signature has been providing security services to
different feature extraction technique and the classification secure electronic transaction over internet. Rivest, Shamir
technique. and Adlemen (RSA) algorithm was most widely used to
provide security technique. Here we have modified the RSA
AN EFFICIENT R-G-B ALGORITHM FOR WEB algorithm to enhance its level of security. This paper
CRAWLER ON INFORMATION EXTRACTION presents a fair comparison between RSA and Modified RSA
Vertical search engine has unstable shooting rate and low algorithm along with time and security by running several
average accuracy. A development of RDF data transfer and encryption and decryption setting to process data of
query on Hadoop Framework to handle the storage of huge different sizes. The efficiency of these algorithms was
data we use map reduce and hadoop framework supporting considered based on key generation speed and security level.
tool to perform data transfer it is used. It is also used to The texts of different sizes were encrypted and decrypted
reduce the access time. using RSA and modified RSA algorithms.
FINANCIALCLOUD: OPEN CLOUD FRAMEWORK OF
HADOOP SECURITY MODELS - A STUDY DERIVATIVE PRICING
The new emerging technology to handle large number of Predicting prices and risk measures of assets and derivatives
dataset in an efficient manner is the big data which is used and rating of financial products have been studied and
in various different platforms and domain to support several widely used by financial institutions and individual
services and improve the system performance in a reliable investors. In contrast to the centralized and oligopoly nature
manner. The main weakness in this domain is the security of the existing financial information services, in this paper,
which can be easily destroyed or surpassed by the user. To we advocate the notion of a Financial Cloud, i.e., an open
enhance it in a protected way it provides a detailed survey distributed framework based cloud computing architecture
about the various security mechanisms and methodologies to host modularize financial services such that these
used in the big data technique. modularized financial services may easily be integrated
flexibly and dynamically to meet users' needs on demand.
BIG DATA EMERGING ISSUES: HADOOP SECURITY This new cloud based architecture of modularized financial
AND PRIVACY services provides several advantages. We may have different
With the growing development of data, it has become types of service providers in the ecosystem on top of the
increasingly vulnerable and exposed to malicious attacks. framework. For example, market data resellers may collect
These attacks can damage the essential properties such as and sell long-term historical market data. Statistical analyses
confidentiality, integrity and availability of information of macroeconomic indices, interest rates, and correlation of
systems. To deal with these malicious intents, it is necessary a set of assets may also be purchased online. Some agencies

IDL - International Digital Library 3|P a g e Copyright@IDL-2017


IDL - International Digital Library Of
Technology & Research
Volume 1, Issue 5, May 2017 Available at: www.dbpublications.org

International e-Journal For Technology And Research-2017


might be interested in providing services based on rating or Security is easy as only the private key must be kept
pricing values of financial products. secret.
Maintenance of the keys becomes easy being the keys
THREE STEP DATA SECURITY MODEL FOR CLOUD remain constant throughout the communication
COMPUTING BASED ON RSA AND depending on the connection.
STEGANOGRAPHY TECHNIQUES The two main advantages of asymmetric encryption are
Cloud computing is based on network and computer that the two parties don't need to have already shared
applications. In cloud data sharing is an important activity. their secret in order to communicate using encryption
Small, medium, and big organizations are use cloud to store and that both authentication and non-repudiation are
their data in minimum rental cost. In present cloud proof possible.
their importance in term of resource and network sharing, Most implementations use asymmetric encryption to
application sharing and data storage utility. Hence, most of encode a symmetric key and transfer it to the other party.
customers want to use cloud facilities and services. So the They then transmit the actual message using the
security is most essential part of customer's point of view as symmetric key which is much more efficient in CPU
well as vendors. There are several issues that need to be time.
attention with respect to service of data, security or privacy An important advantage of asymmetric ciphers over
of data and management of data. The security of stored data symmetric ciphers is that no secret channel is necessary
and information is one of the most crucial problem in cloud for the exchange of the public key. The receiver needs
computing. Using good protection techniques of access only to be assured of the authenticity of the public key.
control we can resolved many security problems. Asymmetric ciphers also create lesser key-management
problems than symmetric ciphers. Only 2n keys are
IV. PROPOSED SYSTEM needed for n entities to communicate securely with one
another.
Fully Homomorphic Encryption
Fully homomorphic algorithm is use to double encrypt and
Hashing is a one way function and encrypted messages
decrypt the content of file, we use RSA and RNS algorithms to could be retrieved if you know the corresponding key to
encrypt and decrypt the content of the file using its private and which you encrypted the message.
the public key. Hashing is, as you said, non-reversible. It is also
RSA cryptosystem appreciate the properties of the constant. This is why we use it to store passwords. When
multiplicative Homomorphicencryption . you set your password for, say, your e-mail, the server
Ronald Rivest, Adi Shamir and Leonard Adleman have devised never stores it. Instead, (assuming your password is
the RSA algorithm and named later on its inventors. "password") they store h("password"). Now lets say you
RSA uses modular exponential for decryption and encryption. want to log in. This is where encryption comes into play.
RSA uses two exponents, a and b, where a is public and b
constitutes private. Let the plaintext is Pt and Ct is cipher text,
V. SYSTEM DESIGN
thus at encryption. Ct = Pta mod n.
And at decryption side Pt = Ctb mod n.
The system architecture is a conceptual model that
n is a very large number, created during key generation defines the structure, behavior, and more views of a system.
process. An architecture description is a formal description and
The RSA method's security rests on the fact that it is extremely representation of a system, organized in a way that supports
difficult to factor very large numbers. reasoning about the structures and behaviors of the system.
Residue number system, which can be applied to any moduli A system architecture can comprise system components that
set. Simulation results proved that the algorithm was many will work together to implement the overall system. There
times faster than most competitive published work. have been efforts to formalize languages to describe system
Determining the position of the most significant nonzero bit of architecture; collectively these are called architecture
any residue number in that algorithm is the major speed description languages.
limiting factor.
They customize the same algorithm to serve two specific
Components used in the system are:
moduli sets: (2/sup k/, 2/sup k/-1, 2/sup k-1/-1) and (2/sup
k/+1, 2/sup k/, 2/sup k/-1), and thus, eliminate that speed A. HADOOP
limiting factor. Based on this work, hardware needed to Hadoop is an open source, Java-based programming
determine most significant bit position has been reduced to a framework that supports the processing and storage of
single adder. extremely large data sets in a distributed computing
Therefore, computation time and hardware requirements are environment. It is part of the Apache project sponsored by
substantially improved. This would enable RNS to be a the Apache Software Foundation. Hadoop makes it possible
stronger force in building general purpose computers. to run applications on systems with thousands of commodity
hardware nodes, and to handle thousands of terabytes of
Advantages: data. Its distributed file system facilitates rapid data transfer

IDL - International Digital Library 4|P a g e Copyright@IDL-2017


IDL - International Digital Library Of
Technology & Research
Volume 1, Issue 5, May 2017 Available at: www.dbpublications.org

International e-Journal For Technology And Research-2017


rates among nodes and allows the system to continue system and PHP as the object-oriented scripting
operating in case of a node failure. language.
This approach lowers the risk of catastrophic system failure ECLIPSE: Eclipse is an integrated development
and unexpected data loss, even if a significant number of environment (IDE) used in computer programming, and
nodes become inoperative. Consequently, Hadoop quickly is the most widely used in Java IDE. It contains a base
emerged as a foundation for big data processing tasks, such workspace and an extensible plug-in system for
as scientific analytics, business and sales planning, and customizing the environment. Eclipse is written mostly
processing enormous volumes of sensor data, including in Java and its primary use is for developing Java
from internet of things sensors. It was inspired by Google's applications, but it may also be used to develop
MapReduce, a software framework in which an application applications in other programming languages via c, c++,
is broken down into numerous small parts. Any of these java script, etc. It can also be used to develop documents
parts, which are also called fragments or blocks, can be run with LaTeX (via a TeXlipse plug-in) and packages for
on any node in the cluster. the software Mathematica. Development environments
Organizations can deploy Hadoop components and include the Eclipse Java development tools (JDT) for
supporting software packages in their local data center. Java and Scala, Eclipse CDT for C/C++, and Eclipse
However, most big data projects depend on short-term use PDT for PHP, among others.
of substantial computing resources. This type of usage is The initial codebase originated from IBM VisualAge.
best-suited to highly scalable public cloud services, such as The Eclipse software development kit (SDK), which
Amazon Web Services (AWS), Google Cloud Platform and includes the Java development tools, is meant for Java
Microsoft Azure. Public cloud providers often support developers. Users can extend its abilities by installing
Hadoop components through basic services, such as AWS plug-ins written for the Eclipse Platform, such as
Elastic Compute Cloud and Simple Storage Service development toolkits for other programming languages,
instances. and can write and contribute their own plug-in modules.
Since Equinox, plug-ins can be plugged-stopped
B. CLOUD dynamically and are termed (OSGI) bundles.
Cloud storage is a cloud computing model in which data is Eclipse software development kit (SDK) is free and
stored on remote servers accessed from the Internet, or open-source software, released under the terms of the
"cloud." It is maintained, operated and managed by a cloud Eclipse Public License, although it is incompatible with
storage service provider on a storage server that are built on the GNU General Public License. It was one of the first
virtualization techniques. Cloud storage is also known as IDEs to run under GNU Classpath and it runs without
utility storage - a term subject to differentiation based on problems under IcedTea.
actual implementation and service delivery. JDK: The Java Development Kit (JDK) is a software
Cloud storage works through data center virtualization, development environment used for developing Java
providing end users and applications with a virtual storage applications and applets. It includes the Java Runtime
architecture that is scalable according to application Environment (JRE), an interpreter/loader (java), a
requirements. In general, cloud storage operates through a compiler (javac), an archiver (jar), a documentation
Web-based API that is remotely implemented through its generator (javadoc) and other tools needed in Java
interaction with the client application's in-house cloud development. Java developers are initially presented
storage infrastructure for input/output (I/O) and read/write with two JDK tools, java and javac. Both are run from
(R/W) operations. the command prompt. Java source files are simple text
When delivered through a public service provider, cloud files saved with an extension of .java. After writing and
storage is known as utility storage. Private cloud storage saving Java source code, the javac compiler is invoked to
provides the same scalability, flexibility and storage create .class files. Once the .class files are created, the
mechanism with restricted or non-public access. 'java' command can be used to run the java program.
My SQL: MySQL is an open source relational database TOMCAT SERVER: Apache Tomcat, often referred to
management system (RDBMS) based on Structured as Tomcat Server, is an open-source Java Servlet
Query Language (SQL). MySQL runs on virtually all Container developed by the Apache Software
platforms, including Linux, UNIX, and Windows. Foundation (ASF). Tomcat implements several Java EE
Although it can be used in a wide range of applications, specifications including Java Servlet, JavaServer Pages
MySQL is most often associated with web-based (JSP), Java EL, and WebSocket.
applications and online publishing and is an important Tomcat is developed and maintained by an open
component of an open source enterprise stack called community of developers under the auspices of the
LAMP. LAMP is a Web development platform that uses Apache Software Foundation, released under the Apache
Linux as the operating system, Apache as the Web License 2.0 license, and is open-source software.
server, MySQL as the relational database management

IDL - International Digital Library 5|P a g e Copyright@IDL-2017


IDL - International Digital Library Of
Technology & Research
Volume 1, Issue 5, May 2017 Available at: www.dbpublications.org

International e-Journal For Technology And Research-2017


Components of TOMCAT SERVER are: 11. Finish. It will take more than half an hour. you will
(i) Catalina:Catalina is Tomcat's servlet container. get cygwin terminal on desktop.
(ii) Coyote:Coyote is a Connector component for 12. Paste the following in the environmental variables
Tomcat that supports the HTTP 1.1 protocol as a by editing it (C:\cygwin\bin; C:\cygwin\usr\bin;.;).
web server. This allows Catalina, nominally a Java 13. Copy the java file, and paste that on the outside the
Servlet or JSP container, to also act as a plain web program files.
server that serves local files as HTTP documents. 14. Download notepad++.
(iii) Jasper:Jasper is Tomcat's JSP Engine. Jasper parses 15. Go to the following file Hadoopsoftwares\
JSP files to compile them into Java code as servlets. Hadoop Software\hadoop-0.18.0\conf\ hadoop-
At runtime, Jasper detects changes to JSP files and env.sh.
recompiles them. 16. Open it with notepad++.
(iv) Cluster:This component has been added to manage 17. Paste the java path in the export line of hadoop-
large applications. It is used for load balancing that env.sh file and change \ to /.
can be achieved through many techniques. 18. Open Cygwin, which is present on the desktop.
Clustering support currently requires the JDK 19. Paste the path fallowing path
version 1.5. cdD:/Hadoop_softwares/Hadoop_Software/hadoop-
0.18.0/bin
20. Enter ./hadoopnamenode -format
21. Enter ./hadoopnamenode
22. Open another Cygwin and type ssh-host-config
23. no, no, yes, no, no, yes.
24. Go to control panel, administrative tools, services.
And start service of Cygwin.
25. Go to Cygwin and type ssh-keygen.
26. Go to Cygwin and type ssh-keygen.
27. cd ~/.ssh/
28. cat id_rsa.pub >>authorized_keys
29. sshlocalhost.
30. Open browser and type localhost50030

FILE UPLOADING MODULE


Description of the MD5 Algorithm
Step 1: Get the Message
Step 2: Convert message in to the bits.
Fig.1 Block Diagram Step 3: Append Padding Bits
(Make the message bit length should be the
exact multiple of 512 bits as well as 16
word Blocks).
VI. WORKING PROCEDURE Step 4: Divide total bits in to 128 bits blocks each
Step 5: Initialize MD Buffer. A four word buffer (A, B,
HADOOP INSTALLATION C, D) is used to compute the message digest,
Steps to install Hadoop: For windows7 - 32bit total 128 bits.
1. Open Cygwin.exe to install the hadoop. Step 6: Do AND, XOR, OR, NOT operations on A,
2. Click next B, C, and D by giving three inputs and get one
3. Install from internet radio button should be on. output.
4. Click next Step 7: Do the Step 6 until get the 128 bits hash (16
5. Select any available download site from the list. bytes).
6. Click next Step 8: Stop
7. Search for openss, inside openss
(Base,Debug,Devel,LLibs,Net,Phyton) click on SEARCHING AND DOWNLOADING MODULE
skip and next. Description of the Searching and Downloading Algorithm
8. Search for tcp, follow the above procedure. Step 1: Start
9. Search for utils, inside Debug, diffutils-debuginfo Step 2: Enter the keyword for search.
change from skip. Step 3: Get the grade key of the searching user.
10. Click next. Step 4: Concatenate grade key and searching keyword
and generate the hash code by MD5 algorithm.

IDL - International Digital Library 6|P a g e Copyright@IDL-2017


IDL - International Digital Library Of
Technology & Research
Volume 1, Issue 5, May 2017 Available at: www.dbpublications.org

International e-Journal For Technology And Research-2017


Step 5: If hash code matched with the database Block chain algorithm can be used for encryption purpose
content. for more security.
Step 6: Get the respective files having matched hash Queries can be included to make the user work easier.
code for download. Different data formats can be included for storage purpose
Step 7: Download the files from the cloud. such as Images, xml, etc.
Step 8: Decompress the file downloaded the files from Real time Case study can be implemented.
the cloud. Single step of conversion of unstructured data can be split
Step 9: Decrypt the file with the RSA public key. into two steps, such as
Step 10: Get the original file. Converting XML file to text file, again from text file to
Step 11: Stop structured format.
User has to select the aggregate Key then after that Input the REFERENCES
search keyword. Convert the keyword into hashcode.
Decrypt the aggregate Key, Separate and get hash keys and [1] B Harikrishna, S Kiran, G.Murali and R.Pradeep Kumar
separate and get public Key. Using Hash Key and keyword Reddy, Security Issues In Service Model Of Cloud
generate hash codes. Send the Hash codes to server, based Computing Environment Procedia Computer Science 87
on the Hash codes received server has to check the keyword (2016) 246 251, Scince Direct.
index and if any matching files are available, list all the file [2] B Harikrishna, N.Anusha, K.Manideep,
names to the user. View the shortlisted files from server, Madhusudhanraoch, Quarantine Stabilizing Multi Keyword
download the files and finally decrypt the file with owner Rated Discover with Unfamiliar ID Transfer over Encrypted
public key. Cloud Warning IJERCSE Vol2, Issue 2, February 2015.
[3] AlexaHuth and James Cebula The Basics of Cloud
Computing, United States Computer Emergency Readiness
VII. RESULTS Team. 2011.
[4] SandipanBasu International Date Encryption Algorithm
Extracting unstructured BigData from the dataset and (IDEA) A Typical Illustration Journal of Global Research
converting to Hadoop format. The resultant data is stored in in Computer Science. July 2011 ISSN: 2229-371X Vol.2,
the cloud and secured by double encryption. The user can Issue 7
retrieve the data in the cloud with the help of user interface [5] Rajeev Bedi, Amritpal Singh and Tejinder Singh,
Comaparative Analysis of Cryptographic Algorithms
by double decryption. The security to the data in the cloud is
provided using Fully Homomorphic algorithm. As a result of International Journal of Advanced
efficient encryption, transmission and storage of sensitive EngineeringTechnoloty.July-Sept. (2013) E-ISSN 0976-
data is achieved. We analyze the existing search algorithm 3945.
over cipher text, for the problem that most algorithm will [6] Pradeep Mittal and Vinod Kumar Comparative Study of
disclosure user's access patterns, we propose a new method Cryptographic Algorithms Internationls Journal of
of private information retrieval supporting keyword search Computer Science and Network. June(2014) ISSN(Online):
which combined with Homomorphic encryption and private 2277-5420, Volume3, Issue 3.
[7] V. G. Korat, A. P. Deshmukh, K. S. Pamu. Introduction
information retrieval.
to Hadoop Distributed File System. Int. J. of Engineering
CONCLUSION Innovations and Research, 1(2): 230-236, March 2012.
[8] Apache Hadoop. http://hadoop.apache.org/, 2012.
[9] P. K. Mantha, A. Luckow, S. Jha. Pilot-MapReduce: An
Big Data consists of huge amount of data to which
Extensible and Flexible MapReduce Implementation for
providing security is main concern. Once the Admin uploads
Distributed Data. Proc. of 2012 Int. Conf. on MapReduce
the file, it is sent to hadoop for converting unstructured data
and Its Applications, pp. 17-24.
into structured data. The analysis process takes place in
[10] Hug the Elephant: Migrating a Legacy Data Analytics
hadoop making use of Map reduce algorithm. The resultant
Application to Hadoop Ecosystem by PradeepAdluru,
file is double encrypted and stored in cloud. When the user
SrikariSindhooriDatla, Xiaowen Zhang.
requests for a particular file, the required file is obtained
[11] Analysis, Modeling, and Simulation of Hadoop YARN
from the cloud with the help of a TOMCAT server, which is
MapReduce. Thomas C. BressoudQiuyi (Jessica) Tang.
then double decrypted and sent to the user for downloading
[12] KOHA: Building a Kafka-based Distributed Queue
the file. This application is mainly used for storing and
System on the fly in a Hadoop cluster. Cao Nguyen, Jik-
securing confidential Big Data.
Soo Kim and Soonwook Hwang.
[13] Hadoop Security Models - A Study.
FUTURE ENHANCEMENT
[14] Big Data Emerging Issues: Hadoop Security And
Privacy.

IDL - International Digital Library 7|P a g e Copyright@IDL-2017


IDL - International Digital Library Of
Technology & Research
Volume 1, Issue 5, May 2017 Available at: www.dbpublications.org

International e-Journal For Technology And Research-2017


[15] Design And Implementation Of HDFS Data
Encryption Scheme Using Aria Algorithm On Hadoop.
[16] PoonamS.Patil, Rajesh. N. Phursule, Survey Paper on
Big Data Processing and HadoopComponents, IJSR,
Volume 3, Issue 10, pp. 585-590,October 2010.
[17] Singh ArpitaJitendrakumar , Security Issues in Big
Data : In Context with a BigData , IJIERE, Volume 2,
Issue 3, 2015, pp. 127-130, ISSN: 2394 3343
[18] Dr. E. Laxmi Lydia, Dr. M.BenSwarup, Analysis of
Big data through Hadoop Ecosystem Components like
Flume, MapReduce, Pig and Hive, IJCSE, Vol. 5 No.01 Jan
2016, pp. 21-29, ISSN : 2319-7323.
[19] SanjeevDhawan, Sanjay Rathee, Big Data Analytics
using Hadoop Components like Pig and Hive,
AIJRSTEM, pp.88-93, 2013, ISSN (Online): 2328-3580.
[20] Apache Flink, [Online] Available
https://en.wikipedia.org/wiki/Apache_Flink [Accessed:
October 10, 2016]
[21] A gentle introduction to blockchain technology with
BlockChaines[online] Available,
https://bitsonblocks.net/2015/09/09/a-gentleintroduction-to-
blockchain-technology
[22] Bitcoin-part-one [online] Available,
http://tech.eu/features/808/bitcoin-part-one
[23] Block hashing algorithm, Available,
https://en.bitcoin.it/wiki/Block_hashing_algorithm

IDL - International Digital Library 8|P a g e Copyright@IDL-2017

Das könnte Ihnen auch gefallen