Sie sind auf Seite 1von 13

FUZZY KEYWORD SEARCH OVER ENCRYPTED DATA IN

CLOUD COMPUTING

1. INTRODUCTION

1.1 Abstract:

Now days, Cloud Computing is becoming prevalent, more and more


sensitive information are being centralized into the cloud. Although traditional
searchable encryption schemes allow a user to securely search over encrypted data
through keywords and selectively retrieve files of interest, these techniques
support only exact keyword search. In this project, formalization and solving the
problem of effective fuzzy keyword search over encrypted cloud data while
maintaining keyword privacy will be demonstrated. The solution, is based on
exploiting edit distance to quantify keywords similarity and the search is based on
two advanced techniques on constructing fuzzy keyword sets, which achieve
optimized storage and representation overheads. Further proposal of a symbol-
based trie-traverse searching scheme, where a multi-way tree structure will be
built up using symbols transformed from the resulted fuzzy keyword sets.
Through rigorous security analysis, the proposed solution is secure and privacy-
preserving, while correctly realizing the goal of fuzzy keyword search. Extensive
experimental results demonstrate the efficiency of the proposed solution.
1.2 Problem Selection:

Considering a cloud data system consisting of cloud server, data owner and data
user. When the user request for a set of data files where each one of them is
indexed based on a file ID and linked with a set of keywords. The fuzzy keyword
search returns the results according to the following rules:

 If the input exactly matches the pre set keyword, the server should return
the files containing the keyword1.
 If it is not matched precisely, it will return the closest possible results

Assuming a semi-trusted server and encrypted version .Despite of the fact the
cloud server may try to derive other sensitive information from users’, search
queries (based on C#).
That’s the reason the search should be conducted in a secure manner that allows
data files to be securely retrieved while revealing as little information as possible
to the cloud server. It is required that nothing should be leaked from the
remotely stored files while processing the pattern of search queries.
Further in this project, a solution is provided which ensures effective and
efficient yet privacy preserving fuzzy keyword search services over encrypted
cloud data.

1.3 Methodology:

1.3.1 Algorithm / Technique used: String Matching Algorithm

Algorithm Description:

The approximate string matching algorithms among them can be classified into
two categories: on-line and off-line. The on-line techniques, performing search
without an index, are unacceptable for their low search efficiency, while the off-
line approach, utilizing indexing techniques, makes it dramatically faster. A
variety of indexing algorithms, such as suffix trees, metric trees and q-gram
methods, have been presented. At the first glance, it seems possible for one to
directly apply these string matching algorithms to the context of searchable
encryption by computing the trapdoors on a character base within an alphabet.
However, this trivial construction suffers from the dictionary and statistics attacks
and fails to achieve the search privacy. An instance M of the data type string-
matching is an object maintaining a pattern and a string. It provides a collection of
different algorithms for computation of the exact string matching problem. Each
function computes a list of all starting positions of occurrences of the pattern in
the string.

1.3.2 System Architecture:

1.3.3 Existing System:

This straightforward approach apparently provides fuzzy keyword search over the
encrypted files while achieving search privacy using the technique of secure
trapdoors. However, this approaches serious efficiency disadvantages. The simple
enumeration method in constructing fuzzy key-word sets would introduce large
storage complexities, which greatly affect the usability.

For example, the following is the listing variants after a substitution operation on
the first character of keyword

CASTLE: {AASTLE, BASTLE, DASTLE, YASTLE, ZASTLE}.


1.3.4 Proposed System:

Main Modules:

1. Wildcard – Based Technique


2. Gram - Based Technique
3. Symbol – Based Trie – traverse Search Scheme

1. Wildcard – Based Technique:

In the above straightforward approach, all the variants of the keywords have
to be listed even if an operation is performed at the same position. Based on the
above observation, we proposed to use an wildcard to denote edit operations at the
same position. The wildcard-based fuzzy set edits distance to solve the problems.
For example, for the keyword CASTLE with the pre-set edit distance 1, its
wildcard based fuzzy keyword set can be constructed as

SCASTLE, 1 = {CASTLE, *CASTLE,*ASTLE, C*ASTLE, C*STLE,


CASTL*E, CASTL*, CASTLE*}.

Edit Distance:

a) Substitution : changing one character to another in a word;

b) Deletion : deleting one character from a word;

c) Insertion: inserting a single character into a word.

Fig.2 Wild card Based technique


2. Gram – Based Technique:
Another efficient technique for constructing fuzzy set is based on grams. The
gram of a string is a substring that can be used as a signature for efficient
approximate search. While gram has been widely used for constructing inverted
list for approximate string search, we use gram for the matching purpose. Proposal
of utilizing the fact that any primitive edit operation will affect at most one
specific character of the keyword, leaving all the remaining characters untouched.
In other words, the relative order of the remaining characters after the primitive
operations is always kept the same as it is before the operations.
For example, the gram-based fuzzy set SCASTLE, 1 for keyword CASTLE can
be constructed as:
{CASTLE, CSTLE, CATLE, CASLE, CASTE, CASTL, ASTLE}.

Fig.3 Gram Based Technique


3. Symbol – Based Trie – traverse Search Scheme
To enhance the search efficiency, symbol-based trie-traverse search
scheme is proposed, where a multi-way tree is constructed for storing the fuzzy
keyword set over a finite symbol set. The key idea behind this construction is that
all trapdoors sharing a common prefix may have common nodes. The root is
associated with an empty set and the symbols in a trapdoor can be recovered in a
search from the root to the leaf that ends the trapdoor. All fuzzy words in the trie
can be found by a depth-first search.

In this section, consider a natural extension from the previous single-user setting
to multi-user setting, where a data owner stores a file collection on the cloud
server and allows an arbitrary group of users to search over his file collection.

Fig.4 Symbol Based Technique


1.4 Objective of Project:

Following are the objectives:

 To explore new mechanism for constructing fuzzy keyword sets optimized


for cloud storage;
 To design search scheme based on the fuzzy keyword sets constructed.
 To validate the security of the proposed information retrieval scheme.

1.5 Scope:

With respect to the future scope of this proposed project I am willing to do the
indexing of the mapped words and fuzzy sets so as to increase the functionality of
the search procedure. Encryption of one or more file formats can be done. Also
decryption of image files and media files can be done simultaneously maintaining
the system integrity and security.
1.6 Process Overview (DFD model Level-0 and Level-1)

Fig.5 DFD Level-0

Fig.6 Level-1 DFD


1.7 System Specification & Requirement:

Hardware Requirements:

• System : Pentium IV 2.4 GHz.


• Hard Disk : 40 GB.
• Floppy Drive : 1.44 Mb.
• Monitor : 15 VGA Colour.
• Mouse : Logitech.
• Ram : 512 Mb.
Software Requirements:

• Operating system : - Windows XP.


• Coding Language : . NET with C#
• Data Base : SQL Server 2005

1.8 Role in Project:

Inculcation of an advanced encryption scheme to provide more secure and


efficient keyword searching at the user end and also simultaneously establishing
the security of indexes associated with the keyword , further to prevent their
duplicity and security leak . Secondly, providing user with more narrow and
precise search of keyword inorder to prevent time consumption.
1.9 Testing Technology and security mechanism:

1.9.1 Testing Mechanism:

The purpose of testing is to discover errors. Testing is the process of trying to


discover every conceivable fault or weakness in a work product. It provides a way
to check the functionality of components, sub assemblies, assemblies and/or a
finished product It is the process of exercising software with the intent of ensuring
that the

Software system meets its requirements and user expectations and does not fail in
an unacceptable manner. There are various types of test. Each test type addresses
a specific testing requirement.

Basics of software testing

There are two basics of software testing: blackbox testing and whitebox testing.

Blackbox Testing

Black box testing is a testing technique that ignores the internal mechanism of the
system and focuses on the output generated against any input and execution of the
system. It is also called functional testing.

Whitebox Testing

White box testing is a testing technique that takes into account the internal
mechanism of a system. It is also called structural testing and glass box testing.

Black box testing is often used for validation and white box testing is often used
for verification.

Types of testing

There are many types of testing like

 Unit Testing

 Integration Testing
 Functional Testing

 System Testing

 Stress Testing

 Performance Testing

 Usability Testing

 Acceptance Testing

 Regression Testing

 Beta Testing

1.9.2 Security Mechanism:

For the consideration, if the data files are been stored in less secured server, it is
possibly that the confidential data may leak through the user request over the
cloud. To eliminate this risk factor, the process is been carried through secured
manner, which does not affect sensitive information even though the cloud server
get the information through the user inputted keyword

The security measures are: 1) The storage of fuzzy keyword should be efficient; 2)
The design should be efficient on fuzzy keyword; 3) maintaining the security on
the implied scheme.

1.10 Contribution of the Project in Real-life Application:

Fuzzy keyword search will greatly enhance the system usability and efficiency by
returning the matching files when users’ searching inputs exactly match the
predefined keywords or the closest possible matching files based on keyword
similarity semantics, at the times when exact match fails.

This technique will eliminate the need for enumerating all the fuzzy keywords and
the resulted size of the fuzzy keyword sets is significantly reduced.
1.11 Limitation of project:

This scheme overcomes the one-time-only search limitation in the previous


schemes. The disadvantages of the proposed system are first of all, the keyword
privacy is compromised once a keyword is searched. As a result, the index must
be rebuilt for the keyword once it has been searched. Such solution is
counterproductive due to the high overhead suffered. Secondly, the existing
inverted index based searchable schemes do not support conjunctive multi-
keyword search, which is the most common form of queries now a days.

2. Conclusion:

Data security is very important in cloud computing .Encrypting data for security
makes effective data utilization challenging task. To achieve efficient data
retrieval from encrypted data collection, I am using the concept of wild card
based, gram based and symbol based trie technique encryption model with some
improvements and will present a system which will efficiently retrieve files
containing information related to specified keyword in rank order from an
encrypted file collection, i.e topmost files contain information more relevant to
the word than other files. The result analysis will show that the application
developed support the secure and efficient data retrieval ,maintaining the data
security.
3. Reference & Bibliography

1. User Interfaces in C#: Windows Forms and Custom Controls by Matthew


MacDonald.
2. Applied Microsoft® .NET Framework Programming (Pro-Developer) by
Jeffrey Richter.
3. Practical .Net2 and C#2: Harness the Platform, the Language, and the
Framework by Patrick Smacchia.
4. Data Communications and Networking, by Behrouz A Forouzan.
5. Computer Networking: A Top-Down Approach, by James F. Kurose.
6. Operating System Concepts, by Abraham Silberschatz.
7. M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. H. Katz, A. Konwinski,
G. Lee, D. A. Patterson, A. Rabkin, I. Stoica, and M. Zaharia, “Above the
clouds: A berkeley view of cloud computing,” University of California,
Berkeley, Tech. Rep. USB-EECS-2009-28, Feb 2009.
8. Amazon Web Services (AWS), Online at http://aws. amazon.com.
9. Google App Engine, Online at http://code.google.com/appengine/.
10. Microsoft Azure, http://www.microsoft.com/azure/.
11. Y.-C. Chang and M. Mitzenmacher, “Privacy preserving keyword
searches on remote encrypted data,” in Proc. of ACNS’05, 2005.

Websites Referred

http://www.networkcomputing.com/

http://www.ieee.org

http://www.almaden.ibm.com/software/quest/Resources/

http://www.computer.org/publications/dlib

http://www.ceur-ws.org/Vol-90/

Das könnte Ihnen auch gefallen