Sie sind auf Seite 1von 21

A Project Synopsis Presentation

“Plagiarism Detection Using n-tuple algorithm For Big Data”

Presented By
Under The Guidance Of
Kiran.T (1DS14IS404)
Mr. Muzameel Ahmed
Manjunath .V (1DS14IS406)
Assistant Professor
Sheetal.S (1DS13IS090)
Yogitha.Rangashree.R(1DS13IS117)

Department of ISE, DSCE


1
INTRODUCTION

What is plagiarism and why to perform plagiarism detection?


Plagiarism is one of the biggest problems of scientific research and engineering.
Plagiarism is understood as presenting, intentionally or otherwise, someone else’s
words, thoughts, analyses, argumentation, pictures, techniques, computer
programmers etc.
Plagiarism detection is essential to maintain originality of content in different
forms. It can be both either manual or software assisted.
Plagiarism detection techniques are applied by making a distinction between
natural and programming languages. A similarity score is determined for
each pair of documents which match significantly.

Department of ISE, DSCE 2


PROPOSED SYSTEM

N-tuple algorithm on distributed computing platform using Hadoop.

Advantages:

• Performance improvement on Big data


• Ability to run on distributed computed platform

Department of ISE, DSCE 3


ARCHITECTURE DIAGRAM

Department of ISE, DSCE 4


Data flow diagram
Other Profile
Operations
User Registration and
Authentication

File Operations Plagiarism


File
Upload Component Detection
Component

File File Run the


Read Delete Reporting
algorithm

View
Department of ISE, DSCE Results
SEQUENCE DIAGRAM
REPOSITORY PLAGIARISM
LOGIN AUTHENTICATION
ACCESS DETECTION
STORAGE

USER
Verifying credentials
Username & Login successful
Password Run Plagiarism

Unsuccessful File read, write


delete
Plagiarism Results

Files and report


storage

Department of ISE, DSCE 6


USE CASE DIAGRAM

Department of ISE, DSCE 7


MODULES BREAKDOWN

Module 1: Hadoop Hortonwork Data Platform setup and configuration


Module 2: Implementation of user profile operations
Module 3: Implementation of N-Tuple algorithm for detection plagiarism
Module 4: Implementation of File Operations on HDFS
Module 5: Extending N-Tuple algorithm on HDFS
Module 6: Plagiarism Result Reporting

Department of ISE, DSCE 8


Module 2:

Implementation of User Profile Operations

Department of ISE, DSCE 9


Index register UserDAOImpl
.jsp .jsp Deployment
descriptor - Register()
Request_type
- GetUserDetails()
=register
- Changepassword()
- Updateprofile()
login User - Forgotpassword()
.jsp Servlet - Deleteprofile()
Request_type
=login
Request type
Update_ - Register User
welcome - Login
profile
.jsp - Changepassword
.jsp
Request_type - Updateprofile
=updateprofile - Forgotpassword
- Deleteprofile MySQLUtility
changepass
word
.jsp

Request_type - GetConnection()
=changepassword

Department of ISE, DSCE 10


Module 3:

Implementation of N-Tuple algorithm for detection plagiarism

Department of ISE, DSCE 11


function main
read file1, file2, syns, tuplesize
create synonym_map
file1  replace synonyms
file2  replace synonyms
If (file1==null || file2==null)
return;
else
initiate_tuple_map()
return checkduplicatetuple()
function initiate_tuple_map
for i=0; i<file2.wordcount()-tuplesize; i++
tuple  read next tuple
if (tupleMap.contains(tuple))
tupleMap.put(tuple,prev_value+1)
else
tupleMap.put(tuple,1)

Department of ISE, DSCE 12


function checkduplicatetuple
matchedtuplecount  0
unmatchedtuplecount  0
for each word in file1
construct a sentence of tuplesize starting from word
if (tupleMap.containsKey(sentence))
matchedtuplecount++
else
unmatchedtuplecount++
matchingratio  matchedtuplecount / (matchedtuplecount + unmatchedtuplecount)
percentage  matchingratio * 100
return percentage

Department of ISE, DSCE 13


Module 4:

Implementation of File Operations on HDFS

Department of ISE, DSCE


14
Deployment
Descriptor

Service Layer
Client Layer

Name Data Data Data


Node Node Node Node

Department of ISE, DSCE 15


Module 5:

Extending N-Tuple algorithm on HDFS

Department of ISE, DSCE


16
Name
Mapper Node

Reducer Data
N-Tuple Node
HDFS Algorithm
commands Implement
ation Combiner Data
(Module 2) Node

Data
Partitioner
Node

Department of ISE, DSCE


17
Module 6:

Plagiarism Resulting and Reporting

Department of ISE, DSCE


18
Client
Layer

Deployment Service
Descriptor Layer

Name Data Data Data


Node Node Node Node

Department of ISE, DSCE


19
Thank you.

Department of ISE, DSCE


20
Any Queries?

Department of ISE, DSCE

Das könnte Ihnen auch gefallen