Sie sind auf Seite 1von 32

CHAPTER 1

INTRODUCTION

1.1General
Programming assignments are an intrinsic part of many courses. Such assignments
require students to produce solutions in different languages including Java. Staff
members need to verify the assignments submitted by the students. The evaluation of
these assignments requires more time and need human intervention at each every level
of correction. Moreover, sometimes students produce a copy of some other students
work. Locating similarities in large sets of submissions is an arduous task.
A solution to this problem would be an automated plagiarism detection system
narrowing down the set of all submissions to a few suspicious files to be considered
carefully.

1.2 Motivation
Plagiarism detection software available is built to detect the content copied from the
web. Aside from stand alone programs, there also exist a number of online
applications that can be used to aid the detection problem. JPlag is one such system
which requires the user to submit their files online. This system allows the user to
process a great deal of data. All these applications are implemented by using only one
algorithm which produces weaknesses in detecting plagiarism.
This project delivers a tool which is used for detection of plagiarism in student
software submission. Four algorithms has been used for design of this tool thus
reducing the vulnerability in detecting plagiarism.

1.3 Aim and Objective of the Project


The aim of the project is to automatically detect plagiarism in student software
submissions.
The objective of the project is to develop an application that will evaluate a group of
students and to display similarity among their submissions.

1
1.4 Requirements

1.4.1 Hardware Requirements

Given below are minimum requirements for this application


800 MHz Pentium III, AMD Athlon or equivalent
128MB ram

1.4.2 Software Requirements

Java development kit 1.6


Windows 98/xp/Vista

2
CHAPTER 2

RELATED WORK
2.1 Literature Survey
There are many tools which are already available for detection which are
implemented on different algorithms. A summary of several algorithms is provided.
Common features of the different plagiarism detection algorithms are described.
Ethical and administrative issues involving detected plagiarism are discussed.

Programs that automatically grade student programs have been common for more
than a decade. A newer development is the automatic detection of plagiarism in
student programs. A number of algorithms to detect plagiarism can be found in the
technical literature.
1) Remove all comments.
2) Ignore all blanks and extra lines, except when needed as delimiters.
3) Perform a character string compare between the two files using UNIX utilities, dif,
grep, and wc.
4) Maintain a count of the percentages of characters which are the same. This measure
is called character correlation.

2.2 Proposed System


Keeping in view of all above parameters autoplag is proposed. Autoplag takes
different programming codes of students and compares each of them with others. It
detects any transformation or similarity of one code with other using complex
algorithms. Autoplag will be having graphical user interface (GUI) and similar codes
are displayed. Autoplag can be implemented using object-oriented programming
language JAVA.

3
CHAPTER 3

DESIGN AND IMPLEMENTATION

3.1 Architecture:
The following four sections outline the four key modules which are required to build
our fully working system. These include the processor, the algorithms, the graphical
user interface and the visualisations. Each module contains a large number of classes
in which each class builds upon the others to produce the best results and adhere to
the most efficient coding practices. Each module is loosely coupled so if any changes
were required in the communication protocols they were relatively simple to
implement. Figure.1 given below outlines the architecture of these key components
and their relations.

4
3.2 Detailed Design:

3.2.1 Processor:

The processor is contained in a shared package with the algorithm classes and consists
of Java class files. Its primary function is to take a set of parameters when a
comparison request is initialised and to process all submission source files efficiently.
It is required to act as a central hub for the breakdown and comparison of files. To
achieve this, the processor must constantly communicate with the algorithms to
ensure all result sets are built and stored correctly. The processor classes also have to
constantly communicate with the GUI to ensure that the user is being updated on
current progress.

3.2.1.1 The processor stages

There are three key stages involved in the operation of the processor classes. The first
stage is to pre-process all the submitted source files into a format the application can
read. To achieve this, all submitted source files are read into the pre-comparator object
and modified appropriately. Once the pre-comparator has finished running against
each individual submission directory, a unique object is created. Following
completion of this stage, all the individual objects are stored collectively.

Stage two involves iterating over the object collection and passing the objects in pairs
to the root algorithm class. The final stage involves the storage of all collected data
for subsequent access by the GUI. This stage also involves the generation of groups
Please note, the above three stages are key to any scan run by the user on a students
data set but the processor is not limited to these files. Classes such as the load and
save and the system log are also available via the processor.

3.2.1.2. The pre-comparator


The pre-comparator process is required for one purpose; to prepare source files for
comparison by the algorithm classes. To do this, a number of methods are required to
carry out a series of steps. They are as follows: removing whitespace, converting all
code to lowercase, removing any skeleton code specified, replacing identifiers,
removing output and removing curly brackets. These methods are used to undo

5
common changes made by possible plagiarists to disguise another solution as their
own. To remove these common methods of plagiarism, the pre-comparator iterates
over the students submission directory. It then collects all .java submission files and
stores the contents in memory. The storing of file contents ensures that the original
file system is left intact. Only once the file collection is complete are the files checked
to see if they are allowed to be used. There are a number of options available to the
user to limit specific filenames or allow all files to be processed. This includes
allowing, or disallowing files linked from other documents via import, extends,
implements or other means.

Once a list of files is compiled, it is case of reading in these documents and


performing the specified pre-comparator methods on all lines. Finally, this data can be
stored in a unique object called a PlagData file that stores information about the user
and their submission(s) including the username, the pathname, a status flag, the list of
files available for comparison (including their start and finish line numbers) and an
array containing the data required for comparisons.

It should be noted that the status flag is required to define the type of object. Its flag
consists of one of four simple characters. An ‘S’ indicates a current student
submission, a ‘P’ indicates a previous submission, an ‘I’ states that the file is being
used for an individual comparison and results should not be passed to the results class
and an ‘N’ indicates the student did not submit any files and comparisons should not
take place.

The most complex of the pre-comparator options was the replace identifiers method.
The replacement of identifiers has commonly been a technique to detect the most
basic form of plagiarism; that is students changing variable names within their source
code. To ensure replacement was achieved accurately the method is passed a list of
keywords to remove from its strings. All lines are tokenized and each token is
compared to the given list. If no match is found, the occurrence in the string is
recognised as an identifier and removed.

Despite the pre-comparator appearing simple in its nature, its replacement methods
are far from it and required a high degree of complexity and planning.

6
3.2.1.3. The comparator.
The comparator is class key to the success of the project. Its sole purpose is simple yet
vital. It is required to iterate over all the PlagData objects. If one of these PlagData
objects includes the flag ‘N’ it can inform the results class no submission was entered,
else, it can pass the two current PlagData objects to the root algorithm class for
comparison.

The results from the algorithm comparisons are passed back through the comparator
to the results class in the form of an ArrayList for recording. Following each cycle of
recording information, the comparator updates the status panel and the progress bar in
the GUI. This in turn keeps the user updated on the progress of the current scan.

3.2.1.4. The results.


The results class stores all the information produced throughout a scan. Via the given
access or methods it is possible to access the four key information components
including the three dimensional array containing all results, the Array List of PlagData
objects, the groups Array List and an Array List containing the stats for display in the
GUI stats panel.

The x and y axis in the three dimensional array are required to point to a specific user.
The z axis denotes which of the three algorithms is represented by an (x,y) co-
ordinate. The array is initialised to the correct number of PlagData objects upon
completion of the pre-comparator process and construction of the results class. All
cells contained in the 3D array are set to “0” at creation and populated following
successful completion of a comparison by the algorithm class. The comparator returns
an ArrayList following a successful comparison and the results class interprets this by
taking the first element and putting this into the first dimension in the Array. It then
repeats the same process for the second and third elements putting them in their
corresponding array positions.

The results class was created to act as a storage medium that can be accessed and
written to from any class throughout the project. It follows a strict interface so that
only valid data can be written at any time. It is accessed from many classes including

7
many GUI elements (such as the Graphs and the Stats tabs), the load and save features
(contained within the processor environment) and the comparator.
3.2.2 Graphical User Interface

The GUI is comprised of five key components, created over six classes that combine
to create the finished layout. The GUI has been designed in such a way to not only
break up the overall complexity but also to facilitate the arrangement of coding of
various parts individually (often by different developers). In addition this allowed
components to be tested separately (Unit Testing) before it was fully integrated within
the GUI and tested using Integration Testing methods. The main frame itself holds
together the entire structure and components are added to this frame separately. See
figure 2 to understand the basic layout. The separate components labeled in figure 2

8
are all created within separate classes. The Menu bar and toolbar were designed this
way to allow interaction with other GUI components. The status panel was again
designed as a stand-alone component to allow interaction from other classes, such as
the processor and the algorithms. When the program loads the only tab presented to
the user is the main options tab. Once a user has selected their chosen options and run
a scan, additional tabs are added to the tabbed pane.

3.2.2.1. Main Options Tab.


The main tab is the initial tab the user will see when the program is loaded. There are
no other tabs viewable by the user until a scan has been run and completed
successfully. Due to the vast amount of components on this tab it is one of the larger
classes. Due to the design plan of the GUI, each tab is implemented in a separate class
which contains the tabs structure. This was mainly to avoid confusion and keep the
GUI as simple as possible. The code for the main tab is rather large due to the fact that
not only are the options selected in this tab, but also the scan itself is run within it.
This means that the main class calls the processor and in turn communicates with the
GUI to provide progress feedback to the user. When a scan is run from this tab a new
thread is created in order to maintain the simultaneous functionality of both GUI and
algorithm components. This also allows the user to dispose of the thread in the case
that they wish to cancel the current scan. Stopping the thread would invoke cutting off
the algorithm side yet still maintaining the GUI properties

3.2.2.2. Advanced Options.


The Advanced options menu was designed for the implementation of using keywords
as a way to specify what should be kept when performing the ‘Replace Identifiers’
method in the processor environment. This feature enables the user to add keywords
that may be generic and unchanged throughout different student submissions. This is
of benefit to lecturers who may be performing a scan on an assignment. The lecturer
is then able to insert keywords which are unique to his assessment and which all
student submissions should contain, for example unique API library functions. This
would therefore stop the system identifying these keywords as variable identifiers and
removing them.

9
3.2.2.3. Drop Target Panes. The drop target pane was designed in a very
methodological way. The design is both simple and robust allowing other classes to
call the drop target pane and create more than one instance of the class. It was
designed in this way in order to allow more than one drop target pane in the review
panel. When a file is dragged into either of the drop panes, methods are called within
those two separate instances to allow the drag and drop facility to be as robust but as
easy to use as possible.

3.2.2.4. Groups
The Group class was implemented in much the same way as the drop pane target was
in the fact that it allows more than one instance of itself to be created and used within
different parts of the GUI. The adaptability of such components allows for easy
integration into the GUI and reduces replicated code. For example both the review tab
and the stats tab use an instance of the Group class. The groups are built around a tree
structure and are placed within a scroll pane for ease of use and hierarchal navigation.

3.2.3 Visualizations
All visualization classes are stored within the GUI package. The classes that
complement visualization include the following:

- The RawDataTab is used to show the results from each of the algorithms.
- The ReviewTab is used to show side by side comparison of the transformed files
with line highlighting.

- The GraphTab converts the raw data into several Graphs to show the user any trends
identified. The visualization also uses another package from a freeware source. This is
called JFreeChart and is used solely by the GraphTab.

3.2.3.1 Graphs.
Graphs are used in this application to display trends in the data. The JFreeChart
package was used to implement the graph features. The choice of JFreeChart was a
recommendation from the Java forum and suited the needs of the project. The
package has over two thousand styles of graph allowing a high degree of
customisation. We were therefore able to choose the most appropriate graphs.

10
The graphs within the Autoplag application obtain data in the form of a 3D array from
the results class. This array is then broken down into three 2D arrays which are then
used to populate the three individual histograms.

3.2.3.2 Review Tab


It should be noted that when reviewing the data from either the GraphTab or the
RawDataTab, there may not be a clear view of what the results are showing. To solve
this problem and to actually indicate the similarity found between two individual
submissions we implemented a document viewer. This allows the users to select two
student’s work from the group tree and using the drag and drop functions display
them next to each other. A list of common lines between two files is provided
following the drag and drop operation. Clicking one of these given line numbers will
highlight the appropriate matching lines we believe identify possible plagiarism.
When selecting a line number there may be multiple lines displayed in the right hand
document, these identify multiple line matches. This line highlighting provides a clear
visual representation of the suspected matches.
This review function also allows the lecturer to review possible plagiarism cases
within the applications environment and identify if the file is matched or if a false
positive has been found.

3.3 Implementation

Different plagiarism techniques have been implemented for the verification of


students assignments, paper write-up’s , software coding, etc., Four algorithms have
been used to implement the plagiarism. They are
1. Algorithm- Straight-Comparison.
2. Algorithm- Common-Words.
3. Algorithm- Common-String.
4. Algorithm- Common-sequence.
Each algorithm can find particular kind of plagiarism. So each algorithm has its own
importance. One can use any of the above specified algorithms for the better
verification and accuracy. An accurate result can also be generated using all the above
algorithms and even higher complex algorithms. The result is given as percentage
match between one file and another file on the basis of one-to-many comparisons.

11
These algorithms are implemented by using JAVA. The detailed java equivalent
codes for above specified algorithms are gives as follows:

3.3.1 Algorithm1: Straight Comparison


This algorithm can be considered as level 1 comparison of two files for plagiarism.
Most of the students copy their assignments from their friends and their neighbours
without even changing a single word, such kind of plagiarism can be detected using
this algorithm.
In this algorithm a character of one file is compared with character of other file in
sequential manner. The number of words of first file and second file are noted. The
numbers of matches are counted while comparing the files. The number of matches to
the number of total words in the original document gives the percentage match
between those two files.
This algorithm can be implemented as follows:
Input: Two files
Output: Matching percentage
Method:
1.Read array ar1, ar2
2. Match←0
3. J←0
4. Org ← length of file
5. do
Begin
6. if ar1[j]==-1 or ar2[j]==-1 then
7. if ar1[j]=ar2[j] then
8. match←match+1
9. j←j+1;
End
10. while j ≠ org

12
3.3.2Algorithm2: Common Words

This algorithm can be considered as level 2 plagiarism detection of files. Sometimes


students change the position of words, uses synonyms of words while copying their
neighbors documents or programs. In such situations this algorithm can be used for
detecting their plagiarism.
In this algorithm each and every word of a file are separated and their synonyms are
taken in two dimensional array removing conjunctions and some commonly used
words. Similarly for another file a two dimensional array is obtained. The two two-
dimensional arrays are compared to get the number of matches. The percentage of
number of matches with total number of words in the original file gives the
percentage of plagiarism.
This algorithm can be implemented as follows:
Input: Two files
Output: Matching percentage
Method:
1. Read array st1, st2
2. Match←0
3. J←0
4. n1 ← length of file1
5. n2 ← length of file1
6. for k:=0 to n1 do
Begin
7. for m:=0 to n2 do
Begin
8. if strlen1<=strlen2then
9. if st1[k]=st2[m]then
10. match++;
11. if strlen1>if strlen2then
12. if st1[m]=st1[k] then
13. match++;
End
End

13
3.3.3 Algorithm3: Common String
This algorithm can be considered as level 3 plagiarism detection. Sometimes copying
can occur as given below
File 1:
int i =1;
int j=2;
k=i+j;
File 2:
int j=2;
int i=1;
k=j+i;
In the above example the code of file2 is generated by slightly transforming the code
of file1. In such cases this algorithm can detect such kind of transformations and
translations.
The entire file or program is divided into strings of data. Each string can be obtained
by separating the file by ‘;’. These strings are placed in an array. Similarly another file
is divided into strings and another array is formed. A string of one array is compared
with every other string in the other array. The numbers of matches are then computed.
The number of matches to the total number of strings gives the percentage of
plagiarism between the two files.
This algorithm can be implemented as follows:
Input: Two files
Output: Matching percentage
Method:
1. Read array st1, st2
2. Match←0
3. J←0
4. n1 ← length of file1
5. n2 ← length of file1
6. for k:=0 to n1 do
Begin
7. for m:=0 to n2 do

14
Begin
8. if strlen1<=strlen2then
9. if st1[k]=st2[m]then
10. match++;
11. if strlen1>if strlen2then
12. if st1[m]=st1[k] then
13. match++;
End
End

3.3.4 Algorithm4: Common Block


This algorithm can be considered as level 4 plagiarism detection of two programs or
files. Sometimes some students do their assignments by themselves but copies coding
of some functions/classes from their friends. Such type of plagiarism can be detected
by implementation of this algorithm.
According to this algorithm the programs of one file is divided into blocks and are
labelled. This can be done by extracting data between { and }. Similarly this
procedure is done for another file. Then one block of one file is compared with each
and every block of another file. The number of matches is computed simultaneously.
Percentage match is obtained by using number of matches and total number of blocks.
This algorithm can be implemented as follows:
Input: Two files
Output: Matching percentage
Method:
1. Read array st1, st2
2. Match←0
3. J←0
4. n1 ← length of file1
5. n2 ← length of file1
6. for k:=0 to n1 do
Begin
7. for m:=0 to n2 do
Begin
8. if strlen1<=strlen2then

15
9. if st1[k]=st2[m]then
10. match++;
11. if strlen1>if strlen2then
12. if st1[m]=st1[k] then
13. match++;
End
End

16
CHAPTER 4

RESULTS AND DISCUSSION

For the specified four algorithms with n input files, an NxN matrix is generated as
shown in the following figures

4.1 Straight chain comparison result

Figure 3
In the above figure three files are compared such that each file is compared with other
files and the matrix is generated. Figure 4 shows the result obtained in a graphical
form. In the figure 4 it is observed that file 1 when compared with file 3 gives 100%
percent match, but when file 3 is compared with file 1 the percentage is below 100
This indicates that file 3 contains more information from file1. This that either one of
the files has been created using the other.

17
Figure 4

4.2 Result for common string algorithm

Figure 5

18
Figure 6
Figure 5 is the snapshot of the result obtained when three files are compared with
each other using common string algorithm. Figure 6 is the graph for the result
obtained.
4.3 Result of common word algorithm

Figure 7

19
Figure 8

4.4 Result
for Common
Block
algorithm

20
Figure 10

From the figures 4, 6,8,10 it has been observed that the plotted graphs for the three
same files is different. This is due to the fact the files has been plagiarized and some
changes are made. Out of these four results priority should be given to common block
algorithm since it identifies the plagiarized block of code.

21
CHAPTER 5

CONCLUSIONS AND FUTURE ENHANCEMENTS

5.1 Conclusion:

The project is able to detect most of the plagiarism followed by the students. The aims
and objectives are clearly met. All the algorithms proposed are implemented
successfully, but GUI module is partially implemented. User should have a general
idea of all the algorithms and the normalized result should be considered. User should
fix some value as acceptance value depending upon the assignment given and
algorithm used, if the result for a file exceeds that value then that file is said to be
plagiarized.

5.2 Future Enhancements


The project can be improvised in future by making some enhancements in the
common word algorithm. In the common word algorithm natural language processing
is added to make it effective in plagiarism detection in paper write-ups.
The other enhancement is connecting to the web and searching for plagiarism
detection with the papers in the net using google api library.

22
REFERENCES

Journal Paper
[1] ALAN PARKER, JAMES O. HAMBLEN "computer algorithms for plagiarism
detection " IEEE Transactions on education , vol. 32,no 2, may 1989.
[2] Colin J. Neill and Ganesh Shanmuganthan ,”web enabled plagiarism detection
tool” published by IEEE computer society October 2004

Books
[3] Herbert Schildt "Java 2 the complete reference", 5th ed ,McGraw-Hill, 2002,
[4]G.S.Baluja “Algorithm analysis”, Dhanpatrai publications,2003

23
APPENDIX -I

CODE
/*
option=1--->byte to byte
option=2---->word to word
option=3--->string to string
option=4--->block to block
*/

import java.io.*;
import java.util.*;
class bl2
{
public static void main(String args[])throws Exception
{
int
size,i,ar1[],j=0,ar2[],org=100,start=0,end=0,k,m,great,small,strlen1,strlen2,n1,n2,balu=0,
x2=-1,x3=-1;
int option=1;
int x1=-1;

float match=0;
ar1=new int[1000];
ar2=new int[1000];
String st1[],st2[];
st1=new String[1000];
st2=new String[1000];
for(i=0;i<100;i++)
{
st1[i]="";
st2[i]="";
}
FileInputStream f0,f1;

String dirname1="D:/Users/varun/sample directory/1.c";


String dirname2="D:/Users/varun/sample directory/2.c";
f0=new FileInputStream(dirname1);
f1=new FileInputStream(dirname2);

do
{
i=f0.read();
ar1[j]=i;
j++;
}

24
while(i!=-1);
org=j-1; //Taking 1.txt as original file
ar1[j]=-1;
j=0;
do
{
i=f1.read();
ar2[j]=i;
j++;
}
while(i!=-1);

j=0;
/////////////////////////////////////
switch(option)
{
case 1:
f0=new FileInputStream(dirname1);
f1=new FileInputStream(dirname2);

do
{

i=f0.read();
ar1[j]=i;
j++;
}
while(i!=-1);
org=j-1; //Taking 1.txt as original file

ar1[j]=-1;
j=0;
do
{
i=f1.read();
ar2[j]=i;
j++;
}
while(i!=-1);
ar2[j]=-1;
j=0;
l:
{
do
{
if(ar1[j]==-1||ar2[j]==-1)break l;
if(ar1[j]==ar2[j])
match++;
j++;

25
}
while(j!=org);
}

break;
//////////////////////////////////////////////
case 4:

f0=new FileInputStream(dirname1);
f1=new FileInputStream(dirname2);

j=0;
do
{
i=f0.read();
x1++;
if(i=='{')
{
start=x1;

}
if(i=='}')
{

end=x1-1;

for(k=start;k<=end;k++)
{

st1[j]=st1[j]+(char)ar1[k];
}

j++;
}
}while(i!=-1);

strlen1=j;
System.out.println(strlen1);
org=j;
x1=-1;
j=0;
do
{
i=f1.read();
x1++;
if(i=='{')
{

26
start=x1;

}
if(i=='}')
{
end=x1-1;

for(k=start;k<=end;k++)
{
st2[j]=st2[j]+(char)ar2[k];
}
j++;
}
}
while(i!=-1);

strlen2=j;
System.out.println(strlen1);
small=strlen1<=strlen2?strlen1:strlen2;
great=strlen1>strlen2?strlen1:strlen2;
j=0;
match=0;

System.out.println(small);
System.out.println(great);

for(k=0;k<small;k++)
{
for(m=0;m<great;m++)
{

if(strlen1<=strlen2)
{
System.out.println("***********************************");
System.out.println(st1[k]);
System.out.println(st2[m]);
System.out.println("***********************************");
if(st1[k].equals(st2[m]))
{
match++;
}
}

if(strlen1>strlen2)
{
if(st2[k].equals(st1[m]))
match++;
}

27
}
}
System.out.println(match);
break;
///////////////////////////////////////
case 2:
x1=-1;
x2=0;
x3=-1;
f0=new FileInputStream(dirname1);
f1=new FileInputStream(dirname2);

j=0;
do
{

if(x3==-1)
{
i=f0.read();
x1++;
}

if((i==' '||i=='\n')&&(x2==0))
{
start=x1;
i=f0.read();
x1++;
x2=1;
x3=-1;
}

if((i==' '||i=='\n')&&(x2==1))
{
x3=0;
x2=0;
end=x1-1;
for(k=start;k<=end;k++)
{
st1[j]=st1[j]+(char)ar1[k];
}

j++;
}
}while(i!=-1);

x2=0;
strlen1=j;
System.out.println(strlen1);

28
org=j;

x1=-1;
x3=-1;
j=0;
do
{
if(x3==-1)
{
i=f1.read();
x1++;
}
if((i==' '||i=='\n')&&(x2==0))
{
x3=-1;
start=x1;
i=f1.read();
x1++;
x2=1;
}
if((i==' '||i=='\n')&&(x2==1))
{
x3=0;
end=x1-1;
x2=0;
for(k=start;k<=end;k++)
{
st2[j]=st2[j]+(char)ar2[k];
}
//System.out.println(st2[j]);
j++;
}
}
while(i!=-1);

strlen2=j;
System.out.println(strlen2);
small=strlen1<=strlen2?strlen1:strlen2;
great=strlen1>strlen2?strlen1:strlen2;
j=0;
match=0;

for(k=0;k<small;k++)
{
for(m=0;m<great;m++)
{

if(strlen1<=strlen2)

29
{
if(st1[k].equals(st2[m]))
{
match++;
}
}
if(strlen1>strlen2)
{
if(st2[m].equals(st1[k]))
{
match++;
}
}
}
}

break;
//////////////////////////////////////////////////////////
case 3:
x1=-1;
x2=0;
x3=-1;
f0=new FileInputStream(dirname1);
f1=new FileInputStream(dirname2);

j=0;
do
{

if(x3==-1)
{
i=f0.read();
x1++;
}

if((i==';'||i=='.')&&(x2==0))
{
start=x1;
i=f0.read();
x1++;
x2=1;
x3=-1;
}

if((i==';'||i=='.')&&(x2==1))
{
x3=0;
x2=0;
end=x1-1;
for(k=start;k<=end;k++)

30
{
st1[j]=st1[j]+(char)ar1[k];
}
System.out.println(st1[j]);
j++;
}
}while(i!=-1);

x2=0;
strlen1=j;
org=j;
System.out.println(strlen1);
x1=-1;
x3=-1;
j=0;

do
{
if(x3==-1)
{
i=f1.read();
x1++;
}
if((i==';'||i=='.')&&(x2==0))
{
x3=-1;
start=x1;
i=f1.read();
x1++;
x2=1;
}
if((i==';'||i=='.')&&(x2==1))
{
x3=0;
end=x1-1;
x2=0;
for(k=start;k<=end;k++)
{
st2[j]=st2[j]+(char)ar2[k];
}
System.out.println(st2[j]);
j++;
}
}
while(i!=-1);

strlen2=j;
//System.out.println(strlen2);
small=strlen1<=strlen2?strlen1:strlen2;

31
great=strlen1>strlen2?strlen1:strlen2;
j=0;
match=0;

//System.out.println(great);

for(k=0;k<=small;k++)
{
for(m=0;m<=great;m++)
{

//if(st1.length<=st2.length)
//{
if(st1[k].equals(st2[m]))
{
match++;
}
//}
//if(st1.length>st2.length)
//{
//if(st2[k].equals(st1[m]))
//match++;
}
}
//}
//match=match/2;
break;

//System.out.println(org);
System.out.println(match);
match=(match/org);
match=match*100;
System.out.println("Match percentage="+match);
f0.close();
f1.close();
}
}.

32

Das könnte Ihnen auch gefallen