Sie sind auf Seite 1von 31

International Journal of Computer Engineering and Technology ENGINEERING (IJCET), ISSN 0976INTERNATIONAL JOURNAL OF COMPUTER 6367(Print), ISSN 0976

6375(Online) Volume 4, Issue 3, May June (2013), IAEME & TECHNOLOGY (IJCET) ISSN 0976 6367(Print) ISSN 0976 6375(Online) Volume 4, Issue 3, May-June (2013), pp. 260-290 IAEME: www.iaeme.com/ijcet.asp Journal Impact Factor (2013): 6.1302 (Calculated by GISI) www.jifactor.com

IJCET
IAEME

HYBRID ZERO-WATERMARKING AND MARKOV MODEL OF WORD MECHANISM AND ORDER TWO ALGORITHM FOR CONTENT AUTHENTICATION OF ENGLISH TEXT DOCUMENTS
Kulkarni U. Vasantrao1, Fahd N. Al-Wesabi2, Adnan Z. Alsakaf3 Professor, Department of Comp. Sci. and Engg., SGGS Institute of Engg. and Tech., Maharashtra, INDIA. 2 PhD Candidate, Faculty of Engineering, SRTM University, Nanded, INDIA, 2 Assistant Teacher, Department of IT, Faculty of Computing and IT, UST, Sanaa, Yemen. 3 Professor, Department of IS, Faculty of Computing and IT, UST, Sanaa, Yemen.
1

ABSTRACT Content authentication and tamper detection of digital text documents has become a major concern in the communication and information exchange era via the Internet. There are very limited techniques available for content authentication of text documents using digital watermarking techniques. English text Zero-Watermark approach based on word mechanism order twoof Markov model is developed in this paper for content authentication and tamper detection of text documents. In the proposed approach, Markov model used as a soft computing tool for text analysis and hybrid with digital watermarking techniques in order to improve the accuracy and complexity issues of the previous watermark technique presented in reference(27). The proposed approach is implemented using PHP programming language. Furthermore, the effectiveness and feasibility of the proposed approachis proved with experiments using six datasets of varying lengths. The experiment results shows that the proposed approach is more sensitive for all kinds of tampering attacks and has good accuracy of tampering detection. The accuracy of tampering detection is compared with other recent approaches under random insertion, deletion and reorder attacks in multiple random locations of experimental datasets. The comparative results shows that the proposed approach is better than WO1 approach in term of watermark complexity, capacity, and watermark accuracy of tampering detection under insertion and deletion attacks. Which means the proposed approach is recommended in these cases, but it is not applicable under reorder tampering attacks especially on large sizes of text documents.
260

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 3, May June (2013), IAEME Keywords: Digital watermarking, Markov Model, order two, word mechanism, probabilistic patterns, information hiding, content authentication, tamper detection, copyright protection. I. INTRODUCTION With the increasing use of internet, e-commerce, and other efficient communication technologies, the copyright protection and authentication of digital contents, have gained great importance. Most of these digital contents are in text form such as email, websites, chats, e-commerce, eBooks, news, and short messaging systems/services (SMS) [1]. These text documents may be tempered by malicious attackers, and the modified data can lead to fatal wrong decision and transaction disputes [2]. Content authentication and tamper detection of digital image, audio, and video has been of great interest to the researchers. Recently, copyright protection, content authentication, and tamper detection of text document attracted the interest of researchers. Moreover, during the last decade, the research on text watermarking schemes is mainly focused on issues of copyright protection, but gave less attention on content authentication, integrity verification, and tamper detection [4]. Various techniques have been proposed for copyright protection, authentication, and tamper detection for digital text documents. Digital Watermarking (DWM) techniques are considered as the most powerful solutions to most of these problems. Digital watermarking is a technology in which various information such as image, a plain text, an audio, a video or a combination of all can be embedded as a watermark in digital content for several applications such as copyright protection, owner identification, content authentication, tamper detection, access control, and many other applications [2]. Traditional text watermarking techniques such as format-based, content-based, and image-based require the use of some transformations or modifications on contents of text document to embed watermark information within text. A new technique has been proposed named as a zero-watermarking for text documents. The main idea of zero-watermarking techniques is that it does not change the contents of original text document, but utilizes the contents of the text itself to generate the watermark information [13]. In this paper, the authors present a new zero-watermarking technique for digital text documents. This technique utilizes the probabilistic nature of the natural languages, mainly the second order based on word level of Markov model. The paper is organized as follows. Section 2 provides an overview of the previous work done on text watermarking. The proposed generation and detection algorithms are described in detail in section 3. Section 4 presents the experimental results for the various tampering attacks such as insertion, deletion and reordering. Performance of the proposed approach is evaluated by multiple text datasets. The last section concludes the paper along with directions for future work. II. PREVIOUS WORK

Text watermarking techniques have been proposed and classified by many literatures based on several features and embedding modes of text watermarking. We have examined briefly some traditional classifications of digital watermarking as in literatures. These techniques involve text images, content based, format based, features based, synonym substitution based, and syntactic structure based, acronym based, noun-verb based, and many others of text watermarking algorithms that depend on various viewpoints [1][3][4].
261

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 3, May June (2013), IAEME A. Format-based Techniques Text watermarking techniques based on format are layout dependent. In [5], proposed three different embedding methods for text documents which are, line shift coding, word shift coding, and feature coding. In line-shift coding technique, each even line is shifted up or down depending on the bit value in the watermark bits. Mostly, the line is shifted up if the bit is one, otherwise, the line is shifted down. The odd lines are considered as control lines and used at decoding. Similarly, in word-shift coding technique, words are shifted and modifies the interword spaces to embed the watermark bits. Finally, in the feature coding technique, certain text features such as the pixel of characters, the length of the end lines in characters are altered in a specific way to encode the zeros and ones of watermark bits. Watermark detection process is performed by comparing the original and watermarked document. B. Content-based Techniques Text watermarking techniques based on content are structure-based natural language dependent [4]. In [6][14], a syntactic approach has been proposed which use syntactic structure of cover text for embedding watermark bits by performed syntactic transformations to syntactic tree diagram taking into account conserving of natural properties of text during watermark embedding process. In [18], a synonym substitution has been proposed to embed watermark by replacing certain words with their synonyms without changing the sense and context of text. C. Binary Image-based Techniques Text Watermarking techniques of binary image documents depends on traditional image watermarking techniques that based on space domain and transform domain, such as Discrete Cosine Transform (DCT), Discrete Wavelet Transform (DWT), and Least Significant Bit (LSB) [5]. Several formal text watermarking methods have been proposed based on embedding watermark in text image by shifting the words and sentences right or left, or shifting the lines up or down to embed watermark bits as it is mentioned above in section formatbased watermarking [5][7]. D. Zero-based Techniques Text watermarking techniques based on Zero-based watermarking are content features dependent. There are several approaches that designed for text documents have been proposed in the literatures which are reviewed in this paper [1][19] [20] and [21]. The first algorithm has been proposed by [19] for tamper detection in plain text documents based on length of words and using digital watermarking and certifying authority techniques. The second algorithm has been proposed by [20] for improvement of text authenticity in which utilizes the contents of text to generate a watermark and this watermark is later extracted to prove the authenticity of text document. The third algorithm has been proposed by [1] for copyright protection of text contents based on occurrence frequency of non-vowel ASCII characters and words. The last algorithm has been proposed by [21] to protect all open textual digital contents from counterfeit in which is insert the watermark image logically in text and extracted it later to prove ownership. In [22], Chinese text zero-watermark approach has been proposed based on space model by using the two-dimensional model coordinate of word level and the sentence weights of sentence level.

262

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 3, May June (2013), IAEME E. Combined-based Techniques One can say the text is dissimilar image. Thus, language has a distinct and syntactical nature that makes such techniques more difficult to apply. Thus, text should be treated as text instead of an image, and the watermarking process should be performed differently. In [23] A combined method has been proposed for copyright protection that combines the best of both image based text watermarking and language based watermarking techniques. The above mentioned text watermarking approaches are not appropriate to all types of text documents under document size, types and random tampering attacks, and its mechanisms are very essential to embed and extract the watermark in which maybe discovered easily by attackers . On the other hands, these approaches are not designed specifically to solve problem of authentication and tamper detection of text documents, and are based on making some modifications on original text document to embed added external information in text document and this information can be used later for various purposes such as content authentication, integrity verification, tamper detection, or copyright protection. This paper proposes a novel intelligent approach for content authentication and tamper detection of English text documents in which the watermark embedding and extraction process are performed logically based on text analysis and extract the features of contents by using hidden Markov model in which the original text document is not altered to embed watermark. III. THE PROPOSED APPROACH

This paper presents an improved intelligent approach of English text zerowatermarking based on word level and second order of Markov model for content authentication and tampering detection of text documents. An improved approach depends on word mechanism and order two of Markov model to improve the performance, complexity and accuracy of tampering detection of similar approach that used order one of Markov model presented in [27] and developed by F. Alwesbiet. el. An improved approach should perform watermark generation, embedding, extraction and detection processes under higher accuracy and security measures. An improved approach hybrid text zero-watermarking techniques and soft computing tools for natural language processing and protect the digital text documents. A Markov model uses for text analysis and extracts the interrelationship between its contents as probabilistic patterns based on word level and second order of Markov model in order to generate the watermark information. This watermark can later be extracted using extraction algorithm and matched with watermark generated from attacked document using detection algorithm for identifying any tampering and prove the authenticity of text document. Before we explain the watermark generation and detection processes, in the next subsection we present a preliminary mathematical description for second order of Markov models based on word mechanism for text analysis A. Markov Models for Text Analysis In this subsection, we explain how to model text using a Markov chain, which is defined as a stochastic (random) model for describing the way that processes move from state to a state. For example, suppose that we want to analyse the following sentence: The quick brown fox jumps over the brown fox who is slow jumps over the brown fox who is dead.
263

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 3, May June (2013), IAEME When we use a Markov model of order two for words mechanism, then each sequence of two words is a state. As the above sample text is processed, the system makes the following transitions: "the quick" -> "quick brown" -> "brown fox" -> "fox jumps" -> "jumps over" -> "over the" -> "the brown" -> "brown fox" -> "fox who" -> "who is" -> "is slow" -> "slow jumps" -> etc Next we present a simple method to build a Markov matrix of states and tionsM, which is the most basic part of text analysis using Markov model. Based on this approach, the size of Markov matrix is not fixed, which means the number of states and transition probabilities are vary based on contents of the given text. A list of all possible states and transitions can be computed by the equation (1) and (2): Ps = (n-2) Ps = (n-2) ^ 2 Where, - n: is the length of given text document. So the matrix of states probabilities for the above given sample text should have (20 2) = 18 double of words. A matrix of transition probabilities from each state, there are (n -2) possible transitions. IF the Markov chain is currently at first state (first two words) in the given text document, the possible states that could come next are [W i+2, W i+3, W i+4, , Wi+n]. So the matrix of transition probabilities should have (n 2) ^ 2 entries. For example in the above given sample text, If the Markov chain is currently at "the quick" state, the possible transitions that could come next are [brown, fox, jumps, over, the, ...,dead]. So the matrix of transition probabilities for the above given sample text should have (20 2) ^ 2 = 18 ^ 2 = 324 entries. In general, if each state has n transitions of words, then there are (n-2) states, and the matrix of transition probabilities needs (n-2) ^ 2 entries. As a result of a Markov model of order two for words mechanism to analysing the above given sentence which contains 20 words and after processed by the system and represented in a Markov chains, we obtain the figure 2 which gives the 11 present states as a words setsin matrix of Markov chains without reputations and 18 (n 2) all possible transitions. . . (1) (2)

264

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 3, May June (2013), IAEME

Fig. 2. Sample text states and transitions based on order 2 of a Markov model Now if we consider state "brown fox", the next state transitions are "jumps", "who", and who. We observe that state who occurs twice. For the analysis of large sized text, we calculate the frequencies of occurrences of the next states to finally obtain the probabilities. Next is a simple procedure to obtain a Markov model of order twofor a given text. Build (and initialize to all-zeroes) an-2 by n-2 matrix M to store the transitions. The entry Mi, j will be used to keep track of the number of times that the i word is followed by the jwordwithin given text. For i 1 to L 2, where L is the length of the text document 2", let x be the ithword in the text and y be the (i+2)stword in the text. Then increment M[x,y]. Now the matrix M contains the counts of all transitions. We want to turn these counts into probabilities. Here is a method that can do it. For each i from 3 to n, sum the entries on the ith row, i.e., let counter[i] = M[i,3] + M[i,4] + M[i,5] + ... + M[i,n]. Now define P[i,j] = M[i,j] / counter[i] for all pairs i,j. This just gives a matrix of probabilities. In other words, now P[i,j] is the probability of making a transition from word i to word j. Hence a matrix of probabilities that describes a Markov model of order twofor the given text is obtained. B. Watermark Generation and Embedding Algorithm The watermark generation and embedding algorithm requires the original text document (To) as input which provided by the author, then as a pre-processing step it is required to perform conversion of capital letters to small letters. A watermark pattern is generated as the output of this algorithm. This watermark is then stored in watermark database along with the main properties of the original text document such as document identity, author name, current date and time. This stage includes involves three algorithms, which are pre-processing and building the Markov matrix, text analysis, and watermark generation and embedding as shown in figure 3.

265

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 3, May June (2013), IAEME

Original Text Document (TDO)

Text Pre- processing

Building Markov matrix


Word Level Order TWO

Text Analysis using Markov model


Compute # of occurrences of NS transitions for every PS Present State (PS) NextState (NS)

Probabilistic patterns

No

Terminate?

Yes Watermark DB WMO Patterns, DocID, Date, Time WMO

DigestWM patterns using MD5 algorithm

Fig. 3. Watermark generation and embedding processes

1) Pre-processing and Build the Markov Matrix This algorithm requires the original text document as inputs, and provides the preprocessed text document and Markov matrix as outputs. Building the states and transition matrix is the most base part of text analysis and watermark generation using Markov model. A Markov matrix that represents the possible states and transitions available in given text is constructed without reputations. In this approach, each unique sequence of two words within given text represent as state (words set) and transition in the Markov matrix. During building process of Markov matrix, the proposed algorithm initialize all transition values by zero to use these cells later to keep track of the number of times that the i word is followed by the jword within given text document.

266

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 3, May June (2013), IAEME The algorithm of preProcessing executes as following: PROCEDURE preProcessing(TO) - Input: Original Text Document (TO) - Output: preprocessed text document (TP), state matrix of given text without repeats arrayList[Ts], - BEGIN - Loop index = 0 to Text.Length - 2, o // convert letters case from capital to small o IF UpperCharacter(TO[index]) = True THEN TP[index] = LowerCharacter(TO[index]); o // List all unique sequence of two words within given text as states in array list o exist = TP[index]; o Loop j = 0 to index o IF arrayList[j] <>exist THEN o arrayList[index] = exist; - index ++; - END Where, o To: represent the original text document, Tp: represent the processed text document,arrayList: represent the states arrayof given text after preprocessing process,index: represent the current word in given text. The algorithm of buildingMarkov matrix executes as following: PROCEDURE Build_Markov_Matrix(TP) - Input: preprocessed text (TP) - Output: Markov matrix with zeros initial value - BEGIN - // perform preprocessing process - Call preProcessing (TP) - // Build states and transitions matrix of Markov model and initialize all zeros - Loop ps = 0 to arrayList.Length - 2, o Loop ns = 0 to arrayList.Length, MarkovMatrix[ps][ns] = 0; o ns ++; - ps ++; - END Where, o TP: represent the preprocessed text document, MarkovMatrix: States and Transitions matrix with zero value for all cells, ps: The present state, ns: The next state. 2) Algorithm of Text Analysis This algorithm takes the preprocessed text document as input, and provides the watermark patterns as output. Aafter the Markov matrix was constructed, text analysis process should be done using Markov model based on order two of word mechanism by267

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 3, May June (2013), IAEME finding the interrelationship between words of the given text document. In the other word, the proposed algorithm computes the number occurrences of the next state transitions for every present state. A Matrix of transition probabilities that represents the number of occurrences of transition from a state to another which constructed by equation (3) as following. MarkovMatrix[ps][ns] =Total Number of Transition[i][j], for i.j=1,2, .,n-2 Where, o n: is the total number of states. o i: refers to PS "the present state". o j: refers to NS "the next state". o P[i,j]: is the probability of making a transition from wordi to word j. Text analysisof given sentence based on word mechanism and order twoshowed in Markov chain and proceeds as illustrated in figure 4. .(3)

Fig. 4. Text analysis processes based on order 2 of a Markov model

Let TPis the preprocessed text, MarkovMatrix[ps][ns] represent the Markov matrix to store values of the number of times that the i word is followed by the jword in the given text. The text analysis algorithm is presented formally and executes as following: PROCEDURE text_analysis(TP) - Input: preprocessed text (TP) - Output: Markov matrix with values of transition probabilities - BEGIN - // build states and transitions matrix of Markov model - Call Build_Markov_Matrix (TP) - // compute the total frequencies of transitions for every state - Loop ps = 0 to arrayList.Length - 2, o Loop ns = 1 to arrayList.Length, Loop counter = 2 to TP.length - 1, MarkovMatrix[ps][ns] = Total Number of Transition[ps][ns] counter ++;
268

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 3, May June (2013), IAEME o ns ++; - ps ++; - END Where, o TP: represent the preprocessed text document, MarkovMatrix: States and Transitions matrix with transition probability values for every state. 3) Algorithm of Watermark Generation and Embedding After performing the text analysis and extracting the probability features, the watermark is obtained by identifying all the nonzero values in the above Markov matrix. These nonzero values are sequentially concatenated to generate a watermark pattern, denoted by WMPO as given by equation (4) and presented in figure 5. WMPO &= MarkovMatrix [ps] [ns], for i,. j= nonzero values in the Markov matrix.. (4)

Fig. 5. The original watermark patterns for a given sample text The embedding process will be done logically during text analysis process by keeping the tracksof all nonzero transitions and its values shown in the Markov matrix. In which the cells of nonzero transitions contains the number of times that the i word is followed by the j word within given text document. These tracks can be used later by detection algorithm for matching it with those tracks that will be producing from the attacked text document. This watermark is then stored in the watermark database along with some properties of the original text document such as document identity, author name, current date and time.After watermark generation as sequential patterns, an MD5 message digest is generated for obtaining a secure and compact form of the watermark, notationalyas given by equation (5) and presented in figure 6. DWM = MD5(WMPO) .. (5)

Fig. 6. The original watermark after MD5 digesting

269

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 3, May June (2013), IAEME The proposed watermark generation and embedding algorithm, using the first order of Markov model word level is presented formally andexecutes as the following: PROCEDURE watermark_gen_embed(M[I,j]) - Input: Markov matrix[i,j] - Output: originalwatermark patterns (WMPO) - BEGIN - // compute the total frequencies of transitions for every state of original document - Call text_analysis(TP) - // concatenate watermark patterns of every states shown in Markov matrix - Loop ps = 0 to MarkovMatrix[rows].Length - 2, o Loop ns = 1 to MarkovMatrix[columns].Length, o IF MarkovMatrix [ps][ns] != 0 // states that have nonzero transitions o WMPO &= MarkovMatrix [ps] [ns] o ns ++; - ps ++; - Store WMPOin DWM database. - // Digest the original watermark using MD5 algorithm - WMO = MD5(WMPO) - Output WMPO, WMO - END

Where, o WMO: Original watermark, WMPO: Original watermark patterns, MD5: Hash algorithm.

C. Algorithms of Watermark Extraction and Detection The watermark detection algorithm is on the base of zero-watermark, so before detection for attacked text document TA, the proposed algorithm still need to generate the attacked watermark patterns. When received the watermark patterns, the matching rate of patterns and watermark distortion are calculated in order to determine tampering detection and content authentication. This stage includes two main processes which are watermark extraction and detection. Extracting the watermark from the received attacked text document and matching it with the original watermark will be done by the detection algorithm. The proposed watermark extraction algorithm takes the attacked text document, and performs the same watermark generation algorithm to obtain the watermark pattern for the attacked text document as shown in figure 7.

270

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 3, May June (2013), IAEME

Attacked Text Document (TDA)

Text Pre-processing

Building Markov matrix


Word Level Order TWO Text Analysis using Markov model
Compute # of occurrences of NS transitions for every PS

Present State (PS)

NextState (NS)

Probabilistic patterns

No

Terminate?

Yes
Watermark DB WMO
Patterns, DocID, Date, Time

EWMA

WMO

WM Pattern Matching

No

Text Document Tampered

Yes Text Document is Authentic

Fig. 7. Watermark Extraction and Detection processes

1) Watermark Extraction Algorithm In this algorithm the proposed approach takes the attacked text document (TA), original watermark patterns or original text document as inputs and the procedure is similar to that of watermark generation. Output of this algorithm is attacked watermark patterns (WMPA).

271

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 3, May June (2013), IAEME The watermark extraction algorithm is executes as following PROCEDURE watermark_extraction(TP) - Input: Attacked text document(TA) - Output: attackedwatermark patterns (WMPA). - BEGIN - // perform preprocessing process for attacked text document - Call preProcessing(TA) - // compute the total frequencies of transitions for every state of attacked document - Call text_analysis(TAp) - // Generatethe attacked watermark patterns from the attacked text document. - Loop ps = 0 to MarkovMatrix[rows].Length - 2, o Loop ns = 0 to MarkovMatrix[columns].Length, o IF MarkovMatrix[ps][ns] != 0, o WMPA &= MarkovMatrix[ps] [ns], o ns ++; - ps ++; - Output WMPA - END Where, o WMPA: Attacked watermark patterns, TA: Attacked text document, TAp: preprocessed attacked text document, MarkovMatrix[ps] [ns]: Markov matrix of the attacked text document. 2) Watermark Detection Algorithm After extracting the attacked watermark pattern, the watermark detection is performed in three steps, Primary matching is performed on the whole watermark pattern of the original document WMPO, and the attacked document WMPA. If these two patterns are found the same, then the text document will be called authentic text without tampering. If the primary matching is unsuccessful, the text document will be called not authentic and tampering occurred, then we proceed to the next step. Secondary matching is performed by comparing the components associated with each state of the overall pattern. which compares the extracted watermark pattern for each state with equivalent transition of original watermark pattern. This process can be described by the following mathematical equations (6), and (7). ,

. , (0 < PMRT<=1) ..

(6)

Where, o :represent the value of pattern matching rate on transition level.. o , : represent the indexes of states and transitions respectively, i= 0 ..number of non-zero states in given text, j= 0 .. number of non-zero transitions in given text. o : represent the value of original watermark in transition level. o : represent the value of attacked watermark in transition level.
272

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 3, May June (2013), IAEME
,

, (0 < PMRS<=1) ..

(7)

Where, o n: is the number of non-zero transitions of every state represented in matrix of Markov model. o i: is the count of non-zero patterns of every state represented in matrix of Markov model. o : represent the value of pattern matching rate on state level. After we get the pattern matching rate of every state, we have find the weight of every state from a whole states in Markov matrix. We can get for it by equation (8) as shown follow. State weight Sw =

.....

(8)

Where, o : is the total pattern matching rate of the state i. o :is the number of states of given text document. Finally, the PMR is calculated by equation (9), which represent the pattern matching rate between the original and attacked text document.

Where, o N:is the total number of statesin the Markov matrix.

...

(9)

The watermark distortion rate refers to tampering amount occurred by attacks on contents of attacked text document, this value represent in WDR which we can get for it by equation (10): 1 . (10) This process is illustrated in figure 8.

Fig. 8: Watermark extraction process based on order 2 of a Markov model


273

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 3, May June (2013), IAEME The watermark detection algorithm is executes as following: PROCEDURE watermark_detection(Pt, Pt) - Input: preprocessed text (TP, TP) - Output: PMR, WDR - BEGIN - // getting watermark of the original document - Call watermark_gen_embed(MarkovMatrix[ps][ns]) - // extract watermark from the attacked document - Call watermark_extraction(MarkovMatrix[ps][ns]) - // pweform matching process between the original and attacked watermark patterns - IF WMA = WMO o Print Document is authentic and no tampering occurred o PMR = 1 - Else Print Document is not authentic and tampering occurred o // compute pattern matching rate on the transition level - Loop i = 0 to MarkovMatrix[rows].Length - 2, o Loop j = 0 to MarkovMatrix[columns].Length o IF WMPO[i][j] != 0 patternCount +=1 , o Else transPMRTotal += PMR

IF WMPA[i][j] != 0 patternCount += WMP ij // compute pattern matching rate on state level


stateWeight =

Sw += stateWeight

// compute pattern matching rate on document level


- PMR

- // compute watermark distortion rate on document level - WDR = 1 PMR - END Where, o SW: is the weight of states correctly matched. o WDR: represent the value of watermark distortion rate (0 < WDRS<=1).
274

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 3, May June (2013), IAEME IV. EXPERIMENTAL SETUP, RESULTS AND DISCUSSION

A. Experimental Setup In order to test the proposed approach and compare with other the approach, we conducted a series of simulation experiments. The experimental environment is listed as below: CPU: Intel Corei5 M480/2.67 GHz, RAM: 8.0GB, Windows 7; Programming language PHP NetBeans IDE 7.0. With regard to the data sets used, six samples from the data sets designed in [24]. These samples were categorized into three classes according to their size, namely Small Size Text (SST), Medium Size Text (MST), and Large Size Text (LST). Next, we define the types of attacks and their percentage as follows, Insertion attack, deletion attack and reorder attack performed randomly on multiple locations of these datasets. The details of our datasets volume and attacks percentage used is shown in table I, which is considered are similar to those performed in [25] for comparison purpose, and it should be mentioned that we perform the reorder attack on the datasets which is not contained in the same paper. TABLE I ORIGINAL AND ATTACKED TEXT SAMPLES WITH INSERTION AND DELETION PERCENTAGE Sample Text ID [SST4] [SST2] [MST5] [MST2] [LST4] Original Text Word Count 179 421 469 559 2018 Attacks Percentage Insertion Deletion Reorder

5$, 10%, 20%, 50$

5$, 10%, 20%, 50$

5$, 10%, 20%, 50$

To measure the performance of our approach and compare it with others, the tampering accuracy which is a measure of the watermark robustness will be used. The PMR value will give the Tampering Accuracy of the given text document. The watermark distortion rate WDR is also measured and compared with other approaches. The values of both PMR and WDR range between 0 and 1 value. The larger PMR value, and obviously the lowest WDR value mean more robustness, while the lowest PMR value and largest WDR value means less robustness. Desirable value of PMR with close to 0, and close to 1 with WDR. We categorize tamper detection states into three classes based on PMR threshold values which are: (High when PMR values greater than 0.70, Mid when PMR values between 0.40 and 0.70, and Low when PMR values less than 0.40). To evaluate the accuracy of the proposed approach, a series of experiments were conducted with all the well-known attacks such as random insertion, deletion and reorder of words and sentences on each sample of the datasets. These various kinds of attacks were applied at multiple locations in the datasets. The experiments were conducted, firstly with individual attacks, then with all attacks at the same time and conducted comparative results of the proposed approach with recently similar approach.

275

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 3, May June (2013), IAEME B. Experiments with the proposed approach In this section, we evaluate the performance of the proposed approach. The character set of words cover all English letters, space, numbers, and specialsymbols. The experiments were conducted with the various kinds of attacks individually with the rates of these attacks which are 5%, 10%, 20% and 50%respectively. The performance results of this approach under all the mentioned attacks are presented in tabular form in table II and graphically represented in figure 9, 10, 11 and 12, for Insertion, Deletion, and Reorder attacks respectively. These results are discussed below. TABLE II EXTRACTED WATERMARK MATCHING AND DISTORTION PERCENTAGE UNDER VARIOUS INDIVIDUAL ATTACKS
Original Text Word Count 179 Extracted watermark matching and accuracy under 3 ATTACKS ATTACKS Volume

Sample Text Category

Insertion PMR % 0.9409 0.8929 0.693 0.6386 0.9246 0.9052 0.8182 0.6622 0.9473 0.9068 0.8233 0.6428 0.9463 0.9006 0.8282 0.6576 0.0102 0.0095 0.0106 0.0066 WDR% 0.0591 0.1071 0.307 0.3614 0.0754 0.0948 0.1818 0.3378 0.0527 0.0932 0.1767 0.3572 0.0537 0.0994 0.1718 0.3424 0.9898 0.9905 0.9894 0.9934

Deletion PMR % 0.8936 0.8506 0.8652 0.7576 0.9448 0.7423 0.8083 0.9144 0.9854 0.9553 0.9475 0.548 0.9565 0.823 0.8269 0.3258 0.9852 0.979 0.0065 0.008 WDR% 0.1064 0.1494 0.1348 0.2424 0.0552 0.2577 0.1917 0.0856 0.0146 0.0447 0.0525 0.452 0.0435 0.177 0.1731 0.6742 0.0148 0.021 0.9935 0.992

Reorder PMR % 0.7354 0.7825 0.363 0.3008 0.8835 0.7412 0.7535 0.7624 0.8589 0.7484 0.5715 0.2619 0.8916 0.7544 0.5697 0.0493 0.8697 0.0577 0.0676 0.0502 WDR% 0.2646 0.2175 0.637 0.6992 0.1165 0.2588 0.2465 0.2376 0.1411 0.2516 0.4285 0.7381 0.1084 0.2456 0.4303 0.9507 0.1303 0.9423 0.9324 0.9498

[SST4]

[SST2]

421

[MST5]

469

[MST2]

559

[LST4]

2018

5% 10% 20% 50% 5% 10% 20% 50% 5% 10% 20% 50% 5% 10% 20% 50% 5% 10% 20% 50%

Results of various attacks under 5% scenario The results shows the PMR accuracy of the proposed algorithm, as applied on different datasets, under 5% rate of insertion, deletion and reorder attacks. The PMR is more than 70% for all kinds of attacks except under insertion attack with large size of text document (LST4) as shown in figure no. 9. It can be observed also that as the PMR is the worst under reorder attack, and it is the best under deletion attack, in which the PMR still maintains a value close to or greater than 90% in all cases.

276

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 3, May June (2013), IAEME
100 90 80 70 60 50 40 30 20 10 0 [SST4] [SST2] Insertion 5% [MST5] Deletion 5% [MST2] Reorder 5% [LST4]

Fig. 9 PMR accuracy under5% scenariosof various attacks Results of various attacks under 10% scenario As applied on different datasets under 10% rate of insertion, deletion and reorder attacks as shown in figure 10, the PMR value is the best under deletion attacks in which the PMR still maintains a value greater than 70% for all datasets. However, in insertion attack, the PMR is still maintains its values close to90% for all datasets except with LST4 dataset. Which refer to that the proposed approach is not applicable under 10% of insertion attacks with large sizes of text document. Finally, in case of reorder attack, the PMR value is increase with small size of text documents and decrease with the large documents.
100 90 80 70 60 50 40 30 20 10 0 [SST4] [SST2] Insertion 10% [MST5] Deletion 10% [MST2] Reorder 10% [LST4]

Fig. 10 PMR accuracy under 10% scenariosof various attacks

277

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 3, May June (2013), IAEME Results of various attacks under 20% scenario Figure 11 shows experimental results as applied on different datasets under 20% rate of insertion, deletion and reorder attacks. As shown in figure 11, the PMR accuracy is good with small and middle sizes of text document, but it is bad with the large size of text documents with all kinds of attacks as shown with LST4 dataset. Which it is refers to that the proposed approach dose not applicable with large documents under 20% rates of various kinds of attacks.
100 80 60 40 20 0 [SST4] [SST2] Insertion 20% [MST5] Deletion 20% [MST2] Reorder 20% [LST4]

Fig. 11 PMR accuracy under20% scenariosof various attacks Results of various attacks under 50% scenario As applied on different datasets under 50% rate of insertion, deletion and reorder attacks as shown in figure 12, the PMR accuracy is increase with small sizes of text documents, decrees with middle size of text documents, and very bad with large size of text documents in which values are close to zero with all scenarios. As shown also from figure 12, the PMR still maintains a value greater than 60% for small and middle datasets under insertion attack.
100 80 60 40 20 0 [SST4] [SST2] Insertion 50% [MST5] Deletion 50% [MST2] Reorder 50% [LST4]

Fig. 12 PMR accuracy under50% scenariosof various attacks


278

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 3, May June (2013), IAEME C. Comparative Results In order to compare the performance of the proposed approach named here as (WO2) with recently published approach for text watermarking which presented in [27] named here as (WO1) proposed by F. Al-wesabi et al., TheWO1 has the environment and parameters as the same of the proposed approach. Both WO2 and WO1approaches depend on wordmechanism of Markov model. However, the core difference between that is order nature, which WO1 approach based on order one of Markov model, while the proposed approach (WO2) based on order two of Markov model. In this experiments, random multiple insertions, deletion and reorder attacks were performed individually on each sample of the datasets with various rates of attacks as shown above in table I. Ratios of successfully detected watermark of the proposed algorithm as compared with reference 27 (WO1)are shown in table III and graphically represented in figure 13, 14, 15and 16. TABLE III COMPARATIVE PERFORMANCE ACCURACY OF THE PROPOSED ALGORITHM WITH WO1 UNDER
INDIVIDUAL ATTACKS

Sample ATText TACKS Word Category Volume Coun t 5% 10% 20% 50% 5% 10% 20% 50% 5% 10% 20% 50% 5% 10% 20% 50% 5% 10% 20% 50%

Origi nal Text

Successfully detected watermark Reference 27 (WO1) Insertion 94.59 89.53 67.47 64.21 91.91 90.34 81.42 65.54 94.64 90.6 80.93 63.03 94.8 89.99 82.43 66.08 94.76 89.67 4.09 1.07 Deletion 86.02 86.39 88.02 71.56 95.19 69.49 73.95 75.13 97.33 93.19 89.18 40.96 93.69 78.71 73.97 27.86 96.62 93.02 1.2 1.54 Reorder 81.85 81.65 45.91 42.68 72.05 78.69 81.46 82.59 88.21 78.85 63.2 0.7 90.02 80.13 65.6 7.89 88.61 60.8 9.09 7.37 The proposed approach (WO2) InserDeleReortion tion der 94.09 89.36 73.54 89.29 85.06 78.25 69.3 86.52 36.3 63.86 75.76 30.08 92.46 94.48 88.35 90.52 74.23 74.12 81.82 80.83 75.35 66.22 91.44 76.24 94.73 98.54 85.89 90.68 95.53 74.84 82.33 94.75 57.15 64.28 54.8 26.19 94.63 95.65 89.16 90.06 82.3 75.44 82.82 82.69 56.97 65.76 32.58 4.93 1.02 98.52 86.97 0.95 97.9 5.77 1.06 0.65 6.76 0.66 0.8 5.02

[SST4]

179

[SST2]

421

[MST5]

469

[MST2]

559

[LST4]

2018

279

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 3, May June (2013), IAEME Comparative Results for Individual Dataset In order to evaluate the accuracy of the proposed approach, we will compare its experiment results with WO1 approach under various scenarios of insertion, deletion and reorder attacks. To perform this comparative, we choose three classes of experimental datasets, SST4 as small dataset, MST5 as a middle dataset, and LST4 as a large dataset. Comparative Results under Various Scenarios for Individual Datasets As shown in figure 13 under 5% volume of various attacks, the pattern matching rate (PMR) of WO2 is better than PMR of WO1 in terms of deletion attacks for all datasets. However, in terms of insertion and reorder attacks, the PMR of WO1 is better than PMR of WO2 for all datasetsexcept with MST5 dataset under insertion attack. Which means the proposed approach provide added value with all sizes of text documents under deletion attacks.

SST4 Dataset
100 80 60 40 20 0 Insertion 5% Deletion 5% Reorder 5% Refrence 27 (WO1) This Approach (WO2) 100 90 80 70 60 50

MST5 Dataset

Insertion 5% Deletion 5% Reorder 5% Refrence 27 (WO1) This Approach (WO2)

LST4 Dataset
110 100 90 80 70 60 50 Insertion 5% Deletion 5% Reorder 5%

Refrence 27 (WO1)

This Approach (WO2)

Fig. 13 Comparison results between (WO1) and (WO2) under5% of various attacks

280

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 3, May June (2013), IAEME As shown in figure 14, the performance of WO2 approach is better than WO1 on middle sizes of text documents under insertion and deletion attacks. On the other hand, the WO1 is better than WO2 under reorder attacks for all datasets, and under insertion attack on large size of text documents such as LST4. Which means the proposed approach is robustness against deletion attacks for all sizes of text documents, recommended for small and middle sizes of text document under this range of insertion attacks, and not applicable under insertion and reorder attacks for large size of text documents.

SST4 Dataset
100 90 80 70 60 50 Insertion 10% Deletion 10% Reorder 10%

Refrence 27 (WO1)

This Approach (WO2)

MST5 Dataset
100 80 60 40 20 0 Insertion 10% Deletion 10% Reorder 10%

Refrence 27 (WO1)

This Approach (WO2)

LST4 Dataset
100 90 80 70 60 50 40 30 20 10 0 Insertion 10% Deletion 10% Reorder 10%

Refrence 27 (WO1)

This Approach (WO2)

Fig. 14 Comparison results between (WO1) and (WO2) under 10% of various attacks
281

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 3, May June (2013), IAEME As shown in figure 15, and comparative with20% scenario of various attacks, the robustness of the proposed approach (WO2) improves with small and middle size of text documents when the volume of insertion and deletion attacks is increments. On the other hand, comparative results shown also, the robustness of WO2 approach is shown worse than (WO1) approach under this rate of reorder attack for all datasets. This means that it is applying the test of the proposed approach (WO2) under 20% and less of volume attacks are applicable for small and middle size of text documents but not recommended for large sizes of text documents.

SST4 Dataset
100 90 80 70 60 50 40 30 20 10 0 Insertion 20% Deletion 20% Reorder 20%

Refrence 27 (WO1)

This Approach (WO2)

MST5 Dataset
100 90 80 70 60 50 40 30 20 10 0 Insertion 20% Deletion 20% Reorder 20%

Refrence 27 (WO1)

This Approach (WO2)

LST4 Dataset
100 90 80 70 60 50 40 30 20 10 0 Insertion 20% Deletion 20% Reorder 20%

Refrence 27 (WO1)

This Approach (WO2)

Fig. 15 Comparison results between (WO1) and (WO2) under 20% of various attacks
282

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 3, May June (2013), IAEME As shown in figure 16, and comparative with previous discussed scenarios of various attacks, the robustness of the proposed approach is still better than (WO1) approach under this rate (50%) especially under insertion and deletion attacks for all datasets.And the robustness value is decrease with large size of text documents. In the other word, the proposed approach provide added value in term of robustness on small and middle sizes of text documents especially under insertion and deletion attacks.

SST4 Dataset
100 90 80 70 60 50 40 30 20 10 0 Insertion 50% Deletion 50% Reorder 50%

Refrence 27 (WO1)

This Approach (WO2)

MST5 Dataset
100 90 80 70 60 50 40 30 20 10 0 Insertion 50% Deletion 50% Reorder 50%

Refrence 27 (WO1)

This Approach (WO2)

LST4 Dataset
8 6 4 2 0 Insertion 50% Deletion 50% Reorder 50%

Refrence 27 (WO1)

This Approach (WO2)

Fig. 16 Comparison results between (WO1) and (WO2) under 50% of various attacks
283

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 3, May June (2013), IAEME Comparative Results under Various Scenarios for All Datasets The following figure no. 17, shows the performance of the two approaches, as applied under 5% of various kind of attacks on different datasets. As shown, for all datasets, the proposed approach WO2 is better performance under insertion and deletion attacks. However, the WO1 is better performance than WO2 approach under reorder attacks, which show in general that the proposed approach is recommended under low volume of all tampering attacks for all sizes of text documents
100 90 80 70 60 50 40 30 20 10 0 WO1 WO2 WO1 WO2 WO1 WO2

Insertion 5% [SST4] [SST2]

Deletion 5% [MST5] [MST2]

Reorder 5% [LST4]

Fig. 17 Comparison results between (WO1) and (WO2) under 5% of various attacks for all datasets

Figure 18 illustrate the comparative results under 10% rate of various attacks, As shown for all datasets, the WO1 and WO2 approaches are close together under insertion and deletion attacks exception on large size dataset (LST4 dataset) which WO1 is better under insertion attacks. Also, under reorder attack, compression results shows that the WO1 approach is better than WO2 with all datasets.
100 90 80 70 60 50 40 30 20 10 0 WO1 WO2 WO1 WO2 WO1 WO2

Insertion 10% [SST4] [SST2]

Deletion 10% [MST5] [MST2]

Reorder 10% [LST4]

Fig. 18 Comparison results between (WO1) and (WO2) under 10% of various attacks for all datasets 284

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 3, May June (2013), IAEME As applied under 20% of different attacks for all datasets. We can say the performance of the two approachesis the same under insertion and deletion attacks as shown in figure no. 19. However, the performance of WO1 approach is better than WO2 under reorder attacks.
100 90 80 70 60 50 40 30 20 10 0 WO1 WO2 WO1 WO2 WO1 WO2

Insertion 20% [SST4] [SST2]

Deletion 20% [MST5] [MST2]

Reorder 20% [LST4]

Fig. 19 Comparison results between (WO1) and (WO2) under 20% of various attacks for all datasets Figure 20 illustrate the comparative results under high rate (50%) of various attacks, As shown for all datasets, the proposed approach WO2 has the best performance and provide added value under insertion and deletion attacks for all datasets, and the proposed approach WO2 is not effective under reorder attacks.
100 90 80 70 60 50 40 30 20 10 0 WO1 WO2 WO1 WO2 WO1 WO2

Insertion 50% [SST4] [SST2]

Deletion 50% [MST5] [MST2]

Reorder 50% [LST4]

Fig. 20 Comparison results between (WO1) and (WO2) under 50% of various attacks for all datasets
285

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 3, May June (2013), IAEME Comparative Results of PMR Standard Deviation for Individual Dataset In order to evaluate the performance of the proposed approach (WO2), we find the PMR standard deviation between WO1 and WO2 approaches (PMR of WO2 - PMR of WO1) for all scenarios of each attack applied on each dataset as shown in Table IV. TABLE. IV STANDARD DEVIATION OF ALL SCENARIOS FOR ALL DATASETS UNDER VARIOUS ATTACKS PMR Standard Deviation Reference 27 (WO1) Dataset Insertion SST4 78.95 SST2 MST5 MST2 LST4 82.30 82.30 83.33 47.40 83.00 78.44 80.17 68.56 48.10 99.44 99.52 82.76 87.53 83.01 98.19 83.32 99.23 0.92 49.47 26.13 73.31 56.63 85.91 61.02 85.25 78.52 79.14 84.18 54.54 Deletion Reorder Insertion Deletion Reorder This approach (WO2)

The average of standard deviation of all scenarios for small dataset (SST4), medium dataset (MST5), and large dataset (LST4)are shown respectively in figure21. As shows, in case of SST4 dataset, the proposed approachWO2observed as the best under insertion and deletion attacks. On the other side, the WO1 approach is the best under reorder tampering attack in which the difference of standard deviation average with the proposed approach WO2 is (-44.9) which means that WO1 approach is recommended for detect reorder attacks, and the performance has been improved by the proposed approach WO2 under insertion and deletions attacks. As shown in case of MST5 dataset, the performance of WO2has improved under insertion and deletion attacks especially under deletion attacks with deference of standard deviation average with WO1 approach that approximately equal to (5.74), we observed also the PMR of the proposed approach has improved under reorder attacks with middle size of text document (MST5) as a comparative with small size of text document (SST4) but the WO1 approach still the best under this reorder tampering attacks.
286

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 3, May June (2013), IAEME Finally, in case of large dataset LST4, comparative results shows that the PMR standard deviation of WO2 approach still the best under deletion attacks, and decrease under insertion and reorder attacks.

SD of SST4 Dataset
100 90 80 70 60 50 40 30 20 10 0 Insertion Reference 27(WO1) Deletion Reorder

This approach (WO2)

SD of MST5 Dataset
100 90 80 70 60 50 40 30 20 10 0 Insertion Reference 27(WO1) Deletion Reorder

This approach (WO2)

SD of LST4 Dataset
100 90 80 70 60 50 40 30 20 10 0 Insertion Reference 27(WO1) Deletion Reorder

This approach (WO2)

Fig. 21 PMR standard deviation of all scenarios forSST4, MST5 and LST4 dataset under various attacks
287

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 3, May June (2013), IAEME Comparative Results of PMR Standard Deviation for All Datasets As shown in figure 22, the averageof standard deviation of all scenarios for all datasets shows that the proposed approach WO2 has positive difference with WO1approach (WO2 PMR WO1 PMR) in term of deletion attack which equal to (3.97) and has negativedifference under insertion (-9.03) and reorder (-41.42) attacks. Thus, the test of WO2 provide added value and recommended under deletion attacks, and it is not recommended for insertion and reorder attacks.

SD for all Datasets


100 80 60 40 20 0 Insertion Reference 27 (WO1) Deletion Reorder This approach (WO2)

Fig. 22 PMR standard deviation of all scenarios for all datasets under various attacks

V. CONCLUSION Based on word mechanism of Markov model order two, the authors have designed a text zero-watermark approach which is based on text analysis. The algorithm uses the text features as probabilistic patterns of states and transitions in order to generate and detect the watermark. The proposed approach is implemented using PHP programming language. The experiment results shows that the proposed approach is sensitive for all kinds of random tampering attacks and has good accuracy of tampering detection. Compared with the recent previous watermark approach named WO1 presented in reference (27) under random insertion, deletion and reorder attacks in multiple locations of 5 variable size text datasets, the comparative results shows that the watermark complexity is increased with the proposed approach, not effective under reorder attacks. However, the accuracy of tampering detection of the proposed approach is improved under all rates of deletion attacks with all sizes of text documents, and its close to accuracy of WO1 approach under insertion attacks.This means that the proposed approach provide added value and recommended in these cases, but it is not robust against reorder attacks especially for large sizes of text documents.

288

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 3, May June (2013), IAEME REFERENCES [1] Z. JaliI, A. Hamza, S. Shahidm M. Arif, A. Mirza, A Zero Text Watermarking Algorithm based on Non-Vowel ASCII Characters. International Conference on Educational and Information Technology (ICET 2010), IEEE. [2] Suhail M. A., Digital Watermarking for Protection of Intellectual Property. A Book Published by University of Bradford, UK, 2008. [3] L. Robert, C. Science, C. Government Arts, A Study on Digital Watermarking Techniques. International Journal of Recent Trends in Engineering, Vol. 1, No. 2, pp. 223-225, 2009. [4] X. Zhou, S. Wang, S. Xiong, Security Theory and Attack Analysis for Text Watermarking. International Conference on E-Business and Information System Security, IEEE, pp. 1-6, 2009. [5] T. Brassil, S Low, and N. F. Maxemchuk, Copyright Protection for the Electronic Distribution of Text Documents. Proceedings of the IEEE, vol. 87, no. 7, July 1999, pp. 1181-1196. [6] M. Atallah, V. Raskin, M. C. Crogan, C. F. Hempelmann, F. Kerschbaum, D. Mohamed, and S.Naik, Natural language watermarking: Design, analysis, andimplementation. Proceedings of the a Fourth Hiding Workshop, vol. LNCS 2137, 25-27 , 2001. [7] N. F. Maxemchuk and S Low, Marking Text Documents. Proceedings of the IEEE International Conference on Image Processing, Washington, DC, Oct 26-29, 1997, pp. 13- 16. [8] D. Huang, H. Yan, Interword distance changes represented by sine waves for watermarking text images. IEEE Trans. Circuits and Systems for Video Technology, Vol.11, No.12, pp. 1237 1245, 2001. [9] N. Maxemchuk, S. Low, Performance Comparison of Two Text Marking Methods. IEEE Journal of Selected Areas in Communications (JSAC), vol. 16 no. 4, pp. 561-572, 1998. [10] S. Low, N. Maxemchuk, Capacity of Text Marking Channel. IEEE Signal Processing Letters, vol. 7, no. 12 , pp. 345 -347, 2000. [11] M. Kim, Text Watermarking by Syntactic Analysis. 12th WSEAS International Conference on Computers, Heraklion, Greece, 2008. [12] H. Meral, B. Sankur, A. Sumru, T. Gngr, E. Sevin , Natural language watermarking via morphosyntactic alterations. Computer Speech and Language, 23, pp. 107-125, 2009. [13] Z. Jalil, A. Mirza, A Review of Digital Watermarking Techniques for Text Documents. International Conference on Information and Multimedia Technology, pp. 230-234 , IEEE, 2009. [14] M. AtaIIah, C. McDonough, S. Nirenburg, V. Raskin, Natural Language Processing for Information Assurance and Security: An Overview and Implementations. Proceedings 9th ACM/SIGSAC New Security Paradigms Workshop, pp. 5 1-65, 2000. [15] H. Meral, E. Sevinc, E. Unkar, B. Sankur, A. Ozsoy, T. Gungor, Syntactic tools for text watermarking. In Proc. of the SPIE International Conference on Security, Steganography, and Watermarking of Multimedia Contents, pp. 65050X-65050X-12, 2007. [16] O. Vybornova, B. Macq., Natural Language Watermarking and Robust Hashing Based on Presuppositional Analysis. IEEE International Conference on Information Reuse and Integration, IEEE, 2007. [17] M. tallah, V. Raskin, C. Hempelmann, language watermarking and tamperproofing. Proc. of al.. Natural 5th International Information Hiding Workshop, Noordwijkerhout, Netherlands, pp.196-212, 2002.
289

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 3, May June (2013), IAEME [18] U. Topkara, M. Topkara, M. J. Atallah, The Hiding Virtues of Ambiguity: Quantifiably Resilient Watermarking of Natural Language Text through Synonym Substitutions. In Proceedings of ACM Multimedia and Security Conference, Geneva, 2006. [19] Z Jalil, A. Mirza, H. Jabeen, Word Length Based Zero-Watermarking Algorithm for Tamper Detection in Text Documents. 2nd International Conference on Computer Engineering and Technology, pp. 378-382, IEEE, 2010. [20] Z Jalil, A. Mirza, M. Sabir, Content based Zero-Watermarking Algorithm for Authentication of Text Documents. (IJCSIS) International Journal of Computer Science and Information Security, Vol. 7, No. 2, 2010. [21] Z. Jalil , A. Mirza, T. Iqbal, A Zero-Watermarking Algorithm for Text Documents based on Structural Components. pp. 1-5 , IEEE, 2010. [22] M.Yingjie, G. Liming, W.Xianlong, G Tao, Chinese Text Zero-Watermark Based on Space Model.In Proceedings of I3rd International Workshop on Intelligent Systems and Applications,pp. 1-5 , IEEE, 2011. [23] S. Ranganathan, A. Johnsha, K. Kathirvel, M. Kumar, Combined Text Watermarking. International Journal of Computer Science and Information Technologies, Vol. 1 (5), pp. 414-416, 2010. [24] Fahd N. Al-Wesabi, Adnan Alsakaf, Kulkarni U. Vasantrao, A Zero Text Watermarking Algorithm based on the Probabilistic weights for Content Authentication of Text Documents, in Proc. On International Journal of Computer Applications(IJCA), U.S.A, pp. 388 - 393, 2012. [25] Fahd N. Al-Wesabi, Adnan Z. Alsakaf and Kulkarni U. Vasantrao, A Zero Text Watermarking Algorithm Based on the Probabilistic Patterns for Content Authentication of Text Documents, International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 1, 2013, pp. 284 - 300, ISSN Print: 0976 6367, ISSN Online: 0976 6375. [26] Fahd N. Al-Wesabi, Adnan Alsakaf, Kulkarni U. Vasantrao, English Text ZeroWatermark Based on Markov Model of Letter Level Order Two, Inderscience, International Journal of Applied Cryptography (IJACT), Submitted.. [27] Fahd N. Al-Wesabi, Adnan Alsakaf, Kulkarni U. Vasantrao, Content Authentication of English Text Documents Using Word Mechanism Order ONE of Markov Model and Zero-Watermarking Techniques, Elsevier, International journal of applied soft computing, Submitted.

290

Das könnte Ihnen auch gefallen