Steganalysis

A Seminar Report on Steganalysis
Prepared by : Pratixa I Mistry Roll No. Class Semester Year Guided by : 110420704003 : MEEC : 3rd Semester : 2nd Year : Prof. Chirag N Paunwala
Department Of Electronics & Communication Engineering Sarvajanik College of Engineering & Technology Dr R.K. Desai Road, Athwalines, Surat - 395001, India
0
STEGANALYSIS
A Seminar Report
Submitted by
Ms. PRATIXA I MISTRY

Enrollment Number (110420704003)
in partial fulfillment for the award of the degree of
MASTER OF ENGINEERING
IN
ELECTRONICS & COMMUNICATION
Year-2012-13
At
SARVAJANIK COLLEGE OF ENGINEERING & TECHNOLOGY DR.R.K. DESAI ROAD, ATHWALINES SURAT-395001, INDIA
1
Abstract
Steganography and steganalysis are important topics in information hiding. Steganography refers to the technology of hiding data into digital media without drawing any suspicion, while steganalysis is the art of detecting the presence of steganography. Steganalysis is a relatively new branch of research. While steganography deals with techniques for hiding information, the goal of steganalysis is to detect and/or estimate potentially hidden information from observed data with little or no knowledge about the steganography algorithm or its parameters. It is fair to say that steganalysis is both an art and a science. The art of steganalysis plays a major role in the selection of features or characteristics a typical stego message might exhibit, while the science helps in reliably testing the selected features for the presence of hidden information. Steganalysis has gained prominence in national security and forensic sciences since detection of hidden messages can lead to the prevention of disastrous security incidents. Steganalysis is a very challenging field because of the scarcity of knowledge about the specific characteristics of the cover media (an image, an audio or video file) that can be exploited to hide information and detect the same. The approaches adopted for steganalysis also sometimes depend on the underlying steganography algorithm(s) used.
Acknowledgement
I would like to Thank Prof. Chirag N Paunwala for supervising my seminar and guiding me throughout the period of my Seminar. He has always been supportive and egger to help. His great experience has helped me immensely in the difficulties and delay that I faced in my seminar. My dear colleagues have also helped me directly or indirectly. Finally my parents for their constant support & Almighty God for providing the strength to complete the work.
Ms. Pratixa I Mistry.
II
Table of Contents
Abstract ............................................................................................................................ I Acknowledgement ........................................................................................................... II 1 Introduction ............................................................................................................. 1 1.1 1.2 1.3 1.4 Motivation: ....................................................................................................... 1 Types of Steganalysis: ....................................................................................... 2 Basic Model: ..................................................................................................... 2 Evaluation Criteria: ........................................................................................... 3 Criteria for Steganography: ........................................................................ 3 Criteria for Steganalysis: ............................................................................ 4
1.4.1 1.4.2 2
Literature review ..................................................................................................... 6 2.1 2.2 2.3 2.4 LSB Matching Steganalysis: .............................................................................. 6 LSB Steganalysis: ............................................................................................. 8 JSteg steganalysis: ............................................................................................. 9 Universal (blind) image Steganalysis using Blockiness: .................................. 11 Comparing The Calibrated image against the original image: ................... 11 Blockiness: ............................................................................................... 12 Parameterized Run-Length Representations: ............................................ 14 Statistical models and information hiding: ................................................ 15 Feature Extraction: ................................................................................... 17
2.4.1 2.4.2 2.5 2.6 2.5.1 2.6.1 2.6.2
Universal (blind) image Steganalysis: .............................................................. 13 Universal JPEG steganalysis: ......................................................................... 15
2.6.3 Feature Based JPEG Steganalysis using Neighboring Joint Density Based Features: ................................................................................................................ 18 2.7 Improving Steganographic Security: ................................................................ 19 REFERENCES.............................................................................................................. 20
III
LIST OF FIGURE
Figure 1.1 The model of steganography and steganalysis[4] .......................................... 3 Figure 1.2 Confusion matrix [4]..................................................................................... 4 Figure 1.3 ROC curve[4] ............................................................................................... 5 Figure 2.1 The flow chart of extracting feature .............................................................. 7 Figure 2.2 (a) Cover Image in RGB Colour Model (b) Stego-Image in RGB Colour Model [6] ........................................................................................................................ 9 Figure 2.3 (a) Cover Image in HSI Colour Model (b) Stego-Image in HSI Colour Model [6] ................................................................................................................................... 9 Figure 2.4 The histogram of a clean 8-bit JPEG image [1]. .......................................... 10 Figure 2.5 The histogram of an 8-bit stegogramme produced using JSteg [1]. .............. 10 Figure 2.6 Comparing the histogram of the calibrated image with that of the cover image [1]. ................................................................................................................................ 12 Figure 2.7 Graphical representation of the blockiness algorithm[1]. ............................. 13 Figure 2.8 Example of run length histograms of original and stego Lena [8]. ............... 14 Figure 2.9 (a): quantization RLHs with Q = 4; (b): difference RLHs with =2, [8]. ..... 15 Figure 2.10 The DCT neighboring joint density probabilities and the difference between the cover and the steganograms[9] ................................................................................. 17
IV
1 Introduction
Cryptography is often used to protect information secrecy through making messages illegible. However, indecipherable messages may raise an opponent's suspicion and probably lead to his destruction of such a communication manner. Therefore, steganography gets a role on the stage of information security[1]. Steganography refers to the technique of hiding information in digital media in order to conceal the existence of the information. The media with and without hidden information are called stego media and cover media, respectively [1]. Steganography can meet both legal and illegal interests. For example, civilians may use it for protecting privacy while terrorists may use it for spreading terroristic information. Compared to digital watermarking, another branch of information hiding, steganography stresses more on preserving the secrecy of the information instead of making the hidden information robust to attacks. Image Steganography can done using two way. Spatial steganography and JPEG steganography. The common ground of spatial steganography is to directly change the image pixel values for hiding data. The embedding rate is often measured in bit per pixel (bpp). JPEG is the common format of the images produced by digital cameras, scanners, and other photographic image capture devices. Therefore, hiding secret information into JPEG images may provide better camouage. Most of the steganographic schemes embed data into the nonzero alternate current (AC) discrete cosine transform (DCT) coefficients of JPEG images because the majority of the DCT energy is concentrated on low frequencies (DC coefficient) and less on higher frequencies ( AC coefficient). So inserting in AC coefficient affects less to the image quality compared to inserting in DC coefficients. As a result, the embedding rate of JPEG steganographic is often evaluated in bit per non-zero AC DCT coefficient (bpac). Steganalysis is an art of deterring covert communications while avoiding affecting the innocent ones. Its basic requirement is to determine accurately whether a secret message is hidden in the testing medium [1]. Further requirements may include judging the type of the steganography, estimating the rough length of the message, or even extracting the hidden message. Steganography and steganalysis are in a hide-and-seek game [5]. They try to defeat each other and also develop with each other. Based on the medium used in steganography to embed the message are classified basically in three types, image steganography, audio steganography and video steganography. Digital images have high degree of redundancy in representation and pervasive applications in daily life, thus appealing for hiding data. As a result, the past decade has seen growing interests in researches on image steganography and image steganalysis [3,5].
1.1 Motivation:
Image steganalysis is the science of analyzing images to discover methods of detecting hidden messages and data within the images. 1
On the steganography side, this is important in order to find methods in order to improve the algorithm implementing steganography. By exposing the flaws to the algorithm, the user can further improve the algorithm in order to make difficult to detect whether or not data is hidden in the images. Steganalysis is also especially important in security aspects, namely monitoring a users communication with the outside world. In the age of internet, images are sent via email or by posting on websites. Detecting whether or not data is hidden in the images will allow the monitor to further analyze the suspicious in order to find what the hidden message is. Steganalysis is very important to international security, as growing interest emerges as to whether terrorist organizations use steganographic techniques to communicate with each other. In fact steganalysis is taken so seriously with security aspects.
1.2 Types of Steganalysis:

Based on the ultimate outcome of the effort we can classify steganalysis into two categories[3]: Passive steganalysis: Detect presence/absence of hidden message in a stego signal, identify the stego embedding algorithm. Active steganalysis: Estimate the embedded message length, estimate locations of the hidden message, estimate the secret key used in embedding, estimate some parameters of the stego embedding algorithm and extract the hidden message. In this report methods are described for image steganalysis and of passive type.
1.3 Basic Model:

The issue in steganography and steganalysis is often modeled by the prisoner's problem [4] which involves three parties, as illustrated in figure 1.1. Alice and Bob are two prisoners who collaborate to hatch an escape plan while their communications will be monitored by a warden, Wendy. Using a data embedding method (.), secret information m is supposed to be hidden into a cover medium X by Alice with a key k1. Generation of an innocuous-looking stego medium Y can be described as Y=(X, m, k1). On the receiver's side, the medium obtained by Bob, denoted by Y, is passed to a data extraction method (.) to extract information m with a key k2. The extraction process may be described as m = (Y, k2). The steganographic scheme should ensure m= m. Although the public key steganographic scheme is considered in some literatures, the private key steganographic scheme, where k1 = k2 is assumed, remains the most common scenario in a steganographic system. Wendy can be active or passive judging from the nature of her work on examining the media in transmission. If she makes Y Y in order to foil all possible covert communications between Alice and Bob, she is called an active warden. If she only takes actions when Y is found suspicious, she is a passive warden. In the passive warden case, which is the main focus of this report, once Wendy can differentiate Y from X; the steganographic method is considered broken. 2
Note that this model only aims to explain the concepts of steganography and steganalysis, but not to detail the way on how to conduct the practice.
Cover Image x
Secret Message m
Data Embedding
Stego Image y
Channel
Stego Image y
Data Extraction
Secret Message m
Secret Key k1
Steganalysis
Secret Key k2
Alice
Wendy
Bob
Figure 1.1 The model of steganography and steganalysis[4]
1.4 Evaluation Criteria:

In order to reasonably evaluate the performance of various kinds of steganographic and steganalytic methods, it is necessary to define some criteria acceptable to the majority. Moreover, the valuation criteria may also lead us to the right direction to improve the techniques.
1.4.1 Criteria for Steganography:

Three common requirements, security, capacity, and imperceptibility, may be used to rate the performance of steganographic techniques [4]. Security: Steganography may suffer from many active or passive attacks, correspondingly in the prisoner's problem when Wendy acts as an active or passive warden. If the existence of the secret message can only be estimated with a probability not higher than random guessing in the presence of some steganalytic systems, steganography may be considered secure under such steganalytic systems. Otherwise we may claim it to be insecure. Capacity: To be useful in conveying secret message, the hiding capacity provided by steganography should be as high as possible, which may be given in absolute measurement (such as the size of secret message), or in relative value (called data embedding rate, such as bits per pixel, bits per nonzero DCT coefficient, or the ratio of the secret message to the cover medium, etc.). Imperceptibility: Stego images should not have severe visual artifacts. If the resultant stego image appears innocuous enough, one can believe this requirement to be satisfied well for the warden not having the original cover image to compare.
1.4.2 Criteria for Steganalysis:

The main goal of steganalysis is to identify whether or not a suspected medium is embedded with secret data, in other words, to determine the testing medium belong to the cover class or the stego class. If a certain steganalytic method is used to steganalyze a suspicious medium, there are four possible resultant situations [4]. True positive (TP): meaning that a stego medium is correctly classified as stego. False negative (FN): meaning that a stego medium is wrongly classified as cover. True negative (TN): meaning that a cover medium is correctly classified as cover. False positive (FP): meaning that a cover medium is wrongly classified as stego. Confusion Matrix: When applying a steganalytic method on a testing data set, which may consist of cover and stego media, a 2x2 confusion matrix[4], is illustrated in figure 1.2.
True type Stego image Cover image
Stego image
True positives (TPs)
False positives (FPs)
Detected type
Cover image
False negatives (FNs)
True negatives (TNs)
Sum up by column
Number of cover images
Number of stego images +
Figure 1.2 Confusion matrix [4]
TP Rate = FP Rate = Accuracy =
1-1
+ +
1-2
+ + + +
1-3
Precision= 4
1-4
Receiver Operating Characteristic (ROC) Curve: The performance of a steganalytic classifier may be visualized by an ROC curve [4], in which true positive rate is plotted on the vertical axis and false positive rate is plotted on the horizontal axis (see Figure 1.3). If the area under the ROC curve (AUC) is larger, the performance of the steganalytic method is better. For example, it can be observed from Figure 1.3 that the performance of ROC curve C is better than B, and B is better than A.
Figure 1.3 ROC curve[4]
2 Literature review
Steganalysis can be regarded as a two-class pattern classification problem which aims to determine whether a testing medium is a cover medium or a stego one. According to its application fields, it can be divided into specific methods and universal methods [4]. A specific steganalytic method fully utilizes the knowledge of a targeted steganographic technique and may only be applicable to such a kind of steganography. A universal steganalytic method can be used to detect several kinds of steganography. Usually universal methods do not require the knowledge of the details of the embedding operations. Therefore, it is also called blind method. Some methods can be considered as "semi-universal", means this methods can reliably detect many JPEG steganographic schemes but may not be effective to spatial steganography.
2.1 LSB Matching Steganalysis:

LSB matching, which is a minor modication of LSB steganography. Instead of replacing the LSBs of the cover image pixels, LSB matching adds or subtracts them by 1 if they does not match the message bits. In [2,10] author has focuses on image steganalysis based on higher order image statistics based on neighborhood information of pixels (NIP) to detect the stego images from original ones. They use subtracting gray values of adjacent pixels to capture neighborhood information. Adjacent image pixels in a neighborhood area contain more local information of an image itself, and are suitable for steganalysis. Inspired by these ideas, they developed their image steganalysis feature sets based on neighborhood information of pixels (NIP). Let ax,y be the gray value of image pixel in coordinates (x,y). neighbor set N(x,y) = { a x-y,y, a x,y+1, a x+1,y, a x,y-1} with elements of gray a x-y,y, a x,y+1, a x+1,y, a x,y-1 values around pixel (x,y). Then subtract gray value of pixels of neighbor with that of center and then threshold them with T. More details are presented as follows: Differenced and thresholded sets DS (x, y) for a neighbor set N(x, y) . DS (x,y)= {DS1 (x,y), DS2 (x,y), DS3 (x,y), DS4 (x,y)} DS1 =Tsh(a x-y,y - ax,y) , DS4 =Tsh(a x,y-1 - ax,y) DS2 =Tsh(a x,y+1 - ax,y), DS3 = Tsh( a x+1,y - ax,y),
While Tsh(.) denote thresholding if input number is larger (or smaller) than T(or -T), as following definition: , Tsh(x) = < < >
After thresholding, elements of take values from -T to T, thus DS have (2T + 1)4 6
possible states for any single pixel because here total (2T+1) number of values and have to arrange them in group of four. Although we can reduced number of possible states of DS by taking threshold, even we can set value T to a very small number, the states of (2T + 1)4 are still too large to get a histogram of DS. Hence combine rotation invariant states. In principal we need to map states to same value if they are rotation invariant. Map any DS(x,y) to a code that c(x,y) [1,(2T+1)4] that ensure rotate invariant ones be identically and uniquely coded. After coding, calculate histogram H for coded DS(x,y) , as feature sets:
1 2 = (1 , 1 , 1 (2 +1)4
)
2-1
, ,
i =1,2, (2T+1)4
0 if x = y Where (x ,y) = 1 if xy
Although in this step, dimensionality of H equals to (2T+1)4, but it is obvious that some bins of H constantly equal to zero due to this special encoding method. These definite zero bins can be easily distinguished by a simple analysis. Remove those redundant zero bins yielding a feature set denoted as F. Dimensionality of F is less than H. The flowchart of extracting feature is presented in figure 2.1.
Neighboring pixels difference
Thresholding with T
Image NIP feature set Calculate normalized histogram Rotation invariant coding
Figure 2.1 The flow chart of extracting feature
They have implemented experiments on BOWS2 image database. Performance of feature sets is assessed by their detection rate of test samples. We use true positive (TP), true negative (TN), and average rate (AR) to compare the detection performance. Considering the trade-off between preserving neighbor structure and low dimensionality of feature, it is acceptable when we set T=3. Except previous defined neighbor, we can also define the neighbor of pixel (x,y) as a set of adjacent pixels in diagonal and mirror diagonal: {ax-1,y-1, ax+1,y-1, ax+1,y+1, ax-1,y+1} and extract NIP feature with the same procedure as described. The dimensionality of this type of NIP feature is also 616. By analyzing the given result in [2], it is concluded that the TN and TP rates for the LSB matching is higher for horizontal and vertical NIP, and those results are better for 0.25 bpp compared to 0.15 bpp embedding. 7
2.2 LSB Steganalysis:

LSB steganography can be done by replacing the LSBs by of randomly selected pixels in the cover image with the secret message bits. The selection of pixels may be determined by a secret key. In [6] auther has developed Steganalysis algorithm base on RGB to HSI colour model conversion. It is tested for stego-image database which is obtained by implementing various RGB Least Significant Bit Steganographic algorithms. There are three different types of colour models they are HSI, HSV, and RGB. Any colour model can be converted to other model using mathematical expression In HSI model the values of Hue, Saturation and Intensity values are derived from all the three R, G and B values. Any change in the values of red or green or blue are easily reflected in all values of HSI colour model. RGB to HSI colour model can be mathematically derived with respect to normalized values of RGB and the mathematical expression is shown below. H = cos
-1 1 2
[ + ] 2-2
[ 2 + ( )()]1/2
S = 1 - (++) [min(, , )] I = 3 ( + + )
1
2-3
2-4
For any given image as input in proposed model it will be converted in to HSI Colour Model and by careful observation stego image can be differentiated from the Cover Image as shown in figure 2.2 and 2.3. The Original and Stego-Image of Ace picture is shown in RGB Colour Model in figure 2.2 where from visual perception its difficult to differentiate the Cover and Stego-Image. The same figures when converted to HSI Colour Model the visual distortion is seen in the top rows of Aces Stego-Image and it shown in figure 2.3. The proposed method was tested only for stego-images generated by LSB Steganography algorithm. Input images from various categories such as natural sceneries, birds, animals etc have chosen. Images generated from their own Stego Image Generator (SIG) tool was given as input.
(a)
(b)
Figure 2.2 (a) Cover Image in RGB Colour Model (b) Stego-Image in RGB Colour Model [6]
(a)
(b)
Figure 2.3 (a) Cover Image in HSI Colour Model (b) Stego-Image in HSI Colour Model [6]
2.3 JSteg steganalysis:

JSteg embeds secret information into a cover image by successively replacing the LSBs of non-zero quantized DCT coefficients with secret message bits. As auther has discussed in [1] the JSteg algorithm introduces Pairs of Values (PoVs) as a result of sequential bit-flipping. It is possible to illustrate these PoVs by extracting all of an images DCT AC coefficients and tallying their frequencies of occurrence. If we split the values into bins we can narrow the results to a focused subsection and display the results by centering them across a specified range x. The result is referred to as a histogram.
What we expect to see for a clean image is that the histogram illustrates a linear distribution to the frequencies of the DCT coefficients across zero. As the values have not been altered by any embedding process, there is a clear structure to the values that provides a characteristic for detecting steganography (see figure 2.4 ).
Figure 2.4 The histogram of a clean 8-bit JPEG image [1].
Figure 2.5 shows the histogram of a clean 8-bit JPEG image. As expected, the values increase in frequency in a linear fashion towards zero, and decrease after. If we compare this histogram with that of a JSteg stegogramme at 80% embedding capacity, we can see how important a role the PoVs play in steganalysis.
Figure 2.5 The histogram of an 8-bit stegogramme produced using JSteg [1].
Above figure shows that the PoVs created by JStegs bit-flipping methodology are apparent in the stegogrammes histogram. All of the values (except 0 and 1 which JSteg does not embed within) can be paired together by their neighboring values because their frequency of occurrence has become very similar. For example, the value -2 occurs roughly as often as the value -1, and similarly the value 2 occurs roughly as often as the value 3. This trait is characteristic of a stegogramme created by bit-flipping. 10
2.4 Universal (blind) image Steganalysis using Blockiness:

Perhaps the most important aspect of blind steganalysis is ensuring that we can derive an estimate of the cover image that is as accurate as possible. The attacks that follow this procedure often compare the data in the estimated cover image to that of the suspect image, so it is imperative that the data of the estimate is as sound as possible so as to not obscure the results. One of the most famous approaches for creating an estimate of the cover image is the model proposed by Jessica Fridrich in [1,7] known as JPEG Calibration. The method take advantage of the fact that most stego-systems encode the message data in the transform domain during the compression procedure to produce JPEG stegogrammes. Given that the JPEG compression algorithm operates by transforming the image into 8x8 blocks, and it is within these blocks that the encoding of the message operates, we can estimate the cover work by introducing a new block structure and comparing it with that of the suspect image. When there is a large difference, it suggests that the suspect image is a stegogramme, where as little difference typically indicates that the image is innocent. The general methodology of the calibration process decompresses the suspect image, removes 4 pixels from each side, and then recompresses the result using the same quantisation table. Visually, and technically (by measures such as PSNR), the calibrated image is still very close to that of the suspect image. However, as a result of cropping the image and recompressing, we effectively break the block structure of the suspect image because the second compression does not consider the first.
2.4.1 Comparing The Calibrated image against the original image:

Perhaps the most effective way of determining how similar the images are is to compare the histograms of them both and overlay the plots such that we can see how similar they are. Figure 2.6 shows the histograms of the cover image, the calibrated image, and also the histogram of the stegogramme. The stegogramme was created by embedding a message at 50% capacity using the F4 steganalysis algorithm. As we can see from Figure 2.6 , the histograms of the cover image and the calibrated images are very close together, meaning the calibration process has been successful. Create an image that contains roughly the same statistical property of the original cover image, even though we had no access to the cover image at any point. Compare these two histograms against that of the stegogramme. Note that the histogram for the stegogramme is much more distant from the cover image and calibrated image. If we were to eliminate the plot for the cover image in Figure 2.6, it would be left with two rather varying plots. We could make a guess that the suspect image is a stegogramme based on this information, but as the calibration process relies heavily on the information of the quantized DCT coefficients, this histogram can not be used alone to make a final decision. Auther have illustrated a blind steganalytical method for evaluating the probability that the image is a stegogramme.
11
Figure 2.6 Comparing the histogram of the calibrated image with that of the cover image [1].
2.4.2 Blockiness:
Estimate of the cover image is derived in above part. we need to find some statistical property that differs between the calibrated image and the suspect image such that we can determine the probability that the image is a stegogramme. One of the strongest methods for achieving this is known as Blockiness which takes advantage of the fact that JPEG-driven stego-systems encode the message data in the same 8x8 blocks that are used for compression. The method is defined best by Dongdong Fu in [1,7,11] when it is stated that: "[Blockiness] defines the sum of spatial discontinuities along the boundary of all 8x8 blocks of JPEG images". Essentially, the logic behind Blockiness is that a stegogramme will contain a different set of coefficients across the boundaries of each 8x8 block to that of a clean image. We can therefore total the sums of the boundaries column-wise and row-wise for both a suspect image and a clean image (or our calibrated image) and then calculate the difference between the two. A large difference suggests that the image is a stegogramme, whilst a small difference is probably down to compression, and therefore reflects a clean image. The formula for calculating the Blockiness of an image is shown in equation 2.4.
B=
1 8
=1
=1
8 , 8 +1, +
1 8
=1
=1
,8 ,8 +1
2-5
where gi;j refers to the coordinates of a pixel value in an MxN grayscale image. To express this in graphical terms, consider Figure 2.7. It firstly shows the boundaries of the 8x8 blocks in (a), and then shows what these values look like in the 12
spatial domain in (b). The red lines indicate the columns that are multiples of 8, and the yellow lines represent their neighboring columns that are multiples of 8 + 1. For each column, the sum of the yellow column is subtracted from the red column. Similarly, the sum of the green rows are subtracted from the blue rows. The absolute values of the two separate totals are then added together to yield the blockiness value.
Figure 2.7 Graphical representation of the blockiness algorithm[1].
2.5 Universal (blind) image Steganalysis:

In [8] auther has proposed a new, simple but effective method is for blind image steganalysis, which is based on run-length histogram analysis [8. Higher order statistics of characteristic functions of three types of image run-length histograms are selected as features which is described below. Their method is described below. For a given image, a run length matrix p(i, j) is defined as the number of runs with pixels of gray level i and run length j. For a run-length matrix p(i, j), let M be the number of gray levels and N be the maximum run length. The image run-length histogram (RLH) can be defined as a vector: () =
=1
(, )
1<j<N
2-6
This vector represents the sum distribution of the number of runs with run length j in the corresponding image. In order to reduce the effect of different image sizes, the RLH may be normalized by the maximal value of the histogram. The lengths of runs could reflect the details of image texture element, hence reflect the local intensity variations of the image. This is the major reason for using the run-length histogram as the basis of the features for steganalysis. After the processing of per-pixel data embedding, the perceived local intensity continuity will be disturbed and the corresponding pixel runs will be alerted. For 13
example, after LSB data embedding, values of some image pixels will be increased or decreased by one as a result. These changes would directly influence the image RLH. A concurrent change occurs: long runs in image would break into short runs, leading to a smaller number of long runs and a larger number of short runs. As a result, the image RLH would shrink. Although there also exist cases that short runs may be combined to a long run, the tendency of these combinations is much less significant than the splitting of long runs because of the spatial correlation of natural images. Figure 2.8. shows the RLHs of the Lena image before and after data hiding, where the shrinkage is clearly seen.
Log(Number of Run Length + 1)
Run Length
Figure 2.8 Example of run length histograms of original and stego Lena [8].
2.5.1 Parameterized Run-Length Representations:

The two run-length representations defined below make the long runs in image runlength matrix much more than those of traditional run-length matrix. As a result, the tendency of shrinkage of their corresponding RLH caused by data hiding turns to be much more obvious, hence the RLH is more sensitive to data embedding which we can see in figure2.9. For natural images, the number of short runs is significantly more than the number of long runs in an image RLH. The maximal length of runs is usually very limited compared to the range of possible length values (see Fig.2.9). In order to make the shrinkage of image RLH more obvious so as to make RLH more sensitive to data embedding, define two new run-length representations, which are variations of the traditional run-length matrix p(i, j), by counting the pixels into a same run with different rules and parameters. Quantization Run-length Representation: Firstly apply intensity quantization on the image plane using a quantization step factor Q. Then calculate the RLH of the quantized image matrix. For example, for a 256 gray-level image with Q = 2, we get a new image matrix whose range of intensity values is from 0 to 127. Hence, the number of long runs in this new image RLH would increase compared to the original image RLH, because each pair of neighboring intensities would 14
fall into the same run. Obviously, the lager the Q is, the more long runs we can expect. The traditional image runlength matrix is just the special case of quantization run-length matrix with Q = 1. Difference Run-length Representation: A run in this type of representation is defined as a string of pixels with a maximum inter-pixel absolute intensity difference of along a direction. Thus, a string of consecutive pixels with small intensity difference would form a single run. For example, for a string of 4 image pixels with intensity of 124, 125, 125 and 126, their corresponding traditional run-length matrix is p(124, 1), p(125, 2) and p(126, 1) , while their corresponding difference run-length matrix in case of = 2 is p(124, 4). Similarly, the larger the is, the more long runs we can obtain. When is 0, the difference run-length matrix is simply the traditional image run-length matrix.
Log(Number of Run Length + 1) Log(Number of Run Length + 1) Run Length
Run Length
(a)
(b)
Figure 2.9 (a): quantization RLHs with Q = 4; (b): difference RLHs with =2, [8].
They have choosen a commonly used image database, the CorelDraw Database,in their experiments. And totally 1142 images from Corel-Draw version 11 CD #4 were collected as the original images. Also, six sets of stego images were generated by using six different (both types of spetial and transform) stego-algorithms.
2.6 Universal JPEG steganalysis:

In [9] auther has proposed a blind image steganalysis, which is based on Intra block and Inter block neighboring joint density based approach. The neighboring joint density on both intra-block and inter-block are extracted from the DCT coefficient array. After the feature space has been constructed, it uses SVM like binary classifier for training and classification (classifier is not discussed in this paper).
2.6.1 Statistical models and information hiding:

A model Probability density function (PDF) can characterize the statistical behavior of a signal. For multimedia signals, the Generalized Gaussian distribution (GGD) is often used. GGD can be applied to model the distribution of Discrete 15
Cosine Transform (DCT) coefficients, the wavelet transform coefficients, pixels difference, etc. Thus, it might be used in video and geometry compression, watermarking, etc. GGD is also known in economy as Generalized Error Distribution (GED). Probability density function of the continuous random variable of GGD takes the form.
; , =
2 . (1/ ) 0
(| |/ )
2-7
z =
1 , > 0
2-8
(z) is the Gamma function, scale parameter models the width of the PDF peak and shape parameter models the shape of the distribution. Their exists the dependency between the compressed DCT coefficients and their neighbors. The information hiding will modify the neighboring joint density of the DCT coefficients. Let the left or upper adjacent DCT coefficient be denoted by random vector X1 and the right or lower adjacent DCT coefficients be denoted by random vector X2; let X = (X1,X2). When hidden data are embedded in the compressed DCT domain in JPEG images by using any steganographic algorithms the DCT neighboring joint probability density coefficients is affected and these changes will be helpful for steganalysis. The change in joint density due to message embedding is shown by the following example. Figure 1 shows the cover image, F5 embedded image and the steghide embedded image. Figure 2 shows the compressed DCT neighboring joint density probability, the neighboring joint density distribution of a F5 steganogram carrying some hidden data and the neighboring joint density distribution of a steghide steganogram carrying some hidden data. From figure 2 it is clear that the neighboring joint density is approximately symmetric about the origin. Figure 3 shows the difference of neighboring joint density of F5 steganogram and steghide steganogram with cover image. So by embedding message the neighboring joint density get modified.
(a) A cover and neighboring joint density
16
(b) The F5 steganogram and neighboring joint density
(c) The Steghide steganogram and neighboring joint density
(d) The difference of the neighboring joint density Figure 2.10 The DCT neighboring joint density probabilities and the difference between the cover and the steganograms[9]
Neighboring Joint Density Features: The information hiding will modify the neighboring joint density. When messages are embedded in the compressed DCT domain in JPEG images by any of the steganographic algorithms the DCT neighboring joint density probability density is affected which will gives a way for steganalysis.
2.6.2 Feature Extraction:

The neighboring joint features are extracted on intra-block and inter-block from the DCT coefficient array respectively. From the DCT coefficient array the 17
neighboring joint density of intra block and inter block features are extracted as shown below. Let F denote the compressed DCT coefficient array of a JPEG image, which consists of MN blocks, F ij ( i =1, 2, ..M; j = 1, 2, , N) with size 88. The intra-block neighboring joint density matrix on horizontal direction NJ1h and the matrix on vertical direction NJ1v are constructed as follows: 1 , =
=1 =1 8 =1 7 =1 ( = , +1
= )
56
=1 =1 7 =1 8 =1 (
2-9
1 , =
= , +1 = )
56
2-10
Where cijmn stands for the compressed DCT coefficient located at the mth row and the nth column in the block Fij. = 1 if and only if its arguments are satisfied. For computational efficiency, the neighboring joint density features on intra-block NJ1 is calculated by: 1 , = 1 , + 1 (, ) /2
2-11
Here the values of x and y are in the range of [6, +6], so NJ1 has 169 features. Similarly the inter-block neighboring joint density matrix NJ2h on horizontal direction and the matrix on vertical direction NJ2v are constructed as follows:
2 , = 2 , =
8 =1
8 =1
=1
1 =1 (
= ,
+1
= ) 2-12 = ) 2-13
64 (1)
8 =1 8 =1 1 =1 =1 (
= ,
+1
64 (1)
The neighboring joint density on inter-block NJ2 is calculated by:
2 , = 2 , + 2 ( , ) /2
2-14
Similarly, the values of x and y are in the range of [6, +6] and NJ2 has 169 features. Hence we extract 169 features from both neighboring joint density of intra and inter block. So totally 338 features are extracted from neighboring joint density DCT array.
2.6.3 Feature Based JPEG Density Based Features:
Steganalysis
using Neighboring Joint
From the neighboring joint density of intra block 169 features and from neighboring joint density of inter block another 169 features are extracted and totally 338 distinguishable statistics are extracted for better steganalysis. After the features are extracted from both stego and clear images it will be given to SVM like binary classifier for training. After the training is completed the features from test images are given for classification. 18
2.7 Improving Steganographic Security:

There are some factors that may influence the steganographic security, such as the number of changed pixels/coefficients, the properties of cover images, etc. In the following some techniques are discussed for making the steganography less detectable [4]. Here we discuss with respect to basic model (figure 1.1). Increasing the Embedding efficiency: If cover images do not need to be modified at all for conveying secret information, certainly the warden cannot di erentiate the cover images and stego images. Therefore, if the probability of modification to the images is less, the embedding changes to the image will reduce, and the security of the steganographic method may increase. Defining the embedding efficiency as the number of embedded bits per one embedding change. Hence, increasing the embedding efficiency is a possible way to enhance the steganographic security. Reducing the Embedding Distortion: Increasing the embedding efficiency can reduce the embedding changes to the image. However, it cannot guarantee that the distortion to the image is minimized. If not all of the coefficients are used for carrying data, Alice has the freedom to select the coefficients whose resultant distortions after data embedding are the smallest for modification. In this way, the stego image will be close to the cover image perceptually and statistically, thus enhancing the steganographic security. Selecting Proper Cover Images: In some scenarios, Alice has the freedom to select the most unsuspicious stego images for conveying secret information. The better images can be chosen according to the availability of the knowledge of a potential steganalyzer.
19
REFERENCES
[1] Philip Bateman, Image Steganography and Steganalysis, Thesis of Master of Science in Security Technologies & Applications, University of Surrey, United Kingdom, August 2008. [2] Qingxiao Guan, Jing Dong, and Tieniu Tan, An Effective Image Steganalysis Method Based on Neighborhood Information of Pixels, 18th IEEE International Conference on Image Processing , 2011. [3] R. Chandramouli, A mathematical framework for active steganalysis, Springer Multimedia Systems, vol. 9, pp. 303311, 2003. [4] Bin Li, Junhui He, Jiwu Huang, Yun Qing Shi, A Survey on Image Steganography and Steganalysis, International Journal of Information Hiding and Multimedia Signal Processing Vol. 2, No. 2, April 2011. [5] Niels Provos, Peter Honeyman, Hide and Seek: An Introduction to Steganography, Journal of IEEE COMPUTER SOCIETY, vol. 03, JUNE 2003. [6] P.Thiyagarajan, G.Aghila and V. Prasanna Venkatesan, Steganalysis using Colour Model Conversion, Signal & Image Processing : An International Journal (SIPIJ) Vol.2, No.4, December 2011 [7] Dongdong Fu, Yun Q. Shi, Dekun Zou, Guorong Xuan, JPEG Steganalysis Using Empirical Transition Matrix in Block DCT Domain,IEEE Conference of Multimedia Signal Processing, 310 313, Oct. 2006. [8] Jing Dong , Tieniu Tan, Blind Image Steganalysis based on Run -length Histogram Analysis, 15th IEEE International Conference on Image Processing, 2008. [9] Arun R Nithin Ravi S and Thiruppathi K, Intra Block and Inter Block Neighboring Joint Density Based Approach for JPEG steganalysis, International Journal on Soft Computing (IJSC) Vol.3, No.2, May 2012. [10] Vajiheh Sabeti, Shadrokh Samavi , Shahram Shirani, An adaptive LSB matching steganography based on octonary complexity measure, Springer Science+Business Media, LLC 2012. [11] Kanchan Patil, Ravindra Gupta, Gajendra Singh, Digital Image Steganalysis Schemes for Breaking Steganography, International Journal of Computer Applications (IJCA), 2012.
20

Steganalysis

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Steganalysis

Hochgeladen von

Copyright:

Verfügbare Formate

A Seminar Report on Steganalysis

Ms. PRATIXA I MISTRY

in partial fulfillment for the award of the degree of

ELECTRONICS & COMMUNICATION

Ms. Pratixa I Mistry.

2.4.1 2.4.2 2.5 2.6 2.5.1 2.6.1 2.6.2

Universal (blind) image Steganalysis: .............................................................. 13 Universal JPEG steganalysis: ......................................................................... 15

1.2 Types of Steganalysis:

1.3 Basic Model:

Figure 1.1 The model of steganography and steganalysis[4]

1.4 Evaluation Criteria:

1.4.1 Criteria for Steganography:

1.4.2 Criteria for Steganalysis:

True positives (TPs)

False positives (FPs)

False negatives (FNs)

True negatives (TNs)

Number of cover images

Number of stego images +

Figure 1.2 Confusion matrix [4]

TP Rate = FP Rate = Accuracy =

Figure 1.3 ROC curve[4]

2.1 LSB Matching Steganalysis:

Neighboring pixels difference

Figure 2.1 The flow chart of extracting feature

2.2 LSB Steganalysis:

2.3 JSteg steganalysis:

Figure 2.4 The histogram of a clean 8-bit JPEG image [1].

2.4 Universal (blind) image Steganalysis using Blockiness:

2.4.1 Comparing The Calibrated image against the original image:

Figure 2.7 Graphical representation of the blockiness algorithm[1].

2.5 Universal (blind) image Steganalysis:

Log(Number of Run Length + 1)

2.5.1 Parameterized Run-Length Representations:

2.6 Universal JPEG steganalysis:

2.6.1 Statistical models and information hiding:

(a) A cover and neighboring joint density

(b) The F5 steganogram and neighboring joint density

(c) The Steghide steganogram and neighboring joint density

2.6.2 Feature Extraction:

The neighboring joint density on inter-block NJ2 is calculated by:

2.6.3 Feature Based JPEG Density Based Features:

using Neighboring Joint

2.7 Improving Steganographic Security:

Das könnte Ihnen auch gefallen