Sie sind auf Seite 1von 5

19th International Conference on Database and Expert Systems Application

Steganography Detection by means of Neural Networks

Zuzana Oplatkova, Jiri Holoska, Ivan Zelinka, Roman Senkerik Tomas Bata University in Zlin, Faculty of Applied Informatics, Nad Stranemi 4511, 762 72 Zlin, Czech Republic {oplatkova}@fai.utb.cz Abstract
Cryptography is one of very much spoken word nowadays. Security of messages transfer is very important and specialists have a work to think a new cryptography up. Cryptography, on the other hand, can be and is very often used by jailbirds, so cryptanalysts have also very important job to detect and reveal and then decode the coded messages. Steganography is additional method leading to better secure up messages which goes hand by hand with cryptography, that why reveal of such a message is not easy. This paper deals with neural network models that are able to detect steganography content coded by a program OutGuess. Neural networks are methods which are very flexible in learning to different and difficult problems. Results in this article show that used model had almost 100 % success in revealing steganography by means of OutGuess. have a method for its detection. To decode a message itself it is another problem. But at first, it has to be revealed that some message is inside the picture. This paper is then focused on such a detection of pictures which a message by a program OutGuess was coded in. Firstly program OutGuess is mentioned, then methods of extracting suitable samples which can be used for training by a neural network. Description of neural network will be in next paragraph and will be followed by results.

2. OutGuess
OutGuess is a universal steganography tool which is able to insert hidden information into redundant bits of input data [5]. Type of input data is not important at all for OutGuess because the program use specific drivers for specific graphic formats which extract redundant bits and after changes write inside back. Version which was used for other simulations is able to work with formats JPEG and PNG. JPEG pictures were used in this work. OutGuess is available under BSD license and it can be used also in commercial area. OutGuess is hard to detect by means of statistics calculation based on frequency analasis. Results are not able to reveal steganography content because OutGuess finds out a maximal length of the message before inserting inside the picture. This causes that result image is not changed from the point of view of frequency analysis as was described at [6].

1. Introduction
In current world we cannot imagine our lives without computers. But with usage a question of secure transfer of data appear very soon. Therefore coding and cryptography is very important. On the other hand, such behaviour and excellent work of some people can be used by jailbirds and this is very dangerous. Therefore cryptoanalysts do not sleep and work as hard as possible to detect and decode coded messages. Steganography and cryptography are connected together more or less [1] - [4]. Cryptography is strong in the usage of the key and the message is somehow coded. But if we send this unsecure, hackers will notice it very soon and will try to break it. But there is steganography. This helps with secure transfer of secret messages. It codes a message inside the picture. If you see such a picture, normally you do not recognize that there is a secret message. And this is the point. Hacker will go through and will not give the attention to the message. Therefore it is necessary to

3. Samples extracting
Firstly, it was necessary to obtain some data for training sets in neural networks. A program JPEG snoop was used which is able to work with extended information in image, video and text files. JPEG snoop is able to extract such a information like:

1529-4188/08 $25.00 2008 IEEE DOI 10.1109/DEXA.2008.82

571

Authorized licensed use limited to: Gandhi Institute of Technology & Management. Downloaded on September 15, 2009 at 05:19 from IEEE Xplore. Restrictions apply.

matrices of quantization tables of colourness and brightness information about reduction of colour parts quality of JPEG image EXIF information RGB histograms tables of Huffmans code At the beginning, we thought that a way thgrough quantization tables will be used. Unfortunatelly, this was not suitable way how to create training set for neural networks even tables show big changes before and after usage of OutGuess. The next idea was Huffmans coding.

12 bit 13 bit 14 bit 15 bit 16 bit


70 60 50 40 30 20 10 0 1 bit 2 bit

0 0 0 0 0

0 0 0 0 0

0 0 0 0 0

0 0 0 0 0

DC,Class0 DC,Class1 AC,Class0 AC,Class1

3 bit

4 bit

5 bit

6 bit

7 bit

8 bit

9 bit

3.1.

Huffmans coding
Fig. 1: Graph of Huffmans coding clear picture Table 2: Table of Huffmans coding - coded picture DC, Class0 1 bit 2 bit 3 bit 4 bit 5 bit 6 bit 7 bit 8 bit 9 bit 10 bit 11 bit 12 bit 13 bit 14 bit 15 bit 16 bit
100 90 80 70 60 50 40 30 20 10 0 1 bit 2 bit 3 bit 4 bit 5 bit 6 bit 7 bit 8 bit 9 bit 10 bit

Huffmans coding was designed by David Huffman in 1952. This method takes symbols represented e.g. by values of discrete cosine transformation (which is one of methods how to present information in pictures like color, brightness etc.) and coded it into changeable length code so that according statistics the shortest bit representation to symbols with the most often appearance. It has two very important properties it is a code with minimal length and prefix code that means that it can be decode uniquely. On the other hand, the disadvantage is that we must know appearance of each symbol a priori. But in the case of pictures we can work with estimation, which will be edited during the compression [7]. According the histograms of Huffmans coding following tables and figures were prepared (Table 1, Table 2, Fig. 1 and Fig. 2) as an example of changes between clear and coded image. Table 1: Table of Huffmans coding - Clear picture DC, Class0 1 bit 2 bit 3 bit 4 bit 5 bit 6 bit 7 bit 8 bit 9 bit 10 bit 11 bit 0 19 49 24 5 2 1 0 0 0 0 DC, Class1 0 66 0 26 6 2 0 0 0 0 0 AC, Class0 0 50 21 7 11 6 3 1 1 0 0 AC, Class1 0 35 34 14 8 3 3 2 1 0 0

DC, Class1 0 93 5 1 0 0 0 0 0 0 0 0 0 0 0 0

AC, Class0 0 46 9 27 8 4 3 1 1 1 0 0 0 0 0 0

AC, Class1 0 73 8 9 5 3 0 1 0 0 0 0 0 0 0 0

0 30 66 3 1 0 0 0 0 0 0 0 0 0 0 0

DC,Class0 DC,Class1 AC,Class0 AC,Class1

Fig. 2: Graph of Huffmans coding coded picture

572

Authorized licensed use limited to: Gandhi Institute of Technology & Management. Downloaded on September 15, 2009 at 05:19 from IEEE Xplore. Restrictions apply.

4. Used neural networks


For the experiments a toolbox Neural Networks for the environment Mathematica 5.2 (www.wolframresearch.com) was used. Two types of feedforward nets were used with one hidden and with two hidden layers. These names are overtaken from Mathematica environment to avoid other speculations what it means [8], [9].

{{0,3,63,15,10,5,3,2,0,0,0,48,22,12,8,6,4,1,0,0,0,5 0,11,17,10,4,4,2,1,1,0,39,21,18,10,6,2,2,1,0}, {0,1,39,20,20,14,5,1,0,0,0,32,21,21,15,8,2,0,0,0,0, 50,9,18,10,4,4,2,1,1,0,36,20,20,12,7,2,2,1,1}, {0,4,70,15,8,3,1,0,0,0,0,46,22,18,9,3,1,0,0,0,0,49, 10,18,10,4,4,2,1,1,0,38,20,20,10,6,2,2,1,0}, {0,4,77,11,5,2,1,0,0,0,0,48,24,17,6,2,1,0,0,0,0,52, 5,18,11,6,3,2,1,1,0,41,22,19,8,6,1,2,1,0}, {0,1,32,18,19,17,10,2,0,0,0,23,17,19,17,12,7,4,1,0 ,0,46,12,18,10,3,5,2,1,1,0,34,19,20,15,6,2,2,1,0}}
Fig. 5: Example of clear inputs in a training set

Fig. 3: Neural net with one hidden layer

{{0,2,50,18,13,7,5,4,1,0,0,54,13,8,8,8,7,2,0,0,0,52 ,12,17,10,2,3,2,1,1,0,38,12,19,15,8,2,2,2,1}, {0,0,25,16,20,19,14,5,1,0,0,24,16,19,17,14,7,2,0,0 ,0,47,12,18,11,3,4,2,1,1,0,34,13,20,18,8,3,2,2,1}, {0,4,60,17,12,5,1,0,0,0,0,46,18,16,12,6,2,0,0,0,0,4 9,13,18,10,3,3,1,1,0,0,40,13,19,14,7,2,2,2,1}, {0,4,68,14,8,4,2,1,0,0,0,47,23,12,7,5,4,1,0,0,0,57, 6,18,10,4,2,1,1,0,0,44,15,19,11,7,1,2,1,0}, {0,1,19,15,20,23,17,6,0,0,0,14,12,17,20,16,12,8,2, 0,0,39,13,19,13,2,7,3,1,1,0,32,16,19,17,7,3,2,3,1} }
Fig. 6: Example of coded inputs in a training set As numbers show there is a difference but here are the examples of two pictures without and with a secret message inside (Fig. 7 and Fig. 8). For the first view there is no difference.

Fig. 4: Neural net with two hidden layer

4.1.

Training sets

For training of neural networks is necessary to define suitable training sets. We used photos in 5 different resolutions, first 3 are directly from photo camera and the last two are were converted from the previous group: 1. 2592x1944 2. 2048x1536 3. 3872x2592 4. 1280x1024 5. 1024x768 In the same group of photos a secret message by means of OutGuess was inserted. The message is unique in each image due to pseudorandom generator of numbers. After that a JPEG snoop created tables of Huffmans coding. Each column from the tables was taken and given to a row one after one. This led to a vector of length equal 40. Examples of clear and coded inputs in a training set are in Fig. 5 and Fig. 6. It is a matrix of 5 individual inputs of length 40. These inputs are then given on the input layer of neural network. Output was 0 in the case of clear input and 1 in the case of coded input.

Fig. 7: Picture without a secret message

573

Authorized licensed use limited to: Gandhi Institute of Technology & Management. Downloaded on September 15, 2009 at 05:19 from IEEE Xplore. Restrictions apply.

Fig. 8: Picture with a secret message Training sets consist of 1000 samples for each resolution and either clear or coded inputs, i.e. 3000 samples in the case of clear inputs and 3000 for inputs coded by OutGuess.

Fig. 11: Root Mean Square Error RMSE for one hidden layer net Once a net is learned it has to be tested its behavior on known pictures before it could be used in real problems. For testing 366 pictures for each resolution and each class (clear or coded), ie. together 2196 pictures were taken. During search, learned net made a mistake in 8 cases for clear images, i.e. the answer of the net was in the interval 0.5 to 1, precisely {0.697057, 0.721635, 0.634685, 0.891021, 0.543743, 0.586112, 0.566472, 0.622812}. In the case of coded set 100 percent success appeared see Table 3. Table 3: Results of detection by one hidden layer neural net Nr. of bad classified cases (mistakes) Clear Coded 8 0 Perceptual mistake Perceptual success (100 perceptual mistake) 99.2714 100

5. Results
5.1. One hidden layer net

Setting of the net was: 7 neurons in hidden layer with transfer function linear saturated, default in NeuralNets toolbox. Output neuron has Sigmoid function.

Fig. 9: Linear saturated function

0.728597 0

5.2.

Two hidden layer net

Fig. 10: Sigmoid function Training iterations were set up to 30. Following figure shows the root mean square error (RMSE) dependent on the iterations during training.

Setting of transfer functions was the same as in previous case, as well as number of iterations. Number of neurons in hidden layer was set as 12 for first hidden and 4 for second hidden layer. Following picture show dependency of RMSE on the iterations during training as in the previous case.

574

Authorized licensed use limited to: Gandhi Institute of Technology & Management. Downloaded on September 15, 2009 at 05:19 from IEEE Xplore. Restrictions apply.

networks has a big success, almost 100%. The better was the net with two hidden layers which could be caused by bigger number of weights. The further work supposes to learn also other types of neural networks to find their ability of adaptation on such problems. The parallel step is also to try to learn neural networks on other steganographic methods e.g. inserted by a program Steghide.

7. Acknowledgements
Fig. 12: Root Mean Square Error RMSE for two hidden layer net The same set of images was tested to find out quality of the learned net. During search, learned net made a mistake only in one case for clear images, i.e. the answer of the net was in the interval 0.5 to 1, precisely {0. 975671}. In the case of coded set 100 percent success appeared see Table 4. Table 4: Results of detection by one hidden layer neural net Nr. of bad classified cases (mistakes) Clear Coded 1 0 Perceptual mistake Perceptual success (100 perceptual mistake) 99.9089 100 This work was supported by the grant NO. MSM 7088352101 of the Ministry of Education of the Czech Republic and by a grant of Grant Agency of Czech Republic GACR 102/06/1132.

8. References
[1] Steganography: Hidden Data, by Deborah Radcliff, June 10, 2002, http://www.computerworld.com/securitytopics/security /story/0,10801,71726,00.html [2] Bailey K., Curran K.: Steganography (Paperback), BookSurge Publishing, 2005, ISBN: 159457667X [3] Steganography, cited 2008-03-20, http://en.wikipedia.org/wiki/Steganography [4] Wayner P., Disappearing Cryptography, ISBN: 1558607692 [5] Software OutGuess, www.outguess.org [6] Defending Against Statistical Steganalysis Niels Provos, 10th USENIX Security Symposium. Washington, DC, August 2001 [7] BENE, Miroslav. Komprese: Huffmanovo kdovn [online]., Czech edition [cit. 2008-02-15]. <http://www.cs.vsb.cz/benes/vyuka/pte/texty/komprese /ch02s02.html>. [8] Neural Network Theory: Feedforward neural networks. Mathematica: Neural nets toolbox: Help [9] Gurney K.: An Introduction to Neural Networks, CRC, 1997, ISBN: 1857285034 [10] Holoska J.: Revealing of Steganography by means of Neural Networks, Czech edition, Master thesis, 2008, UTB Zlin, Czech Republic

0.0910747 0

As can be stated from Table 3 and Table 4 feedforward neural network with one and two hidden layers were almost 100% successful on a huge number of tested images. The second case, with two hidden layers, was more successful than the first one. It might be given more neurons and then more weights which can precise the result. On the other hand, the training process is then much more demanding. The results are also described in more details in [10].

6. Conclusions
This paper dealt with a detection of a steganography content in images inserted by program OutGuess. The revealing of it is done by means of feedforward neural networks either with one or two hidden layers. For learning it was used 6000 samples and for testing 2196 samples. Results show that used neural

575

Authorized licensed use limited to: Gandhi Institute of Technology & Management. Downloaded on September 15, 2009 at 05:19 from IEEE Xplore. Restrictions apply.

Das könnte Ihnen auch gefallen