Sie sind auf Seite 1von 24

1.

INTRODUCTION: encryption algorithms, equally difficult to


decipher
1.1. DEFINITION: C. Steganography can be an effective means
that enables concealed data to be
Steganography is defined by Markus transferred inside of seemingly innocuous
Kahn as follows, "Steganography is the art and (not harmful) carrier files.
science of communicating in a way which hides the D. The purpose of steganography is to
existence of the communication." convey a message inside of a conduit of
(OR) misrepresentation such that the existence
Steganography is an art and science of of the message is both hidden and difficult
hiding information within other information. to recover when discovered.
(OR) E. Steganography : Greek language derived
We can define steganography as word, STEGOS: HIDDEN/ COVERED,
cryptography with the additional property that its GRAPHIA: WRITING
output looks unobtrusively.
F. Steganography clearly fits within
cryptography.
G. For steganography to remain undetected,
the unmodified cover medium must be
kept secret, because if it is exposed, a
comparison between the cover and stego
media immediately reveals the changes.
H. Similar in nature to the slight of
hand used in traditional magic,
steganography uses the illusion
of normality to mask the
existence of covert activity. The
illusion is manifested through the
use of a myriad of forms including
written documents, photographs,
paintings, music, sounds, physical
items, and even the human body.
Two parts of the system are
Figure showing Steganography + Cryptography required to accomplish the
Steganography is a useful tool that allows covert objective, successful masking of
(hidden) transmission of information over an overt the message and keeping the key
(not hidden) communications channel. The covert
to its location and/or deciphering
channel concept was first introduced by Lampson
a secret.
as “a channel used for information transmission,
The encoding step of a steganographic system
but not designed nor intended for communication.
identifies redundant bits and then replaces a subset
Both Steganography and Cryptography are
of them with data from a secret message.
excellent means by which to accomplish this but
neither technology alone is perfect and both can be
broken. It is for this reason that most experts would
suggest using both to add multiple layers of
security. Many governments have created laws that
either limit the strength of cryptosystems or
prohibit them completely. So this leaves the
personnel to not have enough security for the
intended data to be sent. SO in order to provide
multiple layers of security and to subside problems
with crypto versus law, its a good practise to
combine steganography and cryptography.

A. Steganography enables the user to


transmit the user to transmit information
masked inside a file in plain view.
B. The hidden data is both difficult to detect
and when combined with known
Modern Steganographic Communication. create a hidden message) were also
1.2. HISTORICAL EXAMPLES: introduced.
The earliest recordings of Steganography were by
the Greek historian Herodotus in his chronicles
But as we see in all these historical examples,
known as "Histories" and date back to around 440
BC. just by knowing how the message in being passed
on i.e., either by shaving the head of a everyone
passing through a check point, or melting the wax
• Classical examples include King Darius of off any discovered tablets reveals not only the
Susa, who shaved the head of a slave existence of a hidden message but the message
tattooing a message on his scalp. When itself.
the slave’s hair grew back, the King
dispatched the slave to deliver the hidden
message to its intended recipient 2. FUNCTIONAL OVERVIEW:
• Ancient Greeks covered tablets with wax The basic principle of steganography ensures that
and used them to write on. The tablets modifications to the data in the cover file must
were composed of wooden slabs. A layer have insignificant or no impact to the final
of melted wax was poured over the wood presentation, insignificantor no impact on final
and allowed to harden as it dried. Hidden presentation means changes so minor in nature that
messages could be carved into the wood the casual observer cannot tell that a hidden
prior to covering the slab. When the message is even present.
melted wax was poured over the slab, the
now concealed message was later revealed Every digital file is composed of a sequence of
by the recipient when they re-melted the binary digits (0 or 1). It is also a relatively simple
wax and poured it from the tablet. task to modify the content of a file by changing a
single bit in the sequence. Accomplishing the
• From the 1st century (Romans) through modification without changing the presentation or
World War II invisible inks(based on the final form of the file is altogether a different
natural substances such as fruit juices and task. For example, the binary value of the decimal
milk) were often used to conceal hidden number 13 consists of 4 bits (1101), changing one
messages. At first, the inks were organic bit in the sequence changes the decimal value of
substances that oxidized when heated. The the number it represents and ultimately changes the
heat reaction revealed the hidden message. meaning of the value, (i.e. 1100 is the decimal
As time passed, compounds and equivalent of the number 12 not 13).
substances were chosen based on desirable
chemical reactions. When the recipient Example:
mixed the compounds used to write the An electric signal conducted on a wire can contain
invisible message with a reactive agent, varying voltage levels over time. When using a
the resulting chemical reaction revealed single bit to sample the voltage level, we can only
the hidden data. Today, some commonly represent two states for any given time interval (off
used compounds are visible when placed or on _ 0 or 1). We cannot represent a specific
under an ultraviolet light. value such as +3.3v unless the value happens to be
a boundary condition (i.e. the high voltage of this
signal is +3.3v). By adding bits to the
• A microdot is a document or photograph representation of the measurement we can
reduced in size until it is as small as a reproduce measurements between the boundary
pencil dot (about the size of the period at values. Two bits can define up to four states (0, 1.1,
the end of this sentence). Between World 2.2, and 3.3v for example), three can define eight,
War I and II Germany used microdots for four bits define sixteen, and so on. The level of
steganographic messaging purposes and precision used in the measurement is proportional
later many countries passed these to the number of bits used in the binary
microdot messages through insecure representation of the voltage level.
postal channels. Additional bits can eventually become unnecessary
when the accuracy of the waveform has been
achieved. A trade off between file size and sample
• During the times of world war I and world accuracy is often performed and the bit depth
war II null ciphers( taking the 3rd letter (number of bits per sample) chosen based on an
from each word in a harmless message to amicable medium. It is also possible to graphically
represent the waveform in a voltage vs. time plot.
With enough bits to provide fidelity to the match our message. The resulting steganographic
measurement, a close approximation of the actual data stream would become
signal can be reproduced. In the following 01001011, 01001011, 01001101, 01001101,
example, 8 bits will be chosen to represent a value 01001110,
between -5 and +5 volts with the most significant 01001110, 01010000, 01010000
bit determining the sign (+/-) of the measurement. Where the modified bits are in blue bold typeset.
The remaining seven bits provide 128 discrete Note that while the carrier data has changed, what
values for the amplitude of the sampled voltage. is represented or displayed in the final form (i.e. the
A plot of our hypothetical wave form is displayed form delivered to the end user) has been modified
in figure 1. Six randomly selected samples only in an imperceptible manner. Figure 2 shows
(represented by eight binary digits) are included our example waveform embedded with the
below simply to illustrate that the binary data following ASCII message after conversion to
changes over the time interval. binary: “The truth shall set you free”. The existence
of the embedded message can only be seen in the
blow-up of the first few samples of the reproduced
waveform.

Figure1(before modification)
By modifying the least significant bit of each
sample, it is possible to embed information into the
waveform without having significant impact to the
graphical representation of the data. By using 7 bits
to represent 5 volts of amplitude, we create a
relatively small division between values (0.04V).
By modifying the least significant bit (LSB) of any
datum we can only change its reproduced value by Figure 2 (After modification to waveform)
the same amount (0.04V). This imperceptible
change means that intentional modifications to the 3. TYPICAL STEGOSYSTEMS:
LSB of every sample may go unnoticed and allow Any file that requires multiple bits to reasonably
data to be embedded into the bit sequence. quantify itsmessage such that minor changes to the
When viewing the waveform after modification, data are imperceptible when the file is presented in
the difference in voltage at any given datum is final form is an acceptable candidate for a carrier.
imperceptible to the naked eye. To illustrate Three different aspects in information-hiding
consider the following illustrative bit stream: systems contend with each other: capacity, security,
01001010, 01001011, 01001100, 01001101, and robustness. Capacity refers to the amount of
01001110, information that can be hidden in the cover
01001111, 01010000, 01010001 … medium, security to an eavesdropper’s inability to
In the event we wished to inject the 8 bit message detect hidden information, and robustness to the
(11110000) into the data, we would modify the amount of modification the stego medium can
corresponding LSBs of the above bit stream to withstand before an adversary can destroy hidden
information. When considered broadly there are Consider embedding the binary
two types of stegosystems. value of the ASCII letter “T” – 01010100 into the
A. Passive Warden word “Singular.” We can inject the binary string by
B. Active Warden varying the spacing between the letters to indicate
3.1. PASSIVE WARDEN: azero or a one. For comparison, a fixed or naturally
Passive warder just monitors the spacedversion of the word is displayed below the
communication channel. He can pass the encoded version.Grey lines have been added to
cover texts through several statistical tests, more easily identify thecharacters that have been
but do not modify them. It is the same shifted to represent a binary value ofone. In the
situation as when the network packets go example below, all non-shifted (i.e. normally
through Intrusion Detect System. spacedand not touching the reference line)
characters are assumed torepresent a zero.
3.2. ACTIVE WARDEN:
Note that the “i", “g”, and the “l” are touching grey
Active warden manipulates cover lines thusindicating a high state or the binary value
texts in order to preclude the possibility of one for that position.When pieced back together the
hidden communication. Typical real-life values are as follows S-0, i-1, n-0, g-1, u-0, l-1, a-0,
application is watermarking and r-0 or 01010100.
fingerprinting. Watermark is a small piece
of embedded information which can proof Other methods are like:
copyrighted material. Fingerprint is very Stepped character approach (where the
similar, but is intended to track the message is conveyed with embeddedcharacters
concrete copy of copyrighted data. separated by a fixed number or constant step).
Digital Stegosystems: Image, Video and Addition or subtraction of white space
Soundfiles, Data can even beembedded in standard and/or carriage returns atthe end of every line.
TCP/IP packet headers.
The mostcommon image formats include BMP, Example of stepped character approach:
GIF, and JPEG. Consider encoding “secret words” into a
The majority of software applications designed carrier sentence using aseven character stepping
for steganography utilize the JPEG image file algorithm (again the characters are inblue typeset
format as the carrier. for clarity).
“This is much easier curing roses;each petal hadtoo
3.3. TEXT & PRINTED fewdewdrop awards for sicknesses …”
DOCUMENTS:
Text documents of all types can contain The primary focus of linguistic steganography field
embedded messagesthat are difficult if not is in the area of automating the selection
impossible to locate. This paragraphcontains a ofsynonyms for common words to embed data in
hidden message that can be decoded using a writing such thatit eliminates the unnatural and/or
decodingkey provided at the end. In the case of awkward wording problems
this paragraph, thedecoding is performed by
referencing a character in each line byits 3.4. STILL AND MOTION
position and using the character’s numeric IMAGE FILES:
location as the keyto the hidden message. This Images consistof pixels with contributions from
type of data embedding is identical to one time primary colors (red, green, and blue) adding to the
pad cryptography where a key is used to total color composition of the pixel. Mostimages
extract themessage from a stream of data. are represented as triples (Red contribution,
Steganography is not theencryption Greencontribution, Blue contribution). Depending
methodology, but rather the means by which on the depth ofcolor desired in the final image,
toconceal the message. The message contained each component is representedby a separate
in this paragraphreads “secret”, and is number of bits. In the case of a 24 bit bitmap,
decoded using the following key (14, 3,21, 2, eachcolor component has eight bits.
2, 11) – theletters in the paragraphabove have a
bluetypeset for ease oflocation. When closely viewing any specific color, single
digitmodifications to the contribution level are
We can also embed a letter in a string by just imperceptible to the human eye. (i.e., a pixel with a
moving the position of a letter in a word by value of (255, 255, 0) isindistinguishable from
very small distance.
For example:
(254, 255, 0). the picture formats (such asGIF) embed
information pertaining to the visual
representationto the color palette which affects all
of the bit layers in theimage. Steganographic
systems that use these formats and modify the LSB
to embed data, often impart noticeable changesto
the reproduced image and that serves as a cue
indicating theexistence of embedded data. Because
JPEG images do notdistribute the image
information across the entire image, butrather to
discrete 8 * 8 pixel blocks, the format is
lesssusceptible to those visual attacks.

Least Significant Byte (LSB) substitution is well


known and widely used method. Take for example
a True-Colour BMP image format. A colour of
pixel is coded in 3 byte array of indices to RGB
palette. If you change only LSB bit in each colour
element, then the picture will seem still the same,
but is not. It carries hidden information. A picture
with size 120x100 pixels can hold approximately
up to 4500B of hidden data, if this method is used.

3.5. AUDIO FILES:

The human ear can distinguish frequencies between


20 Hz and 20 kHz. By embedding a stream of data
into an audio signalat frequencies above those, the
effect is inaudible and cannot be detected by the
human ear.
Not only does the carrier’s reproduction of the data
sound identical to that of the original, but the only
We can use some conventional random impact the added data has is an increase to the size
number generator. Supplied password will serve as of the file.
initial seed. Generated numbers will specify which A frequency spectrum analyser or a calculation of
pixel to use for encoding next 3 bits of embedded the total amount of data required to produce the
data. The intruder/adversary, even with complete same audible spectrum over that time interval
knowledge of stegosystems, cannot extract the would be able to detect the presence of the
hidden message without knowing the password. additional information.
Similar procedures can be successfully applied to LSB modification of the bit stream can also be used
wide variety of multimedia formats. Only instead but has anoticeable detrimental impact to the
of the colour indices we slightly modify the carrier file when it isreproduced.
Discrete Cosine Transformation (DCT) or Fourier
Transformation (FT) coefficients. 3.6. TCP/IP HEADERS:

3.4.1. REASON FOR USING MOSTLY If the capability to generate and read the contents
JPEG: of well-formed TCP/IP packets exists at both ends
of a communication channel, it is possible for the
When using the LSB of the discrete coefficients for two parties covertly pass hidden data.
any givenblock, the modifications to a single Theexploitation modifies several of the fields in a
coefficient affect the valuesof each of the 64 standard IPv4packet header to carry information
discrete pixels. This translates into 64 over a covert channel.
minorchanges to a single block and that results in a
smoother colortransition between blocks. Some of
Specifically, the flags and sequence number fields
of the IPv4header are particularly susceptible to
manipulations that servethis purpose.

The flags field in the IPv4 header containsthree 4.1. METHODS IN SPATIAL DOMAIN:
bits; a reserved bit, a DF bit (do not fragment), and A basic classification of steganographic algorithms
finallya MF(more fragmentation) bit. Provided that operating in the spatial domain as the method for
the parties wishingto communicate covertly both selecting the pixels distinguishes three main types:
know the maximum transmittableunit (MTU) of the non-filtering algorithms, randomized algorithms
network, they can manipulate the flags fieldto carry and filtering algorithms.
a message within standard TCP/IP packets that
containinnocuous data. By keeping the total packet
size below theMTU, modification of the DF bit has 4.1.1. NON-FILTERING ALGORITHM:
no impact the transmissionof the cover message. This is the simplest steganographic method based
Alternatively, packets exceeding the MTUwith the in the use of LSB, and therefore the most
DF flag set are returned to the sender as vulnerable. The embedding process consists of the
undeliverable. By keeping the size of the cover sequential substitution of each least significant bit
packets below the MTU for the given network, the of the image pixel for each bit of the message. For
DF flag can be arbitrarily assignedallowing the its simplicity, this method can camouflage a great
field to carry binary data covertly. Conversely, by volume of information. This technique is quite
exceeding the MTU of the packet, the sender is simple. It is necessary only a sequential LSB
able to transmitpackets that will arrive out of reading, starting from the first image pixel, to
sequence and thus, can conveybinary information extract the secret message. This method also
through the order of packet arrival.If the order of generates an unbalanced distribution of the changed
arrival of the cover message packets isunimportant, pixels, because the message is embedded at the first
individuals can take advantage of the pixels of the image, leaving unchanged the
packetsequence number field to relay covert remaining pixels.
information to thereceiver. An algorithm that
packetizes the cover message andthen transmits the 4.1.2. RANDOMIZED ALGORITHM:
packets in an altered sequence can indicatebinary This technique was born as a solution for the
data by either transmitting a sequential packet to problems of the previous method. Each one, the
indicatea zero, or an out of sequence packet to sender and the receiver of the image has a
indicate a one. Further, aprogram that controls the password denominated stego-key that is employed
composition of packets can transmitall packets in as the seed for a pseudo-random number generator.
the correct order but modify the sequence This creates a sequence which is used as the index
numberof each transmitted packet when needed (to to have access to the image pixel. The message bit
indicate a change instate). Both of these approaches is embedded in the pixel of the cover image as the
can embed binary data inside of an overt index given by the pseudo-random number
communications channel. generator. The two main features of the pseudo-
random permutation methods are the use of
4. STEGANOGRAPHY TECHNIQUES: password to have access to the message, and the
Christian Cachin proposed an information-theoretic well-spread message bits over the image.
model for steganography that considers the security
of steganographic systems against passive 4.1.3. FILTERING ALGORITHM:
eavesdroppers. In this model, you assume that the This algorithm filters the cover image by using a
adversary has complete knowledge of the encoding
default filter and hides information in those areas
system but does not know the secret key. His or her that get a better rate. The filter is applied to the
task is to devise a model for the probability
most significant bits of every pixel, leaving the less
distribution PC of all possible cover media and PS significant to hide information. The filter ensures
of all possible stego media. The adversary can then
the choice of areas of the image in the least impact
use detection theory to decide between hypothesis with the inclusion of information, which affects a
C (that a message contains no hidden information)
greater difficulty of detecting the presence of
and hypothesis S (that a message carries hidden hidden messages [10]. The retrieval of information
content). A system is perfectly secure if no decision
is ensured because the bits used for filtering are not
rule exists that can perform better than random changed, implying that the reapply the filter will
guessing. Essentially, steganographic
select the same bits in the process of concealment.
communication senders and receivers agree on a It is the most efficient method to hide information.
steganographic system and a shared secret key that
determines how a message is encoded in the cover The algorithm SLSB belongs to this type.
medium.
Description of the Algorithm SLSB:

Where Q (u,v) is a 64 element quantization table.


When 24 bits (8 bits per colour) are input into the
DCT, the information describing any given pixel
can be reduced from 24 bits to as little as 3.
Depending on the number of bits used to represent
the DCT coefficients, the resulting compression of
data describing the pixel can reduce the total size of
the file without noticeably altering its final form.
JPEG steganography uses the least significant bit of
the DCT coefficients to hide the desired message.
Since the coefficient represents the relative
difference from the grids quantized value,
modifying the LSB of each coefficient changes the
value of the entire grid and has an imperceptible
impact to the reproduced image.

4.2. DISCRETE COSINE


TRANSFORM:
(OR)
JPEG ENCODING ALGORITHM:
The JPEG encoding algorithm sections the
input image into 8 * 8 grids containing 64 pixels
per grid. Within each grid adiscrete cosine
transform coefficient for every color componentin
the pixel is calculated. Thus, the data in each grid is
an 8 *8matrix of 64 DCT coefficients for each
color. The formula usedto calculate the DCT
coefficient F(u, v) of an 8 *8 grid ofimage pixels
f(x, y) is:

Where C(x) = 1/1.414 when x=0 and C(x)= 1 at Embedded information in a JPEG.
values x other than 1. (a) The unmodified original picture;
To determine the grid bias or colour offset, a (b) The picture with the first chapter of The
follow-on function must be employed. The Hunting of the Snark embedded in it.
following formula; where Q (u, v) defines the
quantization table for the internal elements is used:
Frequency histograms.

4.3. SEQUENTIAL:
Derek Upham’s JSteg was the first publicly
available steganographic system for JPEG images.
Its embedding algorithm sequentially replaces the
least-significant bit of DCT coefficients with the
message’s data. The algorithm does not require a
shared secret; as a result, anyone who knows the
steganographic system can retrieve the message
hidden by JSteg. Andreas Westfeld and Andreas
Pfitzmann noticed that steganographic systems that
change least-significant bits sequentially cause
distortions detectable by steganalysis.8
They observed that for a given image, the
embedding of high-entropy data (often due to
encryption) changed the histogram of colour Sequential changes to the
frequencies in a predictable way. (a) Original and
In the simple case, the embedding step changes the (b) Modified image’s least-sequential bit of
least-significant bit of colours in an image. The discrete cosine transform coefficients
colours are addressed by their indices i in the tends to equalize the frequency of adjacent
colour table; we refer to their respective DCT coefficients in the histograms.
frequencies before and after embedding as ni and Above figure displays the histogram before and
ni*. Given uniformly distributed message bits, if after a hidden message is embedded in a JPEG
n2i> n2i+1, then pixels with colour 2i are changed image. We see a reduction in the frequency
more frequently to colour 2i + 1 than pixels with difference between coefficient –1 and its adjacent
colour 2i + 1 are changed to colour 2i. As a result, DCT coefficient –2. We can see a similar reduction
the following relation is likely to hold: in frequency difference between coefficients 2 and
3. Westfeld and Pfitzmann used a -test to
determine whether the observed frequency
distribution yi in an image matches a distribution yi
In other words, embedding uniformly distributed * that shows distortion from embedding hidden
message bits reduces the frequency difference data. Although we do not know the cover image,
between adjacent colours. The same is true in the we know that the sum of adjacent DCT coefficients
JPEG data format. Instead of measuring colour remains invariant, which lets us compute the
frequencies, we observe differences in the DCT expected distribution yi * from the stego image.
coefficients’ frequency. Letting ni be the DCT histogram, we compute the
arithmetic mean to determine the expected
distribution and compare it against the observed
distribution.

The value for the difference between the


distributions is given as
If DCT ≠0 and DCT ≠1 then get next LSB
from message replace DCT LSB with
message LSB
end if
insert DCT into stego image
end while
Where v is the degrees of freedom—that is, one
less than the number of different categories in the
histogram. It might be necessary to sum adjacent
values from the expected distribution and the
4.4. PSUEDO RANDOM:
observed distribution to ensure that each category OutGuess 0.1 (created by Niels Provos) is a
has enough counts. Combining two adjacent steganographic system that improves the encoding
categories reduces the degrees of freedom by one. step by using a pseudo-random number generator
The probability p that the two distributions are to select DCT coefficients at random. The least-
equal is given by the complement of the cumulative significant bit of a selected DCT coefficient is
distribution function, replaced with encrypted message data. The χ2-test
for JSteg does not detect data that is randomly
distributed across the redundant data and, for that
reason, it cannot find steganographic content
hidden by OutGuess 0.1. However, it is possible to
extend the χ2- test to be more sensitive to local
distortions in an image. Two identical distributions
produce about the same χ2 values in any part of the
distribution. Instead of increasing the sample size
where Γ is the Euler Gamma function. and applying the test at a constant position, we use
The probability of embedding is determined by a constant sample size but slide the position where
calculating p for a sample from the DCT the samples are taken over the image’s entire range.
coefficients. The samples start at the beginning of Using the extended test, we can detect pseudo-
the image; for each measurement the sample size is randomly distributed hidden data. Given a constant
increased. sample size, we take samples at the beginning of
the image and increase the sample position by 1
percent for every χ2 calculation. We then take the
sum of the probability of embedding for all
samples. If the sum is greater than the detection
threshold, the test indicates that an image contains
a hidden message. To find an appropriate sample
size, we select an expected distribution for the
extended χ2-test that should cause a negative test
result. Instead of calculating the arithmetic mean of
coefficients and their adjacent ones, we take the
arithmetic mean of two unrelated coefficients,

A binary search on the sample size helps find a


Above figure shows the probability of embedding value for which the extended χ2-test does not show
for a stego image created by JSteg. a correlation to the expected distribution derived
The high probability at the beginning of the image from unrelated coefficients.
reveals the presence of a hidden message; the point
at which the probability drops indicates the end of
the message.
4.3.1. JSTEG ALGORITHM:
Input: message, cover image
Output: stego image
While data left to embed do get next DCT
coefficient from cover image
discrimination function is called the feature vector.
Lyu and his colleague used a support vector
machine (SVM) to create a nonlinear
discrimination function. Here, we present a less
sophisticated but easier to understand method for
determining a linear discrimination function,

of the measured image statistics X= (x1, x2, …,


xk)T that, for appropriately chosen bi, discriminates
between the two classes.
For a new image X, the discriminant function Λ
lets us decide between two hypotheses: the
hypothesis H0 that the new image contains no
steganographic content and the hypothesis H1 that
the new image contains a hidden message.
For the binary hypothesis problem, detection theory
provides us with the Neyman-Pearson criterion,
which shows that the likelihood ratio test
maximizes the detection rate PD for a fixed false-
negative rate PF,14 where px|H1 (X|H1) and px|
H0(X|H0) denote the joint probability functions for
(x1, x2, …, xk) under H1 and H0, respectively. The
constant η is the detection threshold.
Above figure shows an analysis of the extended χ2-
test for different false-positive rates. Its detection
rate depends on the hidden data’s size and the
number of DCT coefficients in an image. We
characterize their respective relation by using the
change rate—the fraction of DCT coefficients
available for embedding a hidden message that
have been modified. With a false-positive rate of To choose the weights bi, we assume that the set xi
less than 0.1 percent, the extended χ2-test starts of non-stego images and the set yi of stego images
detecting embedded content for change rates are independently and normally distributed. This
greater than 5 percent. We improve the detection assumption lets us calculate the probability
rate by using a heuristic that eliminates coefficients functions px|H1 (X|H1) and px|H0 (X|H0), which
likely to lead to false negatives. Due to the we use to derive the weights bi. Determining the
heuristic, the detection rate for embedded content discrimination functions is straightforward, but
with a change rate of 5 percent is greater than 40 finding a good feature vector is difficult. Farid
percent for a 1 percent false-positive rate. Niels created a feature vector with a wavelet-like
Provos showed that applying correcting transforms decomposition that builds higher-order statistical
to the embedding step could defeat steganalysis models of natural images.10 He derived the
based on the χ2-test.12 He observed that not all the statistics by applying separable low- and high-pass
redundant bits were used when embedding a hidden filters along the image axes generating vertical,
message. If the statistical tests used to examine an horizontal, and diagonal subbands, which are
image for steganographic content are known, it is denoted Vi(x,y), Hi(x,y) and Di(x,y), respectively,
possible to use the remaining redundant bits to for different scales i = 1, …, n.
correct statistical deviations that the embedding The first set of statistics for the feature vector is
step created. In this case, preserving the DCT given by the mean, variance, skewness, and
frequency histogram prevents steganalysis via the kurtosis of the subband coefficients at each
χ2-test. Siwei Lyu and Hany Farid suggested a orientation and at scales i = 1… n – 1.
different approach based on discrimination of two The second set of statistics is based on the errors in
classes: stego image and non-stego image.10,11 an optimal linear predictor of coefficient
Statistics collected from images in a training set magnitude. For each subband and scale, the error’s
determine a function that discriminates between the distribution is characterized by its mean, variance,
two classes. The discrimination function skewness, and kurtosis resulting in a total size of
determines the class of a new image that is not part 24(n – 1) for the feature vector. Lyu and Farid’s
of the training set. The set of statistics used by the
training set consists of 1,800 nonstego images and
a random subset of 1,800 stego images that contain
images as hidden content. Using four different
scales, a program (or a researcher) calculates a 72-
length feature vector for each image in the training
set.

where i enumerates the number of blocks in the


image, and k enumerates the rows or columns in a
Table 1 shows their achieved detection rate using a
single block. For each distribution, we calculate the
nonlinear SVM for false-positive rates 0.0 percent
mean and its first three central moments, resulting
and 1.0 percent and different message sizes.
in 64 measurements for a single image.
The discrimination function works well only if the
training set captures the image space’s useful
characteristics. For different types of images—for
example, nature scenes and indoor photographs—
the detection rate could decrease when using a
single training set. Improving the training set by
selecting images that match the type of image
we’re analyzing might be possible. The probability
models for clutter in natural images that Ulf
Grenander and Anuj Srivastava15 first proposed let
us select similar images from the training set
automatically. We can improve the detection
quality rate by using a feature vector based on
different statistics. Instead of using a wavelet-like
decomposition, we look at the distribution of
squared differences,

Above figure compares the linear discrimination


functions derived from the two feature vectors.
Figure a shows receiver- operating characteristics
(ROC) for OutGuess messages and their
corresponding change rates; Figure 8b shows the
ROCs for F5 messages (described in more detail
later). For OutGuess, the feature vectors show
comparable detection performance. However, for 6. The slope S0 = B(1) – B(0) corresponds to the
F5, the squared differences feature vector original cover image, and S1 = B1(1) – (1) is the
outperforms the wavelet feature vector. slope for an image with an embedded, maximal
Using a discrimination function does not help us length message.
determine a hidden message’s length. Jessica The stego image’s slope S = BS(1) – BS(0) is
Fridrich and her colleagues made a steganalytic between the two slopes S0 and S1. The hidden
attack on OutGuess that can determine a hidden message’s length is then determined as
message’s length.16 Out- Guess preserves the first-
order statistics of the DCT coefficients, so Fridrich
and her group devised a steganalytic method
independent of the DCT histogram. They used
discontinuities along the boundaries of 8 × 8 pixel
blocks as a macroscopic quantity that increases
with the hidden message’s length. The discontinues Where p = 0 corresponds to the cover image and p
are measured by the blockiness formula = 1 to an image with the maximal embedded
message length. To counter randomness in the
OutGuess embedding algorithm, we repeat the
detection algorithm 10 times. The average of the p-
values is taken as the final message length.
Fridrich and her group tested their algorithm on 70
images of which 24 contained hidden messages.
Their analysis showed an error in the estimated
message length of –0.48 percent ± 6 percent. This
approach has two advantages over class
discrimination: it does not require a training set and
it determines the length of hidden messages.

4.4.1. OUTGUESS 1.0 ALGORITHM:

Input: message, shared secret, cover image


where gij are pixel values in an M*N greyscale
Output: stego image
image. Experimental evidence shows that the
initialize PRNG with shared secret
blockiness B increases monotonically with the
while data left to embed do
number of flipped least-sequential bits in the DCT
get pseudo-random DCT coefficient from
coefficients. Its first derivative decreases with the
cover image
hidden message’s length, meaning that the
if DCT ≠ 0 and DCT ≠ 1 then
blockiness function’s slope is maximal for the
get next LSB from message
cover image and decreases for an image that
replace DCT LSB with message LSB
already contains a message.
end if
insert DCT into stego image
Using the blockiness measure, the algorithm to
end while
detect
OutGuess proceeds as follows:
1. Determine the blockiness BS(0) of the 4.5. SUBTRACTION:
decompressed stego image.
2. Using OutGuess, embed a maximal length Instead of replacing the least-significant bit of a
message and calculate the resulting stego image’s DCT coefficient with message data, F5 decrements
blockiness BS(1). its absolute value in a process called matrix
3. Crop the stego image by four pixels to encoding. As a result, there is no coupling of any
reconstruct an image similar to the cover image. fixed pair of DCT coefficients, meaning the χ2-test
Compress the resulting image using the same JPEG cannot detect F5. Matrix encoding computes an
quantization matrix as the stego image and appropriate (1, (2k – 1), k) Hamming code by
calculate the blockiness B(0). calculating the message block size k from the
4. Using OutGuess, embed a maximal length message length and the number of nonzero non-
message into the cropped image and calculate the DC coefficients. The Hamming code (1, 2k– 1, k)
resulting blockiness B(1). encodes a k-bit message word m into an n-bit code
5. Using OutGuess, embed a maximal length word a with n = 2k – 1. It can recover from a single
message into the stego image from the previous bit error in the code word. F5 uses the decoding
step and compute the resulting blockiness B1(1). function f(a) = ⊕ni=1 ai ⋅ i and the Hamming
distance d. With matrix encoding, embedding any
k-bit message into any n-bit code word changing it
at most by one bit. In other words, we can find a
suitable code word a′ for every code word a and
every message word m so that m = f(a′) and d(a, a′)
≤ 1. Given a code word a and message word m, we
calculate the difference s = m⊕f(a) and get the new
code word as

4.5.1. F5 ALGORITHM:

Input: message, shared secret, cover image


Output: stego image
initialize PRNG with shared secret
permutate DCT coefficients with PRNG First, the DCT coefficients are permuted by a
determine k from image capacity keyed pseudo-random number generator (PRNG),
calculate code word length n←2k – 1 then arranged into groups of n while skipping zero
while data left to embed do and DC coefficients. The message is split into k-bit
get next k-bit message block blocks. For every message block m, we get an n-bit
repeat code word a by concatenating the least significant
G←{n non-zero AC coefficients} bit of the current coefficients’ absolute value. If
s←k-bit hash f of LSB in G the message block m and the decoding f(a) are the
s←s⊕k-bit message block same, the message block can be embedded without
if s ≠ 0 then any changes; otherwise, we use s = m⊕f(a) to
decrement absolute value of DCT determine which coefficient needs to change (its
coefficient Gs absolute value is decremented by one). If the
insert Gs into stego image coefficient becomes zero, shrinkage happens, and it
end if is discarded from the coefficient group. The group
until s = 0 or Gs ≠ 0 is filled with the next nonzero coefficient and the
insert DCT coefficients from Ginto stego process repeats until the message can be embedded.
image For smaller messages, matrix encoding lets F5
end while reduce the number of changes to the image—for
example, for k = 3, every change embeds 3.43
message bits while the total code size more than
doubles. Because F5 decrements DCT coefficients,
the sum of adjacent coefficients is no longer
invariant, and the χ2 test cannot detect F5-
4.5.2. IMPLEMENTATION OF F5:
embedded messages. However, Fridrich and her
group presented a steganalytic method that does
detect images with F5 content. They estimated the
cover-image histogram from the stego image and
compared statistics from the estimated histogram
against the actual histogram. As a result, they found
it possible to get a modification rate β that indicates
if F5 modified an image.
Fridrich and her colleagues’ steganalysis
determined how F5’s embedding step changes the
cover image’s AC coefficients. Let

huv(d) := |{F(u,v)| d = |F(u,v)|, u + v ≠ 0}|

be the total number of AC DCT coefficients in the


cover image with frequency (u,v) whose absolute
value equals d. Huv(d) is the corresponding
function for the stego image. If F5 changes n AC
coefficients, the change rate β is n/P, where P is the
total number of AC coefficients. As F5 changes
coefficients pseudo randomly, we expect the
histogram values for the stego image to be

Huv(d) < (1 – β)huv(d) + β huv(d + 1), for d > 0


Huv(0) < huv(0) + β huv(1), for d = 0.

Fridrich and her group used this estimate to


calculate the expected change rate β from the cover
image histogram. They found the best
correspondence when using d = 0 and d = 1
because these coefficient values change the most
during the embedding step. This leads to the
approximation.
The above figure shows the ROC for a test set of
500 nonstego and 500 stego images. In the first
test, both types of images are double-compressed
due to F5. The only difference is that the stego
images contain a steganographic message. Notice
that the false-positive rate is fairly high compared
to the detection rate. The second test uses the
The final value of β is calculated as the average of
original JPEG images without double compression
βuv for the frequencies (u,v) ∈{(1,2),(2,1),(2,2)}. as reference.
The histogram values for the cover image are
unknown and must be estimated from the stego
image. We do this by decompressing the stego 4.6. COLOUR CLIPPING:
image into the spatial domain. The resulting image Depending on the rounding used in quantization, it
is then cropped by four pixels on each side to move is possible that the reconstructed image data may
the errors at the block boundaries. We recompress be outside the expected range which is often known
the cropped image using the same quantization as color overflowing.
tables as the stego image, getting the estimates for
the cover image histogram from the recompressed
image. Because many images are stored already in
the JPEG format, embedding information with F5
leads to double compression, which could confuse
this detection algorithm. Fridrich and her group
proposed a method for eliminating the effects of
double compression by estimating the quality factor
used to compress the cover image. Unfortunately,
they based their evaluation of the detection
algorithm on only 20 images. To get a better
understanding of its accuracy, we present an
evaluation of the algorithm based on our own
implementation.
target language using the steganographic encoder.
The steganographic encoder essentially creates
multiple translations for each sentence and selects
one of these to encode bits from the hidden
message. The translated text is then transmitted to
the receiver, together with information that is
sufficient to obtain the source text. This can either
be the source text itself or a reference to the source.
The receiver then also performs the translation of
the source text using the same steganographic
encoder configuration. By comparing the resulting
sentences, the receiver reconstructs the bit stream
of the hidden message. As with most
steganographic systems, the hidden message itself
can be encrypted with a secret key, making it
harder for the adversary to perform guessing
attacks on the secret configuration

4.7.2. SELECTING A TRANSLATION:


When selecting a translation to encode the hidden
message, the encoder first builds a Huffman tree of
the available translations using the probabilities
assigned by the generator algorithm. Then the
algorithm selects the sentence that corresponds to
the bit-sequence that is to be encoded.
Using a Huffman tree to select sentences in
accordance with their translation quality estimate
ensures that sentences that are assumed to have a
Many JPEG steganographic techniques embed
low translation quality are selected less often.
message into the quantized DCT coefficients in the
Furthermore, the lower the quality of the selected
compression process. For example, JSteg embeds
translation, the higher the number of transmitted
the message by modulating the least significant bit
bits. This reduces the total amount of cover text
of the non-zero and non-one quantized DCT
required and thus the amount of text the adversary
coefficients. This embedding process as well as the
can analyze. The encoder can use a lower limit on
rounding one degrades the image quality.
the relative translation quality to eliminate
Accordingly, the number of color clipping in the
sentences from consideration where the estimated
decompression process also increases.
translation quality is below a certain threshold, in
which case that threshold becomes part of the
shared secret between sender and receiver.

4.7. TRANSLATION BASED


STEGANOGRAPHY: HUFFMAN TREE

4.7.1. PROTOCOL: 4.7.3. PRODUCING TRANSLATIONS:


The sender first needs to obtain a cover in the The first step for both sender and receiver after
source language. The cover does not have to be obtaining the source text is to produce multiple
secret and can be obtained from public sources - for translations of the source text using the same
example, a news website. The sender then algorithm. The goal of this step is to
translates the sentences in the source text into the
deterministically produce multiple different sentence corresponds to the next bit in the hidden
translations of the source text. The simplest message that is to be transmitted. The decoder then
approach to achieve this is to apply (a subset of) all just computes the hash codes of the received
available MT systems on each sentence in the sentences and concatenates the respective lowest
source text. If the parties have full access to the bits to obtain the hidden message. This scheme
code of a statistical MT system, they can generate assumes that sentences are long enough to almost
multiple MT systems from the same codebase by always have enough variation to obtain a hash with
training it with different corpora. In addition to the desired lowest bit. Error-correcting codes must
generating different sentences using multiple be used to correct errors whenever none of the
translation systems sentences produces an acceptable hash code. Using
it is also possible to apply post-processing on the this variation reduces the bitrate that can be
resulting translations to obtain additional achieved by the encoding. More details on this can
variations. Such post-processing includes be found in our technical report
transformations that mimic the noise inherent in 5. SOFTWARES AVAILABLE:
any (MT) translation.
5.1. STEGANOGRAPHY ANALYZER
ARTIFACT SCANNER
(StegAlyzerAS):
StegAlyzerAS gives you the capability to
scan the entire file system, or individual directories,
on suspect media for the presence of
Steganography application artifacts. And, unlike
other popular forensic tools, you can perform an
automated or manual search of the Windows
Registry to determine whether or not any Registry
keys or values exist that can be associated with a
particular Steganography application.

5.2. STEGANOGRAPHY
ANALYZER SIGNATURE SCANNER
(StegAlyzerSS):
StegAlyzerSS gives you the capability to
scan every file on the suspect media for the
presence of hexadecimal byte patterns, or
signatures, of particular Steganography
applications in the files. If a known signature is
detected, it may be possible to extract information
hidden with the Steganography application
associated with the signature.

5.3. DIGITAL INVISIBLE INK


TOOLKIT:
This project provides a simple Java-based
steganography tool that can hide a message inside a
24-bit color image so that knowing how it was
embedded, or performing statistical analysis, does
not make it any easier to find the concealed
information.
Apart from these there are numerous other software
available for different types of Steganography.
4.7.4. KEEPING SOURCE TEXT
SECRET: • Hidden directories can be created that are
The presented scheme can be adapted to be suitable not included in the allocation tables of the
for watermarking where it would be desirable to main operating system. Files are stored in
keep the source text secret. This can be achieved as these directories through a ghost or mirror
follows. OS directory structure that is managed by
The encoder computes a (cryptographic) hash of the software.
each translated sentence. It then selects a sentence
such that the last bit of the hash of the translated
When the file containing the embedded information • MP3Stego: Hides information in MP3
is provided to the recipient, only the correct carrier files
password and decoding algorithm will produce the • S-Tools: Hides information in BMP, GIF,
decoding sequence or decrypt the embedded file. or WAV carrier files
The most popular piece of software used to perform • Stash: Hides information in BMP, PCX,
this Steganography in documents is a piece of PNG, and TIFF carrier files
software called SNOW. • Stegotif: Hides information in TIFF carrier
files
• Stegowav: Hides information in WAV
carrier files

6. DETECTION AND RECOVERY


ANALYSIS:

Steganalysis is the art and science behind the


detection of the use of Steganography by a third
party. The process of finding these distortions is
called statistical steganalysis.The basic function of
steganalysis is to first detect or estimate the
probability that hidden information is present in
any given file. The detection and estimation is
based only on the data presented in its observable
form.
6.1. STEGANALYSIS:
Simply detecting the presence of hidden data may
not be sufficient, steganalysis also covers the
a) picture of a Word document before functions of extracting the message, disabling
using SNOW and/or destroying the hidden message so that it
cannot be extracted, and finally, altering the hidden
message such that misinformation can be sent to
the intended recipient instead of the original
message.

The steganalysis is attack methods can be broken


into six types:

6.1.1. Steganography-only attack:


Only the file with the embedded data
is available for analysis.

6.1.2. Known-carrier attack: Both the


original carrier file and the final
(hidden message embedded) files are
available for analysis.

6.1.3. Known-message attack: The


b) picture of a Word document after original message prior to embedding
using SNOW in the carrier is known.
As seen in both the images, there’s practically no
visible change in document before and after usage
of SNOW. 6.1.4. Chosen-Steganography attack:
An overview of the different types of software Both the algorithm used to embed the
available for Steganography: data and the final (hidden message
• Gif-It-Up: Hides information in GIF embedded) file are known and
carrier files available for analysis.
• JPHide-&-Seek: Hides information in
JPEG carrier files 6.1.5. Chosen-message attack: The
original message and the algorithm
used to embed the message are
available, but neither the carrier nor
the final (hidden message embedded)
file are. This attack is used by the
analyst for comparison to future files.

6.1.6. Known-Steganography attack:


All components of the system (the
original message, the carrier message,
and the algorithm) are available for
analysis.

The success of any steganalysis technique relies on


the amount of information known about the file
prior to investigation. As more information about
the file is known prior to investigation, the
investigator can move from simply detecting to
modifying or altering the hidden message before
sending it on to the intended recipient.

In Steganography only attack the purpose is only to


detect the existence of message, but without
knowing the message prior to decryption, it takes
excess amount of time to recover the message.

If we just know the carrier file and final file, then


we can recover by just comparing both the files.
The steganography-only attack can be
accomplished through the use of statistical analysis
performed on the final medium.

EXAMPLE:
The colour contents of JPEG images are examined.
A modification to each coefficient’s LSB produces
variations in the data that results in deviations to
the histogram for the given file. If the deviations The arrows in the embedded histogram indicate
are large enough to produce noticeable aberrations, two obvious differences in the waveform.
the embedded file’s of the file (containing 42,886 Steganalysis takes this phenomenon one
colours). histogram can identify the existence of the step further by comparing the normalized
hidden message. Likewise, LSB modifications to distribution of colours against a predicted value.
palette-based images (GIF, etc.) cause duplications For palette based images, a normal distribution of
of the colours in the palette with identical or nearly colour frequency is likely. A scalable standard bell
identical colours appearing. This duplication of curve can be assumed as the comparison
colours can also serve as an indicator pointing to benchmark against the suspect file. Changes to the
the existence of hidden data. When examining the LSBs for any given pixel can create duplicate (or
greyscale histograms for an original and a near duplicate) colors in the image’s color palette.
steganographically embedded JPEG (such as in The duplicate colours increase the frequency for
Figure 4), slight deviations in the histograms are that value and can create a spike in the distribution
noticeable. The greyscale histogram provides a exceeding the benchmark reference. Any large
cumulative value for all three colour channels (red, deviations from the benchmark can be an indicator
green, and blue) at each brightness level (0-255). of anomalies or modifications to the contents of the
Figure 5 compares the same photograph in its file.
original form (containing 42,784 colours) to an The process for JPEGs can be a bit more
embedded version complicated. Because the JPEG format does not
use a palette based encoding algorithm, a second
step is necessary to compare DCT frequency to a
benchmark. Algorithms that sequentially modify
the DCT coefficients in JPEG files tend to cause
distortions in the histogram that flatten out the
frequency values of adjacent DCTs. To compensate
for this issue, newer algorithms do not sequentially
embed the data but rather use a password or key to false-negative rate is preferable to a high false-
generate a random order for DCT or LSB positive rate.
modifications.
Westfeld and Pfitzmann used a test to predict the
probability that an image contained steganographic
content by comparing the expected distribution (the
null hypothesis) against the sampled values. If the
measured value produced a deviation from the
expected, then the amplitude of the deviation was
proportional to the probability of steganographic
content at that point in the file. Because their
algorithm ran on sequential bytes with an
increasing sample size for each calculation, when
the probability dropped, the size of the hidden
message was often revealed as well.
After detection of hidden content with a carrier file,
the next step is recovery of the hidden message
itself. Because modifications to the data comprising
the carrier file are made without incorporating a
mapping back to the original values, recovery of
the original carrier file is difficult and sometimes
impossible. For digital pictures, audio, video, and
even file slack space, steganographic modifications
to the original contents often destroy the integrity
of the carrier file in the process.
6.2. STEGANOGRAPHY
DETECTION ON INTERNET:
6.2.2. VERIFYING CONTENT:
To assess claims that steganographic content is The statistical tests used to find steganographic
regularly posted to the Internet. To find out if such content in images indicate nothing more than a
claims are true, we have a steganography detection likelihood that content is embedded. Because of
framework that gets JPEG images off the Internet that, Stegdetect cannot guarantee a hidden
and uses steganalysis to identify subsets of the message’s existence. To verify that the detected
images likely to contain steganographic content. images have hidden content,
Stegbreak must launch a dictionary attack against
6.2.1. DETECTION FRAMEWORK: the JPEG files. JSteg-Shell, JPHide, or Outguess all
Stegdetect is an automated utility that can analyze hide content based on a user-supplied password, so
JPEG images that have content hidden with JSteg, an attacker can try to guess the password by taking
JPHide, and OutGuess 0.13b. Stegdetect’s output a large dictionary and trying to use every single
lists the steganographic systems it finds in each word in it to retrieve the hidden message. In
image or writes “negative” if it couldn’t detect any. addition to message data, the three systems also
Stegdetect’s false-negative rate depends on the embed header information, so attackers can verify a
steganographic system and the embedded guessed password using header information such as
message’s size. The smaller the message, the message length. For a dictionary attack28 to work,
harder it is to detect by statistical means. Stegdetect the steganographic system’s user must select a
is very reliable in finding images that have content weak password.
embedded with JSteg. For JPHide, detection
depends also on the size and the compression
quality of the JPEG images. Furthermore, JPHide
0.5 reduces the hidden message size by employing
compression. For JSteg, we cannot detect messages
smaller than 50 bytes. The false-negative rate in
such cases is almost 100 percent. However, once
the message size is larger than 150 bytes, our false-
negative rate is less than 10 percent. For JPHide,
the detection rate is independent of the message
size, and the false-negative rate is at least 20
percent in all cases. Although the false-negative
rate for OutGuess is around 60 percent, a high
send a top secret, private or highly sensitive
document over an open systems environment such
as the Internet.

7.2. EMBEDDED DATA:


By embedding the hidden data into the
cover message and sending it, you can gain a sense
of security by the fact that no one knows you have
sent more than a harmless message other than the
intended recipients.

7.3. DIGITAL
WATERMARKING:
Although not a pure steganographic
technique, digital watermarking is very common in
today's world and does use Steganographic
techniques to embed information into documents.
Digital watermarking is usually used for copy write
6.2.3. DISTRIBUTED DICTIONARY reasons by companies or entities that wish to
ATTACK: protect their property by either embedding their
Stegbreak is too slow to run a dictionary trademark into their property or by concealing
attack against JPHide on a single computer. serial numbers/license information in software, etc.
Because a dictionary attack is inherently parallel, Digital watermarking is very important in the
distributing it to many workstations is possible. To detection and prosecution of software
distribute Stegbreak jobs and data sets, we pirates/digital thieves.
developed Disconcert, a distributed computing
framework for loosely coupled workstations. Hydan programme by Rakan El-Khalil exploits
There are two natural ways to parallelize a redundancy in Intel x86 instruction set. Instruction
dictionary attack: each node is assigned its own set addl $20, % eax can be altered to subl $-20, % eax
of images or each node is assigned its own part of and vice versa. Then let the add means logical 1,
the dictionary. With more words existing than sub means logical 0 and a solid base for
images, the latter approach permits finer stegosystem has born.
segmentation of the work. To run the dictionary
attack, Disconcert hands out work units to
workstations in the form of an index into the
dictionary. After a node completes a work unit, it
receives a new index to work on.
7. APPLICATIONS:
The three most popular and researched uses for
steganography in an open systems environment are
• Covert channels,
• Embedded data and
• Digital watermarking.

7.1. COVERT CHANNELS:


Covert channels in TCP/IP involve
masking identification information in the TCP/IP
headers to hide the true identity of one or more
systems. This can be very useful for any secure
communications needs over open systems such as
the Internet when absolute secrecy is needed for an
entire communication process and not just one
document. Using containers (cover messages) to
embed secret messages into is by far the most
popular use of Steganography today. This method
of Steganography is very useful when a party must
8. DENYING STEGANOGRAPHY:
Proving that the files contain differences can be
done through the use of a cryptographic hashing
algorithm that verifies differences indeed exist.
The use of a guard processor at the entry and exit
point(s) of the systems network could accomplish
this task. An MD5 128-bit hash provides a high
degree of confidence that different inputs produce
different hash outputs. Thus, differences in MD5
hashes provide a high level of certainty that the
given inputs (the binary contents composing the
two photos) contain differences. The photograph
with the embedded data (Kids_steg.bmp) is the
same size, contains the same number of pixels, and a) Kids_orig b) Kids_steg
the same depth as the original but the binary Visually, the two files appear to be identical but the
contents of the file are different than the original MD5 sum provides credible evidence that that is
(Kids_orig.bmp). not the case.
To illustrate how to defeat the steganographic
mechanism, the final file (Kids_steg.bmp) was
converted into a JPEG by opening it in Microsoft
Paint and using the “save as” feature to save it in
the JPEG format. Note that the MD5 sum of
Kids_steg.jpg does not match either the original or
the embedded version of the photograph. An
expected and noticeable reduction in file size is
achieved when using JPEG compression. In this
case, once the final file is converted into a new
format, the embedded message is destroyed and the
covert steganographic channel is effectively denied.
The final step to proving this is the case was to
reconvert the JPEG image back into a bitmap.
Again Microsoft Paint was used
to open the JPEG image and the “save as” feature
was used to save it in the bitmap format. Note that 9.1. REVEALING THE
the recovered image (Kids_recov.bmp) has EXISTENCE OF DATA:
identical properties to the original and Because steganography modifies an existing file
steganographic files, but contains a different MD5 that is most likely in circulation on the internet, a
sum. The recovered image no longer contains the bitwise comparison of a given file with the “same”
hidden message and it is not the same file as the file suspected of containing hidden information can
original. The modifications to the original file when reveal use of steganography. Additionally, two
the Microsoft Office Excel spreadsheet was communicating parties can be easily identified as
embedded made irrecoverable changes to the bits communicating covertly if files that normally
defining each pixel. would not be exchanged suddenly are. For
For video and audio files the process example, two business executives frequently
outlined above remains the same. Convert the file exchanging photographs of cars over a period of
to another format that requires a conversion, such time could arouse suspicion.
as a lossy compression or expansion routine, and
the embedded data will be destroyed in the process.
With the exception of high compression data
9.2. RENDERING DATA
formats, the resulting “cleaned” reproduction of the USELESS:
file should show no noticeable deviations from the
original. Once a file is identified as possibly containing
For text based denial techniques, the process can be hidden data, one can either attempt to recover the
a bit more complicated. These steganographic information if the algorithm is known, or to destroy
insertions can be defeated using standard original the data without affecting the quality of the original
character recognition software to rebuild the file. An altered bitmap converted to JPEG would
original file from the OCR output. Synonyms can compress the file and remove unnecessary bits of
also be used to replace the awkward text often information, therefore removing any hidden data.
found when words are substituted in stepped Converting to any other format may not necessarily
character routines. This approach not only denies cause the image to lose information, but would
the steganographic channel, but leaves the intended change the bit composition of the data, making any
message in the carrier intact and can make the hidden data unreadable.
document more pleasant to read.
10. FUTURE SCOPE:
The injection of bits into the headers of TCP/IP Steganography can be used to design a
packets does not modify the content of the payload steganographic file system, where an adversary
in any way. Steganographic covert channels cannot deduce existence of any file. Surely, raw
utilizing techniques such as this are easily defeated disk looks suspicious, but one can only guess if
through the use of monitoring features at the switch there are any files inside and what their names are.
or router level. Malformed packets can be screened There have been few proposals in recent years.
out or modified to conform to a specific rule set. Anderson introduced two ideas. User has to supply
Consider packets with the do not fragment (DF) bit a file name and associated password to access the
manipulated so that the packets carry a covert desired file in both schemes.
message. A history or state based rule set could The first scheme initializes the file system with
trigger on packets going to the same destination several randomly generated cover files. Newly
under the same protocol but having inconsistent DF created object is embedded as the exclusive-or of a
bits. Other network Steganography denial subset of cover files. The subset is selected by the
techniques could include a security specification password and file name.
stating that the DF bit on every packet leaving the The second construction fills the whole disk with
switch/router should have a value of one and all random bits. Then the blocks of new objects are
packets entering should have a value of zero. At a written to absolute disk addresses given by some
more rudimentary level (knowing that it could be pseudorandom process. Other construction used
detrimental to some fragment sensitive HweeHwa Pang. His scheme supports plain and
applications) network security could be achieved hidden files at the same time. By hash value
by forcing the above conditions and modifying the obtained from a file name and password a position
flags. of header of hidden file is located.
The header contains a link to an inode table that
9. SHORT COMINGS OF indexes all the data blocks in the hidden file.
STEGANOGRAPHY: Additionally the header is encrypted. Then there
Because steganography has gained popularity only are various blocks whose type and location confuse
in the past decade, there are many flaws and an adversary.
vulnerabilities that still need to be addressed.
Next interesting application of steganography is can provide a critical advantage to an investigator
developing a scheme, where the content is by adding valuable tools to their forensic toolkit.
encrypted with one key and can be decrypted with
several other keys. The relative entropy between 12. REFERENCES:
encrypt and one specific decrypt key corresponds a. Steganography
to the amount of information, which can be used to http://en.wikipedia.org/wiki/Steganog
fingerprint the obtained data. Such fingerprinted raphy.
data can be later easily tracked. This idea together b. Steganography By Neil F Johnson
with partial extraction was presented at FEE CTU http://www.jjtc.com/stegdoc/.
Poster 2004. c. Steganography - A few tools to
A possible problem that the presented discover hidden data.
steganographic encoding might face in the future is http://www.guillermito2.net/stegano/t
significant progress in machine translation. If ools/index.html.
machine translation were d. JSteg- Steganography and
to become substantially more accurate, the possible Steganalysis.
margin of plausible mistakes might get smaller. http://csis.bits-
However, one large category of current machine pilani.ac.in/faculty/murali/netsec-
translation errors results from the lack of context 09/seminar/refs/anuroopsrep.pdf
that the machine translator takes into consideration. e. F5- A Steganographic Algorithm
In order to significantly improve existing machine http://os.inf.tu-
translation systems, one necessary feature would be dresden.de/~westfeld/publikationen/f
the preservation of context information from one 5.pdf.
sentence to the next. Only with that information f. SLSB: Improving the Steganographic
will it be possible to eliminate certain errors. But Algorithm LSB
introducing this context into the machine http://www.fing.edu.uy/inco/eventos/
translation system also brings new opportunities for cibsi09/docs/Papers/CIBSI-Dia3-
hiding messages in translations. Once machine Sesion9(1).pdf
translation software starts to keep context, it would g. An Overview of Steganography for
be possible for the two parties that use the the Computer Forensics Examiner.
steganographic protocol to use this context as a http://www.garykessler.net/library/fsc
secret key. By seeding their respective translation _stego.html
engines with k-bits of context they can make h. Steganography And Steganalysis
deviations in the translations plausible, forcing the http://www.sans.org/reading_room/w
adversary to potentially try 2k possible contextual hitepapers/stenganography/steganogra
inputs in order to even establish the possibility that phy-steganalysis-overview_553.
the mechanism was used. This is similar to the idea i. Seminar on Steganography
of splitting the corpus based http://www.scribd.com/doc/20529/Se
minar-on-Steganography.
j. Modern Steganography
http://www.scycore.com/papers/ow04
11. CONCLUSION: _paper.pdf.
Steganography is not a threat in general, but it k. Hide and Seek: An Introduction to
always remains to be POTENTIAL THREAT to Steganography
the society. It is in a familiar to deadly weapons, http://www.citi.umich.edu/u/provos/p
the person using it and the intended purpose makes apers/practical.pdf
it useful or Deadly. People should be focusing on l. Principles of steganography
the important aspects of Steganography, such as http://www.math.ucsd.edu/~crypto/Pr
what it is really used for, instead of believing ojects/MaxWeiss/steganography.pdf.
propaganda put out by the media. m. Chameleon-Image Steganography.
Computer forensic professionals need to be aware http://faculty.ksu.edu.sa/ghazy/Steg/R
of the difficulties in identifying the use of eferences/ref13.pdf.
steganography in any investigation. As with many n. Translation Based Steganography.
digital age technologies, steganography techniques http://www.google.co.in/search?
are becoming increasingly more sophisticated and hl=en&q=translation+based+steganog
difficult to reliably detect. Once use is detected or raphy&aq=0v&aqi=g-
discovered, obtaining the ability to recover the v1&aql=&oq=Translation+Based+S
embedded content is becoming difficult as well. o. Steganography FAQ.
Acquiring knowledge of current steganographic http://www.infosecwriters.com/text_r
techniques, along with their associated data types, esources/pdf/Steganography_AManga
rae.pdf

Das könnte Ihnen auch gefallen