Beruflich Dokumente
Kultur Dokumente
Figure1(before modification)
By modifying the least significant bit of each
sample, it is possible to embed information into the
waveform without having significant impact to the
graphical representation of the data. By using 7 bits
to represent 5 volts of amplitude, we create a
relatively small division between values (0.04V).
By modifying the least significant bit (LSB) of any
datum we can only change its reproduced value by Figure 2 (After modification to waveform)
the same amount (0.04V). This imperceptible
change means that intentional modifications to the 3. TYPICAL STEGOSYSTEMS:
LSB of every sample may go unnoticed and allow Any file that requires multiple bits to reasonably
data to be embedded into the bit sequence. quantify itsmessage such that minor changes to the
When viewing the waveform after modification, data are imperceptible when the file is presented in
the difference in voltage at any given datum is final form is an acceptable candidate for a carrier.
imperceptible to the naked eye. To illustrate Three different aspects in information-hiding
consider the following illustrative bit stream: systems contend with each other: capacity, security,
01001010, 01001011, 01001100, 01001101, and robustness. Capacity refers to the amount of
01001110, information that can be hidden in the cover
01001111, 01010000, 01010001 … medium, security to an eavesdropper’s inability to
In the event we wished to inject the 8 bit message detect hidden information, and robustness to the
(11110000) into the data, we would modify the amount of modification the stego medium can
corresponding LSBs of the above bit stream to withstand before an adversary can destroy hidden
information. When considered broadly there are Consider embedding the binary
two types of stegosystems. value of the ASCII letter “T” – 01010100 into the
A. Passive Warden word “Singular.” We can inject the binary string by
B. Active Warden varying the spacing between the letters to indicate
3.1. PASSIVE WARDEN: azero or a one. For comparison, a fixed or naturally
Passive warder just monitors the spacedversion of the word is displayed below the
communication channel. He can pass the encoded version.Grey lines have been added to
cover texts through several statistical tests, more easily identify thecharacters that have been
but do not modify them. It is the same shifted to represent a binary value ofone. In the
situation as when the network packets go example below, all non-shifted (i.e. normally
through Intrusion Detect System. spacedand not touching the reference line)
characters are assumed torepresent a zero.
3.2. ACTIVE WARDEN:
Note that the “i", “g”, and the “l” are touching grey
Active warden manipulates cover lines thusindicating a high state or the binary value
texts in order to preclude the possibility of one for that position.When pieced back together the
hidden communication. Typical real-life values are as follows S-0, i-1, n-0, g-1, u-0, l-1, a-0,
application is watermarking and r-0 or 01010100.
fingerprinting. Watermark is a small piece
of embedded information which can proof Other methods are like:
copyrighted material. Fingerprint is very Stepped character approach (where the
similar, but is intended to track the message is conveyed with embeddedcharacters
concrete copy of copyrighted data. separated by a fixed number or constant step).
Digital Stegosystems: Image, Video and Addition or subtraction of white space
Soundfiles, Data can even beembedded in standard and/or carriage returns atthe end of every line.
TCP/IP packet headers.
The mostcommon image formats include BMP, Example of stepped character approach:
GIF, and JPEG. Consider encoding “secret words” into a
The majority of software applications designed carrier sentence using aseven character stepping
for steganography utilize the JPEG image file algorithm (again the characters are inblue typeset
format as the carrier. for clarity).
“This is much easier curing roses;each petal hadtoo
3.3. TEXT & PRINTED fewdewdrop awards for sicknesses …”
DOCUMENTS:
Text documents of all types can contain The primary focus of linguistic steganography field
embedded messagesthat are difficult if not is in the area of automating the selection
impossible to locate. This paragraphcontains a ofsynonyms for common words to embed data in
hidden message that can be decoded using a writing such thatit eliminates the unnatural and/or
decodingkey provided at the end. In the case of awkward wording problems
this paragraph, thedecoding is performed by
referencing a character in each line byits 3.4. STILL AND MOTION
position and using the character’s numeric IMAGE FILES:
location as the keyto the hidden message. This Images consistof pixels with contributions from
type of data embedding is identical to one time primary colors (red, green, and blue) adding to the
pad cryptography where a key is used to total color composition of the pixel. Mostimages
extract themessage from a stream of data. are represented as triples (Red contribution,
Steganography is not theencryption Greencontribution, Blue contribution). Depending
methodology, but rather the means by which on the depth ofcolor desired in the final image,
toconceal the message. The message contained each component is representedby a separate
in this paragraphreads “secret”, and is number of bits. In the case of a 24 bit bitmap,
decoded using the following key (14, 3,21, 2, eachcolor component has eight bits.
2, 11) – theletters in the paragraphabove have a
bluetypeset for ease oflocation. When closely viewing any specific color, single
digitmodifications to the contribution level are
We can also embed a letter in a string by just imperceptible to the human eye. (i.e., a pixel with a
moving the position of a letter in a word by value of (255, 255, 0) isindistinguishable from
very small distance.
For example:
(254, 255, 0). the picture formats (such asGIF) embed
information pertaining to the visual
representationto the color palette which affects all
of the bit layers in theimage. Steganographic
systems that use these formats and modify the LSB
to embed data, often impart noticeable changesto
the reproduced image and that serves as a cue
indicating theexistence of embedded data. Because
JPEG images do notdistribute the image
information across the entire image, butrather to
discrete 8 * 8 pixel blocks, the format is
lesssusceptible to those visual attacks.
3.4.1. REASON FOR USING MOSTLY If the capability to generate and read the contents
JPEG: of well-formed TCP/IP packets exists at both ends
of a communication channel, it is possible for the
When using the LSB of the discrete coefficients for two parties covertly pass hidden data.
any givenblock, the modifications to a single Theexploitation modifies several of the fields in a
coefficient affect the valuesof each of the 64 standard IPv4packet header to carry information
discrete pixels. This translates into 64 over a covert channel.
minorchanges to a single block and that results in a
smoother colortransition between blocks. Some of
Specifically, the flags and sequence number fields
of the IPv4header are particularly susceptible to
manipulations that servethis purpose.
The flags field in the IPv4 header containsthree 4.1. METHODS IN SPATIAL DOMAIN:
bits; a reserved bit, a DF bit (do not fragment), and A basic classification of steganographic algorithms
finallya MF(more fragmentation) bit. Provided that operating in the spatial domain as the method for
the parties wishingto communicate covertly both selecting the pixels distinguishes three main types:
know the maximum transmittableunit (MTU) of the non-filtering algorithms, randomized algorithms
network, they can manipulate the flags fieldto carry and filtering algorithms.
a message within standard TCP/IP packets that
containinnocuous data. By keeping the total packet
size below theMTU, modification of the DF bit has 4.1.1. NON-FILTERING ALGORITHM:
no impact the transmissionof the cover message. This is the simplest steganographic method based
Alternatively, packets exceeding the MTUwith the in the use of LSB, and therefore the most
DF flag set are returned to the sender as vulnerable. The embedding process consists of the
undeliverable. By keeping the size of the cover sequential substitution of each least significant bit
packets below the MTU for the given network, the of the image pixel for each bit of the message. For
DF flag can be arbitrarily assignedallowing the its simplicity, this method can camouflage a great
field to carry binary data covertly. Conversely, by volume of information. This technique is quite
exceeding the MTU of the packet, the sender is simple. It is necessary only a sequential LSB
able to transmitpackets that will arrive out of reading, starting from the first image pixel, to
sequence and thus, can conveybinary information extract the secret message. This method also
through the order of packet arrival.If the order of generates an unbalanced distribution of the changed
arrival of the cover message packets isunimportant, pixels, because the message is embedded at the first
individuals can take advantage of the pixels of the image, leaving unchanged the
packetsequence number field to relay covert remaining pixels.
information to thereceiver. An algorithm that
packetizes the cover message andthen transmits the 4.1.2. RANDOMIZED ALGORITHM:
packets in an altered sequence can indicatebinary This technique was born as a solution for the
data by either transmitting a sequential packet to problems of the previous method. Each one, the
indicatea zero, or an out of sequence packet to sender and the receiver of the image has a
indicate a one. Further, aprogram that controls the password denominated stego-key that is employed
composition of packets can transmitall packets in as the seed for a pseudo-random number generator.
the correct order but modify the sequence This creates a sequence which is used as the index
numberof each transmitted packet when needed (to to have access to the image pixel. The message bit
indicate a change instate). Both of these approaches is embedded in the pixel of the cover image as the
can embed binary data inside of an overt index given by the pseudo-random number
communications channel. generator. The two main features of the pseudo-
random permutation methods are the use of
4. STEGANOGRAPHY TECHNIQUES: password to have access to the message, and the
Christian Cachin proposed an information-theoretic well-spread message bits over the image.
model for steganography that considers the security
of steganographic systems against passive 4.1.3. FILTERING ALGORITHM:
eavesdroppers. In this model, you assume that the This algorithm filters the cover image by using a
adversary has complete knowledge of the encoding
default filter and hides information in those areas
system but does not know the secret key. His or her that get a better rate. The filter is applied to the
task is to devise a model for the probability
most significant bits of every pixel, leaving the less
distribution PC of all possible cover media and PS significant to hide information. The filter ensures
of all possible stego media. The adversary can then
the choice of areas of the image in the least impact
use detection theory to decide between hypothesis with the inclusion of information, which affects a
C (that a message contains no hidden information)
greater difficulty of detecting the presence of
and hypothesis S (that a message carries hidden hidden messages [10]. The retrieval of information
content). A system is perfectly secure if no decision
is ensured because the bits used for filtering are not
rule exists that can perform better than random changed, implying that the reapply the filter will
guessing. Essentially, steganographic
select the same bits in the process of concealment.
communication senders and receivers agree on a It is the most efficient method to hide information.
steganographic system and a shared secret key that
determines how a message is encoded in the cover The algorithm SLSB belongs to this type.
medium.
Description of the Algorithm SLSB:
Where C(x) = 1/1.414 when x=0 and C(x)= 1 at Embedded information in a JPEG.
values x other than 1. (a) The unmodified original picture;
To determine the grid bias or colour offset, a (b) The picture with the first chapter of The
follow-on function must be employed. The Hunting of the Snark embedded in it.
following formula; where Q (u, v) defines the
quantization table for the internal elements is used:
Frequency histograms.
4.3. SEQUENTIAL:
Derek Upham’s JSteg was the first publicly
available steganographic system for JPEG images.
Its embedding algorithm sequentially replaces the
least-significant bit of DCT coefficients with the
message’s data. The algorithm does not require a
shared secret; as a result, anyone who knows the
steganographic system can retrieve the message
hidden by JSteg. Andreas Westfeld and Andreas
Pfitzmann noticed that steganographic systems that
change least-significant bits sequentially cause
distortions detectable by steganalysis.8
They observed that for a given image, the
embedding of high-entropy data (often due to
encryption) changed the histogram of colour Sequential changes to the
frequencies in a predictable way. (a) Original and
In the simple case, the embedding step changes the (b) Modified image’s least-sequential bit of
least-significant bit of colours in an image. The discrete cosine transform coefficients
colours are addressed by their indices i in the tends to equalize the frequency of adjacent
colour table; we refer to their respective DCT coefficients in the histograms.
frequencies before and after embedding as ni and Above figure displays the histogram before and
ni*. Given uniformly distributed message bits, if after a hidden message is embedded in a JPEG
n2i> n2i+1, then pixels with colour 2i are changed image. We see a reduction in the frequency
more frequently to colour 2i + 1 than pixels with difference between coefficient –1 and its adjacent
colour 2i + 1 are changed to colour 2i. As a result, DCT coefficient –2. We can see a similar reduction
the following relation is likely to hold: in frequency difference between coefficients 2 and
3. Westfeld and Pfitzmann used a -test to
determine whether the observed frequency
distribution yi in an image matches a distribution yi
In other words, embedding uniformly distributed * that shows distortion from embedding hidden
message bits reduces the frequency difference data. Although we do not know the cover image,
between adjacent colours. The same is true in the we know that the sum of adjacent DCT coefficients
JPEG data format. Instead of measuring colour remains invariant, which lets us compute the
frequencies, we observe differences in the DCT expected distribution yi * from the stego image.
coefficients’ frequency. Letting ni be the DCT histogram, we compute the
arithmetic mean to determine the expected
distribution and compare it against the observed
distribution.
4.5.1. F5 ALGORITHM:
5.2. STEGANOGRAPHY
ANALYZER SIGNATURE SCANNER
(StegAlyzerSS):
StegAlyzerSS gives you the capability to
scan every file on the suspect media for the
presence of hexadecimal byte patterns, or
signatures, of particular Steganography
applications in the files. If a known signature is
detected, it may be possible to extract information
hidden with the Steganography application
associated with the signature.
EXAMPLE:
The colour contents of JPEG images are examined.
A modification to each coefficient’s LSB produces
variations in the data that results in deviations to
the histogram for the given file. If the deviations The arrows in the embedded histogram indicate
are large enough to produce noticeable aberrations, two obvious differences in the waveform.
the embedded file’s of the file (containing 42,886 Steganalysis takes this phenomenon one
colours). histogram can identify the existence of the step further by comparing the normalized
hidden message. Likewise, LSB modifications to distribution of colours against a predicted value.
palette-based images (GIF, etc.) cause duplications For palette based images, a normal distribution of
of the colours in the palette with identical or nearly colour frequency is likely. A scalable standard bell
identical colours appearing. This duplication of curve can be assumed as the comparison
colours can also serve as an indicator pointing to benchmark against the suspect file. Changes to the
the existence of hidden data. When examining the LSBs for any given pixel can create duplicate (or
greyscale histograms for an original and a near duplicate) colors in the image’s color palette.
steganographically embedded JPEG (such as in The duplicate colours increase the frequency for
Figure 4), slight deviations in the histograms are that value and can create a spike in the distribution
noticeable. The greyscale histogram provides a exceeding the benchmark reference. Any large
cumulative value for all three colour channels (red, deviations from the benchmark can be an indicator
green, and blue) at each brightness level (0-255). of anomalies or modifications to the contents of the
Figure 5 compares the same photograph in its file.
original form (containing 42,784 colours) to an The process for JPEGs can be a bit more
embedded version complicated. Because the JPEG format does not
use a palette based encoding algorithm, a second
step is necessary to compare DCT frequency to a
benchmark. Algorithms that sequentially modify
the DCT coefficients in JPEG files tend to cause
distortions in the histogram that flatten out the
frequency values of adjacent DCTs. To compensate
for this issue, newer algorithms do not sequentially
embed the data but rather use a password or key to false-negative rate is preferable to a high false-
generate a random order for DCT or LSB positive rate.
modifications.
Westfeld and Pfitzmann used a test to predict the
probability that an image contained steganographic
content by comparing the expected distribution (the
null hypothesis) against the sampled values. If the
measured value produced a deviation from the
expected, then the amplitude of the deviation was
proportional to the probability of steganographic
content at that point in the file. Because their
algorithm ran on sequential bytes with an
increasing sample size for each calculation, when
the probability dropped, the size of the hidden
message was often revealed as well.
After detection of hidden content with a carrier file,
the next step is recovery of the hidden message
itself. Because modifications to the data comprising
the carrier file are made without incorporating a
mapping back to the original values, recovery of
the original carrier file is difficult and sometimes
impossible. For digital pictures, audio, video, and
even file slack space, steganographic modifications
to the original contents often destroy the integrity
of the carrier file in the process.
6.2. STEGANOGRAPHY
DETECTION ON INTERNET:
6.2.2. VERIFYING CONTENT:
To assess claims that steganographic content is The statistical tests used to find steganographic
regularly posted to the Internet. To find out if such content in images indicate nothing more than a
claims are true, we have a steganography detection likelihood that content is embedded. Because of
framework that gets JPEG images off the Internet that, Stegdetect cannot guarantee a hidden
and uses steganalysis to identify subsets of the message’s existence. To verify that the detected
images likely to contain steganographic content. images have hidden content,
Stegbreak must launch a dictionary attack against
6.2.1. DETECTION FRAMEWORK: the JPEG files. JSteg-Shell, JPHide, or Outguess all
Stegdetect is an automated utility that can analyze hide content based on a user-supplied password, so
JPEG images that have content hidden with JSteg, an attacker can try to guess the password by taking
JPHide, and OutGuess 0.13b. Stegdetect’s output a large dictionary and trying to use every single
lists the steganographic systems it finds in each word in it to retrieve the hidden message. In
image or writes “negative” if it couldn’t detect any. addition to message data, the three systems also
Stegdetect’s false-negative rate depends on the embed header information, so attackers can verify a
steganographic system and the embedded guessed password using header information such as
message’s size. The smaller the message, the message length. For a dictionary attack28 to work,
harder it is to detect by statistical means. Stegdetect the steganographic system’s user must select a
is very reliable in finding images that have content weak password.
embedded with JSteg. For JPHide, detection
depends also on the size and the compression
quality of the JPEG images. Furthermore, JPHide
0.5 reduces the hidden message size by employing
compression. For JSteg, we cannot detect messages
smaller than 50 bytes. The false-negative rate in
such cases is almost 100 percent. However, once
the message size is larger than 150 bytes, our false-
negative rate is less than 10 percent. For JPHide,
the detection rate is independent of the message
size, and the false-negative rate is at least 20
percent in all cases. Although the false-negative
rate for OutGuess is around 60 percent, a high
send a top secret, private or highly sensitive
document over an open systems environment such
as the Internet.
7.3. DIGITAL
WATERMARKING:
Although not a pure steganographic
technique, digital watermarking is very common in
today's world and does use Steganographic
techniques to embed information into documents.
Digital watermarking is usually used for copy write
6.2.3. DISTRIBUTED DICTIONARY reasons by companies or entities that wish to
ATTACK: protect their property by either embedding their
Stegbreak is too slow to run a dictionary trademark into their property or by concealing
attack against JPHide on a single computer. serial numbers/license information in software, etc.
Because a dictionary attack is inherently parallel, Digital watermarking is very important in the
distributing it to many workstations is possible. To detection and prosecution of software
distribute Stegbreak jobs and data sets, we pirates/digital thieves.
developed Disconcert, a distributed computing
framework for loosely coupled workstations. Hydan programme by Rakan El-Khalil exploits
There are two natural ways to parallelize a redundancy in Intel x86 instruction set. Instruction
dictionary attack: each node is assigned its own set addl $20, % eax can be altered to subl $-20, % eax
of images or each node is assigned its own part of and vice versa. Then let the add means logical 1,
the dictionary. With more words existing than sub means logical 0 and a solid base for
images, the latter approach permits finer stegosystem has born.
segmentation of the work. To run the dictionary
attack, Disconcert hands out work units to
workstations in the form of an index into the
dictionary. After a node completes a work unit, it
receives a new index to work on.
7. APPLICATIONS:
The three most popular and researched uses for
steganography in an open systems environment are
• Covert channels,
• Embedded data and
• Digital watermarking.