Sie sind auf Seite 1von 14

SECURITY AND COMMUNICATION NETWORKS

Security Comm. Networks (2017)


Published online in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/sec.1757

RESEARCH ARTICLE

Text steganography: a novel character-level embedding


algorithm using font attribute
Bala Krishnan Ramakrishnan1*, Prasanth Kumar Thandra2 and A. V. Satya Murty Srinivasula2
1
Indira Gandhi Centre for Atomic Research, HBNI, Kalpakkam, Tamil Nadu, India
2
Indira Gandhi Centre for Atomic Research, Kalpakkam, Tamil Nadu, India

ABSTRACT
Text Steganography uses text documents as cover medium to communicate the secret messages, covertly, by making
unnoticeable distortions in the cover medium. Character-Level Embedding Technique (CLET), a variant of text
steganography, embeds a secret character by serially marking/distorting an identical character in the cover medium. Hence,
these techniques suffer from low embedding capacity as the occurrence of certain alphabets in a text document is not
uniform/guaranteed. In addition, identification of the marked cover character itself can reveal the hidden secret. This makes
the CLET a less preferable choice. To overcome the aforementioned shortcomings, this paper proposes a novel CLET by
introducing the Frequency Normalization Set in combination with the Character and String Mapping. The combination al-
lows a low occurring character to get embedded in several cover characters, and thereby boosts its embedding probability.
In addition, the combination also ensures the uniform embedding probability of secret characters. A font attribute called the
character spacing is used to embed the secret. Experiments were conducted to investigate the embedding capacity,
uniformity in embedding probability and frequency profile of stego characters of the proposed method. A comparison of
the proposed method with the existing CLET algorithms is also provided. Various applications and the security aspects
of the proposed method have also been discussed. Copyright © 2017 John Wiley & Sons, Ltd.
KEYWORDS
character-level embedding; Frequency Normalization Set; Character and String Mapping; covert communication; information hiding;
text steganography

*Correspondence
Bala Krishnan Ramakrishnan, Computer Division, EIG, Indira Gandhi Centre for Atomic Research, HBNI, Kalpakkam, Tamil Nadu, India.
E-mail: balakrishnanr1987@yahoo.co.in

1. INTRODUCTION also came into the picture. Steganographic technique


embeds the secret by making unnoticeable distortions in
The substantial growth of the digital communication has the redundant information of the cover medium [14]. As
increased the concern of secure communication for users the structure and visual appearance of text documents are
overtime. To meet the privacy requirements, both directly related, the choice of using text documents as a
cryptographic and steganographic techniques were cover medium is difficult.
employed [1]. Steganography provides secrecy [2], Besides the complications and challenges involved, text
whereas, cryptography ensures the security [3]. Though steganography is not a completely avoidable one. There
cryptography succeeds in providing the security, through can be many organizations where text documents could
encryption, it attracts the attention of adversaries as it be exchanged more frequently than other types of digital
cannot hide its own presence [4,5]. Conversely, steganog- media. It is straightforward that using a cover medium that
raphy avoids such attention by performing the communica- is being exchanged in abundance, to perform the covert
tion in a stealthy manner [5–8]. communication, will make the life of an adversary tougher.
Based on the type of cover medium used, digital In addition, because of the smaller file sizes involved, text
steganography is classified into various categories [9]. steganography requires less memory and bandwidth [5].
Commonly used cover mediums are text, image, audio Considering these facts, this paper introduces a novel text
and video files [10]. Recently, Electronic Mail [10], steganographic scheme that embeds the secret inside the
network packets [11,12] and Short Message Service [13] text documents with high embedding capacity.

Copyright © 2017 John Wiley & Sons, Ltd.


Text steganography B. K. Ramakrishnan, P. K. Thandra and A. V. S. M. Srinivasula

The three possible ways to embed a secret inside the text Table I. Occurrence frequency of characters in English text.*
document are character-level embedding, bit-level
embedding and mixed-type embedding. CLET is further
classified into two categories. Category I algorithms serially
embeds the secret character by marking them directly in the
cover medium and Category II algorithms, based on the se-
cret character, directly generates the stego work without the
need for a cover medium. In bit-level embedding, the secret
message is either encrypted [15] or converted into their
equivalent binary bits and the resultant bit string is embedded
inside the cover medium. In mixed-type embedding, first the
secret message is converted into an equivalent bit string, and
then it is split into groups of 2 or 4-bits each. Each group is,
then, mapped to an alphabet and consequently the mapped
alphabet is embedded inside the cover medium.
The proposed work falls under the classification of
CLET―Category I. In this type of embedding, each cover
character has the capability to carry a secret character.
*Since the standard occurrence frequency [21], does not consider the
Existing Category I techniques, like [3,9,16–20], serially
‘Dot’ and ‘Space’ characters, a normal English text document is taken
embeds a secret character, say ‘R’, inside the same cover to calculate the occurrence frequency.
character ‘R’. Hence the utilization of the available ‡
Ideal occurrence probability = 100 / 28 = 3.571; For experimental pur-
embedding space, of these techniques, directly depends on pose, any values between 2 and 5 is considered as the average occur-
the availability of characters (in the same order as they appear rence probability.
in the secret message) in the cover medium. Therefore, a low
occurring character takes more embedding space to get em-
bedded, and vice versa. In other words, the ‘Space’ character
requires only 5* cover characters, whereas, the character ‘Q’ character to get embedded even in their absence in the
requires 2128† cover characters. cover medium and also ensures the efficient utilization of
Thus, the existing techniques make inefficient utilization the available embedding space. The attained embedding
of the available embedding space, while embedding a low oc- capacity by this novel idea is even higher than the bit-wise
curring character. This highly affects the embedding capac- and mixed-type embedding techniques. In addition, as a
ity of such systems as a majority of characters have low single cover character carries several different secret char-
occurrence probability in English text document (refer acters, the extraction process is not straightforward.
Table I). In addition, as the relationship between the secret Organization of the rest of this paper is as follows.
and cover character is straightforward, identification of the Section 2 discusses the existing techniques and their
marked cover characters, itself, can reveal the hidden secret. drawbacks. Section 3 introduces the proposed algorithms
Hence, the challenges involved in the Category I which include the generation of FNS, CSM, embedding
steganography are: (i) the secret character(s) must be able and extraction procedures. Section 4 presents the
to be embedded serially, even in their absence (or not in same evaluation of the proposed method and compares the
order) in the cover medium; (ii) embedding a low occurring obtained experimental results with other existing tech-
character (refer Table I) should not waste the available em- niques. Section 5 discusses the various security aspects of
bedding space; and (iii) identification of the marked cover the proposed method. Section 6 highlights the application
characters, itself, should not reveal the hidden secret. of the proposed method. Section 7 provides the future
To handle the aforementioned said challenges, the pro- scope, and Section 8 concludes this paper.
posed method introduces a novel idea called the Frequency
Normalization Set (FNS) in combination with the Charac-
ter and String Mapping (CSM). The main purpose of the 2. RELATED WORKS
combination is to make the embedding probability of all
the characters uniform. The proposed procedure of embed- This section introduces the various existing techniques of
ding a low occurring secret character in several different Text Steganography that are mentioned in Section 1.
high and average occurring cover characters makes this,
uniformity, achievable. Doing so guarantees the secret 2.1. Character-level embedding techniques

2.1.1. Category I
*
100/20.2944 ≈ 5 (occurrence probability of ‘Space’ is 20.2944—refer (1) Character Marking
Table I)
† This technique marks the characters in a text document
100/0.0470 ≈ 2128 (occurrence probability of ‘Q’ is 0.0470—refer
Table I) by changing its feature like height [19], font size [9,16,17],

Security Comm. Networks (2017) © 2017 John Wiley & Sons, Ltd.
DOI: 10.1002/sec
B. K. Ramakrishnan, P. K. Thandra and A. V. S. M. Srinivasula Text steganography

font style [9], or by making it bold or underline. Grouping (3) Unicode Space Characters [26]
the marked characters together reveals the hidden secret.
This technique injects special Unicode Space
(2) Null Cipher [3,9,20] Characters like En Quad, Em Quad, three-per-Em,
Six-per-Em, Figure, Punctuation, Thin and Hair in
This technique uses a fixed position from each word or inter-word, inter-sentence, end-of-line and inter-paragraph
sentence or paragraph or page to represent the secret spacings to embed the bits secretly [23]. This technique
character. Grouping the characters at the specified position embeds 2-bits/space-character.
reveals the hidden secret.
(4) Change Tracking Technique [27]
(3) Typing Errors
This technique uses the change tracking facility of
This technique explores the common typing errors like some document formats, like Microsoft Word, to embed
misspelling [9,17,18], misplacing the characters [17,19] the bits secretly. The sender selects a normal document
slightly above or below the baseline to embed the secret. and degenerates it by making mistakes in it. Then the
Grouping the original characters in the misspelled word or sender himself corrects the mistakes using comment
the misplaced characters reveals the secret message. An im- and sends it to the receiver. Using the change tracking
proved version of the misspelling technique chooses the new information, the receiver identifies the intentionally
word in such a way that the chosen word also exists in the created mistakes and extracts the hidden bits. This
vocabulary [18]. For example, ‘chat’ in the place of ‘that’. technique has a very low embedding capacity of 0.33-
bits/word.
2.1.2. Category II
(1) Missing Letter Puzzle [16] 2.3. Mixed-type embedding technique

The generated stego work contains a list of words with (1) Publishing Summary [28]
one or two characters in each word replaced by the
question mark making it to disguise like a puzzle. The Using the vertical and horizontal reflection symmetry
ASCII value of the secret character, to be hidden, property, English alphabets are divided into four groups
determines the length of the word. The original character each representing 2-bits of information. Based on each
at the place of the question mark in each word represents secret bit pair, the embedding algorithm picks the
the secret character. On average, this technique requires sentences from the cover document whose first alphabet
10.5 cover characters to embed one secret character. of the first word (not an article) in each sentence
matches with any one of the alphabets present in the
(2) Hiding Data in Wordlist [16] corresponding group. Thus, the generated stego work
will look like a summary. This technique embeds 2-
This technique also generates a list of words. This bits/sentence.
technique uses the ASCII value of the secret character to Thus, the CLET―Category I and mixed-type embed-
determine the first character of each word and its length. ding techniques suffer from low embedding capacity
To embed one secret character, on an average, this because of the non-uniform occurrence probability of
technique requires 10.5 cover characters. characters in text documents. The CLET―Category II
can attract the attention of third parties because of the
2.2. Bit-level embedding techniques use of special characters. In addition, sending a list of
words having no relation between them can raise suspi-
(1) Line and Word Shifting [19,22,23] cion. The bit-level embedding technique requires more
number of distortions to embed one secret character.
Line shifting technique embeds a secret by shifting a line From now on, any reference to the character-level embed-
up or down to some degree. A line is moved up or down, ding technique (CLET) made in this paper refers to
while the lines adjacent to it are left unmoved. Similar to Category I.
line shifting, word shifting moves the middle word, keeping
the adjacent words in a line constant. Line and Word
shifting embeds 2-bits/5-lines and 5-bits/line respectively. 3. PROPOSED WORK
(2) White Spacing [24,25] The proposed work comprises of four modules that in-
clude the generation of FNS, CSM, embedding algorithm
Extra white space or tab is inserted between words, and extraction algorithm. FNS generates 28 strings with
sentences, paragraphs or at the end of each line to embed the cumulative frequency of each string close to uniform.
the secret bits. Their extra presence represents ‘1’ while CSM maps these 28 strings to the 28 possible secret
the other represents ‘0’. characters and thereby achieves the uniform distribution

Security Comm. Networks (2017) © 2017 John Wiley & Sons, Ltd.
DOI: 10.1002/sec
Text steganography B. K. Ramakrishnan, P. K. Thandra and A. V. S. M. Srinivasula

Figure 1. Generation of the Frequency Normalization Set.

in stego characters in addition with the uniform embed- 3.1. Generation of the Frequency
ding probability. The embedding algorithm embeds the Normalization Set
secret characters inside the cover document and the ex-
traction algorithm does the reverse of the embedding As explained earlier, the embedding capacity of the
algorithm. existing CLET techniques [3,9,16–20] is directly

Security Comm. Networks (2017) © 2017 John Wiley & Sons, Ltd.
DOI: 10.1002/sec
B. K. Ramakrishnan, P. K. Thandra and A. V. S. M. Srinivasula Text steganography

Table II. Sample Character and String Mapping.

Mapped Frequency Normalization Cumulative Mapped Frequency Normalization Cumulative


character Set (FNS) frequency character Set (FNS) frequency

A WRZFOIN 23.9587 O MNSLAB. 24.5851


B F□CWQXP 26.1824 P TVIZM.A 24.1936
C JMFXY□Z 25.8691 Q SGKPHEM 25.4306
D XA.MNTU 25.0548 R HEUVIYJ 24.9452
E YCHIJOS 24.8356 S IYLNUAK 25.1018
F KUX□ZJC 25.3523 T OFJGCDE 24.5694
G □.BCPQX 25.2272 U VDEHKLY 25.0079
H BLOSWMI 25.4777 V LONQSUR 25.1018
I PJGALHO 23.9744 W QP□KXRV 26.7617
J NSVOBGH 25.0079 X EZDYGPT 25.3054
K UWY.TSL 23.0505 Y .KQUVZ□ 26.0101
L DIREFCQ 25.3367 Z AXMTRVD 24.8356
M GHTR.WB 22.753 □ RTWBDNF 24.9766
N ZBADEFG 25.0392 . CQPJ□KW 26.0571
where, ‘□’ represents ‘Space’ and ‘.’ represents ‘Dot’

dependent on the availability of the respective characters in which means that, on an average, out of the four consecu-
the cover medium. Hence, a low occurring character ‘Q’ tive characters encountered in the cover medium, a secret
requires 2128‡ cover characters to get embedded. How- character would be embedded.
ever, embedding the same ‘Q’ in several different cover In this view, the proposed method introduces a novel
characters, say ‘S’, ‘G’, ‘K’, ‘P’, ‘H’, ‘E’ and ‘M’, will technique called FNS that generates 28 strings with the fol-
boost the cumulative probability to embed ‘Q’ from lowing properties:
0.0470% to 25.4306%―requiring only 4§ cover
characters. (1) Each string contains English alphabets, Dot and
Let ‘P’ be the ideal cumulative probability of embed- Space (ADS) characters.
ding a secret character inside a cover character. Then, ‘P’ (2) The length of each string ‘L’ is equivalent to the
can be defined as in equation (1). chosen number of characters that are cumulated to
where, the ‘+2’ in the denominator represents the inclusion achieve the uniform embedding probability
of the ‘Dot’ and ‘Space’ characters in the secret message. (NCC). Therefore, each string has ‘L’
Our empirical studies suggest that when the number of positions/columns {0, 1, 2, 3, ...., L-1}.
the cumulated characters is 7††, a uniform embedding (3) Each character occurs only once in a string.
2 3
6 7  Number of characters cumulated to achieve 
6 100 7
P¼ 6 7X (1)
4 Number of alphabets 5 the uniform embedding probability ðN CC Þ
in English þ 2

probability can be achieved. Hence, by substituting the (4) Each character occurs only once in a position in the
value 7 in equation (1) we obtain, whole FNS. That is, column-wise repetitions should
not occur.
(5) When the individual frequency of characters
100 present in any string is summed up, it converges
P¼ X 7 ¼ 25
28 and falls close to the respective ‘P’ value of the
chosen NCC.
An algorithm has been designed to generate such a

100 / 0.0470 ≈ 2128 FNS. Figure 1 and the subsequent pseudo codes give a
§
100 / 25.4306 ≈ 4 simplified version of the developed algorithm. Readers
††
Substituting NCC = 5 in equation (1) results P = 17.86. As the larg- who are not interested in the finer details of the generation
est occurrence frequency value in Table I is 20.2944% (‘Space’ char- of FNS can skip this algorithm and proceed to
acter), the generation of FNS is not possible when NCC ≤ 5. This is the end of this sub-section, for further reading.
because of the later-mentioned properties (2) and (5) of FNS. Also at
NCC = 6, as P = 21.43, it allows the ‘Space’ character to be combined Pseudo code of Test 1:
only with the low occurring alphabets, resulting in lesser number of //Test 1 (): Chooses a String “str” from a Set of available
combinations during FNS generation. Hence, NCC = 7 is considered Strings.

Security Comm. Networks (2017) © 2017 John Wiley & Sons, Ltd.
DOI: 10.1002/sec
Text steganography B. K. Ramakrishnan, P. K. Thandra and A. V. S. M. Srinivasula

Input: Set S, ArrayList chkduplicate L: For each Character “ch2” in “individualchar[]” do


Output: String str If dupe==0 && ch2 ∉ alreadyselected && Count(ch2) < 7 && ch2 ∉
Int sim_count2 ← 100 chkduplicate @ position “pos” && Frequency[ch2] < =
Int value2 ← 100 Frequency[ch1] && Frequency[ch2] > max
String str ← ”” ch ← ch2
For each String “str1” in “S” do max ← Frequency[ch2]
Int sim_count[cardinality of chkduplicate] ← No. of End if
similar characters between “str1” and each string in If ch==“ && Last element of “individualchar[]” == ch2
“chkduplicate” respectively dupe ← 1
Int max_count ← Largest value in sim_count[] Goto L
If max_count < sim_count2 End if
Int value[7] ← No. of times each character of “str1” If dupe==1 && (ch2 ∉ alreadyselected && Count(ch2) < 7 && ch2 ∉
appeared in “chkduplicate” respectively chkduplicate @ position “pos”) && ((Frequency[ch2] >
Int max_value ← Largest value in value[] Frequency[ch1]) || (ch2 ∈ str && Count(ch2) < = 5))
If max_value < value2 ch ← ch2
str ← str1 Break
sim_count2 ← max_count End if
value2 ← max_value End for
End if Return ch
End if
End for
Return str

Pseudo code of Test_3b:


Pseudo code of Sub-Module: Count //Chooses a String from “chkduplicate” that satisfies certain
Input: Char ch, ArrayList chkduplicate properties
Output: Int no Input: ArrayList chkduplicate, Double individualfreq[], Char
Return Number of times “ch” has occurred in “chkduplicate” ch1, Int pos, Char ch, String str
Output: String str2
String str2 ← “”
Int dupe ← 0
Pseudo code of Test 2: Int min ← 100
//Test2 (): Tries to replace the characters of “str”, when its L: For each String “str1” in “chkduplicate” do
number of occurrences in “chkduplicate” crosses certain If dupe==0 && ch1 ∈ str1 @ position “pos” && ch ∉ str1
limit. str2 ← str1
Input: String str, Int occurred_first[], Char End if
individualchar[], Double individualfreq[], Double minerror, If str2==”” && Last element of “chkduplicate[]” == str1
Double maxerror, ArrayList chkduplicate dupe ← 1
Output: String str Goto L
String result ← “SUCCESS” End if
For each Character “ch” in “str” do If dupe == 1 && ch ∉ str1 && str1[pos] ∉ str &&
Int pos ← Position of “ch” in “str” Frequency[str1[pos]]-Frequency[ch] < min && < 4.5
If (Count(ch)==7) || (Count(ch)==6 && occurred_first of str2 ← str1
“ch”==0 && str[0] ! = ch) min ← Frequency[str1[pos]]-Frequency[ch]
Search for a character “ch1” in “individualchar[]” such End if
that ch1 ∉ str && ch1 ∉ chkduplicate @ position “pos” && End for
Frequency[ch1]-Frequency[ch] is between “minerror” and Return str2
“maxerror” exclusive
If Search == SUCCESSFUL then str[pos] ← ch1
Else result ←”FAIL” From the experiments conducted, it has been observed
End if
End for that a large number of such FNS exists. Table II depicts
Return (str, result) one such generated FNS along with the respective cumula-
tive frequency values.
Pseudo code of Test 3:
//Test3 (): Handles the problematic characters of “str”. 3.2. Character and String Mapping
Input: String str, ArrayList chkduplicate, ArrayList
alreayselected
Output: String str Map the generated 28 strings of FNS to the 28 possible
For each problematic character “ch1” in “str” do
Int pos ← Position of “ch1” in “str” ADS characters of the secret message. Map the characters
L: Char ch ← Test_3a ()
If ch ∉ str then str[pos] ← ch
in the same order as in Table I (refer Table II). Perform
Else the mapping in such a way that:
String str1 ← Test_3b ()
If str1 ! = “”
str[pos] ← str1[pos] Table III. Possible ways to embed the secret character ‘X’.
chkduplicate ← Remove “str1” from
“chkduplicate”
str1[pos] ← ch
Select the
chkduplicate ← Add “str1” in “chkduplicate” string from Yj ∈ Position of Yj Respective
End if
Else add “ch” in “alreadyselected” & Goto L
CSM that is selected in selected character
End else mapped to Xi string string spacing
End for
Return str
EZDYGPT E 0 Condense by 0.1 pt
Z 1 Condense by 0.2 pt
Pseudo code of Test _3a: D 2 Condense by 0.3 pt
//Chooses a Character from “individualchar” that satisfies cer- Y 3 Expand by 0.1 pt
tain properties
Input: Char individualchar[], Double individualfreq[], G 4 Expand by 0.2 pt
ArrayList chkduplicate, Int pos, Char ch1, ArrayList P 5 Expand by 0.3 pt
alreadyselected, String str
Output: Char ch T 6 Expand by 0.4 pt
Char ch ← “
Int dupe ← 0 Xi - Secret character to be hidden; Yj - Character encountered in the cover work
Int max ← 0 CSM, Character and String Mapping.

Security Comm. Networks (2017) © 2017 John Wiley & Sons, Ltd.
DOI: 10.1002/sec
B. K. Ramakrishnan, P. K. Thandra and A. V. S. M. Srinivasula Text steganography

Table IV. Possible characters hidden in the stego character ‘U’.

Character Position based on String in CSM that contains ‘U’ Actual embedded
spacing of Xi character spacing at the specified position character

Condensed by 0.1 pt 0 UWY.TSL K


Condensed by 0.2 pt 1 KUX□ZJC F
Condensed by 0.3 pt 2 HEUVIYJ R
Expanded by 0.1 pt 3 .KQUVZ□ Y
Expanded by 0.2 pt 4 IYLNUAK S
Expanded by 0.3 pt 5 LONQSUR V
Expanded by 0.4 pt 6 XA.MNTU D
Xi - Character encountered in the stego work
CSM, Character and String Mapping.

Cover_Work
(1) The mapped character does not exist in the selected count++
string. If Yj ∈ Sk
Int pos ← Position of Yj in Sk
(2) Map a high frequency character to a string that con- Change the character spacing of Yj based on pos
tains atleast 2-low, 1-average and 1-high frequency End if
Else Goto L
characters. CSM ← Perform a Circular Left Shift on all the Strings in CSM //
makes the CSM dynamic
(3) Map an average or a low frequency character to a End for
string that contains atleast 1-low, 1-average and 1- Return Modified_Cover_Work
high frequency characters.
(4) The mapping should not make any cover character For example, let Xi be ‘X’ and is the first character in the
to carry more secret characters**. secret message. Then, the seven possible ways to embed Xi in
Yj are given in Table III. (use the CSM provided in Table II)
This mapping procedure not only increases the embed- The modified cover work is the required stego work
ding probability of a low occurring character but also which has to be saved and communicated to the receiver.
makes it uniform, by efficiently utilizing the available em- In addition, the used CSM, EoS characters and the respec-
bedding space. Also, it makes the embedding process tive character spacing of the seven positions should also be
faster than other existing CLET algorithms. communicated.

3.3. Embedding algorithm 3.4. Extraction algorithm


To embed the secret, the proposed method explores a font The extraction process is the reverse of the embedding pro-
attribute called the character spacing of text documents cess. The pseudo code of the extraction procedure is as follows.
like, Microsoft Word/Power Point and Libre Office Writer.
The character spacing is either decreased/condensed by
Pseudo code of the Extraction Procedure:
{0.1, 0.2 or 0.3} points or increased/expanded by {0.1, Input: Stego_Work, CSM, EoS characters, character
0.2, 0.3 or 0.4} points to represent the seven respective po- spacing and their respective positions
Output: Secret_Message
sitions of the string in CSM. Int count ← 1
The embedding procedure starts if the number of ADS String Secret_Message ← “”
Repeat
characters in the cover work is at least four times greater Char Xi ← Read the character @ position “count” in
Stego_Work
than the number of characters in the secret message (in- count++
cluding the End of Secret (EoS)††). The pseudo code of If character spacing of Xi ≠ 0.0
Int pos ← Respective position, based on the character
the embedding procedure is as follows. spacing of Xi
String Sk ← String in CSM that has Xi @ position pos
Secret_Message ← Secret_Message + Character that
Pseudo code of the Embedding Procedure: is mapped to Sk in CSM
Input: Secret_Message, Cover_Work, CSM, EoS characters, char- CSM ← Perform a Circular Left Shift on all the
acter spacing and their respective positions Strings in CSM
Output: Modified_Cover_Work End if
Int count ← 1 Until (count > Total no. of characters in Stego_Work || EoS is
Secret_Message ← Secret_Message + EoS Characters read)
For each Character “Xi” in Secret_Message do Return Secret_Message
String Sk ← String that is mapped to Xi in CSM
L: Yj ← Read the character @ position “count” in
For example, let Xi be ‘U’ and is the first encountered
**
stego character. Table IV gives the seven possible charac-
For example, mapping several high frequency characters to strings
ters that are embedded in ‘U’ (use the CSM provided in
containing ‘Space’ will make the ‘Space’ character to carry more se-
cret characters Table II).
††
End of Secret characters can be anything that does not appear in the Reaching the end of the stego work without encounter-
secret message. For experimental purpose, it has been considered as ‘. . ing the EoS characters indicates, the receiver, that a
.’ → ‘Dot Space Dot Space Dot’ corrupted stego work is received.

Security Comm. Networks (2017) © 2017 John Wiley & Sons, Ltd.
DOI: 10.1002/sec
Text steganography B. K. Ramakrishnan, P. K. Thandra and A. V. S. M. Srinivasula

Table V. Result of embedding the secrets in the cover medium.

No. of characters encountered in the cover medium while embedding the secret + EoS
No. of characters characters Embedding
in the secret message capacity
(excluding EoS) Alphabet Dot Space Other Total (in %)

500 1484 13 282 8 1787 28.10568


1000 2842 27 556 12 3437 29.19708
1500 4535 48 864 26 5473 27.53809
2000 5999 63 1139 32 7233 27.77392
2500 7612 85 1461 45 9203 27.29854
3000 9238 104 1769 59 11 170 27.00027
3500 10 459 116 2032 83 12 690 27.76235
4000 11 736 130 2287 100 14 253 28.26256
4500 13 403 146 2608 114 16 271 27.85171
5000 14 848 164 2879 128 18 019 27.94701
EoS, End of Secret.

3.5. Salient features of the proposed Table VI. Uniformity in embedding probability.
procedure
Secret message (5000 + EoS char)
(1) Uses text documents as cover medium.
(2) Uses character-level embedding technique. No. of times Average no. of cover characters
(3) Embeds the secrets in the ADS characters of the Characters occurred used to embed one secret character
cover medium. A 184 4.011
(4) Secret messages can contain the ADS characters. B 173 3.572
(5) Secret message should not contain the EoS pattern. C 176 3.699
(6) The embedding and extraction procedures are not D 159 3.874
case-sensitive. E 155 3.142
F 205 3.624
G 184 3.549
4. EVALUATION OF THE PROPOSED H 196 3.673
METHOD I 164 4.238
J 172 3.622
The proposed method is evaluated using the following four K 166 4.265
parameters: L 174 3.046
M 194 4.361
(1) Secrecy: It is nothing but the imperceptibility of the N 180 3.578
embedded secret [29,30]. A good steganographic al- O 180 4
gorithm must embed the secret without creating a P 202 4.327
Q 172 3.192
noticeable difference in its visual appearance.
R 174 3.184
S 173 4.023
To test the imperceptibility level, experiments were
T 192 3.010
conducted using the proposed method with various secret
U 192 3.938
messages. For verification purpose, an illustration is pro- V 173 3.312
vided in the sample given below. W 179 3.156
X 171 3.421
Cover Work:
Y 191 3.822
INTRODUCTION: Internet which is extensively Z 160 3.819
used to share any kind of information does not imply Space 189 3.386
any strict rules for the security of data on its own. Dot 175 3.577
EoS, End of Secret.
Secret Message: Come to my home tomorrow.

Stego Work:
INTRODUCTION: Internet which is extensively used to By looking at both the cover and stego work of the given
share any kind of information does not imply any strict sample, it can be observed that the proposed method does
rules for the security of data on its own. not create any attraction in its visual appearance.

Security Comm. Networks (2017) © 2017 John Wiley & Sons, Ltd.
DOI: 10.1002/sec
B. K. Ramakrishnan, P. K. Thandra and A. V. S. M. Srinivasula Text steganography

(2) Embedding Capacity [10,30]: Embedding capacity To validate the uniformity of the proposed method, a
is the measure of the maximum size of the se- random secret message is embedded inside an English
cret that a chosen cover medium can hide. cover medium and the average number of cover charac-
Hence, the embedding capacity can be defined as ters used to embed each secret character is given in
in equation (2). Table VI.

Number of Characters in the Secret Message ðExcluding EoSÞ


Embedding Capacity ¼ (2)
Number of ADS Characters required in the Cover Medium

Experiments were carried out by embedding various From Table VI, it can be observed that the average em-
English secret messages inside a standard English cover bedding probability of any secret character uniformly falls
medium. Table V furnishes the number of characters pres- within the range 3.571  0.8. This proves that the proposed
ent in the secret message and the number of characters used method utilizes the available embedding space efficiently.
in the cover medium to embed the same.
From Table V, it can be observed that the proposed (4) Comparison with the Existing Methods: Table VII
method achieves an average embedding capacity of provides the comparison of the proposed method
27.87%. with the techniques explained in Section 2.

(3) Uniformity in Embedding Probability: A good The comparison is carried out by considering the aver-
character-level embedding algorithm must embed age length of a word as 4.50 (not including Space) [31],
any secret character, in a uniform manner, irrespec- the average number of characters in a line as 60 (including
tive of its occurrence frequency in the cover Space) [32], and the average number of words in a sen-
medium. tence (should be between 15 and 20 [33]) as 15.

Table VII. Comparison of the proposed method with the existing techniques.

No. of No. of No. of cover


distortions Whether embeddable characters
Level of required to possibility of locations required to
stealthiness embed a non-occurrence required to embed one
Type of (based on secret Embedding of alphabets embed a secret character
Technique embedding visibility) character probability is handled? secret character (approximate)

Character marking Character-level Low 1 Non-uniform No 1 Variable


[9,16,17,19],
Typos [9,17–19]
Missing letter Character-level Low 1 Uniform Yes 1 10.5
puzzle [16]
Hiding data in Character-level Low 1 Uniform Yes 1 10.5
wordlist [16]
Inter-sentence Bit-level Low 8 Uniform Yes 8 751
spacing [24,25]
Inter-word Bit-level Low 8 Uniform Yes 8 44
spacing [24,25]
Line shifting Bit-level Low 8 Uniform Yes 8 1020
[19,22,23]
Word shifting Bit-level High 8 Uniform Yes 8 98.5
[19,22,23]
Unicode space Bit-level High 4 Uniform Yes 4 26.5 (inter-word
characters [26] (2-bits) space)
Change tracking Bit-level High 24 Uniform Yes 24 132
technique [27]
Publishing Mixed-type High 4 Non-uniform No 4 333
summary [28] (2-bits)
Proposed method Character-level High 1 Close to Yes 4 4
uniform

Security Comm. Networks (2017) © 2017 John Wiley & Sons, Ltd.
DOI: 10.1002/sec
Text steganography B. K. Ramakrishnan, P. K. Thandra and A. V. S. M. Srinivasula

Table VIII. Character spacing of the identified stego characters. From Table VII, it can be observed that the proposed
Identified character spacing
method records the least number of cover characters re-
Stego
quired to embed a secret character. In addition, when
characters 0.1 0.2 0.3 0.1 0.2 0.3 0.4 compared with other existing bit-level embedding tech-
niques, the proposed method stands best by making
A 57 59 49 65 53 67 59 only one distortion to embed a secret character whereas
B 16 14 22 20 11 26 20 the former methods require a minimum of four
C 36 29 20 25 26 25 18 distortions.
D 29 31 41 36 37 41 44
E 98 86 82 85 91 81 102
F 22 19 29 18 21 28 23
5. SECURITY ASPECT
G 14 18 10 18 21 24 22
H 33 38 38 28 37 32 31
This section discusses the various security aspects of the
I 88 64 98 68 62 75 72
proposed method that include the uniform distribution in
J 0 0 0 0 0 0 0
stego characters and cryptographic aspects.
K 2 2 1 5 4 5 4
L 23 23 21 24 21 18 23
M 19 20 13 17 10 20 17 5.1. Uniform distribution in stego characters
N 90 89 81 113 85 73 85 To avoid any leakage of information to an adversary,
O 64 74 71 91 85 81 73 the proposed method should distribute the secret charac-
P 5 5 8 14 5 8 7 ters in the seven levels of character spacing of a cover
Q 0 0 0 1 0 0 0 character uniformly. In this regard, the same has been
R 56 70 75 72 62 60 62 studied by embedding an English secret message inside
S 54 39 48 51 50 46 51 an English cover medium. Table VIII presents the ob-
T 73 59 76 90 77 77 85 tained results.
U 13 19 12 10 13 12 16 From Table VIII, it can be observed that the proposed
V 2 1 5 3 1 3 2 method distributes each secret character, among the avail-
W 12 14 14 18 17 19 17 able seven levels, in a uniform manner.
X 0 0 0 0 1 2 1
Y 14 12 16 12 12 12 11 5.2. Cryptographic aspect
Z 0 1 1 0 0 0 0 By default, the proposed method provides a security
Space 90 81 86 75 78 79 75 level comparable with that of a polyalphabetic substitu-
Dot 6 9 6 10 4 1 9
tion cipher of cryptography, when the used CSM is kept

Figure 2. English secret in English cover work.

Security Comm. Networks (2017) © 2017 John Wiley & Sons, Ltd.
DOI: 10.1002/sec
B. K. Ramakrishnan, P. K. Thandra and A. V. S. M. Srinivasula Text steganography

Figure 3. Random secret in English cover work.

Figure 4. Random secret in Random cover work.

Figure 5. Proposed method combined with the Format Preserving Encryption system. CSM, Character and String Mapping.

Security Comm. Networks (2017) © 2017 John Wiley & Sons, Ltd.
DOI: 10.1002/sec
Text steganography B. K. Ramakrishnan, P. K. Thandra and A. V. S. M. Srinivasula

secret. In the proposed method, as both the secret and 6. APPLICATION


cover document are English text, an attacker can try to
gain knowledge, about the hidden secret, by performing The proposed method, by default, can be used as a CLET to
the frequency analysis on the stego characters. Hence, send any secret message that is not case-sensitive. In addi-
experiments were conducted to study the correlation be- tion, it can also be used as a mixed-type embedding tech-
tween the frequency profile of the secret and stego nique as a way to send encrypted or multimedia data such
characters. as image, audio, video, etc. In such a case, the binary data
has to be divided into groups of 4-bits each. The 16 possible
4-bit values are mapped to the 28 ADS characters as shown
5.2.1. Frequency distribution of stego characters in Table IX. Mapping is carried out in such a way that the
To study the distribution of secret characters in the number of common characters between the mapped strings
available embedding space, the English and random secret to the corresponding characters in CSM is minimal.
messages are embedded inside the English and random Using Table IX, convert the divided bit stream into a
cover works. Figure 2, 3 and 4 illustrates the occurrence character stream which is in turn embedded using CSM.
frequency of characters in the secret message, cover work This is identical to the case of embedding a random secret
and stego characters. in an English text (refer Figure 3).
From Figure 2, 3 and 4, it can be observed that the fre- For example:
quency profile of the stego character is predominantly due Secret bit stream―0000 1111 0010 1101 0101
to the frequency profile of the cover work but not due to Equivalent character stream―A/U Z C/M V F/□
that of the secret message. In the earlier example, the first 4-bits ‘0000’ are em-
bedded using the strings that are mapped to the charac-
5.2.2. Use of the Format Preserving Encryption ter ‘A’ or ‘U’ in CSM, achieving an embedding
system capacity that varies from 4-bits/2-cover-characters to 4-
Combining the proposed method with the Format Pre- bits/4-cover-characters. Whereas, the string mapped to
serving Encryption (FPE) system [34] could further en- the character ‘Z’, alone, is used to embed the second
hance the confidentiality of the embedded secret. Unlike 4-bits ‘1111’ achieving an embedding capacity of 4-
other cryptographic schemes, FPE preserves the length bits/4-cover-characters. Hence, an average embedding
and format of the given input. Thereby the encrypted capacity of 4-bits/3.25-cover-characters can be attained
cipher text, which is again an English alphabet, is then em- by the proposed method in this mixed-type manner.
bedded inside the chosen cover medium using the pro- However, by using the mapping presented in
posed method (refer Figure 5). Table IX, an embedding capacity of 4-bits/2.5-cover-
To break this type of dual security, first, an adversary characters can be achieved. This is due to the non-
has to identify the presence of the hidden message and occurrence of common characters between the strings
has to extract it. After this, he has to break the FPE se- of the mapped character.
curity system in order to get the original secret message. Furthermore, the proposed method is not restricted to
As it is very difficult even for an advanced adversary to English language and can be applied to any.
scan all the traffic flowing in/out of an organization,
identifying the steganographically embedded message
on the fly is more complex. Hence, the combined system 7. FUTURE WORK
can provide a greater challenge to an adversary than
when the FPE and proposed method are individually Future work can focus to make the proposed method suit-
applied. able to embed all printable characters. Developing a

Table IX. Mapping of the ADS characters and the 4-bit values.

No. of common characters No. of common characters


between the strings of between the strings of
4-bit value Mapped character the mapped character 4-bit value Mapped character the mapped character

0000 A/U 0 1000 I/Y 0


0001 B/P 0 1001 J/. 0
0010 C/M 0 1010 K/W 0
0011 D/T 0 1011 L/O 0
0100 E/N 0 1100 Q —
0101 F/□ 0 1101 V —
0110 G/R 0 1110 S —
0111 H/X 0 1111 Z —
where, ‘□’ represents ‘Space’ and ‘.’ represents ‘Dot’
ADS, English alphabets, Dot and Space.

Security Comm. Networks (2017) © 2017 John Wiley & Sons, Ltd.
DOI: 10.1002/sec
B. K. Ramakrishnan, P. K. Thandra and A. V. S. M. Srinivasula Text steganography

flexible algorithm that can generate FNS with strings of 5. Wang Z-H, Chang C-C, Lin C-C, Li M-C. A reversible
any user-defined length can also be targeted. information hiding scheme using left–right and up–
down Chinese. The Journal of Systems and Software,
Elsevier 2009; 82(8): 1362–1369.
6. Johnson NF, Jajodia S. Exploring steganography: see-
8. CONCLUSION ing the unseen. IEEE Computer, IEEE 1998; 31(2):
This paper has introduced a character-level embedding text 26–34.
steganographic algorithm that uses a font attribute called 7. Thiyagarajan P, Aghila G. Reversible dynamic secure
the character spacing to embed the secret. A novel tech- steganography for medical image using graph color-
nique named FNS in combination with the CSM has been ing. Health Policy and Technology, Elsevier 2013;
proposed to achieve the uniformity in embedding probabil- 2(3): 151–161.
ity and thereby to increase the embedding capacity. The 8. Thiyagarajan P, Natarajan V, Aghila G, Prasanna
various characteristics of FNS and the procedure to gener- Venkatesan V, Anitha R. Pattern based 3D Image Steg-
ate them were provided. From the experiments conducted, anography. 3D Research 2013; 4(1): 1–8.
an average embedding capacity of 27.87% is acquired. 9. Desoky A, Younis M. Chestega: chess steganography
This is close to the theoretically expected value of 25% that methodology. Security and Communication Networks,
outperforms other existing methods. The process of em-
Wiley 2009; 2(6): 555–566.
bedding a secret character in multiple cover characters fa-
10. Satir E, Isik H. A compression-based text steganogra-
cilitated the proposed method to achieve a uniform
embedding probability of 3.66 cover characters for a secret phy method. The Journal of Systems and Software,
character. The imperceptible changes made in the cover Elsevier 2012; 85(10): 2385–2394.
work attained high secrecy and hence created no attraction 11. A.S. Nair, A Kumar, A Sur, and S Nandi, “Length
in its visual appearance. Frequency profile of the stego based network steganography using UDP protocol,”
characters is analyzed and found to follow the frequency in IEEE 3rd International Conference on
profile of the cover medium and not the frequency profile Communication Software and Networks (ICCSN),
of the secret message. The proposed embedding scheme, IEEE 2011: 726–730.
by default, provides the security level comparable with that 12. Mazurczyk W, Smolarczyk M, Szczypiorski K. Re-
of a polyalphabetic substitution cipher of cryptography. transmission steganography and its detection. Soft
The use of the FPE system with the proposed method has
Computing, Springer-Verlag 2011; 15(3): 505–515.
been discussed to enhance the security level further. In ad-
13. M. Shirali-Shahreza and M.H. Shirali-Shahreza, Text
dition, a method has been suggested to convert the pro-
posed character-level embedding into a mixed-type Steganography in chat, in 3rd IEEE/IFIP International
embedding, making the method suitable to communicate Conference in Central Asia on Internet, IEEE 2007: 1–5.
any encrypted or multimedia data like image, audio, video, 14. Sabu M. Thampi, “Information hiding techniques: a
etc., with an average embedding capacity of 4-bits/3.25- tutorial review,” in ISTE-STTP on Network Security
cover-characters. & Cryptography, LBSCE, 2004.
15. Mark Ettinger J. Steganalysis and game equilibria. In-
formation Hiding, Springer Berlin Heidelberg 1998;
1525: 319–328.
REFERENCES 16. Agarwal M. Text steganographic approaches: a com-
1. Thiyagarajan P, Aghila G, Prasanna Venkatesan V. parison. International Journal of Network Security &
Stepping up internet banking security using dynamic its Applications, AIRCC 2013; 5: 91–106.
pattern based image steganography. Communications 17. Krista Bennett, Linguistic steganography: survey,
in Computer and Information Science, Springer Berlin analysis, and robustness concerns for hiding informa-
Heidelberg 2011; 193: 98–112. tion in text, Purdue University, CERIAS, Technical
2. Li X, Wang J. A steganographic method based upon 2004-13.
JPEG and particle swarm optimization algorithm. 18. Mercan Topkara, Umut Topkara, and Mikhail J.
Information Sciences, Elsevier 2007; 177(15): Atallah, “Information hiding through errors: a confus-
3099–3109. ing approach”, in Proceedings of the SPIE Interna-
3. Fabien A. P. Petitcolas, Ross J. Anderson, and Markus tional Conference on Security, Steganography, and
G. Kuhn, Information hiding—a survey, Proceedings Watermarking of Multimedia Contents IX, SPIE
of the IEEE, IEEE 1999: 1062–1078. 2007; 6505.
4. Chang C-C, Kieu TD. A reversible data hiding scheme 19. Brassil JT, Low S, Maxemchuk NF. Copyright protec-
using complementary embedding strategy. Informa- tion for the electronic distribution of text documents.
tion Sciences, Elsevier 2010; 180(16): 3045–3058. Proceedings of the IEEE, IEEE 1999; 87(7): 1181–1196.

Security Comm. Networks (2017) © 2017 John Wiley & Sons, Ltd.
DOI: 10.1002/sec
Text steganography B. K. Ramakrishnan, P. K. Thandra and A. V. S. M. Srinivasula

20. Kahn D. The code-breakers: the comprehensive Information Forensics and Security, IEEE 2007; 2:
history of secret communication from ancient times to 24–30.
the internet (2nd edn). Scribner, New York, 1996. 28. A. Majumder and S. Changder, “A novel approach for
21. Behrouz A Forouzan, Debdeep Mukhopadhyay. Cryp- text steganography: generating text summary using
tography and network security. Tata McGraw-Hill reflection symmetry,” in International Conference on
Education, India, 2011. Computational Intelligence: Modeling Techniques
22. Brassil JT, Low S, Maxemchuk NF. Electronic and Application, Elsevier 2013: 112–120.
marking and identification techniques to discourage 29. Ni Z, Shi Y-Q, Ansari N, Wei S. Reversible data
document copying. IEEE Journal on Selected Areas hiding. IEEE Transactions on Circuits and Systems
in Communications, IEEE 1995; 13(8): 1495–1504. for Video Technology, IEEE 2006; 16(3): 354–362.
23. Kim Y-W, Il-Seok O. Watermarking text document 30. Zhang X, Wang S. Vulnerability of pixel-value
images using edge direction histograms. Pattern differencing steganography to histogram analysis and
Recognition Letters, Elsevier 2004; 25(11): modification for enhanced security. Pattern Recogni-
1243–1251. tion Letters, Elsevier 2004; 25(3): 331–339.
24. Singh P, Chaudhary R, Agarwal A. A novel approach 31. Pierce JR. An Introduction to Information Theory:
of text steganography based on null spaces. IOSR Symbols, Signals and Noise (2nd edn). Dover
Journal of Computer engineering, IOSR 2014; 3(4): Publications, New York, 1980.
11–17. 32. Isabelle de Ridder, Reading from the screen in a
25. Bender W, Gruhl D, Morimoto N, Lu A. Techniques second language: empirical studies on the effect of
for data hiding. IBM Systems Journal, IBM 1996; marked hyperlinks on incidental vocabulary learning,
35(384): 313–336. text comprehension and the reading process.: Garant,
26. Yee Por L, Wong KS, Onn Chee K. UniSpaCh: a text- Belgium, 2003.
based data hiding method using Unicode Space 33. Cutts M. Oxford Guide to Plain English. Oxford
Characters. The Journal of Systems and Software, University Press, United Kingdom, 2013.
Elsevier 2012; 85(5): 1075–1082. 34. Bellare M, Ristenpart T, Rogaway P, Stegers T.
27. Liu T-Y, Tsai W-H. A new steganographic method Format-preserving encryption. Selected Areas in
for data hiding in Microsoft word documents by a Cryptography, Springer Berlin Heidelberg 2009;
change tracking technique. IEEE Transactions on 5867: 295–312.

Security Comm. Networks (2017) © 2017 John Wiley & Sons, Ltd.
DOI: 10.1002/sec

Das könnte Ihnen auch gefallen