0 Bewertungen0% fanden dieses Dokument nützlich (0 Abstimmungen)
69 Ansichten219 Seiten
This dissertation investigates the possibility of a steganographic file system
which does not have to duplicate hidden data in order to avoid "collisions"
between the hidden and non-hidden data. This will ensure the consistency of
the hidden data, and avoid unnecessary data duplication while at the same
time providing an acceptable level of information security
Originaltitel
INFORMATION SECURITY THROUGH IMAGE STEGANOGRAPHYUSINGLEAST SIGNIFICANT BIT ALGORITHM
This dissertation investigates the possibility of a steganographic file system
which does not have to duplicate hidden data in order to avoid "collisions"
between the hidden and non-hidden data. This will ensure the consistency of
the hidden data, and avoid unnecessary data duplication while at the same
time providing an acceptable level of information security
Copyright:
Attribution Non-Commercial (BY-NC)
Verfügbare Formate
Als PDF, TXT herunterladen oder online auf Scribd lesen
This dissertation investigates the possibility of a steganographic file system
which does not have to duplicate hidden data in order to avoid "collisions"
between the hidden and non-hidden data. This will ensure the consistency of
the hidden data, and avoid unnecessary data duplication while at the same
time providing an acceptable level of information security
Copyright:
Attribution Non-Commercial (BY-NC)
Verfügbare Formate
Als PDF, TXT herunterladen oder online auf Scribd lesen
NON-DUPLICATING PROPERTIES by \ IAN DAVID ELLEFSEN DISSERTATION submitted in the fulfilment of the requirements for the degree MAGISTER SCIENTIAE In INFORMATION TECHNOLOGY in the FACULTY OF SCIENCE at the UNIVERSITY OF JOHANNESBURG SUPERVISOR: PROFESSOR SH VON SOLMS CO-SUPERVISOR: MR WJC VAN STADEN NOVEMBER 2008 Contents List of Figures vii List of Tables ix List of Listings XI Notation and Definitions xiii Summary XV I File Systems, Cryptography, and Steganography 1 1 Introduction 1.1 Introduction . 1.2 Problem Statement 1.3 Goals ....... . 1.4 Structure of this Dissertation 1.4.1 Terminology Used in this Dissertation. 1.5 Conclusion . . . . . . . . . . . . . . . . . . . . 2 File Systems 2.1 Introduction . . . . . . . . 2.2 The Disk . . . . . . . . . . 2.2.1 The Physical Disk 2. 2. 2 The Logical Disk . 2.3 File System Layers .... 2.4 Basic File System Abstractions 2.4.1 Files ..... . 2.4.2 Directories ...... . 2.5 File System Structures . . . . 2.5.1 File System Descriptor 3 3 4 4 5 6 7 9 9 10 10 12 12 14 15 15 16 16 11 CONTENTS 2.5.2 Storage Management 17 2.5.3 File Control Block 20 2.5.4 Directory Entries .. 22 2.6 File System Operations . . . 23 2.6.1 POSIX Compliance . 23 2.6.2 Read and Write Operations 23 2.6.3 System Operations 24 2.6.4 File Operations ... 25 2.6.5 Directory Operations 28 2.7 Virtual File System . . . . . 29 2.8 Filesystem in Userspace (FUSE) . 31 2.9 Summary 31 2.10 Conclusion . 32 3 Cryptography 35 3.1 Introduction . 0 35 3.2 Basic Concepts 0 0 36 3.3 Symmetric Encryption 37 3.3.1 Substitution Boxes 38 3.3.2 Data Encryption Standard (DES) 40 3.3.3 Serpent ........... 43 3.4 Block Cipher Modes 0 0 0 45 3.4.1 Electronic Codebook Mode . . 45 3.4.2 Cipher Block Chaining Mode 46 3.4.3 ECB versus CBC 47 3.5 Asymmetric Encryption .... 48 3.5.1 RSA Encryption 0 0 49 3.6 Cryptographic Hash Functions . 51 3.6.1 Message Integrity Codes 52 3.6.2 Message Authentication Codes . 53 3.6.3 Birthday Attack . . . . . . . . 54 3.6.4 Secure Hash Algorithm (SHA) 56 3.7 Summary 59 3.8 Conclusion ............... 60 4 Steganography and Steganographic File Systems 63 4.1 Introduction ..... 63 4.2 Steganography . . . . . . . . . . 64 4.2.1 Terminology . . . . . . . 64 4.2.2 Historic Steganography . 65 4.2.3 Currency Protection Mechanisms 65 CONTENTS II 5 4.2.4 Copyright Protection Mechanisms . 4.3 Digital Steganography . . . . . . . . . 4.3.1 Image Steganography ..... . 4.3.2 Image Steganography Example 4.3.3 Audio Steganography ..... . 4.3.4 Least Significant Bit (LSB) Attacks 4.4 Cryptographic File Systems . . . . . . . . 4.4.1 The Cryptographic File System- CFS 4.4.2 Cryptfs . . . . . . . . . 4.4.3 Linux Cryptoloop Driver . 4.5 Steganographic File Systems . . . 4.5.1 File System Assumptions . 4.5.2 4.5.3 4.5.4 Anderson, Needham and Shamir. McDonald and Kuhn Pang, Tan, and Zhou 4.6 Summary 4.7 Conclusion ......... . SSFS: The Secure Steganographic File System SSFS: File System Implementation 5.1 Introduction . . . . . . . . . . . . . 5.2 Definitions .............. 5.3 Problems with Existing Implementations 5.3.1 McDonald and Kuhn 5.3.2 Pan, Tan, and Zhou ....... 5.4 Aim 0 0 5.4.1 The Need for a Steganographic File System 5.4.2 Limitations of a Steganographic File Systems 5.5 Basic Construction ...... 5.5.1 Modes of Operation . . . 5.5.2 The Host File System . . 5.5.3 The Hidden File System 5.5.4 Logical and Physical View 5.5.5 Operational Scenario 5.6 Summary 5.7 Conclusion . . . . . . . . . iii 67 67 68 68 70 70 72 72 73 74 75 76 76 77 78 79 81 83 85 85 86 87 87 88 90 93 94 95 96 96 97 98 99 . 100 . 102 lV 6 File System Structures for SSFS 6.1 Introduction . . . . . . . 6.2 File Systems Structures. 6.2.1 Superblock . . . 6.2.2 TMap Array . . . 6.2.3 Translation Map 6.2.4 Inode Table . . . 6.2.5 Files and Directories 6.3 File System Initialisation . . 6.3.1 Host File System Initialisation . 6.3.2 Hidden File System Initialisation 6.4 Summary . 6.5 Conclusion ............ . 7 File System Operations for SSFS 7.1 Introduction .......... . 7.2 Layered File System Operations . 7.3 Low-Level Operations ...... . 7.3.1 Read and Write Operations Overview . 7.4 Intermediate-Level Operations . . . . . . . . . 7.4.1 Logical-Physical Translation Operation 7.4.2 Translation-Map Operations 7.4.3 Inode Operations .. 7.5 High-Level Operations ... 7.5.1 Directory Operations 7.5.2 File Operations 7.6 Summary . 7. 7 Conclusion . . . . . . . 8 File System Security for SSFS 8.1 Introduction . . . . . . . . . . 8.2 Security Overview .............. . 8.2.1 Security through Information Hiding 8.2.2 Security through Cryptography 8.3 Data Cryptography . . . . . 8.3.1 Choice of Algorithm .. 8.4 Cryptographic Layer . . . . . . 8.4.1 Transparent Encryption 8.5 File System Data Encryption Scheme 8.5.1 Data Classes. 8.5.2 Interactions . . . . . . . . . . CONTENTS 105 . 105 106 . 106 . 109 . 110 . 112 . 115 . 116 . 116 . 118 . 126 . 127 129 . 129 . 130 . 131 . 131 133 . 133 . 134 136 . 140 . 140 . 146 150 . 151 153 . 153 . 153 . 154 . 155 . 156 . 156 . 157 . 158 . 160 . 160 . 160 CONTENTS 8.6 Encryption Hierarchy ...... . 8.6.1 Initialisation Vectors (IV) 8.6.2 Operational Scenario 8. 7 Performance Concerns 8.8 Summary . 8.9 Conclusion ..... . 9 Dynamic Reallocation 9.1 Introduction ..... 9.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.1 Other Possible Collision Avoidance Techniques . 9.2.2 Operational Scenario 9.3 9.4 9.5 9.2.3 Process Overview . . . . . . . . . . . . . Operational Details . . . . . . . . . . . . . . . . 9.3.1 Access to Hidden File System Structures 9.3.2 Write Redirection . . . . . 9.3.3 Hidden Data Reallocation . 9.3.4 Reallocation Categories ... 9.3.5 Sacrificial versus Preserving Summary . Conclusion ............. . 10 Steganographic File System Performance 10.1 Introduction ............ . 10.2 Hidden File System Performance . 10.2.1 Hidden Data Fragmentation 10.3 Host File System Performance ... 10.3.1 Dynamic Reallocation Performance 10.3.2 Dynamic Reallocation Code Profiles . 10.4 Summary . 10.5 Conclusion . 11 Conclusion 11.1 Introduction . 11.2 Contribution. 11.3 Contribution of SSFS 11.4 Future Work. 11.5 Conclusion . Appendices v . 161 . 162 . 164 . 164 166 . 167 169 169 170 171 171 172 . 173 174 174 177 179 . 181 . 183 . 184 185 185 186 186 187 187 189 190 193 195 195 . 195 197 198 199 200 vi A SSFS Implementation A.l Introduction . . . . . A.2 Host File System .. A.3 Hidden File System . A.4 Screenshots A.5 Conclusion ..... . Bibliography CONTENTS 203 . 203 . 203 . 204 . 206 . 208 215 List of Figures 2.1 Physical disk .. 2.2 Logical disk . . . 2.3 File system layers 2.4 A file system bitmap 2.5 A simple inode ... 2.6 An inode with levels of indirection . 2. 7 Virtual file system overview 3.1 Symmetric encryption ... 3.2 DES S 1 substitution box [18] . 3.3 DES encryption algorithm flow 3.4 Electronic codebook mode . . . 3.5 Cipher block chaining mode .. 3.6 Comparison between ECB and CBC modes. 3. 7 Asymmetric encryption . . . . . . . . . . . . 3.8 File verification using a message authentication code 4.1 Example of an EURion constellation 4.2 Image steganography example 4.3 CFS design architecture [14] . . . . . 4.4 Cryptfs design architecture [14] ... 4.5 Linux Cryptoloop driver architecture [14] 5.1 Simple host file system layout . . . . . 5.2 Hidden file system logical layout .... 5.3 Hidden and host file system integration 5.4 Steganographic file system operational scenario 7.1 File system operation layers ....... . 7.2 Simple read and write operation overview. 7.3 Logical to physical translation operation 7.4 Block allocation . . . . . . . ..... vii 11 12 13 18 20 21 30 38 39 41 46 47 48 50 53 66 69 72 74 75 97 98 99 101 130 132 133 135 Vlll LIST OF FIGURES 7.5 Creating a directory . 142 8.1 Cryptographic layer . 158 8.2 Transparent encryption . 159 8.3 Initialisation vector hierarchy 163 9.1 Write operation execution redirection . 175 9.2 Function modified with reallocation methods 176 9.3 Reallocation black-box functions . 177 9.4 Reallocation categories 0 0 . 179 10.1 Optimised versus unoptimised dynamic reallocation 188 10.2 Code profile of unoptimised dynamic reallocation 191 10.3 Code profile of optimised dynamic reallocation . . 192 A.1 Using the makehfs utility. 0 0 . 209 A.2 Starting the hidden file system shell. . 209 A.3 The hsh commands . . . . . 210 A.4 Directory listing with hsh ... . 210 A.5 File creation with hsh ..... . 211 A.6 Displaying the contents of a file . 211 A.7 Deleting a file with hsh . . . . . 212 A.8 Deleting a directory with hsh . 212 A.9 Creating a file with fsh .. . 213 A.10 Dynamic reallocation in fsh . 213 List of Tables 3.1 Basic cryptographic terms .. 36 3.2 Basic cryptographic functions 37 4.1 Basic steganographic terms . 65 5.1 SSFS definitions . . . . . . . 86 6.1 Calculation of the size of the Translation Map 121 6.2 Calculation of the size of the Inode Table . . . . 124 lX Notation and Definitions xy xt!Jy xy xy xl\y xVy JSI rxl LxJ f : A ~ B gcd(a, b) bit byte kibibyte (KiB) mebibyte (MiB) gibibyte (GiB) Bitwise Exclusive OR (XOR) of x and y. Bitwise Addition of x and y. Bitwise Right Shift of x by y bits. Bitwise Left Shift of x by y bits. Unary Negation of x. Bitwise And of x andy. Bitwise Or of x andy. The size of set S Ceiling function that returns the smallest integer 2: x Floor function that returns the largest integer :::; x A function f mapping A to B A function to compute the Greatest Common Divisor of two non-zero integers a and b. A binary digit -either 0 or 1 A set of 8 bits A kilo binary byte, where 1 KiB = 2 10 bytes = 1024 bytes. A mega binary byte, where 1 MiB = 2 20 bytes= 1024 KiB. a giga binary byte, where 1 GiB = 2 30 bytes = 1024 MiB. Xlll List of Listings 6.1 Superblock structure 0 0 . 107 6.2 Definition of the TMap Array . 110 6.3 Translation Map structures . . 111 6.4 Inode Table entry . . . . . . 113 6.5 Directory Entry structure . 115 Xl Summary This dissertation investigates the possibility of a steganographic file system which does not have to duplicate hidden data in order to avoid "collisions" between the hidden and non-hidden data. This will ensure the consistency of the hidden data, and avoid unnecessary data duplication while at the same time providing an acceptable level of information security. The dissertation will critically analyse a number of existing stegano- graphic file systems in order to determine the problems which are faced by this field. These problems will then be addressed, which will allow for the definition of a possible solution. In order to provide a more complete understanding of the implementa- tion discussed in the latter part of this dissertation, a number of background concepts are discussed. This includes a discussion of file systems, cryptogra- phy, and steganography, each of which contributes to the body of knowledge required for later chapters. The latter part of this dissertation outlines the Secure Steganographic File System (SSFS). This implementation will attempt to effectively manage the storage of hidden data which is embedded within a host file system. The dissertation will outline how SSFS will allow fragments of hidden data to exist in any physical location on a storage device, while still maintaining a consistent file system structure. The dissertation will then critically analyse the impact of such a system, by examining the impact on the host file system's performance. This will allow the feasibility of such a system to be demonstrated. Keywords Information Security, Cryptography, Information Hiding, Steganography, File Systems, Steganographic File Systems. XV Part I File Systems, Cryptography, and Steganography 1 4 INTRODUCTION 1.2 Problem Statement The application of steganography as a viable method for hiding large amounts of information has always been limited. Traditional steganographic tech- niques, such as image and audio steganography [1, 41], can hide only a very limited amount of information, depending on the total size of the cover-file. Steganographic file systems strive to solve this problem by allowing rela- tively large amounts of data to be hidden within an existing host file system [33]. This however presents a new set of challenges. By hiding data within an existing file system there is a conflict which arises between the hidden and non-hidden data. This results in "collisions" when the host file system attempts to write to a physical block which contains hidden data. This in turn results in hidden data being overwritten, and thus the consistency of that data being questioned. In this dissertation we attempt to address this issue by defining a stegano- graphic file system which has the ability to dynamically reallocate fragments of hidden data to any free physical location on a storage device. This will allow hidden data to be effectively embedded within a host file system, and thus guarantee the consistency of that data. This dissertation is primarily concerned with the construction of a dy- namically reallocating steganographic file system. As such, this study does not specifically go into details of data consistency following the incorrect shutdown of the file system, and assumes a computing environment in which data is always written to storage media in a consistent manner. In the following section we will discuss the goals which we wish to achieve with our steganographic file system implementation. 1.3 Goals In order to address the problems outlined above, we present our stegano- graphic file system implementation, which we call the Secure Steganographic File System (SSFS). To allow SSFS to have the required impact we will present the design goals listed below. These goals are outline in greater detail in chapter 5. Security - hidden data must remain protected from attack through the use of effective security mechanisms. 1.4. STRUCTURE OF THIS DISSERTATION 5 Consistency - hidden data which is retrieved from the hidden file system must be the same as the original data which was stored. Transparency - operation of the host file system must not be impacted in a significant way. Backward Compatibility - SSFS must remain backward compatible with the host file system implementation. Dynamic Reallocation - hidden data must be locatable from any phys- ical location. The physical location of the hidden data must be allowed to change. These goals are achieved by SSFS through the construction of a com- pound file system where the implementation contains both the host file sys- tem and hidden file system components. The interactions between these two components are carefully orchestrated to allow for seamless integration. In the following section we will outline the structure of the following dissertation. 1.4 Structure of this Dissertation We will now outline the structure of this dissertation by briefly discussing the content of each of the chapters which will follow. Chapter 2 is the first of the background chapters. In this chapter we present an overview of traditional storage media, and discuss a number of concepts relating to the construction of a file system. Concepts introduced in chapter 2 are used throughout the later chapters. Chapter 3 discusses the concepts relating to cryptography as a method of providing information security. We discuss a number of cryptographic techniques and algorithms. These concepts are used primarily in chapter 8 to describe the implementations security scheme. Chapter 4 introduces steganography as a method of providing information hiding. This chapter discusses both traditional steganographic techniques, and steganographic file systems. An important aspect of this chapter is the distinction between a cryptographic and steganographic file system. Con- cepts discussed in this chapter are used throughout the following chapters. This chapter will conclude the background discussion. 6 INTRODUCTION Chapter 5 is the first of the implementation chapters, where we discuss the implementation details for SSFS. This chapter is concerned with critically analysing a number of existing implementations. We then present our aim for SSFS. We go on to discuss the basic construction for SSFS, which will lay the framework for the following chapters. Chapter 6 discusses the control structures which are used within SSFS in order to embed hidden data within the host file system. The structures discussed in this chapter are used extensively throughout the following chap- ters. Chapter 7 discusses the hidden file system operations as a mechanism for interacting with the hidden data. In this chapter we define the operational layers used by SSFS in order to allow for easy interaction with the hidden file system structures. This is of particular importance for chapter 8 as the security scheme will extend these layers. Chapter 8 is concerned with the security scheme used by SSFS in order to provide information security. Hidden data is encrypted using multiple initialisation vectors to allow for maximum security. This chapter will also define the transparent encryption layer, which will allow hidden data to be encrypted and decrypted transparently, as needed. The security scheme will work in tandem with the dynamic reallocation mechanism discussed in chap- ter 9 in order to allow encrypted data to be reallocated by the host file system. Chapter 9 defines the dynamic reallocation mechanism which is used by SSFS to avoid "collisions" between the hidden and non-hidden data. This chapter utilises almost all of the concepts from the previous chapters in order to describe the dynamic reallocation process. Chapter 10 discusses the performance impact of the dynamic reallocation mechanism on the host file system. This will allow us to critically analyse the effectiveness of SSFS. Finally in chapter 11 we reflect back upon the content on this dissertation and will discuss the contribution made by SSFS. 1.4.1 Terminology Used in this Dissertation In this dissertation, a UNIX approach is taken when discussing various con- cepts. The UNIX environment has a long standing, accepted set of terms used to describe various aspects of the operating system. These terms are generally easy to understand and provide a good basis for development. 1.5. CONCLUSION 7 An advantage for utilising a UNIX approach is that many UNIX-like operating systems are open sourced. This allows the source code, along with large amounts of documentation, for the operating system to be freely accessed. Using this approach as a platform we can describe many different concepts in this dissertation in a consistent, and accepted manner. 1.5 Conclusion In this chapter we introduced the content of this dissertation. In section 1.2 we discussed the problem statement which will define the theme for the following chapters. Section 1.3 presented an overview of the goals which we hope to achieve with SSFS. Finally in section 1.4 we outlined the structure of this dissertation. In the following chapter we will start the discussion with the first of the background chapters, which will deal with file systems. Chapter 2 File Systems 2.1 Introduction As the storage capability of physical devices grew from only a few Megabytes to hundreds of Gigabytes in modern computer systems, a need arose for data to be organised in a consistent logical way. Modern file systems need to provide many more features in today's security centric world as compared to file systems that existed only decades ago. However, although the functional requirements for a file system have changed, the original design concepts are still in use today, and form the basis for many modern file systems. In this chapter the discussion primarily focuses on UNIX based file sys- tems, and as such uses terminology that is specific to UNIX based file sys- tems. References to other types of file systems are included for interest. This is done as proceeding chapters will make reference to UNIX-styled operating systems and related UNIX -specific terminology. In this chapter we discuss different core components of a file system. Firstly we discuss the hard disk drive as a storage medium for a file system. We then introduce four file system layers that can describe the interaction between the file system metadata, the data and the physical disk. Two basic file system abstractions, files and directories, are then discussed as a method for organising data within the file system. We then go on to introduce a number of structures, which are implemented within a file system in order to organise data. Finally we introduce a number of file system operations and conclude with an overview of a Virtual File System. These will all be discussed in the following sections. 9 10 FILE SYSTEMS 2.2 The Disk Information is usually stored permanently on secondary memory. Primary memory refers to the Random Access Memory (RAM) or the cache memory that is physically on the Central Processing Unit (CPU); this memory is usually made up of fast volatile memory that is accessed by the CPU when a program is executed. In fact all data needed for processing has to be stored in the primary memory. The volatility of primary memory does not lend itself to the long-term storage of information. If a system should be powered down, all the information that is stored in primary memory would be lost [50]. All permanent data needs to be stored in secondary memory, such as a hard disk drive. A management structure such as a file system needs to be in place to effectively manage the storage and retrieval of information that is stored on secondary memory. A hard disk drive can be viewed as either logical or physical, these are discussed below. 2.2.1 The Physical Disk Physically a hard disk drive is made up of one or more magnetic platters, a number of read and write heads, and control circuitry that allows the computer to interface with the hard disk drive. Data is stored magnetically on the platters, which are coated in a magnetic substance. The control circuitry allows information to be stored and retrieved from the magnetic platters. Data is stored on the disk in physical blocks or sectors on the magnetic platters; which is the smallest unit of data that the hard disk will read or write [23]. The size of the blocks on the hard disk drive is usually 512 bytes, and thus all data that is stored on the disk is written in sequences of 512 byte blocks. Blocks on the hard disk drive are ordered in the following way, as seen in figure 2.1: Sectors are the blocks themselves. Clusters are two or more adjacent sectors. Tracks are concentric rings of sectors on the disk. Blocks that physically located next to each other are said to be in the same track. Cylinders (not seen in figure 2.1) are groups of sectors that are located on the different platters but that are directly underneath each other. The cylinder will refer to every block in this grouping. 2.2. THE DISK Magnetic Platter I ... Cluster I I ... Track
Figure 2.1: Physical disk - hard disk drive platter 11 12 Logical Linear Array 0 29 1111111111111111111111111111111 .. I I I I '- Block Figure 2.2: Logical disk - linear array of blocks FILE SYSTEMS The physical location of data on the disk is addressed using Cylinder Head Sector (CHS) addressing, where data is located by referring to the cylinder where the data exists, the head number that will read or write the data, and the sector or block in the cylinder where the data can be located. 2.2.2 The Logical Disk It would become very difficult to continually refer to a disk block using its physical address, which may be dependent on a particular manufacturer's specification. So there is a generic view of the hard disk drive, called the logical disk. The logical disk can be simply viewed as a linear array (see figure 2.2) of equally sized blocks [23]. The Logical Block Address (LBA) is the number of a logical block within the logical disk array. The LBA allows the system designers to reference a storage location in a simple consistent manner, regardless of the physical construction of the hard disk drive. This is achieved through the use of methods which convert logical block addresses into a physical block addresses. Now that we have discussed the physical and logical construction of the hard disk drive, we will discuss a number of file system layers in the next section. The file system layers are a high-level overview of the interaction between the file system implementation and the hard disk drive. 2.3 File System Layers The file system layers are a high-level overview of different functional com- ponents that collectively would form a working file system. The lower layers 2.3. FILE SYSTEM LAYERS 13 Application Programs User Space File-Organisation Module Kernel Space File System Implementation Physical Device Figure 2.3: File system layers would usually be implemented in the operating system's kernel, and the higher levels would be implemented in, what we would know as, the file system implementation. Silberschatz, Galvin, and Gagne [50] point out that the higher levels are extended by functionality defined in the lower levels. Information flows through the different layers until it is at a point where the data can be written directly to the disk. The four different functional layers that are used when interacting with a file system are I/0 Control, Basic File System, File-Organisation Module, and the Logical File System. These different file system layers interact together as shown in figure 2.3. We will now discuss each of these function layers in detail. I/0 Control Silberschatz, Galvin, and Gagne [50] define the lowest file system level as I/0 Control. This level is responsible for interacting with the hardware devices 14 FILE SYSTEMS through the use of device-drivers which communicate with the hardware controller in order to retrieve or store data on the device. Basic File System The Basic File System is simply responsible for passing generic read and write commands to the I/0 Control level [50]. A generic command would be used to reference a particular physical block (using an addressing method such as CHS) to access data that is stored on the disk. File-Organisation Module The File-Organisation Module layer is responsible for converting the logical position of data on the disk to a physical address which then can be used to access the data. This layer will also manage a list of disk blocks which are currently being used by the file system, called the allocated blocks, and those which are not being used, called unallocated blocks. Logical File System The Logical File System is responsible for managing the metadata for the files and directories. This is the layer with which the user and application programs would interact. Metadata, such as the human-readable name of a file or directory, would be translated to block addresses in this level to be passed to the File-Organisation Module. This level would be responsible for allocating and managing file system structures that are defined in the lower levels [50]. The file system layers discussed above describe the interaction between the user data and the hard disk drive. We will now discuss files and di- rectories. These two data types can be used to build up an organisational structure, which is essential to the operation of a file system. 2.4 Basic File System Abstractions There are two basic properties that are common to every file system: 1. The ability to store information; this is usually achieved by storing information in files. 2.4. BASIC FILE SYSTEM ABSTRACTIONS 15 2. The ability for files to be organised into a directory structure. This provides a hierarchical organisation of all the information on disk. The primary purpose of the file system is to manage the organisation of files and directories, and implement mechanisms that facilitate the fast and efficient storage and retrieval of the data on the disk. 2.4.1 Files The most basic file system object is the file. Giampaolo [23] refers to the fact that all information on the file system is stored in some sort of file. Files generally do not have a system-defined structure, and are viewed by the file system simply as a "stream of bytes" which needs to be written or read from the disk [23]. The meaning of the data within a file is designed and interpreted by the creator of the file. Different content-types such as audio, video, or text are in essence all a stream of bytes, and are all managed in the same way by the file system. Files usually have a collection of attributes, commonly referred to as metadata. Simply put, metadata is data about data. Very little of the information that is contained in the metadata is useful to the operation of the file system, except metadata indicating the size of the file. Metadata is used to serve as an interface between the raw data contained in the file system and the human operator. 2.4.2 Directories Directories are used to organise data on the disk into an organisation struc- ture. Originally there was no need for a complex hierarchical directory struc- ture because the small sizes of disk drives prevented the storage of a large number of files. As disk drives grew in size, a need arose to organise files on the disk drive in a logical way to allow the operator to quickly find and access the data, the traditional hierarchical directory structure evolved from this need as it allows files to be efficiently organised. A directory can contain sub-directories and files. Love [31] explains that directories in the UNIX file system are simply modified files. UNIX directo- ries are files that contain a list of i-nodes for associated child sub-directories and files. There are a number of different methods that a file system can use to handle the organisational structure of the file system; each of these 16 FILE SYSTEMS methods will have an impact on the overall performance of the file system. As a result, directory structures are usually designed to be managed using an abstract data structure such as a tree to allow for quick traversal of the direc- tory structure. For example, the Linux Ext2 file system uses B-'frees, and the Mac OS HFS+ file system uses B*'frees (see the discussion on Multiway Search Trees on page 19) to manage its directory structure [23). In order for files and directories to be organised in a meaningful way, there needs to be a method of referencing the physical location of the under lying data on the storage device. In the following section we will discuss a number of generic file system structures which are used to build up the "structure" of the files and directories on the file system. 2.5 File System Structures A file system in its most basic form can store, retrieve and organise in- formation in a logical way. In order for this to be achieved, a set of basic information needs to be maintained. This information usually takes the form of a number of data structures that exist within the file system, which are used to manage, coordinate, and reference data. A file system will have to reference and maintain these file system structures for every operation which can be performed on the data. Every file system implementation has a different set of data structures that it will maintain. The following generic file system structures are dis- cussed below, the file system descriptor, the storage map to provide storage management, file control blocks, and directory entries. These basic struc- tures are found in one form or another on most file systems, although they may differ in design and implementation. 2. 5.1 File System Descriptor The file system descriptor will contain the most basic set of information that can be used to describe and reference all other structures within the file system. When a file system is created, all the basic on-disk structures are defined and the physical position of these structures on the disk is determined. Once the on-disk structures have been created the file system will then record their physical location in the file system descriptor. Traditionally the file system descriptor is called the superblock within UNIX file systems; it is also called the Master File Table in the NTFS file system [50). For the purposes 2.5. FILE SYSTEM STRUCTURES 17 of this dissertation, we will use the term "superblock" to refer to the file system descriptor. The superblock must include all attributes of the file system to allow data to be retrieved. This may include the total number of usable blocks within the file system, the number of blocks that are currently in use, pointers to any storage maps, and pointers to any file control blocks [50]. Although there will be a number of similarities between the design of the superblock on different file system, the contents of the superblock is determined by the file system designers. A common component of most superblocks is some form of consistency information, this will be used to mark if certain operations and structures in the file system have been stored correctly, and to determine if a consistency check needs to be run. The superblock will also contain generic metadata about the file system, such as an identifying name, or any "Magic Numbers" 1 . In the following section we will discuss storage management, which will allow physical disk block to be allocated and deallocated within the file sys- tem. 2.5.2 Storage Management All the blocks on a disk which are allocated to a particular file system need to be mapped in some way. This mapping will allow the file system to record which physical blocks are currently in use. The file system will use the storage map in conjunction with an allocation policy to determine where data should be positioned, this is usually done to minimise fragmentation. File system blocks can be organised in a number of different ways in order to facilitate the structure of the data. Listed below are a number of commonly used methods for organising the physical file system data. Block ~ a file system block. A file system block is the smallest unit of data which the file system will process, and can be a different from the physical disk block size. Extent ~ a number of contiguous file system blocks. An extent will usually be represented by a start address, and a total length. 1 M agic Numbers are simply constant numbers that can be used to identify data struc- tures, to provide a simple method of consistency checking, or to differentiate between versions of data structures. An example of a magic number could be Ox53424C4B, which could be represented in ASCII as SBLK. 18 FILE SYSTEMS Blockrun - another term for an extent. Allocation Group- a number of contiguous extents or blocks. Usually a file system can be broken up into a number of equal sized allocation groups. As in the case of Ext3 each allocation group is a regarded as a "mini-file system", each with a set of corresponding file system structures. In the following sections we will discuss a number of methods that can be used to manage the file system blocks. These techniques allow the file system to manage which of the file system blocks are either allocated or unallocated. Storage Bitmap The simplest approach for storage management is to use a bitmap. This approach was used in early file system design as is it very simple to imple- ment, and easy to understand. A storage bitmap will represent the entire physical device as a linear array of file system blocks, as seen in figure 2.4, each physical block with a corresponding bit in the storage bitmap. Each bit can either be 0 when the block is not allocated and 1 when the block is allocated [34]. Although a bitmap is a very simple solution to mapping file system blocks, it can be inefficient. If the bitmap is implemented as a linear array of bits, then it is subject to the same searching constraints as normal linear arrays. The "worse-case" running time for searching a linear array can have a time complexity of O(n) [29]. B Allocated 0 Unallocated Figure 2.4: A file system bitmap Silberschatz, Galvin, and Gagne [50] explain that modern processors im- plement "bit-manipulation" instructions which allow the bitmap operations to be implemented in a very efficient manner, and this allows a bitmap im- plementation to gain a major performance advantage. However they also 2.5. FILE SYSTEM STRUCTURES 19 point out that there will only be an advantage when the bitmap is kept in memory, and as seen above, this is not always possible because of the storage requirements of larger hard disk drives. Multiway Search Tree Implementation Another approach for storage management is to use a more complex data- structure, such as a multi way search tree, such as a B-Tree or a B+ Tree. The XFS 2 file system implementation utilises B+ Trees to manage the storage blocks on a physical device. XFS manages disk blocks in allocation groups using two B+ Trees to manage the free space within each allocat.ion group [38]. Both B+ Trees store a stored array of free space extents, where the first is sorted by block offset, and the second is stored by the size of the extent. This will allow free space to be located near a particular physical block offset [38]. Allocation Policies Allocation policies are implemented in most file systems in order to allocate blocks in the most contiguous way, where all the data relating to a file is stored as sequentially as possible on the disk. This usually involves trying to locate a set of contiguous unallocated file system blocks in the storage map. Smith and Seltzer [51] point out that a typical UNIX file system will have a performance degradation of about 15% after two years of operation, due to file fragmentation. The design aim of most allocation policies try to increase locality of reference of data in order to minimise seek times, and to improve the overall layout of the data on the disk [34]. FFS 3 will simply place data from the same file within the same allocation group [34] thereby increasing locality of reference. Modern hard drives can be very large; a 500GiB hard drive is not un- common. A storage map may become extremely large; this must be a con- sideration because the storage map will need to be searched for free blocks. Through efficient use of a storage management structure, and an efficient al- location policy, free space can be located quickly and this can greatly improve the performance of a file system. 2 XFS file system implementation created and maintained by Silicon Graphics, Inc. (SGI) 3 FFS- The Fast File System for UNIX is used in the 4.3BSD Family of Operating Systems 20 FILE SYSTEMS In the following sections we will discuss the File Control Blocks and the Directory Entries as a mechanism for referencing specific files and directories within the file system. 2.5.3 File Control Block Inode Structure Inode Number Attributes Physical Disk Direct Block 0 Data Block 1001 Direct Block 1 Data Block 1002 Direct Block n Data Block 3454 Figure 2.5: A simple inode The file control block is one of the most important structures that can exist within the file system; as it is responsible for describing the location of a file on the disk, and for storing any of the file's metadata. Traditionally the file control block is called an "inode" on UNIX systems [34]. The design of an inode is of critical importance because specific attributes of the file system are defined by the inodes, such as the maximum amount of disk space that can be allocated to a single file [23]. With files becoming larger to accommodate data such as video and audio; inodes need to be designed in such a way that allows the file system to address large amounts of data. A simple inode could have a structure similar to that shown in figure 2.5, where the inode would contain metadata relating to a particular file. Refer- ences to the physical location of the file data would be stored in a "block list". A block list is an array of disk blocks where the data an inode references is located. To allow file systems to store large amount of data, there is a level of "indirection" which is introduced into the inode structure. For example, given a file system that has a block size of 1024-bytes, and assuming an inode directly refers eight disk blocks in a single structure. A single inode, 2.5. FILE SYSTEM STRUCTURES 21 and therefore a file, can only reference a maximum of 8192-bytes, or 8KiB, of disk space, this is not nearly sufficient for modern computing. A file system implementation will introduce a level of indirection into the structure of an inode, to increase the number of disk blocks that it can refer to, and as a result increase the maximum file size. An inode will reference a number of "direct" blocks, and then will have a reference to a number of "indirect" blocks, which in turn will reference a number of "direct" blocks. In most cases an inode will also reference "double-indirect" blocks which will reference a number of "indirect" blocks. In rare cases the can be a third level of indirection, and an inode can "triple-indirect" blocks. The relationship between direct, indirect, and double-indirect blocks can be seen in figure 2.6. !node Structure Inode Number Attributes Physical Disk Direct Block 0 Data Block 1001 Direct Block 1 Data Block 1002 Direct Block n Data Block 3454 Indirect Block Direct Block 0 Data Block 3553 Double-Indirect Block Direct Block 1 Data Block 2612 Direct Block n Data Block 109 Indirect Block Indirect Direct Block 0 f-------to Direct Block 0 Indirect Direct Block 1 Direct Block 1 : Indirect Direct Block n Direct Block n Double-Indirect Block Indirect Block Figure 2.6: An inode with levels of indirection The levels of indirection can dramatically increase maximum file size. For example, again assuming a file system block size of 1024-bytes, and assuming an inode that references eight "direct" blocks, and references eight "indirect" blocks, which will in turn reference eight direct blocks. The direct blocks would reference 8KiB, and each of the eight indirect-blocks would 22 FILE SYSTEMS reference 8KiB, the total amount of disk space that a single inode could reference would be 65KiB, and this could be increased in a similar way by using "double-indirect" blocks. UNIX file systems will store inodes in a "table of inodes" that will be located somewhere on the disk. Inodes can be stored in one large single ta- ble, but most UNIX file systems will store inodes for a particular allocation group in a separate table. This would improve performance of the file sys- tem by increasing locality of reference between the file and any associated management structures. The NTFS file system manages file metadata in a different way, the Master File Table stores all the metadata for a file in a rational database, with an entry for each file on the system [50]. This is a more complex method of handling file metadata, but has all the advantages of a rational database, such as easy indexing and complex queries. In the following section we will discuss the Directory Entries as a method for managing a hierarchical directory structure, which is vital for the man- agement of data on the physical disk. 2.5.4 Directory Entries Directory Entries are used to manage the directory structure on the disk. Dif- ferent file system implementations will use different types of abstract struc- tures to manage the file system directory hierarchy. Directory structures can be maintained using simple arrays to complex trees, each offer a different set of advantages to the overall directory structure. The simplest approach is to view directory entries as a special type of file; the directories entries themselves are simply a linear list of sub-directories and files that exist within the current working directory [23). This approach can become inefficient when a directory contains a large number of files. Giampaolo [23) points out that another approach is to store directory entries in a tree structure such as a B-Tree, B+ Tree or a B*Tree. Carrano and Savitch [9) explain that a B-Tree is a balanced multi way search tree of order m, where each node in the tree can have up to 2m children. Giampaolo [23) goes on to explain that B-Tree will allow the file system to store a key for each directory entry, which will allow for the directory structure to be traversed quickly. Every file system regardless of structure requires a root directory, usually referred to simply as the root, which will contains a number of files and directories. The root is represented by a backslash ('/') on UNIX systems, 2.6. FILE SYSTEM OPERATIONS 23 and by a letter designation ('c: \')on Microsoft Windows platforms. It serves as a "mounting point" from which the rest of the directory structure can be referenced. In order for the operating system to interact with the file system im- plementation, a number of file system operation must be supplied by the operating system. These operations allow access to the file system structures and the data which is stored on the storage device. A number of generic file system operations are discussed in the following sections. 2.6 File System Operations The file system needs to provide mechanisms to manage data that it contains; this is achieved through the use of a number of operations which the file system provides. The most basic of the file system operations is the read and write operations, all other file system operations are simply a combination of either a read or a write operation. 2.6.1 POSIX Compliance The Portable Operating System Interface (POSIX) is a standard that is maintained by the IEEE and The Open Group, which allows for interoper- ability between different operating systems. This is achieved by requiring compliant operating systems to implement a standard set of system calls and system utilities [24]. Part of the requirements for POSIX compliance, is the implementation of standard File and Directory Operations, which are discussed below. The interested reader is referred to the Single UNIX Spec- ification [24] for more information regarding the POSIX interface. 2.6.2 Read and Write Operations The read and write operations are the two most basic operations that the file system must support. Both the read and write operations will handle the translation from the logical addressing used in normal file system operations to the physical addresses on the hard disk drive. Giampaolo [23] points out that all file systems need to implement these low-level operations and furthermore implement more advanced features that extend the functionality of the file system. 24 FILE SYSTEMS File system operations can be divided into different categories, namely System Operations, File Operations, and Directory Operations. System op- erations provide the operating system access to the file system. File and directory operations act on the data within the file system. These different types of operations are discussed in the following sections. 2.6.3 System Operations The file system must support a number of basic system operations that man- age the creation, called the initialisation, of the file system, the initial access of the file system, called the mounting, and the shutdown of the file system, called the unmounting. Each of these operations is described below. Initialisation The initialisation operation controls the creation of the file system. This operation is responsible for the creation and set-up of all the file system structures that are going to be used during normal file system operation. The superblock, any storage maps, file control blocks, and all associated file system information is gathered and stored on the storage device where the file system will be located [23]. The root directory also needs to be created during this operation, which will allow data to be created and accessed at a later stage. Once all the file system structures have been created, the location of the structures is recorded in the superblock, and from this point the file system is ready to be used. As Giampaolo [23] points out, the initialisation of modern file systems is done by user programs, and not the file system itself. Mounting The mounting operation is performed whenever the file system is initially exposed to the operating system. During this operation, the superblock is read into primary memory, and any required access control mechanisms will be created. The operating system will be able to access the storage maps, and the file metadata. The mount operation will usually attempt to run a consistency check on the structure of the file system. Should a basic consistency check fail, then a more intensive check will be performed on the file system. A consistency 2.6. FILE SYSTEM OPERATIONS 25 check is usually performed on the file system if it was not unmounted cleanly from the operating system. Unmounting The unmounting operation will cleanly detach the file system from the op- erating system and release any resources the file system is utilising. The unmount operation will flush any blocks that are waiting in the block cache to the storage media, and update any of the file system structures that have been changed during the normal operation of the file system. Once the file system's structures and data have been written to disk, the unmounting operation will mark a flag in the superblock to indicate the file system was unmounted cleanly, and finally the superblock is flushed to disk. We will now discuss the file operations which are used to support the storage and retrieval of files from the file system implementation. 2.6.4 File Operations The file system needs to support a variety of operations which are used to perform a number of actions on files. The reading and writing operations will extend the file system's read and write operations and will operate directly on the block space where the file exists. The create(), delete() and open() file operations will operate on the file's metadata. By combining operations, the file system can create complex operations, such as a move or a rename operation. A file can be regarded as an abstract data type and as such the file system needs to provide generic operations that will act on files regardless of their internal structure [50]. These generic file operations are described below. Creating a file Silberschatz, Galvin, and Gagne [50] state that there are two steps involved in creating a new file. Firstly, allocating space on the hard disk drive for the file, and secondly, modifying the directory entry where this file will exist in order to reflect the new file. Allocating space for the file involves finding unallocated space in the storage map to house the file, then modifying the storage map in order to reflect that the storage space is now allocated, and then allocating or creating a file control block for the new file. Once the file has been allocated, the file's 26 FILE SYSTEMS parent directory entry must then be modified to reflect the new file. Once all the metadata has been created and written to the disk, data can be written to the underlying storage device. Deleting a file Removal of the file from the file system is the reverse of file creation. There are again two steps involved, unallocating the storage space where the file existed, and removing any metadata related to the file. The storage map then needs to be modified, unlike file creation; storage space now needs to be marked as unallocated, which will allow the file system to reallocate the storage space to any new files that will be created at a later stage. Removal of the metadata belonging to the file involves removing the file reference from the directory entry, and removing, or unallocating, the file control block which references the file. Most file systems do not remove the actual file data, but just allow the blocks where the file existed to be overwritten at a later stage. This how- ever, is a very insecure method of removal, as there is always a chance that deleted information can be recovered at a later stage. This is done to allow for a performance improvement; only removing file metadata is a much less intensive operation than removing all of the file's data. When this deletion method is in use, it is possible for deleted data to be recovered from a file system [10]. More secure file systems "zeroize" 4 the blocks where the file existed, mak- ing it more difficult for the information to be recovered at a later stage. However this will greatly decrease the performance of a file system. Opening a file Opening a file is usually achieved through a system call that will instruct the operating system to access the file system and create a pointer to the file. Any controls on the file, such as whether a file is read-only or not, will be controlled through the operating system's interface. A file is usually opened using a PO SIX compliant open() command on UNIX systems. The operating system will then request the file system to return the relevant metadata for the file. A file pointer will then be created which will allow user processes to interact with the file. 4 "zeroize" refers to the process involved in writing zero ( OxOO) to every byte where a file existed. 2.6. FILE SYSTEM OPERATIONS 27 The operating system maintains a table of all open files and the associated control mechanisms. Should a process request a file pointer that does not point to a valid file, or points to a closed file, the operating system will generate an error, which will be passed to the process. Reading a file The file read operation is required to read the data from the underlying device and store the result in a buffer that will be returned to the invoking process. A read () operation is PO SIX compliant on UNIX systems. The file read operation that is provided by the operating system is simpler than the file write operation, because none of the on disk structures are modified [23]. The operating system will store a position indicator for each open file which will be used by the read operating to retrieve data, and also used to indicate if the end of the file has been reached. This allows the operating system to read streams of data. Writing a file The file write operation is more complex that the read operation, because it needs to handle many different situations, for instance, appending data to a file may require that the file's metadata be expanded, and as Giampaolo [23] points out, even modifications to the superblock may be required. The write() operation is a POSIX compliant system operation. The most basic form of the write operation allows data to be written to an existing file, should a file not exist; it will then be created by the operating system. This operation needs to handle many different situations, such as when the file needs to grow beyond the size of the blocks that are currently allocated to it; the file system will then need to allocate more blocks to the file. This process needs to be handled with care, as many of the file system's structures may need to be modified. Firstly the file system needs to find space in the storage map and mark the blocks as allocated. The file's control block will need to be modified, specifically the direct and indirect block addresses in order to allocate file system blocks to the file. The file's metadata will then need to be updated to reflect the new size of the file. In the following section we will discuss directory operations. Directories are easiest viewed as special types of files, and thus directory operations are closely related to file operations in design. 28 FILE SYSTEMS 2.6.5 Directory Operations Directory operations are very closely related to file operations because both sets of operations act on the same generic type of data, however there are differences in the way in which files and directories need to be handled. Op- erations such as creating a directory are generally more complicated because of the hierarchical directory structure that needs to be maintained. Creating a directory Giampaolo [23] argues that creating a directory is a more complex operation than creating a file. In UNIX systems both files and directories have inodes to store metadata; different file systems will use similar methods to store metadata. As a result, the creation or allocation of an inode is very similar to the creation of a file. However, a directory needs to be initialised, and the more complex the structure used to manage the directory hierarchy, the more complex this initialisation will be. Together with initialising the directory structure for the newly created directory, the parent directory entry will also need to be modified in order to maintain the hierarchy correctly. This operation needs to be handled with care because, it is fundamental to the creation of a hierarchical file system [23]. Deleting a directory The directory deleting operation is very similar to the operation for deleting a file, however there needs to be care taken to manage the items which the directory contains. This is an implementation dependant approach, as every file system will handle this situation in a very different way. The most common solution is to only allow a directory to be deleted if it has no dependencies. Another solution is to recursively delete anything that is contained in this directory, this can be a very expensive and time consuming operation if the parent directory contains many other files and directories. Opening a directory The directory open command is fairly simple, the operating system will re- quest the file system to open the directory using the PO SIX opendir () command. Just as the open() command provided access to the contents of 2.7. VIRTUAL FILE SYSTEM 29 a file, the opendir () command must provide access to the contents of the directory [23). Internally this operation needs to provide a mechanism for the operating system to access the directory entry that refers to a directory. Again if a simple internal directory structure is used, then this is a fairly trivial operation. Reading a directory As Giampaolo [23) discusses, the operation which reads the contents of a directory operates together with the directory open command to provide the directory listing, usually achieved by issuing a PO SIX compliant readdir () command. The main purpose of the directory reading operation is to provide a convenient method of enumerating the directory contents. Writing a directory The directory writing operation does not manifest itself as a single operation in most operating systems, and refers to the processes involved in updat- ing a directory entry to reflect a newly created entity, such as a file or a sub-directory. This operation again varies in complexity depending on the underlying structure of the directory entries, the more complex the structure, the more complex the handling of the directory entries. Such is the case in a system that utilises a complex structure such as a B-Tree; the directory write operation will need to ensure that the tree remains balanced, and as a result may need to perform rotations, and rebalancing operations on the directory structure. Most modern operating systems aim for a level of interoperability between multiple file system implementations. In the following section we will dis- cuss the Virtual File System as a method for providing an abstraction layer to allow multiple file systems to transparently interact within an operating system. 2.7 Virtual File System The goal of most modern operating systems is to provide the ability for its users to access a large variety of different file systems, thus allowing maxi- mum interoperability between systems. A Virtual File System (VFS) is an 30 FILE SYSTEMS Operating System Kernel EJ Figure 2. 7: Virtual file system overview abstraction layer that sits between a file system implementation and the op- erating system kernel. The VFS provides a generic interface that is utilised by the kernel, and then relays the commands to the file system implementa- tion. The operating system does not differentiate between files on different file system because all operations are performed indirectly by the VFS. The VFS is sometimes called the V node layer. The VFS can be easily extended to include new file systems, and this does not require modification to the kernel of the operating system. In order for the VFS to utilise a file system, an interface between the open () , read () , and write 0 commands of the VFS and the file system's corresponding com- mands must be provided. This allows for an operating system to transpar- ently access many different file systems regardless of their implementation, as outlined in figure 2. 7. An extension to the VFS layer is the concept of Vnode stacking [26, 61] which allows for modules to be inserted into the VFS interface which would transparently extent the abilities of a file system. Function calls that are passed through the VFS layer are then passed through any number of stackable layers until the actual file system implementation interacts with the disk. An example of such a stackable file system is WrapFS [61] which wraps onto a directory on an existing file system, and can be used to provide additional features such as transparent encryption or compression. In the following section we will discuss the Filesystem in Userspace. 2.8. FILESYSTEM IN USERSPACE (FUSE) 31 2.8 Filesystem in Userspace (FUSE) Filesystem in Userspace (FUSE) [53] is an extension of the operating system kernel which allows a file system to be implemented in userspace. This allows file systems to be easily developed without having a direct interface with the operating system kernel. FUSE is currently part of the Linux kernel, and is available on a number of platforms, such as FreeBSD and MacOS X. File systems implemented using FUSE are highly portable and can be used on any operating system which include the FUSE kernel extension. FUSE file system implementations communicate with the operating sys- tem kernel through the FUSE libraries, which will in turn communicate with the kernel's VFS layer. FUSE file systems allow the easy expansion of the overall functionality of the kernel. FUSE can be used to create a powerful file system implementation, such as NTFS-3G driver. This driver offers a fully stable NTFS file system imple- mentation, which operates completely in userspace. The interaction between the kernelspace and userspace does introduce a performance impact, as such, FUSE file systems are not considered to be as efficient as native implementations. However, steps can be taken to maximise performance and produce a high performance file system. In the following section we will present a summary of this chapter. 2.9 Summary In this chapter we covered the following topics: The Disk - in which we described the layout and organisation of a physical device on which a file system structure is created. File System Layers -- where we discussed a number of conceptual layers which can be used to describe the component parts of a file system implementation. File System Abstractions - where we discussed the basic abstract stor- age containers for a file system, namely: - Files - as a container for storing raw streams of bytes. - Directories - as a container for creating a hierarchical organisa- tional structure containing files and other directories. 32 FILE SYSTEMS File System Structures -- in which we discussed a number of file system control structures which are used to house the file system metadata, namely: - File System Descriptor which stores file system metadata, such as the file system size. Storage which is used to mark allocated and un- allocated file system blocks. - File Control Blocks which store metadata concerning the files. - Directory Entries which store metadata concerning the direc- tories. File System Operations which describe the operations which are performed by the file system in order to manipulate metadata, files and directories, namely: - POSIX Compliance a set of standard operations which allow the file system implementation to interface with different operat- ing systems. - Read and Write Operations the operations provided by the operating system, which allow the file system implementation to interact with the physical device. - System Operations operations which interact with the file sys- tem metadata. - File Operations operations which interact with files. - Directory Operations - operations which interact with directo- nes. Virtual File System an abstraction layer provided by the operating system which allows the interaction of multiple file system implemen- tations. Filesystem in Userspace- an operating system kernel extension which allows file system to be implemented in userspace. 2.10 Conclusion In this chapter we discussed the basic concept of a file system. We discussed the structure of the low-level storage media. We then went on to introduce 2.10. CONCLUSION 33 how a file system can be conceptualised as a number of interacting layers, which can be used to control the flow of data through the file system, even- tually resulting in the permanent storage of the data on the storage media. The discussion continued with the introduction of the file system struc- tures that are used to control the storage and retrieval of the data on the disk. We then commented on some of the operations that are found in the file system in order to act on the stored data. Lastly we discussed the Vir- tual File System, and the Filesystem in Userspace which allows a file system implementation to be implemented in userspace. This chapter, along with chapter 3 and chapter 4, form the foundation which is used in later chapters. Concepts introduced in this chapter are used extensively throughout chapters 6 and 7 in order to describe the component parts of the steganographic file system. Many different types of information systems rely on cryptography to pro- vide a level of information security. In the following chapter we will cover many different aspects relating to cryptography which will be referred to throughout the remaining chapters. Chapter 3 Cryptography 3.1 Introduction Cryptography plays a very important role in modern society. With an ever increasing amount of personal information being stored on computer systems, and transmitted over the Internet, mechanisms need to be in place to ensure that this data remains secure. This need to secure information has always played an important role in human history; from the simple substitution ciphers made famous by the Romans, to a new age of quantum cryptography, there has always been a need to keep information secure. In this chapter we discuss some basic cryptographic principles and then go on to further discuss some specific cryptographic techniques and algorithms. Firstly in section 3. 2 we discuss some basic cryptographic terms that will be used to form a basis for the cryptographic techniques that are discussed in the later sections. We go on in section 3.3 to discuss symmetric encryption techniques by introducing some basic theory, and then go on in sections 3.3.2 and 3.3.3, to discuss the DES and Serpent algorithms respectfully. DES is a good example of a cryptosystem, because it has been in use for a number of years and its properties are fully understood. Serpent is a modern cryptosystem which was created using elements of the DES algo- rithm. Although there are many different encryption algorithms; DES and Serpent provide a complete overview of the design elements of symmetric cryptosystems. We then go on to discuss two different block cipher modes in section 3.4, namely electronic codebook mode in section 3.4.1, and cipher block chaining mode in section 3.4.2. This is then follows with a comparison of these two 35 36 CRYPTOGRAPHY techniques in section 3.4.3. These two techniques provide a good understand- ing of block cipher modes. All other block cipher modes extend the basic principles that will be discussed. A discussion of asymmetric encryption is then presented in section 3.5, again presenting some theory, which is then followed by a brief discussion of RSA encryption in section 3.5.1. Finally in section 3.6 we introduce cryp- tographic hash functions with a discussion of message authentication codes in section 3.6.2, and message integrity codes in section 3.6.1. We continue this discussion with the Birthday Attack in section 3.6.3, and conclude this chapter with a discussion of the SHA-1 algorithm in section 3.6.4. Although there are many different cryptographic systems that are in use today; the ones that will be discussed in the following section provide a good understanding of the principles that are employed in many different cryptosystems. 3.2 Basic Concepts Throughout this chapter certain terms will be used in order to describe the discussed cryptographic systems. These terms are in line with those outlined by Schneier [46], and are described in table 3.1. Plaintext Ciphertext Cryptosystem Key The unencrypted message or data The encrypted message or data The cryptographic system used for encryption The key that is used to facilitate encryption Table 3.1: Basic cryptographic terms A cryptosystem is simply a collection of mathematical functions that allow plaintext to be obfuscated into cryptotext; called encryption. A reverse function is usually defined that can return the cryptotext to the original plaintext; called decryption. In order to control the process and to provide security, the key is provided. To ensure that the message can be only be decrypted by an authorised person, the key or part of the key needs to remain secret. Along with the basic terms used to describe a cryptosystem, a set of basic functions are also defined in order to mathematically describe a cryptosystem. These basic functions are described in table 3.2. 3.3. SYMMETRIC ENCRYPTION 37 P Plaintext C Ciphertext E() Encryption function D() Decryption function Ek () Encryption function using key k Dk() Decryption function using key k Table 3.2: Basic cryptographic functions In the following sections we will discuss Symmetric Encryption and Asym- metric Encryption. These two encryption schemes provide the basis for the standard encryption algorithms that are in use today. We will then go on to discuss different algorithms for each of the two schemes. 3.3 Symmetric Encryption Symmetric encryption refers to the family of cryptosystems that utilise a "shared secret" approach to data encryption. The shared secret usually takes the form of an encryption key, or a passphrase that is used to control the encryption process. Symmetric cryptosystems have two different forms, namely stream ciphers and block ciphers. Stream ciphers are used to encrypt a single character at a time, opposed to block ciphers that are used to encrypt a block or a number of characters at a time. We will only be concerned with block ciphers because of the prominent role they play in cryptographic and steganographic file systems, which will be discussed in later sections. A block cipher that encrypts 128-bits at a time is said to have a 16-byte block size. Symmetric cryptosystems have a form as seen in equation 3.1, and as seen in figure 3.1. c p (3.1) Symmetric cryptosystems have an inherent weakness; the key has to re- main secret for the cryptosystem to be effective. If the key is compromised in any way then the validity of the encrypted data can no longer be assured. 38 CRYPTOGRAPHY Encryption Key l Decryption Key l Algorithm 1--l ---. .. Plaintext Figure 3.1: Symmetric encryption Therefore all parties involved in the encryption process need to take adequate steps to ensure that the encryption key remains secure. Schneier [46] outlines how a symmetric cryptosystem would be used to encrypt and decrypt data: 1. Alice and Bob agree on a cryptosystem that will be used to encrypt the data. 2. Alice and Bob agree on a key k. 3. Alice encrypts the data with the selected cryptosystem and the key k. 4. Alice sends the encrypted data to Bob. 5. Bob decrypts the data using the selected cryptosystem and the key k. Now that the principles of symmetric encryption have been discussed, we will now discuss substitution boxes. Substitution boxes are used in many different symmetric cryptosystems as a method of securely encrypting data, as with the DES algorithm discussed later. 3.3.1 Substitution Boxes Substitution boxes, or S-Boxes are used in block ciphers to perform a sub- stitution of bits; it is argued that S-Boxes are what give block ciphers their security, because the substitution is a non-linear step in the encryption pro- cess [46]. S-Boxes are discussed because of the important role they play in 3.3. SYMMETRIC ENCRYPTION 39 0000 0111 1111 0111 0100 1011 1001 0101 0011 1000 0001 1110 1000 0111 0011 1010 0101 0000 1100 1000 0010 1110 1010 0000 OllO llOl ' ' I ' ' I ' I ' ' I ' I ' lo ll @ill 111 r1o111 Bit Block Outer Bits Inner Bits Output Bits Figure 3.2: DES 81 substitution box {18} symmetric cryptosystems, specifically in the DES and Serpent algorithms that will be discussed below. A S-Box could be represented as a function that would resemble equation 3.2, where St is a substitution box, bi are the inner bits, b 0 are the outer bits, and bsub is the result of the substitution (3.2) In figure 3.2, 8 1 represents the first S-Box from the DES block cipher. This S-Box is implemented as a 4 x 16 matrix, each of the rows represents a number from 0 to 3, and each column represents a number from 0 to 15. The original bit block is a binary number consisting of 6 bits, where b 0 , b 1 , ... , b 5 represent each individual bit, and b 0 is the least significant bit. The complete original bit block in figure 3.2 is 001101b, the outer bits are obtained by selecting b 0 and b 5 ; the most significant bit and the least significant bit. In this case the outer bits form the number Oh. The inner bits are obtained by selecting b 1 , b 2 , b 3 , and b 4 ; the bits between the most significant and least significant bits. The inner bits in this case form the number 0110b. The substitution bits would be the result of the function S 1 (01b, OllOb), which would form 1101b. The result of this substitution would be combined with other substitution operations to form a step in the encryption process. Substitution using S-Boxes are an integral step in the encryption process of both the DES and Serpent algorithms discussed below. 40 CRYPTOGRAPHY The DES algorithm was one of the first to be widely adopted by institu- tions to secure data. Although other encryption algorithms existed, there was no standardisation which limited the commercial use of cryptography. The acceptance of DES as a standard allowed for more commercial applications of cryptography. The Serpent algorithm that is also discussed below uses principles of the DES algorithm to create a cryptosystem that meets the needs for modern data security. Serpent was an AES finalist, and as such was designed to provide a fast and secure data encryption algorithm. Both DES and Serpent are good examples of cryptosystems that are widely in use today. 3.3.2 Data Encryption Standard (DES) History of DES Schneier [46] explains that in the early 1970s the use of non-military cryp- tography was not standardised, although there existed a number of crypto- graphic algorithms they were all different and could not be used to inter- change encrypted data, this limited their commercial use. The Data En- cryption Standard (DES) became a United States federal standard in 1973, and was used to encrypt "non-classified" government data. The American National Standards Institute (ANSI) then adopted DES for commercial use, and eventually many different industries started to utilise DES as the pre- ferred method for securing data. The DES algorithm itself was derived from the IBM Lucifer algorithm that was developed in the early 1970's [52]. DES Encryption Algorithm The full DES algorithm is outlined in the Federal Information Standards Publication 46-3 [18], and operates on 64-bit blocks using a 64-bit key. In order to encrypt a block of plaintext, a number of operations are applied to the block in order to produce the cryptotext block. Firstly a block of plaintext is permuted using what is known as the Initial Permutation (I P); which simply rearranges the bits of the plaintext block. This permuted block then goes through 16 iterations of a key-dependant calculation in order to obtain the preoutput, and then finally permuted using the Inverse Initial Permutation. This can be seen graphically in figure 3.3. The basic encryption algorithm for a single block of plaintext is show in equation 3.3. The 64-bit plaintext block is broken up into two 32-bit blocks, 3.3. SYMMETRIC ENCRYPTION 41 I Plaintext Permuted BlockJ
' I Inverse IP J 1 \ Encryption Function ; Preoutput ------ 1 Ciphertext Block ! ! !..----------------------------------------------' Figure 3.3: DES encryption algorithm flow called R and L; representing the left and right 32-bits of the plaintext. For each iteration of the encryption function, R and L are combined with a unique key (Kn) to generate a new Rand L which are used in subsequent iterations to finally produce the preoutput for the current block. For a description of the notation used below, please refer to "Notation" on page xiii. Kn KS(n,KEY) Rn-1 Ln-1 EB f(Rn-1' Kn) where n = 1, 2, ... , 16 (3.3) In equation 3.3, K S is the "key schedule" function, which is used to pro- duce the unique key Kn for the current iteration ( n). The "cipher function" (f) is used to encrypt a 32-bit block using the unique key. Ln and Rn are the generated for the current iteration. The key schedule and the cipher function will be discussed below. The complete DES encryption algorithm can be seen in algorithm 1. The cipher function (f) and the key schedule function (KS) will now be discussed below. These two functions are discussed because of the important role that they play in the functioning of the DES algorithm. The inter- ested reader is referred to FIPS PUB 46-3 [18] for more detailed information regarding these and other elements of the DES algorithm. The Cipher Function (f) The cipher function is used during each of the 16 iterations of the DES encryption algorithm. The cipher function accepts two input blocks, a 32- bit block and a 48-bit key, and will produce a 32-bit ciphered block. Firstly the cipher function creates a 48-bit block from the 32-bit input block through 42 Input: inputBiock - a block of plaintext. Input: key- a secret key. Output: outputBiock - a block of ciphertext. LR +-- Ini tialPermutation (inputBiock); L 0 +-- left 32 bits of LR (bits 63-32); Ro +-- right 32 bits of LR (bits 31-0); for n +-- 1 to 16 do Kn +-- KeySchedule(n, key); Ln +-- Rn-l; Rn +-- Ln-lEB CipherFunctionCRn-1, Kn); end CRYPTOGRAPHY outputBiock +-- Inverseini tialPermutation (L 16 , R 16 ); Algorithm 1: DES Encryption Algorithm [18] the use of what is referred to as the E function - this is simply a number of bit selections on the original block to produce a new permuted block. The new 48-bit block is then added to the 48-bit key, using bitwise addition, this will produce a single 48-bit block. This 48-bit block is then broken up into eight 6-bit blocks, which are each passed through one of eight Substitution Boxes to produce eight blocks. These eight 4-bit blocks are then combined to form a single block, which is then passed through the permutation function, which is simply a permutation of the bits of the block, to produce the final 32-bit output block for the current iteration. The Key Schedule Function ( K S) The purpose of the key schedule function is to generate a key for each of the 16 iterations of the DES encryption algorithm. The key schedule algorithm firstly creates two 28-bit blocks, called C and D. This is achieved using the Permuted Choice 1 function. Depending on the current iteration of the cipher function, C and D are left-shifted either one or two places. Again, depending on the current iteration of the cipher function, C and D will either go on to another round of or be passed to the Permuted Choice 2 function, which will form the completed 48-bit key to be used within the cipher function. For more a comprehensive description of functions used within the DES encryption algorithm, the interested reader is referred to FIPS PUB 46-3 [18]. 3.3. SYMMETRIC ENCRYPTION 43 The DES algorithm is a relatively slow method of encrypting data. In order to meet the demands of modern computing a new encryption algorithm was needed which would be just as secure, and would encrypt data in a much more efficient way. One of these new algorithms is the Serpent algorithm, which will be discussed below. 3.3.3 Serpent History of Serpent The Serpent algorithm was introduced in 1998 as a candidate for the Ad- vanced Encryption Standard, which was organised by the US National In- stitute of Standards and Technology (NIST) to find a successor algorithm for the DES algorithm. The design for AES required that the new algo- rithm should be faster and more secure than Triple DES [2]. The Serpent algorithm was initially designed to build upon elements of DES, because of the well-understood nature of the DES algorithm; specifically the original Serpent algorithm used the S-Boxes from DES. Serpent Encryption Algorithm The Serpent algorithm operates on a 128-bit block of plaintext, with a 256- bit key, although the key size can be any length between 64-bits and 256-bits, as a shorter key is padded so that the key used in the encryption is always 256-bits [2]. The Serpent algorithm will encrypt a 128-bit block of plaintext to a 128-bit block of ciphertext, using 32 iterations of the round function, which will be discussed later, using a different 128-bit key in each of the iterations. As Anderson, Biham, and Knudsen [2] explain, the Serpent algorithm will operate on a number of input blocks of plaintext. Firstly, the initial permutation (I P) will be applied to a block of plaintext ( P), which will produce a 128-bit block B 0 which will be used in the first iteration of the 32 iterations of the round function ( R). Each of the iterations of the round function will produce a block (Bi) that will be used in the following iteration, where i is the current iteration. Finally a final permutation (F P) is applied to the last block produced by the round function (R), which will produce the 128-bit block of ciphertext. The algorithm can be described as seen in equation 3.4. 44 CRYPTOGRAPHY Bo IP(P) Bi+ 1 R(Bi) where i = 1, 2, ... , 31 C F P(B 32 ) (3.4) The Round Function ( R) Each of the 32 iterations of the round function produces a round output by applying a single S-Box per iteration in parallel. As Anderson et al. [2] explain, R 0 would use S 0 , where S 0 is the first S-Box, and R 0 is the first iteration. It follows that R 1 would use S 1 during the next iteration and so on. The S-Boxes produce a 4-bit output from a 4-bit input, so as stated above the S-Boxes are applied in parallel, so during iteration i, Si would operate on bit 0 ~ 3 of the input block, and concurrently operate on 4 ~ 7 of the same input and so on. The results of these independent operations are combined to produce the final output that will be used in the next round. As in the implementation provided for AES, Serpent utilised a set of eight S-Boxes that where generated from the standard eight DES S-Boxes. As a result for an iteration i, the S-Box that will be applied would be Si mod 8 [3]. The round function is described in equation 3.5, where L is a Linear Transformation, Si is the S-box for the current iteration, and Ki is the key that is used in the current iteration. L(Si(X Ki)) i = 0, 1, ... , 30 Si(X Ki) K32 i = 32 (3.5) Decryption is achieved by applying the inverse of the S-Boxes in the reverse order, with the inverse of the Linear Transformation and using the reverse order of the keys used in the round function. The Linear Transformation is simply a permutation of the bits of an input block to produce a permuted output block. It is implemented using a number of bitwise shifts and XOR operations. For a detail description of this linear transformations and a detailed discussion of the Serpent algorithm, the interested reader is referred to Anderson et al. [2]. In order for the block ciphers that are discussed above to operate effi- ciently there are a number of different methods that can be used to encrypt sequential blocks. Discussed below are two of the most common block cipher 3.4. BLOCK CIPHER MODES 45 modes; namely Electronic Codebook Mode (ECB) and Cipher Block Chain- ing Mode ( CBC). A comparison between ECB and CBC is then presented in order to graphically demonstrate the differences between the two modes. 3.4 Block Cipher Modes Symmetric block ciphers use different modes of operation that can be used to encrypt data. As Schneier [46] states, the chosen mode depends on the application. All block ciphers operate on "blocks" of data, the plaintext P is broken up into equal size blocks depending on the block-size of the cipher being used. For the purposes of the following discussion, P is considered to be the complete plaintext, and n is considered to be the number of plaintext blocks, and P 1 , P 2 , ... , Pn is considered to be each block of the plaintext, the size of Pn may be smaller than the cipher's block size, the same is assumed for the ciphertext C. Two common block cipher modes are discussed in the following section, namely Electronic Codebook Mode (ECB) and Cipher Block Chaining Mode ( CBC). There are other block cipher modes, however they all follow similar principles in their approach to encrypting blocks, the interested reader is referred to Schneier [46] for more information regarding these and other block cipher modes. 3.4.1 Electronic Codebook Mode Electronic Code book Mode (ECB) is the simplest way to utilise a symmetric block cipher. Every block of the plaintext is encrypted with the key to pro- duce the output. ECB mode follows directly from the traditional definition of a symmetric cipher, and as such can be defined as seen in equation 3.6. Ci where i E {1 ... n} Pi where i E { 1 ... n} (3.6) ECB can be implemented in a very efficient manner because it can be calculated in parallel; this can be seen from figure 3.4 [15]. As a result Ci can be calculated directly from Pi. This does expose an undesirable feature of ECB mode; Schneier [46] points out that if enough of the original plaintext blocks and their corresponding ciphertext blocks are known, then parts of the messages can be decrypted, even if the key is not known. This is particularly 46 CRYPTOGRAPHY Plaintext (P) Ciphertext (C) Plaintext (P) ,---- Encryption (Ek) ,---- Decryption ( Dk) r- pl c1 pl r---- r---- f--- Pz Cz Pz r---- r---- t------ p3 c3 p3 r---- r---- t------ p4 c4 p4 r---- r---- t------ r---- f--- r---- Pn Cn Pn L_ ~ L-- Figure 3.4: Electronic codebook mode true for messages that have a regular structure, such as the headers of an email message. For example, if it is known that the plaintext "foobar" in encrypted form is "Ox4f653a018fcd", then every occurrence of the block of ciphertext can be replaced with the corresponding block of plaintext. In the following section we will introduce Cipher Block Chaining mode as a method for improving on ECB mode. 3.4.2 Cipher Block Chaining Mode Cipher Block Chaining mode implements a feedback mechanism where the first block of the plaintext is XOR'ed with an Initialisation Vector and then encrypted, the resulting block of ciphertext is XOR'ed with the next block of plaintext and so on, until all the plaintext blocks have been encrypted [46]. CBC mode is defined as shown in equation 3.7. Ek(P1 0 IV) c1 Ek(Pi 0 Ci-1) ci where i E { 2 ... n} Dk(C1) 0 IV p1 Dk(Ci) 0 Ci-1 p. ~ where i E {2 ... n} (3.7) CBC mode introduces IV which is an Initialisation Vector that will be XOR'ed with the first block of the plaintext to start the chaining. CBC mode is explained graphically in figure 3.5. Dworkin [15] states the IV does not need to remain secret, but needs to be randomly generated; this will ensure that the chaining is not predictable in any way. Schneier [46] 3.4. BLOCK CIPHER MODES 47 I Initialisation Vector (IV) J I Plaintext ( PJ) I I Plaintext (Pz) I 1 Encryption (Ek) Encryption (Ek) I ~ Ciphertext (C 1 ) Ciphertext (Cz) I I I I I I I II Plaintext (Pn) I --------------------------------- I Encryption (Ek) Ciphertext (Cn) Figure 3.5: Cipher block chaining mode discusses two problems that can occur with CBC mode; these are padding and error propagation. Padding refers to the fact the most plaintext data does not divide cleanly into the block size of the cipher, and the last block will need to be padded in order to allow the ciphertext for the block to be produced. Error propagation will result from the first ciphertext block becomes corrupt, then the entire resulting ciphertext will be corrupt. 3.4.3 ECB versus CBC The effect of encrypting data in ECB mode can be clearly seen in figure 3.6. The image that is produced when the original image is encrypted in ECB mode is still discernible; this is a result of the same key being used on each of the plaintext blocks. This clearly demonstrates the weaknesses of encrypt- ing data in ECB mode when the image is compared to an image encrypted using CBC mode, which results is only noise (random pixels of colour) being produced. We produced figure 3.6 by taking the original vectorised image of the Darwin OS mascot and converting it to a Portable Pixmap image (ppm), which is a very simple bitmap image format. The ppm version of the image was encrypted using the mcrypt [32] utility, firstly in ECB mode, and then again in CBC mode. Both images were encrypted using the DES algorithm, and in both cases the key used was "hexley". 48 CRYPTOGRAPHY Original Image [28] ECB Encryption CBC Encryption Figure 3.6: Comparison between ECB and CBC modes The shared secret approach to data encryption used in symmetric encryp- tion does present a problem regarding key management, if the key is com- promised, then the entire encryption scheme is no longer valid. Asymmetric encryption, discussed in the next section, provides a scheme that allows data to be encrypted using a "composite key". This scheme is used for digital certificates and digital signatures where authenticity can be guaranteed. 3.5 Asymmetric Encryption Asymmetric encryption refers to the family of cryptosystems that use a key for encryption and a different key for decryption, also called public-key cryp- tography. This type of cryptosystem as first described by Whitfield Diffie and Martin Hellman in 1976 [12, 46]. In this section we will explain asymmetric encryption, and then we will discuss RSA encryption as an example of an asymmetric cryptosystem. Asymmetric encryption differs from symmetric encryption, in that each party involved in the transmission of encrypted data has two different keys; a public-key and a private-key collectively called a key-pair. The public-key is distributed freely, while the private-key remains secret. Asymmetric cryp- tosystems allow data that is encrypted by the public-key to be decrypted only with the corresponding private-key, as seen in figure 3. 7. These cryp- tosystems have the form as shown in equation 3.8, where kprivate and kpublic are elements of the same key--pair. 3.5. ASYMMETRIC ENCRYPTION Ekpublic ( P) Dkprivate (C) c p 49 (3.8) Asymmetric cryptosystems rely heavily on large random prime numbers in order to generate the key-pair, usually in the region of thousands of bits long. The security of these cryptosystems is assured through the compu- tational complexity of prime factorisation on modern computers. It is con- sidered nearly impossible for a standalone modern desktop computer to fac- torise a significantly large number into its two prime components. However through the use of large distributed computing projects sometimes involving thousands of computers, the time taken for finding the component prime numbers can be reduced. Schneier [46] again outlines how an asymmetric cryptosystem would be used to encrypt and decrypt data: 1. Alice and Bob both generate a key-pair, consisting of a public-key and a private-key. 2. Alice and Bob now agree on an asymmetric cryptosystem that will be used to encrypt the data. 3. Bob sends Alice his public-key. 4. Alice encrypts the data using the selected cryptosystem with Bob's public-key. 5. Alice sends the encrypted data to Bob. 6. Bob decrypts the data using his private-key. In order to further discuss asymmetric encryption, RSA Encryption will be discussed below. RSA is a well-understood and widely used asymmetric cryptosystem. 3.5.1 RSA Encryption History of RSA The RSA Encryption algorithm was first introduced in 1978 and is named after its inventors Rivest, Shamir, and Adleman [44]. As Schneier [46] points out, it is a very popular public-private key algorithm because it is very easy to understand and implement. 50 CRYPTOGRAPHY Encryption Public Key l .. , Algorithm .. cryptotext Decryption Private Key l Cryptotext,---+1 .. , Algorithm .. Plaintext Figure 3.7: Asymmetric encryption RSA Encryption Algorithm The RSA algorithm [44] makes use of trap-door functions in to provide strength to the cryptosystem; specifically RSA makes use of the inability for computers to quickly factorise large numbers in prime numbers. RSA makes use of two pairs of numbers, the public-key pair (e, n), and the private-key pair (d, n), where d, e, and n are three positive integer num- bers. To encrypt a message M, which is represented by a sequence of integer numbers, M is raised to the power of e and then the ciphertext C is the remainder of Me when divided by n. The decryption of the ciphertext C, is simply C raised to the power of d, and then the plaintext results from the remainder of Cd being divided by n. These functions are formally defined in equation 3.9. C Ekpublic ( M) Dkprivate (C) (3.9) The key-pairs are chosen in such a way that they are related to two very large prime numbers. Firstly, n is defined as the product of two large random primes p, and q. The integer dis chosen to be a large number that is relatively prime 1 to (p- 1) * ( q- 1). Finally e is chosen to be the multiplicative inverse 2 of d modulo (p- 1) * (q- 1). The definitions for n, d, and e can be seen in equation 3.10. 1 Relatively Prime - When gcd( a, b) = 1 then a and b are relatively prime [22]. 2 Multiplicative Inverse - When a * b = 1 then a is the multiplicative inverse of b. 3.6. CRYPTOGRAPHIC HASH FUNCTIONS 51 n P*q gcd(d, (p- 1) * (q- 1)) 1 1 mod (p- 1) * (q- 1) (3.10) There are often cases where the consistency or authenticity of data must be determined. The use of cryptographic hash functions, discussed in the following section, allows data to be represented by a unique hash which would allow the consistency or authenticity of data to be checked. 3.6 Cryptographic Hash Functions Cryptographic hash functions form a specific family of cryptographic ciphers that aim at producing a specific output for a specific input. The output that is produced is called the hash-value or message digest, and the function that is used to produce the hash-value from an input is called the hash-function. The input for a cryptographic hash-function is referred to as the message. All cryptographic hash functions will take an arbitrary sized input and produce a fixed-length output. Given a cryptographic hash function h with a domain D and a range R, the mapping between the domain and the range is shown in equation 3.11. h:D-+R where IDI > IRI (3.11) The size ofthe domain (IDI) is always greater than the size of the mapped range (IRI), and due to the fact that the hash-function h can accept any arbi- trary sized input, this implies that the function is many-to-one [35]. Menezes et al. [35] points out that this will imply that a hash-function will contain collisions; which are identical outputs for unique inputs. One of the design aims of a cryptographic hash function is to minimise the probability that a collision will occur in real world applications. The fact that a cryptographic hash function will produce collisions can be used as the basis for an attack to compromise the integrity of the hashed message; this feature is discussed in section 3.6.3. A cryptographic hash function will always produce a standard length output, which is known as the bitlength of the hash-function. If a hash- function produces an output that consists of m-bits then the hash-function will have a bitlength of m [35]. 52 CRYPTOGRAPHY There are two main groups of cryptographic hash functions, namely keyed and unkeyed. Keyed hash functions will generate a hash-value using a secret key, and an input message. This type of hash-function is used to generate what is known as a Message Authentication Code (MAC) which can be used to verify the source and integrity of a message. An unkeyed hash-function generates a message based solely upon the input message without the use of a secret key; this type of hash-function is used to generate what is known as a Message Integrity Code (MIC) [35]. This type of hash-value is used to verify the integrity of a message. These two types of hash-functions are discussed briefly in section 3.6.1 and section 3.6.2. MICs and MACs are not used to secure messages, only to provide mech- anisms to verify the integrity of a message. An example of the use of hash- values to provide a mechanism to verify integrity can be seen on many FTP services across the Internet. Files that can be downloaded from a particular website or FTP server are often distributed with their corresponding hash- values, which can be used to verify that a downloaded file is the same as the original file. If the file was corrupted during download then a hash-value for the file would not match the hash-value of the original file. 3.6.1 Message Integrity Codes A Message Integrity Code (MIC) function will generate a hash-value based solely on the input message, in the ideal case the output that is produced is unique to the input message; however collisions can be produced as discussed above. Menezes et al. [35] point out that there are two types of MICs namely: 1. One-Way Hash Functions - where finding an input message that hashes to the output hash-value is computationally difficult. 2. Collision Resistant Hash Functions -- where finding two messages that hash to the same hash-value is computationally difficult. The use of MICs to verify the integrity of a message can be seen in fig- ure 3.8, where the file ubuntu-7 .10-dvd-i386. iso has been provided along with the file MD5SUMS, which contains the original MD5 [43] hash value for a file as generated by the distributor. In order to verify integrity of the file, the program md5sum [13] is use to calculate the hash-value of the locally stored copy of the file and compare it to the hash-value provided by the distributor. If the hash-value of the file matches the hash-value provided then we can say with relative certainty that the two files are identical. 3.6. CRYPTOGRAPHIC HASH FUNCTIONS rootmachine:/hash# ls MD5SUMS ubuntu-7.10-dvd-i386.iso rootmachine:/hash# cat MD5SUMS b5d9aaa45af862b4c804530734216a15 *ubuntu-7.10-dvd-i386.iso rootmachine:/hash# md5sum -c MD5SUMS ubuntu-7.10-dvd-i386.iso: OK rootmachine:/hash# Figure 3.8: File verification using a message authentication code 3.6.2 Message Authentication Codes 53 A Message Authentication Code (MAC) is a hash-value that is used to en- sure the integrity and source of a message. MAC hash-functions accept a message and a secret key. MACs can be generated in a number of ways, either using a symmetric block cipher, or by using a Message Integrity Code that is combined with the secret key, in both cases a hash-value is produced that can be used to verify the integrity of a message. Block Cipher MAC The simplest approach to generating a MAC is to use a block cipher to encrypt a message using a specific block cipher mode (see section 3.4), and then use the last block of the ciphered message as the MAC [46]. Provided both parties involved in the authentication of the message use the same secret key to generate the MAC, the same result will be achieved. Message Integrity Code MAC This again is a simple approach to the generation of a MAC; this method simply involves combining the input message and the secret key to generate a hash-value using a MIC (see section 3.6.1). Again, providing that both parties involved in the authentication of a message use the same secret key and the same algorithm to generate the hash-value then the authenticity of the message can be verified. Hash functions suffer from collisions because of the limited output range. As such they are subject to an attack known as the birthday attack, where the probability that a collision can occur can be calculated. The birthday paradox is discussed below. 54 CRYPTOGRAPHY 3.6.3 Birthday Attack The birthday attack is one of the most common attack which is used on cryptographic hash functions, and exploits the fact that hash-function will generate collisions for two or more distinct inputs. It is named after the Birthday Paradox, which is a standard statistical distribution problem. In order to fully explain the birthday attack, the birthday paradox is briefly explained. Definition 3.1 Combinatoric definitions [35} 1. Let m, n EN, where m 2: n. Then m'Il is: m' mrr = m(m- 1)(m- 2) ... (m- n + 1) = (m -n)! (3.12) This is the lower factorial, which will count the number of permutations of m distinct objects when n of those objects are chosen. 2. Let m, n E Z, where m 2: n. Then { 7::} is: (3.13) This is the Stirling number of the second kind, which counts the number of ways to partition m objects into n non-empty subsets. The definitions presented in equation 3.12 and 3.13 are standard combina- toric functions that deal with the counting the permutations and partitioning of a given number of objects; these functions are used in the following theo- rems to explain the classical occupancy problem and the birthday paradox. Theorem 3.1 Classical occupancy problem [35} A bucket contains m balls that are numbered 1 through m. If n balls are drawn from the bucket one at a time, their number listed, and then returned to the bucket (i.e. with replacement), then the probability that exactly t different balls have been drawn is, { n} m'Il f(m, n, t) = t mn' where 1 ::; t ::; n (3.14) 3.6. CRYPTOGRAPHIC HASH FUNCTIONS 55 The classical occupancy problem is a probability function that calculates the probability of a certain different number of occurrences over a larger set. The birthday paradox follows from this, as is seen in equation 3.15. Theorem 3.2 Birthday paradox {35} A bucket contains m balls that are numbered 1 through m. If n balls are drawn from the bucket one at a time, their number listed, and the returned to the bucket (i.e. with replacement), then the probability of at least one coincidence is, mn_ g(m,n) = f(m,n,n) = 1--, mn where 1::; n :=:;: m (3.15) As m--+ oo then n = O(Jm) (the upper asymptotic bound). The birthday paradox can be demonstrated by the following example. Consider a situation where there is a large group of people, and you would like to calculate the number of people required such that there is a greater than fifty-percent chance that at least one person has the same birthday as you do; this can simply be calculated as 364 x 0.5 ~ 183. Now consider a situation where you would like to calculate the number of people required such that there is there a greater than fifty-percent chance that there is at least one coincidence, or that there are 2 people that share the same birthday. This situation can be calculated using the birthday paradox, and is simply g(365, 23) ~ 0.507 (see theorem 3.2). Therefore you would need to have at least 23 people together to have a greater that fifty-percent chance of at least one coincidence. The birthday attack follows directly from the birthday paradox, consider the following situation as outlined by Schneier [46]. A hash-value that con- tains n-bits would require calculating the hash-value of 2n random messages. However, finding two messages that have the same hash-value would require calculating the hash-value of at least 2n/ 2 messages. An attacker could theoretically generate multiple messages with different minor changes in each; if enough messages are generated then there could be a case where two of those messages would generate the same hash-value. This could be exploited by an attacker which would allow a legitimate message to be exchanged with a false message that has an identical hash-value, and therefore would be considered to be a valid message. We will now discuss an implementation of a cryptographic hash function. The Secure Hash Algorithm (SHA) is a standard algorithm for calculating 56 CRYPTOGRAPHY hash values. There are many variations of the SHA, each giving a slightly different hash value. SHA -1 will be discussed in the following section. 3.6.4 Secure Hash Algorithm (SHA) SHA Algorithm There are four SHA algorithms that are outline in the Federal Information Processing Standards Publication 180-2 [16], these are SHA-1, SHA-256, SHA-384, and SHA-512. All of the algorithms are similar in design and produce different output bitlengths. The SHA-1 algorithm will be discussed below. The SHA-1 algorithm operates on 512-bit blocks of a message and will produce a 160-bit message digest. The SHA algorithm is broken up into two stages; these are Pre-processing, and Hash Computation. SHA-1 Pre-processing The SHA-1 algorithm can accept an input message (M) of arbitrary length, although the input message is padded so the total bitlength of the message will always be a multiple of 512-bits. The message is then broken up into N 512-bit blocks that will be operated on by the Hash Computation phase of the algorithm; these blocks can be represented as M(l), M( 2 ), ... , M(N). The Hash Computation phase uses an iterative algorithm to produce the final message digest. For each iteration, a hash block (H) is produced and will be used in the next iteration, and so on. The hash block that is used in the first iteration is called the initial hash value which will be used to produce H(l). The interested reader is referred to FIPS PUB 180-2 [16] for more detailed information on the initial hash value. SHA-1 Hash Functions The Secure Hash Standard (SHS) defines a number of functions which are used in each iteration to calculate the hash-value. There are a number of different functions that are defined for each version of the SHA algorithm. Discussed below are the functions that relate to the generation of a SHA-1 hash. Firstly the SHS defines a set of primary functions that are used in the hash computation. A circular right shift function is defined, as seen in 3.6. CRYPTOGRAPHIC HASH FUNCTIONS Input: M- The input message. Input: N- Number of message blocks. Output: OutputHash- The hash-value. I* Operate on every input message block *I for i <--- 1 to N do I* Initialise the temporary variables with words from the previous iteration *I H (i-1). a.___ o ' b H (i-1). <--- 1 ' H (i-1). c <--- 2 ' d H (i-1). <--- 3 ' H (i-1). e <--- 4 ' for t <--- 0 to 79 do I* Perform 80 iterations of the cipher functions for this iteration *I T <--- ROT L 5 (a) EB ft (b,c,d) EB e EB Kt EB Wt; e <--- d; d <--- c; c <--- ROTL 30 (b); b <--- a; a<--- T; end I* Create the Hash Block that will be used in the next iteration *I H (i) Lf\H(i-1). o <-aw o ' Hii) <--- b EBHii- 1 ); H (i) IT\H(i-1). 2 <-Cw 2 ' .___ d H (i) IT\H(i-1). 4 <-ew 4 ' end I* Return the Hash Block that was generated for the final iteration, this is the ciphertext of the input message block *I OutputHash <--- fi(N); Algorithm 2: Complete SHA -1 algorithm 57 58 CRYPTOGRAPHY equation 3.16. Where ROTLn represents a circular right shift by n positions on a 32-bit word. ROT L n = ( x n) V ( x 32 - n) (3.16) SHS also defines a set of eighty functions, called fo, fi, ... , !7 9 , and have the form ft ( x, y, z). The particular form off differs depending on the number of the current iteration. These functions are used during each iteration of the hash computation and are defined in equation 3.17. { Ch(x,y,z) = (xl\y)0(xl\z) Parity(x, y, z) = x 0 y@ z ft(x, y, z) = Maj(x, y, z) = (x 1\ y) 0 (x 1\ z) 0 (y 1\ z) Parity(x, y, z) = x 0 y 0 z SHA -1 Constants 0:; t:; 19 20:; t:; 39 40:; t:; 59 60:; t:; 79 (3.17) The SHA defines a set of eighty constant values that are used in the hash computation, labelled Kt where t E 0, 1, ... , 79. These constant values are shown in equation 3.18. Ox5a827999 0 :; t :; 19 Ox6ed9eba1 20 :; t :; 39 Ox8f1bbcdc 40:; t :; 59 Oxca62c1d6 60 :; t :; 79 SHA-1 Hash Computation (3.18) The hash computation phase operates on each of the N blocks of the in- put message. For each iteration of the hash computation, firstly a message schedule (W) must be created. During the hash computation it is necessary to reference certain 32-bit words within a larger structure. The SHS defines the following notation to perform this reference, for example, M ~ i ) refers to the nth 32-bit word in the ith message block. The message schedule is made up of eighty 32-bit values; each of these 32- bit values is referenced as Wt, where t = 0, 1, ... , 79. The message schedule is constructed as follows: 3.7. SUMMARY 59 { M(i) TXT _ t rrt - 1 ROT L (Wt-3 iSl Wt-s iSl Wt-14 iSl Wt-16) 16 :::; t :::; 79 0 :::; t :::; 15 The complete SHA-1 algorithm can be described as seen in algorithm 2. The interested reader is referred to FIPS PUB 180-2 [16] for more detailed information concerning the SHA family of hash functions. 3.7 Summary In this chapter we covered the following cryptographic concepts: Basic Concepts- where we discuss a number of concepts common to many different cryptographic systems. Symmetric Encryption- where we discuss the family of cryptosystem which use a single "shared secret" , we discuss a number of concepts and algorithms, namely: - Substitution Boxes - a method which is used in many different symmetric cryptosystems to securely encrypt data. - Data Encryption Standard (DES) - a historically widely used symmetric encryption algorithm, - Serpent -- a modern symmetric encryption algorithm which is faster and more secure than DES. Block Cipher Modes -which are used in conjunction with a symmetric cryptosystem to ensure data security, we discussed the following block cipher modes: - Electronic Codebook Mode- a simple cipher mode which uses the same encryption key for each block which is encrypted. - Cipher Block Chaining Mode - a cipher mode in which the en- cryption key is dependent on the preceding encrypted block. ECB versus CBC- where we compare the two discussed block cipher modes to show the differences between the two. Asymmetric Encryption - where we discuss the family of cryptosys- tems which use different keys for encryption and decryption, we dis- cussed the following cryptosystem: 60 CRYPTOGRAPHY - RSA Encryption - a widely used asymmetric cryptosystem, use primarily for digital certificates. Cryptographic Hash Functions - a family of cryptosystems which pro- duce a unique output for a particular input. We discussed the following concepts: Message Integrity Codes - used to verify the integrity of a mes- sage. - Message Authentication Codes - used to verify the authenticity of a message. - Birthday Attack -- a widely used attack on cryptographic hash functions. Secure Hash Algorithm (SHA-1) - a commonly used crypto- graphic hash algorithm. 3.8 Conclusion Cryptography is used extensively in modern information systems to provide both secure communication and secure storage of data. Although many dif- ferent algorithms and techniques exist, they all strive to provide the same goals; to ensure data remains secure, and to provide that security quickly and efficiently. In this chapter we discussed some basic cryptographic theory and tech- niques. We started the discussion with some basic cryptographic concepts that can be used to describe cryptosystems. This was followed with a dis- cussion of symmetric encryption with an overview of the DES and Serpent algorithms. We then went on to describe different techniques that are used to encrypt data used in symmetric block ciphers with specific block cipher modes. We then went on to describe asymmetric cryptosystems and this was followed with a brief discussion of RSA encryption. Finally we discussed cryptographic hash functions and their specific forms, namely Message Au- thentication Codes, and Message Integrity Codes. The Birthday Attack was then discussed along with a brief discussion of the SHA -1 algorithm. In the following chapter we will discuss steganography and steganographic file systems. Concepts and algorithms introduced in this chapter are used throughout the following chapters in order to explain different aspects of 3.8. CONCLUSION 61 information security. Steganography in particular relies heavily on cryptog- raphy to ensure information security. As such this chapter plays a vital role in the understanding of the following chapters. Chapter 4 Steganography and Steganographic File Systems 4.1 Introduction Data hiding techniques are becoming more prominent as more digital media becomes available. Steganography can be used for many different situations where data of some form needs to be hidden. Applications of steganogra- phy can be found from intellectual property protection to currency anti- counterfeiting software. As increasingly more personal information is found in a digital form there is a need to protect that information, and steganog- raphy offers a method of hiding data that provides plausible deniability that particular hidden data never existed. Steganography as a concept is not a new one. Historically there have been many different attempts to ensure that information remains hidden, as this increases security. These techniques have been adapted for use in a modern world where digital information can be hidden within other digital information to such a degree that it can become almost undetectable. As security of information becomes increasingly important in modern society, steganography can play an important role in ensuring that data remains secure. In this chapter we will introduce steganography and steganographic file systems and discuss some applications and methods thereof. In section 4.2 we will introduce steganography and some non-digital applications, in order to demonstrate the extent steganography plays in our day to day lives. In section 4.3 we will discuss image and audio steganography. We will then introduce cryptographic file systems in section 4.4, and then discuss some 63 64 8TEGANOGRAPHY AND 8TEGANOGRAPHIC FILE SYSTEMS implementations. We will then introduce steganographic file systems in sec- tion 4.5, and finally we will discuss some implementations. 4.2 Steganography Steganography is a component of information hiding and literally means "covered writing". Generally it refers to the hiding of information within other information. Where cryptography is the art of obscuring information, steganography is the art of obscuring the presence of information [41]. Mod- ern steganographic techniques often use cryptographic algorithms to further obfuscate information before it is hidden. In the modern world, our personal information must remain secure. It is the responsibility of the user to ensure that private data will remain private. Such as is the case in the United States, where border officials are allowed to search the laptop computers of travellers without a warrant, and based purely on suspicion [42, 48, 55]. Steganography and data hiding techniques can play an important role in ensuring that our personal information can remam secure. Steganography has many wide ranging applications, from paper water- marking, currency protection mechanisms, to intellectual property and copy- right protection mechanisms such as digital watermarking and digital finger- printing, or simply to provide anonymity or allow for covert communications. We will now discuss terminology that is used to describe steganographic sys- tems, and we will then discuss some historic uses of information hiding. 4. 2.1 Terminology Like cryptography, steganography utilises a number of standard terms in order to describe the different components of a steganographic system. The data that is going to be hidden is called the embedded data, the file in which the data is to be hidden is called the cover-file. Depending on the type of data the cover-file contains, it can be referred to as the cover-text, cover- image, cover-audio, cover-video, or generically as the cover-object. The file produced from the steganographic process is called the stego-object, or can be referred to as the stego-text, stego-image, stego-video, or stego-audio. Finally the key used to control the steganographic process is called the stego- key [41, 1]. The terms are summarised in table 4.1. 4.2. STEGANOGRAPHY 65 A stego-object is referred to as steganographically strong if it is impos- sible to detect the presence of the embedded data [36]. In the following section we will discuss historic uses for information hiding, and some modern non-digital applications of steganography. 4.2.2 Historic Steganography Steganography has historically been used in many different forms to either protect a message, or to verify the authenticity of a message. Early cryp- tographic techniques were not particularly sophisticated, and often took the form of a simple shifting cipher, such as the Caesar cipher. It was difficult to distribute the keys from one far flung outpost to another, so as a result if messages or keys were intercepted it was a fairly trivial task of decoding them. Hiding the existence of messages became an attractive solution for the secure transportation of important messages. Techniques included writing messages on wax tablets, or shaving the head of a slave, tattooing the message on the head of the slave, and then sending the slave with the message once his hair had grown back [41]. These techniques would ensure that if a message- carrier was intercepted, the message would have a greater chance of not being located. Paper watermarking has long been used to verify authenticity of paper documents, and although not effective today, paper watermarks are still seen on paper currency and official documents. Paper watermarks are created during the milling of the paper, and are used to hide simple information, in the fibre of the paper [41]. 4.2.3 Currency Protection Mechanisms Currency incorporates many different steganographic objects in order to com- bat counterfeiting [6], by embedding information that proves that a note or Cover-file Embedded data Stego-object Stego-key An unassuming file that data will be hidden in The data that will be hidden in the cover-file The result of the steganographic process A key used to control the steganographic process Table 4.1: Basic steganographic terms 66 STEGANOGRAPHY AND STEGANOGRAPHIC FILE SYSTEMS 0 0 0 0 0 Figure 4.1: Example of an EURion constellation coin is authentic. Most paper currency will incorporate many different anti- counterfeiting elements such as: a paper watermark embedded into the fabric of the paper currency. Colour changing ink ink that will change colour based on the viewing angle. Moire patterns geometric patterns that will appear blurred if the note is duplicated using normal computer equipment. Raised ink special ink that is raised off the surface of the note; this cannot be produced with standard computer equipment. Security strips -- metallic strips that are embedded into the fabric of the note. Micro-writing very small writing that will not be clear if the note is duplicated. EURion a geometric shape consisting of five lmm cir- cles (See figure 4.1) that can be detected by anti-counterfeiting software [30, 37]. UV ink that will only become visible under Ultra Violet light. 4.3. DIGITAL STEGANOGRAPHY 67 Currency protection mechanisms are a closely guarded secret of the coun- try issuing the note. There are probably many more anti-counterfeiting ob- jects that are built into the design of a note that the general public is not aware of. 4.2.4 Copyright Protection Mechanisms Steganography can be used to protect digital media, such as images, audio, or video. This is done by embedding a digital watermark or digital fingerprint in the digital content. The term Digital Watermark was first described by Tirkel et al. [54] in their paper Electronic Watermark. Copyrighted digital data such as images, audio and video, or any other digital data that can be transmitted electronically, can be watermarked in order to control its distribution, or to prove the legal ownership. Digital Watermarks strive to be persistent in nature. Ideally a watermark should still be detectable even after the data has been manipulated. In the case of an image, even after the image has undergone a number of transformations [41]. Another use for digital watermarking is as a Digital Rights Management (DRM) system. Peinado et al. [40] explain that distribution information can be included within video in order to only allow authorised people to view the media. Wu et al. [59] explains how hidden information can be embedded into medical images to prevent any form of tampering. Digital watermarking has many applications as we use more and more digital data in our daily lives. Steganography has wide ranging digital ap- plications, which will be discussed in the following section. We will discuss methods of image and audio steganography. We will then discuss methods of attacking steganographic systems that utilise the least significant bit. 4.3 Digital Steganography Digital steganography utilises digital data to hide other digital data. Com- mon types of cover-files that are used are images and audio. Steganography can be applied to any digital data, as long as the underlying structure is well- understood. In this section, image steganography and audio steganography are discussed, as they are most likely techniques to be encountered. 68 STEGANOGRAPHY AND STEGANOGRAPHIC FILE SYSTEMS 4.3.1 Image Steganography Image steganography uses an image as the cover-file and is probably the most well known of all the steganographic techniques. Image steganographic techniques encode embedded data within the pixel information of the cover-image. In a simple system, a single bit of the em- bedded data can be stored in the Least Significant Bit of a single pixel from the cover-image. This has the effect of increasing the overall colour value of a pixel by one, which does not produce any visible change in the cover-image. For example, assume that a cover-image uses 24-bits to store the colour information for a single pixel, with 8-bits representing each of the colour channels of the pixel, namely Red, Green, and Blue. Then 3-bits of the embedded data can be stored in each pixel of the cover-image, a bit in the red channel, a bit in the blue channel, and a bit in the green channel. The number of least significant bits used can be increased, but this could produce a stego-object which may appear visibly different from the cover-image. The maximum size of the embedded data is therefore dictated by the size of the cover-file [36, 5]. In order to obscure the presence of the embedded data further, a pseudo- random sequence of pixels from the cover-image is chosen. This is controlled through the use of a stego-key. The stego-object that is produced will be the same size as the original cover-file [1]. However the least significant bit approach also has limitations; the em- bedded data is not completely strong, as the embedded data can be lost if the stego-image undergoes a transformation, such as a rotation. The embedded data is not likely to survive operations such as JPEG compression, as this will rewrite the pixel information of the cover-image, effectively destroying any embedded data [1, 41]. As Francia and Gomez [19] explain, this can be an effective method for destroying any steganographic content in the cover-file. 4.3.2 Image Steganography Example We created Figure 4.2 using the steghide [27] application. Stego-objects are created by steghide using a pseudo-random sequence of parts of the cover-image. Graph theory is then used to find which of these parts, when exchanged, will have the effect of encoding the embedded data. The colour values are not changed in the resulting stego-object, which the author of the application claims will make it resistant to standard steganographic detection methods. However this approach will create a stego-object that is larger in 4.3. DIGITAL 8TEGANOGRAPHY 69 Original cover-image [56] Stego-image MD5 Hash: edfa4c0babbba4de75b746600aec78ce MD5 Hash: fb09dc81f3fbb4b74f0b2dfca8527fcb Embedded data [57] MD5 Hash: 7e4890ea23e10bcd82a720362b65296e Figure 4.2: Image steganography example 70 8TEGANOGRAPHY AND 8TEGANOGRAPHIC FILE SYSTEMS size than if the least significant bit approach is used. The interested reader is referred to the Steghide Manual [27] for more detailed information. As can be seen from figure 4.2 the cover-image and the resulting stego- image are visually indistinguishable, although the MD5 [43] hash of the two images are different. 4.3.3 Audio Steganography Another very interesting application of steganography is audio steganogra- phy. Audio steganographic techniques attempt to hide information within an audio file, while not changing the perceivable output [5]. With the ever in- creasing sale of digital audio on the Internet, and the demand from recording companies to ensure that digital music cannot be pirated; steganography can be used to provide an effective Digital Rights Management (DRM) solution. One method of audio steganography is the use of echo hiding. As Gruhl, Lu, and Bender [25] explain, the human auditory system is more sensitive than the other human senses. It is generally difficult to embed data within cover-audio because of the large audio range that humans can perceive, both in terms offrequency and power. They go on to explain that there are "holes" in the human auditory system that can be exploited to encode steganographic data, and not change the audible output. Echo hiding operates by introducing an echo into the cover-file to hide embedded data. Two different length echoes are used to encode 0 and 1, and thus allowing binary data to be hidden. This data hiding technique relies on the fact that if the original sound and an echo are close enough together, humans will not distinguish between the two distinct sounds, but will only hear a single "compound" sound. The interested reader is referred to "Echo Hiding" [25] for more information. 4.3.4 Least Significant Bit (LSB) Attacks LSB steganography is the most common form of steganography used, and as such there are a number of methods for detecting steganographic content. As Westfield and Pfitzmann [58] explain, there are two general type of LSB detection methods; visual attacks and statistical attacks. Both of these types of attacks will be discussed below. 4.3. DIGITAL STEGANOGRAPHY 71 Visual Attacks Visual attacks are performed manually for any steganographic method that modifies the LSB of a cover-image. This type of detection works best on greyscale images, where the embedded data is encoded sequentially within the cover-image. A standard visual attack works by mapping the LSB of each pixel in the stego-object. The resulting map will contain 1 if the LSB of a pixel was 1 and 0 if the LSB of a pixel was 0. The likelihood that an image contains embedded data can be determined by visually analysing that amount of "noise" that is present in the least significant bits of the cover-image. However, Westfield and Pfitzmann [58] go on to explain that most LSB steganographic techniques hide data very carefully as to avoid detection. The interested reader is referred to Westfield and Pfitzmann [58], pp. 64-68 for more information. Statistical Attacks Statistical attacks generally calculate the probability that embedded data of a certain length is hidden in an object. Statistical attacks can be automated to calculate the probability that embedded data exists quickly and accurately. Westfield and Pfitzmann [58] present a statistical attack that uses pairs of values (PoV), which are pixel values that only differ in least significant bit, to calculate the probability that embedded data exists. Fridrich, Goljan, and Du [21] explain that this approach works well for data that is embedded sequentially, but does not produce accurate results for randomly embedded data. The interested reader is referred to Westfield and Pfitzmann [58], pp. 68-71 for more information. Fridrich, Goljan, and Du [21] propose a statistical steganographic attack that calculates length of potential embedded data, and therefore the existence of embedded data. The stego-image is divided into groups of pixels of which are then quantified to eliminate excess noise. The groups of pixels are then divided into regular, singular, and unusable groups through the application of a flipping function. The flipping function simply negates the LSB of a pixel value. This technique calculates how the number of regular and singular groups change with increasing embedded data lengths. For more information the interested reader is referred to Fridrich et al. [21]. We will now discuss cryptographic file systems as a method for obfus- cating data on the hard disk, by discussing some implementations thereof. 72 8TEGANOGRAPHY AND 8TEGANOGRAPHIC FILE SYSTEMS Kernel Space User Space Host File System Figure 4.3: CFS design architecture [14) Cryptographic file systems obfuscate data but do not hide the presence of data. We will then discuss steganographic file systems in order to contrast the application of each of these two different types of file system. 4.4 Cryptographic File Systems The goal of cryptographic file systems is to provide transparent encryption and decryption for user data. Cryptographic file systems are usually imple- mented as an encryption layer on an existing host file system, as is the case with CFS and Cryptfs discussed below. These file systems do not maintain a traditional file system of their own, but rely completely on the host file system to provide low-level access to the data. Other cryptographic im- plementations such as the Linux Cryptoloop driver, also discussed below, manages the encryption and decryption of raw data, which allows for indi- rect creation of a cryptographic file system through the use of other userspace tools. 4.4.1 The Cryptographic File System - CFS Blaze [7] was one of the first to propose creating a file system that transpar- ently encrypted and decrypted user data. The Cryptographic File System (CFS) was created in order to demonstrate these techniques. Transparency is achieved by limiting the amount of human-interaction with the "crypto- graphic housekeeping" that usually occurs when trying to encrypt data. Normal user tools that are available for encrypting data involve a fair amount of interaction with the human operator. The UNIX mcrypt utility can be used to encrypt data streams. The user however needs to specify a number of parameter arguments, such as the encryption algorithm to utilise, 4.4. CRYPTOGRAPHIC FILE SYSTEMS 73 the keysize, and the keymode. CFS tries to eliminate a large amount of user interaction through the use of transparency. This is achieved by introducing encryption and decryption routines di- rectly into the file system implementation. CFS creates a "virtual" file sys- tem on the host machine, through which a particular user can interact with their encrypted files. CFS is implemented as an interface between a UNIX file system and encrypted user data. Blaze [7] explains that a user can access their encrypted data by using a userspace tool to issue an "attach" command on an encrypted directory using an encryption key. CFS will then encrypt and decrypt user data as needed. When a user is done with the file sys- tem, then a "detach" command is issued and CFS will detach the encrypted directory from the virtual file system. The CFS virtual file system is simply a directory that contains all the encrypted data on the host file system; this allows CFS to operate completely independent of the host machine. CFS is implemented as a userspace daemon that interacts with the file system using a modified NFS [45] server. CFS provides a number of userspace tools that can be used to interact with the CFS daemon and the virtual file system. The CFS virtual file system is created using the cmkdir command; this creates an initial directory that will contain the encrypted data. Interac- tion with CFS is initiated using the cattach command; this command will instruct the CFS daemon to mount the virtual file system within the host Vnode sub-system, because the CFS daemon is a modified NFS server, the standard NFS client is used to control access to the CFS daemon. All of the transparent encryption and decryption is then handled by the CFS daemon. In order for a virtual file system to be unmounted, the cdetach command is used. Data in CFS is encrypted using the DES algorithm (see section 3.3.2, page 40), however there are different encryption algorithms available. The interested reader is referred to the article authored by Blaze [7] for more specific implementation details. 4.4.2 Cryptfs Cryptfs [60] is another implementation of a transparent cryptographic file system. Like CFS it is implemented as a virtual file system that uses an existing file system as a host. Unlike CFS, Cryptfs is implemented as a loadable kernel module. This allows Cryptfs to have better performance and security as all internal workings are protected within the kernelspace of the 74 8TEGANOGRAPHY AND 8TEGANOGRAPHIC FILE SYSTEMS Kernel Space User Space Host File System 1<---r------1 User Process Figure 4.4: Cryptfs design architecture [14] operating system. A userspace tool is provided with Cryptfs in order for the user to access and manage encrypted data. Cryptfs is implemented as a stackable Vnode (see section 2.7, on page 29) level file system; this allows the Cryptfs kernel module to extend the func- tionality of any existing file system. Data is encrypted using the Blowfish [4 7] algorithm. The interested reader is referred to the article authored by Zadok, Badulescu, and Shender [60] for specific implementation details. 4.4.3 Linux Cryptoloop Driver The Linux Cryptoloop driver uses the Linux loopback driver to allow a file to be mounted as a block device [11]. This allows a file system to be cre- ated within a normal file. Interaction with the loopback device is done via the standard Linux VFS layer [23]. The Cryptoloop device adds a level of transparent encryption to the standard loopback driver. The Cryptoloop device uses the Linux CryptoAPI to provide many dif- ferent encryption algorithms for the underlying file. The underlying file is initially mounted using a userspace application which will also initialise the encryption key for the Cryptoloop device. After the initial mounting opera- tion, the underlying file is interacted upon as if it were a normal block device, with the Cryptoloop driver providing transparent encryption and decryption of data. A userspace tool is also used to unmount the device and release any kernel resources the Cryptoloop driver is utilising. Once the underlying file has been created and mounted via the Cryptoloop driver, standard file system creation tools can be used to create a complete file system within the encrypted file. The standard file system implementation will then handle the storage of data, and the Cryptoloop driver will handle the underlying encryption. 4.5. 8TEGANOGRAPHIC FILE SYSTEMS 75 Kernel Space User Space Disk H CryptoLoop Driver Host File System Virtual File System t<-------i---1 User Process J Figure 4.5: Linux Cryptoloop driver architecture [14] Although the Cryptoloop driver is generally considered a secure method of transparent encryption, an exploit was discovered in 2005 which allows for watermarked data to be detected within the underlying encrypted file [49]. 4.5 Steganographic File Systems Steganographic file systems are file system implementations that strive to hide the presence of data within the structure of an existing file system. Cryptographic file systems obscure data through cryptographic algorithms, but never deny the presence of the encrypted data. Steganographic file sys- tems however obscure data, usually through cryptography and data hiding techniques to provide plausible deniability. Plausible deniability is a feature which is exhibited by steganographic file systems, in that is allows the existence of data to be denied. This allows sensitive data to be hidden from adversities, such as to thwart industrial espionage, or to protect trade secrets. File system steganography can be seen as low-level steganography, while image and audio steganography (see section 4.2) can be seen as high-level steganography. High-level steganography makes it possible to almost com- pletely hide the presence of embedded data to such a degree that it becomes almost impossible to detect. This is achievable because the structure of the cover-file is well known. Low-level steganography is much less precise, as in the case of a file system; no assumptions can be made about the structure of existing data. In order to fully discuss different file system implementations we need to make a number of assumptions about the structure of a file system. These assumptions will be discussed below. 76 8TEGANOGRAPHY AND 8TEGANOGRAPHIC FILE SYSTEMS 4.5.1 File System Assumptions In order to fully discuss steganographic file systems a number of assumptions need to be made about the cover file system. The cover file system is the existing file system where data will be hidden. This file system will contain, in some form or another, a storage map which will mark allocated file system blocks. The size of the file system block will be defined by the cover file system, it will usually be about 1024KiB or larger. Existing data in the cover file system must be considered to be raw data, as no assumptions can be made about the structure or makeup of the data. As an extension to this, any allocated file system block must be considered to be "completely allocated" . A steganographic file system is confined to hide data within the unallocated cover file system blocks, while at the same time not inhibit the normal operation of the cover file system. As a result steganographic file systems rely heavily on encryption algo- rithms to obfuscate the presence of hidden data to the untrained eye. Ulti- mately hidden data is allocated in such a way to make it appear as if the data is a result of normal file system operations, such as the continual creation and deletion of files. In the section below we will discuss three different proposed methods for creating a steganographic file system. The following three steganographic file systems will be critiqued in the following chapter. Each of the discussed methods uses the above assumptions in order to hide data within the struc- ture of a host file system. 4.5.2 Anderson, Needham and Shamir Two different steganographic file system implementations were presented by Anderson, Needham, and Shamir [4]. They propose that a steganographic file system should provide plausible deniability while also securing the hidden data. Their two proposed solutions are discussed below. Method I The first method that is proposed by Anderson et al. [4] utilises a number of random cover-files in order to hide embedded data. The embedded data is hidden in an exclusive or (XOR) of a subset of the random cover-files, which are chosen with a password P. 4.5. STEGANOGRAPHIC FILE SYSTEMS 77 Assume that the file F is to be the embedded data, and the user specifies a stego-key P that has a bitlength of k. Then suppose the complete set of random cover-files are C 0 , C 1 , ... , Ck-l The subset of cover-files are then obtained by selecting CJ if the ;th bit of P is one. They go on to explain that the subset of cover-files are combined using a bitwise XOR to produce CxoR CxoR is then XOR'ed with F, the result of this is then XOR'ed with a cover-file, CJ, from the original subset. Anderson et al. [4] then go on to extend this method to include multiple security levels. This system relies on the existence of cover-files which will exist on the cover file system; which could potentially give away the existence of this method. Method II In this method Anderson, Needham, and Shamir [4] propose that a whole disk is filled with pseudo-random bits and the embedded data is stored at some pseudo-random location on the disk. They go on to explain that this approach is subject to the Birthday Paradox (see section 3.6.3, on page 54) and that collisions are likely to occur after y'n disk block have been written, assuming that there are n disk blocks in total. This implies that a disk is considered full once only a fraction of the total blocks has been written. Their solution is to write the embedded data at two or more pseudo- random locations on the disk. This has the effect of reducing the possibility that the embedded data will be overwritten, and thus increasing the total possible amount of disk blocks then can be used. The interested reader is referred to Anderson et al. [4] for more information. 4.5.3 McDonald and Kuhn McDonald and Kuhn [33] propose a system that is inspired by the second method presented by Anderson et al. [4] (see section 4.5.2). Their method uses a modified version of the Ext2 [8] file system to store embedded in- formation within unused file system blocks. Their design, called StegFS, is implemented as a Linux File System that is backwards compatible with Ext2. It incorporates fifteen different security levels and features all of the elements of a standard UNIX file system, such as a directory structure that contains directories and files, and hard and soft links. They make no attempt to hide the existence of their file system from the trained user, but provide plausible deniability through the use of encryption and a number security levels. 78 STEGANOGRAPHY AND STEGANOGRAPHIC FILE SYSTEMS The backward compatibility of their file system with Ext2 allows for the file system driver to be removed from the system and allow non-hidden files to be accessed using the normal file system driver. This ability allows further deniability because the hidden data will just appear to be unused disk blocks that have been overwritten. Access to the hidden data is controlled through a number of userspace tools. These tools give access to a particular security level using a specific password. Each of the fifteen security levels are accessible as directories under the root directory. McDonald and Kuhn [33] explain that normal file system operations are performed exactly as if the file system where a generic Ext2 file system; this however opens up the possibility that hidden data will be overwritten. To counter this StegFS will replicate the hidden data on disk; this will allow the hidden data to be recovered. This does not guarantee that hidden data, including the replications, be completely overwritten. In this case, an error is returned to the user and a file system repair tool can be used to clean up any remaining remnants of the hidden data. StegFS Structures In order for StegFS to reference hidden files within a security level, a Block Table and Inode structure are used. The Block Table controls the blocks that are allocated to hidden files. For every allocated block there is a 128-byte Block Entry structure, which contains magic numbers, checksum values, an initialisation vector, and an associated in ode number. The inode number in the Block Entry will reference an inode structure. The StegFS inode structure is similar to the Ext2 Inode structure and con- tains 12 direct blocks, one indirect block, one double indirect block, and one triple indirect block. Unlike the Ext2 Inode, the StegFS inode contains refer- ences to all the replicated versions of the hidden file. This allows the StegFS to retrieve the hidden data even if one of the replicated versions has been overwritten 4.5.4 Pang, Tan, and Zhou Pang, Tan, and Zhou [39] propose a steganographic file system that strives to minimise processing and storage overhead. To try and completely hide the embedded data, any metadata, such as inode tables and usage statistics 4.6. SUMMARY 79 are embedded within the hidden data and encrypted as a single object to the disk. Block allocation is managed with a single storage bitmap that manages both the hidden and non-hidden data. To try to minimise the overhead from block replication, as seen in section 4.5.3, hidden blocks are marked as allocated in the storage bitmap; this prevents these blocks being allocated to unhidden files during normal operation. This method of marking the location of hidden blocks with non-hidden blocks in the storage bitmap could be used to betray the location of the hidden data. To combat this, random values are written to every block during initialisation. There are a number of abandoned blocks that are introduced into the file system. These abandoned blocks are blocks that are marked as allocated in the bitmap but only contain random values. This is done to try to obfuscate the presence of actual hidden data. Together with abandoned blocks, a number of dummy files are introduced throughout the file system. These files are periodically allocated and deal- located, resulting in the storage bitmap periodically changing. This is done to prevent "snapshots" of the block bitmap betraying the location of any hidden data. No metadata is stored separate from the hidden files; as such hidden files are located using a hash value of the file name combined with an access key. The access key will only provide access to files that were created by a specific user. Allocated hidden files have the ability to hold on to free blocks, when a file is truncated, a file's inode can still reference the blocks that normally would not be allocated to the file anymore. This further obfuscates the hidden file, and makes it difficult to distinguish a hidden file from an abandoned block or a dummy file. 4.6 Summary This chapter was concerned with different steganographic concepts. The following concepts were discussed in this chapter: Steganography- discussed as an overview to steganography, in which we introduced a number of different steganographic concepts, including: - Terminology - we domain specific terminology was introduced to the reader, which is used throughout this, and later chapters. 80 STEGANOGRAPHY AND STEGANOGRAPHIC FILE SYSTEMS - Historic Steganography - we gave a brief overview of the historic use of information hiding. - Currency Protection Mechanism -we discussed this concept as an example of a non-digital use of steganography. - Copyright Protection Mechanism - in this section we discussed the use of steganography as a method for providing digital rights management (DRM). Digital Steganography - in this section we discuss the popular digital applications of steganography, including: - Image Steganography - arguably the most well-known applica- tion of information hiding. In this section we discussed the hiding of information in images. - Audio Steganography - where we discussed the hiding of infor- mation in audio data. - Least Significant Bit Attacks- we discussed methods of attacking steganographic content based on the least significant bit approach. Cryptographic File Systems - we discussed the family of file system which transparently encrypt user data, this is discussed as a precursor to later section. We introduced the following implementation, - The Cryptographic File System - CFS - in which we discussed this early cryptographic file system implementation. - Cryptfs - a cryptographic file system which is implemented as a loadable kernel module. - Linux Cryptoloop Driver - an implementation of the Linux loop- back driver which adds a cryptographic layer. Steganographic File Systems - where we discuss the family of file sys- tem which attempt to embed data within a file system implementation, we discuss the following implementations: - File System Assumptions - we introduce a number of assump- tions which relate to the cover file system. - Anderson, Needham, and Shamir- the first to describe a steno- graphic file system, laying down the framework for other imple- mentation. - McDonald and Kuhn - this steganographic file system implemen- tation which is more closely modelled on a traditional file system. 4.7. CoNCLUSION 81 Pang, Tan, and Zhou ~ a steganographic file system implemen- tation which operated in a slightly unique way, using dummy and abandoned blocks. 4. 7 Conclusion The ability to hide information in digital data is an important part of the electronic data that we interact with. We interact with steganographic tech- niques on almost a daily basis, and probably never realise it. Steganography is gradually finding its way into many different forms of digital data. As this data becomes more accessible through the growing use of the Internet there is a need developing for data to be secured, and steganography offers the techniques for doing so. There are often situations where cryptography is simply not sufficient to secure data, like top-secret military information. Steganography offers another level of protection against this sensitive data being compromised. In this chapter we introduce the concepts and applications of steganog- raphy. In section 4.2 we explain how steganography was historically used to hide important messages, and how data hiding techniques are used in non- digital data, such as currency. We then introduced some digital techniques for steganography in section 4.3, namely image steganography and audio steganography. In section 4.4 we explained some concepts of transparent cryptographic file systems and explained some implementations. Finally we introduced steganographic file systems in section 4.5, and discussed how data hiding is achieved in some file system implementations. Concepts introduced in this chapter are fundamental to chapters 5, 6, and 7. Steganographic concepts and terms are used extensively throughout the following chapters in order to describe certain components. The concepts introduced in chapters 2, 3, and 4 forms the basis for the remaining chapters. In the following chapter we will discuss the design and implementation of a Secure Steganographic File System. Part II SSFS: The Secure Steganographic File System 83 Chapter 5 SSFS: File System Implementation 5.1 Introduction A steganographic file system can be used to ensure the security of informa- tion, not only through conventional encryption mechanisms, but by allowing data to be hidden from unauthorised access. Large amounts of information can be stored in a secure manner within the structure of a host file system, which provides an advantage over traditional image or audio steganography. The implementation of a steganographic file system requires that a num- ber of different aspects be addressed, such as the duplication of hidden data. A steganographic file system will require a careful interaction between the so called "hidden" and "non-hidden" data in order to maximise the overall performance and reliability of the system. Additional information security features must be addressed, such as the use of cryptography, in order to provide a solution which can effectively secure steganographic content. In order to address the problems with existing steganographic file system implementations, we will introduce our steganographic file system. This chapter serves to introduce our Secure Steganographic File System (SSFS). In this chapter we introduce the implementation of our steganographic file system. Firstly in section 5.2 we discuss a number of terms which will be used in this, and later chapters to describe components of the steganographic file system. We go on in section 5.3 to discuss the problems with existing steganographic file system implementations. 85 86 SSFS: FILE SYSTEM IMPLEMENTATION In section 5.4 we introduce the aim for this file system implementation by discussing a number of aspects which must be addressed in order to achieved a non-duplicating steganographic file system. Finally in section 5.5 we discuss the basic construction of the steganographic file system with respect to the interaction of the different components. System A high-level computing environment which con- tains an operating system which allows access to a block device (such as a hard disk drive), usually through the use of a Kernel API. Host File System A file system implementation that contains (in some form) a superblock, storage map, and file and directory control blocks. The host file system is used as a container for the hidden file system. Hidden File System A file system that will reference hidden data within the host file system. The hidden file system is embedded within the host file system. Non-Hidden Data Data that is stored on the host file system. Hidden Data Data that is stored on the hidden file system. Shell A command line interface (CLI) which allows an operator to interact with the file system using hu- man understandable commands. Table 5.1: SSFS definitions 5.2 Definitions In order to fully discuss the implementation of our steganographic file sys- tem, a number of concepts must be defined, seen in table 5.1. These concepts will be used throughout the following chapters in order to describe certain components of the steganographic file system implementation and are essen- 5.3. PROBLEMS WITH EXISTING IMPLEMENTATIONS 87 tial to describe the interactions between different components of the overall system. 5.3 Problems with Existing Implementations In order to fully understand the proposed steganographic file system, we will now critically evaluate some problems with the existing steganographic file systems which SSFS will try to address. We will evaluate the imple- mentations of McDonald and Kuhn and Pang, Tan, and Zhou which were introduced in sections 4.5.3 and 4.5.4. 5.3.1 McDonald and Kuhn The steganographic file system implementation by McDonald and Kuhn (see section 4.5.3 on page 77) stores hidden data in a backward compatible file system implementation. The major drawback of this particular implementa- tion is the duplication of steganographic data which has to occur in order to avoid collisions between hidden data and non-hidden data. Hidden data is en- crypted, as such the contents are only accessible with the correct passphrase. However the existence of the hidden data can be betrayed through a combi- nation of the data duplication and a low-level examination of the physical file system blocks. In order to limit the exposure of hidden data, and therefore limit the risk of detection, it would be advantageous to avoid data duplication altogether. Data duplication will create instances where an exact copy of the hidden data is stored in two or more physical locations on a device. In order for a steganographic file system to effectively hide data, the hidden data is masked to appear as random artefacts from the multiple ma- nipulations of a disk block. During the normal operation of a file system, the possibility that unused data in two or more unallocated blocks will be exactly the same is extremely low. Steganographic content can be detected by physically examining the file system and finding two or more unallocated blocks which contain the random same data. If this condition is met then it can be said with relative certainty that a host file system contains steganographic data. However a passphrase 88 SSFS: FILE SYSTEM IMPLEMENTATION will still need to be obtained in order to actually view the steganographic content, which is computationally expensive. It would be advantageous to avoid data duplication in order to eliminate the possibility that hidden data could be detected by examination of the file system structure. Summary Presented below is a summary which outline the findings for the discussion above. The file system implementation is backward compatible with the host file system. In order to avoid collisions between the hidden and non-hidden data duplication of the hidden data is used. This has the effect of: - Two or more physical block will contain the same steganographic content. - Increasing the possibility that the steganographic content can be detected by low-level examination of the device. The file system contains it own set of control structure - which are also duplicated. The control structures reference all duplicates of the hidden data. In the following section we will evaluate the implementations described by Pan, Tan, and Zhou. 5.3.2 Pan, Tan, and Zhou This particular implementation, discussed in section 4.5.4 on page 78 does not suffer from the duplication of data which exists in the previously discussed implementation. The authors however do not make any attempt to hide the fact that a steganographic file system is in place. There is no distinction between the host and hidden file system; there is only a single file system which can hide data. This will have performance benefits but will not have the inherent plausible deniability aspects of utilising a cover file system. 5.3. PROBLEMS WITH EXISTING IMPLEMENTATIONS 89 In order to manage the storage of data in this implementation, all the storage information is stored in a common location. This however could pro- vide a single point of failure for the entire system. Should the storage object become corrupt all data, hidden and non-hidden will be lost. The conven- tional file system design of distributing the file system metadata structure to different physical locations will minimise the risk of file system data becoming corrupt through a single failure. The use of abandoned and dummy blocks allows for an efficient method of obscuring the location of hidden data, this does not ensure that data will remain hidden. This implementation relies on strong cryptography in order to ensure that data will remain secure. The lack of a cover file system does betray the location of hidden data as the exact physical positions must be explicitly marked in a shared storage map. Given this, this implementation does not create a steganographic file system in the purest sense of the concept. Summary Below we present a summary of the above discussion. All control information for the hidden data is stored in a single object. The storage map marks physical blocks allocated to hidden and non- hidden data. Blocks allocated to hidden data are not automatically reclaimed when they are no longer used in order to obscure the location of the hidden data. Abandoned and dummy blocks are used to obscure the presence of the hidden data. - These blocks are moved around the file system in order to obscure the location of the hidden data. Lack of a cover file system could betray the presence of the hidden data. In the following section, the aim of the proposed implementation will be discussed, together with the aspects which a steganographic file system should possess. 90 SSFS: FILE SYSTEM IMPLEMENTATION 5.4 Aim The aim for SSFS is to provide a mechanism to hide arbitrary data within an unassuming regular file system. There are a number of areas which need to be addressed when constructing a steganographic file system. These are listed and discussed below. Security ~ hidden data must remain protected from attack. This can be achieved through the use of cryptography, and providing confiden- tiality, integrity, and availability. Consistency ~ hidden data that is read from the steganographic file system must be the same as the initial data that was stored. Transparency ~ normal operation of the host file system should not be impacted by the embedded hidden file system. Backward Compatibility ~ the implementation should be backward compatible with the cover file system, thus providing plausible denia- bility. Dynamic Reallocation ~ to allow the hidden data to be reallocated to any free physical block, in order to avoid data duplication, thus avoiding collisions between the hidden and non-hidden data. The aim of the following chapters is to describe the SSFS implementation which provided a secure and convenient mechanism for embedding arbitrary user data within a host file system. A further aim is to present an implemen- tation which is free from having to duplicate data in order to avoid collisions, as described in the previous chapter (see section 4.5, page 75). The proposed solution is to provide a dynamic reallocation mechanism that will transpar- ently reallocate hidden data. This implementation will take the form of a file system within a file sys- tem; this will allow for efficient storage and provide the foundation for dy- namic reallocation. This steganographic file system will focus on modifying an existing file system implementation in order to support embedded data. This will also allow for the host file system to be accessed by a standard file system driver. Backward compatibility with the original host file system driver must be maintained in order to effectively obscure the hidden data, and to provide a plausible deniability feature. 5.4. AIM 91 In the following sections we will discuss each of the goals for SSFS stated above, these are security, consistency, transparency, backward compatibility, and dynamic reallocation. Security Information security is achieved through a combination of information hiding and cryptography, to provide confidentiality, integrity, and availability. The security of the user data is achieved through the use of cryptogra- phy, which allows the file system to restrict unauthorised access to the hidden data. Only by supplying the correct passphrase will access to the data be allowed. A strong cryptographic algorithm will ensure that the user data cannot be accessed or modified without knowing the passphrase, thus pro- viding confidentiality, and integrity. Availability is assured through the file system implementation which will manage and ensure access to hidden data when required. The choice of algorithm is important, as this will greatly influence the overall performance of the file system. A good example of a cryptographic algorithm to use is the Serpent algorithm, as discussed in section 3.3.3 on page 43. Recall that the Serpent algorithm is designed to be a fast and secure, modern cryptographic algorithm and thus is perfectly suited to this kind of application, however, any modern cryptographic algorithm, such as Rijndael [17] would provide a good basis for data encryption. The Serpent algorithm was chosen for this implementation as it is not patented and is in public-domain, with no restrictions on its use. Consistency It is important for the data that is contained within the file system, hidden and non-hidden to remain consistent. This is especially important for hidden data which exists in an environment which supports dynamic reallocation. Data that is contained in the hidden file system could remain unchanged; even after multiple reallocations. Precautions must be taken with both the metadata and the user data to ensure that it remains consistent even after reallocations have taken place. Data consistency is difficult to achieve without the use of mechanisms such as a journal. There is always the possibility that a catastrophic event, such as a hard drive failure, would render user data inaccessible, which is unavoidable. 92 SSFS: FILE SYSTEM IMPLEMENTATION Throughout the development of a steganographic file system, great care needs to be taken to ensure that all the data remains consistent. 'lransparency Transparency refers to the ability of the host file system and hidden file system to interoperate on the same physical device without interfering with each other. This allows the user of the host file system to be unaware of the existence of the hidden file system, or the hidden data. This mechanism provided within the file system implementation must support transparent access to data in both the host and the hidden file system. The operation of the host file system must appear to be completely nor- mal; it must behave as it would under normal circumstances. This is a consideration when taking dynamic reallocation of the hidden data into ac- count, because of the interaction which will have to take place between the host and hidden file systems. Any interactions between the host and hidden file system must be designed in such a way as to not produce any behaviour as to indicate that there is another file system in operation. Backward Compatibility Backward compatibility refers to the ability of the steganographic file system implementation to remain compatible with a standard driver for the host file system. The host file system is implemented from an existing file system, therefore a standard driver for the original file system must be able to access the data stored within the host file system. For example if the host file system is constructed from the FAT file system, then a standard FAT driver should be able to read and write to the file system as if it were a normal FAT file system. This is achieved by ensuring that the steganographic file system implementation does not modify existing file system structures, while maintaining interoperability between the host and hidden file systems. Backward compatibility has the effect of aiding plausible deniability by allowing a user to access the steganographic file system with the standard file system driver. This process allows the user to deny the existence of the steganographic content, because as far as the standard file system driver is concerned, the steganographic content does not exist. The process of accessing the steganographic file system with the stan- dard file system driver will result in rendering the hidden data permanently 5.4. AIM 93 inaccessible. There will no longer be the protection of the steganographic file system implementation to ensure interoperability between the host and hidden file systems. This can be useful if a user is requested to access data whilst under duress, as the presence of the data can effectively be denied. Dynamic Reallocation Dynamic reallocation is the ability for the file system to automatically reallo- cate hidden data to a different physical block. This will occur when the host file system requires that a physical block which contains hidden data must be used for non-hidden data, this will be discussed later on in this chapter. This provides a method of avoiding duplication of the hidden data to avoid collisions with the non-hidden data, by simply moving hidden data to a different physical location and updating the hidden file system control structures. Dynamic reallocation allows the file system to effectively mask the presence of the hidden data, as no duplicates are stored which will limit the overall exposure of the hidden data. In the following section we will discuss the need for a steganographic file systems, and the role in which it can play in securing information. 5.4.1 The Need for a Steganographic File System As more of our personal information moves into the electronic realm and is distributed via email and through the Internet; there is a growing need for that information to be protected. If our personal information falls into the wrong hands we open ourselves up to issues such as identity theft and fraud. By using a steganographic file system, it adds another layer to conven- tional information security in a transparent and convenient way. This will give individuals the confidence to store personal and sensitive information on a computer system, because there is no longer the fear that this information can be obtained without the expressly granted permission and knowledge. Information hiding techniques can be used to allow not only the data con- tent to be obfuscated, but also the presence of the data. Data contained within a steganographic file system can only be revealed with the express permission of the user. Information which is stored on a computer is not inherently secure from attack. A resourceful user can gain access to almost any data which is stored on a computer with varying degrees of effort. Cryptography allows us to 94 SSFS: FILE SYSTEM IMPLEMENTATION secure out information from outside attack, but the evidence of encrypted data is clearly visible. Information hiding allows the existence of data to be denied, giving a user greater control, over the access of their data. Information security is of utmost importance as we move more towards a cyber-existence. As our presence on the Internet increases, the user must take the adequate steps to ensure that personal information remains secure. A steganographic file system can be used to hide important information, giving the ability for the existence of our information to be denied; this will ensure that sensitive information can remain almost completely secure. In the following section we will discuss a number of limitations which exists for steganographic file systems. 5.4.2 Limitations of a Steganographic File Systems Unlike image and audio steganography (discussed in the previous chapter), file system steganography does not allow data to be completely hidden from a forensic examination of a hard disk drive, this is inherent in the way in which data is stored on the physical disk. Steganographic file systems draw their strength from an effective organisation of the hidden data on the physical device, and through the use of cryptographic algorithms. Steganographic file systems aim to store hidden data in such a way as to allow hidden data to appear to be random artefacts from normal file system use. Most modern file systems do not remove the data associated with a file when it is deleted, they only mark the corresponding file system blocks as unallocated, and reclaim the associated file control blocks. This does have the effect of improving the performance of the file system, but this does lead to artefacts, or remnants of deleted files, developing in the unallocated blocks over time. By embedding data in the unallocated blocks, a host file system intro- duces a limitation where the host file system blocks must be considered com- pletely allocated. For example, should the host file system have a block size of 1024 bytes, but only writes 128 bytes to a block, the whole block must be marked as allocated, even though there are theoretically 896 unallocated bytes. This file slack space is not used by the steganographic file system as the host file system and the hidden file system are generally two separate entities, and are designed to have only very minor interactions, if any. The structure of the data contained in the host file system is not exposed to the hidden file 5.5. BASIC CONSTRUCTION 95 system implementation; this data is considered raw in nature. The high-level construction of the files and the directories in the host file system are not parsed by the hidden file system, the physical blocks which they allocate are only considered allocated. Although storing hidden data in the slack space would increase the overall storage capacity of the hidden file system, a large administrative overhead would be introduced as hidden data would not be contained in discrete sized file system blocks. This would affect the encryption and dynamic realloca- tion mechanisms as "file system blocks" of unequal size will introduce large performance and administrative limitations. In order to maximise performance, hidden file system blocks are consid- ered to be all equally sized, which will allow them to be reallocated easily, thus eliminating data duplication. However, the total maximum size of the hidden file system is affected. The analysis and investigation info forensic techniques relating to the detection of SSFS is not the main focus of this dissertation, and therefore does not fit into the scope of this document. However, forensic analysis of the hidden file system would allow forensic examiners to detected steganographic content on a physical device. This would be advantageous to prevent the abuse of a steganographic file system through the storage of illegal data. In the following section we will discuss the basic construction of SSFS. This section will serve as an overview for the detailed construction which will be presented in later sections. The following section will help to introduce concepts and aspects of the overall design. 5.5 Basic Construction To facilitate a detailed discussion of a steganographic file system, this section will introduce a number of concepts to help understand how the host and hidden file systems will interact. The steganographic file system described in this dissertation is implemented in a Linux environment. This is because Linux is open and fairly well documented, as such, a number of terms and concepts used in this dissertation are based on those found in a UNIX-type environment. In order to achieve a non-duplicating steganographic file system, a file system within a file system approach will be taken. The steganographic file system is divided into two parts, namely the host file system and the hidden file system. We assume that there are a number of hooks into the kernel API 96 SSFS: FILE SYSTEM IMPLEMENTATION to provide a number of low-level functions such as reading and writing file system blocks. Each of these two parts will be discussed below. 5.5.1 Modes of Operation There are two different high-level modes of operation that must be considered when interacting with a steganographic file system. These are host-only mode and hidden-only mode. These different modes operate on either the hidden or non-hidden data on the block device. Both these modes will be discussed below. Host-Only Mode This is what would be considered normal operation of a file system; it per- forms as a normal file system would. A shell would allow the operator to only access non-hidden data. This would be the mode in which the system would operate on a day-to-day basis. However the host only mode will require ac- cess into the hidden file system in order to facilitate the dynamic reallocation mechanisms. However in terms of access to data, only non-hidden data will be exposed to the user while in this mode. Hidden-Only Mode This mode would be used to access the hidden-data contained within the hidden file system. Access would be controlled through the use of a dedicated command line interface, and through access control mechanisms such as a passphrase. The hidden data can only be accessed with the pre-existing knowledge of the passphrase. The separation of the access of the hidden and non-hidden data is done in order to clearly define the separate roles the data will play in the overall system. The hidden data will be kept more secure if the access to the data is tightly controlled, which is why the interactions between the two modes are kept to a minimum at all times. 5.5.2 The Host File System The host file system is derived from an existing file system implementation, such as the Ext2 or FAT file system. The host file system implementation 5.5. BASIC CONSTRUCTION 97 Host File System Layout: 0 1 n ... I Superblock I Storage Bitmap I Inode Bitmap I Inode Table I User Data I . I Figure 5.1: Simple host file system layout must remain backward-compatible with the original file system implemen- tation; this is done to provide a level of plausible deniability. Should the steganographic file system come under scrutiny the host file system would appear to be the original implementation. For example, if the host file sys- tem is constructed from the FAT file system, then a normal FAT file system driver should be able to access all the non-hidden data as if it were a normal instance of the host file system. The steganographic implementation will take advantage of the way in which the host file system stores data on the physical device. This will allow the hidden data to be embedded within the unallocated blocks of the host file system. For the purposes the following chapters, a simple host file system will be used. This file system will have the characteristics listed below, and these structures are arranged on disk as seen in figure 5.1. Superblock - to manage control information about the file system. The superblock will have a size of 1 file system block. Storage Bitmap - to mark the allocated blocks within the file system. The storage bitmap will have a variable size, depending on the size of the physical device. !node Storage Bitmap - to mark the allocated inodes within the Inode table. The inode bitmap will have a variable size, depending on the size of the physical device !node Table - to store the file and directory control blocks. The inode table will have a variable size, depending on the size of the physical device. 5.5.3 The Hidden File System The hidden file system is the component of the steganographic file system which is used to store and reference the hidden data, which is embedded 98 SSFS: FILE SYSTEM IMPLEMENTATION Hidden File System Logical Layout: 0 1 n ... l Superblock I Translation Map I Inode Table I User Data I. I Figure 5.2: Hidden file system logical layout within the unallocated blocks of the host file system. The hidden file system is a complete file system in its own right, and contains the following metadata structures: Superblock - to manage basic storage information about the hidden file system. Translation Map - to facilitate storage and dynamic reallocation. !node Table - to store and manage the file and directory control blocks. Another aspect of the hidden file system is that every part has to be reallocatable within the host file system, except for the superblock. Normal operation of the host file system will not be hampered in any way. The hidden file system therefore has two different views; the Logical View and the Physical View. These two views will be discussed in the following sec- tion. The "logical" layout of the hidden file system is shown graphically in figure 5.2. 5.5.4 Logical and Physical View The logical and physical view of the hidden file system is used to facilitate the dynamic reallocation of the hidden data, while providing a consistent way to store and reference the hidden data. The primary mechanism through which this is achieved is the Translation Map (which will be discussed in the following chapter), which stores a paired value for each allocated block within the hidden file system. This paired value allows for hidden data to be stored in a consistent "logical" position in the hidden file system, but in actuality can be stored in any "physical" position within the host file system. The integration between the hidden and host file systems is also achieved through the use of the logical and physical view. As seen in figure 5.3, data within the hidden file system is logically allocated in a contiguous manner, 5.5. BASIC CONSTRUCTION 99 40 Host File System: Host Allocated Hidden File System: Hidden Allocated Hidden Position in Host 30 Figure 5.3: Hidden and host file system integration but the actual physical position of the data can be in any of the unallocated blocks within the host file system. In the following section an operational scenario is presented in order to demonstrate how the hidden and host file systems will operate with a dy- namic reallocation mechanism in place. The following section is presented in a high-level manner in order to simply demonstrate the operation of the steganographic file system. 5.5.5 Operational Scenario In order to demonstrate the operation of a steganographic file system, an op- erational scenario will now be presented. This scenario will demonstrate the features and requirements of the host and hidden file systems. This scenario is a simplistic overview of the operation of the steganographic file system. All the steps described below are demonstrated graphically in figure 5.4 on page 101. Alice initialises a steganographic file system on a block device. This process initialises the host and the hidden file system structures on the disk. Alice now stores data on the host file system (File A); this would be considered normal operation of the host file system. Alice now stores data on the hidden file system (Hidden File B); this hidden data is stored within the unallocated blocks of the host file system. 100 SSFS: FILE SYSTEM IMPLEMENTATION Alice now stores data on the host file system (File C). The host file system still will consider the blocks where hidden data is contained as unallocated and will now overwrite the hidden data with the new data. The following events will now need to occur: 1. The file system detects that there is steganographic content in the unallocated blocks where the data is to be written. 2. The hidden data is moved to a new location within the host file system. 3. The host file system operation can now continue as normal. Alice now wishes to access the hidden data that was previously stored on the hidden file system (Hidden File B). The steganographic file system must return the correct data to Alice, even if the hidden data was reallocated. 5.6 Summary In this chapter we introduced SSFS, by discussing existing implementations, introducing our aim for our implementation, and discussing the basic con- struction. We covered the following sections: Definitions - where we discussed a number of concepts which will be used throughout the following chapters to describe components of the steganographic file system. Problems with Existing Implementations - where we critically dis- cussed the problems with the following existing file system implemen- tations: McDonald and Kuhn Pan, Tan, and Zhou Aim -where we outline the aim for this steganographic file system. We then discussed the following: - The Need for a Steganographic File System - where we outline the need for a steganographic file system in modern computer systems. 5.6. SUMMARY Step 1: Initialisation Step 2: Host File Addition ~ File A - r r r r r < T T T ~ ~ - r r r r r < T T T ~ o ~ Hidden File B Step 4: Host File Addition ~ File C Step 5: Access Hidden File ~ Hidden File B ~ I Allocated Host Block I Allocated Hidden Block '--C.._/ Dynamic Reallocation Figure 5.4: Steganographic file system operational scenario 101 102 SSFS: FILE SYSTEM IMPLEMENTATION Limitations of a Steganographic File System - where we discuss a number of limitations of steganographic file systems. Basic Construction- in this section we outline the basic construction concepts for the steganographic file system implementation. We the discussed the following concepts: Modes of Operation - where we discussed how the host and hid- den file systems will access their data. The Host File system - in this section we discussed the basic construction and operation of the host file system. The Hidden File System - where we discussed the basic layout for the hidden file system. Logical and Physical View- in this section we defined the differ- ences between the logical and physical view of the physical device as used by the host and hidden file systems. Operational Scenario- in which we presented an operational sce- nario in order to describe the workings of the steganographic file system. 5. 7 Conclusion Information security plays an important role within the framework of our modern lives. As more of our information is transmitted electronically, there is a growing threat to our security. Thaditional information security mech- anisms, such as cryptography, are becoming less effective in securing our personal information. Steganography can be used to add another layer of protection to our information, by hiding the presence of our data. A steganographic file system allows a large amount of data to be stored on a physical device. This allows the existence of our data to only be revealed with our express permission. This gives us the ability to store our personal information with confidence that it will not be discovered. In this chapter we discussed the design of a steganographic file system, which will allow information to be stored within the unallocated blocks of the host file system. In section 5.2 we introduce a number of terms used to describe the components of a steganographic file system. We then go on to discuss the problems with existing systems, and in section 5.4 we discuss the aim for this steganographic file system by a introducing number of different 5.7. CONCLUSION 103 aspects with the file system must satisfy. Finally we go on in section 5.5 to give an overview of the basic operation of the steganographic file system, in order to demonstrate how the component parts interact. This chapter is the basis for chapters 6 and 7, which are fundamental for the discussion on dynamic reallocation in chapter 9. In the following two chapters we will discuss a number of file system structures and their implementation; this will allow us to define the working framework for SSFS. Chapter 6 File System Structures for SSFS 6.1 Introduction The design of the file system structures plays an important role for defining the layout of the steganographic data on the disk. These structures will be modified by file system operations in order to manage all aspects of the hidden data on the physical device. The effectiveness of the underlying file system structures will play an important role in the management and performance of the file system. SSFS requires a well-defined data structure that can effectively manage and reference hidden data. This allows for operations such as encryption and dynamic reallocation to be applied at a later stage. In this chapter we outline the structures which will be used in SSFS in order for the hidden file system component to effectively store and retrieve data. We will also discuss the initialisation of these structures, with emphasis on the limitations which the host file system will introduce. In this chapter we will discuss the structures used to store and reference data within the hidden file system. In section 6.2 we discuss the construction of the structures, and describe in detail each internal field. We continue in section 6.3 by discussing the initialisation of these structures as this plays an important role in determining the initial state of the file system. 105 106 FILE SYSTEM STRUCTURES FOR SSFS 6.2 File Systems Structures The hidden file system consists of a number of different structures which control the layout of the hidden data within the file system. These structures will be similar to the corresponding structures which are found in normal file systems. The construction of these structures will be discussed in the following sections. The structures discussed in the following sections are of vital importance, as they play a role in all aspects of managing the hidden data within the hidden file system; their construction needs to always be geared to achieve this end. The structures which will be discussed will be the Superblock, the TMap Array, the Translation Map, the Inode Table and Entries, and the Directory Entries. 6.2.1 Superblock The superblock is responsible for storing basic control information about the hidden file system, such as the block size, the location of other important structures, and management information, such as the number of available blocks. The superblock is the only structure in the hidden file system that must remain in a constant position within the host file system. Normally the file system superblock is stored in the first physical block, the superblock for the hidden file system is no exception, it will be stored in the first physical block. This does present a problem, as the first file system block already contains the host file system's superblock. This does not present a problem as the host file system does not use the entire first block to house its superblock. The hidden file system's superblock can be stored directory after the host file system's superblock in the first file system block. This will work as the host file system will reserve the entire first disk block for the host file system superblock structure; depending on the host file system block size this can be anywhere from 1024 to 8196 bytes in size. The host superblock will only occupy as small portion of that, allowing the superblock to be stored in the slack space directly after the host file system's superblock. The superblock is stored in the first logical hidden block (logical position 0) in the hidden file system. Logical position 0 is the only logical block which must be the same as the physical block (i.e. logical block 0 is always referenced to physical block 0). The structure of the hidden superblock can 6.2. FILE SYSTEMS STRUCTURES 107 be seen in listing 6.1, followed by a description of each of the elements of the structure. Each field in the superblock is used to store and manage information concerning the overall hidden file system. Every structure that is stored in the hidden file system must be locatable through the superblock. Each of the fields in the superblock will now be discussed. Listing 6.1: Superblock structure 1 //Define the Magic Numbers for the Superblock 2 #define HIDDEN_.FS...SPBLKJVIAGIC Ox48535042 // 'HSPB' 3 #define HIDDEN_.FS...SPBLKJVIAGIC2 Ox48454e44 // 'HEND' 5 typedef struct hiddenfs_superblock 6 { 7 8 10 11 12 14 15 17 18 20 21 23 24 25 } unsigned int magic = HIDDEN_.FS...SPBLK_MAGIC; unsigned int flags; unsigned int inode_number; unsigned int inode_table_start; unsigned int inode_table_size; unsigned int tmap_start; unsigned int tmap_size; unsigned int num_blocks; unsigned int num_blocks_used; unsigned int rooLinode; unsigned int block_size; unsigned int iv; unsigned int magic2 = HIDDEN_.FS...SPBLKJVIAGIC2; hiddenfs_superblock; Superblock Control and Consistency Fields The first fields of interest are the magic numbers magic and magic2. These 32-bit numbers are used to check the consistency of the superblock. Should the magic number not match the predefined magic constants, then the file system will fail a consistency check. Although this is a very rudimentary way to check file system consistency, it does provide a simple and quick method of determining if the superblock has become corrupt. In a dynamically changing environment, such as a file system, this is of utmost importance. 108 FILE SYSTEM STRUCTURES FOR SSFS The next field is the flags field. This field stores file system flags which will control the operation of the hidden file system. There are two flags which are defined for the hidden file system; these are Preserving mode and Sacrificial mode. These two modes will be discussed in detail in the following chapter. These flags control how the steganographic file system will behave should there be no unallocated host file system blocks available for dynamic reallocation. Superblock Inode Table Fields The next fields store information about the Inode table. The number of inodes available to the hidden file system will be stored in inode_number. The logical hidden file system block where the inode table can be located is stored in inode_table_start. The size of the inode table is stored in inode_table_size, this value stores the number of hidden file system blocks which are allocated to the inode table. These fields will allow the inode table to be located, which will in turn allow a particular inode entry to be located. Superblock Translation Map Fields The Translation Map information is stored next in the superblock. The fields, tmap_start and tmap_size, store the logical starting block of the Translation Map and number of hidden file system blocks that are allocated to the Translation Map respectfully. The Translation Map allows for the hidden file system to translate between logical hidden file system blocks and host physical blocks. Superblock Root Directory Field The inode number of the root directory is stored in root_inode. This will allow the hidden file system to locate the root directory which is used to store any subsequent hidden files and directories. The root directory is cre- ated during file system initialisation, which will be discussed in the following section. Additional Superblock Control Fields The next field is block_size, which stores the byte size for each file system block. This value will be the same for both the hidden and host file sys- tems in order to maintain maximum interoperability. The following field, i v, 6. 2. FILE SYSTEMS STRUCTURES 109 stores a 32-bit initialisation vector that will be used to control an encryption algorithm in order to access various parts of the file system. One may notice that there are very few "metadata" related fields in the superblock, such as the name of the volume, which you would generally ex- pect to find in the superblock. This is done for two reasons, firstly to keep the size of the superblock as small as possible, by only storing essential infor- mation, and secondly to make the superblock appear to be less conspicuous. This should help to slightly obscure the superblock from detection. The superblock has a size of 52-bytes. Remember, the superblock is stored directly after the host superblock, in the same physical location; the small size of the hidden file system's superblock facilitates this. The next structure that will be discussed is the TMap Array. This struc- ture is considered to be part of the superblock and it allows the Translation Map, which is discussed in a later section, to be located in any physical location on the device. 6.2.2 TMap Array The TMap Array is a very simple structure that lists all the physical blocks that are allocated to the Translation Map, which will be discussed in the following section. The TMap Array is stored directly after the hidden superblock, in the same physical block, so that its position is always consistent. This is done so that the Translation Map can always be located, even if it has been real- located on disk. Conceptually the TMap Array can be seen as part of the hidden superblock. The physical position of the blocks allocated to the Translation Map is stored within the TMap Array. The Translation Map may exist in any of the unallocated physical blocks, and may be locatable by both the hidden file system, and the host file system in order for dynamic reallocation to take place. The physical blocks allocated to the Translation Map need not be contiguous; the sole purpose of the TMap Array is to allow the Translation Map to be located. For every physical block which is allocated to the Translation Map, a single integer value is required in the TMap Array. For instance, if the file system block size is 1 KiB, and say the Translation Map requires 16 KiB. The TMap Array will require 16, 4-byte integers (assuming a 32-bit architecture), and will have a total size of 64 bytes. The size however is dependent on the 110 FILE SYSTEM STRUCTURES FOR SSFS overall size of the hidden file system, and the file system block size. The discussion on the determination of the exact size will be discussed in the following section. The definition of the TMap Array can be seen in listing 6.2, where the byte size of the array is obtained from line 15 of listing 6.1. Listing 6.2: Definition of the TMap Array 1 unsigned int tmap_array [ hiddenfs_superblock. tmap_size]; As can be seen from the above listing, the TMap Array is simply an array of integer values, one for each of the blocks that are allocated to the Trans- lation Map. In the following section the Translation Map will be discussed, this structure will allow hidden data to be stored in a logically contiguous manner, yet be located at any physical location. 6.2.3 Translation Map The Translation Map provides the ability for data to be dynamically reallo- cated within the hidden file system. It allows hidden data to be stored in a constant logical order, but located at any physical location within the host file system. The Translation Map is a structure which maps logical blocks to phys- ical locations. All hidden data is organised in terms of its logical position within the hidden file system, when data needs to be stored or retrieved, the Translation Map is used to perform the translation between the logical and the physical position on disk. The structure of Translation Map is seen in listing 6.3. Conceptually the Translation Map is made up of an array of hiddenfs_tmap_entry (as seen on line 7), one entry for each block allocated to the hidden file system. The Translation Map also provides the storage map to the hidden file system; by using the allocated member (line 3 of listing 6.3) any unallocated blocks can be located. This allows the Translation Map to play the dual of providing the location of a physical block for a particular logical location, and marking allocated logical blocks. 6.2. FILE SYSTEMS STRUCTURES Listing 6.3: Translation Map structures 1 typedef struct hiddenfs_tmap_entry 2 { 3 unsigned char allocated [1]; 4 unsigned int entry; 5 } hiddenfs_tmap_entry; 7 hiddenfs_tmap_entry hiddenfs_tmap [superblock. num_blocks]; 111 Each Translation Map entry is a 5-byte structure that contains two en- tries. The first field, allocated, is used to mark if a particular entry is allocated within the hidden file system. The second field, entry, is used to store the physical block that is mapped to a particular Translation Map entry. The Translation Map itself is simply an array of hiddenfs_tmap_entry's. There is one entry for every possible hidden file system block. The size of this structure will depend on the number of blocks that are allocated to the hidden file system. The size of the hidden file system needs to be considered as to ensure that the Translation Map does not grow too large. The 5-byte size of the Translation Map entries means that the entire structure will not be "block-aligned" with the block size, a Translation Map entry could fall between a block boundary. This was done to obscure the Translation Map slightly. Generally structures are designed to fall neatly within a file system block, as to ensure that they are simple to locate. By allowing the Translation Map to not be block aligned the overall structure of the Translation Map will be obscured. The TMap Array and Translation Map could be implemented as a B-Tree, or equivalent data structure. This would have the benefit of decreasing the time required to search for an entry in the Translation Map. The Translation Map is rather implemented as a linear array of Translation Map Entries in order to minimise the storage requirement on the physical device. In the following section the structures relating to the Inode Table will be discussed. These structures are used to reference blocks that are allocated to files and directories, and to manage any related metadata. 112 FILE SYSTEM STRUCTURES FOR SSFS 6.2.4 Inode Table The Inode table is used to store the hidden file and directory metadata. The Inode table is a collection of a number of inode entries, each one can be used to store information about a file or a directory. Each inode is constructed from a number of different smaller structures, which can be seen in listing 6.4 (on page 113 and each element of the struc- tures are described below. An inode entry always references the logical posi- tion of data within the hidden file system, this allows for the inode structure not needing to be changed when the hidden data is reallocated. The blocks that are allocated to an inode entry are stored in extents (see listing 6.4, line 1). An extent stores allocated blocks using a starting position and a length, a single extent can reference a large number of consecutive blocks. The inode structure also makes provision for indirect and double- indirect blocks (see listing 6.4, line 10 and 11), which further increases the allowable size for a file. As with a normal UNIX file system, the indirect blocks will reference a logical block within the hidden file system, which will contain a number of extents. The double-indirect block will reference another logical disk block which will contain a list of references to indirect blocks, which will in turn reference a number of extents. Indirect and double-indirect blocks are only likely to be used when the file system is heavily fragmented. The size of an inode entry plays an important factor in determining the overall size and performance of finding and accessing an inode within the inode table. The size of the inode entry will be of a power of 2; this will ensure that the inode table will fit cleanly into the host file system blocks. For example, if an inode entry has a size of 128 bytes, and the file system has a block size of 1024 bytes, then 8 inode entries will fit cleanly into a single file system block, with no overlap into the next block. This allows the location of a particular inode entry to be located quickly. In order to mark if a particular inode is allocated in the inode table, the Most Significant Bit (MSB) of the inode__number (see listing 6.4, line 21) is changed to either 0 if the inode entry is unallocated or 1 if the inode entry is allocated. This is used to substitute for a separate storage map to mark allocated and unallocated inode entries. 6.2. FILE SYSTEMS STRUCTURES Listing 6.4: !node Table entry 1 typedef struct hiddenfs_exten t 2 { 3 unsigned int start; 4 unsigned int length; 5 } hiddenfs_extent; 7 typedef struct hiddenfs_inode_data 8 { 9 hiddenfs_extent direct [DIRECT_ELOCKS]; 10 unsigned int indirect; 11 unsigned int double_indirect; 12 unsigned int size; 13 unsigned int padding; 14 } hiddenfs_inode_data; 16 typedef struct hiddenfs_inode 17 { 18 unsigned int magic; 19 unsigned int mode; 20 unsigned int key; 21 unsigned int inode_number; 23 hiddenfs _inode_d at a data; 24 } hiddenfs_inode; 113 114 FILE SYSTEM STRUCTURES FOR SSFS Inode Extent Structure Fields The first hiddenfs_extent structure, as seen on line 1 of listing 6.4, is used to store block information which will be used in an inode entry. Recall that an extent will store a "list" of blocks in terms of consecutive logical blocks from where they start, and then the number of blocks that follow the starting block to form the extent. The start field is used to specify the start of the extent, and the length field is used to store the length of the extent. Extents allow the file system to reference a large number of blocks using a relatively small amount of space. For example, 1000 file system blocks can be referenced by a single extent structure (if the logical blocks are contigu- ous). However if there is significant file system fragmentation, there may be multiple extents needed to represent the same number of allocated blocks. A single extent structure has a size of 8-bytes. Inode Data Structure Fields The next structure to be discussed is the hiddenf s_inode_data structure, as seen on line 7. This structure is used to reference the blocks allocated to hidden data. The direct field is used to directly reference extents from within the inode entry itself. This allows the file system to quickly find the blocks that are allocated to the inode. The next field is the indirect field; this field stores the logical location of an indirect file system block. This indirect block is used to store a number of extents that are associated to this inode. The indirect block is only used if the inode structure has run out of directly stored extents. The next field is the double_indirect field; this will store a logical loca- tion of a double-indirect block that will be used to store extents. This field is only used if the inode has run out of storage in both the direct and indi- rect blocks. Assume a file system block size of 1024 bytes then an indirect block can reference 128 extents. A double-indirect block will require a 4- byte integer (assuming a 32-bit architecture) to store a reference for a single indirect block, a double-indirect block will therefore reference 256 indirect blocks. Each indirect block will in turn reference 128 extents. This will allow a double-indirect block to reference 32768 extents. Technically this amount of available addressable storage is not required as the hidden file system will never typically be this large, however it does allow for expandability. The size of the data referenced by the inode is stored in size. This is simply the byte value for the amount of the data that is referenced by the 6.2. FILE SYSTEMS STRUCTURES 115 extents stored in the inode. The padding field is used to bring the whole inode structure to the required byte size; this can be filled with random values to obscure the structure of the inode. Inode Structure Fields The hiddenfs_inode structure stores metadata about the file or directory which a particular inode references. The first field, magic, is used to mark the start of the inode structure, and to provide a method to validate the consistency of an in ode entry, using a magic number. The next field mode is used to mark whether a particular inode is reserved for a file or for a directory by storing a corresponding flag value. The key field is used to store an encryption key that will be used to encrypt and decrypt the data that is referenced by this inode, which will allow the inode data to be encrypted and decrypted. Each inode has a number used to reference it within the inode table; this is stored in the inode_number field. As discussed above the MSB of this filed is used to mark if a particular inode entry is allocated or not. Finally the data field is used to reference a hiddenfs_inode_data struc- ture (discussed above) which is used to store the extent information associ- ated with a particular inode. !nodes form the control structure for files and directories, allowing the hidden file system to retrieve and maintain a directory hierarchy. The par- ticular file and directory structures for the hidden file system will be discussed in the following section. Listing 6.5: Directory Entry structure 1 typedef struct hiddenfs_directory 2 { 3 unsigned int inode_number; 4 unsigned char name_length; 5 char name; 6 } hiddenfs_directory; 6.2.5 Files and Directories Files and directories are both regarded to be streams of arbitrary bytes which are stored in the hidden file system. Directories have a regular structure as 116 FILE SYSTEM STRUCTURES FOR SSFS seen in listing 6.5. The sub-directories and files in a directory are stored in a list of hiddenf s_directory structures each of which is stored sequentially in a hidden file system block. To navigate the directory structure there are two special directory entries that must be provided for each directory, these are the root and the parent. The root directory entry is designated by a "."; this entry simply points to the inode number for this directory entry. The parent directory entry is designated by " .. " ; this entry points to the inode number of the parent directory. In this way the hierarchical directory structure is constructed and can be navigated. The first field in the hiddenfs_directory structure is the inode_number field. This field stores the inode number of the item that this directory entry references. The name_length field is the length of the name of this item. Finally the name field is the name of this particular entry stored as a linear array of characters. In order for the hidden file system to operate, these structures need to be initialised. There are a number of considerations that need to be made during initialisation. Initialisation of each of the structures will be discussed in the following section. 6.3 File System Initialisation A crucial part of the operation of the hidden file system is correct initialisa- tion within the host file system. Initialisation will involve constructing all of the above mentioned structures and writing them out to the block device. During initialisation all of the parameters, such as the overall size, of the hidden file system are calculated. This phase will also determine the overall operational constants of the hidden file system. Initialisation takes place in two parts, namely host file system initialisation and hidden file system initialisation, these will be discussed below. The host file system initialisation will take place first, and then followed by the hidden file system initialisation. 6.3.1 Host File System Initialisation Ideally the host file system is derived from an existing file system implemen- tation, and is backward compatible with the existing file system drivers. This allows data stored on the host file system to be access using a standard file system driver. 6.3. FILE SYSTEM INITIALISATION 117 This adds a level of security, because the file system on the block device will appear to only contain the host file system, and the hidden data will appear to be remnants of the normal day to day activity, it not plausible that a normal user would suspect that hidden data would exist. Initialisation of the host file system can be taken directly from the original host file system creation utility, the resulting host file system structures must be laid out on the disk as they would be as if it were a standalone file system. The way in which the host file system structures are placed on the block device must be well-understood, so that the position of host data can be used to embed the hidden file system structures. There are however a number of factors that will influence the hidden file system initialisation, they are: The size of the host's superblock, as this will have a direct effect on the overall structure of the hidden file system. The host file system block size, as this needs to be taken into account. The number of blocks that exist in the host file system. The structure that marks unallocated and allocated blocks. For the purpose of this discussion, a simple host file system with the following structures listed below is assumed, with the structures stored on the block device as seen in figure 5.1 on page 97: A superblock A storage bitmap An Inode bitmap An Inode Table In the following section we will discuss the initialisation of the hidden file system. In order to construct the hidden file system's structure on the physical device, the positioning of the host file system structures must be un- derstood in order to determine the exact locations for the hidden file system structures. 118 FILE SYSTEM STRUCTURES FOR SSFS 6.3.2 Hidden File System Initialisation Initialisation of the hidden file system relies heavily on a clear understanding of how the host file system stores its structures on the physical device. This allows the hidden file system structures to be embedded within the host file system. Initialisation of the hidden file system is partitioned into four separate stages, these stages are listed below. Superblock Initialisation Translation Map Initialisation Inode Initialisation Root Directory Initialisation Each of the above mentioned stages are crucial to set up the overall operating environment for effective storage of hidden data, and are discussed in the following sections. Superblock Initialisation The superblock is the only structure that must be stored in a consistent location. Initialisation of the superblock will determine a number of param- eters for the hidden file system, such as the number of allocated blocks, the location of the Translation Map, and the location of the inode table. There are a number of considerations which must be made when initialis- ing the superblock, such as what will be the maximum number of blocks that can be allocated to the hidden file system. This value is limited to a per- centage of the total number of blocks in the host file system, say for instance 5%. Limiting the hidden file system's size is done so that the existence of the hidden data can obscured. If there was no limit and the hidden file system could grow to fill all the unallocated blocks within a host file system, then there would be a conflict that would develop between the hidden data and the non-hidden data which will be discussed in the following chapter. The superblock will be used to store the location and size of the Trans- lation Map, the location and size of the Inode Table, and the inode number of the Root Directory. However, these values can only be set once the corre- sponding structures have been initialised. Initialising the superblock is a simple operation. The following steps must be taken: 6.3. FILE SYSTEM INITIALISATION 119 1. Allocate memory for the superblock structure. 2. Set the magic numbers in the superblock. 3. Set the block size for the file system in the superblock. 4. Set the flags in the superblock. 5. Calculate the number of blocks that will be available in the file system. Algorithm 3 shows the basic steps needed to initialise the superblock. Only the first few entries need to be initialised, as the rest of the superblock structure will be constructed during the initialisation of the remaining struc- tures. The superblock will be written to the disk once the rest of the struc- tures have been initialised. The values for the magic numbers can be seen in listing 6.1 on page 107. Hiddenfs.Superblock.magicl ~ HIDDEN_FS_SPBLK_MAGIC; Hiddenfs.Superblock.magic2 ~ HIDDEN_FS_SPBLK_MAGIC2; Hiddenfs.Superblock.flags ~ set the flags to control the access to the data Algorithm 3: Hidden file system superblock initialisation The constant values seen below are set during the hidden superblock initialisation. These constants are used throughout the initialisation of the other structures in order to determine the overall size and physical position. As seen below hiddenFS and hostFS represent the hidden and host file systems respectfully. The size of the hidden file system is determined by LIMIT. The number of file system blocks is represented in a particular file system by numblocks and the file system blocks size is represented by blocksize. hiddenFSnumblocks hidden FSblocksize hidden FS size hostFSnumblocks *LIMIT hostFSblocksize hidden FSnumblocks * hidden FSblocksize Translation Map and TMap Array Initialisation (6.1) The Translation Map provides the ability for the hidden file system to dy- namically reallocate data, by providing the translation between the logical 120 FILE SYSTEM STRUCTURES FOR SSFS and physical view of the file system. Each block that can be allocated in the hidden file system requires an entry in the Translation Map. The Translation Map is implemented as an array of Translation Map entries. The structure of the Translation Map entry can be seen in listing 6.3. Each entry has a size of 5-bytes; this implies that considerations need to be made to store the Translation Map on the block device. The Translation Map is also used to manage the free space within the hid- den file system, by utilising the allocated byte (seen on line 3, listing 6.3). By marking this byte as either 0 or 1 the corresponding logical block will be considered to either be unallocated or allocated. Although the Translation Map provides the mechanism for dynamic re- allocation of the hidden data, it needs to be dynamically reallocatable itself. This is achieved through the use of the TMap Array. The TMap Array provides a static reference to every physical block that is allocated to the Translation Map. It is simply an array of integer values, where each integer gives the physical location of each block of the Translation Map. The TMap Array is stored directly after the Superblock structure, in the first block, and is therefore not reallocatable. It is considered to be part of the complete superblock structure. The byte size for the Translation Map and the TMap Array can be seen in equation 6.2, where translationMapEntry, translationMap and TMap represent a Translation Map Entry, the Translation Map, and the TMap respectfully. translationMapEntrysize translation Map size TMapsize 5 bytes hidden FSnumblocks * translation M a pEntry size r translationMapsize l ( ) 1 h'dd FS * sizeof integer I en blocksize (6.2) This does present a limitation on the hidden file system in that the file system block size will have a direct impact on the overall size of the hidden file system. This situation arises because the first file system block has to store a number of different structures: the host superblock, the hidden superblock, and the TMap Array. The effect that this will have on the hidden file system can be demon- strated by performing the calculation seen below. First assume that the file system block size is 1024-bytes, host superblock as a size of 124-bytes, and 6.3. FILE SYSTEM INITIALISATION 121 the hidden superblock has a size of 56-bytes. Then the size of the TMap is limited to the remaining bytes in the first block. This is calculated by subtracting the size of the host superblock and the hidden superblock from the size of the file system block. The remaining value is then divided by the size of each TMap Array entry (a single 4-byte integer for each TMap Array Entry). This will then give the maximum number of blocks that can be allocated to the Translation Map, by extension the number of Translation Map Entries, and thus the total allowable size of the hidden file system. The calculation can be seen in table 6.1 and represented in equation 6.3. Superblock Total size MaxTMapsize TMapsize hiddenFSsuperblock + hostFSsuperblock L hiddenFSblocksize- SuperblockTotalsize J sizeof(integer) < MaxTMapsize File System block size in bytes (6.3) 1024 Subtract host superblock size in bytes -124 Subtract hidden superblock size in bytes -56 Number of bytes remaining in 1st block 844 Maximum blocks allowed for Translation Map L 844 -;... 4 J = 211 Total size for the Translation Map 211 X 1024 = 216064 Total number of Translation Map Entries L216064-;-5J = 43212 Total file system size 43212 X 1024 = 42 MiB Table 6.1: Calculation of the size of the Translation Map In order to initialise the Translation Map the following steps have to be taken, these steps can be seen in algorithm 4: 1. Allocate memory to the Translation Map structure. 2. Calculate the size (S) of the Translation Map. (a) Calculate the number of blocks (N) required to hold the Transla- tion Map. 122 FILE SYSTEM STRUCTURES FOR SSFS TranslationMap +- allocate memory for TranslationMap ; I* Calculate the size of the Translation Map *I TranslationMap.size +- Hiddenfs.Superblock.NumberBiocks * 5; I* Calculate the number of blocks for the Translation Map *I NumberBiocks +- lTranslationMap.size / Hiddenfs.Superblock.BiockSize J; MaximumSize +- (BiockSize - Host.Superblock - Hiddenfs.Superblock )/4); I* Check to see if the number of blocks is allowable *I if NumberBiocks > MaximumSize then I NumberBiocks +- MaximumSize; TranslationMap.size +- NumberBiocks *5; end I* Allocate an array of integers for TMapArray *I TMapArray +- allocate NumberBiocks integers for TMapArray ; I* Mark the first entry as allocated - for the Superblock *I TranslationMap [OJ +- allocated; for i=O to lNumberBiocks -;-BiockSize J do I* Mark each block allocated to the Translation Map. When the entry is allocated a physical block must be found in the Host file system to become the storage location for this particular entry *I Translation Map [i] +- mark as allocated and find a physical block; TMapArray [i] +- record physical location of block; end Hiddenfs.Superblock +- record location and size of the Translation Map; Algorithm 4: Translation Map initialisation 6.3. FILE SYSTEM INITIALISATION 123 (b) Check to see if the number of blocks does not exceed the maximum allowed. 3. Allocate an array of integers for the TMap Array. 4. Allocate memory for each Translation Map entry in the Translation Map. 5. Mark the first Translation Map entry as allocated for the Superblock (Logical block 0). 6. Mark the next N blocks as allocated for the Translation Map. (a) For each Translation Map entry to be allocated find an unallocated physical block in the host file system to store this block. 7. Record the physical location of the N Translation Map blocks in the TMap Array. Inode Initialisation Once the Superblock, Translation Map, and the TMap Array have been initialised, the inode table for the hidden file system must be initialised. The inodes in the hidden file system are allocated statically; this means that space is reserved during initialisation for all of the inodes that could exist in the file system. When choosing the number of inodes to store in the inode table, there are two considerations that must be made, namely storage requirement versus number of inodes. The more inodes which exist in the inode table, the greater the number of files and directories that can exist within the file system. The more inode entries the larger the inode table will be, and hence the greater the storage requirement. The size of the inode table can be calculated by simply dividing the number of blocks in the file system by the ratio of inodes to file system blocks, and then multiplying that by the number of bytes per inode entry. For example if the file system contains 3276 blocks, and there is 1 inode for every 4 disk blocks. There will be 819 inode entries in the inode table. If each inode entry is 128 bytes, then the total size of the inode table would be 10482 bytes. This calculation is demonstrated in table 6.2 and presented in equation 6.4. 124 FILE SYSTEM STRUCTURES FOR SSFS lnodeEntry size 128 bytes I node T a bleentries l hiddenFSnumblocks J I node Entry size lnodeTablesize I node Tableentries * lnodeEntry size Number of hidden file system blocks Size of an inode entry Number of inode entries in inode table Size of inode table 3276 blocks 128 bytes l3276---;- 4J = 819 819 x 128 = 104832 bytes Table 6.2: Calculation of the size of the !node Table (6.4) To try and strike a balance, the number of inodes is chosen to be one inode for every four disk blocks; this is the same ratio as is used in the Ext2 file system. The inode table has to be dynamically reallocatable; this is achieved by allocating blocks for the inode table via the Translation Map. The inode table will there for appear to be contiguous within the "logical" file system, but it can be located in any "physical" location within the host file system. Once the number of inodes, the size, and the location of the inode table has been calculated, it is stored in the superblock. The inode table will now be locatable by the file system. The following steps must be taken in order to initialise the inode table and is outlined in algorithm 5: 1. Calculate the number of inodes for this file system. Using 1 inode entry for every 4 file system block. 2. Calculate the number of file system blocks that the inode table will occupy. 3. Allocate memory for each lnode Entry in the lnode Table. 4. Allocate file system blocks in the Translation Map for each block the inode table will occupy. 5. Mark the start, size, and the number of inode entries of the inode table in the superblock. 6.3. FILE SYSTEM INITIALISATION I* Calculate the number of inodes *I Numberlnodes +--- Hiddenfs.Superblock.NumberBiocks /4; I* Calculate the number of blocks for the Inode Table *I size +--- Numberlnodes *128; NumberBiocks +--- I size / BlockSize l; I* Allocate memory for the inode table *I lnodeTable +--- allocate memory for !node Table; Allocate NumberBiocks logical blocks for the !node Table from the Translation Map; Mark !node Table attributes in the Hidden Superblock; Algorithm 5: !node Table initialisation Root Directory Initialisation 125 Once the inode table has been initialised, the root directory can be created. The root directory is required so that the hidden file system has an initial directory to work with. The root directory requires that a directory entry (see listing 6.5, on page 115) and an inode (see listing 6.4, on page 113) be allocated. There is no structural difference between the directory entry for the root and any other directory entry in the hidden file system, there is only a semantic difference, in that the root and parent directory entries (" . " and " .. ") both point to the inode that is allocated to the root directory. This is done because the root directory does not have a parent; this ensures that a user can never navigate to the "parent" of the root directory (which does not exist). The inode that is allocated to the root is a normal inode entry. Once the inode has been allocated, the number of the particular inode is recorded in the superblock so that the root can be located. The root directory must be dynamically reallocatable, this is achieved by allocating blocks for root directory via the Translation Map. Initialising the root directory is quite simple; the following steps must be taken: 1. Allocate an inode from the inode table for the root directory, this should be inode 1. 2. Create a directory entry for the root directory (.) that references inode number 1. 126 FILE SYSTEM STRUCTURES FOR SSFS 3. Create a directory entry for the parent directory ( .. ) that references inode number 1. 4. Allocate a logical block from the Translation Map for the root directory. 5. Mark the root directory inode number in the superblock. 6. Write the root directory to the physical disk using the Translation Map to obtain the physical location. Once the Superblock, Translation Map, Inode Table, and root directory have been created, all of the structures are written to the disk. The Trans- lation Map will provide the physical locations where allocated logical hidden file system blocks should be written. 6.4 Summary In the above chapter we discussed the following concepts: File System Structures - where we introduce the control structures which will be used in SSFS. - Superblock - in this section we describe the superblock, which contains the file system metadata - TMap Array - where we describe the structure of the TMap Array, which allows the Translation Map to be located on the physical device. - Translation Map -in this section we discuss the Translation Map, which will house the logical to physical mappings for SSFS. - !node Table -in which we discuss the Inode table, used to store inodes. - Files and Directories -- where we discuss the inodes and direc- tory entries, used to store metadata about files and directories respectfully. File System Initialisation - in this section we discuss the initialisation of the host and hidden file system structures. - Host File System Initialisation - in this section we discuss a simple host file system implementation. 6.5. CONCLUSION 127 - Hidden File System Initialisation- where we discuss the initiali- sation for each of the hidden file system structures, with emphasis on the physical limitations presented imposed by the host file sys- tem. 6.5 Conclusion The effective implementation of a steganographic file system relies on the effective design of the structures which will support the storage and retrieval of hidden data. By designing these structures in such a way as to enable data to be reallocated allows problems such as data collisions to be avoided. Steganographic file systems must be designed to provide convenient and transparent security for hidden data, as well as providing an adequate level of plausible deniability, this is facilitated by the file system's metadata struc- tures. In this chapter we discussed the framework for a steganographic file sys- tem implementation by discussing the structures that will be used to man- age and store hidden data. These structures will allow for convenient and transparent integration with a host file system, giving the hidden file system maximum flexibility. In section 6.2 we discussed the construction of the structures that will be used to store and manage hidden data within the hidden file system. In section 6.3.1 we discussed the initialisation of these structures in order to allow the hidden file system to enter an operational state. In the following chapter we will discuss the file system operations. These operations will operate upon the file system structures in order to allow data to be stored and retrieved. Chapter 7 File System Operations for SSFS 7.1 Introduction In order for the steganographic file system to operate, a number of data operations must be defined to allow a user to interact with the file system data structures. These operations are used to store and retrieve various forms of data from within the file system. There are two different generic types of operations, those which operate on metadata, and those which operate on data. When a user requests access to data, there are a number of interactions between the file system's components. In this chapter we will be discussing the file system operations. We intro- duce a number of the file system's operational layers which are used to group file system operations in operational categories. Each layer of operation will interact with other layers in order to achieve the storage and retrieval of data. Firstly we will introduce a layered approach for the file system operations in section 7.2. We will go on to discuss the low-level operations in section 7.3; this is where the lowest level of functionality is defined, and is primarily concerned with input and output to the physical disk. We will then discuss the intermediate-level operations in section 7.4 and the role which they play in maintaining the file system's metadata. Finally we will discuss the high- level operations in section 7.5 as a mechanism for the storage and retrieval of data. 129 130 FILE SYSTEM OPERATIONS FOR SSFS Disk -----1 Low-Level Operations j I I I I I I I I
I Intermediate-Level Operations I I I I I I I I I
II ________ High-Level Operations j Figure 7.1: File system operation layers 7.2 Layered File System Operations The steganographic file system uses operations which are designed in layers of functionality. This is done to provide interoperability between the "logical" and "physical" layouts of the file system, and in order to provide modu- larity. There are three main operational layers namely: the low-level, the intermediate-level, and the high-level operations. The interaction between these layers can be seen in figure 7.1. The low-level operations are concerned with operating on the "physical" location of the data on the disk. These operations are usually provided by the operating system kernel API, and will provide low-level functionality such as moving the heads of the hard disk drive, spinning the platters, and read and writing to the physical device. The low-level operations are discussed in the following section. The intermediate-level operations are concerned with the logical place- ment of data, and the translation between the logical and physical locations. This layer is implemented within the file system implementation and inter- acts with both the low and high level operations. This layer also includes the security mechanisms in the form of cryptographic routines, which will be covered in the following chapter. The high-level operations allow for user interactions, and are concerned only with the logical position of the hidden data within the steganographic file 7.3. Low-LEVEL OPERATIONS 131 system. This layer contains all the operations relating to the manipulation of files and directories. This layer also provides an interface for a human user to interact with the file system, usually via a command shell. The interaction between the operational layers can be seen graphically in figure 7.1. 7.3 Low-Level Operations All operations that are performed on the file system are defined in terms of a number of low-level operations on the physical storage medium. The most basic low-level operations are that of reading and writing single blocks to a particular physical location. Every other file system operation relies on the ability to read and write to the physical disk. The read operation is very simple, and takes the form of a function that will read a specified number of bytes from a location on the physical medium into primary memory. The write operation works in the same way, it simply writes a specified number of bytes from primary memory to a location on the physical medium. These two operations are usually provided by the operating system kernel API. The design of the steganographic file system does not require any modifi- cation to these functions and can be used as provided by the kernel API. In order to fully discuss other higher level file system operations, the overview of the read and write operations as provided by a kernel API will be discussed below. 7.3.1 Read and Write Operations Overview The most basic read and write operation that is required is that which reads a number of bytes from a physical position on a physical device, such as a hard disk drive. A file system will most often read and write blocks that are of a consistent size, which would normally be the file system block size. The operation of the read and write functions can be seen graphically in figure 7.2. The file system implementation will define a number of situations in which to utilise the kernel read and write operations. These operations are defined in order to simplify the access of data from the physical medium. 132 FILE SYSTEM OPERATIONS FOR SSFS Initial disk state: 11111111111111 Read data: Figure 7.2: Simple read and write operation overview 1. Read or write a number of bytes less than the file system block size - this operation will be used when the file system reads or writes to a file system structure that has a number of defined elements, such as the inode table. This will allow the file system to write directly to the physical location on the disk, which will increase performance. 2. Read or write an entire file system block - this will be the most com- mon I/0 operation. When a file or a directory is accessed on the physical device, the file system will access each file system block that is allocated to the file or directory individually, which either will be stored or retrieved from primary memory. 3. Read or write a stream of file system blocks - this operation will read or write a stream of bytes from the physical device. This will allow access to files and directories which are stored across a number of different file system blocks. This operation will usually take the form of a function which will read multiple file system blocks from the physical device. In the following section we will discuss the intermediate-level operations as a mechanism for accessing file system metadata. 7.4. INTERMEDIATE-LEVEL OPERATIONS 133 Translation Map Logical Disk Figure 7.3: Logical to physical translation operation 7.4 Intermediate-Level Operations The intermediate-level operations are concerned with the translation be- tween the logical and physical layouts of the steganographic file system, and the modification of the metadata structures. These operations accept instruc- tions from the high-level operations and invoke the required low-level oper- ations after a translation or encryption has taken place. The intermediate- level operations also provide the security mechanisms so that data encryption can be transparent. 7 .4.1 Logical-Physical Translation Operation The process of performing the translation between the logical file system address and the physical locations allows data to be stored and retrieved from the physical medium. The translation is achieved by interacting with the Translation Map (see section 6.2.3, on page 110). As described in the previous chapter, the Translation Map consists of a list of logical addresses and associated physical locations. The translation is achieved by simply returning the physical address for a particular logical lo- 134 FILE SYSTEM OPERATIONS FOR SSFS cation. When a high-level operation requests that a logical block is allocated to a file system structure, a free physical block is located through interaction with the host file systems storage map, and the mapping between the logical and physical locations is created in the Translation Map. This gives the ability for the physical location of the data to change, without effecting the logical position within the hidden component of the steganographic file system. This forms the basis for dynamic reallocation and will be discussed further in a later chapter. In the following section we will discuss the translation-map operations and its use in free block manage- ment. The logical to physical translation operation can be seen graphically in figure 7.3. 7.4.2 Translation-Map Operations The Translation Map plays the dual role of providing the translation mech- anism for the logical to physical translation, and marking logical file system blocks as either allocated or unallocated. The allocation and deallocation of blocks will be discussed below. Block Allocation Block allocation is a vital part of the steganographic file system implementa- tion. Every block that is allocated in the Translation Map has an associated Translation Map Entry, as seen in listing 6.3 on page 111. As discussed in the previous chapter, the first byte is used to mark the entry as allocated and the next 4-bytes are used to mark the physical location which is allo- cated to the Translation Map Entry. Each Translation Map entry represents a single logical block within the steganographic file system, where the first Translation Map Entry represents the first logical file system block. Allocation of a hidden file system block involves a number of steps, and is demonstrated graphically in figure 7.4. We provide the steps as follows: 1. Find a free logical block in the Translation Map, this is a free block in the hidden file system. 2. Acquire a write lock on the host file system's storage map. (a) Find a free physical block in the host file system. 3. Release the write lock on the host file system's storage map. 7.4. INTERMEDIATE-LEVEL OPERATIONS 135 Translation Map ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ Step 1 Find free block Physical Disk Step 2 Find free block Translation Map ,_,..,on.,,rnnoTTT>nnnoTTornn Step 3 Physical location stored in Translation Map Figure 7.4: Block allocation 4. Map the physical block to the logical block in the Translation Map. 5. Return the logical block address to the calling function. Allocating a logical block is a simple operation, which involves searching through the Translation Map and finding an unallocated block. The unal- located block is then marked as allocated and then a free physical block is located in order to complete the mapping. A free physical block is located by interacting with the storage map of the host file system. This is either achieved through direct interaction with the storage map structure of the host file system, or by using functions provided by the host file system in order to locate a free physical block. Once a physical block is located it is stored in the Translation Map Entry for the newly allocated logical block. Finally the logical block address is returned to the function that requested that a new block be allocated. It is important to note that the storage map of the host file system is not modified in any way by this operation. It is only used to find a free physical block. By not modifying the host file system's storage map to mark the physical blocks allocated to the host file system, this allows the physical blocks to be available for allocation within host file system at a later stage. 136 FILE SYSTEM OPERATIONS FOR SSFS When this event occurs the dynamic reallocation policy will come into play, this will be discussed in detail in a later chapter. In the following section block deallocation will be discussed. This oper- ation is used when a file or directory no longer needs to utilise a particular logical block. Block Deallocation Block deallocation occurs when a logical block is no longer needed by a file system object. The block must be marked as unallocated and then can be reused for a different object at a later stage. Block deallocation is a very simple operation, and is accomplished in the following two steps: 1. Mark the logical block as unallocated. 2. Write the value "zero" to the entry field of the Translation Map Entry. By marking the Translation Map Entry as unallocated, the stegano- graphic file system will consider the block for allocation at a later stage. As a security precaution, the value "zero" is written to the entry field of the Translation Map Entry. This is done to ensure that any residual data that is remaining in the physical block cannot be referenced back to the hidden data in any way. In order to securely deallocate a block, the hidden file sys- tem removes all traces of redundant data. If the mapping in the Translation Map were to persist, if the block was no longer allocated, there would be a possibility that the redundant user data could be exploited. To ensure that this cannot occur, the deallocated data is complete removed. These block allocation methods play an important role when physical and logical blocks need to be allocated to files and directories. In the following section we will discuss the inode operations, which control the set of blocks which are allocated to a file or directory. 7 .4.3 In ode Operations The inodes are used to store metadata concerning files and directories. Every file or directory needs to have an associated inode in the inode table. An inode will have to be allocated, modified or deallocated depending on the operation requested by the file system. These operations will be discussed below. 7.4. INTERMEDIATE-LEVEL OPERATIONS 137 Allocate Inode Each inode entry in the inode table as a structure as seen in listing 6.4 on page 113, with each entry being 128-bytes in size. All of the inode entries are pre-allocated during initialisation of the inode table; this allows free inode entries to be located and allocated to a file or directory quickly. Inode entries hold items of metadata for files and directories within the file system. The metadata items are kept to a minimum within the file system implementation, in order to reduce the amount of information that can be referenced back to a file, which will eliminate a number of security risks. As such only the most basic information is recorded within the inode entry. The process of allocating an inode entry is performed in three stages: 1. Initialise the metadata variables within the inode structure. 2. Allocate a number of logical blocks to the inode structure. 3. Write the inode entry to the physical device. Firstly the metadata entries have to be initialised, the magic, mode, key, and inode_number fields are initialised as described in the latter part of section 6.2.4. These metadata fields allow the file system to recognise the inode entry and to decrypt the associated data if requested. Secondly a number of logical file system blocks must be allocated to the inode entry. Files and directories need to be stored in logical blocks within the file system, and the inode entry is where these blocks are stored and managed by the file system implementation. As discussed in the previous chapter, the inode entry contains a number of extents which can directly store a number of contiguous block references. The number of logical blocks the will be needed for a file or directory can be determined as follows, where Osize is the byte size of the object, Bfs is the file system block size, and Nblocks is the number of file system blocks required to store the object: rOsizel N 1-B = blocks fs (7.1) Once the number of blocks required has been calculated the block allo- cation methods which were discussed above are invoked in order to allocate the logical blocks within the Translation Map. 138 FILE SYSTEM OPERATIONS FOR SSFS Finally the inode entry is written to its proper location within the inode table. This can be done directly, as the size of the inode entry will divide cleanly into the file system block size, which allows an inode entry within the inode table to fall between block boundaries. This allows the exact location of the inode entry within the inode table on the physical disk to be easily calculated. Modifying an Inode Modification of an inode entry will occur when the file or directory associated with the particular inode is modified, such as when data is appended, or deleted from the data stream of a file or directory. In such a situation the number of blocks which are allocated to the object may either be increased or decreased, and the overall size of the object must be increased or decreased appropriately. This operation does not require any other modification to the inode entry, which results from only storing a minimum amount of information within the inode entry structure. The following two cases will require a modification to the inode structure: 1. Blocks are required to be appended to the data stream. 2. Blocks are required to be removed from the data stream. Blocks are appended or removed from the inode structure in the same way in which they were added when the inode structure was created. In the case of an addition to the inode structure, the modification function will interact with the block allocation function in order to allocate a new set of logical blocks to the inode. This operation will maintain the mapping within the Translation Map. When blocks are required to be removed from the inode structure, again the modification function will interact with the block deallocation function in order to reclaim the logical blocks within the Translation Map. In both cases the size of the file system object as represented by the size (see listing 6.4 on page 113) field in the inode entry structure has to be modified in order to reflect the correct byte size of the object. 7.4. INTERMEDIATE-LEVEL OPERATIONS 139 Deallocate Inode Inode deallocation will occur when the file system is requested to delete a file or directory, in which case the associated inode will need to be reclaimed. The inode will then be available for later allocation by a different file system object. A number of steps must be taken to ensure the inode has been deallocated securely. If the inode was simply marked as unallocated but the metadata remained intact; then information concerning deleted files and directories could be obtained through examination of the remains of the inode. In order to securely deallocate an inode, all the metadata must be overwritten; this can be done by writing "zero" to the entire inode structure. This will ensure the metadata associated with a deleted object will not remain within the file system. There are therefore a number of steps required in order to securely remove an inode from the inode table, these steps are listed below. 1. Mark the inode as unallocated. 2. Deallocate the blocks that are allocated to the inode entry. 3. Write "zero" to the entire inode entry structure. 4. Write the zero filled structure to the inode table. In order to deallocate the blocks which are allocated to the inode entry, the inode deallocation function will interact with the block deallocation func- tion. Once this has occurred the blocks which were allocated to this inode entry will be available for later reallocation by another file system object. Finally the unallocated inode structure is written to the correct physi- cal location within the inode table. This will allow the inode entry to be reallocated to another file system object sometime in the future. The intermediate-level operations discussed above provide the ability to interact with the file system's control structures. In the following section we will discuss the file and directory operations which form the high-level oper- ations. These operations will interact with the intermediate-level operations in order to maintain the files and directories within the steganographic file system. 140 FILE SYSTEM OPERATIONS FOR SSFS 7.5 High-Level Operations The operations are concerned with operating on the logical po- sitions of files and directories within the hidden component of the stegano- graphic file system. The operations will rely on the oper- ations, discussed in the previous section, to perform the translation between the logical file system position and physical locations on the physical medium. Restricting the operations to only operate on the logical po- sitioning of data, allows the steganographic implementation to freely reallo- cate the underlying physical location of the hidden data, while maintaining a consistent logical position. This allows data to be easily located within the hidden file system regardless of the underlying physical location. In the following sections we will discuss the file and directory operations, which are the two sets of operations which allow a human user to interact with the data which is stored within the file system. 7.5.1 Directory Operations Directories are used to form the hierarchical structure of data within the file system. As such their existence is used to give an organisational structure to the file system. The operations on a directory can be seen as simplified file operations, this is because a directory in its simplest form is a stream of bytes, not unlike that of a file. A directory is made up of a linear list of directory entries, called a direc- tory stream. Each directory entry has a structure as was seen in listing 6.5 on page 115, where each entry has a reference to a related inode number and a variable length name. The inode number will either reference another directory, a or a file. An important element of every directory is that it must contain the di- rectory entries for the root and the parent, designated by '.' and ' .. '. The root entry will reference the inode number for the particular directory entry, and the parent entry will reference the inode number for the directory which is the hierarchical parent of a The special case is the so called "root directory", which is the first di- rectory entry which is created during file system initialisation, which acts as the parent for all subsequently created files and directories. The root has the "root entry" and the "parent entry" both referencing the root itself. 7.5. HIGH-LEVEL OPERATIONS 141 This is done to prevent a user from attempting to navigate to a non-existent directory above the root. In the following sections we will discuss the operations that can be per- formed on a directory. These operations allow the user to create and maintain the hierarchical organisational structure. Creating a directory The process of creating a directory allows a hierarchical organisational struc- ture to be established within the file system, and can be used to organise files and other directories into a logical structure. As discussed in the previous chapter, each directory is constructed of a number of directory entries, which is described in listing 6.5 on page 115. Each directory entry is used to describe a particular item in the directory. Each entry has a variable length name and an associated in ode number. In order to create a directory there are two elements of the overall direc- tory structure which need to be considered, the parent directory entry and the new directory entry that is to be created. The parent directory will be the directory where the new directory is to be contained. In order to reflect the new directory entry within the directory structure a reference to the new directory must be appended to the directory stream of the parent. This will require modification to the inode entry of the parent, and may require that new logical blocks be allocated. Firstly, the new directory entry structure must be created and written to the disk, this will require that an inode be allocated, and logical blocks be allocated to the inode. Once this has been accomplished the root and parent directory entries are created. The root being the inode number of the inode which is allocated to this new directory entry, and the parent being the inode number of the inode allocated to the parent directory, designated by ' . ' and ' .. ' respectfully. This will allow the directory structure to be traversed by a user at a later stage. Once the new directory entry has been created and written to the disk, the parent's directory entry can be modified in order to reflect the new directory entry. This is achieved by appending a new directory entry structure to the parent's directory entry. The name of the new directory and the inode which was allocated to it is then appended to the parent's directory. The name of the new directory is simply a human understandable representation of the directory to allow a user to easily identify it. 142 FILE SYSTEM OPERATIONS FOR SSFS Create new Directory Entry Allocate Blocks I Translation Map I Modify Write ~ ~ Parent Directory Stream l Allocate Inode Disk I Inode Table I Figure 7.5: Creating a directory The steps involved in creating a new directory can therefore be sum- marised as follows, and represented graphically in figure 7.5: 1. Create the new directory entry (a) Allocate a set of logical blocks to hold the directory entry. (b) Allocate an inode entry in the inode table for the new directory entry. 2. Modify the parent's directory entry to reflect the new directory entry by appending the name and inode number of the new directory. Reading from a directory Reading the contents of a directory occurs when a user requests that a direc- tory listing be retrieved, such as by using ls command on a UNIX system, or when a user requests an item in a directory by its name, in which case the associated inode number must be retrieved. In both cases the directory entries will need to be parsed in order to extract the required information. Each directory entry is a variable sized structure, and the directory entries within the directory stream are not in any particular sorted order which must be taken into account when parsing the directory entries. The directory data stream specified by the allocated blocks in the inode will contain multiple directory entries, one for each of the object which is stored in a particular directory. The offset of each directory entry is not exact, and therefore in 7.5. HIGH-LEVEL OPERATIONS 143 order to locate a particular directory entry every entry preceding it must first be analysed. Within the directory entry structure, the size field specifies the character size of the name field (see listing 6.5), and the inode_number field will contain the inode number of the associated inode. By iterating through the list of directory entries and by keeping track of the size of each entry, the exact offset of a particular entry within the directory stream can be calculated. In order to retrieve a directory listing every directory entry in the direc- tory stream must be analysed. This is simply a matter of iterating through the entire directory stream and returning each directory entry in turn. Listed below are the generic operations that must be performed in order to either locate a particular directory entry or retrieve a directory listing. 1. Read the directory stream into primary memory using the inode allo- cated to the directory in order to locate the allocated blocks 1 . 2. Iterate through the directory stream, keeping track of the overall size of each directory entry. In order to retrieve a directory listing each directory entry in the directory stream is analysed and returned. To retrieve a particular directory entry each preceding entry must be iterated over, and then the particular entry returned. Writing to a directory Writing to a directory stream will occur when a file or directory is created. Each new file and directory must be contained within a parent directory, in this way building up the hierarchical structure. Every new file or directory must therefore have a unique directory entry in order for it to be locatable within the directory structure. This is done by appending a new directory entry to the directory stream of the parent directory. In the event of the directory stream not being able to accommodate a new directory entry, additional blocks will have to be allocated to the directory stream. This is achieved by locating a free logical block and appending it to the directories inode entry. 1 There are obvious memory concerns, however only directory streams which contain a large number of directory entries will present a significant problem. This can be solved by only reading a single file system block at a time into primary memory. 144 FILE SYSTEM OPERATIONS FOR SSFS This is a relatively simple operation because the directory stream of the parent does not need to be iterated over, as discussed in the previous sec- tion. The overall size of the parent's directory stream is known, and can be retrieved from the associated inode entry. All that is required is to create the directory entry for the new file system object and then append that entry to the directory stream. In order to create the directory entry for a sub-directory, the process discussed in the section above is followed. To create a directory entry for a file, a similar process is taken and will be discussed in the following section. Once the new directory entry has been created then it can be appended to the parent's directory stream. This may involve more file system blocks needing to be appended to the parent's inode in order to house the new directory entry. Finally the size of the parent's directory stream is then modified in the parent's inode entry. The process used to write to a directory stream is summarised below. 1. Obtain the size of the particular directory stream from the correspond- ing inode entry. 2. Read the parent's directory stream into primary memory. 3. Create the new directory entry. The structure of a directory entry for a file or directory is identical. To determine if a particular directory entry corresponds to a file or directory the mode field (see listing 6.4, line 19, on page 113) in the corresponding inode entry is examined. 4. Append the new directory to the parent's directory stream; this will increase the overall size of the stream. 5. Allocate more logical file system blocks to the parent directory as needed. 6. Modify the parent's inode entry to reflect any new allocated logical file system blocks and the new size of the stream. 7. Write the parent's directory stream to the correct physical location on the physical disk. 7.5. HIGH-LEVEL OPERATIONS 145 Deleting a directory The deletion of a directory will occur when it is no longer needed by the file system user. To remove a directory from the file system there are a number of events that must occur in order to ensure a secure removal of data from the file system. A directory cannot be removed if it contains other files or sub- directories. A directory must be completely empty in order to be removed; this is to ensure that the user will not mistakenly delete valid objects. If a directory is empty then the file system will permit it to be removed. Deleting a directory will require modification to the Translation Map, the inode table, the parent directory, and of course the directory entry itself. Firstly, the directory stream of the directory that must be removed must be overwritten with either zero or random values. This is to ensure the secure removal of the directory stream. If this is not performed then information can be obtained from the remnants of the directory stream concerning the files and sub-directories which it contained. Once the directory data has been overwritten, the inode which was al- located to the directory can be reclaimed, which will in turn reclaim the logical blocks in the Translation Map which were allocated to this directory. The inode and logical blocks will now be available for later reallocation by subsequent file system objects. Finally the directory stream of the parent directory must be modified in order to no longer reflect the deleted directory. This is achieved by removing the particular directory entry from the parent's directory stream. Removing the directory entry is done by "shifting" all the proceeding directory entries, and thus in effect removing the particular entry. The size of the parent directory as reflected in the inode entry must then be modified. To summarise the above process, in order to remove a directory entry from the file system the following steps must be taken: 1. Ensure that the directory to be removed is empty. 2. Overwrite the directory stream with zeros. 2 . 3. Reclaim the inode entry for this directory from the inode table. The logical blocks allocated to this directory will be reclaimed. 2 0nly a single overwrite is used, however this can be increased to ensure that the directory stream is overwritten 146 FILE SYSTEM OPERATIONS FOR SSFS 4. Remove the corresponding directory entry from the parent's directory stream. In this section we discussed the directory operations which formed the basis for interacting with the directory structure within the file system. In the following section a number of file operations will be discussed. 7.5.2 File Operations Files are the raw streams of data that form the bulk of information that is stored within the file system. Files have no discernible structure as far as the file system is concerned, only that they are a stream of bytes. Files are described with inode entries and are stored within directories, and thus require a directory entry. Depending on the overall size of the file data, the file can be stored in multiple file system blocks. The physical and logical location of these blocks will differ, and need not be contiguous. It remains the aim of the file system to manage the storage and retrieval of the blocks that are related to a particular file when the human operator requests them. Files differ only slightly from directories, in that directories have an im- plied structure. The operations on files and directories are very similar, but directory operations are generally more complex, because of the structure that must be maintained. In the following sections we will discuss a number of file operations, which allow the human operator to interact with the file data that is stored on the disk. Creating a file The process of creating a file is the basis for storing meaningful information within the file system. Every file that is created has a number of associated file system metadata structures. Files and directories are technically both streams of bytes that are stored within the file system, the only difference is that there is a structure that is imposed on the directories, where as there are no such restrictions on file data. For each file that is active within the file system, a set of logical blocks, an inode entry, and a directory entry must be created. Files will normally occupy more file system blocks than directories, and as such greater care 7.5. HIGH-LEVEL OPERATIONS 147 must be taken when allocating logical file system blocks, as indirect and double-indirect block may have to be allocated within the inode. To create a file, firstly an inode entry must be allocated to contain the file's metadata. The inode is allocated through interaction with the inode al- location functions which will allocate the inode within the inode table. When files are initially created they do not have any file system blocks allocated to them, and as such the overall size of the file remains zero. File system blocks are added to the file's inode entry once data is written to the file. Once the file has a valid inode entry, a directory entry must be created in order to reference the file within the hierarchical directory structure. This is a similar process as with the creation of a directory discussed above. A directory entry for the file must be created within the directory stream of the parent directory. The parent directory is simply the directory that is used to house the file. Once the directory entry has been created, the file is now available for use within the file system. The process used to create a file is summarised below. If there are no longer any hidden file system blocks available for allocation then the file cannot be created, and an error is returned to the user. 1. Allocate an inode to the new file. The file will initially have a no allocated file system blocks and a size of zero. 2. Create a directory entry for the file. 3. Append the directory entry to the directory stream of the parent. Writing to a file Writing to a file assumes that there has already been a file created within the file system which can be written to. The file may either have no size (a size of "zero" reflected in the inode entry) or any arbitrary size. In both cases the process of writing to a file is the same, and is accomplished in a number of steps. A file may, or may not, contain existing data; this data is stored at a logical position within the steganographic file system, which is in turn stored in a physical location on the hard disk. The bytes which make up the file data are referred to as the file stream. 148 FILE SYSTEM OPERATIONS FOR SSFS Firstly a number of file system blocks may need to be allocated to the file; this will only need to occur if the data that is to be written to the file will exceed the available amount of space that is available in the currently allocated blocks. In the case of writing to a file which is zero bytes long, the file system will always have to allocate new file system blocks to the inode in order to store the new data. This will be achieved through interaction with the inode modification methods which will in turn operate on the block allocation methods to allocate new blocks to the file. Once a set of blocks have been allocated to the file, new data can be appended to the end of a file stream. For example, if the size of the current file stream is N bytes, and the new data is M bytes long then the new data will be written at byte position N + 1 in the file stream for a length of M bytes, therefore increasing the overall size of the file stream to N + M bytes. The inode entry for the file is then updated in order to reflect the new size of the file. The inode is written to the correct position within the inode table. The new data is then written to the physical disk. The process of writing data to a file is summarised in the following steps. 1. Allocate more blocks to house the new data if needed. 2. Modify the inode entry to reflect the new blocks, and the increased size of the file. 3. Write the inode entry to the inode table. 4. Append the new data to the existing file stream. Reading from a file All data that is stored within the steganographic file system will have to be accessed as some stage. This will involve the file contents being transferred from the physical disk to the primary memory. The processes involved in reading the file stream from the physical disk to the primary memory involve an interaction with a number of different file system structures. Normally a user will specify a file using the human-understandable name that is stored in the directory entry. The inode number corresponding to the file system entry will have to be extracted from the directory entry in order to access the data stream. A file will have a size that is specified by the inode entry. A user can only read a number of bytes from the file that is less than, or equal to, the 7.5. HIGH-LEVEL OPERATIONS 149 overall byte size. This restriction will prevent the user from accessing data that does not form part of the actual file contents. File data is stored within logical blocks in the steganographic file system, which is referenced to a physical location. The read functions will have to interact with the Translation Map in order to obtain the exact physical location for a particular logical block. The dynamic-reallocation policy will create a situation where the logical-to-physical is not constantly defined; as such there must always be interaction with the translation-map in order to obtain the correct mappings. Data from the file stream can be read in many different ways, normally this will involve reading the file data from the physical disk into a buffer in primary memory to be utilised by the user. The user will specify an offset with in a particular file and a length of bytes that should be read. The file system will read those particular bytes from the physical disk and place them into the user buffer. The read functions will interact with the low-level input/output commands in order to achieve this. Care must be taken as the encryption mechanisms within the steganographic file system will impact on the way in which the bytes are read from the physical disk. The process to read file data from the physical disk is summarised below 1. Obtain the inode number associated with the file from the parent's directory entry. 2. Read the file data specified by the inode into primary memory. (a) Obtain the physical location for each logical file system block. (b) Read the data for each physical block into primary memory. 3. Return the requested file data to the user. Deleting a file Files are removed from the file system when the user no longer has a need for them. An aspect of the steganographic file system is the mandatory secure delete procedures within the file system. All data that is removed from the file system must be completely removed. This includes all the file data, and associated metadata. Normally, in order to improve performance, file data is not removed from the file system, only marked as unallocated and overwritten at a later stage with newer file data. This does present a security risk, as deleted data can be recovered by examination of the physical disk. 150 FILE SYSTEM OPERATIONS FOR SSFS In order to provide security and privacy, all data must be securely re- moved. This process involves overwriting file data during deletion to ensure that it cannot be recovered. This can however impact on performance, as a large file will require a relatively large amount of time in order to overwrite all of the file data. The performance impact is warranted to ensure security and privacy. A number of items must be considered when removing a file from the file system. Firstly the blocks which were allocated to the file must be marked as unallocated. These blocks can then be reallocated to new file system objects. The inode entry corresponding to the file must be reclaimed; the inode entry itself must also be securely removed to avoid any information about the file being recovered. This can be achieved by either writing "zero" or random values to the entire inode entry. The directory entry corresponding to the file must be removed for the parent's directory stream. This is done as discussed in the above sections. Finally the file data must be securely removed by writing "zero" to every physical location which was allocated to the file. This will prevent deleted file data from being extracted from the steganographic file system. To summarise the above process the following events must occur in order to securely remove a file from the steganographic file system. 1. Deallocate the logical blocks allocated to this particular file. 2. Deallocate the inode entry corresponding to this file. 3. Write "zero" or "random values" to the inode entry to securely remove the data it contains. 4. Remove the directory entry corresponding to the file from the parent's directory stream. 5. Write "zero" or "random values" to the physical location where the file was contained to securely remove the data. 7.6 Summary In this chapter we discussed the following concepts: Layered File System Operations - where we introduced the layered model which is used to classify the steganographic file system opera- tions. 7.7. CONCLUSION 151 Low-Level Operations- in this section we describe the low-level oper- ations; these are the operations which interact with the physical device. In this section we covered: - Read and Write Operations - the operations which are used to read and write data to the physical device. Intermediate-Level Operations - where we discuss the intermediate- level operations which are used to modify the hidden file system's meta- data. In this section we covered: - Logical-Physical Translation - these are the operations which are used to perform the logical to physical translation via the Translation Map. - Translation-Map Operations - these are the operations which support the logical to physical translation and control storage management. - !node Operations - these operations are used to control the meta- data for files and directories. High-Level Operations - in this section we discuss the operations which are used to interact with the user. We discuss the following concepts: - Directory Operations- operations which are used to interact with the directory structure of the hidden file system. - File Operations - operations which are used to interact with the file data stored in the hidden file system. 7. 7 Conclusion In order for a user to interact with data in the steganographic file system a number of operations must be defined. These operations will explain the functionality for the file system, and will ensure secure storage and retrieval of data. All of the above discussed operations require interaction with a number of different file system layers to achieve the desired effect. Firstly we introduced the file system layers which are used in order to provide multiple layers of functionality. We then proceed to discuss the low- level operations and the role which they play in interaction with the physical disk. We then discussed the intermediate-level operations with regards to the 152 FILE SYSTEM OPERATIONS FOR SSFS maintenance of the file system metadata. Finally we discussed the high-level operations, specifically the file and directory operations which allow the user to interact the data stored in the file system. This chapter is presented in conjunction with chapter 6 in order to define the layout and operation of the steganographic file system on the physical device. The steganographic structures and operations are only concerned with the embedding of the hidden data within the host file system. In order for data to be securely hidden, the following chapter will present a scheme for securing the hidden data through the use of cryptography. Chapter 8 File System Security for SSFS 8.1 Introduction Data security within the steganographic file system is achieved through two primary techniques; information hiding and cryptography. These two ele- ments work together in order to produce a security scheme which will ensure data remains secure from attackers. There are a number of different aspects which must be taken into consideration when implementing a data security mechanism; these will be discussed in this chapter. In this chapter we will be discussing the security scheme used by SSFS which is used to ensure information security. This is achieved through the use of cryptography. The encryption scheme must support the dynamic reallocation policy which will be discussed in the following chapter. Firstly we will give an overview of the security scheme with respect to information hiding in section 8.2, and data cryptography in section 8.2. We then discuss cryptography in section 8.3 with regards to how cryptographic operations are implemented within the steganographic file system. The cryp- tographic operational layer is then discussed, along with a discussion on transparent encryption in section 8.4. The overall data encryption scheme and encryption hierarchy are discussed in section 8.5 and 8.6 respectfully. Finally a number of performance considerations are presented in section 8.7. 8.2 Security Overview One of the primary goals for a steganographic file system is to provide a high level of data security. Data is secured through two techniques; namely 153 154 FILE SYSTEM SECURITY FOR SSFS information hiding and cryptography. Both techniques are used concurrently in order to provide a security model which will ensure that data will remain secure. Information hiding capabilities are built into the structure of the stegano- graphic file system as discussed in the previous chapters. This provides meth- ods to ensure that data is securely hidden within the structure of a host file system. Cryptography is used to construct another layer of security which will work in conjunction with the information hiding techniques in order to en- sure the security of data. These two combined methods will act together to provide a complete solution for data security. Both the information hiding and cryptographic methods will be discussed in the following sections. 8.2.1 Security through Information Hiding Information hiding is the primary principle on which a steganographic file system is based. This allows data to be hidden within the structure of a host file system. As discussed in the previous chapters, data is hidden in the unallocated blocks of a host file system which can only be accessed through interaction with the steganographic file system implementation. The management of hidden data is maintained by the steganographic file system implementation. Data security is derived, in part, from the process of storing data within the unallocated blocks of the host file system. During normal interaction with the host file system, a user will not be aware of the hidden data stored within the structure of the host file system. The user will only be exposed to data which is stored within the host file system. The presence of the steganographic component of the file system will only be known to the user which created the steganographic file system. Hidden data can only be accessed through the use of a dedicated command shell which will allow access to the hidden file system component, provided that the correct access controls are met. In an unencrypted environment forensic examination of the physical de- vice will reveal the presence of hidden data. Examination of all the physical blocks will reveal that there is a large amount of structured data which is not referenced by the host file system. A forensic examiner could reconstruct the steganographic data which is stored in the unallocated file system blocks. 8. 2. SECURITY OVERVIEW 155 Classic steganography provides a much better cover medium, as data can be "completely hidden" within the high-level structure of a picture or audio file. It is however limited in the amount of data which can be hidden, a picture can only contain a certain maximum amount of steganographic data depending on the overall pixel dimensions. A steganographic file system can contain a much larger amount of data, but is limited by the low-level nature of the cover medium. The unallocated blocks of a host file system is not the ideal place where data can be stored, as it can be easily overwritten due to the dynamic nature of a file system. Data protection methods need to be put into place to ensure that data is not inad- vertently overwritten. The ability to store a large amount of steganographic data is the greatest appeal for a steganographic file system. As discussed above, hiding data within the unallocated file system blocks of a host file system provides very little protection for the data, an experi- enced user or forensic examiner could easily extract the hidden data. Extra security measures must come into play in order to provide complete security for the hidden data. This is achieved through the use of cryptography, which will ensure that hidden data remains secure, even if it is detected. The use of cryptography to secure hidden data will be discussed in the following section. 8.2.2 Security through Cryptography Information hiding and cryptography are used together within the stegano- graphic file system in order to provide a greater level of security. As discussed above, information hiding alone will not sufficiently obfuscate the presence of steganographic data. If data is hidden in its plaintext form, it is a simple matter to extract and reconstruct the information. Cryptography is used to ensure that the hidden data will remain secure, even if it is detected by a third party. The decrypted form of the hidden data will only be accessible with the correct passphrase which is only known to the owner of the data. The steganographic file system allows for data to be transparently en- crypted and decrypted when the correct passphrase is given. The file system user will be unaware of the encryption process, which will allow for more efficient access to the hidden information. The hidden data is encrypted in such a way as not to be effected by the dynamic reallocation of the hidden data, which will be discussed in the following chapters. This ability stems from the logical organisation of the 156 FILE SYSTEM SECURITY FOR SSFS hidden file system blocks within the physical device. This will allow data to be decrypted and access regardless of the underlying organisation of the physical blocks. Introducing cryptographic routines to the file system implementation does introduce a performance impact, as all data must be encrypted and decrypted as it is requested, using a unique key. The security of the system warrants the performance impact which cryptography introduces. This impact can be minimised through the use of modern, efficient, cryptographic algorithms. The performance concerns will be discussed in a later section. In the following section we will discuss the use of cryptography within the steganographic file system. 8.3 Data Cryptography All data which is store within the steganographic file system is transparently encrypted and decrypted as required by the user. This allows the complex- ity of managing data encryption to be the responsibility of the file system implementation. Data encryption is implemented as an operation within the intermedi- ate layer (see section 7.4, on page 133). A cryptographic block cipher (see section 3.3, on page 37) is used for the encryption process, as this simpli- fies the overall implementation. Data is encrypted and decrypted as discrete blocks of data of a particular size, which is depended on the block size of the cryptographic algorithm being used and the file system block size. In this section we will discuss the choice of cryptographic algorithm as this will play an important role in the overall performance of data access within the file system. 8.3.1 Choice of Algorithm The choice of cryptographic algorithm will have a direct impact on the over- all performance of the file system implementation. Different aspects of the algorithm will influence the design and construction of the file system. There are a number of considerations which must be made when selecting a cryp- tographic algorithm, these are listed below. 8.4. CRYPTOGRAPHIC LAYER 157 1. Block cipher- the cryptographic algorithm must be a block cipher. This allows the file system to encrypt and decrypt "blocks", which can be easily reallocated if needed. 2. Cryptographic block size - as discussed in the previous chapters, a file system reads and writes data in discrete blocks, equivalent to the file system block size, which is a multiple of the physical block size. The cryptographic block size and the file system block size must be cleanly divisible, to allow for interaction between the two components. 3. Performance - modern cryptographic algorithms are designed to maximise performance; to encrypt and decrypt as fast as possible. This will allow data to be encrypted and decrypted within the file system very quickly and efficiently. This will have a direct impact on the overall performance of the file system implementation. 4. Security -the cryptographic algorithm must provide an adequate level of data security in terms of the strength of the overall crypto- graphic cipher. As modern computers become more powerful, crypto- graphic algorithms become more secure to prevent unauthorised par- ties from accessing the encrypted data. This aspect is dependent on the overall strength of the user's passphrase, as a weak passphrase will negate any security benefit the cryptographic algorithm provides. The Serpent algorithm (see section 3.3.3, on page 43) is a good example of a current cryptographic algorithm that satisfies the above requirements. Any block cipher can be chosen to perform this task, the particular cipher will be chosen in line with the functional requirements for the file system implementation. In the following section we will discuss the cryptographic layer of SSFS with regards to the interaction with the overall file system implementation. 8.4 Cryptographic Layer The cryptographic functions are implemented as an extension of the inter- mediate layer, as discussed in section 7.4 on page 133. The aim for the cryp- tographic layer is to implement a transparent encryption extension to the file system, which will allow file system data to be encrypted and decrypted without user interaction. 158 FILE SYSTEM SECURITY FOR SSFS I Low-Level Operations I Intermediate-Level Operations l Cryptographic Layer I High-Level Operations I Figure 8.1: Cryptographic layer A cryptographic block cipher operates on discrete sized blocks of data. This will allow the cryptographic layer to exist within the intermediate layer, which is also concerned with discretely sized blocks of file system data. As discussed in the previous chapter, when data is stored within the file system, data flows through each of the operational file system layers culminating in storage on the physical device. When data is accessed from the file system, it again flows through each layer until it is presented to the user. Each file system operational layer operates on progressively smaller sized "portions" of data. As mentioned above the cryptographic layer is an extension of the inter- mediate operational layer, when data flows through the intermediate opera- tions it is either encrypted and written to disk, or decrypted for presentation to the user. The flow of information through the file system operational layers is shown graphically in figure 8.1. In the following section we will discuss the concept of transparent en- cryption and the importance which it plays in the overall data encryption scheme. 8.4.1 Transparent Encryption The concept of transparent encryption is important to the overall operation of the steganographic file system implementation. Transparent encryption allows the complexities of data encryption to be taken away from the user, and allows the file system to manage all the encryption and decryption of user data. 8.4. CRYPTOGRAPHIC LAYER
: Disk I I l ! Operation :---------}--------------------- 1 Transparent Encryption i j Intermediate Layer j Unencrypted: Figure 8.2: Transparent encryption 159 When a user is working with the steganographic file system, all data which is requested will be encrypted or decrypted as it is accessed from the physical device. The user will not have to interact with the cryptographic operations in any way. The user will only interact with the plaintext form of the data, with the lower levels of the file system implementation managing the encryption and decryption process. The transparent encryption mechanism operates on the user data. This will allow the user to remain confident that the data which is being stored within the steganographic file system will be secure. Ideally the crypto- graphic algorithm which is used to encrypt and decrypt the user data should be capable of operating as fast as possible, in order to maximise performance. Data flow with a transparent encryption extension is demonstrated graph- ically in figure 8.2. In the above sections we discussed the requirements and components of the cryptographic system which is used to manage the transparent data en- cryption. In the following section we will discusses the use of unique initial- isation vectors to form an overall system with a greater level of security. 160 FILE SYSTEM SECURITY FOR SSFS 8.5 File System Data Encryption Scheme In order to achieve a transparent encryption mechanism, discussed in the above section, an encryption scheme for the file system implementation must be developed. There are a number of aspects of the file system and the cryptographic system which must be considered, in order to fully implement a transparent encryption system. 8.5.1 Data Classes All data which is stored within the file system is divided into two different data classes, namely system data and user data. This division of data is done in order to specify different realms of data, where access to the specific data class can be controlled through the file system implementation. System data consists of the Superblock, the TMap Array, and the Trans- lation Map. User data consists of the Inode table, and the file and directory data. Access to the encrypted data is managed by the file system imple- mentation and controlled through the type of data class to which the data belongs. Access to the system data is not as tightly controlled as the access to the user data. As discussed in the previous chapters, only a minimum of information is contained within the steganographic file system's metadata structure, particularly to minimise the exposure of the user data. At the same time to provide adequate support for the interaction between the host and hidden file system implementations. This allows the system data to not be as tightly controlled, as the exposure of user data will be minimal. All user data is encrypted with a unique initialisation vector which will be discussed in detail in the following sections. System data must be available to the host file system in order to facilitate in the dynamic reallocation process, which is why there are less restrictions on this data class. In order to allow for interaction between the particular data within the hidden file system, the interactions between the system and user data will be discussed in the following section. 8.5.2 Interactions The need for specific data classes can be seen by presenting the following scenario. The file system implementation must be able to dynamically real- 8.6. ENCRYPTION HIERARCHY 161 locate data within the hidden file system when the host file system request that a specific file system block be written to. In order to maintain security the host file system must not have access to the user data stored within the file system, yet the encrypted user data must be able to be reallocated when requested. To achieve this, the host file system will require access to the hidden file system's Translation Map to determine if a particular file system block contains any hidden file system data. As discussed in the previous chapters, the Translation Map is itself reallocatable; as such the host file system will require access to the hidden file system's TMap Array in order to locate the exact position of the Translation Map. The Translation Map and TMap Array must be accessible to the host file system. By keeping a minimum of information within these metadata structures, and ensuring that user data is strongly encrypted, there is only a very limited risk presented to the user data. The worst case is that an attacker can determine the location of the encrypted data, not its content or construction. An inexperienced user should never be aware of the presence of the hidden data. It is important to note these interactions because of the direct impact which it will have on the security of the hidden file system, and to ensure that user data remain strongly encrypted. In the following section we will discuss the encryption hierarchy which is used to ensure that user data will remain secure. 8.6 Encryption Hierarchy The encryption hierarchy is used to define a security scheme which will en- sure that user data will remain secure. The process utilises multiple randomly generated initialisation vectors (IVs), on multiple levels, in order to encrypt user data. This process is controlled with the master passphrase which the user will specify when the file system is initialised. This process will ensure that the exposure of the master passphrase, and therefore the user data, is reduced and this will limit the possibility that the passphrase can be brute- forced1. 1 Bruteforcing - the process of extracting a passphrase by randomly trying different possibilities until the passphrase is found. Usually this will occur halfway through the key-space 162 FILE SYSTEM SECURITY FOR SSFS The encryption hierarchy is formed by using the master passphrase when the file system is initially accessed, this will give access to another unique randomly generated IV which will in turn give access to the file system's metadata. This IV can then be used to obtain the IV s for specific user data. The process of using a number of different IV s to secure different aspects of the overall file system will greatly increase the overall security. This scheme is similar in design to the Derived Unique Key Per Transac- tion (DUKPT) key management scheme. This allows hidden data to remain secure even if one of the encryption keys is compromised. Although DUKPT is normally used to secure transactions between two parties, the idea can be adapted to allow SSFS to secure hidden data with a set of unique initialisa- tion vectors. In the following section we will discuss the initialisation vectors in detail with regards to the overall encryption scheme, and their use in accessing various aspects of the file system data. 8.6.1 Initialisation Vectors (IV) Data which is stored within the hidden file system is encrypted with randomly generated initialisation vectors (IVs). For the purposes of this discussion we assume the use of the Serpent algorithm as discussed in section 3.3.3. The Serpent algorithm uses a 256-bit IV for encrypting data. The IV can range in size from 64-bits to 256-bits, as an IV shorter than 256-bits is padded with zero so that the IV is always 256-bits in size. An IV of 256-bits gives 2 256 (or 1.15792089 x 10 77 ) different possible key combinations. This key-space is so large that it is impractical to randomly guess the correct key using modern computers. There is a unique IV which is generated for all the data which is stored within the hidden file system. When the file system is created, the user will specify their master passphrase which will be transformed into an IV to be used to "unlock" the file system metadata. The inode entries can then be accessed, which will in turn allow the directory structure and user data to be accessed. As discussed in the above chapters, there is no distinction between direc- tories and files, they are both considered to be forms of user data. When user data is created, a randomly-generated IV is stored within the associated inode entry (see listing 6.4 on page 113). This IV is then used to encrypt and decrypt the user data transparently when requested. 8.6. ENCRYPTION HIERARCHY 163 The inode table is also encrypted transparently, using the key field within the superblock. Portions of the superblock are encrypted with the master passphrase. These encryption levels give complexity to the overall system, making it very difficult for an attacker to forcibly gain access to the user data as multiple layers of encryption will have to be overcome. In order to facilitate the interaction between the host file system and the hidden file system, the Translation Map and TMap Array will remain unencrypted. This is to allow for dynamic reallocation to take place. This does not pose a significant security threat as to the unaware user these will appear to be blocks of unrelated integers. As can be seen in figure 8.3, the encryption hierarchy is formed through the interaction of multiple IV s. Firstly the master pass phrase is used to access the file system metadata. The superblock IV is then used to access the individual inode IV s. Finally the user data is accessed on the physical device. User File System I Master Passphrase L -L Superblock IV I I I I I Inode Table I I I I !node IVs I I I I I I User Data I EJ I I I I I Figure 8.3: Initialisation vector hierarchy The IV hierarchy can also be described as seen below, where IVmp is the master passphrase, and l"Vsb is the superblock IV. IV[o,n] is a set of IVs which are used to encrypt and decrypt the user data, U D[o,n], such that IVa is used to encrypt U D 0 , and /V 1 is used to encrypt U D 1 . There number of unique items of user data, n defines the size of the set of randomly generated IVs. 164 FILE SYSTEM SECURITY FOR SSFS IVmp -- IVsb -- IV[o,n] -- U D[o,n] The master passphrase is used to access the superblock IV. The su- perblock IV is then used to access the set of IVs relating to the unique items of user data. This set of IV s is stored in the inode entries for each item of user data. In the following section we will present an operational scenario in order to explain how different portions of the encryption hierarchy will interact in order to secure the hidden user data. 8.6.2 Operational Scenario In order to fully discuss the operation of the IVs with regards to the en- cryption hierarchy, an operational scenario is presented below. When a user initialised the file system, the master passphrase is specified. This is used to encrypt various portions of the file system metadata. The superblock IV, specified by the i v field in the superblock structure (see section 6.1 on page 107), is used to encrypt the inode table and the inode entries. Each inode entry contains a key field (see listing 6.4 on page 113), which is used to encrypt the associated user data. The user will request operations on the user data, and will always be presented with the unencrypted plaintext form, providing that the correct master passphrase is specified when the hidden file system is initially ac- cessed. When a user request hidden data, the file system will use the master passphrase to decrypt and access the superblock. The IV which is stored in the superblock will then be used to decrypt the inode table, and finally access the particular inode entry of the data being requested. A particular inode entry contains a unique IV which will be used to decrypt an item of hidden user data. All the hidden data will therefore be encrypted using a different IV, which will increase the overall complexity of the encryption scheme. 8. 7 Performance Concerns The overall performance of the steganographic file system must be kept in mind, especially concerning the transparent encryption operation. There 8.7. PERFORMANCE CONCERNS 165 should not be an excess amount of time which is consumed by the encryp- tion and decryption process. In the following section we will discuss the transparent encryption operations with regards to various file system opera- tions. Accessing the file system metadata Portions of the file system metadata are encrypted with different IVs, such as portions of the superblock and the inode table. In order to access the inode table, it must first be decrypted. All inode entries in the inode table will be encrypted with the same IV. Access to an inode entry will only require a single level of encryption. The encryption of the inode entries will have an impact on the access of the file and directory data, which will be discussed below. Accessing files and directories As discussed above, file and directory data is encrypted with a unique IV. This will increase the overall security of the system as different portions of the user data are encrypted independently of each other. This does not present a large performance impact as access to a single file or directory will only require one to two accesses to encrypted data. As discussed in the previous chapters, directory streams contain a linear list of all the objects which they contain, in the form of a human-readable name and an associated in ode number. When a directory stream is de- crypted, access to this entire list will be given. This will allow the user to navigate the directory structure. Directory streams are not very large in comparison to other forms of user data; so as a result, a relatively small amount of time is required to decrypt the directory stream. Files will form the bulk of the user data which will be stored within the file system. A relatively large amount of time will be required to decrypt the file data stream. This can present a large performance impact as the size of the file grows large. As discussed in the previous chapter, the user data is stored in discretely sized blocks equal to the size of the file system block size. User data which consumes a large amount of space is encrypted and decrypted in these discrete blocks and therefore portions of the file can be accessed as needed. The worst case scenario will be when there is access to multiple files and directories in a single operation. This will require the file system to decrypt 166 FILE SYSTEM SECURITY FOR SSFS data from multiple inode entries, which could introduce a large performance impact as the number of files contained in the file system grows larger. 8.8 Summary In this chapter we covered the following sections: Security Overview - in which gave an overview of the security scheme as used by SSFS. This included a discussion on the following concepts: Security through Information Hiding - in this section we discuss information security which is provided by information hiding. - Security through Cryptography- in this section we discuss infor- mation security which is provided by cryptography. Data Cryptography - where we introduce cryptography as used by SSFS to encrypt data. - Choice of Algorithm - where we discuss the choice of crypto- graphic algorithm to be used in SSFS, along with a discussion on the categories use to choose such an algorithm. Cryptographic Layer - where we discuss the cryptographic layer as an extension to the intermediate-level operations. - Transparent Encryption - where we discuss the ability for the cryptographic layer to provide transparent encryption and decryp- tion of hidden data. File System Data Encryption Scheme - in this section we discussed the scheme used by SSFS to classify the type of data which is to be encrypted. This included a discussion on the following concepts: - Data Classes - which are used to manage how certain forms of hidden data within SSFS is encrypted, namely system data and user data. - Interactions - where we outline the interactions which have to be made between the host and hidden file systems, which outlines the need for specific data classes. 8.9. CONCLUSION 167 Encryption Hierarchy -in this section we describe the hierarchy which is formed by allowing access to certain data types which is dependent on other data types. We discuss the following concepts in this section: - Initialisation Vectors (IV) - which are used to from the encryp- tion hierarchy by using unique IV s for each type of data. - Operational Scenario- where we present an operational scenario in order to describe the overall operation of the security scheme. Performance Concerns - in this section we discuss a number of per- formance concerns related to the encryption scheme, and the impact which it will have on access to hidden files and directories. 8. 9 Conclusion The security of the data within the steganographic file system plays an im- portant role in ensuring that a user can store data, confident that it will not be compromised. This is achieved through the creation of a security scheme which will offer an adequate level of data security while not compromising on the overall performance. This balance is achieved through the use of modern and efficient cryptographic algorithms, and specific methods of encrypting data. In section 8.2 we gave a security overview, in order to explain how data security is achieved with in the steganographic file system. We then go on in section 8.3 to discuss data cryptography with particular focus on the require- ment of a cryptographic algorithm. In section 8.4 we introduce and discuss the cryptographic layer as an operational layer which is essential to perform- ing transparent data encryption. We go on in sections 8.5 and 8.6 to discuss the data encryption scheme and encryption hierarchy respectfully, both of which are used to provide a secure data encryption mechanism. Finally in section 8. 7 we address a number of performance concerns regarding the use of data encryption. Information security through cryptography allows us to confidently hide information within the steganographic file system. In the following chapter we will discuss dynamic reallocation to avoid "collisions" between the hidden and non-hidden data, which forms the basis for the non-duplication ability of the steganographic file system. Chapter 9 Dynamic Reallocation 9.1 Introduction In order to avoid duplication of hidden data and thus avoid data collisions, a dynamic reallocation mechanism is introduced, which will give the ability for hidden data to be automatically reallocated as needed by the host file system. The hidden data reallocation mechanism will build upon the host file system's existing write operation, which will check for, and reallocate hidden data as needed from physical locations on a device. The purpose of this chapter is to define the dynamic reallocation mech- anism used by SSFS in order to avoid collisions between hidden and non- hidden data. In order to explain the dynamic reallocation process, an overview is pre- sented in section 9.2 , in which we will introduce the operational processes. In section 9.3 we discuss the details for the dynamic reallocation process. At which point we will discuss access to various hidden file system structures in section 9.3.1 and the write operations redirection in section 9.3.2. These two sections allow the host file system to execute the dynamic reallocation functions. In section 9.3.3 we discuss the redirection process, which is followed in section 9.3.4 we discuss the reallocation categories which describe how the reallocation process should be handled. Finally in section 9.3.5 we discuss the sacrificial and preserving operational modes which are used to control the reallocation process there are no longer any available unallocated file system blocks. 169 170 DYNAMIC REALLOCATION 9.2 Overview The dynamic reallocation mechanism provides the core functionality to avoid unnecessary data duplication on the hidden file system. In simple terms the dynamic reallocation mechanism will move hidden data away from requested physical location by the host file system. The host file system will then have the ability to operate unhindered, and unaware of the underlying hidden file system. The dynamic reallocation mechanisms will interact with the read and write requests of the host file system, by redirecting these requests to a set of reallocation procedures, in order to ensure that hidden data is reallocated in a secure and reliable fashion. As discussed in previous chapters, the hidden file system utilises a number of different on disk structures in order to manage the storage of hidden data, the structures which are relevant to the dynamic reallocation process are listed below: 1. The Superblock ~ records the location of the important on-disk struc- tures for the hidden file system. 2. The TMap Array - records the physical location of the Translation Map. 3. The Translation Map ~ r e c o r d s the mapping between the hidden file system logical blocks and physical locations. The design of the hidden file system allows the physical blocks where hidden data exists to be modified without a need to make significant changes to the hidden file system control structures. When a block of hidden data must be reallocated, only a modification to the Translation Map need occur. The logical layout of the data within the hidden file system ensures that the logical blocks allocated to hidden data need not change. The design of the encryption scheme as discussed in the previous chapter will allow hidden file system blocks to be reallocated without the need to re-encrypt the data. The overall design of the hidden file system supports the dynamic realloca- tion process, while minimising the amount of data modification needed for a reallocation operation. In the following section we will discuss other possible collision avoidance techniques. We will then contrast that to the dynamic reallocation mecha- nism chosen for our solution. 9. 2. OVERVIEW 171 9.2.1 Other Possible Collision Avoidance Techniques There are other possible solutions which can be used to avoid collisions be- tween the hidden and non-hidden data. We will briefly discuss these pos- sibilities and then discuss why the dynamic reallocation of the hidden data was chosen as the most appropriate solution. 1. Reallocation of the non-hidden data - this would involve reallocating the host file system data in order to allow the hidden data to be written unhindered to the storage device. This is unacceptable as this would re- quire extensive modifications to the host file system's implementation. This is contrary to the design goals outlined in section 5.4 on page 90, as this would hinder the backward compatibility with the original host file system driver. 2. Utilise a shared storage map - use a single storage map for both the hidden and non-hidden data. This again would require extensive mod- ifications to the host file system implementation, and could be used to easily identify physical blocks which contain hidden data. Both of the above solutions where rejected because they require extensive modifications to the host file system implementation. The host file system implementation would no longer be backward compatible with the original file system implementation. Dynamic reallocation of the hidden data ensures that the structure of the host file system remains intact, hidden data can then be stored and secured in a way which is detached from the host file system implementation. In the following section we will present an operational scenario which will demonstrate the principals used in the dynamic reallocation process. 9.2.2 Operational Scenario In this section we will discuss an operational scenario in order to demon- strate the basic principle of dynamic reallocation. In order for the dynamic reallocation of hidden data to be achieved, a degree of interaction between the hidden file system and the host file system must be introduced. These interactions must be kept at a minimum in order to minimise the exposure of hidden data. Imagine hidden data which is stored within the hidden file system, at some particular physical location, called block H. During some point in 172 DYNAMIC REALLOCATION time, the host file system implementation will request that its data be stored in block H, which as stated before contains hidden data. This host file system is unaware that hidden data exists at that location, and as such considers it a valid block for allocation. To avoid the hidden data being overwritten, and the hidden file system object which it belongs to becoming unusable, the dynamic reallocation mechanism is invoked to move the hidden data out of the way. In order to allow the hidden file system to locate this block, the control structures are updated to reflect the new physical location. This operation allows both the host and hidden file system data to remain intact and usable, without the need to duplicate the hidden file system data. Recall that in order for the physical location of hidden data to be located the Translation Map is used to provide a mapping between the logical position within the hidden file system and the physical position on the device. Also recall that the physical location of the hidden data is not recorded by the host file system, as far as the host file system is concerned, all available physical blocks are "available" for allocation. This forms the crux of the dynamic reallocation mechanism, when the host file system requests that a physical block be written to, any hidden data which might be stored in that physical block must be "reallocated" in order to preserve it. It is important to note that hidden data must remain intact after a reallocation has occurred; this is facilitated through the logical positioning of hidden data within the hidden file system. A process overview is presented in the following section, which will define the basic operational process which is followed by the dynamic reallocation mechanism. 9.2.3 Process Overview As introduced above, in order for the dynamic reallocation process to take place interaction between the hidden data and the host file system must be allowed in some regard. The dynamic reallocation process can be summarised into the following steps: 1. Intercept and redirect the host file system's write request. 2. Check to see if the physical block which is requested above contains hidden data. If it does ... (a) Determine a new unallocated physical location where the hidden data can be stored. 9.3. OPERATIONAL DETAILS 173 (b) Move the hidden data to the new location. (c) Update the hidden file system's control structures. 3. Allow the host file system to write to the requested location. As can be seen from the above process, the host file system will require a level of interaction with the hidden data. In order to determine the physical locations where hidden data exists, the host file system will require access to the Translation Map. In order to access the Translation Map, the host file system will require access to the TMap Array. In turn, in order to access the TMap Array, access to the hidden file system's superblock will be required. No other structures are required for dynamic reallocation; remember that the host file system is not concerned with the structure of the underlying hidden data, only the raw data itself. The design of the hidden file system component allows for data to be referenced independently of the physical location on the device. Remember that all data within the hidden file system is referenced in terms of the logical position of the data, and is mapped to a particular physical location through the use of the Translation Map. This allows the logical references to the hidden files and directories to remain consistent in the hidden Inode Table, and the hidden Directory Entries. Any block of hidden data can exist in any physical block in the device, as long as the mapping between the hidden logical blocks, and the physical device blocks is valid. The individual blocks of hidden data are all encrypted independent of each other; this allows for hidden data block to be moved freely within the physical device, and without the need for any encryption or decryption to take place. The security implications with regard to the encrypted hidden data will be discussed in a later section. In the following section we will discuss each of the operations in this process in detail. 9.3 Operational Details To describe the entire dynamic reallocation process we will now discuss a number of concepts which will allow hidden data to be safely and securely reallocated. In the following section we will discuss host file system access to the hidden file system control structures to facilitate the reallocation process. 174 DYNAMIC REALLOCATION 9.3.1 Access to Hidden File System Structures As discussed above, the dynamic reallocation methods within the host file system will require access to a number of hidden file system structures. The particular structures in question are the Superblock, the TMap Array, and the Translation Map. The core function of the dynamic reallocation methods is to modify the Translation Map when a hidden block is moved to a different location. As mentioned above in order to locate the Translation Map on the physical device (which itself is reallocatable), the TMap Array must be accessed. Likewise in order to determine the length of the TMap Array, particular fields of the hidden file system's superblock must be accessed. The fields in question are the tmap_start, and tmap_length fields, as seen in Listing 6.1 on page 107. We will now discuss the redirection of the write operations in order to allow the dynamic reallocation mechanism to operate. 9.3.2 Write Redirection When the host file system requests a write to a particular physical loca- tion on the device, it will invoke some or other function which will actually perform the write operation. For the purposes of the following discussion, imagine that the write function as invoked by the host file system takes the form of a function with the following prototype: 1 int write(int position, void* buffer, int length); This write method will accept the physical position when the host data will be written, a memory buffer containing the data, and the length (number of bytes) of the data in the buffer. This method will invoke the kernel's write methods and write the data permanently to the specified physical location on the device. There is enough information contain within this function call in order to determine if the requested physical location contains any hidden data. This process will be discussed in the following section. The redirection of the write involves redirecting the execution of the write operation, performing dynamic reallocation of hidden data, if required, and then resuming the normal execution of the write operation, as seen in fig- ure 9.1. It is important to note that only the hidden data is reallocated. 9.3. OPERATIONAL DETAILS 175 When the host file system requests that data is written to the device, it will have priority over the physical location. This is specifically done in order to minimise the processing which must occur when hidden data is to be reallo- cated, and to keep modification of the host file system implementation to a minimum. In the following section we will discuss the actual redirection of the write operation. Write Operation Reallocation Operation ----- ~ ~ ~ ~ ' ~ : ~ ~ : ~ ~ ~ ~ J : I --- Figure 9.1: Write operation execution redirection Write Operation Execution Redirection The host file system's write operation will perform a number of operations which will eventually result in the data being written to the physical device. In order to ensure that hidden data is reallocated away from the requested physical location, another operation must be added to the overall function. This extra operational step will redirect the execution path of the write operation towards the reallocation functions. In order to achieve this, the original write function is modified to execute the reallocation methods, this can be seen in figure 9.2. As can be seen from figure 9.2, only a very minor modification (see line 6 of the Modified Function) is made to this simplistic write function in order to allow the host file system to perform the reallocation. At this point of the execution, the dynamic reallocation function is considered to be a "black- box" function - in that we are not concerned with the operation of this function. Once the dynamic reallocation process has complete, the write operation will continue with an unhindered write to the physical device. The purpose of having the dynamic reallocation functionally separate from the host file system's write will be discussed in the following section 176 DYNAMIC REALLOCATION Original Function Modified Function int write ( dev, pos, buf, len) 1 int write (device, pos, buf, len) { 2 int ret = 0; 2 int ret = 0; 3 seek ( dev, pos); 3 seek (device , pos); 4 4 5 5 //reallocation if needed 6 6 check_rcallocation (device, pos); 7 7 8 //write to the device 8 / jwrite to the device 9 ret=write(dev, buf, len); 9 ret=write (device, buf, len); 10 10 11 return ret ; 11 return ret; 12 12 } Figure 9.2: Function modified with reallocation methods Black-box Reallocation As introduced above, the dynamic reallocation methods are considered to be a black-box extension to the host file system implementation, this allows for a distinct separation between the host file system implementation and the dynamic reallocation methods. The separation is shown graphically in figure 9.3. The host file system will therefore not have direct access to the hidden file system control structures (which as discussed above are used to facilitate the reallocation process). This will limit the exposure of the hidden data, and increase the overall security during the dynamic reallocation process. The hidden file system control structures which are exposed during this process are not complete enough to extract specific hidden data from the hid- den file system. The information which can be obtained from these structures can only expose the presence of hidden data, not the content. Recall that the hidden data, the inode table and the directory entries are encrypted and not directly exposed during this process. The data encryption, combined with the separation of the host file system implementation and the dynamic real- location mechanisms allows for secure reallocation of hidden data, without the worry that the hidden data can be compromised. Write redirection allows the dynamic reallocation mechanisms to be exe- cuted. In the following section we will discuss the hidden data reallocation process, which will allow hidden data to be reallocated as needed when then host file system requests a write to a physical block. 9.3. OPERATIONAL DETAILS Known Implementation Execution I Write Operation I EJ Unknown Implementation -, Reallocation Operation Blackbox Figure 9.3: Reallocation black-box functions 9.3.3 Hidden Data Reallocation 177 Hidden data reallocation forms the core of the dynamic reallocation mecha- nisms. The dynamic reallocation process is used to determine if a physical block contains hidden data, and if needed, move the hidden data it con- tains to another unallocated physical block. In order to determine if a block contains hidden data, the following process is followed: 1. Search the Translation Map to determine if a logical block maps to a particular physical block. 2. If such a mapping exists ... (a) Determine the Reallocation Category, as this will affect how the reallocation must be handled. (b) Locate a free physical block which is unallocated in both the host and hidden file systems. (c) Reallocate the physical block to the new location based on the Reallocation Category, generally the following will occur: 1. Move all the data from the specified physical block to the new location. 11. Update the mapping Translation Map to reflect the new phys- ical location for the physical block. 178 DYNAMIC REALLOCATION (d) Update the hidden file system's control structure on the physical device. 3. Continue execution of the write function. The dynamic reallocation mechanism is deliberately designed to reallocate hidden data. This imparts a level of security to the hidden data in that hidden data is not static, the physical position of the hidden data can change over time. This obscures the hidden data which makes it difficult to determine its exact position, and thus increasing the overall data security of SSFS. The reallocation of hidden data also ensures that when the host file system requests a physical block for non-hidden data, it is depended only on the free blocks available to the host file system, and not on the position of the hidden data. This reinforces the separation of functionality between the host and hidden file system. It is the responsibility of the hidden file system to mange hidden data, and the responsibility of the host file system to manage non- hidden data. This reduces the exposure of the hidden data and thus will increase overall security. The requirements for locating a free physical block will be discussed in the following section, this will be followed by a discussion of the Reallocation Categories as a method of describing the type of data which is contained in a physical block. Searching for an Unallocated Block In order for hidden data to be safely reallocated within the file system, an un- allocated block must be located. A physical block must satisfy the following two constraints: 1. Be marked as unallocated in the host file system. 2. Not be mapped to any logical block in the Translation Map. If these two constants are not met, then there runs the risk of overwriting data in one of the two file systems. These two constraints are derived from the way in which hidden data is stored within the file system. Recall that hidden data is stored in the unallocated block of the host file system. The physical blocks utilised by hidden data are not marked as allocated in the storage map of the host file system, only marked as mapped to a hidden logical block in the Translation Map. Therefore both of the above constants 9.3. OPERATIONAL DETAILS 179 must be satisfied by a physical block in order for it to be considered for reallocation. The method of determining if the two constants depends on the construc- tion of the storage map of the host file system, however the ability to search for a free block will be provided by the host file system's implementation, it is then a simple case to compared it against the mapping in the Translation Map in order to determine if it is suitable for reallocation. In the following section we will discuss the Reallocation Categories as a mechanism for determining the appropriate course of actions when realloca- tion a particular physical block. 9.3.4 Reallocation Categories The reallocation category for a particular physical block refers to the type of data which exists in that block. The type of data can be described by which logical block is mapped to the particular physical location. Consider figure 9.4, which shows the logical layout of the hidden file system and which reallocation category will be used for different logical blocks. The figure represents the logical layout of the hidden file system where m blocks are allocated to the file system in total, and the Translation Map is n blocks long, therefore the Translation Map ends at block offset n + 1 and the hidden data continues from block offset n + 2. Hidden Logical Layout 0 1 n+l n+2 m Superblock ~ 'franslation Map Normal Data Superblock Category Translation Map Category Normal Category Total blocks allocated ------> m blocks 'franslation Map size ------> n blocks Figure 9.4: Reallocation categories The reallocation of hidden data will fall into the following three categories: 180 DYNAMIC REALLOCATION The Superblock category. The Translation Map category. The Normal Data category. The reallocation categories are not be confused with the hidden file system control structures of the same name. The name of the reallocation category describes the type of data which exists a particular physical block. Depending on the type of reallocation category, the reallocation of the physical blocks must be handled in slightly different ways. Superblock Reallocation Category The Superblock Reallocation Category is used when the host file system re- quests a write to physical block 0 (the superblock). Remember that physical block 0 contains the host file system's superblock, the hidden file system's superblock, and the TMap Array. When there is a write to physical block 0, no reallocation will take place, if and only if the data to be written to the physical block is equal to the byte size of the host file system's superblock. This category will be used whenever the host file system wants to modify its superblock, as long as the requested write does not overwrite the hidden file system superblock or TMap Array, there is no need to perform a reallocation. Translation Map Reallocation Category The Translation Map Reallocation Category will be used when the host file system requests a write to a physical location which is mapped to the Trans- lation Map. For instance, if the Translation Map is n block in length, then a when the host file system requests a write to a physical location which is mapped to a logical block in the range 1 ---t ( n + 1), then the Translation Map Reallocation Category will be used. Remember that the Translation Map itself is reallocatable, and is located using the TMap Array. When a reallocation of a block allocated to the Translation Map is required the following must occur: 1. Move the data from the specified physical block to a new location. 2. Modify the logical to physical block mapping for this physical block in the Translation Map (even though is allocated to the Translation Map). 9.3. OPERATIONAL DETAILS 181 3. Update the physical location of the particular block allocated to the Thanslation Map in the TMap Array. The above process will ensure that the Translation Map can always be located regardless of the physical blocks which it occupies. Normal Data Reallocation Category The Normal Data Reallocation Category is used when requested physical location is mapped to a logical block which is occupied by the hidden file system's Inode Table, Directory Entries, or File Data. For instance, if the hidden file system consists of m logical blocks, and the Thanslation Map is n blocks in length, then any physical block which maps to a logical block in the range (n + 2) ---t m will fall into this category. This category will be used the most, as the bulk of data in the hidden file system will fall into this range. In order to reallocate data within this category, the following must occur: 1. Move the data from the specified physical block to the new location. 2. Modify the logical to physical mapping for this physical block in the Thanslation Map. The above process will ensure that normal hidden data can be located anywhere on the physical device through the Thanslation Map. In the fol- lowing section we will discuss both sacrificial and preserving modes, as a mechanism for handling how the dynamic reallocation mechanism will be- have if there are no longer any free physical blocks available for reallocation. 9.3.5 Sacrificial versus Preserving A problem that will arise with reallocating hidden data to the unallocated blocks of the host file system is that eventually the number of available un- allocated blocks will become exhausted. This arises from the fact that as hidden data is moved the physical block becomes allocated by the host file system in order to store its own data. This will become an inevitable side- effect of embedding the hidden data within the host file system. In order to preserve the hidden data from conflicting with the non-hidden data, there 182 DYNAMIC REALLOCATION are two allocation modes presented below, namely sacrificial and preserv- ing. Both of these two modes define how the reallocation mechanism should behave if the unallocated block within the file system becomes exhausted. Either one of these modes will come into play, depending on the preference of the hidden file system, as specified in the hidden file system superblock in the flags field (see listing 8, on page 107). In the following sections we will discuss the sacrificial and preserving modes. Sacrificial Mode The policy for allocating blocks in this mode is to give priority to the host file systems data. In this mode hidden data will be overwritten in favour of the non-hidden data. This will result in the hidden file system be destroyed. This can be useful if the hidden data is to only be stored for a limited period of time, or data is to be hidden for "one-time" use only. The hidden file system will destroy itself as the number of allocated blocks in the host file system increases. This mode will usually not be the ideal mode of operation, as hidden data will usually be required to be stored for a greater period of time. In the following section we will present the Preserving Mode as a method of allocating block to the host file system as to ensure that hidden data is never lost. Preserving Mode Preserving Mode will give priority to data stored in the hidden file system. When the point where the number of unallocated blocks available to the host file system is equal to the number of blocks allocated to the hidden file system, this allocation mode will be enforced. When this point is reached no more writes to the physical device by the host file system will be allowed. This will ensure that the hidden data will remain intact, which will generally be the preferred method of operation. In order to minimise the possibility that either of the two allocation modes discussed above will come into play, the maximum number of blocks which this hidden file system can consume must be kept to a minimum. These policies will only ever come into play as the host file system becomes very full (which is generally not the case on most computer systems). However, by keeping the number of available blocks for the hidden file system down, 9.4. SUMMARY 183 this likelihood of these allocation modes coming into play will be kept to a minimum. 9.4 Summary In this chapter we covered the following sections: Overview - where we introduce the dynamic reallocation process. We cover the following concepts: Operational Scenario- in this section we provide an operational scenario in order to demonstrate the dynamic reallocation process. Process Overview - in this section we give an overview of the dynamic reallocation process, by explaining the generic process used to perform the dynamic reallocation of hidden data. Operational Details - where we discuss the operational details for the dynamic reallocation process, we discuss the following concepts: Access to Hidden File System Structures - in this section we discuss the need for host file system to access the hidden file system control structures. This allows the host file system to determine if a particular physical block contains hidden data. - Write Redirection - where we discuss how the dynamic realloca- tion process is started. This is achieved by modifying the host file system's write operation to invoke the reallocation methods. Hidden Data Reallocation - in this section we fully discuss the dynamic reallocation process. Reallocation Categories - in this section we discuss how the dy- namic reallocation process will handle certain types of hidden data. This allows hidden data to be reallocated in the correct way depending on which category it falls into. - Sacrificial versus Preserving- where we discuss how the dynamic reallocation mechanism will operate if there are no longer any free physical blocks available for reallocation. 184 DYNAMIC REALLOCATION 9.5 Conclusion In this chapter we discussed the concept of dynamic reallocation as a mecha- nism for the secure reallocation of hidden data by the host file system. This provides a mechanism for the host file system to avoid collisions between hidden and non-hidden data. We introduce the concept of dynamic reallocation with an overview of the reallocation process in section 9.2. We then go on in section 9.3 to discuss the operational details. We discuss all aspects of the reallocation process, in- cluding write operation redirection in section 9.3.2, hidden data reallocation process in section 9.3.3, and the reallocation categories in section 9.3.3. In the following chapter we will address the performance impact which the dynamic reallocation mechanism will have on the operation of the host file system. By analysing this impact we can make judgements concerning the feasibility of a steganographic file system. Chapter 10 Steganographic File System Performance 10.1 Introduction The performance of the steganographic file system will impact on the overall feasibility of such a system. If the overall performance impact is too great, then the existence of the hidden data can be betrayed, and the security of the data will be jeopardised. In order to analyse the performance impact of the steganographic file system, we will need to consider a number of different factors, for both the hidden file system and the host file system. In this chapter we will consider the factors impacting performance for the hidden file system, in section 10.2, and the host file system, in section 10.3. The results presented in this chapter were obtained through experimenta- tion with our implementation of SSFS. This implementation of SSFS allowed us to draw a number of conclusions concerning the performance and feasibil- ity of the overall system. The major concern for hidden file system performance is the impact which file fragmentation will have on the storage and retrieval of hidden data, these concerns can be negated by utilising an appropriate physical device, as will be discussed in the following sections. Host file system performance is most significantly impacted by the dynamic reallocation methods which are used to avoid data collisions and avoid data duplication. An analysis of this impact will be discussed in the following sections. 185 186 STEGANOGRAPHIC FILE SYSTEM PERFORMANCE 10.2 Hidden File System Performance In the following section we will discuss the performance considerations for the hidden file system. The only real consideration is that of file fragmentation. As a side-effect of the dynamic reallocation process, file fragmentation will inevitably impact hidden file system performance, this however is only a consideration on traditional hard disk drives. File fragmentation is the most significant external factor which will impact hidden file system performance and will be discussed below. 10.2.1 Hidden Data Fragmentation One of the factors which will have the greatest impact on the performance of the steganographic file system will be file fragmentation. Ideally file data should be allocated contiguously, and in adjacent physical blocks, in order to minimise the movement of the read/write heads. The greater the degree of file fragmentation, the more movements the read/write heads will have to make in order to access the hidden data. This so called seek time 1 can greatly impact on the performance for reading and writing hidden data. Hidden file system fragmentation will increase more rapidly than on a normal file system implementation, as the dynamic reallocation mechanism is required to move hidden data to alternate physical locations as the host file system requires the physical block. This increased possibility of file fragmen- tation can impact on the access time of large hidden files, as the read/write heads will be required to make more movements across the magnetic platter to read the data into primary memory. File fragmentation will only impact a hidden file system which is stored on a hard disk drive, or removable diskette. It is not a consideration for storage devices which use "flash memory", such as a USB Flash Drive, or a Solid State Disk, access to the physical blocks on these devices is controlled electrically, and therefore file fragmentation, and the associated seek time, is not a factor. The time required to access data stored on a physical block on a flash memory module is constant, and usually is achieved in a few micro- seconds. This ability of flash memory is geared especially well to SSFS. By us- ing a USB Flash Drive as the underlying physical medium for the hidden 1 Seek time -The amount of time taken for the read/write heads to move to the correct position on the platter to read or write data. 10.3. HOST FILE SYSTEM PERFORMANCE 187 file system, it will eliminate the performance impact brought about by file fragmentation. In the following section we will discuss the impact of the dynamic reallo- cation methods on the host file system. 10.3 Host File System Performance The performance of the host file system is most significantly impacted by the dynamic reallocation methods which were introduced to ensure there are no collisions between the hidden and non-hidden data. In order for SSFS to be feasible, the impact on the host file system must be kept to a minimum. In order to quantify the impact of the dynamic reallocation methods on the host file system, we will discuss both an optimised and unoptimised imple- mentation in the following sections. This will allow the efficient reallocation of hidden data and ensure the feasibility of such a system. 10.3.1 Dynamic Reallocation Performance The dynamic reallocation mechanisms will introduce a performance impact on the host file system implementation. Recall that for every write operation to the physical device, the physical block which is to be written to must be checked to see if it contains hidden data which must be reallocated. Searching the Translation Map will constitute the largest performance penalty, how- ever this can be minimised by implementing the Translation Map as a more efficient data structure, such as a Red-Black tree. Figure 10.1 shows the effect which dynamic reallocation will have on the host file system. These results were generated through interaction with SSFS using a virtual machine running Ubuntu 7.10 with a 2.6GHz CPU. An empty 100MiB disk image was used in each case. A number of files of random size were created on the host file system, and the time taken to create these files was recorded. In each case the graph shows the amount of time required to allocate a number of files. The line indicated in blue shows the original performance of the host file system, allocating 2000 files of random size in approximately 0.5 seconds. The line indicated in black shows the amount of time taken when an unoptimised version of the dynamic reallocation methods are used, where the time taken to allocate 2000 files was approximately 4 seconds. This 188 STEGANOGRAPHIC FILE SYSTEM PERFORMANCE results from a linear search of the Translation Map, with a worse case run- time of 0 ( n). Recall that the Translation Map stores the logical to physical mapping as a set of paired values, an entry for the hidden logical address and a corresponding entry for the physical address. This linear increase in time is clearly unacceptable, as the performance impact of the dynamic reallocation mechanism will betray the presence of the hidden file system. In order to improve on this result, an optimised version of the dynamic reallocation methods was introduced. This version utilised a Red-Black tree, indexed by the physical location on the device. This allows the Translation Map to be searched in a worse case run-time of O(log n). This optimisation results in a dramatic improvement of the overall file system performance, taking only 0.8 seconds to allocate 2000 files. This effect of this optimisation will be discussed fully in the following section . 4.0 3.5 ~ 3.0 "' '-' ~ 2.5
u 2.0 <1.) r:/"1 1.5 1.0 With Dynamic Reallocation - Unoptimised With Dynamic Reallocation - Optimised Without Dynamic Reallocation 0.5 ~ ~ ~ = = = = = = = = 3 . 0 ~ 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Files created Figure 10.1: Optimised versus unoptimised dynamic reallocation These results indicate that the performance impact of the dynamic real- location mechanisms on the host file system is small enough to warrant the use of a hidden file system. As is the unoptimised case, the presence of the hidden file system, and thus the hidden data, could be betrayed by the per- formance impact. However by optimising the dynamic reallocation methods, 10.3. HOST FILE SYSTEM PERFORMANCE 189 it allows the hidden data to be stored without impacting on the overall host file system performance. In the following section we will discuss the code profiles for the host file system, this will allow us to observe how the hidden file system implementa- tion will impact on the operation of the host file system. Again these profiles were obtained through experimentation with our implementation of SSFS. 10.3.2 Dynamic Reallocation Code Profiles In order to analyse the impact of the dynamic reallocation methods on the host file system, consider the code profiles for the creation of 2000 randomly sized files as seen in figure 10.2 on page 191, and figure 10.3 on page 192. These profiles will be discussed in detail below. These profiles were created using gprof (the GNU Profiler) [20]. This allows us to examine each function call to determine which function is taking the longest to execute. We can thus determine the impact of the dynamic reallocation mechanism on the host file system implementation. U noptimised Code Profile Figure 10.2 shows the amount of time taken for each function call when an unoptimised version of the dynamic reallocation mechanism is in use. The greatest impact is from the searchTmap function (indicated in red), which is responsible for searching the Translation Map for a particular logical-to- physical mapping. This profile reveals that this function consumes 93 percent of the total running time for this particular application. This is confirmed when examining the graph presented in figure 10.1 which shows the total running time required to allocate a particular number of files. This single function is responsible for the observed 800% increase in running time. This performance impact is unacceptable, as normal operation of the host file system will be impeded which is contrary to the stated design goals. However, the only other dynamic reallocation function which is of significant impact on the host file system is the hfs_wri te_pos function (indicated in blue). This function is responsible for writing both the non-hidden and reallocated hidden data to the physical disk. This function does not have a significant impact on the overall host file system performance (determined by examining the total execution time for this function), therefore if the searchTmap function can be optimised the existence of the hidden file system 190 STEGANOGRAPHIC FILE SYSTEM PERFORMANCE can become feasible. In the following section we will discuss the improved implementation of the dynamic reallocation mechanisms. Optimised Code Profile Figure 10.3 shows the amount of time taken for each function call when the searchTmap function has been optimised. The overall situation has been dramatically improved. The searchTmap function no longer consumes almost all of the total running time. The optimisation consisted of modifying the searchTmap function to utilise a Red-Black tree to search for a particular logical to physical mapping, and as can be seen from figure 10.3 no longer has such a large impact. The hfs_write_pos function (indicated in blue) is found just below the searchTmap function, and has an unchanged run time. This optimised code, as graphed in figure 10.1 introduces only a 0.3 second increase in running time, which is a significant improvement over the previous situation. This overhead is not detrimental to the overall performance of the host file system, which allows for the existence of the hidden file system. By introducing this optimisation into the dynamic reallocation methods, there is an observable improvement in overall performance. Analysing the impact of the dynamic reallocation methods on the run-time of the host file system, the feasibility of embedding hidden data within the host file system is ensured. This will allow hidden data to be securely stored within the host file system and not impact on the overall file system performance. 10.4 Summary In this chapter we discussed the following concepts: Hidden File System Performance ~ in which we introduce the per- formance impacting factors for the hidden file system. We discuss the following concepts: Hidden File System Fragmentation ~ where we discuss file frag- mentation with regards to the impact on the hidden file system. Host File System Performance ~ in this section we discuss how the storage of hidden data will impact on the operation of the host file system. 10.4. SUMMARY 191 % cumulative seLf seLf totaL time seconds seconds caLLs ms/caL L ms/caLL name 97.65 1. 66 1. 66 63609 0.03 0.03 searchTmap 0.59 1.67 0.01 447447 0.00 0.00 delete_from_ list 0.59 1. 68 0.01 210373 0.00 0.00 cache_block_io 0.59 1.69 0.01 2795 0.00 0.00 read_into_ents 0.59 1. 70 0.01 1311 0.01 0.18 flush_ents 0.00 1. 70 0.00 1121200 0.00 0.00 atomic_add 0.00 1. 70 0.00 534686 0.00 0.00 hash_ lookup 0.00 1. 70 0.00 467114 0.00 0.00 block_lookup 0.00 1. 70 0.00 419043 0.00 0.00 add_to_head 0.00 1. 70 0.00 210373 0.00 0.00 system_ time 0.00 1. 70 0.00 147197 0.00 0.00 compare_vnode 0.00 1. 70 0.00 141987 0.00 0.00 release_ block 0.00 1. 70 0.00 141222 0.00 0.00 get_ block 0.00 1. 70 0.00 114754 0.00 0.00 mark_ blocks_ dirty 0.00 1. 70 0.00 69528 0.00 0.00 hash_ delete 0.00 1. 70 0.00 69528 0.00 0.00 hash_ insert 0.00 1. 70 0.00 69528 0.00 0.00 new_hash_ent 0.00 1. 70 0.00 68384 0.00 0.00 cached_ write 0.00 1. 70 0.00 68384 0.00 0.00 write_blocks 0.00 1. 70 0.00 67310 0.00 0.00 file_pos_to_disk_addr 0.00 1. 70 0.00 66384 0.00 0.00 acquire_sem 0.00 1. 70 0.00 66384 0.00 0.00 release_sem 0.00 1. 70 0.00 57945 0.00 0.00 hfs_write_pos 0.00 1. 70 0.00 49546 0.00 0.00 GetFreeRangeOfBits 0.00 1. 70 0.00 42543 0.00 0.00 update_inode 0.00 1. 70 0.00 37253 0.00 0.00 myfs_allocate_blocks 0.00 1. 70 0.00 37253 0.00 0.00 real_allocate_blocks 0.00 1. 70 0.00 29428 0.00 0.00 add_to_tail Figure 10.2: Code profile of unoptimised dynamic reallocation 192 STEGANOGRAPHIC FILE SYSTEM PERFORMANCE % cumulative seLf self totaL time seconds seconds calLs us/caLL us/caLL name 30.00 0.06 0.06 1127 53.24 153.91 write_rand_data 20.00 0.10 0.04 208026 0.19 0.43 cache_block_io 15.00 0.13 0.03 530494 0.06 0.06 hash_ lookup 5.00 0.14 0.01 445902 0.02 0.02 delete_from_list 5.00 0.15 0.01 416457 0.02 0.02 add_to_head 5.00 0.16 0.01 113100 0.09 0.14 mark_blocks_dirty 5.00 0.17 0.01 71116 0.14 0.14 hash_ insert 5.00 0.18 0.01 20459 0.49 5.60 myfs_write_data_stream 5.00 0.19 0.01 18459 0.54 6.15 sys_write 5.00 0.20 0.01 2940 3.40 3.40 get_ents 0.00 0.20 0.00 1110314 0.00 0.00 atomic_ add 0.00 0.20 0.00 821605 0.00 0.00 intcmp 0.00 0.20 0.00 461381 0.00 0.06 block_lookup 0.00 0.20 0.00 208026 0.00 0.00 system_ time 0.00 0.20 0.00 149654 0.00 0.00 compare_vnode 0.00 0.20 0.00 140255 0.00 0.10 release_block 0.00 0.20 0.00 139494 0.00 0.43 get_block 0.00 0.20 0.00 71116 0.00 0.00 hash_ delete 0.00 0.20 0.00 71116 0.00 0.00 new_hash_ent 0.00 0.20 0.00 67769 0.00 0.43 cached_ write 0.00 0.20 0.00 67769 0.00 0.43 write_ blocks 0.00 0.20 0.00 66233 0.00 0.21 file_pos_to_disk_addr 0.00 0.20 0.00 65769 0.00 0.00 acquire_sem 0.00 0.20 0.00 65769 0.00 0.00 release_sem 0.00 0.20 0.00 63359 0.00 0.00 searchTmap 0.00 0.20 0.00 54949 0.00 0.00 hfs_write_pos 0.00 0.20 0.00 48357 0.00 0.00 GetFreeRangeOfBits 0.00 0.20 0.00 42045 0.00 0.68 update_inode Figure 10.3: Code profile of optimised dynamic reallocation 10.5. CONCLUSION 193 10.5 - Dynamic Reallocation Performance - in this section we analyse the performance of the dynamic reallocation methods, by examine the impact on the host file system. - Dynamic Reallocation Code Profiles - where we analyse the code profiles for the host file system in order to examine the impact of the dynamic reallocation mechanism. Conclusion The performance of the steganographic file system will play an important role in the overall ability to securely store hidden data. The performance impact on the host file system due to dynamic reallocation must be kept to a minimum in order to allow the presence of the hidden data to be kept secure. The above chapter analyses the impact of the dynamic reallocation methods on the host file system, and demonstrates the feasibility of such a system. This chapter clearly demonstrates that hidden data can be embedded within a host file system without incurring a major performance impact. The design and implementation of the hidden file system as outlined in the above chapters allows for the dynamic reallocation mechanism to be effectively applied, and minimise the performance impact on the host file system. The impact of the dynamic reallocation methods on the overall host file system operation allows for the operation of a steganographic file system. This will allow hidden data to be stored with confidence that it will not be discovered through an impact in performance. In the above chapter we discussed factors which will impact the perfor- mance of the hidden and host file system. In section 10.2 we discussed the performance concerns regarding the hidden file system, giving special atten- tion to hidden file fragmentation. This was followed by a discussion of the factors impacting the host file system in section 10.2. Special concern was given to the dynamic reallocation methods as a performance impacting fac- tor, as they play a major role in the operation of the steganographic file system. Chapter 11 Conclusion 11.1 Introduction In this chapter we will outline the content of this dissertation, this will allow us to look forward and examine areas of future research. Firstly, in section 11.2 we outline the contribution of each chapter of this dissertation. We go on in section 11.3 to discuss the contribution of SSFS to the field of information hiding. Finally we discuss areas of future research in section 11.4. 11.2 Contribution In this section we will outline the contribution of each of the chapters of the dissertation. Chapter 1 serves as an introduction to this dissertation. The chapter briefly outlined the problem statement, to introduce the secure stegano- graphic file system (SSFS), and to present an overview of what was to follow. Chapter 2 serves to introduce the concepts relating to hard disk drives and traditional file systems. The concepts which are introduced in this chapter are used throughout chapters 5-9. The interaction between the disk and the file system is the most important concept discussed in this chapter, as it laid the framework for the following chapters. Chapter 3 introduces cryptography as a mechanism for providing informa- tion security. This chapter serves to provide a holistic overview of the many 195 196 CONCLUSION different cryptographic techniques. However the most relevant of the dis- cussed aspects are that of symmetric cryptosystem and block cipher modes. These two concepts are referred to extensively in chapter 8 to describe the security scheme for SSFS. Chapter 4 discusses steganography and steganographic file systems. Most notably, this chapter outlines a number of domain specific steganographic terms which are used frequently throughout the following chapters. Another important concept introduced in this chapter is the distinction between cryp- tographic and steganographic file systems. This chapter also outlines a num- ber of steganographic file system implementations which are referred to in later chapters. Chapter 5 is the first of the chapters which describe the implementation of the secure steganographic file system (SSFS). This chapter outlines problems with existing implementations, our aim for SSFS, and the basic construction of such a file system. Important in this chapter is the definition of the relationship between the hidden and host file systems, and the distinction between the logical and physical views of the device. This chapter outlines the framework and concepts for SSFS, which will be used extensively in later chapters. Chapter 6 discusses the control structures which are used by the hidden file system component of SSFS in order to support the storage and retrieval of data. The control structures which are discussed in this chapter are the Superblock, the TMap Array, the Thanslation Map, the Inode Table, the Directory Entries, and the File Streams. The initialisation of the above mentioned structures in relation to the host file system is then discussed. The control structures discussed in this chapter are used extensively throughout the remaining chapters to describe almost every component of SSFS. Chapter 7 discusses the hidden file system operations as a mechanism for interacting with the hidden data. This chapter defines a framework which makes extensive reference to the structures discussed in chapter 6. This chapter outlines the operational layers as used by SSFS, which is used to control the storage and retrieval of the hidden data. These operational layers are of importance, as they are used in chapter 8 to allow for transparent encryption of the hidden data. Chapter 8 defines the security scheme for SSFS, and makes extensive use of the cryptographic concepts discussed in chapter 3. This chapter makes the important distinction between information security through information hiding and information security through data encryption, both of which are used in SSFS to provide a holistic security scheme. 11.3. CONTRIBUTION OF SSFS 197 Chapter 9 discusses the dynamic reallocation mechanism, which provides the core of the "non-duplication" functionality of SSFS. This chapter makes extensive reference of the structures described in chapter 6, and makes con- tinuous use of concepts discussed in previous chapters. The dynamic realloca- tion mechanism will allow hidden data to be reallocated to any physical block as needed by the host file system. This mechanism has a significant impact on the performance of the overall system, which is addressed in chapter 10. Chapter 10 addresses the performance of SSFS, specifically the perfor- mance impact of the dynamic reallocation mechanism on the host file sys- tem. This chapter confirms that such a system is feasible, in that there is an acceptable impact on the host file system. In the following section we will discuss the wider contribution of SSFS as a mechanism for providing information security. 11.3 Contribution of SSFS Steganographic file systems provide a unique way to ensure information se- curity. This is achieved through the use of both cryptography and steganog- raphy in a single environment, which allows for the convenient storage and retrieval of multiple items of hidden data. The greatest problem which plagues all steganographic file systems is how to handle the interaction between the hidden and non-hidden data. A commonly used approach is to store multiple copies of the steganographic content to avoid it being overwritten at some stage by non-hidden data. There is however no guarantee that hidden data will remain intact. SSFS's use of the dynamic reallocation mechanism allows for hidden data to be stored in a manner which will ensure that it will not be overwritten, while still allowing the host file system to operate normally. Furthermore the ability of SSFS to locate hidden data regardless of the underlying physical layout of that data provides an interesting organisational mechanism which could be extended to support many different types of file system. Steganographic file systems, including SSFS, are subject to abuse through the storage of illegal data. The methods and mechanism used by SSFS pre- sented throughout this dissertation can be used to enable forensic examiners to develop mechanisms to detect hidden data. In the following section we will discuss the future improvements which could be made to SSFS. 198 CONCLUSION 11.4 Future Work Multi-user environment SSFS at present only allows for a single user to hide data using their master passphrase. The introduction of a multi-user environment would allow multiple users to hide data within SSFS. "Multi- user" need not imply multiple individuals utilising SSFS, only that multiple passphrases exist which could be used to access different sets of hidden data. This would allow for a greater level of security, as data could be hidden in one of many different sets of files, thus making the detection of the hidden data even more complex. File permissions Once access to the hidden file system component has been granted, SSFS does not control user access to files and directories, this is an extension of the single user environment currently in use. Implementation of user permissions will add an extra security layer when used in conjunction with a multi-user environment mentioned above. Optimisation of the Translation Map structure The Translation Map structure is relatively simple in nature, a simple linear list of logical to phys- ical mappings. By utilising an optimised structure, such as a B-Tree, this will improve on the overall performance of SSFS. Forensic examination of the hidden file system The question of foren- sic examination of a steganographic file system remains largely unexam- ined. Detection and examination of steganographic file systems would allow forensic examiners reliably determine then existence of steganographic con- tent, and then apply other conventional methods to obtain the associated passphrases. The area of research would prevent abuse of steganographic file systems by enabling forensic examiners to detect steganographic content. The use of a journal to ensure data consistency The consistency of hidden and non-hidden data following the incorrect unmounting of the file system is a specific area for further research and improvement. Structures such as a file system journal will guarantee that both hidden and non-hidden data is always available regardless of the state of the storage media. Revise Placement of TMap Array The TMap array allows for the Translation Map to be located from any physical location of the storage 11.5. CONCLUSION 199 device. The current placement of this structure could adversely affect the overall system. It would be advantages to research methods to remove or revise the TMap which would allow for further refinement of the overall system. 11.5 Conclusion This chapter served to reflect back upon the content outlined within this dissertation. We discussed the contribution each of the preceding chapters made to the whole dissertation. We then go on to discuss the contribution of SSFS to the field of information hiding. Finally we discussed a number of different areas of future research. Appendices 201 Appendix A SSFS Implementation A.l Introduction In this appendix we will discuss the technical aspects of our implementation of SSFS. Our implementation of SSFS was constructed with the C programming language, using both Linux and MacOS X machines. The C programming language was chosen because of its suitability to an implementation of this type. Linux and MacOS X were chosen because both provide UNIX-type environments. Both the Linux kernel and MacOS X's Darwin kernel are open and well documented. Recall that SSFS is constructed as a "compound file system" , which con- tains both a host and hidden file system. In the following sections we will discuss the choices for the design and implementation of SSFS. A.2 Host File System A simple host file system was chosen for our implementation of SSFS in order to provide a platform for testing and debugging. Giampaolo in his book, Practical File System Design with the BE File System [23] describes such a simple file system, and provides what he calls the File System Construction Kit 1 (FS-Kit). FS-Kit provides a complete file system implementation which operates on a disk image in userspace. This provides the perfect platform for experi- mentation as it provides for testing and debugging in a convenient manner. 1 Available online from Giampaolo's website: http: I /www. letterp. com/ -dbg/ 203 204 SSFS IMPLEMENTATION FS-Kit also provides an effective analogue for a kernel-level file system, as all of the internal workings are identical to that of a normal kernel file system implementation, except that it is implemented as a set of userspace utilities. FS-Kit provide a number of userspace utilities to interact with the file system, these are listed below: makefs - this utility is used to initialise the file system. fsh - this is used to access a file system shell in order to interact with the file system implementation. tstfs - this is used to perform "stress" tests on the file system. In the following section we will discuss the implementation of the hidden file system A.3 Hidden File System For the hidden file system implementation we extensively added to FS-Kit to allow steganographic content to be embedded. Although FK-Kit was used as the host file system, the hidden file system implementation was written "from scratch" and then integrated within the host file system. To embed the hidden file system into the host file system implementation, very few modifications had to be made to the FS-Kit implementation. These modifications were only used to access the hidden file system initialisation, and dynamic reallocation routines. The bulk of the host file system source code remained unaltered in order to allow for backward compatibility. A number of userspace utilities were created in order to interact and experiment with the hidden file system component of SSFS. These utilities are listed below. makehfs - this utility contains a modified version of the makefs (dis- cussed in the previous section) which will initialise both the host and hidden file systems. hsh - this utility provides a dedicated shell used to interact with the hidden file system component. The shell provides a number of com- mands which will allow the user to create and modify files and direc- tories. A.3. HIDDEN FILE SYSTEM 205 As discussed above, the hidden file system component is a complete file system implementation. The position for the hidden file system's structures and files are determined through interaction with the host file system's on- disk structures. Once the hidden data has been positioned on the physical disk, the dy- namic reallocation mechanism will perform the necessary reallocations when the host file system requests a write to a particular physical blocks. The hidden file system implementation manages all aspects of storage and retrieval of the hidden data; this will include the translation between the logical and physical locations of the hidden data. Recall that the logical- to-physical mappings are handled through interaction with the Translation Map which is implementation within the hidden file system. In the following section we will discuss the hidden file system creation utility and the hidden file system command shell, which is used to interact with the hidden file system. The SSFS Creation Utility (makehfs) The makehfs utility is used to create and initialise both the host and hidden file system. The makehfs utility will firstly create the host file system using the FS-Kit makefs utility, and then initialise and embed the hidden file system. Depending on the overall size of the host file system, the limits of the hid- den file system are determined during the creation of the hidden file system. This allows a reasonable amount of space to be reserved for the stegano- graphic content, without significantly consuming the space available to the host file system. In the following section we will discuss the hidden command shell, which will allow the user to interact with the hidden file system. The Hidden Command Shell (hsh) The hidden command shell is used to provide a seamless interface when the hidden data and the user. The transparent encryption and decryption of the hidden data is managed by the hidden file system implementation, through the hidden shell. The command shell will also allow the user to access hidden data once it has been reallocated on the physical device. The command shell provides the following user commands: 206 SSFS IMPLEMENTATION ls - displays a directory listing. pwd- displays the current working directory. mkdir - creates a new directory. rmdir - removes an existing directory. cd - allows the user to change the current working directory. touch- create a new file, with size 'zero'. appendr - appends a specified number of random bytes to an existing file. cat - displays a 'hexdump' of an existing file. rm - removes an existing file. crandom- creates a file with random size and which contains random data. tstfs- performs a "stress test" on the hidden file system. quit - terminates the hidden shell. In the following section we will present a number of screenshots to demon- strate the operation of SSFS. A.4 Screenshots In this section we present a number of screenshots which are used to demon- strate SSFS in operation. A description of the screenshots will be presented below. The screenshots depict SSFS compiled with debugging information included which allows the operation of SSFS to be examined as it is in oper- ation. 1. Figure A.l on page 209 - this figure shows the operation of the makehf s utility. Firstly the host file system is initialised, followed by the initialisation of the hidden file system. As can been seen in this figure, the limits of the hidden file system are determined by this utility. A.4. 8CREENSHOTS 207 2. Figure A.2 on page 209- this figure shows the operation of the hsh utility which allows the user to interact with the hidden file system. It is implemented as an interactive shell which accepts UNIX-style com- mands. 3. Figure A.3 on page 210- this figure demonstrates all the commands which are available for the user to interact with hsh. 4. Figure A.4 on page 210- this figure demonstrates a number of com- mands within hsh. Firstly the ls command is used to obtain a directory listing. The mkdir command is then used to create a new directory. Finally, the ls command is used to display the new directory in the directory hierarchy. 5. Figure A.5 on page 211 - this figure demonstrates the creation of a file within the hidden file system. Firstly the cd command is used to change into a new directory. The touch command is then used to create a new file with size 'zero'. The file is then populated with random data using the appendr command. Finally a directory listing is obtained with the ls command. 6. Figure A.6 on page 211- this figure demonstrates the ability to display the contents of a file. A new file is created and 100 bytes of random data is appended to it. The content of the new file is then displayed using the cat command. 7. Figure A. 7 on page 212- this figure demonstrates the removal of a file. The rm command is used to remove a file from the current working directory. 8. Figure A.8 on page 212- this figure demonstrates the removal of a directory. The rmdir command is used to remove a directory from the hidden file system. As can be seen, only an empty directory can be removed. If the directory is not empty, an error is displayed. 9. Figure A.9 on page 213- this figure demonstrates FS-Kit's fsh utility. This utility provides a shell for the host file system, and it used to access non-hidden data. As can be seen from this figure, the fsh utility has been modified to provide access to some of the hidden file system's control structures. 10. Figure A.10 on page 213 -- this figure demonstrates the dynamic re- allocation mechanism operating within the fsh utility. As can be seen 208 A.5 SSFS IMPLEMENTATION from this figure, as the host file system attempts to write to a num- ber of physical blocks which contain hidden data, the hidden data is then reallocated. This figure also shows the identification of the Real- location Categories, and the modification of the Translation Map and TMap Array. Conclusion The utilities presented in this appendix constitute a working prototype of SSFS, which gives us the ability to experiment and obtain meaningful results. The prototype consists of the following components: the host file system, the hidden file system, and a number of userspace utilities. The results presented in this dissertation were obtained through interaction with our prototype of SSFS. This implementation allows for convenient examination of the internal workings of a steganographic file system. This allows the strengths and weaknesses of a steganographic file system to be easily examined. In this appendix we discussed the components of SSFS; the host and hidden file system. We go on to discuss a number of utilities which allow a user to interact with the file system implementation. Finally we presented a number of screenshots which depict SSFS in operation. A.5. CONCLUSION 209 Figure A.l: Initialising the host and hidden file system with the makehfs applica- tion. Figure A.2: Starting the hidden file system shell. 210 SSFS IMPLEMENTATION Figure A.3: Showing all the commands available to operate on the hidden file system. Figure A.4: Performing a directory listing with the Ls command and creating a directory with the mkdir command. A.5. CONCLUSION 211 Figure A.5: Creating a file in the newly created directory, and then appending data using the appendr command. Figure A.6: Creating a file and displaying the contents of the file on the console. 212 SSFS IMPLEMENTATION Figure A.7: Deleting a file from the hidden file system, using the rm command. Figure A.8: Attempting to delete a file from the hidden file system, using the rmdir command. A.5. CONCLUSION 213 Figure A.9: Creating a file on the host file system, and then appending data to that file. Figure A.lO: Showing the dynamic reallocation process, with identification of the reallocation categories, and modification of the Translation Map. Bibliography [1] R. Anderson and F.A.P. Petitcolas. On The Limits of Steganography. IEEE Journal of Selected Areas in Communications, 16:474-481, 1998. doi: 10.1109/49.668971. [2] R. Anderson, E. Biham, and L. Knudsen. Serpent: A New Block Ci- pher Proposal. Proceedings of the 5th International Workshop on Fast Software Encryption, Paris, France, LNCS 1372, pages 222-238, 1998. doi: 10.1007 /3-540-69710-L15. [3] R. Anderson, E. Bilham, and L. Knudsen. Serpent: A Proposal for the Advanced Encryption Standard. NIST AES Proposal, 1998. URL http://www.cl.cam.ac.uk/-rja14/Papers/serpent.pdf. [4] R. Anderson, R. Needham, and A. Shamir. The Steganographic File System. In David Aucsmith, editor, Information Hiding, Second Inter- national Workshop, IH'98, Portland, Oregon, USA, April, 1999, Pro- ceedings, 1998. doi: 10.1007 /3-540-49380-8_6. [5] W. Bender, D. Gruhl, N. Morimoto, and A. Lu. Techniques for data hiding. IBM Systems Journal, 35:313-336, 1996. doi: 10.1147 /sj.353. 0313. [6] W. Bender, F. J. Paiz, W. Butera, S. Pogreb, D. Gruhl, and R. Hwang. Applications for data hiding. IBM Systems Journal, 39:547-568, 2000. doi: 10.1147 /sj.393.0547. [7] M. Blaze. A Cryptographic File System for UNIX. In CCS '93: Pro- ceedings of the 1st ACM conference on Computer and communications security, pages 9-16, New York, NY, USA, 1993. ACM Press. ISBN 0-89791-629-8. doi: 10.1145/168588.168590. [8] R. Card, T. Ts'o, and S. Tweedie. Design and Implementation of the Second Extended Filesystem. Proceedings of the First Dutch In- 215 216 BIBLIOGRAPHY ternational Symposium on Linux, pages 90-367, 1994. URL http: //web.mit.edu/tytso/www/linux/ext2intro.html. [9] F. M. Carrano and W. Savitch. Data Structures and Abstractions with Java. Prentice Hall, 2003. ISBN 0-13-017489-0. [10] E. Casey. Digital Evidence and Computer Crime. Academic Press, 2004. ISBN 0-12-163104-4. [11] J. Corbet, A. Rubini, and G Kroah-Hartman. Linux Device Drivers. O'Reilly Media, Inc., third edition, 2005. ISBN 0-596-00590-3. [12] W. Diffie and M. Hellman. New Directions in Cryptography. Infor- mation Theory, IEEE Transactions on, 22(6):644-654, Nov 1976. ISSN 0018-9448. doi: 10.1109/TIT.1976.1055638. [13] U. Drepper, S. Miller, and D. Madore. md5sum- Manual Page. UNIX Man Page, September 2007. [14] I. Dubrawsky. Cryptographic Filesystems, Part One: Design and Im- plementation, March 2003. URL http: I /www. securi tyf ocus. com/ infocus/1673. [15] M. Dworkin. Recommendation for Block Cipher Modes of Operation. NIST Special Publication 800-38A, 2001. URL http: I I csrc. nist. gov I publications/nistpubs/800-38a/sp800-38a.pdf. [16] FIPS PUB 180-2. Federal Information, Processing Standards Publica- tion 180-2, Secure Hash Standard, August 2002. URL http://csrc. nist.gov/publications/fips/fips180-2/fips180-2.pdf. [17] FIPS PUB 197. Federal Information, Processing Standards Pub- lication 197, Advanced Encryption Standard ( AES) , N ovem- ber 2001. URL http: I /www. csrc. nist. gov /publications/fips/ fips197/fips-197.pdf. [18] FIPS PUB 46-3. Federal Information, Processing Standards Publication 46-3, Data Encryption Standard, October 1999. URL http: I I csrc. nist.gov/publications/fips/fips46-3/fips46-3.pdf. [19] G. A. Francia and T. S. Gomez. Steganography Obliterator: An Attack on the Least Significant Bits. In InfoSecCD '06: Proceedings of the 3rd annual conference on Information security curriculum development, pages 85-91, New York, NY, USA, 2006. ACM Press. ISBN 1-59593- 437-5. doi: 10.1145/1231047.1231066. BIBLIOGRAPHY 217 [20] Free Software Foundation. GNU Binutils. UNIX Man Page. URL http:llwww.gnu.orglsoftwarelbinutilsl. [21] J. Fridrich, M. Goljan, and R. Du. Reliable Detection of LSB Steganog- raphy in Color and Grayscale Images. In MM&Sec '01: Proceed- ings of the 2001 workshop on Multimedia and security, pages 27-30, New York, NY, USA, 2001. ACM Press. ISBN 1-58113-393-6. doi: 10.1145/1232454.1232466. [22] J. A. Gallian. Contemporary Abstract Algebra. Houghton Mifflin Com- pany, 5th edition, 2002. ISBN 0-618-12214-1. [23] D. Giampaolo. Practical File System Design with the BE File System. Morgan Kaufmann Publishers, Inc., 1999. ISBN 1-55860-497-9. URL http:llwww.letterp.coml-dbgl. [24] The Open Group and IEEE. Single UNIX Specification Version 3. URL http:llwww.unix.orglsingle_unix_specificationl. [25] D. Gruhl, A. Lu, and W. Bender. Echo Hiding. In R. Ander- son, editor, Information Hiding, First International Workshop, Isaac Newton Institute, Cambridge, England, May 1996, volume 117 4 of LNCS, pages 295-315. Springer-Verlag, 1996. ISBN 3-540-61996-8. doi: 10.1007 /3-540-61996-8_48. [26] J. S. Heidemann and G. J. Popek. File-System Development with Stack- able Layers. ACM Transactions on Computer Systems, 12(1):58-89, 1994. ISSN 0734-2071. doi: 10.1145/174613.174616. [27] S. Hetzl. Steghide- Manual Page. UNIX Man Page, May 2002. URL http:llsteghide.sourceforge.net. [28) J. Hooper. Hexley- DarwinOS Mascot. URL http: I /www. hexley. com. Hexley DarwinOS Mascot Copyright 2000 by Jon Hooper. All Rights Reserved. [29] D. E. Knuth. Big Omicron and Big Omega and Big Theta. SIGACT News, 8(2):18-24, 1976. ISSN 0163-5700. doi: 10.1145/1008328.1008329. [30] M. Kuhn. The EURion Constellation, February 2002. URL http: I I www.cl.cam.ac.ukl-mgk25leurion.pdf. [31] R. Love. Linux Kernel Development. Sams Publishing, 2004. ISBN 0-672-32512-8. 218 BIBLIOGRAPHY [32] N. Mavroyanopoulos. MCrypt- Manual Page. UNIX Man Page, May 2002. URL http: I /mcrypt. sourceforge. net. [33] A. D. McDonald and M.G. Kuhn. StegFS: A Steganographic File Sys- tem for Linux. In Andreas Pfitzmann, editor, Information Hiding, Third International Workshop, IH'99, Dresden, Germany, September/Octo- ber, 1999, Proceedings, volume 1768 of LNCS, pages 462-477. Springer- Verlag, 1999. ISBN 3-540-67182-X. doi: 10.1007 /10719724_32. [34] M. K. McKusick, W. N. Joy, S. J. Leffler, and R. S. Fabry. A Fast File System for UNIX. ACM Transactions on Computer Systems, 2(3): 181-197, 1984. ISSN 0734-2071. doi: 10.1145/989.990. [35] A. J. Menezes, P. C. van Oorschot, and S. A. Vanstone. Handbook of Applied Cryptography. CBC Press, 1996. ISBN 0-84938-523-7. URL http://www.cacr.math.uwaterloo.ca/hac/index.html. [36] I.S. Moskowitz, G.E. Langdon, and L. Chang. A New Paradigm Hidden in Steganography. In NSPW '00: Proceedings of the 2000 workshop on New security paradigms, pages 41-50, New York, NY, USA, 2000. ACM Press. ISBN 1-58113-260-3. doi: 10.1145/366173.366189. [37] S.J. Murdoch. Software Detection of Currency, May 2004. URL http: //www.cl.cam.ac.uk/-sjm217/talks/ih04currency.pdf. [38] B. Naujok. XFS Filesystem Structure Rev 2.0. Technical report, Sil- icon Graphics, Inc, 2006. URL http://oss.sgi.com/projects/xfs/ papers/xfs_filesystem_structure.pdf. [39] H. Pang, K. Tan, and X. Zhou. StegFS: A Steganographic File System. In Data Engineering, 2003. Proceedings. 19th International Conference on, pages 657-667, 5-8 March 2003. doi: 10.1109/ICDE.2003.1260829. [40] M. Peinado, F.A.P. Petitcolas, and D. Kirovski. Digital Rights Manage- ment for Digital Cinema. Multimedia Systems, 9:228-238, 2003. ISSN 0942-4962. doi: 10.1007 /s00530-003-0094-3. [41] F.A.P. Petitcolas, R.J. Anderson, and M.G. Kuhn. Information Hiding- A Survey. Proceedings of the IEEE, 87(7):1062-1078, 1999. doi: 10. 1109/5.771065. [42] E. Michael Power, Jonathan Gilhen, and Roland L. Trope. Setting Boundaries at Borders: Reconciling Laptop Searches and Privacy. Secu- rity 9 Privacy, IEEE, 5(2):72-75, March-April 2007. ISSN 1540-7993. doi: 10.1109/MSP.2007.40. BIBLIOGRAPHY 219 [43] R. Rivest. RFC1321: The MD5 Message-Digest Algorithm, 1992. URL http://www.ietf.org/rfc/rfc1321.txt. [44] R. L. Rivest, A. Shamir, and L. Adleman. A Method for Obtaining Digital Signatures and Public-Key Cryptosystems. Communications of the ACM, 21(2):120-126, 1978. ISSN 0001-0782. doi: 10.1145/359340. 359342. [45] R. Sandberg, D. Goldberg, S. Kleiman, D. Walsh, and B. Lyon. Design and Implementation of the Sun Network Filesystem. Proceedings of the Summer 1985 USENIX Conference, pages 119-130, 1985. URL http: //citeseer.ist.psu.edu/sandberg85design.html. [46] B. Schneier. Applied Cryptography: Protocols, Algorithms, and Source Code in C. John Wiley & Sons, Inc., 1994. ISBN 0-471-59756-2. [47] B. Schneier. Description of a New Variable-Length Key, 64-Bit Block Cipher (Blowfish). Fast Software Encryption, Cambridge Security Workshop Proceedings (December 1993), pages 191-204, 1994. doi: 10.1007/3-540-58108- L24. [48] B. Schneier. Crossing Borders with Laptops and PDAs, May 2008. URL http://www.schneier.com/essay-217.html. [49] SecuriTeam. Linux Cryptoloop Watermark Exploit, May 2005. URL http://www.securiteam.com/exploits/5UPOP1PFPM.html. [50] A. Silberschatz, P. B. Galvin, and G. Gagne. Operating System Concepts with Java Sixth Edition. John Wiley & Sons, Inc., 2004. ISBN 0-471- 48905-0. [51] K. A. Smith and M. Seltzer. A Comparison of FFS Disk Allocation Poli- cies. In ATEC'96: Proceedings of the Annual Technical Conference on USENIX 1996 Annual Technical Conference, Berkeley, CA, USA, 1996. USENIX Association. URL http: I /www. usenix. org/publications/ library/proceedings/sd96/smith.html. [52] D. R. Stinson. Cryptography Theory and Practice. Chapman & Hal- 1/CRC, 2002. ISBN 1-58488-206-9. [53] M. Szeredi. FUSE: Filesystem in Userspace. Webpage. URL http: //fuse.sourceforge.net/. 220 BIBLIOGRAPHY [54] A. Z. Tirkel, G. A. Rankin, R. M. van Schyndel, W. J. Ho, N. R. A. Mee, and C. F. Osborne. Electronic Watermark. In Digital Image Com- puting, Technology and Applications (DICTA '93}, pages 666-673, Mac- quarie University, Sidney, 1993. URL http: I I ci teseer. ist. psu. edul tirkel93electronic.html. [55) R. L. Trope and E. M. Power. Lessons for laptops from the 18th century. Security f3 Privacy, IEEE, 4(4):64-68, July-Aug. 2006. ISSN 1540-7993. doi: 10.1109/MSP.2006.97. [56] University of Southern California- Signal & Image Processing Institute Image Database. 4.2.03 - Mandrill, . URL http: I lsi pi. usc. edul databaselmiscl4.2.03.tiff. [57] University of Southern California- Signal & Image Processing Institute Image Database. 5.1.09- Moon Surface, . URL http: I lsi pi. usc. edul databaselmiscl5.1.09.tiff. [58] A. Westfield and A. Pfitzmann. Attacks on Steganographic Systems. In Andreas Pfitzmann, editor, Information Hiding, Third International Workshop, IH'99, Dresden, Germany, September/October, 1999, Pro- ceedings, volume 1768 of LNCS, pages 61-76. Springer-Verlag, 1999. ISBN 978-3-540-67182-4. doi: 10.1007 /10719724_5. [59] J.H.K. Wu, R. Chang, C. Chen, C. Wang, T. Kuo, W. Moon, and D. Chen. Tamper Detection and Recovery for Medical Images Using Near-lossless Information Hiding Technique. Journal of Digital Imaging, 0:59-76, 2007. doi: 10.1007 /s10278-007-9011-1. [60] E. Zadok, I. Badulescu, and A. Shender. Cryptfs: A Stackable Vnode Level Encryption File System, 1998. URL http: I I ci teseer. ist. psu. edulzadok98cryptfs.html. [61] E. Zadok, I. Badulescu, and A. Shender. Extending File Systems Us- ing Stackable Templates. Proceedings of the Annual USENIX Technical Conference, pages 57-70, June 1999. URL http:llwww.usenix.orgl eventslusenix99lfull_paperslzadoklzadok.pdf.
A Systematic Review of The Clinical Effectiveness and Cost-Effectiveness of Sensory, Psychological and Behavioural Interventions For Managing Agitation in Older Adults With Dementia