Sie sind auf Seite 1von 92

Table of Contents

1 . INTRODUCTION TO MEMORY STORAGE DRIVES ......................................................................................... 7 1.1 1.2 1.3 1.4 AS IT EXISTS TODAY ............................................................................................................................... 7 SOLID STATE DRIVES A BRIEF OVERVIEW ........................................................................................... 7 THESIS OBJECTIVE ................................................................................................................................. 8 SUMMARY OF CHAPTERS ....................................................................................................................... 8

2 . SYSTEM MEMORY OVERVIEW ..................................................................................................................... 9 2.1 2.2 2.3 2.4 2.5 SYSTEM ARCHITECTURE ........................................................................................................................ 9 MEMORY ............................................................................................................................................. 11 STORAGE HIERARCHY.......................................................................................................................... 13 MEMORY CONTROLLER ....................................................................................................................... 17 SUMMARY............................................................................................................................................ 18

3 . MAGNETIC DISK STORAGE ........................................................................................................................ 19 3.1 3.2 3.3 3.4 3.5 HARD DISK DRIVES ............................................................................................................................. 19 HARD DISK DRIVE SYSTEM ARCHITECTURE .......................................................................................... 23 HARD DISK DRIVE INTERFACES ........................................................................................................... 25 EXTERNAL HARD DISK DRIVES ........................................................................................................... 28 FUTURE OF HARD DISK DRIVES ........................................................................................................... 29

4 . SOLID STATE DRIVES ................................................................................................................................. 32 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 FLASH MARKET DEVELOPMENT .......................................................................................................... 32 SOLID STATE DRIVES ........................................................................................................................... 33 PHYSICAL LAYOUT ............................................................................................................................... 38 FLASH TRANSLATION LAYER (FTL) .................................................................................................... 40 SOLID STATE DRIVE INTERFACES ........................................................................................................ 44 SSD MARKET ...................................................................................................................................... 45 FUTURE ................................................................................................................................................ 46 SUMMARY............................................................................................................................................ 47 TYPICAL CHARACTERISTICS OF HDD AND SSD ................................................................................... 47

5 . PERFORMANCE: HDD VS SSD .................................................................................................................... 49 5.1 5.2 5.3 5.4 5.5 5.6 BENCHMARK ........................................................................................................................................ 49 BENCHMARK ENVIRONMENT................................................................................................................ 50 TPC-H BENCHMARK ........................................................................................................................... 50 ENERGY EFFICIENCY TEST................................................................................................................... 54 HD TUNE BENCHMARK ....................................................................................................................... 56 SUMMARY............................................................................................................................................ 58

6 . BETTER INVESTMENT: SSD OR ADDITIONAL RAM? .................................................................................... 59 6.1 6.2 6.3 6.4 BENCHMARK ENVIRONMENT ............................................................................................................... 59 RESULTS .............................................................................................................................................. 60 CONCLUSION ....................................................................................................................................... 61 BENCHMARK PROBLEMS ...................................................................................................................... 62

7 . REVERSE ENGINEERING ............................................................................................................................. 63 7.1 7.2 INTEL X25-EXTREME........................................................................................................................... 63 CRUCIAL REAL C300 ........................................................................................................................... 65

Department of EIT Hochschule Rosenheim

Page 1

7.3 7.4

SUMMARY............................................................................................................................................ 68 CONCLUSION ....................................................................................................................................... 69

8 . DESIGNING OPTIMAL PERFORMANCE BASED SSD SYSTEM LEVEL ARCHITECTURE AND ITS CONTROLLER COST ESTIMATION ......................................................................................................................................... 70 8.1 8.2 8.3 8.4 COST ESTIMATION OF CONTROLLER FOR SYSTEM DESIGNED TO MEET PERFORMANCE SPECIFICATION . 70 IMPLEMENTATION FACTORS IN OPTIMIZATION TOOL ............................................................................ 71 OPTIMIZATION TOOL CONSISTENCY TEST FOR CONTROLLER SIZE ......................................................... 76 HINTS TO USE TOOL FOR OPTIMAL SYSTEM DESIGN AND CONTROLLER COST ESTIMATION: .................. 77

9 . SUMMARY ................................................................................................................................................ 78 9.1 9.2 CONCLUSION ....................................................................................................................................... 78 FUTURE WORK ON SSD ........................................................................................................................ 79

APPENDIX A...............................................................................................................................................80 APPENDIX B...............................................................................................................................................83

Department of EIT Hochschule Rosenheim

Page 2

List of Figures
Figure 1: View of personal computer system [25] _______________________________________________ Figure 2: Interconnections of memory components ______________________________________________ Figure 3: Forms of storage, divided according to their distance from the CPU [19] ____________________ Figure 4: Memory hierarchy in comparison with Cost/MB, Size & Access speed [32] ___________________ Figure 5: Memory controller hub ____________________________________________________________ Figure 6 : Hard Disk Drive [27] ____________________________________________________________ Figure 7 : Representations of sectors, blocks and tracks on platter surface [27] _______________________ Figure 8 : Representation of Hard Disk Drive as blocks __________________________________________ Figure 9 : Role of Cache buffer in Hard disk ___________________________________________________ Figure 10 : Typical IDE/ATA ribbon cable its socket on a motherboard [28] __________________________ Figure 11: A single-drop 68-conductor SCSI ribbon cable [28] ____________________________________ Figure 12: Close-ups of SATA cable and its slots on a motherboard [28] _____________________________ Figure 13: A Seagate 1TB external hard drive [28] ______________________________________________ Figure 14 Moving Parts in Hard Disk Drives [29] ______________________________________________ Figure 15: Evolution in density of NAND flash memory __________________________________________ Figure 16: HDD and SSD [30] _____________________________________________________________ Figure 17: NAND flash memory chip [30] _____________________________________________________ Figure 18: Flash memory overwrite mechanism ________________________________________________ Figure 19 : A generic overview of a Flash memory bank [5] _______________________________________ Figure 20 : Components of SSD _____________________________________________________________ Figure 21: organization of conventional SSD __________________________________________________ Figure 22 : Address translation in solid state drive [8] ___________________________________________ Figure 23 : Internal structure of solid state drive [6] ____________________________________________ Figure 24 : X4 PC-Express card with NAND flash chips on it [31] __________________________________ Figure 25 : SSD Market development_________________________________________________________ Figure 26 : TPC-H benchmark application outline ______________________________________________ Figure 27 : TPCH benchmark performance results ______________________________________________ Figure 28: Comparison for energy efficiency ___________________________________________________ Figure 29: Read speed comparison __________________________________________________________ Figure 30: Access time comparison __________________________________________________________ Figure 31: performance comparison between HDD with 12GB system RAM vs SSDs with 2GB system RAM _ Figure 32: performance comparison between HDD with 2GB, 8GB, and 12GB system RAM _____________ Figure 33: Intel X25 Extreme SSD _________________________________________________________ Figure 34: Controller from Marvell on Intel X25-E SSD board ____________________________________ Figure 35: Crucial Real C300 SSD __________________________________________________________ Figure 36: Controller from Marvell on Crucial Real C300 SSD board _______________________________ Figure 37: Design tool outlook ______________________________________________________________ Figure 38: Warning- System is over designed or under designed with respect to performance specified. ____ Figure 39: Cost calculation tool_____________________________________________________________ Figure 40 : Process flow for flip chip BGA and wire bonded BGA packaging. ________________________ Figure 41 : Controller size for the system with SATA 2.0 interface __________________________________ Figure 42 : Controller size for the system with SATA 3.0 interface __________________________________ Figure 43: procedure to create application package _____________________________________________ 10 10 15 17 17 20 21 23 24 26 27 28 29 30 33 34 35 36 37 39 40 41 42 44 46 52 53 55 57 58 60 61 64 65 66 67 74 74 75 73 76 76 81

Department of EIT Hochschule Rosenheim

Page 3

List of Tables
Table 4-1 SLC vs MLC [9] .................................................................................................................................. 35 Table 5-1 Overview of drives in Benchmark environment .................................................................................... 50 Table 6-1 Overview of drives in Benchmark environment .................................................................................... 59 Table 7-1 Controller chip details of Intel X25- E and Crucial Real C300 SSD ................................................... 68 Table 7-2 Controller chip details of Intel X25- E and Crucial Real C300 SSD ................................................... 68 Table 7-3 Interface compatibility of Intel X25- E and Crucial Real C300 SSD .................................................. 69 Table 8-1 System Interface types and their performances .................................................................................... 71 Table 8-2 Buffer Cache types and their performances ......................................................................................... 71 Table 8-3 SSD Controller Interface signals.......................................................................................................... 72

Department of EIT Hochschule Rosenheim

Page 4

Abbreviations:
Acronym BA BGA CS CK CKE CAS CLK DRAM DQ DQS DM MA MLC RST RAS REF_CLK_P/N SATA SSD SLC PATA PET_P/N HDD ODT UART WE Definition Bank Address Ball Grid Array Chip Select Clock Enable Clock Enable Rank Column Address Strobe Clock Dynamic Random Access Memory Data Bus Data Strobe Data Mask Memory Address Multi Level Cell Reset Row Address Strobe PCI Express Clock Serial ATA Solid State Drive Single Level Cell Parallel ATA PCI Express differential signal Hard Disk Drive On-Die Termination Universal Asynchronous Receiver Transmitter Write Enable

Department of EIT Hochschule Rosenheim

Page 5

Department of EIT Hochschule Rosenheim

Page 6

Chapter 1

1 . Introduction to Memory Storage Drives


1.1 As it exists today
Fifty years since the first commercial drive, the disk drive has been the prevailing storage media in almost every computer system to date. Surprisingly, despite the technological improvements in storage capacity and operational performance, modern disk drives are based on the same physical and electromagnetic properties. With the rapidly changing technologies and innovations, electronic storage devices like computer hard disks are becoming more and more sophisticated in design as well as performance. Even though traditional hard disk drives (HDD) are being threatened to a certain extent by flash based storage devices, they are still the most popular form of storage for computing today. Hard drives are used in everything from servers to desktops and notebooks and offer higher storage capacities.

1.2 Solid State Drives A brief overview


Solid State Drives cost significantly more per unit capacity than their rotating counterparts as of today, but there are numerous applications where they can be applied to great benefit. For example, in transaction-processing systems, disk capacity is often wasted in order to improve operation throughput. In such configurations, many small (cost inefficient) rotating disks are deployed to increase I/O parallelism. Large SSDs, suitably optimized for random read and write performance, could effectively replace whole farms of slow, rotating disks. Currently, small SSDs are starting to appear in laptops because of their reduced power-profile and reliability such as shockproof in portable environments. As the cost of flash continues to decline, the potential application space for solid state disks will certainly continue to grow. Solid State Drives are amongst the most popular computer hard disks which are available in the electronic hardware market. It would be interesting to know what is so special about these computer drives and all the more what make them too special!
Department of EIT Hochschule Rosenheim

Page 7

1.3 Thesis Objective

Performance evaluation of hard disk drives and solid state drives by making an extensive comparison followed by benchmarking,

Analyzing the architecture of an SSD controller and reverse engineering, Finally, developing a tool which suggests the most optimum and cost efficient system level SSD architecture based on a selected interface.

1.4 Summary of Chapters


Chapter 2 gives an overview of memory design architectures in traditional computer systems and a glance to the storage hierarchy. Chapter 3 goes into the details about hard disk drives, its architecture, physical structure and operation followed by a discussion on the different types of host interfaces used by hard disk drives today. Chapter 4 is dedicated to the flash memory technology which gives a taste of solid state drives technology, its architecture, working and advantages over its counterpart, the hard disk drive. In chapter 5 the performance of magnetic disks and SSD is analysed in different scenarios. Here, it is aimed to identify the main characteristics and try to point out possible weaknesses along with a solution to these drawbacks. Chapter 6 provides content regarding reverse engineering of the solid state drive controller and its structure at package level. Here, the different factors responsible for varied performance in solid state drives are listed. Chapter 7 gives insights about deciding between SSD and more RAM for performance enhancements. Chapter 8 describes about the tool that was developed which suggests a system level solid state drive architecture for optimum performance with an estimated controller cost based on selected host interface and. Chapter 9 summarizes the results and discussion on the future of solid state drives.

Department of EIT Hochschule Rosenheim

Page 8

Chapter 2

2 . System Memory Overview


2.1 System Architecture
The system architecture determines the main hardware components that make up the physical computer system and the way in which they are interconnected. The main components required for a computer system are listed below:

Central processing unit (CPU), Random access memory (RAM), Read-only memory (ROM), Input / output (I/O) ports, The system bus, A power supply unit (PSU).

In addition to these core components, in order to extend the functionality of the system and provide a computing environment so that a human operator interacts with much more ease, additional components are required which could include:

Secondary storage devices (e.g. disk drives), Input devices (e.g. keyboard, mouse, scanner) Output devices (e.g. display adapter, monitor, printer)

The core system components are mounted on a backplane, more commonly referred to as a mainboard (or motherboard). The mainboard is a relatively large printed circuit board that provides the electronic channels (buses) that carry data and control signals between the various components, as well as the necessary interfaces (in the form of slots or sockets) to allow the CPU, Memory cards and other components to be plugged into the system. In most cases, the ROM chip is built in to the mainboard, and the CPU and RAM must be compatible
Department of EIT Hochschule Rosenheim Page 9

with the mainboard in terms of their physical format and electronic configuration. Internal I/O ports are provided on the mainboard for devices such as internal disk drives and optical drives.

Figure 1: View of personal computer system [25] The relationship between the elements that make up the core of the system is illustrated below.

Figure 2: Interconnections of memory components The data flows back and forth between the processor and the memory over shared electrical conduits called buses which carry address, data, and control signals. Depending on the particular bus design, data and address signals can share the same set of wires, or they can use different sets.
Department of EIT Hochschule Rosenheim Page 10

External I/O ports are also provided on the mainboard to enable the system to be connected to external peripheral devices such as the keyboard, mouse, video display unit, and audio speakers. Both the video adaptor and audio card may be provided on-board (i.e. built in to the mainboard), or as separate plug-in circuit boards that are mounted in an appropriate slot on the mainboard. The mainboard also provides much of the control circuitry required by the various system components, allowing the CPU to concentrate on its main role, which is to execute programs. Memory is the most important integral part of a computational system. In this chapter, the focus is on memory organization as a clear understanding of these ideas is vital for the analysis of system performance.

2.2 Memory
Memory lies at the heart of the stored-program computer. The system memory is the place where the computer holds current programs and data that are in use. Although memory is used in many different forms around modern PC systems, it can be divided into two essential types

Read-only-memory, ROM. Random access memory, RAM

2.2.1 ROM
ROM refers to non volatile memory which means that it always retains data even after power is shut. In fact, it needs very little charge to retain its memory. It is used to store permanent or semi-permanent data that persists even while the system is turned off. It is usually used to store small start up programs like BIOS which is used to bootstrap the computer. There are many extended types of ROM namely

PROM: Programmable ROM EPROM: Erasable PROM EEPROM: Electrically Erasable PROM (Flash)

Department of EIT Hochschule Rosenheim

Page 11

Flash memory is essentially EEPROM with the added benefit that data can be written or erased in blocks, removing the one-byte-at-a-time limitation. This makes flash memory faster than EEPROM!

2.2.2 RAM
RAM refers to volatile memory which means that the data is lost once the power is turned off. There are two types of RAM, Static RAM (SRAM) and Dynamic RAM (DRAM).

SRAM: It consists of circuits similar to the D flip-flop. Therefore, it doesnt need to be


refreshed, unlike its counterpart, the DRAM. SRAM is faster and much more expensive than
DRAM and is used to build cache memory.

DRAM: It stores each bit of data in a separate capacitor within an integrated circuit. The
capacitor can be either charged or discharged; these two states are taken to represent the two values of a bit, conventionally called '0' and '1'. Capacitors leak the charge stored in them slowly over time and thus must be refreshed every few milliseconds to prevent data loss.
DRAM is cheap memory owing to its simple design compared to SRAM. Designers use DRAM as it is much denser, uses less power and generates less heat than SRAM. For these

reasons, DRAM's are preferred over SRAM's to be used to build the main memory. There are many kinds of DRAM memories and new kinds appear in the market with regularity as manufacturers attempt to keep up with rapidly increasing processor speeds. Each design is based on the conventional DRAM cell, with optimizations that improve the speed with which the basic DRAM cells can be accessed. Synchronous DRAM (SDRAM) SDRAM has a synchronous interface, meaning that it waits for a clock signal before responding to control inputs and is therefore synchronized with the computer's system bus. The clock is used to drive an internal finite state machine that pipelines incoming instructions. This allows the chip to have a more complex pattern of operation, enabling higher speeds.

Department of EIT Hochschule Rosenheim

Page 12

Double Data Rate SDRAM (DDR SDRAM) DDR SDRAM has the same working principle. The difference is that DDR SDRAM doubles the bandwidth by double-pumping (transferring data on both the rising and the falling edge of the clock signal, without increasing the clock frequency. DDR2 SDRAM DDR2 is the next generation of memory developed after DDR. DDR2 increased the data transfer rate referred to as bandwidth by increasing the operational frequency to match the high FSB frequencies and by doubling the pre-fetch buffer rate. Like DDR SDRAM, DDR2 transfers data on both the rising and the falling edge of the clock signal. The trade off is that internal operations are carried out at only half the clock rate! DDR3 SDRAM DDR3 is the successor to DDR2. DDR3 increased the pre-fetch buffer size to 8 bits and increased the operating frequency once again resulting in high data transfer rates than its predecessor DDR2. Like DDR2 SDRAM, DDR3 transfers data on both the rising and the falling edge of the clock signal, although internal operations are limited to only a quarter of the clock rate! Rambus DRAM (RDRAM) This is an alternative proprietary technology with a higher maximum bandwidth than DDR SDRAM. Compared to other contemporary standards, Rambus shows a slight increase in latency, heat output, manufacturing complexity, and cost. Video RAM (VRAM) VRAM has two ports namely, DRAM port and video port. The second port, the video port, is typically read-only and is dedicated to providing a high bandwidth data channel for the graphics chipset. This is used in the frame buffers of graphics systems.

2.3 Storage Hierarchy


Storage hierarchy refers to the different types of memory devices and equipment configured into an operational computer system to provide the necessary attributes of storage capacity, speed, access time, and cost to make a cost-effective practical system.
Department of EIT Hochschule Rosenheim Page 13

In practice, almost all computers use a variety of memory types, organized in a storage hierarchy around the CPU, as a trade-off between performance and cost. Generally, the lower a storage is in the hierarchy, the lesser its bandwidth and the greater its access latency is from the CPU. This traditional division of storage to primary, secondary, tertiary and off-line storage is also guided by cost per bit.

2.3.1 Primary storage


Primary storage or the commonly referred Main Memory is the memory which is directly accessible to the CPU. The CPU continuously reads instructions stored there and executes them as required. Any data actively operated on is also stored there in uniform manner. Besides main large-capacity RAM, Primary storage consists of two additional sublayers namely processor registers and processor cache as shown in the Figure 3

Processor registers are located inside the processor. Each register typically holds a word of data (often 32 or 64 bits). CPU instructions instruct the arithmetic and logic unit to perform various calculations or other operations on this data (or with the help of it). Registers are the fastest of all forms of computer data storage.

Processor cache is an intermediate stage between ultra-fast registers and much slower main memory. It's introduced solely to increase performance of the computer. Most actively used information in the main memory is just duplicated in the cache memory, which is faster, but of much lesser capacity. On the other hand it is much slower, but much larger than processor registers. Multi-level hierarchical cache setup is also commonly usedprimary cache being smallest, fastest and located inside the processor; secondary cache being somewhat larger and slower. secondary cache is the L2 cache, usually contained on the motherboard. However, more and more chip makers are planning to put this cache on board the processor itself. The benefit is that it will then run at the same speed as the processor, and costs less to put on the chip than to set up a bus and logic externally from the processor. The hierarchy continues and it is referred to as L3 cache. This cache used to be the L2 cache on the motherboard, but now that some processors include L1 and L2 cache on

Department of EIT Hochschule Rosenheim

Page 14

the chip, it becomes L3 cache. Usually, it runs slower than the processor, but faster than main memory. Random-Access Memory It is small-sized, but quite expensive at the same time. (The particular types of RAM used for primary storage are also volatile, i.e. they lose the information when not powered). Main memory is directly or indirectly connected to the central processing unit via a memory bus. It is actually two buses: an address bus and a data bus. The CPU firstly sends a memory address that indicates the desired location of data. Then it reads or writes the data itself using the data bus. Additionally, a memory management unit (MMU) is a small device between CPU and RAM recalculating the actual memory address, for example to provide an abstraction of virtual memory or other tasks.

Figure 3: Forms of storage, divided according to their distance from the CPU [19]
Department of EIT Hochschule Rosenheim

Page 15

2.3.2 Secondary Storage


Secondary storage is also known as external memory or auxiliary storage. The term 'secondary' refers to the inability of the CPU to access it directly. The data in the secondary storage is accessed by the CPU through intermediary devices like the processor cache. The computer uses its secondary storage via the various input/ output channels. Secondary storage is non-volatile which means it does not lose the data when the device is powered down. Per unit, it is typically also two orders of magnitude less expensive than primary storage. Consequently, modern computer systems typically have two orders of magnitude more secondary storage than primary storage and data is kept for a longer time there. In modern computers, hard disk drives are commonly used as secondary storage.

2.3.3 Offline Storage


Offline Storage is where removable types of storage media sit such as tape cartridges and optical disc such as CD and DVD. Offline storage is can be used to transfer data between systems but also allow for data to be secured offsite to ensure companies always have a copy of valuable data in the event of a disaster.

2.3.4 Tertiary Storage


Tertiary Storage is mainly used as backup and archival of data and although based on the slowest devices can be classed as the most important in terms of data protection against a variety of disasters that can affect an IT infrastructure. Most devices in this segment are automated via robotics and software to reduce management costs and risk of human error and consist primarily of disk & tape based back up devices.

Department of EIT Hochschule Rosenheim

Page 16

Figure 4: Memory hierarchy in comparison with Cost/MB, Size & Access speed [32] *The values are approximated for illustration

2.4 Memory Controller


The memory controller is a digital circuit which manages the flow of data going to and from the main memory. It can be a separate chip or integrated into another chip, such as on the die of a microprocessor. This is also called a Memory Chip Controller (MCC).

Figure 5: Memory controller hub The memory controller scans for the type and speed of the RAM connected. It also determines the maximum size of each individual memory module and the overall memory capacity of the system. Memory controllers contain the logic necessary to read, write and refresh the main memory. Considering DRAM for example, Reading and writing to DRAM is performed by
Department of EIT Hochschule Rosenheim Page 17

selecting the row and column data addresses of the DRAM as the inputs to the multiplexer circuit, where the de-multiplexer on the DRAM uses the converted inputs to select the correct memory location and return the data, which is then passed back through a multiplexer to consolidate the data in order to reduce the required bus width for the operation. Bus width is the number of parallel lines available to communicate with the memory cell. Memory controller bus widths range from 8 to 64-bit. In complicated systems, memory controllers are operated in parallel such as a four 64-bit bus operating in parallel, though some are designed to operate in "gang mode" where two 64-bit memory controllers can be used to access a 128-bit memory device.

2.5 Summary
In this chapter, an introduction to different memory technologies in a computer system is summarised. System memory hierarchy has been closely analysed which gives a much better idea, how to choose different storage technology based on type and size suitably. In coming chapters, more insights on current technological trends used for secondary storage are given. In next chapter, the focus is on current state of secondary storage, represented by magnetic disks also called as hard disk drives.

Department of EIT Hochschule Rosenheim

Page 18

Chapter 3

3 . Magnetic Disk Storage


As computing capacity increases, so does the need for Secondary storage. The most important device of this class is the Hard Disk Drive (HDD), which is based on magnetic principles for permanently storing information. HDDs have cost per byte at least two orders of magnitude smaller than that of DRAM, making them suitable for storing vast amounts of data. Hence, hard disk drives are used as secondary memory in most computer systems. In this chapter, an insight on current state of disk storage, represented by magnetic disks is given in section 3.1

3.1 Hard Disk Drives


The hard disk drive is by far the most common secondary storage device in use today. Being in use for half a century, hard disk drives are today considered very mature, and have seen many major improvements. Hard Disk Drives (HDD) are storage devices containing one or more rotating platters made out of a non-magnetic material and are coated with a thin layer of magnetic material. Small sections of this material are manipulated into different magnetic states, making it possible to store data. Magnetic disks have had a great ability to scale capacity and continue to do so today. Internal view of a magnetic disk can be seen in Figure 6.

3.1.1 Physical layout


Hard disk drives are called so because of rotating magnetic platters in them which are used for storage. The rotating platters in magnetic disks sometimes use both sides for storage. Each of these is divided into sectors and tracks. The intersection of a single block and a single track makes up a block. As seen in Figure 7, tracks on the outer part of the disk platter are made up of more sectors. This is due to the fact that the surface will pass faster under the disk head, also more surface mean allow to store more data in these tracks. These different sections are called as zones. To get a higher data capacity in a disk, several platters are put together in a
Department of EIT Hochschule Rosenheim

Page 19

spindle. The disk arm will have a separate head for each surface, and is able to write to more sectors without seeking to a different track. The same tracks across all surfaces are called a cylinder. Having cylinders will make it possible to increase read and write operations, as the disk arm can perform operations on multiple surfaces without needing to move to different position.

Figure 6 : Hard Disk Drive [27]

3.1.2 Working Principle


The platters are made from a non-magnetic material and are coated with a thin layer of magnetic material. Read-and-write heads are positioned on top of the disks. The platters are spun at very high speeds with a motor. A typical hard drive has two electric motors, one to spin the disks and one to position the read/write head assembly. Information is examined or altered on the platter as it rotates past the read/write heads. The read-and-write head can detect and modify the magnetization of the ferromagnetic material immediately under it.
Department of EIT Hochschule Rosenheim Page 20

3.1.3 Disk access time


Disk access time in magnetic disks is made up of three different operations. The time the different operations take will vary on position of disk head, where in the rotation the disk surface is and physical abilities of the disk. Disks read and write data in sector-sized blocks. The access time for a sector has three main components: I. Seek time: To read the contents of some target sector, the arm initially positions the head over the track that contains the target sector. The time required to move the arm is called the seek time. The seek time, Tseek, depends on the previous position of the head and the speed that the arm moves across the surface. The average seek time in modern drives, Tavg random sectors. II. Rotational latency: this depends on rotational speed of the disk (RPM). Once the head is in position over the track, the drive waits for the rst bit of the target sector to pass under the head. The performance of this step depends on both the position of the surface when the head arrives at the target sector and the rotational speed of the disk. In the worst case, the head just misses the target sector and waits for the disk to make a full rotation. Thus, the maximum rotational latency is given by
TMax Rotation = 1/RPM
seek,

measured by taking the mean of several thousand seeks to

Figure 7 : Representations of sectors, blocks and tracks on platter surface [27]

Department of EIT Hochschule Rosenheim

Page 21

III.

Transfer time : When the rst bit of the target sector is under the head, the drive can begin to read or write the contents of the sector. The transfer time for one sector depends on the rotational speed and the number of sectors per track. Thus, the average transfer time for one sector can be roughly estimated as
TAverage transfer = (1/RPM) x (1/average number of sectors/track)

With these characteristics, the seek time and rotational delay becomes a significant part of a random read or writes operation. For sequential operations, the disk will be able to work on entire tracks/cylinders at a time, continuing with neighbouring tracks/cylinders. Doing sequential read will, because of a short physical distance between the locations of data, minimize time used on seeks, resulting in an overall lower access time for the data.

3.1.4 Addressing
The location of a specific sector is referenced using its cylinder number, head number and sector number (this addressing scheme is often abbreviated to CHS). Indeed, the total number of sectors on the drive could be calculated by multiplying the number of cylinders by the number of read/write heads, and then multiplying the result by the number of sectors per track. Since the introduction of zoned bit recording (as mentioned above, this is a drive geometry in which the number of sectors per track is smaller at the centre of the disk) this calculation can no longer be used. The way in which sectors are addressed has also become more abstract, relieving the operating system software of the need to know about physical drive geometry. Note that sectors that are logically sequential are not necessarily physically contiguous. After reading a sector, there may be a small delay before the drive controller is ready to read another sector. Sectors that are logically sequential may therefore be spaced at discrete intervals on the disk to give the drive controller time to get ready to read the next sector - a technique known as interleaving. If an interleave factor of 3:1 were used for example, it would take three full rotations for the controller to read all of the sectors on a single track. Thanks to advances in technology, most modern hard drives do not need to use interleaving. Modern hard disk drives use logical block addressing (LBA), a simple linear addressing scheme in which each sector is given an integer index number, starting with 0. The drive controller translates each logical block address into a cylinder, head and sector
Department of EIT Hochschule Rosenheim Page 22

number in order to obtain the physical location of the sector on disk. The maximum number of sectors that can be addressed is dependent on the number of bits used for the logical block address.

3.2 Hard disk drive system architecture


The system level design of hard disk drive is characterized by the use of a few highlyintegrated chips; their interconnection is represented as a block diagram in Figure 8. As one can see in the picture, the whole layout is based upon chips below:

System controller chip including the read/write channel, disk controller and RISC control processor (microcontroller),

Flash ROM chip containing drive firmware, RAM chip used as a cache buffer.

Figure 8 : Representation of Hard Disk Drive as blocks Disk controller is the most complicated drive component which determines the speed of data exchange between a HDD and HOST.

Department of EIT Hochschule Rosenheim

Page 23

Disk controller has four ports used for connection to a HOST, microcontroller, buffer RAM and data exchange channel between it and head disk assembly. Disk controller is an automatic device driven by microcontroller; from HOST side only standard registers of task file are accessible. Disk controller is programmed at the initialization stage by microcontroller, during the procedure it sets up the data encoding methods, selects the polynomial method of error correction, defines flexible or hard partitioning into sectors, etc. Buffer manager is a functional part of disk controller governing the operations of buffer RAM, referred to as cache. The capacity of the latter ranges in modern HDDs from 512Kb to 16MB. The buffer manager splits the whole buffer RAM into separate sectioned buffers. Special registers accessible from microcontroller contain the initial addresses of those sectioned buffers. When HOST exchanges data with one of the buffers the read/write channel can exchange data with another buffer sector. Thus the system achieves multi-sequencing for the processes of data reading/writing from/to disk and data exchange with HOST.

3.2.1 Hard disk drive cache


Hard disk drives contain an integrated cache; also often called a buffer. The purpose of this cache is not dissimilar to other caches used in the PC, even though it is not normally thought of as part of the regular PC cache hierarchy. The function of the cache is to act as a buffer between a relatively fast device and a relatively slow one. For hard disks, the cache is used to hold the results of recent reads from the disk and also to 'pre-fetch' information that is likely to be requested in the near future, for example, the sectors or sectors immediately after the one just requested.

Figure 9 : Role of Cache buffer in Hard disk

Department of EIT Hochschule Rosenheim

Page 24

The basic principle behind the operation of a simple cache is straightforward. Reading data from the hard disk is generally done in blocks of various sizes, not just one 512-byte sector at a time. The cache is broken into segments, or pieces, each of which can contain one block of data. When a request is made for data from the hard disk, the cache circuitry is first queried to see if the data is present in any of the segments of the cache. If it is present, it is supplied to the logic board without access to the hard disk's platters being necessary. If the data is not in the cache, it is read from the hard disk, supplied to the controller, and then placed into the cache in the event that it gets asked for again. Since the cache is limited in size, there are only so many pieces of data that can be held before the segments must be recycled. Typically the oldest piece of data is replaced with the newest one. This is called circular, first-in, first-out (FIFO) or wrap-around caching. The use of cache improves performance of any hard disk, by reducing the number of physical accesses to the disk on repeated reads and allowing data to stream from the disk uninterrupted when the bus is busy.

3.3 Hard Disk Drive Interfaces


Host interface also called as drive interface used defines the characteristics of the electronic interface between the disk drive and the computer. The type of interface used will to a great extent depend on the purpose for which the computer is to be used, and the type of interface(s) supported by the system motherboard. A number of different interfaces have been developed over the years, some of which are described below.

3.3.1 Advanced Technology Attachment (ATA)


ATA has in the past been somewhat incorrectly referred to as Integrated Drive Electronics (IDE) and has been retrospectively renamed as Parallel ATA (PATA) to distinguish it from the more recent Serial ATA (SATA) interface. The use of the popular IDE misnomer comes from the fact that this interface was the first in widespread use to have the drive controller built into the drive itself. Previously, the drive controller was a separate add-on card that occupied one of the ISA slots on the computer's motherboard. The drive was connected to the motherboard using a 40 or 80-conductor ribbon cable that connected a 40-pin socket on the drive itself to a similar socket on the motherboard and transferred sixteen bits of data in
Department of EIT Hochschule Rosenheim

Page 25

parallel. Each ribbon cable could connect two ATA drives in a master-slave configuration. Enhanced IDE, introduced in anticipation of changes to the ATA standard, allowed the use of direct memory access (DMA) which meant that data could be transferred directly between the disk and memory without involving the CPU in the data transfer process. This freed up the CPU for other tasks.

Figure 10 : Typical IDE/ATA ribbon cable its socket on a motherboard [28]

3.3.2 Small Computer System Interface (SCSI)


SCSI disk and tape drives were standard fare on servers and high-performance workstations and despite advances in ATA technology can still be found in many high-performance server applications. SCSI can be used to connect a wide range of devices, and the SCSI standard defines command sets for many specific types of peripheral device. The SCSI interface allows a maximum of either 8 or 16 peripheral devices to connect to the host computer via a shared parallel bus. Servers typically employ RAID drives in which multiple disks are connected to a SCSI RAID controller card via a SCSI backplane inside a disk enclosure. The connection between the backplane and the controller card will typically be a 68 or 80-conductor single drop ribbon cable. Multiple non-RAID devices could also be connected to a SCSI controller card using multi-drop cables. SCSI drives have not been widely adopted for personal computers due to their cost, and the availability of relatively inexpensive ATA drives that provide perfectly adequate performance for most desktop computing environments. SCSI controller cards are nonetheless still available for personal computers, and can be mounted in a standard PCI-X or
Department of EIT Hochschule Rosenheim Page 26

PCI-E expansion slot. Parallel SCSI has largely been superseded in server and mass storage applications by Fibre Channel (FC) or Serially Attached SCSI (SAS), both of which use a high-speed serial interface.

Figure 11: A single-drop 68-conductor SCSI ribbon cable [28]

3.3.3 Serial Advanced Technology Attachment (SATA)


SATA is the successor to Parallel ATA. One of the most obvious differences is the use of a high-speed serial signal cable instead of the parallel ribbon cable used for ATA drives. It has two pairs of wires for carrying data and 3 ground wires, giving a total of seven wires. The cable is cheaper and less bulky than its PATA counterpart, allowing a better flow of air within the system case and making it easier to install. A SATA signal cable connects a single drive to a SATA socket on the motherboard - there is no master/slave arrangement. SATA drives use a 15-pin power connector rather than the 4-pin Molex power connectors used for PATA drives, although adapters are available to enable a SATA drive to be connected to a power supply via a 4-pin Molex power cable should the need arise. The first version of the SATA standard is officially designated as Serial ATA International Organization: Serial ATA Revision 1.0 (the technology itself should be referred to as SATA 1.5 Gbps) and specifies a gross transfer rate of 1.5 gigabits per second. Taking encoding into account, this equates to 1.2 gigabytes (150 megabytes) of data. Subsequent revisions have doubled and redoubled the transfer rates. Revision 2.0 (SATA 3.0 Gbps) is capable of a gross transfer rate of 3.0 gigabits per second, and Revision 3.0 (SATA 6.0 Gbps) has a gross transfer rate of 6.0 gigabits per second. As of 2010, most installed hard drives and PC chipsets implement SATA 3.0 Gb/s, although SATA 6.0 Gbps products are now becoming available (the Version 3.0 standard was released in May 2010). Most motherboards produced
Department of EIT Hochschule Rosenheim Page 27

since 2003 have integrated SATA controllers (although an add-on controller card can be installed in a PCI or PCI-E slot). The SATA controller can use the Advanced Host Controller Interface (AHCI) in order to take advantage of advanced features such as the hot-swapping of drives, providing both the motherboard and operating system support AHCI. If not, SATA controllers are capable of operating in "IDE emulation" mode.

Figure 12: Close-ups of SATA cable and its slots on a motherboard [28]

3.4 External Hard Disk Drives


External hard disk drives are generally standard ATA, SCSI or SATA hard disk drives mounted in a suitable portable disk enclosure. The drive can be connected to a computer via a Universal Serial Bus (USB) or Firewire port, or in the case of SATA drives via an eSATA (external SATA) or eSATAp (power over eSATA) interface. If an eSATA or eSATAp port is not available on the system, one can usually be added using a PCI add-on card. The use of an eSATA interface has the advantage that data transfer rates are generally faster than for contemporary versions of either USB or Firewire. Having said that, a future iteration of Firewire is predicted to be able to achieve a data transfer rate of 6.4 Gbps, which will be slightly faster than the SATA 6.0 Gbps version of eSATA, while USB 3.0 will not be far behind with a data transfer rate of 4.8 Gbps. Unlike USB or Firewire however, eSATA allows low-level drive features such as SMART (Self-Monitoring, Analysis, and Reporting Technology) to be available to the drive. Unlike Firewire, neither USB 2.0 nor eSATA are capable of providing the 12V power supply required by some 3.5" external hard disk drives (such as the 1TB Seagate external drive pictured below), which means they need a separate
Department of EIT Hochschule Rosenheim Page 28

power supply. The introduction of eSATAp is intended to resolve this issue, while USB 3.0 will reportedly be able to provide voltages of 5V, 12V or 24V. At the time of writing, the storage capacity of a typical external hard drive can range from a few hundred gigabytes up to 4 terabytes.

Figure 13: A Seagate 1TB external hard drive [28]

To meet the demands of the fast growing interface technologies, the data access time should reduce immensely which is possible either by simply increasing the rotational speed of the platters or increasing the cache size to hide the latency.

3.5 Future of Hard Disk Drives


Magnetic disks have followed Moores Law during the last decades, doubling in capacity roughly every 12 months. As well as capacity, bandwidth has also followed this trend. Latency does, however, improve with a smaller factor, making random seeks more and more expensive [13]. Continuing this trend, either needs to rethink the way magnetic disks are used or move to an alternative storage solution. Considering the future of magnetic disk storage technology, there are few bottle necks in continuing this trend, the chief among them being the rotational speed of the platters (RPM). Disk RPM is a critical component of hard disk drive performance as it directly impacts the latency and data transfer rate from the disk. The faster the disk spins, the spindle head reads more data; the slower the RPM, the higher the mechanical latencies.

Department of EIT Hochschule Rosenheim

Page 29

A white paper by Fujitsu Trends in Enterprise Hard Disk Drives [10] quotes "Ultrahigh-speed HDDs rotating at speeds exceeding 20,000 rpm have also been researched but not commercialized due to heat generation, power consumption, noise, vibration and other problems in characteristics, and a lack of long term reliability." Companies have tried ingenious designs to reduce the excessive heat produced by a high spin rate. Generally, the physical disk platters of a standard 3.5 inch hard disk have an approximate diameter of 3 inches. However, in companies like Pegasus II, the platter size has been further reduced to 2.5 inches.

Figure 14 Moving Parts in Hard Disk Drives [29] The smaller platters cause less air friction and therefore reduce the amount of heat generated by the drive. In addition, the actual drive chassis is one big heat fin, which also helps dissipate the heat. But, the disadvantage is that since the platters are smaller they have less data capacity. This can be overcome by using more of them in stack but consequently the height of the drive increases. To get higher data rates from HDDs, manufacturers can

Spin the disks faster-but at 20,000 RPM, enterprise-class HDD platters are already under severe mechanical stress.

Increase the number of read/write heads that can be active simultaneously-which constitutes a radical, substantial, and costly architectural and electronic change to HDD design.

Add a second servo actuator with another set of read/write heads and another set of read/write electronics-which is completely out of the question from an economic perspective.
Department of EIT Hochschule Rosenheim Page 30

Combining these trends together suggests that, what customers of big multi user servers would really like is - Faster disk drives with lower power consumption! But that's just getting tougher with hard disk technology.

Department of EIT Hochschule Rosenheim

Page 31

Chapter 4

4 . Solid State Drives


In the past few years flash memory became more and more important. In many mobile devices like mobile phones, digital cameras, USB memory sticks and mp3 players, flash memory is used in small amounts for years. But as the price for flash memory is rapidly decreasing and the storage density of flash memory chips is growing, it becomes feasible to use flash memory even in notebooks, desktop computers and servers. It is now possible to construct devices containing an array of single flash chips to build a device such that its amount of memory is sufficient to use for main storage. Such a device is called a Solid State Drive (SSD). Solid State Drives are increasingly common in small form factor computing like notebooks; but SSDs are also used in the desktop and enterprise server space by those looking leverage the speed of an SSD to get maximum performance. While solid state drives have several benefits, including speed, longevity and practically no noise output, they are not always the best choice as hard drives still dominate in both capacity and cost.

4.1 Flash Market Development


The market of flash memory is changing. The density of NAND Flash is increasing drastically. While flash memory chips can be made much smaller, their capacity doubles approximately every year This development leads to a widespread usage of flash memory. Today solid state drives are mainly used in notebooks. In the future they might even be used in server architectures as the standard configuration. Another interesting development is the cost of flash memory. The price of flash memory is rapidly dropping. Every month flash memory devices or flash memory storage cards get cheaper, new products with larger capacity emerge on the market.
Department of EIT Hochschule Rosenheim

Page 32

Figure 15: Evolution in density of NAND flash memory

4.2 Solid State Drives


Solid state drives do not need any mechanical parts. They are fully electronic device and use solid state memory to store the data persistently. Two different storage chips are used: flash memory or SDRAM memory chips. In this thesis only flash memory based solid state drives is considered which are mostly used today.

Department of EIT Hochschule Rosenheim

Page 33

Figure 16: HDD and SSD [30]

FLASH memory is the cornerstone of the Solid State Drive. With the increasing use of flashbased secondary storage, detailed understanding of flash behaviour which affects operating system design and performance becomes important. This chapter provides detailed information about flash memory. Then in multiple sections the internal parts of a solid state drive are discussed. The section 4.4 describes about the flash translation layer and techniques which ensure the functionality of the solid state drive.

4.2.1 FLASH MEMORY


Flash memory is a specific type of EEPROM that can be electrically erased and programmed in blocks. Flash memory is non-volatile memory. There are two different types of flash memory cells:

NOR flash memory cells. NAND flash memory cells.

At the beginning of flash memory NOR flash memory was often used. It can be addressed by the processor directly and is handy small amounts of storage.

Department of EIT Hochschule Rosenheim

Page 34

Figure 17: NAND flash memory chip [30] Today, NAND flash memory is used to store the data. It offers a much higher density which is more suitable for large data amounts. The costs are lower and the endurance is much longer than NOR flash. NAND flash can only be addressed at the page level. Flash memory can either come with single level cells (SLC) or multi level cells (MLC). The difference in these two cell models is that a SLC can only store 1 bit per cell (1 or 0), whereas MLC can store multiple bits (e.g. 00 or 01 or 10 or 11). Internally these values are managed by holding a different voltage level. Both flash memory cells are similar in their design. MLC flash devices cost less and allow a higher storage density. Therefore in most mass productions MLC cells are used. SLC flash devices provide faster write performance and greater

SLC High Density


Low Cost per Bit

MLC X X

Endurance Operating Temperature Range Low Power Consumption Write/Erase Speeds Write/Erase Endurance

x x x x x Table 4-1 SLC vs MLC [9]

reliability. SLC flash cells are usually used in high performance storage solutions. Table 4-1 compares the two cell models.

Department of EIT Hochschule Rosenheim

Page 35

Flash memory only allows two possible states:


erased programmed

When a flash memory cell is in the erased status then its bits are all set to zero (or one, depending on the flash device). Only when a flash cell is in the erased mode, the controller can write to that cell. In this example this means the 0 can be set to 1. Now the cell is programmed and kind of frozen. It is not possible to simply change back the 1 to a 0 and write again. The flash memory cell has to be erased first (Figure 18). The even worse fact is that not only a small couple of cells can be erased. The erase operation has to be done on a much larger scale. It can only be done in the granularity of erase units which are e.g. 512KB. If the amount of data being written is small compared to the erase unit, then a correspondingly large penalty is incurred in writing the data. The flash memory architecture is divided in blocks of flash memory. The smallest erasable block is called an erase unit. If the position of the written data overlaps two blocks, then both blocks have to be erased. However this erase operation must not be necessarily executed right before or after the write. The controller of the device might choose just a new block for the write request and update the internal addressing map.

Figure 18: Flash memory overwrite mechanism

Department of EIT Hochschule Rosenheim

Page 36

4.2.1.1 Flash Structure


NAND Flash memory is organized into blocks where each block consists of a fixed number of pages. Each page stores data and corresponding metadata and error correction code (ECC) information. A single page is the smallest read and write unit. The internal structure of Flash memory is rarely identical from chip to chip. As the technology has matured over the years, many smaller architectural changes are been made. There are, however, a few fundamentals for how Flash memory is constructed. Each chip will have a large number of storage cells. To be able to store data, these will be arranged into rows and columns [1]. This is called the Flash array. The Flash array is connected to a data register and a cache register (Figure 19). These registers are used when reading or writing data to or from the Flash array. By having a cache register in addition to a data register, the controller can process a request while the controller serves data. This enables the Flash memory bank, to internally, process the next request, while data is being read or written.

Figure 19 : A generic overview of a Flash memory bank [5]

4.2.1.2 Page
Pages in a Flash array are the smallest unit any higher level of abstraction will be working on. The size of a page may vary, depending on the specifics of physical structure, but are typically in the size of 4kB [6, 5]. . Having 128 pages, the next greater unit in the flash memory hierarchy is the erase unit with 512KB; this can vary from drive to drive. In addition, each cell will also have an allotted space for Error-Correction Code (ECC). During a read
Department of EIT Hochschule Rosenheim Page 37

operation, all the data from the page will be transferred to the data register. In a similar way, write operations to a page will write all data in the data register to the cells within a page. Recalling again, when writing flash cells supports only two operations. A cell can be in a neutral or a negative state. When writing data to a page, it is only possible to change from neutral (logical one) to negative (logical zero) state, meaning that to be able to change from zero to one, entire page need to be reset. On whole, Flash chips can be grouped together in so-called planes to increase storage capacity. Multiple planes can be accessed in parallel to enhance data throughput [12]

4.2.1.3 Erase Block


When re-setting cell state with field emission, multiple pages will be affected by the reset. This group of pages is called an erase block. A typical number of pages contained will be 128 [7], but can be different, depending on how the flash cells are structure. Given a page size of 4kB, an erase block would then be 512kB in size. This tells that changing content in any of the pages within the erase block would need to rewrite all 512kB. For this simple reason, inplace writes are not possible in Flash memory.

4.2.1.4 Cell degradation


The Each time a Flash cell is erased, the stress on the cell from the field emission will contribute to cell degradation [1]. Modern flash memory banks are usually rated for approximately 105 erase cycle, but to be able to handle a small number of faulty cells, each page will be fitted with ECC data.

4.3 Physical layout


While flash memory is the cornerstone of the Solid State Drive before data gets to the flash memory, there are several other SSD components that data must pass through. An SSD does not actually have many unique parts and the differentiation in SSDs from different manufacturers often happens in the controller and firmware more than anything else. There is little information released by hardware manufacturers about drive layout and how data is organized. To illustrate this, look at the entirety of what the Intel X25-E datasheet has to say about its architecture.
Department of EIT Hochschule Rosenheim

Page 38

Figure 20 : Components of SSD

The Intel X25-E SATA Solid State Drives utilize a cost effective System on Chip (SOC) design to manage a full SATA 3Gbps bandwidth with the host while managing multiple flash memory devices on multiple channels internally [2]. Having looked the structure of the Flash memory banks in Figure 19, gives an general idea of what to expect, but only a simple read/write operation. If seen from block diagram in Figure 23, an SSD connects several flash memory banks together in a Flash Controller (FC). In a single SSD there are usually multiple FCs, which are commonly called channels. As implied by the name, a channel will be able to independently process requests, giving SSDs the ability to internally process a number of operations in parallel.

Department of EIT Hochschule Rosenheim

Page 39

Figure 21: organization of conventional SSD

4.4 Flash Translation Layer (FTL)


In order to alleviate the erase-before-write problem in flash memory, most flash memory storage devices are equipped with a software or firmware layer called Flash Translation Layer (FTL) [11]. An FTL makes a flash memory storage device look like a hard disk drive to the upper layers. One key role of an FTL is to redirect each logical page write from the host to a clean flash memory page which has been erased, and to remap the logical page address from an old physical page to a new physical page. In addition to this address mapping, an FTL is also responsible for data consistency and uniform wear-leveling. The concept of the FTL is implemented by the controller of the solid state drive. The layer tries to efficiently manage the read and write access to the underlying flash memory chips. It hides all the details from the user. So when writing to the solid state drive the user does not have to worry about free blocks and the erase operation. All the managing is done internally by the FTL. It provides a mechanism to ensure that writes are distributed uniformly across the media. This process is called wear-leveling and prevents flash memory cells from wearing out.

4.4.1 Controller
The controller of a solid state drive manages all the internal processes to make the FTL work. It contains a mapping table that does the logical physical mapping. The logical address that comes from the request is mapped to the physical address which points to the flash memory block where the data is in fact stored. Whenever a read or write request arrives in the solid state drive the logical block address (LBA) first has to be translated into the physical block address (PBA) (Figure 22). The LBA is the block address used by the operating system to
Department of EIT Hochschule Rosenheim

Page 40

read or write a block of data on the flash drive. The PBA is the physical address of a block of data on the flash drive. Note that over time the PBA corresponding to one and the same LBA can change often.

Figure 22 : Address translation in solid state drive [8] The controller handles the wear-leveling process (see section 4.4.2). When a write request arrives at the solid state drive then a free block is selected, the data is written and the address translation table is updated. Internally the old block has not to be erased immediately. The controller could also choose to wait with the erasure and do a kind of garbage collection when the amount of free blocks falls below a certain limit. Or the controller may also wait until the drive is not busy. Certainly some data structures are used to maintain a free block list and to store the used blocks. In a flash memory block there is a little overhead memory where meta-data can be stored to help managing these structures. For example a counter stores how many times each block has already been erased. Like conventional hard disk drives, SSDs usually have an internal DRAM cache to buffer write requests or store prefetched pages. This buffer enables solid-state disks to backup and restore pages during erase cycles and to keep in-memory information, e.g., page-mapping structures. Using and increasing the DRAM cache and add more intelligent techniques to organize requests could make a huge difference. By using an FTL, it is possible to avoid most drawbacks of flash chips while making use of the advantages. Therefore, the FTL is a major performance-critical part of every SSD. For such a SSD controller one can think of many optimizations.

Department of EIT Hochschule Rosenheim

Page 41

Figure 23 : Internal structure of solid state drive [6]

Pre-fetching data when sequential read patterns occur (like a conventional hard disk drive could fill its whole buffer) might speed up the reading process. A controller could also write to different flash chips in parallel (Figure 23). Since all the parts are electronically in flash memory, parallelization might not be very hard to add. Flash memory can also be seen as many memory cells that are ordered in parallel. Using parallelization, the I/O requests, the erase process and the internal maintaining of the data structures get more complicated, but a much higher performance can be accomplished. One could even think of constructing a SSD containing several drives combined together as a RAID configuration inside.

4.4.2 Garbage collection and Wear-leveling


Garbage collection and wear-leveling are other important tasks of the FTL. Garbage collection is needed because blocks must be erased before they are used. The garbage collector works by scanning the SSD blocks for invalid pages, then reclaiming those invalid pages. Wear-leveling is necessary because most workloads write to a subset of blocks frequently, while rarely writing to other blocks. Because each block of flash memory only has a limited number of write-erases before it is worn out, without wear-leveling, the frequently written to blocks would easily wear out well before the other blocks. Wear-leveling helps solve this problem by shuffling cold (unused/less frequently used) blocks with hot (frequently used) blocks to balance out the number of writes over all of the flash memory
Department of EIT Hochschule Rosenheim

Page 42

4.4.3 Write Amplification


As seen in earlier section, SSDs will need workarounds to enable place while writing of data. That is, changing a few bytes of data will either need the entire erase block to move, or the entire erase block to be rewritten. Write amplification is a measure of the number of bytes actually written when writing a certain number of bytes. For example, if you write a 4K file, on average, the drive may write 40K bytes worth of data. This comes back to the flash characteristics. At some point, you will need to combine data from several partially used blocks to free up pages for new data to be written. Write amplification has an impact on the life of a drive. One effective way to measure drive lifetime is to measure how many bytes can be written to the drive over its lifetime.

4.4.4 Error correction


Knowing that a cell will loose its ability to properly store data after a certain number of writes, the SSD-controller needs to be able to handle erroneous pages in a graceful manner. To detect errors, each page has an allotted space for ECC. This makes it possible to check the consistency of the data on writes. The ECC will be used to handle a given number of damaged cells, but will at some point reach an un-correctable amount of noise. This page will then be marked as invalid, and no longer be used by the FTL.

4.4.5 Trim
Trim is a function of the operating system telling the drive that a page is no longer valid! This helps to reduce write amplification because you dont copy stale pages. There will also be fewer pages to copy, which will speed up the process of freeing up partially valid blocks. When it is time to consolidate blocks to free up space, the SSD must copy all of the data it considers valid to a new block before it can erase the current block. Without trim, the SSD does not know a page is invalid unless the LBA associated with it has been rewritten.

Department of EIT Hochschule Rosenheim

Page 43

4.5 Solid State Drive Interfaces


Solid State Drives are available with a variety of system interfaces based primarily on the performance requirements for the SDD in the system. Also, since SDDs are generally used in conjunction or interchangeable with magnetic disk drives, a common mass storage bus interface is used in most cases. This also allows the system software to manage both drive types in a similar way, making system integration nearly plug-and-play. There are also interfaces initially designed for other purposes but have been adopted by SSDs in some cases. Generally SSD are supported with SATA, Serial Attached SCSI, ATA/IDE similar to HDD but SSD also support the latest revised version of these standards. To meet the demand, SSD with PCI Express interfaces are used. In many server applications, PCI Express is used as the end interface only when HDDs are used in RAID mode just to meet the performance bandwidth of PCI Express. However, when SSD using PCI Express, the flash chips are directly placed on the PCI-Express card enabling to get better performance out of flash chips directly as compared to HDDs used in RAID through other interfaces.

Figure 24 : X4 PC-Express card with NAND flash chips on it [31]

4.5.1

PCI Express

PCI Express is a 2.5 Giga transfers/second serial differential point-to-point high-speed interconnect with added flexibility and scalability. The immediate benefit is increased
Department of EIT Hochschule Rosenheim Page 44

bandwidth. PCI Express offers 4GB/s of peak bandwidth per direction for a x16 link and 8 GB/s concurrent bandwidth. This allows for the highest performance in gaming and video capture. In addition, PCI Express is designed for cost parity. The PCI Express x16 connector is expected to be at cost parity to the high volume standard connectors. Peripheral Component Interconnect Express (PCI-e) is an internal interface, so a SSD would be on a circuit board and plugged into a PCI express slot on the motherboard.

4.6 SSD Market


The sources predict that SSD will have a major impact on the storage market. As of today, companies like Crucial, Intel, Fusion-io: just to name few, have already released high speed SSDs to the market. The architectural technologies that can speed up performance and IOPS are independently developed by various original equipment manufacturers to suit particular products or markets. The key architectural features which will increase throughput and shrink the asymmetry gap in read / write IOPS are:

Parallelization of the internal flash arrays Improved flash management technology. Faster flash controllers. Faster host interface controllers (and faster interfaces driven by the needs of the SSD market rather than adapted from the HDD market).

Hybridizing on board memory technologies - for example using faster RAM-like non volatile memory in some parts of the device and slower flash-like memory in the bulk storage arrays

A lot of trial and error will be involved as original equipment manufacturers throw products at the market which tweak the technologies they understand best - and see which products stick. Some of these will enhance currently known architectures, while others may make some architectural features obsolete. In years around the corner- the flash SSD technology is expected to have reached a point where the architecture of an ideal SSD is well established and the ongoing developments will be driven more by process changes than anything else.

Department of EIT Hochschule Rosenheim

Page 45

Figure 25 : SSD Market development

4.7 Future
The availability and maturity of SSD-technology has changed drastically over the last couple of years. Having gone from being a vastly more expensive technology that proved better in only a small subset of scenarios, With ONFI (Open NAND Flash Interface)[3] working intensively on NAND technology, The SSD future seems to be bright. ONFI has created the Block Abstracted NAND addendum specification to simplify the host controller design by relieving the host of the complexities of ECC, bad block management, and other low-level NAND management tasks. The ONFI block abstracted NAND revision 1.1 specification adds the high speed source synchronous interface, which provides up to a 5X improvement in bandwidth compared with the traditional asynchronous NAND interface. The ONFI workgroup continues to evolve the ONFI specifications to meet the needs of a rapidly growing and changing industry.

Department of EIT Hochschule Rosenheim

Page 46

ONFI 2.1 [3] contains a plethora of new features that deliver speeds of 166 MB/s and 200 MB/s, plus other enhancements to increase power, performance, and ECC capabilities. Along with ONFI, the SSD manufacturing companies are designing their products to meet fast interface technologies such as SATAIII, PCI Express. ONFI is dedicated to simplifying NAND flash integration into consumer electronic products, computing platforms, and any other application that requires solid state mass storage.

4.8 Summary
This chapter has given an overview of the technology behind SSDs. It is seen that flash cells are at a point where production and technology are mature enough to make storage devices capable of competing with magnetic disks. Discussions regarding some (FTL, Wear-leveling) of the challenges SSDs are faced with when using these Flash cells for bulk storage.

4.9 Typical characteristics of HDD and SSD


Reliability of the drive HDD drives use mechanical parts whose lifespan is limited. While SSD using flash memory can sustain almost 105 write cycles per write cell [21]. Access Speed The typical access time for a Flash based SSD is about 35 100 micro-seconds whereas that of a rotating disk is around 5,000 10,000 micro-seconds. That makes a Flash-based SSD approximately 100 times faster than a rotating disk. Consistent read performance Read performance does not change based on where data is stored on an SSD but in HDD, If data is written in a fragmented way, reading back the data will have varying response times. Defragmentation SSDs do not benefit from defragmentation because there is little benefit to reading data sequentially and any defragmentation process adds additional writes on the NAND flash that already have a limited cycle life [22]. HDDs may require defragmentation after
Department of EIT Hochschule Rosenheim

Page 47

continued operations or erasing and writing data, especially involving large files or where the disk space becomes low. Audible noise HDD have audible clicks and crunching sounds. While SDD drives are often quieter because they have no mechanical parts. Size Flash-based SSDs are manufactured in standard 2.5 and 3.5 form factors. 2.5 SSDs are normally used in laptops or notebooks while the 3.5 form factors are used in desktops Vibration SSDs are naturally more rugged than HDDs. SSD drive can sustain up to 1,000 Gs/0.5 ms of shock[16] before sustaining damage or a drop in performance while HDD drives can withstand up to 63 Gs/2ms while operating and 350 Gs/1ms [24] when turned off. Power Consumption SSDs have low power consumption over HDDs Heat Dissipation Along with the lower power consumption, there is also much lesser heat dissipation for systems using Flash-based SSDs as their data storage solution. This is due to the absence of heat generated from the rotating/movable media. This certainly proves to be the one of the main advantages of Flash-based SSDs relative to that of a traditional HDD. With less heat dissipation, it serves as the ideal data storage solution for mobile systems such as PDAs, notebooks, etc. Mean Time Between Failures (MTBF) Average MTBF for SSDs is approximately 2,000,000 Hours [16] while MTBF for HDDs is approximately 700,000 Hours [24] Cost Considerations As of February 2011, NAND flash SSDs cost about (US) $1.202.00 per GB and HDDs cost about (US) $0.05/GB for 3.5 inch and $0.10/GB for 2.5 inch drives.

Department of EIT Hochschule Rosenheim

Page 48

Chapter 5

5 . Performance: HDD vs SSD


A chapter earlier has indicated SSDs have the advantage of not having moving parts, giving it an overall low latency. Magnetic disks, on the other hand, have a harder time keeping latency low, due to seek and rotational latency. In this chapter, the focus is on how the above mentioned general performance characteristics add up when faced with specific application scenarios. The goal is to get a clear profile of both SSD and HDD, making the right choice when it comes to performance. There are various techniques used to analyse performance of storage drives, and there architecture behind the performance.

5.1 Benchmark
In computing, a benchmark is the act of running a computer program, a set of programs, or other operations, in order to assess the relative performance of an object, normally by running a number of standard tests and trials against it [23]. Benchmarks provide a method of comparing the performance of various subsystems across different chip/system architectures. The performance of both SSDs and magnetic disks can be difficult to summarize with just a few numbers. As discussed earlier, certain aspects of a disk might give different performance results, and one might get different performance depending on the workload. In addition to these uncertainties, different file systems will store data in a fundamentally different way. All this put together, it is hard to get a clear answer for what level of performance a given application can expect to achieve, only looking at numbers from datasheets. To investigate performance levels, up-to-date high-end SATA consumer and enterprise flash
solid state drives with mechanical hard disk drive are benchmarked. For this, When choosing

drives for benchmark, focused on mid-range alternatives, the two most popular/best SSDs in
Department of EIT Hochschule Rosenheim

Page 49

the market today are considered, namely Intel X25-E and Crucial Real C300. The two SSDs are differentiated by the type of memory and the system interface technology used. HDD from Seagate is considered for benchmarking.

Disk Specification
Make Type Size(GB) Form factor Interface Transfer rate(Gbps) Rotation Memory Average access time sequential read Sequential write
Seagate[24] HDD 80 3.5" SATA 1.5/3 7200 Magnetic Platter 4.16ms Intel X25-E[16] SSD 32 2.5" SATA 1.5/3 SLC NAND 0.08ms 250MB/s 170MB/s Crucial Real C300[17] SSD 128 2.5" SATA 06/03/1.5 MLC NAND <0.1ms 355MB/s 140MB/s

Table 5-1 Overview of drives in Benchmark environment

5.2 Benchmark environment


The benchmark environment consists of two different SSDs one from Intel and other from Crucial along with a magnetic disk drive from Seagate. Information about the drives, as provided from datasheets for comparison is available in Table 5-1. During benchmarks, for simplicity the drives are referred by their manufacturer. The test PC consists of Intel Xeon Processor 5600 and the Intel 5520 Chipset-2.67GHz and 2GB of Random Access Memory (RAM), running on windows XP (32bit) operating system. In section 5.3, different benchmarks are run on both SSD's and magnetic disk, their performance are analysed with the resulting benchmark values.

5.3 TPC-H Benchmark


Transaction Processing Performance Council (TPC) is a non-profit organization founded in 1988 to define transaction processing and database benchmarks and to disseminate objective, verifiable TPC performance data to the industry. TPC benchmarks are widely used today in evaluating the performance of computer systems.
Department of EIT Hochschule Rosenheim Page 50

The TPC-H benchmark is widely used in the database community as a yardstick to assess the performance of database management systems against large scale decision support applications. The benchmark is designed and maintained by the Transaction Processing Performance Council.

5.3.1 BACKGROUND AND SIGNIFICANCE OF TPC-H


The TPC-H benchmark [13] tests the performance of analytics servers used by decision support systems by measuring the performance of ad-hoc queries against a data set (called a scale factor) of a specific size while the underlying data is being modified. The objective is to simulate an on-line production database environment with an unpredictable query load that represents a business oriented decision support workload where a DBA must balance query performance and operational requirements such as locking levels and refresh functions. Results are usually expressed as QphH@Size for performance or $/QphH@Size where Size indicates the database size or scale factor used for the testing. The performance metric reported by TPC-H is called the TPC-H Composite Query-per-Hour Performance Metric (QphH@Size), and reflects multiple aspects of the capability of the system to process queries. TPC-H benchmarking database sizes are currently 1GB, 10GB, 30GB, 100GB, 300GB, 1,000GB, 3,000GB, 10,000GB, 30,000GB, and 100,000GB but the TPC discourages comparing results across different database sizes since database size is a major and obvious factor in performance. Although any benchmark, including the TPC-H, is unlikely to represent any particular customers decision support workload or environment, TPC-H is an important test because of the high level of stress it puts on many parts of a decision support system, and is used by virtually all major platform vendors, and many decision support system suppliers to demonstrate the performance attributes of their systems. In this thesis, there is one important consideration that has to be noted, contrast to expressing results in QphH@Size for performance, the time taken by individual queries to run against the database set is measured. The TPC provides a set of tools to build TPC-H benchmark .This tool contains code files. The tools provided with TPC-H includes, a database population generator (DBGEN) and a query template translator (QGEN)
Department of EIT Hochschule Rosenheim

Page 51

DBGEN and QGEN are written in ANSI 'C' for portability, and have been successfully ported to over a dozen different systems. While the TPC-H specification allows an implementer to use any utility to populate the benchmark database and to create the benchmark query sets, the resultant population must exactly match the output of DBGEN. The source codes have been provided to make the building a compliant database population and query sets as simple as possible. A TPC-H benchmark application package is created which is bound to a database, this application measures time taken by individual query to run against 10 GB database which is application specific, it is done by using DBGEN. The overview of TPC-H benchmark application created is shown in Figure 26. There are several steps to be followed in order to create an application package, for the detailed procedure look into appendix A TPC benchmark results are expected to be accurate representations of system performance. Hence, there are certain guidelines that are expected to be followed when measuring those results. The approach or methodology used in measurements is explicitly described in the specification [13].

Figure 26 : TPC-H benchmark application outline


Department of EIT Hochschule Rosenheim Page 52

5.3.2 Test scenario


SSDs are bit vulnerable to random read in contrast to sequential read and it is therefore interesting to observe how they perform by changing conditions for data seek through queries while scanning through large database. In addition to looking at read performance, it is interesting to know how same operations consume time such as scanning tables while traversals in database handled on Intel X-25E, Crucial Real C300, as opposed to Seagate's HDD. The application package is created using 17 queries from QGEN, each one of them will access 10 GB database. The application database is created on all three drives independently. The identical application is created and ran through the drives by bonding the packages to respective database. The execution time of every individual query is calculated. To discover the level of impact the access time has on performance, a set of same queries across three different disks was tested. These disks are listed in Table 5-1.Each query was made to run 115 times and the initial 15 runs have been excluded while calculating the average time taken. This was done to ensure standard deviation between query execution times is less than 3 percent and the processor is only into running the application query.

5.3.3 Results
When comparing the results from Figure 27, observation indicates the query execution times from Intel X-25 E and Crucial Real C300 are low and comparable in contrast to very high execution times in Seagate HDD.
TPC-H Benchmark performance

Average execution time (seconds)

1400 1200 1000 800 600 400 200 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Query number Seagate H DD Intel X25-E SSD Crucial C300 SSD

Figure 27 : TPCH benchmark performance results


Department of EIT Hochschule Rosenheim Page 53

The difference in execution time is not consistent over the queries executed; this solely depends on how the database is laid across the drives. Considering the query execution time, on an average, Intel X-25E is 8 times and Crucial C300 is 10 times faster than Seagate HDD. Both Intel X-25E and Crucial Real C300 SSD perform relatively close to that of advertised speed when doing reads, this is due to the symmetrical latency properties of SSDs. Except for the query readings-1, 8, 10, 12 and 16, both SSDs, across all file systems, achieve significantly lower execution times. This can be attributed to the fact that although database is distributed across flash memory chips, flash memory banks are channelled in parallel, hence the lower execution time. When getting a series of requests for data located on different channels, the SSDs are able to handle these requests in parallel.

The execution time of query readings-1, 8, 10, 12 and 16 in both SSDs and Seagate HDD show comparable values, this is because the queries are accessing sequentially stored data. Summarizing, solid state drives perform significantly better than hard disk drives in random operations in contrast to sequential operations.

5.4 Energy Efficiency Test


While talking about solid state drives, power consumption becomes an interesting point of discussion. Nowadays there are many huge server architectures that run 24 hours a day and consume quite an amount of power. An advantage of flash memory is that it has low power consumption. The one advantage of a solid state drive over a good hard disk drive in terms of power consumption is that the operation power consumption is lower. On the contrary several measures showed that the idle power consumption can be much lower in a hard drive. A workload can be surely found where the hard disk drive wins and others where the solid state drive can show better energy efficiency. Newer high performance solid state drives often contain an additional amount of DRAM cache which will also use some additional power. The general power consumption comparison is not that easy to do. No general statement can be found. Every hard disk drive or solid state drive has slightly different power consumption. See the critical article from Tom's Hardware [15] for more information.

Department of EIT Hochschule Rosenheim

Page 54

5.4.1 Test scenario


It is interesting to know how much on an average Seagate hard disk drive would consume power in comparison to Intel X25-E and Crucial Real C300.

The power consumption was measured for all 17 TPC-H queries running each of them for 10 times. In order to measure, cost control 3000-a product from Base Tech, was used, which measures not only power consumption but also energy costs accordingly. Consumption of the power indicated is not purely by the drives alone but also from the system on which it is running, the variations are quite acceptable due to moving parts of the Seagate hard disk.

5.4.2 Results
The results are displayed in the Figure 28

Power consumption
3 2.5

Power(KWh)

2 1.5 1 0.5 0 Seagate HDD Intel X 25-E Crucial C300

Figure 28: Comparison for energy efficiency These values are indirectly influenced by the total amount of time taken by individual drives to execute all the TPC-H queries. The Seagate drive consumes approximately 6 times more power than Solid State Drives under tests. This is mainly due to moving parts of Seagate drive;
especially during random readings, head needs to be moved repeatedly. In case of SSDs, no

additional power is required to activate the platters or the mechanical arms.


Department of EIT Hochschule Rosenheim

Page 55

However, as said earlier no specific conclusion can be brought out from this but, the picture shows the overall amount of power that can be saved by replacing HDD by solid state drives while performing the same task.

5.5 HD Tune Benchmark


HD Tune is a hardware-independent utility that administrators can use to perform a variety of hard disk diagnostics, regardless of manufacturer, to confirm hard disk health and performance. HD Tune is a hard disk utility which has many functions namely Benchmark (measures low level read/write performance), file benchmark, random access, health status check, drive temperature display. However, our main interest in HD tune is the benchmark functionality; the benchmark function itself offers four different tests: Transfer rate The data transfer rate is measured across the entire disk surface (default) or across the selected capacity [14] for the specified data block size. It gives option to measure transfer rate for both read and write. To prevent accidental data loss, the write test can only be performed on a disk with no partitions.

During the transfer test, certain parameters have to be set as per our requirements like Test speed/accuracy: the full test will read or write every sector on the disk. This will give the most accurate results but the test time will be very long. By choosing the partial test the transfer speed is sampled across the disk surface. The test time and accuracy can be chosen by moving the slider. Block size: block size which is used during the transfer rate test. Lower values may give lower test results. The default and recommended value is 64 KB. Access time The average access time is measured and displayed in milliseconds (ms). Burst rate The burst rate is the highest speed (in megabytes per second) at which data can be transferred from the drive interface (IDE, SATA, USB) to the operating system.
Department of EIT Hochschule Rosenheim

Page 56

CPU usage The CPU usage shows how much CPU time (in %) the system needs to read data from the hard disk. As seen in section 5.3.3, the time consumed by Seagate hard disk to run queries was quite high in comparison to Intel X-25E and Crucial Real C300. This variation could be due to various factors such as access time, average transfer rate. In section 5.5.1, the above mention factors by running through HD tunes through all the three drives were examined.

5.5.1 Test scenario


The HD utility was run over all three drives. For the transfer rate test, the block size setting was assigned to 4MB with full test in fast mode.

5.5.2 Results
Transfer rate:

Average Read Speed


300 250 200

MB/s

150 100 50 0 Seagate HDD Intel X 25-E Crucial C300

Figure 29: Read speed comparison The sequential read in Seagate hard disk is low compared to hard disk drives shown in Figure 29, Seagate HDD is nearly 4 times slower than Intel X25-E and 5 times slower than Crucial C300. This is mainly because of its large access time.
Department of EIT Hochschule Rosenheim

Page 57

Access time
18 16 14 15.7

Time(ms)

12 10 8 6 4 2 0 Seagate HDD Intel X 25-E Crucial C300 0.1 0.1

Figure 30: Access time comparison This access time difference itself shows how valuable Solid state drives could be in high speed real time application. The SSDs access data nearly 150 times faster than HDD.

5.6 Summary
From benchmarks, the results obtained fit many of the observations done in section 3.1.3. Magnetic disks showed an overall low performance on read operations, due to seek time and rotational delay. SSDs are 8-10 times faster for reads on an average, and 150 times faster with respect to the access speed compared to Hard Disk Drive. SSDs showed a high performance on read operations, even showing a higher degree of performance on random reads in TPC-H benchmark. Most likely, this could be attributed to the fact that an SSD consist of multiple flash memory chips, connected in parallel, as discussed in section 4.3. SSDs will depend heavily on FTL, as each channel can handle requests in parallel! The overall performance results indicate solid state drives give better performance then HDD, but upon keen observation, considerable performance difference can be seen between the two solid states drives. The crucial Real C300 SSD outperforms Intel X25E SSD in most of the benchmark tests conducted above. The performance difference could be due to various reasons like flash type, controller, system architecture, cache buffer used. In coming chapters, the discussions bring out the analysis for performance difference.

Department of EIT Hochschule Rosenheim

Page 58

Chapter 6

6 . Better Investment: SSD or additional RAM?


In comparison to hard disk drives, Solid state drives are faster, quieter, more efficient in energy consumption-but costlier! Usually, to get better performances with Hard disk drives, Systems are installed with more Random Access Memory. Since cost per Giga-Byte in hard disk drives is low, just by adding external RAM, performance outcome of the hard disk drives can be controlled. A system working on 10GB of data, if installed with RAM anything more than 10GB should increase system performance immensely as the data can be completely loaded into RAM thereby reducing the access time of hard disk drive. But, RAM is costlier by itself - stated in section 2.3.1. By adding more RAM to the system, the performance can be immensely increased; as seen in previous chapter, performance can also be enhanced replacing HDD by SSD in the system. To get a better picture of a better buy, section 6.1 focuses on comparing the performance of HDD with additional RAM to SSD with 2GB of RAM. This will give us an idea of a better investment between RAM and SSD to enhance system performance. To analyse this, the same two solid state drives used earlier in section 5.1 and the Seagate Hard disk drive are considered. TPC-H benchmark is used for performance comparison.

6.1 Benchmark Environment


Drive Specification
Make Type Size(GB) Interface Rotation Memory System RAM (GB)
Department of EIT Hochschule Rosenheim

Seagate[24] HDD 80 SATA 2.0 7200 Magnetic Platter 2 / 8 /12

Intel X25-E[16] SSD 32 SATA 2.0 SLC NAND 2

Crucial Real C300[17] SSD 128 SATA 2.0 MLC NAND 2

Table 6-0-1 Overview of drives in Benchmark environment Page 59

TPC-H benchmark with scale factor of 10 was run through all the drives initially with system memory 2GB RAM. Later, benchmark process was repeated in Seagate HDD with 8GB RAM and 12GB RAM.

CPU: Intel Xeon Processor 5600 Main board: Intel 5520 OS : Microsoft Windows 7 Professional x64; Memory:
o o o

2 GB, DDR3-1333 SDRAM (Kingston) 4 GB, DDR3-1333 SDRAM (Kingston) 8GB, DDR3- 1333 SDRAM( Micron)

The TPC-H queries were run for 115 times but while calculating average query execution time, first 15 results were excluded to ensure that processor is only into running the application query.

6.2 Results
To match the performance of SSD, the system was upended with more RAM in steps.

Figure 31: performance comparison between HDD with 12GB system RAM vs SSDs with 2GB system RAM

Analysis of the Figure 31 indicates, expect for the queries-1, 3, 5, 12 and 16 the performance of SSDs with 2GB RAM cannot be met even by increasing the RAM to 12GB in the system
Department of EIT Hochschule Rosenheim

Page 60

with Seagate HDD. Increasing RAM beyond 12GB to attain better performance is not productive in current scenario. Since database size is normally huge, large amount of RAM has to be appended consistently with increase in database size to enhance the performance. Considering overall query performances, SSD provide better results. The comparison of performances with variation in RAM with HDD is indicated in Figure 32. The performance seems to be at saturation level irrespective of increase in RAM except in queries 1, 2, 8, 10, 12, 15. Increasing RAM beyond actual database size to attain better performance is not productive.

Figure 32: performance comparison between HDD with 2GB, 8GB, and 12GB system RAM

6.3 Conclusion
Database applications (server) greatly benefit random disk access speed .This is why servers have large DRAM footprints used as disk cache. However, solid state drive provides significantly better performance in random reads. The test results indicate solid state drives performs better while dealing with large data. Considering server applications, it is worth investing on solid state drives than RAM. Hence solid state drives are a better for performance enhancements. A precise decision cannot be taken by just based on above results as the scenario cannot be generalised. Applications where random reads or writes are very less compared to sequential reads or writes, HDD with RAM is better buy saving the investment significantly. Therefore
Department of EIT Hochschule Rosenheim Page 61

you have to be really careful about where SSDs are used; otherwise it is very difficult to justify their additional cost.

6.4 Benchmark problems


There are no standard benchmarking tools that are specifically built to test Solid State Drives. Benchmarking SSDs using tools developed for HDDs causes several unique problems that need to be solved by developing benchmarking software that catches up with the technology. As mentioned, SSDs use different strategies and data geometry than conventional HDDs. This causes some functional differences and, more importantly, makes some benchmarks inadequate, particularly those that were optimized for the standard platter configuration of HDDs. Due to these addressing issues, some benchmarks could show radically different results also on the transfer graphs and or average the performance values incorrectly. Needless to say that the same algorithms applied to a functionally totally different device will not render the same realistic performance values, on the contrary, many of the test points will fall within one block but others will span from the end of one block to the beginning of another block. That will cause delays in the completion of the reads or writes and since the test samples are relatively small in size, it will result in low calculated performance values. Because the stride size is constant in most benchmarks and the page size is constant, too, these values will result in a saw-tooth pattern of the performance graph, simply as a consequence of the periodicity of the two address patterns. Some benchmarks appear to use test patterns that dont seem to work well with SSDs and, thus, generate artefacts. Thus benchmark cannot be viewed as cent percent representative of the SSD performance.

Department of EIT Hochschule Rosenheim

Page 62

Chapter 7

7 . Reverse engineering
As seen in section 5.3.3, Crucial Real C300 performed better than Intel X25E. This chapter focuses on system level structure of solid state drives deeply. Here, it is tried to analyze the factors that influenced the difference in performance. To visualize, reverse engineering process is carried on Intel X25E and Crucial Real C300. Reverse engineering is the process of discovering the technological principles of a human made device, object or system through analysis of its structure, function and operation. It often involves taking something (a mechanical device, electronic component, or software program) apart and analyzing its workings in detail to be used in maintenance, or to try to make a new device or program that does the same thing without using or simply duplicating (without understanding) any part of the original. Reverse engineering has its origins in the analysis of hardware for commercial or military advantage. The purpose is to deduce design decisions from end products with little or no additional knowledge about the procedures involved in the original production. The basic building blocks of the Solid State Drive are Flash chip Array , Host interface and Controller chip which holds the other two intact and manages entire system. Intel X25E and crucial Real C300 are approached by these building blocks for analysis.

7.1 Intel X25-Extreme


The Intel X-25E [16] board which is shown below in Figure 33 mainly has three kinds of chips namely: a set of flash chips, a controller and a DRAM chip. Intel X25 Extreme uses 50 nm Single Level Chip (SLC) to build its Flash array block. The X25-E uses a 10-channel storage controller backed by 16MB of cache. Amusingly, the cache is provided by Samsung K4S281632I-UC60 SDRAM memory chip. The storage controller is an Intel design that is particularly crafty, supporting not only SMART monitoring, but also
Department of EIT Hochschule Rosenheim Page 63

Native Command Queuing (NCQ). NCQ was originally designed to compensate for the rotational latency inherent to mechanical hard drives, but here it is being used in reverse, the ability of the SATA hard drive to queue and re-order commands to maximize execution efficiency. It takes a little time (time is of course relative when you're talking about an SSD whose access latency is measured in microseconds) between when a system completes a request and the next one is issued.

Figure 33: Intel X25 Extreme SSD Intel X25E is compatible with SATA 1.5 Gbps and 3 Gbps. The flash packages of course are only the building blocks of an SSD. Much of the magic comes from the architecture and optimizations of the SSD controller logic.

7.1.1 Controller Analysis


The Intel X25E controller was scanned and opened to gather more information about the internal structure. The carving of Intel X25E pictures are shown below

Department of EIT Hochschule Rosenheim

Page 64

Figure 34: Controller from Marvell on Intel X25-E SSD board The controller was a ball grid array package (BGA) with a single row wire bonding. Wires were made of gold used for bonding. Although the controller chip seems to be from Intel at the glance, the specifications on the die indicate it was from Marvell. The analog and digital sections in the controller die were well distinguished. The orientation of the controller die on the Intel X25E clearly indicates the SATA controller, DRAM controller and the Flash controller sections. The specifications of the controller are indicated in the Table 7-1, Table 7-2, Table 7-3.

7.2 Crucial Real C300


Crucial Real C300 [17] features with 16 MLC flash memory chips split evenly between the two sides of the circuit board. The 128GB capacity SSD uses 8GB flash chips that have two NAND dies apiece. Crucial Real SSD uses flash memory chips from Micron. The flash chips in modern solid-state drives usually conforms to the Open NAND Flash Interface (ONFI) 1.0 standard like the Intel X25E uses, but the Crucial Real SSD flash chips are hip to the much more recent ONFI 2.1 spec. The ONFI 2.1 specification pushes NAND performance levels into a new performance range: 166 MB/s to 200 MB/s. This new specification is the first NAND specification to specifically address the performance needs of solid-state drives to offer faster data transfer rates in combination with other new technologies like SATA 6 Gbps, USB 3.0 and PCI Express Gen2.
Department of EIT Hochschule Rosenheim

Page 65

Figure 35: Crucial Real C300 SSD If you want to wring more than 300MB/s from a mechanical hard drive, you are going to have to combine several of them in RAID. Solid-state drive makers are actually faced with the same challenge. Individual flash chips do not necessarily offer superior sequential throughput to traditional hard drives, which means that SSD seeking to maximize performance must distribute the load across numerous chips tied to multiple memory channels, effectively creating a multiple channel array within the confines of a single drive. The Crucial Real SSD inherits its 6Gbps Serial ATA support from Marvell's 88SS9174 flash controller which supports TRIM command set. TRIM works in conjunction with Marvell's
Department of EIT Hochschule Rosenheim

Page 66

garbage collection routine, which runs in the background to reclaim flash pages marked as available by the command. The frequency with which garbage collection is performed depends on how the drive is being used and how much free capacity it has available. With eight memory channels, the Marvell controller is two short of the ten channels Intel squeezed into its X25E SSD. Crucial claims the C300 SSD can sustain a sequential read rate of 355MB/s when connected to a 6Gbps SATA interface. The drive's sequential read performance purportedly drops to 265MB/s when using a 3Gbps link. Flipping the C300's circuit board reveals a DDR memory chip that serves as the drive's cache. The 128MB Micron DDR3 DRAM module offers decent cache performance for fast transaction buffering, which will become more important as SATA-III 6.0 Gbps transfers are observed.

7.2.1 Controller Analysis


The Marvell controller provides a lot better performance; it would be interesting to have a closer look. The scanned and opened up pictures provide more information about the internal structure.

Figure 36: Controller from Marvell on Crucial Real C300 SSD board Unlike Intel X25E controller, Crucial Real C300 controller die did not give clear picture. However, by the orientation of the controller die on the board, the SATA and cache interconnections was visualized. The closer look suggested, the controller chip was ball grid array (BGA) package with wire bonding. Contrast to single row bonding in Intel X25E controller die, Crucial Real C300's Marvell controller used three rows for bonding expect for SATA where it used single row bonding. The pads for wire bonding were neatly arranged in multiple rows to shrink the die size.
Department of EIT Hochschule Rosenheim Page 67

The surface of the die looked like FPGA but a detailed analysis suggested, it was a mesh used by Marvell to limit inheritance of the design by its competitors. The much deeper analysis was of interest but due to certain limitations, analysis was concluded at this stage.

7.3 Summary
The reverse engineering analysis of the controllers from Intel X-25E and Crucial C300 is summarized in the table 7-1. With same package technology, increased number of balls, the size of the die from Crucial Real C300 SSD controller is comparably small. Although both SSDs use controller from Marvell, Crucial C300 uses the latest release of it which includes improved firmware features.

7.3.1 Controller Specification


SSD Intel X25-E SSD (32GB) Crucial Real C300 SSD (128GB) Marvell -2007 Marvell -2009 Controller Manufacturer-Year 1.9x1.9 1.7x1.7 Chip Size (cm) PC29AS21AA0 88SS9174-BJP2 Part Number BGA BGA Package Technology 409 521 Balls 5.9 x 5.9 4.4 x 4.4 Die Size(mm) Wire Wire Bonding 1 Row 3 Rows (SATA signals 1 Row) # Bonding Rows Table 7-1 Controller chip details of Intel X25- E and Crucial Real C300 SSD As seen in section 4.4.1, a better performance can be obtained by increasing the number of channels, giving SSDs the ability to internally process a number of operations in parallel. The Intel X25-E uses 10 channels with 20 flash chips but Crucial C300 uses only 8 channels with 16 flash chips still giving a better performance. This is because, flash chips in Crucial C300 uses advanced interface standard (ONFI 2.1) in contrast to Intel X-25-Es ONFI 1.0 standard. ONFI 2.1 offers simplified synchronous flash controller design, but pushes performance levels to higher range 166 MB/s to 200 MB/s. This is summarized in table 7-2

7.3.2 Flash Interface


SSD Intel X25-E SSD (32GB) Crucial Real C300 SSD (128GB) SLC MLC Flash Chip 10 08 Number of Channels 1.0 2.1 ONFI standard* Speed/flash chip (MB/s) [26] 50 166-200 SDRAM DDR3 SDRAM Cache Table 7-2 Controller chip details of Intel X25- E and Crucial Real C300 SSD
Department of EIT Hochschule Rosenheim Page 68

The advanced flash interface technology in Crucial Real C300 is supported by SATA III i.e 6Gbps, while Intel is supported by SATA II interface.

7.3.3 Host Interface


SSD SATA Interface Compatibility Intel X25-E SSD (32GB) 1.5 Gbps , 3Gbps Crucial Real C300 SSD (128GB) 1.5Gbps,3Gbps,6Gbps

Table 7-3 Interface compatibility of Intel X25- E and Crucial Real C300 SSD

7.4 Conclusion
The latest SSD controller from Marvell and advanced flash chip interface standards obtained in Crucial Real C300 giving higher bandwidth, which is backed up by a hefty DDR3 buffer enables it to meet the demands for faster data rates with SATA 6Gbps. These mentioned features of Crucial Real C300 SSD contribute to outperform Intel X25E SSD. Controller is the main part which holds interfaces from surrounding units to produce extended performance, baring bottle necks of individual parts. Overall NAND performance is an important factor at a time when faster speeds are a critical design factor for solid state drives; especially the interfaces of those SSDs connect to offer faster data rates with Serial ATA 6 Gbps, USB 3.0, and PCI Express Gen2. *Note: In our tests, a common SATA 3Gbps was used for testing both the SSDs

Department of EIT Hochschule Rosenheim

Page 69

Chapter 8

8 .Designing optimal performance based SSD system


level architecture and its Controller cost estimation
System designers perform a series of trade-offs when selecting a particular controller for their target product and target market(s). The trade-offs include:

Programmatic cost, schedule, support, warranty, and availability. Technical performance, power, package options, features, scalability, and flexibility. Other commonality, compatibility, documentation, development support, testing, and reputation.

In the process of controller selection, the system designer is also doing the same analysis for the flash parts and other parts needed in the design. It is an iterative process to find the right combination of components to best meet the requirements for the particular product. Due to proprietary concerns, not all the controller design data is available to the general public over the Internet. There is however a significant amount of application detail that can be learned for each of the SSD controllers on the market by studying their use in existing SSDs. In order to meet the known performance bandwidth specifications of current interface technologies through SSD and put them into economic perspective, this section performs a package level cost-effectiveness analysis of controllers by varying the system level architecture of SSD-considering SSD controller price per performance as our metric of choice.

8.1 Cost estimation of controller for a system designed to meet performance specification
8.1.1 MATLAB GUIDE
GUIDE, the MATLAB graphical user interface development environment, provides a set of tools for creating graphical user interfaces (GUIs). These tools simplify the process of laying out and programming GUIs.
Department of EIT Hochschule Rosenheim

Page 70

Using the GUIDE Layout Editor, user can populate a GUI by clicking and dragging GUI components-such as axes, panels, buttons, text fields, sliders, and so on-into the layout area. User can also create menus and context menus for the GUI. From the Layout Editor, user can size the GUI, modify component look and feel, align components, set tab order, view a hierarchical list of the component objects, and set GUI options. A tool was created using MATLAB GUIDE, where the GUI will through-up options for the user to design his own system. The tool created defines the controller size and its cost for the designed system. The tool created is system level optimization tool, designed to optimize different interfaces in solid state storage system to get the best performance and cost for a desired system interface. It defines the Controller which meets the performance of system interface by varying the quantity of other opted type of integral parts of SSD.

8.2 Implementation factors in optimization tool


While designing the optimization tool, several factors have been taken into considerations which are listed below. Performance Factors Host Interface
Host Interface SATA 2.0 SATA 3.0 PCI-e 2.0* PCI-e 3.0* Coding technique 8b/10b 10b/8b 8b/10b 128b/130b Data rate/second 300MB 600MB 500MB 1GB Transfers/second 3GT/s 4GT/s 5GT/s 8GT/s Clock 3GHz 4GHz 5GHz 8GHz

Table 8-1 System Interface types and their performances


* Per Lane

Buffer Cache Interface


Cache Type DDR1 DDR2 DDR3 Transfers/second 200-400 MT/s 400-1066 MT/s 800-2133 MT/s

Table 8-2 Buffer Cache types and their performances


Department of EIT Hochschule Rosenheim Page 71

In setting the buffer cache interface for the designed system, its value assigned performs 4 times the maximum performance of system interface/Host interface selected. Flash Interface Flash interface is the main part in the solid state drive where the performance of the designed system is controlled. The performance is controlled by varying the number of flash channels, number of flash chips per channel, channel width. The tool provides with two different options i.e Single Level Cell and Multi Level Cell to be chosen while designing the system. Note: Flash read Performance per chip per channel is considered as a variable as it depends on the manufacturer. Controller Size Factors While calculating the controller size resulting from system designed, different interfaces that are to be handled and their resulting signal pins are considered. Table below lists the different signal pins that are considered for respective interfaces on the controller.
Signals Control Signals Flash Chips Chip select Write enable Command latch enable Ready/busy Reset/write protect DRAM** CK CK# CKE RST# RAS# CAS# WE# CS# ODT Memory Address Bank Address Data signal DQ MA BA DQ DQS DQS# DM Transmitter[+,-] Receiver[+,-] PET_P[x : 0] PET_N[x : 0] PER_P[x : 0] PER_N[x : 0] Transmitter Receiver SATA CLK PCI-Express* REF_CLK_P REF_CLK_N UART CLK

Table 8-3 SSD Controller Interface signals


* X: 2^Number of lanes ** More signals with varying quantity

The power to ground pin ratio for signal pins is set as variable.
Department of EIT Hochschule Rosenheim Page 72

Controller Cost Factors The tool calculates controller cost for two types of packages namely, Flip-Chip BGA and Wire bonded BGA. The packaging cost for FCBGA and wire bonded BGA is calculated as a function of variables such as die cost, number of I/Os, wafer level die yield, and assembly process yield. The cost of the designed controller is calculated in two sections Cost of the Die and cost of package. The size of the die depends on number of I/O pads, pad pitch and their arrangements.
Package type Pad pitch (microns) Bond pad configuration FCBGA 150 Area array pads Wirebonded BGA 80 Peripheral pads

The gross Die Per Wafer [DPW] can be estimated by the following expression: Wafer diameter [d, mm] Die size [S, mm2] The cost of the die is given by Cost per Die ($) = The cost for respective package depends on cost of each process step indicated in the figure

Figure 37 : Process flow for flip chip BGA and wire bonded BGA packaging.
Department of EIT Hochschule Rosenheim Page 73

8.2.1 The properties of the tool are as defined below


1. Verifies the designed system for optimality by warning over design and under design based on selected system interface maximum performance. 2. Estimates the number of pins on the controller chip based on the designed system. 3. Based on the number of pins indicated, the die size required and the cost is calculated with selected technology and resources. 4. Calculates the cost of the estimated controller chip for the designed system.

Figure 38: Design tool outlook

Figure 39: Warning- System is over designed or under designed with respect to performance specified. Department of EIT Hochschule Rosenheim Page 74

Figure 40: Cost calculation tool

8.2.2 Advantages of the tool


User can design the SSD system architecture for highest degree of performance by varying flash type, buffer type, their size, number of channels, channel bandwidth. The user is provided with input option to set the performance for flash modules (one of the main contributor for system performance) based on which the tool calculates the overall SSD system performance. Based on the flash module specifications, the tool calculates the system capacity. The tool warns the user when system is over and under designed. The tool suggests the number of buffer channels required automatically based on the type, size, and channel bandwidth of buffer selected. The tool provides options to select type of package for controller before calculating the cost.

8.2.3 Limitations
The tool has limited number of package options while calculating the Cost. The performance values indicated in the tool during design are all theoretical.

Department of EIT Hochschule Rosenheim

Page 75

8.3 Optimization tool consistency test for controller size


Systems are designed using optimization tool with interface SATA 2.0 and SATA 3.0. The controller size of the designed systems is compared with Intel X-25E and Crucial Real C300 shown below.
SSD Controller (system Interface) Intel X25-E (SATA 2.0) Crucial C300 (SATA 3.0) Tool Designed (Number of pins/balls) 448 554 Company Designed (Number of pins/balls) 409 521

Figure 41 : Controller size for the system with SATA 2.0 interface

Figure 42 : Controller size for the system with SATA 3.0 interface
Department of EIT Hochschule Rosenheim Page 76

The values from the tool indicate that controller sizes are comparable. The difference in values could be due to various reasons such as signal to power pins ratio. These values are just for comparison. The illustration of the tool for optimal system design and controller cost estimation is shown in Appendix B

8.4 Hints to use tool for optimal system design and controller cost estimation:
Select the desired host interface to set the performance specification for the system to be designed. Fill in the inputs such as flash chip performance, power-ground to I/O pins ratio, select type of cache desired. While designing the system, beware of the warning massage for over design or under design of the system. This helps user to design optimal system architecture for the performance specified in first step. Select calculate button to know total number of pins (balls) on controller chip for the designed system. To know the cost of the controller for the designed system, press Die Cost ($) which is at bottom left of your screen. Select desired, node technology and then select desired package type and finally press Chip cost button to view the cost.

Department of EIT Hochschule Rosenheim

Page 77

Chapter 9

9 . Summary
9.1 Conclusion
In conclusion, the common sense intuition that flash based Solid State Drives (SSDs) provide superior performance for large read I/O is validated. As studied, SSDs are several times faster for reads on an average, and extremely faster with respect to the access speed compared to hard disk drives. Solid state drives are more efficient in power consumption comparably. Heavily used transactional databases, where there is an excessive random I/O workload benefit the most from SSD technology as it additionally helps to negate the disk configuration issues. With HDD devices, it is critical as to how the database structure is laid out, the number of spindles etc, but, for SSD based systems, it does not matter whether data is laid or use column or row-oriented storage for our databases as all the data space finally results in the same performance!

Although it may be considered as application specific, considering the test results, investment on solid state drives is better than on Random Access Memory in regards to performance boost.

The system level optimization tool simulates the scenario which helps for studying the solid storage system and carrying out various odds and outs to enhance the performance of the system in order to save cost when its implemented practically. The development of system level optimization tool is very much effective in designing the solid state storage systems architecture for best performance for a desired system interface. A detailed analysis of the factors considered in the tool helps to guide the decision and clarify the effects of the variables on the cost of controller.

Department of EIT Hochschule Rosenheim

Page 78

With the tremendous hype, it makes truly excited about the potential of SSDs and the rate at which manufacturers are improving on this technology making them increasingly delicious, seeing them dominating the market is a little far from reality at this point.

9.2 Future work on SSD


DRAM based SSDs will continue as a niche product, as cost and capacities will continue to limit its use to system memory. If this continues, then improvements will have to be made in data management to make better use of these SSDs, bringing tiered storage architecture into an area that was traditionally just a flat file system. Developing middleware software to take advantage of SSD however will require a much longer time frame than simply improving on an existing product. It will also require a certain amount of discipline to manage a more graduated approach of architecture with a better level of overall management. Controller in solid state drive is a vital part; companies have to focus on bringing up better architecture. SSD controller technology has to be the target in parallel to flash interface. Currently, major SSD designs have moved from a 4 channel to a 10 channel controller and controllers with even more channels has to be implemented. This will allow SSD drives to perform much faster. Improvements in MLC technology in terms of reliability, capacity and cost will increase the appeal of SSDs. Alternative technologies are also waiting in the wings to replace SSD such as phase change memory (PRAM) and resistive memory (RRAM), which may give more appealing alternative technologies to SSD in terms of cost and performance ahead of what SSD can achieve.

Department of EIT Hochschule Rosenheim

Page 79

Appendix A
Building TPC-H benchmark
The TPC Benchmark H (TPC-H) is a decision support benchmark. It consists of a suite of business oriented ad-hoc queries and concurrent data modifications. The queries and the data populating the database have been chosen to have broad industry-wide relevance. This benchmark illustrates decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to critical business questions. TPC-H benchmark is an Embedded SQL database application, which connects to database and execute embedded SQL statements. Embedded SQL statements are embedded within a host language application. To build the TPC-H benchmark, TPC provides a set of tools namely, a database population generator (DBGEN) and a query template translator (QGEN). With IBM DB2 as a platform, DBGEN provides data for the database and QGEN provides SQL queries. Using these, TPC-H benchmark application can be created. This is done in two parts, creating TPC-H database and creating a package between query source file and the database.

TPC-H database generation


DBGEN generates 8 separate ascii files, each file will contain pipe-delimited data. Create 8 tables under database schema named as TPC-H and import each one of the ascii files into tables defined in the TPC-H database schema.

Assign keys by altering tables in TPC-H database as per TPC-H specification [13].

Department of EIT Hochschule Rosenheim

Page 80

TPC-H application package Source file is created by embedding TPC-H queries/SQL statements in 'C' programming language. To run applications written in compiled host languages, you must create the packages needed by the database manager at execution time. The Figure 43 shows the order of these steps, along with the various modules of a typical compiled DB2 application [4] 1. Create source files that contain programs with TPC-H queries. 2. Connect to a TPC-H database generated using DBGEN, then precompile each source

Figure 43: procedure to create application package

Department of EIT Hochschule Rosenheim

Page 81

file to convert embedded SQL source statements into a form the database manager can use. [The precompiler converts embedded SQL statements directly into DB2 run-time services API calls. When the precompiler processes a source file, it specifically looks for SQL statements and avoids the non-SQL host language. PRECOMPILE (PREP) is an application process that modifies source files containing embedded SQL statements (*.sqc) and yields host language calls consisting of a source file(s) (*.c) and a package. It is at this precompile time that the TIMESTAMP, which is also known as the UNIQUE ID or CONSISTENCY TOKEN, is generated and is associated with the package through the bind file and modified source code.] 3. Compile the modified source files (*.c) using the host language compiler. 4. Link the object files with the DB2 and host language libraries to produce an executable program. Compiling and linking (steps 3 and 4) create the required object modules 5. The BIND command invokes the bind utility. It prepares SQL statements stored in the bind file generated by the precompiler and creates a package that is stored in the database. Bind the bind file to create the package, or bind if different database is going to be accessed. Binding creates the package to be used by the database manager when the program is run. 6. Run the TPC-H benchmark application. The application accesses the TPC-H database using the access plans.

Department of EIT Hochschule Rosenheim

Page 82

Appendix B
System level optimization tool
The System level optimization tool, designed to optimize different interfaces in solid state storage system to get the best performance and cost for a desired system interface is illustrated. Different interfaces to the controller influencing the performance of Solid State Drives: Host Interface Flash Interface Number of Channels Channel width Flash Chip read performance Buffer cache Interface Cache type Cache standard I/O channel width Number of channels Cache size The tool is operated in two sections:

Section1: Design a solid state drive for optimal performance


The tool provides options to select different interfaces mentioned above while designing the solid state storage system. Based on the system interface (Host Interface) selected and its maximum performance, the tool focuses on performance optimality the tool warns if the system is over/under designed while selecting different interfaces which influence the performance. This is illustrated below in steps
Department of EIT Hochschule Rosenheim

Page 83

Step1: Select the desired Host Interface which is the critical factor based on which the system is designed. The system is designed to match the maximum performance offered by selected Host Interface. Here SATA 2.0 is selected for illustration

Step2: Enter all the parameters needed to design a system and calculate its performance such as flash chip read performance, signal to power pin ratio
Department of EIT Hochschule Rosenheim

Page 84

Step 3: Select the type of cache and its standard to vary the number of cache channels suitably. The buffer cache channels are selected ensuring that the performance of buffer cache is 4 times faster than the selected system interface. Here DDR2 is selected and flash chip read performance is entered as 25ns (nano-seconds).

Step 4: The Signal-pins button calculates the designed system performance along with the number of balls on the designed controller. In this case the tool shows a system warning as the system is under-designed for desired system interface. This is visualized by comparing
Department of EIT Hochschule Rosenheim Page 85

desired system performance and designed system performance which are 300MBps and 80MBps respectively. The Vcc-Vdd/IO pin ratio is considered as 1(every 2 IO pins requires 1 Vdd and 1 VCC).

Step 5: To increase the performance, the number of flash channels in parallel are increased. In this case it is increased from 2 to 4 channels.

Department of EIT Hochschule Rosenheim

Page 86

Step 6: The system warning indicated that the system is still under designed. So either the number of flash channels should be increased else the channel width can be increased to attain better performance. In this case the channel width is increased from 8bit to 16bit.

Step 7: When the designed system performance and the desired system performance are matched with +/- 10 percent, the warning stops, indicating that the system design is optimal with respect to selected Host Interface. The designed system is optimal in performance and resulting controller has 340 pins (balls).

Department of EIT Hochschule Rosenheim

Page 87

Step 8: If the designed system exceeds the maximum performance by host interface, the tool warns for overdesigned system. The system should be altered by comparing the host interface performance and designed system performance. Step 9: The designed system capacity can be increased by varying Flash Chips/Channel menu and also by selecting the number of Dies/Flash chip

Section 2: Cost calculation of the controller for the designed Solid State Drive
Continuing the previous example, the designed system has 340 pins (balls) as seen in section1, step 7. This section of the tool calculates cost of the die based on number of resulted pins along with selected parameters such as node technology, wafer diameter and the package selected. Considering the similar packages available from Texas Instruments, an estimation of cost can be made for the controller of the system designed.

Department of EIT Hochschule Rosenheim

Page 88

The cost of the controller chip for designed system is 9.89 $ approximately.

Department of EIT Hochschule Rosenheim

Page 89

Bibliography
[1] R. Bez, E. Camerlenghi, A. Modelli, and A. Visconti. Introduction to flash memory. Proceedings of the IEEE, 91(4):489502, April 2003. [2] Intel X25-E Extreme SATA Solid-State Drive http://download.intel.com/design/flash/nand/extreme/319984.pdf, [3] http://onfi.org/specifications/ [4] IBM DB2 Guide -IBM public library http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=/com.ibm.db2.luw. apdv.embed.doc/doc/c0021136.html [5] Cagdas Dirik and Bruce Jacob. The performance of pc solid-state disks (ssds) as a function of bandwidth, concurrency, device architecture, and system organization. In ISCA 09: Proceedings of the 36th annual international symposium on Computer architecture, pages 279289,New York, NY, USA, 2009. ACM. [6] Design tradeoffs for SSD performance, Nitin Agarwal,Vijayan Prabhakaran. www.usenix.org/event/usenix08/tech/full_papers/agrawal/agrawal.pdf [7] David Roberts, Taeho Kgil, and TrevorMudge. Integrating nand flash devices onto servers. Commun. ACM, 52(4):98103, 2009. [9] Super Talent Technology. SLC vs. MLC: An Analysis of Flash Memory. http://www.supertalent.com/datasheets/SLC_vs_MLCwhitepaper.pdf. [10] Trends in Enterprise Hard Disk Drives, Seiichi Sugaya (June 30, 2005) http://www.fujitsu.com/downloads/MAG/vol42-1/paper08.pdf [11] Intel Corporation. Intel - Understanding the Flash Translation Layer (FTL) Specication. http://www.embeddedfreebsd.org/Documents/Intel-FTL.pdf, 1998. [12] Imation. Solid State Drives - Data Reliability and Lifetime. http://www.imation.com/PageFiles/83/SSD-Reliability-Lifetime-White-Paper.pdf. [13] TPC BENCHMARKTM H- www.tpc.org/tpch/spec/tpch2.1.0.pdf
Department of EIT Hochschule Rosenheim

Page 90

[14] HD tune pro manual, hdtunepro.pdf [15] Tom's hardware. Flash SSD Update: More Results, Answers. http://www.tomshardware.com/reviews/ssd-hard-drive,1968-4.html. [16] Intel X25-E Extreme SATA Solid-State Drives http://download.intel.com/design/flash/nand/extreme/319984.pdf [17] RealSSD C300 2.5 Technical Specifications Crucialwww.crucial.com/pdf/Datasheets-letter_C300_RealSSD_v2-5-10_online.pdf [18] Computer organization and design: the hardware and software interface by David A.Patterson, John L. Hennessy (page 450-475) [19] en.wikipedia.org/wiki/computer_data_storage [20] Intel: Disk Interface Technology,Quick reference guide NP2108.pdf 1040211 [21] Write Endurance in Flash Drives: Measurements and Analysis, Simona Boboila & Peter Desnoyer-http://www.usenix.org/event/fast10/tech/full_papers/boboila.pdf [22] Intel High Performance Solid State Drive - Solid State Drive Frequently Asked Questions http://www.intel.com/support/ssdc/hpssd/sb/CS-029623.htm#5 [23] http://en.wikipedia.org/wiki/Benchmark_(computing) [24] Barracuda 7200.10 www.seagate.com/docs/pdf/datasheet/ds_7200_10.pdf [25] en.wikipedia.org [26] http://onfi.org/wp-content/uploads/2011/03/20100818_S104_Grunzke.pdf [27] http://www.novopc.com/2008/09/hard-disk/ [28] http://www.easy-computer-tech.com [29] http://www.ramsan.com/resources/SSDOverview [30] http://www.datarecoverytools.co.uk [31] www.bit-tech.net [32] http://tjliu.myweb.hinet.net

Department of EIT Hochschule Rosenheim

Page 91

Index
A
Addressing, 22 Advanced Technology Attachment, 25 MATLAB GUIDE, 71 Memory, 11 MLC, 35

O
Offline Storage, 16

B
ball grid array package (BGA), 65 Benchmark, 49

P
Page, 37 PCI Express, 44 Primary storage, 14 Processor cache, 14

C
cache, 12, 14, 23, 24, 25, 29, 37, 41, 54, 58, 63, 67 Cell degradation, 38 Controller, 40 Crucial Real C300, 65

R
RAM, 12 Reverse engineering, 63 ROM, 11 Rotational latency, 21

D
Disk access time, 21

E
Erase Block, 38

S
Secondary Storage, 16 Seek time, 21 Serial Advanced Technology Attachment, 27 SLC, 35 Small Computer System Interface, 26 Solid State Drives, 32 Storage Hierarchy, 13 System Architecture, 9

F
FLASH MEMORY, 34 Flash Structure, 37 Flash Translation Layer, 40

G
Garbage collection, 42

T
Tertiary Storage, 16 TPC-H, 50, 51, 52, 60, 81, 82, 83 Trim, 43

H
Hard Disk Drives, 19 HD Tune, 56

I
Intel X25-Extreme, 63

W
Wear-leveling, 42 Write Amplification, 43

M
Marvell, 65, 66, 67, 68, 69

Department of EIT Hochschule Rosenheim

Page 92

Das könnte Ihnen auch gefallen