Beruflich Dokumente
Kultur Dokumente
1 . INTRODUCTION TO MEMORY STORAGE DRIVES ......................................................................................... 7 1.1 1.2 1.3 1.4 AS IT EXISTS TODAY ............................................................................................................................... 7 SOLID STATE DRIVES A BRIEF OVERVIEW ........................................................................................... 7 THESIS OBJECTIVE ................................................................................................................................. 8 SUMMARY OF CHAPTERS ....................................................................................................................... 8
2 . SYSTEM MEMORY OVERVIEW ..................................................................................................................... 9 2.1 2.2 2.3 2.4 2.5 SYSTEM ARCHITECTURE ........................................................................................................................ 9 MEMORY ............................................................................................................................................. 11 STORAGE HIERARCHY.......................................................................................................................... 13 MEMORY CONTROLLER ....................................................................................................................... 17 SUMMARY............................................................................................................................................ 18
3 . MAGNETIC DISK STORAGE ........................................................................................................................ 19 3.1 3.2 3.3 3.4 3.5 HARD DISK DRIVES ............................................................................................................................. 19 HARD DISK DRIVE SYSTEM ARCHITECTURE .......................................................................................... 23 HARD DISK DRIVE INTERFACES ........................................................................................................... 25 EXTERNAL HARD DISK DRIVES ........................................................................................................... 28 FUTURE OF HARD DISK DRIVES ........................................................................................................... 29
4 . SOLID STATE DRIVES ................................................................................................................................. 32 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 FLASH MARKET DEVELOPMENT .......................................................................................................... 32 SOLID STATE DRIVES ........................................................................................................................... 33 PHYSICAL LAYOUT ............................................................................................................................... 38 FLASH TRANSLATION LAYER (FTL) .................................................................................................... 40 SOLID STATE DRIVE INTERFACES ........................................................................................................ 44 SSD MARKET ...................................................................................................................................... 45 FUTURE ................................................................................................................................................ 46 SUMMARY............................................................................................................................................ 47 TYPICAL CHARACTERISTICS OF HDD AND SSD ................................................................................... 47
5 . PERFORMANCE: HDD VS SSD .................................................................................................................... 49 5.1 5.2 5.3 5.4 5.5 5.6 BENCHMARK ........................................................................................................................................ 49 BENCHMARK ENVIRONMENT................................................................................................................ 50 TPC-H BENCHMARK ........................................................................................................................... 50 ENERGY EFFICIENCY TEST................................................................................................................... 54 HD TUNE BENCHMARK ....................................................................................................................... 56 SUMMARY............................................................................................................................................ 58
6 . BETTER INVESTMENT: SSD OR ADDITIONAL RAM? .................................................................................... 59 6.1 6.2 6.3 6.4 BENCHMARK ENVIRONMENT ............................................................................................................... 59 RESULTS .............................................................................................................................................. 60 CONCLUSION ....................................................................................................................................... 61 BENCHMARK PROBLEMS ...................................................................................................................... 62
7 . REVERSE ENGINEERING ............................................................................................................................. 63 7.1 7.2 INTEL X25-EXTREME........................................................................................................................... 63 CRUCIAL REAL C300 ........................................................................................................................... 65
Page 1
7.3 7.4
8 . DESIGNING OPTIMAL PERFORMANCE BASED SSD SYSTEM LEVEL ARCHITECTURE AND ITS CONTROLLER COST ESTIMATION ......................................................................................................................................... 70 8.1 8.2 8.3 8.4 COST ESTIMATION OF CONTROLLER FOR SYSTEM DESIGNED TO MEET PERFORMANCE SPECIFICATION . 70 IMPLEMENTATION FACTORS IN OPTIMIZATION TOOL ............................................................................ 71 OPTIMIZATION TOOL CONSISTENCY TEST FOR CONTROLLER SIZE ......................................................... 76 HINTS TO USE TOOL FOR OPTIMAL SYSTEM DESIGN AND CONTROLLER COST ESTIMATION: .................. 77
9 . SUMMARY ................................................................................................................................................ 78 9.1 9.2 CONCLUSION ....................................................................................................................................... 78 FUTURE WORK ON SSD ........................................................................................................................ 79
Page 2
List of Figures
Figure 1: View of personal computer system [25] _______________________________________________ Figure 2: Interconnections of memory components ______________________________________________ Figure 3: Forms of storage, divided according to their distance from the CPU [19] ____________________ Figure 4: Memory hierarchy in comparison with Cost/MB, Size & Access speed [32] ___________________ Figure 5: Memory controller hub ____________________________________________________________ Figure 6 : Hard Disk Drive [27] ____________________________________________________________ Figure 7 : Representations of sectors, blocks and tracks on platter surface [27] _______________________ Figure 8 : Representation of Hard Disk Drive as blocks __________________________________________ Figure 9 : Role of Cache buffer in Hard disk ___________________________________________________ Figure 10 : Typical IDE/ATA ribbon cable its socket on a motherboard [28] __________________________ Figure 11: A single-drop 68-conductor SCSI ribbon cable [28] ____________________________________ Figure 12: Close-ups of SATA cable and its slots on a motherboard [28] _____________________________ Figure 13: A Seagate 1TB external hard drive [28] ______________________________________________ Figure 14 Moving Parts in Hard Disk Drives [29] ______________________________________________ Figure 15: Evolution in density of NAND flash memory __________________________________________ Figure 16: HDD and SSD [30] _____________________________________________________________ Figure 17: NAND flash memory chip [30] _____________________________________________________ Figure 18: Flash memory overwrite mechanism ________________________________________________ Figure 19 : A generic overview of a Flash memory bank [5] _______________________________________ Figure 20 : Components of SSD _____________________________________________________________ Figure 21: organization of conventional SSD __________________________________________________ Figure 22 : Address translation in solid state drive [8] ___________________________________________ Figure 23 : Internal structure of solid state drive [6] ____________________________________________ Figure 24 : X4 PC-Express card with NAND flash chips on it [31] __________________________________ Figure 25 : SSD Market development_________________________________________________________ Figure 26 : TPC-H benchmark application outline ______________________________________________ Figure 27 : TPCH benchmark performance results ______________________________________________ Figure 28: Comparison for energy efficiency ___________________________________________________ Figure 29: Read speed comparison __________________________________________________________ Figure 30: Access time comparison __________________________________________________________ Figure 31: performance comparison between HDD with 12GB system RAM vs SSDs with 2GB system RAM _ Figure 32: performance comparison between HDD with 2GB, 8GB, and 12GB system RAM _____________ Figure 33: Intel X25 Extreme SSD _________________________________________________________ Figure 34: Controller from Marvell on Intel X25-E SSD board ____________________________________ Figure 35: Crucial Real C300 SSD __________________________________________________________ Figure 36: Controller from Marvell on Crucial Real C300 SSD board _______________________________ Figure 37: Design tool outlook ______________________________________________________________ Figure 38: Warning- System is over designed or under designed with respect to performance specified. ____ Figure 39: Cost calculation tool_____________________________________________________________ Figure 40 : Process flow for flip chip BGA and wire bonded BGA packaging. ________________________ Figure 41 : Controller size for the system with SATA 2.0 interface __________________________________ Figure 42 : Controller size for the system with SATA 3.0 interface __________________________________ Figure 43: procedure to create application package _____________________________________________ 10 10 15 17 17 20 21 23 24 26 27 28 29 30 33 34 35 36 37 39 40 41 42 44 46 52 53 55 57 58 60 61 64 65 66 67 74 74 75 73 76 76 81
Page 3
List of Tables
Table 4-1 SLC vs MLC [9] .................................................................................................................................. 35 Table 5-1 Overview of drives in Benchmark environment .................................................................................... 50 Table 6-1 Overview of drives in Benchmark environment .................................................................................... 59 Table 7-1 Controller chip details of Intel X25- E and Crucial Real C300 SSD ................................................... 68 Table 7-2 Controller chip details of Intel X25- E and Crucial Real C300 SSD ................................................... 68 Table 7-3 Interface compatibility of Intel X25- E and Crucial Real C300 SSD .................................................. 69 Table 8-1 System Interface types and their performances .................................................................................... 71 Table 8-2 Buffer Cache types and their performances ......................................................................................... 71 Table 8-3 SSD Controller Interface signals.......................................................................................................... 72
Page 4
Abbreviations:
Acronym BA BGA CS CK CKE CAS CLK DRAM DQ DQS DM MA MLC RST RAS REF_CLK_P/N SATA SSD SLC PATA PET_P/N HDD ODT UART WE Definition Bank Address Ball Grid Array Chip Select Clock Enable Clock Enable Rank Column Address Strobe Clock Dynamic Random Access Memory Data Bus Data Strobe Data Mask Memory Address Multi Level Cell Reset Row Address Strobe PCI Express Clock Serial ATA Solid State Drive Single Level Cell Parallel ATA PCI Express differential signal Hard Disk Drive On-Die Termination Universal Asynchronous Receiver Transmitter Write Enable
Page 5
Page 6
Chapter 1
Page 7
Performance evaluation of hard disk drives and solid state drives by making an extensive comparison followed by benchmarking,
Analyzing the architecture of an SSD controller and reverse engineering, Finally, developing a tool which suggests the most optimum and cost efficient system level SSD architecture based on a selected interface.
Page 8
Chapter 2
Central processing unit (CPU), Random access memory (RAM), Read-only memory (ROM), Input / output (I/O) ports, The system bus, A power supply unit (PSU).
In addition to these core components, in order to extend the functionality of the system and provide a computing environment so that a human operator interacts with much more ease, additional components are required which could include:
Secondary storage devices (e.g. disk drives), Input devices (e.g. keyboard, mouse, scanner) Output devices (e.g. display adapter, monitor, printer)
The core system components are mounted on a backplane, more commonly referred to as a mainboard (or motherboard). The mainboard is a relatively large printed circuit board that provides the electronic channels (buses) that carry data and control signals between the various components, as well as the necessary interfaces (in the form of slots or sockets) to allow the CPU, Memory cards and other components to be plugged into the system. In most cases, the ROM chip is built in to the mainboard, and the CPU and RAM must be compatible
Department of EIT Hochschule Rosenheim Page 9
with the mainboard in terms of their physical format and electronic configuration. Internal I/O ports are provided on the mainboard for devices such as internal disk drives and optical drives.
Figure 1: View of personal computer system [25] The relationship between the elements that make up the core of the system is illustrated below.
Figure 2: Interconnections of memory components The data flows back and forth between the processor and the memory over shared electrical conduits called buses which carry address, data, and control signals. Depending on the particular bus design, data and address signals can share the same set of wires, or they can use different sets.
Department of EIT Hochschule Rosenheim Page 10
External I/O ports are also provided on the mainboard to enable the system to be connected to external peripheral devices such as the keyboard, mouse, video display unit, and audio speakers. Both the video adaptor and audio card may be provided on-board (i.e. built in to the mainboard), or as separate plug-in circuit boards that are mounted in an appropriate slot on the mainboard. The mainboard also provides much of the control circuitry required by the various system components, allowing the CPU to concentrate on its main role, which is to execute programs. Memory is the most important integral part of a computational system. In this chapter, the focus is on memory organization as a clear understanding of these ideas is vital for the analysis of system performance.
2.2 Memory
Memory lies at the heart of the stored-program computer. The system memory is the place where the computer holds current programs and data that are in use. Although memory is used in many different forms around modern PC systems, it can be divided into two essential types
2.2.1 ROM
ROM refers to non volatile memory which means that it always retains data even after power is shut. In fact, it needs very little charge to retain its memory. It is used to store permanent or semi-permanent data that persists even while the system is turned off. It is usually used to store small start up programs like BIOS which is used to bootstrap the computer. There are many extended types of ROM namely
PROM: Programmable ROM EPROM: Erasable PROM EEPROM: Electrically Erasable PROM (Flash)
Page 11
Flash memory is essentially EEPROM with the added benefit that data can be written or erased in blocks, removing the one-byte-at-a-time limitation. This makes flash memory faster than EEPROM!
2.2.2 RAM
RAM refers to volatile memory which means that the data is lost once the power is turned off. There are two types of RAM, Static RAM (SRAM) and Dynamic RAM (DRAM).
DRAM: It stores each bit of data in a separate capacitor within an integrated circuit. The
capacitor can be either charged or discharged; these two states are taken to represent the two values of a bit, conventionally called '0' and '1'. Capacitors leak the charge stored in them slowly over time and thus must be refreshed every few milliseconds to prevent data loss.
DRAM is cheap memory owing to its simple design compared to SRAM. Designers use DRAM as it is much denser, uses less power and generates less heat than SRAM. For these
reasons, DRAM's are preferred over SRAM's to be used to build the main memory. There are many kinds of DRAM memories and new kinds appear in the market with regularity as manufacturers attempt to keep up with rapidly increasing processor speeds. Each design is based on the conventional DRAM cell, with optimizations that improve the speed with which the basic DRAM cells can be accessed. Synchronous DRAM (SDRAM) SDRAM has a synchronous interface, meaning that it waits for a clock signal before responding to control inputs and is therefore synchronized with the computer's system bus. The clock is used to drive an internal finite state machine that pipelines incoming instructions. This allows the chip to have a more complex pattern of operation, enabling higher speeds.
Page 12
Double Data Rate SDRAM (DDR SDRAM) DDR SDRAM has the same working principle. The difference is that DDR SDRAM doubles the bandwidth by double-pumping (transferring data on both the rising and the falling edge of the clock signal, without increasing the clock frequency. DDR2 SDRAM DDR2 is the next generation of memory developed after DDR. DDR2 increased the data transfer rate referred to as bandwidth by increasing the operational frequency to match the high FSB frequencies and by doubling the pre-fetch buffer rate. Like DDR SDRAM, DDR2 transfers data on both the rising and the falling edge of the clock signal. The trade off is that internal operations are carried out at only half the clock rate! DDR3 SDRAM DDR3 is the successor to DDR2. DDR3 increased the pre-fetch buffer size to 8 bits and increased the operating frequency once again resulting in high data transfer rates than its predecessor DDR2. Like DDR2 SDRAM, DDR3 transfers data on both the rising and the falling edge of the clock signal, although internal operations are limited to only a quarter of the clock rate! Rambus DRAM (RDRAM) This is an alternative proprietary technology with a higher maximum bandwidth than DDR SDRAM. Compared to other contemporary standards, Rambus shows a slight increase in latency, heat output, manufacturing complexity, and cost. Video RAM (VRAM) VRAM has two ports namely, DRAM port and video port. The second port, the video port, is typically read-only and is dedicated to providing a high bandwidth data channel for the graphics chipset. This is used in the frame buffers of graphics systems.
In practice, almost all computers use a variety of memory types, organized in a storage hierarchy around the CPU, as a trade-off between performance and cost. Generally, the lower a storage is in the hierarchy, the lesser its bandwidth and the greater its access latency is from the CPU. This traditional division of storage to primary, secondary, tertiary and off-line storage is also guided by cost per bit.
Processor registers are located inside the processor. Each register typically holds a word of data (often 32 or 64 bits). CPU instructions instruct the arithmetic and logic unit to perform various calculations or other operations on this data (or with the help of it). Registers are the fastest of all forms of computer data storage.
Processor cache is an intermediate stage between ultra-fast registers and much slower main memory. It's introduced solely to increase performance of the computer. Most actively used information in the main memory is just duplicated in the cache memory, which is faster, but of much lesser capacity. On the other hand it is much slower, but much larger than processor registers. Multi-level hierarchical cache setup is also commonly usedprimary cache being smallest, fastest and located inside the processor; secondary cache being somewhat larger and slower. secondary cache is the L2 cache, usually contained on the motherboard. However, more and more chip makers are planning to put this cache on board the processor itself. The benefit is that it will then run at the same speed as the processor, and costs less to put on the chip than to set up a bus and logic externally from the processor. The hierarchy continues and it is referred to as L3 cache. This cache used to be the L2 cache on the motherboard, but now that some processors include L1 and L2 cache on
Page 14
the chip, it becomes L3 cache. Usually, it runs slower than the processor, but faster than main memory. Random-Access Memory It is small-sized, but quite expensive at the same time. (The particular types of RAM used for primary storage are also volatile, i.e. they lose the information when not powered). Main memory is directly or indirectly connected to the central processing unit via a memory bus. It is actually two buses: an address bus and a data bus. The CPU firstly sends a memory address that indicates the desired location of data. Then it reads or writes the data itself using the data bus. Additionally, a memory management unit (MMU) is a small device between CPU and RAM recalculating the actual memory address, for example to provide an abstraction of virtual memory or other tasks.
Figure 3: Forms of storage, divided according to their distance from the CPU [19]
Department of EIT Hochschule Rosenheim
Page 15
Page 16
Figure 4: Memory hierarchy in comparison with Cost/MB, Size & Access speed [32] *The values are approximated for illustration
Figure 5: Memory controller hub The memory controller scans for the type and speed of the RAM connected. It also determines the maximum size of each individual memory module and the overall memory capacity of the system. Memory controllers contain the logic necessary to read, write and refresh the main memory. Considering DRAM for example, Reading and writing to DRAM is performed by
Department of EIT Hochschule Rosenheim Page 17
selecting the row and column data addresses of the DRAM as the inputs to the multiplexer circuit, where the de-multiplexer on the DRAM uses the converted inputs to select the correct memory location and return the data, which is then passed back through a multiplexer to consolidate the data in order to reduce the required bus width for the operation. Bus width is the number of parallel lines available to communicate with the memory cell. Memory controller bus widths range from 8 to 64-bit. In complicated systems, memory controllers are operated in parallel such as a four 64-bit bus operating in parallel, though some are designed to operate in "gang mode" where two 64-bit memory controllers can be used to access a 128-bit memory device.
2.5 Summary
In this chapter, an introduction to different memory technologies in a computer system is summarised. System memory hierarchy has been closely analysed which gives a much better idea, how to choose different storage technology based on type and size suitably. In coming chapters, more insights on current technological trends used for secondary storage are given. In next chapter, the focus is on current state of secondary storage, represented by magnetic disks also called as hard disk drives.
Page 18
Chapter 3
Page 19
spindle. The disk arm will have a separate head for each surface, and is able to write to more sectors without seeking to a different track. The same tracks across all surfaces are called a cylinder. Having cylinders will make it possible to increase read and write operations, as the disk arm can perform operations on multiple surfaces without needing to move to different position.
Page 21
III.
Transfer time : When the rst bit of the target sector is under the head, the drive can begin to read or write the contents of the sector. The transfer time for one sector depends on the rotational speed and the number of sectors per track. Thus, the average transfer time for one sector can be roughly estimated as
TAverage transfer = (1/RPM) x (1/average number of sectors/track)
With these characteristics, the seek time and rotational delay becomes a significant part of a random read or writes operation. For sequential operations, the disk will be able to work on entire tracks/cylinders at a time, continuing with neighbouring tracks/cylinders. Doing sequential read will, because of a short physical distance between the locations of data, minimize time used on seeks, resulting in an overall lower access time for the data.
3.1.4 Addressing
The location of a specific sector is referenced using its cylinder number, head number and sector number (this addressing scheme is often abbreviated to CHS). Indeed, the total number of sectors on the drive could be calculated by multiplying the number of cylinders by the number of read/write heads, and then multiplying the result by the number of sectors per track. Since the introduction of zoned bit recording (as mentioned above, this is a drive geometry in which the number of sectors per track is smaller at the centre of the disk) this calculation can no longer be used. The way in which sectors are addressed has also become more abstract, relieving the operating system software of the need to know about physical drive geometry. Note that sectors that are logically sequential are not necessarily physically contiguous. After reading a sector, there may be a small delay before the drive controller is ready to read another sector. Sectors that are logically sequential may therefore be spaced at discrete intervals on the disk to give the drive controller time to get ready to read the next sector - a technique known as interleaving. If an interleave factor of 3:1 were used for example, it would take three full rotations for the controller to read all of the sectors on a single track. Thanks to advances in technology, most modern hard drives do not need to use interleaving. Modern hard disk drives use logical block addressing (LBA), a simple linear addressing scheme in which each sector is given an integer index number, starting with 0. The drive controller translates each logical block address into a cylinder, head and sector
Department of EIT Hochschule Rosenheim Page 22
number in order to obtain the physical location of the sector on disk. The maximum number of sectors that can be addressed is dependent on the number of bits used for the logical block address.
System controller chip including the read/write channel, disk controller and RISC control processor (microcontroller),
Flash ROM chip containing drive firmware, RAM chip used as a cache buffer.
Figure 8 : Representation of Hard Disk Drive as blocks Disk controller is the most complicated drive component which determines the speed of data exchange between a HDD and HOST.
Page 23
Disk controller has four ports used for connection to a HOST, microcontroller, buffer RAM and data exchange channel between it and head disk assembly. Disk controller is an automatic device driven by microcontroller; from HOST side only standard registers of task file are accessible. Disk controller is programmed at the initialization stage by microcontroller, during the procedure it sets up the data encoding methods, selects the polynomial method of error correction, defines flexible or hard partitioning into sectors, etc. Buffer manager is a functional part of disk controller governing the operations of buffer RAM, referred to as cache. The capacity of the latter ranges in modern HDDs from 512Kb to 16MB. The buffer manager splits the whole buffer RAM into separate sectioned buffers. Special registers accessible from microcontroller contain the initial addresses of those sectioned buffers. When HOST exchanges data with one of the buffers the read/write channel can exchange data with another buffer sector. Thus the system achieves multi-sequencing for the processes of data reading/writing from/to disk and data exchange with HOST.
Page 24
The basic principle behind the operation of a simple cache is straightforward. Reading data from the hard disk is generally done in blocks of various sizes, not just one 512-byte sector at a time. The cache is broken into segments, or pieces, each of which can contain one block of data. When a request is made for data from the hard disk, the cache circuitry is first queried to see if the data is present in any of the segments of the cache. If it is present, it is supplied to the logic board without access to the hard disk's platters being necessary. If the data is not in the cache, it is read from the hard disk, supplied to the controller, and then placed into the cache in the event that it gets asked for again. Since the cache is limited in size, there are only so many pieces of data that can be held before the segments must be recycled. Typically the oldest piece of data is replaced with the newest one. This is called circular, first-in, first-out (FIFO) or wrap-around caching. The use of cache improves performance of any hard disk, by reducing the number of physical accesses to the disk on repeated reads and allowing data to stream from the disk uninterrupted when the bus is busy.
Page 25
parallel. Each ribbon cable could connect two ATA drives in a master-slave configuration. Enhanced IDE, introduced in anticipation of changes to the ATA standard, allowed the use of direct memory access (DMA) which meant that data could be transferred directly between the disk and memory without involving the CPU in the data transfer process. This freed up the CPU for other tasks.
PCI-E expansion slot. Parallel SCSI has largely been superseded in server and mass storage applications by Fibre Channel (FC) or Serially Attached SCSI (SAS), both of which use a high-speed serial interface.
since 2003 have integrated SATA controllers (although an add-on controller card can be installed in a PCI or PCI-E slot). The SATA controller can use the Advanced Host Controller Interface (AHCI) in order to take advantage of advanced features such as the hot-swapping of drives, providing both the motherboard and operating system support AHCI. If not, SATA controllers are capable of operating in "IDE emulation" mode.
Figure 12: Close-ups of SATA cable and its slots on a motherboard [28]
power supply. The introduction of eSATAp is intended to resolve this issue, while USB 3.0 will reportedly be able to provide voltages of 5V, 12V or 24V. At the time of writing, the storage capacity of a typical external hard drive can range from a few hundred gigabytes up to 4 terabytes.
To meet the demands of the fast growing interface technologies, the data access time should reduce immensely which is possible either by simply increasing the rotational speed of the platters or increasing the cache size to hide the latency.
Page 29
A white paper by Fujitsu Trends in Enterprise Hard Disk Drives [10] quotes "Ultrahigh-speed HDDs rotating at speeds exceeding 20,000 rpm have also been researched but not commercialized due to heat generation, power consumption, noise, vibration and other problems in characteristics, and a lack of long term reliability." Companies have tried ingenious designs to reduce the excessive heat produced by a high spin rate. Generally, the physical disk platters of a standard 3.5 inch hard disk have an approximate diameter of 3 inches. However, in companies like Pegasus II, the platter size has been further reduced to 2.5 inches.
Figure 14 Moving Parts in Hard Disk Drives [29] The smaller platters cause less air friction and therefore reduce the amount of heat generated by the drive. In addition, the actual drive chassis is one big heat fin, which also helps dissipate the heat. But, the disadvantage is that since the platters are smaller they have less data capacity. This can be overcome by using more of them in stack but consequently the height of the drive increases. To get higher data rates from HDDs, manufacturers can
Spin the disks faster-but at 20,000 RPM, enterprise-class HDD platters are already under severe mechanical stress.
Increase the number of read/write heads that can be active simultaneously-which constitutes a radical, substantial, and costly architectural and electronic change to HDD design.
Add a second servo actuator with another set of read/write heads and another set of read/write electronics-which is completely out of the question from an economic perspective.
Department of EIT Hochschule Rosenheim Page 30
Combining these trends together suggests that, what customers of big multi user servers would really like is - Faster disk drives with lower power consumption! But that's just getting tougher with hard disk technology.
Page 31
Chapter 4
Page 32
Page 33
FLASH memory is the cornerstone of the Solid State Drive. With the increasing use of flashbased secondary storage, detailed understanding of flash behaviour which affects operating system design and performance becomes important. This chapter provides detailed information about flash memory. Then in multiple sections the internal parts of a solid state drive are discussed. The section 4.4 describes about the flash translation layer and techniques which ensure the functionality of the solid state drive.
At the beginning of flash memory NOR flash memory was often used. It can be addressed by the processor directly and is handy small amounts of storage.
Page 34
Figure 17: NAND flash memory chip [30] Today, NAND flash memory is used to store the data. It offers a much higher density which is more suitable for large data amounts. The costs are lower and the endurance is much longer than NOR flash. NAND flash can only be addressed at the page level. Flash memory can either come with single level cells (SLC) or multi level cells (MLC). The difference in these two cell models is that a SLC can only store 1 bit per cell (1 or 0), whereas MLC can store multiple bits (e.g. 00 or 01 or 10 or 11). Internally these values are managed by holding a different voltage level. Both flash memory cells are similar in their design. MLC flash devices cost less and allow a higher storage density. Therefore in most mass productions MLC cells are used. SLC flash devices provide faster write performance and greater
MLC X X
Endurance Operating Temperature Range Low Power Consumption Write/Erase Speeds Write/Erase Endurance
reliability. SLC flash cells are usually used in high performance storage solutions. Table 4-1 compares the two cell models.
Page 35
erased programmed
When a flash memory cell is in the erased status then its bits are all set to zero (or one, depending on the flash device). Only when a flash cell is in the erased mode, the controller can write to that cell. In this example this means the 0 can be set to 1. Now the cell is programmed and kind of frozen. It is not possible to simply change back the 1 to a 0 and write again. The flash memory cell has to be erased first (Figure 18). The even worse fact is that not only a small couple of cells can be erased. The erase operation has to be done on a much larger scale. It can only be done in the granularity of erase units which are e.g. 512KB. If the amount of data being written is small compared to the erase unit, then a correspondingly large penalty is incurred in writing the data. The flash memory architecture is divided in blocks of flash memory. The smallest erasable block is called an erase unit. If the position of the written data overlaps two blocks, then both blocks have to be erased. However this erase operation must not be necessarily executed right before or after the write. The controller of the device might choose just a new block for the write request and update the internal addressing map.
Page 36
4.2.1.2 Page
Pages in a Flash array are the smallest unit any higher level of abstraction will be working on. The size of a page may vary, depending on the specifics of physical structure, but are typically in the size of 4kB [6, 5]. . Having 128 pages, the next greater unit in the flash memory hierarchy is the erase unit with 512KB; this can vary from drive to drive. In addition, each cell will also have an allotted space for Error-Correction Code (ECC). During a read
Department of EIT Hochschule Rosenheim Page 37
operation, all the data from the page will be transferred to the data register. In a similar way, write operations to a page will write all data in the data register to the cells within a page. Recalling again, when writing flash cells supports only two operations. A cell can be in a neutral or a negative state. When writing data to a page, it is only possible to change from neutral (logical one) to negative (logical zero) state, meaning that to be able to change from zero to one, entire page need to be reset. On whole, Flash chips can be grouped together in so-called planes to increase storage capacity. Multiple planes can be accessed in parallel to enhance data throughput [12]
Page 38
The Intel X25-E SATA Solid State Drives utilize a cost effective System on Chip (SOC) design to manage a full SATA 3Gbps bandwidth with the host while managing multiple flash memory devices on multiple channels internally [2]. Having looked the structure of the Flash memory banks in Figure 19, gives an general idea of what to expect, but only a simple read/write operation. If seen from block diagram in Figure 23, an SSD connects several flash memory banks together in a Flash Controller (FC). In a single SSD there are usually multiple FCs, which are commonly called channels. As implied by the name, a channel will be able to independently process requests, giving SSDs the ability to internally process a number of operations in parallel.
Page 39
4.4.1 Controller
The controller of a solid state drive manages all the internal processes to make the FTL work. It contains a mapping table that does the logical physical mapping. The logical address that comes from the request is mapped to the physical address which points to the flash memory block where the data is in fact stored. Whenever a read or write request arrives in the solid state drive the logical block address (LBA) first has to be translated into the physical block address (PBA) (Figure 22). The LBA is the block address used by the operating system to
Department of EIT Hochschule Rosenheim
Page 40
read or write a block of data on the flash drive. The PBA is the physical address of a block of data on the flash drive. Note that over time the PBA corresponding to one and the same LBA can change often.
Figure 22 : Address translation in solid state drive [8] The controller handles the wear-leveling process (see section 4.4.2). When a write request arrives at the solid state drive then a free block is selected, the data is written and the address translation table is updated. Internally the old block has not to be erased immediately. The controller could also choose to wait with the erasure and do a kind of garbage collection when the amount of free blocks falls below a certain limit. Or the controller may also wait until the drive is not busy. Certainly some data structures are used to maintain a free block list and to store the used blocks. In a flash memory block there is a little overhead memory where meta-data can be stored to help managing these structures. For example a counter stores how many times each block has already been erased. Like conventional hard disk drives, SSDs usually have an internal DRAM cache to buffer write requests or store prefetched pages. This buffer enables solid-state disks to backup and restore pages during erase cycles and to keep in-memory information, e.g., page-mapping structures. Using and increasing the DRAM cache and add more intelligent techniques to organize requests could make a huge difference. By using an FTL, it is possible to avoid most drawbacks of flash chips while making use of the advantages. Therefore, the FTL is a major performance-critical part of every SSD. For such a SSD controller one can think of many optimizations.
Page 41
Pre-fetching data when sequential read patterns occur (like a conventional hard disk drive could fill its whole buffer) might speed up the reading process. A controller could also write to different flash chips in parallel (Figure 23). Since all the parts are electronically in flash memory, parallelization might not be very hard to add. Flash memory can also be seen as many memory cells that are ordered in parallel. Using parallelization, the I/O requests, the erase process and the internal maintaining of the data structures get more complicated, but a much higher performance can be accomplished. One could even think of constructing a SSD containing several drives combined together as a RAID configuration inside.
Page 42
4.4.5 Trim
Trim is a function of the operating system telling the drive that a page is no longer valid! This helps to reduce write amplification because you dont copy stale pages. There will also be fewer pages to copy, which will speed up the process of freeing up partially valid blocks. When it is time to consolidate blocks to free up space, the SSD must copy all of the data it considers valid to a new block before it can erase the current block. Without trim, the SSD does not know a page is invalid unless the LBA associated with it has been rewritten.
Page 43
4.5.1
PCI Express
PCI Express is a 2.5 Giga transfers/second serial differential point-to-point high-speed interconnect with added flexibility and scalability. The immediate benefit is increased
Department of EIT Hochschule Rosenheim Page 44
bandwidth. PCI Express offers 4GB/s of peak bandwidth per direction for a x16 link and 8 GB/s concurrent bandwidth. This allows for the highest performance in gaming and video capture. In addition, PCI Express is designed for cost parity. The PCI Express x16 connector is expected to be at cost parity to the high volume standard connectors. Peripheral Component Interconnect Express (PCI-e) is an internal interface, so a SSD would be on a circuit board and plugged into a PCI express slot on the motherboard.
Parallelization of the internal flash arrays Improved flash management technology. Faster flash controllers. Faster host interface controllers (and faster interfaces driven by the needs of the SSD market rather than adapted from the HDD market).
Hybridizing on board memory technologies - for example using faster RAM-like non volatile memory in some parts of the device and slower flash-like memory in the bulk storage arrays
A lot of trial and error will be involved as original equipment manufacturers throw products at the market which tweak the technologies they understand best - and see which products stick. Some of these will enhance currently known architectures, while others may make some architectural features obsolete. In years around the corner- the flash SSD technology is expected to have reached a point where the architecture of an ideal SSD is well established and the ongoing developments will be driven more by process changes than anything else.
Page 45
4.7 Future
The availability and maturity of SSD-technology has changed drastically over the last couple of years. Having gone from being a vastly more expensive technology that proved better in only a small subset of scenarios, With ONFI (Open NAND Flash Interface)[3] working intensively on NAND technology, The SSD future seems to be bright. ONFI has created the Block Abstracted NAND addendum specification to simplify the host controller design by relieving the host of the complexities of ECC, bad block management, and other low-level NAND management tasks. The ONFI block abstracted NAND revision 1.1 specification adds the high speed source synchronous interface, which provides up to a 5X improvement in bandwidth compared with the traditional asynchronous NAND interface. The ONFI workgroup continues to evolve the ONFI specifications to meet the needs of a rapidly growing and changing industry.
Page 46
ONFI 2.1 [3] contains a plethora of new features that deliver speeds of 166 MB/s and 200 MB/s, plus other enhancements to increase power, performance, and ECC capabilities. Along with ONFI, the SSD manufacturing companies are designing their products to meet fast interface technologies such as SATAIII, PCI Express. ONFI is dedicated to simplifying NAND flash integration into consumer electronic products, computing platforms, and any other application that requires solid state mass storage.
4.8 Summary
This chapter has given an overview of the technology behind SSDs. It is seen that flash cells are at a point where production and technology are mature enough to make storage devices capable of competing with magnetic disks. Discussions regarding some (FTL, Wear-leveling) of the challenges SSDs are faced with when using these Flash cells for bulk storage.
Page 47
continued operations or erasing and writing data, especially involving large files or where the disk space becomes low. Audible noise HDD have audible clicks and crunching sounds. While SDD drives are often quieter because they have no mechanical parts. Size Flash-based SSDs are manufactured in standard 2.5 and 3.5 form factors. 2.5 SSDs are normally used in laptops or notebooks while the 3.5 form factors are used in desktops Vibration SSDs are naturally more rugged than HDDs. SSD drive can sustain up to 1,000 Gs/0.5 ms of shock[16] before sustaining damage or a drop in performance while HDD drives can withstand up to 63 Gs/2ms while operating and 350 Gs/1ms [24] when turned off. Power Consumption SSDs have low power consumption over HDDs Heat Dissipation Along with the lower power consumption, there is also much lesser heat dissipation for systems using Flash-based SSDs as their data storage solution. This is due to the absence of heat generated from the rotating/movable media. This certainly proves to be the one of the main advantages of Flash-based SSDs relative to that of a traditional HDD. With less heat dissipation, it serves as the ideal data storage solution for mobile systems such as PDAs, notebooks, etc. Mean Time Between Failures (MTBF) Average MTBF for SSDs is approximately 2,000,000 Hours [16] while MTBF for HDDs is approximately 700,000 Hours [24] Cost Considerations As of February 2011, NAND flash SSDs cost about (US) $1.202.00 per GB and HDDs cost about (US) $0.05/GB for 3.5 inch and $0.10/GB for 2.5 inch drives.
Page 48
Chapter 5
5.1 Benchmark
In computing, a benchmark is the act of running a computer program, a set of programs, or other operations, in order to assess the relative performance of an object, normally by running a number of standard tests and trials against it [23]. Benchmarks provide a method of comparing the performance of various subsystems across different chip/system architectures. The performance of both SSDs and magnetic disks can be difficult to summarize with just a few numbers. As discussed earlier, certain aspects of a disk might give different performance results, and one might get different performance depending on the workload. In addition to these uncertainties, different file systems will store data in a fundamentally different way. All this put together, it is hard to get a clear answer for what level of performance a given application can expect to achieve, only looking at numbers from datasheets. To investigate performance levels, up-to-date high-end SATA consumer and enterprise flash
solid state drives with mechanical hard disk drive are benchmarked. For this, When choosing
drives for benchmark, focused on mid-range alternatives, the two most popular/best SSDs in
Department of EIT Hochschule Rosenheim
Page 49
the market today are considered, namely Intel X25-E and Crucial Real C300. The two SSDs are differentiated by the type of memory and the system interface technology used. HDD from Seagate is considered for benchmarking.
Disk Specification
Make Type Size(GB) Form factor Interface Transfer rate(Gbps) Rotation Memory Average access time sequential read Sequential write
Seagate[24] HDD 80 3.5" SATA 1.5/3 7200 Magnetic Platter 4.16ms Intel X25-E[16] SSD 32 2.5" SATA 1.5/3 SLC NAND 0.08ms 250MB/s 170MB/s Crucial Real C300[17] SSD 128 2.5" SATA 06/03/1.5 MLC NAND <0.1ms 355MB/s 140MB/s
The TPC-H benchmark is widely used in the database community as a yardstick to assess the performance of database management systems against large scale decision support applications. The benchmark is designed and maintained by the Transaction Processing Performance Council.
Page 51
DBGEN and QGEN are written in ANSI 'C' for portability, and have been successfully ported to over a dozen different systems. While the TPC-H specification allows an implementer to use any utility to populate the benchmark database and to create the benchmark query sets, the resultant population must exactly match the output of DBGEN. The source codes have been provided to make the building a compliant database population and query sets as simple as possible. A TPC-H benchmark application package is created which is bound to a database, this application measures time taken by individual query to run against 10 GB database which is application specific, it is done by using DBGEN. The overview of TPC-H benchmark application created is shown in Figure 26. There are several steps to be followed in order to create an application package, for the detailed procedure look into appendix A TPC benchmark results are expected to be accurate representations of system performance. Hence, there are certain guidelines that are expected to be followed when measuring those results. The approach or methodology used in measurements is explicitly described in the specification [13].
5.3.3 Results
When comparing the results from Figure 27, observation indicates the query execution times from Intel X-25 E and Crucial Real C300 are low and comparable in contrast to very high execution times in Seagate HDD.
TPC-H Benchmark performance
1400 1200 1000 800 600 400 200 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Query number Seagate H DD Intel X25-E SSD Crucial C300 SSD
The difference in execution time is not consistent over the queries executed; this solely depends on how the database is laid across the drives. Considering the query execution time, on an average, Intel X-25E is 8 times and Crucial C300 is 10 times faster than Seagate HDD. Both Intel X-25E and Crucial Real C300 SSD perform relatively close to that of advertised speed when doing reads, this is due to the symmetrical latency properties of SSDs. Except for the query readings-1, 8, 10, 12 and 16, both SSDs, across all file systems, achieve significantly lower execution times. This can be attributed to the fact that although database is distributed across flash memory chips, flash memory banks are channelled in parallel, hence the lower execution time. When getting a series of requests for data located on different channels, the SSDs are able to handle these requests in parallel.
The execution time of query readings-1, 8, 10, 12 and 16 in both SSDs and Seagate HDD show comparable values, this is because the queries are accessing sequentially stored data. Summarizing, solid state drives perform significantly better than hard disk drives in random operations in contrast to sequential operations.
Page 54
The power consumption was measured for all 17 TPC-H queries running each of them for 10 times. In order to measure, cost control 3000-a product from Base Tech, was used, which measures not only power consumption but also energy costs accordingly. Consumption of the power indicated is not purely by the drives alone but also from the system on which it is running, the variations are quite acceptable due to moving parts of the Seagate hard disk.
5.4.2 Results
The results are displayed in the Figure 28
Power consumption
3 2.5
Power(KWh)
Figure 28: Comparison for energy efficiency These values are indirectly influenced by the total amount of time taken by individual drives to execute all the TPC-H queries. The Seagate drive consumes approximately 6 times more power than Solid State Drives under tests. This is mainly due to moving parts of Seagate drive;
especially during random readings, head needs to be moved repeatedly. In case of SSDs, no
Page 55
However, as said earlier no specific conclusion can be brought out from this but, the picture shows the overall amount of power that can be saved by replacing HDD by solid state drives while performing the same task.
During the transfer test, certain parameters have to be set as per our requirements like Test speed/accuracy: the full test will read or write every sector on the disk. This will give the most accurate results but the test time will be very long. By choosing the partial test the transfer speed is sampled across the disk surface. The test time and accuracy can be chosen by moving the slider. Block size: block size which is used during the transfer rate test. Lower values may give lower test results. The default and recommended value is 64 KB. Access time The average access time is measured and displayed in milliseconds (ms). Burst rate The burst rate is the highest speed (in megabytes per second) at which data can be transferred from the drive interface (IDE, SATA, USB) to the operating system.
Department of EIT Hochschule Rosenheim
Page 56
CPU usage The CPU usage shows how much CPU time (in %) the system needs to read data from the hard disk. As seen in section 5.3.3, the time consumed by Seagate hard disk to run queries was quite high in comparison to Intel X-25E and Crucial Real C300. This variation could be due to various factors such as access time, average transfer rate. In section 5.5.1, the above mention factors by running through HD tunes through all the three drives were examined.
5.5.2 Results
Transfer rate:
MB/s
Figure 29: Read speed comparison The sequential read in Seagate hard disk is low compared to hard disk drives shown in Figure 29, Seagate HDD is nearly 4 times slower than Intel X25-E and 5 times slower than Crucial C300. This is mainly because of its large access time.
Department of EIT Hochschule Rosenheim
Page 57
Access time
18 16 14 15.7
Time(ms)
Figure 30: Access time comparison This access time difference itself shows how valuable Solid state drives could be in high speed real time application. The SSDs access data nearly 150 times faster than HDD.
5.6 Summary
From benchmarks, the results obtained fit many of the observations done in section 3.1.3. Magnetic disks showed an overall low performance on read operations, due to seek time and rotational delay. SSDs are 8-10 times faster for reads on an average, and 150 times faster with respect to the access speed compared to Hard Disk Drive. SSDs showed a high performance on read operations, even showing a higher degree of performance on random reads in TPC-H benchmark. Most likely, this could be attributed to the fact that an SSD consist of multiple flash memory chips, connected in parallel, as discussed in section 4.3. SSDs will depend heavily on FTL, as each channel can handle requests in parallel! The overall performance results indicate solid state drives give better performance then HDD, but upon keen observation, considerable performance difference can be seen between the two solid states drives. The crucial Real C300 SSD outperforms Intel X25E SSD in most of the benchmark tests conducted above. The performance difference could be due to various reasons like flash type, controller, system architecture, cache buffer used. In coming chapters, the discussions bring out the analysis for performance difference.
Page 58
Chapter 6
TPC-H benchmark with scale factor of 10 was run through all the drives initially with system memory 2GB RAM. Later, benchmark process was repeated in Seagate HDD with 8GB RAM and 12GB RAM.
CPU: Intel Xeon Processor 5600 Main board: Intel 5520 OS : Microsoft Windows 7 Professional x64; Memory:
o o o
2 GB, DDR3-1333 SDRAM (Kingston) 4 GB, DDR3-1333 SDRAM (Kingston) 8GB, DDR3- 1333 SDRAM( Micron)
The TPC-H queries were run for 115 times but while calculating average query execution time, first 15 results were excluded to ensure that processor is only into running the application query.
6.2 Results
To match the performance of SSD, the system was upended with more RAM in steps.
Figure 31: performance comparison between HDD with 12GB system RAM vs SSDs with 2GB system RAM
Analysis of the Figure 31 indicates, expect for the queries-1, 3, 5, 12 and 16 the performance of SSDs with 2GB RAM cannot be met even by increasing the RAM to 12GB in the system
Department of EIT Hochschule Rosenheim
Page 60
with Seagate HDD. Increasing RAM beyond 12GB to attain better performance is not productive in current scenario. Since database size is normally huge, large amount of RAM has to be appended consistently with increase in database size to enhance the performance. Considering overall query performances, SSD provide better results. The comparison of performances with variation in RAM with HDD is indicated in Figure 32. The performance seems to be at saturation level irrespective of increase in RAM except in queries 1, 2, 8, 10, 12, 15. Increasing RAM beyond actual database size to attain better performance is not productive.
Figure 32: performance comparison between HDD with 2GB, 8GB, and 12GB system RAM
6.3 Conclusion
Database applications (server) greatly benefit random disk access speed .This is why servers have large DRAM footprints used as disk cache. However, solid state drive provides significantly better performance in random reads. The test results indicate solid state drives performs better while dealing with large data. Considering server applications, it is worth investing on solid state drives than RAM. Hence solid state drives are a better for performance enhancements. A precise decision cannot be taken by just based on above results as the scenario cannot be generalised. Applications where random reads or writes are very less compared to sequential reads or writes, HDD with RAM is better buy saving the investment significantly. Therefore
Department of EIT Hochschule Rosenheim Page 61
you have to be really careful about where SSDs are used; otherwise it is very difficult to justify their additional cost.
Page 62
Chapter 7
7 . Reverse engineering
As seen in section 5.3.3, Crucial Real C300 performed better than Intel X25E. This chapter focuses on system level structure of solid state drives deeply. Here, it is tried to analyze the factors that influenced the difference in performance. To visualize, reverse engineering process is carried on Intel X25E and Crucial Real C300. Reverse engineering is the process of discovering the technological principles of a human made device, object or system through analysis of its structure, function and operation. It often involves taking something (a mechanical device, electronic component, or software program) apart and analyzing its workings in detail to be used in maintenance, or to try to make a new device or program that does the same thing without using or simply duplicating (without understanding) any part of the original. Reverse engineering has its origins in the analysis of hardware for commercial or military advantage. The purpose is to deduce design decisions from end products with little or no additional knowledge about the procedures involved in the original production. The basic building blocks of the Solid State Drive are Flash chip Array , Host interface and Controller chip which holds the other two intact and manages entire system. Intel X25E and crucial Real C300 are approached by these building blocks for analysis.
Native Command Queuing (NCQ). NCQ was originally designed to compensate for the rotational latency inherent to mechanical hard drives, but here it is being used in reverse, the ability of the SATA hard drive to queue and re-order commands to maximize execution efficiency. It takes a little time (time is of course relative when you're talking about an SSD whose access latency is measured in microseconds) between when a system completes a request and the next one is issued.
Figure 33: Intel X25 Extreme SSD Intel X25E is compatible with SATA 1.5 Gbps and 3 Gbps. The flash packages of course are only the building blocks of an SSD. Much of the magic comes from the architecture and optimizations of the SSD controller logic.
Page 64
Figure 34: Controller from Marvell on Intel X25-E SSD board The controller was a ball grid array package (BGA) with a single row wire bonding. Wires were made of gold used for bonding. Although the controller chip seems to be from Intel at the glance, the specifications on the die indicate it was from Marvell. The analog and digital sections in the controller die were well distinguished. The orientation of the controller die on the Intel X25E clearly indicates the SATA controller, DRAM controller and the Flash controller sections. The specifications of the controller are indicated in the Table 7-1, Table 7-2, Table 7-3.
Page 65
Figure 35: Crucial Real C300 SSD If you want to wring more than 300MB/s from a mechanical hard drive, you are going to have to combine several of them in RAID. Solid-state drive makers are actually faced with the same challenge. Individual flash chips do not necessarily offer superior sequential throughput to traditional hard drives, which means that SSD seeking to maximize performance must distribute the load across numerous chips tied to multiple memory channels, effectively creating a multiple channel array within the confines of a single drive. The Crucial Real SSD inherits its 6Gbps Serial ATA support from Marvell's 88SS9174 flash controller which supports TRIM command set. TRIM works in conjunction with Marvell's
Department of EIT Hochschule Rosenheim
Page 66
garbage collection routine, which runs in the background to reclaim flash pages marked as available by the command. The frequency with which garbage collection is performed depends on how the drive is being used and how much free capacity it has available. With eight memory channels, the Marvell controller is two short of the ten channels Intel squeezed into its X25E SSD. Crucial claims the C300 SSD can sustain a sequential read rate of 355MB/s when connected to a 6Gbps SATA interface. The drive's sequential read performance purportedly drops to 265MB/s when using a 3Gbps link. Flipping the C300's circuit board reveals a DDR memory chip that serves as the drive's cache. The 128MB Micron DDR3 DRAM module offers decent cache performance for fast transaction buffering, which will become more important as SATA-III 6.0 Gbps transfers are observed.
Figure 36: Controller from Marvell on Crucial Real C300 SSD board Unlike Intel X25E controller, Crucial Real C300 controller die did not give clear picture. However, by the orientation of the controller die on the board, the SATA and cache interconnections was visualized. The closer look suggested, the controller chip was ball grid array (BGA) package with wire bonding. Contrast to single row bonding in Intel X25E controller die, Crucial Real C300's Marvell controller used three rows for bonding expect for SATA where it used single row bonding. The pads for wire bonding were neatly arranged in multiple rows to shrink the die size.
Department of EIT Hochschule Rosenheim Page 67
The surface of the die looked like FPGA but a detailed analysis suggested, it was a mesh used by Marvell to limit inheritance of the design by its competitors. The much deeper analysis was of interest but due to certain limitations, analysis was concluded at this stage.
7.3 Summary
The reverse engineering analysis of the controllers from Intel X-25E and Crucial C300 is summarized in the table 7-1. With same package technology, increased number of balls, the size of the die from Crucial Real C300 SSD controller is comparably small. Although both SSDs use controller from Marvell, Crucial C300 uses the latest release of it which includes improved firmware features.
The advanced flash interface technology in Crucial Real C300 is supported by SATA III i.e 6Gbps, while Intel is supported by SATA II interface.
Table 7-3 Interface compatibility of Intel X25- E and Crucial Real C300 SSD
7.4 Conclusion
The latest SSD controller from Marvell and advanced flash chip interface standards obtained in Crucial Real C300 giving higher bandwidth, which is backed up by a hefty DDR3 buffer enables it to meet the demands for faster data rates with SATA 6Gbps. These mentioned features of Crucial Real C300 SSD contribute to outperform Intel X25E SSD. Controller is the main part which holds interfaces from surrounding units to produce extended performance, baring bottle necks of individual parts. Overall NAND performance is an important factor at a time when faster speeds are a critical design factor for solid state drives; especially the interfaces of those SSDs connect to offer faster data rates with Serial ATA 6 Gbps, USB 3.0, and PCI Express Gen2. *Note: In our tests, a common SATA 3Gbps was used for testing both the SSDs
Page 69
Chapter 8
Programmatic cost, schedule, support, warranty, and availability. Technical performance, power, package options, features, scalability, and flexibility. Other commonality, compatibility, documentation, development support, testing, and reputation.
In the process of controller selection, the system designer is also doing the same analysis for the flash parts and other parts needed in the design. It is an iterative process to find the right combination of components to best meet the requirements for the particular product. Due to proprietary concerns, not all the controller design data is available to the general public over the Internet. There is however a significant amount of application detail that can be learned for each of the SSD controllers on the market by studying their use in existing SSDs. In order to meet the known performance bandwidth specifications of current interface technologies through SSD and put them into economic perspective, this section performs a package level cost-effectiveness analysis of controllers by varying the system level architecture of SSD-considering SSD controller price per performance as our metric of choice.
8.1 Cost estimation of controller for a system designed to meet performance specification
8.1.1 MATLAB GUIDE
GUIDE, the MATLAB graphical user interface development environment, provides a set of tools for creating graphical user interfaces (GUIs). These tools simplify the process of laying out and programming GUIs.
Department of EIT Hochschule Rosenheim
Page 70
Using the GUIDE Layout Editor, user can populate a GUI by clicking and dragging GUI components-such as axes, panels, buttons, text fields, sliders, and so on-into the layout area. User can also create menus and context menus for the GUI. From the Layout Editor, user can size the GUI, modify component look and feel, align components, set tab order, view a hierarchical list of the component objects, and set GUI options. A tool was created using MATLAB GUIDE, where the GUI will through-up options for the user to design his own system. The tool created defines the controller size and its cost for the designed system. The tool created is system level optimization tool, designed to optimize different interfaces in solid state storage system to get the best performance and cost for a desired system interface. It defines the Controller which meets the performance of system interface by varying the quantity of other opted type of integral parts of SSD.
In setting the buffer cache interface for the designed system, its value assigned performs 4 times the maximum performance of system interface/Host interface selected. Flash Interface Flash interface is the main part in the solid state drive where the performance of the designed system is controlled. The performance is controlled by varying the number of flash channels, number of flash chips per channel, channel width. The tool provides with two different options i.e Single Level Cell and Multi Level Cell to be chosen while designing the system. Note: Flash read Performance per chip per channel is considered as a variable as it depends on the manufacturer. Controller Size Factors While calculating the controller size resulting from system designed, different interfaces that are to be handled and their resulting signal pins are considered. Table below lists the different signal pins that are considered for respective interfaces on the controller.
Signals Control Signals Flash Chips Chip select Write enable Command latch enable Ready/busy Reset/write protect DRAM** CK CK# CKE RST# RAS# CAS# WE# CS# ODT Memory Address Bank Address Data signal DQ MA BA DQ DQS DQS# DM Transmitter[+,-] Receiver[+,-] PET_P[x : 0] PET_N[x : 0] PER_P[x : 0] PER_N[x : 0] Transmitter Receiver SATA CLK PCI-Express* REF_CLK_P REF_CLK_N UART CLK
The power to ground pin ratio for signal pins is set as variable.
Department of EIT Hochschule Rosenheim Page 72
Controller Cost Factors The tool calculates controller cost for two types of packages namely, Flip-Chip BGA and Wire bonded BGA. The packaging cost for FCBGA and wire bonded BGA is calculated as a function of variables such as die cost, number of I/Os, wafer level die yield, and assembly process yield. The cost of the designed controller is calculated in two sections Cost of the Die and cost of package. The size of the die depends on number of I/O pads, pad pitch and their arrangements.
Package type Pad pitch (microns) Bond pad configuration FCBGA 150 Area array pads Wirebonded BGA 80 Peripheral pads
The gross Die Per Wafer [DPW] can be estimated by the following expression: Wafer diameter [d, mm] Die size [S, mm2] The cost of the die is given by Cost per Die ($) = The cost for respective package depends on cost of each process step indicated in the figure
Figure 37 : Process flow for flip chip BGA and wire bonded BGA packaging.
Department of EIT Hochschule Rosenheim Page 73
Figure 39: Warning- System is over designed or under designed with respect to performance specified. Department of EIT Hochschule Rosenheim Page 74
8.2.3 Limitations
The tool has limited number of package options while calculating the Cost. The performance values indicated in the tool during design are all theoretical.
Page 75
Figure 41 : Controller size for the system with SATA 2.0 interface
Figure 42 : Controller size for the system with SATA 3.0 interface
Department of EIT Hochschule Rosenheim Page 76
The values from the tool indicate that controller sizes are comparable. The difference in values could be due to various reasons such as signal to power pins ratio. These values are just for comparison. The illustration of the tool for optimal system design and controller cost estimation is shown in Appendix B
8.4 Hints to use tool for optimal system design and controller cost estimation:
Select the desired host interface to set the performance specification for the system to be designed. Fill in the inputs such as flash chip performance, power-ground to I/O pins ratio, select type of cache desired. While designing the system, beware of the warning massage for over design or under design of the system. This helps user to design optimal system architecture for the performance specified in first step. Select calculate button to know total number of pins (balls) on controller chip for the designed system. To know the cost of the controller for the designed system, press Die Cost ($) which is at bottom left of your screen. Select desired, node technology and then select desired package type and finally press Chip cost button to view the cost.
Page 77
Chapter 9
9 . Summary
9.1 Conclusion
In conclusion, the common sense intuition that flash based Solid State Drives (SSDs) provide superior performance for large read I/O is validated. As studied, SSDs are several times faster for reads on an average, and extremely faster with respect to the access speed compared to hard disk drives. Solid state drives are more efficient in power consumption comparably. Heavily used transactional databases, where there is an excessive random I/O workload benefit the most from SSD technology as it additionally helps to negate the disk configuration issues. With HDD devices, it is critical as to how the database structure is laid out, the number of spindles etc, but, for SSD based systems, it does not matter whether data is laid or use column or row-oriented storage for our databases as all the data space finally results in the same performance!
Although it may be considered as application specific, considering the test results, investment on solid state drives is better than on Random Access Memory in regards to performance boost.
The system level optimization tool simulates the scenario which helps for studying the solid storage system and carrying out various odds and outs to enhance the performance of the system in order to save cost when its implemented practically. The development of system level optimization tool is very much effective in designing the solid state storage systems architecture for best performance for a desired system interface. A detailed analysis of the factors considered in the tool helps to guide the decision and clarify the effects of the variables on the cost of controller.
Page 78
With the tremendous hype, it makes truly excited about the potential of SSDs and the rate at which manufacturers are improving on this technology making them increasingly delicious, seeing them dominating the market is a little far from reality at this point.
Page 79
Appendix A
Building TPC-H benchmark
The TPC Benchmark H (TPC-H) is a decision support benchmark. It consists of a suite of business oriented ad-hoc queries and concurrent data modifications. The queries and the data populating the database have been chosen to have broad industry-wide relevance. This benchmark illustrates decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to critical business questions. TPC-H benchmark is an Embedded SQL database application, which connects to database and execute embedded SQL statements. Embedded SQL statements are embedded within a host language application. To build the TPC-H benchmark, TPC provides a set of tools namely, a database population generator (DBGEN) and a query template translator (QGEN). With IBM DB2 as a platform, DBGEN provides data for the database and QGEN provides SQL queries. Using these, TPC-H benchmark application can be created. This is done in two parts, creating TPC-H database and creating a package between query source file and the database.
DBGEN generates 8 separate ascii files, each file will contain pipe-delimited data. Create 8 tables under database schema named as TPC-H and import each one of the ascii files into tables defined in the TPC-H database schema.
Assign keys by altering tables in TPC-H database as per TPC-H specification [13].
Page 80
TPC-H application package Source file is created by embedding TPC-H queries/SQL statements in 'C' programming language. To run applications written in compiled host languages, you must create the packages needed by the database manager at execution time. The Figure 43 shows the order of these steps, along with the various modules of a typical compiled DB2 application [4] 1. Create source files that contain programs with TPC-H queries. 2. Connect to a TPC-H database generated using DBGEN, then precompile each source
Page 81
file to convert embedded SQL source statements into a form the database manager can use. [The precompiler converts embedded SQL statements directly into DB2 run-time services API calls. When the precompiler processes a source file, it specifically looks for SQL statements and avoids the non-SQL host language. PRECOMPILE (PREP) is an application process that modifies source files containing embedded SQL statements (*.sqc) and yields host language calls consisting of a source file(s) (*.c) and a package. It is at this precompile time that the TIMESTAMP, which is also known as the UNIQUE ID or CONSISTENCY TOKEN, is generated and is associated with the package through the bind file and modified source code.] 3. Compile the modified source files (*.c) using the host language compiler. 4. Link the object files with the DB2 and host language libraries to produce an executable program. Compiling and linking (steps 3 and 4) create the required object modules 5. The BIND command invokes the bind utility. It prepares SQL statements stored in the bind file generated by the precompiler and creates a package that is stored in the database. Bind the bind file to create the package, or bind if different database is going to be accessed. Binding creates the package to be used by the database manager when the program is run. 6. Run the TPC-H benchmark application. The application accesses the TPC-H database using the access plans.
Page 82
Appendix B
System level optimization tool
The System level optimization tool, designed to optimize different interfaces in solid state storage system to get the best performance and cost for a desired system interface is illustrated. Different interfaces to the controller influencing the performance of Solid State Drives: Host Interface Flash Interface Number of Channels Channel width Flash Chip read performance Buffer cache Interface Cache type Cache standard I/O channel width Number of channels Cache size The tool is operated in two sections:
Page 83
Step1: Select the desired Host Interface which is the critical factor based on which the system is designed. The system is designed to match the maximum performance offered by selected Host Interface. Here SATA 2.0 is selected for illustration
Step2: Enter all the parameters needed to design a system and calculate its performance such as flash chip read performance, signal to power pin ratio
Department of EIT Hochschule Rosenheim
Page 84
Step 3: Select the type of cache and its standard to vary the number of cache channels suitably. The buffer cache channels are selected ensuring that the performance of buffer cache is 4 times faster than the selected system interface. Here DDR2 is selected and flash chip read performance is entered as 25ns (nano-seconds).
Step 4: The Signal-pins button calculates the designed system performance along with the number of balls on the designed controller. In this case the tool shows a system warning as the system is under-designed for desired system interface. This is visualized by comparing
Department of EIT Hochschule Rosenheim Page 85
desired system performance and designed system performance which are 300MBps and 80MBps respectively. The Vcc-Vdd/IO pin ratio is considered as 1(every 2 IO pins requires 1 Vdd and 1 VCC).
Step 5: To increase the performance, the number of flash channels in parallel are increased. In this case it is increased from 2 to 4 channels.
Page 86
Step 6: The system warning indicated that the system is still under designed. So either the number of flash channels should be increased else the channel width can be increased to attain better performance. In this case the channel width is increased from 8bit to 16bit.
Step 7: When the designed system performance and the desired system performance are matched with +/- 10 percent, the warning stops, indicating that the system design is optimal with respect to selected Host Interface. The designed system is optimal in performance and resulting controller has 340 pins (balls).
Page 87
Step 8: If the designed system exceeds the maximum performance by host interface, the tool warns for overdesigned system. The system should be altered by comparing the host interface performance and designed system performance. Step 9: The designed system capacity can be increased by varying Flash Chips/Channel menu and also by selecting the number of Dies/Flash chip
Section 2: Cost calculation of the controller for the designed Solid State Drive
Continuing the previous example, the designed system has 340 pins (balls) as seen in section1, step 7. This section of the tool calculates cost of the die based on number of resulted pins along with selected parameters such as node technology, wafer diameter and the package selected. Considering the similar packages available from Texas Instruments, an estimation of cost can be made for the controller of the system designed.
Page 88
The cost of the controller chip for designed system is 9.89 $ approximately.
Page 89
Bibliography
[1] R. Bez, E. Camerlenghi, A. Modelli, and A. Visconti. Introduction to flash memory. Proceedings of the IEEE, 91(4):489502, April 2003. [2] Intel X25-E Extreme SATA Solid-State Drive http://download.intel.com/design/flash/nand/extreme/319984.pdf, [3] http://onfi.org/specifications/ [4] IBM DB2 Guide -IBM public library http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=/com.ibm.db2.luw. apdv.embed.doc/doc/c0021136.html [5] Cagdas Dirik and Bruce Jacob. The performance of pc solid-state disks (ssds) as a function of bandwidth, concurrency, device architecture, and system organization. In ISCA 09: Proceedings of the 36th annual international symposium on Computer architecture, pages 279289,New York, NY, USA, 2009. ACM. [6] Design tradeoffs for SSD performance, Nitin Agarwal,Vijayan Prabhakaran. www.usenix.org/event/usenix08/tech/full_papers/agrawal/agrawal.pdf [7] David Roberts, Taeho Kgil, and TrevorMudge. Integrating nand flash devices onto servers. Commun. ACM, 52(4):98103, 2009. [9] Super Talent Technology. SLC vs. MLC: An Analysis of Flash Memory. http://www.supertalent.com/datasheets/SLC_vs_MLCwhitepaper.pdf. [10] Trends in Enterprise Hard Disk Drives, Seiichi Sugaya (June 30, 2005) http://www.fujitsu.com/downloads/MAG/vol42-1/paper08.pdf [11] Intel Corporation. Intel - Understanding the Flash Translation Layer (FTL) Specication. http://www.embeddedfreebsd.org/Documents/Intel-FTL.pdf, 1998. [12] Imation. Solid State Drives - Data Reliability and Lifetime. http://www.imation.com/PageFiles/83/SSD-Reliability-Lifetime-White-Paper.pdf. [13] TPC BENCHMARKTM H- www.tpc.org/tpch/spec/tpch2.1.0.pdf
Department of EIT Hochschule Rosenheim
Page 90
[14] HD tune pro manual, hdtunepro.pdf [15] Tom's hardware. Flash SSD Update: More Results, Answers. http://www.tomshardware.com/reviews/ssd-hard-drive,1968-4.html. [16] Intel X25-E Extreme SATA Solid-State Drives http://download.intel.com/design/flash/nand/extreme/319984.pdf [17] RealSSD C300 2.5 Technical Specifications Crucialwww.crucial.com/pdf/Datasheets-letter_C300_RealSSD_v2-5-10_online.pdf [18] Computer organization and design: the hardware and software interface by David A.Patterson, John L. Hennessy (page 450-475) [19] en.wikipedia.org/wiki/computer_data_storage [20] Intel: Disk Interface Technology,Quick reference guide NP2108.pdf 1040211 [21] Write Endurance in Flash Drives: Measurements and Analysis, Simona Boboila & Peter Desnoyer-http://www.usenix.org/event/fast10/tech/full_papers/boboila.pdf [22] Intel High Performance Solid State Drive - Solid State Drive Frequently Asked Questions http://www.intel.com/support/ssdc/hpssd/sb/CS-029623.htm#5 [23] http://en.wikipedia.org/wiki/Benchmark_(computing) [24] Barracuda 7200.10 www.seagate.com/docs/pdf/datasheet/ds_7200_10.pdf [25] en.wikipedia.org [26] http://onfi.org/wp-content/uploads/2011/03/20100818_S104_Grunzke.pdf [27] http://www.novopc.com/2008/09/hard-disk/ [28] http://www.easy-computer-tech.com [29] http://www.ramsan.com/resources/SSDOverview [30] http://www.datarecoverytools.co.uk [31] www.bit-tech.net [32] http://tjliu.myweb.hinet.net
Page 91
Index
A
Addressing, 22 Advanced Technology Attachment, 25 MATLAB GUIDE, 71 Memory, 11 MLC, 35
O
Offline Storage, 16
B
ball grid array package (BGA), 65 Benchmark, 49
P
Page, 37 PCI Express, 44 Primary storage, 14 Processor cache, 14
C
cache, 12, 14, 23, 24, 25, 29, 37, 41, 54, 58, 63, 67 Cell degradation, 38 Controller, 40 Crucial Real C300, 65
R
RAM, 12 Reverse engineering, 63 ROM, 11 Rotational latency, 21
D
Disk access time, 21
E
Erase Block, 38
S
Secondary Storage, 16 Seek time, 21 Serial Advanced Technology Attachment, 27 SLC, 35 Small Computer System Interface, 26 Solid State Drives, 32 Storage Hierarchy, 13 System Architecture, 9
F
FLASH MEMORY, 34 Flash Structure, 37 Flash Translation Layer, 40
G
Garbage collection, 42
T
Tertiary Storage, 16 TPC-H, 50, 51, 52, 60, 81, 82, 83 Trim, 43
H
Hard Disk Drives, 19 HD Tune, 56
I
Intel X25-Extreme, 63
W
Wear-leveling, 42 Write Amplification, 43
M
Marvell, 65, 66, 67, 68, 69
Page 92