Final

1
CHAPTER I INTRODUCTION 1.1 INTRODUCTION ABOUT THE PROJECT
Nanotechnology provides smaller, faster, and lower energy devices which allow more powerful and compact circuitry; however, these benefits come with a costthe nanoscale devices may be less reliable. Thermal- and shot-noise estimations alone suggest that the transient fault rate of an individual nanoscale device (e.g., transistor or nanowire) may be orders of magnitude higher than todays devices. As a result, we can expect combinational logic to be susceptible to transient faults in addition to storage cells and communication channels. Therefore, the paradigm of protecting only memory cells and assuming the surrounding circuitries (i.e., encoder and decoder) will never introduce errors is no longer valid .In this paper, we introduce a fault-tolerant nanoscale memory architecture which tolerates transient faults both in the storage unit and in the supporting logic (i.e., encoder, decoder (corrector), and detector circuitries). Particularly, this involves identifying a class of errorcorrecting codes (ECCs) that guarantees the existence of a simple fault-tolerant detector design. This class satisfies a new, restricted definition for ECCs which guarantees that the ECC codeword has an appropriate redundancy structure such that it can detect multiple errors occurring in both the stored codeword in memory and the surrounding circuitries. We call this type of error-correcting codes, fault-secure detector capable ECCs (FSD-ECC). The parity-check Matrix of an FSD-ECC has a particular structure that the decoder circuit, generated from the parity-check Matrix, is Fault-Secure. The ECCs we identify
in this class are close to optimal in rate and distance, suggesting we can achieve this property without sacrificing traditional ECC metrics. We use the fault-secure detection unit to design a fault-tolerant encoder and corrector by monitoring their outputs. If a detector detects an error in either of these units, that unit must repeat the operation to generate the correct output vector. Using this retry technique, we can correct potential transient errors in the encoder and corrector outputs and provide a fully fault-tolerant memory system. The novel contributions of this paper include the following: 1. a mathematical definition of ECCs which have simple FSD which do not requiring the addition of further redundancies in order to achieve the fault-secure property 2. identification and proof that an existing LDPC code (EGLDPC) has the FSD property 3. a detailed ECC encoder, decoder, and corrector design that can be built out of fault-prone circuits when protected by this fault-secure detector also implemented in fault-prone 4. circuits and guarded with a simple OR gate built out of reliable circuitry .
To further show the practical viability of these codes, work is done through the engineering design of a nanoscale memory system based on these encoders and decoders including the following: memory banking strategies and scrubbing reliability analysis unified ECC scheme for both permanent memory bit defects and transient upsets
This allows us to report the area, performance, and reliability achieved for systems based on these encoders and decoders,
1.2 LITERETURE SURVEY H. Naeimi and A. DeHon, Fault secure encoder and decoder for memory applications, in Proc. IEEE Int. Symp. Defect Fault Tolerance VLSI Syst., Sep. 2007,
Proposed the concept of a nanowire-based, sub lithographic memory architecture tolerant to transient faults. Both the storage elements and the supporting ECC encoder and corrector are implemented in dense, but potentially unreliable, nanowire based technology. This compactness is made possible by a recently introduced Fault-Secure detector design [18]. Using Euclidean Geometry error-correcting codes (ECC), and identify particular codes which correct up to 8 errors in data words, achieving a FIT rate at or below one for the entire memory system for bit and nanowire transient failure rates as high as 1017 upsets/device/cycle with a total area below 1.7 the area of the unprotected memory for memories as small as 0.1 Gbit. Scrubbing designs are explored and this shows that the overhead for serial error correction and periodic data scrubbing can be below 0.02% for fault rates as high as 1020 upsets/device/cycle. A design is presented to unify the errorcorrection coding and circuitry used for permanent defect and transient fault tolerance. M. Davey and D.J.Mackay, Low density parity check codes over Gf(q), IEEE Commun. Lett.,vol.2,no.6,pp.165-167,jun.1998.
Proposed the concept of memory cells were the only circuitry susceptible to transient faults, and all the supporting circuitries around the memory (i.e., encoders and decoders) were assumed to be fault-free. As a result most of prior work designs for fault-tolerant memory systems focused on protecting only the memory cells. However, as we continue scaling down feature sizes or use sub lithographic devices, the surrounding circuitries of the memory system will also be susceptible to permanent defects and transient faults . S. J. Piestrak, A. Dandache, and F. Monteiro, Designing fault-secure parallel encoders for systematic linear error correcting codes, IEEE Trans. is using redundancy to generate fault tolerant Reliab., vol. 52, june 2003 Proposed the scheme encoder. develops a fault- secure encoder unit using a concurrent parity prediction scheme. Like the general parity-prediction technique, concurrently generates (predicts) the parity-bits of the encoder outputs (encoded bits) from the encoder inputs (information bits). The predicted parity bits are then compared against the actual parity function of the encoder output (encoded bits) to check the correctness of the encoder unit. The parity predictor circuit implementation is further optimized for each ECC to make a more compact design. For this reason, efficient parity prediction designs are tailored to a specific code. Simple parity prediction guarantees single error detection; however, no generalization is given for detecting multiple errors in the detector other than complete replication of the prediction and comparison units. H. Tang, J. Xu, S. Lin, and K. A. S. Abdel-Ghaffar, Codes on finite geometries, IEEE Trans. Inf. Theory, vol. 51, no. 2, Feb. 2005. Proposed new techniques Euclidean Geometry codes based on the lines and points of the corresponding finite geometries .Euclidean Geometry codes
are also called EG-LDPC codes based on the fact that they are low-density parity-check (LDPC) codes .LDPC codes have a limited number of 1s in each row and column of the matrix; this limit guarantees limited complexity in their associated detectors and correctors making them fast and light weight . D. J. C. MacKay, Good error-correcting codes based on very sparse matrices, IEEE Trans. Inf. Theory, vol. 45, no. 2, pp. 399431, Mar.1999.
Proposed on a simple electromechanical memory device in which an iron nano particle shuttle is controllably positioned within a hollow nano tube channel. The shuttle can be moved reversibly via an electrical write signal and can be positioned with nanoscale precision. The position of the shuttle can be read out directly via a blind resistance read measurement, allowing application as a non volatile memory element with potentially hundreds of memory states per device. The shuttle memory has application for archival storage, with information density as high as 1012 bits/in2, and thermodynamic stability in excess of one billion years. H. Wymeersch, H. Steendam, and M. Moeneclaey, Log-domain decoding of LDPC codes over Gf(q), in Proc. IEEE Int. Conf. Commun., Paris, France, Jun. 2004, pp. 772776.
Proposed on a performance and reliability analysis of a scaled crossbar molecular switch memory and demultiplexer. In particular, multi-switch junction fault tolerance scheme is compared with a banking defect tolerance scheme. Results indicate that delay and power scale linearly with increasing number of redundant molecular switch junctions. The
multi-switch junction scheme was also shown to achieve greater than 99% reliability for molecular switch junction failures rates less than 20%, when a redundancy of at least 3 was implemented. In contrast, the banking scheme was only effective for molecular switch junction failure rates of less 1%, which requires over three times the number of banking modules.
CHAPTER II SYSTEM ANALYSIS 2.1 EXISTING METHOD With the popularity of mobile wireless devices soaring, the wireless communication market continues to see rapid growth. However, with this growth comes a significant challenge. Many applications, such as digital video, need new high data rate wireless communication algorithms. The continuous evolution of these wireless specifications is constantly widening the gap between wireless algorithmic innovation and hardware implementation. In addition, low power consumption is now a critical design issue, since the life of a battery is a key differentiator among consumer mobile devices. The chip designer's most important task is to implement highly complex algorithms into hardware as quickly as possible, while still retaining power efficiency. High Level Synthesis (HLS) methodology has already been widely adopted as the best way to meet the challenge. This article gives an example in which an HLS tool is used, together with architectural innovation, to create a low power LDPC decoder.
HLS methodology allows the hardware design to be completed at a higher level of abstraction such as C/C++ algorithmic description. This provides significant time and cost savings, and paves the way for designers to handle complex designs quickly and efficiently, producing results that compare favorably with hand design. HLS tools also offer specific power-saving features, designed to solve the problems of power optimization. In any design, there are huge opportunities for power reduction at both the system and the architecture levels. HLS can make a significant contribution to power reduction at the architecture level, specifically by offering the following: Ease of architecture and micro-architecture exploration and ease of frequency and voltage exploration. Use of high level power reduction techniques such as multi-level clock gating, which are time-consuming and error-prone when done manually at the RTL level. Power-saving opportunities at the RTL and gate-level are limited and have a much smaller impact on the total power consumption LOW DENSITY PARITY CHECK CODERS Forward Error Correction (FEC) coding, a core technology in wireless communications, has already advanced from 2G convolutional/block codes to more powerful 3G Turbo codes. Recently, designers have been looking elsewhere for help with the more complex 4G systems. A Low-Density, ParityCheck (LDPC) encoding scheme is an attractive proposition for these systems, because of its excellent error correction performance and highly parallel decoding scheme. Nevertheless, it is a major challenge for any designer to create quickly and efficiently a high performance LDPC decoder which also meets the data rate and power consumption constraints in wireless handsets. LDPC decoders vary significantly in their levels of parallelism, which range from fully parallel to partially parallel to fully sequential. A fully parallel
decoder requires a large amount of hardware resources. Moreover, it hardwires the entire parity matrix into hardware, and therefore can only support one particular LDPC code. This makes it impractical to implement in a wireless system-on-a-chip (SoC) because different or multiple LDPC codes might need to be supported eventually. Partial parallel architectures can achieve high throughput decoding at a reduced hardware complexity. However, the level of parallelism in these instances has to be at the sub-circulant (shifted identity matrix) level, which makes it code-specific as well and therefore can be too inflexible for the wireless SoC
2.2 PROPOSED METHOD
In this paper a fault-tolerant nano-technology memory system that tolerates faults in the encoder, corrector and detector circuitry as well as the memory is presented. Euclidean Geometry codes with a fault-secure detector are used to design this memory system. These particular codes tolerate up to 8 errors in the stored data and up to 16 total errors in memory and correction logic with an area less than 1.7 times the unprotected memory area; thereby this involves determining an optimum scrubbing interval, banking scheme, and corrector parallelism so that error correction has negligible performance overhead. This design shows a nanoscale corrector to tolerate permanent cross point defects. Nanotechnology provides smaller, faster, and lower energy devices, which allow more powerful and compact circuitry; however, these benefits come with a costthe nanoscale devices may be less reliable. Thermal- and shot-noise estimations alone suggest that the transient fault rate of an individual nanoscale device (e.g., transistor or
nanowire) may be orders of magnitude higher than todays devices. As a result, we can expect combinational logic to be susceptible to transient faults, not just the storage and communication systems. Therefore, to build fault-tolerant nanoscale systems, we must protect both combinational logic and memory against transient faults. In the present work we introduce a fault-tolerant nanoscale memory architecture which tolerates transient faults both in the storage unit and in the supporting logic (i.e., encoder and decoder (corrector) circuitry). Our proposed system with high fault-tolerant capability is feasible when the following two fundamental properties are satisfied: 1) Any single error in the encoder or corrector circuitry can only corrupt a single codeword digit (i.e., cannot propagate to multiple codeword digits). 2) There is a Fault Secure detector (FSD) circuit which can detect any limited combination of errors in the received codeword or the detector circuit itself. Property 1 is guaranteed by not sharing logic between the circuitry which produces each bit. The FSD (Property 2) is possible with a more constrained definition for the ECC .Figure 1 shows the memory architecture based on this FSD. There are two FSD units monitoring the output vector of the encoder and corrector circuitry. If an error is detected at the output of the encoder or corrector units, that unit has to redo the operation to generate the correct output vector. Using this detect-and-repeat technique, correct potential transient errors can be corrected in the encoder or corrector output to provide a fault-tolerant memory system with fault-tolerant supporting circuitry. The conventional strategy only works as long as we can expect the encoding, decoding, and checking logic to be fault-free, which would prevent the use of nanoscale devices.
10
It is important to note that transient errors accumulate in the memory words over time. In order to avoid error accumulation, which exceeds the code correction capability, the system must scrub memory frequently to remove errors. Memory scrubbing is periodically reading memory words from the memory, correcting any potential errors, and writing the corrected words back into the memory . The frequency of scrubbing must be determined carefully. The scrubbing frequency impacts the throughput from two directions: i) The memory cannot be used on scrubbing cycles, reducing the memory bandwidth available to the application; more frequent scrubbing increases this throughput loss effect. ii) During the normal operation, when an error is detected in a memory word, the system must spend a number of cycles correcting the error; these cycles also take bandwidth away from the application. When scrubbing happens less frequently, more errors accumulate in the memory, and therefore more memory reads require error correction, increasing bandwidth loss.
11
Fig 2.1: Fault-tolerant memory architecture, with Multiple Parallel Pipelined Corrector corrector
The information bits are fed into the encoder to encode the information vector, and the fault secure detector of the encoder verifies the validity of the encoded vector. If the detector detects any error, the encoding operation must be redone to generate the correct codeword. The codeword is then stored in the memory. During memory access operation, the stored code words will be accessed from the memory unit. Code words are susceptible to transient faults while they are stored in the memory; therefore a corrector unit is designed to correct potential errors in the retrieved code words.
CHAPTER III
12
DEVELOPMENT ENVIRONMENT
3.1. HARDWARE ENVIRONMENT
1. WINDOWS XP 2. DUAL CORE processor 3. 512 SD RAM 4. JTAG CABLE 5. CPLD 3.1.1 INTRODUCTION TO CPLD A complex programmable logic device (CPLD) is a programmable logic device with complexity between that of PALs and FPGAs, and architectural features of both. The building block of a CPLD is the macro cell, which contains logic implementing disjunctive expressions and more specialized logic operations. Features in common with PALs:
Non-volatile configuration memory. Unlike many FPGAs, an external configuration ROM isn't required, and the CPLD can function immediately on system start-up.
For many legacy CPLD devices, routing constrains most logic blocks to have input and output signals connected to external pins, reducing opportunities for internal state storage and deeply layered logic. This is usually not a factor for larger CPLDs and newer CPLD product families.
13
Features in common with FPGAs:
Large number of gates available. CPLDs typically have the equivalent of thousands to tens of thousands of logic gates, allowing implementation of moderately complicated data processing devices. PALs typically have a few hundred gate equivalents at most, while FPGAs typically range from tens of thousands to several million. Some provisions for logic more flexible than sum-of-
product expressions, including complicated feedback paths between macro cells, and specialized logic for implementing various commonlyused functions, such as integer arithmetic.
The most noticeable difference between a large CPLD and a small FPGA is the presence of on-chip non-volatile memory in the CPLD. This distinction is rapidly becoming less relevant, as several of the latest FPGA products also offer models with embedded configuration memory. The characteristic of nonvolatility makes the CPLD the device of choice in modern digital designs to perform 'boot loader' functions before handing over control to other devices not having this capability. A good example is where a CPLD is used to load configuration data for an FPGA from non-volatile memory.
CPLDs were an evolutionary step from even smaller devices that preceded them, PLAs (first shipped by Signetics), and PALs. These in turn were
14
preceded by standard logic products, that offered no programmability and were "programmed" by wiring several standard logic chips together.
Because they offer high speeds and a range of capacities, CPLDs are useful for a very wide assortment of applications, from implementing random glue logic to prototyping small gate arrays. One of the most common uses in industry at this time, and a strong reason for the large growth of the CPLD market, is the conversion of designs that consist of multiple SPLDs into a smaller number of CPLDs.
CPLDs can realize reasonably complex designs, such as graphics controller, LAN controllers, UARTs, cache control, and many others. As a general ruleof-thumb, circuits that can exploit wide AND/OR gates, and do not need a very large number of flip-flops are good candidates for implementation in CPLDs. A significant advantage of CPLDs is that they provide simple design changes through re-programming (all commercial CPLD products are reprogrammable). With insystem programmable CPLDs it is even possible to reconfigure hardware (an example might be to change a protocol for a communications circuit) without power-down. Designs often partition naturally into the SPLD-like blocks in a CPLD. The result is more predictable speed-performance than would be the case if a design were split into many small pieces and then those pieces were mapped into different areas of the chip. Predictability of circuit implementation is one of the strongest advantages of CPLD architectures.
15
Commercially Available FPGAs As one of the largest growing segments of the semiconductor industry, the FPGA market-place is volatile. As such, the pool of companies involved changes rapidly and it is somewhat difficult to say which products will be the most significant when the industry reaches a stable state. For this reason, and to provide a more focused discussion, we will not mention all of the FPGA manufacturers that currently exist, but will instead focus on those companies whose products are in widespread use at this time. In describing each device we will list its capacity, nominally in 2-input NAND gates as given by the vendor. Gate count is an especially contentious issue in the FPGA industry, and so the numbers given in this paper for all manufacturers should not be taken too seriously. Wags have taken to calling them dog gates, in reference to the traditional ratio between human and dog years. There are two basic categories of FPGAs on the market today: 1. SRAM-based FPGAs and 2. antifuse-based FPGAs. In the first category, Xilinx and Altera are the leading manufacturers in terms of number of users, with the major competitor being AT&T. For antifuse-based products, Actel, Quicklogic and Cypress, and Xilinx offer competing products.
3.2 SOFTWARE ENVIRONMENT SOFTWARE TOOLS: MODEL SIM
XILINX
16
3.2.1 AN INTRODUCTION ABOUT MODEL SIM ModelSim XE-III is a complete PC HDL simulation environment that enables you to verify the HDL source code and functional and timing models of your designs. Each of the ModelSim tools includes a complete HDL simulation and debugging environment providing 100% VHDL and Verilog language coverage, a source code viewer/editor, waveform viewer, design structure browser, list window, and a host of other features designed to enhance productivity. ModelSim is an easy-to-use yet versatile VHDL/
(System)Verilog/SystemC simulator by Mentor Graphics. It supports behavioral, register transfer level, and gate-level modeling. ModelSim supports all platforms used here at the Institute of Digital and Computer Systems (i.e. Linux, Solaris and Windows) and many others too. On Linux and Solaris platforms ModelSim can be found preinstalled on Institute's computers. Windows users, however, must install it by themself. This tutorial is intended for users with no previous experience with ModelSim simulator. It introduces you with the basic flow how to set up ModelSim simulator, compile your designs and the simulation basics with ModelSim SE. The example used in this tutorial is a small design written in VHDL and only the most basic commands will be covered in this tutorial. This tutorial was made by using version 6.1b of ModelSim SE on Linux. The example used in this tutorial is a simple design describing an electronic lock that can be unlocked by entering a 4-digit PIN (4169) code from a key pad. When the lock detects the correct input sequence, it will set its output high for one clock cycle as a sign to unlock the door. The figure below shows the state machine of the design. The design also includes one dummy
17
variable (count_v) which has no practical meaning but is used to demonstrate debug methods in ModelSim. Modelsim eases the process of finding design defects with an intelligently engineered debug environment. The model sim debug environment efficiently displays design data for analysis and debug of all languages. Model Sim allows many debug and analysis capabilities to be employed post-simulation on saved results, as well as during live simulation runs. For example, the coverage viewer analyzes and annotates source code with code coverage results, including FSM state and transition, statement, expression, branch, and toggle coverage. Signal values can be annotated in the source window and viewed in the waveform viewer, easing debug navigation with hyperlinked navigation between objects and its declaration and between visited files. Race conditions, delta, and event activity can be analyzed in the list and wave windows. User-defined enumeration values can be easily defined for quicker understanding of simulation results. For improved debug productivity, Model Sim also has graphical and textual dataflow capabilities. FEATURES
High-performance, high-capacity engine for the fastest regression suite throughput Native support of Verilog, VHDL, and SystemC for effective verification of the most sophisticated design environments Fast time-to-debug causality tracing and multi-language debug environment Advanced code coverage and analysis tools for fast time to coverage closure
18
3.2.2 AN INTRODUCTION ABOUT XILINX Xilinx, is a supplier of programmable logic devices. It is known for inventing the field programmable gate array (FPGA) and as the first semiconductor company with a fabless manufacturing model. Xilinx was founded in 1984 by two semiconductor engineers, Ross Freeman and Bernard Vonderschmitt, who were both working for integrated circuit and solid-state device manufacturer Zilog Corp. Xilinx designs, develops and markets programmable logic products including integrated circuits (ICs), software design tools, predefined system functions delivered as intellectual property (IP) cores, design services, customer training, field engineering and technical support Xilinx sells both FPGAs and CPLDs programmable logic devices for electronic equipment manufacturers in end markets such as communications, industrial, consumer, automotive and data processing. Xilinx's FPGAs have been used for the ALICE (A Large Ion Collider Experiment) at the CERN European laboratory on the French-Swiss border to map and disentangle the trajectories of thousands of subatomic particles The Virtex-II Pro, Virtex-4, Virtex-5, and Virtex-6 FPGA families are focused on system-on-chip (SoC) designers because they include up to two embedded IBM PowerPC cores. Xilinx FPGAs can run a regular embedded OS (such as Linux or vxWorks) and can implement processor peripherals in programmable logic. Xilinx's IP cores include IP for simple functions (BCD encoders, counters, etc.), for domain specific cores (digital signal processing, FFT and FIR cores) to complex systems (multi-gigabit networking cores, MicroBlaze soft microprocessor, and the compact Picoblaze microcontroller). Xilinx also creates custom cores for a fee.The ISE Design Suite is the central electronic
19
design automation (EDA) product family sold by Xilinx. The ISE Design Suite features include design entry and synthesis supporting Verilog or VHDL, place-and-route (PAR), completed verification and debug using ChipScope Pro tools, and creation of the bit files that are used to configure the chip. Xilinx's Embedded Developer's Kit (EDK) supports the embedded PowerPC 405 and 440 cores (in Virtex-II Pro and some Virtex-4 and -5 chips) and the Microblaze core. Xilinx's System Generator for DSP implements DSP designs on Xilinx FPGAs. A freeware version of its EDA software called ISE WebPACK is used with some of its non-high-performance chips. Xilinx is the only (as of 2007) FPGA vendor to distribute a native Linux freeware synthesis toolchain. The Spartan series targets applications with a low-power footprint, extreme cost sensitivity and high-volume; e.g. displays, set-top boxes, wireless routers and other applications.The Spartan-6 family is built on a 45-nanometer [nm], 9-metal layer, dual-oxide process technology. The Spartan-6 was marketed in 2009 as a low-cost solution for automotive, wireless communications, flat-panel display and video surveillance applications.
3.2.3 HISTORICAL PERSPECTIVE-VLSI The electronics industry has achieved a phenomenal growth over the last two decades, mainly due to the advent of VLSI. The number of applications of integrated circuits in high-performance computing, telecommunications and consumer electronics has been raising steadily and at a very fast pace. Typically, the required computational power (or, in other words, the intelligence) of these applications is the driving force for the fast development of this field. The current leading-edge technologies (such as low bit-rate video and cellular communications) already provide the end users a
20
certain amount of processing power and portability. This trend is expected to be continued with very important implications of VLSI and systems design.
As more and more complex functions are required in various data processing and telecommunications devices, the need to integrate these function in the small system /package is also increasing .The level of integration as measured by the number of logic gates in a monolithic chip has been steadily rising for almost three decades, mainly due to the rapid progress in processing technology and interconnect technology. Shows the evolution of logic complexity in integrated circuits over the last three decades, and marks the milestone of each era. Here, the numbers for circuit complexity should be interpreted only as representative examples to show the order of magnitude. A logic block can contain ten to hundred transistors depending upon the function. The important message here is that the logic complexity per chip has been increasing exponentially. The monolithic integration of a large number of functions on a single chip usually provides: Less area / volume and therefore compactness. Less power consumption. Less testing requirements at system level. Higher reliability, mainly due to improved on-chip Interconnects. Higher speed, due to significantly reduced interconnection length. Significant cost savings.
21
Therefore, the current trend of integration will also continue in the foreseeable future. 3.2.3.1 VLSI DESIGN FLOW
Fig 3.1 :VLSI
Design Flow
22
The design process, at various levels, is usually evolutionary in nature. It starts with a given set of requirements. Initial design is developed and tested against the requirements. When requirements are not met, the design has to be improved. If such improvement is either not possible or too costly, then the revision of requirements and its impacts analysis must be considered. The VLSI design flow consists of three major domains, namely: Behavioral domain Structural domain Geometrical Layout domain.
The design flow starts from the algorithm that the behavior of the target chips. The corresponding architecture of the processor is first defined. It is mapped onto the chip surface by floor planning. The next design evolution in the behavioral domain defines finite state machines(FSMs), which are structurally implemented with functional modules such as registers and the arithmetic logic units (ALUs).These modules are then geometrically placed onto the chip surface using CAD tools for automatic module placement followed by routing, with a goal of minimizing the interconnects area and signal delays. The third evolution starts with behavioral modules are then implemented with leaf cells. At this stage the chip is described in terms of logic gates (leaf cells), which can be placed and interconnected by using a cell placement and routing program. The last evolution involves a detailed Boolean description of leaf cells and mask generation. In standard-cell based design, leaf cells are already pre-designed and stored in a library for logic design use.
23
3.2.4
VHDL AN OVERVIEW VHDL is a hardware description language. The word hardware
however is used in a wide variety of contexts, which range from complete systems like personal computers on one side to the small logical on their internal integrated circuits on the other side. 3.24.1 USES OF A VHDL: Since VHDL is a standard, the chip vendors can easily exchange their circuit designs without depending on their proprietary software. The designing process can be greatly simplified, as each component is designed individually and all such components are interconnected to form a full system- hierarchy and timing are always maintained. With simulators available, a circuit can be tested easily and any error found can be rectified without the expense of using a physical prototype, which means that design time and expenditure on this get slashed down. Programs written in either of the HDLS can be easily understood as they are similar to programs of C or Pascal. 3.2.4.2 FEATURES OF VHDL: VHDL provides five different types of primary constructs, called design units. They are, Entity: It consists of a designs interface signals to the external circuitry Architecture: It describes a designs behavior and functionality. Package: It contains frequently used declarations, constants, functions, procedures user data types and components.
24
Configuration: It binds an entity to architecture when there are multiple architecture for a single entity Library: It consists of all the compiled design units like entities architectures, packages and configurations
3.2.4.3
RANGE OF USE:
The design process always starts with a specification phase. The component, which is to be designed, is defined with respect to function, size, interfaces, etc. Despite the complexity of the final product, mainly simple methods based on paper and pencil most of the time are being used. After that, self-contained modules have to be defined on the system level. Behavior models of standard components can be integrated into the system from libraries of commercial model developers .The overall system can already be simulated. On the logic level, the models that have to be designed are described with all the synthesis aspects in view .As long as only a certain subset of VHDL constructs is used, commercial synthesis programs can derive the Boolean functions from this abstract model description and map them to the elements of an ASIC gate library or the configurable logic blocks of FPGAs. The result is a net list of the circuit or of the module on the gate level. Finally, the circuit layout for a specific ASIC technology can be created by means of other tools from the net list description. Every transition to a lower abstraction level must be proven by functional validation. For this purpose, the description is simulated in such a way that for all stimuli (=input
25
signals for the simulation) the modules responses are compared. VHDL is suitable for the design phases from system level to gate level. 3.2.4.4 APPLICATION FIELD: VHDL is used mainly for the development of Application Specific Integrated Circuits (Asics). Tools for the automatic transformation of VHDL code into a gate level net list were developed already at an early point of time. This transformation is called synthesis and is an integral part of current design flows. For the use with Field Programmable Gate Arrays (FPGAs) several problems exist. In the first step, Boolean equations are derived from the VHDL description, no matter, whether an ASIC or a FPGA is the target technology. But now, this Boolean code has to be partitioned into the configurable logic blocks (CLB) of the FPGA. This is more difficult than the mapping onto an ASIC library. Another big problem is the routing of the CLBs, as the available resources for interconnections are the bottleneck of current FPGAs. MODELING PROCEDURES USING VHDL STRUCTURAL STYLE OF MODELING In the structural style of modeling, an entity is described as a set of Inter connected components. Here the architecture body is composed of two parts: the declarative part and statement part. Declarative part contains the component declarations. The declared components are instantiated in the statement part.
BEHAVIORAL STYLE OF MODELING
26
The behavioral style of modeling specifies the behavior of an entity as aset of statements that are executed sequentially in the specified order. This set of sequential statements, which are specified inside a process statement, do not explicitly specify the structure of the entity but merely its functionality. A process statement is a concurrent statement that can appear with in an architecture body. DATA FLOW STYLE OF MODELING In this modeling style, the flow of data through the entity is expressed primarily using concurrent signal assignment statements. The structure of the entity is not explicitly specified in this modeling style, but it can be implicitly deduced. MIXED STYLE OF MODELING: It is possible to mix three modeling styles that we have known in a single Architecture body. That is, within an architecture body, we could use component instantiation statements, concurrent signal assignment statements, and process statements.
CHAPTER IV ARCHITECTURE DETAILS This paper presents a high-throughput decoder architecture for generic quasi-cyclic low-density parity-check (QC-LDPC) codes. Various optimizations are employed to increase the clock speed. A row permutation
27
scheme is proposed to significantly simplify the implementation of the shuffle network in LDPC decoder. An approximate layered decoding approach is explored to reduce the critical path of the layered LDPC decoder. Provided are an LDPC encoder and decoder, and LDPC encoding and decoding methods. The LDPC encoder includes: a code generating circuit that includes a memory storing a first parity check matrix and sums a first row which is at least one row of the first parity check matrix and a second row which is at least one of the remaining rows of the first parity check matrix to output a second parity check matrix; and an encoding circuit receiving the second parity check matrix and an information word to output an LDPC-encoded code word. Also the LDPC decoder includes: a code generating circuit including a memory which stores a first parity check matrix and summing a first row which is at least one row of the first parity check matrix and a second row which is at least one of the remaining rows of the first parity check matrix to output a second parity check matrix; and a decoding circuit receiving the second parity check matrix and a code word to output an LDPC-decoded information word. he lowdensity parity-check (LDPC) code invented in 1962 by Robert Gallager is a linear block code defined by a very sparse parity check matrix, which is populated primarily with zeros and sparsely with ones. When it was first introduced, the LDPC code was too complicated to implement, and so it was forgotten for a long time until not too long ago. The LDPC code was brought to light again in 1995, and an irregular LDPC code (which is a generalization of the LDPC code suggested by Robert Gallager) was introduced in 1998. When the LDPC code was first introduced by Gallager, a probabilistic decoding algorithm was also suggested, and the LDPC code which is decoded using this algorithm exhibited excellent performance characteristics. The LDPC code also showed improved performance when
28
extended to non-binary code as well as binary code to define code words. Like the turbo code, the LDPC code yields a bit error rate (BER) approaching a Shannon channel capacity limit, which is the theoretical maximum amount of digital data that can be transmitted in a given bandwidth in presence of a certain noise interference. The irregular LDPC code which is known to have the best performance only needs an additional 0.13 dB from the Shannon channel capacity to achieve a BER of 106 when a code length is a million bits in an additive white Gaussian noise (AWGN) channel environment, and is thus suitable for applications which require high-quality transmission with a very low BER. Unlike algebraic decoding algorithms usually used for decoding a block code, the decoding algorithm of the LDPC code is a probabilistic decoding algorithm to which a belief-propagation algorithm, which employs a graph theory and a guessing theory, is applied as is. An LDPC decoder computes a probability of a bit corresponding to each bit of a code word received through a channel being 1 or 0. The probability information computed by the LDPC decoder is referred to as a message, and the quality of the message can be checked through each parity defined in a parity check matrix. If a certain parity of the parity check matrix is satisfied, i.e., the result of a parity check is positive, the computed message is specially referred to as a parity check message and contains the most probable value of each code word bit. The parity check message for each parity is used to determine the value of a corresponding bit, and information on a computed bit is referred to as a bit message. Through a procedure of repeating such message transmission, the information for bits of each code word comes to satisfy all parities of the parity-check matrix. Finally, when all parities of the parity-check matrix are satisfied, the decoding of the code word is finished. In an environment where a
29
signal to noise (S/N) ratio is low, systematic codes are used, and thus certain portions of the code word are extracted to reproduce information bits. If a channel is a frequency selective fading channel, adaptive modulation and coding is used for low-error communication. The LDPC code is a type of block channel code and thus has the disadvantage of being difficult to adaptively modulate compared to a trellis code such as a convolution code or a turbo code to which a desired form of modulation and coding can easily be applied through puncturing. In order for the LDPC code to support various code rates for adaptive transmission, it has to have various code matrices, which carries the disadvantage of the encoder and the decoder needing a large memory. 4.1 SUMMARY OF THE INVENTION The present invention is directed to an LDPC encoder, an LDPC decoder, and LDPC encoding and decoding methods in which a size of a memory of the encoder and decoder can be reduced by forming, from one parity-check matrix, a smaller parity-check matrix. A first aspect of the present invention is to provide an LDPC encoder, including: a code generating circuit including a memory which stores a first parity check matrix and summing a first row which is at least one row of the first parity check matrix and a second row which is at least one of the remaining rows of the first parity check matrix to output a second parity check matrix; and an encoding circuit receiving the second parity check matrix and an information word to output an LDPCencoded code word. A second aspect of the present invention is to provide an LDPC decoder, including: a code generating circuit including a memory which stores a first
30
parity check matrix and summing a first row which is at least one row of the first parity check matrix and a second row which is at least one of the remaining rows of the first parity check matrix to output a second parity check matrix; and a decoding circuit receiving the second parity check matrix and a code word to output an LDPC-decoded information word. A third aspect of the present invention is to provide an LDPC encoder, including: a code generating circuit including a memory which stores a first parity check matrix and outputting a second parity check matrix formed by removing a first row which is at least one row of the first parity check matrix; and an encoding circuit receiving the second parity check matrix and an information word to output an LDPC-encoded code word. A fourth aspect of the present invention is to provide an LDPC decoder, including: a code generating circuit including a memory which stores a first parity check matrix and outputting a second parity check matrix formed by removing a first row which is at least one row of the first parity check matrix; and a decoding circuit receiving the second parity check matrix and a code word to output an LDPC-decoded information word. A fifth aspect of the present invention is to provide an LDPC encoding method, including: storing a first parity check matrix in a memory; summing a first row which is at least one row of the first parity check matrix and a second row which is at least one of the remaining rows of the first parity check matrix to form a second parity check matrix; and receiving the second parity check matrix and an information word and performing LDPC-encoding. A sixth aspect of the present invention is to provide an LDPC decoding method, including: storing a first parity check matrix in a memory; summing a
31
first row which is at least one row of the first parity check matrix and a second row which is at least one of the remaining rows of the first parity check matrix to form a second parity check matrix; and receiving the second parity check matrix and a code word and performing LDPC-decoding. Low Density Parity Check (LDPC) codes offer excellent error correcting performance. However, current implementations are not capable of achieving the performance required by next generation storage and telecom applications. Extrapolation of many of those designs is not possible because of routing congestions. This article proposes a new architecture, based on a redefinition of a lesser-known LDPC decoding algorithm. As random LDPC codes are the most powerful, we abstain from making simplifying assumptions about the LDPC code which could ease the routing problem. We avoid the routing congestion problem by going for multiple independent sequential decoding machines, each decoding separate received codewords. In this serial approach the required amount of memory must be multiplied by the large number of machines. Our key contribution is a check node centric reformulation of the algorithm which gives huge memory reduction and which thus makes the serial approach possible.
NANO-X API
The Nano-X API tries to be compliant with the Microsoft Win32 and WinCE GDI standard. Currently, there is support for most of the graphics drawing and clipping routines, as well as automatic window title bar drawing and dragging windows for movement. The Nano-X API is message-based, and allows programs to be written without regard to the eventual window management policies implemented by the system. The Nano-X API is not
32
currently client/server, and will be discussed in more detail in the section called Nano-X API. NANO-X API
The Nano-X API is modeled after the mini-x server written initially by David Bell, which was a reimplementation of X on the MINIX operating system. It loosely follows the X Window System Xlib API, but the names all being with GrXXX() rather than X...(). Currently, the Nano-X API is client/server, but does not have any provisions for automatic window dressings, title bars, or user window moves. There are several groups writing widget sets currently, which will provide such things. Unfortunately, the user programs must also then write only to a specific widget set API, rather than using the Nano-X API directly, which means that only the functionality provided by the widget set will be upwardly available to the applications programmer. (Although this could be considerable, in the case that, say Gdk was ported.) In recent years, research on nanotechnology has advanced rapidly. Novel nanodevices have been developed, such as those based on carbon nanotubes, nanowires, etc. Using these emerging nanodevices, diverse nanoarchitectures have been proposed. Among them, hybrid nano/CMOS reconfigurable architectures have attracted attention because of their advantages in performance, integration density, and fault tolerance. Recently, a high performance hybrid nano/CMOS reconfigurable architecture, called NATURE, was presented. NATURE comprises CMOS reconfigurable logic and interconnect fabric, and CMOS-fabrication-compatible nanomemory. Highdensity, fast nano RAMs are distributed in NATURE as on-chip storage to store multiple reconfiguration copies for each reconfigurable element. It
33
enables cycle-by-cycle runtime reconfiguration and a highly efficient computational model, called temporal logic folding. Through logic folding, NATURE provides more than an order of magnitude improvement in logic density and area-delay product, and significant design flexibility in performing area-delay trade-offs, at the same technology node. Moreover, NATURE can be fabricated using mainstream photolithography fabrication techniques. Hence, it offers a currently commercially viable reconfigurable architecture with high performance, superior logic density, and outstanding design flexibility, which is very attractive for deployment in cost-conscious embedded systems. In order to fully explore the potential of NATURE and further improve its performance, in this article, a thorough design space exploration is conducted to optimize its architecture. Investigations in terms of different logic element architectures, interconnect designs, and various technologies for nano RAMs are presented. Nano RAMs can not only be used as storage for configuration bits, but the high density of nano RAMs also makes them excellent candidates for large-capacity on-chip data storage in NATURE. Many logic- and memory-intensive applications, such as video and image processing, require large storage of temporal results. To enhance the capability of NATURE for implementing such applications, we investigate the design of nano data memory structures in NATURE and explore the impact of memory density. processing. Experimental results demonstrate significant throughput improvements due to area saving from logic folding and parallel data
34
CHAPTER V
SYSTEM MODULES
5.1 FAULT TOLERANCE APPROACH
Fault tolerance technique is based on at least one of the three types of redundancy: time, data, or hardware redundancy. Hardware redundancy means the replication of hardware modules and some kind of result comparison or voting instance. The inherent redundancy in fieldprogrammable logic resulting from the regular cell-based structure allows a very efficient implementation of hardware redundancy. The faulty resource must not be reused by the new configuration. After the reconfiguration, the possible effect of the fault must be confined for some applications and the circuit must be reset to a consistent state. Then the system can continue to operate. The idea of an autonomous mechanism for fault detection and reconfiguration at an appropriate speed, in terms of the regarded system, is the starting point for the fault tolerance technique presented here. The technique combines a scalable hardware-based fault detection mechanism with a fast online fault reconfiguration technique and a check pointing and rollback mechanism for fault recovery. The reconfiguration is based on a hardwareimplemented reconfiguration controller: the reconfiguration control unit (RCU). In contrast to other online fault test and reconfiguration strategies as described. The fault detection mechanism must provide the fault location and trigger reconfiguration. The reconfiguration step must replace the current
35
configuration data set by an alternative configuration (which provides a faultavoiding mapping of the user circuit) and trigger recovery. The recovery step must bring the whole system back into a consistent state. a fast online technique, such differentiations are too time-consuming and a simpler approach must be taken: all faults are assumed to be permanent. Even under this assumption, no general technique is available today which controls the appropriate reconfiguration procedure.
Fig.5.1: Phases of the fault tolerance technique.
The basic characteristics of fault tolerance require: 1. No single point of repair 2. Fault isolation to the failing component 3. Fault containment to prevent propagation of the failure 4. Availability of reversion modes Fault-tolerant systems are typically based on the concept of redundancy. 5.2 NANOMEMORY ARCHITECTURE MODEL
36
The design structure of the encoder, corrector, and detector units of our proposed fault-tolerant memory system. We also present the implementation of these units on a sub-lithographic, nanowire-based substrate. Before going into the design structure details we start with a brief overview of the sub-lithographic memory architecture model.
Fig. 5.2. Structure of Nano Memory core We use the Nano Memory and Nano PLA architectures to implement the memory core and the supporting logic, respectively. Nano Memory and Nano PLA are based on nanowire crossbars .The Nano Memory architecture developed in can achieve greater than b/cm density even after including the lithographic-scale address wires and defects. This design uses a nanowire crossbar to store memory bits and a limited number of lithographic scale wires for address and control lines. Fig.3 shows a schematic overview of this memory structure. The nanowires can be uniquely selected through the two address decoders located on the two sides of the memory core. Instead of using a lithographic-scale interface to read and write into the memory core, we use a nanowire-based interface. The reason that we can remove the lithographicscale interface is that all the blocks interfacing with the memory core (encoder, corrector and detectors) are implemented with nanowire-based crossbars.
37
5.3 FAULT SECURE DETECTOR The core of the detector operation is to generate the syndrome vector, which is basically implementing the following vector-matrix multiplication on the received encoded vector C and parity-check matrix H: S=C.HT
Fig.5.3: Fault-secure detector for (15, 7, 5) EG-LDPC code
This binary sum is implemented with an XOR gate. Fig. 4 shows the detector circuit for the (15, 7, 5)EG-LDPC code. Since the row weight of the parity-check matrix is , to generate one digit of the syndrome vector we need a -input XOR gate, or (-1)2-input XOR gates. For the whole detector, it take n(-1) 2-input XOR gates. Table II illustrates this quantity for some of the smaller EG-LDPC codes.
Hamming bound (14,7,5) (58,37,9) (222,175,17)
EG-LDPC (15,7,5) (63,37,9) (255,175,17)
Gilert Varshamov bound (17,7,5) (67,37,9) (255,175,17)
38
TABLE 5.1 Detector, encoder, and corrector circuit area An error is detected if any of the syndrome bits has a nonzero value. The final error detection signal is implemented by an OR function of all the syndrome bits. The output of this -input OR gate is the error detector signal (see Fig. 4). In order to avoid a single point of failure, we must implement the OR gate with a reliable substrate (e.g., in a system with sub-lithographic nanowire substrate, the OR gate is implemented with reliable lithographic technologyi.e., lithographic-scaled wire-OR).
5.4 ENCODER
An n-bit codeword c, which encodes a k-bit information vector is generated by multiplying the k -bit information vector with k x n a bit generator matrix G ; i.e., c=i .G.. EG-LDPC codes are not systematic and the information bits must be decoded from the encoded vector, which is not desirable for our fault-tolerant approach due to the further complication and delay that it adds to the operation. these codes are cyclic codes 15. We used the procedure to convert the cyclic generator matrices to systematic generator matrices for all the EG-LDPC codes under consideration.
39
Fig. 5.4: Structure of an encoder circuit for the (15, 7, 5) EG-LDPC code
The above figure shows the encoder circuit to compute the parity bits of the (15, 7, 5) EG-LDPC code. In this figure i=(i0,.,i6) is the information vector and will be copied to c0,.c6 bits of the encoded vector, c , and the rest of encoded vector ,the parity bits, are linear sums (XOR) of the information bits. If the building block is two-input gates then the encoder circuitry takes 22 two-input XOR gates. Table I shows the area of the encoder circuits for each EG-LDPC codes under consideration based on their generator matrices.
5.5 CORRECTOR 1) ONE-STEP MAJORITY-LOGIC CORRECTOR
One-step majority logic correction is the procedure that identifies the correct value of a each bit in the codeword directly from the received codeword; this is in contrast to the general message-passing error correction strategy (e.g., [23]) which may demand multiple iterations of error diagnosis and trial correction. Avoiding iteration makes the correction latency both small and deterministic This method consists of two parts: 1) Generating a specific set of linear sums of the received vector bits
40
2) Finding the majority value of the computed linear sums. linear sum of the received encoded vector bits can be formed by computing the inner product of the received vector and a row of a parity-check matrix. This sum is called ParityCheck sum 2) MAJORITY CIRCUIT IMPLEMENTATION Here we present a compact implementation for the majority gate using Sorting Networks 5.6 BANKED MEMORY Large memories are conventionally organized as sets of smaller memory blocks called banks. The reason for breaking a large memory into smaller banks is to trade off overall memory density for access speed and reliability. Excessively small bank sizes will incur a large area overhead for memory drivers and receivers. Large memory banks require long rows and columns which results in high capacitance wires that consequently increases the delay. Furthermore long wires are more susceptible to breaks and bridging defects. Therefore excessively large memory banks have high defect rate and low performance.
41
Fig.5.5. Banked memory organization, with single global corrector.
The number of faults that accumulate in the memory is directly related to the scrubbing period. The longer the scrubbing period is, the larger the number of errors that can accumulate in the system. However, scrubbing all memory words serially can take a long time. If the time to serially scrub the memory becomes noticeable compared to the scrubbing period, it can reduce the system performance. To reduce the scrubbing time, we can potentially scrub all the memory banks in parallel CHAPTER VI SYSTEM IMPLEMENTATION
42
6.1 PROCESS (Dynamic Reconfiguration) The feasibility of run-time reconfiguration of FPGAs has been established by a large number of case studies. However, these systems have typically involved an ad hoc combination of hardware and software. The software that manages the dynamic reconfiguration is typically specialized to one application and one hardware configuration. We present three different applications of dynamic reconfiguration, based on research activities at Glasgow University, and extract a set of common requirements. We present the design of an extensible run-time system for managing the dynamic reconfiguration of FPGAs, motivated by these requirements. The system is called RAGE, and incorporates operating-system style services that permit sophisticated and high level operations on circuits. ECC stands for "Error Correction Codes" and is a method used to detect and correct errors introduced during storage or transmission of data. Certain kinds of RAM chips inside a computer implement this technique to correct data errors and are known as ECC Memory. ECC Memory chips are predominantly used in servers rather than in client computers. Memory errors are proportional to the amount of RAM in a computer as well as the duration of operation. Since servers typically contain several Gigabytes of RAM and are in operation 24 hours a day, the likelihood of errors cropping up in their memory chips is comparatively high and hence they require ECC Memory. Memory errors are of two types, namely hard and soft. Hard errors are caused due to fabrication defects in the memory chip and cannot be corrected once they start appearing. Soft errors on the other hand are caused predominantly by electrical disturbances. Memory errors that are not corrected immediately can eventually crash a computer. This again has more relevance
43
to a server than a client computer in an office or home environment. When a client crashes, it normally does not affect other computers even when it is connected to a network, but when a server crashes it brings the entire network down with it. Hence ECC memory is mandatory for servers but optional for clients unless they are used for mission critical applications. ECC Memory chips mostly use Hamming Code or Triple Modular Redundancy as the method of error detection and correction. These are known as FEC codes or Forward Error Correction codes that manage error correction on their own instead of going back and requesting the data source to resend the original data. These codes can correct single bit errors occurring in data. Multibit errors are very rare and hence due not pose much of a threat to memory systems. ENCODING PROCESS EGLDPC codes have received tremendous attention in the coding community because of their excellent error correction capability and nearcapacity performance. Some randomly constructed EGLDPC codes, measured in Bit Error Rate (BER), come very close to the Shannon limit for the AWGN channel (within 0.05 dB) with iterative decoding and very long block sizes (on the order of 106 to 107). However, for many practical applications (e.g. packet-based communication systems), shorter and variable block-size EGLDPC codes with good Frame Error Rate (FER) performance are desired. Communications in packet-based wireless networks usually involve a large per-frame overhead including both the physical (PHY) layer and MAC layer headers. As a result, the design for a reliable wireless link often faces a tradeoff between channel utilization (frame size) and error correction capability. One solution is to use adaptive burst profiles in which, transmission parameters
44
relevant to modulation and coding may be assigned dynamically on a burst-byburst basis. Therefore, LDPC codes with variable block lengths and multiple code rates for different quality-of service under various channel conditions are highly desired.
FLOW OF ENCODING PROCESS
Fig 6.1:Flow of encoding process In the recent literature, there are many EGLDPC decoder architectures but few of them support variable block-size and muti-rate decoding. For example, a 1 Gbps 1024-bit, rate 1/2 EGLDPC decoder has been implemented. However this architecture just supports one particular EGLDPC code by wiring the whole Tanner graph into hardware. A code rate programmable EGLDPC decoder is proposed, but the code length is still fixed to 2048 bit for simple VLSI implementation. In [3], a EGLDPC decoder that supports three block sizes and four code rates is designed by storing 12 different parity check matrices on-chip. As we can see, the main design challenge for supporting variable block sizes and multiple code rates stems from the random or unstructured nature of the EGLDPC codes. Generally support for different
45
block sizes of EGLDPC codes would require different hardware architectures. To address this problem, we propose a generalized decoder architecture based on the quasi-cyclic EGLDPC codes that can support a wider range of block sizes and code rates at a low hardware requirement. To balance the implementation complexity and the decoding throughput, a structured EGLDPC code was proposed in recently for modern wireless communication systems including but not limited to IEEE 802.16e and IEEE 802.11n. An expansion factor. It divides the variable nodes and the check nodes into clusters of size P such that if there exists an edge between variable and check clusters, then it means P variable nodes connect to P check nodes via a permutation (cyclic shift) network. Generally, support for different block sizes and code rates implies usage of multiple PCMs. Storing all the PCMs onchip is almost impractical and expensive. A good tradeoff between design complexity and decoding throughput is partially parallel decoding by grouping a certain number of variable and check nodes into a cluster for parallel processing. Furthermore, the layered decoding algorithm can be applied to improve the decoding convergence time by a factor of two and hence increases the throughput. The structured EGLDPC code makes it effectively suitable for efficient VLSI implementation by significantly simplifying the memory access and message passing. The PCM can be viewed as a group of concatenated horizontal layers, where the column weight is at most 1in each layer due to the cyclic shift structure. 6.2 TESTING TECHNIQUES In this project describe simple iterative decoders for low-density paritycheck codes based on Euclidean geometries, suitable for practical very-largescale-integration implementation in applications requiring very fast decoders.
46
The decoders are based on shuffled and replica-shuffled versions of iterative bit-flipping (BF) and quantized weighted BF schemes. The proposed decoders converge faster and provide better ultimate performance than standard BF decoders. Here present simulations that illustrate the performance versus complexity tradeoffs for these decoders. This project can show in some cases through importance sampling that no significant error floor exists. Here novel architectures comprising of one parallel and two semi-parallel decoder architectures for popular PG-based LDPC codes. These architectures have no memory clash and further are reconfigurable for different lengths (and their corresponding rates). The architectures can be configured either for the regular belief propagation based decoding or majority logic decoding (MLD).In this paper, these analyze storage circuits constructed from unreliable memory components. This project propose a memory construction, using low-density parity-check codes, based on a construction originally made by Taylor. The storage circuit consists of unreliable memory cells along with a correcting circuit. The correcting circuit is also constructed from unreliable logic gates along with a small number of perfect gates. The modified construction enables the memory device to perform better than the original construction. These present numerical results supporting our claims. CHAPTER VII PERFORMANCE AND LIMITATIONS REED-SOLOMON APPLICATIONS
Modem Technologies xDSL, Cable modems
47
CD, DVD Players Digital Audio and Video Broadcast HDTV/Digital TV Data Storage and Retrieval Systems Hard-Disk Drives, CD-ROM Wireless Communications Cell Phones, Base Stations Wireless Enabled PDAs Digital Satellite Communication and Broadcast RAID Controllers with Fault-Tolerance
7.1 APPLICATIONS: Used in SOC, NOC Processor Used in Radios Used almost in all electronic devices Loopback BIST model for digital transceivers with limited test circuitry Spot-defects models (typical of CMOS technology) based on noise and nonlinear analysis, using fault abstraction 7.2 MERITS OF SYSTEM Reduces maintenance cost High speed fault tolerance Can easily identify faults
48
Process Capability No external circuitry Does not affect the internal Architecture of nano memory Multiple faults can be easily solved 7.3 LIMITATIONS OF SYSTEM Hardware faults cannot be recognized Only pre designed regions can be checked May negatively impact manufacturers current technology of silicon chips Only used in specific application
7.4 FUTURE ENHANCEMENT With the advancement in science electrical and electronic devices has reached unimaginable levels. The main constraint of any good device is, it serves its purpose effectively BiST enables this efficiency. Future BiST t system can be designed in such a way that hardware faults can also be indicated so that it can be corrected. A multiprocessor systemon-chip is an integrated system that performs real-time tasks at low power and for low cost.
CHAPTER VIII
49
OUTPUT RESULTS AND DISCUSSIONS
ENCODER
50
DECODER
EXISTING METHODS RESULT
51
PAPERS RESULT
PROPOSED METHODS RESULT
52
CHAPTER IX CONCLUSION
This paper presents an algebraic method for constructing Modified E.G low-density parity-check (LDPC) codes based on the structural properties of Euclidean geometries. The construction method results in a class of M-EGLDPC codes. The key novel contribution of this paper is identifying and defining a new class of error-correcting codes whose redundancy makes the design of fault-secure detectors (FSD) particularly simple. We further quantify the importance of protecting encoder and decoder circuitry against transient errors, illustrating a scenario where the system failure rate (FIT) is dominated by the failure rate of the encoder and decoder. We prove that Euclidean geometry low-density parity-check (EG-LDPC) codes have the fault-secure detector capability
53
CHAPTER X REFERENCES
[1] Fault Secure Encoder and Decoder for memory Applications. Naeimi and A. DeHon, in Proc. IEEE Int. Symp. Defect Fault Tolerance VLSI Syst., Sep.2007, pp. 409417 [2] M. Davey and D.J.Mackay, Low density parity check codes over Gf(q), IEEE Commun. Lett.,vol.2,no.6,pp.165-167,jun.1998. [3] Codes on finite geometries H.Tang, J. Xu, S.Lin, and K. A. S. Abdel- IEEE Trans. Inf. Theory, vol. 51, no. 2,Feb. 2005. [4] Designing fault-secure parallel encoders for systematic linear error correcting codes S. J. Piestrak, A. Dandache, and F. Monteiro IEEE Trans. Reliab., vol. 52, no.4 December 2003. [5] D. J. C. MacKay, Good error-correcting codes based on very sparse matrices, IEEE Trans. Inf. Theory, vol. 45, no. 2, pp. 399431, Mar.1999. [6] H. Wymeersch, H. Steendam, and M. Moeneclaey, Log-domain
decoding of LDPC codes over Gf(q), in Proc. IEEE Int. Conf. Commun., Paris, France, Jun. 2004, pp. 772776.
54

Final

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Final

Hochgeladen von

Copyright:

Verfügbare Formate

1

CHAPTER I INTRODUCTION 1.1 INTRODUCTION ABOUT THE PROJECT

2.2 PROPOSED METHOD

3.1. HARDWARE ENVIRONMENT

Features in common with FPGAs:

3.2 SOFTWARE ENVIRONMENT SOFTWARE TOOLS: MODEL SIM

Fig 3.1 :VLSI

VHDL AN OVERVIEW VHDL is a hardware description language. The word hardware

BEHAVIORAL STYLE OF MODELING

5.1 FAULT TOLERANCE APPROACH

Fig.5.1: Phases of the fault tolerance technique.

Fig.5.3: Fault-secure detector for (15, 7, 5) EG-LDPC code

Hamming bound (14,7,5) (58,37,9) (222,175,17)

EG-LDPC (15,7,5) (63,37,9) (255,175,17)

Gilert Varshamov bound (17,7,5) (67,37,9) (255,175,17)

5.5 CORRECTOR 1) ONE-STEP MAJORITY-LOGIC CORRECTOR

Fig.5.5. Banked memory organization, with single global corrector.

FLOW OF ENCODING PROCESS

Modem Technologies xDSL, Cable modems

OUTPUT RESULTS AND DISCUSSIONS

EXISTING METHODS RESULT

PROPOSED METHODS RESULT

Das könnte Ihnen auch gefallen