You are on page 1of 43

Low-Power and Area-Efficient Carry Select Adder

Introduction: Addition usually impacts widely the overall performance of digital systems and a crucial arithmetic function. In electronic applications adders are most widely used. Applications where these are used are multipliers, DSP to execute various algorithms like FFT, FIR and IIR. Wherever concept of multiplication comes adders come in to the picture. As we know millions of instructions per second are performed in microprocessors. So, speed of operation is the most important constraint to be considered while designing multipliers. Due to device portability miniaturization of device should be high and power consumption should be low. Devices like Mobile, Laptops etc. require more battery backup. So, a VLSI designer has to optimize these three parameters in a design. These constraints are very difficult to achieve so depending on demand or application some compromise between constraints has to be made. Ripple carry adders exhibits the most compact design but the slowest in speed. Whereas carry lookahead is the fastest one but consumes more area [2]. Carry select adders act as a compromise between the two adders. In 2002, a new concept of hybrid adders is presented to speed up addition process by Wang et al. that gives hybrid carry look-ahead/carry select adders design [7]. In 2008, low power multipliers based on new hybrid full adders is presented in [8]. DESIGN of area- and power-efficient high-speed data path logicsystems are one of the most substantial areas of research in VLSI system design. In digital adders, the speed of addition is limited by the time required to propagate a carry through the adder. The sum for each bit position in an elementary adder is generated sequentially only after the previous bit position has been summed and a carry propagated into the next position. The CSLA is used in many computational systems to alleviate the problem of carry propagation delay by independently generating mul-tiple carries and then select a carry to generate the sum [1]. However, the CSLA is not area efficient because it uses multiple pairs of Ripple Carry Adders (RCA) to generate partial sum and carry by considering carry input 5 and 5 , then the final sum and carry are selected by the multiplexers (mux).

2. FAST ADDERS 2.1 Ripple Carry Adder Concatenating the N full adders forms N bit Ripple carry adder. In this carry out of previous full adder becomes the input carry for the next full adder. It calculates sum and carry according to the following equations. As carry ripples from one full adder to the other, it traverses longest critical path and exhibits worst-case delay. Si = Ai xor Bi xorCi Ci+1 = Ai Bi + (Ai + Bi) Ci; where i = 0, 1, , n-1 RCA is the slowest in all adders (O (n) time) but it is very compact in size (O (n) area). If the ripple carry adder is implemented by concatenating N full adders, the delay of such an adder is 2N gate delays from Cin to Cout. The delay of adder increases linearly with increase in number of bits. Block diagram of RCA is shown in figure 1.

Figure1: Block diagram of RCA 2.2 Carry Skip Adder (CSKA) A carry skip divides the words to be added in to groups of equal size of k-bits. Carry Propagate pi signals may be used within a group of bits to accelerate the carry propagation. If all the pi signals within the group are pi=1, carry bypasses the entire group as shown in figure 2. P = pi * pi+1 * pi+2 * pi+k In this way delay is reduced as compared to ripple carry adder. The worst-case carry propagation delay in a N-bit carry skip adder with fixed block width b, assuming that one stage of ripple has the same delay as one skip, can be derived: TCSKA = (b -1)+0.5+(N/b-2)+(b -1) = 2b + N/b 3.5 Stages Block width tremendously affects the latency of adder. Latency is directly proportional to block width. More number of blocks means block width is less, hence more delay. The idea behind Variable Block Adder (VBA) is to minimize the critical path delay in the carry chain of a carry skip adder, while allowing the groups to take different sizes. In case of carry skip adder, such condition will result in more number of skips between stages. Such adder design is called variable block design, which is tremendously used to fasten the speed of adder. In the variable block carry skip adder design we divided a 32-bit adder in to 4 blocks or groups. The bit widths of groups are taken as; First block is of 4 bits, second is of 6 bits, third is 18 bit wide and the last group consist of most significant 4 bits. Table 1 shows that the logic utilization of carry skip and variable carry skip 32-bit adder. The

power and delay, which are obtained also given in the table1. From table it can be observed that variable block design consumes more area as gate count and number of LUTs consumed by variable block design is more than conventional carry skip adder. 2.3 Carry Select Adder (CSA) The carry select adder comes in the category of conditional sum adder. Conditional sum adder works on some condition. Sum and carry are calculated by assuming input carry as 1 and 0 prior the input carry comes. When actual carry input arrives, the actual calculated values of sum and carry are selected using a multiplexer. The conventional carry select adder consists of k/2 bit adder for the lower half of the bits i.e. least significant bits and for the upper half i.e. most significant bits (MSBs) two k/ bit adders. In MSB adders one adder assumes carry input as one for performing addition and another assumes carry input as zero. The carry out calculated from the last stage i.e. least significant bit stage is used to select the actual calculated values of output carry and sum. The selection is done by using a multiplexer. This technique of dividing adder in to stages increases the area utilization but addition operation fastens. The block diagram of conventional k bit adder is shown in figure 3.

Basics: In electronics, a carry-select adder is a particular way to implement an adder, which is a logic element that computes the -bit sum of two -bit numbers. The carry-select adder is . simple but rather fast, having a gate level depth of

The carry-select adder generally consists of two ripple carry adders and a multiplexer. Adding two n-bit numbers with a carry-select adder is done with two adders (therefore two ripple carry adders) in order to perform the calculation twice, one time with the assumption of the carry being zero and the other assuming one. After the two results are calculated, the correct sum, as well as the correct carry, is then selected with the multiplexer once the correct carry is known. The number of bits in each carry select block can be uniform, or variable. In the uniform case, the optimal delay occurs for a block size of . When variable, the block size should have a delay, from addition inputs A and B to the carry out, equal to that of the multiplexer chain leading into it, so that the carry out is calculated just in time. The delay is derived from

uniform sizing, where the ideal number of full-adder elements per block is equal to the square root of the number of bits being added, since that will yield an equal number of MUX delays. Adder: This is about a digital circuit. For an electronic circuit that handles analog signals see Electronic mixer. In electronics, an adder or summer is a digital circuit that performs addition of numbers. In many computers and other kinds of processors, adders are used not only in the arithmetic logic unit(s), but also in other parts of the processor, where they are used to calculate addresses, table indices, and similar. Although adders can be constructed for many numerical representations, such as binary-coded decimal or excess-3, the most common adders operate on binary numbers. In cases where two's complement or ones' complement is being used to represent negative numbers, it is trivial to modify an adder into an addersubtractor. Other signed number representations require a more complex adder. Half adder

Half Adder logic diagram File:My work in.svg Half Adder logic diagram The half adder adds two one-bit binary numbers A and B. It has two outputs, S and C (the value theoretically carried on to the next addition); the final sum is 2C + S. The simplest half-adder design, pictured on the right, incorporates an XOR gate for S and an AND gate for C. With the addition of an OR gate to combine their carry outputs, two half adders can be combined to make a full adder.[1]

Full adder

Schematic symbol for a 1-bit full adder with Cin and Cout drawn on sides of block to emphasize their use in a multi-bit adder A full adder adds binary numbers and accounts for values carried in as well as out. A one-bit full adder adds three one-bit numbers, often written as A, B, and Cin; A and B are the operands, and Cin is a bit carried in from the next less significant stage.[2] The full-adder is usually a component in a cascade of adders, which add 8, 16, 32, etc. binary numbers. The circuit produces a two-bit output sum typically represented by the signals Cout and S, where . The one-bit full adder's truth table is:

Full-adder logic diagram Inputs Outputs

A B Cin Cout S

0 0 0

1 0 0

0 1 0

1 1 0

0 0 1

1 0 1

0 1 1

1 1 1

A full adder can be implemented in many different ways such as with a custom transistor-level circuit or composed of other gates. One example implementation is with and . In this implementation, the final OR gate before the carry-out output may be replaced by anXOR gate without altering the resulting logic. Using only two types of gates is convenient if the circuit is being implemented using simple IC chips which contain only one gate type per chip. In this light, Cout can be implemented as . A full adder can be constructed from two half adders by connecting A and B to the input of one half adder, connecting the sum from that to an input to the second adder, connecting Ci to the other input and OR the two carry outputs. Equivalently, S could be made the three-bit XOR of A, B, and Ci, and Cout could be made the three-bit majority function of A, B, and Ci. More complex adders Ripple carry adder

4-bit adder with logic gates shown

It is possible to create a logical circuit using multiple full adders to add N-bit numbers. Each full adder inputs a Cin, which is the Cout of the previous adder. This kind of adder is a ripple carry adder, since each carry bit "ripples" to the next full adder. Note that the first (and only the first) full adder may be replaced by a half adder. The layout of a ripple carry adder is simple, which allows for fast design time; however, the ripple carry adder is relatively slow, since each full adder must wait for the carry bit to be calculated from the previous full adder. The gate delay can easily be calculated by inspection of the full adder circuit. Each full adder requires three levels of logic. In a 32-bit [ripple carry] adder, there are 32 full adders, so the critical path (worst case) delay is 3 (from input to carry in first adder) + 31 * 2 (for carry propagation in later adders) = 65 gate delays. A design with alternating carry polarities and optimized AND-OR-Invert gates can be about twice as fast.[3] [edit]Carry-lookahead adders

4-bit adder with carry lookahead To reduce the computation time, engineers devised faster ways to add two binary numbers by using carry-lookahead adders. They work by creating two signals (P and G) for each bit position, based on if a carry is propagated through from a less significant bit position (at least one input is a '1'), a carry is generated in that bit position (both inputs are '1'), or if a carry is killed in that bit position (both inputs are '0'). In most cases, P is simply the sum output of a half-adder and G is the carry output of the same adder. After P and G are generated the carries for every bit position are created. Some advanced carry-lookahead architectures are the Manchester carry chain, BrentKung adder, and the KoggeStone adder. Some other multi-bit adder architectures break the adder into blocks. It is possible to vary the length of these blocks based on the propagation delay of the circuits to optimize computation time. These block based adders include the carry bypass adder which will determine P and G values for each block rather than each bit, and the carry select adderwhich pre-generates sum and carry values for either possible carry input to the block. Other adder designs include the carry-save adder, carry-select adder, conditional-sum adder, carry-skip adder, and carry-complete adder. [edit]Lookahead carry unit

A 64-bit adder By combining multiple carry lookahead adders even larger adders can be created. This can be used at multiple levels to make even larger adders. For example, the following adder is a 64-bit adder that uses four 16-bit CLAs with two levels of LCUs. 3:2 compressors We can view a full adder as a 3:2 lossy compressor: it sums three one-bit inputs, and returns the result as a single two-bit number; that is, it maps 8 input values to 4 output values. Thus, for example, a binary input of 101 results in an output of 1+0+1=10 (decimal number '2'). The carry-out represents bit one of the result, while the sum represents bit zero. Likewise, a half adder can be used as a 2:2 lossy compressor, compressing four possible inputs into three possible outputs. Such compressors can be used to speed up the summation of three or more addends. If the addends are exactly three, the layout is known as the carry-save adder. If the addends are four or more, more than one layer of compressors is necessary and there are various possible design for the circuit: the most common are Dadda and Wallace trees. This kind of circuit is most notably used in multipliers, which is why these circuits are also known as Dadda and Wallace multipliers. Multiplexer In electronics, a multiplexer (or MUX) is a device that selects one of several analog or digital input signals and forwards the selected input into a single line.[1] A multiplexer of 2n inputs has n select lines, which are used to select which input line to send to the output.[2] Multiplexers are mainly used to increase the amount of data that can be sent over the network within a certain amount of time and bandwidth.[1] A multiplexer is also called a data selector. They are used in CCTV, and almost every business that has CCTV fitted, will own one of these. An electronic multiplexer makes it possible for several signals to share one device or resource, for example one A/D converter or one communication line, instead of having one device per input signal.

On the other hand, a demultiplexer (or demux) is a device taking a single input signal and selecting one of many data-output-lines, which is connected to the single input. A multiplexer is often used with a complementary demultiplexer on the receiving end.[1] An electronic multiplexer can be considered as a multiple-input, single-output switch, and a demultiplexer as a single-input, multiple-output switch.[3] The schematic symbol for a multiplexer is an isosceles trapezoid with the longer parallel side containing the input pins and the short parallel side containing the output pin.[4] The schematic on the right shows a 2-to-1 multiplexer on the left and an equivalent switch on the right. The wire connects the desired input to the output. In digital circuit design, the selector wires are of digital value. In the case of a 2-to-1 multiplexer, a logic value of 0 would connect to the output while a logic value of 1 would connect to the output. In larger multiplexers, the number of selector pins is equal to where is the number of inputs. For example, 9 to 16 inputs would require no fewer than 4 selector pins and 17 to 32 inputs would require no fewer than 5 selector pins. The binary value expressed on these selector pins determines the selected input pin. A 2-to-1 multiplexer has a boolean equation where input, and is the output: and are the two inputs, is the selector

A 2-to-1 mux This truth table shows that when then but when then . A straightforward realization of this 2-to-1 multiplexer would need 2 AND gates, an OR gate, and a NOT gate. Larger multiplexers are also common and, as stated above, require selector pins for inputs. Other common sizes are 4-to-1, 8-to-1, and 16-to-1. Since digital logic uses binary values, powers of 2 are used (4, 8, 16) to maximally control a number of inputs for the given number of selector inputs.

4-to-1 mux

8-to-1 mux

16-to-1 mux The boolean equation for a 4-to-1 multiplexer is:

Two realizations for creating a 4-to-1 multiplexer are shown below:

These are two realizations of a 4-to-1 multiplexer:

one realized from a decoder, AND gates, and an OR gate one realized from 3-state buffers and AND gates (the AND gates are acting as the decoder) inputs indicate the decimal value of the binary control

Note that the subscripts on the

inputs at which that input is let through.

Carry Select Adder (CSA) DESIGN of area- and power-efficient high-speed data path logicsystems are one of the most substantial areas of research in VLSIsystem design. In digital adders, the speed of addition is limited by thetime required to propagate a carry through the adder. The sum for eachbit position in an elementary adder is generated sequentially only afterthe previous bit position has been summed and a carry propagated intothe next position.The CSLA is used in many computational systems to alleviate theproblem of carry propagation delay by independently generating multiplecarries and then select a carry to generate the sum [1]. However,the CSLA is not area efficient because it uses multiple pairs of RippleCarry Adders (RCA) to generate partial sum and carry by consideringcarry input __ _ _ and __ _ _, then the final sum and carry areselected by the multiplexers (mux).The basic idea of this work is to use Binary to Excess-1 Converter(BEC) instead of RCA with __ _ _ in the regular CSLA to achievelower area and power consumption [2][4]. The main advantage of thisBEC logic comes from the lesser number of logic gates than the -bitFull Adder (FA) structure. The details of the BEC logic are discussedin Section III. The carry select adder comes in the category of conditional sum adder. Conditional sum adder works on some condition. Sum and carry are calculated by assuming input carry as 1 and 0 prior the input carry comes. When actual carry input arrives, the actual calculated values of sum and carry are selected using a multiplexer. The conventional carry select adder consists of k/2 bit adder for the lower half of the bits i.e. least significant bits and for the upper half i.e. most significant bits (MSBs) two k/ bit adders. In MSB adders one adder assumes carry input as one for performing addition and another assumes carry input as zero. The carry out calculated from the last stage i.e. least significant bit stage is used to select the actual calculated values of output carry and sum. The selection is done by using a multiplexer. This technique of dividing adder in to stages increases the area utilization but addition operation fastens. The block diagram of conventional k bit adder is shown in figure 3.

Fig. 4.Regular 16-b SQRT CSLA.

Fig. 1.Delay and Area evaluation of an XOR gate. II. DELAY AND AREA EVALUATION METHODOLOGY OF THE BASICADDER BLOCKS The AND, OR, and Inverter (AOI) implementation of an XOR gate is shown in Fig. 1. The gates between the dotted lines are performing the operations in parallel and the numeric representation of each gate indi-cates the delay contributed by that gate. The delay and area evaluation methodology considers all gates to be made up of AND, OR, and Inverter, each having delay equal to 1 unit and area equal to 1 unit. We then add up the number of gates in the longest path of a logic block that contributes to the maximum delay. The area evaluation is done by counting the total number of AOI gates required for each logic block. Based on this approach, the CSLA adder blocks of 2:1 mux, Half Adder (HA), and FA are evaluated and listed in Table I.

TABLE I DELAY AND AREA COUNT OF THE BASIC BLOCKS OF CSLA Adder Block Delay Aera XOR 2:1 mux Half adder Full adder 3 3 3 6 5 4 6 13

TABLE II FUNCTION TABLE OF THE 4-b BEC B[3:0] X[3:0] 0000 0001 0001 0010 . . . . . . 1110 1111 1111 0000

III. BEC As stated above the main idea of this work is to use BEC instead of the RCA with 5 in order to reduce the area and power consump-tion of the regular CSLA. To replace the -bit RCA, an -bit BEC is required. A structure and the function table of a 4-b BEC are shown in Fig. 2 and Table II, respectively.

Fig. 3 illustrates how the basic function of the CSLA is obtained by using the 4-bit BEC together with the mux. One input of the 8:4 mux gets as it input (B3, B2, B1, and B0) and another input of the mux is the BEC output. This produces the two possible partial results in parallel and the mux is used to select either the BEC output or the direct inputs according to the control signal Cin. The importance of the BEC logic stems from the large silicon area reduction when the CSLA with large number of bits are designed. The Boolean expressions of the 4-bit BEC is listed as (note the functional symbols ~NOT, &AND, ^XOR)

Fig. 2.4-b BEC.

Fig. 5. Delay and area evaluation of regular SQRT CSLA: (a) group2, (b) group3, (c) group4, and (d) group5. F is a Full Adder.

IV. DELAY AND AREA EVALUATION METHODOLOGY OF REGULAR 16-B SQRT CSLA The structure of the 16-b regular SQRT CSLA is shown in Fig. 4. Ithas ve groups of different size RCA. The delay and area evaluation ofeach group are shown in Fig. 5, in which the numerals within [] specifythe delay values, e.g., sum2 requires 10 gate delays. The steps leading to the evaluation are as follows. 1) The group2 [see Fig. 5(a)] has two sets of 2-b RCA. Based onthe consideration of delay values of Table I, the arrival time ofselection input_____________of 6:3 mux is earlier than __ and later than____ ___. Thus, ______ _ ___is summation of

2) Except for group2, the arrival time of mux selection input is alwaysgreater than the arrival time of data outputs from the RCAs.Thus, the delay of group3 to group5 is determined, respectively asfollows:

3) The one set of 2-b RCA in group2 has 2 FA for ___ _ _and theother set has 1 FA and 1 HA for___ _ _. Based on the area countof Table I, the total number of gate counts in group2 is determinedas follows:

4) Similarly, the estimated maximum delay and area of the othergroups in the regular SQRT CSLA are evaluated and listed inTable III.

Fig. 6.Modified 16-b SQRT CSLA. The parallel RCA with Cin = 1 is replaced with BEC. TABLE III DELAY AND AREA COUNT OF REGULAR SQRT CSLA GROUPS Group Delay Area Group2 Group3 Group4 Group5 11 13 16 19 57 87 117 147

Fig. 7. Delay and area evaluation of modified SQRT CSLA: (a) group2, (b) group3, (c) group4, and (d) group5. H is a Half Adder.

TABLE IV DELAY AND AREA COUNT OF MODIFIED SQRT CSLA Group Group2 Group3 Group4 Group5 Delay 13 16 19 22 Area 43 61 84 107

FPGA IMPLEMENTATION 5.1 FPGA Implementation Flow Diagram:


Initially the market research should be carried out which covers the previous version of the design and the current requirements on the design. Based on this survey, the specification and the architecture must be identified. Then the RTL modeling should be carried out in Verilog with respect to the identified architecture. Once the RTL modeling is done, it should be simulated and verified for all the cases. The functional verification should meet the intended architecture and should pass all the test cases.

Once the functional verification is clear, the RTL model will be taken to the synthesis process. Three operations will be carried out in the synthesis process such as Translate Map Place and Route The developed RTL model will be translated to the mathematical equation format which will be in the understandable format of the tool. These translated equations will be then mapped to the library that is, mapped to the hardware. Once the mapping is done, the gates were placed and routed. Before these processes, the constraints can be given in order to optimize the design. Finally the BIT MAP file will be generated that has the design information in the binary format which will be dumped in the FPGA board. 5.2 Introduction to FPGA FPGA stands for Field Programmable Gate Array which has the array of logic module, I /O module and routing tracks (programmable interconnect). FPGA can be configured by end user to implement specific circuitry. Speed is up to 100 MHz but at present speed is in GHz. Main applications are DSP, FPGA based computers, logic emulation, ASIC and ASSP. FPGA can be programmed mainly on SRAM (Static Random Access Memory). It is Volatile and main advantage of using SRAM programming technology is re-configurability. Issues in FPGA technology are complexity of logic element, clock support, IO support and interconnections (Routing). In this work, design of a UART is made using VHDL and is synthesized on FPGA family of Spartan 3E through XILINX ISE Tool. This process includes following: Translate Map Place and Route

5.3 FPGA Flow The basic implementation of design on FPGA has the following steps. Design Entry Simulation

Synthesis Logic Optimization Technology Mapping Placement Routing Programming Unit Configured FPGA Above shows the basic steps involved in implementation. The initial design entry of may be VHDL, schematic or Boolean expression. The optimization of the Boolean expression will be carried out by considering area or speed. In technology mapping, the transformation of optimized Boolean expression to FPGA logic blocks, that is said to be as Slices. Here area and delay optimization will be taken place. During placement the algorithms are used to place each block in FPGA array. Assigning the FPGA wire segments, which are programmable, to establish connections among FPGA blocks through routing. The configuration of final chip is made in programming unit. 5.4 Synthesis Result The developed UART is simulated and verified their functionality. Once the functional verification is done, the RTL model is taken to the synthesis process using the Xilinx ISE tool. In synthesis process, the RTL model will be converted to the gate level netlist mapped to a specific technology library. Here in this Spartan 3E family, many different devices were available in the Xilinx ISE tool. In order to synthesis this DWT and IDWT design the device named as XC3S500E has been chosen and the package as FG320 with the device speed such as -4. The design of UART is synthesized and its results were analyzed as follows.

top Project Status (06/07/2012 - 04:43:04) Project File: carryselectexisting.xise Parser Errors: Implementation State:

No Errors Placed and Routed

Module Name: top Target Device: Product Version: Design Goal: Design Strategy: Environment: xc3s500e-5cp132 ISE 13.4

Errors: Warnings: All Signals Completely Routed

Balanced Xilinx Default (unlocked) System Settings

Routing Results: Timing Constraints: Final Timing Score:

0 (Timing Report)

Device Utilization Summary Logic Utilization Number of 4 input LUTs Number of occupied Slices Number of Slices containing only related logic Number of Slices containing unrelated logic Total Number of 4 input LUTs Number of bonded IOBs Used Available Utilization Note(s) 49 27 27 0 49 54 9,312 4,656 27 27 9,312 92 1% 1% 100% 0% 1% 58%


Average Fanout of Non-Clock Nets


Performance Summary Final Timing Score: Routing Results: Timing Constraints: 0 (Setup: 0, Hold: 0) All Signals Completely Routed Pinout Data: Clock Data:

[-] Pinout Report Clock Report

Detailed Reports Report Name Synthesis Report Status Current Generated Thu Jun 7 04:41:44 2012 Thu Jun 7 04:42:45 2012 Thu Jun 7 04:42:50 2012 Thu Jun 7 04:43:00 2012 0 0 Errors Warnings 0 Infos


10 Warnings (0 0 new) 0 0

Translation Report


Map Report Place and Route Report Power Report Post-PAR Static Timing Report Bitgen Report



1 Info (0 new)

Current Out of Date

Thu Jun 7 04:43:03 2012 Sat May 19 10:46:38 2012

6 Infos (0 new)

Secondary Reports Report Name ISIM Simulator Log Post-Map Static Timing Report WebTalk Report WebTalk Log File Status Out of Date Out of Date Out of Date Out of Date Generated Thu May 17 17:32:44 2012 Wed Apr 11 12:41:26 2012 Sat May 19 10:46:40 2012 Sat May 19 10:47:16 2012


Date Generated: 06/07/2012 - 04:43:04

Carry Select Proposed:

Process "Generate Post-Place & Route Static Timing" completed successfully.

top Project Status (06/07/2012 - 05:20:27) Project File: carryselectproposed.xise Parser Errors: Implementation State:

No Errors Placed and Routed No Errors 11 Warnings (0 new) All Signals Completely Routed

Module Name: top Target Device: xc3s100e-5cp132 Product Version: Design Goal: Design Strategy: Environment: ISE 13.4 Balanced Xilinx Default (unlocked) System Settings

Errors: Warnings: Routing Results: Timing Constraints: Final Timing Score:

0 (Timing Report)

Device Utilization Summary Logic Utilization Number of 4 input LUTs Number of occupied Slices Number of Slices containing only related logic Number of Slices containing unrelated logic Total Number of 4 input LUTs Number of bonded IOBs Average Fanout of Non-Clock Nets Used Available Utilization 45 25 25 0 45 54 2.64 1,920 960 25 25 1,920 83 2% 2% 100% 0% 2% 65% Note(s)


Performance Summary Final Timing Score: Routing Results: Timing Constraints: Detailed Reports Report Name Status Generated Errors Warnings Infos 0 (Setup: 0, Hold: 0) All Signals Completely Routed Pinout Data: Clock Data: Pinout Report Clock Report



Synthesis Report Translation Report Map Report Place and Route Report Power Report Post-PAR Static Timing Report Bitgen Report

Current Current Current Current

Thu Jun 7 05:18:48 2012 Thu Jun 7 05:20:11 2012 Thu Jun 7 05:20:16 2012 Thu Jun 7 05:20:22 2012 Thu Jun 7 05:20:25 2012

0 0 0 0

11 Warnings (0 0 new) 0 0 0 0 2 Infos (0 new) 1 Info (1 new)


6 Infos (0 new)

Secondary Reports Report Name ISIM Simulator Log Post-Map Static Timing Report Status Out of Date Out of Date Generated Thu May 17 17:33:22 2012 Wed Apr 11 12:49:54 2012


Date Generated: 06/07/2012 - 05:20:27

Very-large-scale integration (VLSI) is the process of creating integrated circuits by combining thousands of transistors into a single chip. VLSI began in the 1970s when complex semiconductor and communication technologies were being developed. The microprocessor is a VLSI device.

A VLSI integrated-circuit die The first semiconductor chips held two transistors each. Subsequent advances added more and more transistors, and, as a consequence, more individual functions or systems were integrated over time. The first integrated circuits held only a few devices, perhaps as many as ten diodes, transistors, resistors and capacitors, making it possible to fabricate one or more logic gates on a single device. Now known retrospectively as small-scale integration (SSI), improvements in technique led to devices with hundreds of logic gates, known as medium-scale integration (MSI). Further improvements led to large-scale integration (LSI), i.e. systems with at least a thousand logic gates. Current technology has moved far past this mark and today's microprocessors have many millions of gates and billions of individual transistors. At one time, there was an effort to name and calibrate various levels of large-scale integration above VLSI. Terms like ultra-large-scale integration (ULSI) were used. But the huge number of gates and transistors available on common devices has rendered such fine distinctions moot. Terms suggesting greater than VLSI levels of integration are no longer in widespread use. As of early 2008, billion-transistor processors are commercially available. This is expected to become more commonplace as semiconductor fabrication moves from the current generation of 65 nm processes to the next 45 nm generations (while experiencing new challenges such as increased variation across process corners). A notable example is Nvidia's 280 series GPU. This GPU is unique in the fact that almost all of its 1.4 billion transistors are used for logic, in

contrast to the Itanium, whose large transistor count is largely due to its 24 MB L3 cache. Current designs, as opposed to the earliest devices, use extensive design automation and automated logic synthesis to lay out the transistors, enabling higher levels of complexity in the resulting logic functionality. Certain high-performance logic blocks like the SRAM (Static Random Access Memory) cell, however, are still designed by hand to ensure the highest efficiency (sometimes by bending or breaking established design rules to obtain the last bit of performance by trading stability)[citation needed].

Structured design Structured VLSI design is a modular methodology originated by Carver Mead and Lynn Conway for saving microchip area by minimizing the interconnect fabrics area. This is obtained by repetitive arrangement of rectangular macro blocks which can be interconnected using wiring by abutment. An example is partitioning the layout of an adder into a row of equal bit slices cells. In complex designs this structuring may be achieved by hierarchical nesting. Structured VLSI design had been popular in the early 1980s, but lost its popularity later because of the advent of placement and routing tools wasting a lot of area by routing, which is tolerated because of the progress of Moore's Law. When introducing the hardware description language KARL in the mid' 1970s, Reiner Hartenstein coined the term "structured VLSI design" (originally as "structured LSI design"), echoing Edsger Dijkstra's structured programming approach by procedure nesting to avoid chaotic spaghetti-structured programs. Challenges As microprocessors become more complex due to technology scaling, microprocessor designers have encountered several challenges which force them to think beyond the design plane, and look ahead to post-silicon:

Power usage/Heat dissipation As threshold voltages have ceased to scale with advancing process technology, dynamic power dissipation has not scaled proportionally. Maintaining

logic complexity when scaling the design down only means that the power dissipation per area will go up. This has given rise to techniques such as dynamic voltage and frequency scaling (DVFS) to minimize overall power.

Process variation As photolithography techniques tend closer to the fundamental laws of optics, achieving high accuracy in doping concentrations and etched wires is becoming more difficult and prone to errors due to variation. Designers now must simulate across multiple fabrication process corners before a chip is certified ready for production.

Stricter design rules Due to lithography and etch issues with scaling, design rules for layout have become increasingly stringent. Designers must keep ever more of these rules in mind while laying out custom circuits. The overhead for custom design is now reaching a tipping point, with many design houses opting to switch to electronic design automation (EDA) tools to automate their design process.

Timing/design closure As clock frequencies tend to scale up, designers are finding it more difficult to distribute and maintain low clock skew between these high frequency clocks across the entire chip. This has led to a rising interest in multicore and multiprocessor architectures, since an overall speedup can be obtained by lowering the clock frequency and distributing processing.

First-pass success As die sizes shrink (due to scaling), and wafer sizes go up (to lower manufacturing costs), the number of dies per wafer increases, and the complexity of making suitable photomasks goes up rapidly. A mask set for a modern technology can cost several million dollars. This non-recurring expense deters the old iterative philosophy involving several "spin-cycles" to find errors in silicon, and encourages first-pass silicon success. Several design philosophies have been developed to aid this new design flow, including design for manufacturing (DFM), design for test (DFT), and Design for X

VLSI Technology, Inc was a company which designed and manufactured custom and semicustom ICs. The company was based in Silicon Valley, with headquarters at 1109 McKay Drive in San Jose, California. Along with LSI Logic, VLSI Technology defined the leading edge of the

application-specific integrated circuit (ASIC) business, which accelerated the push of powerful embedded systems into affordable products. The company was founded in 1979 by a trio from Fairchild Semiconductor by way of Synertek Jack Balletto, Dan Floyd, Gunnar Wetlesen - and by Doug Fairbairn of Xerox PARC and Lambda (later VLSI Design) magazine. Alfred J. Stein became the CEO of the company in 1982. Subsequently VLSI built its first fab in San Jose; eventually a second fab was built in San Antonio, Texas. VLSI had its initial public offering in 1983, and was listed on the stock market as (NASDAQ: VLSI). The company was later acquired by Philips and survives to this day as part of NXP Semiconductors.

Advanced tools for VLSI design

A VLSI VL82C106 Super I/O chip. The original business plan was to be a contract wafer fabrication company, but the venture investors wanted the company to develop IC (Integrated Circuit) design tools to help fill the foundry.

Thanks to its Caltech and UC Berkeley students, VLSI was an important pioneer in the electronic design automation industry. It offered a sophisticated package of tools, originally based on the 'lambda-based' design style advocated by Carver Mead and Lynn Conway. VLSI became an early vendor of standard cell (cell-based technology) to the merchant market in the early 80s where the other ASIC-focused company, LSI Logic, was a leader in gate arrays. Prior to VLSI's cell-based offering, the technology had been primarily available only within large vertically integrated companies with semiconductor units such as AT&T and IBM. VLSI's design tools eventually included not only design entry and simulation but eventually cellbased routing (chip compiler), a datapath compiler, SRAM and ROM compilers, and a state machine compiler. The tools were an integrated design solution for IC design and not just point tools, or more general purpose system tools. A designer could edit transistor-level polygons and/or logic schematics, then run DRC and LVS, extract parasitics from the layout and run Spice simulation, then back-annotate the timing or gate size changes into the logic schematic database. Characterization tools were integrated to generate FrameMaker Data Sheets for Libraries. VLSI eventually spun off the CAD and Library operation into Compass Design Automation but it never reached IPO before it was purchased by Avanti Corp. VLSI's physical design tools were critical not only to its ASIC business, but also in setting the bar for the commercial EDA industry. When VLSI and its main ASIC competitor, LSI Logic, were establishing the ASIC industry, commercially-available tools could not deliver the productivity necessary to support the physical design of hundreds of ASIC designs each year without the deployment of a substantial number of layout engineers. The companies' development of automated layout tools was a rational "make because there's nothing to buy" decision. The EDA industry finally caught up in the late 1980s when Tangent Systems released its TanCell and TanGate products. In 1989, Tangent was acquired by Cadence Design Systems (founded in 1988). Unfortunately, for all VLSI's initial competence in design tools, they were not leaders in semiconductor manufacturing technology. VLSI had not been timely in developing a 1.0 m

manufacturing process as the rest of the industry moved to that geometry in the late 80s. VLSI entered a long-term technology parthership with Hitachi and finally released a 1.0 m process and cell library (actually more of a 1.2 m library with a 1.0 m gate). As VLSI struggled to gain parity with the rest of the industry in semiconductor technology, the design flow was moving rapidly to a Verilog HDL and synthesis flow. Cadence acquired Gateway, the leader in Verilog hardware design language (HDL) and Synopsys was dominating the exploding field of design synthesis. As VLSI's tools were being eclipsed, VLSI waited too long to open the tools up to other fabs and Compass Design Automation was never a viable competitor to industry leaders. Meanwhile, VLSI entered the merchant high speed static RAM (SRAM) market as they needed a product to drive the semiconductor process technology development. All the large semiconductor companies built high speed SRAMs with cost structures VLSI could never match. VLSI withdrew once it was clear that the Hitachi process technology partnership was working. ARM Ltd was formed in 1990 as a semiconductor intellectual property licensor, backed by Acorn, Apple and VLSI. VLSI became a licensee of the powerful ARM processor and ARM finally funded processor tools. Initial adoption of the ARM processor was slow. Few applications could justify the overhead of an embedded 32 bit processor. In fact, despite the addition of further licensees, the ARM processor enjoyed little market success until they developed the novel 'thumb' extensions. Ericsson adopted the ARM processor in a VLSI chipset for its GSM handset designs in the early 1990s. It was the GSM boost that is the foundation of ARM the company/technology that it is today. Only in PC chipsets, did VLSI dominate in the early 90s. This product was developed by five engineers using the 'Megacells" in the VLSI library that led to a business unit at VLSI that almost equaled its ASIC business in revenue. VLSI eventually ceded the market to Intel because Intel was able to package-sell its processors, chipsets, and even board level products together.

VLSI also had an early partnership with PMC, a design group that had been nurtured of British Columbia Bell. When PMC wanted to divest its semiconductor intellectual property venture, VLSI's bid was beaten by a creative deal by Sierra Semiconductor. The telecom business unit management at VLSI opted to go it alone. PMC Sierra became one of the most important telecom ASSP vendors. Scientists and innovations from the 'design technology' part of VLSI found their way to Cadence Design Systems (by way of Redwood Design Automation). Compass Design Automation (VLSI's CAD and Library spin-off) was sold to Avant! Corporation, which itself was acquired by Synopsys. Global expansion VLSI maintained operations throughout the USA, and in Britain, France, Germany, Italy, Japan, Singapore and Taiwan. One of its key sites was in Tempe, Arizona, where a family of highly successful chipsets was developed for the IBM PC. In 1990, VLSI Technology, along with Acorn Computers and Apple Computer were the founding investing partners in ARM Ltd. Ericsson of Sweden, after many years of fruitful collaboration, was by 1998 VLSI's largest customer, with annual revenue of $120 million. VLSI's datapath compiler (VDP) was the valueadded differentiator that opened the door at Ericsson in 1987/8. The silicon revenue and GPM enabled by VDP must make it one of the most successful pieces of customer-configurable, nonmemory silicon intellectual property (SIP) in the history of the industry. Within the Wireless Products division, based at Sophia-Antipolis in France, VLSI developed a range of algorithms and circuits for the GSM standard and for cordless standards such as the European DECT and the Japanese PHS. Stimulated by its growth and success in the wireless handset IC area, Philips Electronics acquired VLSI in June 1999, for about $1 billion. The former components survive to this day as part of Philips spin-off NXP Semiconductors.

Hardware description language In electronics, a hardware description language or HDL is any language from a class of computer languages, specification languages, or modeling languages for formal description and design of electronic circuits, and most-commonly, digital logic. It can describe the circuit's operation, its design and organization, and tests to verify its operation by means of simulation.[citation needed] HDLs are standard text-based expressions of the spatial and temporal structure and behaviour of electronic systems. Like concurrent programming languages, HDL syntax and semantics includes explicit notations for expressing concurrency. However, in contrast to most software programming languages, HDLs also include an explicit notion of time, which is a primary attribute of hardware. Languages whose only characteristic is to express circuit connectivity between a hierarchy of blocks are properly classified as netlist languages used on electric computer-aided design (CAD). HDLs are used to write executable specifications of some piece of hardware. A simulation program, designed to implement the underlying semantics of the language statements, coupled with simulating the progress of time, provides the hardware designer with the ability to model a piece of hardware before it is created physically. It is this executability that gives HDLs the illusion of being programming languages, when they are more-precisely classed as specification languages or modeling languages. Simulators capable of supporting discrete-event (digital) and continuous-time (analog) modeling exist, and HDLs targeted for each are available. It is certainly possible to represent hardware semantics using traditional programming languages such as C++, although to function such programs must be augmented with extensive and unwieldy class libraries. Primarily, however, software programming languages do not include any capability for explicitly expressing time, and this is why they do not function as a hardware description language. Before the recent introduction of SystemVerilog, C++ integration with a logic simulator was one of the few ways to use OOP in hardware verification. SystemVerilog is the first major HDL to offer object orientation and garbage collection.

Using the proper subset of virtually any (hardware description or software programming) language, a software program called a synthesizer (or synthesis tool) can infer hardware logic operations from the language statements and produce an equivalent netlist of generic hardware primitives to implement the specified behaviour. Synthesizers generally ignore the expression of any timing constructs in the text. Digital logic synthesizers, for example, generally use clock edges as the way to time the circuit, ignoring any timing constructs. The ability to have a synthesizable subset of the language does not itself make a hardware description language.

History The first hardware description languages were ISP (Instruction Set Processor),[1] developed at Carnegie Mellon University, and KARL, developed at University of Kaiserslautern, both around 1977. ISP was, however, more like a software programming language used to describe relations between the inputs and the outputs of the design. Therefore, it could be used to simulate the design, but not to synthesize it. KARL included design calculus language features supporting VLSI chip floorplanning and structured hardware design, which was also the basis of KARL's interactive graphic sister language ABL, implemented in the early 1980s as the ABLED graphic VLSI design editor, by the telecommunication research center CSELT at Torino, Italy. In the mid 80's, a VLSI design framework was implemented around KARL and ABL by an international consortium funded by the commission of the European Union (chapter in

). In 1983 Data-I/O

introduced ABEL. It was targeted for describing programmable logical devices and was basically used to design finite state machines. The first modern HDL, Verilog, was introduced by Gateway Design Automation in 1985. Cadence Design Systems later acquired the rights to Verilog-XL, the HDL-simulator that would become the de-facto standard (of Verilog simulators) for the next decade. In 1987, a request from the U.S. Department of Defense led to the development of VHDL (VHSIC Hardware

Description Language, where VHSIC is Very High Speed Integrated Circuit). VHDL was based on the Ada programming language. Initially, Verilog and VHDL were used to document and simulate circuit-designs already captured and described in another form (such as a schematic file.) HDL-simulation enabled engineers to work at a higher level of abstraction than simulation at the schematic-level, and thus increased design capacity from hundreds of transistors to thousands[citation needed]. The introduction of logic-synthesis for HDLs pushed HDLs from the background into the foreground of digital-design. Synthesis tools compiled HDL-source files (written in a constrained format called RTL) into a manufacturable gate/transistor-level netlist description. Writing synthesizeable RTL files required practice and discipline on the part of the designer; compared to a traditional schematic-layout, synthesized-RTL netlists were almost always larger in area and slower in performance[citation

. Circuit design by a skilled engineer, using labor-intensive

schematic-capture/hand-layout, would almost always outperform its logically-synthesized equivalent, but synthesis's productivity advantage soon displaced digital schematic-capture to exactly those areas that were problematic for RTL-synthesis: extremely high-speed, low-power, or asynchronous circuitry. In short, logic synthesis propelled HDL technology into a central role for digital design. Within a few years, both VHDL and Verilog emerged as the dominant HDLs in the electronics industry, while older and less-capable HDLs gradually disappeared from use. But VHDL and Verilog share many of the same limitations: neither HDL is suitable for analog/mixed-signal circuit simulation. Neither possesses language constructs to describe recursively-generated logic structures. Specialized HDLs (such as Confluence) were introduced with the explicit goal of fixing a specific Verilog/VHDL limitation, though none were ever intended to replace VHDL/Verilog. Over the years, a lot of effort has gone into improving HDLs. The latest iteration of Verilog, formally known as IEEE 1800-2005 SystemVerilog, introduces many new features (classes, random variables, and properties/assertions) to address the growing need for better testbench

randomization, design hierarchy, and reuse. A future revision of VHDL is also in development, and is expected to match SystemVerilog's improvements. Design using HDL Efficiency gains realized using HDL means a majority of modern digital circuit design revolves around it. Most designs begin as a set of requirements or a high-level architectural diagram. Control and decision structures are often prototyped in flowchart applications, or entered in a state-diagram editor. The process of writing the HDL description is highly dependent on the nature of the circuit and the designer's preference for coding style . The HDL is merely the 'capture language'often beginning with a high-level algorithmic description such as MATLAB or a C++ mathematical model. Designers often use scripting languages (such as Perl) to automatically generate repetitive circuit structures in the HDL language. Special text editors offer features for automatic indentation, syntax-dependent coloration, and macro-based expansion of entity/architecture/signal declaration. The HDL code then undergoes a code review, or auditing. In preparation for synthesis, the HDL description is subject to an array of automated checkers. The checkers report deviations from standardized code guidelines, identify potential ambiguous code constructs before they can cause misinterpretation, and check for common logical coding errors, such as dangling ports or shorted outputs. This process aids in resolving errors before the code is synthesized. In industry parlance, HDL design generally ends at the synthesis stage. Once the synthesis tool has mapped the HDL description into a gate netlist, this netlist is passed off to the back-end stage. Depending on the physical technology (FPGA, ASIC gate array, ASIC standard cell), HDLs may or may not play a significant role in the back-end flow. In general, as the design flow progresses toward a physically realizable form, the design database becomes progressively more laden with technology-specific information, which cannot be stored in a generic HDL description. Finally, an integrated circuit is manufactured or programmed for use. Simulating and debugging HDL code

Main article: Logic simulation Essential to HDL design is the ability to simulate HDL programs. Simulation allows an HDL description of a design (called a model) to pass design verification, an important milestone that validates the design's intended function (specification) against the code implementation in the HDL description. It also permits architectural exploration. The engineer can experiment with design choices by writing multiple variations of a base design, then comparing their behavior in simulation. Thus, simulation is critical for successful HDL design. To simulate an HDL model, an engineer writes a top-level simulation environment (called a testbench). At minimum, a testbench contains an instantiation of the model (called the device under test or DUT), pin/signal declarations for the model's I/O, and a clock waveform. The testbench code is event driven: the engineer writes HDL statements to implement the (testbench-generated) reset-signal, to model interface transactions (such as a host bus read/write), and to monitor the DUT's output. An HDL simulator the program that executes the testbench maintains the simulator clock, which is the master reference for all events in the testbench simulation. Events occur only at the instants dictated by the testbench HDL (such as a reset-toggle coded into the testbench), or in reaction (by the model) to stimulus and triggering events. Modern HDL simulators have a full-featured graphical user interfaces, complete with a suite of debug tools. These allow the user to stop and restart the simulation at any time, insert simulator breakpoints (independent of the HDL code), and monitor or modify any element in the HDL model hierarchy. Modern simulators can also link the HDL environment to user-compiled libraries, through a defined PLI/VHPI interface. Linking is system-dependent (Win32/Linux/SPARC), as the HDL simulator and user libraries are compiled and linked outside the HDL environment. Design verification is often the most time-consuming portion of the design process, due to the disconnect between a device's functional specification, the designer's interpretation of the specification, and the imprecision[citation needed] of the HDL language. The majority of the initial test/debug cycle is conducted in the HDL simulator environment, as the early stage of the

design is subject to frequent and major circuit changes. An HDL description can also be prototyped and tested in hardware programmable logic devices are often used for this purpose. Hardware prototyping is comparatively more expensive than HDL simulation, but offers a real-world view of the design. Prototyping is the best way to check interfacing against other hardware devices and hardware prototypes. Even those running on slow FPGAs offer much faster simulation times than pure HDL simulation. Design Verification with HDLs Main article: Functional verification Historically, design verification was a laborious, repetitive loop of writing and running simulation test cases against the design under test. As chip designs have grown larger and more complex, the task of design verification has grown to the point where it now dominates the schedule of a design team. Looking for ways to improve design productivity, the EDA industry developed the Property Specification Language. In formal verification terms, a property is a factual statement about the expected or assumed behavior of another object. Ideally, for a given HDL description, a property or properties can be proven true or false using formal mathematical methods. In practical terms, many properties cannot be proven because they occupy an unbounded solution space. However, if provided a set of operating assumptions or constraints, a property checker can prove (or disprove) more properties, over the narrowed solution space. The assertions do not model circuit activity, but capture and document the "designer's intent" in the HDL code. In a simulation environment, the simulator evaluates all specified assertions, reporting the location and severity of any violations. In a synthesis environment, the synthesis tool usually operates with the policy of halting synthesis upon any violation. Assertion-based verification is still in its infancy, but is expected to become an integral part of the HDL design toolset. HDL and programming languages

A HDL is analogous to a software programming language, but with major differences. Many programming languages are inherently procedural (single-threaded), with limited syntactical and semantic support to handle concurrency. HDLs, on the other hand, resemble concurrent programming languages in their ability to model multiple parallel processes (such as flipflops, adders, etc.) that automatically execute independently of one another. Any change to the process's input automatically triggers an update in the simulator's process stack. Both programming languages and HDLs are processed by a compiler (usually called a synthesizer in the HDL case), but with different goals. For HDLs, 'compiler' refers to synthesis, a process of transforming the HDL code listing into a physically realizable gate netlist. The netlist output can take any of many forms: a "simulation" netlist with gate-delay information, a "handoff" netlist for post-synthesis place and route, or a generic industry-standard EDIF format (for subsequent conversion to a JEDEC-format file). On the other hand, a software compiler converts the source-code listing into a microprocessorspecific object-code, for execution on the target microprocessor. As HDLs and programming languages borrow concepts and features from each other, the boundary between them is becoming less distinct. However, pure HDLs are unsuitable for general purpose software application development, just as general-purpose programming languages are undesirable for modeling hardware. Yet as electronic systems grow increasingly complex, and reconfigurable systems become increasingly mainstream, there is growing desire in the industry for a single language that can perform some tasks of both hardware design and software programming. SystemC is an example of suchembedded system hardware can be modeled as non-detailed architectural blocks (blackboxes with modeled signal inputs and output drivers). The target application is written in C/C++, and natively compiled for the host-development system (as opposed to targeting the embedded CPU, which requires host-simulation of the embedded CPU). The high level of abstraction of SystemC models is well suited to early architecture exploration, as architectural modifications can be easily evaluated with little concern for signallevel implementation issues. However, the threading model used in SystemC and its reliance on shared memory mean that it does not handle parallel execution or lower level models well.

In an attempt to reduce the complexity of designing in HDLs, which have been compared to the equivalent of assembly languages, there are moves to raise the abstraction level of the design. Companies such as Cadence, Synopsys and Agility Design Solutions are promoting SystemC as a way to combine high level languages with concurrency models to allow faster design cycles for FPGAs than is possible using traditional HDLs. Approaches based on standard C or C++ (with libraries or other extensions allowing parallel programming) are found in the Catapult C tools from Mentor Graphics, the Impulse C tools from Impulse Accelerated Technologies, and the free and open-source ROCCC 2.0 tools from Jacquard Computing Inc. Annapolis Micro Systems, Inc.'s CoreFire Design Suite and National Instruments LabVIEW FPGA provide a graphical dataflow approach to high-level design entry. Languages such as SystemVerilog, SystemVHDL, and Handel-C seek to accomplish the same goal, but are aimed at making existing hardware engineers more productive versus making FPGAs more accessible to existing software engineers. There is more information on C to HDL and Flow to HDL in their respective articles