Sie sind auf Seite 1von 73

A

SEMINAR REPORT
ON
Modern Era of Computing
(22nm Process)

By

Sonam Kumari

Guided by
Ashish Sharma
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
JODHPUR NATIONAL UNIVERSITY
2013 - 2014

(Established under Section 2 (f) of the UGC Act,1956)

Narnadi, Jhanwar Road, Jodhpur


Ph. No.: 02931-281551-54 Fax No.:02931-281416
Website: www.jodhpurnationaluniversity.com Email: info@jodhpurnationaluniversity.com

FACULTY OF ENGINEERING AND TECHNOLOGY


DEPARTMENT OF COMPUTER SCIENCE ENGINEERING

CERTIFICATE

This is to certify that the seminar entitled Modern Era of Computing has been
carried out by Sonam Kumari under my guidance in partial fulfillment of the
degree of Master of Technology in Computer Science & Engineering of Jodhpur
National University, Jodhpur during the academic year 2013-2014. To the best of
my knowledge and belief, this work has not been submitted elsewhere for the
award of any other degree.

Guide

Examiner

II

HOD

ACKNOWLEDGEMENT
Even a burning desire cant be sustained in the absence of a proper
infrastructure. One has to have a direction to proceed in, a correct path to
follow. I attempt to thank all those people who have helped me in this small
endeavor.
I would like to articulate my sincere gratitude to Dr. V.P Gupta, Dean,
FE&T, Jodhpur National University & Prof. D.K.Mehta, Chariperson,
CSE deptt. for providing me right environment alongwith all the required
facilities.
I am fortunate enough to worked under the able guidance of Ashish
Sharma. I wish to express my sincere sense of gratitude to her. Her painstaking
guidance despite very busy schedule, her inspiring supervision & keen interest,
invaluable and tireless devotion, scientific approach and brilliant technological
acumen have been a source of tremendous help.
Finally, I extend my thanks towards all those sources which provided me
information related to my topic which proved very helpful for my seminar
report.

Sonam Kumari

III

PAGE INDEX

S.No.

1.

TOPICS

Page No.

ABSTRACT

VI

INTRODUCTION. 1
1.1 Moores

Law.
1.2 History

2.

GENERATIONS 16

3.

45 nm Process..... 23

4.

32 nm Process. 33

5.

Carbon Nano Tubes......

36

5.1 Introduction. 36
5.2 Applications.
6.

40

20 nm Process..... 49
6.1

49

Introduction..
6.2 Comparison between 32nm Process and 22nm

50

Process
IV

7.

6.3 Microarchitecture..

54

6.4 Facts

64

References..
.

65

FIGURE INDEX
Figure
1.1
1.2
1.3
1.4
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
5.1
5.2
5.3
6.1
6.2
6.3
6.4
6.5
6.6
6.7

No.
Figure Caption
Moores Law chart..
Design Rule.
Transistor Count by Year Graph.
Node Size v/s nm process Graph
NMOS and PMOS Transistor.
Stress Graph of Transistor..
iOn & iOff Graphn of 10% iDsat Benefit..
iOn & iOff Graphn of 6% iDsat Benefit
iOn & iOff Graphn of Various iDsat Benefit..
RO Gain Data.
RO Gain Data v/s 65 nm Results
Microscopic Cross Section of 65nm process.
Carbon Nano tubes Structure.
Carbon Nano tubes Structure.
Carbon Nano tubes as transistors
Traditional Planner Transistor...
22nm Tri-gate Transistor
22nm Tri-gate Transistor: 2D.
22nm Tri-gate Transistor: 3D.
Microscopic Image of 32nm Planner Transistor
Microscopic Image of 22nm Planner Transistor
Microarchitecture ..

Page No
1
10
12
13
26
27
27
28
29
29
30
30
38
46
47
50
50
51
51
52
52
53s

Abstract:
Intel has deployed a fundamentally different technology for future
microprocessor families: 3D transistors manufactured at 22 nm. These new
transistors enable Intel to continue to relentlessly pursue Moore's Law and to
ensure that the pace of technology advancement consumers expect, can
continue for years to come.
Previously, transistors, the core of microprocessors, were 2D (planar) devices.
Intel's 3D Tri-Gate transistor, and the ability to manufacture it in high volume,
mark a dramatic change in the fundamental structure of the computer chip.
Learn more about the history of transistors.
This also means Intel can continue to lead in powering products, from the
world's fastest supercomputers to very small mobile handhelds.

Smaller is Better
Transistor size and structure are at the very center of delivering the benefits of
Moore's Law to the end user. The smaller and more power efficient the
transistor, the better. Intel continues to predictably shrink its manufacturing
technology in a series of "world firsts": 45 nm with high-k/metal gate in 2007;
32 nm in 2009; and now 22 nm with the world's first 3D transistor in a high
volume logic process beginning in 2011.
With a smaller, 3D transistor, Intel can design even more powerful processors
with incredible power efficiency. The new technology enables innovative
microarchitectures, System on Chip (SoC) designs, and new productsfrom
servers and PCs to smart phones, and innovative consumer products.

Transistors in the 3rd Dimension


Intel's 3D Tri-Gate transistor uses three gates wrapped around the silicon
channel in a 3D structure, enabling an unprecedented combination of
performance and energy efficiency. Intel designed the new transistor to provide
unique ultra-low power benefits for use in handheld devices, like smart phones
and tablets, while also delivering improved performance normally expected for
high-end processors.

Enabling Processor Innovation


VI

The new transistors are so impressively efficient at low voltages they allow the
Intel Atom processor design team to innovate new architectural
approaches for 22 nm Intel Atom microarchitecture. The new design
specifically maximizes the benefit of the extremely low-power 3D Tri-Gate
transistor technology. And, Intel's future SoC products based on the 22 nm 3D
Tri-Gate transistors will hit sub 1 mW idle powerfor incredibly low-power
SoCs.

Intel Continues to Lead, Users Continue to Benefit


Introduced at the end of 2011, the 3rd generation Intel Core processor was
the first high-volume chip to use 3D transistors.
As Intel continues its product leadership for servers, PCs, laptops, and
handheld devices with 22 nm 3D transistor technology, consumers and
businesses should expect faster computing and graphics, and longer battery life
in a variety of sleek form factors.

VII

VIII

CHAPTER-1
INTRODUCTION

Moore's law
Moore's law is the observation that, over the history of computing hardware,
the number of transistors in a dense integrated circuit doubles approximately
every two years. The law is named after Gordon E. Moore, co-founder of the
Intel Corporation, who described the trend in his 1965 paper. His prediction
has proven to be accurate, in part because the law is now used in the
semiconductor industry to guide long-term planning and to set targets for
research and development. The capabilities of many digital electronic
devices are strongly linked to Moore's law: quality-adjusted microprocessor
prices, memory capacity, sensors and even the number and size of pixels in
digital cameras. All of these are improving at roughly exponential rates as
well.

This exponential improvement has dramatically enhanced the impact of


digital electronics in nearly every segment of the world economy. Moore's
law describes a driving force of technological and social change,
productivity and economic growth in the late 20th and early 21st centuries.
The period is often quoted as 18 months because of Intel executive David
House, who predicted that chip performance would double every 18 months
(being a combination of the effect of more transistors and their being faster).
Although this trend has continued for more than half a century, Moore's law
should be considered an observation or conjecture and not a physical or
natural law. Sources in 2005 expected it to continue until at least 2015 or
2020. However, the 2010 update to the International Technology Roadmap
for Semiconductors predicted that growth will[when?] slow at the end of
2013, when transistor counts and densities are to double only every three
years.

History
For the 35th anniversary issue of Electronics Magazine which was published
on April 19, 1965, Gordon E. Moore, who was currently working as the
Director of R&D at Fairchild Semiconductor, was asked to predict what was
going to happen in the semiconductor components industry over the next 10
years. His response was a brief article entitled, "Cramming more
components onto integrated circuits". Within his editorial, he speculated that
by 1975 it would be possible to contain as many as 65,000 components on a
single quarter-inch semiconductor.
The complexity for minimum component costs has increased at a rate of
roughly a factor of two per year. Certainly over the short term this rate can
be expected to continue, if not to increase. Over the longer term, the rate of
increase is a bit more uncertain, although there is no reason to believe it
will remain nearly constant for at least 10 years.
His reasoning was a log-linear relationship between device complexity
(higher circuit density at reduced cost) and time:
In 1975 Moore slowed his forecast regarding the rate of density-doubling,
stating circuit density-doubling would occur every 24 months. During the
2

1975 IEEE International Electron Devices Meeting he outlined his analysis


of the contributing factors to this exponential behavior:
Die sizes were increasing at an exponential rate and as defective
densities decreased, chip manufacturers could work with larger areas
without losing reduction yields.
Simultaneous evolution to finer minimum dimensions.
and what Moore called "circuit and device cleverness".
Shortly after the 1975 IEEE Meeting, Caltech professor Carver
Mead popularized the term "Moore's law".
Despite a popular misconception, Moore is adamant that he did not predict a
doubling "every 18 months." Rather, David House, an Intel colleague had
factored in the increasing performance of transistors to conclude that
integrated circuits would double in performance every 18 months.
Predictions of similar increases in computer power had existed years
prior. Douglas Engelbart, for example, discussed the projected downscaling
of integrated circuit size in 1959 or 1960.
In April 2005, Intel offered US$10,000 to purchase a copy of the
original Electronics Magazine issue in which Moore's article appeared. An
engineer living in the United Kingdom was the first to find a copy and offer
it to Intel.
Other formulations and similar laws
Several measures of digital technology are improving at exponential rates
related to Moore's law, including the size, cost, density and speed of
components. Moore himself wrote only about the density of components, "a
component being a transistor, resistor, diode or capacitor," at minimum cost.
Transistors per integrated circuit: The most popular formulation is of the
doubling of the number of transistors on integrated circuits every two years.
At the end of the 1970s, Moore's law became known as the limit for the
number of transistors on the most complex chips. The graph at the top shows
this trend holds true today.
Density at minimum cost per transistor: This is the formulation given in
Moore's 1965 paper. It is not just about the density of transistors that can be
achieved, but about the density of transistors at which the cost per transistor
is the lowest. As more transistors are put on a chip, the cost to make each
transistor decreases, but the chance that the chip will not work due to a
defect increases. In 1965, Moore examined the density of transistors at
which cost is minimized, and observed that, as transistors were made smaller
3

through advances in photolithography, this number would increase at "a rate


of roughly a factor of two per year".
Dennard scaling: This suggests that power requirements are proportional to
area (both voltage and current being proportional to length) for transistors.
Combined with Moore's law, performance per watt would grow at roughly
the same rate as transistor density, doubling every 12 years. According to
Dennard scaling transistor dimensions are scaled by 30% (0.7x) every
technology generation, thus reducing their area by 50%. This reduces the
delay by 30% (0.7x) and therefore increases operating frequency by about
40% (1.4x). Finally, to keep electric field constant, voltage is reduced by
30%, reducing energy by 65% and power (at 1.4x frequency) by 50%.[note
2] Therefore, in every technology generation transistor density doubles,
circuit becomes 40% faster, while power consumption (with twice the
number of transistors) stays the same.
The exponential processor transistor growth predicted by Moore does not
always translate into exponentially greater practical CPU performance. Since
around 20052007, Dennard scaling appears to have broken down, so even
though Moore's law continued for several years after that, it has not yielded
dividends in improved performance. The primary reason cited for the
breakdown is that at small sizes, current leakage poses greater challenges,
and also causes the chip to heat up, which creates a threat of thermal
runaway and therefore further increases energy costs. The breakdown of
Dennard scaling prompted a switch among some chip manufacturers to a
greater focus on multicore processors, but the gains offered by switching to
more cores are lower than the gains that would be achieved had Dennard
scaling continued. In another departure from Dennard scaling, Intel
microprocessors adopted a non-planar tri-gate FinFET at 22 nm in 2012
which is faster and consumes less power than a conventional planar
transistor.
Quality adjusted price of IT equipment: The price of Information
Technology (IT), computers and peripheral equipment, adjusted for quality
and inflation, declined 16% per year on average over the five decades from
1959 to 2009. However, the pace accelerated to 23% per year in 1995-1999
triggered by faster IT innovation, and later slowed to 2% per year in 20102013. The rate of quality-adjusted microprocessor price improvement
likewise varies, and is not linear on a log scale. Microprocessor price
improvement accelerated during the late 1990s, reaching 60% per year
(halving every nine months) versus the typical 30% improvement rate
(halving every two years) during the years earlier and later.
4

The number of transistors per chip cannot explain quality-adjusted


microprocessor prices fully. Moore's 1995 paper does not limit Moore's law
to strict linearity or to transistor count, The definition of 'Moore's Law' has
come to refer to almost anything related to the semiconductor industry that
when plotted on semi-log paper approximates a straight line. I hesitate to
review its origins and by doing so restrict its definition.
Moore (2003) credits chemical mechanical planarization (chip smoothing)
with increasing the connectivity of microprocessors from two or three metal
layers in the early 1990s to seven in 2003. This has leveled off at 9-11 layers
since 2007. Connectivity improves performance, and relieves network
congestion. Just as additional floors may not enlarge a building's footprint,
nor is connectivity tallied in transistor count. Microprocessors rely more on
communications (interconnect) than do DRAM chips, which have three or
four metal layers. Microprocessor prices in the late 1990s improved faster
than DRAM prices.
Hard disk drive areal density: A similar law (sometimes called Kryder's law)
has held for hard disk drive areal density. The rate of progress in disk storage
over the past decades has sped up more than once, corresponding to the
utilization of error correcting codes, the magnetoresistive effect and the giant
magnetoresistive effect. The outlook for the rate of progress slowed in recent
years, because of noise related to smaller grain size of the disk media,
thermal stability and writability using available magnetic fields,
Network capacity. According to Gerry/Gerald Butters, the former head of
Lucent's Optical Networking Group at Bell Labs, there is another version,
called Butters' Law of Photonics,[48] a formulation which deliberately
parallels Moore's law. Butter's law[49] says that the amount of data coming
out of an optical fiber is doubling every nine months. Thus, the cost of
transmitting a bit over an optical network decreases by half every nine
months. The availability of wavelength-division multiplexing (sometimes
called WDM) increased the capacity that could be placed on a single fiber by
as much as a factor of 100. Optical networking and dense wavelengthdivision multiplexing (DWDM) is rapidly bringing down the cost of
networking, and further progress seems assured. As a result, the wholesale
price of data traffic collapsed in the dot-com bubble. Nielsen's Law says that
the bandwidth available to users increases by 50% annually.
Pixels per dollar: Similarly, Barry Hendy of Kodak Australia has plotted
pixels per dollar as a basic measure of value for a digital camera,
demonstrating the historical linearity (on a log scale) of this market and the
5

opportunity to predict the future trend of digital camera price, LCD and LED
screens and resolution.
The great Moore's law compensator (TGMLC), generally referred to as
bloat, and also known as Wirth's law, is the principle that successive
generations of computer software acquire enough bloat to offset the
performance gains predicted by Moore's law. In a 2008 article in InfoWorld,
Randall C. Kennedy] formerly of Intel, introduces this term using successive
versions of Microsoft Office between the year 2000 and 2007 as his premise.
Despite the gains in computational performance during this time period
according to Moore's law, Office 2007 performed the same task at half the
speed on a prototypical year 2007 computer as compared to Office 2000 on a
year 2000 computer.
Library expansion was calculated in 1945 by Fremont Rider to double in
capacity every 16 years, if sufficient space were made available. He
advocated replacing bulky, decaying printed works with miniaturized
microform analog photographs, which could be duplicated on-demand for
library patrons or other institutions. He did not foresee the digital technology
that would follow decades later to replace analog microform with digital
imaging, storage, and transmission mediums. Automated, potentially lossless
digital technologies allowed vast increases in the rapidity of information
growth in an era that is now sometimes called an Information Age.
The Carlson Curve is a term coined by The Economist to describe the
biotechnological equivalent of Moore's law, and is named after author Rob
Carlson. Carlson accurately predicted that the doubling time of DNA
sequencing technologies (measured by cost and performance) would be at
least as fast as Moore's law. Carlson Curves illustrate the rapid (in some
cases hyperexponential) decreases in cost, and increases in performance, of a
variety of technologies, including DNA sequencing, DNA synthesis and a
range of physical and computational tools used in protein expression and in
determining protein structures.
As a target for industry and a self-fulfilling prophecy
Although Moore's law was initially made in the form of an observation and
forecast, the more widely it became accepted, the more it served as a goal
for an entire industry. This drove both marketing and engineering
departments of semiconductor manufacturers to focus enormous energy
aiming for the specified increase in processing power that it was presumed
6

one or more of their competitors would soon actually attain. In this regard, it
can be viewed as a self-fulfilling prophecy.
Moore's second law
As the cost of computer power to the consumer falls, the cost for producers
to fulfill Moore's law follows an opposite trend: R&D, manufacturing, and
test costs have increased steadily with each new generation of chips. Rising
manufacturing costs are an important consideration for the sustaining of
Moore's law. This had led to the formulation of Moore's second law, also
called Rock's law, which is that the capital cost of a semiconductor fab also
increases exponentially over time.
Major enabling factors and future trends
Numerous innovations by a large number of scientists and engineers have
helped significantly to sustain Moore's law since the beginning of the
integrated circuit (IC) era. Whereas assembling a detailed list of such
significant contributions would be as desirable as it would be difficult,
below just a few innovations are listed as examples of breakthroughs that
have played a critical role in the advancement of integrated circuit
technology by more than seven orders of magnitude in less than five
decades:
The foremost contribution, which is the raison detre for Moore's law, is the
invention of the integrated circuit itself, credited contemporaneously to Jack
Kilby at Texas Instruments and Robert Noyce at Fairchild Semiconductor.
The invention of the complementary metaloxidesemiconductor (CMOS)
process by Frank Wanlass in 1963. A number of advances in CMOS
technology by many workers in the semiconductor field since the work of
Wanlass have enabled the extremely dense and high-performance ICs that
the industry makes today.
The invention of the dynamic random access memory (DRAM) technology
by Robert Dennard at I.B.M. in 1967. that made it possible to fabricate
single-transistor memory cells, and the invention of flash memory by Fujio
Masuoka at Toshiba in the 1980s, leading to low-cost, high-capacity memory
in diverse electronic products.
The invention of chemically amplified photoresist by C. Grant Willson,
Hiroshi Ito and J.M.J. Frchet at IBM c.1980, that was 10-100 times more
sensitive to ultraviolet light. IBM introduced chemically amplified
photoresist for DRAM production in the mid-1980s.
7

The invention of deep UV excimer laser photolithography by Kanti Jain at


IBM c.1980, that has enabled the smallest features in ICs to shrink from 800
nanometers in 1990 to as low as 22 nanometers in 2012. This built on the
invention of the excimer laser in 1970 by Nikolai Basov, V. A. Danilychev
and Yu. M. Popov, at the Lebedev Physical Institute. From a broader
scientific perspective, the invention of excimer laser lithography has been
highlighted as one of the major milestones in the 50-year history of the laser.
The interconnect innovations of the late 1990s. IBM developed CMP or
chemical mechanical planarization c.1980, based on the centuries-old
polishing process for making telescope lenses. CMP smooths the chip
surface. Intel used chemical-mechanical polishing to enable additional layers
of metal wires in 1990; higher transistor density (tighter spacing) via trench
isolation, local polysilicon (wires connecting nearby transistors) and
improved wafer yield (all in 1995). Higher yield, the fraction of working
chips on a wafer, reduces manufacturing cost. IBM with assistance from
Motorola used CMP for lower electrical resistance copper interconnect
instead of aluminum in 1997.
Computer industry technology roadmaps predict (as of 2001) that Moore's
law will continue for several generations of semiconductor chips. Depending
on the doubling time used in the calculations, this could mean up to a
hundredfold increase in transistor count per chip within a decade. The
semiconductor industry technology roadmap uses a three-year doubling time
for microprocessors, leading to a tenfold increase in the next decade. Intel
was reported in 2005 as stating that the downsizing of silicon chips with
good economics can continue during the next decade, and in 2008 as
predicting the trend through 2029.
Some of the new directions in research that may allow Moore's law to
continue are:
Researchers from IBM and Georgia Tech created a new speed record when
they ran a supercooled silicon-germanium transistor above 500 GHz at a
temperature of 4.5 K (269 C; 452 F).
In April 2008, researchers at HP Labs announced the creation of a working
memristor, a fourth basic passive circuit element whose existence had
previously only been theorized. The memristor's unique properties permit
the creation of smaller and better-performing electronic devices.
In February 2010, Researchers at the Tyndall National Institute in Cork,
Ireland announced a breakthrough in transistors with the design and
fabrication of the world's first junctionless transistor. The research led by
Professor Jean-Pierre Colinge was published in Nature Nanotechnology and
8

describes a control gate around a silicon nanowire that can tighten around
the wire to the point of closing down the passage of electrons without the
use of junctions or doping. The researchers claim that the new junctionless
transistors can be produced at 10-nanometer scale using existing fabrication
techniques.
In April 2011, a research team at the University of Pittsburgh announced the
development of a single-electron transistor 1.5 nanometers in diameter made
out of oxide based materials. According to the researchers, three "wires"
converge on a central "island" which can house one or two electrons.
Electrons tunnel from one wire to another through the island. Conditions on
the third wire result in distinct conductive properties including the ability of
the transistor to act as a solid state memory.
In February 2012, a research team at the University of New South Wales
announced the development of the first working transistor consisting of a
single atom placed precisely in a silicon crystal (not just picked from a large
sample of random transistors). Moore's law predicted this milestone to be
reached in the lab by 2020.
In April 2014, bioengineers at Stanford University developed a new circuit
board modeled on the human brain. 16 custom designed "Neurocore" chips
simulate 1 million neurons and billions of synaptic connections. This
Neurogrid is claimed to be 9,000 times faster and more energy efficient than
a typical PC. The cost of the prototype was $40,000; however with current
technology a similar Neurogrid could be made for $400.
The advancement of nanotechnology could spur the creation of microscopic
computers and restore Moore's Law to its original rate of growth.
Ultimate limits of the law
Atomistic simulation result for formation of inversion channel (electron
density) and attainment of threshold voltage (IV) in a nanowire MOSFET.
Note that the threshold voltage for this device lies around 0.45 V. Nanowire
MOSFETs lie towards the end of the ITRS roadmap for scaling devices
below 10 nm gate lengths.
On 13 April 2005, Gordon Moore stated in an interview that the law cannot
be sustained indefinitely: "It can't continue forever. The nature of
exponentials is that you push them out and eventually disaster happens". He
also noted that transistors would eventually reach the limits of
miniaturization at atomic levels:

In terms of size [of transistors] you can see that we're approaching the size
of atoms which is a fundamental barrier, but it'll be two or three generations
before we get that farbut that's as far out as we've ever been able to see.
We have another 10 to 20 years before we reach a fundamental limit. By
then they'll be able to make bigger chips and have transistor budgets in the
billions.
In January 1995, the Digital Alpha 21164 microprocessor had 9.3 million
transistors. This 64-bit processor was a technological spearhead at the time,
even if the circuit's market share remained average. Six years later, a state of
the art microprocessor contained more than 40 million transistors. It is

theorised that with further miniaturisation, by 2015 these processors should


contain more than 15 billion transistors, and by 2020 will be in molecular
scale production, where each molecule can be individually positioned.
In 2003, Intel predicted the end would come between 2013 and 2018 with 16
nanometer manufacturing processes and 5 nanometer gates, due to quantum
tunnelling, although others suggested chips could just get bigger, or become
layered. In 2008 it was noted that for the last 30 years it has been predicted
that Moore's law would last at least another decade.
Some see the limits of the law as being in the distant future. Lawrence
Krauss and Glenn D. Starkman announced an ultimate limit of around 600
years in their paper, based on rigorous estimation of total informationprocessing capacity of any system in the Universe, which is limited by the
Bekenstein bound. On the other hand, based on first principles, there are
10

predictions that Moore's law will collapse in the next few decades [2040
years]".
One could also limit the theoretical performance of a rather practical
"ultimate laptop" with a mass of one kilogram and a volume of one litre.
This is done by considering the speed of light, the quantum scale, the
gravitational constant and the Boltzmann constant, giving a performance of
5.42581050 logical operations per second on approximately 1031 bits.
Then again, the law has often met obstacles that first appeared
insurmountable but were indeed surmounted before long. In that sense,
Moore says he now sees his law as more beautiful than he had realized:
"Moore's law is a violation of Murphy's law. Everything gets better and
better."
Futurists and Moore's law
Kurzweil's extension of Moore's law from integrated circuits to earlier
transistors, vacuum tubes, relays and electromechanical computers.
If the current trend continues to 2020, the number of transistors would reach
32 billion.
Futurists such as Ray Kurzweil, Bruce Sterling, and Vernor Vinge believe
that the exponential improvement described by Moore's law will ultimately
lead to a technological singularity: a period where progress in technology
occurs almost instantly.
Although Kurzweil agrees that by 2019 the current strategy of ever-finer
photolithography will have run its course, he speculates that this does not
mean the end of Moore's law:
Moore's law of Integrated Circuits was not the first, but the fifth paradigm to
forecast accelerating price-performance ratios. Computing devices have
been consistently multiplying in power (per unit of time) from the
mechanical calculating devices used in the 1890 U.S. Census, to [Newman]
relay-based "[Heath] Robinson" machine that cracked the Lorenz cipher, to
the CBS vacuum tube computer that predicted the election of Eisenhower, to
the transistor-based machines used in the first space launches, to the
integrated-circuit-based personal computer.
Kurzweil speculates that it is likely that some new type of technology (e.g.
optical, quantum computers, DNA computing) will replace current
integrated-circuit technology, and that Moore's Law will hold true long after
2020.
11

Seth Lloyd shows how the potential computing capacity of a kilogram of


matter equals pi times energy divided by Planck's constant. Since the energy
is such a large number and Planck's constant is so small, this equation
generates an extremely large number: about 5.0 * 1050 operations per
second.
He believes that the exponential growth of Moore's law will continue
beyond the use of integrated circuits into technologies that will lead to the
technological singularity. The Law of Accelerating Returns described by
Ray Kurzweil has in many ways altered the public's perception of Moore's
law. It is a common (but mistaken) belief that Moore's law makes
predictions regarding all forms of technology, when it was originally
intended to apply only to semiconductor circuits. Many futurists still use the
term Moore's law in this broader sense to describe ideas like those put forth
by Kurzweil. Kurzweil has hypothesised that Moore's law will apply at
least by inference to any problem that can be attacked by digital computers
as is in its essence also a digital problem. Therefore, because of the digital
12

coding of DNA, progress in genetics may also advance at a Moore's law


rate. Moore himself, who never intended his law to be interpreted so broadly,
has quipped:
Moore's law has been the name given to everything that changes
exponentially. I say, if Gore invented the Internet, I invented the exponential.
Consequences and limitations
Technological change is a combination of more and of better technology. A
2011 study in the journal Science showed that the peak of the rate of change
of the world's capacity to compute information was in the year 1998, when
the world's technological capacity to compute information on generalpurpose computers grew at 88% per year.Since then, technological change
has clearly slowed. In recent times, every new year allowed mankind to
carry out roughly 60% of the computations that could have possibly been
executed by all existing general-purpose computers before that year. This is
still exponential, but shows the varying nature of technological change.

The primary driving force of economic growth is the growth of productivity,


and Moore's law factors into productivity. Moore (1995) expected that the
rate of technological progress is going to be controlled from financial
realities. However, the reverse could and did occur around the late-1990s,
with economists reporting that "Productivity growth is the key economic
indicator of innovation." An acceleration in the rate of semiconductor
progress contributed to a surge in US productivity growth which reached
13

3.4% per year in 1997-2004, outpacing the 1.6% per year during both 19721996 and 2005-2013. As economist Richard G. Anderson notes, Numerous
studies have traced the cause of the productivity acceleration to
technological innovations in the production of semiconductors that sharply
reduced the prices of such components and of the products that contain them
(as well as expanding the capabilities of such products).
Intel transistor gate length trend. Transistor scaling has slowed down
significantly at advanced (smaller) nodes.
While physical limits to transistor scaling such as source-to-drain leakage,
limited gate metals, and limited options for channel material have been
reached, new avenues for continued scaling are open. The most promising of
these approaches rely on using the spin state of electron spintronics, tunnel
junctions, and advanced confinement of channel materials via nano-wire
geometry. A comprehensive list of available device choices shows that a
wide range of device options is open for continuing Moore's law into the
next few decades. Spin-based logic and memory options are actively being
developed in industrial labs as well as academic labs.
Another source of improved performance is in microarchitecture techniques
exploiting the growth of available transistor count. Out-of-order execution
and on-chip caching and prefetching reduce the memory latency bottleneck
at the expense of using more transistors and increasing the processor
complexity. These increases are empirically described by Pollack's Rule
which states that performance increases due to microarchitecture techniques
are square root of the number of transistors or the area of a processor.
For years, processor makers delivered increases in clock rates and
instruction-level parallelism, so that single-threaded code executed faster on
newer processors with no modification. Now, to manage CPU power
dissipation, processor makers favor multi-core chip designs, and software
has to be written in a multi-threaded manner to take full advantage of the
hardware. Many multi-threaded development paradigms introduce overhead,
and will not see a linear increase in speed vs number of processors. This is
particularly true while accessing shared or dependent resources, due to lock
contention. This effect becomes more noticeable as the number of processors
increases. There are cases where a roughly 45% increase in processor
transistors have translated to roughly 1020% increase in processing power.
On the other hand, processor manufactures are taking advantage of the 'extra
space' that the transistor shrinkage provides to add specialized processing
units to deal with features such as graphics, video and cryptography. For one
14

example, Intel's Parallel JavaScript extension not only adds support for
multiple cores, but also for the other non-general processing features of their
chips, as part of the migration in client side scripting towards HTML5.
A negative implication of Moore's law is obsolescence, that is, as
technologies continue to rapidly "improve", these improvements can be
significant enough to rapidly render predecessor technologies obsolete. In
situations in which security and survivability of hardware or data are
paramount, or in which resources are limited, rapid obsolescence can pose
obstacles to smooth or continued operations. Because of the toxic materials
used in the production of modern computers, obsolescence if not properly
managed can lead to harmful environmental impacts.
Moore's law has significantly impacted the performance of other
technologies: Michael S. Malone wrote of a Moore's War following the
apparent success of shock and awe in the early days of the Iraq War.
Progress in the development of guided weapons depends on electronic
technology. Improvements in circuit density and low-power operation
associated with Moore's law have also contributed to the development of
Star Trek-like technologies including mobile phones and replicator-like 3D
printing.

15

CHAPTER-2
GENERATIONS

8086
Introduced June 8, 1978
,Number of transistors 29,000 at 3 m
The first x86 CPU.
Later renamed the iAPX 86[4]
8088
Introduced June 1, 1979
External bus Width 8 bits data, 20 bits address
Number of transistors 29,000 at 3 m
Addressable memory 1 megabyte
80186
Introduced 1982
Number of transistors ~78,999 2 m
80286
Introduced February 2, 1982
Number of transistors 134,000 at 1.5 m
80386DX
Introduced October 17, 1985
Number of transistors 275,000 at 1 m
80386SX
Introduced June 16, 1988
Number of transistors 275,000 at 1 m
Later renamed Intel386TM SX

16

80386SL
Introduced October 15, 1990
Number of transistors 855,000 at 1 m
80486DX
Introduced April 10, 1989
Number of transistors 1.2 million at 1 m; the 50 MHz was at 0.8 m
80486SX
Introduced April 22, 1991
Number of transistors 1.185 million at 1 m and 900,000 at 0.8 m
80486SL
Introduced November 9, 1992
Number of transistors 1.4 million at 0.8 m
80486DX4
Introduced March 7, 1994
Number of transistors 1.6 million at 0.6 m
P5 0.8 m process technology
Introduced March 22, 1993
Number of transistors 3.1 million
The only Pentium running on 5 Volts
P54 0.6 m process technology
Number of transistors 3.2 million
Introduced October 10, 1994
P54CQS 0.35 m process technology
Number of transistors 3.2 million
Introduced March 27, 1995
P54CS 0.35 m process technology
Number of transistors 3.3 million
Introduced June 12, 1995

17

Pentium with MMX Technology


P55C 0.35 m process technology
Introduced January 8, 1997
Pentium Pro
Introduced November 1, 1995
Precursor to Pentium II and III
Primarily used in server systems
Number of transistors 5.5 million
0.6 m process technology
Pentium II
Introduced May 7, 1997
Pentium Pro with MMX and improved 16-bit performance
Number of transistors 7.5 million
Klamath 0.35 m process technology
Deschutes 0.25 m process technology
Introduced January 26, 1998
Celeron (Pentium II-based)
Covington 0.25 m process technology
Introduced April 15, 1998
Number of transistors 7.5 million
Mendocino 0.25 m process technology
Introduced August 24, 1998
Number of transistors 19 million
Pentium III
Katmai 0.25 m process technology
Introduced February 26, 1999
Number of transistors 9.5 million
Coppermine 0.18 m process technology
Introduced October 25, 1999
Number of transistors 28.1 million

18

Pentium II and III Xeon


Introduced October 25, 1999
Number of transistors: 9.5 million at 0.25 m or 28 million at 0.18 m
Celeron (Pentium III Coppermine-based)
Coppermine-128, 0.18 m process technology
Introduced March, 2000
Pentium 4 (not 4EE, 4E, 4F), Itanium, P4-based Xeon, Itanium 2
(chronological entries)
Introduced April 2000 July 2002
Tualatin Celeron 0.13 m process technology
Pentium M
Banias 0.13 m process technology
Introduced March 2003
Dothan 0.09 m (90 nm) process technology
Introduced May 2004
140 million transistors
Intel Core
Yonah 0.065 m (65 nm) process technology
Introduced January 2006
Pentium 4
0.18 m process technology (1.40 and 1.50 GHz)
Introduced November 20, 2000
Pentium 4E
Introduced February 2004
built on 0.09 m (90 nm) process technology Prescott (2.4A, 2.8, 2.8A, 3.0,
3.2, 3.4, 3.6, 3.8) 1 MB L2 cache
Number of transistors 125 million on 1 MB Models
Number of transistors 169 million on 2 MB Models

19

Pentium 4F
Prescott-2M built on 0.09 m (90 nm) process technology
Introduced February 20, 2005
Cedar Mill built on 0.065 m (65 nm) process technology
Introduced January 16, 2006
Pentium D
Smithfield 90 nm process technology (2.663.2 GHz)
Introduced May 26, 2005
Presler 65 nm process technology (2.83.6 GHz)
Introduced January 16, 2006
Pentium D 945
Smithfield 90 nm process technology (3.2 GHz)
Presler 65 nm process technology (3.46, 3.73)
Xeon
Introduced 200
Dempsey 65 nm process (2.67 3.73 GHz)
Introduced May 23, 2006
Intel Core 2
Conroe 65 nm process technology
Desktop CPU (SMP support restricted to 2 CPUs)
Two cores on one die
Introduced July 27, 2006
SSSE3 SIMD instructions
Number of transistors: 291 million
Pentium Dual-Core
Allendale 65 nm process technology
Desktop CPU (SMP support restricted to 2 CPUs)
Two cores on one die
Introduced January 21, 2007
SSSE3 SIMD instructions
Number of transistors 167 million

20

Intel Pentium
Clarkdale 32 nm process technology
Introduced January 2010
Core i3
Clarkdale 32 nm process technology
Introduced January, 2010
Core i5
Lynnfield 45 nm process technology
4 physical cores
Introduced January, 2010
Core i7
Bloomfield 45 nm process technology
4 physical cores
781 million transistors
Introduced November 17, 2008
Celeron
Sandy Bridge 32 nm process technology
2 physical cores/2 threads (500 series), 1 physical core/1 thread (model
G440) or 1 physical core/2 threads (models G460 & G465)
2 MB L3 cache (500 series), 1 MB (model G440) or 1.5 MB (models G460
& G465)
Introduced 3rd quarter, 2011
Pentium
Sandy Bridge 32 nm process technology
2 physical cores/2 threads
624 million transistors
Introduced May, 2011
Core i3
Sandy Bridge 32 nm process technology
2 physical cores/4 threads
624 million transistors
Introduced January, 2011

21

Core i3
Ivy Bridge 22 nm Tri-gate transistor process technology
2 physical cores/4 threads
32+32 Kb (per core) L1 cache
Core i5
Sandy Bridge 32 nm process technology
4 physical cores/4 threads (except for i5-2390T which has 2 physical cores/4
threads)
995 million transistors
Introduced January, 2011
Core i7
Sandy Bridge 32 nm process technology
4 physical cores/8 threads
995 million transistors
Introduced January, 2011
Sandy Bridge-E 32 nm process technology
Core i7 Haswell
2270 million transistors
Introduced November, 2011
Sandy Bridge 22 nm process technology
Coming Processors
8000+ million transistors
To be Introduced in, 2015
8-4nm Carbon Tubes process technology

22

CHAPTER-3
45 nm PROCESS
Per the International Technology Roadmap for Semiconductors, the 45
nanometer (45 nm) technology node should refer to the average half-pitch of
a memory cell manufactured at around the 20072008 time frame.
Matsushita and Intel started mass-producing 45 nm chips in late 2007, and
AMD started production of 45 nm chips in late 2008, while IBM, Infineon,
Samsung, and Chartered Semiconductor have already completed a common
45 nm process platform. At the end of 2008, SMIC was the first China-based
semiconductor company to move to 45 nm, having licensed the bulk 45 nm
process from IBM.
Many critical feature sizes are smaller than the wavelength of light used for
lithography (i.e., 193 nm and 248 nm). A variety of techniques, such as
larger lenses, are used to make sub-wavelength features. Double patterning
has also been introduced to assist in shrinking distances between features,
especially if dry lithography is used. It is expected that more layers will be
patterned with 193 nm wavelength at the 45 nm node. Moving previously
loose layers (such as Metal 4 and Metal 5) from 248 nm to 193 nm
wavelength is expected to continue, which will likely further drive costs
upward, due to difficulties with 193 nm photoresists.
High-k dielectrics
Chipmakers have initially voiced concerns about introducing new high-k
materials into the gate stack, for the purpose of reducing leakage current
density. As of 2007, however, both IBM and Intel have announced that they
have high-k dielectric and metal gate solutions, which Intel considers to be a
fundamental change in transistor design. NEC has also put high-k materials
into production.
Key Points
At IEDM 2007, more technical details of Intel's 45 nm process were
revealed.
Since immersion lithography is not used here, the lithographic patterning is
more difficult. Hence many lines have been lengthened rather than
shortened. A more time-consuming double patterning method is used
explicitly for this 45 nm process, resulting in potentially higher risk of
product delays than before. Also, the use of high-k dielectrics is introduced
23

for the first time, to address gate leakage issues. For the 32 nm node,
immersion lithography will begin to be used by Intel.
1. 160 nm gate pitch (73% of 65 nm generation)
2. 200 nm isolation pitch (91% of 65 nm generation) indicating a
slowing of scaling of isolation distance between transistors
3. Extensive use of dummy copper metal and dummy gates
4. 35 nm gate length (same as 65 nm generation)
5. 1 nm equivalent oxide thickness, with 0.7 nm transition layer
6. Gate-last process using dummy polysilicon and damascene metal gate
7. Squaring of gate ends using a second photoresist coating
8. 9 layers of carbon-doped oxide and Cu interconnect, the last being a
thick "redistribution" layer
9. Contacts shaped more like rectangles than circles for local
interconnection
10.Lead-free packaging
11.1.36 mA/um nFET drive current
12.1.07 mA/um pFET drive current, 51% faster than 65 nm generation,
with higher hole mobility due to increase from 23% to 30% Ge in
embedded SiGe stressors
In a recent Chipworks reverse-engineering, it was disclosed that the trench
contacts were formed as a "Metal-0" layer in tungsten serving as a local
interconnect. Most trench contacts were short lines oriented parallel to the
gates covering diffusion, while gate contacts where even shorter lines
oriented perpendicular to the gates.
It was recently revealed that both the Nehalem and Atom microprocessors
used SRAM cells containing eight transistors instead of the conventional six,
in order to better accommodate voltage scaling. This resulted in an area
penalty of over 30%.
Process
Two key process features that are used to make 45nm generation metal gate
+ high-k gate dielectric CMOS transistors are highlighted in this paper. The
first feature is the integration of stress-enhancement techniques with the dual
metal-gate + high-k transistors. The second feature is the extension of
193nm dry lithography to the 45nm technology node pitches. Use of these
features has enabled industry-leading transistor performance and the first
high volume 45nm high-k + metal gate technology.
High-k + metal gate transistors have been incorporated into our 45nm logic
technology to provide improved performance and significantly reduced gate
leakage . Hi-k + Metal gates have also been shown to have improved
24

variability at the 45nm node. The transistors in this work feature 1.0nm EOT
high-k gate dielectrics with dual workfunction metal gate electrodes and
35nm gate lengths. The addition of new gate materials is complicated by the
need to mesh the process requirements of the metal gate process with the
uniaxial strain-inducing components that have become central to the
transistor architecture. The resultant process flow needs to ensure that the
performance benefits of both elements are fully realized.
The standard scaling requirements for the strained silicon components and
for the gate and contact pitches also needs to be addressed at the 45nm node.
Using 193nm dry lithography for critical layers at the 45nm technology node
is preferred over moving to 193nm immersion lithography due to lower cost
and greater maturity of the toolset. In order to achieve the 160nm gate and
contact pitch requirements, unique gate and contact patterning process flows
have been implemented.
Strain + Metal Gate: Key process considerations/results
The most commonly used techniques for implementing strain in the
transistors include embedded SiGe in the PMOS S/D, stress memorization
for the NMOS and a nitride stress capping layer for NMOS and PMOS
devices. The two common methods for introducing a metal gate to the
standard CMOS flow include, either gate-first or gate-last process. Most
comparisons of these two process flows focus on the ability to select the
appropriate workfunction metals, the ease of integration or the ability to
scale but typically fail to comprehend the interaction with the straininducing techniques.

25

In the gate-first flow, the dual-metal processing is completed prior to the


polysilicon gate deposition. The metal-gates are then subtractively etched
along with the poly gates prior to S/D formation. In contrast, for the gate-last
flow, a standard polysilicon gate is deposited after the high-k gate dielectric
deposition, which is followed by standard polysilicon processing through the
salicide and the 1st ILD deposition. The wafer is then planarized and the
dummy poly gate removed. The dual-metal gates are then deposited along
with a low-resistance gate fill material. The excess metal is then polished off
and followed by contact processing.

26

By removing the poly gate from transistor after the stress-enhancement


techniques are in place, it has been shown that the stress benefit from the
embedded S/D SiGe process can be enhanced. This is a key benefit for the
gate-last process and can be illustrated in simulation with an estimated 50%
increase in lateral compressive stress by removal of the polysilicon gate.

27

By this process PMOS transistors before and after 50% of increase in Stress
to 1.2GPa resulting in heating of processor core, red sections shows highest
stress area ~1.2GPa and black areas shows lowest stress area ~0.8GPa.
The Ge concentration of the SiGe stressors was increased from 22% in our
65nm technology to 30% in 45nm. The combined impact of the increased
Ge fraction and the strain enhancement from the gate last process allow for
1.5x higher hole mobility compared to 65nm despite the scaling of the
transistor pitch from 220nm to 160nm.
Two methods of stress enhancement have been employed on the NMOS in
this technology. First, the loss of the nitride stress layer benefit due to
scaling the pitch from 65nm has been overcome by the introduction of
trench contacts and tailoring the contact fill material to induce a tensile
stress in the channel. The NMOS response to tensile vs. compressive contact
fill materials is shown in figure.

The trench contact fill material impact on the PMOS device is mitigated by
use of the raised S/D inherent to the embedded SiGe S/D process. The S/D
component of stress memorization is compatible with the gate-last flow
28

while the poly gate component would be compromised. The poly gate
component is replaced by Metal Gate Stress (MGS): modifying the metalgate fill material to directly induce stress in the channel. By introducing a
compressive stress gate fill material the performance of the NMOS device is
enhanced and additive to the contact fill technique.

By use of a dual-metal process with PMOS 1st, the stress of the NMOS gate
is decoupled from the PMOS gate through optimization of the PMOS gate
stack to buffer the stress. Through the strain enhancement and elimination of
poly depletion both the saturation and linear drive currents improved.

29

Subthreshold characteristics are well-behaved. Ring oscillator data for a


fanout of 2 gate delay shows an improvement of 23% is demonstrated.

The table in figure breaks out the RO gains between Idsat, Idlin and the gate
and junction capacitances.
193nm Dry Patterning @ 45nm.
The gate patterning process uses a double patterning scheme. Initially the
gate stack is deposited including the polysilicon and hardmask deposition.
The first lithography step patterns a series of parallel, continuous lines. Only
discrete pitches are allowed, with the smallest at 160nm. A second masking
step is then used to define the cuts in the lines. The 2-step process enables
abrupt poly endcap
regions allowing
tight CTG design
rules

30

The contact patterning process also uses a similar restriction to facilitate


lithography. Trench diffusion contacts run parallel to the gates with discrete
pitches, while trench gate contacts run orthogonal to the gates. Use of trench
contacts has the added benefits of lowering the contact resistance by >50%
and allowing use as a local interconnect which improves SRAM/logic
density by up to 10%.
High-k + metal gate transistors have been integrated into a manufacturable
45nm CMOS process using 193nm dry lithography. The significant strain
enhancement benefits of the gate-last process flow have been highlighted.
The process has demonstrated record drive current at low leakage.
Commercial introduction
Matsushita Electric Industrial Co. started mass production of System-on-achip (SoC) for use in digital consumer equipment based on the 45-nm
process technology.
Intel shipped its first 45 nanometer based processor, the Xeon 5400-series, in
November 2007.
Many details about Penryn appeared at the April 2007 Intel Developer
Forum. Its successor is called Nehalem. Important advances include the
addition of new instructions (including SSE4, also known as Penryn New
Instructions) and new fabrication materials (most significantly a hafniumbased dielectric).
AMD released its Sempron II, Athlon II, Turion II and Phenom II (in
generally increasing order of strength), as well as Shanghai Opteron
processors using the 45-nm process technology.
The Xbox 360 S, released in 2010, has its Xenon processor in 45 nm
process.
The PlayStation 3 Slim model introduced Cell Broadband Engine in 45 nm
process.
31

CHAPTER-4
32 nm PROCESS
The 32 nanometer (32 nm) node is the step following the 45 nanometer
process in CMOS semiconductor device fabrication. "32 nanometer" refers
to the average half-pitch (i.e., half the distance between identical features) of
a memory cell at this technology level. Intel and AMD both produced
commercial microchips using the 32 nanometer process in the early 2010s.
IBM and the Common Platform also developed a 32 nm high-k metal gate
process. Intel began selling its first 32 nm processors using the Westmere
architecture on 7 January 2010. The 32 nm process was superseded by
commercial 22 nm technology in 2012.
Technology demos
Prototypes using 32 nm technology first emerged in the mid-2000s. In 2004,
IBM demonstrated a 0.143 m2 SRAM cell with a poly gate pitch of 135
nm, produced using electron-beam lithography and photolithography on the
same layer. It was observed that the cell's sensitivity to input voltage
fluctuations degraded significantly at such a small scale. In October 2006,
32

the Interuniversity Microelectronics Centre (IMEC) demonstrated a 32 nm


flash patterning capability based on double patterning and immersion
lithography. The necessity of introducing double patterning and hyper-NA
tools to reduce memory cell area offset some of the cost advantages of
moving to this node from the 45 nm node. TSMC similarly used double
patterning combined with immersion lithography to produce a 32 nm node
0.183 m2 six-transistor SRAM cell in 2005.
Intel Corporation revealed its first 32 nm test chips to the public on 18
September 2007 at the Intel Developer Forum. The test chips had a cell size
of 0.182 m2, used a second-generation high-k gate dielectric and metal
gate, and contained almost two billion transistors. 193 nm immersion
lithography was used for the critical layers, while 193 nm or 248 nm dry
lithography was used on less critical layers. The critical pitch was 112.5 nm.
In January 2011, Samsung completed development of what it claimed was
the industry's first DDR4 DRAM module using a process technology with a
size between 30 nm and 39 nm. The module could reportedly achieve data
transfer rates of 2.133 Gbit/s at 1.2V, compared to 1.35V and 1.5V DDR3
DRAM at an equivalent 30 nm-class process technology with speeds of up
to 1.6 Gbit/s. The module used pseudo open drain (POD) technology,
specially adapted to allow DDR4 DRAM to consume just half the current of
DDR3 when reading and writing data.
28nm Last node of Moores law
The 28nm generation represents TSMC's most energy-efficient and highperformance method of manufacturing to date. 28nm is the first generation
that foundry industry starts to use high-K metal gate (HKMG) process. Still,
poly/oxynitride process is offered to meet customer's time-to-market need.
The 28nm Process Family
TSMC's 28nm technology delivers twice the gate density of the 40nm
process and also features an SRAM cell size shrink of 50 percent.
The low power (LP) process is the first available 28nm technology. It is
ideal for low standby power applications such as cellular baseband. The
28LP process boasts a 20 percent speed improved over the 40LP process at
the same leakage/gate.
The 28nm high performance (HP) process is the first option to use high-k
metal gate process technology. Featuring superior speed and performance,
the 28HP process targets CPU, GPU, FPGA, PC, networking, and consumer
33

electronics applications. The 28HP process supports a 45 percent speed


improvement over the 40G process at the same leakage/gate.
The 28nm low power with high-k metal gates (HPL) technology adopts the
same gate stack as HP technology while meeting more stringent low leakage
requirements with a trade of performance speed. With a wide leakage and
performance spectrum, N28HPL is best suitable for cellular baseband,
application process, wireless connectivity, and programmable logics. The
28HPL process reduces both standby and operation power by more than
40%.
The 28nm High Performance Mobile Computing (HPM) provides high
performance for mobile applications to address the need for applications
requiring high speed. Such technology can provide the highest speed among
28nm technologies. With such higher performance coverage, 28HPM is ideal
for many applications from networking, and high-end smartphone/ mobile
consumer products.
TSMC also provides high performance compact mobile computing (HPC)
for customers looking to tap chip area and power saving benefits of mid- to
low-end SoC designs. Compared with TSMCs 28LP, 28HPC provides 10%
smaller die size and more than 30% power reduction at all levels of speed. A
comprehensive 28HPC IP ecosystem is also built and compatible with
28HPM, accelerating time-to-market for customers. 28HPC is also ideal for
many applications from mid and mid-to-low end smartphone, tablet and
mobile consumer products.
The 28nm family will also provide a wider variety of metal options to
support a broad range of product applications for better trade-off between
performance and density.
The 28nm Design Ecosystem
TSMC provides the robust design ecosystem, technology platforms and
manufacturing excellences that promote the highest level of collaboration to
drive your next innovations.

34

CHAPTER-5
CARBON NANO TUBES
Introduction
The Amazing and Versatile Carbon Chemical basis for life
With an atomic number of 6, Carbon is the 4th most abundant
element in the Universe by mass after (Hydrogen Helium and
Oxygen). It forms more compounds that any other element, with
almost 10 million pure organic compounds. Abundance, together
with the unique diversity of organic compounds and their unusual
polymer forming ability at the temperatures commonly
encountered on Earth makes the element the chemical basis of all
known life.
Carbon Nanotubes
35

Carbon Nanotubes, long, thin cylinders of carbon, were discovered


in 1991 by Sumio Iijima. These are large macromolecules that are
unique for their size, shape, and remarkable physical properties.
They can be thought of as a sheet of graphite (a hexagonal lattice
of carbon) rolled into a cylinder. These intriguing structures have
sparked much excitement in recent years and a large amount of
research has been dedicated to their understanding. Currently, the
physical properties are still being discovered and disputed.
Nanotubes have a very broad range of electronic, thermal, and
structural properties that change depending on the different kinds
of nanotube (defined by its diameter, length, and chirality, or
twist). To make things more interesting, besides having a single
cylindrical wall (SWNTs), Nanotubes can have multiple walls
(MWNTs)--cylinders inside the other cylinders.
Carbon Nanotubes and Moores Law
At the rate Moores Law is progressing, by 2019 it will result in
transistor just a few atoms in width. This means that the strategy of
ever finer photolithography will have run its course; we have
already seen a progression from a micron, to sub micron to 45 nm
scale. Carbon Nanotubes, whose walls are just 1 atom thick, with
diameters of only 1 to 2 nm, seems to be one of the perfect
candidates to take us right to the end of Moores Law curve. We
possibly cannot go beyond that. So certainly carbon Nanotubes has
a promising future!
Key properties of Carbon Nanotubes
Carbon Nanotubes are an example of true nanotechnology: they
are less than 100 nanometers in diameter and can be as thin as 1 or
2 nm. They are molecules that can be manipulated chemically and
physically in very useful ways. They open an incredible range of
applications in materials science, electronics, chemical processing,
energy management, and many other fields. Some properties
include
1. Extraordinary electrical conductivity, heat conductivity, and
mechanical properties.
36

2. They are probably the best electron field-emitter known,


largely due to their high length-to-Diameter ratios
3. As pure carbon polymers, they can be manipulated using the
well-known and the tremendously rich chemistry of that
element.
Some of the above properties provide opportunity to modify their
structure, and to optimize their solubility and dispersion. These
extraordinary characteristics give CNTs potential in numerous
applications.
Key application areas
1. Field Emitters/Emission:
2. Conductive or reinforced plastics
3. Molecular electronics: CNT based non volatile RAM
4. CNT based transistors
5. Energy Storage
6. CNT based fibers and fabrics
7. CNT based ceramics
8. Biomedical applications.
Properties of Carbon Nanotubes
The structure of a carbon nanotube is formed by a layer of carbon
atoms that are bonded together in a hexagonal (honeycomb) mesh.
This one-atom thick layer of carbon is called graphene, and it is
wrapped in the shape of a cylinder and bonded together to form a
carbon nanotube. Nanotubes can have a single outer wall of
carbon, or they can be made of multiple walls (cylinders inside
other cylinders of carbon). Carbon nanotubes have a range of
electric, thermal, and structural properties that can change based on
the physical design of the
nanotube.
Single-walled carbon nanotube
structure
Single-walled carbon nanotubes
can be formed in three different
designs: Armchair, Chiral, and
37

Zigzag. The design depends on the way the graphene is wrapped


into a cylinder. For example, imagine rolling a sheet of paper from
its corner, which can be considered one design, and a different
design can be formed by rolling the paper from its edge. A singlewalled nanotubes structure is represented by a pair of indices
(n,m) called the chiral vector.

The chiral vector is defined in the image below.

The structural design has a direct effect on the nanotubes electrical


properties. When n m is a multiple of 3, then the nanotube is
described as "metallic" (highly conducting), otherwise the
nanotube is a semiconductor. The Armchair design is always
metallic while other designs can make the nanotube a
semiconductor.
Multi-walled carbon nanotube structure
There are two structural models of multi-walled nanotubes. In the
Russian Doll model, a carbon nanotube contains another nanotube
inside it (the inner nanotube has a smaller diameter than the outer
nanotube). In the Parchment model, a single graphene sheet is
rolled around itself multiple times, resembling a rolled up scroll of
paper. Multi-walled carbon nanotubes have similar properties to
single- walled nanotubes, yet the outer walls on multi-walled
38

nanotubes can protect the inner carbon nanotubes from chemical


interactions with outside materials. Multi-walled nanotubes also
have a higher tensile strength than single-walled nanotubes.
Strength
Carbon nanotubes have a higher tensile strength than steel and
Kevlar. Their strength comes from the sp2 bonds between the
individual carbon atoms. This bond is even stronger than the sp3
bond found in diamond. Under high pressure, individual nanotubes
can bond together, trading some sp2 bonds for sp3 bonds. This
gives the possibility of producing long nanotube wires. Carbon
nanotubes are not only strong, they are also elastic. You can press
on the tip of a nanotube and cause it to bend without damaging to
the nanotube, and the nanotube will return to its original shape
when the force is removed. A nanotube's elasticity does have a
limit, and under very strong forces, it is possible to permanently
deform to shape of a nanotube. A nanotubes strength can be
weakened by defects in the structure of the nanotube. Defects
occur from atomic vacancies or a rearrangement of the carbon
bonds. Defects in the structure can cause a small segment of the
nanotube to become weaker, which in turn causes the tensile
strength of the entire nanotube to weaken. The tensile strength of a
nanotube depends on the strength of the weakest segment in the
tube similar to the way the strength of a cahin depends on the
weakest link in the chain.
Electrical properties
As mentioned previously, the structure of a carbon nanotube
determines how conductive the nanotube is. When the structure of
atoms in a carbon nanotube minimizes the collisions between
conduction electrons and atoms, a carbon nanotube is highly
conductive. The strong bonds between carbon atoms also allow
carbon nanotubes to withstand higher electric currents than copper.
Electron transport occurs only along the axis of the tube. Single
walled nanotubes can route electrical signals at speeds up to 10
39

GHz when used as interconnects on semi-conducting devices.


Nanotubes also have a constant resistively.
Thermal Properties
The strength of the atomic bonds in carbon nanotubes allows them
to withstand high temperatures. Because of this, carbon nanotubes
have been shown to be very good thermal conductors. When
compared to copper wires, which are commonly used as thermal
conductors, the carbon nanotubes can transmit over 15 times the
amount of watts per meter per Kelvin. The thermal conductivity of
carbon nanotubes is dependent on the temperature of the tubes and
the outside environment.
Applications:
There are many potential applications for Carbon nanotubes from waterproof
and tear resistant cloth fabrics, concrete and steel like applications (a space
elevator has even been proposed) based on the property of strength,
electrical circuits based on the property of electrical conductivity, sensors
based on the property of thermal conductivity, vacuum proof food
packaging, and even as a vessel for delivering drugs. For the purpose of this
paper we are going to focus on the applications related to nano- eletronics.
Nano-Electronics
One of the most significant potential applications of single-walled nanotubes
is believed to be in the domain of nano-electronics. This is as a result of
SWNT's being highly-conductive. In fact, according to single-walled
nanotube ropes are the most conductive carbon fibers known. Alternative
configurations of a carbon nanotube can result in the resultant material being
semi-conductive like silicon.
Conductivity in nanotubes is based on the degree of chirality i.e. the
degree of twist and size of the diameter of the actual nanotube - which
results in a nanotube that is actually extremely conductive (making it
suitable as an interconnect on an integrated circuit) or non-conductive
(making it suitable as the basis for semi-conductors).
Interconnect
Chip manufacturers require metallic compounds to serve as the basis for
interconnects between transistors on chips. Up until around seven years ago
40

chip manufacturers used aluminum at which point they switched to copper.


However copper's resistance to electricity flow increases as the metals
dimensions decrease, creating a lower bound for the size of copper based
interconnects. By 2012 it is expected that higher performance chips
combined with more tightly packed transistors require interconnects less
than 40 nanometers wide, at which point copper's resistance will prove to be
ineffective as an interconnection technology.
With high conductivity and small dimensions, carbon nanotubes may
provide an alternative interconnect option to copper. Toshiba and Stanford
University recently published results
demonstrating a CNT-based interconnect operating at 1Ghz on a chip
containing 11000 transistors on a chip the size of 1/100th of a square inch.
This research demonstrates that carbon nanotubes are not just a viable
alternative to copper, but that they can also be used alongside existing IC
manufacturing processes. Another advantage to carbon nanotubes is that
unlike copper there is no need to embed the interconnects into trenches on
the circuit board, which could make for a simpler manufacturing process.
Transistors
Transistors form the basis for modern integrated circuits functioning as
digital switches. Alternative configurations of carbon-nanotubes result in
defects being present that allow single walled nanotubes to act as transistors.
Nanotube based switches the size of an individual electron had been
envisioned but had originally required cryogenic like temperatures.
In such a switch a molecule can be positioned inside a carbon nanotube to
affect the electronic current flowing across it. The result is a molecular-scare
gate in which the position of the molecule controls the flow of the electrical
current. In this model, the gate is about one nanometer in size, or three
orders of magnitude smaller than a silicon chip. In 2001 researchers
demonstrated that nanotube transistors could be realized that would operate
at room temperature. IBM has also demonstrated fabrication of nanotube
transistors.
Energy Production and Storage
Carbon Nanotube technology also holds promise for a wide range of energyrelated applications.

Batteries
41

Most portable electronic devices use rechargeable lithium-ion batteries.


These batteries release charge when lithium ions move between two
electrodes - one of which is graphite and the other is metal oxide.
Researchers at the University of North Carolina have demonstrated that by
replacing the graphite with SWCNTs they can double storage capacity.
Electrodes made of carbon nanotubes can be ten times thinner and lighter
than amorphous carbon electrodes and their conductivity is more than one
thousand times greater. In some cases, such as electric vehicles, the
reduction in weight can make a significant reduction in battery power
requirements. Carbon nanotubes have been used in supercapacitors
producing a power density of 30kw/kg (compared to 4kw/kg for
commercially available devices). Such supercapacitors could drastically
reduce the time it takes to recharge devices such as laptops and cell phones.
Ultra-thin flexible batteries have been made with CNT infused paper. Ionic
liquid is soaked into the paper as the battery's electrolyte. Electrolytes in
human blood, sweat, and urine can also help to power the battery which may
be useful in implantable medical devices. These batteries can be rolled,
folded, or cut without loss of efficiency. They can also be stacked to boost
their output power. Although the materials used in the batteries, which are
over 90% cellulose, are very inexpensive an inexpensive method of massproduction has not yet been developed.
Solar Cells
Researchers at Georgia Tech Research Institute have created solar cells
consisting of 100- micrometer-high towers built of CNTs grown on ironcoated silicon wafers. There are 40,000 of these
towers in each square centimeter of the surface; Each tower is an array of
millions of vertically aligned CNTs. These cells absorb more light as it
reflects off the sides of the towers. Unlike typical solar cells that have peak
efficiency when the sun is at 90o, these cells have two peaks at 45 and
operate with relatively high efficiency during most of the day. This makes
them particularly appropriate for applications in space because it eliminates
the requirement of having a mechanical means of orienting the cells to face
the sun.
Current hurdles
There are several remaining obstacles, technical and non-technical to CNT
success. Covering these in detail is length and probably impossible; flipped
on its head, such a passage would be a path to success with nanotubes! But
here are a couple of the major impediments.
42

Electronic Heterogeneity
One problem with nanotube production for electronics is that batches of
nanotubes are heterogeneous mixtures of metallic and semiconducting tube
types. Electrical devices typically require these types to be separated, but so
far it has been difficult to tune production in this regard. There also remain
issues with doping, or tuning conductivity, and electrical behavior at contact
points. (Rogers, UIUC).
Orientation
A problem with nanotubes where recent progress has been made is
controlling their orientation. Nanotubes are commonly grown in a chaotic
organization (affectionately known as a "rat's nest"), which are difficult to
use in microprocessors. Recently John Rogers and his team at the University
of Illinois, Urbana Champagne, discovered that carefully growing nanotubes
on quartz wafers can lead to a highly organized configuration.
Size and Density
The size of manufactured nanotubes typically varies widely. For commercial
use, nanotube manufacturers will need to make size more consistent. Though
nanotubes are very narrow, nanotube matrices typically have quite large
(100nm) spacing between tubes.
Export Policy
As with other multi-use technologies, nanotubes may be subject to export
controls. Finding information about this was difficult, multiple sources were
unaware of restrictions but in at least one case a foreign researcher was
denied access to nanotubes. Adelina Santos, a Brazilian nuclear scientist,
says a U.S. based supplier refused to ship him nanotubes due to federal
regulations. However, restrictions seem difficult to enforce; Santos had a
friend smuggle a gram of nanotubes to him Current customs protocols
probably do not place a priority on detecting nanotubes.
Export restrictions could slow adoption of nanotube technologies and
prevent standardization. Regulation of commerce in nanotube technology
will increase costs.
Environmental concerns
The environmental risks of nanotubes are still unclear[6]. Naturally
occurring carbon is fairly benign, and is largely unregulated, but nanotubes
interact with the environment differently. There have been several studies
performed to test the effects of carbon nanotubes on living systems.
1. Fruit fly larvae fed a diet containing nanotubes appeared to develop
normally.
43

2. One study showed that CNTs delay embryo development in zebrafish,


but the fish otherwise appeared normal.
3. Mice lungs became inflamed when exposed to nanotubes. Though the
inflammation subsided within a few months, this has stark parallels to
the affect of asbestos on human lungs.
4. Some human tumor cells seem to proliferate more rapidly in the
presence of nanotubes.
In some situations, coatings applied to the nanotubes, rather than the
nanotubes themselves, may become environmental culprits. The solar cells
described earlier are coated with a cadmium-telluride mix, which would be
too toxic for widespread use. Perhaps most sobering to consider is that some
forms of nanotubes biodegrade slowly, tubes released into the environment
may make their way into our food supply, and from there, throughout our
bodies. Some researchers believe that nanotube use in electronics is probably
not very risky because of the small volumes involved, but this argument
hinges on computing being limited to small numbers of devices.
Entrenched dominance
The single biggest hurdle to nanotube success in integrated circuits is the
continued success of silicon-based devices. Nanotubes have some
outstanding properties, but exploiting these properties to build robust chips
could prove very difficult. One company, Nantero, received funding in 2001
for a nanotube-based RAM (NRAM) product. At the time, NRAM promised
to be better than DRAM in several dimensions, but, seven years later they
don't have a product or even a vague timeline for when something might be
available. By the time a product is released, if ever, NRAMs advantages will
undoubtedly have diminished. Competing with silicon apples-to-apples has
been the downfall of other promising materials technologies; CNT
nanofabrics, which use a design paradigm that considers defects in
nanomaterials to be the norm rather than the exception, hold considerable
promise, but nanotube-based products will probably find their first successes
in specialized applications such as interconnect.
Fabrication
The furtherment of fabrication technology will require the ability to target
nanotubes with high yield of specified lengths, diameter, number of walls,
and chirality. There are several procedures that have been developed for
fabricating CNT structures. In this section, we give an overview of a few of
them.
Arc Discharge Method
44

A chamber containing a graphite cathode and anode contains evaportated


carbon molecules in a buffer gas such as helium. The chamber also contains
some amount of metal catalyst particles (such as cobalt, nickel, and/or iron).
DC current is passed through the chamber while the chamber is also
pressurized and heated to ~4000K. In the course of this procedure, about
half of the evaporated carbon solidifies on the cathode tip into a "cylindrical
hard deposit." The remaining carbon condenses into "chamber soot" around
the walls of the chamber and "cathode soot" on the cathode.
The cathode soot and chamber soot yield either single-walled or multiwalled carbon nanotubes. The cylindrical hard deposit doesn't yield anything
particularly interesting.
The choice of buffer gas, the pressure of the chamber, and the metallic
catalyst added to the chamber. Apparently the nanotubes grow from the
surfaces of the metallic catalyst particles. These choices determine the shape
and whether they are single- or multi-walled.
The advantage of this method is that it produces a large quantity of
nanotubes. But the main disadvantage is that there is relatively little control
over the alignment (i.e. chirality) of the produced nanotubes, which is
critical to their characterization and role. Furthermore, due to the metallic
catalyst included in the reaction, the products need to be purified afterwards.
Methods such as oxidation, centrifugation, filtration, and acid treatment have
been used.
Laser Ablation Method
A quartz tube containing a block of graphite is heated in a furnace. A flow of
argon gas is maintained throughout the reaction. A laser is used to vaporize
the graphite within the quartz. The carbon vaporizes, is carried away by the
argon, and condenses downstream on the cooler walls of the quartz. This
condensation is SWNT and metallic particles. Thereafter, purification
methods are applied to this mixture.
The key to the proper formation of the condensed nanotubes is that the
location where the carbon atoms begin to condense should be set up as a
curved sheet of graphene with a catalyst metallic atom nearby. As carbon
atoms begin to attach and form rings, the metallic atom, if it has the proper
electonegativity properties, will preserve the open edge of the tube and
prevent it from drawing to a close. The authors of the paper describe this
phenomenon as the "scooter" effect, because the metallic atom "scoots"
around the open edge, preventing it from closing.

45

The large atom in this figure is the "scooting" metallic atom


Advantages of this technique include a relatively high yield and relatively
low metallic impurities, since the metallic atoms involved tend to evaporate
from the end of the tube once it is closed. One disadvantage is that the
nanotubes produced from this method are not necessarily uniformly straight,
but instead do contain some branching.
Chemical Vapor Deposition
The CVD approach allows CNTs to grow on a variety of materials, which
makes it more viable to integrate into already existent processes for
synthesizing electronics. This process involves the chemical breakdown of a
hydrocarbon on a substrate.
It's already been shown in previous methods, such as the arc discharge
method, that a main way to grow carbon nanotubes is by exciting carbon
atoms that are in contact with metallic catalyst particles. The CVD method
extends this idea by embedding these metallic particles (iron, in the case of
the seminal paper) in properly aligned holes in a substrate (silicon, in this
case). Essentially, tubes are drilled into silicon and implanted with iron
nanoparticles at the bottom. Then, a hydrocarbon such as acetylene is heated
and decomposed onto the substrate. The carbon comes into contact with the
metal particles

46

embedded in the holes and start to form nanotubes that are "templated"
from the shape of the tunnel. It turns out that the carbon nanotubes grow
very long and very well aligned, in the angle of the tunnel.

The advantages of this method are that the yield is very high, the alignment
of the nanotubes is consistent (which is crucial for creating particular types
of nanotubes, e.g. semiconductor or metallic), and the size of the growth
area is theoretically arbitrary.
The main disadvantage is that, though the size of the growth area is basically
arbitrary, large sized areas (several millimeters) tend to crack, shrink, and
otherwise warp. The substrates need to be dried very thoroughly to prevent
against this.
n-hexane Pyrolysis
Researchers developed a method to synthesize large, long single walled
nanotube bundles in a vertical furnace by pyrolyzing hexane molecules.
These n-hexane molecules are mixed with certain other chemicals that have
been shown independently to help with growth of nanotubes. These are
burned (pyrolyzed) at a very high temperature in a flow of hydrogen and
other optional gases. According to the paper, using a different hydrocarbon
or using a different gas prevented the formation of long nanotubes.
The primary advantage of this method is that it produces macroscopic
nanotube bundles ("microtubes"): their diameters are typically larger than
that of human hair, and their length is several centimeters. The disadvantage
is that the alignment is not as produced from other methods, making it viable
for creating "microcables", but not nanotubes with precise electrical
47

properties. Another disadvantage is that from the researchers' measurements,


the elasticity of these nanotube bundles is not as great as hoped (i.e. they are
more brittle).
There is much about carbon nanotubes that is still unknown. More research
needs to be done regarding the environmental and health impacts of
producing large quantities of them. There is also much work to be done
towards cheaper mass-production and incorporation with other materials
before many of the current applications being researched can be
commercialized. There is no doubt however that carbon
nanotubes will play a significant role in a wide range of commercial
applications in the very near future. Not only will they help create some very
cool tech gadgets, they may also help solve the world's energy problems.
Carbon Nanotubes and Processor
MIT, Samsung, Intel and many more research and development sectors
already produced test model to describe potential of as low as 5nm-3nm
transistors which could change the way of computing. As said by a professor
at MIT if a Processor and transistor process as low as 5nm-3nm the
processing power will reach near to todays super computers with drastically
tiny size and less heat generation v/s stress formations as compared to old Si
(Silicon) based transistors.

CHAPTER-6
22 nm PROCESS

The 22 nanometer (22 nm) is the process step following the 32 nm in CMOS
semiconductor device fabrication. The typical half-pitch (i.e., half the
distance between identical features in an array) for a memory cell using the
process is around 22 nm. It was first introduced by semiconductor
companies in 2008 for use in memory products, while first consumer-level
CPU deliveries started in April 2012.
48

The ITRS (International Technology Roadmap for Semiconductors) 2006


Front End Process Update indicates that equivalent physical oxide thickness
will not scale below 0.5 nm (about twice the diameter of a silicon atom),
which is the expected value at the 22 nm node. This is an indication that
CMOS scaling in this area has reached a wall at this point, possibly
disturbing Moore's law.
On the ITRS (International Technology Roadmap for Semiconductors)
roadmap, the successor to 22 nm technology will be 14 nm technology.
Intel is introducing revolutionary Tri-Gate transistors on its 22 nm
logic technology
Tri-Gate transistors provide an unprecedented combination of
improved performance and energy efficiency.
22 nm processors using Tri-Gate transistors, code-named Ivy Bridge,
are now demonstrated working in systems.
Intel is on track for 22 nm production in 2H 11, maintaining a 2-year
cadence for introducing new technology generations.
This technological breakthrough is the result of Intels highly
coordinated research-development-manufacturing pipeline.
Tri-Gate transistors are an important innovation needed to continue
Moores Law

Comparison between 32nm Process and 22nm Process:

49

50

51

52

53

Microarchitecture

Intel core 2 Architecture on 22nm Process, with Mircoarchitecture.

54

In electronics engineering and computer engineering, microarchitecture


(sometimes abbreviated to arch or uarch), also called computer
organization, is the way a given instruction set architecture (ISA) is
implemented on a processor. A given ISA may be implemented with
different microarchitectures;implementations may vary due to different goals
of a given design or due to shifts in technology.
Computer architecture is the combination of microarchitecture and
instruction set design.

Relation to instruction set architecture


The ISA is roughly the same as the programming model of a processor as
seen by an assembly language programmer or compiler writer. The ISA
includes the execution model, processor registers, address and data formats
among other things. The microarchitecture includes the constituent parts of
the processor and how these interconnect and interoperate to implement the
ISA.

Single bus organization microarchitecture


The microarchitecture of a machine is usually represented as (more or less
detailed) diagrams that describe the interconnections of the various
microarchitectural elements of the machine, which may be everything from
single gates and registers, to complete arithmetic logic units (ALUs) and
even larger elements. These diagrams generally separate the datapath (where
data is placed) and the control path (which can be said to steer the data).
The person designing a system usually draws the specific microarchitecture
as a kind of data flow diagram. Like a block diagram, the microarchitecture
diagram shows microarchitectural elements such as the arithmetic and logic
unit and the register file as a single schematic symbol. Typically the diagram
connects those elements with arrows and thick lines and thin lines to
distinguish between three-state buses -- which require a three state buffer for
each device that drives the bus; unidirectional buses -- always driven by a
single source, such as the way the address bus on simpler computers is
always driven by the memory address register; and individual control lines.
Very simple computers have a single data bus organization -- they have a
55

single three-state bus. The diagram of more complex computers usually


shows multiple three-state buses, which help the machine do more
operations simultaneously.
Each microarchitectural element is in turn represented by a schematic
describing the interconnections of logic gates used to implement it. Each
logic gate is in turn represented by a circuit diagram describing the
connections of the transistors used to implement it in some particular logic
family. Machines with different microarchitectures may have the same
instruction set architecture, and thus be capable of executing the same
programs. New microarchitectures and/or circuitry solutions, along with
advances in semiconductor manufacturing, are what allows newer
generations of processors to achieve higher performance while using the
same ISA.
In principle, a single microarchitecture could execute several different ISAs
with only minor changes to the microcode.

Aspects of microarchitecture
Intel 80286 microarchitecture
The pipelined datapath is the most commonly used datapath design in
microarchitecture today. This technique is used in most modern
microprocessors, microcontrollers, and DSPs. The pipelined architecture
allows multiple instructions to overlap in execution, much like an assembly
line. The pipeline includes several different stages which are fundamental in
microarchitecture designs. Some of these stages include instruction fetch,
instruction decode, execute, and write back. Some architectures include
other stages such as memory access. The design of pipelines is one of the
central microarchitectural tasks.
Execution units are also essential to microarchitecture. Execution units
include arithmetic logic units (ALU), floating point units (FPU), load/store
units, branch prediction, and SIMD. These units perform the operations or
calculations of the processor. The choice of the number of execution units,
their latency and throughput is a central microarchitectural design task. The
size, latency, throughput and connectivity of memories within the system are
also microarchitectural decisions.
56

System-level design decisions such as whether or not to include peripherals,


such as memory controllers, can be considered part of the microarchitectural
design process. This includes decisions on the performance-level and
connectivity of these peripherals.
Unlike architectural design, where achieving a specific performance level is
the main goal, microarchitectural design pays closer attention to other
constraints. Since microarchitecture design decisions directly affect what
goes into a system, attention must be paid to such issues as:

Chip area/cost
Power consumption
Logic complexity
Ease of connectivity
Manufacturability
Ease of debugging
Testability
Microarchitectural concepts
Instruction cycle
Main article: instruction cycle

In general, all CPUs, single-chip microprocessors or multi-chip


implementations run programs by performing the following steps:
1.
2.
3.
4.
5.

Read an instruction and decode it


Find any associated data that is needed to process the instruction
Process the instruction
Write the results out
The instruction cycle is repeated continuously until the power is
turned off.

Increasing execution speed


Complicating this simple-looking series of steps is the fact that the memory
hierarchy, which includes caching, main memory and non-volatile storage
like hard disks (where the program instructions and data reside), has always
57

been slower than the processor itself. Step (2) often introduces a lengthy (in
CPU terms) delay while the data arrives over the computer bus. A
considerable amount of research has been put into designs that avoid these
delays as much as possible. Over the years, a central goal was to execute
more instructions in parallel, thus increasing the effective execution speed of
a program. These efforts introduced complicated logic and circuit structures.
Initially, these techniques could only be implemented on expensive
mainframes or supercomputers due to the amount of circuitry needed for
these techniques. As semiconductor manufacturing progressed, more and
more of these techniques could be implemented on a single semiconductor
chip. See Moore's law.

Instruction set choice


Instruction sets have shifted over the years, from originally very simple to
sometimes very complex (in various respects). In recent years, load-store
architectures, VLIW and EPIC types have been in fashion. Architectures that
are dealing with data parallelism include SIMD and Vectors. Some labels
used to denote classes of CPU architectures are not particularly descriptive,
especially so the CISC label; many early designs retroactively denoted
"CISC" are in fact significantly simpler than modern RISC processors (in
several respects).
However, the choice of instruction set architecture may greatly affect the
complexity of implementing high performance devices. The prominent
strategy, used to develop the first RISC processors, was to simplify
instructions to a minimum of individual semantic complexity combined with
high encoding regularity and simplicity. Such uniform instructions were
easily fetched, decoded and executed in a pipelined fashion and a simple
strategy to reduce the number of logic levels in order to reach high operating
frequencies; instruction cache-memories compensated for the higher
operating frequency and inherently low code density while large register sets
were used to factor out as much of the (slow) memory accesses as possible.

Instruction pipelining
One of the first, and most powerful, techniques to improve performance is
the use of the instruction pipeline. Early processor designs would carry out
all of the steps above for one instruction before moving onto the next. Large
58

portions of the circuitry were left idle at any one step; for instance, the
instruction decoding circuitry would be idle during execution and so on.
Pipelines improve performance by allowing a number of instructions to
work their way through the processor at the same time. In the same basic
example, the processor would start to decode (step 1) a new instruction
while the last one was waiting for results. This would allow up to four
instructions to be "in flight" at one time, making the processor look four
times as fast. Although any one instruction takes just as long to complete
(there are still four steps) the CPU as a whole "retires" instructions much
faster.
RISC make pipelines smaller and much easier to construct by cleanly
separating each stage of the instruction process and making them take the
same amount of time one cycle. The processor as a whole operates in an
assembly line fashion, with instructions coming in one side and results out
the other. Due to the reduced complexity of the Classic RISC pipeline, the
pipelined core and an instruction cache could be placed on the same size die
that would otherwise fit the core alone on a CISC design. This was the real
reason that RISC was faster. Early designs like the SPARC and MIPS often
ran over 10 times as fast as Intel and Motorola CISC solutions at the same
clock speed and price.
Pipelines are by no means limited to RISC designs. By 1986 the top-of-theline VAX implementation (VAX 8800) was a heavily pipelined design,
slightly predating the first commercial MIPS and SPARC designs. Most
modern CPUs (even embedded CPUs) are now pipelined, and microcoded
CPUs with no pipelining are seen only in the most area-constrained
embedded processors. Large CISC machines, from the VAX 8800 to the
modern Pentium 4 and Athlon, are implemented with both microcode and
pipelines. Improvements in pipelining and caching are the two major
microarchitectural advances that have enabled processor performance to
keep pace with the circuit technology on which they are based.

Cache
It was not long before improvements in chip manufacturing allowed for even
more circuitry to be placed on the die, and designers started looking for
ways to use it. One of the most common was to add an ever-increasing
59

amount of cache memory on-die. Cache is simply very fast memory,


memory that can be accessed in a few cycles as opposed to many needed to
"talk" to main memory. The CPU includes a cache controller which
automates reading and writing from the cache, if the data is already in the
cache it simply "appears", whereas if it is not the processor is "stalled" while
the cache controller reads it in.
RISC designs started adding cache in the mid-to-late 1980s, often only 4 KB
in total. This number grew over time, and typical CPUs now have at least
512 KB, while more powerful CPUs come with 1 or 2 or even 4, 6, 8 or 12
MB, organized in multiple levels of a memory hierarchy. Generally
speaking, more cache means more performance, due to reduced stalling.
Caches and pipelines were a perfect match for each other. Previously, it
didn't make much sense to build a pipeline that could run faster than the
access latency of off-chip memory. Using on-chip cache memory instead,
meant that a pipeline could run at the speed of the cache access latency, a
much smaller length of time. This allowed the operating frequencies of
processors to increase at a much faster rate than that of off-chip memory.

Branch prediction
One barrier to achieving higher performance through instruction-level
parallelism stems from pipeline stalls and flushes due to branches. Normally,
whether a conditional branch will be taken isn't known until late in the
pipeline as conditional branches depend on results coming from a register.
From the time that the processor's instruction decoder has figured out that it
has encountered a conditional branch instruction to the time that the deciding
register value can be read out, the pipeline needs to be stalled for several
cycles, or if it's not and the branch is taken, the pipeline needs to be flushed.
As clock speeds increase the depth of the pipeline increases with it, and
some modern processors may have 20 stages or more. On average, every
fifth instruction executed is a branch, so without any intervention, that's a
high amount of stalling.
Techniques such as branch prediction and speculative execution are used to
lessen these branch penalties. Branch prediction is where the hardware
makes educated guesses on whether a particular branch will be taken. In
reality one side or the other of the branch will be called much more often
60

than the other. Modern designs have rather complex statistical prediction
systems, which watch the results of past branches to predict the future with
greater accuracy. The guess allows the hardware to prefetch instructions
without waiting for the register read. Speculative execution is a further
enhancement in which the code along the predicted path is not just
prefetched but also executed before it is known whether the branch should
be taken or not. This can yield better performance when the guess is good,
with the risk of a huge penalty when the guess is bad because instructions
need to be undone.

Superscalar
Even with all of the added complexity and gates needed to support the
concepts outlined above, improvements in semiconductor manufacturing
soon allowed even more logic gates to be used.
In the outline above the processor processes parts of a single instruction at a
time. Computer programs could be executed faster if multiple instructions
were processed simultaneously. This is what superscalar processors achieve,
by replicating functional units such as ALUs. The replication of functional
units was only made possible when the die area of a single-issue processor
no longer stretched the limits of what could be reliably manufactured. By the
late 1980s, superscalar designs started to enter the market place.
In modern designs it is common to find two load units, one store (many
instructions have no results to store), two or more integer math units, two or
more floating point units, and often a SIMD unit of some sort. The
instruction issue logic grows in complexity by reading in a huge list of
instructions from memory and handing them off to the different execution
units that are idle at that point. The results are then collected and re-ordered
at the end.

Out-of-order execution
The addition of caches reduces the frequency or duration of stalls due to
waiting for data to be fetched from the memory hierarchy, but does not get
rid of these stalls entirely. In early designs a cache miss would force the
cache controller to stall the processor and wait. Of course there may be some
other instruction in the program whose data is available in the cache at that
61

point. Out-of-order execution allows that ready instruction to be processed


while an older instruction waits on the cache, then re-orders the results to
make it appear that everything happened in the programmed order. This
technique is also used to avoid other operand dependency stalls, such as an
instruction awaiting a result from a long latency floating-point operation or
other multi-cycle operations.

Register renaming
Register renaming refers to a technique used to avoid unnecessary serialized
execution of program instructions because of the reuse of the same registers
by those instructions. Suppose we have two groups of instruction that will
use the same register. One set of instructions is executed first to leave the
register to the other set, but if the other set is assigned to a different similar
register, both sets of instructions can be executed in parallel (or) in series.

Multiprocessing and multithreading


Main articles: Multiprocessing and Multithreading (computer architecture)
Computer architects have become stymied by the growing mismatch in CPU
operating frequencies and DRAM access times. None of the techniques that
exploited instruction-level parallelism (ILP) within one program could make
up for the long stalls that occurred when data had to be fetched from main
memory. Additionally, the large transistor counts and high operating
frequencies needed for the more advanced ILP techniques required power
dissipation levels that could no longer be cheaply cooled. For these reasons,
newer generations of computers have started to exploit higher levels of
parallelism that exist outside of a single program or program thread.
This trend is sometimes known as throughput computing. This idea
originated in the mainframe market where online transaction processing
emphasized not just the execution speed of one transaction, but the capacity
to deal with massive numbers of transactions. With transaction-based
applications such as network routing and web-site serving greatly increasing
in the last decade, the computer industry has re-emphasized capacity and
throughput issues.
One technique of how this parallelism is achieved is through
multiprocessing systems, computer systems with multiple CPUs. Once
62

reserved for high-end mainframes and supercomputers, small scale (2-8)


multiprocessors servers have become commonplace for the small business
market. For large corporations, large scale (16-256) multiprocessors are
common. Even personal computers with multiple CPUs have appeared since
the 1990s.
With further transistor size reductions made available with semiconductor
technology advances, multicore CPUs have appeared where multiple CPUs
are implemented on the same silicon chip. Initially used in chips targeting
embedded markets, where simpler and smaller CPUs would allow multiple
instantiations to fit on one piece of silicon. By 2005, semiconductor
technology allowed dual high-end desktop CPUs CMP chips to be
manufactured in volume. Some designs, such as Sun Microsystems'
UltraSPARC T1 have reverted to simpler (scalar, in-order) designs in order
to fit more processors on one piece of silicon.
Another technique that has become more popular recently is multithreading.
In multithreading, when the processor has to fetch data from slow system
memory, instead of stalling for the data to arrive, the processor switches to
another program or program thread which is ready to execute. Though this
does not speed up a particular program/thread, it increases the overall system
throughput by reducing the time the CPU is idle.
Conceptually, multithreading is equivalent to a context switch at the
operating system level. The difference is that a multithreaded CPU can do a
thread switch in one CPU cycle instead of the hundreds or thousands of CPU
cycles a context switch normally requires. This is achieved by replicating the
state hardware (such as the register file and program counter) for each active
thread.
A further enhancement is simultaneous multithreading. This technique
allows superscalar CPUs to execute instructions from different
programs/threads simultaneously in the same cycle.

63

Facts of 22nm Process


The original transistor built by Bell Labs in 1947 was large enough
that it was pieced together by hand. By contrast, more than 100
million 22nm tri-gate transistors could fit onto the head of a pin.
More than 6 million 22nm tri-gate transistors could fit in the period at
the end of this sentence.
A 22nm tri-gate transistors gates that are so small, you could fit more
than 4000 of them across the width of a human hair.
If a typical house shrunk as transistors have, you would not be able to
see a house without a microscope. To see a 22nm feature with the
naked eye, you would have to enlarge a chip to be larger than a house.
Compared to Intels first microprocessor, the 4004, introduced in
1971, a 22nm CPU runs over 4000 times as fast and each transistor
uses about 5000 times less energy. The price per transistor has
dropped by a factor of about 50,000.
A 22nm transistor can switch on and off well over 100 billion times in
one second. It would take you around 2000 years to flick a light
switch on and off that many times.
Its one thing to design a tri-gate transistor but quite another to get it
into high volume manufacturing. Intels factories produce over 5
billion transistors every second. Thats 150,000,000,000,000,000
transistors per year, the equivalent of over 20 million transistors for
every man, woman and child on earth.

64

References:
[1]. http://www.intel.com : Photos and Process details.
[2]. http://www.wikipedia.com : Definitions.

65