Design Planning For Large SoC Implemention at 40nm Part 3

Design planning for large SoC implemention
at 40nm - Part 3
bhupesh dasila - August 09, 2013
[Part 1 explores the process technology to learn its capabilities and limitations, including evaluating
the technology libraries, determining the implementation tools and flows, and capturing the SoC
requirements.
Part 2 covers comprehensive planning for complex designs at lower geometry.]
Floorplanning and PnR
A thorough exercise during physical architecture is the foundation for an efficient floorplan. It helps
in reducing the overall turnaround time of the physical design phase. The broader prospective of the
floorplan should be performed during the physical architecture phase, and the actual floorplaning
phase should address the finer details of the floorplan, which impacts the physical designs QoR.
Floorplanning guidelines
The seed for a floorplan primarily comes from physical architecture, die size-power estimation
exercise and the technology. When creating a floorplan, its important to consider some basic
characteristics of the process technology. The designer should have explored the technology enough
in the context of metal stack and metal configuration. Also the designer should have gained ample
experience about the availability of vertical and horizontal routing resources and their requirements
for the design as per the physical architecture.
At any level, creating non-preferred routing (i.e. not using the preferred routing direction for that
level) is not recommended. In the case of a channel-based floorplan, when placing blocks, four-way
intersections in top-level channels should be avoided; T intersections create much less congestion.
This consideration can be critical in leaving adequate space for routing channels, especially if there
is not much opportunity for over-the-cell routing. Using fly lines can help determine optimal
placement and orientation, but when the fly lines are numerous enough to paint the area between
blocks, designers must rely on their best judgment for block placement, and later evaluate the
results for possible modification.
Once blocks are placed, block-level pins may be placed. It is necessary to determine the correct
layer for the pins and spread the pins out to reduce congestion. Placing pins in corners where
routing access is limited should be avoided; instead, multiple pin layers for less congestion should be
used. It is worth spending the time needed to place block pins manually so that block-to-block routs
are straight, have minimum distance between them and are cross-talk immune. This will help
immensely down the line during full-chip timing closure.
While placing the hard macros, like PLL or other analog blocks, it is important to adhere to the
guidelines provided by the IP vendor. Placing cells within the perimeter of hard macros is not
recommended. To keep from blocking access to signal pins, it is a good idea to avoid placing cells
under power straps unless the straps are on high metal layers (i.e., higher than metal2). Density
constraints or placement of blockage arrays may be used to reduce congestion since these strategies
will help spread cells over a larger area, thereby reducing the routing requirements in that area.
In any physical design work, it is essential to understand the requirements of the target process
technology. Lower utilization would result in a larger chip, but the chip is less likely to have
problems in routing. For example, most processes now require the insertion of holes in large metal
areas in a step known as slotting or cheesing. Slotting relieves stress-related problems in the
metal due to thermal effects, but may change the metals current-carrying characteristics. It is
imperative to consult the design rule document for this and many other physical variables.
For technology nodes below 40nm, there are other important rules that must be considered while
creating the floorplan. For example, the TCD structures are placed to monitor the various processes
on the die. These components are driven from the foundry. The TCD structure is required to be
placed at regular intervals throughout the chip. It could be a significant size, which may need to be
allocated on the die early on. This may impact a floorplan of a block at a later stage if this fact is not
considered early on, such as in a block packed with memories. Similarly, for core ESD protection, it
is recommended to place ESD clamps at regular intervals on the die. These are a few of the
mandatory components that must be considered in the early stages of full-chip floorplanning, and
must be considered for block-level floorplanning.
The RTL should be examined for logical models to break out into hierarchical physical elements. If
there are multiple instances of any logical hierarchical element, these elements can be grouped to
form one physical element. It is easier to floorplan with same-size blocks, so small blocks should be
grouped and large blocks divided when appropriate. Working with medium-sized blocks is typically
best; six to twelve roughly equivalent-sized blocks is a reasonable target.
Typically, floorplans should be started with I/Os at the periphery (depending on the package design).
It is crucial to determine the total number of I/Os required, and their placements. The physical
designer must calculate the number of core Vdd and Vss pads in the I/O ring for ESD protection (in
the case of a flip chip). In the case of a wire-bond chip, the core Vdd and Vss must be calculated
through a thorough analysis after considering the chip power requirements and IR drop. The SSO
ratio must also be considered for calculation of VddIO and VssIO; care must be taken for high-speed
interfaces. Apart from this, there could be some custom requirements for I/O planning, such as the
placement of PLL in the I/O rings or placement of SerDes IP or DDR IP, which is inclusive of bumps
in it. During and after the I/O planning, it is very important that the bump or bond pad DRC and LVS
is clean before a floorplan can be considered final. Hence, I/O planning is yet another seed for
floorplanning.
It is best to place parts of the design that have special layout requirements (e.g., memories, analog
circuitry, PLLs, logic that works with a double-speed clock, blocks that require a different voltage,
any exceptionally large blocks, etc.) first to ensure that their needs are accommodated. Design
blocks with special needs must be understood at the beginning; for example, flash memory has a
high-voltage programming input that must be within a certain distance of an I/O pin, so it is best to
place this element first.
If there are two or more large blocks or other features that make a reasonable floorplan impossible,
it may be necessary to increase the die size or re-arrange I/Os. Finding this problem early in the flow
enables and easier business decision about whether the chip will be financially viable with a larger,
more expensive die. If any of the large blocks are soft (synthesizable) IP or otherwise available as
RTL, it might be possible to avoid going to a larger die by repartitioning that block into smaller
pieces.
Another key aspect in I/O planning is to consider the scan I/O requirement. Here, the physical
designer must engage with the DFT architect. The scan architecture could also play a significant
role in I/O planning and floorplanning.
Block-level floorplanning
Block-level floorplanning
A good floorplan of the blocks is crucial to faster timing closure. The design knowledge and the data
flow understanding help immensely in creating an optimum floorplan, which leads to faster
convergence of the block. So, the physical designer must engage with the RTL designer.
Establish up front whether the block floorplan needs to be pushed down from the top or the other
way around. Typically, the block floorplan is pushed down from top, but in the case of some critical
blocks, the top floorplan may have to adjust to the block floorplan requirement.
Initial synthesis should be run to determine the total area of the cells in a block.
Determining the area of a block beyond the area of its cells is a factor of utilization. Utilization
varies depending on the library, technology, and characteristics of the design implemented; for the
typical library, the sweet spot is usually about 70% utilization. An unusually high percentage of
registers, or hard IP, will increase this number; large numbers of multiplexers or other small, pindense cells will decrease it.
The 70% utilization from a synthesized netlist may not be optimal for every block. As the growth of
every block is dependent upon various factors, and more importantly the number of memories and
associated MBIST logic post DFT insertion. A block without any memory can have a starting
utilization of as high as 75%, and a block packed with memories may have a starting utilization of a
low as 60%. Therefore, its best to arrive at the best utilization number after taking a block
through an entire rough PD process in the early phase.
Care must be taken while placing the memories. For example, there is poly orientation rule from
the foundry at technology nodes below 40nm, which does not allow memory orientation to be at 90
degrees from one another (i.e. either memories should be aligned to X axis or should be aligned to
Y axis). Realizing this at a later stage could be catastrophic. Hence the foundry rules must be
understood and considered right from the beginning of physical design activity.
PnR consideration
The physical designer must not rely on the push button PnR approach. Its better to take stock of the
situation at every PD stage. The physical design optimization and implementation go through various
stages, like placement, placement optimization, CTS, post CTS optimization, hold fixing, routing and
routing optimization, crosstalk repair and chip finishing. The best approach in physical design is to
be able to push down the physical information from the physical synthesis environment to the PnR
environment. However, it may not be possible if the physical synthesis and PnR tool vendors are
different. In such cases, the physical designer must build a margin of miscorrelation by exercising
some critical benchmark blocks through synthesis to the PnR process.
Typically every block would be processed through the PD flow a couple of times before going in for
the final PD implementation. Every design could impose its own challenges and could have its own
customization for better optimization. Some designs with higher pin density and more complex cells
can impose major routing challenges. Some designs can have a high combo vs. sequential cell ratio,
which is more challenging in PD. In general, a ratio beyond eight could be highly challenging to
converge. Some designs may have a requirement of creating soft regions. If there are some critical
signal nets, which need higher attention, such as adopting NDRs, like clock nets, then these must be
identified. All of these issues must be identified by the physical designer prior to final
implementation.
One should also consider incorporating the process technology-related requirements early on, like
well taps and end caps to avoid physical DRC-related issues later on. The routing resource allocation
and assessment for signal nets must be done when the power structure for the chip is created. This
is where the metal stack plays a key role. A physical designer would always be comfortable with a
higher metal stack; but the metal stack decision has to align with business decisions.
The PD implementation flow from the RTL to PD environment could vary depending on how the
MBIST logic insertion is done (i.e. on RTL or in the netlist). Its better if the physical synthesis is
done post DFT insertion. The BIST logic-to-memory path at a functional frequency may require
physical synthesis optimization. The synthesis, MBIST implementation and PD must collaborate to
achieve better PD implementation of a block.
One of the crucial seeds to PD is the constraints. The STA and PD engineer must streamline the
constraints before starting the PnR activity. The PD engineer must figure out what margins are to be
kept at different stages of PnN for better optimization. One important thing to assess is where you
would want to fix the hold. Encounter does a good job in fixing hold after the post CTS stage, but
care must be taken regarding the number of hold buffers inserted. If there is a huge difference
between the estimated routing delay and the actual routing delay (after detailed routing), then the
PD tool might find the design environment at the post CTS stage very pessimistic for hold and may
end up adding more hold buffers than needed. At the same time, it may not be effective to wait until
the crosstalk fix stage to fix the hold violations. However, if the design is sensitive toward power,
then it may be worth it to assess the hold violation fixing through timing ECOs.
From the chip-level prospective, at the beginning of PnR, it is advisable to figure out the exact
number of interactions between the blocks (i.e. the interface pins and the frequency of every
interface). This helps in allocating the routing resources or customizing the critical interfaces. The
PD engineer must consider the keep-out region for around the block to stay away from crosstalk
issues from within the block.
PDV (physical DRCs and LVS) is traditionally planned at a very late stage of SoC implementation, but
it is highly recommended to look at this aspect early on. With the increased complexity in the
designs and stringent foundry rules at 40nm and below, PDV is no more an isolated activity left
toward the end. In some cases it may also have an impact on the floorplan. It is mandatory to clean
the Bump or Bond pad DRC before freezing the floorplan and I/O plan. The physical designer must
look at the overall DRC and LVS situation thoroughly with the pre-final netlist so that there are no
surprises seen once the design timing is closed with the final netlist. The block-level physical
designer must try to clean the DRC and LVS on their respective blocks before they are assembled
with the top level. The designer should assess the overall PDV situation, and should find solutions to
every DRC at the pre-final stage so that toward the end, with the huge schedule pressure, the DRC
and LVS cycle can be reduced and made predictable. Hence, PDV must be incorporated in the prefinal stage of the PD schedule.
PD integration
PD integration
An SoC typically has various components on one die. It needs thorough planning, an extensive
manual and automatic verification to ensure that each and every component functions as desired in
the SoC environment. The PD engineer must thoroughly study the layout integration guidelines of
each and every standard IP being integrated. The ESD, latch-up requirements and overall electrical
rules must be well understood. The designer should also be clear on the multiple power regions and
their isolation requirements. At a higher level, the ESD and latch-up rules appear simple, but their
planning and verification become very exhaustive. Therefore, it needs to be considered during the
floorplanning stage.
ESD is the transient discharge of static charge or current through its component when the chip
comes in contact with any charged body. If not designed carefully, a combination of these parasitic
components might trigger current flow in the substrate and result in shorting of power and ground
resulting in a huge current flow through its component, eventually damaging the device. This is
called latch up. ESD events and latch up might result in permanent damage of an ASIC, so it is
imperative to take preventive measures while designing the chip. Some of the considerations for
EAD and latch up are:
I/Os are the only external interfaces susceptible to ESD; hence the standalone I/O should be
protected. The ESD diode provides the protection for instant current surge and pass on to the low
resistance network. The designer should ensure that the power I/Os have ESD protection.
An ESD design should be such that for any external interface point there should be a lowestpossible resistance path for ESD current. Total bus resistance also includes resistance from Bump
to the I/O, or Bond pad to the I/O, so, the designer should ensure there are no weak connections
from the source.
The designer must design for a common ESD path. Usually its the core VSS since it has the lowest
resistance mesh and it connects to a big plane on the package.
Every power domain needs to be understood for its ESD scheme and needs to come up with an
ESD plan that will protect the entire design. Enough ESD clamps should be added in each of the
domains.
For latch-up protection, reduce the substrate resistance so that voltage drop does not build to
cause latch-up. Increase the substrate resistance so there is no stray current in the substrate to
cause latch up. Use guard rings/guard bands to isolate the regions with a potential chance of latch
up.
The designer must check that all the standard IPs have substrate contacts. Standard cells may or
may not have substrate contacts. Per foundry rule, TAP Cells must be added if the substrate
contacts are not built in. Memories typically have substrate contacts.
The designer should check that all the standard I/O has the built-in substrate contacts. LU
protection is important when I/Os are operating at different voltages, which are placed closer
together. Add substrate contacts (with Pdiff) or guard-bands (with ndiff) or a combination of both
to avoid LU.
For latch-up protection on IPs, designers must consider a careful placement of memories, complex
IPs and I/O domains, while considering voltage levels and proximity of the sensitive circuits to the
high-power/switching circuits.
Overall electrical rule checks require identifying a lot of rules and their verification. This should be
done both manually and automatically. In the early phase of implementation, PD engineers should
consider identifying and automating those checks.
Clock planning
The foundation of the clock network starts right at the architecture stage. A lot of planning must go
into architecting the clock before it reaches the clock tree synthesis stage at the physical design
level. Appropriate clock definitions and constraints should be the seed for the clock network layout.
A sufficient amount of time must be spent creating a clock specification file. For example, the
generated clock tree constraints file may not contain all of the necessary constraints. It might
require an understanding of clock strategy, which might help in defining the root pin. Therefore, the
recommendation is not to use the auto constraint file blindly, but to create your own file after
understanding the clock strategy. All clock group statements must be specified before any clock
specification. Clock grouping is done to ensure that the maximum skew between their sinks does not
exceed the maximum skew time specified in the clock tree specification file.
It is crucial for physical designers to understand the clocking from a design perspective. The
physical designer should have a clear understanding about modes and about how various clock
domains interact. In the case of multiple clock domain designs, it is crucial to create a worst or
merged-mode clocking constraint, which can cover the maximum timing-critical paths. The idea
must be to create a clock tree that is optimized for all the modes.
The starting point of a clock spec is to identify the appropriate clock buffers. The libraries contain a
range of clock net buffers and inverters that are designed to have nearly matching rise and fall
signal behavior. Such behavior helps the generation of balanced clock circuitry. The cells also have
much finer step in drive strengths compared to regular buffers and inverters. Additionally, the clock
net buffers are designed such that the input capacitance of each drive strength version is nearly
identical. This offers the possibility to exchange cells in a clock circuit to tune the drive strength
without affecting the loading of the net connected to the input of the cell and affecting the overall
clock tree performance. Usages of clock inverters are preferred over clock buffers since clock
inverters have the ability to re-generate the edges. Clock inverters maintain the duty cycle better,
which is crucial for half-cycle paths.
Clock planning has to be crosstalk aware. Having crosstalk on the clock path impacts setup paths
twice as severely as having cross talk on the data path because its accounted for in both capture
and launch delay computation. Crosstalk in the common clock path cancels out the impact during
hold time analysis, since analysis is done on the same edge, while for setup its accounted twice.
Therefore, its extremely important to lay out the top-level clock tree with the least amount of
crosstalk. This will directly benefit the block level setup timing closure. One of the factors that PD
engineers should consider is to budget the top-level crosstalk while closing the timing at the block
level. One should be pessimistic and realistic so as not to be surprised when block-level timings are
seen at the top-level environment.
So crosstalk aware clock planning is challenging in 40nm and below. Usually the critical path data
optimization is already exhausted at multiple levels from physical synthesis to physical optimization.
When it comes to crosstalk fixing, it is usually solved through more custom efforts. The default flow
does not help much. Usually there is not much room left for routing, due to floor plan optimization
and the higher pin density inherited from the RTL complexity. All these factors make crosstalk
optimization a challenging job. Some of the key points to plan while synthesizing and routing clocks
are:
Using max distance along with slew constraints can help in building an SI free clock network
without shielding.
Use timing aware downsizing of aggressor net driver to fix SI issues.
Use max distance constrains wisely. Though the number of clock buffers used are more in the
tighter Max distance constraints, the clock latency and skew comes out to be much lesser.
Change routing topology of aggressor and/or victim net segments.
Relying on the tool to achieve a desired clock tree result may not be a good idea. The physical
designer can understand and specify the requirement much better. Below is an example where one
can see the divergence reduction going from the tool-based CTS approach to semi-auto (custom)
CTS approach.
Tool-based auto CTS approach
Tool-based auto CTS approach
Clock routing to Block1
Clock routing Block2
Observations on tool- based auto CTS

There is not much common point from where the tool diverges between the interacting blocks.
Minimum common point between interacting blocks leads to divergence.
More divergence => more skew
Total divergence distance in block1 = 6106+2402=8508u
Total divergence distance in block2 = 2x3060+3905= 10025u
Skew due to divergence
block1-to-block2 = 8508xderate -10025xderate
block2-to-block1=10025xderate - 8508xderate
Semi-custom CTS approach

The following guidelines are adopted:
Manually add anchor points (clock buffers) at the desired location in the layout.
Detach the clock sinks from the source (pll) and attach them to the desired anchor point. This
requires an ECO to be done in the netlist.
The clock path from source (pll) to block1 spans an approximate distance of 19mm out of which the
divergence is only for 3100um (6 buffers).
Anchor points are inserted from pll until the maximum common point between interacting blocks
from a floorplan placement perspective.
Clock tree after semi-auto CTS approach
Advantages:
Designer dictates the diverging point between the interacting blocks/sinks through ECOs.
Divergence is optimal.
One can also consider creating a feed-through path through a block. Example below:
Reduced clock divergence between the blocks
Timing Constraints, Budgeting

Constraints drive the chip implementation, and the implementation of a chip is verified against
required constraints. So, one can say constraints are a bounding box of SoC implementation. A
definition of constraints must start at a very early phase of RTL development. The starting point
must be the understanding of various I/O specifications and data sheets, and translating them into
an easily understandable spreadsheet in which SDC should be coded. Half the job is done if the
clocks are correctly defined, in case of multiple complex clock design.
Constraints are typically a refinement process. Its important that constraints are coded from a full
chip prospective. For large hierarchical chips, the crucial part is translating the constraints from top
to bottom (i.e. chip level to sub-chip level). It is also equally important to merge sub-chip level
constraints at the full chip level. For example, bringing a sub-chip exception to chip level. This is
where the budgeting methodology plays a key role. The designer must be able to accurately estimate
the top-level interconnect delay based on the delay estimation exercise. The data from the delay
estimation can be utilized to devise the block level I/O constraints in the early phase. The basic idea
of accurate budgeting is to reduce the STA iteration between blocks to the top. The block-level
constraints should be complete and accurate enough so that once a block is timing closed at the
block level, it should remain timing closed once its seen in the full chip prospective.
In the full chip context, there are two kinds of interactions that need to be understood thoroughly
before executing the budgeting, one block-to-block and second block-to-chip I/O. These require a
clear understanding of some important parameters, like pad delays, interconnect delays, clock
propagation, data propagation, clock-data skew and the slack margin parameters (i.e. the available
and setup/hold, max/min delay in the interface data sheet).
In fact, before thinking about budgeting, the STA engineers must come up with the STA strategy for
the full chip. Hierarchical STA is the way to go for large ASICs. The STA engineer must come up
with a strategy for timing and verification of the chip at various stages. In the initial phases it would
be the model based STA. QTM, ETM or ILM models can be used. One can come up with a mixed
model approach depending upon the RTL and PD stage of each of the blocks. For the large and
complex ASIC, it is very important that full-chip timing is continuously seen from the early RTL
development stage so that the PD and STA engineers get a good hold on the chip interface and
block-to-block interconnect timings.
Its worth spending time in cleaning up the constraints at the pre-layout stage of physical design.
Constraints or timing analysis should drive the PD implementation because eventually the quality of
PD implementation is validated against the constraints.
Design closure and signoff
At the design closure stage PD engineers primarily verify and signoff the design for timing, power
and foundry rules. However, the strategy for signoff must be chalked out well in advance in the PD
implementation stage. The designer must have clarity on the operating condition of the chip and
create the power and timing verification environment accordingly. It could be wise to look into it
before the PD implementation is in full swing.
Timing closure
Timing closure
Another important aspect is to define the signoff corners. It is worth taking a look at reducing the
timing closure corner. The application of the chip should be taken into consideration while coming
up with the various permutations and combinations of STA corners. One should avoid overdesigning
for non-existing application corners. Every additional STA corner has a cost of efforts and iteration
time.
Defining the total number of functional and DFT modes right at the beginning of the PD
implementation stage is crucial. It has to be thoroughly verified that the STA modes of the chip are
optimum and cover everything required for timing verification, keeping all the defined applications
and the environment in mind. STA, DFT and PD engineers must spend time to craft an overall chip
timing verification strategy. The mode coded in the SDC for PD implementation has to be
comprehensive enough to cover all the modes. So, the key is rightfully merging the modes for PD
implementation as well as for timing verification.
Signal integrity has been one of the key challenges in complex SoC design at technology nodes like
40nm. Coupling capacitance becomes proportionally larger at these geometries. Interconnect delays
become bottlenecks for performance improvement. Large-scale integration of systems into silicon
brings many long interconnects between the blocks. Signal arrival times at the destination
dynamically change on these long interconnects, as a result of large associated coupling
capacitance. The induced delay (positive or negative) is called crosstalk delay. Crosstalk delay is
hard to predict due to its dynamic nature, and if not taken care of it will adversely affect the
performance of the device and may lead to device failure. Some of the traditional timing
closure/design budgeting techniques become ineffective while closing timing for crosstalk induced
delay. The challenges are realistic analysis, avoiding pessimism, careful isolation of valid violations,
and determining a methodology for efficient fixes.
Some of the key points to drive crosstalk-aware implementation and analysis are:
Account for larger die area for more buffers, less net congestion, more shielding, etc. so that
crosstalk effects are mitigated.
Add a couple of weeks in the schedule just for crosstalk cleanup.
Start the process early.
Make sure the crosstalk settings are realistic, and have a dependable crosstalk delay analysis flow.
Ensure slew (transition) control from the beginning.
Shield noisy macros like analog cells.
Shield long busses by routing ground (VSS) stripes beside them.
Macros like DPLLs, DLLs, DCDLs and RAMs should be placed near the blocks that use them.
Otherwise, long outputs running from such macros would be disastrous from a crosstalk
perspective. Create placement and routing blockages around some of these skew-sensitive macros
to avoid possible crosstalk.
Interleave address and data busses with each other during port placement of the internal blocks. It
is less likely that the address bits and data bits change their value at the same time.
Use ILM-like models for top-level analysis.
Attack the root cause to reduce the number of iterations.
Control various parameters of the flow to get the results for the right analysis. For example,
control the number of aggressors and scaling factors, clock grouping, etc. to perform accurate
analysis.
Post-process the reports efficiently to filter out the real violations.
Derive efficient methods to fix the violations with minimal effect on other domains.
The timing closure engineers should be cognizant of the PD process and its impact on timing. So, the
timing closure should be a continual process with the PD closure. The impact of chip finishing like
DFM is another important aspect that the timing engineer should not leave until the end. The metal
fill, the redundant via, etc. can impact the ground cap and the net resistance, and therefore, the
crosstalk. Hence, the DFM process and its impact on timing should be assessed well ahead of the
tape-out phase.
Power analysis for signoff
Rail analysis (i.e. the static and dynamic IR) is usually done once the design has run through PD
once. However, this data could be very important in validating the power plan and the floorplan
itself. Accuracy of analysis is the key. The designer should be careful in choosing the correct corner,
while generating the power analysis data for rail analysis. Knowing where and how to apply power
rail analysis can save a great deal of time in power planning and verification. For todays designs,
both static and dynamic analysis should be utilized from initial floorplanning (power planning)
through sign-off.
Some important points to consider when determining a methodology for power rail analysis include:
Use static analysis to generate robust power rails (widths, vias, etc.).
Use dynamic analysis to optimize the insertion of de-coupling capacitance.
In the case of power gate design, use static analysis to optimize power switch sizes to minimize IR
drop. Use dynamic power-up analysis to optimize power switch sizes to control power-up ramp
time.
Use both static and dynamic analysis early and late in the design flow.
Establish IR drop limits based on understanding how IR drop can affect timing.
Try to optimize for decoupling capacitors early in the flow, since late optimization for de-caps can
lead to major re-work.
Another important part of Rail analysis is EM analysis. Designers must ensure that the design meets
the DC current density limits of the power mesh and the power rails before finalizing the power
plan. It becomes more critical if the design is packed with memories. The power tapping to the
memories needs be sufficient to sustain the current densities.
Signal EM is also becoming crucial for high-frequency designs at technology nodes below 40nm.
Signal EM can impact the clock network. However, if proper care is taken, such as NDR for clock
network, then chances of signal EM violation can be reduced. Before analyzing the EM violation, the
designer must validate the DC and AC current limits in the tech file and validate them through DRM.
The designer should also be aware of the operating conditions while generating the current density
models for EM analysis. Typically EM is signed off at 110C, and tech file and DRM provide the table
for 110C.
In summary, there is a lot at stake when the power requirement is established and power is
estimated. Hence, its extremely important to have as much accurate data as possible early on in the
process in order to make the decisions. Because there is so much at stake in terms of providing the
direction for ASIC development, its absolutely necessary to make the decision in the very early
stages of ASIC development. It is also advisable to do a complete rail analysis (i.e static, dynamic IR
and EM analysis) well before the tape-out phase. If touching a lot of metal layers to fix rail violations
post timing closure occurs, it could cause schedule overhead.
Summary
Large SoCs in a smaller geometry technology increase the design complexity multifold. The
traditional waterfall approach of SoC implementation can no longer guarantee a predictable
schedule and reliable silicon. Upfront and thorough thinking, in every aspect of SoC development, is
needed for todays SoC designs. Thorough planning is required from an early stage, and all functions
must work cohesively and in parallel with each other. More reviews and more involvement of teams
across the functions can reduce the risk of mistakes. Additionally, with higher constraints in the
higher technology nodes (mainly variation), and with the cost of manufacturing, it is important to
place more hooks and checks prior to signoff, for successful single pass silicon.
More about Bhupesh Dasila
Also see:
Design planning for large SoC implementation at 40nm: Guaranteeing predictable schedule and
first-pass silicon success
Design planning for large SoC implementation at 40nm - Part 2
Kilopass develops embedded multi-time programmable non-volatile memory in 40nm logic CMOS

Design Planning For Large SoC Implemention at 40nm Part 3

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Design Planning For Large SoC Implemention at 40nm Part 3

Hochgeladen von

Copyright:

Verfügbare Formate

Design planning for large SoC implemention

Change routing topology of aggressor and/or victim net segments.

Clock routing to Block1

Clock routing Block2

Observations on tool- based auto CTS

Semi-custom CTS approach

Clock tree after semi-auto CTS approach

Reduced clock divergence between the blocks

Timing Constraints, Budgeting

Das könnte Ihnen auch gefallen