Digital Design Implementation

20 October, 2016

We got introduced to the concept of Double patterning at 22/20nm due to lithography limitations. Then came the onslaught of the 3D transistor complexity in the form of FinFETs at 16/14nm. Just when things were stabilizing, the transition to 10nm/7nm thrust upon us multi-patterning and new sub-metal layer physical design constraints. Place and route software undergoes extensive updates with each new node transition, but the changes have typically impacted the router and DRC checker to handle new and more complex routing rules. However the transition to 10nm, has impacted the entire implementation flow and the key P&R engines including the placer, optimizer, timer, clock tree synthesis, etc.

We just published a new whitepaper describing the new constraints and how they affect the physical design tools. You can download it here: Understanding Physical Design Constraints in the 10nm Era.

As a summary, the new physical design rules can be classified into two main groups – submetal layer errors and metal/via layer errors. Here are some examples of the submetal layer rules and metal/via layer rules and how they can be addressed by the place and route engines.

Submetal layers DRC errors

  • Width, spacing, and area DRC on implant layers
  • Jog rules, typically on Oxide Diffusion (transistor active area) layer
  • Prohibited Drain-Drain abutment

Metal/via layers DRC errors

  • Direct DRC violations, including same mask spacing, between ports or blockages of adjacent lib cells
  • Violations between lib cell ports or blockages and preroutes, such as wires and vias in the power/ ground grid
  • Unroutable cell ports
  • Pin alignment and track color matching

Submetal rules—width, spacing, and area – Standard cells contain just two shapes on submetal layers, dividing the cell vertically in half—one half for N implanted area, another for P. These shapes are usually expressed in LEF files as blockages and are often called implant layers. If such a submetal shape is too small then it is flagged as a DRC violation as shown in the diagram. These violations can be either avoided or fixed by the placer either with cell movement or by inserting fillers


Submetal rules—oxide diffusion jogs  – Min-jog violations is another sub metal rule that applies to the oxide diffusion (OD) layer and shown in the diagram. A min-jog violation can be fixed by the placement engine by inserting a matching cell to the cell in the middle or inserting a gap that will be filled later.

Metal and via layer rules—pin access and direct DRC with preroutes – Pin access problems are becoming more common due to the narrower standard cell structures (thanks to the 3D devices) and complex power grid structures. Abutted cells can cause pin blockages, but deploying a blanket prohibition of abutment between all cluster members is not the ideal solution. Mentor’s place and route tool takes a statistical/analytical approach and uses soft constraints to improve routability and pin reachability. The placer uses the Global Router pin density map to determine optimal placement of standard cells.

Metal and via layer rules—Pin-to-track color matching  – Pin-to-track color matching is a new placement constraint for 10nm and is a consequence of self-aligned double patterning. The cell ports must be centered on a routing track and the mask and track colors must match.

Capture3Routing pitch is not always equal to the 2x site width, so a cell can easily land in locations where ports will either miss the track or will be on the opposite color. Mentor’s placer can figure out the discrete subset of legal locations for each library cell that will guarantee alignment and mask matching between cell ports and routing tracks.

As we continue our march towards smaller geometries, new classes of physical constraints affect the digital implementation flow. These rules need to be considered by all the key engines including global placement, optimization, and clock tree synthesis. I have only touched upon some of the new constraints and how they affect the digital implementation flow. You can learn a lot more in the whitepaper.

, , , , , , , , ,

27 September, 2016

Among the many threats to design schedules and design quality is achieving physical verification signoff on time and with minimal ECO iterations. At advanced nodes (16/14/10/7nm), with the introduction of FinFETs and Multi-patterning, designs now have to meet very complex rules for DRC, multi-patterning (MP), and DFM (Figure 1). The number of physical verification errors found during signoff is increasing significantly, which requires multiple, sometimes non-convergent, signoff iterations. Each pass involves time-consuming transfers of huge data files between the signoff analysis and the implementation tools. Multi-pattern conflicts, unlike DRC violations are global in nature and could impact several objects that span significant distances. Fixing multi-patterning problems in a local scope might contribute to global cycles and it might be too late to leave multi-cycle fixing to the last step of routing flow.

DRC-CountDesigners used to be able to complete physical design before starting physical verification signoff analysis, but that approach no longer works due to the inherent complexities of advanced-node designs. The optimal way to solve these growing manufacturing closure problems is to address them starting in physical design, well before sign-off verification. This moves manufacturing closure actions early enough in the design process to allow them to be effective, to avoid late stage surprises and to avoid non-convergent iterations.

Mentor’s Calibre InRoute is and interactive design and manufacturing closure platform that enables all the Calibre sign-off capabilities from within the place and route environment. It is built on Mentor Place and Route System (Nitro-SoC & Olympus-SoC) and Calibre, the industry standard for manufacturing sign-off. Calibre InRoute. Some of the features of Calibre InRoute include:

  • DRC, DFM, and multi-patterning analysis and fixing during design with direct access to signoff engines ensuring that the manufacturability issues are resolved without introducing new ones or degrading design performance
  • Automatically fixes and incrementally verifies violations at either the block or full-chip level
  • Uses routing technology with native coloring, verification and smart conflict resolution engines to handle both local and global multi patterning violations
  • Concurrently optimizes the layout for all timing, power, signal integrity (SI), and manufacturing issues;

CIRCalibre InRoute is based on the open router architecture that allows the place and route tools (Nitro-SoC and Olympus-SoC) to natively invoke the Calibre sign-off engines in the inner loop of the router without any file transfers. It has API-level access to the Calibre engines and performs true signoff analysis and then uses the router to automatically fix any violations without introducing new ones or degrading design performance. All violations found with Calibre InRoute are persistent in the place and route database, and can be viewed and edited through the error browser. Calibre InRoute includes the full suite of Calibre capabilities.

Shifting physical signoff into place and route minimizes the growing gap between design and signoff environments to improve design schedules and design quality and speed time-to-market. With Calibre InRoute, manufacturing closure that takes weeks or months can be reduced to days.

White paper:

, , , ,

6 September, 2016

No self-respecting engineer reads user manuals, even if that VCR blinks 12:00 its entire life.  VCR days were simpler times – try deciphering the acronyms on the dashboard of any modern car without the aid of a manual. When there is a vast array of knobs, switches and buttons labelled LKAS, TPMS, SH-AWD, LDWS, ACC, IDS etc., along with a capacitive touch screen display, stacked in three different levels; it makes the simple task of tuning your radio an absolute delight. Apparently most of the modern cars have enough compute power and technology to put the Apollo guidance system to shame.


After decades of stagnation the auto industry renaissance has begun – the end goal is to deliver completely autonomous, fully networked transport pods, electric or otherwise. In the interim most of the auto manufacturers are offering solutions for semi-autonomous cars (Tesla, BMW, Infiniti to name a few) with driving aid systems that have these fancy acronyms.  As a result the number of IC components in modern cars have grown significantly to help this endeavor – check the different technologies packed into a modern car for comfort, safety and reliability.

The automotive segment is the fastest growing (10%) for ICs due to the myriad of technologies essential to make self-driving cars a reality. A slew of sensors, cameras and radars and ECU’s combined with advance software algorithms and high compute power are essential to make critical decisions for these semi-autonomous vehicles. Auto-IC growth

The auto IC’s are subject to harsh conditions with extreme temperatures, vibration stress, dust and corrosive materials. The typical design challenges for auto ICs have to do with reliability and safety. But what about power consumption, in light of this explosive growth of semiconductor components?  This plethora of new technologies that have been enabled by hundreds of ECUs and MCUs come at a cost. The additional weight and the cooling systems required to dissipate heat have an impact on the fuel efficiency for Internal Combustion engines, and battery range for electric vehicles. The impact, especially on the electric cars is going to be more noticeable as the electric vehicles are already facing range issues with limited onboard power. It’s just a matter of time before the auto manufacturers tighten the noose of the power budgets for these IC’s. It is much easier to put the onus on the IC design community than improving the efficiency of the internal combustion engine or invent new battery technologies. After all the semiconductor industry is the one that is keeping pace with Moore.

, , ,

7 December, 2012

Do FDSOI and FinFET technologies provide better performance and better power than bulk? Will FDSOI at 20nm bridge the 16nm finFET gap? Does finFET offer better cost benefits than FDSOI?  Does shamwow actually hold 20 times its weight in liquid? While the last claim is questionable, the jury is not out yet on the FDSOI vs. FinFET war. Proponents of both these technologies claim significant power and performance benefits but there is no clear winner yet as these are relatively new technologies that are still maturing and its too early to call.

FDSOI or Fully Depleted Silicon on Insulator (FDSOI) technology relies on a thin layer of silicon that is over a Buried Oxide (BOx). Transistors are built into the thin silicon layer that is fully depleted of charges and hence provides some unique advantages over bulk.

Cross section of a FDSOI transistor

FDSOI technology claims better power and better performance than its bulk counterpart. Since the body is fully depleted the random dopant fluctuation that plagues bulk CMOS is reduced which  helps improve performance even at lower VDD. Power/performance claims of 30% to 40% are not uncommon and FDSOI is already in production at 28nm and is positioned as an alternate option to  bulk 20nm. Even if FDSOI at 28nm delivers half the power savings of bulk 20nm, I would take it any day rather than dealing with the beast that is Double Patterning. I digress. One of the other untold  benefits from a P&R perspective is that the FDSOI technology can use the conventional design flows and is completely transparent to the tools.

FinFET is another newfangled technology using 3D transistors that promises the sun and the moon in terms of power, performance and area. FinFET devices have their channels turned on their edge  with the gate wrapping around them. The term “fin” was coined by professors at Berkeley to define the thin silicon conducting channel. This unique configuration provides a gate that is wrapped  around the channel on all three sides thereby delivering much better channel control and better resistance to dopant fluctuations. Due to the innovative 3D structure and tighter channel control this technology delivers improved area better performance and lower power than bulk. P&R flows are expected to have minimum impact due to FinFET devices. FinFET technology is in production at 22nm and is quickly ramping up for the next generations.

FinFET with wrapped gate

Whether these two technologies will continue their battle to dominate or collaborate and co-exist successfully as we continue the march towards single digit micron devices remains to be seen. Once  thing is for certain – both these technologies will give a significant boost to the designers in terms of power reduction and performance. As always, production volumes with determine the eventual  winner. Just look at shamwow sales numbers if you don’t trust me.

, , , , , ,

20 August, 2010

For those of you who were wondering if I had fallen off the face of the planet, the answer is no. My mind was stuck in a limbo when I got hurt in an extraction (not the parasitic kind) mission. Confused? Read on…

I finally got to watch the critically acclaimed sci-fi movie Inception last weekend and life has never been the same again. Without giving away too much detail for the benefit of those who have not watched it yet, the main plot involves dreams within dreams within dreams – three level s to be precise to incept an idea into someone’s sub conscious mind. Are you still with me? Never mind, the first thing that came to my mind when I was exposed to the concept of dreams within dreams was – Nested domains in Multi-Voltage designs. Blame the nerd gene for triggering this reaction but the truth remains.

One thought led to another and before long I was dreaming about nested Multi-Voltage domains with donut shaped domains, which happens to be reality. The donut shaped nested domains is one of the new emerging flavors for nested Multi-Voltage designs and it brings a new set of requirements and challenges for the MV flow. Some of the key considerations for the donut shaped nested domains are:

  • Number of levels of nested hierarchy
  • Defining donut domains in the UPF
  • Hierarchy and netlist management for the top level and the donut domains
  • Placement of cells based on connectivity in the donut hole and the top level
  • Handling of level shifters based on connectivity (need to be placed in the donut hole or the top level)
  • Handling of isolation cells if the donut domain has a switching supply
  • Power routing to the donut hole if the donut domain has a switching supply
  • Power supply routing to the donut domain if the top level has a switching supply
  • Handling power switches if either the donut or the top level has a switching supply
  • Building a balanced clock tree for the donut domain
  • Signal Routing within the donut domain boundary and meeting timing requirements
  • Always-on buffer handling for the donut hole or the top level
  • Ensuring power integrity for all the domains, etc.

Nested Donut Domains in Multi-Voltage Designs

Nested Donut Domains in Multi-Voltage Designs

If there are more than two levels of nesting with donut shapes this list will get even longer and much more complex. Why exactly a designer would need a donut domain is beyond me but whoever planted the idea is playing a cruel practical joke. Now, if you will excuse me I need to go and spin my top.

, , ,

4 February, 2010

Step 0 Commitment – Are you really sure you want to MV? Are you positive that Multi-Vt & Clock gating would not help with your power budgets? Proceed to step1 with caution only if you really must.

Step 1 Architecture Selection – Ensure that the architecture is frozen and capture all the power constraints required for the chosen MV style in the UPF file. As most of you are aware this can also be done using the other power format but we will stick to UPF as it simplifies interoperability

Step 2 RTL Synthesis – Using the UPF file Complete RTL synthesis and derive the gate level netlist. Ensure that the simulation & verification runs are complete and validated

Step 3 Data Import – Import  LEF, lib, SDC, Verilog, and DEF.  Properties that are relevant to the multi-voltage design flow are:

  • Special cells in Library (always_on, is_isolation_cell, is_isolation_enable, is_level_shiter)
  • Corner & Modes – Define appropriate modes and corners for the different domains. Ensure that the worst case timing and power corners are setup correctly to concurrently optimize for power & timing

Step 4 Power Domain setup – Read the power domain definition by sourcing or loading the golden UPF file (same that was used for RTL synthesis).  After reading the UPF file, the following items will be defined:

  • Domains with default power and ground nets
  • Power state table to define all possible power state combinations
  •  Level shifter and isolation rules for the different voltage domains

Step 5 Floorplanning –  Create physical domains and the corresponding power structures for each individual supply net defined in the UPF. Define Domain-specific hierarchy mapping and library association based on the architecture. Insert power switches for domains that are shut down (either VDD or VSS gated)

Step 6 Power Domain Verification – Perform design checks for general design and UPF setup, verification of level shifters and isolation cells, and analysis of always-on connections. The intent here is to help you find any missing UPF or power domain setup data that could lead to potential misery.

 Step 7 Pre-CTS Opt – During the Pre-CTS flow ensure that no port punching occurs on power domain interfaces. The optimization engine should use the power state table (PST) when buffering nets in a multi-voltage design to automatically choose always-on-buffers or otherwise. Nothing much you can do since you are the mercy of the tool.

 Step 8 CTS – During CTS ensure that no port punching occurs on the power domains interfaces. Like the optimizer, the CTS engine should also use the PST-based buffering solution to determine the type of buffers to use while expanding the clock tree network. Some clock tree synthesis flows require special clock gate classes to be recognized in order to restrict sizing operations during CTS to equivalent class types. Have you been nice to your R&D lately?

 Step 9 Routing – Ensure that the routing engine honors the domain boundaries and contains the routes within them. Secondary power pin connections for special cells such as always-on buffers and level shifters should also be handled using special properties set on the power pins. Many design flows also require double vias and non-default width wires for routing of the secondary power connections. Top level nets that span across domains can be handled using gas stations to help optimize timing and area. Hail Mary…

 Step 10 Hope and Pray – This step is optional. If your chip is DOA start from step 0 and repeat until you either have a working part or unemployed.


, , , , , ,

14 December, 2009

Clock designers are an enigma. Clock designers in general are die hard star wars fans, own vintage Porsches that leak oil by the gallon, usually have lava lamps in their offices/cubicles, wear fancy leather jackets in peak summer and have likeminded clock designers as best lunch buddies. Clock designers are notorious for making other lesser designers cry with their fancy PLL spice runs, non-negotiable skew numbers and for being resource hogs, especially higher layer metals. Clock designers live and breathe Pico seconds & watts (more recently) while the lesser mortals are perfectly happy to go for beers after two optimization runs. I have never been a clock designer myself but I have worked with clocks & clock designers for the longest time in my career, first as a design engineer, poring over timing reports and then as an application engineer supporting a sign-off timing tool. Building a good well balanced clock tree and effectively managing clock skew has been a challenge since the first transistor was invented and it still is today, especially at 28 & 22nm – The only difference is that now power is in the mix along with timing which complicates things even more. At smaller technology nodes the clock network is responsible for more than half the power consumed on any chip and majority of it is dynamic power due to the toggling clock.

As we are all are aware, clocks are a significant source of dynamic power usage, and clock tree synthesis (CTS) and optimization is a great place to achieve power savings in the physical design flow. The traditional low-power CTS strategies include lowering overall capacitance, specifically leaf caps, minimizing switching activity and minimizing area and buffer count in the clock tree.

While the traditional techniques help optimize clock tree power to a certain extent, Multi-Corner Multi-Mode (MCMM) CTS is an absolute must for achieving optimal QoR for both timing and power. One of the biggest challenges of design variation is clock tree synthesis. In smaller nodes, large variations of resistance seen across various process corners pose additional challenge of balancing the clock skew across multiple corners. With the proliferation of mobile devices, clock trees have become extremely complex circuits with different clock tracing per circuit mode of operation. Further, building robust clock trees that can withstand process variation is a huge challenge for the design teams.

Getting the best power reduction from CTS depends on the ability to synthesize the clocks for multiple corners and modes concurrently in the presence of design and manufacturing variability. Multi-corner CTS can measure early and late clock network delays over all process corners concurrently with both global and local variation accounted for. A multi-corner dynamic tradeoff between either buffering the wire or assigning it to less resistive layers is essential in order to achieve the best delay, area & power tradeoff.  In comparison to the 1M1C flow the MCMM CTS solution provides significant reduction in area, buffer count, skew, TNS, and WNS in addition to lower dynamic power.

Now, before I forget let me state what I wanted to in the first place – I have to confess that I can now relate to this unique breed of clock designers and have utmost respect them for solving some of the most difficult chip design challenges thrown at them. A whole new generation is evolving with much cooler iphones and Gore-Tex jackets.

, , , ,

30 September, 2009

Resistance is futile. I recently caved and switched to an iphone after having been a loyal Google phone user for more than year. Apart from the coolness factor, my main motivation was corporate mail support that was absent in Gphone, plus the fact that I got the iphone for free when my wife upgraded hers. The difference is day and night between the two phones – The iphone UI is much friendlier, menu options are simple and logical and the device is much faster for certain applications like browsing, data download and video capture. Most of the modern smart phones/PDA are increasingly employing the Multi-Voltage technique, specifically Dynamic Voltage and Frequency Scaling to reduce power without sacrificing performance. The iPhone designers, unlike the Gphone have done a good job of creating this balance between the different applications running on the device.

Regardless of the phone type Multi-voltage designs unlike the vanilla designs, are difficult to implement because of the inherent complexity and the need to handle special cells such as level shifters and isolation cells. In addition these design styles also cause the number of modes and corners to increase significantly when min/max voltage combinations from all the power domains are considered. Because each different voltage supply and operational mode implies different timing and power constraints on the design, multi-voltage methodologies cause the number of design corners to increase exponentially with addition of each domain or voltage island. DVFS further complicate matters with varying frequency and clock combinations leading to even more design modes and corners. Additionally, the worst case power corners don’t necessarily correspond to the worst case timing, so it’s critical to know how to pick a set of corners that will result in true optimization across all design objectives without excessive design margins.

So, what’s the story you might ask? It’s pretty simple (or not) – In order to effectively close these MV  design across all modes, corners, timing and power must be concurrently analyzed and optimized simultaneously for different combinations of library models, voltages, and interconnect (RC) corners. In essence True and concurrent Multi-Corner Multi-Mode analysis and optimization is a pre-requisite for any Multi-Voltage Design. Anything less would not guarantee convergence because optimization in one scenario could create a new violation in a different scenario, lead to multiple iterations, create unpredictable ECO loops, result in poor QoR and possibly reduce yield. In other words, low power designs, specifically MV designs, inherently require true MCMM optimization for both power and timing.

Now, will I go back to the Gphone? If and only if, they support corporate mail and also improve the performance. It wouldn’t hurt to jack up the coolness factor either. Till then I will remain an iphone user (loyal or not is debatable). The only problem with the iphone is, quoting Seinfeld “if you are mad at someone you cannot slam the iphone, but instead you will have to slide the phone off”.





, , , , ,

11 August, 2009

A recent issue of the New York Times Sunday magazine ran a very interesting article on data centers. The author, Tom Vanderbuilt, describes the massive infrastructure that powers the internet and so much of our daily lives. The article says that data centers use more energy annually than Sweden. This includes the wattage to run the servers, to cool them, and the leakage power from the estimated 30% of servers that are in standby. According to the 2007 EPA study the data centers consumed a staggering 60 Billion Kwh in 2006 and are projected to reach 100 Billion Kwh by 2011 at a cost of ~8 Billion dollars. The biggest challenge that is being faced today for the growth of these server farms is the availability of power and the cost associated with cooling apart from the fact that they are not eligible for Government bailout.

Many ingenious design ideas are being considered for the next generation data centers to effectively manage the heat generated by thousands or millions of servers – A roofless building with servers packed in shipping containers, an underground bunker, server farms in the Arctic & Antarctic, and Google’s “water-based data center”. Even if they manage to build these efficient server farms, delivering power to these farms them would be a monumental task.

With the emerging middle class in this global economy and their quest for knowledge it is inevitable that more and more kilowatts will be needed to run these ever expanding data centers. The short term solution for this problem is to build the data centers close to the power generating stations as evidenced by the Google farm close to the Columbia River dam. The more effective long term solution is to design more power efficient chips that also provide the high performance needed for these compute intensive servers. I believe that the fundamental solution still lies in the very chips that power these servers. Microprocessor design is already adapting to this need by packing more cores on a single die without increasing the clock frequency. A lot more can be done to further minimize power using advanced power reduction techniques for both processor and peripheral designs. The key is to employ power optimization at all stages of the design flow starting at the system level, board level, architectural level and finally at the component level.

Stories like this in major news outlets remind us that low power design is not a techie issue, but rather an absolute necessity for the growth of our economy and culture. Regardless of whether you use Bing or Google, one thing is certain – data centers powered using low power chips will directly help minimize green house gases and pave the way for the next generation of green server farms. I personally prefer Bing.

29 May, 2009

After some cautious and tentative moments, I finally managed to get my first post out. In this debut blog post, I’d like to introduce myself, present my bona fides and give you some idea about the likely content you’ll see here on a regular basis.

 I’m Arvind Narayanan, Product Marketing Manager in the Place and Route Division at Mentor. I started my career as a Microprocessor design engineer for Hal Computer Systems (no the name was not derived from the movie) when 0.3u was state of the art and 100 Mhz was considered blazingly fast. I have been with the semiconductor industry for about 14 years in different capacities, ranging from processor design engineer to application engineer and currently in Product Marketing. After my design tenure at Hal, I worked at Synopsys focusing on STA and then at Magma Design focusing on low power design implementation and analysis. I earned my Masters degree in Electrical and Computer Engineering from Mississippi State University, and my MBA from Duke University (and I’m ardent Blue Devil fan!).

In the last five years I have seen the “low power” buzz gather enough momentum to the point where it is now being used in the design engineer vernacular alongside with timing – as in “Dude, did you close the design and are we cool?” I have been part of key product launches primarily targeted for the power-savvy engineering community to design greener chips. I have also been very involved in the development of the Unified Power Format (UPF), now p1801, initiative right from the inception to give the designers the long awaited power constraints file – take that sdc. My position here affords me a comprehensive view of the low-power trends and challenges faced by a variety of in-the-trenches designers. I will speak to the usefulness of different technologies and methodologies, talk about new ideas that arise in the technical press or conferences, and describe how specific challenges in low power design have been solved by our customers.

Please add this blog to your RSS feed, and leave comments and questions for me. I hope for this to be a fruitful and engaging encounter for everyone. A good place for me to start is by pointing you to the newly remodeled Mentor website with a dedicated low-power solution section. I won’t reiterate the website content in my blog posts, but the methodologies you’ll find on the Low Power Solution site reflect my point of view. Check it out when you have some time to spare:

 Next blog, I’ll ruminate on the relative importance of ESL, implementation, and verification to overall power reduction. Till then I remain…