Power PlayPower Play RSS
Do FDSOI and FinFET technologies provide better performance and better power than bulk? Will FDSOI at 20nm bridge the 16nm finFET gap? Does finFET offer better cost benefits than FDSOI? Does shamwow actually hold 20 times its weight in liquid? While the last claim is questionable, the jury is not out yet on the FDSOI vs. FinFET war. Proponents of both these technologies claim significant power and performance benefits but there is no clear winner yet as these are relatively new technologies that are still maturing and its too early to call.
FDSOI or Fully Depleted Silicon on Insulator (FDSOI) technology relies on a thin layer of silicon that is over a Buried Oxide (BOx). Transistors are built into the thin silicon layer that is fully depleted of charges and hence provides some unique advantages over bulk.
FDSOI technology claims better power and better performance than its bulk counterpart. Since the body is fully depleted the random dopant fluctuation that plagues bulk CMOS is reduced which helps improve performance even at lower VDD. Power/performance claims of 30% to 40% are not uncommon and FDSOI is already in production at 28nm and is positioned as an alternate option to bulk 20nm. Even if FDSOI at 28nm delivers half the power savings of bulk 20nm, I would take it any day rather than dealing with the beast that is Double Patterning. I digress. One of the other untold benefits from a P&R perspective is that the FDSOI technology can use the conventional design flows and is completely transparent to the tools.
FinFET is another newfangled technology using 3D transistors that promises the sun and the moon in terms of power, performance and area. FinFET devices have their channels turned on their edge with the gate wrapping around them. The term “fin” was coined by professors at Berkeley to define the thin silicon conducting channel. This unique configuration provides a gate that is wrapped around the channel on all three sides thereby delivering much better channel control and better resistance to dopant fluctuations. Due to the innovative 3D structure and tighter channel control this technology delivers improved area better performance and lower power than bulk. P&R flows are expected to have minimum impact due to FinFET devices. FinFET technology is in production at 22nm and is quickly ramping up for the next generations.
Whether these two technologies will continue their battle to dominate or collaborate and co-exist successfully as we continue the march towards single digit micron devices remains to be seen. Once thing is for certain – both these technologies will give a significant boost to the designers in terms of power reduction and performance. As always, production volumes with determine the eventual winner. Just look at shamwow sales numbers if you don’t trust me.
For those of you who were wondering if I had fallen off the face of the planet, the answer is no. My mind was stuck in a limbo when I got hurt in an extraction (not the parasitic kind) mission. Confused? Read on…
I finally got to watch the critically acclaimed sci-fi movie Inception last weekend and life has never been the same again. Without giving away too much detail for the benefit of those who have not watched it yet, the main plot involves dreams within dreams within dreams – three level s to be precise to incept an idea into someone’s sub conscious mind. Are you still with me? Never mind, the first thing that came to my mind when I was exposed to the concept of dreams within dreams was – Nested domains in Multi-Voltage designs. Blame the nerd gene for triggering this reaction but the truth remains.
One thought led to another and before long I was dreaming about nested Multi-Voltage domains with donut shaped domains, which happens to be reality. The donut shaped nested domains is one of the new emerging flavors for nested Multi-Voltage designs and it brings a new set of requirements and challenges for the MV flow. Some of the key considerations for the donut shaped nested domains are:
- Number of levels of nested hierarchy
- Defining donut domains in the UPF
- Hierarchy and netlist management for the top level and the donut domains
- Placement of cells based on connectivity in the donut hole and the top level
- Handling of level shifters based on connectivity (need to be placed in the donut hole or the top level)
- Handling of isolation cells if the donut domain has a switching supply
- Power routing to the donut hole if the donut domain has a switching supply
- Power supply routing to the donut domain if the top level has a switching supply
- Handling power switches if either the donut or the top level has a switching supply
- Building a balanced clock tree for the donut domain
- Signal Routing within the donut domain boundary and meeting timing requirements
- Always-on buffer handling for the donut hole or the top level
- Ensuring power integrity for all the domains, etc.
If there are more than two levels of nesting with donut shapes this list will get even longer and much more complex. Why exactly a designer would need a donut domain is beyond me but whoever planted the idea is playing a cruel practical joke. Now, if you will excuse me I need to go and spin my top.
Step 0 Commitment – Are you really sure you want to MV? Are you positive that Multi-Vt & Clock gating would not help with your power budgets? Proceed to step1 with caution only if you really must.
Step 1 Architecture Selection – Ensure that the architecture is frozen and capture all the power constraints required for the chosen MV style in the UPF file. As most of you are aware this can also be done using the other power format but we will stick to UPF as it simplifies interoperability
Step 2 RTL Synthesis – Using the UPF file Complete RTL synthesis and derive the gate level netlist. Ensure that the simulation & verification runs are complete and validated
Step 3 Data Import – Import LEF, lib, SDC, Verilog, and DEF. Properties that are relevant to the multi-voltage design flow are:
Special cells in Library (always_on, is_isolation_cell, is_isolation_enable, is_level_shiter)
Corner & Modes – Define appropriate modes and corners for the different domains. Ensure that the worst case timing and power corners are setup correctly to concurrently optimize for power & timing
Step 4 Power Domain setup – Read the power domain definition by sourcing or loading the golden UPF file (same that was used for RTL synthesis). After reading the UPF file, the following items will be defined:
Domains with default power and ground nets
Power state table to define all possible power state combinations
Level shifter and isolation rules for the different voltage domains
Step 5 Floorplanning – Create physical domains and the corresponding power structures for each individual supply net defined in the UPF. Define Domain-specific hierarchy mapping and library association based on the architecture. Insert power switches for domains that are shut down (either VDD or VSS gated)
Step 6 Power Domain Verification – Perform design checks for general design and UPF setup, verification of level shifters and isolation cells, and analysis of always-on connections. The intent here is to help you find any missing UPF or power domain setup data that could lead to potential misery.
Step 7 Pre-CTS Opt – During the Pre-CTS flow ensure that no port punching occurs on power domain interfaces. The optimization engine should use the power state table (PST) when buffering nets in a multi-voltage design to automatically choose always-on-buffers or otherwise. Nothing much you can do since you are the mercy of the tool.
Step 8 CTS – During CTS ensure that no port punching occurs on the power domains interfaces. Like the optimizer, the CTS engine should also use the PST-based buffering solution to determine the type of buffers to use while expanding the clock tree network. Some clock tree synthesis flows require special clock gate classes to be recognized in order to restrict sizing operations during CTS to equivalent class types. Have you been nice to your R&D lately?
Step 9 Routing – Ensure that the routing engine honors the domain boundaries and contains the routes within them. Secondary power pin connections for special cells such as always-on buffers and level shifters should also be handled using special properties set on the power pins. Many design flows also require double vias and non-default width wires for routing of the secondary power connections. Top level nets that span across domains can be handled using gas stations to help optimize timing and area. Hail Mary…
Step 10 Hope and Pray – This step is optional. If your chip is DOA start from step 0 and repeat until you either have a working part or unemployed.
Clock designers are an enigma. Clock designers in general are die hard star wars fans, own vintage Porsches that leak oil by the gallon, usually have lava lamps in their offices/cubicles, wear fancy leather jackets in peak summer and have likeminded clock designers as best lunch buddies. Clock designers are notorious for making other lesser designers cry with their fancy PLL spice runs, non-negotiable skew numbers and for being resource hogs, especially higher layer metals. Clock designers live and breathe Pico seconds & watts (more recently) while the lesser mortals are perfectly happy to go for beers after two optimization runs. I have never been a clock designer myself but I have worked with clocks & clock designers for the longest time in my career, first as a design engineer, poring over timing reports and then as an application engineer supporting a sign-off timing tool. Building a good well balanced clock tree and effectively managing clock skew has been a challenge since the first transistor was invented and it still is today, especially at 28 & 22nm – The only difference is that now power is in the mix along with timing which complicates things even more. At smaller technology nodes the clock network is responsible for more than half the power consumed on any chip and majority of it is dynamic power due to the toggling clock.
As we are all are aware, clocks are a significant source of dynamic power usage, and clock tree synthesis (CTS) and optimization is a great place to achieve power savings in the physical design flow. The traditional low-power CTS strategies include lowering overall capacitance, specifically leaf caps, minimizing switching activity and minimizing area and buffer count in the clock tree.
While the traditional techniques help optimize clock tree power to a certain extent, Multi-Corner Multi-Mode (MCMM) CTS is an absolute must for achieving optimal QoR for both timing and power. One of the biggest challenges of design variation is clock tree synthesis. In smaller nodes, large variations of resistance seen across various process corners pose additional challenge of balancing the clock skew across multiple corners. With the proliferation of mobile devices, clock trees have become extremely complex circuits with different clock tracing per circuit mode of operation. Further, building robust clock trees that can withstand process variation is a huge challenge for the design teams.
Getting the best power reduction from CTS depends on the ability to synthesize the clocks for multiple corners and modes concurrently in the presence of design and manufacturing variability. Multi-corner CTS can measure early and late clock network delays over all process corners concurrently with both global and local variation accounted for. A multi-corner dynamic tradeoff between either buffering the wire or assigning it to less resistive layers is essential in order to achieve the best delay, area & power tradeoff. In comparison to the 1M1C flow the MCMM CTS solution provides significant reduction in area, buffer count, skew, TNS, and WNS in addition to lower dynamic power.
Now, before I forget let me state what I wanted to in the first place – I have to confess that I can now relate to this unique breed of clock designers and have utmost respect them for solving some of the most difficult chip design challenges thrown at them. A whole new generation is evolving with much cooler iphones and Gore-Tex jackets.
Resistance is futile. I recently caved and switched to an iphone after having been a loyal Google phone user for more than year. Apart from the coolness factor, my main motivation was corporate mail support that was absent in Gphone, plus the fact that I got the iphone for free when my wife upgraded hers. The difference is day and night between the two phones – The iphone UI is much friendlier, menu options are simple and logical and the device is much faster for certain applications like browsing, data download and video capture. Most of the modern smart phones/PDA are increasingly employing the Multi-Voltage technique, specifically Dynamic Voltage and Frequency Scaling to reduce power without sacrificing performance. The iPhone designers, unlike the Gphone have done a good job of creating this balance between the different applications running on the device.
Regardless of the phone type Multi-voltage designs unlike the vanilla designs, are difficult to implement because of the inherent complexity and the need to handle special cells such as level shifters and isolation cells. In addition these design styles also cause the number of modes and corners to increase significantly when min/max voltage combinations from all the power domains are considered. Because each different voltage supply and operational mode implies different timing and power constraints on the design, multi-voltage methodologies cause the number of design corners to increase exponentially with addition of each domain or voltage island. DVFS further complicate matters with varying frequency and clock combinations leading to even more design modes and corners. Additionally, the worst case power corners don’t necessarily correspond to the worst case timing, so it’s critical to know how to pick a set of corners that will result in true optimization across all design objectives without excessive design margins.
So, what’s the story you might ask? It’s pretty simple (or not) – In order to effectively close these MV design across all modes, corners, timing and power must be concurrently analyzed and optimized simultaneously for different combinations of library models, voltages, and interconnect (RC) corners. In essence True and concurrent Multi-Corner Multi-Mode analysis and optimization is a pre-requisite for any Multi-Voltage Design. Anything less would not guarantee convergence because optimization in one scenario could create a new violation in a different scenario, lead to multiple iterations, create unpredictable ECO loops, result in poor QoR and possibly reduce yield. In other words, low power designs, specifically MV designs, inherently require true MCMM optimization for both power and timing.
Now, will I go back to the Gphone? If and only if, they support corporate mail and also improve the performance. It wouldn’t hurt to jack up the coolness factor either. Till then I will remain an iphone user (loyal or not is debatable). The only problem with the iphone is, quoting Seinfeld “if you are mad at someone you cannot slam the iphone, but instead you will have to slide the phone off”.
A recent issue of the New York Times Sunday magazine ran a very interesting article on data centers. The author, Tom Vanderbuilt, describes the massive infrastructure that powers the internet and so much of our daily lives. The article says that data centers use more energy annually than Sweden. This includes the wattage to run the servers, to cool them, and the leakage power from the estimated 30% of servers that are in standby. According to the 2007 EPA study the data centers consumed a staggering 60 Billion Kwh in 2006 and are projected to reach 100 Billion Kwh by 2011 at a cost of ~8 Billion dollars. The biggest challenge that is being faced today for the growth of these server farms is the availability of power and the cost associated with cooling apart from the fact that they are not eligible for Government bailout.
Many ingenious design ideas are being considered for the next generation data centers to effectively manage the heat generated by thousands or millions of servers – A roofless building with servers packed in shipping containers, an underground bunker, server farms in the Arctic & Antarctic, and Google’s ”water-based data center”. Even if they manage to build these efficient server farms, delivering power to these farms them would be a monumental task.
With the emerging middle class in this global economy and their quest for knowledge it is inevitable that more and more kilowatts will be needed to run these ever expanding data centers. The short term solution for this problem is to build the data centers close to the power generating stations as evidenced by the Google farm close to the Columbia River dam. The more effective long term solution is to design more power efficient chips that also provide the high performance needed for these compute intensive servers. I believe that the fundamental solution still lies in the very chips that power these servers. Microprocessor design is already adapting to this need by packing more cores on a single die without increasing the clock frequency. A lot more can be done to further minimize power using advanced power reduction techniques for both processor and peripheral designs. The key is to employ power optimization at all stages of the design flow starting at the system level, board level, architectural level and finally at the component level.
Stories like this in major news outlets remind us that low power design is not a techie issue, but rather an absolute necessity for the growth of our economy and culture. Regardless of whether you use Bing or Google, one thing is certain – data centers powered using low power chips will directly help minimize green house gases and pave the way for the next generation of green server farms. I personally prefer Bing.
After some cautious and tentative moments, I finally managed to get my first post out. In this debut blog post, I’d like to introduce myself, present my bona fides and give you some idea about the likely content you’ll see here on a regular basis.
I’m Arvind Narayanan, Product Marketing Manager in the Place and Route Division at Mentor. I started my career as a Microprocessor design engineer for Hal Computer Systems (no the name was not derived from the movie) when 0.3u was state of the art and 100 Mhz was considered blazingly fast. I have been with the semiconductor industry for about 14 years in different capacities, ranging from processor design engineer to application engineer and currently in Product Marketing. After my design tenure at Hal, I worked at Synopsys focusing on STA and then at Magma Design focusing on low power design implementation and analysis. I earned my Masters degree in Electrical and Computer Engineering from Mississippi State University, and my MBA from Duke University (and I’m ardent Blue Devil fan!).
In the last five years I have seen the “low power” buzz gather enough momentum to the point where it is now being used in the design engineer vernacular alongside with timing – as in “Dude, did you close the design and are we cool?” I have been part of key product launches primarily targeted for the power-savvy engineering community to design greener chips. I have also been very involved in the development of the Unified Power Format (UPF), now p1801, initiative right from the inception to give the designers the long awaited power constraints file – take that sdc. My position here affords me a comprehensive view of the low-power trends and challenges faced by a variety of in-the-trenches designers. I will speak to the usefulness of different technologies and methodologies, talk about new ideas that arise in the technical press or conferences, and describe how specific challenges in low power design have been solved by our customers.
Please add this blog to your RSS feed, and leave comments and questions for me. I hope for this to be a fruitful and engaging encounter for everyone. A good place for me to start is by pointing you to the newly remodeled Mentor website with a dedicated low-power solution section. I won’t reiterate the website content in my blog posts, but the methodologies you’ll find on the Low Power Solution site reflect my point of view. Check it out when you have some time to spare: http://www.mentor.com/solutions/low-power.
Next blog, I’ll ruminate on the relative importance of ESL, implementation, and verification to overall power reduction. Till then I remain…
About Power Play
- Battle of Fins and BOXes
- Effects of Inception
- How to MV in 10 easy steps (really over simplified backend flow)
- Clocks will be Clocks..
- Why only MV when you can MC, MM & MV?
- To Bing or to Google is not the question
- December 2012 (1)
- August 2010 (1)
- February 2010 (1)
- December 2009 (1)
- September 2009 (1)
- August 2009 (1)
- May 2009 (1)