John Ferguson's BlogJohn Ferguson's Blog RSS
Just a friendly reminder, the Mentor Graphics User Group Meeting is just around the corner. It is scheduled for April 26th in Santa Clara, CA at the Santa Clara Marriott. If you are a Calibre user, this is your chance to get free access to information on Calibre’s and our roadmap as well as attend sessions on Mentor Graphics solutions in P&R, PCB, Custom IC design, and Test & Yield analysis. There is still time, so don’t be too shy to register now!
Here’s a small sample of what you can find:
- Calibre Product Roadmap & Update presentation given by David Abercrombie.
- Details on deriving and matching m & nf device parameters in LVS using DFM operations by Sarojini Rajachidambaram and Venkat Ramasubramanian of Global Foundries.
- Data splitting for into user-specifiable sub-block space using Calibre DRC/ERC by Arya Raychaudhuri of Fastrack Design and Duc Vu of PLX Technology.
- And more!
Hope to see you there!
Density and the Analog Cell
Analog design is a very sensitive business. Unlike the digital world, where circuits are on or off, and have built in hysteresis to prevent inadvertent toggling, analog circuitry is intentionally designed to respond to minor fluctuations in the signal. As a result, analog layout is riskier than digital. To prevent race conditions, or minor (yet potentially catastrophic) fluctuations in signal to more than one device, analog cells are constructed in a way that requires very precise geometric matching and symmetry between devices.
So what?, you say. Analog designers have successfully dealt with these issues for years. Unfortunately, when you take that carefully designed analog IP and place it in an SoC design, there is one threat that can undo all of that detailed layout: dummy fill insertion. The typical metal fill flow checks the full-chip density. Wherever regions are found that violate the density requirements, metal fill is inserted. Unfortunately, this fill approach is woefully ignorant when it comes to the matching and symmetry requirements of analog circuitry. As a result, if a full-chip density violation overlaps the placement of an analog cell, metal fill may be applied willy-nilly, likely throwing off the cell’s expected behavior.
Again, so what?, you say. You may be thinking that, as long as the analog cell itself passes density, there is no possibility that it will generate a density violation when placed into context, right? Unfortunately, that is not the case. There are a couple of reasons why an analog cell that passed density checking in isolation may be associated with density violations when placed into a larger design. The first is the surrounding context. If an analog cell that passes density is placed into a low density region, then the density violations may extend into the extent of the analog cell. In fact, if the analog cell is smaller than the density checking window size, the density violation may actually cover the entire cell.
Another, less obvious, reason why the analog cell may pass when run in isolation, but fail in the full chip, is due to the nature of density checking itself. In density checking, a window of predefined area is stepped, with a pre-defined step size, across a design or region. The windows that happen to fail are output as errors. This approach, by its definition, requires some implied starting point for the density windows. For example, with Calibre DRC, the first density window starts at the lower leftmost corner of the extent of the design.
Imagine a case where density is run on an analog cell. The first window checked is in its lower leftmost corner. The next window checked is shifted by the step size, and so on. Let’s assume this cell passes density by itself. Now let’s put the analog cell into the context of a larger design, as shown in the figure below. In this case, the first window checked is in the lower leftmost corner of the larger design. As the density window is stepped across the design, there is no guarantee, and in fact, little likelihood, that the density window locations from the full-chip run will align with the density windows checked when the cell was in isolation. There is a very real possibility that, because the density windows are now in a new location, the analog geometries within the new window will now fail density. This is actually a real error. Unfortunately, it is one that could not be found or corrected when the analog cell itself was created. Now we have a real challenge.
Figure 1. Due to the mis-alignment of the cell origins at chip-level versus cell level, it is possible to identify an error at the chip that is not found on the cell itself.
So, what can we do about this issue? The good news is that there are some solutions. The bad news is that they all have some limitations. Let’s take a look…
First, we can do a better job detecting these errors within the analog block. For a shifted density window at full-chip level to flag an error not found at the cell level, the original density windows run on the analog cell must have been close to the limit, because there is not much room for variation in the geometries under the window. One commonly used approach is to apply a more restrictive checking requirement to the analog block, but this approach does have some challenges. First, it requires modifying the golden rule file during the analog cell run, which is generally considered taboo. Second, even if you do modify the rule, you need to know how much to modify the constraint by. This is not trivial. To know the correct adjustment, you need to know how much the window will shift when the cell is placed into context, which of course, you don’t. To determine the worst case scenario, you’d have to imagine that the region in the analog cell that is now not checked was 100% covered from a density perspective, and that the new region being checked is 0% density compliant. From this situation, the worst case situation would occur when this delta is completely devoid of polygons for the layer being checked in the density rule.
Figure 2. Overlapping density windows.
Let’s take an example. Assume that the upper right window from Figure 2 is the window used when calculating the density at the analog block level. Further assume that the lower left window in Figure 2 is the window used when calculating the density at the full-chip. The inverted-L shape in the upper right, representing the portion of the window from the analog block not checked in the window at full-chip is exactly equal to the L-shaped region on the lower-left, representing the new portion of the design that was not checked (not known) when we checked the analog block. In the worst-case scenario, inverted-L in the upper right is completely full of whatever geometry we are checking for in the density calculation, and the L-shape in the lower-left is completely devoid of that geometry and each has a width exactly equal to one step size. In this case, for the analog cell to be assured of passing anywhere it is placed into the design, at the cell-level, it must meet a density requirement of [(100*100)-2*(10*10 + 10*9)]/100 = 96.2% of the original density constraint. The larger the step size, the greater this difference will be.
Another approach that does not require such complex algebra is to simply decrease the step size of the density window when running the analog block, thereby increasing the coverage of the density checking. To assure 100% coverage, this step would have to be the smallest step as defined by the process. This approach requires longer runtimes and potentially much larger result databases. It also requires detailed analysis of the results to determine the best fix. In Calibre, multiple methods can be used to help with this analysis, including the use of histograms and color-maps, or the use of automated averaging and combining of window results.
Figure 3. Density window histograms and color map enable fast identification of the min and max density windows within a block.
Of course, both of these approaches only solve part of the problem. Finding all possible density violations in an analog cell does not ensure that the cell can be modified to be 100% density-compliant in all cases. It also does not help in the case of pre-existing analog IP that cannot be modified.
One last method tries to ignore these errors at the chip-level. Historically, users tried to ignore these errors by limiting the areas where the density windows are and are not checked. Unfortunately, this approach also falls short. Density windows are square by default. It is possible that, in context, multiple analog cells line up in a manner that forms non-square regions. In these situations, it is not always possible to check everywhere outside of the analog cells while not checking anywhere inside the cells. Consider, for example, a “doughnut” ring made of analog cells surrounding some digital area. The doughnut “hole” in this case is not checked. Another common problem happens when the sections to check have regions narrower than the density window. How do you check only part of a density window?
Figure 4. Complex keep-out regions for the analog portion of the density, may not be easily partitioned by a square density window.
There is an alternative that can be used with this approach. In this alternative, the full-chip density windows are checked everywhere. Those windows that have a user-defined percentage of their area covered by an analog cell are simply ignored. This approach, unfortunately, also requires non-trivial modification of the density check in the golden rule file. The exact details of how to do the necessary coding can be seen in the Calibre Solutions Manual under the Density Checking chapter.
However, I’m not going to leave you with incomplete and unsatisfactory solutions. I’m just not that kind of guy.
To remove the need to modify the rules, a new method of DRC waiving, specific to density checking at the chip-level, is available. With this approach, the user specifies the check name and the percent area of a density window that must be covered by the analog region for it to be waived. The analog region may be identified by cell name, by a drawn marker layer in the layout, or by some previously identified density results. With this approach, density windows that have enough of their area covered by the analog keep-out region are waived. All other windows are checked as normal. Unlike the existing DRC waiving approach of Calibre Auto-Waivers, this waiving is done window by window as opposed to by merged results. Those density windows that are waived output to a separate database for user review, with details of their measured density and the percent of window area covered by the analog region. This is all done without any user modification to the golden rule file.
Figure 5. Calibre RVE displaying both the density errors and the waived density errors for check “met1_density”. The results database shows the calculated density value as well as measured areas of the metal and the keep-out region. These can be used for sorting or filtering.
With the combination of the techniques described to try to assure that analog blocks will meet density regardless of context, and the ability to automatically waive violating density windows interacting with the analog cell, users now have much safer and automated solutions to a problem that has vexed the SoC design community for several years. Solving the analog IP density issue helps prevent the inadvertent destruction of analog circuitry via non-optimal metal fill placement, and saves significant time by automating the previously tedious manual approaches used to identify and remove those metal fill geometries from the analog regions. For more details, feel free to ping me directly.
In my last few posts, I began discussing on what it takes to enable software quality and support. This particular post will focus on the latter, support.
Of course the goal of any decent software provider is to deliver software that is bug free, intuitive to use, and performs a valuable service. While we strive for perfection, in reality these goals can never be fully achieved. In the EDA world, which is always growing and evolving with the electronics industry, even flawless software must at least enable a path to future enhancements.
This is where support comes in. Support provides many benefits beyond the software itself, but in general can be summarized as two key benefits: 1) Access to software updates enabling timely hand-offs for tool fixes or enhancements, and 2) Access to expert knowledge for complex user challenges or issues. Since my previous post already detailed the approach used by the Calibre team to enable high quality and timely updates, I will focus here on benefit #2.
Providing access to expert knowledge can come in many forms. Typically we think most about the worst case scenario: a user who is up against an urgent time line who is struggling to get or to interpret the results they need. In this kind of scenario, the user is trying to pull in expertise. Of course, there is also the more proactive approach where the vendor will work to push expertise and knowledge on broad topics to the user community. At Mentor, our aim is to provide both approaches to sharing this kind of knowledge, and to do so in a manner that is faster and simpler than the rest of the industry.
Let’s focus first on the proactive methods employed. One approach used is the SupportPro News letter. This weekly flyer focuses each release on a different technology or trend and is full of helpful hints and places to find more information. Another avenue is U2U, the Mentor Graphics User’s community, where users can share their own tips and feedback, and also connect directly with Mentor personnel. In addition, Mentor regularly provides webinars and seminars with various technical experts presenting in the area of their expertise. Many of these are often recorded and available for review at the user’s convenience. All of these methods provide a means for users to educate themselves and to interact directly with Mentor.
Of course, this is only part of the solution. We also strive to provide users with timely answers to technical questions or problems. There are many ways for a user to access this level of knowledge. For Mentor customers on support, often the easiest and fastest is Support Net. Not only is this where users can quickly download the latest releases and documentation, but it is also a portal to a wealth of knowledge. User’s can search by topic or keyword to find technotes, application notes, tutorials, webinarrs and more? Most problems can be resolved quickly and easily without ever having to directly interract with anyone at all. Of course, if you can’t find the answers you are looking for, this is also an easy way to leave a request for a service engineer. Did you know that more than 70% of all questions coming into SupportNet are answered within seconds? Of course, you can also communicate with our support engineers directly through email or phone. (1-800-547-4303 for North America).
Of course access to a person is one thing, but access to someone with the required expertise and with the skills and aptitude to walk you through it is another. We’ve all heard the jokes and we’ve all lived through the pain of waiting on hold forever only to be put in touch with someone who can’t help. Making sure that there is fast access to someone with the right knowledge level is imperative; the average response time for technical questions is under 10 minutes!
There are many ways we use to ensure our support engineers are more than up to the challenge. We have our own CSD University where the engineers gain training and certification in their areas of expertise. They also attend AE training events twice a year to learn all the new features recently implemented and coming down the pike. We also hold regular brown bag lunch and tech talk sessions where tool development, marketing and support have an opportunity to present their recent findings to one another. Of course, like any engineer, they are also driven to keep informed with the trends in the industry through outside training and through various publications.
In addition to all of this, our support engineers are considered an integral part of all our projects and planning sessions. As a result, they are represented, along with marketing, R&D, QA, and documentation in all projects concerning key new functionalities being implemented or considered. They represent the voice of you, the user. This provides them advanced insight and expertise to new features or functions before they are available to the market.
The support team also meet regularly amongst themselves and with the documentation and customer training teams to cross-train and align on observed common issues or requirements.
Lastly, our CSD team regularly engages directly with users, not just through emails and phone calls, but through various technical and marketing review meetings, trade-shows and events, and more. That’s where you can come in. Next time you see Burr Shaw at the annual U2U convention, or you exchange an email with Tricia Allgyer, or talk on the phone with Bill Drezen,… don’t feel shy; fill them in on what you’re working on and the technologies you’re involved in. They live to solve your problems, but they love to learn in the process! The end result is better solutions and better software.
In my last blog I discussed the importance of support and the value it provides in the physical verification space. As indicated, one of the key components in providing support is having an infrastructure helps to assure quality software releases in the first place. In this blog, I will provide more insight into the procedures in place within the Calibre organization that help to ensure the high standards of Quality that Calibre has become known for.
For Calibre, the concept of quality starts at the very beginning. When new features or functions are conceived and planned, marketing takes the initial role to scope the expected behavior. Depending upon the complexity of the functionality, this is often done in the form of project teams that consist of marketing, development, product engineering and QA and often customer support and documentation. Through the course of the definition, the customer goals and requirements are always kept first and foremost. With those goals in mind, various implementation proposals are explored.
While R&D develops functionality, they will typically incorporate any tests that come from customer support or the customers directly as well as add their own validation tests.
Ultimately, however, product engineering has the goal to test that the final solution meets the initial requirements and does not break any existing functionality. Functional tests can come from customers, marketing, customer support, R&D, and their own generated test cases. All of these tests are collected to form an initial baseline test for the specific set of functionality.
Once an agreed upon implementation is put into the code, testing can begin. For each new function, the functional tests are run and validated. In addition to this, all historic functional tests must also be run and validated. This assures that new functionality has not introduced an unexpected change to any existing behavior.
Unlike many solutions, Calibre is a platform of offerings. It consists of a single processing engine with solutions built for offerings for DRC, LVS, parasitic extraction, retical enhancement technology, mask data prep and fracture, and the list goes on. It is important to ensure that a change to an offering, such as DRC, does not have an adverse effect for another application, such as OPC. This means even further testing and validation.
Because a product like Calibre can be run in several modes (flat, hierarchical, multi-threaded, distributed, hyper, …) it is critical to validate that all modes deliver the same expected behavior release to release. This means that all those functional tests must be run and compared in every possible configuration. In addition to this, these configurations must also be run across all supported hardware and OS platforms. This translates to literally thousands of validation runs that must be platformed for each new release.
In addition to functional testing, it is also critical to also validate performance and capacity. With Calibre, the goal is to continue to lead the industry in performance. In most releases, performance improvements can be anticipated. At an absolute minimal, it is critical to ensure that performance and capacity have not degraded. To validate these goals have been met, large design tests, specifically targeted to tax the performance limits, are also run. Typically these will consists of large designs and rule files from key partner customers. By providing such cases, customers can feel comfortable that similar future designs will continue to see performance gains.
With each new release providing key enhancements for every tool offering in the Calibre platform, this testing becomes a critical component of the release process. For the Calibre platform, there are four new releases planned each year, typically released in the middle of each calender quarter. While it is not expected that any customer is going to upgrade to a new release every quarter, it is important that new releases are available frequently, so that new functionality is available promptly when a customer does choose to transition.
Of course, even with all of the testing that is done, it is impossible to guarantee no bus. To help address that, in addition to each official release, there are typically 2 to 3 update releases for Calibre. These update releases will build upon a previous official release and will consist of bug fixes and minor enhancements. These update releases allow users a way to upgrade to new functionality or gain access to bug fixes, without bringing the additional risk associated with other changes.
For the Mentor Graphics Calibre platform, the responsibility associated with sign-off quality and accuracy is taken very seriously. When you add up all of the official releases, and all of hand-off patches provided, the number of tests required per year with Calibre quickly climbs to the order of hundreds of thousands, and consumes several thousand CPUs at near 100% utilization around the clock! Can your other vendors claim the same? When they fail, will they be there to bail you out? In my next blog, we’ll examine in more detail how Mentor Graphics has organized to provide the best support possible in the event that you do have a problem.
When asked about the value that the Calibre platform brings to the design community, most folks will respond with performance, foundry support, and ease of debugging. While these are all valuable aspects and traits of Calibre, there is one more benefit that is often taken for granted: support.
The word “support” is something bandied around loosely in EDA. Saying you have good support is akin to saying you have high “quality.” It’s an intangible that is difficult to get your hands on. But, if we can’t define support, we can’t possibly quantify it. Similarly, if you can’t quantify it, you can’t realistically compare one vendor’s support versus another’s.
So, from a Mentor Graphics and Calibre point of view, what does support mean, and how does it differ from quality? The simplest definition for support is probably a measure of the risk level the tool imparts on a business. Let’s face it, physical verification is a critical component in the electronic design cycle. It is among the very last tasks performed before tape-out. Its function is to ensure that the layout, which will dictate the mask (and ultimately the silicon of the design), can be manufactured and will implement the functionality as designed. This is critical. As Jon Kuppinger of LSI once commented to me, “If our place and route tools, or our layout tools, or our netlisters have bugs, it’s painful, but I can at least manage it, because I know Calibre will identify those errors before tape-out. But, if Calibre has a bug, there is no safety net.” Clearly, the risk associated with physical verification, then is quite high. Minimizing that risk is the role of support.
Quality, on the other hand, is a measure of the usefulness of a product for a given task. This includes a measure of the risk associated with a product in its current state. To be useful towards a task, it must contain all the functionality required to perform the task; it must be relatively easy to implement into an organization and flow, and it must also be dependable, meaning it is unlikely to have disastrous bugs, and that it provides alternative paths to success should a problem be found.
That said, for software products, and in particular for EDA products, which must constantly evolve to meet the ever-changing challenges of new technologies, it should be clear that one cannot achieve high quality without first having very strong support. The support ultimately drives the shape of the product as it continues to evolve and grow.
In the case of physical verification, the challenge for software providers is to do all that is possible to reduce tape-out time while ensuring acceptable results. To that end, there are several approaches that can be employed as part of a comprehensive support model:
* Implement a high-coverage testing approach to identify and correct bugs before a software version is released
* Provide an efficient communication vehicle for users to pass details back when faced with problems or challenges
* Activate a well-trained set of experts with an infrastructure that enables them to respond to those identified issues in a timely manner
* Establish a strong roadmap targeting expected future requirements, including advanced verification needs, as well as new processing requirements
* Employ and maintain an effective mechanism for delivering new code for fixes to problems and for new functionality enhancements
Each of these items represents goals at which Mentor Graphics and the Calibre product line strive to excel. In my coming blogs, I will take a closer look at each of these goals, and the specific actions taken for each to achieve superiority.
I don’t normally take the time to respond to any of the various competitive claims out there. But recently in ESNUG 483, item #2, there was a posting entitled “We recently dumped Mentor Calibre for Magma Quartz DRC/LVS” (http://www.deepchip.com/items/0483-02.html) that I feel needs to be addressed because it is misleading. So let me lay out the facts to set the record straight. Tezzaron Semiconductor Corp., Naperville, IL, became a Calibre customer when it purchased a perpetual license in 2000. In 2001, Tezzaron declined to renew technical support, and since then has not renewed the license, so they only have access to Calibre 2001.2, which was released in early 2001.
In his submission to John Cooley’s ESNUG site, Tezzaron’s Robert Patti, stated:
“We had been using Calibre for many years, but were not satisfied with it. In particular, we were looking for:
- An ability to scale efficiently across a larger # of CPU’s and machines. Calibre is OK on a single machine, but doesn’t do well outside of that. We need to get results faster.
- Better technical support. We do highly specialized designs. DRC customization is the norm rather than an exception for us.
- Native support for TCL. This is really important, as we need to be able to customize both the runsets and tool ourselves. Other EDA tools we use are TCL based. Calibre is based on a legacy runset language, and newer design rules need a better language option.”
I’ll address each of these issues in turn.
Eight years ago Calibre’s distributed processing functionality had not yet been released. Since then, Mentor has released numerous Calibre performance innovations, including the ability to thread and distribute rule file operations (Hyperscaling), and the ability to distribute layout data (Hyper Remote). Together, these features consistently provide the best performance and scaling in the market, as demonstrated in many competitive benchmarks. Here’s the data comparing the latest version of Calibre to our 2001 performance:
As you can see, there is an order of magnitude performance improvement and great scalability on multiple CPUs.
If you want technical support, there is none better than Mentor. Mentor is the only Five Star Support award winner in the EDA industry, and we consistently rate higher than our competitors in independently conducted annual customer surveys. As I mentioned above, Tezzaron dropped their Mentor technical support in 2001.
Native Support for TCL
Calibre has supported TCL since 2004 through the TVF syntax. Like the significant scaling improvements, this is now a standard part of the Calibre DRC/DRCH package. Calibre customers with support contracts have had TCL support in Calibre at no additional charge since 2004.
Comparison to Quartz
Mr. Patti asserts in the ESNUG submission:
“In our best ‘apples to apples’ comparison we could do with Calibre (same number of threads for both tools), we found Magma Quartz to be 2-4X faster at a lower number of threads. As the number of threads increase, the gap grows even larger. We have tested from 2 to 16 cores and find the scaling to be very good. When the final top level checks on a 24 chip ‘design’ is ~2 hours, it literally saves us days.”
Tezzaron has the 2001 version of Calibre and does not have appropriate licensing to run more than a single CPU. Since there wasn’t a Calibre version number listed in Mr Patti’s write up, nor were there any Calibre performance times, I can only assume this is a comparison of an eight year old, single-CPU version of Calibre to the current version of Quartz.
With the significant improvements in Calibre runtimes, stemming from engine optimizations, scaling improvements, and hardware and OS platform support, I’m confident that had Mr. Patti tested a current version of Calibre, the results would have been dramatically different.
For further evidence of Calibre’s performance, see the customer review in the same issue of ESUNG, “Calibre nmDRC 6X speed-up from HyperScaling and Hyper Remote” (http://www.deepchip.com/items/0483-07.html ).
Mr. Patti also comments:
“We’ve run our ‘design’ that’s more than 20x larger than a normal design, and Quartz with Direct Read handled it, while Calibre choked.”
Without any background information, it is unclear what this means. Quartz runs on Linux platforms. With the 2001 version of Calibre, 64bit support on similar Linux platforms was not available (64bit Linux computing products were not mainstream in 2001). As a result, Tezzaron’s runs are limited to the 2Gb limit of 32bit processing in Linux machines. Of course, the current version of Calibre supports 64-bit computing, and Calibre customers are verifying some of the largest and most complex designs in the world.
The Bottom Line
All the indications (performance, scaling, availability of TCL, license status) are that this was not a valid comparison of Magma Quartz to Calibre nmDRC. It is a comparison of Magma’s latest product to an eight-year-old version of Calibre.
A new season of NBC’s “The Biggest Loser” recently started. Have you seen this show? My wife, Cherie, loves it; she finds it inspirational to watch these folks go through such a tough ordeal in order to improve their health. I enjoy it as well, though my motives are completely different. Somehow watching these folks literally work their butts, while someone is screaming at them, makes me feel less self-conscious of my physical shape and fitness, or lack thereof. Knowing that I haven’t gotten to that point yet allows me to justify why I’m sitting on a couch watching them while reaching for a handful of potato chips!
I do find it interesting how they measure gains or losses week in and week out. If you’ve watched from the beginning, you may have noticed that the current approach used is different than the first season. When the show first started, the contestants were measured purely on the amount of total weight, in pounds, they lost that week. Now, instead, they determine the biggest losers based on percent of weight lost each week. I believe that the decision to do so was to counter the assertion that by going based purely on pounds that the competition favored those who were already the most overweight, as they would have more to lose.
But, it seems to me that the current approach doesn’t really solve the problem. Yes, it helps. Someone who was 250 pounds that loses 5 pounds now equals someone who was 400 pounds that loses 8. But, someone who is 400 pounds still has a lot more to lose over the long haul.
Let’s consider two theoretical contestants; I’ll call them Bob and Jillian. Let’s assume that they each have an ideal weight of 175 lbs. Let’s assume Jillian starts the competition at 225 lbs and Bob starts at 275 lbs. Imagine that 2/3rds through the competition, they’ve both lost 22% of their body weight. That means that Bob, the heavier of the two, has lost 61 pounds and now weighs 213 lbs. But Jillian, the lighter of the two, has lost 49.5 lbs and now weighs 175.5 lbs. She has nowhere to go from here, while Bob still has room to lose another 38 lbs! As a result, its likely Bob will win the long-run, unless Jillian struggles to lose weight beyond her ideal! But, in this fictional case, is Bob really the person who should be considered to have made the most achievement? Afterall the lighter of the two contestants made it to ideal weight much faster. Isn’t that what really counts?
This same phenomenon creeps up from time to time in the world of physical verification when we talk about scaling. Let’s face it, scaling is hugely important for DRC runtimes these days. If you are designing at 32nm or 28nm with a design with billions of devices, there is simply no way you are going to get reasonable runtimes without it.
As part of the efforts to continue to ensure the fastest total physical verification runtimes, Calibre continues to improve our scaling capability. If you are looking for the best runtimes with Calibre for large designs, then “hyper remote” with remote data servers is the way to go. “Hyper remote” is a great concept. It actually combines Calibre’s strengths in the world of true multi-threading, with our existing distributed processing and initial hyperscaling concepts. In essence, it allows us to run multi-threaded processes on remote machines. Also, by allowing the remotes to manage their own processes, it allows us to do many more tasks in parallel than traditional hyperscaling, thus improving scaling and cutting runtimes dramatically. Remote data servers allow us to also move memory allocation to be shared across the memory in the remote machines. Doing so greatly reduces the requirements for a “master” machine. The combination allows Calibre to gain considerable improvements in both environments with lots of small (2 processor nodes) machines, or in environments with several large servers with many processors. As always, its just part of the standard Calibre licensing configuration, and all comes as part of the support dollars spent on your Calibre investment.
But, with all that in mind, we still realize that scaling is only a means to an end. The end goal is really fastest turn around times. Sometimes it’s easy to lose track of this goal, putting the emphasis not on runtimes but on scaling itself. This can be misleading. Consider the two scaling graphs below.
If you consider these two graphs out of context, you may conclude that the second graph represents the best solution for physical verification performance, because it seems to scale to more CPUs. But this may not be true due to some unstated assumptions.
First, scaling is always measured by reference to some starting point. These starting points are not necessarily scaled with respect to one another. To illustruate, lets go back to our Biggest Loser analogy. Recall our two contestants, Bob and Jillian. Let’s assume that after 2/3rds through the season, Jillian stopped losing weight, but Bob continued to lose another 20 lbs through the course of the season. You could plot “scaling” as a curve of their current weight per week. In doing so, you’d clearly see that the heavier contestant’s weight loss ‘scaled’ further. But, could you then conclude that this means that contestant is somehow more fit? Of course not; Jillian weighs less!
The same is true for physical verification and scaling. Let’s consider the same original scaling graphs, but this time, lets plot them not by relative speed-up, but by actual runtimes. In doing so, some new information comes to light that can dramatically change the picture. Now we can see that the first graph’s curve stopped scaling earlier, but actually reached a faster total runtime in the end.
Ah, but you might say ‘the graph on the right still looks best because it is continuing to scale and by extrapolation, it looks as though it will eventually be faster.’ Well that’s a stretch, to say the least. Let’s reconsider our contestants. We all know that, eventually, each body has its own minimum healthy sustainable weight point. We know where Jillian’s minimum weight is; she already reached it at 175. But, you can’t tell if Bob will continue to lose more weight or if he has already hit his minimum.
From this scenario, you can clearly see that for contestants on the Biggest Loser, it is to one’s advantage to come in weighing more. Quite frankly, any would-be contestant on the show would be well served to binge eat as much as possible prior to coming onto the show, just so they had more weight to lose over the course of the season, and thereby increasing their odds to stay above the dreaded yellow line!
Again, there is a similar analogy in the world of physical verification. Amdahl’s law clearly shows that all scaling solutions eventually reach a point of diminishing returns. In other words, like ideal body weight, every tool will eventually reach a runtime plateau, where adding more CPUs does not improve performance, and may even start to run slower. This basically means that you cannot extrapolate a scaling curve. We illustrate this point by extending the previous scaling curves to more CPUs below – clearly the extrapolation was not a safe bet.
Another commonly related mistake is to make a determination that for a larger design the second solution, which scales further, will be better because the first one stopped scaling too early to get good returns. This is akin to saying that if Bob and Jillian left the show, only to both return back the following year each weighing 400 lbs, that Bob would now be better suited to win. What this thinking fails to take into account is that the situation in the earlier season is now completely changed and cannot be used to set an expectation.
The same is true for PV. One cannot assume that the scaling, and the point where that scaling stops, are consistent for a particular physical verification solution, across any design. I can’t speak for every tool, but for Calibre, that’s clearly not true. Calibre’s scaling will depend on many things: the size of the design, the hierarchy of the design, the number of rules being run, the complexity of the rules being run, the interactions between the rules run, the number of types of hardware used … In general, we can say that two designs on the same process with similar design styles, but with two different design sizes, will not experience a runtime increase in proportion to the design size. It should be considerably less, given Calibre’s handling of hierarchy and repetition.
For the producers of “The Biggest Loser,” this may all seem like a lot to try to digest! The point to remember is that scaling is just a means to an end. What are the real goals? For fitness, it is how to get to the ideal target weight the fastest, not how to stretch the amount of time it takes to achieve that out the longest! For physical verification the goals are two-fold: First and foremost is how to get the fastest possible runtimes. The second, less obvious one, should be how to get there with the lowest cost, which generally means with the fewest CPUs and using the least memory and disk resources.
It’s for this reason that Calibre is not just optimized for scaling. It would be relatively easy to modify the Calibre architecture such that it scaled further. This would be akin to binge eating before going on the Biggest Loser show. To do so, however, would likely increase the total runtime and memory usage. Instead, the focus for Calibre is first on reducing total CPU times through a combination of continued engine improvements and optimizations and by introducing new operations to simplify and speed new process checking requirements. Below is an example of performance improvements due to engine optimization in Calibre over the past year.
With this approach, the total CPU time required to run a job is significantly reduced. This means that when scaling to multiple CPUs, there is less computation that needs to be shared. In otherwords, less hardware is required to get to the ideal performance goals. Or, to go back to our Biggest Loser analogy, it means that Calibre is doing what it needs to do stay fit and trim from the beginning, instead of having to spend weeks on the treadmill with people screaming for improvement!
So, I’ve “volunteered” to provide the occassional highlight of my DAC experience this year for Mentor Graphics. I was a little concerned about this, as I’ve been affraid this was going to be a rather lack-lustre event. Unfortunately, I have to say that so far my expectations have been dead on. But, due to a little serendipity, I did stumble upon something that at least sparked some thought and interest.
On Monday morning, I received a phone call from a collegue. He’d planned to go attend a the EDA Roadmap Workshop hosted by Juan-Antonio Carballo of IBM and Andrew Kahng of UCSD. Unfortunately, he’d been pulled away and asked if I would substitute in. Unfortunately, I was only free to attend the first hour. But, I have to say just that first hour was enough to spark some interesting discussion.
Just the concept alone is interesting. Can we and should we align the EDA vendors to working on the same technologies to be better prepared with enablement software when the next generation of process technology roles out. Let me first say, I think this is a discussion that warrants such meetings. I hope and expect that it will continue. That said, so far I’m not convinced we should or will ever get there.
Why do I say that? First, lets admit it, EDA is shrinking. There are four (soon to be three??) big players. Each has their own area of expertise. Each pays attention to and does its best to be involved with the various industry technology roadmaps. But, its only natural that as the roadmaps identify problematic areas that need EDA solutions, that each EDA provider would focus on an implementation solution that builds from and integrates closely to their existing areas of strength. For example, for a problem impacting designers, it is only natural that Synopsys would build from the P&R perspective, Cadence would build from the custom design space, and Mentor would build from the physical verification domain.
In my opinion, this is not a bad thing, in fact it is a very good thing. Why? Because as each vendor focuses from their areas of knowledge, they intuitively solve the problems that their users care about. But, each may miss concerns from the spaces that they are less involved in. But, since one of the other vendors will have come from that direction, they will have found those holes. In the end, I believe, this approach eventually leads towards all the vendors better understanding the requirements across the broad user spectrum. At that point, it just becomes a matter of who can implement fastest and deliver a quality solution to the market best. On the other hand, if we take the approach of driving the EDA direction through committee, we are much more likely to have three or more vendors providing very similar solutions, all with the exact same holes and problems.
But, as I expressed, having a more centralized manner to discuss the issues and combine the needs and requirements across different customers to present commonly to the EDA community is a good thing. I think it will help get the ball started with all vendors. I just hope that we’ll do it in a way that will still allow each vendor to diverge as they see best.
Now for a little side discussion …
Most of the first hour was a summary of the various IC technology roadmaps, presented by Dr. Alan Allan of Intel with a particular focus on ITRS, but also some interesting commentary on where it diverges from other roadmaps including TWG. While a bit like drinking from a fire hose, I found this discussion fascinating. One thought, in particular, kept coming back to me.
One of the first things Dr. Alan discussed was how the way Moore’s Law has been implemented over time has changed. Moore’s Law was initially targeted at providing an improvement in IC performance, with a target of a 30% speed-up at each generation. Historically, this was achieved through a process node shrink. By shrinking the transistors, the transistor performance was sped-up. Because the device performance was the primary limiter of the overall IC performance, designs were thus inherently sped-up.
But, eventually, a simple shrink was no longer enough. Eventually all the physical issues which once could be ignored as in the noise, like performance loss due to interconnect parasitics or issues of leakage current, etc, started to creep up. Ultimately, in addition to the device size shrinks, other techniques, including new interconnect layers, new dielectric materials, etc., were implemented to help provide the performance needed.
As Dr. Alan summarized his summary, one of the main differences between ITRS and TWG, one thing that seemed to pop-out, was TWG’s greater emphasis on System in Package (SiP) and techniques like Through Silicon Vias (TSV) to connect multiple chips together in one package. Here I got the impression that many in the room were unconvinced that this was as important as the focus on next process node. Naturally, I find myself, once again, the contrarian.
Why do I say this? It all comes down to economics. What I think people forget is that the good old days of a pure process node shrink not only provided a performance bump, but it also represented an economic advantage. If you were at 0.25 micron, a move to 0.18 micron meant you could get more chips per wafer. As a result, the total cost per chip was reduced. But, now with the significant increase in the cost to go to a new wafer, this no longer seems to be the case.
That said, I’m predicting a shift in the way the industry works. For several years now we’ve been going down the SoC road. More and more design components get integrated into a single chip, all targeting the same or at least complementary process nodes. But, does this really make sense? Why pay to have every transistor at 28nm, if only some components are performance critical? What if you could create a package that efficiently connected a digital core at 40nm with a memory at 60nm and a high performance graphics processor at 28nm? If we, as an industry, can provide a means for designers to connect timing critical components made from a chip processed at an advanced node, with less timing critical components processed at an older node, we may provide a more economic approach to delivering product to consumers.
Keep in mind, this is a big “if”. There is a lot of work to be done. Technologies like TSV, where a super large “via” is drilled and connected through the substrate itself, allowing multiple chips to be stacked and connected through bump pads, still has many unknowns. How do you model its impact on performance? How much variability will it have in manufacturing? How do you manage the heat flow and other introduced problems? How do we make it more consistent? Eventually the answers will come. There is still the possiblity that when they do, the answer will reveal that to implement properly it is equally or even more expensive than the approach of keeping everything on a single chip.
Rest assured, this is not a topic of passing interest. Like the DFM buzz that started about 5 years ago, the SiP and TSV discussions are here to stay, along with some heated arguments both for and against. I predict it will be one of the hot topics for DAC next year. That is, of course, assuming that there is a DAC next year! If it can provide more discussions like the one I attended I think it will. But, if they are relying on vendor donations to carry the weight, somebody better start figuring out how to make actual customers care about DAC again!
That’s my 2 cents. TTFN,
In my last post, and in Michael White’s reccent post, the reasons and challenges associated with “waivers” for DRC were discussed. As detailed, this is becoming a bigger and bigger challenge as designs become more intricate and design rules become more complex. To the poor design team that has the challenge of integrating IP from multiple sources into a single working design, this can become a nightmare to manage. Not only is the DRC debug time significantly drawn out, but there is also the very painful process of trying to communicate across all parties to try to understand which result are real and which can safely be waived. What can you do? This is where the gratuitous plug comes in.
Fear not, dear reader, calibre has a solution for you! The calibre Automatic Waiver Flow enables accurate removal errors in a design matching previously waived results from within the integrated IP blocks.
Let’s consider the requirements for an acceptable waiving solution. First and foremost, it must be accurate. This is where most historic approaches to automating the elimination of waived errors fall short. Having errors show up that should be waived is annoying and continues to impact debug time. But, worse yet, inadvertently waiving a real error can be disaserous.
Unfortunately, accurate waiving is trickier than it may sound. By placing an IP block in context, the errors may be modified. Results may be promoted in the hierarchy. In addition to promotion, the shape of an error result may also be modified as placed in context. This poses a difficult quesiton. If the in-context error is not exactly the same shape as the error when run stand-alone, should it be waived?
Another important criteria for a successful waiving flow is that it cannot depend on modifications to the golden rule file. For a user to modify the rule file, a great deal of risk is taken on by the user. If they code it incorrectly, and miss a real error, it will fall on their shoulders. With the calibre solution, the waiving is done automatically, under the hood, using the golden sign-off rule file. To ensure accuracy, this flow is being validated by the various library validation teams.
Another historic issue that must be addressed is how the waivers are passed. On the one hand, reliance on proprietary formats will naturally limit the industry’s ability to adopt the solution. On the other hand, use of industry standards like gdsii have been tried in the past. One problem is that they typically rely on a separate layer for every check that can be waived. But with the number of checks in today’s rule files, this is nearly impossible. Another issue is that anyone can edit gdsii. What is to prevent a user from creating their own waiver geometry that covers the entire design, thus eliminating any errors from being reported for a specific check?
These issues and several other surprising challenges are all accounted for in the Calibre Automated Waiver Flow. To learn more, come see me at DAC at the Mentor Graphics suites. I’ll be demonstrating this functionality at 1pm daily.
Many, many years ago, when I started in this business, I encountered something that I thought was surprising. In my very first DRC benchmark, I was struggling with a particular rule. The customer had given me a 0.25 micron layout, which they had successfully taped out. My job was to write a rule file in the new tool to measure performance improvement. My code matched the design rule manual and passed all the regression tests. But, it seemed that no matter how I tweaked the coding of a paritcular rule, I kept flagging an error.
Finally, in frustration, I asked the layout designer to help explain what I was doing wrong. That’s when he hit me with “Oh, don’t worry about those errors — we waived them with the foundry”! I was flabbergasted. You mean you are not really DRC clean, but you taped-out anyway?? And the chip actually worked?? How could this be? I later came to find out that, although this was rare, it did happen from time to time.
Now fast forward a dozen years, and quite a few more gray hairs, to the present. In the past 3-5 years, it seems I have not encountered a single full-chip layout design that does not have “waivers”. Worse, it seems that some layouts can have hundreds, maybe even thousands of these waived results! A” waiver” is a DRC violations that the design team negotiates with the foundry to accept anyway. So what gives? Does “DRC clean” actually exist anymore? What really is required to tape-out successfully?
The first question is probably the easiest. Why are there now so many results waived? There are a couple of inter-related reasons. First, there are many more rules than ever before. As a result, in a congested layout, it is sometimes difficult to correct one violation without creating a different violation without a significant redesign, but this means a serious delay in time to market.
Secondly, and related, is the fact that the rules are now much, much more complex, with multiple layer and geometrical dependencies. Just understanding the requirements is difficult. Debugging can be a very time consuming process. Again, it is difficult to make a correction without introducing a new violation. What I find most interesting is that part of the reason for the complexity of rules in the first place is an attempt to weed out corner cases that might result in false violations in the first place!
Finally, a big reason for waivers is the advent of “recommended rules”. These are DRC rules that are not required, but, as the name implies, recommended. They often are standard DRC rules, but with a larger margin in measurement. These were introduced to the world at 130 nm and have grown with each release. The bane of many layout designer’s existance, recommended rules have caused lots of issues. Why? Because you often get lots of violations in cells that have already shown good results in silicon that you cannot edit, and because there is no enforcement of the rules.
Customers tend to handle recommended rules in one of two ways: 1) Ignore them altogehter or 2) Use these as their DRC constraints instead of the actual DRC constraints. If you go with approach #2, then you are going to have tons and tons of violations. Again, many of these will be associated with IP blocks that you know have worked in other designs in the past and which you cannot just edit. So, what do you do? You waive them!
So, now we know why there are so many waived DRC violations in nanometer processes. It seems DRC clean really no longer exists, instead what we’re attempting is “DRC clean enough”. But, is it really clean enough? Just how does the foundry determine if a particular result is uber dangerous or just marginally material? If there is a way to determine this, why can’t the designer just make that determination independently of the foundry? All good questions and fodder for a follow-on blog. I’m going to tee up my buddy Abercrombie to lean into this pitch! I’ll give you a hint, come see the Mentor Graphics DAC suite session or DAC workshop on eqDRC!
The next question is, assuming that the designer and foundry have negotiated and agreed that a certain violation can be waived, how do you safely capture and pass that information? You certainly don’t want the poor layout designer doing full chip integration to have to look at every violation and decide on his or her own whether each one represents the agreed upon waiver. But, you also don’t want to waive a result in IP that was created with version 0.1 of a rule file in version 2.0 of the rule file if the check requirements have changed. This is also a topic for a future post. And yes, you can also learn more about this at DAC by coming to the Mentor Graphics suite presentation on debug and waivers.
Yes, I’m using my blog post for gratuitous pitch to come to our sessions!!! Hope its informative never the less. Now I’ll “wave” good-bye. Hopefully you are waiving back with more than one finger!
About John Ferguson's Blog
Will provide insight into the challenges and requirements of physical verification across multiple process nodes. We'll explore new requirements, solutions and challenges.
- Mentor Graphics User to User (U2U)
- Layout Density and the Analog Cell
- Enabling Superior Support
- Assuring Software Quality
- The Value of Support
- Apples to Apples Benchmark? I Don’t Think So!
- April 2011 (1)
- November 2010 (1)
- June 2010 (1)
- March 2010 (1)
- January 2010 (1)
- December 2009 (1)
- September 2009 (1)
- July 2009 (2)
- June 2009 (1)
- May 2009 (1)