David Abercrombie’s Blog

Are your design rules letting you down? Did another yield problem slip through the cracks? Having trouble memorizing and debugging 2000 different design rules? Don’t know where DRC stops and DFM begins or maybe DFM never ends? Then this is the blog for you! There has to be a better way. Let’s figure it out together. Join me in a revolution to do things differently.

20 November, 2009

In case you missed the webinar by Jim Culp on November 3rd, I wanted to give you an opportunity to see what you missed. Jim is a Senior Engineer in IBM’s Advanced Physical Design and Technology Integration team. He is leading a team in the development of Parametric DFM and the mitigation of Circuit Limited Yield (CLY). During the webinar he discussed how CLY is becoming the leading contributor to yield loss in advanced technology nodes.

One of his areas of focus has been static chip leakage. He showed how the static leakage has become a significant if not dominant contributor to total leakage. Estimating chip leakage during design has historically been a “back of the envelop” calculation using some form of device count, maybe L&W counts and a bit of excel magic. Jim asserts that this is no longer sufficient to protect against a profit impacting “surprise” when the chip hits production. He was determined to find a better way.

Jim had attended a presentation I gave a long time ago on equation-based DRC and how we were trying to add functionality to the traditional DRC language to enable a new approach to checking. He quickly grasped that this capability might give him the tools he needed to attack this leakage prediction problem. He developed empirical equations to model the various components of device leakage, calibrated them using the SPICE decks from the fab and coded them in Calibre nmDRC.

Not only did he get better prediction accuracy, but he was able to get much improved visualization and drill down into the contributions and prioritization of the gates with them most problem. They use the tool in a complete flow to identify problem designs early, validate results as the chips move through production and improve the model over time by localizing discrepancies between the model vs actual and improve the equations.

Unfortunately, we were not able to record the webinar, but Jim has been gracious enough to allow us to make the slides available for you to see. Here is a link to them: slides. I hope that you find them as intriguing and inspiring as I did.



, , , , , , , , ,

20 August, 2009

I got some questions from my last installment of this series asking for some pictures of defects that caused yield issues in production that could have been avoided during design. It struck me that most designers probably never get a chance to see the manufacturing problems their designs encounter. Since my background is in the fab, I wrongly assumed everyone had lived through the same pain as myself. It’s a great question so I decided to focus this installment on real life examples.

There are actually three basic type of DFM issues that a design can encounter  (Random, Systematic, Parametric). Random defects are defects that occur independent of the design layout, but the probability of the design failing because of them, is dependent on the layout. Here are some example of the defects I am talking about.


The image on the left is a composite wafer map showing the location of all the particle defects that occurred on this wafer during processing. By composite I mean the sum of the particles that occurred at various points during the manfacturing process and are located within various layers of the design. This map gives you a feel for the spatial distribution and occurrence rate of particles. For the  most part you can see that they are randomly distributed across the wafer independent of the repeating pattern of the design die. The exception are the defects that occur in the circular (spirograph like) patterns. These are scratches generated during the CMP process as the grinding pad rubbed some hard particle across the wafer. The pictures on the right are optical and SEM images of some of the defects. Depending on where and when the particles occur they can have three possible effects. They can cause an electrical short, an electrical open or they can have no effect at all if they land in open areas or are two small to create a complete short or open. This is how the design can effect the yield. It is clear from the wafer map that if every particle caused a short or open then none of the die on this wafer would yield. In reality, a design layout is relatively empty on a given layer. If you think about it as layout density then most parts of the layout on a given layer are less than 50% dense meaning half of the space is unused. Therefore only a percentage of the defects land on active circuitry and if they do only a percentage are big enough to cause a comple short or open. By utilizing the open space of the layout more effectively, a designer can limit the susceptability of the design to these particles. Critical Area Analysis (CAA) is the DFM tool used to assess the design sensitivity to random defects. By measuring and reducing the amount of critical area, design teams can improve their yield. An example of a design change effecting this sensitivity is shown below.


This comes from a paper I did with LSI at DesignCon this year. We used the Calibre YieldEnhancer tool to find opportunities for via doubling that their router missed on four different designs. We then ran Calibre YieldAnalyzer to assess the Critical Area impact of doubling the extra vias and the yield impact it would have in production. You can see that the design yields were increased by up to 2% by making these few incremental changes on top of what the router had already done on a process that was already running at mature yields. On a high volume product 2% could mean a lot of extra profit. Imagine the impact of a broad range of changes throughout the design flow.

The second type of DFM issue that a design can encounter is systematic defects. These are defects that only occur when a particular layout construct interacts with a particular process variation. Again, the problem is statistical, in that the process only exhibits the particular variation a small percentage of the time and only a narrow range of layout constructs are suceptible to the variation. Several examples are shown below.


In this first example an electrical short and and electrical open are shown that are caused by variation in the lithography process that interacted with these particular layout constructs. You can see that the the bulk of the patterns are produced without issue and the problem was very localized. These locations print perfectly fine at nominal litho dose and focus, but at one edge of the process variation these spots image improperly. These particular locations have a non-zero probability of having this occur but the probability is not 100%. Tools like Calibre Litho Friendly Design (LFD) are used to identify these types of litho sensitivities.


In this second example an electical short is shown that was caused by the interaction of the previous layers with the CMP process. You can see in the picture on the left that all the lower levels o metal were aligned with the same spacing and width. This caused a slight thickness variation on each layer that added up as each layer was polished. Then in the top layer the layout was different. and the severity of the depression had accumulated to the point that the CMP process did not clear all the copper in the depressed area leaving the slight amount of copper bridging the two wires. Again these particular locations have a non-zero probability of having this occur but the probability is not 100%. Tools like Calibre CMPAnalyzer (CMPA) are used to identify these types of litho sensitivities and tools like Calibre YieldEnhancer are used to do “smart” fill to correct them.


In this example an electrical open is shown which is caused by the migration of small voids (bubbles essentially) in the copper metal that move to a point of stress relief and accumulate to the point of creating a significantly large void to cause an open. This phenomenon occurs when large areas of copper are in proximity to a single via. The via tends to act as a point of stress relief. Again the probability of it occuring is non-zero but not 100%. As the graph on the right shows the probability varies dramatically with the change in the width of the wire in this particular test structure.


 In this example a non-problem becomes a problem in a very limited combination multiple layout dimensions. The dielectric deposition process that covers poly and active prior to cutting the local interconnect (LI) holes produces “keyholes” with certain gate spacings as shown in the picture on the right. Normally, these are no problem and do not effect anything about the circuit. However, when two LI cuts with small spacings between them are made between these gates as shown in the layout on the left an unexpected problem occurs. The keyhole acts as a tunnel between the two LI cuts and when the titanium liner is deposited in the cut, small amounts of Ti diffuse into the tunnel. If the LI cuts are close enough together then the tunnel is short enough for the diffused Ti from each side to touch causing a short as shown in the picture in the middle. Again the probability of it occuring is non-zero but not 100% and is highly dependent on both the gate and LI spacing simultaneously.


In this final example electrical shorts have an increased probability of occuring if min spaced metal wires of min width run at long distances beside each other. It is due to surface tension caused by evaporating water during develop rinse and dry. It is very feature dimension sensitive and has a non-zero but not 100% probability of occuring.

In the last three examples there is not a dedicated process simulator based DFM solution in the EDA industry for identifying these types of things. In these cases people are using Calibre YieldAnalyzer to create statistically based recommended rule analysis reports for these issues as they find them. We call this type of analysis Critical Feature Analysis (CFA). The idea is to take multi-dimensional measurements of the layout and relate them in a mathematical way to generate some level of empirical model of the probability or risk of these types of occurrences and then to roll up the statistical probability at the block or chip level. Armed with this information the designer can proritize the various features by sensitivity and drive down the overall statistical probability of failure. This in turn improves the yield. An example of this was demonstrated by Samsung in the Common Platform joint paper at SPIE this year shown below.


The table on the left shows the difference in the MCD score between the DFM enhanced and the nominal design. MCD is the Common Platform implementation of the Calibre YieldAnalyzer CFA solution. They ran the two versions of the layout on a test chip side by side. The table on the right shows that the DFM version yielded ~8% better than the non-optimized one. The MCD scores doesn’t predict the exact amount but there is a strong statistical correlation between the improvement of these DFM quality scores from CFA and the yield in production.

The last type of DFM issue that a design can encounter is parametric variability. This might not be accurately called “yield loss” as it depends on your product specifications. However, different layout configurations can experience much more variation than others in a way that doesn’t cause a short or open but causes a variation in some product performance measure. Again I will use a litho example.


In this example the L-Shaped piece of poly rounds off when printed on the wafer. Because the bend is so close to the active area edge it affects the gate length at the edge of this transistor. The difference will vary as the alignment and exposure vary during processing. By moving the bend farther away or reducing how far the bend runs parallel with the active edge the designer can reduce the variation he or she will see in production.  Recommended rules in general are layout guidelines that relate to statistical yield loss and parametric variability. In other words, they are rules that you don’t have to always follow but the statistically more you follow them the more of a reduction in statistical variability you will see in the product. The following are good examples.


The left example shows that changing the contact to gate spacing from the min design rule to the increased recommended rule reduces the Ioff leakage in the transistor by 35%. A 35% change in one transistor may not be critical but if a statistically significant number of transistor have room to make this change then it will have a statistically significant impact on the chip leakage. The second example shows a 10% change in resistivity of poly as the width varies from the min DRC rule to the RR. The third example indicates a significant change in the IDsat of a transistor as the gate spacing is changed from min DRC to RR. The bottom line is summed up well in the following data from ARM.


They show in this experiment 5 different implementations of the same cell. The graph shows how the performance of the cell varied with different implementations and the table shows a change in relative yield between the approaches. All of them passed DRC and LVS! Design does make a difference and using the DFM tools to guide you optimization will make a difference.

I hope these examples help you better understand the importance of investing in DFM tools, practices and methodologies.

, , , , , , , , , ,

6 August, 2009

My Monday started off well delivering the eqDRC presentation with Jim Culp. But I didn’t have long to enjoy it as I had to quickly head up to the mezzanine level to get ready for my lunch and learn event with ARM and Chartered. We have had a long relationship with both companies and we finally arranged to do a joint presentation on how we have collaborated to make more DFM compliant IP.

It started with a rather unexpectedly good lunch. Usually, I don’t like the buffet like food you get at these things but this was both good and healthy. They had grilled vegetables, two different salads, Chicken, Halibut, and beef. I have been working on my cholesterol (Design for Maturation), so I stuck to the Halibut. However, after filling the belly we got down to business and each of us gave our own view of the partnership.

I, of course, focused on all the DFM tools we have developed over the last 4 years YieldAnalyzer, CMPAnalyzer, YieldEnhancer,  and LFD. The YieldAnalyzer tool supports two types of analysis. the first is Critical Area Analysis or CAA which analyzes the design sensitivity to random defects in the manufacturing process. The second is Critical Feature Analysis or CFA which analyzes the design sensitivity to the host of issues covered by recommended rules. CMPAnalyzer models the thickness variation and the YieldEnhancer tool provides several modes of fill to help improve the planarity. Finally, LFD or Litho Friendly Design, models the 2D variation due to litho/etch effects. However, my main point really was to say that all these tools were not much use without the partnership that provides the configuration data and use models that put them into useful practice. That is what was nice about having Rob Aitken from ARM and KK Lin from Chartered there to really show how they had enabled and utilized the tools to improve the DFM quality of incoming IP.

Rob talked about how ARM had learned over the history of cooperation since 90nm to get better and better at DFM. He also pointed out that it is very important for them to interact very early in the process development life cycle with Chartered to assure their IP is ready for customers when they need it. Their approach to making IP DFM compliant is to make it part of the architecture and design process as opposed to and add-on activity. I think that is a really smart way to go. I learned a lot from the following slide that showed the various DFM loops that ARM utilizes in improving their IP.


The first is the tightest loop where the designer is using the tools interactively to optimize the IP as it is designed. Once that is complete they do additional analysis on the whole library to look for outlier cells that need to be cleaned up. They then work with Chartered to have them analyze the cells for any fine tuning. Finally, everything is evaluated in silicon test chips to see if anything was missed. Many people only do the last long loop which is very expensive and does not lead to good DFM quality IP.

Rob had some great results data that showed both performance and yield gain due to these types of activities. The following slide shows the performance and yield metrics from 5 different  implementation appoaches of the same set of IP. In the chart the higher the value the better performance. The yield metric combines yield with cell utilization information to get a number in which the higher it is the better.


As you can see approach 1 had high performance and high yield. Approach 4 had high performance but much lower yield. Approach 3 had poor performance and mediocre yield. This shows that different approaches to implementing IP and DFM can have significant impact on the final result.

KK talked about how they had continued to refine and expand their DFM offerings over the last 4 years and 4 technology nodes. This included not only configuration data for our DFM tools but also use flows and acceptance criteria development. The slide below shows their acceptance criteria flow for IP and the web portal that they allows customers to access that shows which IP has which level of qualification including DFM.


I think that this type of process and information availability is really unique to Chartered. I know that I would want to know the quality status of IP I was going to use in a design. It is also great when someone else goes to the trouble to do the evaluation for you. KK also showed a great set of data about the DFM scores of the last three stdcell libraries that they qualified with ARM.


Each chart is a histogram of the DFM scores for a whole library of cells. The score ranges from 0 to 1 and the higher the better. The top chart are the scores for redundancy and the bottom is the scores for the process margin checks. You can see that all the cells from all three libraries have distributions highly biased to the right with a small std dev. The yellow distributions are from the most recent library that went through the most recent DFM processes and you can see that it is better than the previous generations. I just love to see people drive improvement with data instead of just talking about it. It always amazes me how creating metrics like this drives long term improvement.

Finally, KK showed some test chip results from a paper that Samsung (their alliance partner) presented at SPIE in February.


They ran a test chip with and without DFM enhancements alternating on the same wafer. The non-DFM version yielded 79% and the DFM version yielded 87%. That is 8 points in yield! The leakage and speed distributions were also better on the DFM version of the design. Finally some data to back up what everyone knows but wants to deny about DFM. It does make a difference!

Well that just goes to show that it is possible to have a good meal and learn some good information all at the same time:)

, , , , , , , , ,

4 August, 2009

I felt privileged this year to get a paper accepted into the technical track at DAC. It seems more and more difficult to get something through. I think they said they only had a 20% acceptance rate this year. The paper was part of track 5 on Tuesday at DAC. I was glad to get to present this one because it was fun doing the experimentation for it and I think it helps answer one of the nagging questions I always get about eqDRC. I worked with Fedor Pikus, the Lead Software Architect, and Cosmin Cazan, a Portland State University Intern, on this project.

If you don’t already know, eqDRC is just a set of command extensions we have added to the base Calibre DRC product that enables new ways to define and implement design rule and recommended rule checks. Basically, it enables a simple mathematical modeling engine based on multi-dimensional geometriclayout measurements. That is a fancy way of saying that DRC is no longer limited to overly simplistic one dimensional measurements to determine if a layout is manufacturable. It allows defining equations that relate multiple dimension measurements together. With eqDRC you can better approximate the physics of the manufacturing issue or at least a much more accurate empirical model of the phenomenon. Here is a link to some papers and on-line seminars on the subject:

One of the questions that everyone always brings up is “how do I determine the equation?”. Well there are many answers to that and it also depends on who you are. If you are the fab engineer who defines the design rules normally, then you either have silicon wafer data where you have characterized the phenomenon or you are pulling the rule out of your @#$ based on your experience. Either way, I would argue that it is just as easy to define a mathematical function for the phenomenon as it is to pick some points along the curve to make a bunch of “bucketed” single dimensional rules. If you are in the fabless design house and you are defining design methodology checks then you know the phenomenon you are trying to check. Look to see if a mathematical relationship would capture the rule intent better than a bunch of single dimensional checks.

The challenge is when you are a fab guy without silicon data or a fabless guy trying to build a better check for a fab issue. A good example of this is litho corner rounding. Everyone knows that these days layout is not WYSIWYG. The picture below shows how the as manufactured poly shape diverges from the drawn causing potential changes to the effective gate channel length near the active edge.


Today most of us check this kind of phenomenon with a simple spacing check between the bent poly and the active. You can easily see that this does not accurate model such a complex situation. This is the reason we have built process simulation tools like Litho Friendly Design (LFD) to accurately simulate the image contours. The advantage of solutions like LFD is that the foundries provide the configuration data kits for a fabless company to use to do their own simulations. The disadvantage to these tools is that the simulation is much more compute intensive than standard DRC deterring the use in a iterative loop during layout. In this paper, we proposed a flow combines advantages from eqDRC and LFD. We started by defining a two dimensional empirical model for this phenomemon as shown in the next picture.


We found that combining the width (distance from the l-poly to active) and the run length (length of the poly bend) gave good results vs simulation. Intuitively, the rounding effect changes exponentially with width and linearly with run length. We then drew a bunch of simple GDSII test structures in which we varied the width and run length and ran the LFD simulator on them to see how the actual contours varied from drawn at the gate edge. Below is a picture of the test structures and the data we extracted.


This data is simple and fast to run with the simlator because the GDSII is so small. We curve fit this data using Microsoft Excel using the model form shown earlier and calibrated an equation we could code in Calibre nmDRC using the eqDRC functionality. For each gate we summed the corner rounding effect from each side of the gate to get the total gate length variation. We then ran both the eqDRC and LFD solutions on a real design and below is a comparison of the results from each.


You can see that there is very good correlation between the empirical equation method and the simulation method. The advantage is in run time. On a big design like this the simulation can run for days and the eqDRC deck can run in minutes. The results can be graded by how much gate length variation as shown in the following picture.


The full flow for utilizing the best of both tools is shown below.


You use the LFD simulator to calibrate an eqDRC deck. You then use the eqDRC deck in the iterative layout loop when you are trying to optimize your layout. You then use the eqDRC deck to define the most sensitive locations in your layout after you have finished optimization. Finally, you run LFD on these limited sites to get a high resolution accurate simulation to make sure nothing is still out of spec. These last two steps help improve the run time of the simulation on the whole layout by limiting where it needs to run.

Overall, I really think this shows a practical use of tools you can get from us and data you can get from your foundry to make a very useful flow for analyzing and optimizing a very complex manufacturing issue in your layouts. The only challenge at DAC was that I had to present the whole thing in 15 minutes! Luckily I talk fast. I would love to hear your feedback on this approach or ideas you might have to apply this concept to other issues.

, , , , , , , , ,

31 July, 2009

Well, day two of DAC started a little earlier than the first day. I had to attend the speakers breakfast for the paper I was going to give later that day. However, after breakfast I had my 9am suite presentation on eqDRC again and I also had a special guest again. This time it was Robert Boone from Freescale in Austin, TX. He works in the DFM team and he also agreed to come tell everyone what he and freescale had been doing with eqDRC.

What was fun for me is that Robert’s talk was much different than the one Jim Culp from IBM had given the day before. Jim’s was all about power analysis and Robert’s was all about applications to recommended rule based DFM. Here is a mug shot of Robert giving his presentation. Sorry about the quality Robert:)


Robert first showed an example of how Freescale had started using eqDRC for regular DRC applications, but he quickly moved to his two primary applications which were DFM scoring and DFM improvability. Freescale has been working with ST Microelectronics in a joint venture for Automotive Design in which they developed these flows. The slide below shows an overview of what they do for scoring recommended rules.


In their design rule manual they show not only the design rule limit but also three levels of DFM limits (L1, L2, L3). They know that the impact of recommended rule violations varies dramatically by dimension as shown in the colored curve in the bottom left. They use eqDRC capabilities to grade each violation on a continuous scale. As seen in the next slide the severity levels help give the designers target (bin) thresholds for improvement.


Each bin is weighted on this non-linear scale as shown in the table on the right. Notice the weights vary by both dimension and from high critical rule to low critical rule. This matrix helps make tradeoffs between violations and rules. Robert stated that having the ability to create these scores has really helped them understand, measure and track their designs and design methodology. Over time they have been able to track the improvement in their scores as they implement new tools, procedures and practices.

One of those new practices was “Improvability”. This is a metric they have worked on creating with the calibre infrastructure (including eqDRC) to understand how much low hanging fruit is available in a design that could easily be fixed by the designer. It is kind of like the “stop you whining and fix the simple stuff” kind of metric. The next slide shows what they mean.


They use Calibre to find places where simple local improvements can be made without impacting other rules or design area. If the improvement is sufficient to move a violation along the score curve from one bin to another, then they consider it improtant enough to fix. To deal with the tradeoffs between various recommended and design rules they use the following system.


The previous table is expanded to compare the improvement against any detriment it may cause in another rule. This is where eqDRC becomes so helpful. Determining if the fix is the right thing to do requires a mathematical analysis of the options. This also helps alleviate the potential oscillation between fixes and new violations. The example below really helps show how they apply all this.


In this rule that encourages widening field poly where possible you can see in the table that two other recommended rules and four other design rules are analyzed in determining which ones are improvable. In the picture you can see the poly is in Green the active is in red the violations are in cyan and the improvable edges are in magenta. Freescale uses these scoring and improvability decks in many ways as shown in the following slide.


The first two uses are in optimizing the router techLEF setups and the pCell generators for auto cell migration. The run experiments with different settings and see which one produces better scores and leave fewer improvable locations. They then use the decks to drive manual optimization on top of the automated optimization. As the table shows, they identified that 10 cells out of the 600+ library accounted for 75% of the yield loss because their utilization was so high and their scores were so bad. By focusing the design efforts on these 10 cells they show in the table that they can get almost 1% yield improvement on a large design from defect limited modes alone. Remember, this analysis was done on a mature process that had already been optimized by the automated tools. This is additional yield they squeezed out for volume production margin improvement. It also doesn’t account for improvements in parametric variability and yield. I like the last reason on the slide as well; You just learn stuff when you have a tool that can measure your quality and give you useful feedback.

It was great having Robert present. Both he and Jim will be giving webex seminars later (August or September time frame) on this material. So if you missed it you can hear it from them at those sessions. Keep a look out on the Mentor website and your email for the announcements. Much of the work that Freescale and ST have done has also been documented in two User2User papers that they wrote. Here are links to them if you are interested in more detail in the short term.



I hope all of you are having as much success with DFM!

, , , , , , , , , , , ,

28 July, 2009

Well it felt familiar to be back in San Francisco for DAC this year. However, I wasn’t ready for the cold. It was 100 degrees in Portland when I left and I always assume the Bay area will be warmer. Luckily I looked at the weather map before I finished packing and replaced my short sleeve shirts with long sleeve ones. I didn’t get in until late Sunday night so I only had time for a dinner in the Westin and then headed to bed.

Monday began pretty early for me. I gave the 9am presentation in the Mentor suite on eqDRC (Equation-Based DRC). I say that I gave it but it turns out I had a great special guest who did most of the talking. Jim Culp (the “DFM Jedi” as we call him) at IBM in Fishkill, NY was on hand and agreed to present on the work he has been doing with the eqDRC capabilities at IBM. I gave a brief intro to the basic idea and concepts behind eqDRC and then let Jim run with it. Here is a not so professional picture of him doing the presentation. Sorry Jim:)


I have been amazed over the last 1-2 years what Jim has done. He has been one of the first guys to really grasp the potential power of the eqDRC capability and apply it to real world problems. He was only able to discuss the most mature of his ventures to date which was related to chip leakage analysis, but I know that he has several other applications he has done with it as well. Jim’s focus has been to use the generic tool capability as a platform for modeling circuit yield loss mechanisms on incoming designs into the foundry at IBM.

For the leakage application he set out to generate a much more accurate modeling capability of static chip leakage that can run in just a few hours and give the foundry and the designer much more information about what to expect in leakage and what issues they may not be aware of. Jim presented the graph below to explain why he thinks static leakage analysis is so important in advanced designs.


The static leakage is becoming a dominant source of the total leakage and failure to predict it properly can cause significant yield loss in production when your leakage specs are tight.

The one thing people always bring up about eqDRC is “how do I get the equation”. Well Jim’s approach for leakage modeling was to use the SPICE models that the foundry provides to generate data for an empirical fit. For instance. Below is the equation form and example data fit that he used for the N-Well proximity effect on transistor leakage.



Using this technique combined with using equations you can get from any device physics book, Jim was able to create a complete set of statistics on leakage for each transistor in the layout that is unique to its specific context. Below is a picture of how he is able to “grade” each transistor.


Once the leakage of each transistor is modeled Jim’s uses Calibre to statistically roll up the leakage into a “heat map” as shown below.


With this he can predict the total chip leakage and distribution across the wafer. He also uses this to compare with the actual heat map from the real chip in failure analysis. If there are mismatches between the model and the actual he can identify them to either understand a new mfg issue or better tune the model.

Each of the parameters that contribute to leakage are separately calculated and analyzed for distribution and outliers. The following is an example chip distribution of narrow channel effects (NCE).


He stated that most designers assumed parameters like this had a normal distribution. Clearly this is not the case and there are some dramatic outliers to the distribution. These transistors can now be highlighted on the layout to determine what is causing them and hopefully to drive layout changes to fix them.

All in all, it was an amazing presentation and a great start to DAC!

, , , , , , , , , , , , ,

1 July, 2009

That is the question!

If you read my colleague John’s most recent posting “Waive of the future?”, you will understand the question. I was equally shocked as John to find that almost no one tapes out DRC clean anymore. I would add one other reason to John’s list as to why this has happened. I think the traditional DRC rules are broken. Please read my first post “Are Design Rules Broken?” for my stance on this one.

I can understand it from the standpoint of recommended rule violation as no one expects you to always follow the recommended rule but to follow them as often as possible without giving up area. Being so focused on yield over my career, I have focused my EDA tool development on finding better ways to encourage people to follow recommended rules and made the assumption they always followed the design rules. In the process I missed the biggest new problem in physical verification. People just want to know which rules (design or recommended) they can ignore. Deciding which violations to allow is a bigger problem to design teams than deciding which ones to fix! When I think about it from a human behavior standpoint I guess it makes sense. Most people are more interested in identifying work they can “get out of ” as opposed to extra credit work they can “volunteer to do”:P 

The key to deciding which violations to waive rests on the same manufacturing reality that nothing is black and white. All violations are not created equally. Design (recommended) rule checking must evolve beyond the pass/fail approach. All violations should be assessed on the grounds of yield, reliability or circuit performance risk. You should always attempt to fix the violations in order of risk until you have the smallest set of lowest risk violations you can. Then the fab engineers or design management team and evaluate the remaining risk to make an informed decision about whether to allow the remaining violations as waivers. It was in this revelation that the new Calibre functionality like Equation-Based DRC (eqDRC) and Critical Feature Analysis (CFA) was born.

EqDRC allows the rule writer to not only check that a feature meets a minimum requirement, but to mathematically grade the violation in reference to the design rule and/or recommended rule requirement. The “grade” could be as simple as a delta from required value or a more elaborate measure of actual risk based on failure probability or performance loss, etc. The benefit is best captured in the following picture:


Admit it we are all engineers at least at heart, so it is much easier to convince someone to “let” you waive a violation if you can show them data. By looking at the histogram the reviewer could quickly focus on the worst of the errors. If they are comfortable with those then they might readily waive the entire set. There is also less chance that they miss looking at an important error that they should not let you waive. Waiving does you no good if the chip doesn’t yield.

The Critical Feature Analysis functionality that is supported in Calibre YieldAnalyzer takes this idea to the next level. It enables statistical roll-ups of impact, interactive and batch reporting of charts, tables, etc. In the case of recommended rules for instance you could calculate a cumulative impact of each of the graded violations and if the “score” is less than some threshold then it is ok to tape out. It is also nice to generate batch html reports of the final statistical scores for management review and historical records.

Are you facing these types of issues in your designs? I would love to hear what you think about this issue and if you think these type of things would help. By the way I am giving various presentations at DAC on eqDRC. There is a presentation in the Mentor Booth each morning at 9am. I am also presenting in the DAC Theater in North Hall, Booth 4359 at 2:20pm on Tuesday of DAC on the subject. I would love to have you come by and give me some feedback face-to-face. Just don’t slap me:P

, , , , , , , ,

19 June, 2009

One of the fundamental questions everyone asks about DFM is “why should I do it?”

On the one hand this always strikes me as a funny question. I always look at DFM in the same way I think of automobile safety. Statistically, most people never get in a serious accident. So why would you spend so much money on airbags, antilock brakes, better seat belts, side door reinforcements, traction control, etc. It probably adds 20% to the cost of the car and makes it take longer to get cool new designs to the market. The reason is, you don’t want to be the one in the tail of that statistical distribution.

My previous blog talked about the risk of yield variability due to manufacturing interactions with the design. I talked alot about the two or three designs on my chart that were having issues. However, did you notice that the large majority of the designs followed the curve as expected? You aren’t doing DFM because you will get a yield problem, you are doing DFM because you might get one. It is always a matter of statistical probability. Doing DFM just moves you farther from the tail of the distribution.

The other thing I did want to clarify with this blog was that it is not only a matter of yield. Yield seems to be what everyone brings up when discussing DFM. I think it is just easy to relate yield to the bottom line. One of the other areas that DFM can have a really important effect is in reliability. I have been working with several customers who are in the automotive or military product space, and reliability means alot more to them than yield. However, I don’t think a customer return for quality ever helps anyone in any product space.

When I used to work at LSI Logic we did some big studies in the yield and reliability space and there was some really good material published on the results. It was primarily focused on improving test coverage but I think it is very applicable to the DFM subject. The following chart shows a correlation over three process nodes in which we tracked the defect density (lower Dd equals higher yield) and reliability failures.

Correlation over three process nodes of yield to reliability
Correlation over three process nodes of yield to reliability

 You can see that as the defect density decrease (yield got better) in each technology node the reliability failures (EFR – Early Fail Rate0 also decreased accordingly.  It suggested a strong correlation between the two. So to investigate further we did a controlled split experiment.

Die that "almost failed" test ended up failing in burn-in reliability screening

Die that "almost failed" test ended up failing in burn-in reliability screening

In the wafer map in the bottom right of this picture you can see a map of some of the parametric tests that were done at wafer sort. This is a map of the min-VDD voltage at which the die would function properly. All these die passed the test, but you can see a strong variation from one side of the wafer to the other. This is typical of systematic variation in the processing of the wafer in which etch, photo or other process cause slight variations in gate length or other things that cause the chips to behave slightly differently. What is interesting are the four die that are circled. They are no worse than the die on the left of the wafer and they pass the test. However, in their “neighborhood” of other die they are clearly outliers. In the table on the right of this picture, we split the “normal” passing die from the “outlier” passing die on 14 different wafer lots of the same product. We then ran burn-in reliability stress testing on both groups. In the “Total” row you can see that the “normal” group failed 0.22% of the time and the “outlier” group failed 10.72% of the time!!!

The key is that these outliers are die that almost failed. The TEM cross section picture in the upper left of the picture shows the failure analysis result from one of these “outlier” die that passed wafer sort test but failed reliability testing. You can see that the tungsten was missing from the via, but the liner was pretty much in tact. It conducted current, but very poorly. With the accelerated stress of burn-in the liner broke down and it failed. The bottom line is that the “outlier” die are those ones that needed the extra safety gear in the car. The same things that help make you robust for yield, also make you robust for reliability.

So who thinks a seat belt is worth the extra time and money now?

, , , , , , ,

2 June, 2009

I got a kick out of Rohan’s comment on my previous blog (How do you define DFM?).  It is too easy to assume that anyone knows what you are talking about when you say DFM.  Just because everyone has been talking about it doesn’t mean any of them know what they are talking about.

You could probably infer from my approach to the previous blog that my background is primarily on the manufacturing side. I was responsible for yield and reliability improvement in the wafer fab. We measured yield and reliability by counting the difference between the total number of die (chips) that we started at the beginning of the manufacturing process vs the number of die that passed a series of electrical and physical tests at the end of the process. You can measure this as a percentage, but this is misleading because different designs have different die sizes which builds in a bias towards large die sizes yielding lower. It is simply a matter that big die have more surface area and circuit content so they have a higher chance of getting impacted by any particular issue than a small die.  Because of this issue, a metric called Defect Density (Dd or D0) is used to normalize yield across different die sizes. The chart below shows the expected yield by die size for a specified Dd. For a given die size you can predict the percent yield assuming a exposure to a constant fab defect rate.

Percent Yield vs Die Size for a Specified Defect Density

Percent Yield vs Die Size for a Specified Defect Density

This Dd provides a very simple single measurement to track yield across product lines. It also provides a convenient hammer for management to beat you with:) However, making the numbers get better is a whole different ball game. With hundreds of process steps from beginning to end, it is like chasing cats trying to tune out and control all the variation. That is why we were so focused on measuring data at every step along the way.

The thing that ultimately led to my transition into EDA was that we began to notice that some designs didn’t follow this curve. In the following chart you can see some of the data from my previous life in which we plotted a wide assortment of our designs on this curve. Each point on the chart represents the yield for a given “lot” of wafers (25 wafers) run through the fab.

Actual Yields vs Plan Yield for 12 products on the same process at the same time

Actual Yields vs Plan Yield for 12 products on the same process at the same time

You can see in the chart that products 1,2 & 3 have very similar die sizes but dramatically different average yields compared to each other even though they were running on the same process in the same fab at the same time. They yield consistently but differently. Clearly, something was fundamentally different between them that make product 3 much more sensitive to the fab issues than product 2. Also note product 7. It yielded very inconsistently from lot to lot. In other words, slight variation differences from one process run to the next would have dramatic effects on its yield. You can’t blame these problems on the fab because they all got the same process and the other products fit the curve as they should.

The take away for the non-PhD is that design can make a difference on yield and just passing DRC, LVS & timing does not account for this difference . The trick is figuring out what variations in design lead to these variations and determining ways of eliminating these variations. The process, tools and methods for doing this are DFM. In my next blogs I will discuss how this manifest itself in reliability problems as well and the various types of design variations that cause these issues.

, , , , , ,

15 May, 2009

What does design for manfuacturing (DFM) mean to you? “More work to do!” “Someone else’s problem!” “Just more design constraints!” “The fab guys are expecting me to understand the process as well as design!”

I propose that we define DFM (design for mfg) as an attempt to trasfer a way of doing business that has been tried and tested in the manufacturing space for years, into the design space. The fabs have a long history of dealing with what I would call manufacturing for design (MFD). The basic mfg philosophy is that the only way we can ever hope to produce millions of chips from hundreds of designs with high yield is to measure, target and tighten variation in the manufacturing process.

For a fab guy everything begins with metrology (measure something). Someone once said that “You get what you measure.” They were right. With data you can understand the intrinsic distribution and ongoing trend that your process produces. Armed with data the fab tweeks the tools, flows, behaviors, etc. that tightens and centers the distribution and controls the trend over time.

I believe DFM can serve the same purpose in the design process. You will never improve the manufacturing robustness (quality) of you designs if you don’t measure something about the design that correlates to manufacturing robustness. With a good measurement in place it becomes a matter of tuning the design tools, flows, methods and behaviors to improve that metric.

I also don’t believe in the need to quantify the ultimate ROI before beginning this process. Measuring the quality of your design costs very little and understanding your own design quality variation will reveal low hanging ROI for improvement. The results of small initial steps will justify the next steps and so on and so on.

What do you think? Do you do DFM in your design flow? What kinds of things do you do? How did it get justified? Do you think it actually works?

, , ,