Accelerating mechanical stress simulation for 3D-IC reliability in the cloud
Marc Swinnen, Product Marketing Director at Ansys, discusses recent
collaboration with TSMC and Microsoft to develop a joint solution which
provides a high-capacity cloud solution for analysing the mechanical
stresses in 2.5D/3D- IC multi-die systems, which lets joint customers
avoid field failures and extends product lifetime and reliability.
PHILIP ALSOP, EDITOR asks the questions.
PA: Ansys has been collaborating with TSMC and Microsoft, focusing on analysing mechanical stresses in multi-die 3D integrated circuit systems.The obvious place to start would be to understand how did the collaboration come about?
MS: Our collaboration with TSMC has been going on for decades. That’s based largely on the fact that Ansys sells and produces the RedHawk-SC™ product, which is an electronic design automation (EDA) software tool used by chip designers to verify the power integrity of their chip. Basically, every chip has a power and ground network on it. Every single transistor has to be connected to power and has to be connected to ground, like any electronic device. If you have 50 billion transistors on your chip, that means you must design two electric networks, each with 50 billion endpoints. So, these are incredibly large and complicated on-chip networks that are vital to the proper functioning of the chip. They need to be checked because there is always voltage drop on the power lines. And these days, to save power, the voltage is so low that you really can’t afford to lose even 100 millivolts going from the package pin to the actual transistor. Hence everything has to be very carefully analysed to make sure your power integrity, or voltage drop, is properly accounted for and will meet your spec.
And that’s a very big, tough problem. And that’s what RedHawk-SC does. It does the final sign off for manufacturing, saying, yes, this will work. Of course, this all relies critically on the manufacturing rules. We work very closely with all of the major foundries, including TSMC. TSMC and Ansys have a longstanding collaborative association to get this golden sign off tool out to the industry - the large majority of all the chips in the world are signed off for their power integrity using Ansys Redhawk-SC. That is the foundation of our deep and ongoing collaboration with TSMC.
Which brings us to the topic under discussion. Traditionally, a chip, or integrated circuit (IC), is a monolithic piece of silicon - it’s all one thing. You cut it out of the wafer and it’s one little chip of silicon which gets embedded in a package. But now, for multiple reasons, it is no longer possible to build the big systems you want today on just a sing chip. So, they’ve started making multiple chips and putting them together into a system we call 3D IC - where you stack several chips on top of each other or, more commonly, you put them right next to each other, which we call two-and-a-half D. I’ll just call all of these configurations 3DIC - all these different ways of stacking or putting them right next to each other. The idea of a 3DIC is that it contains multiple dice. So bare dice, not packaged dice, that are placed right next to each other. Usually they’re placed on top of another chip, called an interposer, which connects them all together. All high-performance computing is going there today.
Now, some of these dice get hot and some of them get less hot and so you have differential thermal expansion. The dice are connected to each other with micro bumps. These are tiny, tiny bumps - up to a thousand per square millimetre - and they can’t stand very much shear stress. If your assembly starts expanding and contracting differentially and cycling through these thermal cycles, you’re going to get mechanical deformation, warping and stresses in this 3D assembly. And that is something radically new for chip designers. I mean, someone always had to worry about thermal expansion at some point. Usually a system or package designer way down the line, after the chip was assembled on a board and the board was put in the system and the system was in the heatsink - at that point, somebody did some mechanical analysis. But now it’s come crashing down onto the chip, designers now have to worry right out of the gate, how is this thing going to deform and warp? And if I use the wrong materials or use the wrong floor plan, my design will have much lower reliability than if it’s done properly. Hence, they need to do mechanical simulations early on, and predict thermo mechanical behaviour.
Ansys has a rich history in this area - beyond the semiconductor division we have many other simulation tools. We have computational fluid dynamics, we have mechanical, we have safety, we have optics, we have electromagnetics - many, many fields. But mechanical is one of our specialties where we are industry leading. It was natural for us to take those algorithms and apply them to our semiconductor problem.
TSMC worked with us to solve some of the issues they’ve seen in their own production and design side of things. They saw this as a problem which they needed solving. It’s a tough computational problem so they pulled in Microsoft Azure to give the cloud computing capabilities required to really solve this in the required timeframe. With cloud computing from Microsoft, the mechanical/thermal simulation from Ansys, and the manufacturing capability from TSMC, together we came up with a solution flow that worked and has been proven to work.
PA: And the objective of the project, I believe it’s to provide added confidence to address novel multi-physics requirements that improve the functional reliability and increase product lifetimes of advanced 3D fabric designs?
MS: So, there’s two points to that. One is novel, and the other one’s reliability. So why is this novel? Mechanical simulation is not novel in itself, but for semiconductor designers, it is. As I already mentioned, this was something monolithic designers never had to worry about. But 3D assemblies - and I use 3D as a catch-all name for all those different architectures that the foundries supply with chip-on-chip and chip-next-to-chip - there’s lots of ways of arranging these chips. I’ll just call them all 3D-IC
So that is novel for chip designers that they now have to think at the floor planning stage: Okay, which of these chips is going to get hot, which is going to stay cooler? If I put two hot chips right next to each other, is that going to be a thermal problem that I can’t fix? Or especially if two chips get hot in the same activity mode, like in streaming mode while you’re playing video, both these chips get really hot in this corner. That could doom my project right from the get go.
Thermal is the number-one limitation on achievable integration density today. You can very easily stack chips several layers deep. You can design that. You can manufacture that. All that is not a problem. The problem is you can’t cool it! It’ll get too hot, and it’ll melt. So how close and how compact you can make a system is determined, number one, far and away, by power dissipation. How do I control and manage my heat dissipation? So, when you’re assembling these large systems, multiple chips together - and we’re talking up to a dozen chips - how do I manage my power?