TSMC processes...

As promised. I first read this paper in October: https://fuse.wikichip.org/news/6439/tsmc-extends-its-5nm-fam… But at some point but it is no longer paywalled, and the chart of processes, depending on your job, might be worth printing out and posting on your wall. For example, I didn’t know that N7P and N7+ were actually different process nodes.

Which of the four N5 based nodes is AMD using for Zen 4? As I understand, the answer is none of them. AFAIK the two main differences are the support for Vcache, and power distribution network/layers that are patented by AMD. The trick? instead of having a solid (copper) ground plane, there are Vcc and ground layers that are edited to create one effective plane, but of course with no connections between them. You want/need capacitance between power and ground, right? Sure, but putting it where you want it rather than everywhere may be a good idea.

Note that the Vcache requires that the (L3) connections need to be on the surface. This means that the power and ground have to be provided below that layer. So the Vcache and ground plane differences may all be part of one design.

This is really about Intel’s technology roadmap, but I figure that anyone who reads the Intel group and is interested in comparative technology reads this group.

You can find a roadmap here: https://www.anandtech.com/show/16823/intel-accelerated-offen… I’m just going to talk about power delivery and interposers.

What you gain from putting the power under the transistors is easy to figure. But using two wafers to manage that is expensive. Right?

What if you have that silicon there for other reasons. AMD Ryzen and EPYC “chips” all depend on an interposer to mount the Zen and I/O chips onto. So why not use that piece of silicon to distribute the power. The bulk of the various chips will serve as the ground plane. It will be necessary to run connections through the interposer to the pins (or non-pins with Zen 4 and later). But that has to be done anyway. Getting power above the transistor layers can be done with plated-through vias. That’s what I’m expecting in Zen 4. AMD could use a third piece of silicon, but I don’t see why.

Now on to Intel. Intel has picked a fancy name, PowerVia for the technology. I figure from Anandtech’s (not Intel’s) graphic that Intel will be using a silicon interposer for the Intel 20A* node. I think they could start earlier, but Intel 4 is supposed to use Foveros 36 micron then EMIB 45 micron, and finally Foveros 25 micron and Foveros Direct both aligned with Intel 3. This puts PowerVia, and by extension “full” interposers in mid to late 2024. Do I believe that Intel will manage three new process nodes in two years? No. Saying more and staying polite? That’s difficult. So I’ll go back to AMD and TSMC for a bit.

Not interested in history skip to the next dashed line.

Going back into history, there have been long process nodes and short nodes. The reasons have little to do with the technology as such. Most companies (without fabs) base their moves from one node to the next based on cost. Period. They may have a new design they want to build, which takes away extra development costs, but say that you can expect 160% (or 60% more) from the next node today. Well, whenever your new design is expected to be running through the fab. If this is a “full node” you expect to get twice as many chips once the learning curve has done some of its work. Call this the normal case.

Sometimes all the capex for the current node can be used at the new node, and yields are expected to approach the present node yield. The old node won’t have much lifetime. A short node.

What if there are serious problems with getting yields above (say) 20%. That will make the current node a long node. In Intel’s recent experience, a very, very long node. (14 nm)

A lot of fab customers looked at the (so-called) 10 nm offerings and saw nothing but trouble. Double-patterning mostly. Double-patterning could be avoided with EUV, but that wasn’t really ready. What AMD and a lot of other companies did was to say, if we have to do double-patterning, jumping to 7 nm instead will cover all those pesky double-patterning costs. So they skipped 10 nm. I could produce lots of guesses as to the comparative costs, but… AMD worked with GF to produce a version of 14 nm with tighter layout macros. They called 12 nm. Whether the name matches anything is irrelevant. What was relevant was that it allowed AMD to make Zen+ parts for less than Zen parts, and with higher yields and clock speeds. (Higher yields? When redesigning the library it makes sense to get rid of the few macros that result in most of the rejects.)

What is really important here is that 7 nm has been a long node, along with TSMC’s named variants. Expect 5 nm to be an even longer node. Whether Zen 5 uses one of TSMC’s 5 nm variants that begins with a 4, or some AMD-only variant. Don’t expect Zen 5 to jump to N3. Way too big a risk. There may be a Zen 6 at the N3 node, or AMD may retire the Zen name. My guess is Zen 4 later this year, Zen 4c as a kicker next year, and Zen 5 in early 2024. Zen 6? Chrystal ball very cloudy, check again–in a year or two.

Could AMD use N3 for Zen 5? Big NO! based on corporate culture. Lisa Su doesn’t like to take risks, and AMD doesn’t need to take any right now. Could AMD design a Zen 6 alongside Zen 5, and Zen 5 have a very short market lifetime? The first part of that is easy. AMD is designing Zen 6 and putting the finishing touches on Zen 5 right now. Everyone in this business knows that you put the gun to your head and pull the trigger. Four years later, you find out if the gun went off.

Since I expect Zen 5 to use the same process, with perhaps a few tweaks, as Zen 4, the second part is that I expect Zen 4 to have a relatively short lifetime. All those things that didn’t go into Zen 4 will get crammed into Zen 5. Some will end up falling off the table, but most will get in. And since Zen 5 in addition to sharing a process with Zen 4 will share most of its detailed design, AMD could (possibly) have it ready to go by January. They won’t release it then, but the point I am making here is that there won’t be much fab work to do (hot lots, etc.) on Zen 5.

  • I decided that flagging Intel’s process names in bold is more useful than doing the same for their codenames. There are just too many Intel codenames, and recently they changed too fast. (Not Alder Lake, just everything before that. :wink:
1 Like

…And now for something completely different.

Intel is adding some new instructions to Sapphire Lake as part of AVX-512 that will be aimed at AI training. It will add a new set of 10 registers, and a way to configure the registers, and parts of registers as single precision, b16, and various sizes of integers. Then one instruction (well four integer variants and one floating-point) will multiply two matrices, assuming they fit in the space. If it doesn’t and you have a large matrix to multiply, you can set things up to multiply parts of matrices, then crunch away.

I was asked if it made sense to build a BLAS library to use these new instructions. My first answer was, “Why?” If you need to do operations on small matrices, the code your compiler, BLAS, or LAPACK uses is fine. If you have lots of small matrices of INT8, INT16, or B16 values? Your AI/machine learning system already deals with them. But if you are spending days on a single ML task? Uh, how many GPUs are you using? I don’t expect that doing any ML math on a CPU chip is going to win compared to high-end GPUs. And, yes, there are specialized systems sold for doing machine learning. I don’t expect anyone to replace them with a box of Sapphire Lake chips.

So who is going to use it? I see no point in Intel adding these instructions without also providing a BLAS to go with. Intel might, but they haven’t asked me to write it. :wink: If they do get one, and it supports Integer and Boolean arrays along with floating-point, I’m sure they will print benchmark results using it. Compared to what? A CPU without those instructions? A GPU?

It might be fun to write an emulation library that could compete with Sapphire Lake. For some definitions of fun. Word to the wise, I would write it in Ada. Why? I could start out with benchmarks for the existing code in Ada. Then I could deal with cases like INT8 time B16 etc.

Does anyone else have a clue about using these instructions? I’m going to crosspost to the Intel group. I very seldom crosspost.

Just to be clear… Power Via puts multiple layers of metal on the backside of the wafer to do the power routing. This means that somehow the front side lithography alignment marks are now visible on the backside? There is no reason they could not also use TSV’s (through silicon Vias) to move all the interconnect bumps to the same side of the wafer, but I don’t know if they do this or not.
Any thoughts?

Also, it is Sapphire Rapids, not Lake…


Also, it is Sapphire Rapids, not Lake…

Oops, my bad.

Power Via puts multiple layers of metal on the backside of the wafer to do the power routing. This means that somehow the front side lithography alignment marks are now visible on the backside?

Or you could put some TSVs (vias) through the wafer before litho to provide alignment marks? I don’t know, and I doubt there are all that many people inside Intel that know what it will finally look like.

My speculation was more about AMD using silicon that will be there anyway to route the power. AMD has an advantage right now in that, at least for non-APU Ryzen and EPYC chips, there is only one voltage needed on the CPU chiplets. They may at some point try to run different processors on different voltages, but I suspect that if they do that, it will be on a per chiplet basis. (I’m also expecting that if AMD makes an APU that has both Zen 4 and Zen 4c cores, they will be on separate chiplets.)

AMD using silicon that will be there anyway to route the power.
I can see this working for the top layer of power, but I can’t imagine getting the bump pitch small enough to do local routing to the transistors. I am not sure exactly how you see this working.
Intel also supplies the CPU die with a single voltage, and then does on chip regulation with a FIVR(fully integrated voltage regulator).

I can see this working for the top layer of power, but I can’t imagine getting the bump pitch small enough to do local routing to the transistors.

I’m assuming that the bulk of the top die will be effectively the ground plane. There will be a copper layer on top of the interposer to carry power. There will be vias, of course, to connect the ground to the bulk layer. I don’t see a need for finely-grained bumps on a per transistor basis. AMD might be able to use the copper to copper direct connect they are using for VCache. If there are dozens in effect connecting two copper surfaces, where the upper layer provides power to a row of transistors, one bad contact would not ruin a die.