AMD using chiplets for chipsets

According to a report by, AMD’s multi-chip approach for X670 and X670E has similar advantages to AMD’s current chiplet architecture on Ryzen CPUs. With this approach, AMD can increase I/O expansion drastically while at the same time significantly reducing manufacturing costs. This would be impossible if AMD built single monolithic dies for X670 and X670E.

Here’s how AMD’s new chipset architecture works. The base chiplet for X670 and X670E is known as the Promontory 21 (PROM21) chipset, which is built by 3rd party supplier ASMedia. One of these chips comes in a 19x19mm FCBGA package featuring a maximum power rating of 7W.

That chip provides one PCIe 4.0 uplink connection to the CPU and two PCIe 4.0 x4 downlink controllers, for a maximum of eight PCIe 4.0 lane. It also supports four PCIe 3.0 x1/SATA 6Gpbs ports, six USB 3.2 Gen 2 10Gbps ports (two of which can be fused into a single 20Gbps port), and six additional USB 2.0 ports. For the SATA/PCIe 3.0 ports and USB 3.2 ports, the motherboard manufacturer can choose whether to opt for more SATA ports over PCIe ports or vice versa.

A single PROM21 chipset functions as AMD’s midrange B650 chipset. A trimmed down (harvested) version can also be used as an A600-series chipset, presumably A620. However, X670 and X670E take a different approach.

For X670 and X670E, a single PROM21 chiplet is only half of the equation, with two PROM21 chiplets linked together via daisy-chaining, effectively doubling the connectivity: double the USB ports, and double the PCIe 3.0/SATA 6Gbps ports. The exception is the downlink controller capabilities, which go from two PCIe 4.0 x4 downlink controllers on a single chiplet to three downlink controllers — one of the downlink controllers on the first chiplet connects to the uplink of the second chiplet.

While the broader connectivity options are great, one of the biggest benefits is likely to be production cost. It’s more cost effective to produce a single SKU that can be scaled up or down to meet the demands of other tiers, rather than trying to build two or more entirely separate chips. Like the Zen CPU chiplets in Ryzen 3000 and later processors, each with eight CPU cores that can be disabled in clusters of two, AMD can focus on mass producing PROM21 and using it across the entire suite of motherboard offerings. There’s no need to waste additional design and manufacturing resources, trying to predetermine the appropriate ratios of production.

Another positive side effect of daisy-chaining chipsets is that you can spread out cooling over a wider area. Instead of a single chipset that can draw a 15W or more, you get two 7.5W chips that are separated by perhaps several centimeters. This can save motherboard makers a lot of headaches when it comes to cooling X670 and X670E, and it should result in most X670 and X670E motherboards having passive cooling despite the PCIe 5.0 requirements, a great improvement from the actively cooled X570 chipset. However, we expect mini-ITX form factors to still have active cooling due to the physical constraints of such motherboards.…

1 Like

4 PCIe4.0 lanes from Zen 4 CPUs to possibly two chipset(s)?

That doesn’t quite make sense. Zen 4 (Raphael) will have 28 PCIe5.0 lanes, but the chipset only uses PCIe4.0. I guess that leaves room for a 770 chipset a year or three down the road.

If I were designing a motherboard, I’d provide one slot with 8 PCIe5 lanes and split the rest of the PCIe5 lanes into four PCIe5 x4 NVMe slots. But then again, I need lots of disk space, and I currently have two 1 Gig NVMe, plus four (various terabyte) SATA drives. Switching to the SATA drives feels like going from a superhighway to hubcap deep mud. I guess I need to find a PCIe gen 4 or 5 8-lane card that supports multiple NVMe drives.

Why do I need so much disk bandwidth? I have about a dozen benchmarking files in the several hundred-gigabyte size range. To keep the disk I/O from dominating benchmarks I need to copy the test data to (two) NVMe drives. The output isn’t all that fast, so a hard drive with a large cache can take it out of the (timing) picture. I run the benchmarks twice so that all of the OS stuff I need gets cached–but I’d really love to put it on an NVMe drive. (In production, this stuff gets run on a system with two 32-core EPYC CPUs and 128 Gig of memory…) I have no idea what the next dedicated system will look like, other than Zen 4 It will probably an EPYC Genoa system, but I can probably out-bench the current EPYC system–just another item to put on the stack. Anyway, disk I/O is currently the limiting factor, I remember when smaller datasets took months. (FFTs and inverse FFTs of seismic data.)

4 PCIe4.0 lanes from Zen 4 CPUs to possibly two chipset(s)?

That doesn’t quite make sense. Zen 4 (Raphael) will have 28 PCIe5.0 lanes, but the chipset only uses PCIe4.0.

I don’t think they’re planning on this chipset being used for 8 SATA drives being maxed out sending data to the CPU.
For most people (even most server usage) they only are maxing out a single SATA drive at a time.
So x4 PCIE4.0 would be enough bandwidth to service that.
I think usage models that need more than that (which probably is mostly fileserver machines) are probably going to have separate SATA controllers connected to the other PCIE lanes.