Astera Labs presents at recent tech conferences

Astera Labs recently had discussions and presentations at three separate conferences in early December with UBS, Barclays, and Raymond James.

There was quite a lot of new information in these short presentations, including two of the bigger announcements,

  1. Astera replaced Broadcom in the UALink (Ultra Accelerator Link) consortium which is a competing standard against Nvidia’s NVLink. This group includes Microsoft, Google, Amazon, and AMD among others. Astera noted that the hyperscalers almost never join these type of open standards groups, but this shows how critical the emerging standards are that are coming
  2. Astera plays a key role in Amazon’s new Trainium-2 chip, and mentions each chip has 128 of our PCIe-based AECs. They’ve mentioned repeatedly that hyperscalers are “continuing to double up on their own ASICs and accelerators that they are building”. Astera says they “have close to 400 design wins that are for the internal accelerator based platforms”

On the UALink group they said this,


Here is a less than a minute intro to the AWS Tranium-2 chip,

In the middle of the video they say they are deploying the chip with a “Petabit scale network fabric” which I believe is Astera’s Scorpio. Some shipments of these chips are already over 100,000 chips in size connected and being installed in their data centers.

From the conferences,


Some other notes from the conferences include,

  • hyperscalers starting to become more vertically integrated and doing their own chips, lots of announcements from them
  • pure play AI company, more than 80% of business comes through AI deployments through Nvidia, AMD, and internally developed ASICs that hyper scalers are developing
  • GPUs today are only 50% utilized, customers writing big checks to Nvidia but half the time the chips is collecting dust, Astera solving this problem and lots of opportunity to grow the business
  • developing products like our fabric devices that are becoming more central to the AI deployment
  • higher dollar content opportunities per GPU, dollar content is growing significantly generation over generation
  • on Nvidia systems only Astera only plays in the front-end network, GPU → CPU or storage and networking, while the backend where NVLink works Astera doesn’t play here
  • non-Nvidia systems they play on both sides of front end and back end, such as in the Trainium-2 chip
  • focus on four protocols: UALink GPU to GPU, Ethernet, CXL for memory, and PCI Express for interconnecting storage and networking
  • Scorpio is the industry’s first fabric device that is developed for AI interconnects
  • greenfield use case in the non-NVIDIA ecosystem where Scorpio X-series
  • comparing to competitors Astera offers a module form factor, meaning it’s not just the chip
  • Taurus line which competes with Credo does the work as a chip while Credo built a complete cable, “Credo obviously did a good job, recognizes market, and their approach was to be a cable - a complete cable”
  • ASP tends to be different (comparing to Credo), “so you’ll see that reflected in what Credo announces and what we announced. But generally our business is more profitable”, approach is much more scalable and portable
  • Leo CXL product line is being deployed on the general compute side for large database applications, have all four major hyperscalers in the US developing CXL based platforms right now
  • “We are also seeing a lot of inference use cases, where CXL benefits”
  • our chips are software defined meaning 60-70% of the chip is actually implemented in software, the benefit of a software defined architecture is that it is customizable
  • systems have become very complicated, need to monitor them in terms of telemetry, diagnostics, fleet management, and predictive failure
  • hyperscalers use COSMOS API to monitor their infrastructure, detect failures before they happen
  • we don’t discuss unreleased products, not revealing anything right now… “it’s amazing how much energy there is, how much traction and engagement we have as we grow our product lines”
  • think of the company as “heterogeneous compute”, connectivity fabric, or connectivity subsystem, nervous system for AI is different
  • hyperscalers starting to get more vertically integrated, meaning they are doing their own chips
  • have close to 400 design wins that are for internal accelerator based platforms
  • “I’ve always said I think investors have underestimated how the hyperscalers have reacted in terms of investing in their own accelerator programs… you saw that in our numbers”
  • non-NVIDIA systems all use variants of PCI Express or other standards, “very, very fertile ground”
  • our content is getting much richer
  • our vision is to own connectivity at the rack level
  • as you move to more complex protocols and faster protocols we’ll see an ASP increase
  • 12B TAM for products (some of the products they are the far away leader so getting more of the TAM per that category)
  • just using Astera systems hyper scalers can get for free going from 52% GPU utilization up to 55-56% by optimizing, CEO says customers point this out to him
  • power envelope and physical size envelope to contend with, have to provide more power to compete
  • AI servers that land in the data center only work 69% of the time on day one, 31% need some tweaking, that’s how complex the systems are
  • One single chip will have like 8 different temperature sensors. We can detect if a fan stops working in one corner of a chip, a cable is inserted by it’s not fully inserted. Something else is heating up around our chip” (they have the ability to troubleshoot thermal issues)
  • GPUs in a cluster act like one GPU, one goes down the data goes back to the previous checkpoint, it takes 45 minutes today meaning each time something goes wrong, 45 minutes of compute is lost
  • three point formula for company, 1) listen to customers 2) innovate 3) execution
  • Blackwell created a lot of confusion for folks that are not familiar with Astera or sockets like retimers, Hopper generation was more simple to analyze
  • Astera in customized version of NVL racks and that is where content from retimers and fabric devices comes from, Amazon customized GB200 servers that were showcasing, which is Astera’s design win
  • categorically noted our retimers shipments will bigger in 2025 than 2024
  • Scorpio to be at least 10% of overall revenue next year, CXL to get to production next half of year, customized NVL racks to show up in the 2nd half of the year
  • CEO, “I think the market will need to learn more about how the systems are configured and how the retimer business is working” (implying here investors/analysts still don’t understand their growth)
  • gen5 of PCIe product for retimers, Astera has 90%+ market share
  • retimer content in Blackwell does go down, but more than offset when adding Scorpio content, was telling market overall content would go up before Scorpio was released, looking at overall retimer market including ASICs the overall opportunity goes up
  • customers actually coming to us and saying we want you to build the fix
  • Scorpio, hoped they would be faster than Broadcom which worked out, first in market and worked very hard to build a better product, do expect Broadcom announcements later
  • Taurus still niche, have a lead customer
  • COSMOS software can update the firmware on the cable without brining down the server
  • Leo CXL working with Granite Rapids CPU from Intel, Turin from AMD, equivalent ARM CPUs
  • ROI from LEO is very clear, lots of excitement from customers
  • Scorpio, market is moving so fast that we identified a sweet spot, and are addressing multiple opportunities
  • will maintain margin model of 70%+, analyst suggests Scorpio is a higher margin product
  • at some point as data rates go up, will intersect with optics, huge existing running market for optics and mostly in exploration in this category

One of the most impressive aspects about Astera Labs is each time they have an earnings or any sort of presentation, there is new information revealing even more upside to the company. It is simply incredible Astera removed Broadcom from the UALink consortium and is now the clear leader in defining the standards that hyperscalers will use for building out their next generation of chips such as Amazon’s Trainium-2. I do believe that Astera can out innovate Broadcom and has the possibility to reach or surpass the scale of Broadcom which would present huge upside to investors.

58 Likes

That’s a nice spin attempt, but the real reason for UALink is to avoid making Nvidia the sole supplier for networking. The hyperscalers want to reduce costs, and that means multiple suppliers.

Dylan Patel had an interesting side comment in the BG2 episode discussed in the other thread here (start at 1:15:22). It’s that Amazon’s Tranium-2 chip is an “Amazon Basics” for AI, and Amazon’s tech requires active cabling instead of passive cabling. And Astera Labs is a leader in active cabling, so that’s why we hear so much about internal ASIC Accelerator development and Amazon in particular as Astera’s design wins. Amazon is OK paying for active cabling since they’ve got their per chip down to about $5k versus $30k and up for Nvidia.

What they’re really saying here is that companies that choose to stay within the Nvidia ecosystem (eg complete racks) get Nvidia’s NVLink connectivity. So no or little UALink there. But, Astera Labs still gets the retimer business within Nvidia (even though that’s shrinking with Blackwell over Hopper I wouldn’t be surprised to see volumes higher overall). Hence, Astera makes more on the “ASIC-based platform” (such as Amazon’s Tranium) than on the Nvidia platforms.

Now, don’t get me wrong - I actually like ALAB as an investment and have been increasing my ownership. This isn’t a winner-take-all market, and both Nvidia selling chips/systems as well as the hyperscalers doing their own designs for internal use will grow. So, Astera Labs grows either way.

45 Likes

Where is the spin if Astera is working with the hyperscalers for a competing standard against NVLink?

Here’s what you wrote in the previous ALAB thread about NVLink,

So what changed in the past three months where you are now saying it won’t be a winner take all for networking? The stock price of ALAB has tripled since you were making the case there would be no challenges to NVLink.


Dylan spent most of the time on the pod making a case for a winner take all market with Nvidia as he was repeating Nvidia is 98% of the market for public AI. He mentions that Google has TPUs which account for a big part of the market but failed to mention all the hyperscalers are working on custom ASICs.

For an hour and half podcast which was supposed to be a deep dive on the competition it seems any competition was downplayed whether it was AMD, or Amazon’s Tranium. He basically said AMD has zero chance, and Tranium only has a cost advantage. There was no mention of UALink, Astera, or Credo. His view point is far from a comprehensive view of the ecosystem and what competition is emerging.

As the CEO said, “investors have underestimated how the hyperscalers have reacted in terms of investing in their own accelerator programs”, and “I think the market will need to learn more about how the systems are configured”

28 Likes

From your OP in this thread, from the conference:

When you buy Nvidia boards, you get Nvidia networking within that board, whether it’s multiple GPUs or a combination of a GPU with a CPU. Astera doesn’t play there - only Nvidia controls that, obviously, and it’s going to stay that way. If you buy an Nvidia-based rack (eg NVL36 or NVL72), internal to that rack is all Nvidia networking.

Where UALink can come into play is between racks. Re-read what you quoted from me in that context and I hope it makes sense.

Where the spin comes in is the claim that this technology is so critical even the hyperscalers are joining. Again, the real truth is that everyone is scared of being beholden to Nvidia for not just the chips, not just the software, but also the server to server networking. Nvidia currently has a lock on the chips and software, so networking is one place an alternate standard could survive, if its performance is good enough.

He discussed Google and Amazon quite a bit, pointing out that Google and Microsoft are using their chips mostly for their internal AI applications, as that’s their largest profit margin opportunity. And with Amazon, he talks about how Anthropic is building a 400,000 Tranium chip super-computer. And he briefly mentions Meta’s and Apple’s custom ASIC development - saying that some of these won’t hit until 2026 and even then could be failures (like Microsoft’s he says without providing details). Patel makes the case that Broadcom will be the biggest winner when it comes to custom ASICs because they all will need networking and only Broadcom can challenge Nvidia’s NVSwitch.

He also points out that Google is currently dependent on Broadcom for networking for its TPUs. But he thinks Google is trying to have an alternative there as well.

Yeah, unfortunately the interviewers didn’t skew questions to the “inside baseball” level given the time constraints they had.

15 Likes

What you are writing here is in direct contradiction with what is said in the conference. As I explained on that bullet point for the Nvidia based systems they have content in the “front-end network”.


Astera is the one that is using the term “critical” and mentioned that hyperscalers typically don’t these join types of consortiums. It seems pretty obvious they are creating the consortium to counter Nvidia in the networking segment.

It is unclear to me why you have any position in Astera if you view the company as spinning a false narrative?

If Broadcom is the clear winner here, why did Astera replace them in the UVLink consortium? Again, if you agree with this viewpoint he has about Broadcom being the big winner for hyperscaler ASICs, it makes no sense to be investing in Astera.

Dylan’s site has articles on Astera on there and he was brought in as the expert on public market semiconductor companies. The original topic of discussion was what’s going on with the public markets and semi conductor industry competition. Astera is one of the most hyped IPOs in the semi industry this year, and Credo just posted a phenomenal guide, so it’s a big oversight they got no mention on the podcast. I’m not buying that an hour and a half was not enough time as they let Dylan speak uninterrupted for a lot of it. Unfortunately a lot of times guests like this just talk their own book which is probably Nvidia, Google and Broadcom.

14 Likes

Yeah, for the retimers, not for the kind of networking we’re talking about with NVSwitch, UALink, etc.

If I avoided all companies that hype up the product and company position I wouldn’t be able to own shares of any company.

I can’t find a second source for this claim. Do you have one?

Fine. I’m buying it. They liked letting Patel talk, as he was saying interesting things, but by the time they started talking beyond Nvidia, there was almost no time left.

6 Likes

Broadcom removed themselves. Note, all the hyperscalars are developing custom ASICs with Broadcom, because BRCM has the best SERDES today. Broadcom thus has little incentive to lead a standard that will attempt to (in practice) reduce their moat.

That won’t preclude them from developing chips (or IP) that meets the standard. They just have a bit of a conflict of interest.

26 Likes

I’m less sanguine. I do think Astera can beat Broadcom in a few products specific to CPU/GPU interconnects and even some server to server active cabling, but, just like Arista beats Cisco in its relatively narrow target market, Broadcom will continue to have a wider variety of products in a wider variety of markets. For Astera to branch out to cover what Broadcom does is a tall order. Possible, sure, but would take several years, with numerous risks along the way.

I do agree with @MFChips assessment that Broadcom dropping out of the UALink consortium after being one of the founding members is indicative of them increasing their offerings, not lessening. It’s actually a pretty savvy strategy - help start the consortium to get people aligned with an alternative standard to Nvidia, then drop out to out-compete in that newly established standard.

Of course, the really savvy player in all this was Jensen Huang buying Mellanox back in the day when he realized AI compute was coming and needed better networking. Here’s a contemporaneous article:

“Datacenters are the most important computers in the world today, and in the future – as the workloads continue to change triggered by artificial intelligence, machine learning, data analytics and data sciences – future datacenters of all kinds will be built like high performance computers,” said Huang.
“We believe that in future datacenters, the compute will not start and end at the server, but the compute will extend into the network. And the network itself, the fabric, will become part of the computing fabric. Long-term, I think we have the opportunity to create datacenter-scale computing architectures; short-term, Mellanox’s footprint in datacenters is quite large. […] We will be in position to address this large market opportunity much better,” he said.

Of course, those were the days when “AI” wasn’t yet the magic buzzword.

16 Likes

Like I wrote above Astera is already beating Broadcom to market in products like Scorpio where they expect Broadcom to enter the market but they haven’t yet.

Again I would ask why you have any position in Astera if you keep making bear cases against them in the thread?

How is it savvy to start the consortium and then drop out as some part of a master strategy? They no longer have as much say about the way the standards are going to go, and Astera will have more influence than Broadcom.

It doesn’t seem like you are applying the same scrutiny to Nvidia that you are applying to Astera where it’s only optimistic outcomes for Nvidia and bearish outcomes for Astera.

You’ve warned against buying into the hype or spin on Astera, but just recently you were writing there won’t possibly be any competitors to NVLink which is buying into the Nvidia hype.

11 Likes

There are nuances in my postings and positions that you’re apparently overlooking. Just as I made money in Arista without ever thinking they were going to be as big as Cisco, it’s possible I could make money in Astera without them getting as big as Broadcom.

I already stated why I think it’s a savvy move: getting the consortium was important to get people seeing there could alternatives to Nvidia networking and to get some technical solutions agreed upon (standards are important, even Nvidia’s networking adheres to the InfiniBand standard). But, now that UALink is seen as a viable alternative, Broadcom wants to be free to offer products that are UALink+ (not a real trademark). Kind of like what Microsoft tried with HTML5 back in the browser war days.

Maybe re-read my posts for the subtleties. Saying Astera is unlikely to become as big as Broadcom (the 9th most valuable company today!) any time soon is not bearish, as you’re making it out to be.

I thought I explained this previously. And one needs to be careful about terminology (which I admit sometimes I’m not). This post you keep harping on about was trying to say that the networking on the boards Nvidia produces will always be the networking Nvidia chooses, and there’s no competition for that.

That said, Nvidia’s NVSwitch products are certainly something that can, and does have competition with UALink. And we have to be careful not to confuse UALink with Ultra Ethernet, latter which competes with InfiniBand. And competion like CXL is just not relevant anymore with its 4096 node limit in a world of 100K AI nodes today, lol.

Your “Nvidia hype” seems something you’re drreaming up - they know well about competition, and have products like Spectrum-X, which is their ethernet based solution. A little discussion of this is here:

While InfiniBand currently has the edge in the data center networking market, but several factors point to increased Ethernet adoption for AI clusters in the future…
By 2028, it’s estimated that: 1] 45% of generative AI workloads will run on Ethernet (up from <20% now) and 2] 30% will run on InfiniBand (up from <20% now).

It would be interesting to know just how dependent on Amazon’s ASICs Astera Labs’ business is today. Not just how close to the 10% disclosure threshold, but also what other ASIC developments are using - or intending to use - Astera. It appears Amazon sees shifting costs to Astera as overall cheaper than doing what Nvidia and perhaps other ASIC developers are doing. Is that something specific to Amazon (and its investment in ALAB), or is it something that others will adopt as well? This kind of chip development inside baseball is really hard for us outsiders to get a good handle on. Dylan Patel’s comments on the BG2 podcast (and I assume his paid service) are probably the closest we can get.

22 Likes

Nvidia defines NVLink as: "a 1.8TB/s bidirectional, direct GPU-to-GPU interconnect that scales multi-GPU input and output (IO) within a server. " (my emphasis).

Indeed, if you watch Jensen’s CEO keynote presentation, you see that NVLink is built into both the highest end NVLink72 (aka NVL72) and the new lowest end “Digits”:

NVL72:
72 Blackwell GPUs connected via 18 NVLink Switches, also has 72 Connect-X NICs (NIC = Network Interface Card)

Project Digits:
An ARM-based CPU and a Blackwell GPU connected internally via an NVLink on a single board (GB10). For $3,000

And today, for just one example, there are 4 NVSwitches in the HGX H200 (8 GPUs) - on the motherboard. Picture here:

These are all on-board chips, so if you’re not happy with how the CPUs and GPUs are connected internally, you’re building your own boards with whatever networking chip you want instead.

OK, that’s NVLink, what about NVLink Switch (often just called NVSwitch)? From Nvidia:

“The NVIDIA NVLink Switch chips connect multiple NVLinks to provide all-to-all GPU communication at full NVLink speed within a single rack and between racks.”

So if you didn’t want to use Nvidia networking, you instead use UALink within a rack and Ultra Ethernet between racks. Getting Ultra Ethernet as the interface between racks can help data centers standardize on ethernet for all connectivity, but that doesn’t say anything about UALink - you could still use NVLink internally (scale up) and Ultra Ethernet externally (scale out). So, why UALink?

UALink has an advantage is that it supports more nodes (1024) than NVLink currently does. NVLink only supports 18, but with NVSwitches on-board (as NVL72 and NVL36 do), that combines to 576, so just about half of UALink. The question becomes how much do you do in a scale-up versus a scale-out? If you’re trying for a 100k GPU cluster, what’s the right balance? That’s beyond my knowledge.

But, it strikes me that the increased node limit for UALink better suits the non-NVidia, internally developed ASIC chips that Amazon, Google, Microsoft, etc. are attempting. Those aren’t as fast as Nvidia’s, so you need more of them, so maybe they’re going to build servers with with lots more non-Nvidia chips using UALink.

As Astera Labs, I think it’s important to note that their retimers work with both NVLink and UALink, although the fewer GPUs you have and the faster the native connectivity between them, the need for retimers is reduced. On the other hand, as data centers scale-out, the distances between servers/racks is increased and so the need for retimers increases. So, larger AI data centers and/or AI data centers using more GPUs because each is slower will use more retimers.

Here’s a recentish article on the new HGX B200 systems built with Blackwell chips:

These have NVSwitches on board, as well as 3 or 4 Astera Labs’ retimers.

But, it’s worth noting that SemiAnalysis says of the NVL72/NVL36 systems:

By having the ConnectX-8 ASICs extremely close to the GPUs, that would mean there is no need for retimers between the GPU and the ConnectX-8 NIC. This is unlike the HGX H100/B100/B200 which requires retimers to go from the HGX baseboard to the PCIe switch.

This is from where the concern about Astera Labs’ retimer business comes - with the redesign of the boards containing Blackwell chips for the top of the line Nvidia servers eliminating the need for Astera Labs’ retiimers, that business may go down. I guess it depends on the product mix, and what other server companies do when they build their own boards with Blackwell chips.

Separately, here’s an article on Astera Labs’ Scopio PCI-e switch, aimed at competing with Broadcom:

We were told that the Astera Labs Scorpio PCIe switches are sampling, and we were able to get actual photos of the devices. Our best sense is that they will hit the market in 2025, but these are certainly a step beyond just an idea. For Broadcom, this is another multi-billion dollar market that Astera Labs is pushing into.

For me, Astera Labs remains an evolving story. Retimers use within a server may go down, but increased use of PCI-e connections between servers will probably need retimers. And Astera is doing more with PCI-e, going into the switch business, so there’s potential expansion there. In the meantime, those developing their own in-house ASIC GPUs seem to be relying more on Astera Labs retimers and swtiches even as Nvidia seems to be becoming less reliant. Where the mix ends up is still, I believe, and unknown. I’d love to hear from anyone (@CMF_muji ?) with insights on this.

I still have a small position in ALAB while I analyze it further.

34 Likes