I’m trying to figure out if the announcement of Nividia’s Spectrum-X product enabling their GPUs to work better with Ethernet is good news or bad news for ANET. I’m not sure if it makes the needs for ANET’s products obsolete (it competes with and can replace ANET’s products) or if helps ANET and increases their TAM. Hoping someone with a better understanding of network architecture can help.
I would say it is good news for Anet. But NVDA also has a non ethernet solution also. But it doesn’t matter because Anet will work with either one. I am sure NVDA will give the specification for both products to Anet because they do not want to go into the router or networking business. Just like they do not want to go into the server business, so they gave the specifications to Dell and SMCI.
Andy
Well, InfiniBand is already a public specification. Nvidia’s products are the leaders in that space. So much so that some customers are hesitant to use it since there aren’t good second sourcing options. Nvidia bought Infiniband leader Mellanox back in 2019, as Huang was already seeing that networking bandwidth was going to be key for AI data flows. That was pretty expensive (they had to beat out other companies), but clearly was a very good acquisition (all cash, not stock, btw).
ANET doesn’t work with InfiniBand. They’re Ethernet only, as is just about everyone else.
Nvidia’s Spectrum-X products (Ethernet Switches, BlueField-3 SuperNIC, and LinkX transceivers and cables) are hardware products and unique to Nvidia, although compatible, as the name implies, with Ethernet. This is Huang pushing on multiple fronts. He’s already acknowledged that the upcoming “Ultra Ethernet” standard may be suitable for AI training workloads, and is even starting to put out networking products for companies that want to stay 100% Ethernet. The Ultra Ethernet consortium was created by Broadcom and other companies specifically to thwart the rise of Nvidia’s InfiniBand networking, so they’ve excluded Nvidia, but I suspect Nvidia can still adopt the standard when it’s time.
So, we see that Nvidia already is in the networking business. Matter of fact, Nvidia makes over 4 times as much money in networking alone as AMD does in all of its AI products, and 2 times as money in networking alone as ANET does in all its businesses.
Nvidia’s networking business is at a $13 billion annualized revenue run rate.
That all said, I don’t know much about how Nvidia impacts Arista. This video claims to present an understanding (and may be worth watching just to have networking and Arista’s business model explained to you), but in the end doesn’t say much about the competition:
I would think that long-term, ethernet is still here to stay and that companies coming out with Ultra Ethernet products are going to do well. That said, Artista’s revenue growth guidance is barely double-digits, so not the kind of growth company in which this board is interested.
Networking is a generic term. The OP was specifically asking about ANET. That is more in the router, switching equipment part of networking. Not routers or switches like Logitech makes but in the enterprise part of the business, which would compete with Cisco. So I do not know of any product that NVDA makes that competes with Anet.
Andy
Yes, I think it’s fair to say that almost nobody is going to look to Nvidia for networking solutions for non-AI data centers, which is where most of Artista’s business is today.
But, I also think it’s fair to say that Arista and Broadcom have their work cut out for them if they want to get a significant piece of the AI data center networking market. Considering the cost of H100s and the like, throttling those expensive GPUs with networking not designed nor really suited to AI training isn’t a smart business decision. This is where Ultra Ethernet wants to be, and will probably get there. I think the question remains whether it gets there in time, whether InfiniBand just keeps being better, or even whether Nvidia can grab a share of that AI data center ethernet pie with its Spectrum-X products.
Basically, Nvidia won’t take away Artisa’s existing business and target markets, but Artisa is going to have its work cut out for them to expand into Nvidia’s existing networking market (AI data centers for training).
The above article came out a few days ago that got me thinking about this. It appears that ANET is working to get a slice of the AI Networking pie. I was trying to figure out how realistic it is with Nividia’s products.
I agree with that, but I would say that the Ai training use routers, routers and switches are not part of that portion of the network. But when they move the data from the training environment out to other users network for inferencing or to use it with the end user they will still need the router and switch companies to perform that portion and that is where Anet will come in. So as long as they keep building Data Centers they will still need to keep deploying Anet. I am not invested in Anet, although for those who are it has been a very good investment, congrats. I agree with Smorg that it will not be where the fastest part of the growth will be, although it could still be a good investment.
Andy
Came across two article from Meta (aka Facebook) on their big Nvidia clusters and networking:
we built one cluster with a remote direct memory access (RDMA) over converged Ethernet (RoCE) network fabric solution based on the Arista 7800 with Wedge400 and Minipack2 OCP rack switches. The other cluster features an NVIDIA Quantum2 InfiniBand fabric. Both of these solutions interconnect 400 Gbps endpoints. With these two, we are able to assess the suitability and scalability of these different types of interconnect for large-scale training, giving us more insights that will help inform how we design and build even larger, scaled-up clusters in the future. Through careful co-design of the network, software, and model architectures, we have successfully used both RoCE and InfiniBand clusters for large, GenAI workloads (including our ongoing training of Llama 3 on our RoCE cluster) without any network bottlenecks.
So Artisa can be part of an Ethernet based solution for AI - although note the work Meta had to do to make this work.
A more recent Meta blog entry:
We optimized the RoCE cluster for quick build time, and the InfiniBand cluster for full-bisection bandwidth. We used both InfiniBand and RoCE clusters to train Llama 3, with the RoCE cluster used for training the largest model. Despite the underlying network technology differences between these clusters, we were able to tune both of them to provide equivalent performance for these large GenAI workloads
With this note:
We spoke in depth about our RoCE load-balancing techniques at Networking @Scale 2023.
So, it’s doable with a bunch of cutting-edge engineering work. For this to benefit Arista, what Meta has pioneered has to become more mainstream. That said, with Nvidia now offering both Ethernet and Infiniband solutions, it’ll be interesting to see if companies really want to get away from depending on Nvidia as much as possible. It’s going to be an interesting ride to watch
I have Arista as a large holding so am very interested in this. Thanks you for posting it!
AJ
This video, https://www.youtube.com/watch?v=wLQzaelC5PA was shared with me. It helped me understand Nvidia’s Spectrum product versus ANET’s products and whether it would open up opportunities for ANET. As background, I’ve owned ANET since 2015. It’s done well and has grown to a 23% position for me. I’m figuring out whether to trim, sell, or leave it as is. It’s been a good bet to leave as is over the years, but the valuation is high with a 51 PE and 15.7X EV/S with a growth rate in the mid-high teens. I might have been willing to leave it alone if Nvidia’s Spectrum product complimented ANET’s solutions, but the above video helped me see how it competes. The global AI Training data center build out with GPUs is a new opportunity for ANET, but Nvidia will give them intense competition. They will continue to do well in the traditional CPU ethernet market, but the AI buildout is less significant of an opportunity for them than I had hoped. ANET along with the Ultra Ethernet consortium need to build an ethernet design that optimizes how GPUs perform to effectively compete in the GEN AI training space, which they are working on. They will likely make a good product, but it doesn’t appear they have it yet. I will likely trim ANET and lock in some of these gains.
I don’t see where this is a problem for Anet. Anet if they chose could produce routers and switches that use Infiniband but obviously they choose to stay with Ethernet and Ultra-Ethernet. Since both are open source they could choose to change and go with both standards like Cisco has done. I am not sure how NVDA, getting into switches, is that big of a concern.
Andy
With individual Nvidia chips costing as much as a passenger car, and if what Huang says is true about time, I could see companies deciding to play it safe and stick with the Nvidia ecosystem for everything around the chip just to guarantee best performance with the least setup or downtime.
That said, I know there are companies with existing Ethernet infrastructure that would prefer to stick with that as they expand their AI capabilities.