AMD Advancing AI event

Running now, Lisa speaking.

Anyone else tuned in?

CPUs:

  • She tells a great story on Turin vs. Intel’s best
  • Turin servers should represent a 7x improvement vs. Intel-based servers on Cascade Lake coming up on retirement right now-- 130 Turn servers can do the work of 1K Cascade Lake… CIOs should like :slight_smile:

Amin Vedad from Google Cloud joins her on stage and:

  • “Epyc adoption within Google for customer and internal workloads driven by gen over gen improvements, substantial cost improvements on conventional server workloads”
  • Google “AI Hypercomputer” architecture coming up will rely on Epyc CPUs (maybe not exclusively?)
  • Turin-based VMs will be available next year on Google Cloud

Data Center GPUs

  • Data center accelerator TAM will grow to $500B in 2028, huge AMD growth opportunity
  • Last ten months, ROCm investment to enable faster customer adoption and rollouts, MI300x performanace has doubled (on inferencing) through software improvements on ROCm; Silo.AI team will help with optimization and customer adoption
  • Out of the box performance is solid but with some tuning consistently outperforms H100 by ~1.3x across wide variety of use cases
  • MI325X launching today, drops into existing platforms, 40% better inference performance than H200 on e.g. Llama, “Very competitive” training performance
  • Widespread availability in Q1 from most major server OEMs
  • Oracle Cloud Senior VP of Cloud Infra Karan Batta – Oracle deploying Epycs and Pensando DPU – Uber currently using AMD on OCI to serve their trip routing infrastructure plus the Oracle Exadata infrastructure, plus MI300x rollout, excited about future roadmap, customers have embraced the solutions from the partnership, launching Turin compute instances next year, continuing to scale out MI300x etc.
  • Databricks Naveen Rao “Data Intelligence Platform” – MI300x has delivered 50% improvement on performance on critical LLM workloads, ROCm now provides easy transition and scaling given the work in the last year, many models running on AMD HW with no modification, optimizing on many levels; looking forward to continued optimization in SW as new HW rolls out; on Databricks side working on new ML models and techniques

Roadmap:

  • Annual cadence of GPUs
  • MI350 details - 3nm, biggest generational leap in history for AI compute at AMD, CDNA4 will provide 35x performance uplift over CDNA3?, leading memory capacity and bandwidth – MI355x should be 9.2PF of compute and 1.6x uplift over MI250
  • Recorded interview with Satya and Lisa to discuss roadmap – “The scaling laws” are providing an a bundance of compute power – silicon, algorithms, etc. leading to 100x improvement for each 10x of HW compute power, really unprecedented pace of diffusion of this tech throughout the world… AMD/MS partnership goes back a LONG time but last 4 years of silicon and software work going after emerging workloads in AI have been hugely successful… In the future innovation, performance per dollar per watt is the ultimate metric (as well as “is world GDP improving”), getting into the feedbdesigack loop that will continue to provide the 100x benefit for 10x HW performance
  • Meta partnership: using Epyc and Instinct broadly, Meta VP of Infrastructure Kevin Salvadore in person, started designing Milan into the Meta infrastructure in 2020, scaling AI deployments, Genoa and Turin were key to innovating at scale, AMD’s next offerings will be key to deploying AI innovations at scale, every generation bringing broader deployments, Bergamo provided huge improvements, Meta now running 1.5M Epyc CPUs across the infrastructure etc.; MI300 adoption ramping fast, MI300X has served all Llama traffic in the latest rollouts, working on moving training workloads to MI300 – PyTorch, Triton and Llama workloads ; Instinct roadmap: “AMD is really good at listening to Meta as a partner”

AMD SVP, AI Vamsi Boppana

  • Progress with ROCm: Have created an open software ecosystem ready to develop and deploy workloads at scale in 2 years for best performance and support for broad ecosystem, many key open source projects incl models and frameworks etc. are being developed for AMD, compatibility validated nightly, work on AMD on Day Zero when released
  • Relentless emphasis on performance work - 2.4x performance vs. last year’s ROCm – improved algorithms, parallelization, model optimization etc., now using Silo.AI to close the last mile gap for customers – 300 AI experts incl. 125 AIPhDs, implementing end to end solutions for customers and have developed many European open source models on AMD HW
  • More partners: CEO of Reka.AI (multimodal AI from cloud to on-device), CEO of Fireworks AI (tools for productionizing GenAI, PyTorch maintainer), CEO of Essential AI (Ashish Vaswani, co-invented the Transformer, targeting knowledge work problems, excited about AMD ROCm and training stack), CEO of Lumalabs.AI (Amit – “Dream Machine,” creating videos from prompts to democratize video production) in a panel discussion-- all thanking AMD and lauding the rapid progress and overall results - per-device best in class performance, linear scalling, all the good stuff, inspirational yadda yadda :slight_smile:

…Trying to keep up with this but alas I need to go tend to real life for a minute. Would love to see an AI generated summary of the presentations at the event, must be some way to do that

Forrest Norrod co-presented with execs HW partners Dell, HPE (who announced new data center and edge products based on Turin and ‘more announcements in the next couple of months’), SuperMicro (Vik Malyala – ‘kick-a** solutions’ based on fully optimized all-AMD stacks and ease of migration through further partnerships, announcing new H14 products which support MI325X and Epyc 5th generation, benchmarks show up to “77%, no 73%, something like that” 4u liquid cooled or 8u air cooled systems, with best time to market, available for remote testing by customers today)…

Also, Lenovo with announcements for SMB and Tier 2 cloud service provider solutions , “Smarter AI for all” – 50% gowth in Lenovo AMD Tier 2 CSP last year, expecting 70% this year, also AMD CPUs are volume leader in Lenovo sales to CSPs… Spoke about a bunch of offerings for these segments, and then also slick AI PCs, Lenovo being Lenovo.

Lisa’s final topic: The AI PC esp. Ryzen AI Pro platform for enterprise client Copilot+ PCs. Honestly I’m not able to keep up at this point, and it’s nice to have as long as we hit data center and hyperscale.

On the whole I just see a lot of enterprise acceleration happening, which is great.

4 Likes

I am now! …

wanna start taking notes now that Norrod is on stage? I can’t keep up

Sorry, I kinda lost interest an hour ago. Tech details too sparsely dispersed among marketing speak. I’ll wait for your synopsis. :blush:

1 Like

I did the best I could with the notes above… The main takeaways seem to be

  1. Nothing too unexpected on roadmap, just reaffirmation that things are proceeding well, as expected
  2. Lots of OEM and software vendor and hyperscaler support including in core use cases e.g. Meta, Google Cloud, Databricks, Oracle
  3. ROCm compatibility and performance have received HUGE investment with big improvements and should yield major dividends from here. Table stakes for taking a shot at Nvidia, certainly, but great to see the emphasis and the validation that it’s working.
  4. The needed work to get customers and software/hyperscaler partners succeeding is going to get done, via Silo.ai investment – again, absolute necessity to get customer traction against Nvidia but great to see it happening.

So no news, probably, to move the needle on the stock vs. 24 hours ago, but no news of some shortfall that would tank the stock either. Progress is progress.

I wasn’t in a position to catch any further content that may have been streamed after this headline session. I know there were developer-oriented sessions later in the day.

3 Likes

Thanks. No bad news is good news as they say!