AMD’s AI Chip Event: Everything Revealed in 8 Minutes – Video

Tachnologies

AMD’s AI Chip Event: Everything Revealed in 8 Minutes – Video

hintsmedia

December 7, 2023

AMD’s AI Chip Event: Everything Revealed in 8 Minutes – Video

AMD's AI Chip Event: Everything Revealed in 8 Minutes

Speaker 1: Good morning everyone. Welcome to all of you who are joining us here in Silicon Valley and to everyone who’s joining us online from around the world. So that’s why I’m so excited today to launch our instinct, MI 300 x. It’s the highest performance accelerator in the world for generative ai. MI 300 X is actually built on our new cDNA three data center architecture, and it’s optimized for performance and power efficiency. cDNA three has a lot of new features. It combines a new compute engine. [00:00:30] It supports sparsity, the latest data formats, including FP eight. It has industry leading memory capacity and bandwidth. And we’re going to talk a lot about memory today, and it’s built on the most advanced process technologies and 3D packaging. Now let’s talk about some of the performance and why it’s so great for generative ai. Memory, capacity and bandwidth are really important for performance.

Speaker 1: If you look at MI 300 x, we made a very conscious decision to add [00:01:00] more flexibility, more memory capacity, and more bandwidth. And what that translates to is 2.4 times more memory capacity and 1.6 times more memory bandwidth than the competition. Now, when you run things like lower precision data types that are widely used in LLMs, the new cDNA three compute units and memory density actually enable MI 300 X to deliver 1.3 times more tariff flops of FP eight and FP 16 performance than the competition. [00:01:30] And if you take a look at how we put it together, it’s actually pretty amazing. We start with four IO D in the base layer, and what we have on the IO DS are 256 megabytes of infinity cache and all of the next gen io that you need things like 128 channel HBM three interfaces, PCIE Gen five support our fourth gen infinity fabric that connects multiple MI 300 xs so that we get 896 gigabytes per second, and then we stack [00:02:00] eight cDNA, three accelerator chips or X CDs on top of the IO d, and that’s where we deliver 1.3 pet flops of FP 16 and 2.6 petta flops of FP eight performance.

Speaker 1: And then we connect these 304 compute units with dense through silicon vias or TSVs, and that supports up to 17 terabytes per second of bandwidth. And of course, to take advantage of all of this compute, we connect eight stacks of HBM three [00:02:30] for a total of 192 gigabytes of memory at 5.3 terabytes per second of bandwidth. That’s a lot of stuff on that. What you see here is eight MI 300 x gvu, and they’re connected by our high performance infinity fabric in an OCP compliant design. Now, what makes that special? So this board actually drops right into any OCP compliant design, which is the majority of AI systems today. And we did this for [00:03:00] a very deliberate reason. We want to make this as easy as possible for customers to adopt. So you can take out your other board and put in the MI 300 X instinct platform. And if you take a look at the specifications, we actually support all of the same connectivity and networking capabilities of our competition. So PCI Gen five support for 400 gig ethernet, that 896 gigabytes per second of total system bandwidth, but all of that is with 2.4 times [00:03:30] more memory and 1.3 times more compute server than the competition. So that’s really why we call it the most powerful gen AI system in the world.

Speaker 2: We architected rockham to be modular and open source to enable very broad user accessibility and rapid contribution by the open source community and AI community. Open source and the ecosystem are really integral to our software strategy. And in fact, really open is integral to our overall strategy. This [00:04:00] contrast with cuda, which is proprietary and close. Now the open source community everybody knows moves at the speed of light in deploying and proliferating new algorithms, models, tools, and performance enhancements. And we are definitely seeing the benefits of that in the tremendous ecosystem momentum that we’ve established. So I’m really super excited that we’ll be shipping Roku six later this month. I’m really proud of what the team has done with this really big release. Roku six has been optimized for [00:04:30] genai, particularly large language models, has powerful new features, library optimizations, expanded ecosystem support and increases performance by factors. It really delivers for AI developers. ROC M six supports FP 16 BF 16, and the new FD eight data pipe for higher performance while reducing both memory and balance needs. We’ve incorporated advanced graph and kernel optimizations [00:05:00] and optimized libraries for approved efficiency. We’re shipping state-of-the-art attention algorithms like flash attention to page attention, which are critical for performing OMS and other models.

Speaker 3: In 2021, we delivered the MI two 50, introducing third generation infinity architecture. It connected an epic CPU to the MI two 50 GPU through a high speed bus infinity fabric that allowed the CPU and the GPU to share a coherent memory space [00:05:30] and easily trade data back and forth, simplifying programming and speeding up processing. But today we’re taking that concept one step further really to its logical conclusion with the fourth generation infinity architecture, bringing the CPU and the GPU together into one package sharing a unified pool of memory. This is an A PU Accelerated processing unit, and I’m very proud to say that [00:06:00] the industry’s first data center, a PU for AI and HPC, the MI 300 a began volume production earlier this quarter and is now being built into what we expect to be the world’s highest performing system. And let’s talk about that performance 61 tariff flops of double precision floating point FP sixty four, a hundred and twenty two tariff flops, a single precision combined with that 128 gigabytes of HVM [00:06:30] three memory at 5.3 terabytes a second of bandwidth. The capabilities of the MI 300 A are impressive and they’re impressive too. When you compare it to the alternative, when you look at the competition, MI 300 A has 1.6 times the memory capacity and bandwidth of hopper for low precision operations like FP 16. The two are at parity in terms of computational performance, but where precision [00:07:00] is needed, MI 300 a delivers 1.8 times the double and single precision FP 64 and FP 32 floating point performance.

Speaker 1: So today I’m very happy to say that we’re launching our Hawk Point Rising 80 40 series mobile processors. And thank you

Speaker 1: Hawk Point combines all of our industry leading performance in battery life [00:07:30] and it increases AI tops by 60% compared to the previous generation. So if you just take a look at some of the performance metrics for the RYZEN 80 40 series, if you look at the top of the stack, so RYZEN 9 89 45, it’s actually significantly faster than the competition in many areas, delivering more performance for multi-threaded applications, 1.8 x higher frame rates for games, and 1.4 x faster performance across content creation applications. A very, [00:08:00] very special thank you to all of our partners who joined us today, and thank you all for joining us.