Nvidia CEO Jensen Huang unveils next-gen ‘Blackwell’ chip family at GTC

Nvidia CEO Jensen Huang on Monday presided over the AI chipmaker’s first expertise convention held in individual for the reason that COVID-19 pandemic, the GPU Expertise Convention, or GTC, in San Jose, California, and unveiled the corporate’s new design for its chips, code-named “Blackwell.”

Many take into account GTC to be the “Woodstock of AI” or the “Lalapalooza of AI.” “I hope you notice this isn’t a live performance,” Huang mentioned following large applause on the outset. He referred to as out the huge assortment of companions and clients in attendance.

“Michael Dell is sitting proper there,” Huang mentioned, noting the Dell founder and CEO was within the viewers.

Huang emphasised the dimensions of computing required for coaching massive language fashions of generative AI, or, GenAI. A mannequin that has trillions of parameters, mixed with coaching knowledge that’s trillions of “tokens,” or word-parts, would require “30 billion quadrillion floating level operations,” or 30 billion petaFLOPS, Huang famous. “Should you had a petaFLOP GPU, you would wish 30 billion seconds to go compute, to go prepare that mannequin — 30 billion seconds is roughly 1,000 years.”

“I might love to do it sooner, however it’s price it — that is normally my reply,” Huang quipped.

Nvidia’s H100 GPU, the present state-of-the-art chip, delivers on the order of two,000 trillion floating-point operations per second, or, 2,000 TFLOPS. A thousand TFLOPS is the same as one petaFLOP, ergo, the H100, and its sibling, H200, can handle solely a few petaFLOPS, far under the 30 billion to which Huang referred.

“What we want are larger GPUs — we want a lot, a lot larger GPUs,” he mentioned.

Blackwell, recognized within the business as “HopperNext,” can carry out 20 petaFLOPS per GPU. It’s meant to be delivered in an 8-way system, an “HGX” circuit board of the chips.

Utilizing “quantization,” a form of compressed math the place every worth in a neural community is represented utilizing fewer decimal locations, referred to as “FP4,” the chip can run as many as 144 petaFLOPs in an HGX system.

The chip has 208 billion transistors, Huang mentioned, utilizing a customized semiconductor manufacturing course of at Taiwan Semiconductor Manufacturing often known as “4NP.” That’s greater than double the 80 billion in Hopper GPUs.

Blackwell can run massive language fashions of generative AI with a trillion parameters 25 instances quicker than prior chips, Huang mentioned.

The chip is known as after David Harold Blackwell, who, Nvidia relates, was “a mathematician who specialised in recreation principle and statistics, and the primary Black scholar inducted into the Nationwide Academy of Sciences.”

The Blackwell chip makes use of a brand new model of Nvidia’s high-speed networking hyperlink, NVLink, which delivers 1.8 terabytes per second to every GPU. A discrete a part of the chip is what Nvidia calls a “RAS engine,” to take care of “reliability, availability and serviceability” of the chip. A group of decompression circuitry improves efficiency of issues similar to database queries.

Amazon Internet Companies, Dell, Google, Meta, Microsoft, OpenAI, Oracle, Tesla, and xAI are amongst Blackwell’s early adopters.

Like its predecessors, two Blackwell GPUs might be mixed with one in every of Nvidia’s “Grace” microprocessors to provide a mixed chip, referred to as the “GB200 Grace Blackwell Superchip.”

Thirty-six of the Grace and 72 of the GPUs might be mixed for a rack-based pc Nvidia calls the “GB200 NVL72” that may carry out 1,440 petaFLOPS, getting nearer to that billion petaFLOPs Huang cited.

A brand new system for the chips, the DGX SuperPOD, combines “tens of 1000’s” of the Grace Blackwell Superchips, boosting the operations per second much more.

Alongside Blackwell, Nvidia made a number of further bulletins:

New generative AI algorithms to reinforce its current library of semiconductor design algorithms often known as “cuLitho,” referring to photolithography used within the semiconductor design course of. The GenAI code generates an preliminary “photomask” for lithography, which might then be refined by conventional strategies. It accelerates design of such photomasks by 100%. TSMC and chip-design software program maker Synopsys are implementing cuLitho and the brand new GenAI capabilities into their applied sciences.

A brand new line of community switches and community interface playing cards primarily based on the InfiniBand expertise developed by Nvidia’s Mellanox operation, the “Quantum-X800 Infiniband,” and the ethernet networking normal, the “Spectrum-X800 Ethernet.” Each applied sciences ship 800 billion bits per second, or 800Gbps. Nvidia says the switches and NICs are “optimized for trillion-parameter GPU computing” to deal with the velocity of floating-point operations of the chips.

A catalog of 25 “microservices,” cloud-based software container providers software program, pre-built for particular person functions, together with customized AI fashions, constructed on high of Nvidia’s “NIM” container software program suite, which is in flip a part of the corporate’s AI Enterprise software program providing. The packages are what the corporate describes as a “standardized path to run customized AI fashions optimized for Nvidia’s CUDA put in base of a whole lot of thousands and thousands of GPUs throughout clouds, knowledge facilities, workstations and PCs.” The microservices embody a bundle of life sciences-focused, some devoted to “generative biology” and chemistry and “molecular prediction” duties, to carry out “inference,” the technology of predictions, “for a rising assortment of fashions throughout imaging, medtech, drug discovery, and digital well being.” The microservices are made accessible via Dell and different distributors’ programs, via public cloud providers together with AWS, Google Cloud, Microsoft Azure, and Oracle Cloud Infrastructure, and they are often trialed on Nvidia’s personal cloud service.

Earth-2, a separate microservice designed as a “digital twin” simulation of utmost climate circumstances, meant to “ship warnings and up to date forecasts in seconds in comparison with the minutes or hours in conventional CPU-driven modeling.” The expertise is predicated on a generative AI mannequin constructed by Nvidia referred to as “CorrDiff,” which might generate “12.5x larger decision photos” of climate patterns “than present numerical fashions 1,000x quicker and three,000x extra vitality effectively.” The Climate Firm is an preliminary person of the expertise.

Along with the product and expertise bulletins by itself, Nvidia introduced a number of initiatives with companions:

A collaboration with Oracle for “sovereign AI” to run AI packages regionally, “inside a rustic’s or group’s safe premises.”

A brand new supercomputer for Amazon AWS constructed from DGX programs operating the Blackwell chips, referred to as “Ceiba.”

A partnership with Google Cloud to increase the JAX programming framework to the Nvidia chips, “widening entry to large-scale LLM coaching among the many broader ML group.”

Extra information might be discovered within the Nvidia newsroom.

You possibly can catch the complete keynote deal with on replay on YouTube.