Goto

Collaborating Authors

 superchip


SuperOffload: Unleashing the Power of Large-Scale LLM Training on Superchips

Lian, Xinyu, Tanaka, Masahiro, Ruwase, Olatunji, Zhang, Minjia

arXiv.org Artificial Intelligence

The emergence of Superchips represents a significant advancement in next-generation AI hardware. These Superchips employ a tightly coupled heterogeneous architecture that integrates GPU and CPU on the same package, which offers unprecedented computational power. However, there has been scant research investigating how LLM training benefits from this new architecture. In this work, for the first time, we study LLM training solutions based on offloading for Superchips. We observe important differences between Superchips and traditional loosely-coupled GPU-CPU architecture, which necessitate revisiting prevailing assumptions about offloading. Based on that, we present SuperOffload, a Superchip-centric offloading system that simultaneously uses Hopper GPU, Grace CPU, and NVLink-C2C interconnect more efficiently. SuperOffload accomplishes this via a combination of techniques, such as adaptive weight offloading, bucketization repartitioning, Superchip-aware casting, speculative execution, and a highly optimized Adam optimizer for Grace CPUs. Our evaluation of SuperOffload on NVIDIA GH200 demonstrates up to 2.5x throughput improvement compared to state-of-the-art offloading-based systems, enabling training of up to 25B model on a single Superchip while achieving high training throughput. We also extend SuperOffload with ZeRO-style data parallelism and DeepSpeed-Ulysses sequence parallelism, enabling training of 13B model with sequence lengths up to 1 million tokens on 8 GH200 while achieving 55% MFU.


Isambard-AI: a leadership class supercomputer optimised specifically for Artificial Intelligence

McIntosh-Smith, Simon, Alam, Sadaf R, Woods, Christopher

arXiv.org Artificial Intelligence

Isambard-AI is a new, leadership-class supercomputer, designed to support AI-related research. Based on the HPE Cray EX4000 system, and housed in a new, energy efficient Modular Data Centre in Bristol, UK, Isambard-AI employs 5,448 NVIDIA Grace-Hopper GPUs to deliver over 21 ExaFLOP/s of 8-bit floating point performance for LLM training, and over 250 PetaFLOP/s of 64-bit performance, for under 5MW. Isambard-AI integrates two, all-flash storage systems: a 20 PiByte Cray ClusterStor and a 3.5 PiByte VAST solution. Combined these give Isambard-AI flexibility for training, inference and secure data accesses and sharing. But it is the software stack where Isambard-AI will be most different from traditional HPC systems. Isambard-AI is designed to support users who may have been using GPUs in the cloud, and so access will more typically be via Jupyter notebooks, MLOps, or other web-based, interactive interfaces, rather than the approach used on traditional supercomputers of sshing into a system before submitting jobs to a batch scheduler. Its stack is designed to be quickly and regularly upgraded to keep pace with the rapid evolution of AI software, with full support for containers. Phase 1 of Isambard-AI is due online in May/June 2024, with the full system expected in production by the end of the year.


Nvidia: what's so good about the tech firm's new AI superchip?

The Guardian

The chipmaker Nvidia has extended its lead in artificial intelligence with the unveiling of a new "superchip", a quantum computing service, and a new suite of tools to help develop the ultimate sci-fi dream: general purpose humanoid robotics. Here we look at what the company is doing and what it might mean. The main announcement of the company's annual develop conference on Monday was the "Blackwell" series of AI chips, used to power the fantastically expensive datacentres that train frontier AI models such as the latest generations of GPT, Claude and Gemini. One, the Blackwell B200, is a fairly straightforward upgrade over the company's pre-existing H100 AI chip. Training a massive AI model, the size of GPT-4, would currently take about 8,000 H100 chips, and 15 megawatts of power, Nvidia said – enough to power about 30,000 typical British homes.


Nvidia's Blackwell AI 'superchip' is the most powerful yet

New Scientist

Nvidia has unveiled a "superchip" for training artificial intelligence models, the most powerful it has ever produced. The US computing firm, which has recently rocketed in value to become the world's third-largest company, has not yet revealed the cost of its new chips, but observers expect a high price tag that will make them accessible to only a few organisations. The chips were announced by Nvidia CEO Jensen Huang at a press conference in San Jose, California on 18 March. He showed off the company's new Blackwell B200 graphics processing units (GPUs), each of which has 208 billion transistors – the tiny switches at the heart of modern computing devices – compared to the 80 billion transistors of Nvidia's current-generation Hopper chips. He also revealed the GB200 Grace Blackwell Superchip, which combines two of the B200 chips.


Tech giant Nvidia unveils higher performing 'superchips' to power AI

Al Jazeera

Nvidia has unveiled its latest family of chips for powering artificial intelligence as it seeks to consolidate its position as the major supplier to the AI frenzy. So, ladies and gentlemen, I would like to introduce you to a very, very big GPU," CEO Jensen Huang said on Monday at a developers conference in California, referring to the graphics processors that are vital in the creation of generative AI. The event, dubbed the "AI Woodstock" by Wedbush analyst Dan Ives, has become a can't-miss date on big tech's calendar due to Nvidia's singular role in the AI revolution that has taken the world by storm since the introduction of ChatGPT in late 2022. "I hope you realise this is not a concert, this is a developers' conference," Huang joked as he took the stage in a packed arena usually reserved for ice hockey games and concerts. Nvidia's powerful GPU chips and software are an integral ingredient in the creation of generative AI, with rivals like AMD or Intel still struggling to match the power and efficiency of the company's blockbuster H100 product, launched in 2022.


NVIDIA announces its next generation of AI supercomputer chips

Engadget

NVIDIA has launched its next-generation of AI supercomputer chips that will likely play a large role in future breakthroughs in deep learning and large language models (LLMs) like OpenAI's GPT-4, the company announced. The technology represents a significant leap over the last generation and is poised to be used in data centers and supercomputers -- working on tasks like weather and climate prediction, drug discovery, quantum computing and more. The key product is the HGX H200 GPU based on NVIDIA's "Hopper" architecture, a replacement for the popular H100 GPU. It's the company's first chip to use HBM3e memory that's faster and has more capacity, thus making it better suited for large language models. "With HBM3e, the NVIDIA H200 delivers 141GB of memory at 4.8 terabytes per second, nearly double the capacity and 2.4x more bandwidth compared with its predecessor, the NVIDIA A100," the company wrote.


Will AMD's MI300 Beat NVIDIA In AI?

#artificialintelligence

The upcoming MI300, which will ship Latter this year after NVIDIA's Grace/Hopper Superchip, certainly has a shot at it. But there remain a lot of unknowns that will determine how well it performs for AI applications. And then there is Software. AMD Instinct MI300 is a combination of its flagship CPU and GPU. At the 2023 CES Keynote address, AMD's CEO Dr. Lisa Su reiterated the company's plan to bring the Instinct MI300 to market by the end of this year, and showed the monster silicon in hand. The chip is certainly a major milestone for the company, and the industry in general, being to most aggressive chiplet implementation seen so far.


Nvidia launches a new GPU architecture and the Grace CPU Superchip – TechCrunch

#artificialintelligence

At its annual GTC conference for AI developers, Nvidia today announced its next-gen Hopper GPU architecture and the Hopper H100 GPU, as well as a new data center chip that combines the GPU with a high-performance CPU, which Nvidia calls the "Grace CPU Superchip" (not to be confused with the Grace Hopper Superchip). With Hopper, Nvidia is launching a number of new and updated technologies, but for AI developers, the most important one may just be the architecture's focus on transformer models, which have become the machine learning technique de rigueur for many use cases and which powers models like GPT-3 and asBERT. The new Transformer Engine in the H100 chip promises to speed up model training by up to six times and because this new architecture also features Nvidia's new NVLink Switch system for connecting multiple nodes, large server clusters powered by these chips will be able to scale up to support massive networks with less overhead. "The largest AI models can require months to train on today's computing platforms," Nvidia's Dave Salvator writes in today's announcement. AI, high performance computing and data analytics are growing in complexity with some models, like large language ones, reaching trillions of parameters.


Nvidia describes Arm-based Grace CPU 'Superchip'

#artificialintelligence

Did you miss a session at the Data Summit? Nvidia offered details on its Grace central processing unit (CPU) "Superchip" during CEO Jensen Huang's keynote speech at its virtual Nvidia GTC 2022 event. Huang said the chip would double the performance and energy efficiency of Nvidia's chips. It is on schedule to ship next year, he said, and it can be a "superchip," or essentially two chips connected together. The chip is Nvidia's own variant of the Arm Neoverse architecture, and it is a discrete datacenter CPU designed for AI infrastructure and high-performance computing, providing the highest performance and twice the memory bandwidth and energy-efficiency compared to today's leading server chips, Huang said.