perlmutter
Plexus: Taming Billion-edge Graphs with 3D Parallel Full-graph GNN Training
Ranjan, Aditya K., Singh, Siddharth, Wei, Cunyang, Bhatele, Abhinav
Graph neural networks (GNNs) leverage the connectivity and structure of real-world graphs to learn intricate properties and relationships between nodes. Many real-world graphs exceed the memory capacity of a GPU due to their sheer size, and training GNNs on such graphs requires techniques such as mini-batch sampling to scale. The alternative approach of distributed full-graph training suffers from high communication overheads and load imbalance due to the irregular structure of graphs. We propose a three-dimensional (3D) parallel approach for full-graph training that tackles these issues and scales to billion-edge graphs. In addition, we introduce optimizations such as a double permutation scheme for load balancing, and a performance model to predict the optimal 3D configuration of our parallel implementation -- Plexus. We evaluate Plexus on six different graph datasets and show scaling results on up to 2048 GPUs of Perlmutter, and 1024 GPUs of Frontier. Plexus achieves unprecedented speedups of 2.3-12.5x over prior state of the art, and a reduction in time-to-solution by 5.2-8.7x on Perlmutter and 7.0-54.2x on Frontier.
- North America > United States > Maryland > Prince George's County > College Park (0.14)
- North America > United States > Missouri > St. Louis County > St. Louis (0.05)
- North America > United States > New York > New York County > New York City (0.05)
- (10 more...)
- Overview (0.67)
- Research Report (0.51)
- Energy (1.00)
- Government > Regional Government > North America Government > United States Government (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
- Information Technology > Architecture > Distributed Systems (0.93)
- Information Technology > Artificial Intelligence > Natural Language (0.93)
- Information Technology > Data Science > Data Mining (0.93)
Trump strikes a blow for AI – by firing the US copyright supremo
Sometimes it helps me to write by thinking about how a radio broadcaster or television presenter would deliver the information, so I'm your host, Blake Montgomery. Today in tech news: questions hover over the automation of labor in the worker-strapped US healthcare system; and drones proliferate in a new conflict: India v Pakistan, both armed with nuclear weapons. Meanwhile, in contrast to a thoughtful and robust conversation, the US is taking the opposite tack. Legend has it that Alexander the Great was presented with a knot in a rope tying a cart to a stake. So complex were its twistings that no man had been able to untie it of the hundreds who had tried. Alexander silently drew his sword and sliced the knot in two.
- Asia > India (0.74)
- Asia > Pakistan (0.64)
- Europe > United Kingdom (0.48)
- (10 more...)
- Law (1.00)
- Government > Regional Government > North America Government > United States Government (0.72)
- Government > Military (0.68)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.49)
- Information Technology > e-Commerce > Financial Technology (0.51)
- Information Technology > Artificial Intelligence > Robots (0.31)
Trump admin fires top US copyright official days after terminating Librarian of Congress
An AI art lecturer said he believes the U.S. government would encounter difficulty if it attempted to establish a watermark system for AI-generated content. Trump fired Librarian of Congress Carla Hayden, who was the first woman and first African American to be Librarian of Congress, on Thursday. The termination was part of the administration's ongoing purge of government officials who are perceived to be opposed to Trump and his agenda. The White House did not immediately respond to Fox News Digital's requests for comment on the matter. Like Perlmutter, Hayden was notified of her firing in an email, according to The Associated Press.
InfoGain Wavelets: Furthering the Design of Diffusion Wavelets for Graph-Structured Data
Johnson, David R., Krishnaswamy, Smita, Perlmutter, Michael
Diffusion wavelets extract information from graph signals at different scales of resolution by utilizing graph diffusion operators raised to various powers, known as diffusion scales. Traditionally, the diffusion scales are chosen to be dyadic integers, $\mathbf{2^j}$. Here, we propose a novel, unsupervised method for selecting the diffusion scales based on ideas from information theory. We then show that our method can be incorporated into wavelet-based GNNs via graph classification experiments.
- North America > United States > Idaho > Ada County > Boise (0.04)
- North America > United States > Connecticut > New Haven County > New Haven (0.04)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Therapeutic Area > Oncology (0.93)
Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers
Singh, Siddharth, Singhania, Prajwal, Ranjan, Aditya, Kirchenbauer, John, Geiping, Jonas, Wen, Yuxin, Jain, Neel, Hans, Abhimanyu, Shu, Manli, Tomar, Aditya, Goldstein, Tom, Bhatele, Abhinav
Training and fine-tuning large language models (LLMs) with hundreds of billions to trillions of parameters requires tens of thousands of GPUs, and a highly scalable software stack. In this work, we present a novel four-dimensional hybrid parallel algorithm implemented in a highly scalable, portable, open-source framework called AxoNN. We describe several performance optimizations in AxoNN to improve matrix multiply kernel performance, overlap non-blocking collectives with computation, and performance modeling to choose performance optimal configurations. These have resulted in unprecedented scaling and peak flop/s (bf16) for training of GPT-style transformer models on Perlmutter (620.1 Petaflop/s), Frontier (1.381 Exaflop/s) and Alps (1.423 Exaflop/s). While the abilities of LLMs improve with the number of trainable parameters, so do privacy and copyright risks caused by memorization of training data, which can cause disclosure of sensitive or private information at inference time. We highlight this side effect of scale through experiments that explore "catastrophic memorization", where models are sufficiently large to memorize training data in a single pass, and present an approach to prevent it. As part of this study, we demonstrate fine-tuning of a 405-billion parameter LLM using AxoNN on Frontier.
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- (5 more...)
- Information Technology (0.95)
- Energy (0.67)
- Government > Regional Government (0.67)
Convergence of Manifold Filter-Combine Networks
Johnson, David R., Chew, Joyce, Viswanath, Siddharth, De Brouwer, Edward, Needell, Deanna, Krishnaswamy, Smita, Perlmutter, Michael
In order to better understand manifold neural networks (MNNs), we introduce Manifold Filter-Combine Networks (MFCNs). The filter-combine framework parallels the popular aggregate-combine paradigm for graph neural networks (GNNs) and naturally suggests many interesting families of MNNs which can be interpreted as the manifold analog of various popular GNNs. We then propose a method for implementing MFCNs on high-dimensional point clouds that relies on approximating the manifold by a sparse graph. We prove that our method is consistent in the sense that it converges to a continuum limit as the number of data points tends to infinity.
Communication-minimizing Asynchronous Tensor Parallelism
Singh, Siddharth, Sating, Zack, Bhatele, Abhinav
In this work, we propose Tensor3D, a three dimensional (3D) hybrid tensor and data parallel framework which strives to alleviate As state-of-the-art neural networks scale to billions of parameters, the aforementioned performance bottlenecks of existing tensor designing parallel algorithms that can train these networks parallel approaches. Our framework relies on three key ideas to efficiently on multi-GPU clusters has become critical. This paper minimize the idle time spent in communication. First, we show how presents Tensor3D, a novel three-dimensional (3D) approach to a naive application of a tensor parallel strategy can lead to a significant parallelize tensor computations, that strives to minimize the idle amount of communication for satisfying the data dependencies time incurred due to communication in parallel training of large of parallelized layers of a neural network. To this end, we propose multi-billion parameter models. First, we introduce an intelligent an intelligent distribution of neural network parameters across distribution of neural network parameters across GPUs that eliminates GPUs that eliminates the aforementioned communication for satisfying communication required for satisfying data dependencies of data dependencies.
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Maryland > Prince George's County > College Park (0.04)
- (5 more...)
Copyright Office Sets Sights on Artificial Intelligence in 2023
"This year, the big milestone was having the board open its doors and start accepting claims," Perlmutter said, adding that board decisions will start coming in the next year. Though it is "still early days" and it remains unclear what the standard volume of claims will be, Perlmutter said she is "extremely impressed" with how well the board is doing. It's received over 260 cases so far. She added that several of the cases have been dismissed. The office believes that means they've been settled, which would adhere to the alternative dispute resolution mechanism of the board, she said. "We set up this totally new tribunal in really record time. I think most other agencies who have seen what we've done can't understand how we managed that in under a year and a half, because it required a lot of work," she said.
Learnable Filters for Geometric Scattering Modules
Tong, Alexander, Wenkel, Frederik, Bhaskar, Dhananjay, Macdonald, Kincaid, Grady, Jackson, Perlmutter, Michael, Krishnaswamy, Smita, Wolf, Guy
We propose a new graph neural network (GNN) module, based on relaxations of recently proposed geometric scattering transforms, which consist of a cascade of graph wavelet filters. Our learnable geometric scattering (LEGS) module enables adaptive tuning of the wavelets to encourage band-pass features to emerge in learned representations. The incorporation of our LEGS-module in GNNs enables the learning of longer-range graph relations compared to many popular GNNs, which often rely on encoding graph structure via smoothness or similarity between neighbors. Further, its wavelet priors result in simplified architectures with significantly fewer learned parameters compared to competing GNNs. We demonstrate the predictive performance of LEGS-based networks on graph classification benchmarks, as well as the descriptive quality of their learned features in biochemical graph data exploration tasks. Our results show that LEGS-based networks match or outperforms popular GNNs, as well as the original geometric scattering construction, on many datasets, in particular in biochemical domains, while retaining certain mathematical properties of handcrafted (non-learned) geometric scattering.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > Michigan (0.04)
- (5 more...)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Education (0.93)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Communications (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
FourCastNet: Accelerating Global High-Resolution Weather Forecasting using Adaptive Fourier Neural Operators
Kurth, Thorsten, Subramanian, Shashank, Harrington, Peter, Pathak, Jaideep, Mardani, Morteza, Hall, David, Miele, Andrea, Kashinath, Karthik, Anandkumar, Animashree
Extreme weather amplified by climate change is causing increasingly devastating impacts across the globe. The current use of physics-based numerical weather prediction (NWP) limits accuracy due to high computational cost and strict time-to-solution limits. We report that a data-driven deep learning Earth system emulator, FourCastNet, can predict global weather and generate medium-range forecasts five orders-of-magnitude faster than NWP while approaching state-of-the-art accuracy. FourCast-Net is optimized and scales efficiently on three supercomputing systems: Selene, Perlmutter, and JUWELS Booster up to 3,808 NVIDIA A100 GPUs, attaining 140.8 petaFLOPS in mixed precision (11.9%of peak at that scale). The time-to-solution for training FourCastNet measured on JUWELS Booster on 3,072GPUs is 67.4minutes, resulting in an 80,000times faster time-to-solution relative to state-of-the-art NWP, in inference. FourCastNet produces accurate instantaneous weather predictions for a week in advance, enables enormous ensembles that better capture weather extremes, and supports higher global forecast resolutions.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > California > Santa Clara County > Santa Clara (0.05)
- North America > United States > California > Alameda County > Berkeley (0.04)
- (2 more...)