Goto

Collaborating Authors

 perlmutter


Plexus: Taming Billion-edge Graphs with 3D Parallel Full-graph GNN Training

Ranjan, Aditya K., Singh, Siddharth, Wei, Cunyang, Bhatele, Abhinav

arXiv.org Artificial Intelligence

Graph neural networks (GNNs) leverage the connectivity and structure of real-world graphs to learn intricate properties and relationships between nodes. Many real-world graphs exceed the memory capacity of a GPU due to their sheer size, and training GNNs on such graphs requires techniques such as mini-batch sampling to scale. The alternative approach of distributed full-graph training suffers from high communication overheads and load imbalance due to the irregular structure of graphs. We propose a three-dimensional (3D) parallel approach for full-graph training that tackles these issues and scales to billion-edge graphs. In addition, we introduce optimizations such as a double permutation scheme for load balancing, and a performance model to predict the optimal 3D configuration of our parallel implementation -- Plexus. We evaluate Plexus on six different graph datasets and show scaling results on up to 2048 GPUs of Perlmutter, and 1024 GPUs of Frontier. Plexus achieves unprecedented speedups of 2.3-12.5x over prior state of the art, and a reduction in time-to-solution by 5.2-8.7x on Perlmutter and 7.0-54.2x on Frontier.


Trump strikes a blow for AI – by firing the US copyright supremo

The Guardian

Sometimes it helps me to write by thinking about how a radio broadcaster or television presenter would deliver the information, so I'm your host, Blake Montgomery. Today in tech news: questions hover over the automation of labor in the worker-strapped US healthcare system; and drones proliferate in a new conflict: India v Pakistan, both armed with nuclear weapons. Meanwhile, in contrast to a thoughtful and robust conversation, the US is taking the opposite tack. Legend has it that Alexander the Great was presented with a knot in a rope tying a cart to a stake. So complex were its twistings that no man had been able to untie it of the hundreds who had tried. Alexander silently drew his sword and sliced the knot in two.


Trump admin fires top US copyright official days after terminating Librarian of Congress

FOX News

An AI art lecturer said he believes the U.S. government would encounter difficulty if it attempted to establish a watermark system for AI-generated content. Trump fired Librarian of Congress Carla Hayden, who was the first woman and first African American to be Librarian of Congress, on Thursday. The termination was part of the administration's ongoing purge of government officials who are perceived to be opposed to Trump and his agenda. The White House did not immediately respond to Fox News Digital's requests for comment on the matter. Like Perlmutter, Hayden was notified of her firing in an email, according to The Associated Press.


InfoGain Wavelets: Furthering the Design of Diffusion Wavelets for Graph-Structured Data

Johnson, David R., Krishnaswamy, Smita, Perlmutter, Michael

arXiv.org Machine Learning

Diffusion wavelets extract information from graph signals at different scales of resolution by utilizing graph diffusion operators raised to various powers, known as diffusion scales. Traditionally, the diffusion scales are chosen to be dyadic integers, $\mathbf{2^j}$. Here, we propose a novel, unsupervised method for selecting the diffusion scales based on ideas from information theory. We then show that our method can be incorporated into wavelet-based GNNs via graph classification experiments.


Convergence of Manifold Filter-Combine Networks

Johnson, David R., Chew, Joyce, Viswanath, Siddharth, De Brouwer, Edward, Needell, Deanna, Krishnaswamy, Smita, Perlmutter, Michael

arXiv.org Machine Learning

In order to better understand manifold neural networks (MNNs), we introduce Manifold Filter-Combine Networks (MFCNs). The filter-combine framework parallels the popular aggregate-combine paradigm for graph neural networks (GNNs) and naturally suggests many interesting families of MNNs which can be interpreted as the manifold analog of various popular GNNs. We then propose a method for implementing MFCNs on high-dimensional point clouds that relies on approximating the manifold by a sparse graph. We prove that our method is consistent in the sense that it converges to a continuum limit as the number of data points tends to infinity.


Communication-minimizing Asynchronous Tensor Parallelism

Singh, Siddharth, Sating, Zack, Bhatele, Abhinav

arXiv.org Artificial Intelligence

In this work, we propose Tensor3D, a three dimensional (3D) hybrid tensor and data parallel framework which strives to alleviate As state-of-the-art neural networks scale to billions of parameters, the aforementioned performance bottlenecks of existing tensor designing parallel algorithms that can train these networks parallel approaches. Our framework relies on three key ideas to efficiently on multi-GPU clusters has become critical. This paper minimize the idle time spent in communication. First, we show how presents Tensor3D, a novel three-dimensional (3D) approach to a naive application of a tensor parallel strategy can lead to a significant parallelize tensor computations, that strives to minimize the idle amount of communication for satisfying the data dependencies time incurred due to communication in parallel training of large of parallelized layers of a neural network. To this end, we propose multi-billion parameter models. First, we introduce an intelligent an intelligent distribution of neural network parameters across distribution of neural network parameters across GPUs that eliminates GPUs that eliminates the aforementioned communication for satisfying communication required for satisfying data dependencies of data dependencies.


Copyright Office Sets Sights on Artificial Intelligence in 2023

#artificialintelligence

"This year, the big milestone was having the board open its doors and start accepting claims," Perlmutter said, adding that board decisions will start coming in the next year. Though it is "still early days" and it remains unclear what the standard volume of claims will be, Perlmutter said she is "extremely impressed" with how well the board is doing. It's received over 260 cases so far. She added that several of the cases have been dismissed. The office believes that means they've been settled, which would adhere to the alternative dispute resolution mechanism of the board, she said. "We set up this totally new tribunal in really record time. I think most other agencies who have seen what we've done can't understand how we managed that in under a year and a half, because it required a lot of work," she said.


Learnable Filters for Geometric Scattering Modules

Tong, Alexander, Wenkel, Frederik, Bhaskar, Dhananjay, Macdonald, Kincaid, Grady, Jackson, Perlmutter, Michael, Krishnaswamy, Smita, Wolf, Guy

arXiv.org Artificial Intelligence

We propose a new graph neural network (GNN) module, based on relaxations of recently proposed geometric scattering transforms, which consist of a cascade of graph wavelet filters. Our learnable geometric scattering (LEGS) module enables adaptive tuning of the wavelets to encourage band-pass features to emerge in learned representations. The incorporation of our LEGS-module in GNNs enables the learning of longer-range graph relations compared to many popular GNNs, which often rely on encoding graph structure via smoothness or similarity between neighbors. Further, its wavelet priors result in simplified architectures with significantly fewer learned parameters compared to competing GNNs. We demonstrate the predictive performance of LEGS-based networks on graph classification benchmarks, as well as the descriptive quality of their learned features in biochemical graph data exploration tasks. Our results show that LEGS-based networks match or outperforms popular GNNs, as well as the original geometric scattering construction, on many datasets, in particular in biochemical domains, while retaining certain mathematical properties of handcrafted (non-learned) geometric scattering.


FourCastNet: Accelerating Global High-Resolution Weather Forecasting using Adaptive Fourier Neural Operators

Kurth, Thorsten, Subramanian, Shashank, Harrington, Peter, Pathak, Jaideep, Mardani, Morteza, Hall, David, Miele, Andrea, Kashinath, Karthik, Anandkumar, Animashree

arXiv.org Artificial Intelligence

Extreme weather amplified by climate change is causing increasingly devastating impacts across the globe. The current use of physics-based numerical weather prediction (NWP) limits accuracy due to high computational cost and strict time-to-solution limits. We report that a data-driven deep learning Earth system emulator, FourCastNet, can predict global weather and generate medium-range forecasts five orders-of-magnitude faster than NWP while approaching state-of-the-art accuracy. FourCast-Net is optimized and scales efficiently on three supercomputing systems: Selene, Perlmutter, and JUWELS Booster up to 3,808 NVIDIA A100 GPUs, attaining 140.8 petaFLOPS in mixed precision (11.9%of peak at that scale). The time-to-solution for training FourCastNet measured on JUWELS Booster on 3,072GPUs is 67.4minutes, resulting in an 80,000times faster time-to-solution relative to state-of-the-art NWP, in inference. FourCastNet produces accurate instantaneous weather predictions for a week in advance, enables enormous ensembles that better capture weather extremes, and supports higher global forecast resolutions.


Scalable training of graph convolutional neural networks for fast and accurate predictions of HOMO-LUMO gap in molecules

Choi, Jong Youl, Zhang, Pei, Mehta, Kshitij, Blanchard, Andrew, Pasini, Massimiliano Lupo

arXiv.org Artificial Intelligence

Graph Convolutional Neural Network (GCNN) is a popular class of deep learning (DL) models in material science to predict material properties from the graph representation of molecular structures. Training an accurate and comprehensive GCNN surrogate for molecular design requires large-scale graph datasets and is usually a time-consuming process. Recent advances in GPUs and distributed computing open a path to reduce the computational cost for GCNN training effectively. However, efficient utilization of high performance computing (HPC) resources for training requires simultaneously optimizing large-scale data management and scalable stochastic batched optimization techniques. In this work, we focus on building GCNN models on HPC systems to predict material properties of millions of molecules. We use HydraGNN, our in-house library for large-scale GCNN training, leveraging distributed data parallelism in PyTorch. We use ADIOS, a high-performance data management framework for efficient storage and reading of large molecular graph data. We perform parallel training on two open-source large-scale graph datasets to build a GCNN predictor for an important quantum property known as the HOMO-LUMO gap. We measure the scalability, accuracy, and convergence of our approach on two DOE supercomputers: the Summit supercomputer at the Oak Ridge Leadership Computing Facility (OLCF) and the Perlmutter system at the National Energy Research Scientific Computing Center (NERSC). We present our experimental results with HydraGNN showing i) reduction of data loading time up to 4.2 times compared with a conventional method and ii) linear scaling performance for training up to 1,024 GPUs on both Summit and Perlmutter.