Goto

Collaborating Authors

 pytorch 2




GraphMend: Code Transformations for Fixing Graph Breaks in PyTorch 2

Kashmira, Savini, Dantanarayana, Jayanaka, Sathiyalogeswaran, Thamirawaran, Yuan, Yichao, Talati, Nishil, Flautner, Krisztian, Tang, Lingjia, Mars, Jason

arXiv.org Artificial Intelligence

This paper presents GraphMend, a high-level compiler that eliminates FX graph breaks in PyTorch 2 programs. Although PyTorch 2 introduced TorchDynamo and TorchInductor to enable just-in-time graph compilation, unresolved dynamic control flow and unsupported Python constructs often fragment models into multiple FX graphs. These fragments force frequent fallbacks to eager mode, incur costly CPU-to-GPU synchronizations, and reduce optimization opportunities. GraphMend addresses this limitation by analyzing and transforming source code before execution. Built on the Jac compilation framework, GraphMend introduces two code transformations that remove graph breaks due to dynamic control flow and Python I/O functions. This design allows PyTorch's compilation pipeline to capture larger, uninterrupted FX graphs without requiring manual refactoring by developers. Evaluation across eight Hugging Face models shows that GraphMend removes all fixable graph breaks due to dynamic control flow and Python I/O functions, driving the break count to 0 in 6 models and reducing it from 5 to 2 in another model. On NVIDIA RTX 3090 and A40 GPUs, GraphMend achieves up to 75% latency reductions and up to 8% higher end-to-end throughput. These results demonstrate that high-level code transformation is an effective complement to PyTorch's dynamic JIT compilation pipeline, substantially improving both usability and performance.


iSpLib: A Library for Accelerating Graph Neural Networks using Auto-tuned Sparse Operations

Anik, Md Saidul Hoque, Badhe, Pranav, Gampa, Rohit, Azad, Ariful

arXiv.org Artificial Intelligence

Core computations in Graph Neural Network (GNN) training and inference are often mapped to sparse matrix operations such as sparse-dense matrix multiplication (SpMM). These sparse operations are harder to optimize by manual tuning because their performance depends significantly on the sparsity of input graphs, GNN models, and computing platforms. To address this challenge, we present iSpLib, a PyTorch-based C++ library equipped with auto-tuned sparse operations. iSpLib expedites GNN training with a cache-enabled backpropagation that stores intermediate matrices in local caches. The library offers a user-friendly Python plug-in that allows users to take advantage of our optimized PyTorch operations out-of-the-box for any existing linear algebra-based PyTorch implementation of popular GNNs (Graph Convolution Network, GraphSAGE, Graph Inference Network, etc.) with only two lines of additional code. We demonstrate that iSpLib obtains up to 27x overall training speedup compared to the equivalent PyTorch 2.1.0 and PyTorch Geometric 2.4.0 implementations on the CPU. Our library is publicly available at https://github.com/HipGraph/iSpLib (https://doi.org/10.5281/zenodo.10806511).


PyTorch 2.0: Our next generation release that is faster, more Pythonic and Dynamic as ever

#artificialintelligence

We are excited to announce the release of PyTorch 2.0 which we highlighted during the PyTorch Conference on 12/2/22! PyTorch 2.0 offers the same eager-mode development and user experience, while fundamentally changing and supercharging how PyTorch operates at compiler level under the hood with faster performance and support for Dynamic Shapes and Distributed. This next-generation release includes a Stable version of Accelerated Transformers (formerly called Better Transformers); Beta includes torch.compile For a comprehensive introduction and technical overview of torch.compile, Along with 2.0, we are also releasing a series of beta updates to the PyTorch domain libraries, including those that are in-tree, and separate libraries including TorchAudio, TorchVision, and TorchText.


What's New in PyTorch 2.0? torch.compile - PyImageSearch

#artificialintelligence

Over the last few years, PyTorch has evolved as a popular and widely used framework for training deep neural networks (DNNs). The success of PyTorch is attributed to its simplicity, first-class Python integration, and imperative style of programming. Since the launch of PyTorch in 2017, it has strived for high performance and eager execution. It has provided some of the best abstractions for distributed training, data loading, and automatic differentiation. With continuous innovation from the PyTorch team, PyTorch has moved from version 1.0 to the most recent version, 1.13. However, over all these years, hardware accelerators like GPUs have become 15x and 2x faster in compute and memory access, respectively. Thus, to leverage these resources and deliver high-performance eager execution, the team moved substantial parts of PyTorch internals to C .


Intel Contributes AI Acceleration to PyTorch 2.0 - cyberpogo

#artificialintelligence

In the release of Python 2.0, contributions from Intel using Intel Extension for PyTorch, oneAPI Deep Neural Network Library (oneDNN) and additional support for Intel CPUs enable developers to optimize inference and training performance for artificial intelligence (AI). As part of the PyTorch 2.0 compilation stack, the TorchInductor CPU backend optimization by Intel Extension for PyTorch and PyTorch ATen CPU achieved up to 1.7 times faster FP32 inference performance when benchmarked with TorchBench, HuggingFace and timm.1 This update brings notable performance improvements to graph compilation over the PyTorch eager mode. Notices & Disclaimers Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See backup for configuration details.


Intel Contributes AI Acceleration to PyTorch 2.0 - Liwaiwai

#artificialintelligence

New features for AI developers in research and production environments optimize performance. In the release of Python 2.0, contributions from Intel using Intel® Extension for PyTorch , oneAPI Deep Neural Network Library (oneDNN) and additional support for Intel® CPUs enable developers to optimize inference and training performance for artificial intelligence (AI). As part of the PyTorch 2.0 compilation stack, the TorchInductor CPU backend optimization by Intel Extension for PyTorch and PyTorch ATen CPU achieved up to 1.7 times faster FP32 inference performance when benchmarked with TorchBench, HuggingFace and timm.1 This update brings notable performance improvements to graph compilation over the PyTorch eager mode. Other optimizations include: More: Read…


PyTorch 2.0 vs. TensorFlow 2.10, which one is better?

#artificialintelligence

We can simply call the model's fit() method to train it and then its evaluate() method to evaluate it on the test set. First, we need to choose a loss function and an optimizer, and call the model's compile() method:


PyTorch 2.0正式版来了!

#artificialintelligence

在PyTorch Conference 2022上,研发团队介绍了 PyTorch 2.0,并宣布稳定版本将在今年 3 月正式发布,现在 PyTorch 2.0 正式版如期而至。