Goto

Collaborating Authors

 torchscript


A Comparative Survey of PyTorch vs TensorFlow for Deep Learning: Usability, Performance, and Deployment Trade-offs

Alawi, Zakariya Ba

arXiv.org Artificial Intelligence

This paper presents a comprehensive comparative survey of TensorFlow and PyTorch, the two leading deep learning frameworks, focusing on their usability, performance, and deployment trade-offs. We review each framework's programming paradigm and developer experience, contrasting TensorFlow's graph-based (now optionally eager) approach with PyTorch's dynamic, Pythonic style. We then compare model training speeds and inference performance across multiple tasks and data regimes, drawing on recent benchmarks and studies. Deployment flexibility is examined in depth - from TensorFlow's mature ecosystem (TensorFlow Lite for mobile/embedded, TensorFlow Serving, and JavaScript support) to PyTorch's newer production tools (TorchScript compilation, ONNX export, and TorchServe). We also survey ecosystem and community support, including library integrations, industry adoption, and research trends (e.g., PyTorch's dominance in recent research publications versus TensorFlow's broader tooling in enterprise). Applications in computer vision, natural language processing, and other domains are discussed to illustrate how each framework is used in practice. Finally, we outline future directions and open challenges in deep learning framework design, such as unifying eager and graph execution, improving cross-framework interoperability, and integrating compiler optimizations (XLA, JIT) for improved speed. Our findings indicate that while both frameworks are highly capable for state-of-the-art deep learning, they exhibit distinct trade-offs: PyTorch offers simplicity and flexibility favored in research, whereas TensorFlow provides a fuller production-ready ecosystem - understanding these trade-offs is key for practitioners selecting the appropriate tool. We include charts, code snippets, and more than 20 references to academic papers and official documentation to support this comparative analysis


High-performance training and inference for deep equivariant interatomic potentials

Tan, Chuin Wei, Descoteaux, Marc L., Kotak, Mit, Nascimento, Gabriel de Miranda, Kavanagh, Seán R., Zichi, Laura, Wang, Menghang, Saluja, Aadit, Hu, Yizhong R., Smidt, Tess, Johansson, Anders, Witt, William C., Kozinsky, Boris, Musaelian, Albert

arXiv.org Artificial Intelligence

Machine learning interatomic potentials, particularly those based on deep equivariant neural networks, have demonstrated state-of-the-art accuracy and computational efficiency in atomistic modeling tasks like molecular dynamics and high-throughput screening. The size of datasets and demands of downstream workflows are growing rapidly, making robust and scalable software essential. This work presents a major overhaul of the NequIP framework focusing on multi-node parallelism, computational performance, and extensibility. The redesigned framework supports distributed training on large datasets and removes barriers preventing full utilization of the PyTorch 2.0 compiler at train time. We demonstrate this acceleration in a case study by training Allegro models on the SPICE 2 dataset of organic molecular systems. For inference, we introduce the first end-to-end infrastructure that uses the PyTorch Ahead-of-Time Inductor compiler for machine learning interatomic potentials. Additionally, we implement a custom kernel for the Allegro model's most expensive operation, the tensor product. Together, these advancements speed up molecular dynamics calculations on system sizes of practical relevance by up to a factor of 18.


Review for NeurIPS paper: Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning

Neural Information Processing Systems

Weaknesses: This work is most applicable on networks with many small kernels, which may not be of broad interest in all cases. Nonetheless, it does help with training MobileNet and similar networks on desktop or server GPUs. I also feel that some parts of the paper overstate the contribution, either by only evaluating on these networks or by leaving out some optimized baselines. The biggest issues here are: - For inference, you should compare against an optimized inference runtime such as TensorRT. This will likely do better than PyTorch or Caffe2 do out of the box, even with TorchScript.


AI-driven Conservative-to-Primitive Conversion in Hybrid Piecewise Polytropic and Tabulated Equations of State

Kacmaz, Semih, Haas, Roland, Huerta, E. A.

arXiv.org Artificial Intelligence

We present a novel AI-based approach to accelerate conservative-to-primitive inversion in relativistic hydrodynamics simulations, focusing on hybrid piecewise polytropic and tabulated equations of state. Traditional root-finding methods are computationally intensive, particularly in large-scale simulations. To address this, we employ feedforward neural networks (NNC2PS and NNC2PL), trained in PyTorch and optimized for GPU inference using NVIDIA TensorRT, achieving significant speedups with minimal loss in accuracy. The NNC2PS model achieves $L_1$ and $L_\infty$ errors of $4.54 \times 10^{-7}$ and $3.44 \times 10^{-6}$, respectively, with the NNC2PL model yielding even lower error values. TensorRT optimization ensures high accuracy, with FP16 quantization offering 7x faster performance than traditional root-finding methods. Our AI models outperform conventional CPU solvers, demonstrating enhanced inference times, particularly for large datasets. We release the scientific software developed for this work, enabling the validation and extension of our findings. These results highlight the potential of AI, combined with GPU optimization, to significantly improve the efficiency and scalability of numerical relativity simulations.


Flexible and Efficient Surrogate Gradient Modeling with Forward Gradient Injection

Otte, Sebastian

arXiv.org Artificial Intelligence

Automatic differentiation is a key feature of present deep learning frameworks. Moreover, they typically provide various ways to specify custom gradients within the computation graph, which is of particular importance for defining surrogate gradients in the realms of non-differentiable operations such as the Heaviside function in spiking neural networks (SNNs). PyTorch, for example, allows the custom specification of the backward pass of an operation by overriding its backward method. Other frameworks provide comparable options. While these methods are common practice and usually work well, they also have several disadvantages such as limited flexibility, additional source code overhead, poor usability, or a potentially strong negative impact on the effectiveness of automatic model optimization procedures. In this paper, an alternative way to formulate surrogate gradients is presented, namely, forward gradient injection (FGI). FGI applies a simple but effective combination of basic standard operations to inject an arbitrary gradient shape into the computational graph directly within the forward pass. It is demonstrated that using FGI is straightforward and convenient. Moreover, it is shown that FGI can significantly increase the model performance in comparison to custom backward methods in SNNs when using TorchScript. These results are complemented with a general performance study on recurrent SNNs with TorchScript and torch.compile, revealing the potential for a training speedup of more than 7x and an inference speedup of more than 16x in comparison with pure PyTorch.


RL-X: A Deep Reinforcement Learning Library (not only) for RoboCup

Bohlinger, Nico, Dorer, Klaus

arXiv.org Artificial Intelligence

This paper presents the new Deep Reinforcement Learning (DRL) library RL-X and its application to the RoboCup Soccer Simulation 3D League and classic DRL benchmarks. RL-X provides a flexible and easy-to-extend codebase with self-contained single directory algorithms. Through the fast JAX-based implementations, RL-X can reach up to 4.5x speedups compared to well-known frameworks like Stable-Baselines3.


How to Run Inference on Ludwig Models Using TorchScript - Predibase - Predibase

#artificialintelligence

In Ludwig 0.6, we have introduced the ability to export Ludwig models into TorchScript, making it easier than ever to deploy models for highly performant model inference. In this blog post, we will describe the benefits of serving models using TorchScript and demonstrate how to train, export, and use the exported models on an example dataset. A common way to serve machine learning models is wrapping them in REST APIs and exposing their endpoints. This works great if you do not have particularly strict SLA requirements or if backwards compatibility is not a concern. However, if you need to serve a model in a production environment, you will likely need to use a more robust solution.


GitHub - VoltaML/voltaML: VoltaML is a lightweight library to convert and run your ML/DL deep learning models in high performance inference runtimes like TensorRT, TorchScript, ONNX and TVM.

#artificialintelligence

VoltaML can optimize, compile and deploy your models to your target CPU and GPU devices, with just one line of code. Classification has been done on Imagenet data, batch size 1 and imagesize 224 on NVIDIA RTX 2080Ti. In terms of top 1% and 5% accuracy for int8 models, we have not seen an accuracy drop of more than 1%. Object Detection inference was done on a dummy data with imagesize 640 and batch size 1 on NVIDIA RTX 2080Ti. Segmentation inference was done on a dummy data with imagesize 224 and batch size 1 on NVIDIA RTX 2080Ti.


How PyTorch Is Challenging TensorFlow Lately

#artificialintelligence

Google's TensorFlow and Facebook's PyTorch are the most popular machine learning frameworks. The former has a two-year head start over PyTorch (released in 2016). TensorFlow's popularity reportedly declined after PyTorch bursted into the scene. However, Google released a more user-friendly TensorFlow 2.0 in January 2019 to recover lost ground. PyTorch is emerging as a leader in terms of papers in leading research conferences.


Welcome to PyTorch Tutorials -- PyTorch Tutorials 1.8.0 documentation

#artificialintelligence

Learn how to load data, build deep neural networks, train and save your models in this quickstart guide. This tutorial introduces the fundamental concepts of PyTorch through self-contained examples. Use torch.nn to create and train a neural network. Learn to use TensorBoard to visualize data and model training. Train a generative adversarial network (GAN) to generate new celebrities.