AITopics | torchscript

2508.04035

Country:

North America > Canada > Quebec > Montreal (0.04)
Asia > Middle East > Saudi Arabia > Riyadh Province > Riyadh (0.04)

Genre:

Research Report (1.00)
Overview (0.66)

Industry:

Information Technology > Services (1.00)
Education (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceApr-23-2025

High-performance training and inference for deep equivariant interatomic potentials

Tan, Chuin Wei, Descoteaux, Marc L., Kotak, Mit, Nascimento, Gabriel de Miranda, Kavanagh, Seán R., Zichi, Laura, Wang, Menghang, Saluja, Aadit, Hu, Yizhong R., Smidt, Tess, Johansson, Anders, Witt, William C., Kozinsky, Boris, Musaelian, Albert

Machine learning interatomic potentials, particularly those based on deep equivariant neural networks, have demonstrated state-of-the-art accuracy and computational efficiency in atomistic modeling tasks like molecular dynamics and high-throughput screening. The size of datasets and demands of downstream workflows are growing rapidly, making robust and scalable software essential. This work presents a major overhaul of the NequIP framework focusing on multi-node parallelism, computational performance, and extensibility. The redesigned framework supports distributed training on large datasets and removes barriers preventing full utilization of the PyTorch 2.0 compiler at train time. We demonstrate this acceleration in a case study by training Allegro models on the SPICE 2 dataset of organic molecular systems. For inference, we introduce the first end-to-end infrastructure that uses the PyTorch Ahead-of-Time Inductor compiler for machine learning interatomic potentials. Additionally, we implement a custom kernel for the Allegro model's most expensive operation, the tensor product. Together, these advancements speed up molecular dynamics calculations on system sizes of practical relevance by up to a factor of 18.

artificial intelligence, arxiv preprint arxiv, machine learning, (17 more...)

2504.16068

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Europe > United Kingdom > England > Tyne and Wear > Sunderland (0.04)

Genre: Research Report (0.50)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Energy (1.00)
Health & Medicine (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Neural Information Processing SystemsJan-24-2025, 21:17:04 GMT

Review for NeurIPS paper: Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning

Weaknesses: This work is most applicable on networks with many small kernels, which may not be of broad interest in all cases. Nonetheless, it does help with training MobileNet and similar networks on desktop or server GPUs. I also feel that some parts of the paper overstate the contribution, either by only evaluating on these networks or by leaving out some optimized baselines. The biggest issues here are: - For inference, you should compare against an optimized inference runtime such as TensorRT. This will likely do better than PyTorch or Caffe2 do out of the box, even with TorchScript.

deep learning, neurips paper, parallel gpu task scheduling, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)

Kacmaz, Semih, Haas, Roland, Huerta, E. A.

AI-driven Conservative-to-Primitive Conversion in Hybrid Piecewise Polytropic and Tabulated Equations of State

arXiv.org Artificial IntelligenceDec-10-2024

We present a novel AI-based approach to accelerate conservative-to-primitive inversion in relativistic hydrodynamics simulations, focusing on hybrid piecewise polytropic and tabulated equations of state. Traditional root-finding methods are computationally intensive, particularly in large-scale simulations. To address this, we employ feedforward neural networks (NNC2PS and NNC2PL), trained in PyTorch and optimized for GPU inference using NVIDIA TensorRT, achieving significant speedups with minimal loss in accuracy. The NNC2PS model achieves $L_1$ and $L_\infty$ errors of $4.54 \times 10^{-7}$ and $3.44 \times 10^{-6}$, respectively, with the NNC2PL model yielding even lower error values. TensorRT optimization ensures high accuracy, with FP16 quantization offering 7x faster performance than traditional root-finding methods. Our AI models outperform conventional CPU solvers, demonstrating enhanced inference times, particularly for large datasets. We release the scientific software developed for this work, enabling the validation and extension of our findings. These results highlight the potential of AI, combined with GPU optimization, to significantly improve the efficiency and scalability of numerical relativity simulations.

artificial intelligence, machine learning, tensorrt, (19 more...)

2412.07836

Country: North America > United States > Illinois (0.70)

Genre: Research Report > New Finding (0.88)

Industry: Energy > Oil & Gas > Upstream (0.71)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.36)

arXiv.org Artificial IntelligenceMay-31-2024

Flexible and Efficient Surrogate Gradient Modeling with Forward Gradient Injection

Otte, Sebastian

Automatic differentiation is a key feature of present deep learning frameworks. Moreover, they typically provide various ways to specify custom gradients within the computation graph, which is of particular importance for defining surrogate gradients in the realms of non-differentiable operations such as the Heaviside function in spiking neural networks (SNNs). PyTorch, for example, allows the custom specification of the backward pass of an operation by overriding its backward method. Other frameworks provide comparable options. While these methods are common practice and usually work well, they also have several disadvantages such as limited flexibility, additional source code overhead, poor usability, or a potentially strong negative impact on the effectiveness of automatic model optimization procedures. In this paper, an alternative way to formulate surrogate gradients is presented, namely, forward gradient injection (FGI). FGI applies a simple but effective combination of basic standard operations to inject an arbitrary gradient shape into the computational graph directly within the forward pass. It is demonstrated that using FGI is straightforward and convenient. Moreover, it is shown that FGI can significantly increase the model performance in comparison to custom backward methods in SNNs when using TorchScript. These results are complemented with a general performance study on recurrent SNNs with TorchScript and torch.compile, revealing the potential for a training speedup of more than 7x and an inference speedup of more than 16x in comparison with pure PyTorch.

fw pass bw pass, gradient, torch, (14 more...)

2406.00177

Country:

North America > United States (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Europe > Germany (0.04)
Europe > Austria (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Bohlinger, Nico, Dorer, Klaus

RL-X: A Deep Reinforcement Learning Library (not only) for RoboCup

arXiv.org Artificial IntelligenceOct-20-2023

This paper presents the new Deep Reinforcement Learning (DRL) library RL-X and its application to the RoboCup Soccer Simulation 3D League and classic DRL benchmarks. RL-X provides a flexible and easy-to-extend codebase with self-contained single directory algorithms. Through the fast JAX-based implementations, RL-X can reach up to 4.5x speedups compared to well-known frameworks like Stable-Baselines3.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2310.13396

Country:

Europe > Portugal > Braga > Braga (0.04)
Europe > Germany (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Sports > Soccer (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

#artificialintelligenceDec-20-2022, 20:25:11 GMT

How to Run Inference on Ludwig Models Using TorchScript - Predibase - Predibase

In Ludwig 0.6, we have introduced the ability to export Ludwig models into TorchScript, making it easier than ever to deploy models for highly performant model inference. In this blog post, we will describe the benefits of serving models using TorchScript and demonstrate how to train, export, and use the exported models on an example dataset. A common way to serve machine learning models is wrapping them in REST APIs and exposing their endpoints. This works great if you do not have particularly strict SLA requirements or if backwards compatibility is not a concern. However, if you need to serve a model in a production environment, you will likely need to use a more robust solution.

ludwig model, predibase, torchscript, (15 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

#artificialintelligenceOct-29-2022, 12:30:47 GMT

GitHub - VoltaML/voltaML: VoltaML is a lightweight library to convert and run your ML/DL deep learning models in high performance inference runtimes like TensorRT, TorchScript, ONNX and TVM.

VoltaML can optimize, compile and deploy your models to your target CPU and GPU devices, with just one line of code. Classification has been done on Imagenet data, batch size 1 and imagesize 224 on NVIDIA RTX 2080Ti. In terms of top 1% and 5% accuracy for int8 models, we have not seen an accuracy drop of more than 1%. Object Detection inference was done on a dummy data with imagesize 640 and batch size 1 on NVIDIA RTX 2080Ti. Segmentation inference was done on a dummy data with imagesize 224 and batch size 1 on NVIDIA RTX 2080Ti.

deep learning model, learning model, lightweight library, (10 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

#artificialintelligenceJun-11-2021, 11:50:23 GMT

How PyTorch Is Challenging TensorFlow Lately

Google's TensorFlow and Facebook's PyTorch are the most popular machine learning frameworks. The former has a two-year head start over PyTorch (released in 2016). TensorFlow's popularity reportedly declined after PyTorch bursted into the scene. However, Google released a more user-friendly TensorFlow 2.0 in January 2019 to recover lost ground. PyTorch is emerging as a leader in terms of papers in leading research conferences.

artificial intelligence, machine learning, pytorch, (7 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

#artificialintelligenceMar-5-2021, 20:10:49 GMT

Welcome to PyTorch Tutorials -- PyTorch Tutorials 1.8.0 documentation

Learn how to load data, build deep neural networks, train and save your models in this quickstart guide. This tutorial introduces the fundamental concepts of PyTorch through self-contained examples. Use torch.nn to create and train a neural network. Learn to use TensorBoard to visualize data and model training. Train a generative adversarial network (GAN) to generate new celebrities.

artificial intelligence, machine learning, pytorch, (16 more...)

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)