AITopics | ml compiler

Collaborating Authors

ml compiler

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Transferable Graph Optimizers for ML Compilers

Neural Information Processing SystemsDec-24-2025, 09:17:38 GMT

Most compilers for machine learning (ML) frameworks need to solve many correlated optimization problems to generate efficient machine code. Current ML compilers rely on heuristics based algorithms to solve these optimization problems one at a time. However, this approach is not only hard to maintain but often leads to sub-optimal solutions especially for newer model architectures. Existing learning based approaches in the literature are sample inefficient, tackle a single optimization problem, and do not generalize to unseen graphs making them infeasible to be deployed in practice. To address these limitations, we propose an end-to-end, transferable deep reinforcement learning method for computational graph optimization (GO), based on a scalable sequential attention mechanism over an inductive graph neural network. GO generates decisions on the entire graph rather than on each individual node autoregressively, drastically speeding up the search compared to prior methods. Moreover, we propose recurrent attention layers to jointly optimize dependent graph optimization tasks and demonstrate 33%-60% speedup on three graph optimization tasks compared to TensorFlow default optimization. On a diverse set of representative graphs consisting of up to 80,000 nodes, including Inception-v3, Transformer-XL, and WaveNet, GO achieves on average 21% improvement over human experts and 18% improvement over the prior state of the art with 15x faster convergence, on a device placement task evaluated in real systems.

ml compiler, name change, transferable graph optimizer, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.96)

Add feedback

Review for NeurIPS paper: Transferable Graph Optimizers for ML Compilers

Neural Information Processing SystemsJan-27-2025, 02:40:31 GMT

This paper describes a well-motivated and elegant method for optimizing execution of neural network computation graphs using reinforcement learning. The paper is well-written, however there is no open source implementation, hence results may be difficult to reproduce.

ml compiler, neurips paper, transferable graph optimizer

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)

Add feedback

Transferable Graph Optimizers for ML Compilers

Neural Information Processing SystemsOct-10-2024, 22:52:45 GMT

graph optimization task, ml compiler, transferable graph optimizer, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.99)

Add feedback

Operator Fusion in XLA: Analysis and Evaluation

Snider, Daniel, Liang, Ruofan

arXiv.org Artificial IntelligenceJan-30-2023

Machine learning (ML) compilers are an active area of research because they offer the potential to automatically speedup tensor programs. Kernel fusion is often cited as an important optimization performed by ML compilers. However, there exists a knowledge gap about how XLA, the most common ML compiler, applies this nuanced optimization, what kind of speedup it can afford, and what low-level effects it has on hardware. Our paper aims to bridge this knowledge gap by studying key compiler passes of XLA's source code. Our evaluation on a reinforcement learning environment Cartpole shows how different fusion decisions in XLA are made in practice. Furthermore, we implement several XLA kernel fusion strategies that can achieve up to 10.56x speedup compared to our baseline implementation.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2301.13062

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.50)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.46)

Add feedback

5 Types of ML Accelerators

#artificialintelligenceNov-2-2022, 14:30:08 GMT

Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. The past decade has been the era of deep learning.

compiler, lifecycle, platform, (13 more...)

#artificialintelligence

Industry: Information Technology > Services (0.73)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Optimizing Data Collection in Deep Reinforcement Learning

Gleeson, James, Snider, Daniel, Yang, Yvonne, Gabel, Moshe, de Lara, Eyal, Pekhimenko, Gennady

arXiv.org Artificial IntelligenceJul-15-2022

Reinforcement learning (RL) workloads take a notoriously long time to train due to the large number of samples collected at run-time from simulators. Unfortunately, cluster scale-up approaches remain expensive, and commonly used CPU implementations of simulators induce high overhead when switching back and forth between GPU computations. We explore two optimizations that increase RL data collection efficiency by increasing GPU utilization: (1) GPU vectorization: parallelizing simulation on the GPU for increased hardware parallelism, and (2) simulator kernel fusion: fusing multiple simulation steps to run in a single GPU kernel launch to reduce global memory bandwidth requirements. We find that GPU vectorization can achieve up to $1024\times$ speedup over commonly used CPU simulators. We profile the performance of different implementations and show that for a simple simulator, ML compiler implementations (XLA) of GPU vectorization outperform a DNN framework (PyTorch) by $13.4\times$ by reducing CPU overhead from repeated Python to DL backend API calls. We show that simulator kernel fusion speedups with a simple simulator are $11.3\times$ and increase by up to $1024\times$ as simulator complexity increases in terms of memory bandwidth requirements. We show that the speedups from simulator kernel fusion are orthogonal and combinable with GPU vectorization, leading to a multiplicative speedup.

implementation, kernel fusion, simulator, (14 more...)

arXiv.org Artificial Intelligence

2207.07736

Country:

North America > Canada > Ontario > Toronto (0.49)
North America > United States > California > Santa Clara County > Santa Clara (0.05)
North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Industry: Information Technology (0.47)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Graphics (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.56)

Add feedback

Why are ML Compilers so Hard?

#artificialintelligenceFeb-12-2022, 00:35:30 GMT

Even before the first version of TensorFlow was released, the XLA project was integrated as a "domain-specific compiler" for its machine learning graphs. Since then there have been a lot of other compilers aimed at ML problems, like TVM, MLIR, EON, and GLOW. They have all been very successful in different areas, but they're still not the primary way for most users to run machine learning models. In this post I want to talk about some of the challenges that face ML compiler writers, and some approaches I think may help in the future. I'm not a compiler expert at all, but I have been working on infrastructure to run deep learning models across different platforms for the last ten years, so most of my observations come from being a user rather than an implementer of compiler technology.

compiler, opération, representation, (16 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

How can we democratize machine learning on IoT devices?

#artificialintelligenceMar-15-2020, 22:01:14 GMT

TinyML, as a concept, concerns the running of ML inference on Ultra Low-Power (ULP 1mW) microcontrollers found on IoT devices. Yet today, various challenges still limit the effective execution of TinyML in the embedded IoT world. As both a concept and community, it is still under development. Here at Ericsson, the focus of our TinyML as-a-Service (TinyMLaaS) activity is to democratize TinyML, enabling manufacturers to start their AI businesses using TinyML, which runs on 8, 16 and 32 bit microcontrollers. Our goal is to make the execution of ML tasks possible and easy in a specific class of devices.

compiler, ecosystem, ml compiler, (11 more...)

#artificialintelligence

Country: Europe > Hungary > Budapest > Budapest (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.86)
Information Technology > Internet of Things (0.72)

Add feedback