AITopics | device placement

Modern machine learning workloads use large models, with complex structures, that are very expensive to execute. The devices that execute complex models are becoming increasingly heterogeneous as we see a flourishing of Domain Specific Architectures (DSAs) being offered as hardware accelerators in addition to CPUs.

device placement, efficient algorithm, name change, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.78)

Add feedback

Post: Device Placement with Cross-Entropy Minimization and Proximal Policy Optimization

Neural Information Processing SystemsNov-20-2025, 22:38:25 GMT

Training deep neural networks requires an exorbitant amount of computation resources, including a heterogeneous mix of GPU and CPU devices. It is critical to place operations in a neural network on these devices in an optimal way, so that the training process can complete within the shortest amount of time. The state-of-the-art uses reinforcement learning to learn placement skills by repeatedly performing Monte-Carlo experiments. However, due to its equal treatment of placement samples, we argue that there remains ample room for significant improvements. In this paper, we propose a new joint learning algorithm, called Post, that integrates cross-entropy minimization and proximal policy optimization to achieve theoretically guaranteed optimal efficiency. In order to incorporate the cross-entropy method as a sampling technique, we propose to represent placements using discrete probability distributions, which allows us to estimate an optimal probability mass by maximal likelihood estimation, a powerful tool with the best possible efficiency. We have implemented Post in the Google Cloud platform, and our extensive experiments with several popular neural network training benchmarks have demonstrated clear evidence of superior performance: with the same amount of learning time, it leads to placements that have training times up to 63.7% shorter over the state-of-the-art.

device placement, minimization and proximal policy optimization, name change, (4 more...)

Neural Information Processing Systems

Industry: Information Technology > Services (0.60)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A Structure-Aware Framework for Learning Device Placements on Computation Graphs

Neural Information Processing SystemsOct-10-2025, 10:09:19 GMT

Computation graphs are Directed Acyclic Graphs (DAGs) where the nodes correspond to mathematical operations and are used widely as abstractions in optimizations of neural networks.

computation graph, device placement, experiment, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > Switzerland (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Health & Medicine (0.68)
Government > Regional Government > North America Government > United States Government (0.67)
Information Technology (0.67)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

9f29450d2eb58feb555078bdefe28aa5-Paper.pdf

Neural Information Processing SystemsAug-15-2025, 11:44:57 GMT

graph, placement, policy network, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Santa Clara County > Mountain View (0.04)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Communications > Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
(2 more...)

Add feedback

Efficient Algorithms for Device Placement of DNN Graph Operators

Neural Information Processing SystemsMay-27-2025, 09:21:37 GMT

Modern machine learning workloads use large models, with complex structures, that are very expensive to execute. The devices that execute complex models are becoming increasingly heterogeneous as we see a flourishing of Domain Specific Architectures (DSAs) being offered as hardware accelerators in addition to CPUs. Recent work has shown that significant gains can be obtained with model parallelism, i.e, partitioning a neural network's computational graph onto multiple devices. In particular, this form of parallelism assumes a pipeline of devices, which is fed a stream of samples and yields high throughput for training and inference of DNNs. However, for such settings (large models and multiple heterogeneous devices), we require automated algorithms and toolchains that can partition the ML workload across devices.

artificial intelligence, efficient algorithm, machine learning, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Efficient Algorithms for Device Placement of DNN Graph Operators

Neural Information Processing SystemsOct-11-2024, 03:21:53 GMT

Modern machine learning workloads use large models, with complex structures, that are very expensive to execute. The devices that execute complex models are becoming increasingly heterogeneous as we see a flourishing of Domain Specific Architectures (DSAs) being offered as hardware accelerators in addition to CPUs. Recent work has shown that significant gains can be obtained with model parallelism, i.e, partitioning a neural network's computational graph onto multiple devices. In particular, this form of parallelism assumes a pipeline of devices, which is fed a stream of samples and yields high throughput for training and inference of DNNs. However, for such settings (large models and multiple heterogeneous devices), we require automated algorithms and toolchains that can partition the ML workload across devices.

device placement, dnn graph operator, efficient algorithm, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Reviews: Post: Device Placement with Cross-Entropy Minimization and Proximal Policy Optimization

Neural Information Processing SystemsOct-9-2024, 02:41:31 GMT

This is a great work as it tackles an important problem: graph partitioning in heterogeneous/multi-device settings. There is an increasing number of problems that could benefit from resource allocation optimization techniques such as the one described in this work. ML and specifically RL techniques have been recently developed to solve the problem of device placement. This work addresses one of the main deficiencies of the prior work by making more sample efficient (as demonstrated by empirical results). The novelty is in the way the placement parameters are trained: As oppose to directly train a placement policy for best runtime, a softmax is used to model the distribution of op placements on devices (for each device among the pool of available devices.)

device placement, minimization and proximal policy optimization, placement, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.59)

Add feedback

A structure-aware framework for learning device placements on computation graphs

Duan, Shukai, Ping, Heng, Kanakaris, Nikos, Xiao, Xiongye, Zhang, Peiyu, Kyriakis, Panagiotis, Ahmed, Nesreen K., Ma, Guixiang, Capota, Mihai, Nazarian, Shahin, Willke, Theodore L., Bogdan, Paul

arXiv.org Artificial IntelligenceMay-23-2024

Existing approaches for device placement ignore the topological features of computation graphs and rely mostly on heuristic methods for graph partitioning. At the same time, they either follow a grouper-placer or an encoder-placer architecture, which requires understanding the interaction structure between code operations. To bridge the gap between encoder-placer and grouper-placer techniques, we propose a novel framework for the task of device placement, relying on smaller computation graphs extracted from the OpenVINO toolkit using reinforcement learning. The framework consists of five steps, including graph coarsening, node representation learning and policy optimization. It facilitates end-to-end training and takes into consideration the directed and acyclic nature of the computation graphs. We also propose a model variant, inspired by graph parsing networks and complex network analysis, enabling graph representation learning and personalized graph partitioning jointly, using an unspecified number of groups. To train the entire framework, we utilize reinforcement learning techniques by employing the execution time of the suggested device placements to formulate the reward. We demonstrate the flexibility and effectiveness of our approach through multiple experiments with three benchmark models, namely Inception-V3, ResNet, and BERT. The robustness of the proposed framework is also highlighted through an ablation study. The suggested placements improve the inference speed for the benchmark models by up to $58.2\%$ over CPU execution and by up to $60.24\%$ compared to other commonly used baselines.

computation graph, device placement, neural network, (15 more...)

arXiv.org Artificial Intelligence

2405.14185

Country:

North America > United States > California (0.15)
North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

Filters

Collaborating Authors

device placement

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

94bd1b8c906494bada774096bd0fdd73-Paper-Conference.pdf

9f29450d2eb58feb555078bdefe28aa5-Paper.pdf

Efficient Algorithms for Device Placement of DNN Graph Operators

Post: Device Placement with Cross-Entropy Minimization and Proximal Policy Optimization

A Structure-Aware Framework for Learning Device Placements on Computation Graphs

9f29450d2eb58feb555078bdefe28aa5-Paper.pdf

Efficient Algorithms for Device Placement of DNN Graph Operators

Efficient Algorithms for Device Placement of DNN Graph Operators

Reviews: Post: Device Placement with Cross-Entropy Minimization and Proximal Policy Optimization

A structure-aware framework for learning device placements on computation graphs