AITopics | parallelism

Collaborating Authors

parallelism

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

References

Neural Information Processing SystemsApr-25-2026, 06:15:28 GMT

Distributed balanced partitioning via linear embedding. Language models are few-shot learners. Geeps: Scalable deep learning on distributed gpus with a gpu-specialized parameter server. More effective distributed ml via a stale synchronous parallel parameter server. Transgan: Two pure transformers can make one strong gan, and that can scale up.

artificial intelligence, arxiv preprint arxiv, machine learning, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

AMP: Automatically Finding Model Parallel Strategies with Heterogeneity Awareness

Neural Information Processing SystemsApr-25-2026, 06:15:25 GMT

Scaling up model sizes can lead to fundamentally new capabilities in many machine learning (ML) tasks. However, training big models requires strong distributed system expertise to carefully design model-parallel execution strategies that suit the model architectures and cluster setups. In this paper, we develop AMP, a framework that automatically derives such strategies. AMP identifies a valid space of model parallelism strategies and efficiently searches the space for high-performed strategies, by leveraging a cost model designed to capture the heterogeneity of the model and cluster specifications. Unlike existing methods, AMP is specifically tailored to support complex models composed of uneven layers and cluster setups with more heterogeneous accelerators and bandwidth. We evaluate AMP on popular models and cluster setups from public clouds and show that AMP returns parallel strategies that match the expert-tuned strategies on typical cluster setups. On heterogeneous clusters or models with heterogeneous architectures, AMP finds strategies with 1.54 and 1.77 higher throughput than state-of-the-art model-parallel systems, respectively.

artificial intelligence, machine learning, optimization problem, (20 more...)

Neural Information Processing Systems

Genre: Research Report (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Kraken: Inherently Parallel Transformers For Efficient Multi-Device Inference

Neural Information Processing SystemsMar-18-2026, 03:19:41 GMT

Large Transformer networks are increasingly used in settings where low inference latency is necessary to enable new applications and improve the end-user experience.However, autoregressive inference is resource intensive and requires parallelism for efficiency.Parallelism introduces collective communication that is both expensive and represents a phase when hardware resources are underutilized.Towards mitigating this, Kraken is an evolution of the standard Transformer architecture that is designed to complement existing tensor parallelism schemes for efficient inference on multi-device systems.By introducing a fixed degree of intra-layer model parallelism, the architecture allows collective operations to be overlapped with compute, decreasing latency and increasing hardware utilization.When trained on OpenWebText, Kraken models reach a similar perplexity as standard Transformers while also preserving their language modeling capabilities as evaluated on the SuperGLUE benchmark.Importantly, when tested on multi-GPU systems using TensorRT-LLM engines, Kraken speeds up Time To First Token by a mean of 35.6% across a range of model sizes, context lengths, and degrees of tensor parallelism.

artificial intelligence, large language model, natural language, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.60)

Add feedback

b14680dec683e744ada1f2fe08614086-Supplemental.pdf

Neural Information Processing SystemsFeb-19-2026, 06:21:37 GMT

accelerator, graph, workload, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
Europe > Germany (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.67)

Add feedback

ASPEN: Breaking Operator Barriers for Efficient Parallel Execution of Deep Neural Networks

Neural Information Processing SystemsFeb-17-2026, 10:07:04 GMT

ASPEN also achieves high resource utilization and memory reuse by letting each resource asynchronously traverse depthwise in the DNN graph to its full computing potential.

artificial intelligence, execution, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > South Korea > Seoul > Seoul (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
North America > United States > Colorado > Boulder County > Boulder (0.04)
Africa > Mali (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Accelerated On-Device Forward Neural Network Training with Module-Wise Descending Asynchronism

Neural Information Processing SystemsFeb-16-2026, 07:02:35 GMT

However, FGD's dependencies across layers hinder parallel computation and can lead to inefficient resource utilization.

artificial intelligence, asyncfgd, machine learning, (15 more...)

Neural Information Processing Systems

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Middle East > UAE (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre:

Research Report > Promising Solution (0.46)
Research Report > New Finding (0.46)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

OptEx: Expediting First-Order Optimization with Approximately Parallelized Iterations

Neural Information Processing SystemsFeb-16-2026, 01:34:19 GMT

First-order optimization (FOO) algorithms are pivotal in numerous computational domains, such as reinforcement learning and deep learning. However, their application to complex tasks often entails significant optimization inefficiency due to their need of many sequential iterations for convergence. In response, we introduce first-order opt imization ex pedited with approximately parallelized iterations (OptEx), the first general framework that enhances the optimization efficiency of FOO by leveraging parallel computing to directly mitigate its requirement of many sequential iterations for convergence. To achieve this, OptEx utilizes a kernelized gradient estimation that is based on the history of evaluated gradients to predict the gradients required by the next few sequential iterations in FOO, which helps to break the inherent iterative dependency and hence enables the approximate paral-lelization of iterations in FOO. We further establish theoretical guarantees for the estimation error of our kernelized gradient estimation and the iteration complexity of SGD-based OptEx, confirming that the estimation error diminishes to zero as the history of gradients accumulates and that our SGD-based OptEx enjoys an effective acceleration rate of Θ( N) over standard SGD given parallelism of N, in terms of the sequential iterations required for convergence. Finally, we provide extensive empirical studies, including synthetic functions, reinforcement learning tasks, and neural network training on various datasets, to underscore the substantial efficiency improvements achieved by OptEx in practice. Our implementation is available at https://github.com/youyve/OptEx .

artificial intelligence, deep learning, machine learning, (18 more...)

Neural Information Processing Systems

Country: