model parallelism
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
- Information Technology > Artificial Intelligence > Natural Language (0.93)
- Information Technology > Hardware (0.68)
- North America > United States (0.05)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.05)
- North America > Canada (0.04)
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > Virginia (0.05)
- North America > United States > Oregon (0.05)
- (8 more...)
Piper: Multidimensional Planner for DNN Parallelization
The rapid increase in sizes of state-of-the-art DNN models, and consequently the increase in the compute and memory requirements of model training, has led to the development of many execution schemes such as data parallelism, pipeline model parallelism, tensor (intra-layer) model parallelism, and various memory-saving optimizations. However, no prior work has tackled the highly complex problem of optimally partitioning the DNN computation graph across many accelerators while combining all these parallelism modes and optimizations.In this work, we introduce Piper, an efficient optimization algorithm for this problem that is based on a two-level dynamic programming approach. Our two-level approach is driven by the insight that being given tensor-parallelization techniques for individual layers (e.g., Megatron-LM's splits for transformer layers) significantly reduces the search space and makes the global problem tractable, compared to considering tensor-parallel configurations for the entire DNN operator graph.
ASAP: an Agentic Solution to Auto-optimize Performance of Large-Scale LLM Training
Ding, Yuran, Chen, Xinwei, Zhang, Xiaofan, Zhou, Zongwei
Optimizing large-language model (LLM) training on distributed domain-specific accelerator systems presents significant challenges due to its complex optimization space. Existing optimization methods, however, rely on time-consuming manual tuning or resource-intensive black-box searches, which struggle to keep pace with the rapidly evolving LLM domain, leading to slow development and underutilized resources. To address this, we introduce ASAP, an Agentic Solution to Auto-optimize Performance of Large-Scale LLM Training. It is a multi-agent system, featuring Coordinator, Analyzer, and Proposal agents, which integrates LLM reasoning with insights from performance profiling tools, roofline analysis, and a knowledge base of best practices and successful past optimizations from human experts. Our proposed design can automate the diagnosis of performance bottlenecks and recommend optimized sharding configurations with reasoning, thus effectively improving the efficiency of distributed LLM training. Experiments have shown that the ASAP-generated sharding configurations can contribute up to 28% training step time reduction and 1.43 times throughput improvement. When combined with additional optimization from human experts, throughput can be further increased to 2.58 times. The proposed ASAP promises to provide a scalable and explainable methodology for AI-assisted performance engineering in large-scale LLM training.
- North America > United States > Maryland > Prince George's County > College Park (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
ffe10334251de1dc98339d99ae4743ba-AuthorFeedback.pdf
We thank the reviewers for their thoughtful comments. But consider the case of training BERT on a TPU pod, which takes around 4 days. We provide a formalization of the problem with rigorous guarantees. We now address a few of the specific reviewer concerns. However, in the revised version of this paper we will include a more thorough discussion of this. That post draws on Courcelle's theorem (namely, every graph property definable in the monadic second-order We feel that it's more accurate to avoid
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
- Information Technology > Artificial Intelligence > Natural Language (0.93)
- Information Technology > Hardware (0.68)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (2 more...)
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > Virginia (0.05)
- North America > United States > Oregon (0.05)
- (8 more...)