AITopics | autosync

0a2298a72858d90d5c4b4fee954b6896-Paper.pdf

Neural Information Processing SystemsFeb-7-2026, 10:14:21 GMT

arxiv preprint arxiv, autosync, simulator, (15 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
Asia > China (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

AutoSync: Learning to Synchronize for Data-Parallel Distributed Deep Learning

Neural Information Processing SystemsDec-23-2025, 17:52:24 GMT

Synchronization is a key step in data-parallel distributed machine learning (ML). Different synchronization systems and strategies perform differently, and to achieve optimal parallel training throughput requires synchronization strategies that adapt to model structures and cluster configurations. Existing synchronization systems often only consider a single or a few synchronization aspects, and the burden of deciding the right synchronization strategy is then placed on the ML practitioners, who may lack the required expertise. In this paper, we develop a model-and resource-dependent representation for synchronization, which unifies multiple synchronization aspects ranging from architecture, message partitioning, placement scheme, to communication topology. Based on this representation, we build an end-to-end pipeline, AutoSync, to automatically optimize synchronization strategies given model structures and resource specifications, lowering the bar for data-parallel distributed ML. By learning from low-shot data collected in only 200 trial runs, AutoSync can discover synchronization strategies up to 1.6x better than manually optimized ones. We develop transfer-learning mechanisms to further reduce the auto-optimization cost -- the simulators can transfer among similar model architectures, among similar cluster configurations, or both. We also present a dataset that contains over 10000 synchronization strategies and run-time pairs on a diverse set of models and cluster specifications.

autosync, learning, synchronization strategy, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

0a2298a72858d90d5c4b4fee954b6896-Supplemental.pdf

Neural Information Processing SystemsOct-9-2025, 13:06:41 GMT

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

Add feedback

0a2298a72858d90d5c4b4fee954b6896-Paper.pdf

Neural Information Processing SystemsOct-9-2025, 13:06:34 GMT

artificial intelligence, machine learning, simulator, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

AutoSync: Learning to Synchronize for Data-Parallel Distributed Deep Learning

Neural Information Processing SystemsMay-26-2025, 15:38:15 GMT

Synchronization is a key step in data-parallel distributed machine learning (ML). Different synchronization systems and strategies perform differently, and to achieve optimal parallel training throughput requires synchronization strategies that adapt to model structures and cluster configurations. Existing synchronization systems often only consider a single or a few synchronization aspects, and the burden of deciding the right synchronization strategy is then placed on the ML practitioners, who may lack the required expertise. In this paper, we develop a model- and resource-dependent representation for synchronization, which unifies multiple synchronization aspects ranging from architecture, message partitioning, placement scheme, to communication topology. Based on this representation, we build an end-to-end pipeline, AutoSync, to automatically optimize synchronization strategies given model structures and resource specifications, lowering the bar for data-parallel distributed ML.

artificial intelligence, deep learning, machine learning, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

Review for NeurIPS paper: AutoSync: Learning to Synchronize for Data-Parallel Distributed Deep Learning

Neural Information Processing SystemsJan-21-2025, 09:47:32 GMT

Additional Feedback: Section 3.1 Equation (2) I believe p is missing in r {\Pi}_{i,k} . The example of 7 days and 2200 AWS credits saving should be given in the context of the full cost. In subsection'Search space evaluation' I don't understand how 42% for VGG16 and 28.5% can be considered as a large positive hit rate. They way I understand it it means 58% and 71.5% of the strategies were worst than hand-optimized baselines. Why no hit-rate in Figure 3? Table 3 would be more informative in terms of improvement measures.

deep learning, learning, neurips paper, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

Review for NeurIPS paper: AutoSync: Learning to Synchronize for Data-Parallel Distributed Deep Learning

Neural Information Processing SystemsJan-21-2025, 09:47:25 GMT

The authors cast the task of parallel training as a learning problem, allowing data driven decisions to be made instead of the hand-crafted rules. The topic is relevant and the results are impactful. The comprehensive ablation studies performed to evaluate the system are also appreciated. Several aspects of the proposed system have room for improvement, both in terms of scope and quality. However, that doesn't seem to be a crucial problem with the paper but rather room for follow up works.

deep learning, learning, neurips paper, (2 more...)

Neural Information Processing Systems

Industry: Education > Focused Education > Special Education (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

AutoSync: Learning to Synchronize for Data-Parallel Distributed Deep Learning

Neural Information Processing SystemsOct-9-2024, 11:33:40 GMT

Synchronization is a key step in data-parallel distributed machine learning (ML). Different synchronization systems and strategies perform differently, and to achieve optimal parallel training throughput requires synchronization strategies that adapt to model structures and cluster configurations. Existing synchronization systems often only consider a single or a few synchronization aspects, and the burden of deciding the right synchronization strategy is then placed on the ML practitioners, who may lack the required expertise. In this paper, we develop a model- and resource-dependent representation for synchronization, which unifies multiple synchronization aspects ranging from architecture, message partitioning, placement scheme, to communication topology. Based on this representation, we build an end-to-end pipeline, AutoSync, to automatically optimize synchronization strategies given model structures and resource specifications, lowering the bar for data-parallel distributed ML.

autosync, learning, synchronization strategy, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback