AITopics | worker node

Approximate Gradient Coding for Distributed Learning with Heterogeneous Stragglers

Neural Information Processing SystemsJun-23-2026, 09:13:52 GMT

In this paper, we propose an optimally structured gradient coding scheme to mitigate the straggler problem in distributed learning. Conventional gradient coding methods often assume homogeneous straggler models or rely on excessive data replication, limiting performance in real-world heterogeneous systems. To address these limitations, we formulate an optimization problem minimizing residual error while ensuring unbiased gradient estimation by explicitly considering individual straggler probabilities. We derive closed-form solutions for optimal encoding and decoding coefficients via Lagrangian duality and convex optimization, and propose data allocation strategies that reduce both redundancy and computation load. We also analyze convergence behavior for λ-strongly convex and µ-smooth loss functions. Numerical results show that our approach significantly reduces the impact of stragglers and accelerates convergence compared to existing methods.

artificial intelligence, machine learning, optimization problem, (18 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Add feedback

Finding Local Minima Efficiently in Decentralized Optimization

Neural Information Processing SystemsApr-29-2026, 17:19:50 GMT

In this paper we study the second-order optimality of decentralized stochastic algorithm that escapes saddle point efficiently for nonconvex optimization problems. We propose a new pure gradient-based decentralized stochastic algorithm PEDESTAL with a novel convergence analysis framework to address the technical challenges unique to the decentralized stochastic setting. Our method is the first decentralized stochastic algorithm to achieve second-order optimality with non-asymptotic analysis. We provide theoretical guarantees with the gradient complexity of O(ϵ 3)to find O(ϵ, ϵ)-second-order stationary point, which matches state-of-the-art results of centralized counterparts or decentralized methods to find first-order stationary point. We also conduct two decentralized tasks in our experiments, a matrix sensing task with synthetic data and a matrix factorization task with a real-world dataset to validate the performance of our method.

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)

Add feedback

Coded Computing for Resilient Distributed Computing: A Learning-Theoretic Framework

Neural Information Processing SystemsMar-22-2026, 12:10:31 GMT

Coded computing has emerged as a promising framework for tackling significant challenges in large-scale distributed computing, including the presence of slow, faulty, or compromised servers. In this approach, each worker node processes a combination of the data, rather than the raw data itself. The final result then is decoded from the collective outputs of the worker nodes. However, there is a significant gap between current coded computing approaches and the broader landscape of general distributed computing, particularly when it comes to machine learning workloads. To bridge this gap, we propose a novel foundation for coded computing, integrating the principles of learning theory, and developing a framework that seamlessly adapts with machine learning applications.

artificial intelligence, machine learning, proceedings, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Snap ML: A Hierarchical Framework for Machine Learning

Celestine Dünner, Thomas Parnell, Dimitrios Sarigiannis, Nikolas Ioannou, Andreea Anghel, Gummadi Ravi, Madhusudanan Kandasamy, Haralampos Pozidis

Neural Information Processing SystemsMar-13-2026, 20:42:48 GMT

We describe a new software framework for fast training of generalized linear models. Theframework,named Snap Machine Learning (Snap ML), combines recent advances inmachine learning systems andalgorithms inanested manner to reflect the hierarchical architecture of modern computing systems.

artificial intelligence, machine learning, snap ml, (15 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.15)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Florida > Orange County > Orlando (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.52)

Add feedback

Coded Computing for Resilient Distributed Computing: A Learning-Theoretic Framework

Neural Information Processing SystemsFeb-18-2026, 04:19:55 GMT

Coded computing has emerged as a promising framework for tackling significant challenges in large-scale distributed computing, including the presence of slow, faulty, or compromised servers. In this approach, each worker node processes a combination of the data, rather than the raw data itself. The final result then is decoded from the collective outputs of the worker nodes. However, there is a significant gap between current coded computing approaches and the broader landscape of general distributed computing, particularly when it comes to machine learning workloads. To bridge this gap, we propose a novel foundation for coded computing, integrating the principles of learning theory, and developing a framework that seamlessly adapts with machine learning applications. In this framework, the objective is to find the encoder and decoder functions that minimize the loss function, defined as the mean squared error between the estimated and true values. Facilitating the search for the optimum decoding and functions, we show that the loss function can be upper-bounded by the summation of two terms: the generalization error of the decoding function and the training error of the encoding function. Focusing on the second-order Sobolev space, we then derive the optimal encoder and decoder.

artificial intelligence, enc, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Africa > Sudan (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry: Information Technology (0.67)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Finding Local Minima Efficiently in Decentralized Optimization

Neural Information Processing SystemsFeb-17-2026, 00:40:16 GMT

In this paper we study the second-order optimality of decentralized stochastic algorithm that escapes saddle point efficiently for nonconvex optimization problems.

algorithm, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Maryland > Prince George's County > College Park (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)

Add feedback

eb3c8135137c8a60425a0320869ad87e-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 15:51:02 GMT

machine learning, reinforcement learning, worker node, (19 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Appendix of A Deep Learning Dataloader with Shared Data Preparation

Neural Information Processing SystemsFeb-9-2026, 16:05:34 GMT

In this part, we show the I/O speed in the synchronous and asynchronous cases. Figure 3a show the I/O speed for four jobs that start at different moments. Then we further compare the RefCnt with the generic cache policy in the above cases. D = sample ([0, 13333], 10000) means sample a subset D with 10000 of size from [0, 13333] uniformly at random 36th Conference on Neural Information Processing Systems (NeurIPS 2022). DSA can always get the minimum misses.

artificial intelligence, deep learning, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Champaign County > Urbana (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Verifiable Split Learning via zk-SNARKs

Alaa, Rana, González-Ferreiro, Darío, Beis-Penedo, Carlos, Fernández-Veiga, Manuel, Díaz-Redondo, Rebeca P., Fernández-Vilas, Ana

arXiv.org Artificial IntelligenceNov-4-2025

Split learning is an approach to collaborative learning in which a deep neural network is divided into two parts: client-side and server-side at a cut layer. The client side executes its model using its raw input data and sends the intermediate activation to the server side. This configuration architecture is very useful for enabling collaborative training when data or resources are separated between devices. However, split learning lacks the ability to verify the correctness and honesty of the computations that are performed and exchanged between the parties. To this purpose, this paper proposes a verifiable split learning framework that integrates a zk-SNARK proof to ensure correctness and verifiability. The zk-SNARK proof and verification are generated for both sides in forward propagation and backward propagation on the server side, guaranteeing verifiability on both sides. The verifiable split learning architecture is compared to a blockchain-enabled system for the same deep learning network, one that records updates but without generating the zero-knowledge proof. From the comparison, it can be deduced that applying the zk-SNARK test achieves verifiability and correctness, while blockchains are lightweight but unverifiable.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2511.01356

Country: Europe (0.28)

Genre:

Research Report (0.82)
Overview (0.68)

Industry:

Information Technology > Security & Privacy (1.00)
Government (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Approximate Gradient Coding for Distributed Learning with Heterogeneous Stragglers

Song, Heekang, Choi, Wan

arXiv.org Artificial IntelligenceOct-28-2025

In this paper, we propose an optimally structured gradient coding scheme to mitigate the straggler problem in distributed learning. Conventional gradient coding methods often assume homogeneous straggler models or rely on excessive data replication, limiting performance in real-world heterogeneous systems. To address these limitations, we formulate an optimization problem minimizing residual error while ensuring unbiased gradient estimation by explicitly considering individual straggler probabilities. We derive closed-form solutions for optimal encoding and decoding coefficients via Lagrangian duality and convex optimization, and propose data allocation strategies that reduce both redundancy and computation load. We also analyze convergence behavior for $λ$-strongly convex and $μ$-smooth loss functions. Numerical results show that our approach significantly reduces the impact of stragglers and accelerates convergence compared to existing methods.

artificial intelligence, gradient, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2510.22539

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Filters

Collaborating Authors

worker node

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Approximate Gradient Coding for Distributed Learning with Heterogeneous Stragglers

Finding Local Minima Efficiently in Decentralized Optimization

Coded Computing for Resilient Distributed Computing: A Learning-Theoretic Framework

Snap ML: A Hierarchical Framework for Machine Learning

Coded Computing for Resilient Distributed Computing: A Learning-Theoretic Framework

Finding Local Minima Efficiently in Decentralized Optimization

eb3c8135137c8a60425a0320869ad87e-Supplemental-Conference.pdf

Appendix of A Deep Learning Dataloader with Shared Data Preparation

Verifiable Split Learning via zk-SNARKs

Approximate Gradient Coding for Distributed Learning with Heterogeneous Stragglers