AITopics | straggler mitigation

Straggler Mitigation in Distributed Optimization Through Data Encoding

Neural Information Processing SystemsNov-21-2025, 15:27:40 GMT

Slow running or straggler tasks can significantly reduce computation speed in distributed computation. Recently, coding-theory-inspired approaches have been applied to mitigate the effect of straggling, through embedding redundancy in certain linear computational steps of the optimization algorithm, thus completing the computation without waiting for the stragglers. In this paper, we propose an alternate approach where we embed the redundancy directly in the data itself, and allow the computation to proceed completely oblivious to encoding. We propose several encoding schemes, and demonstrate that popular batch algorithms, such as gradient descent and L-BFGS, applied in a coding-oblivious manner, deterministically achieve sample path linear convergence to an approximate solution of the original problem, using an arbitrarily varying subset of the nodes at each iteration. Moreover, this approximation can be controlled by the amount of redundancy and the number of nodes used in each iteration. We provide experimental results demonstrating the advantage of the approach over uncoded and data replication strategies.

data encoding, optimization, straggler mitigation, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.41)

Add feedback

Reviews: Straggler Mitigation in Distributed Optimization Through Data Encoding

Neural Information Processing SystemsOct-7-2024, 23:07:57 GMT

This paper addresses the issue of performing distributed optimization in the presence of straggling/slow computation units. In particular, the paper focuses on the problem of linear regression min_w \ Xw - y\ _2, ---- (1) where X [(X_1) T, (X_2) T,..., (X_m) T] T and y [y_1, y_2,..., y_m] T denote the data points and the corresponding labels. In general, the distributed setup with m worker nodes allocates i -th data point X_i and the associated label y_i to i -th worker node. The linear regression problem is then solved in an iterative manner where messages/information needs to be communicated among (master) server and the worker nodes. However, in practice, some of the workers (aka stragglers) take longer time to completer their end of processing/communication, which slows down the entire distributed optimization problem.

straggler mitigation, tilde, worker node, (13 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.59)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.40)

Add feedback

Nested Gradient Codes for Straggler Mitigation in Distributed Machine Learning

Maßny, Luis, Hofmeister, Christoph, Egger, Maximilian, Bitar, Rawad, Wachter-Zeh, Antonia

arXiv.org Machine LearningDec-16-2022

We consider distributed learning in the presence of slow and unresponsive worker nodes, referred to as stragglers. In order to mitigate the effect of stragglers, gradient coding redundantly assigns partial computations to the worker such that the overall result can be recovered from only the non-straggling workers. Gradient codes are designed to tolerate a fixed number of stragglers. Since the number of stragglers in practice is random and unknown a priori, tolerating a fixed number of stragglers can yield a sub-optimal computation load and can result in higher latency. We propose a gradient coding scheme that can tolerate a flexible number of stragglers by carefully concatenating gradient codes for different straggler tolerance. By proper task scheduling and small additional signaling, our scheme adapts the computation load of the workers to the actual number of stragglers. We analyze the latency of our proposed scheme and show that it has a significantly lower latency than gradient codes.

artificial intelligence, machine learning, straggler, (16 more...)

arXiv.org Machine Learning

2212.0858

Country: Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (0.41)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Straggler Mitigation in Distributed Optimization Through Data Encoding

Karakus, Can, Sun, Yifan, Diggavi, Suhas, Yin, Wotao

Neural Information Processing SystemsDec-31-2017

Slow running or straggler tasks can significantly reduce computation speed in distributed computation. Recently, coding-theory-inspired approaches have been applied to mitigate the effect of straggling, through embedding redundancy in certain linear computational steps of the optimization algorithm, thus completing the computation without waiting for the stragglers. In this paper, we propose an alternate approach where we embed the redundancy directly in the data itself, and allow the computation to proceed completely oblivious to encoding. We propose several encoding schemes, and demonstrate that popular batch algorithms, such as gradient descent and L-BFGS, applied in a coding-oblivious manner, deterministically achieve sample path linear convergence to an approximate solution of the original problem, using an arbitrarily varying subset of the nodes at each iteration. Moreover, this approximation can be controlled by the amount of redundancy and the number of nodes used in each iteration. We provide experimental results demonstrating the advantage of the approach over uncoded and data replication strategies.

artificial intelligence, machine learning, optimization problem, (18 more...)

Neural Information Processing Systems

Country: