ogwild
- Asia > Middle East > Jordan (0.05)
- North America > United States > California > Alameda County > Berkeley (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.05)
- North America > United States > California > Santa Clara County > Stanford (0.04)
!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent
Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve stateof-the-art performance on a variety of machine learning tasks. Several researchers have recently proposed schemes to parallelize SGD, but all require performancedestroying memory locking and synchronization. This work aims to show using novel theoretical analysis, algorithms, and implementation that SGD can be implemented without any locking.
- North America > United States > Wisconsin > Dane County > Madison (0.14)
- North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- North America > United States > California (0.04)
Taming the Wild: A Unified Analysis of H!-Style Algorithms
Stochastic gradient descent (SGD) is a ubiquitous algorithm for a variety of machine learning problems. Researchers and industry have developed several techniques to optimize SGD's runtime performance, including asynchronous execution and reduced precision. Our main result is a martingale-based analysis that enables us to capture the rich noise models that may arise from such techniques.
- North America > United States > California > Santa Clara County > Palo Alto (0.05)
- North America > United States > California > Santa Clara County > Stanford (0.04)
Conflict-free Asynchronous Machine Learning
In all of these studies, classic algorithms are parallelized by simply running parallel and asynchronous model updates without locks. These lock-free, asynchronous algorithms exhibit speedups even when applied to large, non-convex problems, as demonstrated by deep learning systems such as Google's Downpour SGD [6] and Microsoft's Project Adam [4]. While these techniques have been remarkably successful, many of the above papers require delicate and tailored analyses to quantify the benefits of asynchrony for each particular learning task. Moreover, in non-convex settings, we currently have little quantitative insight into how much speedup is gained from asynchrony.
- Asia > Middle East > Jordan (0.05)
- North America > United States > California > Alameda County > Berkeley (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Taming the Wild: A Unified Analysis of Hogwild-Style Algorithms
Sa, Christopher M. De, Zhang, Ce, Olukotun, Kunle, Ré, Christopher, Ré, Christopher
Stochastic gradient descent (SGD) is a ubiquitous algorithm for a variety of machine learning problems. Researchers and industry have developed several techniques to optimize SGD's runtime performance, including asynchronous execution and reduced precision. Our main result is a martingale-based analysis that enables us to capture the rich noise models that may arise from such techniques. Specifically, we useour new analysis in three ways: (1) we derive convergence rates for the convex case (Hogwild) with relaxed assumptions on the sparsity of the problem; (2) we analyze asynchronous SGD algorithms for non-convex matrix problems including matrix completion; and (3) we design and analyze an asynchronous SGD algorithm, called Buckwild, that uses lower-precision arithmetic. We show experimentally that our algorithms run efficiently for a variety of problems on modern hardware.
- North America > United States > California > Santa Clara County > Palo Alto (0.05)
- North America > United States > California > Santa Clara County > Stanford (0.04)
Taming the Wild: A Unified Analysis of Hogwild!-Style Algorithms
De Sa, Christopher, Zhang, Ce, Olukotun, Kunle, Ré, Christopher
Stochastic gradient descent (SGD) is a ubiquitous algorithm for a variety of machine learning problems. Researchers and industry have developed several techniques to optimize SGD's runtime performance, including asynchronous execution and reduced precision. Our main result is a martingale-based analysis that enables us to capture the rich noise models that may arise from such techniques. Specifically, we use our new analysis in three ways: (1) we derive convergence rates for the convex case (Hogwild!) with relaxed assumptions on the sparsity of the problem; (2) we analyze asynchronous SGD algorithms for non-convex matrix problems including matrix completion; and (3) we design and analyze an asynchronous SGD algorithm, called Buckwild!, that uses lower-precision arithmetic. We show experimentally that our algorithms run efficiently for a variety of problems on modern hardware.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)