Taming the Wild: A Unified Analysis of H!-Style Algorithms
–Neural Information Processing Systems
Stochastic gradient descent (SGD) is a ubiquitous algorithm for a variety of machine learning problems. Researchers and industry have developed several techniques to optimize SGD's runtime performance, including asynchronous execution and reduced precision. Our main result is a martingale-based analysis that enables us to capture the rich noise models that may arise from such techniques.
Neural Information Processing Systems
Mar-13-2024, 01:45:57 GMT