SWALP : Stochastic Weight Averaging in Low-Precision Training

Yang, Guandao, Zhang, Tianyi, Kirichenko, Polina, Bai, Junwen, Wilson, Andrew Gordon, De Sa, Christopher

Apr-26-2019–arXiv.org Artificial Intelligence

Low precision operations can provide scalability, memory savings, portability, and energy efficiency. This paper proposes SWALP, an approach to low precision training that averages low-precision SGD iterates with a modified learning rate schedule. SWALP is easy to implement and can match the performance of full-precision SGD even with all numbers quantized down to 8 bits, including the gradient accumulators. Additionally, we show that SWALP converges arbitrarily close to the optimal solution for quadratic objectives, and to a noise ball asymptotically smaller than low precision SGD in strongly convex settings.

deep learning, neural network, swalp, (18 more...)

arXiv.org Artificial Intelligence

Apr-26-2019

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia (0.14)

Genre:
- Research Report > New Finding (0.68)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks > Deep Learning (0.94)
  - Statistical Learning (0.94)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found