Towards Exact Gradient-based Training on Analog In-memory Computing

Wu, Zhaoxian, Gokmen, Tayfun, Rasch, Malte J., Chen, Tianyi

Jun-18-2024–arXiv.org Artificial Intelligence

Given the high economic and environmental costs of using large vision or language models, analog in-memory accelerators present a promising solution for energy-efficient AI. While inference on analog accelerators has been studied recently, the training perspective is underexplored. Recent studies have shown that the "workhorse" of digital AI training - stochastic gradient descent (SGD) algorithm converges inexactly when applied to model training on non-ideal devices. This paper puts forth a theoretical foundation for gradient-based training on analog devices. We begin by characterizing the non-convergent issue of SGD, which is caused by the asymmetric updates on the analog devices. We then provide a lower bound of the asymptotic error to show that there is a fundamental performance limit of SGD-based analog training rather than an artifact of our analysis. To address this issue, we study a heuristic analog algorithm called Tiki-Taka that has recently exhibited superior empirical performance compared to SGD and rigorously show its ability to exactly converge to a critical point and hence eliminates the asymptotic error. The simulations verify the correctness of the analyses.

analog device, analog sgd, tiki-taka, (15 more...)

arXiv.org Artificial Intelligence

Jun-18-2024

arXiv.org PDF

Add feedback

Country:
- Europe > France (0.04)
- North America > United States
  - New York > Rensselaer County > Troy (0.04)

Genre:
- Research Report > Promising Solution (0.34)

Industry:
- Semiconductors & Electronics (0.87)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (0.88)
  - Machine Learning
    - Neural Networks > Deep Learning (1.00)
    - Statistical Learning (0.86)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found