Improving the Convergence Rates of Forward Gradient Descent with Repeated Sampling
Dexheimer, Niklas, Schmidt-Hieber, Johannes
–arXiv.org Artificial Intelligence
Forward gradient descent (FGD) has been proposed as a biologically more plausible alternative of gradient descent as it can be computed without backward pass. Considering the linear model with $d$ parameters, previous work has found that the prediction error of FGD is, however, by a factor $d$ slower than the prediction error of stochastic gradient descent (SGD). In this paper we show that by computing $\ell$ FGD steps based on each training sample, this suboptimality factor becomes $d/(\ell \wedge d)$ and thus the suboptimality of the rate disappears if $\ell \gtrsim d.$ We also show that FGD with repeated sampling can adapt to low-dimensional structure in the input distribution. The main mathematical challenge lies in controlling the dependencies arising from the repeated sampling process.
arXiv.org Artificial Intelligence
Nov-26-2024
- Country:
- Asia > Middle East
- Jordan (0.04)
- Europe
- Netherlands (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Oxfordshire > Oxford (0.04)
- North America
- Canada
- British Columbia > Vancouver (0.04)
- Quebec > Montreal (0.04)
- United States
- District of Columbia > Washington (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Canada
- Asia > Middle East
- Genre:
- Research Report (1.00)
- Technology: