Improving the Convergence Rates of Forward Gradient Descent with Repeated Sampling

Dexheimer, Niklas, Schmidt-Hieber, Johannes

Nov-26-2024–arXiv.org Artificial Intelligence

Forward gradient descent (FGD) has been proposed as a biologically more plausible alternative of gradient descent as it can be computed without backward pass. Considering the linear model with $d$ parameters, previous work has found that the prediction error of FGD is, however, by a factor $d$ slower than the prediction error of stochastic gradient descent (SGD). In this paper we show that by computing $\ell$ FGD steps based on each training sample, this suboptimality factor becomes $d/(\ell \wedge d)$ and thus the suboptimality of the rate disappears if $\ell \gtrsim d.$ We also show that FGD with repeated sampling can adapt to low-dimensional structure in the input distribution. The main mathematical challenge lies in controlling the dependencies arising from the repeated sampling process.

fgd, gradient descent, mspe, (14 more...)

arXiv.org Artificial Intelligence

Nov-26-2024

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East
  - Jordan (0.04)
- Europe
  - Netherlands (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
    - Oxfordshire > Oxford (0.04)
- North America
  - Canada
    - British Columbia > Vancouver (0.04)
    - Quebec > Montreal (0.04)
  - United States
    - District of Columbia > Washington (0.04)
    - Massachusetts > Middlesex County
      - Cambridge (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found