Step-size Optimization for Continual Learning

Degris, Thomas, Javed, Khurram, Sharifnassab, Arsalan, Liu, Yuxin, Sutton, Richard

Jan-30-2024–arXiv.org Artificial Intelligence

In continual learning, a learner has to keep learning from the data over its whole life time. A key issue is to decide what knowledge to keep and what knowledge to let go. In a neural network, this can be implemented by using a step-size vector to scale how much gradient samples change network weights. Common algorithms, like RMSProp and Adam, use heuristics, specifically normalization, to adapt this step-size vector. In this paper, we show that those heuristics ignore the effect of their adaptation on the overall objective function, for example by moving the step-size vector away from better step-size vectors. On the other hand, stochastic meta-gradient descent algorithms, like IDBD (Sutton, 1992), explicitly optimize the step-size vector with respect to the overall objective function. On simple problems, we show that IDBD is able to consistently improve step-size vectors, where RMSProp and Adam do not. We explain the differences between the two approaches and their respective limitations. We conclude by suggesting that combining both approaches could be a promising future direction to improve the performance of neural networks in continual learning.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

Jan-30-2024

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East
  - Israel (0.14)
- North America
  - Canada > Alberta (0.14)
  - United States (0.28)

Genre:
- Research Report (0.50)

Industry:
- Education (0.47)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks (1.00)
  - Statistical Learning > Gradient Descent (0.36)