Reviews: Learning to learn by gradient descent by gradient descent
–Neural Information Processing Systems
Comments (mostly on related work): Authors: "The idea of using learning to learn or meta-learning to acquire knowledge or inductive biases has a long history [Thrun and Pratt, 1998]." But the intro to this reference is muddling the waters by confusing meta-learning (which is about learning the learning algorithm itself) and transfer learning, subsuming basically everything under "meta-learning," even standard back-propagation, because it can be applied to some data set, and then may learn new data points more quickly (so this is just standard transfer learning). To my knowledge, the first work on learning general learning algorithms written in a universal programming language was published in 1987: J. Schmidhuber. Evolutionary principles in self-referential learning, or on learning how to learn: The meta-meta-... hook. Authors: "This work was built on by [Younger et al., 2001, Hochreiter et al., 2001] wherein a higher-level network act as a gradient descent procedure, with both levels trained during learning. Alternatively Schmidhuber [1992, 1993] considers networks that are able to modify their own behavior and act as an alternative to recurrent networks in meta-learning. Note, however that these earlier works do not directly address the transfer of a learned training procedure to novel problem instances and instead focus on adaptivity in the online setting."
Neural Information Processing Systems
Jan-20-2025, 23:04:46 GMT
- Country:
- North America > United States
- California > San Francisco County > San Francisco (0.15)
- Europe > Germany
- North Rhine-Westphalia > Upper Bavaria > Munich (0.05)
- North America > United States
- Technology: