Collaborating Authors

ARCADe: A Rapid Continual Anomaly Detector Machine Learning

Although continual learning and anomaly detection have separately been well-studied in previous works, their intersection remains rather unexplored. The present work addresses a learning scenario where a model has to incrementally learn a sequence of anomaly detection tasks, i.e. tasks from which only examples from the normal (majority) class are available for training. We define this novel learning problem of continual anomaly detection (CAD) and formulate it as a meta-learning problem. Moreover, we propose A Rapid Continual Anomaly Detector (ARCADe), an approach to train neural networks to be robust against the major challenges of this new learning problem, namely catastrophic forgetting and overfitting to the majority class. The results of our experiments on three datasets show that, in the CAD problem setting, ARCADe substantially outperforms baselines from the continual learning and anomaly detection literature. Finally, we provide deeper insights into the learning strategy yielded by the proposed meta-learning algorithm.

Online Structured Laplace Approximations for Overcoming Catastrophic Forgetting

Neural Information Processing Systems

We introduce the Kronecker factored online Laplace approximation for overcoming catastrophic forgetting in neural networks. The method is grounded in a Bayesian online learning framework, where we recursively approximate the posterior after every task with a Gaussian, leading to a quadratic penalty on changes to the weights. The Laplace approximation requires calculating the Hessian around a mode, which is typically intractable for modern architectures. In order to make our method scalable, we leverage recent block-diagonal Kronecker factored approximations to the curvature. Our algorithm achieves over 90% test accuracy across a sequence of 50 instantiations of the permuted MNIST dataset, substantially outperforming related methods for overcoming catastrophic forgetting.

A Comprehensive Overview and Survey of Recent Advances in Meta-Learning Machine Learning

This article reviews meta-learning also known as learning-to-learn which seeks rapid and accurate model adaptation to unseen tasks with applications in highly automated AI, few-shot learning, natural language processing and robotics. Unlike deep learning, meta-learning can be applied to few-shot high-dimensional datasets and considers further improving model generalization to unseen tasks. Deep learning is focused upon in-sample prediction and meta-learning concerns model adaptation for out-of-sample prediction. Meta-learning can continually perform self-improvement to achieve highly autonomous AI. Meta-learning may serve as an additional generalization block complementary for original deep learning model. Meta-learning seeks adaptation of machine learning models to unseen tasks which are vastly different from trained tasks. Meta-learning with coevolution between agent and environment provides solutions for complex tasks unsolvable by training from scratch. Meta-learning methodology covers a wide range of great minds and thoughts. We briefly introduce meta-learning methodologies in the following categories: black-box meta-learning, metric-based meta-learning, layered meta-learning and Bayesian meta-learning framework. Recent applications concentrate upon the integration of meta-learning with other machine learning framework to provide feasible integrated problem solutions. We briefly present recent meta-learning advances and discuss potential future research directions.

Uncertainty in Model-Agnostic Meta-Learning using Variational Inference Machine Learning

Thanh-Toan Do University of Liverpool Abstract W e introduce a new, rigorously-formulated Bayesian meta-learning algorithm that learns a probability distribution of model parameter prior for few-shot learning. The proposed algorithm employs a gradient-based variational inference to infer the posterior of model parameters to a new task. Our algorithm can be applied to any model architecture and can be implemented in various machine learning paradigms, including regression and classification. W e show that the models trained with our proposed meta-learning algorithm are well calibrated and accurate, with state-of-the-art calibration and classification results on two few-shot classification benchmarks (Omniglot and Mini-ImageNet), and competitive results in a multi-modal task-distribution regression. 1. Introduction Machine learning, in particular deep learning, has thrived during the last decade, producing results that were previously considered to be infeasible in several areas. For instance, outstanding results have been achieved in speech and image understanding [1-4], and medical image analysis [5]. However, the development of these machine learning methods typically requires a large number of training samples to achieve notable performance. Such requirement contrasts with the human ability of quickly adapting to new learning tasks using few "training" samples. This difference may be due to the fact that humans tend to exploit prior knowledge to facilitate the learning of new tasks, while machine learning algorithms often do not use any prior knowledge (e.g., training from scratch with random initialisation) [6] or rely on weak prior knowledge to learn new tasks (e.g., training from pre-trained models) [7]. This challenge has motivated the design of machine learning methods that can make more effective use of prior knowledge to adapt to new learning tasks using few training samples [8].

Meta-Learning with Warped Gradient Descent Machine Learning

A versatile and effective approach to meta-learning is to infer a gradient-based up-date rule directly from data that promotes rapid learning of new tasks from the same distribution. Current methods rely on backpropagating through the learning process, limiting their scope to few-shot learning. In this work, we introduce Warped Gradient Descent (WarpGrad), a family of modular optimisers that can scale to arbitrary adaptation processes. WarpGrad methods meta-learn to warp task loss surfaces across the joint task-parameter distribution to facilitate gradient descent, which is achieved by a reparametrisation of neural networks that interleaves warp layers in the architecture. These layers are shared across task learners and fixed during adaptation; they represent a projection of task parameters into a meta-learned space that is conducive to task adaptation and standard backpropagation induces a form of gradient preconditioning. WarpGrad methods are computationally efficient and easy to implement as they rely on parameter sharing and backpropagation. They are readily combined with other meta-learners and can scale both in terms of model size and length of adaptation trajectories as meta-learning warp parameters do not require differentiation through task adaptation processes. We show empirically that WarpGrad optimisers meta-learn a warped space where gradient descent is well behaved, with faster convergence and better performance in a variety of settings, including few-shot, standard supervised, continual, and reinforcement learning.