Using variational Bayes neural networks, we develop an algorithm capable of accumulating knowledge into a prior from multiple different tasks. The result is a rich and meaningful prior capable of few-shot learning on new tasks. The posterior can go beyond the mean field approximation and yields good uncertainty on the performed experiments. Analysis on toy tasks shows that it can learn from significantly different tasks while finding similarities among them. Experiments of Mini-Imagenet yields the new state of the art with 74.5% accuracy on 5 shot learning. Finally, we provide experiments showing that other existing methods can fail to perform well in different benchmarks.
Considering music as a sequence of events with multiple complex dependencies, the Long Short-Term Memory (LSTM) architecture has proven very efficient in learning and reproducing musical styles. However, the generation of rhythms requires additional information regarding musical structure and accompanying instruments. In this paper we present DeepDrum, an adaptive Neural Network capable of generating drum rhythms under constraints imposed by Feed-Forward (Conditional) Layers which contain musical parameters along with given instrumentation information (e.g. bass and guitar notes). Results on generated drum sequences are presented indicating that DeepDrum is effective in producing rhythms that resemble the learned style, while at the same time conforming to given constraints that were unknown during the training process.
A neural network (NN) is a parameterised function that can be tuned via gradient descent to approximate a labelled collection of data with high precision. A Gaussian process (GP), on the other hand, is a probabilistic model that defines a distribution over possible functions, and is updated in light of data via the rules of probabilistic inference. GPs are probabilistic, data-efficient and flexible, however they are also computationally intensive and thus limited in their applicability. We introduce a class of neural latent variable models which we call Neural Processes (NPs), combining the best of both worlds. Like GPs, NPs define distributions over functions, are capable of rapid adaptation to new observations, and can estimate the uncertainty in their predictions. Like NNs, NPs are computationally efficient during training and evaluation but also learn to adapt their priors to data. We demonstrate the performance of NPs on a range of learning tasks, including regression and optimisation, and compare and contrast with related models in the literature.
In this work we explore a straightforward variational Bayes scheme for Recurrent Neural Networks. Firstly, we show that a simple adaptation of truncated backpropagation through time can yield good quality uncertainty estimates and superior regularisation at only a small extra computational cost during training, also reducing the amount of parameters by 80\%. Secondly, we demonstrate how a novel kind of posterior approximation yields further improvements to the performance of Bayesian RNNs. We incorporate local gradient information into the approximate posterior to sharpen it around the current batch statistics. We show how this technique is not exclusive to recurrent neural networks and can be applied more widely to train Bayesian neural networks. We also empirically demonstrate how Bayesian RNNs are superior to traditional RNNs on a language modelling benchmark and an image captioning task, as well as showing how each of these methods improve our model over a variety of other schemes for training them. We also introduce a new benchmark for studying uncertainty for language models so future methods can be easily compared.
Theoretical understanding of deep learning is one of the most important tasks facing the statistics and machine learning communities. While deep neural networks (DNNs) originated as engineering methods and models of biological networks in neuroscience and psychology, they have quickly become a centerpiece of the machine learning toolbox. Unfortunately, DNN adoption powered by recent successes combined with the open-source nature of the machine learning community, has outpaced our theoretical understanding. We cannot reliably identify when and why DNNs will make mistakes. In some applications like text translation these mistakes may be comical and provide for fun fodder in research talks, a single error can be very costly in tasks like medical imaging. As we utilize DNNs in increasingly sensitive applications, a better understanding of their properties is thus imperative. Recent advances in DNN theory are numerous and include many different sources of intuition, such as learning theory, sparse signal analysis, physics, chemistry, and psychology. An interesting pattern begins to emerge in the breadth of possible interpretations. The seemingly limitless approaches are mostly constrained by the lens with which the mathematical operations are viewed. Ultimately, the interpretation of DNNs appears to mimic a type of Rorschach test --- a psychological test wherein subjects interpret a series of seemingly ambiguous ink-blots. Validation for DNN theory requires a convergence of the literature. We must distinguish between universal results that are invariant to the analysis perspective and those that are specific to a particular network configuration. Simultaneously we must deal with the fact that many standard statistical tools for quantifying generalization or empirically assessing important network features are difficult to apply to DNNs.