Traditionally, deep learning algorithms update the network weights whereas the network architecture is chosen manually, using a process of trial and error. In this work, we propose two novel approaches that automatically update the network structure while also learning its weights. The novelty of our approach lies in our parameterization where the depth, or additional complexity, is encapsulated continuously in the parameter space through control parameters that add additional complexity. We propose two methods: In tunnel networks, this selection is done at the level of a hidden unit, and in budding perceptrons, this is done at the level of a network layer; updating this control parameter introduces either another hidden unit or another hidden layer. We show the effectiveness of our methods on the synthetic two-spirals data and on two real data sets of MNIST and MIRFLICKR, where we see that our proposed methods, with the same set of hyperparameters, can correctly adjust the network complexity to the task complexity.
Training neural networks is traditionally done by providing a sequence of random mini-batches sampled uniformly from the entire training data. In this work, we analyze the effects of curriculum learning, which involves the dynamic non-uniform sampling of mini-batches, on the training of deep networks, and specifically CNNs trained on image recognition. To employ curriculum learning, the training algorithm must resolve 2 problems: (i) sort the training examples by difficulty; (ii) compute a series of mini-batches that exhibit an increasing level of difficulty. We address challenge (i) using two methods: transfer learning from some competitive "teacher" network, and bootstrapping. We show that both methods show similar benefits in terms of increased learning speed and improved final performance on test data. We address challenge (ii) by investigating different pacing functions to guide the sampling. The empirical investigation includes a variety of network architectures, using images from CIFAR-10, CIFAR-100 and subsets of ImageNet. We conclude with a novel theoretical analysis of curriculum learning, where we show how it effectively modifies the optimization landscape. We then define the concept of an ideal curriculum, and show that under mild conditions it does not change the corresponding global minimum of the optimization function.
Neuroscience experiments often require training animals to perform tasks designed to elicit various sensory, cognitive, and motor behaviors. Training typically involves a series of gradual adjustments of stimulus conditions and rewards in order to bring about learning. However, training protocols are usually hand-designed, relying on a combination of intuition, guesswork, and trial-and-error, and often require weeks or months to achieve a desired level of task performance. Here we combine ideas from reinforcement learning and adaptive optimal experimental design to formulate methods for adaptive optimal training of animal behavior. Our work addresses two intriguing problems at once: first, it seeks to infer the learning rules underlying an animal's behavioral changes during training; second, it seeks to exploit these rules to select stimuli that will maximize the rate of learning toward a desired objective.
At the heart of deep learning lies a hard optimization problem. So hard that for several decades after the introduction of neural networks, the difficulty of optimization on deep neural networks was a barrier to their mainstream usage and contributed to their decline in the 1990s and 2000s. Since then, we have overcome this issue. In this post, I explore the "hardness" in optimizing neural networks and see what the theory has to say. In a nutshell: the deeper the network becomes, the harder the optimization problem becomes.
The mushroom body is the key network for the representation of learned olfactory stimuli in Drosophila and insects. The sparse activity of Kenyon cells, the principal neurons in the mushroom body, plays a key role in the learned classification of different odours. In the specific case of the fruit fly, the sparseness of the network is enforced by an inhibitory feedback neuron called APL, and by an intrinsic high firing threshold of the Kenyon cells. In this work we took inspiration from the fruit fly brain to formulate a novel machine learning algorithm that is able to optimize the sparsity level of a reservoir by changing the firing thresholds of the nodes. The sparsity is only applied on the readout layer so as not to change the timescales of the reservoir and to allow the derivation of a one-layer update rule for the firing thresholds. The proposed algorithm is a combination of learning a neuron-specific sparsity threshold via gradient descent and a global sparsity threshold via a Markov chain Monte Carlo method. The proposed model outperforms the standard gradient descent, which is limited to the readout weights of the reservoir, on two example tasks. It demonstrates how the learnt sparse representation can lead to better classification performance, memorization ability and convergence time.