AITopics

Designing a principled and effective algorithm for learning deep architectures is a challenging problem. The current approach involves two training phases: a fully unsupervised learning followed by a strongly discriminative optimization. We suggest a deep learning strategy that bridges the gap between the two phases, resulting in a three-phase learning procedure. We propose to implement the scheme using a method to regularize deep belief networks with top-down information. The network is constructed from building blocks of restricted Boltzmann machines learned by combining bottom-up and top-down sampled signals. A global optimization procedure that merges samples from a forward bottom-up pass and a top-down pass is used. Experiments on the MNIST dataset show improvements over the existing algorithms for deep belief networks. Object recognition results on the Caltech-101 dataset also yield competitive results.

deep learning, neural network, representation, (16 more...)

Country: Europe > France (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Szegedy, Christian, Toshev, Alexander, Erhan, Dumitru

Deep Neural Networks for Object Detection

Deep Neural Networks (DNNs) have recently shown outstanding performance on image classification tasks [14]. In this paper we go one step further and address the problem of object detection using DNNs, that is not only classifying but also precisely localizing objects of various classes. We present a simple and yet powerful formulationof object detection as a regression problem to object bounding box masks. We define a multi-scale inference procedure which is able to produce high-resolutionobject detections at a low cost by a few network applications. State-of-the-art performance of the approach is shown on Pascal VOC.

deep learning, detection, neural network, (17 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)

Adaptive dropout for training deep neural networks

Ba, Jimmy, Frey, Brendan

Recently, it was shown that by dropping out hidden activities with a probability of 0.5, deep neural networks can perform very well. We describe a model in which a binary belief network is overlaid on a neural network and is used to decrease the information content of its hidden units by selectively setting activities to zero. This ''dropout network can be trained jointly with the neural network by approximately computing local expectations of binary dropout variables, computing derivatives using back-propagation, and using stochastic gradient descent. Interestingly, experiments show that the learnt dropout network parameters recapitulate the neural network parameters, suggesting that a good dropout network regularizes activities according to magnitude. When evaluated on the MNIST and NORB datasets, we found our method can be used to achieve lower classification error rates than other feather learning methods, including standard dropout, denoising auto-encoders, and restricted Boltzmann machines. For example, our model achieves 5.8% error on the NORB test set, which is better than state-of-the-art results obtained using convolutional architectures. "

artificial intelligence, deep learning, neural network, (17 more...)

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Generalized Denoising Auto-Encoders as Generative Models

Bengio, Yoshua, Yao, Li, Alain, Guillaume, Vincent, Pascal

Recent work has shown how denoising and contractive autoencoders implicitly capture the structure of the data generating density, in the case where the corruption noise is Gaussian, the reconstruction error is the squared error, and the data is continuous-valued. This has led to various proposals for sampling from this implicitly learned density function, using Langevin and Metropolis-Hastings MCMC. However, it remained unclear how to connect the training procedure of regularized auto-encoders to the implicit estimation of the underlying data generating distribution when the data are discrete, or using other forms of corruption process and reconstruction errors. Another issue is the mathematical justification which is only valid in the limit of small corruption noise. We propose here a different attack on the problem, which deals with all these issues: arbitrary (but noisy enough) corruption, arbitrary reconstruction loss (seen as a log-likelihood), handling both discrete and continuous-valued variables, and removing the bias due to non-infinitesimal corruption noise (or non-infinitesimal contractive penalty).

deep learning, markov chain, neural network, (20 more...)

Country: North America > Canada (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Lake, Brenden M., Salakhutdinov, Ruslan R., Tenenbaum, Josh

One-shot learning by inverting a compositional causal process

People can learn a new visual class from just one example, yet machine learning algorithms typically require hundreds or thousands of examples to tackle the same problems. Here we present a Hierarchical Bayesian model based on compositionality and causality that can learn a wide range of natural (although simple) visual concepts, generalizing in human-like ways from just one image. We evaluated performance on a challenging one-shot classification task, where our model achieved a human-level error rate while substantially outperforming two deep learning models. We also used a visual Turing test" to show that our model produces human-like performance on other conceptual tasks, including generating new examples and parsing."

deep learning, neural network, participant, (15 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Grosse, Roger B., Maddison, Chris J., Salakhutdinov, Ruslan R.

Annealing between distributions by averaging moments

Many powerful Monte Carlo techniques for estimating partition functions, such as annealed importance sampling (AIS), are based on sampling from a sequence of intermediate distributions which interpolate between a tractable initial distribution and an intractable target distribution. The near-universal practice is to use geometric averages of the initial and target distributions, but alternative paths can perform substantially better. We present a novel sequence of intermediate distributions for exponential families: averaging the moments of the initial and target distributions. We derive an asymptotically optimal piecewise linear schedule for the moments path and show that it performs at least as well as geometric averages with a linear schedule. Moment averaging performs well empirically at estimating partition functions of restricted Boltzmann machines (RBMs), which form the building blocks of many deep learning models, including Deep Belief Networks and Deep Boltzmann Machines.

deep learning, intermediate distribution, neural network, (18 more...)

Country: North America > Canada > Ontario > Toronto (0.15)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Martens, James, Chattopadhya, Arkadev, Pitassi, Toni, Zemel, Richard

On the Representational Efficiency of Restricted Boltzmann Machines

This paper examines the question: What kinds of distributions can be efficiently represented by Restricted Boltzmann Machines (RBMs)? We characterize the RBM's unnormalized log-likelihood function as a type of neural network (called an RBM network), and through a series of simulation results relate these networks to types that are better understood. We show the surprising result that RBM networks can efficiently compute any function that depends on the number of 1's in the input, such as parity. We also provide the first known example of a particular type of distribution which provably cannot be efficiently represented by an RBM (or equivalently, cannot be efficiently computed by an RBM network), assuming a realistic exponential upper bound on the size of the weights. By formally demonstrating that a relatively simple distribution cannot be represented efficiently by an RBM our results provide a new rigorous justification for the use of potentially more expressive generative models, such as deeper ones.

deep learning, neural network, rbm network, (18 more...)

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)

Convex Two-Layer Modeling

Aslan, Özlem, Cheng, Hao, Zhang, Xinhua, Schuurmans, Dale

Latent variable prediction models, such as multi-layer networks, impose auxiliary latent variables between inputs and outputs to allow automatic inference of implicit features useful for prediction. Unfortunately, such models are difficult to train because inference over latent variables must be performed concurrently with parameter optimization---creating a highly non-convex problem. Instead of proposing another local training method, we develop a convex relaxation of hidden-layer conditional models that admits global training. Our approach extends current convex modeling approaches to handle two nested nonlinearities separated by a non-trivial adaptive latent layer. The resulting methods are able to acquire two-layer models that cannot be represented by any single-layer model over the same features, while improving training quality over local heuristics.

deep learning, neural network, representation, (19 more...)

Country: North America > Canada > Alberta (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Srivastava, Nitish, Salakhutdinov, Ruslan R.

Discriminative Transfer Learning with Tree-based Priors

High capacity classifiers, such as deep neural networks, often struggle on classes that have very few training examples. We propose a method for improving classification performancefor such classes by discovering similar classes and transferring knowledge among them. Our method learns to organize the classes into a tree hierarchy. This tree structure imposes a prior over the classifier's parameters. Weshow that the performance of deep neural networks can be improved by applying these priors to the weights in the last layer. Our method combines the strength of discriminatively trained deep neural networks, which typically require largeamounts of training data, with tree-based priors, making deep neural networks work well on infrequent classes as well. We also propose an algorithm for learning the underlying tree structure. Starting from an initial pre-specified tree, this algorithm modifies the tree to make it more pertinent to the task being solved, for example, removing semantic relationships in favour of visual ones for an image classification task. Our method achieves state-of-the-art classification results on the CIFAR-100 image data set and the MIR Flickr image-text data set.

baseline, deep learning, neural network, (17 more...)

Country:

North America > Canada > Ontario > Toronto (0.15)
North America > United States > Virginia (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Simonyan, Karen, Vedaldi, Andrea, Zisserman, Andrew

Deep Fisher Networks for Large-Scale Image Classification

As massively parallel computations have become broadly available with modern GPUs, deep architectures trained on very large datasets have risen in popularity. Discriminativelytrained convolutional neural networks, in particular, were recently shown to yield state-of-the-art performance in challenging image classification benchmarkssuch as ImageNet. However, elements of these architectures are similar to standard handcrafted representations used in computer vision. In this paper, we explore the extent of this analogy, proposing a version of the stateof-the-art Fishervector image encoding that can be stacked in multiple layers. This architecture significantly improves on standard Fisher vectors, and obtains competitive results with deep convolutional networks at a smaller computational learning cost. Our hybrid architecture allows us to assess how the performance of a conventional handcrafted image classification pipeline changes with increased depth. We also show that convolutional networks and Fisher vector encodings are complementary in the sense that their combination further improves the accuracy.

deep learning, fisher layer, neural network, (18 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)