AITopics | Deep Learning

Collaborating Authors

Deep Learning

New computational algorithms make it possible to build neural networks with many input nodes and many layers, and distinguish "deep learning" of these networks from previous work on artificial neural nets.

News Overviews Instructional Materials AI-Alerts Classics

A Statistical Theory of Deep Learning via Proximal Splitting

Polson, Nicholas G., Willard, Brandon T., Heidari, Massoud

arXiv.org Machine LearningSep-20-2015

In this paper we develop a statistical theory and an implementation of deep learning (DL) models. We show that an elegant variable splitting scheme for the alternating direction method of multipliers (ADMM) optimises a deep learning objective. We allow for non-smooth non-convex regularisation penalties to induce sparsity in parameter weights. We provide a link between traditional shallow layer statistical models such as principal component and sliced inverse regression and deep layer models. We also define the degrees of freedom of a deep learning predictor and a predictive MSE criteria to perform model selection for comparing architecture designs. We focus on deep multiclass logistic learning although our methods apply more generally. Our results suggest an interesting and previously under-exploited relationship between deep learning and proximal splitting techniques. To illustrate our methodology, we provide a multi-class logit classification analysis of Fisher's Iris data where we illustrate the convergence of our algorithm. Finally, we conclude with directions for future research.

algorithm, artificial intelligence, machine learning, (13 more...)

arXiv.org Machine Learning

1509.06061

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)

Add feedback

Robobarista: Object Part based Transfer of Manipulation Trajectories from Crowd-sourcing in 3D Pointclouds

Sung, Jaeyong, Jin, Seok Hyun, Saxena, Ashutosh

arXiv.org Artificial IntelligenceSep-18-2015

There is a large variety of objects and appliances in human environments, such as stoves, coffee dispensers, juice extractors, and so on. It is challenging for a roboticist to program a robot for each of these object types and for each of their instantiations. In this work, we present a novel approach to manipulation planning based on the idea that many household objects share similarly-operated object parts. We formulate the manipulation planning as a structured prediction problem and design a deep learning model that can handle large noise in the manipulation demonstrations and learns features from three different modalities: point-clouds, language and trajectory. In order to collect a large number of manipulation demonstrations for different objects, we developed a new crowd-sourcing platform called Robobarista. We test our model on our dataset consisting of 116 objects with 249 parts along with 250 language instructions, for which there are 1225 crowd-sourced manipulation demonstrations. We further show that our robot can even manipulate objects it has never seen before.

demonstration, manipulation trajectory, trajectory, (13 more...)

arXiv.org Artificial Intelligence

1504.03071

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Generative Image Modeling Using Spatial LSTMs

Theis, Lucas, Bethge, Matthias

arXiv.org Machine LearningSep-18-2015

Modeling the distribution of natural images is challenging, partly because of strong statistical dependencies which can extend over hundreds of pixels. Recurrent neural networks have been successful in capturing long-range dependencies in a number of problems but only recently have found their way into generative image models. We here introduce a recurrent image model based on multidimensional long short-term memory units which are particularly suited for image modeling due to their spatial structure. Our model scales to images of arbitrary size and its likelihood is computationally tractable. We find that it outperforms the state of the art in quantitative comparisons on several image datasets and produces promising results when used for texture synthesis and inpainting. 1 Introduction The last few years have seen tremendous progress in learning useful image representations [6]. While early successes were often achieved through the use of generative models [e.g., 13, 23, 30], recent breakthroughs were mainly driven by improvements in supervised techniques [e.g., 20, 34]. Y et unsupervised learning has the potential to tap into the much larger source of unlabeled data, which may be important for training bigger systems capable of a more general scene understanding. For example, multimodal data is abundant but often unlabeled, yet can still greatly benefit unsupervised approaches [36].

artificial intelligence, machine learning, mcgsm, (19 more...)

arXiv.org Machine Learning

1506.03478

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Adapting Resilient Propagation for Deep Learning

Mosca, Alan, Magoulas, George D.

arXiv.org Machine LearningSep-16-2015

The Resilient Propagation (Rprop) algorithm has been very popular for backpropagation training of multilayer feed-forward neural networks in various applications. The standard Rprop however encounters difficulties in the context of deep neural networks as typically happens with gradient-based learning algorithms. In this paper, we propose a modification of the Rprop that combines standard Rprop steps with a special drop out technique. We apply the method for training Deep Neural Networks as standalone components and in ensemble formulations. Results on the MNIST dataset show that the proposed modification alleviates standard Rprop's problems demonstrating improved learning speed and accuracy.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Machine Learning

1509.04612

Country: Europe > United Kingdom (0.15)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CURL: Co-trained Unsupervised Representation Learning for Image Classification

Bianco, Simone, Ciocca, Gianluigi, Cusano, Claudio

arXiv.org Machine LearningSep-11-2015

Abstract--In this paper we propose a strategy for semi-supervised image classification that leverages unsupervised representation learning and co-training. The strategy, that is called CURL from Co-trained Unsupervised Representation Learning, iteratively builds two classifiers on two different views of the data. The two views correspond to different representations learned from both labeled and unlabeled data and differ in the fusion scheme used to combine the image features. T o assess the performance of our proposal, we conducted several experiments on widely used data sets for scene and object recognition. We considered three scenarios (inductive, transductive and self-taught learning) that differ in the strategy followed to exploit the unlabeled data. As image features we considered a combination of GIST, PHOG, and LBP as well as features extracted from a Con-volutional Neural Network. Moreover, two embodiments of CURL are investigated: one using Ensemble Projection as unsupervised representation learning coupled with Logistic Regression, and one based on LapSVM. The results show that CURL clearly outperforms other supervised and semi-supervised learning methods in the state of the art. Semi-supervised learning [1] consists in taking into account both labeled and unlabeled data when training machine learning models. It is particularly effective when there is plenty of training data, but only a few instances are labeled. In the last years, many semi-supervised learning approaches have been proposed including generative methods [2], [3], graph-based methods [4], [5], and methods based on Support V ector Machines [6], [7]. Co-training is another example of semi-supervised technique [8].

artificial intelligence, machine learning, representation, (19 more...)

arXiv.org Machine Learning

1505.08098

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies

Balduzzi, David, Ghifary, Muhammad

arXiv.org Machine LearningSep-10-2015

This paper proposes GProp, a deep reinforcement learning algorithm for continuous policies with compatible function approximation. The algorithm is based on two innovations. Firstly, we present a temporal-difference based method for learning the gradient of the value-function. Secondly, we present the deviator-actor-critic (DAC) model, which comprises three neural networks that estimate the value function, its gradient, and determine the actor's policy respectively. We evaluate GProp on two challenging tasks: a contextual bandit problem constructed from nonparametric regression datasets that is designed to probe the ability of reinforcement learning algorithms to accurately estimate gradients; and the octopus arm, a challenging reinforcement learning benchmark. GProp is competitive with fully supervised methods on the bandit task and achieves the best performance to date on the octopus arm.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1509.03005

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Deeply Learning the Messages in Message Passing Inference

Lin, Guosheng, Shen, Chunhua, Reid, Ian, Hengel, Anton van den

arXiv.org Machine LearningSep-8-2015

Deep structured output learning shows great promise in tasks like semantic image segmentation. We proffer a new, efficient deep structured model learning scheme, in which we show how deep Convolutional Neural Networks (CNNs) can be used to estimate the messages in message passing inference for structured prediction with Conditional Random Fields (CRFs). With such CNN message estimators, we obviate the need to learn or evaluate potential functions for message calculation. This confers significant efficiency for learning, since otherwise when performing structured learning for a CRF with CNN potentials it is necessary to undertake expensive inference for every stochastic gradient iteration. The network output dimension for message estimation is the same as the number of classes, in contrast to the network output for general CNN potential functions in CRFs, which is exponential in the order of the potentials. Hence CNN message learning has fewer network parameters and is more scalable for cases that a large number of classes are involved. We apply our method to semantic image segmentation on the PASCAL VOC 2012 dataset. We achieve an intersection-over-union score of 73.4 on its test set, which is the best reported result for methods using the VOC training images alone. This impressive performance demonstrates the effectiveness and usefulness of our CNN message learning method.

artificial intelligence, inductive learning, machine learning, (18 more...)

arXiv.org Machine Learning

1506.02108

Genre: Research Report (0.51)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Quantization based Fast Inner Product Search

Guo, Ruiqi, Kumar, Sanjiv, Choromanski, Krzysztof, Simcha, David

arXiv.org Machine LearningSep-4-2015

We propose a quantization based approach for fast approximate Maximum Inner Product Search (MIPS). Each database vector is quantized in multiple subspaces via a set of codebooks, learned directly by minimizing the inner product quantization error. Then, the inner product of a query to a database vector is approximated as the sum of inner products with the subspace quantizers. Different from recently proposed LSH approaches to MIPS, the database vectors and queries do not need to be augmented in a higher dimensional feature space. We also provide a theoretical analysis of the proposed approach, consisting of the concentration results under mild assumptions. Furthermore, if a small sample of example queries is given at the training time, we propose a modified codebook learning procedure which further improves the accuracy. Experimental results on a variety of datasets including those arising from deep neural networks show that the proposed approach significantly outperforms the existing state-of-the-art.

artificial intelligence, machine learning, precision 0, (17 more...)

arXiv.org Machine Learning

1509.01469

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Domain Generalization for Object Recognition with Multi-task Autoencoders

Ghifary, Muhammad, Kleijn, W. Bastiaan, Zhang, Mengjie, Balduzzi, David

arXiv.org Artificial IntelligenceAug-31-2015

The problem of domain generalization is to take knowledge acquired from a number of related domains where training data is available, and to then successfully apply it to previously unseen domains. We propose a new feature learning algorithm, Multi-Task Autoencoder (MTAE), that provides good generalization performance for cross-domain object recognition. Our algorithm extends the standard denoising autoencoder framework by substituting artificially induced corruption with naturally occurring inter-domain variability in the appearance of objects. Instead of reconstructing images from noisy versions, MTAE learns to transform the original image into analogs in multiple related domains. It thereby learns features that are robust to variations across domains. The learnt features are then used as inputs to a classifier. We evaluated the performance of the algorithm on benchmark image recognition datasets, where the task is to learn features from multiple datasets and to then predict the image label from unseen datasets. We found that (denoising) MTAE outperforms alternative autoencoder-based models as well as the current state-of-the-art algorithms for domain generalization.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

1508.0768

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Partitioning Large Scale Deep Belief Networks Using Dropout

Huang, Yanping, Zhang, Sai

arXiv.org Machine LearningAug-28-2015

Deep learning methods have shown great promise in many practical applications, ranging from speech recognition, visual object recognition, to text processing. However, most of the current deep learning methods suffer from scalability problems for large-scale applications, forcing researchers or users to focus on smallscale problems with fewer parameters. In this paper, we consider a well-known machine learning model, deep belief networks (DBNs) that have yielded impressive classification performance on a large number of benchmark machine learning tasks. To scale up DBN, we propose an approach that can use the computing clusters in a distributed environment to train large models, while the dense matrix computations within a single machine are sped up using graphics processors (GPU). When training a DBN, each machine randomly drops out a portion of neurons in each hidden layer, for each training case, making the remaining neurons only learn to detect features that are generally helpful for producing the correct answer. Within our approach, we have developed four methods to combine outcomes from each machine to form a unified model. Our preliminary experiment on the MNIST handwritten digit database demonstrates that our approach outperforms the state of the art test error rate.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1508.07096

Country: North America > United States (0.15)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.62)

Add feedback