AITopics | Deep Learning

Collaborating Authors

Deep Learning

New computational algorithms make it possible to build neural networks with many input nodes and many layers, and distinguish "deep learning" of these networks from previous work on artificial neural nets.

News Overviews Instructional Materials AI-Alerts Classics

Training recurrent networks online without backtracking

Ollivier, Yann, Tallec, Corentin, Charpiat, Guillaume

arXiv.org Machine LearningNov-20-2015

We introduce the "NoBackTrack" algorithm to train the parameters of dynamical systems such as recurrent neural networks. This algorithm works in an online, memoryless setting, thus requiring no backpropagation through time, and is scalable, avoiding the large computational and memory cost of maintaining the full gradient of the current state with respect to the parameters. The algorithm essentially maintains, at each time, a single search direction in parameter space. The evolution of this search direction is partly stochastic and is constructed in such a way to provide, at every time, an unbiased random estimate of the gradient of the loss function with respect to the parameters. Because the gradient estimate is unbiased, on average over time the parameter is updated as it should. The resulting gradient estimate can then be fed to a lightweight Kalman-like filter to yield an improved algorithm. For recurrent neural networks, the resulting algorithms scale linearly with the number of parameters. Small-scale experiments confirm the suitability of the approach, showing that the stochastic approximation of the gradient introduced in the algorithm is not detrimental to learning. In particular, the Kalman-like version of NoBackTrack is superior to backpropagation through time (BPTT) when the time span of dependencies in the data is longer than the truncation span for BPTT.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1507.0768

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)

Add feedback

Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images

Watter, Manuel, Springenberg, Jost Tobias, Boedecker, Joschka, Riedmiller, Martin

arXiv.org Machine LearningNov-20-2015

We introduce Embed to Control (E2C), a method for model learning and control of non-linear dynamical systems from raw pixel images. E2C consists of a deep generative model, belonging to the family of variational autoencoders, that learns to generate image trajectories from a latent space in which the dynamics is constrained to be locally linear. Our model is derived directly from an optimal control formulation in latent space, supports long-term prediction of image sequences and exhibits strong performance on a variety of complex control problems.

artificial intelligence, latent space, machine learning, (20 more...)

arXiv.org Machine Learning

1506.07365

Country:

Europe > Germany (0.28)
North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

Add feedback

Teaching Machines to Read and Comprehend

Hermann, Karl Moritz, Kočiský, Tomáš, Grefenstette, Edward, Espeholt, Lasse, Kay, Will, Suleyman, Mustafa, Blunsom, Phil

arXiv.org Artificial IntelligenceNov-19-2015

Teaching machines to read natural language documents remains an elusive challenge. Machine reading systems can be tested on their ability to answer questions posed on the contents of documents that they have seen, but until now large scale training and test datasets have been missing for this type of evaluation. In this work we define a new methodology that resolves this bottleneck and provides large scale supervised reading comprehension data. This allows us to develop a class of attention based deep neural networks that learn to read real documents and answer complex questions with minimal prior knowledge of language structure.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

1506.0334

Country:

North America > United States > Florida > Orange County > Orlando (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.64)

Industry:

Education (0.88)
Media > News (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AUC-maximized Deep Convolutional Neural Fields for Sequence Labeling

Wang, Sheng, Sun, Siqi, Xu, Jinbo

arXiv.org Machine LearningNov-19-2015

Deep Convolutional Neural Networks (DCNN) has shown excellent performance in a variety of machine learning tasks. This manuscript presents Deep Convolutional Neural Fields (DeepCNF), a combination of DCNN with Conditional Random Field (CRF), for sequence labeling with highly imbalanced label distribution. The widely-used training methods, such as maximum-likelihood and maximum labelwise accuracy, do not work well on highly imbalanced data. To handle this, we present a new training algorithm called maximum-AUC for DeepCNF. That is, we train DeepCNF by directly maximizing the empirical Area Under the ROC Curve (AUC), which is an unbiased measurement for imbalanced data. To fulfill this, we formulate AUC in a pairwise ranking framework, approximate it by a polynomial function and then apply a gradient-based procedure to optimize it. We then test our AUC-maximized DeepCNF on three very different protein sequence labeling tasks: solvent accessibility prediction, 8-state secondary structure prediction, and disorder prediction. Our experimental results confirm that maximum-AUC greatly outperforms the other two training methods on 8-state secondary structure prediction and disorder prediction since their label distributions are highly imbalanced and also have similar performance as the other two training methods on the solvent accessibility prediction problem which has three equally-distributed labels. Furthermore, our experimental results also show that our AUC-trained DeepCNF models greatly outperform existing popular predictors of these three tasks.

artificial intelligence, machine learning, prediction, (17 more...)

arXiv.org Machine Learning

1511.05265

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)

Add feedback

Adjustable Bounded Rectifiers: Towards Deep Binary Representations

Wu, Zhirong, Lin, Dahua, Tang, Xiaoou

arXiv.org Machine LearningNov-19-2015

Binary representation is desirable for its memory efficiency, computation speed and robustness. In this paper, we propose adjustable bounded rectifiers to learn binary representations for deep neural networks. While hard constraining representations across layers to be binary makes training unreasonably difficult, we softly encourage activations to diverge from real values to binary by approximating step functions. Our final representation is completely binary. We test our approach on MNIST, CIFAR10, and ILSVRC2012 dataset, and systematically study the training dynamics of the binarization process. Our approach can binarize the last layer representation without loss of performance and binarize all the layers with reasonably small degradations. The memory space that it saves may allow more sophisticated models to be deployed, thus compensating the loss. To the best of our knowledge, this is the first work to report results on current deep network architectures using complete binary middle representations. Given the learned representations, we find that the firing or inhibition of a binary neuron is usually associated with a meaningful interpretation across different classes. This suggests that the semantic structure of a neural network may be manifested through a guided binarization process.

artificial intelligence, machine learning, representation, (16 more...)

arXiv.org Machine Learning

1511.06201

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Unified Gradient Regularization Family for Adversarial Examples

Lyu, Chunchuan, Huang, Kaizhu, Liang, Hai-Ning

arXiv.org Machine LearningNov-19-2015

Adversarial examples are augmented data points generated by imperceptible perturbation of input samples. They have recently drawn much attention with the machine learning and data mining community. Being difficult to distinguish from real examples, such adversarial examples could change the prediction of many of the best learning models including the state-of-the-art deep learning models. Recent attempts have been made to build robust models that take into account adversarial examples. However, these methods can either lead to performance drops or lack mathematical motivations. In this paper, we propose a unified framework to build robust machine learning models against adversarial examples. More specifically, using the unified framework, we develop a family of gradient regularization methods that effectively penalize the gradient of loss function w.r.t. inputs. Our proposed framework is appealing in that it offers a unified view to deal with adversarial examples. It incorporates another recently-proposed perturbation based approach as a special case. In addition, we present some visual effects that reveals semantic meaning in those perturbations, and thus support our regularization method and provide another explanation for generalizability of adversarial examples. By applying this technique to Maxout networks, we conduct a series of experiments and achieve encouraging results on two benchmark datasets. In particular,we attain the best accuracy on MNIST data (without data augmentation) and competitive performance on CIFAR-10 data.

artificial intelligence, machine learning, perturbation, (18 more...)

arXiv.org Machine Learning

1511.06385

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

Classifying and Segmenting Microscopy Images Using Convolutional Multiple Instance Learning

Kraus, Oren Z., Ba, Lei Jimmy, Frey, Brendan

arXiv.org Machine LearningNov-17-2015

Convolutional neural networks (CNN) have achieved state of the art performance on both classification and segmentation tasks. Applying CNNs to microscopy images is challenging due to the lack of datasets labeled at the single cell level. We extend the application of CNNs to microscopy image classification and segmentation using multiple instance learning (MIL). We present the adaptive Noisy-AND MIL pooling function, a new MIL operator that is robust to outliers. Combining CNNs with MIL enables training CNNs using full resolution microscopy images with global labels. We base our approach on the similarity between the aggregation function used in MIL and pooling layers used in CNNs. We show that training MIL CNNs end-to-end outperforms several previous methods on both mammalian and yeast microscopy images without requiring any segmentation steps.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Machine Learning

doi: 10.1093/bioinformatics/btw252

1511.05286

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.50)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Neural Adaptive Sequential Monte Carlo

Gu, Shixiang, Ghahramani, Zoubin, Turner, Richard E.

arXiv.org Machine LearningNov-16-2015

Sequential Monte Carlo (SMC), or particle filtering, is a popular class of methods for sampling from an intractable target distribution using a sequence of simpler intermediate distributions. Like other importance sampling-based methods, performance is critically dependent on the proposal distribution: a bad proposal can lead to arbitrarily inaccurate estimates of the target distribution. This paper presents a new method for automatically adapting the proposal using an approximation of the Kullback-Leibler divergence between the true posterior and the proposal distribution. The method is very flexible, applicable to any parameterized proposal distribution and it supports online and batch variants. We use the new framework to adapt powerful proposal distributions with rich parameterizations based upon neural networks leading to Neural Adaptive Sequential Monte Carlo (NASMC). Experiments indicate that NASMC significantly improves inference in a non-linear state space model outperforming adaptive proposal methods including the Extended Kalman and Unscented Particle Filters. Experiments also indicate that improved inference translates into improved parameter learning when NASMC is used as a subroutine of Particle Marginal Metropolis Hastings. Finally we show that NASMC is able to train a latent variable recurrent neural network (LV-RNN) achieving results that compete with the state-of-the-art for polymorphic music modelling. NASMC can be seen as bridging the gap between adaptive SMC methods and the recent work in scalable, black-box variational inference.

artificial intelligence, machine learning, proposal distribution, (16 more...)

arXiv.org Machine Learning

1506.03338

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

How (not) to Train your Generative Model: Scheduled Sampling, Likelihood, Adversary?

Huszár, Ferenc

arXiv.org Machine LearningNov-16-2015

Modern applications and progress in deep learning research have created renewed interest for generative models of text and of images. However, even today it is unclear what objective functions one should use to train and evaluate these models. In this paper we present two contributions. Firstly, we present a critique of scheduled sampling, a state-of-the-art training method that contributed to the winning entry to the MSCOCO image captioning benchmark in 2015. Here we show that despite this impressive empirical performance, the objective function underlying scheduled sampling is improper and leads to an inconsistent learning algorithm. Secondly, we revisit the problems that scheduled sampling was meant to address, and present an alternative interpretation. We argue that maximum likelihood is an inappropriate training objective when the end-goal is to generate natural-looking samples. We go on to derive an ideal objective function to use in this situation instead. We introduce a generalisation of adversarial training, and show how such method can interpolate between maximum likelihood training and our ideal training objective. To our knowledge this is the first theoretical analysis that explains why adversarial training tends to produce samples with higher perceived quality.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Machine Learning

1511.05101

Genre:

Instructional Material (0.56)
Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.58)

Add feedback

Heterogeneous Knowledge Transfer in Video Emotion Recognition, Attribution and Summarization

Xu, Baohan, Fu, Yanwei, Jiang, Yu-Gang, Li, Boyang, Sigal, Leonid

arXiv.org Artificial IntelligenceNov-15-2015

Rapid development of mobile devices has led to an explosive growth of user-generated images and videos, which creates a demand for computational understanding of visual media content. In addition to recognition of objective content, such as objects and scenes, an important dimension of video content analysis is the understanding of emotional or affective content, i.e. estimating the emotional impact of the video on a viewer. Emotional content can strongly resonate with viewers and plays a crucial role in the videowatching experience. Some successes have been achieved with the use of deep-learning architectures trained for text at both sentence-and document-level [40] or image sentiment analysis [8]. However, the ability to understand emotions from video, to a large extent, remains an unsolved problem. Analysis of emotional content in video has many realworld applications. Video recommendation services can benefit from matching user interests with the emotions of video content and prediction of interestingness [20], [21], [36], leading to improved user satisfaction. Better understanding of video emotions may enable advertising that is consistent with the main video's mood and help avoid social inappropriateness such as placing a funny advertisement alongside a funeral video. Video summarization [68] and coding [60] can also benefit from understanding emotions, since an accurate summary should keep the emotional content conveyed by the original video.

dataset, emotion, video, (15 more...)

arXiv.org Artificial Intelligence

1511.04798

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > China > Shanghai > Shanghai (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(4 more...)

Genre:

Research Report > New Finding (0.93)
Overview (0.93)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Information Technology (0.67)
Education > Educational Setting (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(7 more...)

Add feedback