Oates, Tim


Adaptive Normalized Risk-Averting Training For Deep Neural Networks

arXiv.org Machine Learning

This paper proposes a set of new error criteria and learning approaches, Adaptive Normalized Risk-Averting Training (ANRAT), to attack the non-convex optimization problem in training deep neural networks (DNNs). Theoretically, we demonstrate its effectiveness on global and local convexity lower-bounded by the standard $L_p$-norm error. By analyzing the gradient on the convexity index $\lambda$, we explain the reason why to learn $\lambda$ adaptively using gradient descent works. In practice, we show how this method improves training of deep neural networks to solve visual recognition tasks on the MNIST and CIFAR-10 datasets. Without using pretraining or other tricks, we obtain results comparable or superior to those reported in recent literature on the same tasks using standard ConvNets + MSE/cross entropy. Performance on deep/shallow multilayer perceptrons and Denoised Auto-encoders is also explored. ANRAT can be combined with other quasi-Newton training methods, innovative network variants, regularization techniques and other specific tricks in DNNs. Other than unsupervised pretraining, it provides a new perspective to address the non-convex optimization problem in DNNs.


Adaptive Normalized Risk-Averting Training for Deep Neural Networks

AAAI Conferences

This paper proposes a set of new error criteria and a learning approach, called Adaptive Normalized Risk-Averting Training (ANRAT) to attack the non-convex optimization problem in training deep neural networks without pretraining. Theoretically, we demonstrate its effectiveness based on the expansion of the convexity region. By analyzing the gradient on the convexity index $\lambda$, we explain the reason why our learning method using gradient descent works. In practice, we show how this training method is successfully applied for improved training of deep neural networks to solve visual recognition tasks on the MNIST and CIFAR-10 datasets. Using simple experimental settings without pretraining and other tricks, we obtain results comparable or superior to those reported in recent literature on the same tasks using standard ConvNets + MSE/cross entropy. Performance on deep/shallow multilayer perceptron and Denoised Auto-encoder is also explored. ANRAT can be combined with other quasi-Newton training methods, innovative network variants, regularization techniques and other common tricks in DNNs. Other than unsupervised pretraining, it provides a new perspective to address the non-convex optimization strategy in training DNNs.


Imaging Time-Series to Improve Classification and Imputation

AAAI Conferences

Inspired by recent successes of deep learning in computer vision, we propose a novel framework for encoding time series as different types of images, namely, Gramian Angular Summation/Difference Fields (GASF/GADF) and Markov Transition Fields (MTF). This enables the use of techniques from computer vision for time series classification and imputation. We used Tiled Convolutional Neural Networks (tiled CNNs) on 20 standard datasets to learn high-level features from the individual and compound GASF-GADF-MTF images. Our approaches achieve highly competitive results when compared to nine of the current best time series classification approaches. Inspired by the bijection property of GASF on 0/1 rescaled data, we train Denoised Auto-encoders (DA) on the GASF images of four standard and one synthesized compound dataset. The imputation MSE on test data is reduced by 12.18% – 48.02% when compared to using the raw data. An analysis of the features and weights learned via tiled CNNs and DAs explains why the approaches work.


Imaging Time-Series to Improve Classification and Imputation

arXiv.org Machine Learning

Inspired by recent successes of deep learning in computer vision, we propose a novel framework for encoding time series as different types of images, namely, Gramian Angular Summation/Difference Fields (GASF/GADF) and Markov Transition Fields (MTF). This enables the use of techniques from computer vision for time series classification and imputation. We used Tiled Convolutional Neural Networks (tiled CNNs) on 20 standard datasets to learn high-level features from the individual and compound GASF-GADF-MTF images. Our approaches achieve highly competitive results when compared to nine of the current best time series classification approaches. Inspired by the bijection property of GASF on 0/1 rescaled data, we train Denoised Auto-encoders (DA) on the GASF images of four standard and one synthesized compound dataset. The imputation MSE on test data is reduced by 12.18%-48.02% when compared to using the raw data. An analysis of the features and weights learned via tiled CNNs and DAs explains why the approaches work.


Encoding Time Series as Images for Visual Inspection and Classification Using Tiled Convolutional Neural Networks

AAAI Conferences

Inspired by recent successes of deep learning in computer vision and speech recognition, we propose a novel framework to encode time series data as different types of images, namely, Gramian Angular Fields (GAF) and Markov Transition Fields (MTF). This enables the use of techniques from computer vision for classification. Using a polar coordinate system, GAF images are represented as a Gramian matrix where each element is the trigonometric sum (i.e., superposition of directions) between different time intervals. MTF images represent the first order Markov transition probability along one dimension and temporal dependency along the other. We used Tiled Convolutional Neural Networks (tiled CNNs) on 12 standard datasets to learn high-level features from individual GAF, MTF, and GAF-MTF images that resulted from combining GAF and MTF representations into a single image. The classification results of our approach are competitive with five stateof-the-art approaches. An analysis of the features and weights learned via tiled CNNs explains why the approach works.


Comparing Raw Data and Feature Extraction for Seizure Detection with Deep Learning Methods

AAAI Conferences

Personalized health monitoring is slowly becoming a reality due to advances in small, high-fidelity sensors, low-power processors, as well as energy harvesting techniques. The ability to efficiently and effectively process this data and extract useful information is of the utmost importance. In this paper, we aim at dealing with this challenge for the application of automated seizure detection. We explore the use of a variety of representations and machine learning algorithms to the particular task of seizure detection in high-resolution, multi-channel EEG data. In doing so, we explore the classification accuracy, computational complexity and memory requirements with a view toward understanding which approaches are most suitable. In particular, we show that layered learning approaches such as Deep Belief Networks excel along these dimensions.


The Thing That We Tried Didn't Work Very Well : Deictic Representation in Reinforcement Learning

arXiv.org Artificial Intelligence

Most reinforcement learning methods operate on propositional representations of the world state. Such representations are often intractably large and generalize poorly. Using a deictic representation is believed to be a viable alternative: they promise generalization while allowing the use of existing reinforcement-learning methods. Yet, there are few experiments on learning with deictic representations reported in the literature. In this paper we explore the effectiveness of two forms of deictic representation and a na\"{i}ve propositional representation in a simple blocks-world domain. We find, empirically, that the deictic representations actually worsen learning performance. We conclude with a discussion of possible causes of these results and strategies for more effective learning in domains with objects.


Toward an Integrated Metacognitive Architecture

AAAI Conferences

Researchers have studied problems in metacognition both in computers and in humans. In response some have implemented models of cognition and metacognitive activity in various architectures to test and better define specific theories of metacognition. However, current theories and implementations suffer from numerous problems and lack of detail. Here we illustrate the problems with two different computational approaches. The Meta-Cognitive Loop and Meta-AQUA both examine the metacognitive reasoning involved in monitoring and reasoning about failures of expectations, and they both learn from such experiences. But neither system presents a full accounting of the variety of known metacognitive phenomena, and, as far as we know, no extant system does. The problem is that no existing cognitive architecture directly addresses metacognition. Instead, current architectures were initially developed to study more narrow cognitive functions and only later were they modified to include higher level attributes. We claim that the solution is to develop a metacognitive architecture outright, and we begin to outline the structure that such a foundation might have.


Model AI Assignments 2011

AAAI Conferences

The Model AI Assignments session seeks to gather and disseminate the best assignment designs of the Artificial Intelligence (AI) Education community. Recognizing that assignments form the core of student learning experience, we here present abstracts of three AI assignments from the 2011 session that are easily adoptable, playfully engaging, and flexible for a variety of instructor needs.


The Metacognitive Loop: An Architecture for Building Robust Intelligent Systems

AAAI Conferences

What commonsense knowledge do intelligent systems need, in order to recover from failures or deal with unexpected situations? It is impractical to represent predetermined solutions to deal with every unanticipated situation or provide predetermined fixes for all the different ways in which systems may fail. We contend that intelligent systems require only a finite set of anomaly-handling strategies to muddle through anomalous situations. We describe a generalized metacognition module that implements such a set of anomaly-handling strategies and that in principle can be attached to any host system to improve the robustness of that system. Several implemented studies are reported, that support our contention.