Information Technology
Learning Classification with Unlabeled Data
Department of Computer Science University of Rochester Rochester, NY 14627 Abstract One of the advantages of supervised learning is that the final error metric isavailable during training. For classifiers, the algorithm can directly reduce the number of misclassifications on the training set. Unfortunately, whenmodeling human learning or constructing classifiers for autonomous robots,supervisory labels are often not available or too expensive. In this paper we show that we can substitute for the labels by making use of structure between the pattern distributions to different sensory modalities.We show that minimizing the disagreement between the outputs of networks processing patterns from these different modalities is a sensible approximation to minimizing the number of misclassifications in each modality, and leads to similar results. Using the Peterson-Barney vowel dataset we show that the algorithm performs well in finding appropriate placementfor the codebook vectors particularly when the confuseable classes are different for the two modalities. 1 INTRODUCTION This paper addresses the question of how a human or autonomous robot can learn to classify new objects without experience with previous labeled examples.
Feature Densities are Required for Computing Feature Correspondences
The feature correspondence problem is a classic hurdle in visual object-recognition concerned with determining the correct mapping between the features measured from the image and the features expected bythe model. In this paper we show that determining good correspondences requires information about the joint probability density over the image features. We propose "likelihood based correspondence matching" as a general principle for selecting optimal correspondences.The approach is applicable to nonrigid models, allows nonlinear perspective transformations, and can optimally dealwith occlusions and missing features.
Classification of Multi-Spectral Pixels by the Binary Diamond Neural Network
Classification is widely used in the animal kingdom. Identifying an item as food is classification. Assigning words to objects, actions, feelings, and situations is classification. The purpose of this work is to introduce a new neural network, the Binary Diamond, which can be used as a general purpose classification tool. The design and operational mode of the Binary Diamond are influenced by observations of the underlying mechanisms that take place in human classification processes.
Neural Network Methods for Optimization Problems
In a talk entitled "Trajectory Control of Convergent Networks with applications to TSP", Natan Peterfreund (Computer Science, Technion) dealt with the problem of controlling the trajectories of continuous convergent neural networks models for solving optimization problems, without affecting their equilibria set and their convergence properties.Natan presented a class of feedback control functions which achieve this objective, while also improving the convergence rates. A modified Hopfield andTank neural network model, developed through the proposed feedback approach, was found to substantially improve the results of the original model in solving the Traveling Salesman Problem. The proposed feedback overcame the 2n symmetric property of the TSP problem. In a talk entitled "Training Feedforward Neural Networks quickly and accurately using Very Fast Simulated Reannealing Methods", Bruce Rosen (Asst. Professor, Computer Science, UT San Antonio) presented the Very Fast Simulated Reannealing (VFSR)algorithm for training feedforward neural networks [2].
Catastrophic interference in connectionist networks: Can It Be predicted, can It be prevented?
Catastrophic interference in connectionist networks: Can it be predicted, can it be prevented? Catastrophic forgetting occurs when connectionist networks learn new information, and by so doing, forget all previously learned information. This workshop focused primarily on the causes of catastrophic interference, the techniques that have been developed to reduce it, the effect of these techniques on the networks' ability to generalize, andthe degree to which prediction of catastrophic forgetting is possible. The speakers were Robert French, Phil Hetherington (Psychology Department, McGill University, het@blaise.psych.mcgill.ca), French indicated that catastrophic forgetting is at its worst when high representation overlapat the hidden layer combines with significant teacher-output error.
Credit Assignment through Time: Alternatives to Backpropagation
Bengio, Yoshua, Frasconi, Paolo
Learning to recognize or predict sequences using long-term context hasmany applications. However, practical and theoretical problems are found in training recurrent neural networks to perform tasksin which input/output dependencies span long intervals. Starting from a mathematical analysis of the problem, we consider and compare alternative algorithms and architectures on tasks for which the span of the input/output dependencies can be controlled. Results on the new algorithms show performance qualitatively superior tothat obtained with backpropagation. 1 Introduction Recurrent neural networks have been considered to learn to map input sequences to output sequences. Machines that could efficiently learn such tasks would be useful for many applications involving sequence prediction, recognition or production.
Optimal Stopping and Effective Machine Complexity in Learning
Wang, Changfeng, Venkatesh, Santosh S., Judd, J. Stephen
We study tltt' problem of when to stop If'arning a class of feedforward networks - networks with linear outputs I1PUrOIl and fixed input weights - when they are trained with a gradient descent algorithm on a finite number of examples. Under general regularity conditions, it is shown that there a.re in general three distinct phases in the generalization performance in the learning process, and in particular, the network has hetter gt'neralization pPTformance when learning is stopped at a certain time before til(' global miniIl111lu of the empirical error is reachert. A notion of effective size of a machine is rtefil1e i and used to explain the tradeoff betwf'en the complexity of the marhine and the training error ill the learning process. The study leads nat.urally to a network size selection critt'rion, which turns Ol1t to be a generalization of Akaike's Information Criterioll for the It'arning process. It if; shown that stopping Iparning before tiJt' global minimum of the empirical error has the effect of network size splectioll. 1 INTRODUCTION The primary goal of learning in neural nets is to find a network that gives valid generalization. In achieving this goal, a central issue is the tradeoff between the training error and network complexity. This usually reduces to a problem of network size selection, which has drawn much research effort in recent years. Various principles, theories, and intuitions, including Occam's razor, statistical model selection criteria such as Akaike's Information Criterion (AIC) [11 and many others [5, 1, 10,3,111 all quantitatively support the following PAC prescription: between two machines which have the same empirical error, the machine with smaller VC-dimf'nsion generalizes better. However, it is noted that these methods or criteria do not npcpssarily If'ad to optimal (or llearly optimal) generalization performance.
Convergence of Stochastic Iterative Dynamic Programming Algorithms
Jaakkola, Tommi, Jordan, Michael I., Singh, Satinder P.
Increasing attention has recently been paid to algorithms based on dynamic programming (DP) due to the suitability of DP for learning problemsinvolving control. In stochastic environments where the system being controlled is only incompletely known, however, a unifying theoretical account of these methods has been missing. In this paper we relate DPbased learning algorithms to the powerful techniquesof stochastic approximation via a new convergence theorem, enabling us to establish a class of convergent algorithms to which both TD("\) and Q-Iearning belong. 1 INTRODUCTION Learning to predict the future and to find an optimal way of controlling it are the basic goals of learning systems that interact with their environment. A variety of algorithms are currently being studied for the purposes of prediction and control in incompletely specified, stochastic environments. Here we consider learning algorithms definedin Markov environments. There are actions or controls (u) available for the learner that affect both the state transition probabilities, and the probability distributionfor the immediate, state dependent costs (Ci( u)) incurred by the learner.