Goto

Collaborating Authors

 Computational Learning Theory


Can neural networks do better than the Vapnik-Chervonenkis bounds?

Neural Information Processing Systems

These experiments are designed to test whether average generalization performance can surpass the worst-case bounds obtained from formal learning theory using the Vapnik-Chervonenkis dimension (Blumer et al., 1989). We indeed find that, in some cases, the average generalization is significantly better than the VC bound: the approach to perfect performance is exponential in the number of examples m, rather than the 11m result of the bound. In other cases, we do find the 11m behavior of the VC bound, and in these cases, the numerical prefactor is closely related to prefactor contained in the bound.


Learning Time-varying Concepts

Neural Information Processing Systems

This work extends computational learning theory to situations in which concepts vary over time, e.g., system identification of a time-varying plant. We have extended formal definitions of concepts and learning to provide a framework in which an algorithm can track a concept as it evolves over time. Given this framework and focusing on memory-based algorithms, we have derived some PACstyle sample complexity results that determine, for example, when tracking is feasible. We have also used a similar framework and focused on incremental tracking algorithms for which we have derived some bounds on the mistake or error rates for some specific concept classes. 1 INTRODUCTION The goal of our ongoing research is to extend computational learning theory to include concepts that can change or evolve over time. For example, face recognition is complicated bythe fact that a persons face changes slowly with age and more quickly with changes in make up, hairstyle, or facial hair.


Theory and Application of Minimal-Length Encoding: 1990 AAAI Spring Symposium Report

AI Magazine

This symposium was very successful and was perhaps the most unusual of the spring symposia this year. It brought together for the first time distinguished researchers from many diverse disciplines to discuss and share results on a particular topic of mutual interest. The disciplines included machine learning, computational learning theory, computer vision, pattern recognition, perceptual psychology, statistics, information theory, theoretical computer science, and molecular biology, with the involvement of the latter group having lead to a joint session with the AI and Molecular Biology symposium.


The Strength of Weak Learnability

Classics

This paper addresses the problem of improving the accuracy of an hypothesis output by a learning algorithm in the distribution-free (PAC) learning model. A concept class is learnable (or strongly learnable) if, given access to a Source of examples of the unknown concept, the learner with high probability is able to output an hypothesis that is correct on all but an arbitrarily small fraction of the instances. The concept class is weakly learnable if the learner can produce an hypothesis that performs only slightly better than random guessing.In this paper, it is shown that these two notions of learnability are equivalent. A method is described for converting a weak learning algorithm into one that achieves arbitrarily high accuracy. This construction may have practical applications as a tool for efficiently converting a mediocre learning algorithm into one that performs extremely well. In addition, the construction has some interesting theoretical consequences, including a set of general upper bounds on the complexity of any strong learning algorithm as a function of the allowed error e.See also: SpringerLinkMachine Learning, 5 (2), 197-227


Learnability and the Vapnik-Chervonenkis dimension

Classics

In this paper we extend Valiant's model to learning concepts defined by regions in Euclidean n-dimensional space E", n 2 1. The general techniques we develop lead to new results in Boolean domains as well. Our methods are based on the pioneering work of Vapnik and Chervonenkis [6 I-631 on the distribution-free convergence of empirical probability estimates and its application to the theory of pattern recognition. These methods provide a unified treatment of some of Valiant's results, and extend previous results of Pearl [50, 5 I] and Devroye and Wagner ([ 151, see also [ 141) along with our results from [lo]. In learning a class C of concepts (e.g., subsets of E") from examples, a single target concept is selected from C and we are given a finite sequence of points in E", each labeled " 1" if it is in the target concept (a positive example) and "0" if it is not (a negative example). This set is called a sample of the target concept. A learning function for C is a function that, given a large enough randoml:y drawn sample of any target concept in C, returns a region in E" (a hypothesis) that is with high probability a good approximation to the target concept. More precisely: (1) We let P be a fixed probability distribution on E" and assume that examples are created by drawing points independently at random according to P. (2) The error of a hypothesis is taken to be the probability that it disagrees with the target concept on a randomly drawn example, that is, the error is just the probability (according to P) of the symmetric difference between the hypothesis and the target concept.



Universal coding, information, prediction, and estimation

Classics

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.


A general learning theory and its application to schema abstraction

Classics

This chapter focuses on ACT system that embodies the extremely powerful thesis that a single set of learning processes underlies the whole gamut of human learning—from children learning their first language by hearing examples of adult speech to adults learning to program a computer by reading textbook instructions. The computer simulation is called ACT. The ACT theory describes its application to research on abstraction of schemas. In ACT, knowledge is divided into two categories: declarative and procedural. The declarative knowledge is represented in a propositional network similar to semantic network representations.