AITopics | Computational Learning Theory

Collaborating Authors

Computational Learning Theory

In computer science, computational learning theory (or just learning theory) is a subfield of Artificial Intelligence devoted to studying the design and analysis of machine learning algorithms (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Can neural networks do better than the Vapnik-Chervonenkis bounds?

Neural Information Processing SystemsApr-6-2023, 19:36:41 GMT

These experiments are designed to test whether average generalization performance can surpass the worst-case bounds obtained from formal learning theory using the Vapnik-Chervonenkis dimension (Blumer et al., 1989). We indeed find that, in some cases, the average generalization is significantly better than the VC bound: the approach to perfect performance is exponential in the number of examples m, rather than the 11m result of the bound. In other cases, we do find the 11m behavior of the VC bound, and in these cases, the numerical prefactor is closely related to prefactor contained in the bound.

experiment, prefactor, vapnik-chervonenkis

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Learning Time-varying Concepts

Neural Information Processing SystemsApr-6-2023, 19:34:05 GMT

This work extends computational learning theory to situations in which concepts vary over time, e.g., system identification of a time-varying plant. We have extended formal definitions of concepts and learning to provide a framework in which an algorithm can track a concept as it evolves over time. Given this framework and focusing on memory-based algorithms, we have derived some PAC-style sample complexity results that determine, for example, when tracking is feasible. We have also used a similar framework and focused on incremental tracking algorithms for which we have derived some bounds on the mistake or error rates for some specific concept classes.

algorithm, learning time-varying concept

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.70)

Add feedback

Chaitin-Kolmogorov Complexity and Generalization in Neural Networks

Neural Information Processing SystemsApr-6-2023, 19:32:15 GMT

We present a unified framework for a number of different ways of failing to generalize properly. The complexity of the function computed is therefore increased, and generalization is degraded. We analyze replicated networks, in which a number of identical networks are independently trained on the same data and their results averaged. We conclude that replication almost always results in a decrease in the expected complexity of the network, and that replication therefore increases expected generalization. Simulations confirming the effect are also presented.

chaitin-kolmogorov complexity and generalization, neural network, training data, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.40)

Add feedback

Polynomial Uniform Convergence of Relative Frequencies to Probabilities

Neural Information Processing SystemsApr-6-2023, 19:23:34 GMT

We define the concept of polynomial uniform convergence of relative frequencies to probabilities in the distribution-dependent context. Let Xn {O, l}n, let Pn be a probability distribution on Xn and let Fn C 2X,. be a family of events. The family {(Xn, Pn, Fn)}n l has the property of polynomial uniform convergence if the probability that the maximum difference (over Fn) between the relative frequency and the probabil(cid:173) ity of an event exceed a given positive e be at most 6 (0 6 1), when the sample on which the frequency is evaluated has size polynomial in n,l/e,l/b. Given at-sample (Xl, ...,Xt), let C t)(XI, ...,Xt) be the Vapnik-Chervonenkis dimension of the family {{x}, ...,xtl n f I f E Fn} and M(n, t) the expectation E(C t) It). Applications to distribution-dependent PAC learning are discussed.

polynomial uniform convergence, probability, relative frequency

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.63)

Add feedback

The VC-Dimension versus the Statistical Capacity of Multilayer Networks

Neural Information Processing SystemsApr-6-2023, 19:23:16 GMT

A general relationship is developed between the VC-dimension and the statistical lower epsilon-capacity which shows that the VC-dimension can be lower bounded (in order) by the statistical lower epsilon-capacity of a network trained with random samples. This relationship explains quan(cid:173) titatively how generalization takes place after memorization, and relates the concept of generalization (consistency) with the capacity of the optimal classifier over a class of classifiers with the same structure and the capacity of the Bayesian classifier. Furthermore, it provides a general methodology to evaluate a lower bound for the VC-dimension of feedforward multilayer neural networks. This general methodology is applied to two types of networks which are important for hardware implementations: two layer (N - 2L - 1) net(cid:173) works with binary weights, integer thresholds for the hidden units and zero threshold for the output unit, and a single neuron ((N - 1) net(cid:173) works) with binary weigths and a zero threshold. Here W is the total number of weights of the (N - 2L - 1) networks.

classifier, multilayer network, statistical capacity, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)

Add feedback

Automatic Capacity Tuning of Very Large VC-Dimension Classifiers

Neural Information Processing SystemsApr-6-2023, 19:13:14 GMT

Large VC-dimension classifiers can learn difficult tasks, but are usually impractical because they generalize well only if they are trained with huge quantities of data. In this paper we show that even high-order polynomial classifiers in high dimensional spaces can be trained with a small amount of training data and yet generalize better than classifiers with a smaller VC-dimension. This is achieved with a maximum margin algorithm (the Generalized Portrait). The technique is applicable to a wide variety of classifiers, including Perceptrons, polynomial classifiers (sigma-pi unit net(cid:173) works) and Radial Basis Functions. The effective number of parameters is adjusted automatically by the training algorithm to match the complexity of the problem.

automatic capacity tuning, classifier, vc-dimension classifier, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.91)

Add feedback

Developing Population Codes by Minimizing Description Length

Neural Information Processing SystemsApr-6-2023, 18:58:10 GMT

The Minimum Description Length principle (MDL) can be used to train the hidden units of a neural network to extract a representa(cid:173) tion that is cheap to describe but nonetheless allows the input to be reconstructed accurately. We show how MDL can be used to develop highly redundant population codes. Each hidden unit has a location in a low-dimensional implicit space. If the hidden unit activities form a bump of a standard shape in this space, they can be cheaply encoded by the center ofthis bump. So the weights from the input units to the hidden units in an autoencoder are trained to make the activities form a standard bump.

input vector, minimizing description length, population code, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.41)

Add feedback

Autoencoders, Minimum Description Length and Helmholtz Free Energy

Neural Information Processing SystemsApr-6-2023, 18:57:07 GMT

An autoencoder network uses a set of recognition weights to convert an input vector into a code vector. It then uses a set of generative weights to convert the code vector into an approximate reconstruction of the input vector. We derive an objective function for training autoencoders based on the Minimum Description Length (MDL) principle. The aim is to minimize the information required to describe both the code vector and the reconstruction error. We show that this information is minimized by choosing code vectors stochastically according to a Boltzmann distri(cid:173) bution, where the generative weights define the energy of each possible code vector given the input vector.

code vector, minimum description length, vector, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Neural Networks with Quadratic VC Dimension

Neural Information Processing SystemsApr-6-2023, 18:22:10 GMT

This paper shows that neural networks which use continuous acti(cid:173) vation functions have VC dimension at least as large as the square of the number of weights w. This result settles a long-standing open question, namely whether the well-known O( w log w) bound, known for hard-threshold nets, also held for more general sigmoidal nets. Implications for the number of samples needed for valid gen(cid:173) eralization are discussed.

neural network, quadratic vc dimension

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.71)

Add feedback

A Convergence Proof for the Softassign Quadratic Assignment Algorithm

Neural Information Processing SystemsApr-6-2023, 18:13:49 GMT

The softassign quadratic assignment algorithm has recently emerged as an effective strategy for a variety of optimization prob(cid:173) lems in pattern recognition and combinatorial optimization. While the effectiveness of the algorithm was demonstrated in thousands of simulations, there was no known proof of convergence. Here, we provide a proof of convergence for the most general form of the algorithm.

convergence proof, softassign quadratic assignment algorithm

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.40)

Add feedback