AITopics | Inductive Learning

Collaborating Authors

Inductive Learning

Inductive learning, or induction, is the process of creating generalizations from individual instances.

News Overviews Instructional Materials AI-Alerts Classics

Discontinuous Generalization in Large Committee Machines

Neural Information Processing SystemsApr-6-2023, 18:56:42 GMT

The problem of learning from examples in multilayer networks is studied within the framework of statistical mechanics. Using the replica formalism we calculate the average generalization error of a fully connected committee machine in the limit of a large number of hidden units. If the number of training examples is proportional to the number of inputs in the network, the generalization error as a function of the training set size approaches a finite value. If the number of training examples is proportional to the number of weights in the network we find first-order phase transitions with a discontinuous drop in the generalization error for both binary and continuous weights.

committee machine, discontinuous generalization, generalization error, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Add feedback

A Comparison of Dynamic Reposing and Tangent Distance for Drug Activity Prediction

Neural Information Processing SystemsApr-6-2023, 18:53:07 GMT

In drug activity prediction (as in handwritten character recogni(cid:173) tion), the features extracted to describe a training example depend on the pose (location, orientation, etc.) of the example. In hand(cid:173) written character recognition, one of the best techniques for ad(cid:173) dressing this problem is the tangent distance method of Simard, LeCun and Denker (1993). Jain, et al. (1993a; 1993b) introduce a new technique-dynamic reposing-that also addresses this prob(cid:173) lem. Dynamic reposing iteratively learns a neural network and then reposes the examples in an effort to maximize the predicted out(cid:173) put values. New models are trained and new poses computed until models and poses converge.

cid, drug activity prediction, dynamic reposing and tangent distance, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.65)

Add feedback

Cross-Validation Estimates IMSE

Neural Information Processing SystemsApr-6-2023, 18:47:57 GMT

Let zN denote a given set of N training examples. Let QN(zN) denote the expected squared error (the expectation taken over all possible examples) of the network after being trained on zN. This measures the quality of fit afforded by training on a given set of N examples. Let IMSEN denote the Integrated Mean Squared Error for training sets of size N. Given reasonable assumptions, it is straightforward to show that IMSEN E[Q N(ZN)] - 0"2, where the expectation is now over all training sets of size N, ZN is a random training set of size N, and 0"2 is the noise variance. Let CN CN(zN) denote the "delete-one cross-validation" squared error measure for a network trained on zN.

cross-validation estimate imse, denote, training set, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.60)

Add feedback

Active Learning for Function Approximation

Neural Information Processing SystemsApr-6-2023, 18:42:43 GMT

In function approximation, example-based learning can be formulated as synthesiz(cid:173) ing an approximation function for data sampled from an unknown target function (Poggio and Girosi, 1990). Active learning describes a class of example-based learning paradigms that seeks out new training examples from specific regions of the input space, instead of passively accepting examples from some data generating source.

active learning, target function, unknown target function, (13 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.65)

Add feedback

Active Learning with Statistical Models

Neural Information Processing SystemsApr-6-2023, 18:38:30 GMT

An active learning problem is one where the learner has the ability or need to influence or select its own training data. Many problems of great practical interest allow active learning, and many even require it. We consider the problem of actively learning a mapping X - Y based on a set of training examples {(Xi,Yi)} l' where Xi E X and Yi E Y. The learner is allowed to iteratively select new inputs x (possibly from a constrained set), observe the resulting output y, and incorporate the new examples (x, y) into its training set. The primary question of active learning is how to choose which x to try next. There are many heuristics for choosing x based on intuition, including choosing places where we don't have data, where we perform poorly [Linden and Weber, 1993], where we have low confidence [Thrun and Moller, 1992], where we expect it

learner, learning architecture, statistical model, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.57)

Add feedback

Using Unlabeled Data for Supervised Learning

Neural Information Processing SystemsApr-6-2023, 18:31:28 GMT

Many classification problems have the property that the only costly part of obtaining examples is the class label. This paper suggests a simple method for using distribution information contained in unlabeled examples to augment labeled examples in a supervised training framework. Empirical tests show that the technique de(cid:173) scribed in this paper can significantly improve the accuracy of a supervised learner when the learner is well below its asymptotic accuracy level.

learner, supervised learning, unlabeled data

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.40)

Add feedback

Family Discovery

Neural Information Processing SystemsApr-6-2023, 18:22:50 GMT

"Family discovery" is the task of learning the dimension and struc(cid:173) ture of a parameterized family of stochastic models. It is espe(cid:173) cially appropriate when the training examples are partitioned into "episodes" of samples drawn from a single parameter value. We present three family discovery algorithms based on surface learn(cid:173) ing and show that they significantly improve performance over two alternatives on a parameterized classification task.

cid, family discovery

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.77)

Add feedback

For Valid Generalization the Size of the Weights is More Important than the Size of the Network

Neural Information Processing SystemsApr-6-2023, 18:17:30 GMT

This paper shows that if a large neural network is used for a pattern classification problem, and the learning algorithm finds a network with small weights that has small squared error on the training patterns, then the generalization performance depends on the size of the weights rather than the number of weights. More specifi(cid:173) cally, consider an i-layer feed-forward network of sigmoid units, in which the sum of the magnitudes of the weights associated with each unit is bounded by A. The misclassification probability con(cid:173) verges to an error estimate (that is closely related to squared error on the training set) at rate O((cA)l(l 1)/2J(log n)jm) ignoring log factors, where m is the number of training patterns, n is the input dimension, and c is a constant. This may explain the gen(cid:173) eralization performance of neural networks, particularly when the number of training examples is considerably smaller than the num(cid:173) ber of weights. It also supports heuristics (such as weight decay and early stopping) that attempt to keep the weights small during training.

neural network, training pattern, valid generalization, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.63)

Add feedback

Learning Temporally Persistent Hierarchical Representations

Neural Information Processing SystemsApr-6-2023, 18:11:47 GMT

A biologically motivated model of cortical self-organization is pro(cid:173) posed. Context is combined with bottom-up information via a maximum likelihood cost function. Clusters of one or more units are modulated by a common contextual gating Signal; they thereby organize themselves into mutually supportive predictors of abstract contextual features. The model was tested in its ability to discover viewpoint-invariant classes on a set of real image sequences of cen(cid:173) tered, gradually rotating faces. It performed considerably better than supervised back-propagation at generalizing to novel views from a small number of training examples. The importance of context effects l in perception has been demonstrated in many domains.

cid, information, learning temporally persistent hierarchical representation, (13 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.78)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.51)

Add feedback

Generalization in Decision Trees and DNF: Does Size Matter?

Neural Information Processing SystemsApr-6-2023, 17:48:47 GMT

Recent theoretical results for pattern classification with thresh(cid:173) olded real-valued functions (such as support vector machines, sig(cid:173) moid networks, and boosting) give bounds on misclassification probability that do not depend on the size of the classifier, and hence can be considerably smaller than the bounds that follow from the VC theory. In this paper, we show that these techniques can be more widely applied, by representing other boolean functions as two-layer neural networks (thresholded convex combinations of boolean functions). For example, we show that with high probabil(cid:173) ity any decision tree of depth no more than d that is consistent with m training examples has misclassification probability no more than o ( ( (Neff VCdim(U) log2 m log d)) 1/2), where U is the class of node decision functions, and Neff::; N can be thought of as the effective number of leaves (it becomes small as the distribution on the leaves induced by the training data gets far from uniform). This bound is qualitatively different from the VC bound and can be considerably smaller. We use the same technique to give similar results for DNF formulae.

decision tree and dnf, generalization, size matter, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.62)

Add feedback