AITopics

We present an improvement of Novikoff's perceptron convergence theorem. Reinterpreting this mistake bound as a margin dependent sparsity guarantee allows us to give a PACstyle generalisation error boundfor the classifier learned by the perceptron learning algorithm. Thebound value crucially depends on the margin a support vector machine would achieve on the same data set using the same kernel. Ironically, the bound yields better guarantees than are currently availablefor the support vector solution itself. 1 Introduction In the last few years there has been a large controversy about the significance of the attained margin, i.e. the smallest real valued output of a classifiers before thresholding, as an indicator of generalisation performance. Results in the YC, PAC and luckiness frameworks seem to indicate that a large margin is a prerequisite for small generalisation error bounds (see [14, 12]).

artificial intelligence, generalisation error, neural network, (17 more...)

Country:

North America > United States > Wisconsin (0.14)
North America > United States > California (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.84)

Hsu, David, Figueroa, Miguel, Diorio, Chris

A Silicon Primitive for Competitive Learning

Competitive learning is a technique for training classification and clustering networks. We have designed and fabricated an 11-transistor primitive, that we term an automaximizing bump circuit, that implements competitive learning dynamics. The circuit performs asimilarity computation, affords nonvolatile storage, and implements simultaneous local adaptation and computation. We show that our primitive is suitable for implementing competitive learning in VLSI, and demonstrate its effectiveness in a standard clustering task. 1 Introduction Competitive learning is a family of neural learning algorithms that has proved useful fortraining many classification and clustering networks [1]. In these networks, a neuron's synaptic weight vector typically represents a tight cluster of data points.

artificial intelligence, bump circuit, machine learning, (16 more...)

Country: North America > United States > California (0.28)

Industry: Semiconductors & Electronics (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Myung, In Jae, Pitt, Mark A., Zhang, Shaobo, Balasubramanian, Vijay

The Use of MDL to Select among Computational Models of Cognition

How should we decide among competing explanations of a cognitive process given limited observations?

bayesian inference, complexity, neural network, (18 more...)

Country:

North America > United States > Pennsylvania (0.14)
North America > United States > Ohio (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.95)

Redundancy and Dimensionality Reduction in Sparse-Distributed Representations of Natural Objects in Terms of Their Local Features

Penev, Penio S.

Low-dimensional representations are key to solving problems in highlevel vision,such as face compression and recognition. Factorial coding strategies for reducing the redundancy present in natural images on the basis of their second-order statistics have been successful in accounting forboth psychophysical and neurophysiological properties of early vision. Class-specific representations are presumably formed later, at the higher-level stages of cortical processing. Here we show that when retinotopic factorial codes are derived for ensembles of natural objects, such as human faces, not only redundancy, but also dimensionality is reduced. Wealso show that objects are built from parts in a non-Gaussian fashion which allows these local-feature codes to have dimensionalities that are substantially lower than the respective Nyquist sampling rates.

artificial intelligence, machine learning, representation, (14 more...)

Country: North America > United States (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.42)

Zemel, Richard S., Pitassi, Toniann

A Gradient-Based Boosting Algorithm for Regression Problems

Adaptive boosting methods are simple modular algorithms that operate as follows. Let 9: X -t Y be the function to be learned, where the label set Y is finite, typically binary-valued.The algorithm uses a learning procedure, which has access to n training examples, {(Xl, Y1), ..., (xn, Yn)}, drawn randomly from X x Yaccording todistribution D; it outputs a hypothesis I:

algorithm, artificial intelligence, inductive learning, (17 more...)

Country: North America > Canada > Ontario > Toronto (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.90)

Cadez, Igor V., Smyth, Padhraic

Model Complexity, Goodness of Fit and Diminishing Returns

Igor V. Cadez Information and Computer Science University of California Irvine, CA 92697-3425, U.S.A. PadhraicSmyth Information and Computer Science University of California Irvine, CA 92697-3425, U.S.A. Abstract We investigate a general characteristic of the tradeoff in learning problems between goodness-of-fit and model complexity. Specifically wecharacterize a general class of learning problems where the goodness-of-fit function can be shown to be convex within firstorder asa function of model complexity. This general property of "diminishing returns" is illustrated on a number of real data sets and learning problems, including finite mixture modeling and multivariate linear regression. 1 Introduction, Motivation, and Related Work Assume we have a data set D Such learning tasks can typically be characterized by the existence of a model and a loss function. The complexity k is the number of Markov models being used in the mixture (see Cadez et al. (2000) for further details on the model and the data set). The empirical curve has a distinctly concave appearance, with large relative gains in fit for low complexity models and much more modest relative gains for high complexity models.

artificial intelligence, loss function, machine learning, (13 more...)

Country: North America > United States > California > Orange County > Irvine (0.95)

Industry: Education > Focused Education > Special Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.37)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Bengio, Yoshua, Ducharme, Réjean, Vincent, Pascal

A Neural Probabilistic Language Model

A goal of statistical language modeling is to learn the joint probability function of sequences of words. This is intrinsically difficult because of the curse of dimensionality: we propose to fight it with its own weapons. In the proposed approach one learns simultaneously (1) a distributed representation foreach word (i.e. a similarity between words) along with (2) the probability function for word sequences, expressed with these representations. Generalizationis obtained because a sequence of words that has never been seen before gets high probability if it is made of words that are similar to words forming an already seen sentence. We report on experiments using neural networks for the probability function, showing on two text corpora that the proposed approach very significantly improves ona state-of-the-art trigram model. 1 Introduction A fundamental problem that makes language modeling and other learning problems difficult isthe curse of dimensionality. It is particularly obvious in the case when one wants to model the joint distribution between many discrete random variables (such as words in a sentence, or discrete attributes in a data-mining task).

artificial intelligence, neural network, sequence, (18 more...)

Country:

North America > Canada > Quebec (0.14)
North America > United States > Ohio (0.14)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.74)

arXiv.org Artificial IntelligenceDec-15-2001

Distribution of Mutual Information

Hutter, Marcus

The mutual information of two random variables i and j with joint probabilities t_ij is commonly used in learning Bayesian nets as well as in many other fields. The chances t_ij are usually estimated by the empirical sampling frequency n_ij/n leading to a point estimate I(n_ij/n) for the mutual information. To answer questions like "is I(n_ij/n) consistent with zero?" or "what is the probability that the true mutual information is much larger than the point estimate?" one has to go beyond the point estimate. In the Bayesian framework one can answer these questions by utilizing a (second order) prior distribution p(t) comprising prior information about t. From the prior p(t) one can compute the posterior p(t|n), from which the distribution p(I|n) of the mutual information can be calculated. We derive reliable and quickly computable approximations for p(I|n). We concentrate on the mean, variance, skewness, and kurtosis, and non-informative priors. For the mean we also give an exact expression. Numerical issues and the range of validity are discussed.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Artificial Intelligence

cs/0112019

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.89)

Agent-Centered Search

Koenig, Sven

AI MagazineDec-15-2001

In this article, I describe agent-centered search (also called real-time search or local search) and illustrate this planning paradigm with examples. Agent-centered search methods interleave planning and plan execution and restrict planning to the part of the domain around the current state of the agent, for example, the current location of a mobile robot or the current board position of a game. These methods can execute actions in the presence of time constraints and often have a small sum of planning and execution cost, both because they trade off planning and execution cost and because they allow agents to gather information early in nondeterministic domains, which reduces the amount of planning they have to perform for unencountered situations. These advantages become important as more intelligent systems are interfaced with the world and have to operate autonomously in complex environments. Agent-centered search methods have been applied to a variety of domains, including traditional search, strips-type planning, moving-target search, planning with totally and partially observable Markov decision process models, reinforcement learning, constraint satisfaction, and robot navigation. I discuss the design and properties of several agent-centered search methods, focusing on robot exploration and localization.

artificial intelligence, execution cost, survey article, (17 more...)

AI Magazine

Country:

Europe (1.00)
North America > United States > Massachusetts > Middlesex County (0.46)
North America > United States > California > San Francisco County > San Francisco (0.28)

Industry: Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)