Goto

Collaborating Authors

 Country


From Margin to Sparsity

Neural Information Processing Systems

We present an improvement of Novikoff's perceptron convergence theorem. Reinterpreting this mistake bound as a margin dependent sparsity guarantee allows us to give a PACstyle generalisation error boundfor the classifier learned by the perceptron learning algorithm. Thebound value crucially depends on the margin a support vector machine would achieve on the same data set using the same kernel. Ironically, the bound yields better guarantees than are currently availablefor the support vector solution itself. 1 Introduction In the last few years there has been a large controversy about the significance of the attained margin, i.e. the smallest real valued output of a classifiers before thresholding, as an indicator of generalisation performance. Results in the YC, PAC and luckiness frameworks seem to indicate that a large margin is a prerequisite for small generalisation error bounds (see [14, 12]).


A Silicon Primitive for Competitive Learning

Neural Information Processing Systems

Competitive learning is a technique for training classification and clustering networks. We have designed and fabricated an 11-transistor primitive, that we term an automaximizing bump circuit, that implements competitive learning dynamics. The circuit performs asimilarity computation, affords nonvolatile storage, and implements simultaneous local adaptation and computation. We show that our primitive is suitable for implementing competitive learning in VLSI, and demonstrate its effectiveness in a standard clustering task. 1 Introduction Competitive learning is a family of neural learning algorithms that has proved useful fortraining many classification and clustering networks [1]. In these networks, a neuron's synaptic weight vector typically represents a tight cluster of data points.



Redundancy and Dimensionality Reduction in Sparse-Distributed Representations of Natural Objects in Terms of Their Local Features

Neural Information Processing Systems

Low-dimensional representations are key to solving problems in highlevel vision,such as face compression and recognition. Factorial coding strategies for reducing the redundancy present in natural images on the basis of their second-order statistics have been successful in accounting forboth psychophysical and neurophysiological properties of early vision. Class-specific representations are presumably formed later, at the higher-level stages of cortical processing. Here we show that when retinotopic factorial codes are derived for ensembles of natural objects, such as human faces, not only redundancy, but also dimensionality is reduced. Wealso show that objects are built from parts in a non-Gaussian fashion which allows these local-feature codes to have dimensionalities that are substantially lower than the respective Nyquist sampling rates.


A Gradient-Based Boosting Algorithm for Regression Problems

Neural Information Processing Systems

Adaptive boosting methods are simple modular algorithms that operate as follows. Let 9: X -t Y be the function to be learned, where the label set Y is finite, typically binary-valued.The algorithm uses a learning procedure, which has access to n training examples, {(Xl, Y1), ..., (xn, Yn)}, drawn randomly from X x Yaccording todistribution D; it outputs a hypothesis I:


Model Complexity, Goodness of Fit and Diminishing Returns

Neural Information Processing Systems

Igor V. Cadez Information and Computer Science University of California Irvine, CA 92697-3425, U.S.A. PadhraicSmyth Information and Computer Science University of California Irvine, CA 92697-3425, U.S.A. Abstract We investigate a general characteristic of the tradeoff in learning problems between goodness-of-fit and model complexity. Specifically wecharacterize a general class of learning problems where the goodness-of-fit function can be shown to be convex within firstorder asa function of model complexity. This general property of "diminishing returns" is illustrated on a number of real data sets and learning problems, including finite mixture modeling and multivariate linear regression. 1 Introduction, Motivation, and Related Work Assume we have a data set D Such learning tasks can typically be characterized by the existence of a model and a loss function. The complexity k is the number of Markov models being used in the mixture (see Cadez et al. (2000) for further details on the model and the data set). The empirical curve has a distinctly concave appearance, with large relative gains in fit for low complexity models and much more modest relative gains for high complexity models.


A Neural Probabilistic Language Model

Neural Information Processing Systems

A goal of statistical language modeling is to learn the joint probability function of sequences of words. This is intrinsically difficult because of the curse of dimensionality: we propose to fight it with its own weapons. In the proposed approach one learns simultaneously (1) a distributed representation foreach word (i.e. a similarity between words) along with (2) the probability function for word sequences, expressed with these representations. Generalizationis obtained because a sequence of words that has never been seen before gets high probability if it is made of words that are similar to words forming an already seen sentence. We report on experiments using neural networks for the probability function, showing on two text corpora that the proposed approach very significantly improves ona state-of-the-art trigram model. 1 Introduction A fundamental problem that makes language modeling and other learning problems difficult isthe curse of dimensionality. It is particularly obvious in the case when one wants to model the joint distribution between many discrete random variables (such as words in a sentence, or discrete attributes in a data-mining task).


Distribution of Mutual Information

arXiv.org Artificial Intelligence

The mutual information of two random variables i and j with joint probabilities t_ij is commonly used in learning Bayesian nets as well as in many other fields. The chances t_ij are usually estimated by the empirical sampling frequency n_ij/n leading to a point estimate I(n_ij/n) for the mutual information. To answer questions like "is I(n_ij/n) consistent with zero?" or "what is the probability that the true mutual information is much larger than the point estimate?" one has to go beyond the point estimate. In the Bayesian framework one can answer these questions by utilizing a (second order) prior distribution p(t) comprising prior information about t. From the prior p(t) one can compute the posterior p(t|n), from which the distribution p(I|n) of the mutual information can be calculated. We derive reliable and quickly computable approximations for p(I|n). We concentrate on the mean, variance, skewness, and kurtosis, and non-informative priors. For the mean we also give an exact expression. Numerical issues and the range of validity are discussed.


Agent-Centered Search

AI Magazine

In this article, I describe agent-centered search (also called real-time search or local search) and illustrate this planning paradigm with examples. Agent-centered search methods interleave planning and plan execution and restrict planning to the part of the domain around the current state of the agent, for example, the current location of a mobile robot or the current board position of a game. These methods can execute actions in the presence of time constraints and often have a small sum of planning and execution cost, both because they trade off planning and execution cost and because they allow agents to gather information early in nondeterministic domains, which reduces the amount of planning they have to perform for unencountered situations. These advantages become important as more intelligent systems are interfaced with the world and have to operate autonomously in complex environments. Agent-centered search methods have been applied to a variety of domains, including traditional search, strips-type planning, moving-target search, planning with totally and partially observable Markov decision process models, reinforcement learning, constraint satisfaction, and robot navigation. I discuss the design and properties of several agent-centered search methods, focusing on robot exploration and localization.


It Does So: Review of The Mind Doesn't Work That Way: The Scope and Limits of Computational Psychology

AI Magazine

The Mind Doesn't Work That Way: Fodor dubs the synthesis of computationalism, we've got; indeed, the only one like the wrong paradigm for studying However, doesn't work for abductive inferences" types is innate), massive modularity we will have to add something radically (p. Fodor doesn't that the frame problem is why the part in a knowledge base antecedently think we were created, of course; instead of the human mind responsible for deemed to be irrelevant to the inference. Fodor defines irrelevant information, globality rather than by gradual, small transitions, the frame problem as the problem of (pp. Consider just one case the latter being the hallmark of "[h]ow to make abductive inferences from research on analogy: Who would classical adaptationism). This of the atom, but it was relevant.