AITopics

I consider the problem of calculating learning curves (i.e., average generalization performance) of Gaussian processes used for regression. A simple expression for the generalization error in terms of the eigenvalue decomposition of the covariance function is derived, and used as the starting point for several approximation schemes. I identify where these become exact, and compare with existing bounds on learning curves; the new approximations, which can be used for any input space dimension, generally get substantially closer to the truth. 1 INTRODUCTION: GAUSSIAN PROCESSES Within the neural networks community, there has in the last few years been a good deal of excitement about the use of Gaussian processes as an alternative to feedforward networks [lJ. The advantages of Gaussian processes are that prior assumptions about the problem to be learned are encoded in a very transparent way, and that inference-at least in the case of regression that I will consider-is relatively straightforward. One crucial question for applications is then how'fast' Gaussian processes learn, i.e., how many training examples are needed to achieve a certain level of generalization performance.

artificial intelligence, covariance function, machine learning, (15 more...)

Country: Europe > United Kingdom (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.56)

Moll, Robert, Barto, Andrew G., Perkins, Theodore J., Sutton, Richard S.

Learning Instance-Independent Value Functions to Enhance Local Search

Reinforcement learning methods can be used to improve the performance of local search algorithms for combinatorial optimization by learning an evaluation function that predicts the outcome of search. The evaluation function is therefore able to guide search to low-cost solutions better than can the original cost function. We describe a reinforcement learning method for enhancing local search that combines aspects of previous work by Zhang and Dietterich (1995) and Boyan and Moore (1997, Boyan 1998). In an off-line learning phase, a value function is learned that is useful for guiding search for multiple problem sizes and instances. We illustrate our technique by developing several such functions for the Dial-A-Ride Problem. Our learning-enhanced local search algorithm exhibits an improvement of more then 30% over a standard local search algorithm.

algorithm, artificial intelligence, machine learning, (14 more...)

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Yuille, Alan L., Coughlan, James M.

Convergence Rates of Algorithms for Visual Search: Detecting Visual Contours

This paper formulates the problem of visual search as Bayesian and defines a Bayesian ensemble of problem instances.inference In particular, we address the problem of the detection of visual noise/clutter by optimizing a global criterion whichcontours in and geometry information.

algorithm, artificial intelligence, information management, (17 more...)

Country: North America > United States > California > San Francisco County > San Francisco (0.15)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.94)
Information Technology > Information Management > Search (0.87)

Kearns, Michael J., Saul, Lawrence K.

Inference in Multilayer Networks via Large Deviation Bounds

Arguably oneabilities of the most important types of information processing is the capacity for probabilistic reasoning. The properties of undirectedproDabilistic models represented as symmetric networks ethave been studied extensively using methods from statistical mechanics (Hertz aI, 1991). Detailed analyses of these models are possible by exploiting averaging that occur in the thermodynamic limit of large networks.phenomena In this paper, we analyze the limit of large, multilayer networks for probabilistic models represented as directed acyclic graphs. These models are known as Bayesian networks (Pearl, 1988; Neal, 1992), and they have different probabilistic semantics than symmetric neural networks (such as Hopfield models or Boltzmann machines). We show that the intractability of exact inference in multilayer Bayesian networks 261 Inference in Multilayer Networks via Large Deviation Bounds does not preclude their effective use. Our work builds on earlier studies of variational methods (Jordan et aI, 1997).

artificial intelligence, bayesian inference, probability, (17 more...)

Country:

Asia > Middle East > Jordan (0.25)
North America > United States > California > San Mateo County (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Neuneier, Ralph, Mihatsch, Oliver

Risk Sensitive Reinforcement Learning

A directed generative model for binary data using a small number of hidden continuous units is investigated. The relationships between the correlations of the underlying continuous Gaussian variables and the binary output variables are utilized to learn the appropriate weights of the network. The advantages of this approach are illustrated on a translationally invariant binary distribution and on handwritten digit images. Introduction Principal Components Analysis (PCA) is a widely used statistical technique for representing data with a large number of variables [1]. It is based upon the assumption that although the data is embedded in a high dimensional vector space, most of the variability in the data is captured by a much lower climensional manifold. In particular for PCA, this manifold is described by a linear hyperplane whose characteristic directions are given by the eigenvectors of the correlation matrix with the largest eigenvalues. The success of PCA and closely related techniques such as Factor Analysis (FA) and PCA mixtures clearly indicate that much real world data exhibit the low dimensional manifold structure assumed by these models [2, 3].

artificial intelligence, banking & finance, generative model, (17 more...)

Country:

North America > United States (0.28)
Asia > Middle East > Israel (0.14)

Industry: Banking & Finance > Trading (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)

Munos, Rémi, Moore, Andrew W.

Barycentric Interpolators for Continuous Space and Time Reinforcement Learning

In order to find the optimal control of continuous state-space and time reinforcement learning (RL) problems, we approximate the value function (VF) with a particular class of functions called the barycentric interpolators. We establish sufficient conditions under which a RL algorithm converges to the optimal VF, even when we use approximate models of the state dynamics and the reinforcement functions.

artificial intelligence, convergence, reinforcement learning, (13 more...)

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Williams, Christopher K. I., Adams, Nicholas J.

DTs: Dynamic Trees

A dynamic tree model specifies a prior over a large number of trees, each one of which is a tree-structured belief net (TSBN) . Our aim is to retain the advantages of tree-structured belief networks, namely the hierarchical structure of the model and (in part) the efficient inference algorithms, while avoiding the "blocky" artifacts that derive from a single, fixed TSBN structure. One use for DTs is as prior models over labellings for image segmentation problems.

artificial intelligence, machine learning, node, (17 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Magdon-Ismail, Malik, Atiya, Amir F.

Neural Networks for Density Estimation

Although quantities such as the mean, the variance, and possibly higher order moments of a random variable have often been sufficient to characterize a particular problem, the quest for higher modeling accuracy, and for more realistic assumptions drives us towards modeling the available random variables using their probability density. This of course leads us to the problem of density estimation (see [6]). The most common approach for density estimation is the nonparametric approach, where the density is determined according to a formula involving the data points available. The most common non parametric methods are the kernel density estimator, alsoknown as the Parzen window estimator [4] and the k-nearest neighbor technique [1]. Non parametric density estimation belongs to the class of ill-posed problems in the sense that small changes in the data can lead to large changes in "To whom correspondence should be addressed.

artificial intelligence, distribution function, machine learning, (16 more...)

Country: North America > United States > California (0.15)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Jr., Charles Lee Isbell, Viola, Paul A.

Restructuring Sparse High Dimensional Data for Effective Retrieval

The task in text retrieval is to find the subset of a collection of documents relevant to a user's information request, usually expressed as a set of words. Classically, documents and queries are represented as vectors of word counts.

artificial intelligence, natural language, query, (15 more...)

Country:

Africa (0.30)
North America > United States > Massachusetts (0.14)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Wheeler, Kevin R., Dhawan, Atam P.

Basis Selection for Wavelet Regression

The initial assumption is that the original data samples lie in the finest space Vo, which is spanned by the scaling function,p E Vo such that the collection {,p( x -t) It E Z} is a Riesz basis of Vo .

artificial intelligence, coefficient, data quality, (14 more...)