AITopics

The problem of time series prediction is studied within the uniform convergence frameworkof Vapnik and Chervonenkis. The dependence inherent in the temporal structure is incorporated into the analysis, thereby generalizing the available theory for memoryless processes. Finite sample boundsare calculated in terms of covering numbers of the approximating class,and the tradeoff between approximation and estimation is discussed. A complexity regularization approach is outlined, based on Vapnik's method of Structural Risk Minimization, and shown to be applicable inthe context of mixing stochastic processes.

artificial intelligence, machine learning, sequence, (14 more...)

Country: Asia > Middle East > Israel (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Leen, Todd K., Schottky, Bernhard, Saad, David

Two Approaches to Optimal Annealing

The latter studies are based on examining the Kramers Moyal expansion of the master equation for the weight space probability densities. A different approach, based on the deterministic dynamics of macroscopic quantities called order parameters, has been recently presented [6, 7]. This approach enables one to monitor the evolution of the order parameters and the system performance at all times. In this paper we examine the relation between the two approaches and contrast the results obtained for different learning rate annealing schedules in the asymptotic regime. We employ the order parameter approach to examine the dependence of the dynamics on the number of hidden nodes in a multilayer system.

artificial intelligence, generalization error, machine learning, (16 more...)

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Asymptotic Theory for Regularization: One-Dimensional Linear Case

Koistinen, Petri

The generalization ability of a neural network can sometimes be improved dramatically by regularization. To analyze the improvement oneneeds more refined results than the asymptotic distribution ofthe weight vector. Here we study the simple case of one-dimensional linear regression under quadratic regularization, i.e., ridge regression. We study the random design, misspecified case, where we derive expansions for the optimal regularization parameter andthe ensuing improvement. It is possible to construct examples where it is best to use no regularization.

artificial intelligence, machine learning, regularization parameter, (15 more...)

Country:

Europe > Finland (0.14)
Europe > United Kingdom > England (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.37)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.35)

Kivinen, Jyrki, Warmuth, Manfred K.

Relative Loss Bounds for Multidimensional Regression Problems

We study online generalized linear regression with multidimensional outputs, i.e., neural networks with multiple output nodes but no hidden nodes. We allow at the final layer transfer functions such as the softmax functionthat need to consider the linear activations to all the output neurons. We use distance functions of a certain kind in two completely independent roles in deriving and analyzing online learning algorithms for such tasks. We use one distance function to define a matching loss function for the (possibly multidimensional) transfer function, which allows usto generalize earlier results from one-dimensional to multidimensional outputs.We use another distance function as a tool for measuring progress made by the online updates. This shows how previously studied algorithmssuch as gradient descent and exponentiated gradient fit into a common framework. We evaluate the performance of the algorithms usingrelative loss bounds that compare the loss of the online algoritm to the best off-line predictor from the relevant model class, thus completely eliminating probabilistic assumptions about the data.

algorithm, artificial intelligence, machine learning, (15 more...)

Country: North America > United States > California (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Kappen, Hilbert J., Ortiz, Francisco de Borja Rodríguez

Boltzmann Machine Learning Using Mean Field Theory and Linear Response Correction

We present a new approximate learning algorithm for Boltzmann Machines, using a systematic expansion of the Gibbs free energy to second order in the weights. The linear response correction to the correlations is given by the Hessian of the Gibbs free energy. The computational complexity of the algorithm is cubic in the number of neurons. We compare the performance of the exact BM learning algorithm with first order (Weiss) mean field theory and second order (TAP) mean field theory. The learning task consists of a fully connected Ising spin glass model on 10 neurons. We conclude that 1) the method works well for paramagnetic problems 2) the TAP correction gives a significant improvement over the Weiss mean field theory, both for paramagnetic and spin glass problems and 3) that the inclusion of diagonal weights improves the Weiss approximation for paramagnetic problems, but not for spin glass problems.

approximation, artificial intelligence, machine learning, (13 more...)

Country:

Europe > Spain (0.15)
Europe > Netherlands (0.15)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.64)

Selecting Weighting Factors in Logarithmic Opinion Pools

Heskes, Tom

A simple linear averaging of the outputs of several networks as e.g. in bagging [3], seems to follow naturally from a bias/variance decomposition of the sum-squared error. The sum-squared error of the average model is a quadratic function of the weighting factors assigned to the networks in the ensemble [7], suggesting a quadratic programming algorithm for finding the "optimal" weighting factors. If we interpret the output of a network as a probability statement, the sum-squared error corresponds to minus the loglikelihood or the Kullback-Leibler divergence, and linear averaging of the outputs tologarithmic averaging of the probability statements: the logarithmic opinion pool. The crux of this paper is that this whole story about model averaging, bias/variancedecompositions, and quadratic programming to find the optimal weighting factors, is not specific for the sumsquared error,but applies to the combination of probability statements of any kind in a logarithmic opinion pool, as long as the Kullback-Leibler divergence plays the role of the error measure. As examples we treat model averaging for classification models under a cross-entropy error measure and models for estimating variances.

artificial intelligence, machine learning, optimization problem, (14 more...)

Country: Europe > Netherlands (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.55)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.47)

Golea, Mostefa, Bartlett, Peter L., Lee, Wee Sun, Mason, Llew

Generalization in Decision Trees and DNF: Does Size Matter?

Recent theoretical results for pattern classification with thresholded real-valuedfunctions (such as support vector machines, sigmoid networks,and boosting) give bounds on misclassification probability that do not depend on the size of the classifier, and hence can be considerably smaller than the bounds that follow from the VC theory. In this paper, we show that these techniques can be more widely applied, by representing other boolean functions as two-layer neural networks (thresholded convex combinations of boolean functions).

artificial intelligence, decision tree learning, machine learning, (16 more...)

Country: Oceania > Australia (0.29)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)

Vinje, William E., Gallant, Jack L.

Modeling Complex Cells in an Awake Macaque during Natural Image Viewing

Our model consists of a classical energy mechanism whose output is divided by nonclassical gain control and texture contrast mechanisms. We apply this model to review movies, a stimulus sequence that replicates the stimulation a cell receives during free viewing of natural images. Data were collected from three cells using five different review movies, and the model was fit separately to the data from each movie. For the energy mechanism alone we find modest but significant correlations (rE 0.41, 0.43, 0.59, 0.35) between model and data. These correlations are improved somewhat when we allow for suppressive surround effects (rE G 0.42, 0.56, 0.60, 0.37). In one case the inclusion of a delayed suppressive surround dramatically improves the fit to the data by modifying the time course of the model's response.

artificial intelligence, mechanism, review movie, (12 more...)

Country: North America > United States > California > Alameda County > Berkeley (0.15)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.67)

Industry: Health & Medicine (0.70)

Technology:

Information Technology > Artificial Intelligence (0.48)
Information Technology > Sensing and Signal Processing > Image Processing (0.35)

Sahani, Maneesh, Pezaris, John S., Andersen, Richard A.

On the Separation of Signals from Neighboring Cells in Tetrode Recordings

We discuss a solution to the problem of separating waveforms produced bymultiple cells in an extracellular neural recording. We take an explicitly probabilistic approach, using latent-variable models ofvarying sophistication to describe the distribution of waveforms producedby a single cell. The models range from a single Gaussian distribution of waveforms for each cell to a mixture of hidden Markov models. We stress the overall statistical structure of the approach, allowing the details of the generative model chosen to depend on the specific neural preparation.

artificial intelligence, machine learning, waveform, (18 more...)

Country: North America > United States > California (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Riesenhuber, Maximilian, Poggio, Tomaso

Just One View: Invariances in Inferotemporal Cell Tuning

In macaque inferotemporal cortex (IT), neurons have been found to respond selectivelyto complex shapes while showing broad tuning ("invariance") withrespect to stimulus transformations such as translation and scale changes and a limited tuning to rotation in depth.

artificial intelligence, invariance, machine learning, (18 more...)

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)