AITopics

This paper presents a neural-model of pre-attentive visual processing. The model explains why certain displays can be processed very fast, "in parallel", while others require slower, "serial" processing, in subsequent attentional systems. Our approach stems from the observation that the visual environment is overflowing with diverse information, but the biological information-processing systems analyzing it have a limited capacity [1]. This apparent mismatch suggests that data compression should be performed at an early stage of perception, and that via an accompanying process of dimension reduction, only a few essential features of the visual display should be retained. We propose that only parallel displays incorporate global features that enable fast target detection, and hence they can be processed pre-attentively, with all items (target and dis tractors) examined at once.

node, principal axis, similarity, (16 more...)

Country:

Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.05)
North America > United States > New York (0.04)
North America > United States > Indiana (0.04)
North America > United States > California > Orange County > Irvine (0.04)

Genre: Research Report (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.43)

Rattray, Magnus, Saad, David

Globally Optimal On-line Learning Rules

We present a method for determining the globally optimal online learning rule for a soft committee machine under a statistical mechanics framework. This work complements previous results on locally optimal rules, where only the rate of change in generalization error was considered. We maximize the total reduction in generalization error over the whole learning process and show how the resulting rule can significantly outperform the locally optimal rule. 1 Introduction We consider a learning scenario in which a feed-forward neural network model (the student) emulates an unknown mapping (the teacher), given a set of training examples produced by the teacher. The performance of the student network is typically measured by its generalization error, which is the expected error on an unseen example. The aim of training is to reduce the generalization error by adapting the student network's parameters appropriately. A common form of training is online learning, where training patterns are presented sequentially and independently to the network at each learning step.

algorithm, generalization error, optimal rule, (14 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
Asia > Middle East > Jordan (0.05)
North America > United States > California > San Mateo County > San Mateo (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Instructional Material > Online (0.40)

Industry: Education > Educational Setting > Online (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Priel, Avner, Kanter, Ido, Kessler, David A.

Analytical Study of the Interplay between Architecture and Predictability

We study model feed forward networks as time series predictors in the stationary limit. The focus is on complex, yet non-chaotic, behavior. The main question we address is whether the asymptotic behavior is governed by the architecture, regardless the details of the weights. We find hierarchies among classes of architectures with respect to the attract or dimension of the long term sequence they are capable of generating; larger number of hidden units can generate higher dimensional attractors. In the case of a perceptron, we develop the stationary solution for general weights, and show that the flow is typically one dimensional.

attractor, fourier component, prediction, (15 more...)

Country:

Asia > Middle East > Israel (0.05)
North America > United States > Massachusetts > Middlesex County > Reading (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.59)

Structural Risk Minimization for Nonparametric Time Series Prediction

Meir, Ron

The problem of time series prediction is studied within the uniform convergence framework of Vapnik and Chervonenkis. The dependence inherent in the temporal structure is incorporated into the analysis, thereby generalizing the available theory for memoryless processes. Finite sample bounds are calculated in terms of covering numbers of the approximating class, and the tradeoff between approximation and estimation is discussed. A complexity regularization approach is outlined, based on Vapnik's method of Structural Risk Minimization, and shown to be applicable in the context of mixing stochastic processes.

complexity, sequence, structural risk minimization, (12 more...)

Country:

North America > United States > New York (0.05)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Leen, Todd K., Schottky, Bernhard, Saad, David

Two Approaches to Optimal Annealing

We employ both master equation and order parameter approaches to analyze the asymptotic dynamics of online learning with different learning rate annealing schedules. We examine the relations between the results obtained by the two approaches and obtain new results on the optimal decay coefficients and their dependence on the number of hidden nodes in a two layer architecture.

equation, generalization error, order parameter approach, (14 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Oregon (0.05)
North America > United States > California > San Mateo County > San Mateo (0.04)
(2 more...)

Industry: Education > Educational Setting (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Asymptotic Theory for Regularization: One-Dimensional Linear Case

Koistinen, Petri

The generalization ability of a neural network can sometimes be improved dramatically by regularization. To analyze the improvement one needs more refined results than the asymptotic distribution of the weight vector. Here we study the simple case of one-dimensional linear regression under quadratic regularization, i.e., ridge regression. We study the random design, misspecified case, where we derive expansions for the optimal regularization parameter and the ensuing improvement. It is possible to construct examples where it is best to use no regularization.

asymptotic theory, expansion, regularization parameter, (13 more...)

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)
Europe > Austria > Vienna (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.37)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.35)

Kappen, Hilbert J., Ortiz, Francisco de Borja Rodríguez

Boltzmann Machine Learning Using Mean Field Theory and Linear Response Correction

We present a new approximate learning algorithm for Boltzmann Machines, using a systematic expansion of the Gibbs free energy to second order in the weights. The linear response correction to the correlations is given by the Hessian of the Gibbs free energy. The computational complexity of the algorithm is cubic in the number of neurons. We compare the performance of the exact BM learning algorithm with first order (Weiss) mean field theory and second order (TAP) mean field theory. The learning task consists of a fully connected Ising spin glass model on 10 neurons. We conclude that 1) the method works well for paramagnetic problems 2) the TAP correction gives a significant improvement over the Weiss mean field theory, both for paramagnetic and spin glass problems and 3) that the inclusion of diagonal weights improves the Weiss approximation for paramagnetic problems, but not for spin glass problems.

approximation, kl divergence, weiss approximation, (12 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Spain > Galicia > Madrid (0.05)
Europe > Netherlands > Gelderland > Nijmegen (0.05)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.64)

Selecting Weighting Factors in Logarithmic Opinion Pools

Heskes, Tom

A simple linear averaging of the outputs of several networks as e.g. in bagging [3], seems to follow naturally from a bias/variance decomposition of the sum-squared error. The sum-squared error of the average model is a quadratic function of the weighting factors assigned to the networks in the ensemble [7], suggesting a quadratic programming algorithm for finding the "optimal" weighting factors. If we interpret the output of a network as a probability statement, the sum-squared error corresponds to minus the loglikelihood or the Kullback-Leibler divergence, and linear averaging of the outputs to logarithmic averaging of the probability statements: the logarithmic opinion pool. The crux of this paper is that this whole story about model averaging, bias/variance decompositions, and quadratic programming to find the optimal weighting factors, is not specific for the sumsquared error, but applies to the combination of probability statements of any kind in a logarithmic opinion pool, as long as the Kullback-Leibler divergence plays the role of the error measure. As examples we treat model averaging for classification models under a cross-entropy error measure and models for estimating variances.

kullback-leibler divergence, logarithmic opinion pool, weighting factor, (11 more...)

Country:

Europe > Netherlands > Gelderland > Nijmegen (0.05)
Asia > Middle East > Jordan (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.55)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.47)

Golea, Mostefa, Bartlett, Peter L., Lee, Wee Sun, Mason, Llew

Generalization in Decision Trees and DNF: Does Size Matter?

Recent theoretical results for pattern classification with thresholded real-valued functions (such as support vector machines, sigmoid networks, and boosting) give bounds on misclassification probability that do not depend on the size of the classifier, and hence can be considerably smaller than the bounds that follow from the VC theory. In this paper, we show that these techniques can be more widely applied, by representing other boolean functions as two-layer neural networks (thresholded convex combinations of boolean functions).

decision tree, misclassification probability, probability, (14 more...)

Country:

Oceania > Australia > Australian Capital Territory > Canberra (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > San Mateo County > San Mateo (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)

Bohossian, Vasken, Bruck, Jehoshua

Multiple Threshold Neural Logic

This observation has boosted interest in the field of artificial neural networks [Hopfield 82], [Rumelhart 82]. The latter are built by interconnecting artificial neurons whose behavior is inspired by that of biological neurons.

boolean function, compute, symmetric function, (15 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > Los Angeles County > Pasadena (0.04)

Industry: Semiconductors & Electronics (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)