AITopics

Support Vector (SV) Machines for pattern recognition, regression estimation and operator inversion exploit the idea of transforming into a high dimensional feature space where they perform a linear algorithm. Instead of evaluating this map explicitly, one uses Hilbert Schmidt Kernels k(x, y) which correspond to dot products of the mapped data in high dimensional space, i.e. k(x, y) ( I (x) · I (y))

kernel, regularization operator, sv machine, (9 more...)

Country:

North America > United States > New York (0.05)
North America > United States > District of Columbia > Washington (0.04)
North America > United States > California > San Mateo County > San Mateo (0.04)
Europe > Germany > Berlin (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.74)

Shawe-Taylor, John, Cristianini, Nello

Data-Dependent Structural Risk Minimization for Perceptron Decision Trees

This paper presents a neural-model of pre-attentive visual processing. The model explains why certain displays can be processed very fast, "in parallel", while others require slower, "serial" processing, in subsequent attentional systems. Our approach stems from the observation that the visual environment is overflowing with diverse information, but the biological information-processing systems analyzing it have a limited capacity [1]. This apparent mismatch suggests that data compression should be performed at an early stage of perception, and that via an accompanying process of dimension reduction, only a few essential features of the visual display should be retained. We propose that only parallel displays incorporate global features that enable fast target detection, and hence they can be processed pre-attentively, with all items (target and dis tractors) examined at once.

node, principal axis, similarity, (16 more...)

Country:

Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.05)
North America > United States > New York (0.04)
North America > United States > Indiana (0.04)
North America > United States > California > Orange County > Irvine (0.04)

Genre: Research Report (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.43)

Seung, H. Sebastian, Richardson, Tom J., Lagarias, J. C., Hopfield, John J.

Minimax and Hamiltonian Dynamics of Excitatory-Inhibitory Networks

A Lyapunov function for excitatory-inhibitory networks is constructed. The construction assumes symmetric interactions within excitatory and inhibitory populations of neurons, and antisymmetric interactions between populations. The Lyapunov function yields sufficient conditions for the global asymptotic stability of fixed points. If these conditions are violated, limit cycles may be stable. The relations of the Lyapunov function to optimization theory and classical mechanics are revealed by minimax and dissipative Hamiltonian forms of the network dynamics.

excitatory-inhibitory network, lyapunov function, sufficient condition, (14 more...)

Country:

North America > United States > New York (0.05)
North America > United States > New Jersey > Mercer County > Princeton (0.04)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.63)

Rattray, Magnus, Saad, David

Globally Optimal On-line Learning Rules

We present a method for determining the globally optimal online learning rule for a soft committee machine under a statistical mechanics framework. This work complements previous results on locally optimal rules, where only the rate of change in generalization error was considered. We maximize the total reduction in generalization error over the whole learning process and show how the resulting rule can significantly outperform the locally optimal rule. 1 Introduction We consider a learning scenario in which a feed-forward neural network model (the student) emulates an unknown mapping (the teacher), given a set of training examples produced by the teacher. The performance of the student network is typically measured by its generalization error, which is the expected error on an unseen example. The aim of training is to reduce the generalization error by adapting the student network's parameters appropriately. A common form of training is online learning, where training patterns are presented sequentially and independently to the network at each learning step.

algorithm, generalization error, optimal rule, (14 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
Asia > Middle East > Jordan (0.05)
North America > United States > California > San Mateo County > San Mateo (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Instructional Material > Online (0.40)

Industry: Education > Educational Setting > Online (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Priel, Avner, Kanter, Ido, Kessler, David A.

Analytical Study of the Interplay between Architecture and Predictability

We study model feed forward networks as time series predictors in the stationary limit. The focus is on complex, yet non-chaotic, behavior. The main question we address is whether the asymptotic behavior is governed by the architecture, regardless the details of the weights. We find hierarchies among classes of architectures with respect to the attract or dimension of the long term sequence they are capable of generating; larger number of hidden units can generate higher dimensional attractors. In the case of a perceptron, we develop the stationary solution for general weights, and show that the flow is typically one dimensional.

attractor, fourier component, prediction, (15 more...)

Country:

Asia > Middle East > Israel (0.05)
North America > United States > Massachusetts > Middlesex County > Reading (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.59)

Structural Risk Minimization for Nonparametric Time Series Prediction

Meir, Ron

The problem of time series prediction is studied within the uniform convergence framework of Vapnik and Chervonenkis. The dependence inherent in the temporal structure is incorporated into the analysis, thereby generalizing the available theory for memoryless processes. Finite sample bounds are calculated in terms of covering numbers of the approximating class, and the tradeoff between approximation and estimation is discussed. A complexity regularization approach is outlined, based on Vapnik's method of Structural Risk Minimization, and shown to be applicable in the context of mixing stochastic processes.

complexity, sequence, structural risk minimization, (12 more...)

Country:

North America > United States > New York (0.05)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Leen, Todd K., Schottky, Bernhard, Saad, David

Two Approaches to Optimal Annealing

We employ both master equation and order parameter approaches to analyze the asymptotic dynamics of online learning with different learning rate annealing schedules. We examine the relations between the results obtained by the two approaches and obtain new results on the optimal decay coefficients and their dependence on the number of hidden nodes in a two layer architecture.

equation, generalization error, order parameter approach, (14 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Oregon (0.05)
North America > United States > California > San Mateo County > San Mateo (0.04)
(2 more...)

Industry: Education > Educational Setting (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Asymptotic Theory for Regularization: One-Dimensional Linear Case

Koistinen, Petri

The generalization ability of a neural network can sometimes be improved dramatically by regularization. To analyze the improvement one needs more refined results than the asymptotic distribution of the weight vector. Here we study the simple case of one-dimensional linear regression under quadratic regularization, i.e., ridge regression. We study the random design, misspecified case, where we derive expansions for the optimal regularization parameter and the ensuing improvement. It is possible to construct examples where it is best to use no regularization.

asymptotic theory, expansion, regularization parameter, (13 more...)

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)
Europe > Austria > Vienna (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.37)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.35)

Kivinen, Jyrki, Warmuth, Manfred K. K.

Relative Loss Bounds for Multidimensional Regression Problems

We study online generalized linear regression with multidimensional outputs, i.e., neural networks with multiple output nodes but no hidden nodes. We allow at the final layer transfer functions such as the softmax function that need to consider the linear activations to all the output neurons. We use distance functions of a certain kind in two completely independent roles in deriving and analyzing online learning algorithms for such tasks. We use one distance function to define a matching loss function for the (possibly multidimensional) transfer function, which allows us to generalize earlier results from one-dimensional to multidimensional outputs. We use another distance function as a tool for measuring progress made by the online updates. This shows how previously studied algorithms such as gradient descent and exponentiated gradient fit into a common framework. We evaluate the performance of the algorithms using relative loss bounds that compare the loss of the online algoritm to the best off-line predictor from the relevant model class, thus completely eliminating probabilistic assumptions about the data.

algorithm, loss function, transfer function, (13 more...)

Country:

North America > United States > California > Santa Cruz County > Santa Cruz (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > United States > New York (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

New Approximations of Differential Entropy for Independent Component Analysis and Projection Pursuit

Hyvärinen, Aapo

We derive a first-order approximation of the density of maximum entropy for a continuous 1-D random variable, given a number of simple constraints. This results in a density expansion which is somewhat similar to the classical polynomial density expansions by Gram-Charlier and Edgeworth. Using this approximation of density, an approximation of 1-D differential entropy is derived. The approximation of entropy is both more exact and more robust against outliers than the classical approximation based on the polynomial density expansions, without being computationally more expensive. The approximation has applications, for example, in independent component analysis and projection pursuit. 1 Introduction The basic information-theoretic quantity for continuous one-dimensional random variables is differential entropy. The differential entropy H of a scalar random variable X with density f(x) is defined as H(X) - / f(x) log f(x)dx.

approximation, entropy, projection pursuit, (15 more...)

Country:

Europe > Finland > Uusimaa > Helsinki (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.53)