AITopics

We study the approximation of functions by two-layer feedforward neural networks,focusing on incremental algorithms which greedily add units, estimating single unit parameters at each stage. As opposed to standard algorithms for fixed architectures, the optimization at each stage is performed over a small number of parameters, mitigating many of the difficult numerical problems inherent in high-dimensional nonlinear optimization. Weestablish upper bounds on the error incurred by the algorithm, when approximating functions from the Sobolev class, thereby extending previous results which only provided rates of convergence for functions in certain convex hulls of functional spaces. By comparing our results to recently derived lower bounds, we show that the greedy algorithms arenearly optimal. Combined with estimation error results for greedy algorithms, a strong case can be made for this type of approach.

approximation, artificial intelligence, neural network, (17 more...)

Country: Asia > Middle East > Israel (0.15)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Moghaddam, Baback, Jebara, Tony, Pentland, Alex

Bayesian Modeling of Facial Similarity

In previous work [6, 9, 10], we advanced a new technique for direct visual matching of images for the purposes of face recognition and image retrieval, using a probabilistic measure of similarity based primarily on a Bayesian (MAP) analysis of image differences, leadingto a "dual" basis similar to eigenfaces [13]. The performance advantage of this probabilistic matching technique over standard Euclidean nearest-neighbor eigenface matching was recently demonstrated using results from DARPA's 1996 "FERET" face recognition competition, in which this probabilistic matching algorithm was found to be the top performer. We have further developed a simple method of replacing the costly compution of nonlinear (online) Bayesian similarity measures by the relatively inexpensive computation of linear (offline) subspace projections and simple (online) Euclidean norms, thus resulting in a significant computational speedup for implementation with very large image databases as typically encountered in real-world applications.

bayesian inference, similarity measure, us government, (16 more...)

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)

Industry:

Government > Military (0.69)
Government > Regional Government > North America Government > United States Government (0.54)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Mika, Sebastian, Schölkopf, Bernhard, Smola, Alex J., Müller, Klaus-Robert, Scholz, Matthias, Rätsch, Gunnar

Kernel PCA and De-Noising in Feature Spaces

Kernel PCA as a nonlinear feature extractor has proven powerful as a preprocessing step for classification algorithms. But it can also be considered asa natural generalization of linear principal component analysis. This gives rise to the question how to use nonlinear features for data compression, reconstruction, and de-noising, applications common in linear PCA. This is a nontrivial task, as the results provided by kernel PCAlive in some high dimensional feature space and need not have pre-images in input space. This work presents ideas for finding approximate pre-images,focusing on Gaussian kernels, and shows experimental results using these pre-images in data reconstruction and de-noising on toy examples as well as on real world data.

algorithm, artificial intelligence, machine learning, (18 more...)

Country:

North America > United States (0.29)
Europe > Germany (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Hayashi, Akira, Suematsu, Nobuo

Viewing Classifier Systems as Model Free Learning in POMDPs

Classifier systems are now viewed disappointing because of their problems suchas the rule strength vs rule set performance problem and the credit assignment problem. In order to solve the problems, we have developed ahybrid classifier system: GLS (Generalization Learning System). In designing GLS, we view CSs as model free learning in POMDPs and take a hybrid approach to finding the best generalization, given the total number of rules. GLS uses the policy improvement procedure by Jaakkola et al. for an locally optimal stochastic policy when a set of rule conditions is given. GLS uses GA to search for the best set of rule conditions. 1 INTRODUCTION Classifier systems (CSs) (Holland 1986) have been among the most used in reinforcement learning.

artificial intelligence, evolutionary algorithm, rule set, (14 more...)

Country: Asia > Japan (0.15)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.97)

Christianson, G. Bjorn, Becker, Suzanna

A Model for Associative Multiplication

Despite the fact that mental arithmetic is based on only a few hundred basicfacts and some simple algorithms, humans have a difficult time mastering the subject, and even experienced individuals make mistakes. Associative multiplication, the process of doing multiplication by memory without the use of rules or algorithms, is especially problematic.

multiplication, neural network, representation, (14 more...)

Country:

North America > Canada > Ontario > Hamilton (0.15)
North America > United States > Massachusetts > Middlesex County (0.14)
North America > United States > California (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Briegel, Thomas, Tresp, Volker

Fisher Scoring and a Mixture of Modes Approach for Approximate Inference and Learning in Nonlinear State Space Models

The difficulties lie in the Monte-Carlo E-step which consists of sampling from the posterior distribution of the hidden variables given the observations. The new idea presented in this paper is to generate samples from a Gaussian approximation to the true posterior from which it is easy to obtain independent samples. The parameters of the Gaussian approximation are either derived from the extended Kalman filter or the Fisher scoring algorithm. In case the posterior density is multimodal wepropose to approximate the posterior by a sum of Gaussians (mixture of modes approach). We show that sampling from the approximate posteriordensities obtained by the above algorithms leads to better models than using point estimates for the hidden states. In our experiment, theFisher scoring algorithm obtained a better approximation of the posterior mode than the EKF. For a multimodal distribution, the mixture ofmodes approach gave superior results. 1 INTRODUCTION Nonlinear state space models (NSSM) are a general framework for representing nonlinear time series. In particular, any NARMAX model (nonlinear auto-regressive moving average model with external inputs) can be translated into an equivalent NSSM.

approximation, artificial intelligence, machine learning, (17 more...)

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.64)

Lee, Daniel D., Sompolinsky, Haim

Learning a Continuous Hidden Variable Model for Binary Data

A directed generative model for binary data using a small number of hidden continuous units is investigated. The relationships between the correlations of the underlying continuousGaussian variables and the binary output variables are utilized to learn the appropriate weights of the network. The advantages of this approach are illustrated on a translationally invariant binarydistribution and on handwritten digit images. Introduction Principal Components Analysis (PCA) is a widely used statistical technique for representing datawith a large number of variables [1]. It is based upon the assumption that although the data is embedded in a high dimensional vector space, most of the variability in the data is captured by a much lower climensional manifold.

artificial intelligence, eigenvalue, neural network, (17 more...)

Country:

Asia > Middle East > Israel (0.15)
North America > United States (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.47)

Al-Ansari, Mohammad A., Williams, Ronald J.

Robust, Efficient, Globally-Optimized Reinforcement Learning with the Parti-Game Algorithm

The former represents the number of cells that have to be traveled through to get to the goal cell and the latter represents the belief that there is no reliable way of getting from that cell to the goal. Cells with a cost of infinity are called losing cells while others are called winning ones.

algorithm, artificial intelligence, reinforcement learning, (14 more...)

Country:

North America > United States (0.14)
Asia > Middle East > Saudi Arabia (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)

Hochreiter, Sepp, Schmidhuber, Jürgen

Source Separation as a By-Product of Regularization

This paper reveals a previously ignored connection between two important fields: regularization and independent component analysis (ICA).We show that at least one representative of a broad class of algorithms (regularizers that reduce network complexity) extracts independent features as a byproduct. This algorithm is Flat Minimum Search (FMS), a recent general method for finding low-complexity networks with high generalization capability. FMS works by minimizing both training error and required weight precision. Accordingto our theoretical analysis the hidden layer of an FMStrained autoassociator attempts at coding each input by a sparse code with as few simple features as possible. In experiments themethod extracts optimal codes for difficult versions of the "noisy bars" benchmark problem by separating the underlying sources, whereas ICA and PCA fail.

artificial intelligence, lococode, neural network, (18 more...)

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Rätsch, Gunnar, Onoda, Takashi, Müller, Klaus R.

Regularizing AdaBoost

We will also introduce a regularization strategy(analogous to weight decay) into boosting. This strategy uses slack variables to achieve a soft margin (section 4). Numerical experiments show the validity of our regularization approach in section 5 and finally a brief conclusion is given. 2 AdaBoost Algorithm Let {ht(x): t 1, ...,T} be an ensemble of T hypotheses defined on input vector x and e

adaboost, artificial intelligence, health & medicine, (20 more...)