AITopics

In many applications, such as credit default prediction and medical image recognition, test inputs are available in addition to the labeled training examples. We propose a method to incorporate the test inputs into learning.

incorporating test input, test error, test input, (16 more...)

Country:

North America > United States > California > Los Angeles County > Pasadena (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Maryland > Baltimore (0.04)

Industry: Health & Medicine > Diagnostic Medicine (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.71)

Burger, Matthias, Graepel, Thore, Obermayer, Klaus

An Annealed Self-Organizing Map for Source Channel Coding

It is especially suited for speech and image data which in many applieations have to be transmitted under low bandwidth/high noise level conditions. Following the idea of (Farvardin, 1990) and (Luttrell, 1989) of jointly optimizing the codebook and the data representation w.r.t. to a given channel noise we apply a deterministic annealing scheme (Rose, 1990; Buhmann, 1997) to the problem and develop a An Annealed Self-Organizing Map for Source Channel Coding 431 soft topographic vector quantization algorithm (STVQ) (cf.

self-organizing map, ssom, stvq, (12 more...)

Country:

North America > United States > District of Columbia > Washington (0.04)
Europe > Germany > Berlin (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Blais, Brian S., Intrator, Nathan, Shouval, Harel Z., Cooper, Leon N.

Receptive Field Formation in Natural Scene Environments: Comparison of Single Cell Learning Rules

We study several statistically and biologically motivated learning rules using the same visual environment, one made up of natural scenes, and the same single cell neuronal architecture. This allows us to concentrate on the feature extraction and neuronal coding properties of these rules. Included in these rules are kurtosis and skewness maximization, the quadratic form of the BCM learning rule, and single cell ICA. Using a structure removal method, we demonstrate that receptive fields developed using these rules depend on a small portion of the distribution. We find that the quadratic form of the BCM rule behaves in a manner similar to a kurtosis maximization rule when the distribution contains kurtotic directions, although the BCM modification equations are computationally simpler.

kurtosis, natural scene, receptive field, (14 more...)

Country:

North America > United States > New York (0.04)
North America > United States > California > San Mateo County > San Mateo (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Industry: Government > Regional Government (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Bishop, Christopher M., Lawrence, Neil D., Jaakkola, Tommi, Jordan, Michael I.

Approximating Posterior Distributions in Belief Networks Using Mixtures

Exact inference in densely connected Bayesian networks is computationally intractable, and so there is considerable interest in developing effective approximation schemes. One approach which has been adopted is to bound the log likelihood using a mean-field approximating distribution. While this leads to a tractable algorithm, the mean field distribution is assumed to be factorial and hence unimodal. In this paper we demonstrate the feasibility of using a richer class of approximating distributions based on mixtures of mean field distributions. We derive an efficient algorithm for updating the mixture parameters and apply it to the problem of learning in sigmoid belief networks. Our results demonstrate a systematic improvement over simple mean field theory as the number of mixture components is increased.

approximating posterior distribution, hlm, log likelihood, (11 more...)

Country:

Asia > Middle East > Jordan (0.07)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom (0.04)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Bengio, Yoshua, Bengio, Samy, Isabelle, Jean-Franc, Singer, Yoram

Shared Context Probabilistic Transducers

Recently, a model for supervised learning of probabilistic transducers represented by suffix trees was introduced. However, this algorithm tends to build very large trees, requiring very large amounts of computer memory. In this paper, we propose anew, more compact, transducer model in which one shares the parameters of distributions associated to contexts yielding similar conditional output distributions. We illustrate the advantages of the proposed algorithm with comparative experiments on inducing a noun phrase recogmzer.

algorithm, node, transducer, (13 more...)

Country:

North America > Canada > Quebec > Montreal (0.05)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.31)

Barber, David, Bishop, Christopher M.

Ensemble Learning for Multi-Layer Networks

In contrast to the maximum likelihood approach which finds only a single estimate for the regression parameters, the Bayesian approach yields a distribution of weight parameters, p(wID), conditional on the training data D, and predictions are ex- ·Present address: SNN, University of Nijmegen, Geert Grooteplein 21, Nijmegen, The Netherlands.

covariance matrix, gaussian, posterior distribution, (12 more...)

Country:

Europe > Netherlands > Gelderland > Nijmegen (0.45)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Yang, Howard Hua, Amari, Shun-ichi

The Efficiency and the Robustness of Natural Gradient Descent Learning Rule

The inverse of the Fisher information matrix is used in the natural gradient descent algorithm to train single-layer and multi-layer perceptrons. We have discovered a new scheme to represent the Fisher information matrix of a stochastic multi-layer perceptron. Based on this scheme, we have designed an algorithm to compute the natural gradient. When the input dimension n is much larger than the number of hidden neurons, the complexity of this algorithm is of order O(n). It is confirmed by simulations that the natural gradient descent learning rule is not only efficient but also robust.

algorithm, fisher information matrix, gd algorithm, (12 more...)

Country:

North America > United States > Oregon > Multnomah County > Portland (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Japan > Honshū > Kantō > Saitama Prefecture > Saitama (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.95)

Xiong, Yuansheng, Kwon, Chulan, Oh, Jong-Hoon

The Storage Capacity of a Fully-Connected Committee Machine

We study the storage capacity of a fully-connected committee machine with a large number K of hidden nodes. The storage capacity is obtained by analyzing the geometrical structure of the weight space related to the internal representation.

committee machine, internal representation, storage capacity, (14 more...)

Country:

Asia > South Korea > Gyeongsangbuk-do > Pohang (0.05)
North America > United States (0.04)
Asia > Singapore (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.30)

On the Infeasibility of Training Neural Networks with Small Squared Errors

Vu, Van H.

We demonstrate that the problem of training neural networks with small (average) squared error is computationally intractable.

algorithm, neural network, threshold, (14 more...)

Country:

North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
North America > United States > California > Santa Clara County > San Jose (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.74)

Competitive On-line Linear Regression

Vovk, Volodya

We apply a general algorithm for merging prediction strategies (the Aggregating Algorithm) to the problem of linear regression with the square loss; our main assumption is that the response variable is bounded. It turns out that for this particular problem the Aggregating Algorithm resembles, but is slightly different from, the wellknown ridge estimation procedure. From general results about the Aggregating Algorithm we deduce a guaranteed bound on the difference between our algorithm's performance and the best, in some sense, linear regression function's performance. We show that the AA attains the optimal constant in our bound, whereas the constant attained by the ridge regression procedure in general can be 4 times worse. 1 INTRODUCTION The usual approach to regression problems is to assume that the data are generated by some stochastic mechanism and make some, typically very restrictive, assumptions about that stochastic mechanism. In recent years, however, a different approach to this kind of problems was developed (see, e.g., DeSantis et al. [2], Littlestone and Warmuth [7]): in our context, that approach sets the goal of finding an online algorithm that performs not much worse than the best regression function found off-line; in other words, it replaces the usual statistical analyses by the competitive analysis of online algorithms. DeSantis et al. [2] performed a competitive analysis of the Bayesian merging scheme for the log-loss prediction game; later Littlestone and Warmuth [7] and Vovk [10] introduced an online algorithm (called the Weighted Majority Algorithm by the Competitive Online Linear Regression 365 former authors) for the simple binary prediction game. These two algorithms (the Bayesian merging scheme and the Weighted Majority Algorithm) are special cases of the Aggregating Algorithm (AA) proposed in [9, 11]. The AA is a member of a wide family of algorithms called "multiplicative weight" or "exponential weight" algorithms. Closer to the topic of this paper, Cesa-Bianchi et al. [1) performed a competitive analysis, under the square loss, of the standard Gradient Descent Algorithm and Kivinen and Warmuth [6] complemented it by a competitive analysis of a modification of the Gradient Descent, which they call the Exponentiated Gradient Algorithm.

algorithm, learner, ridge regression procedure, (12 more...)

Country:

North America > United States > New York (0.05)
North America > United States > California > San Mateo County > San Mateo (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)