AITopics

We discuss a strategy for polychotomous classification that involves estimating class probabilities for each pair of classes, and then coupling the estimates together. The coupling model is similar to the Bradley-Terry method for paired comparisons. We study the nature of the class probability estimates that arise, and examine the performance of the procedure in simulated datasets. The classifiers used include linear discriminants and nearest neighbors: application to support vector machines is also briefly described.

classification, friedman, procedure, (14 more...)

Country:

North America > Canada > Ontario > Toronto (0.15)
North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Industry: Health & Medicine (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.56)

Grove, Adam J., Roth, Dan

Linear Concepts and Hidden Variables: An Empirical Study

Some learning techniques for classification tasks work indirectly, by first trying to fit a full probabilistic model to the observed data. Whether this is a good idea or not depends on the robustness with respect to deviations from the postulated model. We study this question experimentally in a restricted, yet nontrivial and interesting case: we consider a conditionally independent attribute (CIA) model which postulates a single binary-valued hidden variable z on which all other attributes (i.e., the target and the observables) depend. In this model, finding the most likely value of anyone variable (given known values for the others) reduces to testing a linear function of the observed values. We learn CIA with two techniques: the standard EM algorithm, and a new algorithm we develop based on covariances. We compare these, in a controlled fashion, against an algorithm (a version of Winnow) that attempts to find a good linear classifier directly. Our conclusions help delimit the fragility of using the CIA model for classification: once the data departs from this model, performance quickly degrades and drops below that of the directly-learned linear classifier.

algorithm, cia model, experiment, (13 more...)

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Goldberg, Paul W., Williams, Christopher K. I., Bishop, Christopher M.

Regression with Input-dependent Noise: A Gaussian Process Treatment

Gaussian processes provide natural nonparametric prior distributions over regression functions. In this paper we consider regression problems where there is noise on the output, and the variance of the noise depends on the inputs. If we assume that the noise is a smooth function of the inputs, then it is natural to model the noise variance using a second Gaussian process, in addition to the Gaussian process governing the noise-free output value. We show that prior uncertainty about the parameters controlling both processes can be handled and that the posterior distribution of the noise rate can be sampled from using Markov chain Monte Carlo methods. Our results on a synthetic data set give a posterior noise variance that well-approximates the true variance.

gaussian process, input-dependent noise, noise level, (12 more...)

Country:

North America > Canada > Ontario > Toronto (0.15)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
Europe > United Kingdom > England > West Midlands > Birmingham (0.04)
(2 more...)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Feraud, Raphaël, Bernier, Olivier

Ensemble and Modular Approaches for Face Detection: A Comparison

A new learning model based on autoassociative neural networks is developped and applied to face detection. To extend the detection ability in orientation and to decrease the number of false alarms, different combinations of networks are tested: ensemble, conditional ensemble and conditional mixture of networks. The use of a conditional mixture of networks allows to obtain state of the art results on different benchmark face databases. The set of all possible windows is E V uN, with V n N 0. Since collecting a representative set of non-face examples is impossible, face detection by a statistical model is a difficult task. An autoassociative network, using five layers of neurons, is able to perform a nonlinear dimensionnality reduction [Kramer, 1991].

conditional mixture, ensemble, face detector, (11 more...)

Country:

Europe > France (0.05)
North America > United States > New York (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Frey, Brendan J., MacKay, David J. C.

A Revolution: Belief Propagation in Graphs with Cycles

Until recently, artificial intelligence researchers have frowned upon the application of probability propagation in Bayesian belief networks that have cycles. The probability propagation algorithm is only exact in networks that are cycle-free. However, it has recently been discovered that the two best error-correcting decoding algorithms are actually performing probability propagation in belief networks with cycles. 1 Communicating over a noisy channel Our increasingly wired world demands efficient methods for communicating bits of information over physical channels that introduce errors. Examples of real-world channels include twisted-pair telephone wires, shielded cable-TV wire, fiberoptic cable, deep-space radio, terrestrial radio, and indoor radio. Engineers attempt to correct the errors introduced by the noise in these channels through the use of channel coding which adds protection to the information source, so that some channel errors can be corrected.

bayesian network, information bit, probability propagation, (14 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > United States > Illinois (0.04)
(4 more...)

Industry:

Media > Television (0.54)
Leisure & Entertainment (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Freitas, João F. G. de, Niranjan, Mahesan, Gee, Andrew H.

Regularisation in Sequential Learning Algorithms

In this paper, we discuss regularisation in online/sequential learning algorithms. In environments where data arrives sequentially, techniques such as cross-validation to achieve regularisation or model selection are not possible. Further, bootstrapping to determine a confidence level is not practical. To surmount these problems, a minimum variance estimation approach that makes use of the extended Kalman algorithm for training multi-layer perceptrons is employed. The novel contribution of this paper is to show the theoretical links between extended Kalman filtering, Sutton's variable learning rate algorithms and Mackay's Bayesian estimation framework. In doing so, we propose algorithms to overcome the need for heuristic choices of the initial conditions and noise covariance matrices in the Kalman approach.

algorithm, ekf algorithm, posterior density function, (15 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.06)
North America > United States > California > San Mateo County > San Mateo (0.04)
Africa > South Africa (0.04)

Cohen, William W., Schapire, Robert E., Singer, Yoram

Learning to Order Things

Most previous work in inductive learning has concentrated on learning to classify. However, there are many applications in which it is desirable to order rather than classify instances. An example might be a personalized email filter that gives a priority ordering to unread mail. Here we will consider the problem of learning how to construct such orderings, given feedback in the form of preference judgments, i.e., statements that one instance should be ranked ahead of another. Such orderings could be constructed based on a learned classifier or regression model, and in fact often are.

algorithm, pref, preference function, (16 more...)

Country:

North America > United States > New York (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.34)

Burger, Matthias, Graepel, Thore, Obermayer, Klaus

An Annealed Self-Organizing Map for Source Channel Coding

It is especially suited for speech and image data which in many applieations have to be transmitted under low bandwidth/high noise level conditions. Following the idea of (Farvardin, 1990) and (Luttrell, 1989) of jointly optimizing the codebook and the data representation w.r.t. to a given channel noise we apply a deterministic annealing scheme (Rose, 1990; Buhmann, 1997) to the problem and develop a An Annealed Self-Organizing Map for Source Channel Coding 431 soft topographic vector quantization algorithm (STVQ) (cf.

self-organizing map, ssom, stvq, (12 more...)

Country:

North America > United States > District of Columbia > Washington (0.04)
Europe > Germany > Berlin (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Blais, Brian S., Intrator, Nathan, Shouval, Harel Z., Cooper, Leon N.

Receptive Field Formation in Natural Scene Environments: Comparison of Single Cell Learning Rules

We study several statistically and biologically motivated learning rules using the same visual environment, one made up of natural scenes, and the same single cell neuronal architecture. This allows us to concentrate on the feature extraction and neuronal coding properties of these rules. Included in these rules are kurtosis and skewness maximization, the quadratic form of the BCM learning rule, and single cell ICA. Using a structure removal method, we demonstrate that receptive fields developed using these rules depend on a small portion of the distribution. We find that the quadratic form of the BCM rule behaves in a manner similar to a kurtosis maximization rule when the distribution contains kurtotic directions, although the BCM modification equations are computationally simpler.

kurtosis, natural scene, receptive field, (14 more...)

Country:

North America > United States > New York (0.04)
North America > United States > California > San Mateo County > San Mateo (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Industry: Government > Regional Government (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Bishop, Christopher M., Lawrence, Neil D., Jaakkola, Tommi, Jordan, Michael I.

Approximating Posterior Distributions in Belief Networks Using Mixtures

Exact inference in densely connected Bayesian networks is computationally intractable, and so there is considerable interest in developing effective approximation schemes. One approach which has been adopted is to bound the log likelihood using a mean-field approximating distribution. While this leads to a tractable algorithm, the mean field distribution is assumed to be factorial and hence unimodal. In this paper we demonstrate the feasibility of using a richer class of approximating distributions based on mixtures of mean field distributions. We derive an efficient algorithm for updating the mixture parameters and apply it to the problem of learning in sigmoid belief networks. Our results demonstrate a systematic improvement over simple mean field theory as the number of mixture components is increased.

approximating posterior distribution, hlm, log likelihood, (11 more...)

Country:

Asia > Middle East > Jordan (0.07)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom (0.04)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)