AITopics

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Tishby, Naftali, Slonim, Noam

Data Clustering by Markovian Relaxation and the Information Bottleneck Method

We introduce a new, nonparametric and principled, distance based clustering method. This method combines a pairwise based approach witha vector-quantization method which provide a meaningful interpretation to the resulting clusters. The idea is based on turning the distance matrix into a Markov process and then examine the decay of mutual-information during the relaxation of this process. The clusters emerge as quasi-stable structures during thisrelaxation, and then are extracted using the information bottleneck method.

artificial intelligence, information, machine learning, (18 more...)

Country:

Asia > Middle East > Israel (0.14)
North America > United States > Ohio (0.14)

Industry: Health & Medicine (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Sparse Kernel Principal Component Analysis

Tipping, Michael E.

'Kernel' principal component analysis (PCA) is an elegant nonlinear generalisationof the popular linear data analysis method, where a kernel function implicitly defines a nonlinear transformation intoa feature space wherein standard PCA is performed. Unfortunately, thetechnique is not'sparse', since the components thus obtained are expressed in terms of kernels associated with every trainingvector. This paper shows that by approximating the covariance matrix in feature space by a reduced number of example vectors,using a maximum-likelihood approach, we may obtain a highly sparse form of kernel PCA without loss of effectiveness. 1 Introduction Principal component analysis (PCA) is a well-established technique for dimensionality reduction,and examples of its many applications include data compression, image processing, visualisation, exploratory data analysis, pattern recognition and time series prediction.

artificial intelligence, feature space, machine learning, (16 more...)

Country: Europe > United Kingdom (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.82)

Szummer, Martin, Jaakkola, Tommi

Kernel Expansions with Unlabeled Examples

Modern classification applications necessitate supplementing the few available labeled examples with unlabeled examples to improve classification performance.We present a new tractable algorithm for exploiting unlabeled examples in discriminative classification. This is achieved essentially by expanding the input vectors into longer feature vectors via both labeled and unlabeled examples. The resulting classification method can be interpreted as a discriminative kernel density estimate and is readily trainedvia the EM algorithm, which in this case is both discriminative and achieves the optimal solution. We provide, in addition, a purely discriminative formulationof the estimation problem by appealing to the maximum entropy framework. We demonstrate that the proposed approach requiresvery few labeled examples for high classification accuracy.

Mika, Sebastian, Rätsch, Gunnar, Müller, Klaus-Robert

A Mathematical Programming Approach to the Kernel Fisher Algorithm

We investigate a new kernel-based classifier: the Kernel Fisher Discriminant (KFD).A mathematical programming formulation based on the observation thatKFD maximizes the average margin permits an interesting modification of the original KFD algorithm yielding the sparse KFD. We find that both, KFD and the proposed sparse KFD, can be understood in an unifying probabilistic context. Furthermore, we show connections to Support Vector Machines and Relevance Vector Machines. From this understanding, we are able to outline an interesting kernel-regression technique based upon the KFD algorithm.

artificial intelligence, kfd, machine learning, (18 more...)

Country: Europe > Germany (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.69)

Mangasarian, Olvi L., Musicant, David R.

Active Support Vector Machine Classification

Classificationis achieved by a linear or nonlinear separating surface in the input space of the dataset. In this work we propose a very fast simple algorithm, based on an active set strategy for solving quadratic programs with bounds [18]. The algorithm is capable of accurately solving problems with millions of points and requires nothing more complicated than a commonly available linear equation solver [17, 1, 6] for a typically small (100) dimensional input space of the problem. Key to our approach are the following two changes to the standard linear SVM: 1. Maximize the margin (distance) between the parallel separating planes with respect to both orientation (w) as well as location relative to the origin b).

algorithm, artificial intelligence, machine learning, (14 more...)

Country:

North America > United States > Massachusetts > Middlesex County (0.29)
North America > United States > Wisconsin > Dane County > Madison (0.28)
North America > United States > Pennsylvania (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Herbrich, Ralf, Graepel, Thore

Large Scale Bayes Point Machines

The concept of averaging over classifiers is fundamental to the Bayesian analysis of learning. Based on this viewpoint, it has recently beendemonstrated for linear classifiers that the centre of mass of version space (the set of all classifiers consistent with the training set) - also known as the Bayes point - exhibits excellent generalisationabilities. In this paper we present a method based on the simple perceptron learning algorithm which allows to overcome this algorithmic drawback. The method is algorithmically simpleand is easily extended to the multi-class case. We present experimental results on the MNIST data set of handwritten digitswhich show that Bayes point machines (BPMs) are competitive with the current world champion, the support vector machine.

artificial intelligence, generalisation error, machine learning, (17 more...)

Country:

North America > United States > Wisconsin (0.14)
North America > Canada > Ontario > Toronto (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.39)

Gray, Alexander G., Moore, Andrew W.

`N-Body' Problems in Statistical Learning

We present efficient algorithms for all-point-pairs problems, or'Nbody'-like problems,which are ubiquitous in statistical learning. We focus on six examples, including nearest-neighbor classification, kernel density estimation, outlier detection, and the two-point correlation.

artificial intelligence, data mining, machine learning, (18 more...)

Country: North America > United States (0.14)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Frey, Brendan J., Patrascu, Relu, Jaakkola, Tommi, Moran, Jodi

Sequentially Fitting ``Inclusive'' Trees for Inference in Noisy-OR Networks

Forexample, in medical diagnosis, the presence of a symptom can be expressed as a noisy-OR of the diseases that may cause the symptom - on some occasions, a disease may fail to activate the symptom. Inference in richly-connected noisy-OR networks is intractable, butapproximate methods (e .g., variational techniques) are showing increasing promise as practical solutions. One problem withmost approximations is that they tend to concentrate on a relatively small number of modes in the true posterior, ignoring otherplausible configurations of the hidden variables. We introduce a new sequential variational method for bipartite noisy OR networks, that favors including all modes of the true posterior and models the posterior distribution as a tree. We compare this method with other approximations using an ensemble of networks with network statistics that are comparable to the QMR-DT medical diagnosticnetwork. 1 Inclusive variational approximations Approximate algorithms for probabilistic inference are gaining in popularity and are now even being incorporated into VLSI hardware (T.

artificial intelligence, machine learning, symptom, (14 more...)

Country:

North America > United States > Massachusetts (0.15)
North America > Canada > Ontario > Toronto (0.15)
North America > United States > California > San Francisco County > San Francisco (0.14)

Industry: Health & Medicine > Diagnostic Medicine (0.36)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.30)

Discovering Hidden Variables: A Structure-Based Approach

Elidan, Gal, Lotner, Noam, Friedman, Nir, Koller, Daphne

A serious problem in learning probabilistic models is the presence of hidden variables.These variables are not observed, yet interact with several of the observed variables. As such, they induce seemingly complex dependencies amongthe latter. In recent years, much attention has been devoted to the development of algorithms for learning parameters, and in some cases structure, in the presence of hidden variables. In this paper, weaddress the related problem of detecting hidden variables that interact with the observed variables. This problem is of interest both for improving our understanding of the domain and as a preliminary step that guides the learning procedure towards promising models.

artificial intelligence, bayesian inference, machine learning, (17 more...)