AITopics

We introduce a new, nonparametric and principled, distance based clustering method. This method combines a pairwise based approach witha vector-quantization method which provide a meaningful interpretation to the resulting clusters. The idea is based on turning the distance matrix into a Markov process and then examine the decay of mutual-information during the relaxation of this process. The clusters emerge as quasi-stable structures during thisrelaxation, and then are extracted using the information bottleneck method.

artificial intelligence, information, machine learning, (18 more...)

Country:

Asia > Middle East > Israel (0.14)
North America > United States > Ohio (0.14)

Industry: Health & Medicine (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Sparse Kernel Principal Component Analysis

Tipping, Michael E.

'Kernel' principal component analysis (PCA) is an elegant nonlinear generalisationof the popular linear data analysis method, where a kernel function implicitly defines a nonlinear transformation intoa feature space wherein standard PCA is performed. Unfortunately, thetechnique is not'sparse', since the components thus obtained are expressed in terms of kernels associated with every trainingvector. This paper shows that by approximating the covariance matrix in feature space by a reduced number of example vectors,using a maximum-likelihood approach, we may obtain a highly sparse form of kernel PCA without loss of effectiveness. 1 Introduction Principal component analysis (PCA) is a well-established technique for dimensionality reduction,and examples of its many applications include data compression, image processing, visualisation, exploratory data analysis, pattern recognition and time series prediction.

artificial intelligence, feature space, machine learning, (16 more...)

Country: Europe > United Kingdom (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.82)

Szummer, Martin, Jaakkola, Tommi

Kernel Expansions with Unlabeled Examples

Modern classification applications necessitate supplementing the few available labeled examples with unlabeled examples to improve classification performance.We present a new tractable algorithm for exploiting unlabeled examples in discriminative classification. This is achieved essentially by expanding the input vectors into longer feature vectors via both labeled and unlabeled examples. The resulting classification method can be interpreted as a discriminative kernel density estimate and is readily trainedvia the EM algorithm, which in this case is both discriminative and achieves the optimal solution. We provide, in addition, a purely discriminative formulationof the estimation problem by appealing to the maximum entropy framework. We demonstrate that the proposed approach requiresvery few labeled examples for high classification accuracy.

Mika, Sebastian, Rätsch, Gunnar, Müller, Klaus-Robert

A Mathematical Programming Approach to the Kernel Fisher Algorithm

We investigate a new kernel-based classifier: the Kernel Fisher Discriminant (KFD).A mathematical programming formulation based on the observation thatKFD maximizes the average margin permits an interesting modification of the original KFD algorithm yielding the sparse KFD. We find that both, KFD and the proposed sparse KFD, can be understood in an unifying probabilistic context. Furthermore, we show connections to Support Vector Machines and Relevance Vector Machines. From this understanding, we are able to outline an interesting kernel-regression technique based upon the KFD algorithm.

artificial intelligence, kfd, machine learning, (18 more...)

Country: Europe > Germany (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.69)

Mangasarian, Olvi L., Musicant, David R.

Active Support Vector Machine Classification

Classificationis achieved by a linear or nonlinear separating surface in the input space of the dataset. In this work we propose a very fast simple algorithm, based on an active set strategy for solving quadratic programs with bounds [18]. The algorithm is capable of accurately solving problems with millions of points and requires nothing more complicated than a commonly available linear equation solver [17, 1, 6] for a typically small (100) dimensional input space of the problem. Key to our approach are the following two changes to the standard linear SVM: 1. Maximize the margin (distance) between the parallel separating planes with respect to both orientation (w) as well as location relative to the origin b).

algorithm, artificial intelligence, machine learning, (14 more...)

Country:

North America > United States > Massachusetts > Middlesex County (0.29)
North America > United States > Wisconsin > Dane County > Madison (0.28)
North America > United States > Pennsylvania (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Lu, Wei, Rajapakse, Jagath C.

Constrained Independent Component Analysis

The paper presents a novel technique of constrained independent component analysis (CICA) to introduce constraints into the classical ICAand solve the constrained optimization problem by using Lagrange multiplier methods. This paper shows that CICA can be used to order the resulted independent components in a specific manner and normalize the demixing matrix in the signal separation procedure. It can systematically eliminate the ICA's indeterminacy on permutation and dilation. The experiments demonstrate the use of CICA in ordering of independent components while providing normalized demixing processes. Keywords: Independent component analysis, constrained independent componentanalysis, constrained optimization, Lagrange multiplier methods 1 Introduction Independent component analysis (ICA) is a technique to transform a multivariate randomsignal into a signal with components that are mutually independent in complete statistical sense [1]. There has been a growing interest in research for efficient realization of ICA neural networks (ICNNs).

artificial intelligence, lagrange multiplier method, machine learning, (16 more...)

Genre: Research Report (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.77)

Lodhi, Huma, Shawe-Taylor, John, Cristianini, Nello, Watkins, Christopher J. C. H.

Text Classification using String Kernels

HUlna Lodhi John Shawe-Taylor N ello Cristianini Chris Watkins Department of Computer Science Royal Holloway, University of London Egham, Surrey TW20 OEX, UK {huma, john, nello, chrisw}Cdcs.rhbnc.ac.uk Abstract We introduce a novel kernel for comparing two text documents. The kernel is an inner product in the feature space consisting of all subsequences of length k. A subsequence is any ordered sequence ofk characters occurring in the text though not necessarily contiguously. The subsequences are weighted by an exponentially decaying factor of their full length in the text, hence emphasising those occurrences which are close to contiguous. A direct computation ofthis feature vector would involve a prohibitive amount of computation even for modest values of k, since the dimension of the feature space grows exponentially with k. The paper describes how despite this fact the inner product can be efficiently evaluated by a dynamic programming technique.

artificial intelligence, kernel, machine learning, (14 more...)

Industry: Health & Medicine (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Herbrich, Ralf, Graepel, Thore

Large Scale Bayes Point Machines

The concept of averaging over classifiers is fundamental to the Bayesian analysis of learning. Based on this viewpoint, it has recently beendemonstrated for linear classifiers that the centre of mass of version space (the set of all classifiers consistent with the training set) - also known as the Bayes point - exhibits excellent generalisationabilities. In this paper we present a method based on the simple perceptron learning algorithm which allows to overcome this algorithmic drawback. The method is algorithmically simpleand is easily extended to the multi-class case. We present experimental results on the MNIST data set of handwritten digitswhich show that Bayes point machines (BPMs) are competitive with the current world champion, the support vector machine.

artificial intelligence, generalisation error, machine learning, (17 more...)

Country:

North America > United States > Wisconsin (0.14)
North America > Canada > Ontario > Toronto (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.39)

Gray, Alexander G., Moore, Andrew W.

`N-Body' Problems in Statistical Learning

We present efficient algorithms for all-point-pairs problems, or'Nbody'-like problems,which are ubiquitous in statistical learning. We focus on six examples, including nearest-neighbor classification, kernel density estimation, outlier detection, and the two-point correlation.

artificial intelligence, data mining, machine learning, (18 more...)

Country: North America > United States (0.14)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Frey, Brendan J., Patrascu, Relu, Jaakkola, Tommi, Moran, Jodi

Sequentially Fitting ``Inclusive'' Trees for Inference in Noisy-OR Networks

Forexample, in medical diagnosis, the presence of a symptom can be expressed as a noisy-OR of the diseases that may cause the symptom - on some occasions, a disease may fail to activate the symptom. Inference in richly-connected noisy-OR networks is intractable, butapproximate methods (e .g., variational techniques) are showing increasing promise as practical solutions. One problem withmost approximations is that they tend to concentrate on a relatively small number of modes in the true posterior, ignoring otherplausible configurations of the hidden variables. We introduce a new sequential variational method for bipartite noisy OR networks, that favors including all modes of the true posterior and models the posterior distribution as a tree. We compare this method with other approximations using an ensemble of networks with network statistics that are comparable to the QMR-DT medical diagnosticnetwork. 1 Inclusive variational approximations Approximate algorithms for probabilistic inference are gaining in popularity and are now even being incorporated into VLSI hardware (T.

artificial intelligence, machine learning, symptom, (14 more...)

Country:

North America > United States > Massachusetts (0.15)
North America > Canada > Ontario > Toronto (0.15)
North America > United States > California > San Francisco County > San Francisco (0.14)

Industry: Health & Medicine > Diagnostic Medicine (0.36)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.30)