AITopics

The Minimax Probability Machine Classification (MPMC) framework [Lanckriet et al., 2002] builds classifiers by minimizing the maximum probability of misclassification, and gives direct estimates of the probabilistic accuracy bound Ω. The only assumptions that MPMC makes is that good estimates of means and covariance matrices of the classes exist. However, as with Support Vector Machines, MPMC is computationally expensive and requires extensive cross validation experiments to choose kernels and kernel parameters that give good performance. In this paper we address the computational cost of MPMC by proposing an algorithm that constructs nonlinear sparse MPMC (SMPMC) models by incrementally adding basis functions (i.e.

algorithm, basis function, mpmc, (13 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Colorado > Boulder County > Boulder (0.05)
Asia > Middle East > Jordan (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.48)

Efficient and Robust Feature Extraction by Maximum Margin Criterion

Li, Haifeng, Jiang, Tao, Zhang, Keshu

A new feature extraction criterion, maximum margin criterion (MMC), is proposed in this paper. This new criterion is general in the sense that, when combined with a suitable constraint, it can actually give rise to the most popular feature extractor in the literature, linear discriminate analysis (LDA).

criterion, extractor, feature extractor, (11 more...)

Country:

North America > United States > New York (0.05)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Florida > Sarasota County > Sarasota (0.04)
North America > United States > California > Riverside County > Riverside (0.04)

Genre: Research Report (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Data Science > Data Mining (0.86)

Fischer, Bernd, Roth, Volker, Buhmann, Joachim M.

Clustering with the Connectivity Kernel

Clustering aims at extracting hidden structure in dataset. While the problem of finding compact clusters has been widely studied in the literature, extracting arbitrarily formed elongated structures is considered a much harder problem. In this paper we present a novel clustering algorithm which tackles the problem by a two step procedure: first the data are transformed in such a way that elongated structures become compact ones. In a second step, these new objects are clustered by optimizing a compactness-based criterion. The advantages of the method over related approaches are threefold: (i) robustness properties of compactness-based criteria naturally transfer to the problem of extracting elongated structures, leading to a model which is highly robust against outlier objects; (ii) the transformed distances induce a Mercer kernel which allows us to formulate a polynomial approximation scheme to the generally N P-hard clustering problem; (iii) the new method does not contain free kernel parameters in contrast to methods like spectral clustering or mean-shift clustering.

algorithm, dissimilarity, elongated structure, (16 more...)

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Fukumizu, Kenji, Bach, Francis R., Jordan, Michael I.

Kernel Dimensionality Reduction for Supervised Learning

We propose a novel method of dimensionality reduction for supervised learning. Given a regression or classification problem in which we wish to predict a variable Y from an explanatory vector X, we treat the problem of dimensionality reduction as that of finding a low-dimensional "effective subspace" of X which retains the statistical relationship between X and Y. We show that this problem can be formulated in terms of conditional independence. To turn this formulation into an optimization problem, we characterize the notion of conditional independence using covariance operators on reproducing kernel Hilbert spaces; this allows us to derive a contrast function for estimation of the effective subspace. Unlike many conventional methods, the proposed method requires neither assumptions on the marginal distribution of X, nor a parametric model of the conditional distribution of Y.

conditional independence, dimensionality reduction, reduction, (10 more...)

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > California > Alameda County > Berkeley (0.04)
North America > United States > Wisconsin (0.04)
(2 more...)

Genre: Research Report (0.48)

Industry: Health & Medicine > Therapeutic Area (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.85)

Bie, Tijl D., Cristianini, Nello

Convex Methods for Transduction

The 2-class transduction problem, as formulated by Vapnik [1], involves finding a separating hyperplane for a labelled data set that is also maximally distant from a given set of unlabelled test points. In this form, the problem has exponential computational complexity in the size of the working set. So far it has been attacked by means of integer programming techniques [2] that do not scale to reasonable problem sizes, or by local search procedures [3]. In this paper we present a relaxation of this task based on semidefinite programming (SDP), resulting in a convex optimization problem that has polynomial complexity in the size of the data set. The results are very encouraging for mid sized data sets, however the cost is still too high for large scale problems, due to the high dimensional search space. To this end, we restrict the feasible region by introducing an approximation based on solving an eigenproblem. With this approximation, the computational cost of the algorithm is such that problems with more than 1000 points can be treated.

approximation, constraint, matrix, (17 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
North America > United States > California > Yolo County > Davis (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.69)

Zhu, Ji, Rosset, Saharon, Tibshirani, Robert, Hastie, Trevor J.

1-norm Support Vector Machines

The standard 2-norm SVM is known for its good performance in twoclass classi£cation. In this paper, we consider the 1-norm SVM. We argue that the 1-norm SVM may have some advantage over the standard 2-norm SVM, especially when there are redundant noise features. We also propose an ef£cient algorithm that computes the whole solution path of the 1-norm SVM, hence facilitates adaptive selection of the tuning parameter for the 1-norm SVM.

1-norm svm, penalty, svm, (16 more...)

Country:

North America > United States > New York (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Industry:

Health & Medicine > Therapeutic Area > Oncology > Leukemia (0.47)
Health & Medicine > Therapeutic Area > Hematology (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Schultz, Matthew, Joachims, Thorsten

Learning a Distance Metric from Relative Comparisons

This paper presents a method for learning a distance metric from relative comparison such as "A is closer to B than A is to C". Taking a Support Vector Machine (SVM) approach, we develop an algorithm that provides a flexible way of describing qualitative training data as a set of constraints. We show that such constraints lead to a convex quadratic programming problem that can be solved by adapting standard methods for SVM training. We empirically evaluate the performance and the modelling flexibility of the algorithm on a collection of text documents.

constraint, distance metric, relative comparison, (12 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > Tompkins County > Ithaca (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)

Graepel, Thore, Herbrich, Ralf

Invariant Pattern Recognition by Semi-Definite Programming Machines

Knowledge about local invariances with respect to given pattern transformations can greatly improve the accuracy of classification. Previous approaches are either based on regularisation or on the generation of virtual (transformed) examples. We develop a new framework for learning linear classifiers under known transformations based on semidefinite programming. We present a new learning algorithm-- the Semidefinite Programming Machine (SDPM)--which is able to find a maximum margin hyperplane when the training examples are polynomial trajectories instead of single points. The solution is found to be sparse in dual variables and allows to identify those points on the trajectory with minimal real-valued output as virtual support vectors. Extensions to segments of trajectories, to more than one transformation parameter, and to learning with kernels are discussed. In experiments we use a Taylor expansion to locally approximate rotational invariance in pixel images from USPS and find improvements over known methods.

constraint, polynomial, trajectory, (16 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.40)

Taskar, Ben, Guestrin, Carlos, Koller, Daphne

Max-Margin Markov Networks

In typical classification tasks, we seek a function which assigns a label to a single object. Kernel-based approaches, such as support vector machines (SVMs), which maximize the margin of confidence of the classifier, are the method of choice for many such tasks. Their popularity stems both from the ability to use high-dimensional feature spaces, and from their strong theoretical guarantees. However, many real-world tasks involve sequential, spatial, or structured data, where multiple labels must be assigned. Existing kernel-based methods ignore structure in the problem, assigning labels independently to each object, losing much useful information. Conversely, probabilistic graphical models, such as Markov networks, can represent correlations between labels, by exploiting problem structure, but cannot handle high-dimensional feature spaces, and lack strong theoretical generalization guarantees.

classification, constraint, markov network, (16 more...)

Country:

North America > United States > New York (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Ihler, Alexander T., Sudderth, Erik B., Freeman, William T., Willsky, Alan S.

Efficient Multiscale Sampling from Products of Gaussian Mixtures

The problem of approximating the product of several Gaussian mixture distributions arises in a number of contexts, including the nonparametric belief propagation (NBP) inference algorithm and the training of product of experts models. This paper develops two multiscale algorithms for sampling from a product of Gaussian mixtures, and compares their performance to existing methods. The first is a multiscale variant of previously proposed Monte Carlo techniques, with comparable theoretical guarantees but improved empirical convergence rates. The second makes use of approximate kernel density evaluation methods to construct a fast approximate sampler, which is guaranteed to sample points to within a tunable parameter ɛ of their true probability. We compare both multiscale samplers on a set of computational examples motivated by NBP, demonstrating significant improvements over existing methods.

algorithm, product density, sampler, (16 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > United States > New York (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)