AITopics

We propose a novel method of dimensionality reduction for supervised learning. Given a regression or classification problem in which we wish to predict a variable Y from an explanatory vector X, we treat the problem of dimensionality reduction as that of finding a low-dimensional "effective subspace" of X which retains the statistical relationship between X and Y. We show that this problem can be formulated in terms of conditional independence. To turn this formulation into an optimization problem, we characterize the notion of conditional independence using covariance operators on reproducing kernel Hilbert spaces; this allows us to derive a contrast function for estimation of the effective subspace. Unlike many conventional methods, the proposed method requires neither assumptions on the marginal distribution of X, nor a parametric model of the conditional distribution of Y.

conditional independence, dimensionality reduction, reduction, (10 more...)

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > California > Alameda County > Berkeley (0.04)
North America > United States > Wisconsin (0.04)
(2 more...)

Genre: Research Report (0.48)

Industry: Health & Medicine > Therapeutic Area (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.85)

Bie, Tijl D., Cristianini, Nello

Convex Methods for Transduction

The 2-class transduction problem, as formulated by Vapnik [1], involves finding a separating hyperplane for a labelled data set that is also maximally distant from a given set of unlabelled test points. In this form, the problem has exponential computational complexity in the size of the working set. So far it has been attacked by means of integer programming techniques [2] that do not scale to reasonable problem sizes, or by local search procedures [3]. In this paper we present a relaxation of this task based on semidefinite programming (SDP), resulting in a convex optimization problem that has polynomial complexity in the size of the data set. The results are very encouraging for mid sized data sets, however the cost is still too high for large scale problems, due to the high dimensional search space. To this end, we restrict the feasible region by introducing an approximation based on solving an eigenproblem. With this approximation, the computational cost of the algorithm is such that problems with more than 1000 points can be treated.

approximation, constraint, matrix, (17 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
North America > United States > California > Yolo County > Davis (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.69)

Andrews, Stuart, Hofmann, Thomas

Multiple Instance Learning via Disjunctive Programming Boosting

Learning from ambiguous training data is highly relevant in many applications. We present a new learning algorithm for classification problems where labels are associated with sets of pattern instead of individual patterns. This encompasses multiple instance learning as a special case. Our approach is based on a generalization of linear programming boosting and uses results from disjunctive programming to generate successively stronger linear relaxations of a discrete non-convex problem.

algorithm, constraint, relaxation, (15 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.15)
North America > United States > Rhode Island > Providence County > Providence (0.05)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.70)

Tsuda, Koji, Rätsch, Gunnar

Image Reconstruction by Linear Programming

A common way of image denoising is to project a noisy image to the subspace of admissible images made for instance by PCA. However, a major drawback of this method is that all pixels are updated by the projection, even when only a few pixels are corrupted by noise or occlusion.

noise, pixel, reconstruction error, (14 more...)

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
North America > United States > Wisconsin (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Zhu, Ji, Rosset, Saharon, Tibshirani, Robert, Hastie, Trevor J.

1-norm Support Vector Machines

The standard 2-norm SVM is known for its good performance in twoclass classi£cation. In this paper, we consider the 1-norm SVM. We argue that the 1-norm SVM may have some advantage over the standard 2-norm SVM, especially when there are redundant noise features. We also propose an ef£cient algorithm that computes the whole solution path of the 1-norm SVM, hence facilitates adaptive selection of the tuning parameter for the 1-norm SVM.

1-norm svm, penalty, svm, (16 more...)

Country:

North America > United States > New York (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Industry:

Health & Medicine > Therapeutic Area > Oncology > Leukemia (0.47)
Health & Medicine > Therapeutic Area > Hematology (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Schultz, Matthew, Joachims, Thorsten

Learning a Distance Metric from Relative Comparisons

This paper presents a method for learning a distance metric from relative comparison such as "A is closer to B than A is to C". Taking a Support Vector Machine (SVM) approach, we develop an algorithm that provides a flexible way of describing qualitative training data as a set of constraints. We show that such constraints lead to a convex quadratic programming problem that can be solved by adapting standard methods for SVM training. We empirically evaluate the performance and the modelling flexibility of the algorithm on a collection of text documents.

constraint, distance metric, relative comparison, (12 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > Tompkins County > Ithaca (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)

Graepel, Thore, Herbrich, Ralf

Invariant Pattern Recognition by Semi-Definite Programming Machines

Knowledge about local invariances with respect to given pattern transformations can greatly improve the accuracy of classification. Previous approaches are either based on regularisation or on the generation of virtual (transformed) examples. We develop a new framework for learning linear classifiers under known transformations based on semidefinite programming. We present a new learning algorithm-- the Semidefinite Programming Machine (SDPM)--which is able to find a maximum margin hyperplane when the training examples are polynomial trajectories instead of single points. The solution is found to be sparse in dual variables and allows to identify those points on the trajectory with minimal real-valued output as virtual support vectors. Extensions to segments of trajectories, to more than one transformation parameter, and to learning with kernels are discussed. In experiments we use a Taylor expansion to locally approximate rotational invariance in pixel images from USPS and find improvements over known methods.

constraint, polynomial, trajectory, (16 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.40)

Griffiths, Thomas L., Jordan, Michael I., Tenenbaum, Joshua B., Blei, David M.

Hierarchical Topic Models and the Nested Chinese Restaurant Process

We address the problem of learning topic hierarchies from data. The model selection problem in this domain is daunting--which of the large collection of possible trees to use? We take a Bayesian approach, generating an appropriate prior via a distribution on partitions that we refer to as the nested Chinese restaurant process. This nonparametric prior allows arbitrarily large branching factors and readily accommodates growing data collections. We build a hierarchical topic model by combining this prior with a likelihood that is based on a hierarchical variant of latent Dirichlet allocation. We illustrate our approach on simulated data and with an application to the modeling of NIPS abstracts.

chinese restaurant process, hierarchy, restaurant, (15 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > Alameda County > Berkeley (0.14)
Asia > Middle East > Jordan (0.05)
(2 more...)

Industry: Consumer Products & Services > Restaurants (0.90)

Torresani, Lorenzo, Hertzmann, Aaron, Bregler, Christoph

Learning Non-Rigid 3D Shape from 2D Motion

This paper presents an algorithm for learning the time-varying shape of a nonrigid 3D object from uncalibrated 2D tracking data. We model shape motion as a rigid component (rotation and translation) combined with a nonrigid deformation. Reconstruction is ill-posed if arbitrary deformations are allowed. We constrain the problem by assuming that the object shape at each time instant is drawn from a Gaussian distribution. Based on this assumption, the algorithm simultaneously estimates 3D shape and motion for each time frame, learns the parameters of the Gaussian, and robustly fills-in missing data points. We then extend the algorithm to model temporal smoothness in object shape, thus allowing it to handle severe cases of missing data.

algorithm, deformation, vec, (16 more...)

Country:

North America > Canada > Ontario > Toronto (0.15)
North America > United States > New York (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)

Vovk, Vladimir, Shafer, Glenn, Nouretdinov, Ilia

Self-calibrating Probability Forecasting

In the problem of probability forecasting the learner's goal is to output, given a training set and a new object, a suitable probability measure on the possible values of the new object's label. An online algorithm for probability forecasting is said to be well-calibrated if the probabilities it outputs agree with the observed frequencies. We give a natural nonasymptotic formalization of the notion of well-calibratedness, which we then study under the assumption of randomness (the object/label pairs are independent and identically distributed). It turns out that, although no probability forecasting algorithm is automatically well-calibrated in our sense, there exists a wide class of algorithms for "multiprobability forecasting" (such algorithms are allowed to output a set, ideally very narrow, of probability measures) which satisfy this property; we call the algorithms in this class "Venn probability machines". Our experimental results demonstrate that a 1-Nearest Neighbor Venn probability machine performs reasonably well on a standard benchmark data set, and one of our theoretical results asserts that a simple Venn probability machine asymptotically approaches the true conditional probabilities regardless, and without knowledge, of the true probability measure generating the examples.

probability, probability machine, probability measure, (15 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York (0.05)
Europe > United Kingdom (0.04)
(4 more...)

Genre: Research Report (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.35)