Goto

Collaborating Authors

 Corduneanu, Adrian


Continuation Methods for Mixing Heterogenous Sources

arXiv.org Machine Learning

A number of modern learning tasks involve estimation from heterogeneous information sources. This includes classification with labeled and unlabeled data as well as other problems with analogous structure such as competitive (game theoretic) problems. The associated estimation problems can be typically reduced to solving a set of fixed point equations (consistency conditions). We introduce a general method for combining a preferred information source with another in this setting by evolving continuous paths of fixed points at intermediate allocations. We explicitly identify critical points along the unique paths to either increase the stability of estimation or to ensure a significant departure from the initial source. The homotopy continuation approach is guaranteed to terminate at the second source, and involves no combinatorial effort. We illustrate the power of these ideas both in classification tasks with labeled and unlabeled data, as well as in the context of a competitive (min-max) formulation of DNA sequence motif discovery.


On Information Regularization

arXiv.org Machine Learning

We formulate a principle for classification with the knowledge of the marginal distribution over the data points (unlabeled data). The principle is cast in terms of Tikhonov style regularization where the regularization penalty articulates the way in which the marginal density should constrain otherwise unrestricted conditional distributions. Specifically, the regularization penalty penalizes any information introduced between the examples and labels beyond what is provided by the available labeled examples. The work extends Szummer and Jaakkola's information regularization (NIPS 2002) to multiple dimensions, providing a regularizer independent of the covering of the space used in the derivation. We show in addition how the information regularizer can be used as a measure of complexity of the classification task with unlabeled data and prove a relevant sample-complexity bound. We illustrate the regularization principle in practice by restricting the class of conditional distributions to be logistic regression models and constructing the regularization penalty from a finite set of unlabeled examples.


Distributed Information Regularization on Graphs

Neural Information Processing Systems

We provide a principle for semi-supervised learning based on optimizing the rate of communicating labels for unlabeled points with side information. Theside information is expressed in terms of identities of sets of points or regions with the purpose of biasing the labels in each region to be the same. The resulting regularization objective is convex, has a unique solution, and the solution can be found with a pair of local propagation operationson graphs induced by the regions. We analyze the properties of the algorithm and demonstrate its performance on document classificationtasks.