Goto

Collaborating Authors

 Genre


Multi-Stage Multi-Task Feature Learning

arXiv.org Machine Learning

Multi-task sparse feature learning aims to improve the generalization performance by exploiting the shared features among tasks. It has been successfully applied to many applications including computer vision and biomedical informatics. Most of the existing multi-task sparse feature learning algorithms are formulated as a convex sparse regularization problem, which is usually suboptimal, due to its looseness for approximating an $\ell_0$-type regularizer. In this paper, we propose a non-convex formulation for multi-task sparse feature learning based on a novel non-convex regularizer. To solve the non-convex optimization problem, we propose a Multi-Stage Multi-Task Feature Learning (MSMTFL) algorithm; we also provide intuitive interpretations, detailed convergence and reproducibility analysis for the proposed algorithm. Moreover, we present a detailed theoretical analysis showing that MSMTFL achieves a better parameter estimation error bound than the convex formulation. Empirical studies on both synthetic and real-world data sets demonstrate the effectiveness of MSMTFL in comparison with the state of the art multi-task sparse feature learning algorithms.


Typed Answer Set Programming and Inverse Lambda Algorithms

arXiv.org Artificial Intelligence

Our broader goal is to automatically translate English sentences into formulas in appropriate knowledge representation languages as a step towards understanding and thus answering questions with respect to English text. Our focus in this paper is on the language of Answer Set Programming (ASP). Our approach to translate sentences to ASP rules is inspired by Montague's use of lambda calculus formulas as meaning of words and phrases. With ASP as the target language the meaning of words and phrases are ASP-lambda formulas. In an earlier work we illustrated our approach by manually developing a dictionary of words and their ASP-lambda formulas. However such an approach is not scalable. In this paper our focus is on two algorithms that allow one to construct ASP-lambda formulas in an inverse manner. In particular the two algorithms take as input two lambda-calculus expressions G and H and compute a lambda-calculus expression F such that F with input as G, denoted by F@G, is equal to H; and similarly G@F = H. We present correctness and complexity results about these algorithms. To do that we develop the notion of typed ASP-lambda calculus theories and their orders and use it in developing the completeness results. (To appear in Theory and Practice of Logic Programming.)


Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

arXiv.org Artificial Intelligence

Most state-of-the-art techniques for multi-class image segmentation and labeling use conditional random fields defined over pixels or image regions. While region-level models often feature dense pairwise connectivity, pixel-level models are considerably larger and have only permitted sparse graph structures. In this paper, we consider fully connected CRF models defined on the complete set of pixels in an image. The resulting graphs have billions of edges, making traditional inference algorithms impractical. Our main contribution is a highly efficient approximate inference algorithm for fully connected CRF models in which the pairwise edge potentials are defined by a linear combination of Gaussian kernels. Our experiments demonstrate that dense connectivity at the pixel level substantially improves segmentation and labeling accuracy.


A Generalized Mean Field Algorithm for Variational Inference in Exponential Families

arXiv.org Machine Learning

The mean field methods, which entail approximating intractable probability distributions variationally with distributions from a tractable family, enjoy high efficiency, guaranteed convergence, and provide lower bounds on the true likelihood. But due to requirement for model-specific derivation of the optimization equations and unclear inference quality in various models, it is not widely used as a generic approximate inference algorithm. In this paper, we discuss a generalized mean field theory on variational approximation to a broad class of intractable distributions using a rich set of tractable distributions via constrained optimization over distribution spaces. We present a class of generalized mean field (GMF) algorithms for approximate inference in complex exponential family models, which entails limiting the optimization over the class of cluster-factorizable distributions. GMF is a generic method requiring no model-specific derivations. It factors a complex model into a set of disjoint variable clusters, and uses a set of canonical fix-point equations to iteratively update the cluster distributions, and converge to locally optimal cluster marginals that preserve the original dependency structure within each cluster, hence, fully decomposed the overall inference problem. We empirically analyzed the effect of different tractable family (clusters of different granularity) on inference quality, and compared GMF with BP on several canonical models. Possible extension to higher-order MF approximation is also discussed.


Coalition Structure Generation over Graphs

Journal of Artificial Intelligence Research

We give the analysis of the computational complexity of coalition structure generation over graphs. Given an undirected graph G = (N,E) and a valuation function v : P(N) โ†’ R over the subsets of nodes, the problem is to find a partition of N into connected subsets, that maximises the sum of the components values. This problem is generally NPcomplete; in particular, it is hard for a defined class of valuation functions which are independent of disconnected membersthat is, two nodes have no effect on each others marginal con- tribution to their vertex separator. Nonetheless, for all such functions we provide bounds on the complexity of coalition structure generation over general and minorfree graphs. Our proof is constructive and yields algorithms for solving corresponding instances of the problem. Furthermore, we derive linear time bounds for graphs of bounded treewidth. However, as we show, the problem remains NPcomplete for planar graphs, and hence, for any K_k minorfree graphs where k โ‰ฅ 5. Moreover, a 3-SAT problem with m clauses can be represented by a coalition structure generation problem over a planar graph with O(m^2) nodes. Importantly, our hardness result holds for a particular subclass of valuation functions, termed edge sum, where the value of each subset of nodes is simply determined by the sum of given weights of the edges in the induced subgraph.


A Distance-Based Branch and Bound Feature Selection Algorithm

arXiv.org Machine Learning

There is no known efficient method for selecting k Gaussian features from n which achieve the lowest Bayesian classification error. We show an example of how greedy algorithms faced with this task are led to give results that are not optimal. This motivates us to propose a more robust approach. We present a Branch and Bound algorithm for finding a subset of k independent Gaussian features which minimizes the naive Bayesian classification error. Our algorithm uses additive monotonic distance measures to produce bounds for the Bayesian classification error in order to exclude many feature subsets from evaluation, while still returning an optimal solution. We test our method on synthetic data as well as data obtained from gene expression profiling.


On Information Regularization

arXiv.org Machine Learning

We formulate a principle for classification with the knowledge of the marginal distribution over the data points (unlabeled data). The principle is cast in terms of Tikhonov style regularization where the regularization penalty articulates the way in which the marginal density should constrain otherwise unrestricted conditional distributions. Specifically, the regularization penalty penalizes any information introduced between the examples and labels beyond what is provided by the available labeled examples. The work extends Szummer and Jaakkola's information regularization (NIPS 2002) to multiple dimensions, providing a regularizer independent of the covering of the space used in the derivation. We show in addition how the information regularizer can be used as a measure of complexity of the classification task with unlabeled data and prove a relevant sample-complexity bound. We illustrate the regularization principle in practice by restricting the class of conditional distributions to be logistic regression models and constructing the regularization penalty from a finite set of unlabeled examples.


Boltzmann Machine Learning with the Latent Maximum Entropy Principle

arXiv.org Machine Learning

We present a new statistical learning paradigm for Boltzmann machines based on a new inference principle we have proposed: the latent maximum entropy principle (LME). LME is different both from Jaynes maximum entropy principle and from standard maximum likelihood estimation.We demonstrate the LME principle BY deriving new algorithms for Boltzmann machine parameter estimation, and show how robust and fast new variant of the EM algorithm can be developed.Our experiments show that estimation based on LME generally yields better results than maximum likelihood estimation, particularly when inferring hidden units from small amounts of data.


Learning Measurement Models for Unobserved Variables

arXiv.org Machine Learning

Observed associations in a database may be due in whole or part to variations in unrecorded ("latent") variables. Identifying such variables and their causal relationships with one another is a principal goal in many scientific and practical domains. Previous work shows that, given a partition of observed variables such that members of a class share only a single latent common cause, standard search algorithms for causal Bayes nets can infer structural relations between latent variables. We introduce an algorithm for discovering such partitions when they exist. Uniquely among available procedures, the algorithm is (asymptotically) correct under standard assumptions in causal Bayes net search algorithms, requires no prior knowledge of the number of latent variables, and does not depend on the mathematical form of the relationships among the latent variables. We evaluate the algorithm on a variety of simulated data sets.


Locally Weighted Naive Bayes

arXiv.org Machine Learning

Despite its simplicity, the naive Bayes classifier has surprised machine learning researchers by exhibiting good performance on a variety of learning problems. Encouraged by these results, researchers have looked to overcome naive Bayes primary weakness - attribute independence - and improve the performance of the algorithm. This paper presents a locally weighted version of naive Bayes that relaxes the independence assumption by learning local models at prediction time. Experimental results show that locally weighted naive Bayes rarely degrades accuracy compared to standard naive Bayes and, in many cases, improves accuracy dramatically. The main advantage of this method compared to other techniques for enhancing naive Bayes is its conceptual and computational simplicity.