Goto

Collaborating Authors

 Africa


Scalable Sparse Covariance Estimation via Self-Concordance

AAAI Conferences

We consider the class of convex minimization problems, composed of a self-concordant function, such as the logdet metric, a convex data fidelity term h(.) and, a regularizing — possibly non-smooth — function g(.). This type of problems have recently attracted a great deal of interest, mainly due to their omnipresence in top-notch applications. Under this locally Lipschitz continuous gradient setting, we analyze the convergence behavior of proximal Newton schemes with the added twist of a probable presence of inexact evaluations. We prove attractive convergence rate guarantees and enhance state-of-the-art optimization schemes to accommodate such developments. Experimental results on sparse covariance estimation show the merits of our algorithm, both in terms of recovery efficiency and complexity.


Adaptive Singleton-Based Consistencies

AAAI Conferences

Singleton-based consistencies have been shown to dramatically improve the performance of constraint solvers on some difficult instances. However, they are in general too expensive to be applied exhaustively during the whole search. In this paper, we focus on partition-one-AC, a singleton-based consistency which, as opposed to singleton arc consistency, is able to prune values on all variables when it performs singleton tests on one of them. We propose adaptive variants of partition-one-AC that do not necessarily run until having proved the fixpoint. The pruning can be weaker than the full version but the computational effort can be significantly reduced. Our experiments show that adaptive Partition-one-AC can obtain significant speed-ups over arc consistency and over the full version of partition-one-AC.


Semantic Data Representation for Improving Tensor Factorization

AAAI Conferences

Predicting human activities is important for improving recommender systems or analyzing social relationships among users. Those human activities are usually repre- sented as multi-object relationships (e.g. user’s tagging activities for items or user’s tweeting activities at some locations). Since multi-object relationships are naturally represented as a tensor, tensor factorization is becom- ing more important for predicting users’ possible ac- tivities. However, its prediction accuracy is weak for ambiguous and/or sparsely observed objects. Our so- lution, Semantic data Representation for Tensor Fac- torization (SRTF), tackles these problems by incorpo- rating semantics into tensor factorization based on the following ideas: (1) It first links objects to vocabu- laries/taxonomies and resolves the ambiguity caused by objects that can be used for multiple purposes. (2) It next links objects to composite classes that merge classes in different kinds of vocabularies/taxonomies (e.g. classes in vocabularies for movie genres and those for directors) to avoid low prediction accuracy caused by rough-grained semantics. (3) It then lifts sparsely observed objects into their classes to solve the sparsity problem for rarely observed objects. To the best of our knowledge, this is the first study that leverages seman- tics to inject expert knowledge into tensor factorization. Experiments show that SRTF achieves up to 10% higher accuracy than state-of-the-art methods.


Learning the Structure of Probabilistic Graphical Models with an Extended Cascading Indian Buffet Process

AAAI Conferences

This paper presents an extension of the cascading Indian buffet process (CIBP) intended to learning arbitrary directed acyclic graph structures as opposed to the CIBP, which is limited to purely layered structures. The extended cascading Indian buffet process (eCIBP) essentially consists in adding an extra sampling step to the CIBP to generate connections between non-consecutive layers. In the context of graphical model structure learning, the proposed approach allows learning structures having an unbounded number of hidden random variables and automatically selecting the model complexity. We evaluated the extended process on multivariate density estimation and structure identification tasks by measuring the structure complexity and predictive performance. The results suggest the extension leads to extracting simpler graphs without scarifying predictive precision.


Dynamic Bayesian Probabilistic Matrix Factorization

AAAI Conferences

Collaborative filtering algorithms generally rely on the assumption that user preference patterns remain stationary. However, real-world relational data are seldom stationary. User preference patterns may change over time, giving rise to the requirement of designing collaborative filtering systems capable of detecting and adapting to preference pattern shifts. Motivated by this observation, in this paper we propose a dynamic Bayesian probabilistic matrix factorization model, designed for modeling time-varying distributions. Formulation of our model is based on imposition of a dynamic hierarchical Dirichlet process (dHDP) prior over the space of probabilistic matrix factorization models to capture the time-evolving statistical properties of modeled sequential relational datasets. We develop a simple Markov Chain Monte Carlo sampler to perform inference. We present experimental results to demonstrate the superiority of our temporal model.


DFacTo: Distributed Factorization of Tensors

arXiv.org Machine Learning

We present a technique for significantly speeding up Alternating Least Squares (ALS) and Gradient Descent (GD), two widely used algorithms for tensor factorization. By exploiting properties of the Khatri-Rao product, we show how to efficiently address a computationally challenging sub-step of both algorithms. Our algorithm, DFacTo, only requires two sparse matrix-vector products and is easy to parallelize. DFacTo is not only scalable but also on average 4 to 10 times faster than competing algorithms on a variety of datasets. For instance, DFacTo only takes 480 seconds on 4 machines to perform one iteration of the ALS algorithm and 1,143 seconds to perform one iteration of the GD algorithm on a 6.5 million x 2.5 million x 1.5 million dimensional tensor with 1.2 billion non-zero entries.


Scalable sparse covariance estimation via self-concordance

arXiv.org Machine Learning

We consider the class of convex minimization problems, composed of a self-concordant function, such as the $\log\det$ metric, a convex data fidelity term $h(\cdot)$ and, a regularizing -- possibly non-smooth -- function $g(\cdot)$. This type of problems have recently attracted a great deal of interest, mainly due to their omnipresence in top-notch applications. Under this \emph{locally} Lipschitz continuous gradient setting, we analyze the convergence behavior of proximal Newton schemes with the added twist of a probable presence of inexact evaluations. We prove attractive convergence rate guarantees and enhance state-of-the-art optimization schemes to accommodate such developments. Experimental results on sparse covariance estimation show the merits of our algorithm, both in terms of recovery efficiency and complexity.


Topic-Based Dissimilarity and Sensitivity Models for Translation Rule Selection

Journal of Artificial Intelligence Research

Translation rule selection is a task of selecting appropriate translation rules for an ambiguous source-language segment. As translation ambiguities are pervasive in statistical machine translation, we introduce two topic-based models for translation rule selection which incorporates global topic information into translation disambiguation. We associate each synchronous translation rule with source- and target-side topic distributions.With these topic distributions, we propose a topic dissimilarity model to select desirable (less dissimilar) rules by imposing penalties for rules with a large value of dissimilarity of their topic distributions to those of given documents. In order to encourage the use of non-topic specific translation rules, we also present a topic sensitivity model to balance translation rule selection between generic rules and topic-specific rules. Furthermore, we project target-side topic distributions onto the source-side topic model space so that we can benefit from topic information of both the source and target language. We integrate the proposed topic dissimilarity and sensitivity model into hierarchical phrase-based machine translation for synchronous translation rule selection. Experiments show that our topic-based translation rule selection model can substantially improve translation quality.


Probabilistic Archetypal Analysis

arXiv.org Machine Learning

Archetypal analysis (AA) represents observations as composition of pure patterns, i.e., archetypes, or equivalently convex combinations of extreme values (Cutler and Breiman, 1994). Although AA bears resemblance with many well established prototypical analysis tools, such as principal component analysis (PCA, Mohamed et al, 2009), nonnegative matrix factorization (NMF, F evotte and Idier, 2011), probabilistic latent semantic analysis (Hofmann, 2013), andk -means (Steinley, 2006); AA is arguably unique, both conceptually and computationally . Conceptually, AA imitates the human tendency of representing a group of objects by its extreme elements (Davis and Love, 2010): this makes AA an interesting exploratory tool for applied scientists (e.g., Eugster, 2012; Seiler and Wohlrabe, 2013). Computationally, AA is data-driven, and requires the factors to be probability vectors: these make AA a computationally demanding tool, yet brings better interpretability . The concept of AA was originally formulated by Cutler and Breiman (1994).


Inapproximability of Treewidth and Related Problems

Journal of Artificial Intelligence Research

Graphical models, such as Bayesian Networks and Markov networks play an important role in artificial intelligence and machine learning. Inference is a central problem to be solved on these networks. This, and other problems on these graph models are often known to be hard to solve in general, but tractable on graphs with bounded Treewidth. Therefore, finding or approximating the Treewidth of a graph is a fundamental problem related to inference in graphical models. In this paper, we study the approximability of a number of graph problems: Treewidth and Pathwidth of graphs, Minimum Fill-In, One-Shot Black (and Black-White) pebbling costs of directed acyclic graphs, and a variety of different graph layout problems such as Minimum Cut Linear Arrangement and Interval Graph Completion. We show that, assuming the recently introduced Small Set Expansion Conjecture, all of these problems are NP-hard to approximate to within any constant factor in polynomial time.