Goto

Collaborating Authors

 Europe


High-dimensional Sparse Inverse Covariance Estimation using Greedy Methods

arXiv.org Machine Learning

In this paper we consider the task of estimating the non-zero pattern of the sparse inverse covariance matrix of a zero-mean Gaussian random vector from a set of iid samples. Note that this is also equivalent to recovering the underlying graph structure of a sparse Gaussian Markov Random Field (GMRF). We present two novel greedy approaches to solving this problem. The first estimates the non-zero covariates of the overall inverse covariance matrix using a series of global forward and backward greedy steps. The second estimates the neighborhood of each node in the graph separately, again using greedy forward and backward steps, and combines the intermediate neighborhoods to form an overall estimate. The principal contribution of this paper is a rigorous analysis of the sparsistency, or consistency in recovering the sparsity pattern of the inverse covariance matrix. Surprisingly, we show that both the local and global greedy methods learn the full structure of the model with high probability given just $O(d\log(p))$ samples, which is a \emph{significant} improvement over state of the art $\ell_1$-regularized Gaussian MLE (Graphical Lasso) that requires $O(d^2\log(p))$ samples. Moreover, the restricted eigenvalue and smoothness conditions imposed by our greedy methods are much weaker than the strong irrepresentable conditions required by the $\ell_1$-regularization based methods. We corroborate our results with extensive simulations and examples, comparing our local and global greedy methods to the $\ell_1$-regularized Gaussian MLE as well as the Neighborhood Greedy method to that of nodewise $\ell_1$-regularized linear regression (Neighborhood Lasso).


Estimation And Selection Via Absolute Penalized Convex Minimization And Its Multistage Adaptive Applications

arXiv.org Machine Learning

The $\ell_1$-penalized method, or the Lasso, has emerged as an important tool for the analysis of large data sets. Many important results have been obtained for the Lasso in linear regression which have led to a deeper understanding of high-dimensional statistical problems. In this article, we consider a class of weighted $\ell_1$-penalized estimators for convex loss functions of a general form, including the generalized linear models. We study the estimation, prediction, selection and sparsity properties of the weighted $\ell_1$-penalized estimator in sparse, high-dimensional settings where the number of predictors $p$ can be much larger than the sample size $n$. Adaptive Lasso is considered as a special case. A multistage method is developed to apply an adaptive Lasso recursively. We provide $\ell_q$ oracle inequalities, a general selection consistency theorem, and an upper bound on the dimension of the Lasso estimator. Important models including the linear regression, logistic regression and log-linear models are used throughout to illustrate the applications of the general results.


Dr.Fill: Crosswords and an Implemented Solver for Singly Weighted CSPs

Journal of Artificial Intelligence Research

We describe Dr.Fill, a program that solves American-style crossword puzzles. From a technical perspective, Dr.Fill works by converting crosswords to weighted CSPs, and then using a variety of novel techniques to find a solution. These techniques include generally applicable heuristics for variable and value selection, a variant of limited discrepancy search, and postprocessing and partitioning ideas. Branch and bound is not used, as it was incompatible with postprocessing and was determined experimentally to be of little practical value. Dr.Filll's performance on crosswords from the American Crossword Puzzle Tournament suggests that it ranks among the top fifty or so crossword solvers in the world.


Document Clustering based on Topic Maps

arXiv.org Artificial Intelligence

Importance of document clustering is now widely acknowledged by researchers for better management, smart navigation, efficient filtering, and concise summarization of large collection of documents like World Wide Web (WWW). The next challenge lies in semantically performing clustering based on the semantic contents of the document. The problem of document clustering has two main components: (1) to represent the document in such a form that inherently captures semantics of the text. This may also help to reduce dimensionality of the document, and (2) to define a similarity measure based on the semantic representation such that it assigns higher numerical values to document pairs which have higher semantic relationship. Feature space of the documents can be very challenging for document clustering. A document may contain multiple topics, it may contain a large set of class-independent general-words, and a handful class-specific core-words. With these features in mind, traditional agglomerative clustering algorithms, which are based on either Document Vector model (DVM) or Suffix Tree model (STC), are less efficient in producing results with high cluster quality. This paper introduces a new approach for document clustering based on the Topic Map representation of the documents. The document is being transformed into a compact form. A similarity measure is proposed based upon the inferred information through topic maps data and structures. The suggested method is implemented using agglomerative hierarchal clustering and tested on standard Information retrieval (IR) datasets. The comparative experiment reveals that the proposed approach is effective in improving the cluster quality.


Convex Optimization without Projection Steps

arXiv.org Artificial Intelligence

For the general problem of minimizing a convex function over a compact convex domain, we will investigate a simple iterative approximation algorithm based on the method by Frank & Wolfe 1956, that does not need projection steps in order to stay inside the optimization domain. Instead of a projection step, the linearized problem defined by a current subgradient is solved, which gives a step direction that will naturally stay in the domain. Our framework generalizes the sparse greedy algorithm of Frank & Wolfe and its primal-dual analysis by Clarkson 2010 (and the low-rank SDP approach by Hazan 2008) to arbitrary convex domains. We give a convergence proof guaranteeing {\epsilon}-small duality gap after O(1/{\epsilon}) iterations. The method allows us to understand the sparsity of approximate solutions for any l1-regularized convex optimization problem (and for optimization over the simplex), expressed as a function of the approximation quality. We obtain matching upper and lower bounds of {\Theta}(1/{\epsilon}) for the sparsity for l1-problems. The same bounds apply to low-rank semidefinite optimization with bounded trace, showing that rank O(1/{\epsilon}) is best possible here as well. As another application, we obtain sparse matrices of O(1/{\epsilon}) non-zero entries as {\epsilon}-approximate solutions when optimizing any convex function over a class of diagonally dominant symmetric matrices. We show that our proposed first-order method also applies to nuclear norm and max-norm matrix optimization problems. For nuclear norm regularized optimization, such as matrix completion and low-rank recovery, we demonstrate the practical efficiency and scalability of our algorithm for large matrix problems, as e.g. the Netflix dataset. For general convex optimization over bounded matrix max-norm, our algorithm is the first with a convergence guarantee, to the best of our knowledge.


Finding Density Functionals with Machine Learning

arXiv.org Machine Learning

Institute of Pharmaceutical Sciences, ETH Zurich, 8093 Zürich, Switzerland (Dated: March 5, 2018) Machine learning is used to approximate density functionals. For the model problem of the kinetic energy of non-interacting fermions in 1d, mean absolute errors below 1 kcal/mol on test densities similar to the training set are reached with fewer than 100 training densities. A predictor identifies if a test density is within the interpolation region. Via principal component analysis, a projected functional derivative finds highly accurate self-consistent densities. Challenges for application of our method to real electronic structure problems are discussed.


Additive Gaussian Processes

arXiv.org Machine Learning

We introduce a Gaussian process model of functions which areadditive . An additive function is one which decomposes into a sum of low-dimensional functions, each depending on only a subset of the input variables. Additive GPs generalize both Generalized Additive Models, and the standard GP models which use squared-exponential kernels. Hyperparameter learning in this model can be seen as Bayesian Hierarchical Kernel Learning (HKL). We introduce an expressive but tractable parameterization of the kernel function, which allows efficient evaluation of all input interaction terms, whose number is exponential in the input dimension. The additional structure discoverable by this model results in increased interpretability, as well as state-of-the-art predictive power in regression tasks.


A New Algorithm for Exploratory Projection Pursuit

arXiv.org Machine Learning

In this paper, we propose a new algorithm for exploratory projection pursuit. The basis of the algorithm is the insight that previous approaches used fairly narrow definitions of interestingness / non interestingness. We argue that allowing these definitions to depend on the problem / data at hand is a more natural approach in an exploratory technique. This also allows our technique much greater applicability than the approaches extant in the literature. Complementing this insight, we propose a class of projection indices based on the spatial distribution function that can make use of such information. Finally, with the help of real datasets, we demonstrate how a range of multivariate exploratory tasks can be addressed with our algorithm. The examples further demonstrate that the proposed indices are quite capable of focussing on the interesting structure in the data, even when this structure is otherwise hard to detect or arises from very subtle patterns.


Computable de Finetti measures

arXiv.org Machine Learning

We prove a computable version of de Finetti's theorem on exchangeable sequences of real random variables. As a consequence, exchangeable stochastic processes expressed in probabilistic functional programming languages can be automatically rewritten as procedures that do not modify non-local state. Along the way, we prove that a distribution on the unit interval is computable if and only if its moments are uniformly computable. Key words: de Finetti's theorem, exchangeability, computable probability theory, probabilistic programming languages, mutation 2010 MSC: 03D78, 60G09, 68Q10, 03F60, 68N18 1. Introduction The classical de Finetti theorem states that an exchangeable sequence of real random variables is a mixture of independent and identically distributed (i.i.d.) sequences of random variables. Moreover, there is an (almost surely unique) measure-valued random variable, called the directing random measure, conditioned on which the random sequence is i.i.d. The distribution of the directing random measure is called the de Finetti measure or the mixing measure. This paper examines the computable probability theory of exchangeable sequences of real-valued random variables. We prove a computable version of de Finetti's theorem: the distribution of an exchangeable sequence of real random variables is computable if and only if its de Finetti measure is computable. The classical proofs do not readily effectivize; instead, we show how to directly compute the de Finetti measure (as characterized by the classical theorem) in terms of a computable representation of the distribution of the exchangeable sequence. A key step in the proof is to describe the de Finetti measure in terms of the moments of a set of random variables derived from the exchangeable sequence. When the directing random measure is (almost surely) continuous, we can show that these moments are computable, which suffices to complete the proof of the main theorem in this case.


Theoretical and Practical Foundations of Large-Scale Agent-Based Micro-Storage in the Smart Grid

Journal of Artificial Intelligence Research

In this paper, we present a novel decentralised management technique that allows electricity micro-storage devices, deployed within individual homes as part of a smart electricity grid, to converge to profitable and efficient behaviours. Specifically, we propose the use of software agents, residing on the users' smart meters, to automate and optimise the charging cycle of micro-storage devices in the home to minimise its costs, and we present a study of both the theoretical underpinnings and the implications of a practical solution, of using software agents for such micro-storage management. First, by formalising the strategic choice each agent makes in deciding when to charge its battery, we develop a game-theoretic framework within which we can analyse the competitive equilibria of an electricity grid populated by such agents and hence predict the best consumption profile for that population given their battery properties and individual load profiles. Our framework also allows us to compute theoretical bounds on the amount of storage that will be adopted by the population. Second, to analyse the practical implications of micro-storage deployments in the grid, we present a novel algorithm that each agent can use to optimise its battery storage profile in order to minimise its owner's costs. This algorithm uses a learning strategy that allows it to adapt as the price of electricity changes in real-time, and we show that the adoption of these strategies results in the system converging to the theoretical equilibria. Finally, we empirically evaluate the adoption of our micro-storage management technique within a complex setting, based on the UK electricity market, where agents may have widely varying load profiles, battery types, and learning rates. In this case, our approach yields savings of up to 14% in energy cost for an average consumer using a storage device with a capacity of less than 4.5 kWh and up to a 7% reduction in carbon emissions resulting from electricity generation (with only domestic consumers adopting micro-storage and, commercial and industrial consumers not changing their demand). Moreover, corroborating our theoretical bound, an equilibrium is shown to exist where no more than 48% of households would wish to own storage devices and where social welfare would also be improved (yielding overall annual savings of nearly £1.5B).