Goto

Collaborating Authors

 Minimum Complexity Machines


Parsimonious module inference in large networks

arXiv.org Machine Learning

We investigate the detectability of modules in large networks when the number of modules is not known in advance. We employ the minimum description length (MDL) principle which seeks to minimize the total amount of information required to describe the network, and avoid overfitting. According to this criterion, we obtain general bounds on the detectability of any prescribed block structure, given the number of nodes and edges in the sampled network. We also obtain that the maximum number of detectable blocks scales as $\sqrt{N}$, where $N$ is the number of nodes in the network, for a fixed average degree $$. We also show that the simplicity of the MDL approach yields an efficient multilevel Monte Carlo inference algorithm with a complexity of $O(\tau N\log N)$, if the number of blocks is unknown, and $O(\tau N)$ if it is known, where $\tau$ is the mixing time of the Markov chain. We illustrate the application of the method on a large network of actors and films with over $10^6$ edges, and a dissortative, bipartite block structure.


Minimum Encoding Approaches for Predictive Modeling

arXiv.org Machine Learning

We analyze differences between two information-theoretically motivated approaches to statistical inference and model selection: the Minimum Description Length (MDL) principle, and the Minimum Message Length (MML) principle. Based on this analysis, we present two revised versions of MML: a pointwise estimator which gives the MML-optimal single parameter model, and a volumewise estimator which gives the MML-optimal region in the parameter space. Our empirical results suggest that with small data sets, the MDL approach yields more accurate predictions than the MML estimators. The empirical results also demonstrate that the revised MML estimators introduced here perform better than the original MML estimator suggested by Wallace and Freeman.


SMML estimators for 1-dimensional continuous data

arXiv.org Machine Learning

The minimum message length(MML) principle[4] is an information theoretic criterion that links data compression with statistical inference [3]. It has a number of useful properties and it has close connections with Kolmogorov complexity [5]. Using the MML principle to construct estimators is known to be NPhard in general [1] so it is common to use approximations in practice [3]. The term'strict minimum message length' (SMML) is used to distinguish the exact MML criterion from these approximations. The only known algorithm for calculating an SMML estimator is Farr's algorithm [1] which applies to data taking values in a finite set which is (in some sense) 1-dimensional.


Context Tree Maximizing

AAAI Conferences

Recent developments in reinforcement learning for non-Markovianproblems witness a surge in history-based methods, among which weare particularly interested in two frameworks, PhiMDP and MC-AIXI-CTW. PhiMDP attempts to reduce the general RL problem, where the environment's states and dynamics are both unknown, toan MDP, while MC-AIXI-CTW incrementally learns a mixture of contexttrees as its environment model. The main idea of PhiMDP is toconnect generic reinforcement learning with classical reinforcementlearning. The first implementation of PhiMDP relies on astochastic search procedure for finding a tree that minimizes acertain cost function. This does not guarantee finding theminimizing tree, or even a good one, given limited search time. As aconsequence it appears that the approach has difficulties with largedomains. MC-AIXI-CTW is attractive in that it can incrementally andanalytically compute the internal model through interactions withthe environment. Unfortunately, it is computationally demanding dueto requiring heavy planning simulations at every single time step.We devise a novel approach called CTMRL, which analytically andefficiently finds the cost-minimizing tree. Instead of thecontext-tree weighting method that MC-AIXI-CTW is based on, we usethe closely related context-tree maximizing algorithm that selectsjust one single tree. This approach falls under the PhiMDPframework, which allows the replacement of the costly planningcomponent of MC-AIXI-CTW with simple Q-Learning. Our empiricalinvestigation show that CTMRL finds policies of quality as good as MC-AIXI-CTW's on sixdomains including a challenging Pacman domain, but in an order ofmagnitude less time.


An MDL framework for sparse coding and dictionary learning

arXiv.org Machine Learning

The power of sparse signal modeling with learned over-complete dictionaries has been demonstrated in a variety of applications and fields, from signal processing to statistical inference and machine learning. However, the statistical properties of these models, such as under-fitting or over-fitting given sets of data, are still not well characterized in the literature. As a result, the success of sparse modeling depends on hand-tuning critical parameters for each data and application. This work aims at addressing this by providing a practical and objective characterization of sparse models by means of the Minimum Description Length (MDL) principle -- a well established information-theoretic approach to model selection in statistical inference. The resulting framework derives a family of efficient sparse coding and dictionary learning algorithms which, by virtue of the MDL principle, are completely parameter free. Furthermore, such framework allows to incorporate additional prior information to existing models, such as Markovian dependencies, or to define completely new problem formulations, including in the matrix analysis area, in a natural way. These virtues will be demonstrated with parameter-free algorithms for the classic image denoising and classification problems, and for low-rank matrix recovery in video applications.


Low-rank data modeling via the Minimum Description Length principle

arXiv.org Machine Learning

Robust low-rank matrix estimation is a topic of increasing interest, with promising applications in a variety of fields, from computer vision to data mining and recommender systems. Recent theoretical results establish the ability of such data models to recover the true underlying low-rank matrix when a large portion of the measured matrix is either missing or arbitrarily corrupted. However, if low rank is not a hypothesis about the true nature of the data, but a device for extracting regularity from it, no current guidelines exist for choosing the rank of the estimated matrix. In this work we address this problem by means of the Minimum Description Length (MDL) principle -- a well established information-theoretic approach to statistical inference -- as a guideline for selecting a model for the data at hand. We demonstrate the practical usefulness of our formal approach with results for complex background extraction in video sequences.


Multi-Select Faceted Navigation Based on Minimum Description Length Principle

AAAI Conferences

Faceted navigation can effectively reduce user efforts of reaching targeted resources in databases, by suggesting dynamic facet values for iterative query refinement. A key issue is minimizing the navigation cost in a user query session. Conventional navigation scheme assumes that at each step, users select only one suggested value to figure out resources containing it. To make faceted navigation more flexible and effective, this paper introduces a multi-select scheme where multiple suggested values can be selected at one step, and a selected value can be used to either retain or exclude the resources containing it. Previous algorithms for cost-driven value suggestion can hardly work well under our navigation scheme. Therefore, we propose to optimize the navigation cost using the Minimum Description Length principle, which can well balance the number of navigation steps and the number of suggested values per step under our new scheme. An emperical study demonstrates that our approach is more cost-saving and efficient than state-of-the-art approaches.


Notes on a New Philosophy of Empirical Science

arXiv.org Machine Learning

This book presents a methodology and philosophy of empirical science based on large scale lossless data compression. In this view a theory is scientific if it can be used to build a data compression program, and it is valuable if it can compress a standard benchmark database to a small size, taking into account the length of the compressor itself. This methodology therefore includes an Occam principle as well as a solution to the problem of demarcation. Because of the fundamental difficulty of lossless compression, this type of research must be empirical in nature: compression can only be achieved by discovering and characterizing empirical regularities in the data. Because of this, the philosophy provides a way to reformulate fields such as computer vision and computational linguistics as empirical sciences: the former by attempting to compress databases of natural images, the latter by attempting to compress large text databases. The book argues that the rigor and objectivity of the compression principle should set the stage for systematic progress in these fields. The argument is especially strong in the context of computer vision, which is plagued by chronic problems of evaluation. The book also considers the field of machine learning. Here the traditional approach requires that the models proposed to solve learning problems be extremely simple, in order to avoid overfitting. However, the world may contain intrinsically complex phenomena, which would require complex models to understand. The compression philosophy can justify complex models because of the large quantity of data being modeled (if the target database is 100 Gb, it is easy to justify a 10 Mb model). The complex models and abstractions learned on the basis of the raw data (images, language, etc) can then be reused to solve any specific learning problem, such as face recognition or machine translation.


Universal Regularizers For Robust Sparse Coding and Modeling

arXiv.org Machine Learning

Sparse data models, where data is assumed to be well represented as a linear combination of a few elements from a dictionary, have gained considerable attention in recent years, and their use has led to state-of-the-art results in many signal and image processing tasks. It is now well understood that the choice of the sparsity regularization term is critical in the success of such models. Based on a codelength minimization interpretation of sparse coding, and using tools from universal coding theory, we propose a framework for designing sparsity regularization terms which have theoretical and practical advantages when compared to the more standard l0 or l1 ones. The presentation of the framework and theoretical foundations is complemented with examples that show its practical advantages in image denoising, zooming and classification.


Reconstruction of Causal Networks by Set Covering

arXiv.org Machine Learning

We present a method for the reconstruction of networks, based on the order of nodes visited by a stochastic branching process. Our algorithm reconstructs a network of minimal size that ensures consistency with the data. Crucially, we show that global consistency with the data can be achieved through purely local considerations, inferring the neighbourhood of each node in turn. The optimisation problem solved for each individual node can be reduced to a Set Covering Problem, which is known to be NP-hard but can be approximated well in practice. We then extend our approach to account for noisy data, based on the Minimum Description Length principle. We demonstrate our algorithms on synthetic data, generated by an SIR-like epidemiological model.