AITopics | Minimum Complexity Machines

Collaborating Authors

Minimum Complexity Machines

News Overviews Instructional Materials AI-Alerts Classics

Universal Regularizers For Robust Sparse Coding and Modeling

arXiv.org Machine LearningAug-3-2010

Sparse data models, where data is assumed to be well represented as a linear combination of a few elements from a dictionary, have gained considerable attention in recent years, and their use has led to state-of-the-art results in many signal and image processing tasks. It is now well understood that the choice of the sparsity regularization term is critical in the success of such models. Based on a codelength minimization interpretation of sparse coding, and using tools from universal coding theory, we propose a framework for designing sparsity regularization terms which have theoretical and practical advantages when compared to the more standard l0 or l1 ones. The presentation of the framework and theoretical foundations is complemented with examples that show its practical advantages in image denoising, zooming and classification.

artificial intelligence, coefficient, machine learning, (20 more...)

arXiv.org Machine Learning

1003.2941

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Information Management (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
(3 more...)

Add feedback

Reconstruction of Causal Networks by Set Covering

Fyson, Nick, De Bie, Tijl, Cristianini, Nello

arXiv.org Machine LearningJun-4-2010

We present a method for the reconstruction of networks, based on the order of nodes visited by a stochastic branching process. Our algorithm reconstructs a network of minimal size that ensures consistency with the data. Crucially, we show that global consistency with the data can be achieved through purely local considerations, inferring the neighbourhood of each node in turn. The optimisation problem solved for each individual node can be reduced to a Set Covering Problem, which is known to be NP-hard but can be approximated well in practice. We then extend our approach to account for noisy data, based on the Minimum Description Length principle. We demonstrate our algorithms on synthetic data, generated by an SIR-like epidemiological model.

artificial intelligence, machine learning, node, (17 more...)

arXiv.org Machine Learning

1006.0849

Country: Europe (0.46)

Genre: Research Report (0.82)

Industry: Health & Medicine > Epidemiology (0.34)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (0.43)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.35)

Add feedback

A Generalization of the Chow-Liu Algorithm and its Application to Statistical Learning

Suzuki, Joe

arXiv.org Artificial IntelligenceFeb-10-2010

We extend the Chow-Liu algorithm for general random variables while the previous versions only considered finite cases. In particular, this paper applies the generalization to Suzuki's learning algorithm that generates from data forests rather than trees based on the minimum description length by balancing the fitness of the data to the forest and the simplicity of the forest. As a result, we successfully obtain an algorithm when both of the Gaussian and finite random variables are present.

artificial intelligence, machine learning, random variable, (18 more...)

arXiv.org Artificial Intelligence

1002.224

Country: North America (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.34)

Add feedback

Discrete MDL Predicts in Total Variation

Hutter, Marcus

Neural Information Processing SystemsDec-31-2009

The Minimum Description Length (MDL) principle selects the model that has the shortest code for data plus model. We show that for a countable class of models, MDL predictions are close to the true distribution in a strong sense. The result is completely general. No independence, ergodicity, stationarity, identifiability, or other assumption on the model class need to be made. More formally, we show that for any countable class of models, the distributions selected by MDL (or MAP) asymptotically predict (merge with) the true measure in the class in total variation distance.

artificial intelligence, machine learning, prediction, (16 more...)

Neural Information Processing Systems

Country:

Europe (0.46)
North America (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Is there an Elegant Universal Theory of Prediction?

Legg, Shane

arXiv.org Artificial IntelligenceDec-1-2009

Could there exist an elegant and universal theory of sequence pre diction? Solomonoff's model of induction rapidly learns to make optimal predict ions for any computable sequence, including probabilistic ones [13, 14]. In deed the problem of sequence prediction could well be considered solved [9, 8], if it were not for the fact that Solomonoff's theoretical model is incomputab le. Among computable theories there exist powerful general predict ors, such as the Lempel-Ziv algorithm [5] and Context Tree Weighting [18], that can learn to predict some complex sequences, but not others. Some prediction methods, such as the Minimum Description Length principle [12] and the Minimum Messa ge Length principle [17], can even be viewed as computable approximation s to Solomonoff induction [10]. However in practice their power and genera lity are limited by the power of compression and coding methods employed, as well as having a significantly reduced data efficiency as compared to Solom onoff induction [11]. This work was supported by SNF grant 200020-107616.

artificial intelligence, machine learning, sequence, (17 more...)

arXiv.org Artificial Intelligence

cs/0606070

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.35)

Add feedback

Sparsification and feature selection by compressive linear regression

Popescu, Florin, Renz, Daniel

arXiv.org Machine LearningOct-21-2009

The Minimum Description Length (MDL) principle states that the optimal model for a given data set is that which compresses it best. Due to practial limitations the model can be restricted to a class such as linear regression models, which we address in this study. As in other formulations such as the LASSO and forward step-wise regression we are interested in sparsifying the feature set while preserving generalization ability. We derive a well-principled set of codes for both parameters and error residuals along with smooth approximations to lengths of these codes as to allow gradient descent optimization of description length, and go on to show that sparsification and feature selection using our approach is faster than the LASSO on several datasets from the UCI and StatLib repositories, with favorable generalization accuracy, while being fully automatic, requiring neither cross-validation nor tuning of regularization hyper-parameters, allowing even for a nonlinear expansion of the feature set followed by sparsification.

artificial intelligence, machine learning, regression, (19 more...)

arXiv.org Machine Learning

0910.4135

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Discrete MDL Predicts in Total Variation

Hutter, Marcus

arXiv.org Machine LearningSep-24-2009

artificial intelligence, machine learning, prediction, (17 more...)

arXiv.org Machine Learning

0909.4588

Country:

Europe (0.46)
North America (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)

Add feedback

A Minimum Description Length Approach to Multitask Feature Selection

Tomasik, Brian

arXiv.org Artificial IntelligenceMay-29-2009

Many regression problems involve not one but several response variables (y's). Often the responses are suspected to share a common underlying structure, in which case it may be advantageous to share information across them; this is known as multitask learning. As a special case, we can use multiple responses to better identify shared predictive features -- a project we might call multitask feature selection. This thesis is organized as follows. Section 1 introduces feature selection for regression, focusing on ell_0 regularization methods and their interpretation within a Minimum Description Length (MDL) framework. Section 2 proposes a novel extension of MDL feature selection to the multitask setting. The approach, called the "Multiple Inclusion Criterion" (MIC), is designed to borrow information across regression tasks by more easily selecting features that are associated with multiple responses. We show in experiments on synthetic and real biological data sets that MIC can reduce prediction error in settings where features are at least partially shared across responses. Section 3 surveys hypothesis testing by regression with a single response, focusing on the parallel between the standard Bonferroni correction and an MDL approach. Mirroring the ideas in Section 2, Section 4 proposes a novel MIC approach to hypothesis testing with multiple responses and shows that on synthetic data with significant sharing of features across responses, MIC sometimes outperforms standard FDR-controlling methods in terms of finding true positives for a given level of false positives. Section 5 concludes.

artificial intelligence, coefficient, machine learning, (19 more...)

arXiv.org Artificial Intelligence

0906.0052

Country: North America > United States (0.67)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Endocrinology (0.45)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback

The Redundancy of a Computable Code on a Noncomputable Distribution

Dębowski, Łukasz

arXiv.org Machine LearningApr-10-2009

We introduce new definitions of universal and superuniversal computable codes, which are based on a code's ability to approximate Kolmogorov complexity within the prescribed margin for all individual sequences from a given set. Such sets of sequences may be singled out almost surely with respect to certain probability measures. Consider a measure parameterized with a real parameter and put an arbitrary prior on the parameter. The Bayesian measure is the expectation of the parameterized measure with respect to the prior. It appears that a modified Shannon-Fano code for any computable Bayesian measure, which we call the Bayesian code, is superuniversal on a set of parameterized measure-almost all sequences for prior-almost every parameter. According to this result, in the typical setting of mathematical statistics no computable code enjoys redundancy which is ultimately much less than that of the Bayesian code. Thus we introduce another characteristic of computable codes: The catch-up time is the length of data for which the code length drops below the Kolmogorov complexity plus the prescribed margin. Some codes may have smaller catch-up times than Bayesian codes.

artificial intelligence, bayesian code, machine learning, (16 more...)

arXiv.org Machine Learning

0901.2321

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

Add feedback

Classification via Minimum Incremental Coding Length (MICL)

Wright, John, Tao, Yangyu, Lin, Zhouchen, Ma, Yi, Shum, Heung-yeung

Neural Information Processing SystemsDec-31-2008

We present a simple new criterion for classification, based on principles from lossy data compression. The criterion assigns a test sample to the class that uses the minimum number of additional bits to code the test sample, subject to an allowable distortion. We prove asymptotic optimality of this criterion for Gaussian data and analyze its relationships to classical classifiers. Theoretical results provide new insights into relationships among popular classifiers such as MAP and RDA, as well as unsupervised clustering methods based on lossy compression [13]. Minimizing the lossy coding length induces a regularization effect which stabilizes the (implicit) density estimate in a small-sample setting. Compression also provides a uniform means of handling classes of varying dimension. This simple classification criterion and its kernel and local versions perform competitively against existing classifiers on both synthetic examples and real imagery data such as handwritten digits and human faces, without requiring domain-specific information.

classifier, criterion, micl, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois (0.04)
Asia (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.47)

Add feedback