AITopics | Minimum Complexity Machines

Collaborating Authors

Minimum Complexity Machines

News Overviews Instructional Materials AI-Alerts Classics

Technical Perspective: Finding Connections between One-Way Functions and Kolmogorov Complexity

Communications of the ACMApr-22-2023, 10:50:40 GMT

Cryptography requires "useful" sources of computational hardness for most of its constructs. For example, in the classic setting of encryption schemes, decryption should be easy when given an appropriate decryption key, while it must be infeasible without it. Fortunately, the theory of computational complexity generously provides a wide variety of sources of computational hardness, but which ones may be useful for cryptography? The long-celebrated interplay between cryptography and computational complexity has been challenged constantly with understanding what "useful" hardness means, where it may be found, and how it may be utilized. This has led the cryptography community to embark on an exciting journey initiated by the pioneering work of Whitfield Diffie and Martin Hellman back in 1976.

artificial intelligence, machine learning, one-way function and kolmogorov complexity, (6 more...)

Communications of the ACM

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.40)

Add feedback

Chaitin-Kolmogorov Complexity and Generalization in Neural Networks

Neural Information Processing SystemsApr-6-2023, 19:32:15 GMT

We present a unified framework for a number of different ways of failing to generalize properly. The complexity of the function computed is therefore increased, and generalization is degraded. We analyze replicated networks, in which a number of identical networks are independently trained on the same data and their results averaged. We conclude that replication almost always results in a decrease in the expected complexity of the network, and that replication therefore increases expected generalization. Simulations confirming the effect are also presented.

artificial intelligence, chaitin-kolmogorov complexity and generalization, machine learning, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.40)

Add feedback

Developing Population Codes by Minimizing Description Length

Neural Information Processing SystemsApr-6-2023, 18:58:10 GMT

The Minimum Description Length principle (MDL) can be used to train the hidden units of a neural network to extract a representa(cid:173) tion that is cheap to describe but nonetheless allows the input to be reconstructed accurately. We show how MDL can be used to develop highly redundant population codes. Each hidden unit has a location in a low-dimensional implicit space. If the hidden unit activities form a bump of a standard shape in this space, they can be cheaply encoded by the center ofthis bump. So the weights from the input units to the hidden units in an autoencoder are trained to make the activities form a standard bump.

artificial intelligence, machine learning, minimizing description length, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.41)

Add feedback

Autoencoders, Minimum Description Length and Helmholtz Free Energy

Neural Information Processing SystemsApr-6-2023, 18:57:07 GMT

An autoencoder network uses a set of recognition weights to convert an input vector into a code vector. It then uses a set of generative weights to convert the code vector into an approximate reconstruction of the input vector. We derive an objective function for training autoencoders based on the Minimum Description Length (MDL) principle. The aim is to minimize the information required to describe both the code vector and the reconstruction error. We show that this information is minimized by choosing code vectors stochastically according to a Boltzmann distri(cid:173) bution, where the generative weights define the energy of each possible code vector given the input vector.

artificial intelligence, machine learning, minimum description length, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

The Use of MDL to Select among Computational Models of Cognition

Neural Information Processing SystemsApr-6-2023, 17:03:24 GMT

How should we decide among competing explanations of a cognitive process given limited observations? The problem of model selection is at the heart of progress in cognitive science. In this paper, Minimum Description Length (MDL) is introduced as a method for selecting among computational models of cognition. We also show that differential geometry provides an intuitive understanding of what drives model selection in MDL. Finally, adequacy of MDL is demonstrated in two areas of cognitive modeling.

artificial intelligence, complexity, machine learning, (13 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Cognitive Science > Cognitive Architectures (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.38)

Add feedback

Transfer Learning using Kolmogorov Complexity: Basic Theory and Empirical Evaluations

Neural Information Processing SystemsApr-6-2023, 14:48:49 GMT

In transfer learning we aim to solve new problems using fewer examples using information gained from solving related problems. Transfer learning has been successful in practice, and extensive PAC analysis of these methods has been de- veloped. However it is not yet clear how to define relatedness between tasks. This is considered as a major problem as it is conceptually troubling and it makes it unclear how much information to transfer and when and how to transfer it. In this paper we propose to measure the amount of information one task contains about another using conditional Kolmogorov complexity between the tasks.

artificial intelligence, kolmogorov complexity, machine learning, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.70)

Add feedback

Discrete MDL Predicts in Total Variation

Neural Information Processing SystemsApr-6-2023, 14:02:53 GMT

The Minimum Description Length (MDL) principle selects the model that has the shortest code for data plus model. We show that for a countable class of models, MDL predictions are close to the true distribution in a strong sense. The result is completely general. No independence, ergodicity, stationarity, identifiability, or other assumption on the model class need to be made. More formally, we show that for any countable class of models, the distributions selected by MDL (or MAP) asymptotically predict (merge with) the true measure in the class in total variation distance.

artificial intelligence, discrete mdl predict, machine learning, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.70)

Add feedback

Universal coding, intrinsic volumes, and metric complexity

Mourtada, Jaouad

arXiv.org Machine LearningMar-13-2023

We study sequential probability assignment in the Gaussian setting, where the goal is to predict, or equivalently compress, a sequence of real-valued observations almost as well as the best Gaussian distribution with mean constrained to a given subset of $\mathbf{R}^n$. First, in the case of a convex constraint set $K$, we express the hardness of the prediction problem (the minimax regret) in terms of the intrinsic volumes of $K$; specifically, it equals the logarithm of the Wills functional from convex geometry. We then establish a comparison inequality for the Wills functional in the general nonconvex case, which underlines the metric nature of this quantity and generalizes the Slepian-Sudakov-Fernique comparison principle for the Gaussian width. Motivated by this inequality, we characterize the exact order of magnitude of the considered functional for a general nonconvex set, in terms of global covering numbers and local Gaussian widths. This implies metric isomorphic estimates for the log-Laplace transform of the intrinsic volume sequence of a convex body. As part of our analysis, we also characterize the minimax redundancy for a general constraint set. We finally relate and contrast our findings with classical asymptotic results in information theory.

artificial intelligence, inequality, machine learning, (19 more...)

arXiv.org Machine Learning

2303.07279

Country: Europe (0.67)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.50)

Add feedback

Evaluating Representations with Readout Model Switching

Li, Yazhe, Bornschein, Jorg, Hutter, Marcus

arXiv.org Artificial IntelligenceFeb-19-2023

Although much of the success of Deep Learning builds on learning good representations, a rigorous method to evaluate their quality is lacking. In this paper, we treat the evaluation of representations as a model selection problem and propose to use the Minimum Description Length (MDL) principle to devise an evaluation metric. Contrary to the established practice of limiting the capacity of the readout model, we design a hybrid discrete and continuous-valued model space for the readout models and employ a switching strategy to combine their predictions. The MDL score takes model complexity, as well as data efficiency into account. As a result, the most appropriate model for the specific task and representation will be chosen, making it a unified measure for comparison. The proposed metric can be efficiently computed with an online method and we present results for pre-trained vision encoders of various architectures (ResNet and ViT) and objective functions (supervised and self-supervised) on a range of downstream tasks. We compare our methods with accuracy-based approaches and show that the latter are inconsistent when multiple readout models are used. Finally, we discuss important properties revealed by our evaluations such as model scaling, preferred readout model, and data efficiency.

artificial intelligence, machine learning, representation, (18 more...)

arXiv.org Artificial Intelligence

2302.09579

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Interpolation Learning With Minimum Description Length

Manoj, Naren Sarayu, Srebro, Nathan

arXiv.org Artificial IntelligenceFeb-14-2023

We prove that the Minimum Description Length learning rule exhibits tempered overfitting. We obtain tempered agnostic finite sample learning guarantees and characterize the asymptotic behavior in the presence of random label noise.

artificial intelligence, description length, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2302.07263

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback