AITopics | Minimum Complexity Machines

Collaborating Authors

Minimum Complexity Machines

News Overviews Instructional Materials AI-Alerts Classics

SpaceTime: Causal Discovery from Non-Stationary Time Series

Mameche, Sarah, Cornanguer, Lénaïg, Ninad, Urmi, Vreeken, Jilles

arXiv.org Artificial IntelligenceJan-17-2025

Understanding causality is challenging and often complicated by changing causal relationships over time and across environments. Climate patterns, for example, shift over time with recurring seasonal trends, while also depending on geographical characteristics such as ecosystem variability. Existing methods for discovering causal graphs from time series either assume stationarity, do not permit both temporal and spatial distribution changes, or are unaware of locations with the same causal relationships. In this work, we therefore unify the three tasks of causal graph discovery in the non-stationary multi-context setting, of reconstructing temporal regimes, and of partitioning datasets and time intervals into those where invariant causal relationships hold. To construct a consistent score that forms the basis of our method, we employ the Minimum Description Length principle. Our resulting algorithm SPACETIME simultaneously accounts for heterogeneity across space and non-stationarity over time. Given multiple time series, it discovers regime changepoints and a temporal causal graph using non-parametric functional modeling and kernelized discrepancy testing. We also show that our method provides insights into real-world phenomena such as river-runoff measured at different catchments and biosphere-atmosphere interactions across ecosystems.

artificial intelligence, machine learning, mechanism, (14 more...)

arXiv.org Artificial Intelligence

2501.10235

Country:

Europe (0.46)
North America > United States > Virginia (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

Unifying Two Types of Scaling Laws from the Perspective of Conditional Kolmogorov Complexity

Wan, Jun

arXiv.org Artificial IntelligenceJan-12-2025

In 2020, OpenAI proposed the first type of Scaling Laws, describing the relationships between model performance and parameters, data, and compute. In 2024, OpenAI proposed the second type of Scaling Laws, describing the relationship between model inference performance and inference computation. In this paper, we analyze LLM training and inference processes from the perspective of lossless compression using conditional Kolmogorov complexity, and unify these two types of Scaling Laws. We find that both types of Scaling Laws improve approximation of conditional Kolmogorov complexity by increasing execution steps $t$. The first type of Scaling Laws increases $t$ by increasing model parameters $y$. The second type of Scaling Laws increases $t$ by increasing the number of output tokens.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2501.06802

Genre:

Workflow (0.49)
Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)

Add feedback

The Complexity Dynamics of Grokking

DeMoss, Branton, Sapora, Silvia, Foerster, Jakob, Hawes, Nick, Posner, Ingmar

arXiv.org Artificial IntelligenceDec-12-2024

We investigate the phenomenon of generalization through the lens of compression. In particular, we study the complexity dynamics of neural networks to explain grokking, where networks suddenly transition from memorizing to generalizing solutions long after over-fitting the training data. To this end we introduce a new measure of intrinsic complexity for neural networks based on the theory of Kolmogorov complexity. Tracking this metric throughout network training, we find a consistent pattern in training dynamics, consisting of a rise and fall in complexity. We demonstrate that this corresponds to memorization followed by generalization. Based on insights from rate--distortion theory and the minimum description length principle, we lay out a principled approach to lossy compression of neural networks, and connect our complexity measure to explicit generalization bounds. Based on a careful analysis of information capacity in neural networks, we propose a new regularization method which encourages networks towards low-rank representations by penalizing their spectral entropy, and find that our regularizer outperforms baselines in total compression of the dataset.

artificial intelligence, complexity, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2412.0981

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.71)

Add feedback

CantorNet: A Sandbox for Testing Geometrical and Topological Complexity Measures

Lewandowski, Michal, Eghbalzadeh, Hamid, Moser, Bernhard A.

arXiv.org Machine LearningDec-2-2024

Many natural phenomena are characterized by self-similarity, for example the symmetry of human faces, or a repetitive motif of a song. Studying of such symmetries will allow us to gain deeper insights into the underlying mechanisms of complex systems. Recognizing the importance of understanding these patterns, we propose a geometrically inspired framework to study such phenomena in artificial neural networks. To this end, we introduce \emph{CantorNet}, inspired by the triadic construction of the Cantor set, which was introduced by Georg Cantor in the $19^\text{th}$ century. In mathematics, the Cantor set is a set of points lying on a single line that is self-similar and has a counter intuitive property of being an uncountably infinite null set. Similarly, we introduce CantorNet as a sandbox for studying self-similarity by means of novel topological and geometrical complexity measures. CantorNet constitutes a family of ReLU neural networks that spans the whole spectrum of possible Kolmogorov complexities, including the two opposite descriptions (linear and exponential as measured by the description length). CantorNet's decision boundaries can be arbitrarily ragged, yet are analytically known. Besides serving as a testing ground for complexity measures, our work may serve to illustrate potential pitfalls in geometry-ignorant data augmentation techniques and adversarial attacks.

artificial intelligence, cantornet, machine learning, (18 more...)

arXiv.org Machine Learning

2411.19713

Country: Europe > Austria (0.28)

Genre: Research Report (0.40)

Industry: Government (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.36)

Add feedback

Graph Community Augmentation with GMM-based Modeling in Latent Space

Fukushima, Shintaro, Yamanishi, Kenji

arXiv.org Machine LearningDec-2-2024

This study addresses the issue of graph generation with generative models. In particular, we are concerned with graph community augmentation problem, which refers to the problem of generating unseen or unfamiliar graphs with a new community out of the probability distribution estimated with a given graph dataset. The graph community augmentation means that the generated graphs have a new community. There is a chance of discovering an unseen but important structure of graphs with a new community, for example, in a social network such as a purchaser network. Graph community augmentation may also be helpful for generalization of data mining models in a case where it is difficult to collect real graph data enough. In fact, there are many ways to generate a new community in an existing graph. It is desirable to discover a new graph with a new community beyond the given graph while we keep the structure of the original graphs to some extent for the generated graphs to be realistic. To this end, we propose an algorithm called the graph community augmentation (GCA). The key ideas of GCA are (i) to fit Gaussian mixture model (GMM) to data points in the latent space into which the nodes in the original graph are embedded, and (ii) to add data points in the new cluster in the latent space for generating a new community based on the minimum description length (MDL) principle. We empirically demonstrate the effectiveness of GCA for generating graphs with a new community structure on synthetic and real datasets.

artificial intelligence, data mining, machine learning, (21 more...)

arXiv.org Machine Learning

2412.01163

Country: Asia > Japan (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.48)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
(2 more...)

Add feedback

Symbolic regression via MDLformer-guided search: from minimizing prediction error to minimizing description length

Yu, Zihan, Ding, Jingtao, Li, Yong

arXiv.org Artificial IntelligenceNov-6-2024

Symbolic regression, a task discovering the formula best fitting the given data, is typically based on the heuristical search. These methods usually update candidate formulas to obtain new ones with lower prediction errors iteratively. However, since formulas with similar function shapes may have completely different symbolic forms, the prediction error does not decrease monotonously as the search approaches the target formula, causing the low recovery rate of existing methods. To solve this problem, we propose a novel search objective based on the minimum description length, which reflects the distance from the target and decreases monotonically as the search approaches the correct form of the target formula. To estimate the minimum description length of any input data, we design a neural network, MDLformer, which enables robust and scalable estimation through large-scale training. With the MDLformer's output as the search objective, we implement a symbolic regression method, SR4MDL, that can effectively recover the correct mathematical form of the formula. Extensive experiments illustrate its excellent performance in recovering formulas from data. Our method successfully recovers around 50 formulas across two benchmark datasets comprising 133 problems, outperforming state-of-the-art methods by 43.92%.

evolutionary algorithm, formula, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2411.03753

Country: North America > United States (0.47)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.94)
(2 more...)

Add feedback

Interpretability as Compression: Reconsidering SAE Explanations of Neural Activations with MDL-SAEs

Ayonrinde, Kola, Pearce, Michael T., Sharkey, Lee

arXiv.org Artificial IntelligenceOct-14-2024

Sparse Autoencoders (SAEs) have emerged as a useful tool for interpreting the internal representations of neural networks. However, naively optimising SAEs for reconstruction loss and sparsity results in a preference for SAEs that are extremely wide and sparse. We present an information-theoretic framework for interpreting SAEs as lossy compression algorithms for communicating explanations of neural activations. We appeal to the Minimal Description Length (MDL) principle to motivate explanations of activations which are both accurate and concise. We further argue that interpretable SAEs require an additional property, "independent additivity": features should be able to be understood separately. We demonstrate an example of applying our MDL-inspired framework by training SAEs on MNIST handwritten digits and find that SAE features representing significant line segments are optimal, as opposed to SAEs with features for memorised digits from the dataset or small digit fragments. We argue that using MDL rather than sparsity may avoid potential pitfalls with naively maximising sparsity such as undesirable feature splitting and that this framework naturally suggests new hierarchical SAE architectures which provide more concise explanations.

artificial intelligence, description length, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2410.11179

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

Introducing Routing Uncertainty in Capsule Networks

Neural Information Processing SystemsOct-10-2024, 02:51:42 GMT

Rather than performing inefficient local iterative routing between adjacent capsule layers, we propose an alternative global view based on representing the inherent uncertainty in part-object assignment. In our formulation, the local routing iterations are replaced with variational inference of part-object connections in a probabilistic capsule network, leading to a significant speedup without sacrificing performance. In this way, global context is also considered when routing capsules by introducing global latent variables that have direct influence on the objective function, and are updated discriminatively in accordance with the minimum description length (MDL) principle. We focus on enhancing capsule network properties, and perform a thorough evaluation on pose-aware tasks, observing improvements in performance over previous approaches whilst being more computationally efficient.

artificial intelligence, machine learning, routing uncertainty, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.68)

Add feedback

Minimum Description Length and Generalization Guarantees for Representation Learning

Neural Information Processing SystemsOct-9-2024, 09:24:15 GMT

A major challenge in designing efficient statistical supervised learning algorithms is finding representations that perform well not only on available training samples but also on unseen data. While the study of representation learning has spurred much interest, most existing such approaches are heuristic; and very little is known about theoretical generalization guarantees. For example, the information bottleneck method seeks a good generalization by finding a minimal description of the input that is maximally informative about the label variable, where minimality and informativeness are both measured by Shannon's mutual information. In this paper, we establish a compressibility framework that allows us to derive upper bounds on the generalization error of a representation learning algorithm in terms of the Minimum Description Length'' (MDL) of the labels or the latent variables (representations). Rather than the mutual information between the encoder's input and the representation, which is often believed to reflect the algorithm's generalization capability in the related literature but in fact, falls short of doing so, our new bounds involve the "multi-letter" relative entropy between the distribution of the representations (or labels) of the training and test sets and a fixed prior.

artificial intelligence, machine learning, representation, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.59)

Add feedback

Towards Practical Few-Shot Query Sets: Transductive Minimum Description Length Inference Ségolène Martin Malik Boudiaf Emilie Chouzenoux

Neural Information Processing SystemsOct-5-2024, 16:53:04 GMT

Standard few-shot benchmarks are often built upon simplifying assumptions on the query sets, which may not always hold in practice. In particular, for each task at testing time, the classes effectively present in the unlabeled query set are known a priori, and correspond exactly to the set of classes represented in the labeled support set. We relax these assumptions and extend current benchmarks, so that the query-set classes of a given task are unknown, but just belong to a much larger set of possible classes. Our setting could be viewed as an instance of the challenging yet practical problem of extremely imbalanced K-way classification, K being much larger than the values typically used in standard benchmarks, and with potentially irrelevant supervision from the support set.

artificial intelligence, machine learning, optimization problem, (17 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Data Science (0.93)
(2 more...)

Add feedback