Minimum Complexity Machines
Causal Discovery from Event Sequences by Local Cause-Effect Attribution
Sequences of events, such as crashes in the stock market or outages in a network, contain strong temporal dependencies, whose understanding is crucial to react to and influence future events. In this paper, we study the problem of discovering the underlying causal structure from event sequences. To this end, we introduce a new causal model, where individual events of the cause trigger events of the effect with dynamic delays. We show that in contrast to existing methods based on Granger causality, our model is identifiable for both instant and delayed effects.We base our approach on the Algorithmic Markov Condition, by which we identify the true causal network as the one that minimizes the Kolmogorov complexity. As the Kolmogorov complexity is not computable, we instantiate our model using Minimum Description Length and show that the resulting score identifies the causal direction.
Seqret: Mining Rule Sets from Event Sequences
Siji, Aleena, Cüppers, Joscha, Mian, Osman Ali, Vreeken, Jilles
Summarizing event sequences is a key aspect of data mining. Most existing methods neglect conditional dependencies and focus on discovering sequential patterns only. In this paper, we study the problem of discovering both conditional and unconditional dependencies from event sequence data. We do so by discovering rules of the form $X \rightarrow Y$ where $X$ and $Y$ are sequential patterns. Rules like these are simple to understand and provide a clear description of the relation between the antecedent and the consequent. To discover succinct and non-redundant sets of rules we formalize the problem in terms of the Minimum Description Length principle. As the search space is enormous and does not exhibit helpful structure, we propose the Seqret method to discover high-quality rule sets in practice. Through extensive empirical evaluation we show that unlike the state of the art, Seqret ably recovers the ground truth on synthetic datasets and finds useful rules from real datasets.
- North America > United States > Texas > Harris County > Houston (0.04)
- North America > United States > Texas > Dallas County > Dallas (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (9 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.48)
Comparative Analysis of MDL-VAE vs. Standard VAE on 202 Years of Gynecological Data
This study presents a comparative evaluation of a Variational Autoencoder (VAE) enhanced with Minimum Description Length (MDL) regularization against a Standard Autoencoder for reconstructing high - dimensional gynecological data. The MDL - VAE exhibits significantly lower reconstruction errors (MSE, MAE, RMSE) and more structured latent representations, driven by effective KL divergence regularization. Statistical analyses confirm these performance improvements are significant. Furthermore, the MDL - VAE shows consistent training and validation losses and achieves efficient inference times, underscoring its robustness and practical viability. Our findings suggest that incorporating MDL principles into VAE architectures can substantially improve data reconstruction and generalization, making it a promising approach for advanced applica tions in healthcare data modeling and analysis. Despite substantial advances in medical research, early detection of menstrual disorders and tumors in the female reproductive system remains a significant challenge. This issue is critical because timely detection is essential for improving treatment outcomes, quality of life, and patient survival rates.
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.56)
Some Insights of Construction of Feature Graph to Learn Pairwise Feature Interactions with Graph Neural Networks
Yamchote, Phaphontee, Win, Saw Nay Htet, Amornbunchornvej, Chainarong, Noraset, Thanapon
Feature interaction is crucial in predictive machine learning models, as it captures the relationships between features that influence model performance. In this work, we focus on pairwise interactions and investigate their importance in constructing feature graphs for Graph Neural Networks (GNNs). Rather than proposing new methods, we leverage existing GNN models and tools to explore the relationship between feature graph structures and their effectiveness in modeling interactions. Through experiments on synthesized datasets, we uncover that edges between interacting features are important for enabling GNNs to model feature interactions effectively. We also observe that including non-interaction edges can act as noise, degrading model performance. Furthermore, we provide theoretical support for sparse feature graph selection using the Minimum Description Length (MDL) principle. We prove that feature graphs retaining only necessary interaction edges yield a more efficient and interpretable representation than complete graphs, aligning with Occam's Razor. Our findings offer both theoretical insights and practical guidelines for designing feature graphs that improve the performance and interpretability of GNN models.
- Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.61)
Towards Practical Few-shot Query Sets: Transductive Minimum Description Length Inference
Standard few-shot benchmarks are often built upon simplifying assumptions on the query sets, which may not always hold in practice. In particular, for each task at testing time, the classes effectively present in the unlabeled query set are known a priori, and correspond exactly to the set of classes represented in the labeled support set. We relax these assumptions and extend current benchmarks, so that the query-set classes of a given task are unknown, but just belong to a much larger set of possible classes. Our setting could be viewed as an instance of the challenging yet practical problem of extremely imbalanced K -way classification, K being much larger than the values typically used in standard benchmarks, and with potentially irrelevant supervision from the support set. Motivated by these observations, we introduce a \textbf{P}rim\textbf{A}l \textbf{D}ual Minimum \textbf{D}escription \textbf{LE}ngth (\textbf{PADDLE}) formulation, which balances data-fitting accuracy and model complexity for a given few-shot task, under supervision constraints from the support set.
SpaceTime: Causal Discovery from Non-Stationary Time Series
Mameche, Sarah, Cornanguer, Lénaïg, Ninad, Urmi, Vreeken, Jilles
Understanding causality is challenging and often complicated by changing causal relationships over time and across environments. Climate patterns, for example, shift over time with recurring seasonal trends, while also depending on geographical characteristics such as ecosystem variability. Existing methods for discovering causal graphs from time series either assume stationarity, do not permit both temporal and spatial distribution changes, or are unaware of locations with the same causal relationships. In this work, we therefore unify the three tasks of causal graph discovery in the non-stationary multi-context setting, of reconstructing temporal regimes, and of partitioning datasets and time intervals into those where invariant causal relationships hold. To construct a consistent score that forms the basis of our method, we employ the Minimum Description Length principle. Our resulting algorithm SPACETIME simultaneously accounts for heterogeneity across space and non-stationarity over time. Given multiple time series, it discovers regime changepoints and a temporal causal graph using non-parametric functional modeling and kernelized discrepancy testing. We also show that our method provides insights into real-world phenomena such as river-runoff measured at different catchments and biosphere-atmosphere interactions across ecosystems.
- North America > United States > Virginia > Arlington County > Arlington (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Germany > Berlin (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
Unifying Two Types of Scaling Laws from the Perspective of Conditional Kolmogorov Complexity
In 2020, OpenAI proposed the first type of Scaling Laws, describing the relationships between model performance and parameters, data, and compute. In 2024, OpenAI proposed the second type of Scaling Laws, describing the relationship between model inference performance and inference computation. In this paper, we analyze LLM training and inference processes from the perspective of lossless compression using conditional Kolmogorov complexity, and unify these two types of Scaling Laws. We find that both types of Scaling Laws improve approximation of conditional Kolmogorov complexity by increasing execution steps $t$. The first type of Scaling Laws increases $t$ by increasing model parameters $y$. The second type of Scaling Laws increases $t$ by increasing the number of output tokens.
- Workflow (0.49)
- Research Report (0.40)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)
The Complexity Dynamics of Grokking
DeMoss, Branton, Sapora, Silvia, Foerster, Jakob, Hawes, Nick, Posner, Ingmar
We investigate the phenomenon of generalization through the lens of compression. In particular, we study the complexity dynamics of neural networks to explain grokking, where networks suddenly transition from memorizing to generalizing solutions long after over-fitting the training data. To this end we introduce a new measure of intrinsic complexity for neural networks based on the theory of Kolmogorov complexity. Tracking this metric throughout network training, we find a consistent pattern in training dynamics, consisting of a rise and fall in complexity. We demonstrate that this corresponds to memorization followed by generalization. Based on insights from rate--distortion theory and the minimum description length principle, we lay out a principled approach to lossy compression of neural networks, and connect our complexity measure to explicit generalization bounds. Based on a careful analysis of information capacity in neural networks, we propose a new regularization method which encourages networks towards low-rank representations by penalizing their spectral entropy, and find that our regularizer outperforms baselines in total compression of the dataset.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.71)
Graph Community Augmentation with GMM-based Modeling in Latent Space
Fukushima, Shintaro, Yamanishi, Kenji
This study addresses the issue of graph generation with generative models. In particular, we are concerned with graph community augmentation problem, which refers to the problem of generating unseen or unfamiliar graphs with a new community out of the probability distribution estimated with a given graph dataset. The graph community augmentation means that the generated graphs have a new community. There is a chance of discovering an unseen but important structure of graphs with a new community, for example, in a social network such as a purchaser network. Graph community augmentation may also be helpful for generalization of data mining models in a case where it is difficult to collect real graph data enough. In fact, there are many ways to generate a new community in an existing graph. It is desirable to discover a new graph with a new community beyond the given graph while we keep the structure of the original graphs to some extent for the generated graphs to be realistic. To this end, we propose an algorithm called the graph community augmentation (GCA). The key ideas of GCA are (i) to fit Gaussian mixture model (GMM) to data points in the latent space into which the nodes in the original graph are embedded, and (ii) to add data points in the new cluster in the latent space for generating a new community based on the minimum description length (MDL) principle. We empirically demonstrate the effectiveness of GCA for generating graphs with a new community structure on synthetic and real datasets.
CantorNet: A Sandbox for Testing Geometrical and Topological Complexity Measures
Lewandowski, Michal, Eghbalzadeh, Hamid, Moser, Bernhard A.
Many natural phenomena are characterized by self-similarity, for example the symmetry of human faces, or a repetitive motif of a song. Studying of such symmetries will allow us to gain deeper insights into the underlying mechanisms of complex systems. Recognizing the importance of understanding these patterns, we propose a geometrically inspired framework to study such phenomena in artificial neural networks. To this end, we introduce \emph{CantorNet}, inspired by the triadic construction of the Cantor set, which was introduced by Georg Cantor in the $19^\text{th}$ century. In mathematics, the Cantor set is a set of points lying on a single line that is self-similar and has a counter intuitive property of being an uncountably infinite null set. Similarly, we introduce CantorNet as a sandbox for studying self-similarity by means of novel topological and geometrical complexity measures. CantorNet constitutes a family of ReLU neural networks that spans the whole spectrum of possible Kolmogorov complexities, including the two opposite descriptions (linear and exponential as measured by the description length). CantorNet's decision boundaries can be arbitrarily ragged, yet are analytically known. Besides serving as a testing ground for complexity measures, our work may serve to illustrate potential pitfalls in geometry-ignorant data augmentation techniques and adversarial attacks.
- Europe > Switzerland > Basel-City > Basel (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Austria > Upper Austria (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.36)