Computational Learning Theory
Complexity as Advantage: A Regret-Based Perspective on Emergent Structure
We introduce Complexity as Advantage (CAA), a framework that defines the complexity of a system relative to a family of observers. Instead of measuring complexity as an intrinsic property, we evaluate how much predictive regret a system induces for different observers attempting to model it. A system is complex when it is easy for some observers and hard for others, creating an information advantage. We show that this formulation unifies several notions of emergent behavior, including multiscale entropy, predictive information, and observer-dependent structure. The framework suggests that "interesting" systems are those positioned to create differentiated regret across observers, providing a quantitative grounding for why complexity can be functionally valuable. We demonstrate the idea through simple dynamical models and discuss implications for learning, evolution, and artificial agents.
Recursively Enumerably Representable Classes and Computable Versions of the Fundamental Theorem of Statistical Learning
Kattermann, David, Krapp, Lothar Sebastian
We study computable probably approximately correct (CPAC) learning, where learners are required to be computable functions. It had been previously observed that the Fundamental Theorem of Statistical Learning, which characterizes PAC learnability by finiteness of the Vapnik-Chervonenkis (VC-)dimension, no longer holds in this framework. Recent works recovered analogs of the Fundamental Theorem in the computable setting, for instance by introducing an effective VC-dimension. Guided by this, we investigate the connection between CPAC learning and recursively enumerable representable (RER) classes, whose members can be algorithmically listed. Our results show that the effective VC-dimensions can take arbitrary values above the traditional one, even for RER classes, which creates a whole family of (non-)examples for various notions of CPAC learning. Yet the two dimensions coincide for classes satisfying sufficiently strong notions of CPAC learning. We then observe that CPAC learnability can also be characterized via containment of RER classes that realize the same samples. Furthermore, it is shown that CPAC learnable classes satisfying a unique identification property are necessarily RER. Finally, we establish that agnostic learnability can be guaranteed for RER classes, by considering the relaxed notion of nonuniform CPAC learning.
The Limits of AI Explainability: An Algorithmic Information Theory Approach
This paper establishes a theoretical foundation for understanding the fundamental limits of AI explainability through algorithmic information theory. We formalize explainability as the approximation of complex models by simpler ones, quantifying both approximation error and explanation complexity using Kolmogorov complexity. Our key theoretical contributions include: (1) a complexity gap theorem proving that any explanation significantly simpler than the original model must differ from it on some inputs; (2) precise bounds showing that explanation complexity grows exponentially with input dimension but polynomially with error tolerance for Lipschitz functions; and (3) a characterization of the gap between local and global explainability, demonstrating that local explanations can be significantly simpler while maintaining accuracy in relevant regions. We further establish a regulatory impossibility theorem proving that no governance framework can simultaneously pursue unrestricted AI capabilities, human-interpretable explanations, and negligible error. These results highlight considerations likely to be relevant to the design, evaluation, and oversight of explainable AI systems.
Sorting by Strip Swaps is NP-Hard
Roy, Swapnoneel, Asaithambi, Asai, Mukhopadhyay, Debajyoti
We show that \emph{Sorting by Strip Swaps} (SbSS) is NP-hard by a polynomial reduction of \emph{Block Sorting}. The key idea is a local gadget, a \emph{cage}, that replaces every decreasing adjacency $(a_i,a_{i+1})$ by a guarded triple $a_i,m_i,a_{i+1}$ enclosed by guards $L_i,U_i$, so the only decreasing adjacencies are the two inside the cage. Small \emph{hinge} gadgets couple adjacent cages that share an element and enforce that a strip swap that removes exactly two adjacencies corresponds bijectively to a block move that removes exactly one decreasing adjacency in the source permutation. This yields a clean equivalence between exact SbSS schedules and perfect block schedules, establishing NP-hardness.
Few-Shot Multimodal Medical Imaging: A Theoretical Framework
Mohsin, Md Talha, Abdulrashid, Ismail
Medical imaging relies heavily on large, labeled datasets. But, unfortunately, they are not always easily accessible in clinical settings. Additionally, many practitioners often face various structural obstacles like limited data availability, fragmented data systems, and unbalanced datasets. These barriers often lead to the increased diagnostic uncertainty, underrepresentation of certain conditions, reduced model robustness, and biased diagnostic decisions. In response to these challenges, approaches such as transfer learning, meta-learning, and multimodal fusion have made great strides. However, they still need a solid theoretical justification for why they succeed or fail in situations where data is scarce. To address this gap, we propose a unified theoretical framework that characterizes learning and inference under low-resource medical imaging conditions. We first formalize the learning objective under few-shot conditions and compute sample complexity constraints to estimate the smallest quantity of data needed to achieve clinically reliable accuracy. Then based on ideas from PAC-learning and PAC-Bayesian theory, we explain how multimodal integration encourages generalization and quantifies uncertainty under sparse supervision. We further propose a formal metric for explanation stability, offering interpretability guarantees under low-data conditions. Taken together, the proposed framework establishes a principled foundation for constructing dependable, data-efficient diagnostic systems by jointly characterizing sample efficiency, uncertainty quantification, and interpretability in a unified theoretical setting.
The Information-Theoretic Imperative: Compression and the Epistemic Foundations of Intelligence
Dittrich, Christian, Kinne, Jennifer Flygare
Existing frameworks converge on the centrality of compression to intelligence but leave underspecified why this process enforces the discovery of causal structure rather than superficial statistical patterns. We introduce a two-level framework to address this gap. The Information-Theoretic Imperative (ITI) establishes that any system persisting in uncertain environments must minimize epistemic entropy through predictive compression: this is the evolutionary "why" linking survival pressure to information-processing demands. The Compression Efficiency Principle (CEP) specifies how efficient compression mechanically selects for generative, causal models through exception-accumulation dynamics, making reality alignment a consequence rather than a contingent achievement. Together, ITI and CEP define a causal chain: from survival pressure to prediction necessity, compression requirement, efficiency optimization, generative structure discovery, and ultimately reality alignment. Each link follows from physical, information-theoretic, or evolutionary constraints, implying that intelligence is the mechanically necessary outcome of persistence in structured environments. This framework yields empirically testable predictions: compression efficiency, measured as approach to the rate-distortion frontier, correlates with out-of-distribution generalization; exception-accumulation rates differentiate causal from correlational models; hierarchical systems exhibit increasing efficiency across abstraction layers; and biological systems demonstrate metabolic costs that track representational complexity. ITI and CEP thereby provide a unified account of convergence across biological, artificial, and multi-scale systems, addressing the epistemic and functional dimensions of intelligence without invoking assumptions about consciousness or subjective experience.
Symbolic Snapshot Ensembles
Inductive logic programming (ILP) is a form of logical machine learning. Most ILP algorithms learn a single hypothesis from a single training run. Ensemble methods train an ILP algorithm multiple times to learn multiple hypotheses. In this paper, we train an ILP algorithm only once and save intermediate hypotheses. We then combine the hypotheses using a minimum description length weighting scheme. Our experiments on multiple benchmarks, including game playing and visual reasoning, show that our approach improves predictive accuracy by 4% with less than 1% computational overhead.
Last Iterate Analyses of FTRL in Stochasitc Bandits
Zhan, Jingxin, Han, Yuze, Zhang, Zhihua
The convergence analysis of online learning algorithms is central to machine learning theory, where last-iterate convergence is particularly important, as it captures the learner's actual decisions and describes the evolution of the learning process over time. However, in multi-armed bandits, most existing algorithmic analyses mainly focus on the order of regret, while the last-iterate (simple regret) convergence rate remains less explored -- especially for the widely studied Follow-the-Regularized-Leader (FTRL) algorithms. Recently, a growing line of work has established the Best-of-Both-Worlds (BOBW) property of FTRL algorithms in bandit problems, showing in particular that they achieve logarithmic regret in stochastic bandits. Nevertheless, their last-iterate convergence rate has not yet been studied. Intuitively, logarithmic regret should correspond to a $t^{-1}$ last-iterate convergence rate. This paper partially confirms this intuition through theoretical analysis, showing that the Bregman divergence, defined by the regular function $Ψ(p)=-4\sum_{i=1}^{d}\sqrt{p_i}$ associated with the BOBW FTRL algorithm $1/2$-Tsallis-INF (arXiv:1807.07623), between the point mass on the optimal arm and the probability distribution over the arm set obtained at iteration $t$, decays at a rate of $t^{-1/2}$.
Approximate Replicability in Learning
Hopkins, Max, Impagliazzo, Russell, Ye, Christopher
Replicability, introduced by (Impagliazzo et al. STOC '22), is the notion that algorithms should remain stable under a resampling of their inputs (given access to shared randomness). While a strong and interesting notion of stability, the cost of replicability can be prohibitive: there is no replicable algorithm, for instance, for tasks as simple as threshold learning (Bun et al. STOC '23). Given such strong impossibility results we ask: under what approximate notions of replicability is learning possible? In this work, we propose three natural relaxations of replicability in the context of PAC learning: (1) Pointwise: the learner must be consistent on any fixed input, but not across all inputs simultaneously, (2) Approximate: the learner must output hypotheses that classify most of the distribution consistently, (3) Semi: the algorithm is fully replicable, but may additionally use shared unlabeled samples. In all three cases, for constant replicability parameters, we obtain sample-optimal agnostic PAC learners: (1) and (2) are achievable for ``free" using $Θ(d/α^2)$ samples, while (3) requires $Θ(d^2/α^2)$ labeled samples.
The Parameterized Complexity of Computing the VC-Dimension
Foucaud, Florent, Gahlawat, Harmender, Inerney, Fionn Mc, Tale, Prafullkumar
The VC-dimension is a well-studied and fundamental complexity measure of a set system (or hypergraph) that is central to many areas of machine learning. We establish several new results on the complexity of computing the VC-dimension. In particular, given a hypergraph $\mathcal{H}=(\mathcal{V},\mathcal{E})$, we prove that the naive $2^{\mathcal{O}(|\mathcal{V}|)}$-time algorithm is asymptotically tight under the Exponential Time Hypothesis (ETH). We then prove that the problem admits a $1$-additive fixed-parameter approximation algorithm when parameterized by the maximum degree of $\mathcal{H}$ and a fixed-parameter algorithm when parameterized by its dimension, and that these are essentially the only such exploitable structural parameters. Lastly, we consider a generalization of the problem, formulated using graphs, which captures the VC-dimension of both set systems and graphs. We design a $2^{\mathcal{O}(\rm{tw}\cdot \log \rm{tw})}\cdot |V|$-time algorithm for any graph $G=(V,E)$ of treewidth $\rm{tw}$ (which, for a set system, applies to the treewidth of its incidence graph). This is in contrast with closely related problems that require a double-exponential dependency on the treewidth (assuming the ETH).