Goto

Collaborating Authors

 binary representation





Deep Learning as a Convex Paradigm of Computation: Minimizing Circuit Size with ResNets

Jacot, Arthur

arXiv.org Machine Learning

Technically speaking, no statistical model can be strictly better than another, and each model can be optimal under certain assumptions. However, it has been argued that there is a statistical model that outmatches all others up to constants: finding the minimal Kolmogorov complexity function, i.e. the program with the minimal description length, that fits the data [42, 43]. Such an ideal statistical model would be at least as good as any model with short descriptions [20] which basically includes all models that have been described within a single scientific paper. But this perfect statistical model appears unattainable because we know that Kolmogorov complexity is undecidable in general, and even less optimizable. In this paper, we prove results that suggest that Residual Networks (ResNets) training is implementing a weaker version of this ideal model: greedily minimizing circuit complexity in the so-called Harder than Monte Carlo (HTMC) regime. These simplifications can be summarized as follows: Minimal Circuit Size Problem (MCSP): searching for an interpolating circuit with minimal minimal number of operations is "only" NP.



Transformers know more than they can tell -- Learning the Collatz sequence

Charton, François, Narayanan, Ashvni

arXiv.org Artificial Intelligence

We investigate transformer prediction of long Collatz steps, a complex arithmetic function that maps odd integers to their distant successors in the Collatz sequence ( $u_{n+1}=u_n/2$ if $u_n$ is even, $u_{n+1}=(3u_n+1)/2$ if $u_n$ is odd). Model accuracy varies with the base used to encode input and output. It can be as high as $99.7\%$ for bases $24$ and $32$, and as low as $37$ and $25\%$ for bases $11$ and $3$. Yet, all models, no matter the base, follow a common learning pattern. As training proceeds, they learn a sequence of classes of inputs that share the same residual modulo $2^p$. Models achieve near-perfect accuracy on these classes, and less than $1\%$ for all other inputs. This maps to a mathematical property of Collatz sequences: the length of the loops involved in the computation of a long Collatz step can be deduced from the binary representation of its input. The learning pattern reflects the model learning to predict inputs associated with increasing loop lengths. An analysis of failure cases reveals that almost all model errors follow predictable patterns. Hallucination, a common feature of large language models, almost never happens. In over $90\%$ of failures, the model performs the correct calculation, but wrongly estimates loop lengths. Our observations give a full account of the algorithms learned by the models. They suggest that the difficulty of learning such complex arithmetic function lies in figuring the control structure of the computation -- the length of the loops. We believe that the approach outlined here, using mathematical problems as tools for understanding, explaining, and perhaps improving language models, can be applied to a broad range of problems and bear fruitful results.



Are Large Reasoning Models Interruptible?

Wu, Tsung-Han, Miroyan, Mihran, Chan, David M., Darrell, Trevor, Norouzi, Narges, Gonzalez, Joseph E.

arXiv.org Artificial Intelligence

Large Reasoning Models (LRMs) excel at complex reasoning but are traditionally evaluated in static, "frozen world" settings: model responses are assumed to be instantaneous, and the context of a request is presumed to be immutable over the duration of the response. While generally true for short-term tasks, the "frozen world" assumption breaks down in modern reasoning tasks such as assistive programming, where models may take hours to think through problems and code may change dramatically from the time the model starts thinking to the model's final output. In this work, we challenge the frozen world assumption and evaluate LRM robustness under two realistic dynamic scenarios: interruptions, which test the quality of the model's partial outputs on a limited budget, and dynamic context, which tests model adaptation to in-flight changes. Across mathematics and programming benchmarks that require long-form reasoning, static evaluations consistently overestimate robustness: even state-of-the-art LRMs, which achieve high accuracy in static settings, can fail unpredictably when interrupted or exposed to changing context, with performance dropping by up to 60% when updates are introduced late in the reasoning process. Our analysis further reveals several novel failure modes, including reasoning leakage, where models fold the reasoning into their final answer when interrupted; panic, where under time pressure models abandon reasoning entirely and return incorrect answers; and self-doubt, where performance degrades while incorporating updated information. Project Page: http://dynamic-lm.github.io/



Representation Learning of Auxiliary Concepts for Improved Student Modeling and Exercise Recommendation

Badran, Yahya, Preisach, Christine

arXiv.org Artificial Intelligence

Personalized recommendation is a key feature of intelligent tutoring systems, typically relying on accurate models of student knowledge. Knowledge Tracing (KT) models enable this by estimating a student's mastery based on their historical interactions. Many KT models rely on human-annotated knowledge concepts (KCs), which tag each exercise with one or more skills or concepts believed to be necessary for solving it. However, these KCs can be incomplete, error-prone, or overly general. In this paper, we propose a deep learning model that learns sparse binary representations of exercises, where each bit indicates the presence or absence of a latent concept. We refer to these representations as auxiliary KCs. These representations capture conceptual structure beyond human-defined annotations and are compatible with both classical models (e.g., BKT) and modern deep learning KT architectures. We demonstrate that incorporating auxiliary KCs improves both student modeling and adaptive exercise recommendation. For student modeling, we show that augmenting classical models like BKT with auxiliary KCs leads to improved predictive performance. For recommendation, we show that using auxiliary KCs enhances both reinforcement learning-based policies and a simple planning-based method (expectimax), resulting in measurable gains in student learning outcomes within a simulated student environment.