surprisal
Delightful Distributed Policy Gradient
Distributed reinforcement learning trains on data from stale, buggy, or mismatched actors, producing actions with high surprisal (negative log-probability) under the learner's policy. The core difficulty is not surprising data per se, but \emph{negative learning from surprising data}. High-surprisal failures can dominate the update direction despite carrying little useful signal, while high-surprisal successes reveal opportunities the current policy would otherwise miss. The \textit{Delightful Policy Gradient} (DG) separates these cases by gating each update with delight, the product of advantage and surprisal, suppressing rare failures and amplifying rare successes without behavior probabilities. Under contaminated sampling, the cosine similarity between the standard policy gradient and the true gradient collapses, while DG's grows as the policy improves. No sign-blind reweighting, including exact importance sampling, can reproduce this effect. On MNIST with simulated staleness, DG without off-policy correction outperforms importance-weighted PG with exact behavior probabilities. On a transformer sequence task with staleness, actor bugs, reward corruption, and rare discovery, DG achieves roughly $10{\times}$ lower error. When all four frictions act simultaneously, its compute advantage is order-of-magnitude and grows with task complexity.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
AutoDiscovery: Open-ended Scientific Discovery via Bayesian Surprise
Agarwal, Dhruv, Majumder, Bodhisattwa Prasad, Adamson, Reece, Chakravorty, Megha, Gavireddy, Satvika Reddy, Parashar, Aditya, Surana, Harshit, Mishra, Bhavana Dalvi, McCallum, Andrew, Sabharwal, Ashish, Clark, Peter
The promise of autonomous scientific discovery (ASD) hinges not only on answering questions, but also on knowing which questions to ask. Most recent works in ASD explore the use of large language models (LLMs) in goal-driven settings, relying on human-specified research questions to guide hypothesis generation. However, scientific discovery may be accelerated further by allowing the AI system to drive exploration by its own criteria. The few existing approaches in open-ended ASD select hypotheses based on diversity heuristics or subjective proxies for human interestingness, but the former struggles to meaningfully navigate the typically vast hypothesis space, and the latter suffers from imprecise definitions. This paper presents AutoDiscovery -- a method for open-ended ASD that instead drives scientific exploration using Bayesian surprise. Here, we quantify the epistemic shift from the LLM's prior beliefs about a hypothesis to its posterior beliefs after gathering experimental results. To efficiently explore the space of nested hypotheses, our method employs a Monte Carlo tree search (MCTS) strategy with progressive widening using surprisal as the reward function. We evaluate AutoDiscovery in the setting of data-driven discovery across 21 real-world datasets spanning domains such as biology, economics, finance, and behavioral science. Our results demonstrate that under a fixed budget, AutoDiscovery substantially outperforms competitors by producing 5-29% more discoveries deemed surprising by the LLM. Our human evaluation further reveals that two-thirds of discoveries made by our system are surprising to domain experts as well, suggesting this is an important step towards building open-ended ASD systems.
- Europe (0.46)
- North America > United States (0.28)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study > Negative Result (0.34)
Perceptually Aligning Representations of Music via Noise-Augmented Autoencoders
Bjare, Mathias Rose, Cantisani, Giorgia, Pasini, Marco, Lattner, Stefan, Widmer, Gerhard
We argue that training autoencoders to reconstruct inputs from noised versions of their encodings, when combined with perceptual losses, yields encodings that are structured according to a perceptual hierarchy. We demonstrate the emergence of this hierarchical structure by showing that, after training an audio autoencoder in this manner, perceptually salient information is captured in coarser representation structures than with conventional training. Furthermore, we show that such perceptual hierarchies improve latent diffusion decoding in the context of estimating surprisal in music pitches and predicting EEG-brain responses to music listening. Pretrained weights are available on github.com/CPJKU/pa-audioic.
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > United Kingdom > England > West Yorkshire > Huddersfield (0.04)
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
Surprisal reveals diversity gaps in image captioning and different scorers change the story
Ilinykh, Nikolai, Dobnik, Simon
We quantify linguistic diversity in image captioning with surprisal variance - the spread of token-level negative log-probabilities within a caption set. On the MSCOCO test set, we compare five state-of-the-art vision-and-language LLMs, decoded with greedy and nucleus sampling, to human captions. Measured with a caption-trained n-gram LM, humans display roughly twice the surprisal variance of models, but rescoring the same captions with a general-language model reverses the pattern. Our analysis introduces the surprisal-based diversity metric for image captioning. We show that relying on a single scorer can completely invert conclusions, thus, robust diversity evaluation must report surprisal under several scorers.
- Europe > Switzerland > Zürich > Zürich (0.14)
- Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (18 more...)
Identifying the Periodicity of Information in Natural Language
Ou, Yulin, Wang, Yu, Xu, Yang, Buschmeier, Hendrik
Recent theoretical advancement of information density in natural language has brought the following question on desk: To what degree does natural language exhibit periodicity pattern in its encoded information? We address this question by introducing a new method called AutoPeriod of Surprisal (APS). APS adopts a canonical periodicity detection algorithm and is able to identify any significant periods that exist in the surprisal sequence of a single document. By applying the algorithm to a set of corpora, we have obtained the following interesting results: Firstly, a considerable proportion of human language demonstrates a strong pattern of periodicity in information; Secondly, new periods that are outside the distributions of typical structural units in text (e.g., sentence boundaries, elementary discourse units, etc.) are found and further confirmed via harmonic regression modeling. We conclude that the periodicity of information in language is a joint outcome from both structured factors and other driving factors that take effect at longer distances. The advantages of our periodicity detection method and its potentials in LLM-generation detection are further discussed.
- Europe > Austria > Vienna (0.14)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (8 more...)
CAVE: Detecting and Explaining Commonsense Anomalies in Visual Environments
Bhagwatkar, Rishika, Montariol, Syrielle, Romanou, Angelika, Borges, Beatriz, Rish, Irina, Bosselut, Antoine
Humans can naturally identify, reason about, and explain anomalies in their environment. In computer vision, this long-standing challenge remains limited to industrial defects or unrealistic, synthetically generated anomalies, failing to capture the richness and unpredictability of real-world anomalies. In this work, we introduce CAVE, the first benchmark of real-world visual anomalies. CAVE supports three open-ended tasks: anomaly description, explanation, and justification; with fine-grained annotations for visual grounding and categorizing anomalies based on their visual manifestations, their complexity, severity, and commonness. These annotations draw inspiration from cognitive science research on how humans identify and resolve anomalies, providing a comprehensive framework for evaluating Vision-Language Models (VLMs) in detecting and understanding anomalies. We show that state-of-the-art VLMs struggle with visual anomaly perception and commonsense reasoning, even with advanced prompting strategies. By offering a realistic and cognitively grounded benchmark, CAVE serves as a valuable resource for advancing research in anomaly detection and commonsense reasoning in VLMs.
- North America > United States (0.28)
- North America > Canada > Quebec (0.04)
- Europe > Spain > Galicia > Madrid (0.04)
- (4 more...)
Quantifying the Effects of Word Length, Frequency, and Predictability on Dyslexia
Rydel-Johnston, Hugo, Kafkas, Alex
Division of Psychology, Communication & Human Neuroscience, The University of Manchester Author Note Hugo Rydel - Johnston https://orcid.org/0009 - 0006 - 1103 - 1015 Alex Ka fkas https://orcid.org/0000 - 0001 - 5133 - 8827 We have no conflict s of interest to disclose. Correspondence concerning this article should be addressed to Hugo Rydel - Johnston, Division of Psychology, Communication & Human Neuroscience, The University of Manchester, Oxford Road, Manchester, M13 9PL, UK . DYSLEXIC READING TAKES LONGER 2 Abstract We ask where, and under what conditions, dyslexic reading costs arise in a large - scale naturalistic reading dataset. Using eye - tracking aligned to word - level properties -- word length, frequency, and predictability -- we model the influence of each of these feat ures on dyslexic time costs. We find that all three properties robustly change reading times in both typical and dyslexic readers, but dyslexic readers show stronger sensitivities to each of the three features, especially predictability. Counterfactual man ipulations of these features substantially narrow the dyslexic - control gap -- by about one - third -- with predictability showing the strongest effect, followed by length, and frequency. These patterns align with existing dyslexia theories suggesting heightened de mands on linguistic working memory and phonological encoding in dyslexic reading and directly motivate further research into lexical complexity and preview benefits to further explain the quantified gap. In effect, these findings break down when extra dysl exic costs arise, how large they are, and provide actionable guidance for the development of interventions and computational models for dyslexic readers. Keywords: e ye movements, r eading time, w ord length, l exical f requency, p redictability, s kipping, t otal reading time DYSLEXIC READING TAKES LONGER 3 Why Dyslexic Reading Takes Longer - And When Dyslexia is characterized by persistent difficulty in accurate and/or fluent word recognition and decoding (Lyon et al., 2003) and affects between 4 - 8% of individuals (Yang et al., 2022; Doust et al., 2022).
- Europe > United Kingdom (0.24)
- Europe > Germany > Saxony > Leipzig (0.04)
- Europe > Germany > Brandenburg > Potsdam (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- Research Report > New Finding (0.46)
- Research Report > Experimental Study (0.30)
A Theory of the Mechanics of Information: Generalization Through Measurement of Uncertainty (Learning is Measuring)
Hazard, Christopher J., Resnick, Michael, Beel, Jacob, Xia, Jack, Mack, Cade, Glennie, Dominic, Fulp, Matthew, Maze, David, Bassett, Andrew, Koistinen, Martin
Traditional machine learning relies on explicit models and domain assumptions, limiting flexibility and interpretability. We introduce a model-free framework using surprisal (information theoretic uncertainty) to directly analyze and perform inferences from raw data, eliminating distribution modeling, reducing bias, and enabling efficient updates including direct edits and deletion of training data. By quantifying relevance through uncertainty, the approach enables generalizable inference across tasks including generative inference, causal discovery, anomaly detection, and time series forecasting. It emphasizes traceability, interpretability, and data-driven decision making, offering a unified, human-understandable framework for machine learning, and achieves at or near state-of-the-art performance across most common machine learning tasks. The mathematical foundations create a ``physics'' of information, which enable these techniques to apply effectively to a wide variety of complex data types, including missing data. Empirical results indicate that this may be a viable alternative path to neural networks with regard to scalable machine learning and artificial intelligence that can maintain human understandability of the underlying mechanics.
- North America > United States > Wisconsin (0.04)
- Oceania > Australia (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- (6 more...)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine > Therapeutic Area (1.00)
- Leisure & Entertainment > Games > Computer Games (0.67)
The Mechanistic Emergence of Symbol Grounding in Language Models
Wu, Shuyu, Ma, Ziqiao, Luo, Xiaoxi, Huang, Yidong, Torres-Fonseca, Josue, Shi, Freda, Chai, Joyce
Symbol grounding (Harnad, 1990) describes how symbols such as words acquire their meanings by connecting to real-world sensorimotor experiences. Recent work has shown preliminary evidence that grounding may emerge in (vision-)language models trained at scale without using explicit grounding objectives. Yet, the specific loci of this emergence and the mechanisms that drive it remain largely unexplored. To address this problem, we introduce a controlled evaluation framework that systematically traces how symbol grounding arises within the internal computations through mechanistic and causal analysis. Our findings show that grounding concentrates in middle-layer computations and is implemented through the aggregate mechanism, where attention heads aggregate the environmental ground to support the prediction of linguistic forms. This phenomenon replicates in multimodal dialogue and across architectures (Transformers and state-space models), but not in unidirectional LSTMs. Our results provide behavioral and mechanistic evidence that symbol grounding can emerge in language models, with practical implications for predicting and potentially controlling the reliability of generation.
- North America > United States > Virginia (0.04)
- North America > United States > Michigan (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)