surprisal
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
Perceptually Aligning Representations of Music via Noise-Augmented Autoencoders
Bjare, Mathias Rose, Cantisani, Giorgia, Pasini, Marco, Lattner, Stefan, Widmer, Gerhard
We argue that training autoencoders to reconstruct inputs from noised versions of their encodings, when combined with perceptual losses, yields encodings that are structured according to a perceptual hierarchy. We demonstrate the emergence of this hierarchical structure by showing that, after training an audio autoencoder in this manner, perceptually salient information is captured in coarser representation structures than with conventional training. Furthermore, we show that such perceptual hierarchies improve latent diffusion decoding in the context of estimating surprisal in music pitches and predicting EEG-brain responses to music listening. Pretrained weights are available on github.com/CPJKU/pa-audioic.
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > United Kingdom > England > West Yorkshire > Huddersfield (0.04)
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
Surprisal reveals diversity gaps in image captioning and different scorers change the story
Ilinykh, Nikolai, Dobnik, Simon
We quantify linguistic diversity in image captioning with surprisal variance - the spread of token-level negative log-probabilities within a caption set. On the MSCOCO test set, we compare five state-of-the-art vision-and-language LLMs, decoded with greedy and nucleus sampling, to human captions. Measured with a caption-trained n-gram LM, humans display roughly twice the surprisal variance of models, but rescoring the same captions with a general-language model reverses the pattern. Our analysis introduces the surprisal-based diversity metric for image captioning. We show that relying on a single scorer can completely invert conclusions, thus, robust diversity evaluation must report surprisal under several scorers.
- Europe > Switzerland > Zürich > Zürich (0.14)
- Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (18 more...)
Identifying the Periodicity of Information in Natural Language
Ou, Yulin, Wang, Yu, Xu, Yang, Buschmeier, Hendrik
Recent theoretical advancement of information density in natural language has brought the following question on desk: To what degree does natural language exhibit periodicity pattern in its encoded information? We address this question by introducing a new method called AutoPeriod of Surprisal (APS). APS adopts a canonical periodicity detection algorithm and is able to identify any significant periods that exist in the surprisal sequence of a single document. By applying the algorithm to a set of corpora, we have obtained the following interesting results: Firstly, a considerable proportion of human language demonstrates a strong pattern of periodicity in information; Secondly, new periods that are outside the distributions of typical structural units in text (e.g., sentence boundaries, elementary discourse units, etc.) are found and further confirmed via harmonic regression modeling. We conclude that the periodicity of information in language is a joint outcome from both structured factors and other driving factors that take effect at longer distances. The advantages of our periodicity detection method and its potentials in LLM-generation detection are further discussed.
- Europe > Austria > Vienna (0.14)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (8 more...)
CAVE: Detecting and Explaining Commonsense Anomalies in Visual Environments
Bhagwatkar, Rishika, Montariol, Syrielle, Romanou, Angelika, Borges, Beatriz, Rish, Irina, Bosselut, Antoine
Humans can naturally identify, reason about, and explain anomalies in their environment. In computer vision, this long-standing challenge remains limited to industrial defects or unrealistic, synthetically generated anomalies, failing to capture the richness and unpredictability of real-world anomalies. In this work, we introduce CAVE, the first benchmark of real-world visual anomalies. CAVE supports three open-ended tasks: anomaly description, explanation, and justification; with fine-grained annotations for visual grounding and categorizing anomalies based on their visual manifestations, their complexity, severity, and commonness. These annotations draw inspiration from cognitive science research on how humans identify and resolve anomalies, providing a comprehensive framework for evaluating Vision-Language Models (VLMs) in detecting and understanding anomalies. We show that state-of-the-art VLMs struggle with visual anomaly perception and commonsense reasoning, even with advanced prompting strategies. By offering a realistic and cognitively grounded benchmark, CAVE serves as a valuable resource for advancing research in anomaly detection and commonsense reasoning in VLMs.
- North America > United States (0.28)
- North America > Canada > Quebec (0.04)
- Europe > Spain > Galicia > Madrid (0.04)
- (4 more...)
Quantifying the Effects of Word Length, Frequency, and Predictability on Dyslexia
Rydel-Johnston, Hugo, Kafkas, Alex
Division of Psychology, Communication & Human Neuroscience, The University of Manchester Author Note Hugo Rydel - Johnston https://orcid.org/0009 - 0006 - 1103 - 1015 Alex Ka fkas https://orcid.org/0000 - 0001 - 5133 - 8827 We have no conflict s of interest to disclose. Correspondence concerning this article should be addressed to Hugo Rydel - Johnston, Division of Psychology, Communication & Human Neuroscience, The University of Manchester, Oxford Road, Manchester, M13 9PL, UK . DYSLEXIC READING TAKES LONGER 2 Abstract We ask where, and under what conditions, dyslexic reading costs arise in a large - scale naturalistic reading dataset. Using eye - tracking aligned to word - level properties -- word length, frequency, and predictability -- we model the influence of each of these feat ures on dyslexic time costs. We find that all three properties robustly change reading times in both typical and dyslexic readers, but dyslexic readers show stronger sensitivities to each of the three features, especially predictability. Counterfactual man ipulations of these features substantially narrow the dyslexic - control gap -- by about one - third -- with predictability showing the strongest effect, followed by length, and frequency. These patterns align with existing dyslexia theories suggesting heightened de mands on linguistic working memory and phonological encoding in dyslexic reading and directly motivate further research into lexical complexity and preview benefits to further explain the quantified gap. In effect, these findings break down when extra dysl exic costs arise, how large they are, and provide actionable guidance for the development of interventions and computational models for dyslexic readers. Keywords: e ye movements, r eading time, w ord length, l exical f requency, p redictability, s kipping, t otal reading time DYSLEXIC READING TAKES LONGER 3 Why Dyslexic Reading Takes Longer - And When Dyslexia is characterized by persistent difficulty in accurate and/or fluent word recognition and decoding (Lyon et al., 2003) and affects between 4 - 8% of individuals (Yang et al., 2022; Doust et al., 2022).
- Europe > United Kingdom (0.24)
- Europe > Germany > Saxony > Leipzig (0.04)
- Europe > Germany > Brandenburg > Potsdam (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- Research Report > New Finding (0.46)
- Research Report > Experimental Study (0.30)
A Theory of the Mechanics of Information: Generalization Through Measurement of Uncertainty (Learning is Measuring)
Hazard, Christopher J., Resnick, Michael, Beel, Jacob, Xia, Jack, Mack, Cade, Glennie, Dominic, Fulp, Matthew, Maze, David, Bassett, Andrew, Koistinen, Martin
Traditional machine learning relies on explicit models and domain assumptions, limiting flexibility and interpretability. We introduce a model-free framework using surprisal (information theoretic uncertainty) to directly analyze and perform inferences from raw data, eliminating distribution modeling, reducing bias, and enabling efficient updates including direct edits and deletion of training data. By quantifying relevance through uncertainty, the approach enables generalizable inference across tasks including generative inference, causal discovery, anomaly detection, and time series forecasting. It emphasizes traceability, interpretability, and data-driven decision making, offering a unified, human-understandable framework for machine learning, and achieves at or near state-of-the-art performance across most common machine learning tasks. The mathematical foundations create a ``physics'' of information, which enable these techniques to apply effectively to a wide variety of complex data types, including missing data. Empirical results indicate that this may be a viable alternative path to neural networks with regard to scalable machine learning and artificial intelligence that can maintain human understandability of the underlying mechanics.
- North America > United States > Wisconsin (0.04)
- Oceania > Australia (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- (6 more...)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine > Therapeutic Area (1.00)
- Leisure & Entertainment > Games > Computer Games (0.67)
The Mechanistic Emergence of Symbol Grounding in Language Models
Wu, Shuyu, Ma, Ziqiao, Luo, Xiaoxi, Huang, Yidong, Torres-Fonseca, Josue, Shi, Freda, Chai, Joyce
Symbol grounding (Harnad, 1990) describes how symbols such as words acquire their meanings by connecting to real-world sensorimotor experiences. Recent work has shown preliminary evidence that grounding may emerge in (vision-)language models trained at scale without using explicit grounding objectives. Yet, the specific loci of this emergence and the mechanisms that drive it remain largely unexplored. To address this problem, we introduce a controlled evaluation framework that systematically traces how symbol grounding arises within the internal computations through mechanistic and causal analysis. Our findings show that grounding concentrates in middle-layer computations and is implemented through the aggregate mechanism, where attention heads aggregate the environmental ground to support the prediction of linguistic forms. This phenomenon replicates in multimodal dialogue and across architectures (Transformers and state-space models), but not in unidirectional LSTMs. Our results provide behavioral and mechanistic evidence that symbol grounding can emerge in language models, with practical implications for predicting and potentially controlling the reliability of generation.
- North America > United States > Virginia (0.04)
- North America > United States > Michigan (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
Exploring Gaps in the APS: Direct Minimal Pair Analysis in LLM Syntactic Assessments
Pistotti, Timothy, Brown, Jason, Witbrock, Michael
Recent studies probing the Argument from the Poverty of the Stimulus (APS) have applied Large Language Models (LLMs) to test the learnability of complex syntax through surprisal-based metrics. However, divergent conclusions raise questions concerning the insights these metrics offer. While Wilcox et al. (2024) used direct minimal pair comparisons (the "wh-effect") to demonstrate that models successfully generalise knowledge of filler-gap dependencies, Lan et al. (2024) used a Difference-in-Differences (DiD) metric and found that models largely fail on parasitic gaps (PGs). This paper argues that the direct minimal pair approach offers greater diagnostic transparency. We demonstrate this by generating a full 8-permutation paradigm of refined PG stimuli and evaluating the GPT-2 model used in previous studies with a systematic Wilcox-style wh-effect analysis. Our results show that GPT-2 succeeds across all four tested conditions, indicating robust knowledge of filler-gap licensing principles even in complex PG environments. This finding, which contrasts with the more ambiguous results from DiD-style metrics, suggests that the choice of evaluation metric is critical for assessing an LLM's syntactic competence.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Oceania > New Zealand > North Island > Auckland Region > Auckland (0.05)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.69)