dissociation
The Geometric Canary: Predicting Steerability and Detecting Drift via Representational Stability
Reliable deployment of language models requires two capabilities that appear distinct but share a common geometric foundation: predicting whether a model will accept targeted behavioral control, and detecting when its internal structure degrades. We show that geometric stability, the consistency of a representation's pairwise distance structure, addresses both. Supervised Shesha variants that measure task-aligned geometric stability predict linear steerability with near-perfect accuracy ($ฯ= 0.89$-$0.97$) across 35-69 embedding models and three NLP tasks, capturing unique variance beyond class separability (partial $ฯ= 0.62$-$0.76$). A critical dissociation emerges: unsupervised stability fails entirely for steering on real-world tasks ($ฯ\approx 0.10$), revealing that task alignment is essential for controllability prediction. However, unsupervised stability excels at drift detection, measuring nearly $2\times$ greater geometric change than CKA during post-training alignment (up to $5.23\times$ in Llama) while providing earlier warning in 73\% of models and maintaining a $6\times$ lower false alarm rate than Procrustes. Together, supervised and unsupervised stability form complementary diagnostics for the LLM deployment lifecycle: one for pre-deployment controllability assessment, the other for post-deployment monitoring.
A simple model of recognition and recall memory
Nisheeth Srivastava, Edward Vul
We show that several striking differences in memory performance between recognition and recall tasks are explained by an ecological bias endemic in classic memory experiments - that such experiments universally involve more stimuli than retrieval cues. We show that while it is sensible to think of recall as simply retrieving items when probed with a cue - typically the item list itself - it is better to think of recognition as retrieving cues when probed with items. To test this theory, by manipulating the number of items and cues in a memory experiment, we show a crossover effect in memory performance within subjects such that recognition performance is superior to recall performance when the number of items is greater than the number of cues and recall performance is better than recognition when the converse holds. We build a simple computational model around this theory, using sampling to approximate an ideal Bayesian observer encoding and retrieving situational co-occurrence frequencies of stimuli and retrieval cues. This model robustly reproduces a number of dissociations in recognition and recall previously used to argue for dual-process accounts of declarative memory.
How important is language for human-like intelligence?
Lupyan, Gary, Gentry, Hunter, Zettersten, Martin
We use language to communicate our thoughts. But is language merely the expression of thoughts, which are themselves produced by other, nonlinguistic parts of our minds? Or does language play a more transformative role in human cognition, allowing us to have thoughts that we otherwise could (or would) not have? Recent developments in artificial intelligence (AI) and cognitive science have reinvigorated this old question. We argue that language may hold the key to the emergence of both more general AI systems and central aspects of human intelligence. We highlight two related properties of language that make it such a powerful tool for developing domain--general abilities. First, language offers compact representations that make it easier to represent and reason about many abstract concepts (e.g., exact numerosity). Second, these compressed representations are the iterated output of collective minds. In learning a language, we learn a treasure trove of culturally evolved abstractions. Taken together, these properties mean that a sufficiently powerful learning system exposed to language--whether biological or artificial--learns a compressed model of the world, reverse engineering many of the conceptual and causal structures that support human (and human-like) thought.
Representation biases: will we achieve complete understanding by analyzing representations?
Lampinen, Andrew Kyle, Chan, Stephanie C. Y., Li, Yuxuan, Hermann, Katherine
A common approach in neuroscience is to study neural representations as a means to understand a system -- increasingly, by relating the neural representations to the internal representations learned by computational models. However, a recent work in machine learning (Lampinen, 2024) shows that learned feature representations may be biased to over-represent certain features, and represent others more weakly and less-consistently. For example, simple (linear) features may be more strongly and more consistently represented than complex (highly nonlinear) features. These biases could pose challenges for achieving full understanding of a system through representational analysis. In this perspective, we illustrate these challenges -- showing how feature representation biases can lead to strongly biased inferences from common analyses like PCA, regression, and RSA. We also present homomorphic encryption as a simple case study of the potential for strong dissociation between patterns of representation and computation. We discuss the implications of these results for representational comparisons between systems, and for neuroscience more generally.
Transition States Energies from Machine Learning: An Application to Reverse Water-Gas Shift on Single-Atom Alloys
Cheula, Raffaele, Andersen, Mie
Obtaining accurate transition state (TS) energies is a bottleneck in computational screening of complex materials and reaction networks due to the high cost of TS search methods and first-principles methods such as density functional theory (DFT). Here we propose a machine learning (ML) model for predicting TS energies based on Gaussian process regression with the Wasserstein Weisfeiler-Lehman graph kernel (WWL-GPR). Applying the model to predict adsorption and TS energies for the reverse water-gas shift (RWGS) reaction on single-atom alloy (SAA) catalysts, we show that it can significantly improve the accuracy compared to traditional approaches based on scaling relations or ML models without a graph representation. Further benefitting from the low cost of model training, we train an ensemble of WWL-GPR models to obtain uncertainties through subsampling of the training data and show how these uncertainties propagate to turnover frequency (TOF) predictions through the construction of an ensemble of microkinetic models. Comparing the errors in model-based vs DFT-based TOF predictions, we show that the WWL-GPR model reduces errors by almost an order of magnitude compared to scaling relations. This demonstrates the critical impact of accurate energy predictions on catalytic activity estimation. Finally, we apply our model to screen new materials, identifying promising catalysts for RWGS. This work highlights the power of combining advanced ML techniques with DFT and microkinetic modeling for screening catalysts for complex reactions like RWGS, providing a robust framework for future catalyst design.
On Representational Dissociation of Language and Arithmetic in Large Language Models
Kisako, Riku, Kuribayashi, Tatsuki, Sasano, Ryohei
The association between language and (non-linguistic) thinking ability in humans has long been debated, and recently, neuroscientific evidence of brain activity patterns has been considered. Such a scientific context naturally raises an interdisciplinary question -- what about such a language-thought dissociation in large language models (LLMs)? In this paper, as an initial foray, we explore this question by focusing on simple arithmetic skills (e.g., $1+2=$ ?) as a thinking ability and analyzing the geometry of their encoding in LLMs' representation space. Our experiments with linear classifiers and cluster separability tests demonstrate that simple arithmetic equations and general language input are encoded in completely separated regions in LLMs' internal representation space across all the layers, which is also supported with more controlled stimuli (e.g., spelled-out equations). These tentatively suggest that arithmetic reasoning is mapped into a distinct region from general language input, which is in line with the neuroscientific observations of human brain activations, while we also point out their somewhat cognitively implausible geometric properties.
A simple model of recognition and recall memory
We show that several striking differences in memory performance between recognition and recall tasks are explained by an ecological bias endemic in classic memory experiments - that such experiments universally involve more stimuli than retrieval cues. We show that while it is sensible to think of recall as simply retrieving items when probed with a cue - typically the item list itself - it is better to think of recognition as retrieving cues when probed with items. To test this theory, by manipulating the number of items and cues in a memory experiment, we show a crossover effect in memory performance within subjects such that recognition performance is superior to recall performance when the number of items is greater than the number of cues and recall performance is better than recognition when the converse holds. We build a simple computational model around this theory, using sampling to approximate an ideal Bayesian observer encoding and retrieving situational co-occurrence frequencies of stimuli and retrieval cues. This model robustly reproduces a number of dissociations in recognition and recall previously used to argue for dual-process accounts of declarative memory.
Study: MRI with machine learning reveals brain changes from PTSD
A new machine learning approach added to conventional magnetic resonance imaging can identify the regions of the brain causing dissociative symptoms in people with post-traumatic stress disorder, researchers found in a study published Friday by the American Journal of Psychiatry. Although MRI has long been used to document changes in the brain that occur as a result of a number of neurological conditions, bolstering the approach with machine learning enabled researchers to uncover and measure changes in functional connections between different regions of the brain in women with PTSD. These altered connections correlated with their dissociative symptoms, including memory loss or amnesia, the researchers said. "This new work may help us to establish a new standard of care for traumatized patients with PTSD who struggle with significant symptoms of dissociation," study co-author Dr. Milissa Kaufman, director of the Dissociative Disorders and Trauma Research Program at McLean Hospital, said in a statement. PTSD is a mental health disorder that occurs following trauma -- violent personal assaults, natural or human-caused disasters, accidents and military combat, for example -- according to the National Institute of Mental Health.
Towards a general model for psychopathology
The DSM-1 was published in 1952, contains 128 diagnostic categories, described in 132 pages. The DSM-5 appeared in 2013, contains 541 diagnostic categories, described in 947 pages. The field of psychology is characterised by a steady proliferation of diagnostic models and subcategories, that seems to be inspired by the principle of "divide and inflate". This approach is in contrast with experimental evidence, which suggests on one hand that traumas of various kind are often present in the anamnesis of patients and, on the other, that the gene variants implicated are shared across a wide range of diagnoses. In this work I propose a holistic approach, built with tools borrowed from the field of Artificial Intelligence. My model is based on two pillars. The first one is trauma, which represents the attack to the mind, is psychological in nature and has its origin in the environment. The second pillar is dissociation, which represents the mind defence in both physiological and pathological conditions, and incorporates all other defence mechanisms. Damages to dissociation can be considered as another category of attacks, that are neurobiological in nature and can be of genetic or environmental origin. They include, among other factors, synaptic over-pruning, abuse of drugs and inflammation. These factors concur to weaken the defence, represented by the neural networks that implement the dissociation mechanism in the brain. The model is subsequently used to interpret five mental conditions: PTSD, complex PTSD, dissociative identity disorder, schizophrenia and bipolar disorder. Ideally, this is a first step towards building a model that aims to explain a wider range of psychopathological affections with a single theoretical framework. The last part is dedicated to sketching a new psychotherapy for psychological trauma.
A simple model of recognition and recall memory
Srivastava, Nisheeth, Vul, Edward
We show that several striking differences in memory performance between recognition and recall tasks are explained by an ecological bias endemic in classic memory experiments - that such experiments universally involve more stimuli than retrieval cues. We show that while it is sensible to think of recall as simply retrieving items when probed with a cue - typically the item list itself - it is better to think of recognition as retrieving cues when probed with items. To test this theory, by manipulating the number of items and cues in a memory experiment, we show a crossover effect in memory performance within subjects such that recognition performance is superior to recall performance when the number of items is greater than the number of cues and recall performance is better than recognition when the converse holds. We build a simple computational model around this theory, using sampling to approximate an ideal Bayesian observer encoding and retrieving situational co-occurrence frequencies of stimuli and retrieval cues. This model robustly reproduces a number of dissociations in recognition and recall previously used to argue for dual-process accounts of declarative memory.