Goto

Collaborating Authors

 neuron


SGD Provably Prioritizes a Shortcut Spurious Feature in the XOR Model

arXiv.org Machine Learning

Neural networks are known to be susceptible to over-reliance on spurious correlations. However, the precise mechanism by which models exploit shortcut features is not fully understood, and algorithms to mitigate this behavior rely on as yet unjustified assumptions about the learned representations. In this work, we provide the first end-to-end theoretical characterization of spurious feature learning for two-layer ReLU neural networks trained by online minibatch SGD on the logistic loss. We consider data drawn from the high-dimensional Boolean hypercube with a quadratic signal function (namely XOR) and a linear spurious correlation. We show that SGD learns the spurious feature first, and exponentially fast. Moreover, the optimization dynamics couple the spurious and signal features, with a stronger spurious component inhibiting signal feature learning. Our analysis reveals precise phase transitions in the learning dynamics. In the first phase, alignment between the signs of the spurious feature and second-layer weight drives rapid growth of the spurious feature. In the second phase, large majority group margin slows learning and the signal feature remains suppressed. When the spurious correlation is maximally strong, we show theoretically that the spurious feature dominates even at the sample complexity threshold where XOR would be learned in isolation (i.e., if the spurious feature was absent). In contrast, when the correlation strength is constant, we provide preliminary empirical evidence that the model can eventually learn the XOR signal, although the spurious feature is not forgotten.


How some people's brains make an extraordinary recovery from stroke

New Scientist

How some people's brains make an extraordinary recovery from stroke A well-known actor who had experienced a stroke was treated by stroke specialist Sandor Nardai. The actor had been left with aphasia, or an impaired ability to speak - brutal for anyone, but "probably the most devastating thing that could happen to an actor", says Nardai. After three months of recovery, though, the actor was able to say some words. After a year, he voiced a commercial. Remarkably, he eventually got well enough to return to live theatre, says Nardai, who is at Semmelweis University in Hungary.


The Remarkable Robustness of LLMs: Stages of Inference?

Neural Information Processing Systems

We investigate the robustness of Large Language Models (LLMs) to structural interventions by deleting and swapping adjacent layers during inference. Surprisingly, models retain 72-95% of their original top-1 prediction accuracy without any fine-tuning. We find that performance degradation is not uniform across layers: interventions to the early and final layers cause the most degradation, while the model is remarkably robust to dropping middle layers. This pattern of localized sensitivity motivates our hypothesis of four stages of inference, observed across diverse model families and sizes: (1) detokenization, where local context is integrated to lift raw token embeddings into higher-level representations; (2) feature engineering, where task-and entity-specific features are iteratively refined; (3) prediction ensembling, where hidden states are aggregated into plausible next-token predictions; and (4) residual calibration, where irrelevant features are suppressed to finalize the top-1 output distribution. Synthesizing behavioral and mechanistic evidence, we provide a hypothesis for interpreting depth-dependent computations in LLMs.


Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models

Neural Information Processing Systems

Sparse Autoencoders (SAEs) have recently gained attention as a means to improve the interpretability and steerability of Large Language Models (LLMs), both of which are essential for AI safety. In this work, we extend the application of SAEs to Vision-Language Models (VLMs), such as CLIP, and introduce a comprehensive framework for evaluating monosemanticity at the neuron-level in visual representations. To ensure that our evaluation aligns with human perception, we propose a benchmark derived from a large-scale user study. Our experimental results reveal that SAEs trained on VLMs significantly enhance the monosemanticity of individual neurons, with sparsity and wide latents being the most influential factors. Further, we demonstrate that applying SAE interventions on CLIP's vision encoder directly steers multimodal LLM outputs (e.g., LLaVA), without any modifications to the underlying language model. These findings emphasize the practicality and efficacy of SAEs as an unsupervised tool for enhancing both interpretability and control of VLMs.


Clip-and-Verify: Linear Constraint-Driven Domain Clipping for Accelerating Neural Network Verification

Neural Information Processing Systems

State-of-the-art neural network (NN) verifiers demonstrate that applying the branchand-bound (BaB) procedure with fast bounding techniques plays a key role in tackling many challenging verification properties. In this work, we introduce the linear constraint-driven clipping framework, a class of scalable and efficient methods designed to enhance the efficacy of NN verifiers. Under this framework, we develop two novel algorithms that efficiently utilize linear constraints to 1) reduce portions of the input space that are either verified or irrelevant to a subproblem in the context of branch-and-bound, and 2) directly improve intermediate bounds throughout the network. The process novelly leverages linear constraints that often arise from bound propagation methods and is general enough to also incorporate constraints from other sources. It efficiently handles linear constraints using a specialized GPU procedure that can scale to large neural networks without the use of expensive external solvers. Our verification procedure, Clip-and-Verify, consistently tightens bounds across multiple benchmarks and can significantly reduce the number of subproblems handled during BaB. We show that our clipping algorithms can be integrated with BaB-based verifiers such as ฮฑ,ฮฒ-CROWN, utilizing either the split constraints in activation-space BaB or the output constraints that denote the unverified input space. We demonstrate the effectiveness of our procedure on a broad range of benchmarks where, in some instances, we witness a 96% reduction in the number of subproblems during branch-and-bound, and also achieve state-of-the-art verified accuracy across multiple benchmarks. Clip-and-Verify is part of the ฮฑ,ฮฒ-CROWNverifier, the VNN-COMP 2025 winner.


Curl Descent: Non-Gradient Learning Dynamics with Sign-Diverse Plasticity

Neural Information Processing Systems

Gradient-based algorithms are a cornerstone of artificial neural network training, yet it remains unclear whether biological neural networks use similar gradientbased strategies during learning. Experiments often discover a diversity of synaptic plasticity rules, but whether these amount to an approximation to gradient descent is unclear. Here we investigate a previously overlooked possibility: that learning dynamics may include fundamentally non-gradient "curl"-like components while still being able to effectively optimize a loss function. Curl terms naturally emerge in networks with inhibitory-excitatory connectivity or Hebbian/anti-Hebbian plasticity, resulting in learning dynamics that cannot be framed as gradient descent on any objective. To investigate the impact of these curl terms, we analyze feedforward networks within an analytically tractable student-teacher framework, systematically introducing non-gradient dynamics through neurons exhibiting rule-flipped plasticity.


Spatial-Aware Decision-Making with Ring Attractors in Reinforcement Learning Systems

Neural Information Processing Systems

Ring attractors, mathematical models inspired by neural circuit dynamics, provide a biologically plausible mechanism to improve learning speed and accuracy in Reinforcement Learning (RL). Serving as specialized brain-inspired structures that encode spatial information and uncertainty, ring attractors explicitly encode the action space, facilitate the organization of neural activity, and enable the distribution of spatial representations across the neural network in the context of Deep Reinforcement Learning (DRL). These structures also provide temporal filtering that stabilizes action selection during exploration, for example, by preserving the continuity between rotation angles in robotic control or adjacency between tactical moves in game-like environments. The application of ring attractors in the action selection process involves mapping actions to specific locations on the ring and decoding the selected action based on neural activity. We investigate the application of ring attractors by both building an exogenous model and integrating them as part of DRL agents. Our approach significantly improves state-of-the-art performance on the Atari 100k benchmark, achieving a 53% increase in performance over selected baselines.


Connectome-Based Modelling Reveals Orientation Maps in the Drosophila Optic Lobe

Neural Information Processing Systems

The ability to extract oriented edges from visual input is a core computation across animal vision systems. Orientation maps, long associated with the layered architecture of the mammalian visual cortex, systematically organise neurons by their preferred edge orientation. Despite lacking cortical structures, the Drosophila melanogaster brain contains feature-selective neurons and exhibits complex visual detection capacity, raising the question of whether map-like vision representations can emerge without cortical infrastructure. We integrate a complete fruit fly brain connectome with biologically grounded spiking neuron models to simulate neuroprocessing in the fly visual system. By driving the network with oriented stimuli and analysing downstream responses, we show that coherent orientation maps can emerge from purely connectome-constrained dynamics. These results suggest that species of independent origin could evolve similar visual structures.


Proxy Target: Bridging the Gap Between Discrete Spiking Neural Networks and Continuous Control

Neural Information Processing Systems

However, most RL algorithms for continuous control are designed for Artificial Neural Networks (ANNs), particularly the target network soft update mechanism, which conflicts with the discrete and non-differentiable dynamics of spiking neurons. We show that this mismatch destabilizes SNN training and degrades performance. To bridge the gap between discrete SNNs and continuous-control algorithms, we propose a novel proxy target framework. The proxy network introduces continuous and differentiable dynamics that enable smooth target updates, stabilizing the learning process. Since the proxy operates only during training, the deployed SNN remains fully energy-efficient with no additional inference overhead. Extensive experiments on continuous control benchmarks demonstrate that our framework consistently improves stability and achieves up to 32%higher performance across various spiking neuron models. Notably, to the best of our knowledge, this is the first approach that enables SNNs with simple Leaky Integrate and Fire (LIF) neurons to surpass their ANN counterparts in continuous control. This work highlights the importance of SNN-tailored RL algorithms and paves the way for neuromorphic agents that combine high performance with low power consumption.


Adaptive Fission: Post-training Encoding for Low-latency Spike Neural Networks

Neural Information Processing Systems

Spiking Neural Networks (SNNs) often rely on rate coding, where high-precision inference depends on long time-steps, leading to significant latency and energy cost--especially for ANN-to-SNN conversions. To address this, we propose Adaptive Fission, a post-training encoding technique that selectively splits highsensitivity neurons into groups with varying scales and weights. This enables neuron-specific, on-demand precision and threshold allocation while introducing minimal spatial overhead. As a generalized form of population coding, it seamlessly applies to a wide range of pretrained SNN architectures without requiring additional training or fine-tuning. Experiments on neuromorphic hardware demonstrate up to 80% reductions in latency and power consumption without degrading accuracy.