Goto

Collaborating Authors

 Country


Spectral Analysis of Representational Similarity with Limited Neurons

Neural Information Processing Systems

Understanding representational similarity between neural recordings and computational models is essential for neuroscience, yet remains challenging to measure reliably due to the constraints on the number of neurons that can be recorded simultaneously. In this work, we apply tools from Random Matrix Theory to investigate how such limitations affect similarity measures, focusing on Centered Kernel Alignment (CKA) and Canonical Correlation Analysis (CCA). We propose an analytical framework for representational similarity analysis that relates measured similarities to the spectral properties of the underlying representations. We demonstrate that neural similarities are systematically underestimated under finite neuron sampling, mainly due to eigenvector delocalization. Moreover, for power-law population spectra, we show that the number of localized eigenvectors scales as the square root of the number of recorded neurons, providing a simple rule of thumb for practitioners. To overcome sampling bias, we introduce a denoising method to infer population-level similarity, enabling accurate analysis even with small neuron samples. Theoretical predictions are validated on synthetic and real datasets, offering practical strategies for interpreting neural data under finite sampling constraints.


Efficient Quadratic Corrections for Frank-Wolfe Algorithms

Neural Information Processing Systems

We develop a Frank-Wolfe algorithm with corrective steps, generalizing previous algorithms including Blended Conditional Gradients, Blended Pairwise Conditional Gradients, and Fully-Corrective Frank-Wolfe. For this, we prove tight convergence guarantees together with an optimal face identification property. Furthermore, we propose two highly efficient corrective steps for convex quadratic objectives based on linear optimization or linear system solving, akin to Wolfe's MinimumNorm Point algorithm, and prove finite-time convergence under suitable conditions. Beyond optimization problems that are directly quadratic, we revisit two algorithms, Split Conditional Gradient and Second-Order Conditional Gradient Sliding, which can leverage quadratic corrections to accelerate the solution of their quadratic subproblems. We show improved convergence rates for the first and prove broader applicability for the second. Finally, we demonstrate substantial computational speedups for Frank-Wolfe-based algorithms with quadratic corrections across the considered problem classes.


WorldEmbeddingVLAInstructionImageVLAActionImage/Video Generation InstructionImagePolicyVLAInstructionImageAction InstructionImageActionAction(a)(b)(c)(d)Dream Queries

Neural Information Processing Systems

Recent advances in vision-language-action (VLA) models have shown promise in integrating image generation with action prediction to improve generalization and reasoning in robot manipulation. However, existing methods are limited to challenging image-based forecasting, which suffers from redundant information and lacks comprehensive and critical world knowledge, including dynamic, spatial and semantic information. To address these limitations, we propose DreamVLA, a novel VLA framework that integrates comprehensive world knowledge forecasting to enable inverse dynamics modeling, thereby establishing a perceptionprediction-action loop for manipulation tasks. Specifically, DreamVLA introduces a dynamic-region-guided world knowledge prediction, integrated with the spatial and semantic cues, which provide compact yet comprehensive representations for action planning.


Computational Budget Should Be Considered in Data Selection

Neural Information Processing Systems

Data selection improves computational efficiency by choosing informative subsets of training samples. However, existing methods ignore the compute budget, treating data selection and importance evaluation independently of compute budget constraints. Yet empirical studies show no algorithm can consistently outperform others (or even random selection) across varying budgets. We therefore argue that compute budget must be integral to data-selection strategies, since different budgets impose distinct requirements on data quantity, quality, and distribution for effective training. To this end, we propose a novel Computational budget-Aware Data Selection (CADS) method and naturally formulate it into a bilevel optimization framework, where the inner loop trains the model within the constraints of the computational budget on some selected subset of training data, while the outer loop optimizes data selection based on model evaluation.


ACT as Human: Multimodal Large Language Model Data Annotation with Critical Thinking

Neural Information Processing Systems

Supervised learning relies on high-quality labeled data, but obtaining such data through human annotation is both expensive and time-consuming. Recent work explores using large language models (LLMs) for annotation, but LLM-generated labels still fall short of human-level quality. To address this problem, we propose the Annotation with Critical Thinking (ACT) data pipeline, where LLMs serve not only as annotators but also as judges to critically identify potential errors. Human effort is then directed towards reviewing only the most "suspicious" cases, significantly improving the human annotation efficiency. Our major contributions are as follows: (1) ACT is applicable to a wide range of domains, including natural language processing (NLP), computer vision (CV), and multimodal understanding, by leveraging multimodal-LLMs (MLLMs).


Cognitive Mirrors: Exploring the Diverse Functional Roles of Attention Heads in LLMReasoning

Neural Information Processing Systems

Large language models (LLMs) have achieved state-of-the-art performance in a variety of tasks, but remain largely opaque in terms of their internal mechanisms. Understanding these mechanisms is crucial to improve their reasoning abilities. Drawing inspiration from the interplay between neural processes and human cognition, we propose a novel interpretability framework to systematically analyze the roles and behaviors of attention heads, which are key components of LLMs. We introduce CogQA, a dataset that decomposes complex questions into step-by-step subquestions with a chain-of-thought design, each associated with specific cognitive functions such as retrieval or logical reasoning. By applying a multi-class probing method, we identify the attention heads responsible for these functions. Our analysis across multiple LLM families reveals that attention heads exhibit functional specialization, characterized as cognitive heads. These cognitive heads exhibit several key properties: they are universally sparse, and vary in number and distribution across different cognitive functions, and they display interactive and hierarchical structures. We further show that cognitive heads play a vital role in reasoning tasks--removing them leads to performance degradation, while augmenting them enhances reasoning accuracy. These insights offer a deeper understanding of LLM reasoning and suggest important implications for model design, training and fine-tuning strategies.


Fourier Token Merging: Understanding and Capitalizing Frequency Domain for Efficient Image Generation

Neural Information Processing Systems

Image generation requires intensive computations and faces challenges due to long latency. Exploiting redundancy in the input images and intermediate representations throughout the neural network pipeline is an effective way to accelerate image generation. Token merging (ToMe) exploits similarities among input tokens by clustering them and merges similar tokens into one, thus significantly reducing the number of tokens that are fed into the transformer block. This work introduces Fourier Token Merging, a new method for understanding and capitalizing frequency domain for efficient image generation. By introducing frequency token merging, we find that transforming the token into the frequency domain representation for clustering can better exert the ability of clustering based on the underlying redundancy after de-correlation. Through analytical and empirical studies, we demonstrate the benefits of using Fourier clustering over the original time domain clustering. We experimented Fourier Token Merging on the stable diffusion model, and the results show up to 25% reduction in latency without impairing image quality.


Breaking the Performance Ceiling in Reinforcement Learning requires Inference Strategies

Neural Information Processing Systems

Reinforcement learning (RL) systems have countless applications, from energygrid management to protein design. However, such real-world scenarios are often extremely difficult, combinatorial in nature, and require complex coordination between multiple agents. This level of complexity can cause even state-of-theart RL systems, trained until convergence, to hit a performance ceiling which they are unable to break out of with zero-shot inference. Meanwhile, many digital or simulation-based applications allow for an inference phase that utilises a specific time and compute budget to explore multiple attempts before outputting a final solution. In this work, we show that such an inference phase employed at execution time, and the choice of a corresponding inference strategy, are key to breaking the performance ceiling observed in complex multi-agent RL problems. Our main result is striking: we can obtain up to a 126% and, on average, a 45% improvement over the previous state-of-the-art across 17 tasks, using only a couple seconds of extra wall-clock time during execution. We also demonstrate promising compute scaling properties, supported by over 60k experiments, making it the largest study on inference strategies for complex RL to date.


DoseSurv: Predicting Personalized Survival Outcomes under Continuous-Valued Treatments

Neural Information Processing Systems

Estimating heterogeneous treatment effects (HTEs) of continuous-valued interventions on survival, that is, time-to-event (TTE) outcomes, is crucial in various fields, notably in clinical decision-making and in driving the advancement of nextgeneration clinical trials. However, while HTE estimation for continuous-valued (i.e., dosage-dependent) interventions and for TTE outcomes have been separately explored, their combined application remains largely overlooked in the machine learning literature. We propose DoseSurv, a varying-coefficient network designed to estimate HTEs for different dosage-dependent and non-dosage treatment options from TTE data. DoseSurv uses radial basis functions to model continuity in doseresponse relationships and learns balanced representations to address covariate shifts arising in HTE estimation from observational TTE data.


POCO: Scalable Neural Forecasting through Population Conditioning

Neural Information Processing Systems

Predicting future neural activity is a core challenge in modeling brain dynamics, with applications ranging from scientific investigation to closed-loop neurotechnology. While recent models of population activity emphasize interpretability and behavioral decoding, neural forecasting--particularly across multi-session, spontaneous calcium recordings--remains underexplored. We introduce POCO, a unified forecasting model that combines a lightweight univariate forecaster with a population-level encoder to capture both neuron-specific and brain-wide dynamics in calcium imaging recordings. Trained across five calcium imaging datasets spanning zebrafish, mice, and C. elegans, POCO achieves state-of-the-art accuracy at cellular resolution in spontaneous behaviors.