Goto

Collaborating Authors

 Genre


Nyström-Accelerated Primal LS-SVMs: Breaking the O(an3) Complexity Bottleneck for Scalable ODEs Learning

Neural Information Processing Systems

A major problem of kernel-based methods (e.g., least squares support vector machines, LS-SVMs) for solving linear/nonlinear ordinary differential equations (ODEs) is the prohibitive O(an3) (a = 1 for linear ODEs and 27 for nonlinear ODEs) part of their computational complexity with increasing temporal discretization points n. We propose a novel Nyström-accelerated LS-SVMs framework that breaks this bottleneck by reformulating ODEs as primal-space constraints. Specifically, we derive for the first time an explicit Nyström-based mapping and its derivatives from one-dimensional temporal discretization points to a higher m-dimensional feature space (1 < m n), enabling the learning process to solve linear/nonlinear equation systems with m-dependent complexity. Numerical experiments on sixteen benchmark ODEs demonstrate: 1) 10 6000 times faster computation than classical LS-SVMs and physics-informed neural networks (PINNs), 2) comparable accuracy to LS-SVMs (< 0.13% relative MAE, RMSE, and y ˆy difference) while maximum surpassing PINNs by 72% in RMSE, and 3) scalability to n = 104 time steps with m = 50features. This work establishes a new paradigm for efficient kernel-based ODEs learning without significantly sacrificing the accuracy of the solution.


Error Forcing in Recurrent Neural Networks

Neural Information Processing Systems

One way to address the known limitations of backpropagation through time is to directly adjust neural activities during the learning process. However, it remains unclear how to effectively use feedback to shape RNN dynamics. Here, we introduce error forcing (EF), where the network activity is guided orthogonally toward the zero-error manifold during learning. This method contrasts with alternatives like teaching forcing, which impose stronger constraints on neural activity and thus induce larger feedback influence on circuit dynamics. Furthermore, EF can be understood from a Bayesian perspective as a form of approximate dynamic inference. Empirically, EF consistently outperforms other learning algorithms across several tasks and its benefits persist when additional biological constraints are taken into account. Overall, EF is a powerful temporal credit assignment mechanism and a promising candidate model for learning in biological systems.


MLLM-For3D: Adapting Multimodal Large Language Model for 3DReasoning Segmentation

Neural Information Processing Systems

Reasoning segmentation aims to segment target objects in complex scenes based on human intent and spatial reasoning. While recent multimodal large language models (MLLMs) have demonstrated impressive 2D image reasoning segmentation, adapting these capabilities to 3D scenes remains underexplored. In this paper, we introduce MLLM-For3D, a simple yet effective framework that transfers knowledge from 2DMLLMs to 3D scene understanding. Specifically, we utilize MLLMs to generate multi-view pseudo-segmentation masks and corresponding text embeddings, then unproject 2D masks into 3D space and align them with the text embeddings. The primary challenge lies in the absence of 3D context and spatial consistency across multiple views, causing the model to hallucinate objects that do not exist and fail to target objects consistently.


What Happens During the Loss Plateau Understanding Abrupt Learning in Transformers

Neural Information Processing Systems

Training Transformers on algorithmic tasks frequently demonstrates an intriguing abrupt learning phenomenon: an extended performance plateau followed by a sudden, sharp improvement. This work investigates the underlying mechanisms for such dynamics, primarily in shallow Transformers. We reveal that during the plateau, the model often develops an interpretable partial solution while simultaneously exhibiting a strong repetition bias in their outputs. This output degeneracy is accompanied by internal representation collapse, where hidden states across different tokens become nearly parallel. We further identify the slow learning of optimal attention maps as a key bottleneck. Hidden progress in attention configuration during the plateau precedes the eventual rapid convergence, and directly intervening on attention significantly alters plateau duration and the severity of repetition bias and representational collapse. We validate that these identified phenomena--repetition bias and representation collapse--are not artifacts of toy setups but also manifest in the early pre-training stage of large language models like Pythia and OLMo.


Exploring and Leveraging Class Vectors for Classifier Editing

Neural Information Processing Systems

Image classifiers play a critical role in detecting diseases in medical imaging and identifying anomalies in manufacturing processes. However, their predefined behaviors after extensive training make post hoc model editing difficult, especially when it comes to forgetting specific classes or adapting to distribution shifts. Existing classifier editing methods either focus narrowly on correcting errors or incur extensive retraining costs, creating a bottleneck for flexible editing. Moreover, such editing has seen limited investigation in image classification. To overcome these challenges, we introduce Class Vectors, which capture class-specific representation adjustments during fine-tuning.


Learning Stochastic Multiscale Models

Neural Information Processing Systems

The physical sciences are replete with dynamical systems that require the resolution of a wide range of length and time scales. This presents significant computational challenges since direct numerical simulation requires discretization at the finest relevant scales, leading to a high-dimensional state space. In this work, we propose an approach to learn stochastic multiscale models in the form of stochastic differential equations directly from observational data. Drawing inspiration from physics-based multiscale modeling approaches, we resolve the macroscale state on a coarse mesh while introducing a microscale latent state to explicitly model unresolved dynamics. We learn the parameters of the multiscale model using a simulator-free amortized variational inference method with a Product of Experts likelihood that enforces scale separation. We present detailed numerical studies to demonstrate that our learned multiscale models achieve superior predictive accuracy compared to under-resolved direct numerical simulation and closure-type models at equivalent resolution, as well as reduced-order modeling approaches.


EPA Boosting Event based Video Frame Interpolation with Perceptually Aligned Learning

Neural Information Processing Systems

Event cameras, with their capacity to provide high temporal resolution information between frames, are increasingly utilized for video frame interpolation (VFI) in challenging scenarios characterized by high-speed motion and significant occlusion. However, prevalent issues of blur and distortion within the keyframes and ground truth data used for training and inference in these demanding conditions are frequently overlooked. This oversight impedes the perceptual realism and multiscene generalization capabilities of existing event-based VFI (E-VFI) methods when generating interpolated frames. Motivated by the observation that semanticperceptual discrepancies between degraded and pristine images are considerably smaller than their image-level differences, we introduce EPA. This novel E-VFI framework diverges from approaches reliant on direct image-level supervision by constructing multilevel, degradation-insensitive semantic perceptual supervisory signals to enhance the perceptual realism and multi-scene generalization of the model's predictions. Specifically, EPA operates in two phases: it first employs a DINO-based perceptual extractor, a customized style adapter, and a reconstruction generator to derive multi-layered, degradation-insensitive semantic-perceptual features (S).


World ModelBench: Judging Video Generation Models As World Models

Neural Information Processing Systems

Video generation models have rapidly progressed, positioning themselves as video world models capable of supporting decision-making applications like robotics and autonomous driving. However, current benchmarks fail to rigorously evaluate these claims, focusing only on general video quality, ignoring important factors to world models such as physics adherence. To bridge this gap, we propose WorldModelBench, a benchmark designed to evaluate the world modeling capabilities of video generation models in application-driven domains. WorldModelBench offers two key advantages: (1) Against to nuanced world modeling violations: By incorporating instruction-following and physics-adherence dimensions, WorldModelBench detects subtle violations, such as irregular changes in object size that breach the mass conservation law--issues overlooked by prior benchmarks.


Online Exploration Unknown GeometryOpen-world Objects

Neural Information Processing Systems

Current 3D scene understanding methods are limited by offline-collected multi-view data or pre-constructed 3D geometry. In this paper, we present ExtractAnything3D ( enables EA3D), simultaneous a unified online geome frame tric w reconstruction ork for open-w and orld holistic 3D object scene e understanding.


Contimask: Explaining Irregular Time Series Models via Perturbations in Continuous Time

Neural Information Processing Systems

Explaining black-box models for time series data is critical for the wide-scale adoption of deep learning techniques across domains such as healthcare. Recently, explainability methods for deep time series models have seen significant progress by adopting saliency methods that perturb masked segments of time series to uncover their importance towards the prediction of black-box models. Thus far, such methods have been largely restricted to regular time series. Irregular time series, however, sampled at irregular time intervals and potentially with missing values, are the dominant form of time series in various critical domains (e.g., hospital records). In this paper, we conduct the first evaluation of saliency methods for the interpretation of irregular time series models.