Goto

Collaborating Authors

 evolution


Learning Robust Spectral Dynamics for Temporal Domain Generalization

Neural Information Processing Systems

Modern machine learning models struggle to maintain performance in dynamic environments where temporal distribution shifts, i.e., concept drift, are prevalent. Temporal Domain Generalization (TDG) seeks to enable model generalization across evolving domains, yet existing approaches typically assume smooth incremental changes, struggling with complex real-world drifts involving both long-term structure (incremental evolution/periodicity) and local uncertainties. To overcome these limitations, we introduce FreKoo, which tackles these challenges through a novel frequency-domain analysis of parameter trajectories. It leverages the Fourier transform to disentangle parameter evolution into distinct spectral bands. Specifically, the low-frequency components with dominant dynamics are learned and extrapolated using the Koopman operator, robustly capturing diverse drift patterns including both incremental and periodic drifts. Simultaneously, potentially disruptive high-frequency variations are smoothed via targeted temporal regularization, preventing overfitting to transient noise and domain uncertainties. In addition, this dual-spectral strategy is rigorously grounded through theoretical analysis, providing stability guarantees for the Koopman prediction, a principled Bayesian justification for the high-frequency regularization, and culminating in a multiscale generalization bound connecting spectral dynamics to improved generalization. Extensive experiments demonstrate FreKoo's significant superiority over state-of-the-art TDG methods, particularly excelling in real-world streaming scenarios with complex drifts and uncertainties.


Explore In-Context Message Passing Operator for Graph Neural Networks in AMean Field Game

Neural Information Processing Systems

In typical graph neural networks (GNNs), feature representation learning naturally evolves through iteratively updating node features and exchanging information based on graph topology. In this context, we conceptualize that the learning process in GNNs is a mean-field game (MFG), where each graph node is an agent, interacting with its topologically connected neighbors. However, current GNNs often employ the identical MFG strategy across different graph datasets, regardless of whether the graph exhibits homophilic or heterophilic characteristics. To address this challenge, we propose to formulate the learning mechanism into a variational framework of the MFG inverse problem, introducing an in-context selective message passing paradigm for each agent, which promotes the best overall outcome for the graph. Specifically, we seek for the application-adaptive transportation function (controlling information exchange throughout the graph) and reaction function (controlling feature representation learning on each agent), on the fly, which allows us to uncover the most suitable selective mechanism of message passing by solving an MFG variational problem through the lens of Hamiltonian flows. Taken together, our variational framework unifies existing GNN models into various mean-field games with distinct equilibrium states, each characterized by the learned in-context message passing operators. Furthermore, we present an agnostic end-to-end deep model, coined Game-of-GNN, to jointly identify the message passing mechanism and fine-tune the GNN hyper-parameters on top of the elucidated message passing operators. Game-of-GNN has achieved SOTA performance on diverse graph data, including popular benchmark datasets and human connectomes. More importantly, the mathematical insight of MFG framework provides a new window to understand the foundational principles of graph learning as an interactive dynamical system, which allows us to reshape the idea of designing next-generation GNN models.


Counterfactual Evolution of Multimodal Datasets via Visual Programming

Neural Information Processing Systems

The rapid development of Multimodal Large Language Models (MLLMs) poses increasing demands on the diversity and complexity of multimodal datasets. Yet manual annotation pipelines can no longer keep pace. Existing augmentation methods often follow fixed rules and lack verifiable control over sample diversity and reasoning complexity. To address this, we introduce Scalable COunterfactual Program Evolution (SCOPE), a framework that uses symbolic Visual Programming to guide program evolution via counterfactual reasoning. SCOPE performs the three steps of counterfactual inference: (1) Abduction, by generating verifiable programs to model reasoning associations; (2) Action, by intervening on program structure along three axes--reasoning path, visual context, and cross-instance composition; and (3) Prediction, by categorizing evolved instances by difficulty, structure, and input multiplicity. Based on this process, we build SCOPE-Train and SCOPE-Test, evolving benchmarks with expert validation. To support training, we propose MAP, a curriculum learning strategy that aligns model capacity with sample difficulty. Experiments show that SCOPEimproves reasoning performance, exposes model blind spots, and enhances visual dialog capabilities.


Act to See, See to Act: Diffusion-Driven Perception-Action Interplay for Adaptive Policies

Neural Information Processing Systems

Existing imitation learning methods decouple perception and action, which overlooks the causal reciprocity between sensory representation and action execution that humans naturally leverage for adaptive behaviors. To bridge this gap, we introduce Action-Guided Diffusion Policy (DP-AG), a unified representation learning that explicitly models a dynamic interplay between perception and action through probabilistic latent dynamics. DP-AG encodes latent observations into a Gaussian posterior via variational inference and evolves them using an action-guided SDE, where the Vector-Jacobian Product (VJP) of the diffusion policy's noise predictions serves as a structured stochastic force driving latent updates. To promote bidirectional learning between perception and action, we introduce a cycle-consistent contrastive loss that organizes the gradient flow of the noise predictor into a coherent perception-action loop, enforcing mutually consistent transitions in both latent updates and action refinements. Theoretically, we derive a variational lower bound for the action-guided SDE, and prove that the contrastive objective enhances continuity in both latent and action trajectories. Empirically, DP-AG significantly outperforms state-of-the-art methods across simulation benchmarks and real-world UR5 manipulation tasks. As a result, our DP-AG offers a promising step toward bridging biological adaptability and artificial policy learning.


Continuous Domain Generalization

Neural Information Processing Systems

Real-world data distributions often shift continuously across multiple latent factors such as time, geography, and socioeconomic contexts. However, existing domain generalization approaches typically treat domains as discrete or as evolving along a single axis (e.g., time). This oversimplification fails to capture the complex, multidimensional nature of real-world variation. This paper introduces the task of Continuous Domain Generalization (CDG), which aims to generalize predictive models to unseen domains defined by arbitrary combinations of continuous variations. We present a principled framework grounded in geometric and algebraic theories, showing that optimal model parameters across domains lie on a low-dimensional manifold. To model this structure, we propose a Neural Lie Transport Operator (NeuralLio), which enables structure-preserving parameter transitions by enforcing geometric continuity and algebraic consistency. To handle noisy or incomplete domain variation descriptors, we introduce a gating mechanism to suppress irrelevant dimensions and a local chart-based strategy for robust generalization. Extensive experiments on synthetic and real-world datasets, including remote sensing, scientific documents, and traffic forecasting, demonstrate that our method significantly outperforms existing baselines in both generalization accuracy and robustness.



Understanding the Evolution of the Neural Tangent Kernel at the Edge of Stability

Neural Information Processing Systems

The study of Neural Tangent Kernels (NTKs) in deep learning has drawn increasing attention in recent years. NTKs typically actively change during training and are related to feature learning. In parallel, recent work on Gradient Descent (GD) has found a phenomenon called Edge of Stability (EoS), in which the largest eigenvalue of the NTK oscillates around a value inversely proportional to the step size. However, although follow-up works have explored the underlying mechanism of such eigenvalue behavior in depth, the understanding of the behavior of the NTK eigenvectors during EoS is still missing. This paper examines the dynamics of NTK eigenvectors during EoS in detail. Across different architectures, we observe that larger learning rates cause the leading eigenvectors of the final NTK, as well as the full NTK matrix, to have greater alignment with the training target. We then study the underlying mechanism of this phenomenon and provide a theoretical analysis for a two-layer linear network. Our study enhances the understanding of GD training dynamics in deep learning.


A solvable model of learning generative diffusion: theory and insights

Neural Information Processing Systems

In this manuscript, we analyze a solvable model of flow or diffusion-based generative model. We consider the problem of learning a model parametrized by a two-layer auto-encoder, trained with online stochastic gradient descent, on a highdimensional target density with an underlying low-dimensional manifold structure. We derive a tight asymptotic characterization of low-dimensional projections of the distribution of samples generated by the learned model, ascertaining in particular its dependence on the number of training samples. Building on this analysis, we discuss how mode collapse can arise, and lead to model collapse when the generative model is re-trained on generated synthetic data.


Modeling Microenvironment Trajectories on Spatial Transcriptomics with NicheFlow

Neural Information Processing Systems

Understanding the evolution of cellular microenvironments in spatiotemporal data is essential for deciphering tissue development and disease progression. While experimental techniques like spatial transcriptomics now enable high-resolution mapping of tissue organization across space and time, current methods that model cellular evolution operate at the single-cell level, overlooking the coordinated development of cellular states in a tissue. We introduce NicheFlow, a flow-based generative model that infers the temporal trajectory of cellular microenvironments across sequential spatial slides. By representing local cell neighborhoods as point clouds, NicheFlow jointly models the evolution of cell states and spatial coordinates using optimal transport and Variational Flow Matching. Our approach successfully recovers both global spatial architecture and local microenvironment composition across diverse spatiotemporal datasets, from embryonic to brain development.


Counterfactual Evolution of Multimodal Datasets via Visual Programming

Neural Information Processing Systems

The rapid development of Multimodal Large Language Models (MLLMs) poses increasing demands on the diversity and complexity of multimodal datasets. Yet manual annotation pipelines can no longer keep pace. Existing augmentation methods often follow fixed rules and lack verifiable control over sample diversity and reasoning complexity. To address this, we introduce Scalable COunterfactual Program Evolution (SCOPE), a framework that uses symbolic Visual Programming to guide program evolution via counterfactual reasoning. SCOPE performs the three steps of counterfactual inference: (1) Abduction, by generating verifiable programs to model reasoning associations; (2) Action, by intervening on program structure along three axes--reasoning path, visual context, and cross-instance composition; and (3) Prediction, by categorizing evolved instances by difficulty, structure, and input multiplicity. Based on this process, we build SCOPE-Train and SCOPE-Test, evolving benchmarks with expert validation. To support training, we propose MAP, a curriculum learning strategy that aligns model capacity with sample difficulty. Experiments show that SCOPE improves reasoning performance, exposes model blind spots, and enhances visual dialog capabilities.