Goto

Collaborating Authors

 generalizability


Towards Generalizable Retina Vessel Segmentation with Deformable Graph Priors

Neural Information Processing Systems

Retinal vessel segmentation is critical for medical diagnosis, yet existing models often struggle to generalize across domains due to appearance variability, limited annotations, and complex vascular morphology. We propose GraphSeg, a variational Bayesian framework that integrates anatomical graph priors with structure-aware image decomposition to enhance cross-domain segmentation.


Dual Data Alignment Makes AI-Generated Image Detector Easier Generalizable

Neural Information Processing Systems

The rapid increase in AI-generated images (AIGIs) underscores the need for detection methods. Existing detectors are often trained on biased datasets, leading to overfitting on spurious correlations between non-causal image attributes and real/synthetic labels. While these biased features enhance performance on the training data, they result in substantial performance degradation when tested on unbiased datasets. A common solution is to perform data alignment through generative reconstruction, matching the content between real and synthetic images. However, we find that pixel-level alignment alone is inadequate, as the reconstructed images still suffer from frequency-level misalignment, perpetuating spurious correlations.


STARC-9: A Large-scale Dataset for Multi-Class Tissue Classification for CRC Histopathology

Neural Information Processing Systems

Multi-class tissue-type classification of colorectal cancer (CRC) histopathologic images is a significant step in the development of downstream machine learning models for diagnosis and treatment planning. However, publicly available CRC datasets used to build tissue classifiers often suffer from insufficient morphologic diversity, class imbalance, and low-quality image tiles, limiting downstream model performance and generalizability. To address this research gap, we introduce STARC-9 (STAnford coloRectal Cancer), a large-scale dataset for multi-class tissue classification. STARC-9 comprises 630,000 histopathologic image tiles uniformly sampled across nine clinically relevant tissue classes (each represented by 70,000 tiles), systematically extracted from hematoxylin & eosin-stained whole-slide images (WSI) from 200 CRC patients at the Stanford University School of Medicine. To construct STARC-9, we propose a novel framework, DeepCluster++, consisting of two primary steps to ensure diversity within each tissue class, followed by pathologist verification.


Is Grokking a Computational Glass Relaxation?

Neural Information Processing Systems

Understanding neural network' (NN) generalizability remains a central question in deep learning research. The special phenomenon of grokking, where NNs abruptly generalize long after the training performance reaches near-perfect level, offers a unique window to investigate the underlying mechanisms of NNs' generalizability. Here we propose an interpretation for grokking by framing it as a computational glass relaxation: viewing NNs as a physical system where parameters are the degrees of freedom and train loss is the system energy, we find memorization process resembles a rapid cooling of liquid into non-equilibrium glassy state at low temperature and the later generalization is like a slow relaxation towards a more stable configuration. This mapping enables us to sample NNs' Boltzmann entropy (states of density) landscape as a function of training loss and test accuracy.


Generalized and Invariant Single-Neuron In-Vivo Activity Representation Learning

Neural Information Processing Systems

In computational neuroscience, models representing single-neuron in-vivo activity have become essential for understanding the functional identities of individual neurons. These models, such as implicit representation methods based on Transformer architectures, contrastive learning frameworks, and variational autoencoders, aim to capture the invariant and intrinsic computational features of single neurons. The learned single-neuron computational role representations should remain invariant across changing environment and are affected by their molecular expression and location. Thus, the representations allow for in vivo prediction of the molecular cell types and anatomical locations of single neurons, facilitating advanced closed-loop experimental designs. However, current models face the problem of limited generalizability.


Enhancing Sharpness-Aware Optimization Through Variance Suppression

Neural Information Processing Systems

Sharpness-aware minimization (SAM) has well documented merits in enhancing generalization of deep neural networks, even without sizable data augmentation. Embracing the geometry of the loss function, where neighborhoods of'flat minima' heighten generalization ability, SAM seeks'flat valleys' by minimizing the maximum loss caused by an adversary perturbing parameters within the neighborhood. Although critical to account for sharpness of the loss function, such an'over-friendly adversary' can curtail the outmost level of generalization. The novel approach of this contribution fosters stabilization of adversaries through variance suppression (VaSSO) to avoid such friendliness.


Enhancing Sharpness-Aware Optimization Through Variance Suppression

Neural Information Processing Systems

Sharpness-aware minimization (SAM) has well documented merits in enhancing generalization of deep neural networks, even without sizable data augmentation. Embracing the geometry of the loss function, where neighborhoods of'flat minima' heighten generalization ability, SAM seeks'flat valleys' by minimizing the maximum loss caused by an adversary perturbing parameters within the neighborhood. Although critical to account for sharpness of the loss function, such an'over-friendly adversary' can curtail the outmost level of generalization. The novel approach of this contribution fosters stabilization of adversaries through variance suppression (VaSSO) to avoid such friendliness.



Meta-Reinforcement Learning with Universal Policy Adaptation: Provable Near-Optimality under All-task Optimum Comparator

Neural Information Processing Systems

Meta-reinforcement learning (Meta-RL) has attracted attention due to its capability to enhance reinforcement learning (RL) algorithms, in terms of data efficiency and generalizability. In this paper, we develop a bilevel optimization framework for meta-RL (BO-MRL) to learn the meta-prior for task-specific policy adaptation, which implements multiple-step policy optimization on one-time data collection. Beyond existing meta-RL analyses, we provide upper bounds of the expected optimality gap over the task distribution. This metric measures the distance of the policy adaptation from the learned meta-prior to the task-specific optimum, and quantifies the model's generalizability to the task distribution. We empirically validate the correctness of the derived upper bounds and demonstrate the superior effectiveness of the proposed algorithm over benchmarks.


Probing the Decision Boundaries of In-context Learning in Large Language Models

Neural Information Processing Systems

In-context learning is an emergent paradigm in large language models (LLMs) that enables them to generalize to new tasks and domains by simply prompting these models with a few exemplars without explicit parameter updates. Many attempts have been made to understand in-context learning in LLMs as a function of model scale, pretraining data, and other factors. In this work, we propose a new mechanism to probe and understand in-context learning from the lens of decision boundaries for in-context binary classification. Decision boundaries are straightforward to visualize and provide important information about the qualitative behavior of the inductive biases of standard classifiers. To our surprise, we find that the decision boundaries learned by current LLMs in simple binary classification tasks are often irregularly non-smooth, regardless of task linearity. This paper investigates the factors influencing these decision boundaries and explores methods to enhance their generalizability. We assess various approaches, including training-free and fine-tuning methods for LLMs, the impact of model architecture, and the effectiveness of active prompting techniques for smoothing decision boundaries in a data-efficient manner. Our findings provide a deeper understanding of in-context learning dynamics and offer practical improvements for enhancing robustness and generalizability of in-context learning.