Goto

Collaborating Authors

 new sample


Explanation-based Data Augmentation for Image Classification

Neural Information Processing Systems

Existing works have generated explanations for deep neural network decisions toprovide insights into model behavior. Weobservethat these explanations can also be used to identify concepts that caused misclassifications. This allows us to understand the possible limitations of the dataset used to train the model, particularly the under-represented regions in the dataset.


Balancing Suspense and Surprise: Timely Decision Making with Endogenous Information Acquisition

Ahmed M. Alaa, Mihaela Van Der Schaar

Neural Information Processing Systems

We develop a Bayesian model for decision-making under time p ressure with endogenous information acquisition. In our model, the decisi on-maker decides when to observe (costly) information by sampling an underlying c ontinuous-time stochastic process (time series) that conveys informa tion about the potential occurrence/non-occurrence of an adverse event which will t erminate the decision-making process. In her attempt to predict the occurrence of t he adverse event, the decision-maker follows a policy that determines when to acquire information from the time series (continuation), and when to stop acquiring information and make a final prediction (stopping). We show that the optimal polic y has a " rendezvous" structure, i.e. a structure in which whenever a new informat ion sample is gathered from the time series, the optimal "date" for acquiring the ne xt sample becomes computable. The optimal interval between two information s amples balances a trade-off between the decision maker's "surprise", i.e. th e drift in her posterior belief after observing new information, and "suspense", i. e. the probability that the adverse event occurs in the time interval between two inf ormation samples. Moreover, we characterize the continuation and stopping re gions in the decision-maker's state-space, and show that they depend not only on th e decision-maker's beliefs, but also on the "context", i.e. the current realiza tion of the time series.





OFAL: An Oracle-Free Active Learning Framework

Khorsand, Hadi, Pourahmadi, Vahid

arXiv.org Artificial Intelligence

In the active learning paradigm, using an oracle to label data has always been a complex and expensive task, and with the emersion of large unlabeled data pools, it would be highly beneficial If we could achieve better results without relying on an oracle. This research introduces OFAL, an oracle-free active learning scheme that utilizes neural network uncertainty. OFAL uses the model's own uncertainty to transform highly confident unlabeled samples into informative uncertain samples. First, we start with separating and quantifying different parts of uncertainty and introduce Monte Carlo Dropouts as an approximation of the Bayesian Neural Network model. Secondly, by adding a variational autoencoder, we go on to generate new uncertain samples by stepping toward the uncertain part of latent space starting from a confidence seed sample. By generating these new informative samples, we can perform active learning and enhance the model's accuracy. Lastly, we try to compare and integrate our method with other widely used active learning sampling methods.


Data Generation without Function Estimation

Daneshmand, Hadi, Soleymani, Ashkan

arXiv.org Machine Learning

Estimating the score function (or other population-density-dependent functions) is a fundamental component of most generative models. However, such function estimation is computationally and statistically challenging. Can we avoid function estimation for data generation? We propose an estimation-free generative method: A set of points whose locations are deterministically updated with (inverse) gradient descent can transport a uniform distribution to arbitrary data distribution, in the mean field regime, without function estimation, training neural networks, and even noise injection. The proposed method is built upon recent advances in the physics of interacting particles. We show, both theoretically and experimentally, that these advances can be leveraged to develop novel generative methods.


Diffusion-based supervised learning of generative models for efficient sampling of multimodal distributions

Tran, Hoang, Zhang, Zezhong, Bao, Feng, Lu, Dan, Zhang, Guannan

arXiv.org Machine Learning

We propose a hybrid generative model for efficient sampling of high-dimensional, multimodal probability distributions for Bayesian inference. Traditional Monte Carlo methods, such as the Metropolis-Hastings and Langevin Monte Carlo sampling methods, are effective for sampling from single-mode distributions in high-dimensional spaces. However, these methods struggle to produce samples with the correct proportions for each mode in multimodal distributions, especially for distributions with well separated modes. To address the challenges posed by multimodality, we adopt a divide-and-conquer strategy. We start by minimizing the energy function with initial guesses uniformly distributed within the prior domain to identify all the modes of the energy function. Then, we train a classifier to segment the domain corresponding to each mode. After the domain decomposition, we train a diffusion-model-assisted generative model for each identified mode within its support. Once each mode is characterized, we employ bridge sampling to estimate the normalizing constant, allowing us to directly adjust the ratios between the modes. Our numerical examples demonstrate that the proposed framework can effectively handle multimodal distributions with varying mode shapes in up to 100 dimensions. An application to Bayesian inverse problem for partial differential equations is also provided.


STOOD-X methodology: using statistical nonparametric test for OOD Detection Large-Scale datasets enhanced with explainability

Sevillano-García, Iván, Luengo, Julián, Herrera, Francisco

arXiv.org Machine Learning

Out-of-Distribution (OOD) detection is a critical task in machine learning, particularly in safety-sensitive applications where model failures can have serious consequences. However, current OOD detection methods often suffer from restrictive distributional assumptions, limited scalability, and a lack of interpretability. To address these challenges, we propose STOOD-X, a two-stage methodology that combines a Statistical nonparametric Test for OOD Detection with eXplainability enhancements. In the first stage, STOOD-X uses feature-space distances and a Wilcoxon-Mann-Whitney test to identify OOD samples without assuming a specific feature distribution. In the second stage, it generates user-friendly, concept-based visual explanations that reveal the features driving each decision, aligning with the BLUE XAI paradigm. Through extensive experiments on benchmark datasets and multiple architectures, STOOD-X achieves competitive performance against state-of-the-art post hoc OOD detectors, particularly in high-dimensional and complex settings. In addition, its explainability framework enables human oversight, bias detection, and model debugging, fostering trust and collaboration between humans and AI systems. The STOOD-X methodology therefore offers a robust, explainable, and scalable solution for real-world OOD detection tasks.


Generative diffusion models from a PDE perspective

Cao, Fei, Johnston, Kimball, Laurent, Thomas, Le, Justin, Motsch, Sébastien

arXiv.org Artificial Intelligence

Diffusion models have become the de facto framework for generating new datasets. The core of these models lies in the ability to reverse a diffusion process in time. The goal of this manuscript is to explain, from a PDE perspective, how this method works and how to derive the PDE governing the reverse dynamics as well as to study its solution analytically. By linking forward and reverse dynamics, we show that the reverse process's distribution has its support contained within the original distribution. Consequently, diffusion methods, in their analytical formulation, do not inherently regularize the original distribution, and thus, there is no generalization principle. This raises a question: where does generalization arise, given that in practice it does occur? Moreover, we derive an explicit solution to the reverse process's SDE under the assumption that the starting point of the forward process is fixed. This provides a new derivation that links two popular approaches to generative diffusion models: stable diffusion (discrete dynamics) and the score-based approach (continuous dynamics). Finally, we explore the case where the original distribution consists of a finite set of data points. In this scenario, the reverse dynamics are explicit (i.e., the loss function has a clear minimizer), and solving the dynamics fails to generate new samples: the dynamics converge to the original samples. In a sense, solving the minimization problem exactly is "too good for its own good" (i.e., an overfitting regime).