Goto

Collaborating Authors

 Lemont







Kernel Model Validation: How To Do It, And Why You Should Care

arXiv.org Machine Learning

Gaussian Process (GP) models are popular tools in uncertainty quantification (UQ) because they purport to furnish functional uncertainty estimates that can be used to represent model uncertainty . It is often difficult to state with precision what probabilistic interpretation attaches to such an uncertainty, and in what way is it calibrated. Without such a calibration statement, the value of such uncertainty estimates is quite limited and qualitative. We motivate the importance of proper probabilistic calibration of GP predictions by describing how GP predictive calibration failures can cause degraded convergence properties in a target optimization algorithm called T argeted Adaptive Design (T AD). We discuss the interpretation of GP-generated uncertainty intervals in UQ, and how one may learn to trust them, through a formal procedure for covariance kernel validation that exploits the multivariate normal nature of GP predictions. We give simple examples of GP regression misspecified 1-dimensional models, and discuss the situation with respect to higher-dimensional models.


Distributional Sensitivity Analysis: Enabling Differentiability in Sample-Based Inference

arXiv.org Machine Learning

We present two analytical formulae for estimating the sensitivity -- namely, the gradient or Jacobian -- at given realizations of an arbitrary-dimensional random vector with respect to its distributional parameters. The first formula interprets this sensitivity as partial derivatives of the inverse mapping associated with the vector of 1-D conditional distributions. The second formula, intended for optimization methods that tolerate inexact gradients, introduces a diagonal approximation that reduces computational cost at the cost of some accuracy. We additionally provide four second-order numerical algorithms to approximate both formulae when closed forms are unavailable. We performed verification and validation studies to demonstrate the correctness of these numerical algorithms and the effectiveness of the proposed formulae. A nuclear physics application showcases how our work enables uncertainty quantification and parameter inference for quantum correlation functions. Our approach differs from existing methods by avoiding the need for model fitting, knowledge of sampling algorithms, and evaluation of high-dimensional integrals. It is therefore particularly useful for sample-based inverse problems when the sampler operates as a black box or requires expensive physics simulations. Moreover, our method renders arbitrary sampling subroutines differentiable, facilitating their integration into programming frameworks for deep learning and automatic differentiation. Algorithmic details and code implementations are provided in this paper and in our open-source software DistroSA to enable reproducibility and further development.


Enhancing Interpretability in Generative Modeling: Statistically Disentangled Latent Spaces Guided by Generative Factors in Scientific Datasets

arXiv.org Machine Learning

Semantic data representations are critical in artificial intelligence, significantly enhancing model performance in tasks like transfer and zero-shot learning (Lake et al., 2017). Central to this effort is to disentangle latent representations in generative models--representations where each latent dimension corresponds to an independent underlying factor of variation in the data. Disentanglement is achieved by leveraging statistical properties of the latent space and the dataset, enabling models where changes in one latent dimension affect only its corresponding factor without impacting others. This not only improves model interpretability but also enhances robustness against adversarial attacks (Yang et al., 2021). For a comprehensive review of disentanglement and its statistical underpinnings, see Wang et al. (2023). Datasets encountered in scientific research are often heterogeneous in modalities, fidelities, and accuracy where a particular entity or a state may be simultaneously associated with multiple images, graphs, vectors, scalar parameters, or labels with various associated measurement uncertainties.


Improving the Predictability of the Madden-Julian Oscillation at Subseasonal Scales with Gaussian Process Models

arXiv.org Machine Learning

The Madden-Julian Oscillation, or MJO, is a significant weather pattern that affects weather, influencing rainfall, temperature, and even storm frequency and intensity. When the MJO is active, it can affect the weather globally. To better predict weather changes with 3-4 weeks in advance, we rely on the ability to predict the MJO's activity. Data-driven methods such as the ones that rely on deep neural networks have been recently employed to make such predictions. By examining existing MJO patterns, neural networks attempt to predict upcoming ones. However, while neural networks are robust enough to predict the MJO's activity, they do not provide confidence intervals for those predictions. To address this shortcoming, we use a model known as the "Gaussian process" or GP. This statistical tool is distinctive because it not only provides predictions but also quantifies the level of confidence in them.


Benchmarking community drug response prediction models: datasets, models, tools, and metrics for cross-dataset generalization analysis

arXiv.org Artificial Intelligence

Deep learning (DL) and machine learning (ML) models have shown promise in drug response prediction (DRP), yet their ability to generalize across datasets remains an open question, raising concerns about their real-world applicability. Due to the lack of standardized benchmarking approaches, model evaluations and comparisons often rely on inconsistent datasets and evaluation criteria, making it difficult to assess true predictive capabilities. In this work, we introduce a benchmarking framework for evaluating cross-dataset prediction generalization in DRP models. Our framework incorporates five publicly available drug screening datasets, six standardized DRP models, and a scalable workflow for systematic evaluation. To assess model generalization, we introduce a set of evaluation metrics that quantify both absolute performance (e.g., predictive accuracy across datasets) and relative performance (e.g., performance drop compared to within-dataset results), enabling a more comprehensive assessment of model transferability. Our results reveal substantial performance drops when models are tested on unseen datasets, underscoring the importance of rigorous generalization assessments. While several models demonstrate relatively strong cross-dataset generalization, no single model consistently outperforms across all datasets. Furthermore, we identify CTRPv2 as the most effective source dataset for training, yielding higher generalization scores across target datasets. By sharing this standardized evaluation framework with the community, our study aims to establish a rigorous foundation for model comparison, and accelerate the development of robust DRP models for real-world applications.