Goto

Collaborating Authors

 empirical cdf


DINOv3 as a Frozen Encoder for CRPS-Oriented Probabilistic Rainfall Nowcasting

Filho, Luciano Araujo Dourado, Neto, Almir Moreira da Silva, Miyaguchi, Anthony, David, Rodrigo Pereira, Calumby, Rodrigo Tripodi, Picek, Lukáš

arXiv.org Artificial Intelligence

This paper proposes a competitive and computationally efficient approach to probabilistic rainfall nowcasting. A video projector (V-JEPA Vision Transformer) associated to a lightweight probabilistic head is attached to a pre-trained satellite vision encoder (DINOv3-SAT493M) to map encoder tokens into a discrete empirical CDF (eCDF) over 4-hour accumulated rainfall. The projector-head is optimized end-to-end over the Ranked Probability Score (RPS). As an alternative, 3D-UNET baselines trained with an aggregate Rank Probability Score and a per-pixel Gamma-Hurdle objective are used. On the Weather4Cast 2025 benchmark, the proposed method achieved a promising performance, with a CRPS of 3.5102, which represents $\approx$ 26% in effectiveness gain against the best 3D-UNET.



Improving Coverage in Combined Prediction Sets with Weighted p-values

Wong, Gina, Prinster, Drew, Saria, Suchi, Chellappa, Rama, Liu, Anqi

arXiv.org Machine Learning

Conformal prediction quantifies the uncertainty of machine learning models by augmenting point predictions with valid prediction sets, assuming exchangeability. For complex scenarios involving multiple trials, models, or data sources, conformal prediction sets can be aggregated to create a prediction set that captures the overall uncertainty, often improving precision. However, aggregating multiple prediction sets with individual $1-α$ coverage inevitably weakens the overall guarantee, typically resulting in $1-2α$ worst-case coverage. In this work, we propose a framework for the weighted aggregation of prediction sets, where weights are assigned to each prediction set based on their contribution. Our framework offers flexible control over how the sets are aggregated, achieving tighter coverage bounds that interpolate between the $1-2α$ guarantee of the combined models and the $1-α$ guarantee of an individual model depending on the distribution of weights. We extend our framework to data-dependent weights, and we derive a general procedure for data-dependent weight aggregation that maintains finite-sample validity. We demonstrate the effectiveness of our methods through experiments on synthetic and real data in the mixture-of-experts setting, and we show that aggregation with data-dependent weights provides a form of adaptive coverage.


Predicting band gap from chemical composition: A simple learned model for a material property with atypical statistics

Ma, Andrew, Dugan, Owen, Soljačić, Marin

arXiv.org Artificial Intelligence

In solid-state materials science, substantial efforts have been devoted to the calculation and modeling of the electronic band gap. While a wide range of ab initio methods and machine learning algorithms have been created that can predict this quantity, the development of new computational approaches for studying the band gap remains an active area of research. Here we introduce a simple machine learning model for predicting the band gap using only the chemical composition of the crystalline material. To motivate the form of the model, we first analyze the empirical distribution of the band gap, which sheds new light on its atypical statistics. Specifically, our analysis enables us to frame band gap prediction as a task of modeling a mixed random variable, and we design our model accordingly. Our model formulation incorporates thematic ideas from chemical heuristic models for other material properties in a manner that is suited towards the band gap modeling task. The model has exactly one parameter corresponding to each element, which is fit using data. To predict the band gap for a given material, the model computes a weighted average of the parameters associated with its constituent elements and then takes the maximum of this quantity and zero. The model provides heuristic chemical interpretability by intuitively capturing the associations between the band gap and individual chemical elements.


Modelling Sampling Distributions of Test Statistics with Autograd

Kadhim, Ali Al, Prosper, Harrison B.

arXiv.org Machine Learning

Automatic differentiation (see, for example, Ref.[1]) has revolutionized machine learning, permitting the routine application of gradient descent algorithms to fit to data models of essentially unlimited complexity. The same technology can be used to take the derivative of these models with respect to their inputs without the need to explicitly calculate the derivatives [2]. A potentially useful application of this capability is approximating the probability density function (pdf), f(x | θ), given an accurate neural network model of the associated conditional cumulative distribution function (cdf), F (x | θ), using the fact that F (x | θ) f(x | θ) =, (1) x where θ are the parameters of the data-generation mechanism, which we distinguish from the parameters w of the neural network model. This paper explores this possibility in the context of simulation-based frequentist inference [3-7]. Equation (1) furnishes an approximation of the pdf f(x | θ) whether x is a function of the underlying observations D only or if x = λ(D; θ) is a test statistic that depends on D as well as on the parameters θ. Moreover, computing the derivative of the cdf using autograd to obtain the pdf is exact; autograd does not use finite difference approximations.


Generalization Error Bounds for Learning under Censored Feedback

Yang, Yifan, Payani, Ali, Naghizadeh, Parinaz

arXiv.org Machine Learning

Generalization error bounds from learning theory provide statistical guarantees on how well an algorithm will perform on previously unseen data. In this paper, we characterize the impacts of data non-IIDness due to censored feedback (a.k.a. selective labeling bias) on such bounds. We first derive an extension of the well-known Dvoretzky-Kiefer-Wolfowitz (DKW) inequality, which characterizes the gap between empirical and theoretical CDFs given IID data, to problems with non-IID data due to censored feedback. We then use this CDF error bound to provide a bound on the generalization error guarantees of a classifier trained on such non-IID data. We show that existing generalization error bounds (which do not account for censored feedback) fail to correctly capture the model's generalization guarantees, verifying the need for our bounds. We further analyze the effectiveness of (pure and bounded) exploration techniques, proposed by recent literature as a way to alleviate censored feedback, on improving our error bounds. Together, our findings illustrate how a decision maker should account for the trade-off between strengthening the generalization guarantees of an algorithm and the costs incurred in data collection when future data availability is limited by censored feedback.


Structured Evaluation of Synthetic Tabular Data

Yang, Scott Cheng-Hsin, Eaves, Baxter, Schmidt, Michael, Swanson, Ken, Shafto, Patrick

arXiv.org Machine Learning

Tabular data is common yet typically incomplete, small in volume, and access-restricted due to privacy concerns. Synthetic data generation offers potential solutions. Many metrics exist for evaluating the quality of synthetic tabular data; however, we lack an objective, coherent interpretation of the many metrics. To address this issue, we propose an evaluation framework with a single, mathematical objective that posits that the synthetic data should be drawn from the same distribution as the observed data. Through various structural decomposition of the objective, this framework allows us to reason for the first time the completeness of any set of metrics, as well as unifies existing metrics, including those that stem from fidelity considerations, downstream application, and model-based approaches. Moreover, the framework motivates model-free baselines and a new spectrum of metrics. We evaluate structurally informed synthesizers and synthesizers powered by deep learning. Using our structured framework, we show that synthetic data generators that explicitly represent tabular structure outperform other methods, especially on smaller datasets.


Change Point Detection with Conceptors

Gade, Noah D., Rodu, Jordan

arXiv.org Machine Learning

Offline change point detection retrospectively locates change points in a time series. Many nonparametric methods that target i.i.d. mean and variance changes fail in the presence of nonlinear temporal dependence, and model based methods require a known, rigid structure. For the at most one change point problem, we propose use of a conceptor matrix to learn the characteristic dynamics of a baseline training window with arbitrary dependence structure. The associated echo state network acts as a featurizer of the data, and change points are identified from the nature of the interactions between the features and their relationship to the baseline state. This model agnostic method can suggest potential locations of interest that warrant further study. We prove that, under mild assumptions, the method provides a consistent estimate of the true change point, and quantile estimates are produced via a moving block bootstrap of the original data. The method is evaluated with clustering metrics and Type 1 error control on simulated data, and applied to publicly available neural data from rats experiencing bouts of non-REM sleep prior to exploration of a radial maze. With sufficient spacing, the framework provides a simple extension to the sparse, multiple change point problem.


Non-Stochastic CDF Estimation Using Threshold Queries

Okoroafor, Princewill, Gupta, Vaishnavi, Kleinberg, Robert, Goh, Eleanor

arXiv.org Artificial Intelligence

Estimating the empirical distribution of a scalar-valued data set is a basic and fundamental task. In this paper, we tackle the problem of estimating an empirical distribution in a setting with two challenging features. First, the algorithm does not directly observe the data; instead, it only asks a limited number of threshold queries about each sample. Second, the data are not assumed to be independent and identically distributed; instead, we allow for an arbitrary process generating the samples, including an adaptive adversary. These considerations are relevant, for example, when modeling a seller experimenting with posted prices to estimate the distribution of consumers' willingness to pay for a product: offering a price and observing a consumer's purchase decision is equivalent to asking a single threshold query about their value, and the distribution of consumers' values may be non-stationary over time, as early adopters may differ markedly from late adopters. Our main result quantifies, to within a constant factor, the sample complexity of estimating the empirical CDF of a sequence of elements of $[n]$, up to $\varepsilon$ additive error, using one threshold query per sample. The complexity depends only logarithmically on $n$, and our result can be interpreted as extending the existing logarithmic-complexity results for noisy binary search to the more challenging setting where noise is non-stochastic. Along the way to designing our algorithm, we consider a more general model in which the algorithm is allowed to make a limited number of simultaneous threshold queries on each sample. We solve this problem using Blackwell's Approachability Theorem and the exponential weights method. As a side result of independent interest, we characterize the minimum number of simultaneous threshold queries required by deterministic CDF estimation algorithms.


Group Meritocratic Fairness in Linear Contextual Bandits

Grazzi, Riccardo, Akhavan, Arya, Falk, John Isak Texas, Cella, Leonardo, Pontil, Massimiliano

arXiv.org Artificial Intelligence

We study the linear contextual bandit problem where an agent has to select one candidate from a pool and each candidate belongs to a sensitive group. In this setting, candidates' rewards may not be directly comparable between groups, for example when the agent is an employer hiring candidates from different ethnic groups and some groups have a lower reward due to discriminatory bias and/or social injustice. We propose a notion of fairness that states that the agent's policy is fair when it selects a candidate with highest relative rank, which measures how good the reward is when compared to candidates from the same group. This is a very strong notion of fairness, since the relative rank is not directly observed by the agent and depends on the underlying reward model and on the distribution of rewards. Thus we study the problem of learning a policy which approximates a fair policy under the condition that the contexts are independent between groups and the distribution of rewards of each group is absolutely continuous. In particular, we design a greedy policy which at each round constructs a ridge regression estimate from the observed context-reward pairs, and then computes an estimate of the relative rank of each candidate using the empirical cumulative distribution function. We prove that, despite its simplicity and the lack of an initial exploration phase, the greedy policy achieves, up to log factors and with high probability, a fair pseudo-regret of order $\sqrt{dT}$ after $T$ rounds, where $d$ is the dimension of the context vectors. The policy also satisfies demographic parity at each round when averaged over all possible information available before the selection. Finally, we use simulated settings and experiments on the US census data to show that our policy achieves sub-linear fair pseudo-regret also in practice.