Goto

Collaborating Authors

 cdf


Flexible Language Modeling in Continuous Space with Transformer-based Autoregressive Flows

Neural Information Processing Systems

Autoregressive models have driven remarkable progress in language modeling. Their foundational reliance on discrete tokens, unidirectional context, and singlepass decoding, while central to their success, also inspires the exploration of a design space that could offer new axes of modeling flexibility. In this work, we explore an alternative paradigm, shifting language modeling from a discrete token space to a continuous latent space. We propose a novel framework TarFlowLM, that employs transformer-based autoregressive normalizing flows [73] to model these continuous representations. This approach unlocks substantial flexibility, enabling the construction of models that can capture global bi-directional context through stacked, alternating-direction autoregressive transformations, support block-wise generation with flexible token patch sizes, and facilitate a hierarchical multi-pass generation process. We further propose new mixture-based coupling transformations designed to capture complex dependencies within the latent space shaped by discrete data, and demonstrate theoretical connections to conventional discrete autoregressive models. Extensive experiments on language modeling benchmarks demonstrate strong likelihood performance and highlight the flexible modeling capabilities inherent in our framework.


Learning Context-conditioned Gaussian Overbounds for Convolution-Based Uncertainty Propagation

arXiv.org Machine Learning

Uncertainty quantification is essential in safety-critical settings--from autonomous driving to aviation, finance, and health--where decisions must rely on conservative bounds rather than point estimates. Predictor-level intervals (e.g., from quantile regression, conformal prediction, variance networks, or Bayesian models) generally do not compose: adding two per-variable intervals need not yield a valid interval for their sum or preserve coverage. In aviation, Gaussian overbounding replaces complex error distributions with a conservative Gaussian whose tails dominate the truth, so conservatism propagates through linear operations. Yet classical overbounds are global, often overly conservative, and hard to adapt to feature-conditioned errors. We propose a unified learning framework that trains neural networks to produce context-aware Gaussian overbounds--mean and scale--with provable conservatism on a finite quantile grid and, under three explicit regularity assumptions, continuous-tail conservatism on a certified interval. Our overbounding loss enforces conservativeness at selected quantiles while penalizing distributional distance with a Wasserstein-style term. The learned bounds support conservative linear-combination and convolution analysis on the enforced grid, and on the certified interval when assumptions hold, while being less redundant than traditional methods. We provide a scoped analysis of discrete-to-continuous conservatism and compact-domain objective regularity, and validate on synthetic data and real-world datasets, including multipath, ionospheric, and tropospheric residual errors. Across these settings, the method yields tighter bounds while maintaining conservatism on the enforced grid and in experiments. The framework is modality-agnostic and applicable to learning systems that require conservative, feature-conditioned uncertainty estimates in dynamic environments.



High-Resolution Tensor-Network Fourier Methods for Exponentially Compressed Non-Gaussian Aggregate Distributions

arXiv.org Machine Learning

Its low-rank QTT structure arises from intrinsic spectral smoothness in continuous models, or from spectral energy concentration as the number of components D grows in discrete models. We demonstrate this on weighted sums of Bernoulli and lognormal random variables. In the latter, the approach reaches high-resolution discretizations of N = 230 frequency modes on standard hardware, far beyond the N =224 ceiling of dense implementations. These compressed representations enable efficient computation of Value at Risk (VaR) and Expected Shortfall (ES), supporting applications in quantitative finance and beyond. I. INTRODUCTION Weighted sums of independent random variables constitute a basic probabilistic model, describing macroscopic behavior arising from the aggregation of microscopic stochastic components. These models arise in a wide range of applications. Their probability distribution generally lacks a closed-form expression, and their evaluation involves multidimensional convolution integrals that are susceptible to the curse of dimensionality. Consequently, evaluating these models relies on specializednumericalmethods. Whilethese methods have been adapted for discrete settings [18, 19], they are frequently hampered by persistent Gibbs oscillations, which arise from distributional discontinuities and preclude uniform convergence [20, 21]. No existing method simultaneously achieves an accurate approximation of the exact, fully non-Gaussian target distribution while remaining scalable to larger, practically relevant system sizes. In this work, we introduce a new algorithm that combines the Fourier spectral method with tensor-network techniques.



Distribution-Free Statistical Dispersion Control for Societal Applications

Neural Information Processing Systems

Previous work has focused mainly on bounding either the expected loss of a predictor or the probability that an individual prediction will incur a loss value in a specified range.





Time-uniform confidence bands for the CDF under nonstationarity

Neural Information Processing Systems

Estimation of a complete univariate distribution from a sequence of observations is a useful primitive for both manual and automated decision making. This problem has received extensive attention in the i.i.d.