Technology
Functional Gradient Descent with Adaptive Representations
Csillag, Daniel, Schuller, Rodrigo, Dall'Antonia, Pedro, Guibas, Leonidas, Velho, Luiz, Novello, Tiago
Functional optimization problems are typically solved by optimizing the parameters of a fixed representation, such as a neural network, resulting in highly nonconvex losses that complicate both training and theoretical analysis. An interesting alternative is functional gradient descent (FGD), that is, gradient descent directly in function space, which benefits from strong convergence results and admits a clean theory. However, FGD is difficult to implement in practice because functional gradients are infinite-dimensional, and thus cannot be fully computed nor stored in memory. Existing implementations therefore rely on fixed approximations, which introduce approximation error. We propose a new, theoretically-grounded FGD algorithm that adapts the representation of the functional gradients over the course of optimization. By explicitly incorporating this approximation into the analysis, we establish convergence to a stationary point (for smooth losses) and to a global minimizer (under smoothness + a Polyak-Lojasiewicz-type condition) regardless of our approximations. To the best of our knowledge, this is the first implementable FGD method with such guarantees in a general setting. We demonstrate the effectiveness of our method on regression, numerical solution of PDEs, and modern computer vision. Across settings, our method consistently outperforms both FGD with fixed approximations and neural network baselines in efficiency and accuracy.
Stochastic trace estimation with tensor train random vectors
Bujanoviฤ, Zvonimir, Kressner, Daniel, Oliฤ, Hrvoje
Stochastic trace estimation is a standard tool for approximating the trace of a large-scale matrix available only through matrix-vector products. However, in tensor-structured settings, unstructured Gaussian or Rademacher test vectors may be prohibitively expensive to store and compute with, while cheaper rank-one tensor-product vectors can require sample complexities that grow exponentially with the tensor order. This work studies Gaussian random tensor train vectors as a structured alternative for stochastic trace estimation. We show that, with a suitable choice of the tensor train rank, random tensor train vectors recover dimension-independent guarantees for the Girard--Hutchinson estimator. In particular, a median-of-means variant with tensor train rank $r \geq d-1$ achieves the same dependence on the accuracy $\varepsilon$ and failure probability $ฮด$ as the classical estimator based on unstructured Gaussian vectors. We further prove an oblivious subspace injection result for sketches formed from independent Gaussian random tensor train vectors: tensor train rank $r\geq d-1$ and $\mathcal{O}(\varepsilon^{-2}(k+\log(1/ฮด)))$ samples suffice for a $k$-dimensional target subspace. Finally, we investigate the use of such sketches within the Nystrรถm++ framework. We show that the resulting estimator can achieve the desired $\mathcal{O}(\varepsilon^{-1})$ sample complexity under an additional spectral-tail condition. These results provide clarififcation on both the potential and the limitations of random tensor train vectors in stochastic trace estimation.
Conformal Candidate Certification for Offline Model-Based Optimization
Offline model-based optimization (MBO) proposes candidates by optimizing a surrogate trained on a fixed historical dataset. Because candidates are deliberately out-of-distribution, surrogate rankings are least reliable exactly where the optimizer is most aggressive, yet existing methods provide no per-candidate statistical certificate that a design meets a target threshold. We propose \emph{Conformal Candidate Certification} (CCC), a post-hoc wrapper that attaches a calibrated one-sided lower bound to each candidate and advances only those whose bound exceeds the target. We show that entropy-regularized surrogate maximization induces a Gibbs-tilted proposal, so the same surrogate supplies importance weights for weighted conformal prediction without a separate density-ratio estimation step. In a controlled synthetic study, CCC certifies $16.7\%$ of an aggressive proposal pool with empirical coverage 0.990 at nominal 0.90, while standard conformal prediction ignoring the covariate shift collapses to 0.416 coverage.
A Koopman-PINN Framework for Epidemic Models: Parameter Inference and Forecasting
Zinihi, Achraf, Ehrhardt, Matthias, Ammi, Moulay Rchid Sidi
We propose a Koopman-enhanced physics-informed neural network (K--PINN) framework for parameter inference and forecasting in nonlinear epidemic models. This method combines Koopman operator theory and physics-informed learning. It maps epidemic states into a latent observable space where the dynamics evolve approximately linearly while satisfying the governing epidemic equations through automatic differentiation. This integration improves interpretability, parameter identifiability, and long-term predictive stability. We apply the proposed framework to a normalized SEIRSD epidemic model and evaluate it using synthetic monkeypox (Mpox) data and real-world datasets from Germany, Morocco, and Sweden for the SARS-CoV-2 virus. Synthetic trajectories are generated using a structure-preserving, nonstandard finite difference scheme to ensure reliable training data. Numerical results demonstrate that K--PINN achieves more accurate parameter estimation, trajectory reconstruction, and long-term forecasting than classical PINNs and Koopman-EDMD approaches. These results suggest that K--PINN is an effective machine learning framework for epidemic modeling that can be extended to more complex systems.
Service-Induced Congestion in Memory-Constrained LLM Serving
Ao, Ruicheng, Dong, Jing, Luo, Gan, Simchi-Levi, David
In large language model (LLM) serving, each request accumulates persistent graphics processing unit (GPU) memory during service as its key-value cache grows with every generated token. Under high concurrency, aggregate memory usage therefore increases endogenously over time: the service process itself creates future capacity pressure. When memory capacity is exceeded, systems evict active requests, discarding cached state and restarting them later, which wastes computation and reduces throughput. We develop a discrete-time dynamical model of memory-constrained LLM inference that captures admission, memory growth, and eviction under continuous batching. In the saturated-input regime, the system admits both eviction-free fixed points and limit cycles with evictions. For homogeneous workloads, we show that the eviction-free equilibrium is unstable and that, except for a Lebesgue-measure-zero exact-capture set, the system converges to a unique worst-case limit cycle that is asymptotically stable outside this exceptional set, with throughput losses as large as 50%. For heterogeneous workloads, we prove a stability criterion in the two-class common-input setting and explain how the survival-polynomial mechanism generalizes to multiple classes and heterogeneous-input lengths. Under an input-dominated scaling regime, coprime decoding lengths stabilize the eviction-free equilibrium, while non-coprime lengths create synchronized modes that drive instability. These results characterize when workload heterogeneity desynchronizes completions and helps stabilize memory-constrained serving. More broadly, we identify service-induced congestion as a structural instability mechanism and derive scheduling design principles for sustaining high throughput.
Score-Based Martingale Posteriors for Deep Neural Networks
Zhumekenov, Abylay, Jasra, Ajay, Maama, Mohamed, Tempone, Raul
In this paper we investigate the efficacy of the score-based martingale posteriors (SMP) (Cui & Walker, 2025; Fong et al., 2023) in the context of modern and large-scale machine learning problems and its potential for meaningful uncertainty quantification. SMPs work with a stochastic gradient ascent-type recursion on the parameter space of stochastic models and construct a martingale on the parameter space. Under simple mathematical assumptions, the recursion can be built so that the parameters form a martingale sequence which possesses a limiting, in time, random variable, the latter of which can be simulated very quickly, in contrast to Monte Carlo-based methods such as Markov chain Monte Carlo. In this expository paper we explore the SMP for inferring the parameters of deep neural networks (DNNs) and, where feasible, compare our results to the state-of-the-art Monte Carlo methods aimed at inferring conventional Bayesian posteriors.
Finite Resources False Discovery Rate Control in Structured Hypothesis Spaces
Perets, Binyamin, Mannor, Shie
Scientific discovery relies on large-scale hypothesis testing. However, the capacity to identify true discoveries while controlling false discovery faces major challenges: obtaining relevant reference data (the null distribution) is resource-intensive, leaving finite-data uncertainty, and the procedure should account for the inherent structure in the hypothesis space, when such structure exists. Here, we present a framework for controlling the false discovery rate both when each hypothesis is evidenced only by a finite count of null draws, leaving its p-value uncertain, and when the hypothesis space carries arbitrary structure, requiring only that the structure be represented through a suitable reproducing kernel. We present two decision rules that are both robust to structural mis-specification, yet offer a distinct trade-off between exact FDR control and statistical power. The first rule guarantees exact FDR control; the second maximizes power by adapting mirror-statistic control into count space, utilizing an analytical framework to assess FDR control when exact mirror symmetry is relaxed. Furthermore, the tractability gained by the RKHS framework allows us to directly investigate finite-data uncertainties, which we leverage to suggest a policy for the efficient allocation of null distribution samples.
Generative Predictive Distributions for Time Series
Llorens-Terrazas, Jordi, Meitz, Mika
We propose a flexible framework for modeling the predictive distributions of nonlinear, possibly multivariate time series. Our approach expresses a general predictive distribution in an appropriate generative representation that is based on a folklore result from measure theoretic probability. This representation provides a direct simulation-based approximation to the predictive distribution, enabling straightforward computation of forecasts for the conditional mean and variance, fan charts, value at risk, expected shortfall, joint tail risks, and other quantities of interest. We estimate this generative representation using a version of conditional generative adversarial networks and provide a formal statistical analysis of estimation under weak temporal dependence. Specifically, estimation is expressed as a particular minimax problem and we establish consistency of its approximate solutions in Hausdorff distance. The empirical relevance of the approach is illustrated using applications to equity returns, realized variance, and realized covariances. The proposed method is also computationally manageable, with estimation in our applications taking approximately one minute on a standard laptop.
PHINN: Persistent Homology Inspired Neural Network for Rare-Event Time Series Generation
Yusuf, Emre, Takahashi, Ren, Bhaduri, Jayabrata
Rare events in time series are critical to model but hard to learn due to data scarcity. Current generative models struggle with extreme values. We observe that rare events leave distinct topological fingerprints - transitions in Betti numbers from point-cloud embeddings - that are more stable and discriminative than statistical moments. We introduce PHINN, a flow-matching framework using dynamic Betti curves as conditioning signals and a persistence landscape loss for homology consistency. It scales to multivariate data, includes a natural-language interface to set Betti targets, supports cross-domain meta-learning and few-shot generation, and provides certified adversarial robustness. On financial, epidemiological, and multi-modal benchmarks, PHINN outperforms statistical and diffusion baselines in topological fidelity (beta-RMSE down 41-63%, transition accuracy up 84%) and matches jump-diffusion models in tail coverage while exceeding them in shape fidelity. All results have 95% confidence intervals.
LLMs on Tabular Data with Limited Semantics: Evidence from Industrial Car Retrofit Prediction
Pons, Aina Vila, Tzachristas, Ioannis, Antoniou, Constantinos
Industrial retrofit planning depends on structured operational data rather than free text: planners must estimate whether a newly registered prototype will require a retrofit, which retrofit package it will need, and how long the work will take. We study an industrial dataset linking a prototype-registration system (284,271 vehicles) with a retrofit-management system (48,716 cleaned visits), and compare strong tabular machine learning baselines with three LLM-based strategies on row-serialized inputs: embedding features (Amazon Titan), direct prompted classification (Claude Sonnet 4), and an ML+LLM stacking approach. Across binary occurrence prediction, 15-way retrofit-type classification, per-visit duration regression, and an aggregated monthly benchmark, classical tree ensembles remain the strongest standalone models. However, the LLM results reveal a consistent pattern: embeddings remain useful on tables (binary AUC = 0.982), direct prompting collapses once semantic signal is stripped by hashing (binary AUC = 0.500; multiclass weighted F1 = 0.018), and hybrid stacking yields the best manually built multiclass model (weighted F1 = 0.626). On the monthly benchmark, lag-based machine learning outperforms time-series foundation models, though Chronos-small remains competitive in zero-shot forecasting. The results suggest that on privacy-constrained industrial tables, LLMs are more effective as complementary components than as replacements for strong tabular baselines.