Goto

Collaborating Authors

 Genre


Generative Modeling of Approximately Periodic Time Series by a Posterior-Weighted Gaussian Process

arXiv.org Machine Learning

Discrete automated processes in industrial and cyber-physical systems often exhibit a repetitive structure in which successive repetitions follow a common trajectory while differing in duration, amplitude, and fine-scale dynamics. Such \emph{approximately periodic} behavior poses a challenge for Gaussian Processes (GP) modeling: strictly periodic models suppress inter-repetition variability, while non-periodic models fail to capture the strong structural regularities required for generation. In this work, we propose a stochastic generative model for approximately periodic time series. The model is based on a GP whose posterior is modulated by a novel kernel. Our approach decouples intra-repetition structure from inter-repetition variability through a two-stage construction which yields a generative distribution with a identical mean function across repetitions, while allowing smooth variation between repetitions. The modeling choices are supported by an implementation in which realistic synthetic trajectories are generated from toy datasets.


Kernel-based guarantees for nonlinear parametric models in Bayesian optimization

arXiv.org Machine Learning

Modern Bayesian optimization and adaptive sampling methods increasingly rely on nonlinear parametric models, yet theoretical guarantees for such models under adaptive data collection remain limited. Existing analyses largely focus on Gaussian processes, kernel machines, linear models, or linearized neural approximations, leaving a gap between theory and the nonlinear models used in practice. We develop a kernel-based framework for analyzing regularized nonlinear parametric models trained on adaptively collected data. Our approach uses kernels over the parameter space to induce reproducing-kernel Hilbert space structures over the corresponding model class, yielding confidence bounds for models trained with broad classes of regularized convex losses. We show how these bounds can support convergence guarantees for nonlinear acquisition and surrogate models, including randomized regularized policies that select points by maximizing a trained random model. These results provide a unified route to analyzing nonlinear parametric models in Bayesian optimization and related adaptive optimization settings.


Coupling-Informed Transport Maps for Bayesian Filtering in Nonlinear Dynamical Systems

arXiv.org Machine Learning

A likelihood-free transport filtering method is proposed based on the couplings between state and observation variables. By exploiting a block-triangular structure in the transport map, the analysis step of filtering is reformulated as the minimization of the maximum mean discrepancy (MMD) between the true joint measure and its transport-based approximation. To circumvent the non-convexity in the MMD optimization, we introduce a training-free transport filter method via gradient flows, which leads to an analytic computation for the transport map that implies the steepest descent direction of the MMD. The proposed approach accurately approximates non-Gaussian filtering posteriors and avoids particle collapse. We provide a convergence analysis for the expectation of the MMD between the approximated posterior and the truth posterior. Finally, we extend the method to high-dimensional problems through domain localization. Numerical examples demonstrate the superior performance of our approach over conventional filtering methods in nonlinear, non-Gaussian scenarios.


LLMs as Implicit Imputers: Uncertainty Should Scale with Missing Information

arXiv.org Machine Learning

Large language models (LLMs) are increasingly deployed in settings where the available context is incomplete or degraded. We argue that an LLM generating answers under incomplete context can be viewed as an implicit imputer, and evaluated against a criterion from the multiple imputation (MI) literature: uncertainty should scale with the amount of missing information. We assess this criterion on SQuAD, using a controlled framework in which context availability is varied across five levels. We evaluate two answer-level uncertainty measures that can be estimated from repeated sampling: sampling-based confidence (empirical mode frequency) and response entropy. Confidence fails to reflect increasing missingness: it remains high even as accuracy collapses. Entropy, by contrast, increases with context removal, consistent with the MI analogy, and explains substantially more variance in accuracy than confidence across all evidence levels (quadratic $R^2$ gap up to 0.057). We further introduce a black-box diagnostic $ρ_R(α)$ that estimates the proportion of baseline uncertainty resolved by context level $α$, requiring only repeated sampling with and without context. These results suggest that entropy is a more responsive black-box uncertainty measure than confidence under incomplete context.


The Sample Complexity of Multiple Change Point Identification under Bandit Feedback

arXiv.org Machine Learning

We study multiple change point localization under bandit feedback. An unknown piecewise-constant function on a compact interval can be queried sequentially at adaptively chosen inputs, and each query returns a noisy evaluation of the function. The goal is to identify a prescribed number of discontinuities, known as change points, within a target precision $η$ and confidence level $1-δ$, while using as few samples as possible. We propose an adaptive algorithm that first detects intervals likely to contain change points and then refines their locations to precision $η$. We establish non-asymptotic upper bounds on its sample budget, together with corresponding lower bounds. Prior work shows that jump magnitudes alone determine the asymptotic sample complexity as $δ\to 0$. We reveal that this picture is incomplete beyond this regime. We demonstrate, both empirically and theoretically, that for general $δ$ and $η$, the complexity is jointly governed by the jumps and the relative positions of the change points.


Unified generalization analysis for physics informed neural networks

arXiv.org Machine Learning

Physics-Informed Neural Networks (PINNs) and their variational counterparts (VPINNs) are neural networks that incorporate physical laws, making them useful for scientific problems. Existing generalization analyses for PINNs and VPINNs remain limited, often requiring restrictive assumptions such as stability conditions or linear ellipticity. In this paper, we derive generalization bounds for neural networks that involve differentiation with respect to input variables, covering PINNs and VPINNs under a unified framework. We apply Taylor expansion to represent nonlinear differential operators as linear operators on a high-dimensional space, enabling the use of Koopman-based analysis and showing that high-rank networks can generalize well even in settings involving differential operators. We also show that the nonlinearity of the differential operator exponentially enlarges the bound, highlighting its significant impact on generalization.


Learning Perturbations to Extrapolate Your LLM

arXiv.org Machine Learning

Training large language models (LLMs) such as GPT-5 and Qwen-3 (Singh et al., 2025; Yang et al., 2025) on massive text corpora aims at capturing the underlying distribution of natural language. Yet, it remains challenging for the trained model to extrapolate to out-of-distribution or out-of-domain settings beyond the support of its training data. The literature has seen the development of various data perturbation techniques, such as synonym replacement, random insertion, deletion, and swap, that modify training instances into semantically similar variants to effectively expose LLMs to a broader range of inputs and improve their ability to generalize beyond the training data (Feng et al., 2019, 2020; Li et al., 2024; Cen et al., 2026). However, their approach remains grounded in the discrete, word-level augmentation procedures mentioned previously, which may restrict its adaptivity across diverse domains. While discrete perturbations are simple to use, they could be too coarse and hard to refine due to the complexity of natural language (Park et al., 2022; Li et al., 2023). Meanwhile, fixed perturbations apply the same transformations to the data regardless of the contexts, thus failing to generalize appropriately (Ismailov and Asanova, 2025).


Delightful Exploration

arXiv.org Machine Learning

Most exploration algorithms search broadly until uncertainty is resolved. When the action space is too large to resolve within budget, practitioners default to $\varepsilon$-greedy, which bounds disruption but spends its override blindly. We introduce \textit{Delight-gated exploration} (DE), a host--override rule that spends exploratory actions only when their prospective delight (expected improvement times surprisal) exceeds a gate price. This practical heuristic recovers a classical result: Pandora's reservation-value rule for costly search, with surprisal setting the effective inspection cost. Resolved arms exit the gate, fresh arms shut off above a prior-determined threshold, and selected linear-bandit overrides consume finite information budget. Across Bernoulli bandits, linear bandits, and tabular MDPs, the same hyperparameters transfer without retuning, and DE shows much weaker regret growth than Thompson Sampling and $\varepsilon$-greedy in the tested unresolved regimes. Delight improves acting for the same reason it improves learning: it prices scarce resources by the product of upside and surprisal.


Support-Conditioned Flow Matching Is Kernel Smoothing

arXiv.org Machine Learning

Generative models are often conditioned on a small set of examples via cross-attention. Under the Gaussian optimal-transport path, we show that the exact velocity field induced by a finite support set is a Nadaraya--Watson kernel smoother whose bandwidth decreases with flow time, from broad averaging at early steps to nearest-neighbor at late steps. A single Gaussian-kernel attention head exactly computes this field, connecting cross-attention conditioning to classical kernel theory. The theory predicts three failure regimes: nearest-neighbor collapse of the kernel at high dimension, mismatch between the isotropic kernel and the data geometry, and insufficient support for nonparametric estimation. Experiments on Gaussian mixtures, spherical shells, and DINOv2 ImageNet features confirm that learned conditioning improves in precisely these regimes, and that IP-Adapter's cross-attention implements approximate NW smoothing in practice.


Trajectory-Level Data Augmentation for Offline Reinforcement Learning

arXiv.org Machine Learning

We propose a data augmentation method for offline reinforcement learning, motivated by active positioning problems. Particularly, our approach enables the training of off-policy models from a limited number of suboptimal trajectories. We introduce a trajectory-based augmentation technique that exploits task structure and the geometric relationship between rewards, value functions, and mathematical properties of logging policies. During data collection, our augmentation supports suboptimal logging policies, leading to higher data quality and improved offline reinforcement learning performance. We provide theoretical justification for these strategies and validate them empirically across positioning tasks of varying dimensionality and under partial observability.