modeling
Yield Curves Dynamics Using Variational Autoencoders Under No-arbitrage
Luo, Fusheng, Geman, H'elyette
This paper introduces a physics-informed generative framework that resolves the fundamental conflict between the statistical flexibility of deep learning and the rigorous theoretical constraints of fixed-income modeling. We demonstrate that standard generative models and unconstrained statistical extrapolations suffer from "manifold collapse" and severe arbitrage violations when forecasting term structures across diverse macroeconomic regimes. To overcome this, we propose a two-stage architecture. First, a Student-t Conditional Variational Autoencoder with Dynamic Level Injection (CVAEsT+LS) extracts a robust, heavy-tailed term structure manifold, effectively decoupling macroeconomic shape dynamics from absolute base rates. Second, the latent dynamic evolution is governed by a continuous-time Neural Stochastic Differential Equation (SDE) strictly penalized by a No-Arbitrage Partial Differential Equation (PDE). Empirical results across multiple sovereign currencies (USD, GBP, JPY) confirm that our synergistic approach drastically reduces out-of-sample forecasting errors -- achieving an exceptional 6.58 bps Mean Tenor RMSE -- and successfully overcomes the massive parallel drift and zero-lower-bound violations exhibited by the classical HJM model in extreme environments. Furthermore, through phase space vector field analysis, we demonstrate the model's superior capability in unsupervised macroeconomic regime detection and high-quality continuous-time scenario generation. Ultimately, this research provides a highly scalable, mathematically sound evolutionary engine for term structure modeling.
Understanding Deterioration Random Effects for Causal Discovery in Infrastructure Management
Infrastructure deterioration poses significant challenges for asset management, yet existing approaches rely on population-averaged models that overlook equipment-specific heterogeneity. We present a novel framework that combines Bayesian hierarchical hazard modeling with causal discovery to identify operational patterns that drive heterogeneous deterioration rates in pump equipment. Our approach first estimates pump-specific random effects $u_i$ using GPU-accelerated No-U-Turn Sampling (NUTS), achieving 3--5$\times$ speedup over CPU implementations. We then employ DirectLiNGAM to discover causal relationships between 22 engineered time-series features and deterioration rates, stratified by positive ($u_i > 0$, faster deterioration) versus negative ($u_i \leq 0$, slower deterioration) random effects. Analyzing 112 pumps with 92,861 observations over 650 days, we uncover striking heterogeneity: the negative group exhibits causal effects 400$\times$ larger than the positive group, with standard deviation (std) showing a strong positive causal effect ($+1.515$) on deterioration rates in low-risk equipment. We validate linearity assumptions through NonlinearLiNGAM comparison and demonstrate practical scalability through GPU acceleration. Our findings enable targeted maintenance strategies by revealing that different operational regimes require fundamentally distinct management approaches, advancing predictive maintenance from population-averaged to heterogeneity-aware decision making.
Forecasting Medium-Horizon Alzheimer's Disease Progression: Residual Gap-Aware Transformers for 24-Month CDR-SB Change from ADNI Clinical and Biomarker Histories
Tong, Ran, Wang, Tong, Wang, Lanruo, Ni, Yin
Medium-horizon Alzheimer's disease progression prediction is difficult because future clinical scores can remain tied to baseline severity, while biomarker histories are irregular and incompletely observed. We develop an anchor-based analysis of 24-month Clinical Dementia Rating Sum of Boxes (CDR-SB) change using harmonized Alzheimer's Disease Neuroimaging Initiative (ADNI) tables. Each labeled sample is anchored at a mild cognitive impairment visit, uses only clinical and biomarker history observed at or before that anchor, and defines the response as CDR-SB at the future visit closest to 24 months within an 18--30 month window minus anchor CDR-SB. The analytic cohort contains 2,600 labeled anchors from 858 participants and 7,276 longitudinal rows. We propose a residual gap-aware transformer that combines a mixed-effects statistical reference with transformer-based residual learning from pre-anchor clinical and biomarker histories. The model uses participant-level random intercepts in the mixed-effects reference, observation-level triplet tokenization for irregular histories, and a learned nonnegative time-gap penalty inside self-attention. We compare the proposed model with a Bayesian-information-criterion-selected linear mixed-effects baseline, GRU-D, and STraTS under repeated participant-level train--test splits. Across five participant-level random seeds, the proposed model achieves the best mean test performance across all reported metrics, reducing MSE by 13.1% and increasing prediction--observation correlation by 26.4% relative to the mixed-effects baseline. It also improves over both GRU-D and STraTS in mean error and correlation. These results show that statistical anchoring and gap-aware residual learning provide a useful structure for medium-horizon Alzheimer's disease progression prediction.
Introducing ARFBench: A time series question-answering benchmark based on real incidents
More than a trillion dollars are lost every year due to system failures. To resolve them, engineers must troubleshoot outages quickly. An important task in incident response involves analyzing observability metrics, or time series data that snapshot the health of software systems. For example, an engineer for a service may use Datadog to answer questions like "When did latency start increasing?" and "What metrics outside of latency are also behaving abnormally?" to localize the root cause of the anomalous behavior. These time series question-answering (TSQA) tasks are essential for engineers, and present challenging and necessary tasks for SRE models and agents to perform.
Learning Generative Dynamics with Soft Law Constraints: A McKean-Vlasov FBSDE Approach
Boustany, Samer El, Mekkaoui, Samy, Hafsi, Yadh, Alouadi, Alexandre, Pham, Huyรชn
We propose a generative framework for learning stochastic dynamics from endpoint and intermediate distributional observations. The method formulates generation as a McKean-Vlasov control problem in which terminal and time-marginal laws are enforced through soft energy constraints. The associated optimality system is a forward-backward stochastic differential equation (FBSDE) whose backward component receives a continuous drift induced by the marginal law penalties. This provides a principled alternative to hard interpolation or optimal transport maps between observed distributions: the model learns a stochastic path law whose dynamics remain globally coupled through the mean-field objective. We derive the reduced FBSDE system for quadratic control cost and constant diffusion, connecting terminal and marginal law flat derivatives to score-like training signals. The resulting neural solver is evaluated on low-dimensional distributional benchmarks, where it recovers smooth stochastic paths matching prescribed marginal laws. In a higher-dimensional ALAE latent space, endpoint supervision is used as a qualitative stress test for transporting non-smiling faces toward smiling ones in a pretrained representation. We then use articulated human motion as a structured high-dimensional case study on a curated AMASS low-to-high position dataset, using SMPL-H pose sequences and reduced pose representations. The experiments show that soft marginal law constraints can produce coherent stochastic trajectories whose intermediate distributions follow the observed evolution of human motion. The code is available at https://github.com/murex/deep-mkv-gen/tree/main.
Sequential Neural Models with Stochastic Layers
Marco Fraccaro, Sรธren Kaae Sรธnderby, Ulrich Paquet, Ole Winther
This paper introduces stochastic recurrent neural networks which glue a deterministic recurrent neural network and a state space model together to form a stochastic and sequential neural generative model. The clear separation of deterministic and stochastic layers allows a structured variational inference network to track the factorization of the model's posterior distribution. By retaining both the nonlinear recursive structure of a recurrent neural network and averaging over the uncertainty in a latent path, like a state space model, we improve the state of the art results on the Blizzard and TIMIT speech modeling data sets by a large margin, while achieving comparable performances to competing methods on polyphonic music modeling.
Operator Learning with Neural Fields: Tackling PDEs on General Geometries
Machine learning approaches for solving partial differential equations require learning mappings between function spaces. While convolutional or graph neural networks are constrained to discretized functions, neural operators present a promising milestone toward mapping functions directly. Despite impressive results they still face challenges with respect to the domain geometry and typically rely on some form of discretization. In order to alleviate such limitations, we present CORAL, a new method that leverages coordinate-based networks for solving PDEs on general geometries. CORAL is designed to remove constraints on the input mesh, making it applicable to any spatial sampling and geometry. Its ability extends to diverse problem domains, including PDE solving, spatio-temporal forecasting, and geometry-aware inference. CORAL demonstrates robust performance across multiple resolutions and performs well in both convex and non-convex domains, surpassing or performing on par with state-of-the-art models.
Recursion in Recursion: Two-Level Nested Recursion for Length Generalization with Scalability
Binary Balanced Tree Recursive Neural Networks (BBT-RvNNs) enforce sequence composition according to a preset balanced binary tree structure. Thus, their nonlinear recursion depth (which is the tree depth) is just log2 n(nbeing the sequence length). Such logarithmic scaling makes BBT-RvNNs efficient and scalable on long sequence tasks such as Long Range Arena (LRA). However, such computational efficiency comes at a cost because BBT-RvNNs cannot solve simple arithmetic tasks like ListOps. On the flip side, RvNN models (e.g., Beam Tree RvNN) that do succeed on ListOps (and other structure-sensitive tasks like formal logical inference) are generally several times more expensive (in time and space) than even Recurrent Neural Networks.