objective
Theoretical Foundations and Effective Algorithms for Policy-Aware Simulator Learning
Dann, Christoph, Mansour, Yishay, Mohri, Mehryar
Model-based reinforcement learning (MBRL) agents typically learn world models by minimizing predictive loss. However, powerful RL optimizers inevitably exploit minor model inaccuracies, leading to simulator exploitation and a reality gap where policies succeed in simulation but fail in the real world. We propose that the objective for learning simulators should be strategic robustness rather than predictive accuracy, and formulate this as a zero-sum minimax game between a model player and an adversarial policy player. We provide a comprehensive theoretical analysis: (1) an online learning guarantee showing the game is learnable with sublinear regret bounds; (2) a tractable critic-based simplification bounding the global policy-value gap by the local critic's loss; and (3) an Error-MDP duality, proving that finding the worst-case policy is formally dual to a standard RL problem where the reward is the one-step critic error. This duality yields a provably convergent active data selection algorithm. Experiments on continuous control tasks demonstrate that our approach reduces prediction error in strategically important regions by $1.5$-$2.2\times$ and enables policies trained purely in simulation to match near-optimal real-world performance.
Joint Model and Data Sparsification via the Marginal Likelihood
Timans, Alexander, Möllenhoff, Thomas, Naesseth, Christian A., Khan, Mohammad Emtiyaz, Nalisnick, Eric
Sparse recovery in linear systems underpins applications from signal processing to high-dimensional regression. Sparse Bayesian Learning, grounded in the principle of automatic relevance determination (ARD), offers a practical Bayesian mechanism for feature sparsity via marginal likelihood optimization. Yet, its reliance on a homoscedastic noise model renders it sensitive to data contaminations such as outliers or misspecified noise, harming model fit and predictions. Instead, we propose jointly learning individual feature and sample relevancies, enabling simultaneous model and data sparsification via a single Bayesian objective. This symmetric pruning of model and data offers a natural extension that preserves conjugacy, admits closed-form updates for standard optimization procedures, and aligns with perspectives from robust regression and influence functions. Empirical results across diverse regression tasks affirm that a joint ARD approach consistently yields both sparse and robust prediction models.
Soft Specialists: $α$-Rényi Ensembles for Uncertainty-Aware LLM Post-Training
Cordero-Encinar, Paula, Tyukin, Georgy, Duncan, Andrew B.
Existing training approaches for large language models learn a single set of parameters, based on large volumes of data, which is typically heterogeneous, conflicting and often outright contradictory. As a result, the model is forced to compress conflicting goals, and inherent uncertainties into a single, averaged pattern of behaviour. We propose an $α$-Rényi variational framework for learning distributions over post-training parameters, offering an uncertainty-aware alternative to deep ensemble approaches. The resulting variational objective interpolates between classical variational Bayes and predictively oriented posterior learning, balancing between globally plausible individual models against systems of complementary specialists. We identify local stability criteria, demonstrating how model misspecification can make non-degenerate posterior spread locally favourable, manifesting contradictory or conflicting data as epistemic uncertainty. We apply our framework to LLM post-training, learning an ensemble of LoRA adapters attached to a shared, frozen base model, providing a scalable training procedure for both supervised fine-tuning and preference optimisation. Our approach enables training examples to be softly routed across ensemble members, promoting model specialisation and providing actionable uncertainty estimates across different tasks.
Reward Transfer from Inverse Reinforcement Learning: A Coupled Minimax Approach
Hao, Guang-Yuan, van der Laan, Lars, Bibaut, Aurélien, Kallus, Nathan
Expert demonstrations, such as those from car drivers, help navigate environments with unknown rewards, but are often collected in controlled settings, such as closed-course test tracks, while learned control policies must be deployed in new environments, such as city streets. We can imitate experts to perform well in the same source environment where demonstrations are observed, and we may even use inverse reinforcement learning (IRL) to improve on simple behavior cloning (Ng and Russell, 2000; Abbeel and Ng, 2004; Ziebart et al., 2008; Fu et al., 2018; Geng et al., 2020). But the target environment may have a different transition law, discount factor, or soft-control regularization. For this, IRL is crucial: we can learn a reward from demonstrations in the source environment and transfer it to the target environment, learning a policy that optimizes the same reward function in a new setting (Fu et al., 2018; Schlaginhaufen and Kamgarpour, 2024). In this paper, we characterize how well this transfer can be done and which approaches are preferable. In particular, we show the value in a coupled approach that takes the target environment into account even when learning from the source. In ordinary offline control, the Bellman equation uses a known reward, so the main statistical error comes from target transitions.
Convergence of empirical subgradients for optimal transport-based objectives
Optimal transport is widely used to learn distributions, enforce distributional constraints, and model uncertainty. In applications, transport losses are often computed from samples through tractable representations, such as one-dimensional sorting formulas or sliced Wasserstein costs, making them practical components in training pipelines. We study parameterized objectives defined by sampled transport costs and prove graphical convergence of their subdifferentials to the subdifferential of the population objective. In particular, this ensures that standard subgradient methods consistently approach stationary points of the population-level problem. We illustrate the results in several settings, including risk-averse optimization, fairness-constrained learning, and sliced Wasserstein problems. Our analysis highlights that smooth parameterizations provide a favorable interface between statistical consistency and optimization. By contrast, transport objectives with nonsmooth costs and models may exhibit unstable derivatives in the large-sample limit.
Decision-focused learning for optimal PV-Battery scheduling
Depoortere, Joris, Kazmi, Hussain, Driesen, Johan
The use of residential photovoltaics has increased dramatically in recent years. With battery systems becoming more affordable, the optimal operation of a photovoltaic-battery system can bring significant savings to households. Optimal control requires correct forecasts of underlying parameters, such as photovoltaic power generation, to schedule the battery. While forecasting models have become increasingly accurate due to algorithmic advances and data availability, accuracy is typically measured in generic metrics which might not align with the downstream application. This study proposes a decision-focused learning framework that integrates optimization and prediction by training a Long Short-Term Memory photovoltaic energy forecaster on the downstream optimal scheduling of a battery system. The proposed methodology is compared against a standard two-phase approach. Across a 14-month evaluation period, the decision-focused method reduced average electricity costs across twenty buildings by 3.6% when normalized against performance bounds defined by a perfect forecast and a baseline of no optimization. Critically, this financial improvement was achieved despite the model exhibiting a root mean squared error of 19.9%, significantly higher than the decoupled model's 8.2%. Warm-starting the decision-focused model further improves results, lowering average cost by approximately 8%, while also mitigating the negative impact on statistical accuracy (root mean squared error of 13.7%). The findings are statistically significant at the 0.001 level across the twenty households and for each household individually. These results demonstrate that aligning forecast models with optimization goals is key for achieving cost advantages in PV-battery systems. Future research should replicate these findings on other datasets, alternate forecasting models and alternate optimization algorithms.
Conservative neural posterior estimation via distributionally robust training
Laplante, William, Hikida, Yuga, Dellaporta, Charita, Briol, François-Xavier, Bharti, Ayush
Simulation-based inference (SBI; Cranmer et al., 2020) is a powerful framework for inferring parameters of scientific models whose likelihood functions are unavailable or computationally prohibitive to evaluate, but for which simulating data is straightforward. The use of flexible neural conditional density estimators has substantially expanded the applicability of SBI to challenging problems, especially in fields such as particle physics (Brehmer, 2021), cognitive neuroscience (Fengler et al., 2021), economics (Dyer et al., 2024) and cosmology (Alsing et al., 2018; Jeffrey et al., 2021). Neural SBI methods rely on simulations from the scientific model to approximate intractable quantities such as the posterior, the likelihood, the likelihood-to-evidence ratio, or the score function; see Zammit-Mangion et al. (2024) for a recent review. In this work, we focus on the widely used neural posterior estimation (NPE) method (Papamakarios and Murray, 2016; Radev et al., 2022). A central practical limitation of NPE is the simulation budget required to train the conditional density estimator. As many scientific simulators are expensive to run, generating a sufficiently large training set is often the main computational bottleneck.
A PAC-Bayesian View of Generalisation for Physics-Informed Machine Learning
Nguyen, Thien V., Habrard, Amaury, Guedj, Benjamin
Physics-informed machine learning (PIML) integrates mechanistic knowledge, typically in the form of partial differential equations (PDE), into data-driven models. Despite strong empirical performance, its statistical generalisation properties remain poorly understood, particularly in the regression setting with unbounded losses. Existing analyses rely on approximation or stability arguments and do not fully capture how physical structure influences generalisation from finite data. In this work, we develop a PAC-Bayesian framework for PIML that provides high-probability generalisation guarantees in the presence of unbounded losses. We adopt a multi-task perspective that jointly treats data fidelity, PDE residuals, initial and boundary conditions, avoiding the looseness induced by standard union-bound approaches. Our analysis leverages the structure of physics-informed objectives to derive novel bounds where the complexity scales with input-gradient norms of the losses, revealing a direct link between physical regularity and generalisation. We instantiate this framework under Sobolev and Poincaré-type assumptions, yielding two classes of bounds that trade off statistical complexity and smoothness in different regimes. Building on these results, we propose a self-bounding-aware learning algorithm that directly optimises tractable surrogates of the derived bounds, along with a practical procedure to estimate the associated constants in realistic settings. Empirical evaluations on standard PDE benchmarks demonstrate that our bounds are non-vacuous, significantly tighter than union-bound baselines, and can be effectively minimised during training. Overall, our results provide a principled statistical foundation for the generalisation of physics-informed models.
CART Random Forests as Sequential Allocation over Random Opportunity Sets: A Stochastic-Control Theory of Ensemble Risk
Mei, Tianxing, Fan, Yingying, Leng, Mingming, Lv, Jinchi
CART random forests are among the most widely used modern predictive methods, with well-documented empirical success. Yet, at the mechanistic level, the algorithm is often treated as a black box because of its complexity. In this paper, we develop a stochastic-control perspective on feature-subsampled CART random forests, named CART random opportunity-set allocation (CART-ROSA). At each node, the random subset of features is interpreted as a random feasible action set, and the CART split rule as a masked-action allocation policy. This policy induces a controlled stochastic process over informative split-count states, whose terminal law determines both single-tree error and cross-tree interaction terms in the forest mean squared error (MSE). Such representation opens the black box of CART-forests by separating two design levers: the informative-opportunity rate induced by feature subsampling, and the contraction strength from the within-mask split policy. We establish that the CART policy is locally stabilizing: it contracts imbalances in informative split allocations and concentrates terminal tree geometry. At the system level, however, it can be globally suboptimal for the forest objective. Specializing to the linear model, we derive the MSE risk expansion explicitly. Our results show how an operations-research perspective makes tractable a theoretical gap difficult to access from the standard algorithmic description of CART forests.
Model Merging on Loss Landscape: A Geometry Perspective
Lu, Juanwu, Bhaskar, Anand, Axelrod, Brian, Tolstaya, Ekaterina, Emrich, Tristan
Model merging offers a promising avenue for knowledge integration and parallel development without retraining. Yet, existing methods either ignore the geometry of the loss landscape or rely on intractable full-space Hessian approximations. We propose EpiMer, a framework that casts model merging as solving the Fréchet mean on a Riemannian manifold and restricts the computation to a low-rank subspace spanned by the task vectors. With the expected Hessian as the metric, we reveal a connection between local curvature and epistemic uncertainty of the parameters. Our theoretical analysis decomposes the merging error bound into the subspace Fréchet variance and the residual energy, and provides a closed-form characterization of when curvature-aware merging provably outperforms flat-geometry methods. In addition, our framework unifies both curvature-aware methods and recent spectral methods as special cases of the subspace Fréchet mean with different geometric metrics. Merging fine-tuned CLIP-ViT models on eight image classification tasks, Epistemic Merging strictly outperforms the baselines on all three CLIP-ViT backbones at matched rank, improving the across-task average accuracy and worst-task accuracy on every backbone.