Overview
Statistical Reinforcement Learning in the Real World: A Survey of Challenges and Future Directions
Gazi, Asim H., Guo, Yongyi, Gao, Daiqi, Xu, Ziping, Zhang, Kelly W., Murphy, Susan A.
Reinforcement learning (RL) has achieved remarkable success in real-world decision-making across diverse domains, including gaming, robotics, online advertising, public health, and natural language processing. Despite these advances, a substantial gap remains between RL research and its deployment in many practical settings. Two recurring challenges often underlie this gap. First, many settings offer limited opportunity for the agent to interact extensively with the target environment due to practical constraints. Second, many target environments often undergo substantial changes, requiring redesign and redeployment of RL systems (e.g., advancements in science and technology that change the landscape of healthcare delivery). Addressing these challenges and bridging the gap between basic research and application requires theory and methodology that directly inform the design, implementation, and continual improvement of RL systems in real-world settings. In this paper, we frame the application of RL in practice as a three-component process: (i) online learning and optimization during deployment, (ii) post- or between-deployment offline analyses, and (iii) repeated cycles of deployment and redeployment to continually improve the RL system. We provide a narrative review of recent advances in statistical RL that address these components, including methods for maximizing data utility for between-deployment inference, enhancing sample efficiency for online learning within-deployment, and designing sequences of deployments for continual improvement. We also outline future research directions in statistical RL that are use-inspired -- aiming for impactful application of RL in practice.
Optimal Transport under Group Fairness Constraints
Bleistein, Linus, Dagréou, Mathieu, Andrade, Francisco, Boudou, Thomas, Bellet, Aurélien
Ensuring fairness in matching algorithms is a key challenge in allocating scarce resources and positions. Focusing on Optimal Transport (OT), we introduce a novel notion of group fairness requiring that the probability of matching two individuals from any two given groups in the OT plan satisfies a predefined target. We first propose \texttt{FairSinkhorn}, a modified Sinkhorn algorithm to compute perfectly fair transport plans efficiently. Since exact fairness can significantly degrade matching quality in practice, we then develop two relaxation strategies. The first one involves solving a penalised OT problem, for which we derive novel finite-sample complexity guarantees. This result is of independent interest as it can be generalized to arbitrary convex penalties. Our second strategy leverages bilevel optimization to learn a ground cost that induces a fair OT solution, and we establish a bound guaranteeing that the learned cost yields fair matchings on unseen data. Finally, we present empirical results that illustrate the trade-offs between fairness and performance.
Disordered Dynamics in High Dimensions: Connections to Random Matrices and Machine Learning
Bordelon, Blake, Pehlevan, Cengiz
We provide an overview of high dimensional dynamical systems driven by random matrices, focusing on applications to simple models of learning and generalization in machine learning theory. Using both cavity method arguments and path integrals, we review how the behavior of a coupled infinite dimensional system can be characterized as a stochastic process for each single site of the system. We provide a pedagogical treatment of dynamical mean field theory (DMFT), a framework that can be flexibly applied to these settings. The DMFT single site stochastic process is fully characterized by a set of (two-time) correlation and response functions. For linear time-invariant systems, we illustrate connections between random matrix resolvents and the DMFT response. We demonstrate applications of these ideas to machine learning models such as gradient flow, stochastic gradient descent on random feature models and deep linear networks in the feature learning regime trained on random data. We demonstrate how bias and variance decompositions (analysis of ensembling/bagging etc) can be computed by averaging over subsets of the DMFT noise variables. From our formalism we also investigate how linear systems driven with random non-Hermitian matrices (such as random feature models) can exhibit non-monotonic loss curves with training time, while Hermitian matrices with the matching spectra do not, highlighting a different mechanism for non-monotonicity than small eigenvalues causing instability to label noise. Lastly, we provide asymptotic descriptions of the training and test loss dynamics for randomly initialized deep linear neural networks trained in the feature learning regime with high-dimensional random data. In this case, the time translation invariance structure is lost and the hidden layer weights are characterized as spiked random matrices.
Self-Supervised Learning from Noisy and Incomplete Data
Tachella, Julián, Davies, Mike
Many important problems in science and engineering involve inferring a signal from noisy and/or incomplete observations, where the observation process is known. Historically, this problem has been tackled using hand-crafted regularization (e.g., sparsity, total-variation) to obtain meaningful estimates. Recent data-driven methods often offer better solutions by directly learning a solver from examples of ground-truth signals and associated observations. However, in many real-world applications, obtaining ground-truth references for training is expensive or impossible. Self-supervised learning methods offer a promising alternative by learning a solver from measurement data alone, bypassing the need for ground-truth references. This manuscript provides a comprehensive summary of different self-supervised methods for inverse problems, with a special emphasis on their theoretical underpinnings, and presents practical applications in imaging inverse problems.
From 'sand theft auto' to space BABIES: The global innovations and trends set to shape 2026
Trump's ominous warning to Colombia as acting Venezuelan president issues message to world calling for'peace and dialogue, not war' Trump plans a military'quarantine' of Venezuela's oil to strong-arm Maduro's successor I got a GLP-1 drug with few questions asked... and never meeting a doctor face-to-face. But could that convenience have put my health at risk? Addicted, arrested and dead in a hotel corridor...Victoria Jones is the latest child of a famous parent to tragically spiral. So why ARE so many children of the rich and famous cursed? Marco Rubio'runs laps' around CBS reporter who asked why US commandos didn't nab Maduro associates in daring night time raid Prince Harry'desperately wants King Charles to come to Montecito and see Archie and Lilibet' Travis Kelce finally addresses possible retirement as Chiefs lose to NFL's worst team in what could be humiliating end to his iconic career State of Jennifer Garner and Jennifer Lopez's relationship revealed by insiders... as parents gossip about'less sociable' star at school play NASA's'queen of diamonds' EXPOSED: Genius is accused of treachery over top secret mission... as chilling details emerge Michael B. Jordan's unimpressed face sends fans wild as Timothee Chalamet cries on stage over Kylie Jenner North West, 12, sparks face piercing speculation after backlash over'risky' body modification'Out-of-touch' Gayle King slammed for complaining that her upper class seat doesn't have a window on her eight-hour flight'back to work' from Hawaii American family of seven stranded after Venezuela raids say they're trapped in a living hell... while oblivious influencers BOAST about getting stuck Ten people who spread false claims France's First Lady Brigitte Macron was born a man are found guilty of cyberbullying in Paris EXPOSED: The Air Force vet who let China steal America's nuclear secrets... and KEPT his $200K tax-funded salary From'sand theft auto' to space BABIES: The global innovations and trends set to shape 2026 From the rise of the humanoid robot to the weird world of AI girlfriends, 2025 had no shortage of strange and transformative inventions. Now, experts from the Nesta research foundation have revealed the global innovations and trends set to shape the world in 2026.
Statistical and computational challenges in ranking
Carpentier, Alexandra, Verzelen, Nicolas
We consider the problem of ranking $n$ experts according to their abilities, based on the correctness of their answers to $d$ questions. This is modeled by the so-called crowd-sourcing model, where the answer of expert $i$ on question $k$ is modeled by a random entry, parametrized by $M_{i,k}$ which is increasing linearly with the expected quality of the answer. To enable the unambiguous ranking of the experts by ability, several assumptions on $M$ are available in the literature. We consider here the general isotonic crowd-sourcing model, where $M$ is assumed to be isotonic up to an unknown permutation $π^*$ of the experts - namely, $M_{π^{*-1}(i),k} \geq M_{π^{*-1}(i+1),k}$ for any $i\in [n-1], k \in [d]$. Then, ranking experts amounts to constructing an estimator of $π^*$. In particular, we investigate here the existence of statistically optimal and computationally efficient procedures and we describe recent results that disprove the existence of computational-statistical gaps for this problem. To provide insights on the key ideas, we start by discussing simpler and yet related sub-problems, namely sub-matrix detection and estimation. This corresponds to specific instances of the ranking problem where the matrix $M$ is constrained to be of the form $λ\mathbf 1\{S\times T\}$ where $S\subset [n], T\subset [d]$. This model has been extensively studied. We provide an overview of the results and proof techniques for this problem with a particular emphasis on the computational lower bounds based on low-degree polynomial methods. Then, we build upon this instrumental sub-problem to discuss existing results and algorithmic ideas for the general ranking problem.
Distributionally Robust Imitation Learning
We consider the imitation learning problem of learning a policy in a Markov Decision Process (MDP) setting where the reward function is not given, but demonstrations from experts are available. Although the goal of imitation learning is to learn a policy that produces behaviors nearly as good as the experts' for a desired task, assumptions of consistent optimality for demonstrated behaviors are often violated in practice. Finding a policy that is distributionally robust against noisy demonstrations based on an adversarial construction potentially solves this problem by avoiding optimistic generalizations of the demonstrated data. This paper studies Distributionally Robust Imitation Learning (DRoIL) and establishes a close connection between DRoIL and Maximum Entropy Inverse Reinforcement Learning. We show that DRoIL can be seen as a framework that maximizes a generalized concept of entropy. We develop a novel approach to transform the objective function into a convex optimization problem over a polynomial number of variables for a class of loss functions that are additive over state and action spaces. Our approach lets us optimize both stationary and non-stationary policies and, unlike prevalent previous methods, it does not require repeatedly solving an inner reinforcement learning problem. We experimentally show the significant benefits of DRoIL's new optimization method on synthetic data and a highway driving environment.
How does Weight Correlation Affect Generalisation Ability of Deep Neural Networks?
This paper studies the novel concept of weight correlation in deep neural networks and discusses its impact on the networks' generalisation ability. For fully-connected layers, the weight correlation is defined as the average cosine similarity between weight vectors of neurons, and for convolutional layers, the weight correlation is defined as the cosine similarity between filter matrices. Theoretically, we show that, weight correlation can, and should, be incorporated into the PAC Bayesian framework for the generalisation of neural networks, and the resulting generalisation bound is monotonic with respect to the weight correlation. We formulate a new complexity measure, which lifts the PAC Bayes measure with weight correlation, and experimentally confirm that it is able to rank the generalisation errors of a set of networks more precisely than existing measures. More importantly, we develop a new regulariser for training, and provide extensive experiments that show that the generalisation error can be greatly reduced with our novel approach.