Jonckheere, Matthieu
Probabilistic Insights for Efficient Exploration Strategies in Reinforcement Learning
Garcia, Ernesto, Bermolen, Paola, Jonckheere, Matthieu, Shneer, Seva
We investigate efficient exploration strategies of environments with unknown stochastic dynamics and sparse rewards. Specifically, we analyze first the impact of parallel simulations on the probability of reaching rare states within a finite time budget. Using simplified models based on random walks and L\'evy processes, we provide analytical results that demonstrate a phase transition in reaching probabilities as a function of the number of parallel simulations. We identify an optimal number of parallel simulations that balances exploration diversity and time allocation. Additionally, we analyze a restarting mechanism that exponentially enhances the probability of success by redirecting efforts toward more promising regions of the state space. Our findings contribute to a more qualitative and quantitative theory of some exploration schemes in reinforcement learning, offering insights into developing more efficient strategies for environments characterized by rare events.
Optimizing Asynchronous Federated Learning: A Delicate Trade-Off Between Model-Parameter Staleness and Update Frequency
Alahyane, Abdelkrim, Comte, Céline, Jonckheere, Matthieu, Moulines, Éric
Synchronous federated learning (FL) scales poorly with the number of clients due to the straggler effect. Algorithms like FedAsync and GeneralizedFedAsync address this limitation by enabling asynchronous communication between clients and the central server. In this work, we rely on stochastic modeling to better understand the impact of design choices in asynchronous FL algorithms, such as the concurrency level and routing probabilities, and we leverage this knowledge to optimize loss. We characterize in particular a fundamental trade-off for optimizing asynchronous FL: minimizing gradient estimation errors by avoiding model parameter staleness, while also speeding up the system by increasing the throughput of model updates. Our two main contributions can be summarized as follows. First, we prove a discrete variant of Little's law to derive a closed-form expression for relative delay, a metric that quantifies staleness. This allows us to efficiently minimize the average loss per model update, which has been the gold standard in literature to date. Second, we observe that naively optimizing this metric leads us to slow down the system drastically by overemphazing staleness at the detriment of throughput. This motivates us to introduce an alternative metric that also takes system speed into account, for which we derive a tractable upper-bound that can be minimized numerically. Extensive numerical results show that these optimizations enhance accuracy by 10% to 30%.
Queuing dynamics of asynchronous Federated Learning
Leconte, Louis, Jonckheere, Matthieu, Samsonov, Sergey, Moulines, Eric
We study asynchronous federated learning mechanisms with nodes having potentially different computational speeds. In such an environment, each node is allowed to work on models with potential delays and contribute to updates to the central server at its own pace. Existing analyses of such algorithms typically depend on intractable quantities such as the maximum node delay and do not consider the underlying queuing dynamics of the system. In this paper, we propose a non-uniform sampling scheme for the central server that allows for lower delays with better complexity, taking into account the closed Jackson network structure of the associated computational graph. Our experiments clearly show a significant improvement of our method over current state-of-the-art asynchronous algorithms on an image classification problem.
Score-Aware Policy-Gradient Methods and Performance Guarantees using Local Lyapunov Conditions: Applications to Product-Form Stochastic Networks and Queueing Systems
Comte, Céline, Jonckheere, Matthieu, Sanders, Jaron, Senen-Cerda, Albert
Stochastic networks and queueing systems often lead to Markov decision processes (MDPs) with large state and action spaces as well as nonconvex objective functions, which hinders the convergence of many reinforcement learning (RL) algorithms. Policy-gradient methods perform well on MDPs with large state and action spaces, but they sometimes experience slow convergence due to the high variance of the gradient estimator. In this paper, we show that some of these difficulties can be circumvented by exploiting the structure of the underlying MDP. We first introduce a new family of gradient estimators called score-aware gradient estimators (SAGEs). When the stationary distribution of the MDP belongs to an exponential family parametrized by the policy parameters, SAGEs allow us to estimate the policy gradient without relying on value-function estimation, contrary to classical policy-gradient methods like actor-critic. To demonstrate their applicability, we examine two common control problems arising in stochastic networks and queueing systems whose stationary distributions have a product-form, a special case of exponential families. As a second contribution, we show that, under appropriate assumptions, the policy under a SAGE-based policy-gradient method has a large probability of converging to an optimal policy, provided that it starts sufficiently close to it, even with a nonconvex objective function and multiple maximizers. Our key assumptions are that, locally around a maximizer, a nondegeneracy property of the Hessian of the objective function holds and a Lyapunov function exists. Finally, we conduct a numerical comparison between a SAGE-based policy-gradient method and an actor-critic algorithm. The results demonstrate that the SAGE-based method finds close-to-optimal policies more rapidly, highlighting its superior performance over the traditional actor-critic method.
Choosing the parameter of the Fermat distance: navigating geometry and noise
Chazal, Frédéric, Ferraris, Laure, Groisman, Pablo, Jonckheere, Matthieu, Pascal, Frédéric, Sapienza, Facundo
The Fermat distance has been recently established as a useful tool for machine learning tasks when a natural distance is not directly available to the practitioner or to improve the results given by Euclidean distances by exploding the geometrical and statistical properties of the dataset. This distance depends on a parameter $\alpha$ that greatly impacts the performance of subsequent tasks. Ideally, the value of $\alpha$ should be large enough to navigate the geometric intricacies inherent to the problem. At the same, it should remain restrained enough to sidestep any deleterious ramifications stemming from noise during the process of distance estimation. We study both theoretically and through simulations how to select this parameter.
FEMDA: a unified framework for discriminant analysis
Houdouin, Pierre, Jonckheere, Matthieu, Pascal, Frederic
Although linear and quadratic discriminant analysis are widely recognized classical methods, they can encounter significant challenges when dealing with non-Gaussian distributions or contaminated datasets. This is primarily due to their reliance on the Gaussian assumption, which lacks robustness. We first explain and review the classical methods to address this limitation and then present a novel approach that overcomes these issues. In this new approach, the model considered is an arbitrary Elliptically Symmetrical (ES) distribution per cluster with its own arbitrary scale parameter. This flexible model allows for potentially diverse and independent samples that may not follow identical distributions. By deriving a new decision rule, we demonstrate that maximum-likelihood parameter estimation and classification are simple, efficient, and robust compared to state-of-the-art methods.
Symphony of experts: orchestration with adversarial insights in reinforcement learning
Jonckheere, Matthieu, Mignacco, Chiara, Stoltz, Gilles
Structured reinforcement learning leverages policies with advantageous properties to reach better performance, particularly in scenarios where exploration poses challenges. We explore this field through the concept of orchestration, where a (small) set of expert policies guides decision-making; the modeling thereof constitutes our first contribution. We then establish value-functions regret bounds for orchestration in the tabular setting by transferring regret-bound results from adversarial settings. We generalize and extend the analysis of natural policy gradient in Agarwal et al. [2021, Section 5.3] to arbitrary adversarial aggregation strategies. We also extend it to the case of estimated advantage functions, providing insights into sample complexity both in expectation and high probability. A key point of our approach lies in its arguably more transparent proofs compared to existing methods. Finally, we present simulations for a stochastic matching toy model.
FEMDA: Une m\'ethode de classification robuste et flexible
Houdouin, Pierre, Jonckheere, Matthieu, Pascal, Frederic
Linear and Quadratic Discriminant Analysis (LDA and QDA) are well-known classical methods but can heavily suffer from non-Gaussian distributions and/or contaminated datasets, mainly because of the underlying Gaussian assumption that is not robust. This paper studies the robustness to scale changes in the data of a new discriminant analysis technique where each data point is drawn by its own arbitrary Elliptically Symmetrical (ES) distribution and its own arbitrary scale parameter. Such a model allows for possibly very heterogeneous, independent but non-identically distributed samples. The new decision rule derived is simple, fast and robust to scale changes in the data compared to others state-of-the-art methods.
Robust classification with flexible discriminant analysis in heterogeneous data
Houdouin, Pierre, Pascal, Frédéric, Jonckheere, Matthieu, Wang, Andrew
Linear and Quadratic Discriminant Analysis are well-known The new method called Generalized QDA (GQDA) classical methods but can heavily suffer from non-Gaussian relies on the estimation of a threshold parameter, whose optimal distributions and/or contaminated datasets, mainly because of value is fixed for each sub-family of distribution. The the underlying Gaussian assumption that is not robust. To fill case c 1 corresponds to the Gaussian case. Finally, [10] improved this gap, this paper presents a new robust discriminant analysis the previous work by adding robust estimators, coming where each data point is drawn by its own arbitrary Elliptically up with the Robust GQDA (RGQDA) method. Symmetrical (ES) distribution and its own arbitrary All these methods assume that all clusters belong to the scale parameter. Such a model allows for possibly very heterogeneous, same distribution family. In practice, such an hypothesis may independent but non-identically distributed samples.
Uncovering differential equations from data with hidden variables
Somacal, Agustín, Boechi, Leonardo, Jonckheere, Matthieu, Lefieux, Vincent, Picard, Dominique, Smucler, Ezequiel
Examples include meteorology, biology, and physics. The usual way to model deterministic dynamical systems is by using (partial) differential equations. Typically, differential equations models for a given dynamical system are derived using apriori insights into the problem at hand; then the model is validated using empirical observations. In an era in which massive data-sets pertaining to different fields of science are widely available, an interesting problem is whether it is possible for a useful differential equations model to be learned directly from data, without any major modeling effort required by the researcher. Our goal in this paper is to develop a general methodology for building such differential equations models in contexts in which not all relevant variables are observed, that is, in cases in which the main variable of interest depends on other variables of which no measurements are available. As a concrete example, consider the following problem. RTE, the electricity transmission system operator of France, uses high-level simulations of hourly temperature series to study the impact different climate scenarios have on electricity consumption, and hence on the French electrical power grid.