AITopics

2503.03565

Country:

Europe > France (0.14)
South America > Uruguay (0.14)
North America > United States (0.14)
Europe > Italy (0.14)

Genre: Research Report > New Finding (0.66)

Industry: Energy > Oil & Gas > Upstream (0.86)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)

arXiv.org Artificial IntelligenceFeb-12-2025

Optimizing Asynchronous Federated Learning: A Delicate Trade-Off Between Model-Parameter Staleness and Update Frequency

Alahyane, Abdelkrim, Comte, Céline, Jonckheere, Matthieu, Moulines, Éric

Synchronous federated learning (FL) scales poorly with the number of clients due to the straggler effect. Algorithms like FedAsync and GeneralizedFedAsync address this limitation by enabling asynchronous communication between clients and the central server. In this work, we rely on stochastic modeling to better understand the impact of design choices in asynchronous FL algorithms, such as the concurrency level and routing probabilities, and we leverage this knowledge to optimize loss. We characterize in particular a fundamental trade-off for optimizing asynchronous FL: minimizing gradient estimation errors by avoiding model parameter staleness, while also speeding up the system by increasing the throughput of model updates. Our two main contributions can be summarized as follows. First, we prove a discrete variant of Little's law to derive a closed-form expression for relative delay, a metric that quantifies staleness. This allows us to efficiently minimize the average loss per model update, which has been the gold standard in literature to date. Second, we observe that naively optimizing this metric leads us to slow down the system drastically by overemphazing staleness at the detriment of throughput. This motivates us to introduce an alternative metric that also takes system speed into account, for which we derive a tractable upper-bound that can be minimized numerically. Extensive numerical results show that these optimizations enhance accuracy by 10% to 30%.

artificial intelligence, deep learning, machine learning, (17 more...)

2502.08206

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Machine LearningFeb-12-2024

Queuing dynamics of asynchronous Federated Learning

Leconte, Louis, Jonckheere, Matthieu, Samsonov, Sergey, Moulines, Eric

We study asynchronous federated learning mechanisms with nodes having potentially different computational speeds. In such an environment, each node is allowed to work on models with potential delays and contribute to updates to the central server at its own pace. Existing analyses of such algorithms typically depend on intractable quantities such as the maximum node delay and do not consider the underlying queuing dynamics of the system. In this paper, we propose a non-uniform sampling scheme for the central server that allows for lower delays with better complexity, taking into account the closed Jackson network structure of the associated computational graph. Our experiments clearly show a significant improvement of our method over current state-of-the-art asynchronous algorithms on an image classification problem.

artificial intelligence, machine learning, node, (14 more...)

2405.00017

Country:

Europe (0.46)
North America > United States > New York (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

arXiv.org Artificial IntelligenceDec-5-2023

Score-Aware Policy-Gradient Methods and Performance Guarantees using Local Lyapunov Conditions: Applications to Product-Form Stochastic Networks and Queueing Systems

Comte, Céline, Jonckheere, Matthieu, Sanders, Jaron, Senen-Cerda, Albert

Stochastic networks and queueing systems often lead to Markov decision processes (MDPs) with large state and action spaces as well as nonconvex objective functions, which hinders the convergence of many reinforcement learning (RL) algorithms. Policy-gradient methods perform well on MDPs with large state and action spaces, but they sometimes experience slow convergence due to the high variance of the gradient estimator. In this paper, we show that some of these difficulties can be circumvented by exploiting the structure of the underlying MDP. We first introduce a new family of gradient estimators called score-aware gradient estimators (SAGEs). When the stationary distribution of the MDP belongs to an exponential family parametrized by the policy parameters, SAGEs allow us to estimate the policy gradient without relying on value-function estimation, contrary to classical policy-gradient methods like actor-critic. To demonstrate their applicability, we examine two common control problems arising in stochastic networks and queueing systems whose stationary distributions have a product-form, a special case of exponential families. As a second contribution, we show that, under appropriate assumptions, the policy under a SAGE-based policy-gradient method has a large probability of converging to an optimal policy, provided that it starts sufficiently close to it, even with a nonconvex objective function and multiple maximizers. Our key assumptions are that, locally around a maximizer, a nondegeneracy property of the Hessian of the objective function holds and a Lyapunov function exists. Finally, we conduct a numerical comparison between a SAGE-based policy-gradient method and an actor-critic algorithm. The results demonstrate that the SAGE-based method finds close-to-optimal policies more rapidly, highlighting its superior performance over the traditional actor-critic method.

artificial intelligence, machine learning, probability, (17 more...)

2312.02804

Country:

Europe (0.67)
North America > United States (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

arXiv.org Machine LearningNov-30-2023

Choosing the parameter of the Fermat distance: navigating geometry and noise

Chazal, Frédéric, Ferraris, Laure, Groisman, Pablo, Jonckheere, Matthieu, Pascal, Frédéric, Sapienza, Facundo

The Fermat distance has been recently established as a useful tool for machine learning tasks when a natural distance is not directly available to the practitioner or to improve the results given by Euclidean distances by exploding the geometrical and statistical properties of the dataset. This distance depends on a parameter $\alpha$ that greatly impacts the performance of subsequent tasks. Ideally, the value of $\alpha$ should be large enough to navigate the geometric intricacies inherent to the problem. At the same, it should remain restrained enough to sidestep any deleterious ramifications stemming from noise during the process of distance estimation. We study both theoretically and through simulations how to select this parameter.

artificial intelligence, fermat distance, machine learning, (18 more...)

2311.18663

Country: North America > United States > California > Alameda County > Berkeley (0.14)

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

arXiv.org Machine LearningNov-13-2023

FEMDA: a unified framework for discriminant analysis

Houdouin, Pierre, Jonckheere, Matthieu, Pascal, Frederic

Although linear and quadratic discriminant analysis are widely recognized classical methods, they can encounter significant challenges when dealing with non-Gaussian distributions or contaminated datasets. This is primarily due to their reliance on the Gaussian assumption, which lacks robustness. We first explain and review the classical methods to address this limitation and then present a novel approach that overcomes these issues. In this new approach, the model considered is an arbitrary Elliptically Symmetrical (ES) distribution per cluster with its own arbitrary scale parameter. This flexible model allows for potentially diverse and independent samples that may not follow identical distributions. By deriving a new decision rule, we demonstrate that maximum-likelihood parameter estimation and classification are simple, efficient, and robust compared to state-of-the-art methods.

artificial intelligence, bayesian inference, machine learning, (15 more...)

2311.07518

Genre: Research Report > Promising Solution (0.54)

Industry: Health & Medicine (0.95)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

arXiv.org Machine LearningOct-25-2023

Symphony of experts: orchestration with adversarial insights in reinforcement learning

Jonckheere, Matthieu, Mignacco, Chiara, Stoltz, Gilles

Structured reinforcement learning leverages policies with advantageous properties to reach better performance, particularly in scenarios where exploration poses challenges. We explore this field through the concept of orchestration, where a (small) set of expert policies guides decision-making; the modeling thereof constitutes our first contribution. We then establish value-functions regret bounds for orchestration in the tabular setting by transferring regret-bound results from adversarial settings. We generalize and extend the analysis of natural policy gradient in Agarwal et al. [2021, Section 5.3] to arbitrary adversarial aggregation strategies. We also extend it to the case of estimated advantage functions, providing insights into sample complexity both in expectation and high probability. A key point of our approach lies in its arguably more transparent proofs compared to existing methods. Finally, we present simulations for a stochastic matching toy model.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2310.16473

Genre: Research Report (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceJul-4-2023

FEMDA: Une m\'ethode de classification robuste et flexible

Houdouin, Pierre, Jonckheere, Matthieu, Pascal, Frederic

Linear and Quadratic Discriminant Analysis (LDA and QDA) are well-known classical methods but can heavily suffer from non-Gaussian distributions and/or contaminated datasets, mainly because of the underlying Gaussian assumption that is not robust. This paper studies the robustness to scale changes in the data of a new discriminant analysis technique where each data point is drawn by its own arbitrary Elliptically Symmetrical (ES) distribution and its own arbitrary scale parameter. Such a model allows for possibly very heterogeneous, independent but non-identically distributed samples. The new decision rule derived is simple, fast and robust to scale changes in the data compared to others state-of-the-art methods.

artificial intelligence, classification, contamination, (17 more...)

2307.01954

Country:

Europe > France (0.15)
North America > United States > California (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence (0.47)

arXiv.org Machine LearningJan-9-2022

Robust classification with flexible discriminant analysis in heterogeneous data

Houdouin, Pierre, Pascal, Frédéric, Jonckheere, Matthieu, Wang, Andrew

Linear and Quadratic Discriminant Analysis are well-known The new method called Generalized QDA (GQDA) classical methods but can heavily suffer from non-Gaussian relies on the estimation of a threshold parameter, whose optimal distributions and/or contaminated datasets, mainly because of value is fixed for each sub-family of distribution. The the underlying Gaussian assumption that is not robust. To fill case c 1 corresponds to the Gaussian case. Finally, [10] improved this gap, this paper presents a new robust discriminant analysis the previous work by adding robust estimators, coming where each data point is drawn by its own arbitrary Elliptically up with the Robust GQDA (RGQDA) method. Symmetrical (ES) distribution and its own arbitrary All these methods assume that all clusters belong to the scale parameter. Such a model allows for possibly very heterogeneous, same distribution family. In practice, such an hypothesis may independent but non-identically distributed samples.

artificial intelligence, contamination rate, machine learning, (16 more...)

2201.02967

Country: Europe (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

arXiv.org Machine LearningFeb-6-2020

Uncovering differential equations from data with hidden variables

Somacal, Agustín, Boechi, Leonardo, Jonckheere, Matthieu, Lefieux, Vincent, Picard, Dominique, Smucler, Ezequiel

Examples include meteorology, biology, and physics. The usual way to model deterministic dynamical systems is by using (partial) differential equations. Typically, differential equations models for a given dynamical system are derived using apriori insights into the problem at hand; then the model is validated using empirical observations. In an era in which massive data-sets pertaining to different fields of science are widely available, an interesting problem is whether it is possible for a useful differential equations model to be learned directly from data, without any major modeling effort required by the researcher. Our goal in this paper is to develop a general methodology for building such differential equations models in contexts in which not all relevant variables are observed, that is, in cases in which the main variable of interest depends on other variables of which no measurements are available. As a concrete example, consider the following problem. RTE, the electricity transmission system operator of France, uses high-level simulations of hourly temperature series to study the impact different climate scenarios have on electricity consumption, and hence on the French electrical power grid.

artificial intelligence, dynamical systems, machine learning, (17 more...)

2002.0225

Country: Europe > France (0.25)

Genre: Research Report (0.40)

Industry: Energy > Power Industry (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)