Federated Ensemble-Directed Offline Reinforcement Learning

Rengarajan, Desik, Ragothaman, Nitin, Kalathil, Dileep, Shakkottai, Srinivas

arXiv.org Artificial Intelligence 

Federated learning is an approach wherein clients learn collaboratively by sharing their locally trained models (not their data) with a federating agent, which periodically combines their models and returns the federated model to the clients for further refinement (Kairouz et al., 2021; Wang et al., 2021). Federated learning has seen much success in supervised learning applications due to its ability to generate well-trained models using small amounts of data at each client while preserving privacy and reducing the usage of communication resources. Recently, there is a growing interest in employing federated learning for online RL problems where each client collects data online by following its own Markovian trajectory, while simultaneously updating the model parameters (Khodadadian et al., 2022; Nadiger et al., 2019; Qi et al., 2021). However, such an online learning approach requires sequential interactions with the environment or the simulator, which may not be feasible in many real-world applications. Instead, each clients may have pre-collected operational data generated according to a client-specific behavior policy. The federated offline reinforcement learning problem is to learn the optimal policy using these heterogeneous offline data sets distributed across the clients and collected by different unknown behavior policies, without sharing the data explicitly. The framework of offline reinforcement learning (Levine et al., 2020) offers a way to learn the policy only using the offline data collected according a behavior policy, without any direct interactions with the environment. However, naively combining an off-the-shelf offline RL algorithm such as TD3-BC (Fujimoto & Gu, 2021) with an off-the-shelf federated supervised learning approach such as FedAvg (McMahan et al., 2017) will lead to a poorly performing policy, as we show later (see Figure 1-3). Federated offline RL is significantly more challenging than its supervised learning counterpart and the centralized offline RL because of the following reasons.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found