Multi-Fidelity Hybrid Reinforcement Learning via Information Gain Maximization
Sifaou, Houssem, Simeone, Osvaldo
–arXiv.org Artificial Intelligence
Optimizing a reinforcement learning (RL) policy typically requires extensive interactions with a high-fidelity simulator of the environment, which are often costly or impractical. Offline RL addresses this problem by allowing training from pre-collected data, but its effectiveness is strongly constrained by the size and quality of the dataset. Hybrid offline-online RL leverages both offline data and interactions with a single simulator of the environment. In many real-world scenarios, however, multiple simulators with varying levels of fidelity and computational cost are available. In this work, we study multi-fidelity hybrid RL for policy optimization under a fixed cost budget. We introduce multi-fidelity hybrid RL via information gain maximization (MF-HRL-IGM), a hybrid offline-online RL algorithm that implements fidelity selection based on information gain maximization through a bootstrapping approach. Theoretical analysis establishes the no-regret property of MF-HRL-IGM, while empirical evaluations demonstrate its superior performance compared to existing benchmarks.
arXiv.org Artificial Intelligence
Sep-19-2025
- Country:
- Asia > Middle East
- Jordan (0.04)
- Europe > United Kingdom
- England
- Cambridgeshire > Cambridge (0.04)
- Greater London > London (0.40)
- England
- Asia > Middle East
- Genre:
- Research Report (0.40)
- Technology: