Offline Imitation Learning from Multiple Baselines with Applications to Compiler Optimization

Marinov, Teodor V., Agarwal, Alekh, Trofin, Mircea

Mar-28-2024–arXiv.org Artificial Intelligence

This work studies a Reinforcement Learning (RL) problem in which we are given a set of trajectories collected with K baseline policies. Each of these policies can be quite suboptimal in isolation, and have strong performance in complementary parts of the state space. The goal is to learn a policy which performs as well as the best combination of baselines on the entire state space. We propose a simple imitation learning based algorithm, show a sample complexity bound on its accuracy and prove that the the algorithm is minimax optimal by showing a matching lower bound. Further, we apply the algorithm in the setting of machine learning guided compiler optimization to learn policies for inlining programs with the objective of creating a small binary. We demonstrate that we can learn a policy that outperforms an initial policy learned via standard RL through a few iterations of our approach.

machine learning, reinforcement learning, trajectory, (15 more...)

arXiv.org Artificial Intelligence

Mar-28-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.29)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (1.00)
  - Representation & Reasoning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found