Beyond Black-Box Advice: Learning-Augmented Algorithms for MDPs with Q-Value Predictions

Neural Information Processing Systems 

A notable example that motivates our work is the problem of minimizing costs (or maximizing rewards) in a single-trajectory Markov Decision Process (MDP).

Similar Docs  Excel Report  more

TitleSimilaritySource
None found