High-Dimensional Contextual Policy Search with Unknown Context Rewards using Bayesian Optimization

Neural Information Processing Systems 

Here we consider contextual policies that are a map from a discrete context to a set of continuous parameters . For example, video streaming and real-time conferencing systems use adaptive bitrate (ABR) algorithms to balance between video quality and uninterrupted playback.