First-Explore, then Exploit: Meta-Learning to Solve Hard Exploration-Exploitation Trade-Offs Ben Norman

Neural Information Processing Systems 

The objective is to maximize the total reward accumulated over all episodes (e.g., the number of games won), expressed as

Similar Docs  Excel Report  more

TitleSimilaritySource
None found