Goto

Collaborating Authors

 Reinforcement Learning


MinimaxValueIntervalforOff-PolicyEvaluation andPolicyOptimization

Neural Information Processing Systems

FunctionApproximation Throughout thepaper,weassume access totwofunction classesQ (S A R)andW (S A R). Todevelop intuition, theyare supposed to modelQฯ€ and wฯ€/ยต, respectively, though most of our main results are stated without assuming any kind of realizability.





OntheConvergenceTheoryofDebiased Model-AgnosticMeta-ReinforcementLearning

Neural Information Processing Systems

In particular, using stochastic gradients in MAML update steps is crucial for RL problems since computation of exact gradients requires access to a large number of possible trajectories.




VisualAdversarialImitationLearning usingVariationalModels

Neural Information Processing Systems

Behaviour cloning (BC) is a classic algorithm to imitate expert demonstrations [7], which uses supervised learning to greedily match the expert behaviour at demonstrated expert states. Due to environmentstochasticity,covariateshift,andpolicyapproximationerror,theagentmaydriftaway from the expert state distribution and ultimately fail to mimic the demonstrator [8].