Goto

Collaborating Authors

 Reinforcement Learning


Temporal Regularization for Markov Decision Process

Neural Information Processing Systems

Yetinreinforcementlearning,duetothenatureofthe Bellman equation, there isanopportunity toalsoexploit temporal regularization based on smoothness in value estimates over trajectories. This paper explores a class of methods for temporal regularization.


Exponentially Weighted Imitation Learning for Batched Historical Data

Neural Information Processing Systems

We consider deep policy learning with only batched historical trajectories. The main challenge of this problem is that the learner no longer has a simulator or "environment oracle" as in most reinforcement learning settings.




LAPO: Latent-VariableAdvantage-WeightedPolicy OptimizationforOfflineReinforcementLearning

Neural Information Processing Systems

But in practice, it requires querying the behavior policy which is unknown, and using an erroneous approximation of the behavior policy can negatively affect the performance ([39]).



Teaching Inverse Reinforcement Learners via Features and Demonstrations

Neural Information Processing Systems

Weintroduceanaturalquantity,the teaching risk, which measures the potential suboptimality of policies that look optimal to the learner in this setting. We show that bounds on the teaching risk guarantee that the learner is able to find a near-optimal policy using standard algorithms basedoninversereinforcement learning. Basedonthesefindings, we suggest a teaching scheme in which the expert can decrease the teaching risk by updating the learner's worldview, and thus ultimately enable her to find a near-optimalpolicy.



eda9523faa5e7191aee1c2eaff669716-Supplemental-Conference.pdf

Neural Information Processing Systems

Though promising results have been reported on some RL application domains, policies learned with such representations usually fail to generalize well in a complex environment because minimizing a reconstruction loss may potentially introduce local (visual) features with task-irrelevant information.


eda9523faa5e7191aee1c2eaff669716-Paper-Conference.pdf

Neural Information Processing Systems

Though promising results have been reported on some RL application domains, policies learned with such representations usually fail to generalize well in a complex environment because minimizing a reconstruction loss may potentially introduce local (visual) features with task-irrelevant information.