Reinforcement Learning
ActiveExplorationfor InverseReinforcementLearning
Instead of using an explicit reward function, Inverse Reinforcement Learning (IRL; Ng et al., 2000) seeks to recover the reward by observing anexpert,e.g.,anhuman whoalready knowshowtoperform atask. However,most existing IRL algorithms assume that the transition model, and in some cases, the expert's policy, areknown.
ImprovingSampleComplexityBoundsfor(Natural) Actor-CriticAlgorithms
The goal of reinforcement learning (RL) [39] is to maximize the expected total reward by taking actions according toapolicyinastochastic environment, whichismodelled asaMarkovdecision process (MDP) [4]. To obtain an optimal policy, one popular method is the direct maximization of the expected total reward via gradient ascent, which is referred to as the policy gradient (PG) method [40,47].