Goto

Collaborating Authors

 Reinforcement Learning





Identifiabilityininversereinforcementlearning

Neural Information Processing Systems

Inverse reinforcement learning attempts to reconstruct the reward function in a Markov decision problem, using observations of agent actions. As already observed in Russell [1998] the problem is ill-posed, and the reward function is not identifiable, even under the presence of perfect information about optimal behavior. We provide a resolution to this non-identifiability for problems with entropyregularization.




ProvablyEfficientModel-FreeConstrainedRLwith LinearFunctionApproximation

Neural Information Processing Systems

We study the constrained reinforcement learning problem, in which an agent aims tomaximize the expected cumulativereward subject toaconstraint on the expected total value of a utility function. In contrast to existing model-based approaches or model-free methods accompanied with a'simulator', we aim to develop thefirst model-free, simulator-freealgorithm that achieves a sublinear regret and a sublinear constraint violation even inlarge-scale systems.


ProvablyEfficientModel-FreeConstrainedRLwith LinearFunctionApproximation

Neural Information Processing Systems

We study the constrained reinforcement learning problem, in which an agent aims tomaximize the expected cumulativereward subject toaconstraint on the expected total value of a utility function. In contrast to existing model-based approaches or model-free methods accompanied with a'simulator', we aim to develop thefirst model-free, simulator-freealgorithm that achieves a sublinear regret and a sublinear constraint violation even inlarge-scale systems.



EDGE: Explaining Deep Reinforcement Learning Policies S1 Additional Technical Details

Neural Information Processing Systems

Note that these games are two-player games, we select the runner in You-Shall-Not-Pass and kicker in Kick-And-Defend as our target agent. Section 4 mentioned that we download a well-trained policy for each game.