Goto

Collaborating Authors

 Reinforcement Learning





Deep Inverse Q-learning with Constraints Appendix Gabriel Kalweit

Neural Information Processing Systems

Visualizations of the real and learned state-values of IA VI, IQL and DIQL can be found in Figure 7.Figure 7: Visualization of state-values for different numbers of trajectories in Objectworld. Table 2: Comparison between online and offline estimation of state-action visitations for the Ob-jectworld environment, given a data set with an action distribution equivalent to the true optimal Boltzmann distribution. The pseudocode of the tabular variant of Constrained Inverse Q-learning can be found in Algorithm 4. See [4] for further details of Constrained Q-learning.Algorithm 4: Tabular Model-free Constrained Inverse Q-learning The pseudocode of Deep Constrained Inverse Q-learning can be found in Algorithm 5. The lower row shows the EVD. 3 For DIQL, the parameters were optimized in the range of Hence, it can only increase.


Deep Inverse Q-learning with Constraints

Neural Information Processing Systems

Popular Maximum Entropy Inverse Reinforcement Learning approaches require the computation of expected state visitation frequencies for the optimal policy under an estimate of the reward function.