Goto

Collaborating Authors

 Reinforcement Learning



TowardtheFundamentalLimitsofImitation Learning

Neural Information Processing Systems

We then propose a novel algorithm based on minimum-distance functionals in the setting where the transition model is given and the expert is deterministic.Thealgorithmissuboptimalby .|S|H3/2/N,matchingourlower




Overleaf Example

Neural Information Processing Systems

We model episode sessions--parts of the episode where the latent state isfixed--and propose three keymodifications toexisting meta-RL methods: (i) consistency of latent information within sessions, (ii) session masking, and (iii) priorlatent conditioning.





Safety through feedback in Constrained RL

Neural Information Processing Systems

This feedback can be system generated or elicited from a human observing the training process. Previous approaches have not been able to scale to complex environments and are constrained to receiving feedback at the state level which can be expensive to collect. To this end, we introduce an approach that scales to more complex domains and extends beyond state-level feedback, thus, reducing the burden on the evaluator.