Goto

Collaborating Authors

 Reinforcement Learning


Near-OptimalRegretforAdversarialMDPwith DelayedBanditFeedback

Neural Information Processing Systems

The standard assumption in reinforcement learning (RL) is that agents observe feedback for their actions immediately. However, in practice feedback is often observedindelay.



Real-Time Reinforcement Learning

Neural Information Processing Systems

While it is well suited to describe turn-based decision problems such as board games, this framework is ill suited for real-time applications in which the environment's state continues to evolve while the agent selects an action (Travnik et al., 2018). Nevertheless, this framework hasbeen used forreal-time problems using what areessentially tricks, e.g.





Learning to Discover Skills through Guidance Hyunseung Kim,1 Byungkun Lee,1 Hojoon Lee

Neural Information Processing Systems

However, we have identified that the effectiveness of these rewards declines as the environmental complexity rises. Therefore, we present a novel USD algorithm, skill disco very with gui dance ( DISCO-DANCE), which (1) selects the guide skill that possesses the highest potential to reach unexplored states, (2) guides other skills to follow guide skill, then (3) the guided skills are dispersed to maximize their discriminability in unexplored states. Empirical evaluation demonstrates that DISCO-DANCE outperforms other USD baselines in challenging environments, including two navigation benchmarks and a continuous control benchmark.


Addressing Sample Complexity in Visual Tasks Using HER and Hallucinatory GANs

Neural Information Processing Systems

To this end, Andrychowicz et al.[1] introduced Hindsight Experience Replay (HER), which can rapidly train goal-conditioned policies by retroactively imagining failed trajectories as successful ones.


EfficientSchedulingofDataAugmentation forDeepReinforcementLearning

Neural Information Processing Systems

However,evenwhentheprior is useful for generalization, distilling it to RL agent often interferes with RL training and degenerates sample efficiency.


EfficientSchedulingofDataAugmentation forDeepReinforcementLearning

Neural Information Processing Systems

However,evenwhentheprior is useful for generalization, distilling it to RL agent often interferes with RL training and degenerates sample efficiency.