Goto

Collaborating Authors

 Reinforcement Learning





Adversarially Trained Weighted Actor-Critic for Safe Offline Reinforcement Learning

Neural Information Processing Systems

Additionally, we offer a practical version of WSAC and compare it with existing state-of-the-art safe offline RL algorithms in several continuous control environments.






PrefPaint: Aligning Image Inpainting Diffusion Model with Human Preference

Neural Information Processing Systems

Specifically, instead of directly measuring the divergence with paired images, we train a reward model with the dataset we construct, consisting of nearly 51,000 images annotated with human preferences.


Policy Learning from Tutorial Books via Understanding, Rehearsing and Introspecting Xiong-Hui Chen

Neural Information Processing Systems

However, current research for decision-making, like reinforcement learning (RL), has primarily required numerous real interactions with the target environment to learn a skill, while failing to utilize the existing knowledge already summarized in the text.