Goto

Collaborating Authors

 Reinforcement Learning




PrefPaint: Aligning Image Inpainting Diffusion Model with Human Preference

Neural Information Processing Systems

Specifically, instead of directly measuring the divergence with paired images, we train a reward model with the dataset we construct, consisting of nearly 51,000 images annotated with human preferences.



f0eb6568ea114ba6e293f903c34d7488-Paper.pdf

Neural Information Processing Systems

Several works haveshown this vulnerability via adversarial attacks, butexisting approaches onimproving therobustness ofDRL under this setting have limited success and lack for theoretical principles. We show that naively applying existing techniques on improving robustness for classification tasks,likeadversarialtraining,areineffectiveformanyRLtasks.



ASelf-TuningActor-CriticAlgorithm

Neural Information Processing Systems

The general concept is to represent the training loss as a function of both the agent parameters and the hyperparameters. The agent optimizes the parameters to minimize this loss function, w.r.t the current hyperparameters.




XDO: ADoubleOracleAlgorithmfor Extensive-FormGames

Neural Information Processing Systems

Policy Space Response Oracles (PSRO) is a reinforcement learning (RL) algorithm for two-player zero-sum games that has been empirically shown to find approximate Nash equilibria in large games.