Generative Augmented Flow Networks
Pan, Ling, Zhang, Dinghuai, Courville, Aaron, Huang, Longbo, Bengio, Yoshua
–arXiv.org Artificial Intelligence
Deep reinforcement learning (RL) has achieved significant progress in recent years with particular success in games (Mnih et al., 2015, Silver et al., 2016, Vinyals et al., 2019). RL methods applied to the setting where a reward is only given at the end (i.e., terminal states) typically aim at maximizing that reward function for learning the optimal policy. However, diversity of the generated states is desirable in a wide range of practical scenarios including molecule generation (Bengio et al., 2021a), biological sequence design (Jain et al., 2022b), recommender systems (Kunaver and Požrl, 2017), dialogue systems (Zhang et al., 2020), etc. For example, in molecule generation, the reward function used in in-silico simulations can be uncertain and imperfect itself (compared to the more expensive in-vivo experiments). Therefore, it is not sufficient to only search the solution that maximizes the return. Instead, it is desired that we sample many high-reward candidates, which can be achieved by sampling them proportionally to the reward of each terminal state. Interestingly, GFlowNets (Bengio et al., 2021a,b) learn a stochastic policy to sample composite objects x X with probability proportional to the return R(x) . The learning paradigm of GFlowNets is different from other RL methods, as it is explicitly aiming at modeling the diversity in the target distribution, i.e., all the modes of the reward function. This makes it natural for practical applications where the model should discover objects that are both interesting and diverse, which is a focus of previous GFlowNet works (Bengio et al., 2021a,b, Jain et al., 2022b, Malkin et al., 2022).
arXiv.org Artificial Intelligence
Oct-6-2022