Kim, Hyeonah
Neural Genetic Search in Discrete Spaces
Kim, Hyeonah, Choi, Sanghyeok, Son, Jiwoo, Park, Jinkyoo, Kwon, Changhyun
Effective search methods are crucial for improving the performance of deep generative models at test time. In this paper, we introduce a novel test-time search method, Neural Genetic Search (NGS), which incorporates the evolutionary mechanism of genetic algorithms into the generation procedure of deep models. The core idea behind NGS is its crossover, which is defined as parent-conditioned generation using trained generative models. This approach offers a versatile and easy-to-implement search algorithm for deep generative models. We demonstrate the effectiveness and flexibility of NGS through experiments across three distinct domains: routing problems, adversarial prompt generation for language models, and molecular design.
Improved Off-policy Reinforcement Learning in Biological Sequence Design
Kim, Hyeonah, Kim, Minsu, Yun, Taeyoung, Choi, Sanghyeok, Bengio, Emmanuel, Hernรกndez-Garcรญa, Alex, Park, Jinkyoo
Designing biological sequences with desired properties is a significant challenge due to the combinatorially vast search space and the high cost of evaluating each candidate sequence. To address these challenges, reinforcement learning (RL) methods, such as GFlowNets, utilize proxy models for rapid reward evaluation and annotated data for policy training. Although these approaches have shown promise in generating diverse and novel sequences, the limited training data relative to the vast search space often leads to the misspecification of proxy for out-of-distribution inputs. We introduce $\delta$-Conservative Search, a novel off-policy search method for training GFlowNets designed to improve robustness against proxy misspecification. The key idea is to incorporate conservativeness, controlled by parameter $\delta$, to constrain the search to reliable regions. Specifically, we inject noise into high-score offline sequences by randomly masking tokens with a Bernoulli distribution of parameter $\delta$ and then denoise masked tokens using the GFlowNet policy. Additionally, $\delta$ is adaptively adjusted based on the uncertainty of the proxy model for each data point. This enables the reflection of proxy uncertainty to determine the level of conservativeness. Experimental results demonstrate that our method consistently outperforms existing machine learning methods in discovering high-score sequences across diverse tasks-including DNA, RNA, protein, and peptide design-especially in large-scale scenarios.
Ant Colony Sampling with GFlowNets for Combinatorial Optimization
Kim, Minsu, Choi, Sanghyeok, Kim, Hyeonah, Son, Jiwoo, Park, Jinkyoo, Bengio, Yoshua
This paper introduces the Generative Flow Ant Colony Sampler (GFACS), a neural-guided probabilistic search algorithm for solving combinatorial optimization (CO). GFACS integrates generative flow networks (GFlowNets), an emerging amortized inference method, with ant colony optimization (ACO), a promising probabilistic search algorithm. Specifically, we use GFlowNets to learn a constructive policy in combinatorial spaces for enhancing ACO by providing an informed prior distribution over decision variables conditioned on input graph instances. Furthermore, we introduce a novel off-policy training algorithm for scaling conditional GFlowNets into large-scale combinatorial spaces by leveraging local search and shared energy normalization. Our experimental results demonstrate that GFACS outperforms baseline ACO algorithms in seven CO tasks and is competitive with problem-specific heuristics for vehicle routing problems.
Symmetric Replay Training: Enhancing Sample Efficiency in Deep Reinforcement Learning for Combinatorial Optimization
Kim, Hyeonah, Kim, Minsu, Ahn, Sungsoo, Park, Jinkyoo
Deep reinforcement learning (DRL) has significantly advanced the field of combinatorial optimization (CO). However, its practicality is hindered by the necessity for a large number of reward evaluations, especially in scenarios involving computationally intensive function assessments. To enhance the sample efficiency, we propose a simple but effective method, called symmetric replay training (SRT), which can be easily integrated into various DRL methods. Our method leverages high-reward samples to encourage exploration of the under-explored symmetric regions without additional online interactions - free. Through replay training, the policy is trained to maximize the likelihood of the symmetric trajectories of discovered high-rewarded samples. Experimental results demonstrate the consistent improvement of our method in sample efficiency across diverse DRL methods applied to real-world tasks, such as molecular optimization and hardware design.
Genetic-guided GFlowNets: Advancing in Practical Molecular Optimization Benchmark
Kim, Hyeonah, Kim, Minsu, Choi, Sanghyeok, Park, Jinkyoo
The proposed method shows a stateof-the-art score of 16.213, significantly outperforming the reported best score in the benchmark genetic algorithms (e.g., Jensen, 2019). of 15.185, in practical molecular optimization The recent work of Gao et al. (2022a) proposes a practical (PMO), which is an official benchmark for molecular optimization (PMO) benchmark, emphasizing sample-efficient molecular optimization. Remarkably, the importance of sample efficiency in de novo molecular ours exceeds all baselines, including reinforcement optimization for practical applicability. The benchmark is learning, Bayesian optimization, generative reasonable because real-world applications of molecule optimization models, GFlowNets, and genetic algorithms, (e.g., drug discovery) require expensive scoring in 14 out of 23 tasks. Our code is available at processes such as wet lab experiments.
RL4CO: a Unified Reinforcement Learning for Combinatorial Optimization Library
Berto, Federico, Hua, Chuanbo, Park, Junyoung, Kim, Minsu, Kim, Hyeonah, Son, Jiwoo, Kim, Haeyeon, Kim, Joungho, Park, Jinkyoo
Deep reinforcement learning offers notable benefits in addressing combinatorial problems over traditional solvers, reducing the reliance on domain-specific knowledge and expert solutions, and improving computational efficiency. Despite the recent surge in interest in neural combinatorial optimization, practitioners often do not have access to a standardized code base. Moreover, different algorithms are frequently based on fragmentized implementations that hinder reproducibility and fair comparison. To address these challenges, we introduce RL4CO, a unified Reinforcement Learning (RL) for Combinatorial Optimization (CO) library. We employ state-of-the-art software and best practices in implementation, such as modularity and configuration management, to be flexible, easily modifiable, and extensible by researchers. Thanks to our unified codebase, we benchmark baseline RL solvers with different evaluation schemes on zero-shot performance, generalization, and adaptability on diverse tasks. Notably, we find that some recent methods may fall behind their predecessors depending on the evaluation settings. We hope RL4CO will encourage the exploration of novel solutions to complex real-world tasks, allowing the community to compare with existing methods through a unified framework that decouples the science from software engineering. We open-source our library at https://github.com/ai4co/rl4co.
A Neural Separation Algorithm for the Rounded Capacity Inequalities
Kim, Hyeonah, Park, Jinkyoo, Kwon, Changhyun
The cutting plane method is a key technique for successful branch-and-cut and branch-price-and-cut algorithms that find the exact optimal solutions for various vehicle routing problems (VRPs). Among various cuts, the rounded capacity inequalities (RCIs) are the most fundamental. To generate RCIs, we need to solve the separation problem, whose exact solution takes a long time to obtain; therefore, heuristic methods are widely used. We design a learning-based separation heuristic algorithm with graph coarsening that learns the solutions of the exact separation problem with a graph neural network (GNN), which is trained with small instances of 50 to 100 customers. We embed our separation algorithm within the cutting plane method to find a lower bound for the capacitated VRP (CVRP) with up to 1,000 customers. We compare the performance of our approach with CVRPSEP, a popular separation software package for various cuts used in solving VRPs. Our computational results show that our approach finds better lower bounds than CVRPSEP for large-scale problems with 400 or more customers, while CVRPSEP shows strong competency for problems with less than 400 customers.
Meta-SAGE: Scale Meta-Learning Scheduled Adaptation with Guided Exploration for Mitigating Scale Shift on Combinatorial Optimization
Son, Jiwoo, Kim, Minsu, Kim, Hyeonah, Park, Jinkyoo
This paper proposes Meta-SAGE, a novel approach for improving the scalability of deep reinforcement learning models for combinatorial optimization (CO) tasks. Our method adapts pre-trained models to larger-scale problems in test time by suggesting two components: a scale meta-learner (SML) and scheduled adaptation with guided exploration (SAGE). First, SML transforms the context embedding for subsequent adaptation of SAGE based on scale information. Then, SAGE adjusts the model parameters dedicated to the context embedding for a specific instance. SAGE introduces locality bias, which encourages selecting nearby locations to determine the next location. The locality bias gradually decays as the model is adapted to the target instance. Results show that Meta-SAGE outperforms previous adaptation methods and significantly improves scalability in representative CO tasks. Our source code is available at https://github.com/kaist-silab/meta-sage