Active-GRPO: Adaptive Imitation and Self-Improving Reasoning for Molecular Optimization

Open in new window