POLO: Preference-Guided Multi-Turn Reinforcement Learning for Lead Optimization

Open in new window