POLO: Preference-Guided Multi-Turn Reinforcement Learning for Lead Optimization