SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks

Open in new window