Round-trip Reinforcement Learning: Self-Consistent Training for Better Chemical LLMs

Open in new window