Round-trip Reinforcement Learning: Self-Consistent Training for Better Chemical LLMs