Exploring RL-based LLM Training for Formal Language Tasks with Programmed Rewards

Open in new window