Exploring RL-based LLM Training for Formal Language Tasks with Programmed Rewards