Coarse-Tuning Models of Code with Reinforcement Learning Feedback