Learning to Reason in Large Theories without Imitation