CaRL: Learning Scalable Planning Policies with Simple Rewards