Surpassing legacy approaches to PWR core reload optimization with single-objective Reinforcement learning