Winner Takes It All: Training Performant RL Populations for Combinatorial Optimization