Ranked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimization