Reinforcement Learning for Graph Coloring: Understanding the Power and Limits of Non-Label Invariant Representations