Rewarding What Matters: Step-by-Step Reinforcement Learning for Task-Oriented Dialogue