Breaking Habits: On the Role of the Advantage Function in Learning Causal State Representations