The LoCA Regret: A Consistent Metric to Evaluate Model-Based Behavior in Reinforcement Learning