The LoCA Regret: A Consistent Metric to Evaluate Model-Based Behavior in Reinforcement Learning -- Supplementary Material -- AT abular Experiments

Oct-2-2025, 20:18:27 GMT–Neural Information Processing Systems

Here, we discuss some additional settings for the tabular experiments. The reason for this is that Sarsa(0.95), in contrast to MB-VI and MB-SU, is a multi-step Therefore, there is stochasticity in the update target even in deterministic environments due to exploration of the behavior policy. All methods used optimistic initialization. The pseudocode of the tabular, on-policy method used in Section 5.1 is shown in Algorithm 1. These estimates are updated at the end of the episode, using the data gathered during the episode.

experiment, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Oct-2-2025, 20:18:27 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.53)

Duplicate Docs Excel Report

Title
TheLoCARegret: AConsistentMetrictoEvaluate Model-BasedBehaviorinReinforcementLearning--SupplementaryMaterial -- ATabularExperiments

Similar Docs Excel Report more

Title	Similarity	Source
None found