TheLoCARegret: AConsistentMetrictoEvaluate Model-BasedBehaviorinReinforcementLearning--SupplementaryMaterial -- ATabularExperiments

Feb-8-2026, 07:44:44 GMT–Neural Information Processing Systems

For all tabular experiments, we used -greedy exploration with = 0.1. Furthermore, during pretraining and training, we used a maximum episode-length of 100. For evaluation, we set = 0, and ran 10 evaluation episodes. We used a fixed step-sizeα for all tabular experiments. Therefore, there is stochasticity in the update target even in deterministic environments due to exploration of the behavior policy.

artificial intelligence, supplementarymaterial, thelocaregret, (6 more...)

Neural Information Processing Systems

Feb-8-2026, 07:44:44 GMT

Conferences PDF

Add feedback

Country:
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.05)

Technology:
- Information Technology > Artificial Intelligence (0.32)

Duplicate Docs Excel Report

Title
The LoCA Regret: A Consistent Metric to Evaluate Model-Based Behavior in Reinforcement Learning -- Supplementary Material -- AT abular Experiments

Similar Docs Excel Report more

Title	Similarity	Source
None found