24cceab7ffc1118f5daaace13c670885-Supplemental.pdf
–Neural Information Processing Systems
A.1 Algorithm The code is available at https://github.com/mklissa/MOC. A.2 Tabular experiments A.2.1 Implementation Details For our experiments of the FourRooms domain we based our implementation on [Bacon et al., 2016] and ran the experiments for 500 episodes that last a maximum of 1000 steps with goal located in the right hallway. In the first experiment we verify whether learning a fixed set of options can be accelerated by our method. We define this fixed set as the hallway options from Sutton et al. [1999b]. As the policies of these options were deterministic and we use importance sampling, we relax them to stochastic policies where the most likely action is the one leading to a hallway.
Neural Information Processing Systems
Apr-25-2026, 03:48:08 GMT
- Technology: