Review for NeurIPS paper: A Boolean Task Algebra for Reinforcement Learning
–Neural Information Processing Systems
Additional Feedback: Some ideas on how to relax the restrictive assumptions: The relationship to UVFAs is intriguing, and may potentially lead to a means of applying an approximate version of the results of this paper to more complex settings. For example, what happens if one applies the Boolean operators on value functions to UVFAs? While it's probably possible to construct MDPs in which this won't work, it seems plausible that for sparse enough reward settings one might obtain good value function approximations. I also wonder if it might be possible to apply these results to the setting of van Niekirk et al., which appears somewhat looser in the nature of the MDP transition dynamics and the reward function. A couple of points remain that I feel weren't fully addressed by the rebuttal.
Neural Information Processing Systems
Jan-25-2025, 10:06:13 GMT
- Technology: