Supplementary Material: A Boolean Task Algebra For Reinforcement Learning
–Neural Information Processing Systems
A Boolean algebra is a set B equipped with the binary operators (disjunction) and (conjunction), and the unary operator (negation), which satisfies the following Boolean algebra axioms for a, b, c in B: (i) Idempotence: a a = a a = a. Let M be a set of tasks which adhere to Assumption 2. Then (M,,,, M M. We show that,, satisfy the Boolean properties (i) - (vii). D. But this is a contradiction since the result obtained by following an optimal trajectory up to a terminal state without the reward for entering the terminal state must be strictly less that receiving r This follows directly from Lemma 2. Since all tasks M M share the same optimal policy π Corollary 2. Let F: M Q This follows from Theorem 3. Below we list the pseudocode for the modified Q-learning algorithm used in the four-rooms domain. Note that the greedy action selection step is equivalent to generalised policy improvement (Barreto et al., 2017) over the set of extended value functions. The theoretical results presented in this work rely on Assumptions 1 and 2, which restrict the tasks' transition dynamics and reward functions in potentially problematic ways.
Neural Information Processing Systems
Jun-2-2025, 12:33:05 GMT