Goto

Collaborating Authors

 Broda, Krysia


Hierarchies of Reward Machines

arXiv.org Artificial Intelligence

Hierarchical reinforcement learning (HRL; Barto & Mahadevan, 2003) frameworks, such as options (Sutton et al., Reward machines (RMs) are a recent formalism 1999), have been used to exploit RMs by learning policies for representing the reward function of a reinforcement at two levels of abstraction: (i) select a formula (i.e., subgoal) learning task through a finite-state machine from a given RM state, and (ii) select an action to whose edges encode subgoals of the task using (eventually) satisfy the chosen formula (Toro Icarte et al., high-level events. The structure of RMs enables 2018; Furelos-Blanco et al., 2021). The subtask decomposition the decomposition of a task into simpler and independently powered by HRL enables learning at multiple scales solvable subtasks that help tackle longhorizon simultaneously, and eases the handling of long-horizon and and/or sparse reward tasks. We propose sparse reward tasks. In addition, several works have considered a formalism for further abstracting the subtask the problem of learning the RMs themselves from structure by endowing an RM with the ability to interaction (e.g., Toro Icarte et al., 2019; Xu et al., 2020; call other RMs, thus composing a hierarchy of Furelos-Blanco et al., 2021; Hasanbeig et al., 2021).


Reactive Answer Set Programming

arXiv.org Artificial Intelligence

Logic Production System (LPS) is a logic-based framework for modelling reactive behaviour. Based on abductive logic programming, it combines reactive rules with logic programs, a database and a causal theory that specifies transitions between the states of the database. This paper proposes a systematic mapping of the Kernel of this framework (called KELPS) into an answer set program (ASP). For this purpose a new variant of KELPS with finite models, called $n$-distance KELPS, is introduced. A formal definition of the mapping from this $n$-distance KELPS to ASP is given and proven sound and complete. The Answer Set Programming paradigm allows to capture additional behaviours to the basic reactivity of KELPS, in particular proactive, preemptive and prospective behaviours. These are all discussed and illustrated with examples. Then a hybrid framework is proposed that integrates KELPS and ASP, allowing to combine the strengths of both paradigms. Under consideration in Theory and Practice of Logic Programming (TPLP).


Induction and Exploitation of Subgoal Automata for Reinforcement Learning

arXiv.org Artificial Intelligence

In this paper we present ISA, an approach for learning and exploiting subgoals in episodic reinforcement learning (RL) tasks. ISA interleaves reinforcement learning with the induction of a subgoal automaton, an automaton whose edges are labeled by the task's subgoals expressed as propositional logic formulas over a set of high-level events. A subgoal automaton also consists of two special states: a state indicating the successful completion of the task, and a state indicating that the task has finished without succeeding. A state-of-the-art inductive logic programming system is used to learn a subgoal automaton that covers the traces of high-level events observed by the RL agent. When the currently exploited automaton does not correctly recognize a trace, the automaton learner induces a new automaton that covers that trace. The interleaving process guarantees the induction of automata with the minimum number of states, and applies a symmetry breaking mechanism to shrink the search space whilst remaining complete. We evaluate ISA in several grid-world and continuous state space problems using different RL algorithms that leverage the automaton structures. We provide an in-depth empirical analysis of the automaton learning process performance in terms of the traces, the symmetric breaking and specific restrictions imposed on the final learnable automaton. For each class of RL problem, we show that the learned automata can be successfully exploited to learn policies that reach the goal, achieving an average reward comparable to the case where automata are not learned but handcrafted and given beforehand.


The ILASP system for Inductive Learning of Answer Set Programs

arXiv.org Artificial Intelligence

The goal of Inductive Logic Programming (ILP) is to learn a program that explains a set of examples in the context of some pre-existing background knowledge. Until recently, most research on ILP targeted learning Prolog programs. Our own ILASP system instead learns Answer Set Programs, including normal rules, choice rules and hard and weak constraints. Learning such expressive programs widens the applicability of ILP considerably; for example, enabling preference learning, learning common-sense knowledge, including defaults and exceptions, and learning non-deterministic theories. In this paper, we first give a general overview of ILASP's learning framework and its capabilities. This is followed by a comprehensive summary of the evolution of the ILASP system, presenting the strengths and weaknesses of each version, with a particular emphasis on scalability.


Induction of Subgoal Automata for Reinforcement Learning

arXiv.org Machine Learning

Our method relies on inducing an automaton whose transitions are subgoals expressed as propositional formulas over a set of observable events. A state-of-the-art inductive logic programming system is used to learn the automaton from observation traces perceived by the RL agent. The reinforcement learning and automaton learning processes are interleaved: a new refined automaton is learned whenever the RL agent generates a trace not recognized by the current automaton. We evaluate ISA in several gridworld problems and show that it performs similarly to a method for which automata are given in advance. We also show that the learned automata can be exploited to speed up convergence through reward shaping and transfer learning across multiple tasks. Finally, we analyze the running time and the number of traces that ISA needs to learn an automata, and the impact that the number of observable events has on the learner's performance.


Inductive Learning of Answer Set Programs from Noisy Examples

arXiv.org Artificial Intelligence

In recent years, non-monotonic Inductive Logic Programming has received growing interest. Specifically, several new learning frameworks and algorithms have been introduced for learning under the answer set semantics, allowing the learning of common-sense knowledge involving defaults and exceptions, which are essential aspects of human reasoning. In this paper, we present a noise-tolerant generalisation of the learning from answer sets framework. We evaluate our ILASP3 system, both on synthetic and on real datasets, represented in the new framework. In particular, we show that on many of the datasets ILASP3 achieves a higher accuracy than other ILP systems that have previously been applied to the datasets, including a recently proposed differentiable learning framework.


Learning Weak Constraints in Answer Set Programming

arXiv.org Artificial Intelligence

This paper contributes to the area of inductive logic programming by presenting a new learning framework that allows the learning of weak constraints in Answer Set Programming (ASP). The framework, called Learning from Ordered Answer Sets, generalises our previous work on learning ASP programs without weak constraints, by considering a new notion of examples as ordered pairs of partial answer sets that exemplify which answer sets of a learned hypothesis (together with a given background knowledge) are preferred to others. In this new learning task inductive solutions are searched within a hypothesis space of normal rules, choice rules, and hard and weak constraints. We propose a new algorithm, ILASP2, which is sound and complete with respect to our new learning framework. We investigate its applicability to learning preferences in an interview scheduling problem and also demonstrate that when restricted to the task of learning ASP programs without weak constraints, ILASP2 can be much more efficient than our previously proposed system.