Most logic-based machine learning algorithms rely on an Occamist bias where textual simplicity of hypotheses is optimised. This approach, however, fails to distinguish between the efficiencies of hypothesised programs, such as quick sort (O(n log n)) and bubble sort (O(n^2)). We address this issue by considering techniques to minimise both the resource complexity and textual complexity of hypothesised programs. We describe an algorithm proven to learn optimal resource complexity robot strategies, and we propose future work to generalise this approach to a broader class of logic programs.
Most logic-based machine learning algorithms rely on an Occamist bias where textual complexity of hypotheses is minimised. Within Inductive Logic Programming (ILP), this approach fails to distinguish between the efficiencies of hypothesised programs, such as quick sort (O(n log n)) and bubble sort (O(n 2 )). This paper addresses this issue by considering techniques to minimise both the textual complexity and resource complexity of hypothesised robot strategies. We develop a general framework for the problem of minimising resource complexity and show that on two robot strategy problems, 1) Postman 2) Sorter (recursively sort letters for delivery), the theoretical resource complexities of optimal strategies vary depending on whether objects can be composed within a strategy. The approach considered is an extension of Meta-Interpretive Learning (MIL), a recently developed paradigm in ILP which supports predicate invention and the learning of recursive logic programs. We introduce a new MIL implementation, Metagol O , and prove its convergence, with increasing numbers of randomly chosen examples to optimal strategies of this kind. Our experiments show that Metagol O learns theoretically optimal robot sorting strategies, which is in agreement with the theoretical predictions showing a clear divergence in resource requirements as the number of objects grows. To the authors’ knowledge this paper is the first demonstration of a learning algorithm able to learn optimal resource complexity robot strategies and algorithms for sorting lists.
In recent years Predicate Invention has been under-explored within Inductive Logic Programming due to difficulties in formulating efficient search mechanisms. However, a recent paper demonstrated that both predicate invention and the learning of recursion can be efficiently implemented for regular and context-free grammars, by way of abduction with respect to a meta-interpreter. New predicate symbols are introduced as constants representing existentially quantified higher-order variables. In this paper we generalise the approach of Meta-Interpretive Learning (MIL) to that of learning higher-order dyadic datalog programs. We show that with an infinite signature the higher-order dyadic datalog class H 2 2 has universal Turing expressivity though H 2 2 is decidable given a finite signature. Additionally we show that Knuth-Bendix ordering of the hypothesis space together with logarithmic clause bounding allows our Dyadic MIL implementation Metagol D to PAC-learn minimal cardinailty H 2 2 definitions. This result is consistent with our experiments which indicate that Metagol D efficiently learns compact H 2 2 definitions involving predicate invention for robotic strategies and higher-order concepts in the NELL language learning domain.
World-class human players have been outperformed in a number of complex two person games (Go, Chess, Checkers) by Deep Reinforcement Learning systems. However, owing to tractability considerations minimax regret of a learning system cannot be evaluated in such games. In this paper we consider simple games (Noughts-and-Crosses and Hexapawn) in which minimax regret can be efficiently evaluated. We use these games to compare Cumulative Minimax Regret for variants of both standard and deep reinforcement learning against two variants of a new Meta-Interpretive Learning system called MIGO. In our experiments all tested variants of both normal and deep reinforcement learning have worse performance (higher cumulative minimax regret) than both variants of MIGO on Noughts-and-Crosses and Hexapawn. Additionally, MIGO's learned rules are relatively easy to comprehend, and are demonstrated to achieve significant transfer learning in both directions between Noughts-and-Crosses and Hexapawn.