Note: PDF of full volume downloadable by clicking on title above (26 MB). Selected individual chapters available from the links below. CONTENTSINTRODUCTION MATHEMATICAL FOUNDATIONS1 The morphology of prex—an essay in meta-algorithmics. J. LAS KS 32 Program schemata. M. S. PATE RSON 193 Language definition and compiler validation. J. J. FLORENTIN 334 Placing trees in lexicographic order. H. I.S COINS 43 THEOREM PROVING5 A new look at mathematics and its mechanization. B. M ELTZER 636 Some notes on resolution strategies. B. MELTZER 717 The generalized resolution principle. J. A. ROBINSON 778 Some tree-paring strategies for theorem proving. D.LUCKHAM 959 Automatic theorem proving with equality substitutions andmathematical induction. J. L. D ARLINGTON 113 MACHINE LEARNING AND HEURISTIC PROGRAMMING10 On representations of problems of reasoning about actions.S.AMAREL 13111 Descriptions. E.W.ELCOCK 17312 Kalah on Atlas. A.G.BELL 18113 Experiments with a pleasure-seeking automaton: J. E. DORAN 19514 Collective behaviour and control problems. V.I.VARSHAVSKY 217 MAN—MACHINE INTERACTION15 A comparison of heuristic, interactive, and unaided methods ofsolving a shortest-route problem. D.MICHIE, J. G. FLEMING andJ. V.OLDFIELD 24516 Interactive programming at Carnegie Tech. A.H.BOND 25717 Maintenance of large computer systems—the engineer's assistant.M.H.J.BAYLIS 269 COGNITIVE PROCESSES: METHODS AND MODELS18 The syntactic analysis of English by machine. J.P.THORNE,P.BRATLEY and H.DEWAR 28119 The adaptive memorization of sequences. H.C.LONOUETHIGGINSand A.ORTONY 311 PATTERN RECOGNITION20 An application of Graph Theory in pattern recognition.C.J.HILDITCH 325 PROBLEM-ORIENTED LANGUAGES21 Some semantics for data structures. D. PARK 35122 Writing search algorithms in functional form. R.M.BURSTALL 37323 Assertions: programs written without specifying unnecessaryorder. J.M.FOSTER 38724 The design philosophy of Pop-2. R.J.POPPLESTONE 393 INDEX 403 Machine Intelligence Workshop
We propose a method for efficient training of deep Reinforcement Learning (RL) agents when the reward is highly sparse and non-Markovian, but at the same time admits a high-level yet unknown sequential structure, as seen in a number of video games. This high-level sequential structure can be expressed as a computer program, which our method infers automatically as the RL agent explores the environment. Through this process, a high-level sequential task that occurs only rarely may nonetheless be encoded within the inferred program. A hybrid architecture for deep neural fitted Q-iteration is then employed to fill in low-level details and generate an optimal control policy that follows the structure of the program. Our experiments show that the agent is able to synthesise a complex program to guide the RL exploitation phase, which is otherwise difficult to achieve with state-of-the-art RL techniques.
Using the methods demonstrated in this paper, a robot with an unknown sensorimotor system can learn sets of features and behaviors adequate to explore a continuous environment and abstract it to a finitestate automaton. The structure of this automaton can then be learned from experience, and constitutes a cognitive map of the environment.