Hutter, Marcus
Report on the Third Conference on Artificial General Intelligence
Goertzel, Ben (Novamente LLC) | Hutter, Marcus (Australian National University)
The second Future of Humanity Institute on AGI and keynote was by Tecnalia neuroscientist Randal possible paths to technological singularity. Koene, who also gave a tutorial on the connection While the community of AGI researchers is between reinforcement learning models in AI and nowhere near a consensus on the best approach to in computational neuroscience. Koene's keynote the original, grand goal of the AI field, it's clear focused on technologies enabling detailed brain that the pursuit of the goal is alive and well, and imaging and whole-brain emulation and on the yielding interesting discoveries and discussions.
Reinforcement Learning via AIXI Approximation
Veness, Joel (University of New South Wales and NICTA) | Ng, Kee Siong (Medicare Australia and Australian National University) | Hutter, Marcus (Australian National University and NICTA) | Silver, David (University College London)
This paper introduces a principled approach for the design of a scalable general reinforcement learning agent. This approach is based on a direct approximation of AIXI, a Bayesian optimality notion for general reinforcement learning agents. Previously, it has been unclear whether the theory of AIXI could motivate the design of practical algorithms. We answer this hitherto open question in the affirmative, by providing the first computationally feasible approximation to the AIXI agent. To develop our approximation, we introduce a Monte Carlo Tree Search algorithm along with an agent-specific extension of the Context Tree Weighting algorithm. Empirically, we present a set of encouraging results on a number of stochastic, unknown, and partially observable domains.
Discrete MDL Predicts in Total Variation
Hutter, Marcus
The Minimum Description Length (MDL) principle selects the model that has the shortest code for data plus model. We show that for a countable class of models, MDL predictions are close to the true distribution in a strong sense. The result is completely general. No independence, ergodicity, stationarity, identifiability, or other assumption on the model class need to be made. More formally, we show that for any countable class of models, the distributions selected by MDL (or MAP) asymptotically predict (merge with) the true measure in the class in total variation distance.
Discrete MDL Predicts in Total Variation
Hutter, Marcus
The Minimum Description Length (MDL) principle selects the model that has the shortest code for data plus model. We show that for a countable class of models, MDL predictions are close to the true distribution in a strong sense. The result is completely general. No independence, ergodicity, stationarity, identifiability, or other assumption on the model class need to be made. More formally, we show that for any countable class of models, the distributions selected by MDL (or MAP) asymptotically predict (merge with) the true measure in the class in total variation distance. Implications for non-i.i.d. domains like time-series forecasting, discriminative learning, and reinforcement learning are discussed.
Open Problems in Universal Induction & Intelligence
Hutter, Marcus
Specialized intelligent systems can be found everywhere: finger print, handwriting, speech, and face recognition, spam filtering, chess and other game programs, robots, et al. This decade the first presumably complete mathematical theory of artificial intelligence based on universal induction-prediction-decision-action has been proposed. This information-theoretic approach solidifies the foundations of inductive inference and artificial intelligence. Getting the foundations right usually marks a significant progress and maturing of a field. The theory provides a gold standard and guidance for researchers working on intelligent algorithms. The roots of universal induction have been laid exactly half-a-century ago and the roots of universal intelligence exactly one decade ago. So it is timely to take stock of what has been achieved and what remains to be done. Since there are already good recent surveys, I describe the state-of-the-art only in passing and refer the reader to the literature. This article concentrates on the open problems in universal induction and its extension to universal intelligence.
Feature Reinforcement Learning: Part I: Unstructured MDPs
Hutter, Marcus
General-purpose, intelligent, learning agents cycle through sequences of observations, actions, and rewards that are complex, uncertain, unknown, and non-Markovian. On the other hand, reinforcement learning is well-developed for small finite state Markov decision processes (MDPs). Up to now, extracting the right state representations out of bare observations, that is, reducing the general agent setup to the MDP framework, is an art that involves significant effort by designers. The primary goal of this work is to automate the reduction process and thereby significantly expand the scope of many existing reinforcement learning algorithms and the agents that employ them. Before we can think of mechanizing this search for suitable MDPs, we need a formal objective criterion. The main contribution of this article is to develop such a criterion. I also integrate the various parts into one learning algorithm. Extensions to more realistic dynamic Bayesian networks are developed in Part II. The role of POMDPs is also considered there.
Temporal Difference Updating without a Learning Rate
Hutter, Marcus, Legg, Shane
Feature Dynamic Bayesian Networks
Hutter, Marcus
Feature Markov Decision Processes (PhiMDPs) are well-suited for learning agents in general environments. Nevertheless, unstructured (Phi)MDPs are limited to relatively simple environments. Structured MDPs like Dynamic Bayesian Networks (DBNs) are used for large-scale real-world problems. In this article I extend PhiMDP to PhiDBN. The primary contribution is to derive a cost criterion that allows to automatically extract the most relevant features from the environment, leading to the "best" DBN representation. I discuss all building blocks required for a complete general learning algorithm.
On the Possibility of Learning in Reactive Environments with Arbitrary Dependence
Ryabko, Daniil, Hutter, Marcus
We address the problem of reinforcement learning in which observations may exhibit an arbitrary form of stochastic dependence on past observations and actions, i.e. environments more general than (PO)MDPs. The task for an agent is to attain the best possible asymptotic reward where the true generating environment is unknown but belongs to a known countable family of environments. We find some sufficient conditions on the class of environments under which an agent exists which attains the best asymptotic reward for any environment in the class. We analyze how tight these conditions are and how they relate to different probabilistic assumptions known in reinforcement learning and related fields, such as Markov Decision Processes and mixing conditions.