If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."
However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …
We discuss a variant of Thompson sampling for nonparametric reinforcement learning in a countable classes of general stochastic environments. These environments can be non-Markov, non-ergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges to the optimal value in mean and (2) given a recoverability assumption regret is sublinear.
This paper revisits the problem of learning a k-CNF Boolean function from examples, for fixed k, in the context of online learning under the logarithmic loss. We give a Bayesian interpretation to one of Valiant’s classic PAC learning algorithms, which we then build upon to derive three efficient, online, probabilistic, supervised learning algorithms for predicting the output of an unknown k-CNF Boolean function. We analyze the loss of our methods, and show that the cumulative log-loss can be upper bounded by a polynomial function of the size of each example.
Recent developments in reinforcement learning for non-Markovianproblems witness a surge in history-based methods, among which weare particularly interested in two frameworks, PhiMDP and MC-AIXI-CTW. PhiMDP attempts to reduce the general RL problem, where the environment's states and dynamics are both unknown, toan MDP, while MC-AIXI-CTW incrementally learns a mixture of contexttrees as its environment model. The main idea of PhiMDP is toconnect generic reinforcement learning with classical reinforcementlearning. The first implementation of PhiMDP relies on astochastic search procedure for finding a tree that minimizes acertain cost function. This does not guarantee finding theminimizing tree, or even a good one, given limited search time. As aconsequence it appears that the approach has difficulties with largedomains. MC-AIXI-CTW is attractive in that it can incrementally andanalytically compute the internal model through interactions withthe environment. Unfortunately, it is computationally demanding dueto requiring heavy planning simulations at every single time step.We devise a novel approach called CTMRL, which analytically andefficiently finds the cost-minimizing tree. Instead of thecontext-tree weighting method that MC-AIXI-CTW is based on, we usethe closely related context-tree maximizing algorithm that selectsjust one single tree. This approach falls under the PhiMDPframework, which allows the replacement of the costly planningcomponent of MC-AIXI-CTW with simple Q-Learning. Our empiricalinvestigation show that CTMRL finds policies of quality as good as MC-AIXI-CTW's on sixdomains including a challenging Pacman domain, but in an order ofmagnitude less time.
Veness, Joel (University of New South Wales and NICTA) | Ng, Kee Siong (Medicare Australia and Australian National University) | Hutter, Marcus (Australian National University and NICTA) | Silver, David (University College London)
This paper introduces a principled approach for the design of a scalable general reinforcement learning agent. This approach is based on a direct approximation of AIXI, a Bayesian optimality notion for general reinforcement learning agents. Previously, it has been unclear whether the theory of AIXI could motivate the design of practical algorithms. We answer this hitherto open question in the affirmative, by providing the first computationally feasible approximation to the AIXI agent. To develop our approximation, we introduce a Monte Carlo Tree Search algorithm along with an agent-specific extension of the Context Tree Weighting algorithm. Empirically, we present a set of encouraging results on a number of stochastic, unknown, and partially observable domains.