AITopics

1811.03895

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.83)

arXiv.org Artificial IntelligenceMay-21-2018

AGI Safety Literature Review

Everitt, Tom, Lea, Gary, Hutter, Marcus

The development of Artificial General Intelligence (AGI) promises to be a major event. Along with its many potential benefits, it also raises serious safety concerns (Bostrom, 2014). The intention of this paper is to provide an easily accessible and up-to-date collection of references for the emerging field of AGI safety. A significant number of safety problems for AGI have been identified. We list these, and survey recent research on solving them. We also cover works on how best to think of AGI from the limited knowledge we have today, predictions for when AGI will first be created, and what will happen after its creation. Finally, we review the current public policy on AGI.

deep learning, neural network, us government, (21 more...)

1805.01109

Country:

Asia (0.68)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > California (0.14)

Genre:

Overview (1.00)
Research Report > New Finding (0.34)

Industry:

Leisure & Entertainment > Games (1.00)
Education (1.00)
Information Technology > Security & Privacy (0.92)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
(3 more...)

arXiv.org Artificial IntelligenceApr-12-2018

A Topological Approach to Meta-heuristics: Analytical Results on the BFS vs. DFS Algorithm Selection Problem

Everitt, Tom, Hutter, Marcus

Search is a central problem in artificial intelligence, and breadth-first search (BFS) and depth-first search (DFS) are the two most fundamental ways to search. In this paper we derive estimates for average BFS and DFS runtime. The average runtime estimates can be used to allocate resources or judge the hardness of a problem. They can also be used for selecting the best graph representation, and for selecting the faster algorithm out of BFS and DFS. They may also form the basis for an analysis of more advanced search methods. The paper treats both tree search and graph search. For tree search, we employ a probabilistic model of goal distribution; for graph search, the analysis depends on an additional statistic of path redundancy and average branching factor. As an application, we use the results to predict BFS and DFS runtime on two concrete grammar problems and on the N-puzzle. Experimental verification shows that our analytical approximations come close to empirical reality.

artificial intelligence, probability, survey article, (18 more...)

1509.02709

Country: North America > Canada > Alberta (0.14)

Genre:

Overview (0.67)
Research Report (0.64)

Industry: Leisure & Entertainment (0.45)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)

arXiv.org Machine LearningAug-19-2017

Reinforcement Learning with a Corrupted Reward Channel

Everitt, Tom, Krakovna, Victoria, Orseau, Laurent, Hutter, Marcus, Legg, Shane

No real-world reward function is perfect. Sensory errors and software bugs may result in RL agents observing higher (or lower) rewards than they should. For example, a reinforcement learning agent may prefer states where a sensory error gives it the maximum reward, but where the true reward is actually small. We formalise this problem as a generalised Markov Decision Problem called Corrupt Reward MDP. Traditional RL methods fare poorly in CRMDPs, even under strong simplifying assumptions and when trying to compensate for the possibly corrupt rewards. Two ways around the problem are investigated. First, by giving the agent richer data, such as in inverse reinforcement learning and semi-supervised reinforcement learning, reward corruption stemming from systematic sensory errors may sometimes be completely managed. Second, by using randomisation to blunt the agent's optimisation, reward corruption can be partially managed under some assumptions.

agent, artificial intelligence, reinforcement learning, (19 more...)

1705.08417

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceJun-3-2016

Thompson Sampling is Asymptotically Optimal in General Environments

Leike, Jan, Lattimore, Tor, Orseau, Laurent, Hutter, Marcus

We discuss a variant of Thompson sampling for nonparametric reinforcement learning in a countable classes of general stochastic environments. These environments can be non-Markov, non-ergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges to the optimal value in mean and (2) given a recoverability assumption regret is sublinear.

artificial intelligence, reinforcement learning, thompson, (17 more...)

1602.07905

Country: North America > Canada > Alberta (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.93)

arXiv.org Machine LearningApr-12-2016

Loss Bounds and Time Complexity for Speed Priors

Filan, Daniel, Hutter, Marcus, Leike, Jan

This paper establishes for the first time the predictive performance of speed priors and their computational complexity. A speed prior is essentially a probability distribution that puts low probability on strings that are not efficiently computable. We propose a variant to the original speed prior (Schmidhuber, 2002), and show that our prior can predict sequences drawn from probability measures that are estimable in polynomial time. Our speed prior is computable in doubly-exponential time, but not in polynomial time. On a polynomial time computable sequence our speed prior is computable in exponential time. We show better upper complexity bounds for Schmidhuber's speed prior under the same conditions, and that it predicts deterministic sequences that are computable in polynomial time; however, we also show that it is not computable in polynomial time, and the question of its predictive properties for stochastic sequences remains open.

artificial intelligence, machine learning, polynomial time, (18 more...)

1604.03343

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceJul-15-2015

Solomonoff Induction Violates Nicod's Criterion

Leike, Jan, Hutter, Marcus

Nicod's criterion states that observing a black raven is evidence for the hypothesis H that all ravens are black. We show that Solomonoff induction does not satisfy Nicod's criterion: there are time steps in which observing black ravens decreases the belief in H. Moreover, while observing any computable infinite string compatible with H, the belief in H decreases infinitely often when using the unnormalized Solomonoff prior, but only finitely often when using the normalized Solomonoff prior. We argue that the fault is not with Solomonoff induction; instead we should reject Nicod's criterion.

artificial intelligence, bayesian inference, infinite string, (18 more...)

1507.04121

Country: Europe (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

arXiv.org Artificial IntelligenceJul-15-2015

On the Computability of Solomonoff Induction and Knowledge-Seeking

Leike, Jan, Hutter, Marcus

Solomonoff induction is held as a gold standard for learning, but it is known to be incomputable. We quantify its incomputability by placing various flavors of Solomonoff's prior M in the arithmetical hierarchy. We also derive computability bounds for knowledge-seeking agents, and give a limit-computable weakly asymptotically optimal reinforcement learning agent.

agent, artificial intelligence, reinforcement learning, (14 more...)

1507.04124

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.56)

AAAI ConferencesMar-6-2015

Compress and Control

Veness, Joel (Google DeepMind) | Bellemare, Marc G (Google DeepMind) | Hutter, Marcus (Australian National University) | Chua, Alvin (Google DeepMind) | Desjardins, Guillaume (Google DeepMind)

This paper describes a new information-theoretic policy evaluation technique for reinforcement learning. This technique converts any compression or density model into a corresponding estimate of value. Under appropriate stationarity and ergodicity conditions, we show that the use of a sufficiently powerful model gives rise to a consistent value function estimator. We also study the behavior of this technique when applied to various Atari 2600 video games, where the use of suboptimal modeling techniques is unavoidable. We consider three fundamentally different models, all too limited to perfectly model the dynamics of the system. Remarkably, we find that our technique provides sufficiently accurate value estimates for effective on-policy control. We conclude with a suggestive study highlighting the potential of our technique to scale to large problems.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

AAAI Conferences

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country: North America > United States > New York (0.14)

Industry: Leisure & Entertainment > Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.91)

arXiv.org Machine LearningJun-29-2013

Concentration and Confidence for Discrete Bayesian Sequence Predictors

Lattimore, Tor, Hutter, Marcus, Sunehag, Peter

Bayesian sequence prediction is a simple technique for predicting future symbols sampled from an unknown measure on infinite sequences over a countable alphabet. While strong bounds on the expected cumulative error are known, there are only limited results on the distribution of this error. We prove tight high-probability bounds on the cumulative error, which is measured in terms of the Kullback-Leibler (KL) divergence. We also consider the problem of constructing upper confidence bounds on the KL and Hellinger errors similar to those constructed from Hoeffding-like bounds in the i.i.d. case. The new results are applied to show that Bayesian sequence prediction can be used in the Knows What It Knows (KWIK) framework with bounds that match the state-of-the-art.

artificial intelligence, ln 1, machine learning, (17 more...)

1307.0127

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)