AITopics | Orseau, Laurent

Collaborating Authors

Orseau, Laurent

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Policy-Guided Heuristic Search with Guarantees

Orseau, Laurent, Lelis, Levi H. S.

arXiv.org Artificial IntelligenceMar-21-2021

The use of a policy and a heuristic function for guiding search can be quite effective in adversarial problems, as demonstrated by AlphaGo and its successors, which are based on the PUCT search algorithm. While PUCT can also be used to solve single-agent deterministic problems, it lacks guarantees on its search effort and it can be computationally inefficient in practice. Combining the A* algorithm with a learned heuristic function tends to work better in these domains, but A* and its variants do not use a policy. Moreover, the purpose of using A* is to find solutions of minimum cost, while we seek instead to minimize the search loss (e.g., the number of search steps). LevinTS is guided by a policy and provides guarantees on the number of search steps that relate to the quality of the policy, but it does not make use of a heuristic function. In this work we introduce Policy-guided Heuristic Search (PHS), a novel search algorithm that uses both a heuristic function and a policy and has theoretical guarantees on the search loss that relates to both the quality of the heuristic and of the policy. We show empirically on the sliding-tile puzzle, Sokoban, and a puzzle from the commercial game `The Witness' that PHS enables the rapid learning of both a policy and a heuristic function and compares favorably with A*, Weighted A*, Greedy Best-First Search, LevinTS, and PUCT in terms of number of problems solved and search time in all three domains tested.

algorithm, artificial intelligence, computer game, (15 more...)

arXiv.org Artificial Intelligence

2103.11505

Country:

North America > Canada (0.28)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment > Games > Computer Games (0.48)
Leisure & Entertainment > Games > Go (0.34)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)

Add feedback

Training a First-Order Theorem Prover from Synthetic Data

Firoiu, Vlad, Aygun, Eser, Anand, Ankit, Ahmed, Zafarali, Glorot, Xavier, Orseau, Laurent, Zhang, Lei, Precup, Doina, Mourad, Shibl

arXiv.org Artificial IntelligenceMar-5-2021

A major challenge in applying machine learning to automated theorem proving is the scarcity of training data, which is a key ingredient in training successful deep learning models. To tackle this problem, we propose an approach that relies on training purely with synthetically generated theorems, without any human data aside from axioms. We use these theorems to train a neurally-guided saturationbased prover. Our neural prover outperforms the state-of-the-art E-prover on this synthetic data in both time and search steps, and shows significant transfer to the unseen human-written theorems from the TPTP library, where it solves 72% of first-order problems without equality. Most work applying machine learning to theorem proving takes the following approach: 1) pick a dataset of formalized mathematics, such as Mizar or Metamath, or the standard library of a major proof assistant such as HOL-Light or Coq; 2) split the dataset into train and test; 3) use imitation learning or reinforcement learning on the training set to learn a policy; and finally 4) evaluate the policy on the test set (Loos et al. (2017), Bansal et al. (2019), Yang & Deng (2019), Han et al. (2021), Polu & Sutskever (2020)). Such methods are fundamentally limited by the size of the training set, particularly when relying on deep neural networks (Kaplan et al., 2020). Unfortunately, unlike in computer vision and natural language processing, theorem proving datasets are comparatively tiny.

deep learning, logic programming, theorem, (21 more...)

arXiv.org Artificial Intelligence

2103.03798

Country: North America > United States > California > Los Angeles County > Long Beach (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Logarithmic Pruning is All You Need

Orseau, Laurent, Hutter, Marcus, Rivasplata, Omar

arXiv.org Machine LearningOct-25-2020

The Lottery Ticket Hypothesis is a conjecture that every large neural network contains a subnetwork that, when trained in isolation, achieves comparable performance to the large network. An even stronger conjecture has been proven recently: Every sufficiently overparameterized network contains a subnetwork that, at random initialization, but without training, achieves comparable accuracy to the trained large network. This latter result, however, relies on a number of strong assumptions and guarantees a polynomial factor on the size of the large network compared to the target function. In this work, we remove the most limiting assumptions of this previous work while providing significantly tighter bounds: the overparameterized network only needs a logarithmic factor (in all variables but depth) number of neurons per weight of the target subnetwork.

deep learning, neural network, neuron, (20 more...)

arXiv.org Machine Learning

2006.12156

Country: North America > Canada (0.14)

Genre:

Research Report (0.40)
Contests & Prizes (0.34)

Industry: Leisure & Entertainment (0.48)

Technology:

Information Technology > Communications > Networks (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Avoiding Side Effects By Considering Future Tasks

Krakovna, Victoria, Orseau, Laurent, Ngo, Richard, Martic, Miljan, Legg, Shane

arXiv.org Artificial IntelligenceOct-15-2020

Designing reward functions is difficult: the designer has to specify what to do (what it means to complete the task) as well as what not to do (side effects that should be avoided while completing the task). To alleviate the burden on the reward designer, we propose an algorithm to automatically generate an auxiliary reward function that penalizes side effects. This auxiliary objective rewards the ability to complete possible future tasks, which decreases if the agent causes side effects during the current task. The future task reward can also give the agent an incentive to interfere with events in the environment that make future tasks less achievable, such as irreversible actions by other agents. To avoid this interference incentive, we introduce a baseline policy that represents a default course of action (such as doing nothing), and use it to filter out future tasks that are not achievable by default. We formally define interference incentives and show that the future task approach with a baseline policy avoids these incentives in the deterministic case. Using gridworld environments that test for side effects and interference, we show that our method avoids interference and is more effective for avoiding side effects than the common approach of penalizing irreversible actions.

agent, artificial intelligence, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2010.07877

Country:

North America > United States (0.14)
North America > Canada (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Iterative Budgeted Exponential Search

Helmert, Malte, Lattimore, Tor, Lelis, Levi H. S., Orseau, Laurent, Sturtevant, Nathan R.

arXiv.org Artificial IntelligenceJul-30-2019

We tackle two long-standing problems related to re-expansions in heuristic search algorithms. For graph search, A* can require $\Omega(2^{n})$ expansions, where $n$ is the number of states within the final $f$ bound. Existing algorithms that address this problem like B and B' improve this bound to $\Omega(n^2)$. For tree search, IDA* can also require $\Omega(n^2)$ expansions. We describe a new algorithmic framework that iteratively controls an expansion budget and solution cost limit, giving rise to new graph and tree search algorithms for which the number of expansions is $O(n \log C)$, where $C$ is the optimal solution cost. Our experiments show that the new algorithms are robust in scenarios where existing algorithms fail. In the case of tree search, our new algorithms have no overhead over IDA* in scenarios to which IDA* is well suited and can therefore be recommended as a general replacement for IDA*.

algorithm, artificial intelligence, expansion, (16 more...)

arXiv.org Artificial Intelligence

1907.13062

Country: North America > Canada > Alberta (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)

Add feedback

Zooming Cautiously: Linear-Memory Heuristic Search With Node Expansion Guarantees

Orseau, Laurent, Lelis, Levi H. S., Lattimore, Tor

arXiv.org Artificial IntelligenceJun-7-2019

We introduce and analyze two parameter-free linear-memory tree search algorithms. Under mild assumptions we prove our algorithms are guaranteed to perform only a logarithmic factor more node expansions than A* when the search space is a tree. Previously, the best guarantee for a linear-memory algorithm under similar assumptions was achieved by IDA*, which in the worst case expands quadratically more nodes than in its last iteration. Empirical results support the theory and demonstrate the practicality and robustness of our algorithms. Furthermore, they are fast and easy to implement.

artificial intelligence, neural network, node, (17 more...)

arXiv.org Artificial Intelligence

1906.03242

Country:

North America > Canada > Quebec (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

An investigation of model-free planning

Guez, Arthur, Mirza, Mehdi, Gregor, Karol, Kabra, Rishabh, Racanière, Sébastien, Weber, Théophane, Raposo, David, Santoro, Adam, Orseau, Laurent, Eccles, Tom, Wayne, Greg, Silver, David, Lillicrap, Timothy

arXiv.org Machine LearningJan-11-2019

The field of reinforcement learning (RL) is facing increasingly challenging domains with combinatorial complexity. For an RL agent to address these challenges, it is essential that it can plan effectively. Prior work has typically utilized an explicit model of the environment, combined with a specific planning algorithm (such as tree search). More recently, a new family of methods have been proposed that learn how to plan, by providing the structure for planning via an inductive bias in the function approximator (such as a tree structured neural network), trained end-to-end by a model-free RL algorithm. In this paper, we go even further, and demonstrate empirically that an entirely model-free approach, without special structure beyond standard neural network components such as convolutional networks and LSTMs, can learn to exhibit many of the characteristics typically associated with a model-based planner. We measure our agent's effectiveness at planning in terms of its ability to generalize across a combinatorial and irreversible state space, its data efficiency, and its ability to utilize additional thinking time. We find that our agent has many of the characteristics that one might expect to find in a planning algorithm. Furthermore, it exceeds the state-of-the-art in challenging combinatorial domains such as Sokoban and outperforms other model-free approaches that utilize strong inductive biases toward planning.

computer game, deep learning, drc, (20 more...)

arXiv.org Machine Learning

1901.03559

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Soft-Bayes: Prod for Mixtures of Experts with Log-Loss

Orseau, Laurent, Lattimore, Tor, Legg, Shane

arXiv.org Machine LearningJan-8-2019

We consider prediction with expert advice under the log-loss with the goal of deriving efficient and robust algorithms. We argue that existing algorithms such as exponentiated gradient, online gradient descent and online Newton step do not adequately satisfy both requirements. Our main contribution is an analysis of the Prod algorithm that is robust to any data sequence and runs in linear time relative to the number of experts in each round. Despite the unbounded nature of the log-loss, we derive a bound that is independent of the largest loss and of the largest gradient, and depends only on the number of experts and the time horizon. Furthermore we give a Bayesian interpretation of Prod and adapt the algorithm to derive a tracking regret.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1901.0223

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

Single-Agent Policy Tree Search With Guarantees

Orseau, Laurent, Lelis, Levi, Lattimore, Tor, Weber, Theophane

Neural Information Processing SystemsDec-31-2018

We introduce two novel tree search algorithms that use a policy to guide search. The first algorithm is a best-first enumeration that uses a cost function that allows us to provide an upper bound on the number of nodes to be expanded before reaching a goal state. We show that this best-first algorithm is particularly well suited for ``needle-in-a-haystack'' problems. The second algorithm, which is based on sampling, provides an upper bound on the expected number of nodes to be expanded before reaching a set of goal states. We show that this algorithm is better suited for problems where many paths lead to a goal. We validate these tree search algorithms on 1,000 computer-generated levels of Sokoban, where the policy used to guide search comes from a neural network trained using A3C. Our results show that the policy tree search algorithms we introduce are competitive with a state-of-the-art domain-independent planner that uses heuristic search.

deep learning, neural network, node, (20 more...)

Neural Information Processing Systems

Country: North America > Canada > Alberta (0.14)

Genre: Research Report > New Finding (0.68)

Technology: