Search
Learning to Search via Self-Imitation
Song, Jialin, Lanka, Ravi, Zhao, Albert, Yue, Yisong, Ono, Masahiro
We study the problem of learning a good search policy. To do so, we propose the self-imitation learning setting, which builds upon imitation learning in two ways. First, self-imitation uses feedback provided by retrospective analysis of demonstrated search traces. Second, the policy can learn from its own decisions and mistakes without requiring repeated feedback from an external expert. Combined, these two properties allow our approach to iteratively scale up to larger problem sizes than the initial problem size for which expert demonstrations were provided.
AI has analyzed every chemical reaction ever performed
According to the science magazine Nature, chemists are heralding a new artificial intelligence platform as a significant milestone. The platform has the potential to accelerate the process of drug discovery, and it should be able to make organic chemistry more efficient. The new platform is designed to help chemists to plan the syntheses of small organic molecules. Traditionally, chemists use the process of retrosynthesis, which is an established problem-solving technique whereby target molecules are recursively transformed into increasingly simpler precursors. The goal of retrosynthetic analysis is structural simplification.
How to Implement a Beam Search Decoder for Natural Language Processing - Machine Learning Mastery
Natural language processing tasks, such as caption generation and machine translation, involve generating sequences of words. Models developed for these problems often operate by generating probability distributions across the vocabulary of output words and it is up to decoding algorithms to sample the probability distributions to generate the most likely sequences of words. In this tutorial, you will discover the greedy search and beam search decoding algorithms that can be used on text generation problems. How to Implement Beam Search Decoder for Natural Language Processing Photo by See1,Do1,Teach1, some rights reserved. In natural language processing tasks such as caption generation, text summarization, and machine translation, the prediction required is a sequence of words.
Transforming Logistics with Self-Learning AI NVIDIA Blog
One of the longest-running challenges in the logistics industry is finding the shortest routes. First articulated in the 1930s, the "traveling salesman problem" seeks to deduce the shortest route connecting a group of cities to ensure optimal use of time and resources. Karim Beguir, co-founder and CEO of London-based AI startup InstaDeep, told GPU Technology Conference attendees this week that GPU-powered deep learning and reinforcement learning may have the answer. Previous efforts to address the traveling salesman problem include optimization solvers, heuristics and Monte Carlo Tree Search algorithms. But, according to Beguir, these approaches all suffer from the same shortcoming: They don't learn.
When Subgraph Isomorphism is Really Hard, and Why This Matters for Graph Databases
McCreesh, Ciaran, Prosser, Patrick, Solnon, Christine, Trimble, James
The subgraph isomorphism problem involves deciding whether a copy of a pattern graph occurs inside a larger target graph. The non-induced version allows extra edges in the target, whilst the induced version does not. Although both variants are NP-complete, algorithms inspired by constraint programming can operate comfortably on many real-world problem instances with thousands of vertices. However, they cannot handle arbitrary instances of this size. We show how to generate "really hard" random instances for subgraph isomorphism problems, which are computationally challenging with a couple of hundred vertices in the target, and only twenty pattern vertices. For the non-induced version of the problem, these instances lie on a satisfiable / unsatisfiable phase transition, whose location we can predict; for the induced variant, much richer behaviour is observed, and constrainedness gives a better measure of difficulty than does proximity to a phase transition. These results have practical consequences: we explain why the widely researched "filter / verify" indexing technique used in graph databases is founded upon a misunderstanding of the empirical hardness of NP-complete problems, and cannot be beneficial when paired with any reasonable subgraph isomorphism algorithm.
Minimax Estimation of Quadratic Fourier Functionals
Singh, Shashank, Sriperumbudur, Bharath K., Póczos, Barnabás
We study estimation of (semi-)inner products between two nonparametric probability distributions, given IID samples from each distribution. These products include relatively well-studied classical $\mathcal{L}^2$ and Sobolev inner products, as well as those induced by translation-invariant reproducing kernels, for which we believe our results are the first. We first propose estimators for these quantities, and the induced (semi)norms and (pseudo)metrics. We then prove non-asymptotic upper bounds on their mean squared error, in terms of weights both of the inner product and of the two distributions, in the Fourier basis. Finally, we prove minimax lower bounds that imply rate-optimality of the proposed estimators over Fourier ellipsoids.
Monte Carlo Tree Search - beginners guide - Machine learning blog
For quite a long time, a common opinion in academic world was that machine achieving human master performance level in the game of Go was far from realistic. It was considered a'holy grail' of AI – a milestone we were quite far away from reaching within upcoming decade. Deep Blue had its moment more than 20 years ago and since then no Go engine became close to human masters. The opinion about'numerical chaos' in Go established so well it became referenced in movies, too. Surprisingly, in march 2016 an algorithm invented by Google Deepmind called Alpha Go defeated korean world champion in Go 4-1 proving fictional and real-life sceptics wrong. Around a year after that, Alpha Go Zero – the next generation of Alpha Go Lee (the one beating Korean master) – was reported to destroy its predecessor 100-0, being very doubtfully reachable for humans.
The First microRTS Artificial Intelligence Competition
Ontañón, Santiago (Drexel University) | Barriga, Nicolas A. (University of Alberta) | Silva, Cleyton R. (Universidade Federal de Viçosa) | Moraes, Rubens O. (Universidade Federal de Viçosa) | Lelis, Levi H. S. (Universidade Federal de Viçosa)
This article presents the results of the first edition of the microRTS (μRTS) AI competition, which was hosted by the IEEE Computational Intelligence in Games (CIG) 2017 conference. The goal of the competition is to spur research on AI techniques for real-time strategy (RTS) games. In this first edition, the competition received three submissions, focusing on address- ing problems such as balancing long-term and short-term search, the use of machine learning to learn how to play against certain opponents, and finally, dealing with partial observability in RTS games.
Revisiting First-Order Convex Optimization Over Linear Spaces
Locatello, Francesco, Raj, Anant, Reddy, Sai Praneeth, Rätsch, Gunnar, Schölkopf, Bernhard, Stich, Sebastian U., Jaggi, Martin
Two popular examples of first-order optimization methods over linear spaces are coordinate descent and matching pursuit algorithms, with their randomized variants. While the former targets the optimization by moving along coordinates, the latter considers a generalized notion of directions. Exploiting the connection between the two algorithms, we present a unified analysis of both, providing affine invariant sublinear $\mathcal{O}(1/t)$ rates on smooth objectives and linear convergence on strongly convex objectives. As a byproduct of our affine invariant analysis of matching pursuit, our rates for steepest coordinate descent are the tightest known. Furthermore, we show the first accelerated convergence rate $\mathcal{O}(1/t^2)$ for matching pursuit on convex objectives.
Active Reinforcement Learning with Monte-Carlo Tree Search
Schulze, Sebastian, Evans, Owain
Active Reinforcement Learning (ARL) is a twist on RL where the agent observes reward information only if it pays a cost. This subtle change makes exploration substantially more challenging. Powerful principles in RL like optimism, Thompson sampling, and random exploration do not help with ARL. We relate ARL in tabular environments to Bayes-Adaptive MDPs. We provide an ARL algorithm using Monte-Carlo Tree Search that is asymptotically Bayes optimal. Experimentally, this algorithm is near-optimal on small Bandit problems and MDPs. On larger MDPs it outperforms a Q-learner augmented with specialised heuristics for ARL. By analysing exploration behaviour in detail, we uncover obstacles to scaling up simulation-based algorithms for ARL.