Goto

Collaborating Authors

 Search


AlphaZero Chess: How It Works, What Sets It Apart, and What It Can Tell Us

#artificialintelligence

To those of you who have an interest in chess or who have been monitoring recent developments in artificial intelligence the name "AlphaZero" will be instantly recognisable; its victory over the then-leading chess engine in the world, Stockfish, had revolutionised the way that chess is played by both computers and, indeed, humans. However, if you aren't a chess aficionado or have missed the news a couple of years ago, you might be wondering what exactly this AlphaZero really is, and what makes it worth writing an entire blog post about. For you, I will explain. In short, AlphaZero is a game-playing program that, through a combination of self-play and neural network reinforcement learning (more on that later), is able to learn to play games such as chess and Go from scratch that is, after being fed nothing more than the rules of said games. In fact, a newer derivative of AlphaZero, called MuZero, isn't limited to only board games such as chess, but can also learn to play a range of simple video games from the Atari collection.


AutoSlicer: Scalable Automated Data Slicing for ML Model Analysis

arXiv.org Artificial Intelligence

Automated slicing aims to identify subsets of evaluation data where a trained model performs anomalously. This is an important problem for machine learning pipelines in production since it plays a key role in model debugging and comparison, as well as the diagnosis of fairness issues. Scalability has become a critical requirement for any automated slicing system due to the large search space of possible slices and the growing scale of data. We present AutoSlicer, a scalable system that searches for problematic slices through distributed metric computation and hypothesis testing. We develop an efficient strategy that reduces the search space through pruning and prioritization. In the experiments, we show that our search strategy finds most of the anomalous slices by inspecting a small portion of the search space.


A Simple Baseline for Beam Search Reranking

arXiv.org Artificial Intelligence

Reranking methods in machine translation aim to close the gap between common evaluation metrics (e.g. BLEU) and maximum likelihood learning and decoding algorithms. Prior works address this challenge by training models to rerank beam search candidates according to their predicted BLEU scores, building upon large models pretrained on massive monolingual corpora -- a privilege that was never made available to the baseline translation model. In this work, we examine a simple approach for training rerankers to predict translation candidates' BLEU scores without introducing additional data or parameters. Our approach can be used as a clean baseline, decoupled from external factors, for future research in this area.


Multi-Task Learning for Sparsity Pattern Heterogeneity: A Discrete Optimization Approach

arXiv.org Machine Learning

We extend best-subset selection to linear Multi-Task Learning (MTL), where a set of linear models are jointly trained on a collection of datasets (``tasks''). Allowing the regression coefficients of tasks to have different sparsity patterns (i.e., different supports), we propose a modeling framework for MTL that encourages models to share information across tasks, for a given covariate, through separately 1) shrinking the coefficient supports together, and/or 2) shrinking the coefficient values together. This allows models to borrow strength during variable selection even when the coefficient values differ markedly between tasks. We express our modeling framework as a Mixed-Integer Program, and propose efficient and scalable algorithms based on block coordinate descent and combinatorial local search. We show our estimator achieves statistically optimal prediction rates. Importantly, our theory characterizes how our estimator leverages the shared support information across tasks to achieve better variable selection performance. We evaluate the performance of our method in simulations and two biology applications. Our proposed approaches outperform other sparse MTL methods in variable selection and prediction accuracy. Interestingly, penalties that shrink the supports together often outperform penalties that shrink the coefficient values together. We will release an R package implementing our methods.


MURMUR: Modular Multi-Step Reasoning for Semi-Structured Data-to-Text Generation

arXiv.org Artificial Intelligence

Prompting large language models has enabled significant recent progress in multi-step reasoning over text. However, when applied to text generation from semi-structured data (e.g., graphs or tables), these methods typically suffer from low semantic coverage, hallucination, and logical inconsistency. We propose MURMUR, a neuro-symbolic modular approach to text generation from semi-structured data with multi-step reasoning. MURMUR is a best-first search method that generates reasoning paths using: (1) neural and symbolic modules with specific linguistic and logical skills, (2) a grammar whose production rules define valid compositions of modules, and (3) value functions that assess the quality of each reasoning step. We conduct experiments on two diverse data-to-text generation tasks like WebNLG and LogicNLG. These tasks differ in their data representations (graphs and tables) and span multiple linguistic and logical skills. MURMUR obtains significant improvements over recent few-shot baselines like direct prompting and chain-of-thought prompting, while also achieving comparable performance to fine-tuned GPT-2 on out-of-domain data. Moreover, human evaluation shows that MURMUR generates highly faithful and correct reasoning paths that lead to 26% more logically consistent summaries on LogicNLG, compared to direct prompting.


Reinforcement Learning to solve Rubik's cube (and other complex problems!)

#artificialintelligence

Half a year has passed since my book "Deep Reinforcement Learning Hands-On" has seen the light. It took me almost a year to write the book and after some time of rest from writing I've discovered that explaining RL methods and turning theoretical papers into working code is a lot of fun for me and I don't want to stop. Luckily, RL domain is evolving, so, there are lots of topics to write about. In mass perception, Deep Reinforcement Learning is a tool to be used mostly for game playing. This is not surprising, given the fact, that historically, the first success in the field was achieved in Atari game suite by Deep Mind in 2015. Atari benchmark suite turned out to be very successful for RL problems and, even now, lots of research papers are using it for demonstrating the efficiency of their methods. As the RL field progresses, the classical 53 Atari games continue to become less and less challenging (at the time of writing more than half of games are solved with super-human accuracy) and researches turn to more complex games, like StarCraft and Dota2. But this bias towards games creates a false impression "RL is about playing games'', which is very far from the truth. In my book, published in June 2018, I've tried to counterbalance this by accompanying Atari games with the examples from other domains, including stock trading (chapter 8), chatbots and NLP problems (chapter 12), web navigation automation (chapter 13), continuous control (chapters 14…16) and boards games (chapter 18). In fact RL having very flexible MDP model potentially could be applied to a wide variety of domains, where computer games is just one convenient and spectacular example of the complicated decision making. In this article I've tried to write a detailed description of the recent attempt to apply RL to a field of combinatorial optimisation. The paper discussed was published by the group of researchers from UCI (University of California, Irvine) and called "Solving the Rubik's Cube Without Human Knowledge''.


Learning to repeatedly solve routing problems

arXiv.org Artificial Intelligence

In the last years, there has been a great interest in machine-learning-based heuristics for solving NP-hard combinatorial optimization problems. The developed methods have shown potential on many optimization problems. In this paper, we present a learned heuristic for the reoptimization of a problem after a minor change in its data. We focus on the case of the capacited vehicle routing problem with static clients (i.e., same client locations) and changed demands. Given the edges of an original solution, the goal is to predict and fix the ones that have a high chance of remaining in an optimal solution after a change of client demands. This partial prediction of the solution reduces the complexity of the problem and speeds up its resolution, while yielding a good quality solution. The proposed approach resulted in solutions with an optimality gap ranging from 0\% to 1.7\% on different benchmark instances within a reasonable computing time.


Materials Discovery using Max K-Armed Bandit

arXiv.org Artificial Intelligence

Search algorithms for the bandit problems are applicable in materials discovery. However, the objectives of the conventional bandit problem are different from those of materials discovery. The conventional bandit problem aims to maximize the total rewards, whereas materials discovery aims to achieve breakthroughs in material properties. The max K-armed bandit (MKB) problem, which aims to acquire the single best reward, matches with the discovery tasks better than the conventional bandit. Thus, here, we propose a search algorithm for materials discovery based on the MKB problem using a pseudo-value of the upper confidence bound of expected improvement of the best reward. This approach is pseudo-guaranteed to be asymptotic oracles that do not depends on the time horizon. In addition, compared with other MKB algorithms, the proposed algorithm has only one hyperparameter, which is advantageous in materials discovery. We applied the proposed algorithm to synthetic problems and molecular-design demonstrations using a Monte Carlo tree search. According to the results, the proposed algorithm stably outperformed other bandit algorithms in the late stage of the search process when the optimal arm of the MKB could not be determined based on its expectation reward.


A comparative study of the performance of different search algorithms on FOON graphs

arXiv.org Artificial Intelligence

A robot finds it really hard to learn creatively and adapt to new unseen challenges. This is mainly because of the minimal information it has access to or experience towards. Paulius et al. [1] presented a way to construct functional graphs that encapsulate. Sakib et al. [2] further expanded FOON objects for robotic cooking. This paper presents a comparative study of Breadth First Search (BFS), Greedy Breadth First search (GBFS) with two heuristic functions, and Iterative Depth First Search (IDFS) and provides a comparison of their performance.


Monte-Carlo Tree-Search for Leveraging Performance of Blackbox Job-Shop Scheduling Heuristics

arXiv.org Artificial Intelligence

In manufacturing, the production is often done on out-of-the-shelf manufacturing lines, whose underlying scheduling heuristics are not known due to the intellectual property. We consider such a setting with a black-box job-shop system and an unknown scheduling heuristic that, for a given permutation of jobs, schedules the jobs for the black-box job-shop with the goal of minimizing the makespan. Here, the jobs need to enter the job-shop in the given order of the permutation, but may take different paths within the job shop, which depends on the black-box heuristic. The performance of the black-box heuristic depends on the order of the jobs, and the natural problem for the manufacturer is to find an optimum ordering of the jobs. Facing a real-world scenario as described above, we engineer the Monte-Carlo tree-search for finding a close-to-optimum ordering of jobs. To cope with a large solutions-space in planning scenarios, a hierarchical Monte-Carlo tree search (H-MCTS) is proposed based on abstraction of jobs. On synthetic and real-life problems, H-MCTS with integrated abstraction significantly outperforms pure heuristic-based techniques as well as other Monte-Carlo search variants. We furthermore show that, by modifying the evaluation metric in H-MCTS, it is possible to achieve other optimization objectives than what the scheduling heuristics are designed for -- e.g., minimizing the total completion time instead of the makespan. Our experimental observations have been also validated in real-life cases, and our H-MCTS approach has been implemented in a production plant's controller.