Goto

Collaborating Authors

 Search


An Information-Theoretic Approach to Minimax Regret in Partial Monitoring

arXiv.org Machine Learning

We prove a new minimax theorem connecting the worst-case Bayesian regret and minimax regret under partial monitoring with no assumptions on the space of signals or decisions of the adversary. We then generalise the information-theoretic tools of Russo and Van Roy (2016) for proving Bayesian regret bounds and combine them with the minimax theorem to derive minimax regret bounds for various partial monitoring settings. The highlight is a clean analysis of `non-degenerate easy' and `hard' finite partial monitoring, with new regret bounds that are independent of arbitrarily large game-dependent constants. The power of the generalised machinery is further demonstrated by proving that the minimax regret for k-armed adversarial bandits is at most sqrt{2kn}, improving on existing results by a factor of 2. Finally, we provide a simple analysis of the cops and robbers game, also improving best known constants.


Minimax Testing of Identity to a Reference Ergodic Markov Chain

arXiv.org Machine Learning

We exhibit an efficient procedure for testing, based on a single long state sequence, whether an unknown Markov chain is identical to or $\varepsilon$-far from a given reference chain. We obtain nearly matching (up to logarithmic factors) upper and lower sample complexity bounds for our notion of distance, which is based on total variation. Perhaps surprisingly, we discover that the sample complexity depends solely on the properties of the known reference chain and does not involve the unknown chain at all, which is not even assumed to be ergodic.


An Optimization Framework for Task Sequencing in Curriculum Learning

arXiv.org Machine Learning

Abstract--Curriculum learning is gaining popularity in (deep) reinforcement learning. It can alleviate the burden on data collection and provide better exploration policies through transfer and generalization from less complex tasks. Current methods for automatic task sequencing for curriculum learning in reinforcement learning provided initial heuristic solutions, with little to no guarantee on their quality. We introduce an optimization framework for task sequencing composed of the problem definition, several candidate performance metrics for optimization, and three benchmark algorithms. We experimentally show that the two most commonly used baselines (learning with no curriculum, and with a random curriculum) perform worse than a simple greedy algorithm. Furthermore, we show theoretically and demonstrate experimentally that the three proposed algorithms provide increasing solution quality at moderately increasing computational complexity, and show that they constitute better baselines for curriculum learning in reinforcement learning. Reinforcement Learning (RL) has recently been successfully applied to a number of tasks whose complexity would have appeared overwhelming only a few years ago [1], [2]. In such large and complex environments, classical exploration strategies designed for Markov Decision Processes (MDPs), aiming at visiting every state the most efficiently, are inadequate. One approach actively investigated is the use of transfer learning [3] to generalize from previous similar tasks, and more recently the application of transfer learning to sequences of tasks of increasing complexity forming a curriculum . Curriculum Learning is often employed in (Deep) RL [4], [5] to let the agent progress more quickly towards better behaviors, but curricula are mostly designed by hand. Curriculum learning has the potential to greatly increase the quality of the behavior discovered by the agent. However, at the moment, creating an appropriate curriculum requires significant human intuition.


Distributed Correlation-Based Feature Selection in Spark

arXiv.org Machine Learning

CFS (Correlation-Based Feature Selection) is an FS algorithm that has been successfully applied to classification problems in many domains. We describe Distributed CFS (DiCFS) as a completely redesigned, scalable, parallel and distributed version of the CFS algorithm, capable of dealing with the large volumes of data typical of big data applications. Two versions of the algorithm were implemented and compared using the Apache Spark cluster computing model, currently gaining popularity due to its much faster processing times than Hadoop's MapReduce model. We tested our algorithms on four publicly available datasets, each consisting of a large number of instances and two also consisting of a large number of features. The results show that our algorithms were superior in terms of both time-efficiency and scalability. In leveraging a computer cluster, they were able to handle larger datasets than the non-distributed WEKA version while maintaining the quality of the results, i.e., exactly the same features were returned by our algorithms when compared to the original algorithm available in WEKA.


Learning Position Evaluation Functions Used in Monte Carlo Softmax Search

arXiv.org Artificial Intelligence

This paper makes two proposals for Monte Carlo Softmax Search, which is a recently proposed method that is classified as a selective search like the Monte Carlo Tree Search. The first proposal separately defines the node-selection and backup policies to allow researchers to freely design a node-selection policy based on their searching strategies and confirms the principal variation produced by the Monte Carlo Softmax Search to that produced by a minimax search. The second proposal modifies commonly used learning methods for positional evaluation functions. In our new proposals, evaluation functions are learned by Monte Carlo sampling, which is performed with the backup policy in the search tree produced by Monte Carlo Softmax Search. The learning methods under consideration include supervised learning, reinforcement learning, regression learning, and search bootstrapping. Our sampling-based learning not only uses current positions and principal variations but also the internal nodes and important variations of a search tree. This step reduces the number of games necessary for learning. New learning rules are derived for sampling-based learning based on the Monte Carlo Softmax Search and combinations of the modified learning methods are also proposed in this paper.


Learning to Project in Multi-Objective Binary Linear Programming

arXiv.org Machine Learning

In this paper, we investigate the possibility of improving the performance of multi-objective optimization solution approaches using machine learning techniques. Specifically, we focus on multi-objective binary linear programs and employ one of the most effective and recently developed criterion space search algorithms, the so-called KSA, during our study. This algorithm computes all nondominated points of a problem with p objectives by searching on a projected criterion space, i.e., a (p-1)-dimensional criterion apace. We present an effective and fast learning approach to identify on which projected space the KSA should work. We also present several generic features/variables that can be used in machine learning techniques for identifying the best projected space. Finally, we present an effective bi-objective optimization based heuristic for selecting the best subset of the features to overcome the issue of overfitting in learning. Through an extensive computational study over 2000 instances of tri-objective Knapsack and Assignment problems, we demonstrate that an improvement of up to 12% in time can be achieved by the proposed learning method compared to a random selection of the projected space.


How AlphaZero Works

#artificialintelligence

Recently I posted about the phenomenal performance of the AlphaZero algorithm in computer chess. For the first time in history, an algorithm displayed human-like understanding of chess. AlphaZero seemed to understand what moves were best and spent its time focusing only on them. It didn't mechanically crunch through millions of possible positions, run out of time, and then select the best move. The best moves emerged from its computer neural network, like a human grandmaster. It was given just the rules of chess and nine hours to play itself 44 million games, and then it learned something so deep about chess that it crushed the world champion computer chess program, Stockfish, 155 games to 6. (They played 1,000 total games; at this level most games are draws.)


Tutorial on Monte Carlo Tree Search - The Algorithm Behind AlphaGo

#artificialintelligence

Between 9 and 15 March, 2016, the second-highest ranked Go player, Lee Sidol, took on a computer program named AlphaGo. AlphaGo emphatically outplayed and outclassed Mr. Sidol and won the series 4-1. Designed by Google's DeepMind, the program has spawned many other developments in AI, including AlphaGo Zero. These breakthroughs are widely considered as stepping stones towards Artificial General Intelligence (AGI). In this article, I will introduce you to the algorithm at the heart of AlphaGo – Monte Carlo Tree Search (MCTS). This algorithm has one main purpose – given the state of a game, choose the most promising move.


Evolutionary-Neural Hybrid Agents for Architecture Search

arXiv.org Machine Learning

Neural Architecture Search has recently shown potential to automate the design of Neural Networks. The use of Neural Network agents trained with Reinforcement Learning can offer the possibility to learn complex architectural patterns, as well as the ability to explore a vast and compositional search space. On the other hand, evolutionary algorithms offer the sample efficiency needed for such a resource intensive application. We propose a class of Evolutionary-Neural hybrid agents (Evo-NAS), that retain the qualities of the two approaches. We show that the Evo-NAS agent outperforms both Neural and Evolutionary agents when applied to architecture search for a suite of text classification and image classification benchmarks. On a high-complexity architecture search space for image classification, the Evo-NAS agent surpasses the performance of commonly used agents with only 1/3 of the trials.


Automatic Synthesis of Totally Self-Checking Circuits

arXiv.org Artificial Intelligence

Totally self-checking (TSC) circuits are synthesised with a grid of computers running a distributed population based stochastic optimisation algorithm. The presented method is the first to automatically synthesise TSC circuits from arbitrary logic as all previous methods fail to guarantee the checker is self-testing (ST) for circuits with limited output codespaces. The circuits synthesised by the presented method have significantly lower overhead than the previously reported best for every one of a set of 11 frequently used benchmarks. Average overhead across the entire set is 23% of duplication and comparison overhead, compared with an average of 69% for the previous best reported values across the set. The methodology presented represents a breakthrough in concurrent error detection (CED). The highly efficient, novel designs produced are tailored to each circuit's function, rather than being constrained by a particular modular CED design methodology. Results are synthesised using two-input gates and are TSC with respect to all gate input and output stuck-at faults. The method can be used to add CED with or without modifications to the original logic, and can be generalised to any implementation technology and fault model. An example circuit is analysed and rigorously proven to be TSC.