parallelisation
Chee
My thesis is largely focused on the parallelisation of UCT (and other Best-First Search techniques) and the ramifications of doing so. I have identified issues with chunking in UCT, created by some forms of parallelisation, and developed a solution to this involving buffering of simulations that appear "out of order" and reevaluation of propagation data. I have developed a technique for scalable distribution of both tree data and computation across a large scale compute cluster. The context of most of my work is General Game Playing, but the techniques themselves are largely agnostic to domain.
Training multiple ML models and running data tasks in parallel via YARN Spark multithreading
To objective of this article is to show how a single data scientist can launch dozens or hundreds of data science-related tasks simultaneously (including machine learning model training) without using complex deployment frameworks. In fact, the tasks can be launched from a "data scientist"-friendly interface, namely, a single Python script which can be run from an interactive shell such as Jupyter, Spyder or Cloudera Workbench. The tasks can be themselves parallelised in order to handle large amounts of data, such that we effectively add a second layer of parallelism. "Data science" and "automation" are two words that invariably go hand-in-hand with each other, as one of the keys goals of machine learning is to allow machines to perform tasks more quickly, with lower cost, and/or better quality than humans. Naturally, it wouldn't make sense for an organization to spend more on tech staff that are supposed to develop and maintain systems that automate work (data scientists, data engineers, DevOps engineers, software engineers and others) than on the staff that do the work manually.
Probabilistic Graphical Models on Multi-Core CPUs using Java 8
Masegosa, Andres R., Martinez, Ana M., Borchani, Hanen
In this paper, we discuss software design issues related to the development of parallel computational intelligence algorithms on multi-core CPUs, using the new Java 8 functional programming features. In particular, we focus on probabilistic graphical models (PGMs) and present the parallelisation of a collection of algorithms that deal with inference and learning of PGMs from data. Namely, maximum likelihood estimation, importance sampling, and greedy search for solving combinatorial optimisation problems. Through these concrete examples, we tackle the problem of defining efficient data structures for PGMs and parallel processing of same-size batches of data sets using Java 8 features. We also provide straightforward techniques to code parallel algorithms that seamlessly exploit multi-core processors. The experimental analysis, carried out using our open source AMIDST (Analysis of MassIve Data STreams) Java toolbox, shows the merits of the proposed solutions.
Distribution of UCT and Its Ramifications
Chee, Marc Yu-San (The University of New South Wales)
My thesis is largely focused on the parallelisation of UCT (and other Best-First Search techniques) and the ramifications of doing so. I have identified issues with chunking in UCT, created by some forms of parallelisation, and developed a solution to this involving buffering of simulations that appear “out of order” and reevaluation of propagation data. I have developed a technique for scalable distribution of both tree data and computation across a large scale compute cluster. The context of most of my work is General Game Playing, but the techniques themselves are largely agnostic to domain.
A Multicore Tool for Constraint Solving
Amadini, Roberto (University of Bologna) | Gabbrielli, Maurizio (University of Bologna) | Mauro, Jacopo (University of Bologna)
In Constraint Programming (CP), a portfolio solver uses a variety of different solvers for solving a given Constraint Satisfaction / Optimization Problem. In this paper we introduce sunny-cp2: the first parallel CP portfolio solver that enables a dynamic, cooperative, and simultaneous execution of its solvers in a multicore setting. It incorporates state-of-the-art solvers, providing also a usable and configurable framework. Empirical results are very promising. sunny-cp2 can even outperform the performance of the oracle solver which always selects the best solver of the portfolio for a given problem.
A Multicore Tool for Constraint Solving
Amadini, Roberto, Gabbrielli, Maurizio, Mauro, Jacopo
In Constraint Programming (CP), a portfolio solver uses a variety of different solvers for solving a given Constraint Satisfaction / Optimization Problem. In this paper we introduce sunny-cp2: the first parallel CP portfolio solver that enables a dynamic, cooperative, and simultaneous execution of its solvers in a multicore setting. It incorporates state-of-the-art solvers, providing also a usable and configurable framework. Empirical results are very promising.
Efficient Argumentation for Medical Decision-Making
Craven, Robert (Imperial College London) | Toni, Francesca (Imperial College London) | Cadar, Cristian (Imperial College London) | Hadad, Adrian (Imperial College London) | Williams, Matthew (University College Hospital)
We describe the application of assumption-based argumentation (ABA) to a domain of medical knowledge derived from clinical trials of drugs for breast cancer. We adapt an algorithm for calculating the admissible semantics for ABA frameworks to take account of preferences and describe a prototype implementation which uses variant-based parallel computation to improve the efficiency of query answering.
Large Scale Nonparametric Bayesian Inference: Data Parallelisation in the Indian Buffet Process
Doshi-velez, Finale, Mohamed, Shakir, Ghahramani, Zoubin, Knowles, David A.
Nonparametric Bayesian models provide a framework for flexible probabilistic modelling of complex datasets. Unfortunately, Bayesian inference methods often require high-dimensional averages and can be slow to compute, especially with the potentially unbounded representations associated with nonparametric models. We address the challenge of scaling nonparametric Bayesian inference to the increasingly large datasets found in real-world applications, focusing on the case of parallelising inference in the Indian Buffet Process (IBP). Our approach divides a large data set between multiple processors. The processors use message passing to compute likelihoods in an asynchronous, distributed fashion and to propagate statistics about the global Bayesian posterior. This novel MCMC sampler is the first parallel inference scheme for IBP-based models, scaling to datasets orders of magnitude larger than had previously been possible.