Goto

Collaborating Authors

 parallel optimization




High-Performance Parallel Optimization of the Fish School Behaviour on the Setonix Platform Using OpenMP

arXiv.org Artificial Intelligence

This paper presents an in-depth investigation into the high-performance parallel optimization of the Fish School Behaviour (FSB) algorithm on the Setonix supercomputing platform using the OpenMP framework. Given the increasing demand for enhanced computational capabilities for complex, large-scale calculations across diverse domains, there's an imperative need for optimized parallel algorithms and computing structures. The FSB algorithm, inspired by nature's social behavior patterns, provides an ideal platform for parallelization due to its iterative and computationally intensive nature. This study leverages the capabilities of the Setonix platform and the OpenMP framework to analyze various aspects of multi-threading, such as thread counts, scheduling strategies, and OpenMP constructs, aiming to discern patterns and strategies that can elevate program performance. Experiments were designed to rigorously test different configurations, and our results not only offer insights for parallel optimization of FSB on Setonix but also provide valuable references for other parallel computational research using OpenMP. Looking forward, other factors, such as cache behavior and thread scheduling strategies at micro and macro levels, hold potential for further exploration and optimization.


AutoML for neuromorphic computing and application-driven co-design: asynchronous, massively parallel optimization of spiking architectures

arXiv.org Artificial Intelligence

In this work we have extended AutoML inspired approaches to the exploration and optimization of neuromorphic architectures. Through the integration of a parallel asynchronous model-based search approach with a simulation framework to simulate spiking architectures, we are able to efficiently explore the configuration space of neuromorphic architectures and identify the subset of conditions leading to the highest performance in a targeted application. We have demonstrated this approach on an exemplar case of real time, on-chip learning application. Our results indicate that we can effectively use optimization approaches to optimize complex architectures, therefore providing a viable pathway towards application-driven codesign.


On the Learning Mechanisms in Physical Reasoning

arXiv.org Artificial Intelligence

Is dynamics prediction indispensable for physical reasoning? If so, what kind of roles do the dynamics prediction modules play during the physical reasoning process? Most studies focus on designing dynamics prediction networks and treating physical reasoning as a downstream task without investigating the questions above, taking for granted that the designed dynamics prediction would undoubtedly help the reasoning process. In this work, we take a closer look at this assumption, exploring this fundamental hypothesis by comparing two learning mechanisms: Learning from Dynamics (LfD) and Learning from Intuition (LfI). In the first experiment, we directly examine and compare these two mechanisms. Results show a surprising finding: Simple LfI is better than or on par with state-of-the-art LfD. This observation leads to the second experiment with Ground-truth Dynamics, the ideal case of LfD wherein dynamics are obtained directly from a simulator. Results show that dynamics, if directly given instead of approximated, would achieve much higher performance than LfI alone on physical reasoning; this essentially serves as the performance upper bound. Yet practically, LfD mechanism can only predict Approximate Dynamics using dynamics learning modules that mimic the physical laws, making the following downstream physical reasoning modules degenerate into the LfI paradigm; see the third experiment. We note that this issue is hard to mitigate, as dynamics prediction errors inevitably accumulate in the long horizon. Finally, in the fourth experiment, we note that LfI, the extremely simpler strategy when done right, is more effective in learning to solve physical reasoning problems. Taken together, the results on the challenging benchmark of PHYRE show that LfI is, if not better, as good as LfD for dynamics prediction. However, the potential improvement from LfD, though challenging, remains lucrative.


BoTorch: Programmable Bayesian Optimization in PyTorch

arXiv.org Machine Learning

Bayesian optimization provides sample-efficient global optimization for a broad range of applications, including automatic machine learning, molecular chemistry, and experimental design. We introduce BoTorch, a modern programming framework for Bayesian optimization. Enabled by Monte-Carlo (MC) acquisition functions and auto-differentiation, BoTorch's modular design facilitates flexible specification and optimization of probabilistic models written in PyTorch, radically simplifying implementation of novel acquisition functions. Our MC approach is made practical by a distinctive algorithmic foundation that leverages fast predictive distributions and hardware acceleration. In experiments, we demonstrate the improved sample efficiency of BoTorch relative to other popular libraries. BoTorch is open source and available at https://github.com/pytorch/botorch.


99% of Parallel Optimization is Inevitably a Waste of Time

arXiv.org Machine Learning

It is well known that many optimization methods, including SGD, SAGA, and Accelerated SGD for over-parameterized models, do not scale linearly in the parallel setting. In this paper, we present a new version of block coordinate descent that solves this issue for a number of methods. The core idea is to make the sampling of coordinate blocks on each parallel unit independent of the others. Surprisingly, we prove that the optimal number of blocks to be updated by each of $n$ units in every iteration is equal to $m/n$, where $m$ is the total number of blocks. As an illustration, this means that when $n=100$ parallel units are used, $99\%$ of work is a waste of time. We demonstrate that with $m/n$ blocks used by each unit the iteration complexity often remains the same. Among other applications which we mention, this fact can be exploited in the setting of distributed optimization to break the communication bottleneck. Our claims are justified by numerical experiments which demonstrate almost a perfect match with our theory on a number of datasets.


Parallel Optimization of Motion Controllers via Policy Iteration

Neural Information Processing Systems

This paper describes a policy iteration algorithm for optimizing the performance of a harmonic function-based controller with respect to a user-defined index. Value functions are represented as potential distributions over the problem domain, being control policies represented as gradient fields over the same domain. All intermediate policies are intrinsically safe, i.e. collisions are not promoted during the adaptation process. The algorithm has efficient implementation in parallel SIMD architectures. One potential application - travel distance minimization - illustrates its usefulness.


Parallel Optimization of Motion Controllers via Policy Iteration

Neural Information Processing Systems

This paper describes a policy iteration algorithm for optimizing the performance of a harmonic function-based controller with respect to a user-defined index. Value functions are represented as potential distributions over the problem domain, being control policies represented as gradient fields over the same domain. All intermediate policies are intrinsically safe, i.e. collisions are not promoted during the adaptation process. The algorithm has efficient implementation in parallel SIMD architectures. One potential application - travel distance minimization - illustrates its usefulness.


Parallel Optimization of Motion Controllers via Policy Iteration

Neural Information Processing Systems

This paper describes a policy iteration algorithm for optimizing the performance of a harmonic function-based controller with respect to a user-defined index. Value functions are represented as potential distributionsover the problem domain, being control policies represented as gradient fields over the same domain. All intermediate policiesare intrinsically safe, i.e. collisions are not promoted during the adaptation process. The algorithm has efficient implementation inparallel SIMD architectures. One potential application - travel distance minimization - illustrates its usefulness.