Evolutionary Systems
Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents
Conti, Edoardo, Madhavan, Vashisht, Such, Felipe Petroski, Lehman, Joel, Stanley, Kenneth, Clune, Jeff
Evolution strategies (ES) are a family of black-box optimization algorithms able to train deep neural networks roughly as well as Q-learning and policy gradient methods on challenging deep reinforcement learning (RL) problems, but are much faster (e.g. hours vs. days) because they parallelize better. However, many RL problems require directed exploration because they have reward functions that are sparse or deceptive (i.e. contain local optima), and it is unknown how to encourage such exploration with ES. Here we show that algorithms that have been invented to promote directed exploration in small-scale evolved neural networks via populations of exploring agents, specifically novelty search (NS) and quality diversity (QD) algorithms, can be hybridized with ES to improve its performance on sparse or deceptive deep RL tasks, while retaining scalability. Our experiments confirm that the resultant new algorithms, NS-ES and two QD algorithms, NSR-ES and NSRA-ES, avoid local optima encountered by ES to achieve higher performance on Atari and simulated robots learning to walk around a deceptive trap. This paper thus introduces a family of fast, scalable algorithms for reinforcement learning that are capable of directed exploration. It also adds this new family of exploration algorithms to the RL toolbox and raises the interesting possibility that analogous algorithms with multiple simultaneous paths of exploration might also combine well with existing RL algorithms outside ES.
Space Expansion of Feature Selection for Designing more Accurate Error Predictors
Nikkhah, Shayan Tabatabaei, Kamal, Mehdi, Afzali-Kusha, Ali, Pedram, Massoud
Approximate computing is being considered as a promising design paradigm to overcome the energy and performance challenges in computationally demanding applications. If the case where the accuracy can be configured, the quality level versus energy efficiency or delay also may be traded-off. For this technique to be used, one needs to make sure a satisfactory user experience. This requires employing error predictors to detect unacceptable approximation errors. In this work, we propose a scheduling-aware feature selection method which leverages the intermediate results of the hardware accelerator to improve the prediction accuracy. Additionally, it configures the error predictors according to the energy consumption and latency of the system. The approach enjoys the flexibility of the prediction time for a higher accuracy. The results on various benchmarks demonstrate significant improvements in the prediction accuracy compared to the prior works which used only the accelerator inputs for the prediction.
Meta Reinforcement Learning with Distribution of Exploration Parameters Learned by Evolution Strategies
Shen, Yiming, Yang, Kehan, Yuan, Yufeng, Liu, Simon Cheng
In this paper, we propose a novel meta-learning method in a reinforcement learning setting, based on evolution strategies (ES), exploration in parameter space and deterministic policy gradients. ES methods are easy to parallelize, which is desirable for modern training architectures; however, such methods typically require a huge number of samples for effective training. We use deterministic policy gradients during adaptation and other techniques to compensate for the sample-efficiency problem while maintaining the inherent scalability of ES methods. We demonstrate that our method achieves good results compared to gradient-based meta-learning in high-dimensional control tasks in the MuJoCo simulator. In addition, because of gradient-free methods in the meta-training phase, which do not need information about gradients and policies in adaptation training, we predict and confirm our algorithm performs better in tasks that need multi-step adaptation.
Neuromemrisitive Architecture of HTM with On-Device Learning and Neurogenesis
Zyarah, Abdullah M., Kudithipudi, Dhireesha
Hierarchical temporal memory (HTM) is a biomimetic sequence memory algorithm that holds promise for invariant representations of spatial and spatiotemporal inputs. This paper presents a comprehensive neuromemristive crossbar architecture for the spatial pooler (SP) and the sparse distributed representation classifier, which are fundamental to the algorithm. There are several unique features in the proposed architecture that tightly link with the HTM algorithm. A memristor that is suitable for emulating the HTM synapses is identified and a new Z-window function is proposed. The architecture exploits the concept of synthetic synapses to enable potential synapses in the HTM. The crossbar for the SP avoids dark spots caused by unutilized crossbar regions and supports rapid on-chip training within 2 clock cycles. This research also leverages plasticity mechanisms such as neurogenesis and homeostatic intrinsic plasticity to strengthen the robustness and performance of the SP. The proposed design is benchmarked for image recognition tasks using MNIST and Yale faces datasets, and is evaluated using different metrics including entropy, sparseness, and noise robustness. Detailed power analysis at different stages of the SP operations is performed to demonstrate the suitability for mobile platforms.
A Hybrid Genetic Algorithm for the Traveling Salesman Problem with Drone
Ha, Quang Minh, Deville, Yves, Pham, Quang Dung, Hร , Minh Hoร ng
This paper addresses the Traveling Salesman Problem with Drone (TSP-D), in which a truck and drone are used to deliver parcels to customers. The objective of this problem is to either minimize the total operational cost (min-cost TSP-D) or minimize the completion time for the truck and drone (min-time TSP-D). This problem has gained a lot of attention in the last few years since it is matched with the recent trends in a new delivery method among logistics companies. To solve the TSP-D, we propose a hybrid genetic search with dynamic population management and adaptive diversity control based on a split algorithm, problem-tailored crossover and local search operators, a new restore method to advance the convergence and an adaptive penalization mechanism to dynamically balance the search between feasible/infeasible solutions. The computational results show that the proposed algorithm outperforms existing methods in terms of solution quality and improves best known solutions found in the literature. Moreover, various analyses on the impacts of crossover choice and heuristic components have been conducted to analysis further their sensitivity to the performance of our method.
Autonomous Tools and Design
Designers increasingly leverage autonomous software tools that make decisions independent of the designer. Examples abound in virtually every design field. For example, semiconductor chip designers use tools that make decisions about placement and logic checking. Game designers rely on software that generates initial drafts of virtual worlds. Autonomous tools employ artificial intelligence methods, including machine learning, pattern recognition, meta-heuristics, and evolutionary algorithms to generate design artifacts beyond any human's capabilities. A naรฏve view suggests these tools will someday replace human designers in the design process. An alternative perspective is that humans will continue to play an important role but also that this role is changing.
Fast Exact Computation of Expected HyperVolume Improvement
Zhao, Guang, Arroyave, Raymundo, Qian, Xiaoning
In multi-objective Bayesian optimization and surrogate-based evolutionary algorithms, Expected HyperVolume Improvement (EHVI) is widely used as the acquisition function to guide the search approaching the Pareto front. This paper focuses on the exact calculation of EHVI given a nondominated set, for which the existing exact algorithms are complex and can be inefficient for problems with more than three objectives. Integrating with different decomposition algorithms, we propose a new method for calculating the integral in each decomposed high-dimensional box in constant time. We develop three new exact EHVI calculation algorithms based on three region decomposition methods. The first grid-based algorithm has a complexity of $O(m\cdot n^m)$ with $n$ denoting the size of the nondominated set and $m$ the number of objectives. The Walking Fish Group (WFG)-based algorithm has a worst-case complexity of $O(m\cdot 2^n)$ but has a better average performance. These two can be applied for problems with any $m$. The third CLM-based algorithm is only for $m=3$ and asymptotically optimal with complexity $\Theta(n\log{n})$. Performance comparison results show that all our three algorithms are at least twice faster than the state-of-the-art algorithms with the same decomposition methods. When $m>3$, our WFG-based algorithm can be over $10^2$ faster than the corresponding existing algorithms. Our algorithm is demonstrated in an example involving efficient multi-objective material design with Bayesian optimization.
Online Decisioning Meta-Heuristic Framework for Large Scale Black-Box Optimization
Zhao, Mingde, Ge, Hongwei, Lian, Yi, Chen, C. L. Philip
Out of practical concerns and with the expectation to achieve high overall efficiency of the resource utilization, this paper transforms the large scale black-box optimization problems with limited resources into online decision problems from the perspective of dynamic multi-armed bandits, a simplified view of Markov decision processes. The proposed Online Decisioning Meta-heuristic framework (ODM) is particularly well suited for real-world applications, with flexible compatibility for various kinds of costs, interfaces for easy heuristic articulation as well as fewer hyper-parameters for less variance in performance. Experimental results on benchmark functions suggest that ODM has demonstrated significant capabilities for online decisioning. Furthermore, when ODM is articulated with three heuristics, competitive performance can be achieved on benchmark problems with search dimensions up to 10000.
A near Pareto optimal approach to student-supervisor allocation with two sided preferences and workload balance
Sanchez-Anguix, Victor, Chalumuri, Rithin, Aydogan, Reyhan, Julian, Vicente
Students are usually allocated tosupervisors for their projects by means of a centralized human decision maker or by means of interactions between students and staff members. The decision makers have to take into consideration the preferences of both students and supervisors with respect to the conduct of the project, as well as departmental constraintssuch as minimum and maximum levels of workload (in terms of supervision) for each supervisor. This situation results in an extremely time consuming process, and a suboptimal allocation due to a large and complex search space faced by human decision makers. Automating this process by applying artificial intelligence techniques may enhance the process in terms of satisfaction and performance of students with these individual projects. In this article, we present a genetic algorithm for matching students to supervisors accordingto both students' and supervisors' preferences and the constraints of the department. The rationale behind this problem is matching an appropriate student with a supervisor for the development of an individual project.The problem of matching students to supervisors, or students to projects [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13], is a subclass of the wider problem of matching between two sets, one of the most studied fields in computer sciencedue to its applications to a wide range of domains such as the hospital/residents (HR) or the college admission (CA) problem [14, 15, 16]. Particularly, the student-supervisor allocation problem solved in this article can be considered as an instance of the CA problem with lower and upper quotas, where the colleges are the supervisors, both colleges and students (i.e., supervisors andstudents in our case) have some representation of preferences on each other for the conduct of a project, and the minimum and maximum quotas are the minimum and maximum number of students to be supervised by staff members.