Search
Monte-Carlo Tree Search as Regularized Policy Optimization
Grill, Jean-Bastien, Altché, Florent, Tang, Yunhao, Hubert, Thomas, Valko, Michal, Antonoglou, Ioannis, Munos, Rémi
The combination of Monte-Carlo tree search (MCTS) with deep reinforcement learning has led to significant advances in artificial intelligence. However, AlphaZero, the current state-of-the-art MCTS algorithm, still relies on handcrafted heuristics that are only partially understood. In this paper, we show that AlphaZero's search heuristics, along with other common ones such as UCT, are an approximation to the solution of a specific regularized policy optimization problem. With this insight, we propose a variant of AlphaZero which uses the exact solution to this policy optimization problem, and show experimentally that it reliably outperforms the original algorithm in multiple domains.
CATCH: Context-based Meta Reinforcement Learning for Transferrable Architecture Search
Chen, Xin, Duan, Yawen, Chen, Zewei, Xu, Hang, Chen, Zihao, Liang, Xiaodan, Zhang, Tong, Li, Zhenguo
Neural Architecture Search (NAS) achieved many breakthroughs in recent years. In spite of its remarkable progress, many algorithms are restricted to particular search spaces. They also lack efficient mechanisms to reuse knowledge when confronting multiple tasks. These challenges preclude their applicability, and motivate our proposal of CATCH, a novel Context-bAsed meTa reinforcement learning (RL) algorithm for transferrable arChitecture searcH. The combination of meta-learning and RL allows CATCH to efficiently adapt to new tasks while being agnostic to search spaces. CATCH utilizes a probabilistic encoder to encode task properties into latent context variables, which then guide CATCH's controller to quickly "catch" top-performing networks. The contexts also assist a network evaluator in filtering inferior candidates and speed up learning. Extensive experiments demonstrate CATCH's universality and search efficiency over many other widely-recognized algorithms. It is also capable of handling cross-domain architecture search as competitive networks on ImageNet, COCO, and Cityscapes are identified. This is the first work to our knowledge that proposes an efficient transferrable NAS solution while maintaining robustness across various settings.
TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP
Morris, John X., Lifland, Eli, Yoo, Jin Yong, Grigsby, Jake, Jin, Di, Qi, Yanjun
While there has been substantial research using adversarial attacks to analyze NLP models, each attack is implemented in its own code repository. It remains challenging to develop NLP attacks and utilize them to improve model performance. This paper introduces TextAttack, a Python framework for adversarial attacks, data augmentation, and adversarial training in NLP. TextAttack builds attacks from four components: a goal function, a set of constraints, a transformation, and a search method. TextAttack's modular design enables researchers to easily construct attacks from combinations of novel and existing components. TextAttack provides implementations of 16 adversarial attacks from the literature and supports a variety of models and datasets, including BERT and other transformers, and all GLUE tasks. TextAttack also includes data augmentation and adversarial training modules for using components of adversarial attacks to improve model accuracy and robustness. TextAttack is democratizing NLP: anyone can try data augmentation and adversarial training on any model or dataset, with just a few lines of code. Code and tutorials are available at https://github.com/QData/TextAttack.
What is important about the No Free Lunch theorems?
The No Free Lunch theorems prove that under a uniform distribution over induction problems (search problems or learning problems), all induction algorithms perform equally. As I discuss in this chapter, the importance of the theorems arises by using them to analyze scenarios involving {non-uniform} distributions, and to compare different algorithms, without any assumption about the distribution over problems at all. In particular, the theorems prove that {anti}-cross-validation (choosing among a set of candidate algorithms based on which has {worst} out-of-sample behavior) performs as well as cross-validation, unless one makes an assumption -- which has never been formalized -- about how the distribution over induction problems, on the one hand, is related to the set of algorithms one is choosing among using (anti-)cross validation, on the other. In addition, they establish strong caveats concerning the significance of the many results in the literature which establish the strength of a particular algorithm without assuming a particular distribution. They also motivate a ``dictionary'' between supervised learning and improve blackbox optimization, which allows one to ``translate'' techniques from supervised learning into the domain of blackbox optimization, thereby strengthening blackbox optimization algorithms. In addition to these topics, I also briefly discuss their implications for philosophy of science.
A Hierarchical Approach to Scaling Batch Active Search Over Structured Data
Myers, Vivek, Greenside, Peyton
Active search is the process of identifying high-value data points in a large and often high-dimensional parameter space that can be expensive to evaluate. Traditional active search techniques like Bayesian optimization trade off exploration and exploitation over consecutive evaluations, and have historically focused on single or small (<5) numbers of examples evaluated per round. As modern data sets grow, so does the need to scale active search to large data sets and batch sizes. In this paper, we present a general hierarchical framework based on bandit algorithms to scale active search to large batch sizes by maximizing information derived from the unique structure of each dataset. Our hierarchical framework, Hierarchical Batch Bandit Search (HBBS), strategically distributes batch selection across a learned embedding space by facilitating wide exploration of different structural elements within a dataset. We focus our application of HBBS on modern biology, where large batch experimentation is often fundamental to the research process, and demonstrate batch design of biological sequences (protein and DNA). We also present a new Gym environment to easily simulate diverse biological sequences and to enable more comprehensive evaluation of active search methods across heterogeneous data sets. The HBBS framework improves upon standard performance, wall-clock, and scalability benchmarks for batch search by using a broad exploration strategy across coarse partitions and fine-grained exploitation within each partition of structured data.
DeepCO: Offline Combinatorial Optimization Framework Utilizing Deep Learning
Combinatorial optimization serves as an essential part in many modern industrial applications. A great number of the problems are offline setting due to safety and/or cost issues. While simulation-based approaches appear difficult to realise for complicated systems, in this research, we propose DeepCO, an offline combinatorial optimization framework utilizing deep learning. We also design an offline variation of Travelling Salesman Problem (TSP) to model warehouse operation sequence optimization problem for evaluation. With only limited historical data, novel proposed distribution regularized optimization method outperforms existing baseline method in offline TSP experiment reducing route length by 5.7% averagely and shows great potential in real world problems.
Minimax Policy for Heavy-tailed Multi-armed Bandits
In these contexts, exploration means learning In the worst-case setting, the lower and upper bounds the environment while exploitation means taking empirically are distribution-free. Assuming the rewards are bounded, computed best actions. When finite time performance is concerned, Audibert and Bubeck [6] establish a Ω( KT) lower bound i.e., scenarios in which one cannot learn indefinitely, on the minimax regret. They also studied a modified UCB ensuring a good balance of exploration and exploitation is algorithm called Minimax Optimal Strategy in the Stochastic the key to a good performance. MAB and its variations are case (MOSS) and proved that it achieves an order-optimal prototypical models for these problems, and they are widely worst-case regret while maintaining a logarithm distributiondependent used in many areas such as network routing, recommendation regret. Degenne and Perchet [7] extend MOSS to systems and resource allocation; see [1, Chapter 1]. an anytime version called MOSSanytime. The stochastic MAB problem was originally proposed by The rewards being bounded or sub-Gaussian is a common Robbins [2]. In this problem, at each time, an agent chooses assumption that gives sample mean an exponential an arm from a set of K arms and receives the associated convergence and simplifies the MAB problem.
A Novel Column Generation Heuristic for Airline Crew Pairing Optimization with Large-scale Complex Flight Networks
Aggarwal, Divyam, Saxena, Dhish Kumar, Bäck, Thomas, Emmerich, Michael
For an airline, the crew operating cost is second only to the fuel cost, making the crew pairing optimization (CPO) critical for business viability. Its aim is to generate a set of flight sequences (crew pairings) that cover all flights in an airline's schedule, at minimum cost, while satisfying several legality constraints. Being an NP-hard combinatorial optimization problem, CPO is tackled by relaxing the underlying Integer Programming Problem into a Linear Programming Problem, and solving the latter through Column generation (CG) technique. However, with the expansion of airlines' operations lately, the curse of dimensionality renders the exact CG-implementations obsolete, paving the way for heuristic-based CG-implementations. Yet, the much prevalent large-scale complex flight networks involving multiple-crew bases and hub-and-spoke sub-networks, largely remain unaddressed. To bridge the research gap, this paper proposes a novel CG heuristic, which has enabled in-house development of an Airline Crew Pairing Optimizer (AirCROP). The efficacy of the heuristic/AirCROP has been: (a) tested on real-world airline data with an unprecedented conjunct scale-and-complexity, marked by over 4200 flights, 15 crew bases, and over a billion pairings, and (b) validated by the research consortium's industrial sponsor. This paper has a dedicated focus on the proposed CG heuristic which constitutes the core search mechanism of the optimizer, by balancing random exploration (of pairings' space), exploitation of domain knowledge (on optimal solution's features), and utilization of the past computational effort through archiving. Though this paper has an airline context, the underlying propositions may find applications across different domains as the proposed CG heuristic can serve as a template on how to utilize domain knowledge to better tackle large-scale combinatorial optimization problems.
How Can Smart Data Improve Search-Based Analytics? - DZone Big Data
Any business that is able to utilize its data effectively has shown great promise to survive diverse conditions and adapt to the growing market -- even in the Covid-19 pandemic. To ensure that business can be as flexible and adaptable as possible, they need to use Search-Based Analytics. Search-Based Analytics changed how business intelligence works. It gave businesses the necessary way to understand their data with the help of dashboards and visualization. In short, it provides the company's best minds to make data-driven decisions to reach company goals.
Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning
Khadka, Shauharda, Aflalo, Estelle, Marder, Mattias, Ben-David, Avrech, Miret, Santiago, Tang, Hanlin, Mannor, Shie, Hazan, Tamir, Majumdar, Somdeb
As modern neural networks have grown to billions of parameters, meeting tight latency budgets has become increasingly challenging. Approaches like compression, sparsification and network pruning have proven effective to tackle this problem - but they rely on modifications of the underlying network. In this paper, we look at a complimentary approach of optimizing how tensors are mapped to on-chip memory in an inference accelerator while leaving the network parameters untouched. Since different memory components trade off capacity for bandwidth differently, a sub-optimal mapping can result in high latency. We introduce evolutionary graph reinforcement learning (EGRL) - a method combining graph neural networks, reinforcement learning (RL) and evolutionary search - that aims to find the optimal mapping to minimize latency. Furthermore, a set of fast, stateless policies guide the evolutionary search to improve sample-efficiency. We train and validate our approach directly on the Intel NNP-I chip for inference using a batch size of 1. EGRL outperforms policy-gradient, evolutionary search and dynamic programming baselines on BERT, ResNet-101 and ResNet-50. We achieve 28-78% speed-up compared to the native NNP-I compiler on all three workloads.