Search
Demystifying Map Space Exploration for NPUs
Kao, Sheng-Chun, Parashar, Angshuman, Tsai, Po-An, Krishna, Tushar
Map Space Exploration is the problem of finding optimized mappings of a Deep Neural Network (DNN) model on an accelerator. It is known to be extremely computationally expensive, and there has been active research looking at both heuristics and learning-based methods to make the problem computationally tractable. However, while there are dozens of mappers out there (all empirically claiming to find better mappings than others), the research community lacks systematic insights on how different search techniques navigate the map-space and how different mapping axes contribute to the accelerator's performance and efficiency. Such insights are crucial to developing mapping frameworks for emerging DNNs that are increasingly irregular (due to neural architecture search) and sparse, making the corresponding map spaces much more complex. In this work, rather than proposing yet another mapper, we do a first-of-its-kind apples-to-apples comparison of search techniques leveraged by different mappers. Next, we extract the learnings from our study and propose two new techniques that can augment existing mappers -- warm-start and sparsity-aware -- that demonstrate speedups, scalability, and robustness across diverse DNN models.
Neural Topological Ordering for Computation Graphs
Gagrani, Mukul, Rainone, Corrado, Yang, Yang, Teague, Harris, Jeon, Wonseok, Van Hoof, Herke, Zeng, Weiliang Will, Zappi, Piero, Lott, Christopher, Bondesan, Roberto
Recent works on machine learning for combinatorial optimization have shown that learning based approaches can outperform heuristic methods in terms of speed and performance. In this paper, we consider the problem of finding an optimal topological order on a directed acyclic graph with focus on the memory minimization problem which arises in compilers. We propose an end-to-end machine learning based approach for topological ordering using an encoder-decoder framework. Our encoder is a novel attention based graph neural network architecture called \emph{Topoformer} which uses different topological transforms of a DAG for message passing. The node embeddings produced by the encoder are converted into node priorities which are used by the decoder to generate a probability distribution over topological orders. We train our model on a dataset of synthetically generated graphs called layered graphs. We show that our model outperforms, or is on-par, with several topological ordering baselines while being significantly faster on synthetic graphs with up to 2k nodes. We also train and test our model on a set of real-world computation graphs, showing performance improvements.
Large-Scale Differentiable Causal Discovery of Factor Graphs
Lopez, Romain, Hütter, Jan-Christian, Pritchard, Jonathan K., Regev, Aviv
A common theme in causal inference is learning causal relationships between observed variables, also known as causal discovery. This is usually a daunting task, given the large number of candidate causal graphs and the combinatorial nature of the search space. Perhaps for this reason, most research has so far focused on relatively small causal graphs, with up to hundreds of nodes. However, recent advances in fields like biology enable generating experimental data sets with thousands of interventions followed by rich profiling of thousands of variables, raising the opportunity and urgent need for large causal graph models. Here, we introduce the notion of factor directed acyclic graphs (f-DAGs) as a way to restrict the search space to non-linear low-rank causal interaction models. Combining this novel structural assumption with recent advances that bridge the gap between causal discovery and continuous optimization, we achieve causal discovery on thousands of variables. Additionally, as a model for the impact of statistical noise on this estimation procedure, we study a model of edge perturbations of the f-DAG skeleton based on random graphs and quantify the effect of such perturbations on the f-DAG rank. This theoretical analysis suggests that the set of candidate f-DAGs is much smaller than the whole DAG space and thus may be more suitable as a search space in the high-dimensional regime where the underlying skeleton is hard to assess. We propose Differentiable Causal Discovery of Factor Graphs (DCD-FG), a scalable implementation of -DAG constrained causal discovery for high-dimensional interventional data. DCD-FG uses a Gaussian non-linear low-rank structural equation model and shows significant improvements compared to state-of-the-art methods in both simulations as well as a recent large-scale single-cell RNA sequencing data set with hundreds of genetic interventions.
General Univariate Estimation-of-Distribution Algorithms
We propose a general formulation of a univariate estimation-of-distribution algorithm (EDA). It naturally incorporates the three classic univariate EDAs \emph{compact genetic algorithm}, \emph{univariate marginal distribution algorithm} and \emph{population-based incremental learning} as well as the \emph{max-min ant system} with iteration-best update. Our unified description of the existing algorithms allows a unified analysis of these; we demonstrate this by providing an analysis of genetic drift that immediately gives the existing results proven separately for the four algorithms named above. Our general model also includes EDAs that are more efficient than the existing ones and these may not be difficult to find as we demonstrate for the OneMax and LeadingOnes benchmarks.
Cost Aware Asynchronous Multi-Agent Active Search
Banerjee, Arundhati, Ghods, Ramina, Schneider, Jeff
Multi-agent active search requires autonomous agents to choose sensing actions that efficiently locate targets. In a realistic setting, agents also must consider the costs that their decisions incur. Previously proposed active search algorithms simplify the problem by ignoring uncertainty in the agent's environment, using myopic decision making, and/or overlooking costs. In this paper, we introduce an online active search algorithm to detect targets in an unknown environment by making adaptive cost-aware decisions regarding the agent's actions. Our algorithm combines principles from Thompson Sampling (for search space exploration and decentralized multi-agent decision making), Monte Carlo Tree Search (for long horizon planning) and pareto-optimal confidence bounds (for multi-objective optimization in an unknown environment) to propose an online lookahead planner that removes all the simplifications. We analyze the algorithm's performance in simulation to show its efficacy in cost aware active search.
Experiments in Underwater Feature Tracking with Performance Guarantees Using a Small AUV
Biggs, Benjamin, He, Hans, McMahon, James, Stilwell, Daniel J.
We present the results of experiments performed using a small autonomous underwater vehicle to determine the location of an isobath within a bounded area. The primary contribution of this work is to implement and integrate several recent developments real-time planning for environmental mapping, and to demonstrate their utility in a challenging practical example. We model the bathymetry within the operational area using a Gaussian process and propose a reward function that represents the task of mapping a desired isobath. As is common in applications where plans must be continually updated based on real-time sensor measurements, we adopt a receding horizon framework where the vehicle continually computes near-optimal paths. The sequence of paths does not, in general, inherit the optimality properties of each individual path. Our real-time planning implementation incorporates recent results that lead to performance guarantees for receding-horizon planning.
Medical Image Retrieval via Nearest Neighbor Search on Pre-trained Image Features
Gupta, Deepak, Loane, Russell, Gayen, Soumya, Demner-Fushman, Dina
Nearest neighbor search (NNS) aims to locate the points in high-dimensional space that is closest to the query point. The brute-force approach for finding the nearest neighbor becomes computationally infeasible when the number of points is large. The NNS has multiple applications in medicine, such as searching large medical imaging databases, disease classification, diagnosis, etc. With a focus on medical imaging, this paper proposes DenseLinkSearch an effective and efficient algorithm that searches and retrieves the relevant images from heterogeneous sources of medical images. Towards this, given a medical database, the proposed algorithm builds the index that consists of pre-computed links of each point in the database. The search algorithm utilizes the index to efficiently traverse the database in search of the nearest neighbor. We extensively tested the proposed NNS approach and compared the performance with state-of-the-art NNS approaches on benchmark datasets and our created medical image datasets. The proposed approach outperformed the existing approach in terms of retrieving accurate neighbors and retrieval speed. We also explore the role of medical image feature representation in content-based medical image retrieval tasks. We propose a Transformer-based feature representation technique that outperformed the existing pre-trained Transformer approach on CLEF 2011 medical image retrieval task. The source code of our experiments are available at https://github.com/deepaknlp/DLS.
TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches
Shah, Aashaka, Chidambaram, Vijay, Cowan, Meghan, Maleki, Saeed, Musuvathi, Madan, Mytkowicz, Todd, Nelson, Jacob, Saarikivi, Olli, Singh, Rachee
Machine learning models are increasingly being trained across multiple GPUs and servers. In this setting, data is transferred between GPUs using communication collectives such as AlltoAll and AllReduce, which can become a significant bottleneck in training large models. Thus, it is important to use efficient algorithms for collective communication. We develop TACCL, a tool that enables algorithm designers to guide a synthesizer into automatically generating algorithms for a given hardware configuration and communication collective. TACCL uses a novel communication sketch abstraction to get crucial information from the designer to significantly reduce the search space and guide the synthesizer towards better algorithms. TACCL also uses a novel encoding of the problem that allows it to scale beyond single-node topologies. We use TACCL to synthesize algorithms for three collectives and two hardware topologies: DGX-2 and NDv2. We demonstrate that the algorithms synthesized by TACCL outperform the Nvidia Collective Communication Library (NCCL) by up to 6.7x. We also show that TACCL can speed up end-to-end training of Transformer-XL and BERT models by 11%--2.3x for different batch sizes.
Automaton-Guided Control Synthesis for Signal Temporal Logic Specifications
Ho, Qi Heng, Ilyes, Roland B., Sunberg, Zachary N., Lahijanian, Morteza
This paper presents an algorithmic framework for control synthesis of continuous dynamical systems subject to signal temporal logic (STL) specifications. We propose a novel algorithm to obtain a time-partitioned finite automaton from an STL specification, and introduce a multi-layered framework that utilizes this automaton to guide a sampling-based search tree both spatially and temporally. Our approach is able to synthesize a controller for nonlinear dynamics and polynomial predicate functions. We prove the correctness and probabilistic completeness of our algorithm, and illustrate the efficiency and efficacy of our framework on several case studies. Our results show an order of magnitude speedup over the state of the art.
Near-Optimal Algorithms for Making the Gradient Small in Stochastic Minimax Optimization
We study the problem of finding a near-stationary point for smooth minimax optimization. The recent proposed extra anchored gradient (EAG) methods achieve the optimal convergence rate for the convex-concave minimax problem in deterministic setting. However, the direct extension of EAG to stochastic optimization is not efficient.In this paper, we design a novel stochastic algorithm called Recursive Anchored IteratioN (RAIN). We show that the RAIN achieves near-optimal stochastic first-order oracle (SFO) complexity for stochastic minimax optimization in both convex-concave and strongly-convex-strongly-concave cases. In addition, we extend the idea of RAIN to solve structured nonconvex-nonconcave minimax problem and it also achieves near-optimal SFO complexity.