Search
Lookback for Learning to Branch
Gupta, Prateek, Khalil, Elias B., Chetélat, Didier, Gasse, Maxime, Bengio, Yoshua, Lodi, Andrea, Kumar, M. Pawan
The expressive and computationally inexpensive bipartite Graph Neural Networks (GNN) have been shown to be an important component of deep learning based Mixed-Integer Linear Program (MILP) solvers. Recent works have demonstrated the effectiveness of such GNNs in replacing the branching (variable selection) heuristic in branch-and-bound (B&B) solvers. These GNNs are trained, offline and on a collection of MILPs, to imitate a very good but computationally expensive branching heuristic, strong branching. Given that B&B results in a tree of sub-MILPs, we ask (a) whether there are strong dependencies exhibited by the target heuristic among the neighboring nodes of the B&B tree, and (b) if so, whether we can incorporate them in our training procedure. Specifically, we find that with the strong branching heuristic, a child node's best choice was often the parent's second-best choice. We call this the "lookback" phenomenon. Surprisingly, the typical branching GNN of Gasse et al. (2019) often misses this simple "answer". To imitate the target behavior more closely by incorporating the lookback phenomenon in GNNs, we propose two methods: (a) target smoothing for the standard cross-entropy loss function, and (b) adding a Parent-as-Target (PAT) Lookback regularizer term. Finally, we propose a model selection framework to incorporate harder-to-formulate objectives such as solving time in the final models. Through extensive experimentation on standard benchmark instances, we show that our proposal results in up to 22% decrease in the size of the B&B tree and up to 15% improvement in the solving times.
The Voronoigram: Minimax Estimation of Bounded Variation Functions From Scattered Data
Hu, Addison J., Green, Alden, Tibshirani, Ryan J.
We consider the problem of estimating a multivariate function $f_0$ of bounded variation (BV), from noisy observations $y_i = f_0(x_i) + z_i$ made at random design points $x_i \in \mathbb{R}^d$, $i=1,\ldots,n$. We study an estimator that forms the Voronoi diagram of the design points, and then solves an optimization problem that regularizes according to a certain discrete notion of total variation (TV): the sum of weighted absolute differences of parameters $\theta_i,\theta_j$ (which estimate the function values $f_0(x_i),f_0(x_j)$) at all neighboring cells $i,j$ in the Voronoi diagram. This is seen to be equivalent to a variational optimization problem that regularizes according to the usual continuum (measure-theoretic) notion of TV, once we restrict the domain to functions that are piecewise constant over the Voronoi diagram. The regression estimator under consideration hence performs (shrunken) local averaging over adaptively formed unions of Voronoi cells, and we refer to it as the Voronoigram, following the ideas in Koenker (2005), and drawing inspiration from Tukey's regressogram (Tukey, 1961). Our contributions in this paper span both the conceptual and theoretical frontiers: we discuss some of the unique properties of the Voronoigram in comparison to TV-regularized estimators that use other graph-based discretizations; we derive the asymptotic limit of the Voronoi TV functional; and we prove that the Voronoigram is minimax rate optimal (up to log factors) for estimating BV functions that are essentially bounded.
HUSP-SP: Faster Utility Mining on Sequence Data
Zhang, Chunkai, Yang, Yuting, Du, Zilin, Gan, Wensheng, Yu, Philip S.
High-utility sequential pattern mining (HUSPM) has emerged as an important topic due to its wide application and considerable popularity. However, due to the combinatorial explosion of the search space when the HUSPM problem encounters a low utility threshold or large-scale data, it may be time-consuming and memory-costly to address the HUSPM problem. Several algorithms have been proposed for addressing this problem, but they still cost a lot in terms of running time and memory usage. In this paper, to further solve this problem efficiently, we design a compact structure called sequence projection (seqPro) and propose an efficient algorithm, namely discovering high-utility sequential patterns with the seqPro structure (HUSP-SP). HUSP-SP utilizes the compact seq-array to store the necessary information in a sequence database. The seqPro structure is designed to efficiently calculate candidate patterns' utilities and upper bound values. Furthermore, a new upper bound on utility, namely tighter reduced sequence utility (TRSU) and two pruning strategies in search space, are utilized to improve the mining performance of HUSP-SP. Experimental results on both synthetic and real-life datasets show that HUSP-SP can significantly outperform the state-of-the-art algorithms in terms of running time, memory usage, search space pruning efficiency, and scalability.
The LM-Cut Heuristic Family for Optimal Numeric Planning with Simple Conditions
Kuroiwa, Ryo (University of Toronto) | Shleyfman, Alexander (Technion - Israel Institute of Technology) | Piacentini, Chiara (Augmenta Inc) | Castro, Margarita P. (Pontificia Universidad Católica de Chile) | Beck, J. Christopher (University of Toronto)
The LM-cut heuristic, both alone and as part of the operator counting framework, represents one of the most successful heuristics for classical planning. In this paper, we generalize LM-cut and its use in operator counting to optimal numeric planning with simple conditions and simple numeric effects, i.e., linear expressions over numeric state variables and actions that increase or decrease such variables by constant quantities. We introduce a variant of hmaxhbd (a previously proposed numeric hmax heuristic) based on the delete-relaxed version of such planning tasks and show that, although inadmissible by itself, our variant yields a numeric version of the classical LM-cut heuristic which is admissible. We classify the three existing families of heuristics for this class of numeric planning tasks and introduce the LM-cut family, proving dominance or incomparability between all pairs of existing max and LM-cut heuristics for numeric planning with simple conditions. Our extensive empirical evaluation shows that the new LM-cut heuristic, both on its own and as part of the operator counting framework, is the state-of-the-art for this class of numeric planning problem.
Near-Term Quantum Computing Techniques: Variational Quantum Algorithms, Error Mitigation, Circuit Compilation, Benchmarking and Classical Simulation
Huang, He-Liang, Xu, Xiao-Yue, Guo, Chu, Tian, Guojing, Wei, Shi-Jie, Sun, Xiaoming, Bao, Wan-Su, Long, Gui-Lu
Quantum computing is a game-changing technology for global academia, research centers and industries including computational science, mathematics, finance, pharmaceutical, materials science, chemistry and cryptography. Although it has seen a major boost in the last decade, we are still a long way from reaching the maturity of a full-fledged quantum computer. That said, we will be in the Noisy-Intermediate Scale Quantum (NISQ) era for a long time, working on dozens or even thousands of qubits quantum computing systems. An outstanding challenge, then, is to come up with an application that can reliably carry out a nontrivial task of interest on the near-term quantum devices with non-negligible quantum noise. To address this challenge, several near-term quantum computing techniques, including variational quantum algorithms, error mitigation, quantum circuit compilation and benchmarking protocols, have been proposed to characterize and mitigate errors, and to implement algorithms with a certain resistance to noise, so as to enhance the capabilities of near-term quantum devices and explore the boundaries of their ability to realize useful applications. Besides, the development of near-term quantum devices is inseparable from the efficient classical simulation, which plays a vital role in quantum algorithm design and verification, error-tolerant verification and other applications. This review will provide a thorough introduction of these near-term quantum computing techniques, report on their progress, and finally discuss the future prospect of these techniques, which we hope will motivate researchers to undertake additional studies in this field.
Don't Be So Sure! Boosting ASR Decoding via Confidence Relaxation
Wullach, Tomer, Chazan, Shlomo E.
Automatic Speech Recognition (ASR) systems frequently use a search-based decoding strategy aiming to find the best attainable transcript by considering multiple candidates. One prominent speech recognition decoding heuristic is beam search, which seeks the transcript with the greatest likelihood computed using the predicted distribution. While showing substantial performance gains in various tasks, beam search loses some of its effectiveness when the predicted probabilities are highly confident, i.e., the predicted distribution is massed for a single or very few classes. We show that recently proposed Self-Supervised Learning (SSL)-based ASR models tend to yield exceptionally confident predictions that may hamper beam search from truly considering a diverse set of candidates. We perform a layer analysis to reveal and visualize how predictions evolve, and propose a decoding procedure that improves the performance of fine-tuned ASR models. Our proposed approach does not require further training beyond the original fine-tuning, nor additional model parameters. In fact, we find that our proposed method requires significantly less inference computation than current approaches. We propose aggregating the top M layers, potentially leveraging useful information encoded in intermediate layers, and relaxing model confidence. We demonstrate the effectiveness of our approach by conducting an empirical study on varying amounts of labeled resources and different model sizes, showing consistent improvements in particular when applied to low-resource scenarios.
The Grey Wolf Optimizer - Teaching & Academics
Search Algorithms and Optimization techniques are the engines of most Artificial Intelligence techniques and Data Science. There is no doubt that the Grey Wolf Optimizer is one of the most recent, well-regarded and widely-used AI search techniques. A lot of scientists and practitioners use search and optimization algorithms without understanding their internal structure. However, understanding the internal structure and mechanism of such AI problem-solving techniques will allow them to solve problems more efficiently. This also allows them to tune, tweak, and even design new algorithms for different projects.
UB3: Best Beam Identification in Millimeter Wave Systems via Pure Exploration Unimodal Bandits
Ghosh, Debamita, Rahman, Haseen, Hanawal, Manjesh K., Zlatanov, Nikola
Millimeter wave (mmWave) communications have a broad spectrum and can support data rates in the order of gigabits per second, as envisioned in 5G systems. However, they cannot be used for long distances due to their sensitivity to attenuation loss. To enable their use in the 5G network, it requires that the transmission energy be focused in sharp pencil beams. As any misalignment between the transmitter and receiver beam pair can reduce the data rate significantly, it is important that they are aligned as much as possible. To find the best transmit-receive beam pair, recent beam alignment (BA) techniques examine the entire beam space, which might result in a large amount of BA latency. Recent works propose to adaptively select the beams such that the cumulative reward measured in terms of received signal strength or throughput is maximized. In this paper, we develop an algorithm that exploits the unimodal structure of the received signal strengths of the beams to identify the best beam in a finite time using pure exploration strategies. Strategies that identify the best beam in a fixed time slot are more suitable for wireless network protocol design than cumulative reward maximization strategies that continuously perform exploration and exploitation. Our algorithm is named Unimodal Bandit for Best Beam (UB3) and identifies the best beam with a high probability in a few rounds. We prove that the error exponent in the probability does not depend on the number of beams and show that this is indeed the case by establishing a lower bound for the unimodal bandits. We demonstrate that UB3 outperforms the state-of-the-art algorithms through extensive simulations. Moreover, our algorithm is simple to implement and has lower computational complexity.
Teaching - CS 221
CS 221 ― Artificial Intelligence My twin brother Afshine and I created this set of illustrated Artificial Intelligence cheatsheets covering the content of the CS 221 class, which I TA-ed in Spring 2019 at Stanford. They can (hopefully!) be useful to all future students of this course as well as to anyone else interested in Artificial Intelligence. You can help us translating them on GitHub!
Merging enzymatic and synthetic chemistry with computational synthesis planning
Synthesis planning programs trained on chemical reaction data can design efficient routes to new molecules of interest, but are limited in their ability to leverage rare chemical transformations. This challenge is acute for enzymatic reactions, which are valuable due to their selectivity and sustainability but are few in number. We report a retrosynthetic search algorithm using two neural network models for retrosynthesis–one covering 7984 enzymatic transformations and one 163,723 synthetic transformations–that balances the exploration of enzymatic and synthetic reactions to identify hybrid synthesis plans. This approach extends the space of retrosynthetic moves by thousands of uniquely enzymatic one-step transformations, discovers routes to molecules for which synthetic or enzymatic searches find none, and designs shorter routes for others. Application to (-)-Δ9 tetrahydrocannabinol (THC) (dronabinol) and R,R-formoterol (arformoterol) illustrates how our strategy facilitates the replacement of metal catalysis, high step counts, or costly enantiomeric resolution with more elegant hybrid proposals. The identification of synthetic routes combining enzymatic and non-enzymatic reactions has been challenging and requiring expert knowledge. Here, the authors describe a computational retrosynthetic approach relying on neural network models for planning synthetic routes using both strategies.