Search
Speeding Up Multi-Objective Hyperparameter Optimization by Task Similarity-Based Meta-Learning for the Tree-Structured Parzen Estimator
Watanabe, Shuhei, Awad, Noor, Onishi, Masaki, Hutter, Frank
Hyperparameter optimization (HPO) is a vital step in improving performance in deep learning (DL). Practitioners are often faced with the trade-off between multiple criteria, such as accuracy and latency. Given the high computational needs of DL and the growing demand for efficient HPO, the acceleration of multi-objective (MO) optimization becomes ever more important. Despite the significant body of work on meta-learning for HPO, existing methods are inapplicable to MO tree-structured Parzen estimator (MO-TPE), a simple yet powerful MO-HPO algorithm. In this paper, we extend TPE's acquisition function to the meta-learning setting using a task similarity defined by the overlap of top domains between tasks. We also theoretically analyze and address the limitations of our task similarity. In the experiments, we demonstrate that our method speeds up MO-TPE on tabular HPO benchmarks and attains state-of-the-art performance. Our method was also validated externally by winning the AutoML 2022 competition on "Multiobjective Hyperparameter Optimization for Transformers".
Personalized Algorithmic Recourse with Preference Elicitation
De Toni, Giovanni, Viappiani, Paolo, Teso, Stefano, Lepri, Bruno, Passerini, Andrea
Algorithmic Recourse (AR) is the problem of computing a sequence of actions that -- once performed by a user -- overturns an undesirable machine decision. It is paramount that the sequence of actions does not require too much effort for users to implement. Yet, most approaches to AR assume that actions cost the same for all users, and thus may recommend unfairly expensive recourse plans to certain users. Prompted by this observation, we introduce PEAR, the first human-in-the-loop approach capable of providing personalized algorithmic recourse tailored to the needs of any end-user. PEAR builds on insights from Bayesian Preference Elicitation to iteratively refine an estimate of the costs of actions by asking choice set queries to the target user. The queries themselves are computed by maximizing the Expected Utility of Selection, a principled measure of information gain accounting for uncertainty on both the cost estimate and the user's responses. PEAR integrates elicitation into a Reinforcement Learning agent coupled with Monte Carlo Tree Search to quickly identify promising recourse plans. Our empirical evaluation on real-world datasets highlights how PEAR produces high-quality personalized recourse in only a handful of iterations.
Space Net Optimization
Tsai, Chun-Wei, Yang, Yi-Cheng, Tang, Tzu-Chieh, Hsu, Che-Wei
Most metaheuristic algorithms rely on a few searched solutions to guide later searches during the convergence process for a simple reason: the limited computing resource of a computer makes it impossible to retain all the searched solutions. This also reveals that each search of most metaheuristic algorithms is just like a ballpark guess. To help address this issue, we present a novel metaheuristic algorithm called space net optimization (SNO). It is equipped with a new mechanism called space net; thus, making it possible for a metaheuristic algorithm to use most information provided by all searched solutions to depict the landscape of the solution space. With the space net, a metaheuristic algorithm is kind of like having a ``vision'' on the solution space. Simulation results show that SNO outperforms all the other metaheuristic algorithms compared in this study for a set of well-known single objective bound constrained problems in most cases.
Symmetry-Aware Robot Design with Structured Subgroups
Dong, Heng, Zhang, Junyu, Wang, Tonghan, Zhang, Chongjie
Robot design aims at learning to create robots that can be easily controlled and perform tasks efficiently. Previous works on robot design have proven its ability to generate robots for various tasks. However, these works searched the robots directly from the vast design space and ignored common structures, resulting in abnormal robots and poor performance. To tackle this problem, we propose a Symmetry-Aware Robot Design (SARD) framework that exploits the structure of the design space by incorporating symmetry searching into the robot design process. Specifically, we represent symmetries with the subgroups of the dihedral group and search for the optimal symmetry in structured subgroups. Then robots are designed under the searched symmetry. In this way, SARD can design efficient symmetric robots while covering the original design space, which is theoretically analyzed. We further empirically evaluate SARD on various tasks, and the results show its superior efficiency and generalizability.
Look-Ahead Task Offloading for Multi-User Mobile Augmented Reality in Edge-Cloud Computing
Mobile augmented reality (MAR) blends a real scenario with overlaid virtual content, which has been envisioned as one of the ubiquitous interfaces to the Metaverse. Due to the limited computing power and battery life of MAR devices, it is common to offload the computation tasks to edge or cloud servers in close proximity. However, existing offloading solutions developed for MAR tasks suffer from high migration overhead, poor scalability, and short-sightedness when applied in provisioning multi-user MAR services. To address these issues, a MAR service-oriented task offloading scheme is designed and evaluated in edge-cloud computing networks. Specifically, the task interdependency of MAR applications is firstly analyzed and modeled by using directed acyclic graphs. Then, we propose a look-ahead offloading scheme based on a modified Monte Carlo tree (MMCT) search, which can run several multi-step executions in advance to get an estimate of the long-term effect of immediate action. Experiment results show that the proposed offloading scheme can effectively improve the quality of service (QoS) in provisioning multi-user MAR services, compared to four benchmark schemes. Furthermore, it is also shown that the proposed solution is stable and suitable for applications in a highly volatile environment.
Local Branching Relaxation Heuristics for Integer Linear Programs
Huang, Taoan, Ferber, Aaron, Tian, Yuandong, Dilkina, Bistra, Steiner, Benoit
Large Neighborhood Search (LNS) is a popular heuristic algorithm for solving combinatorial optimization problems (COP). It starts with an initial solution to the problem and iteratively improves it by searching a large neighborhood around the current best solution. LNS relies on heuristics to select neighborhoods to search in. In this paper, we focus on designing effective and efficient heuristics in LNS for integer linear programs (ILP) since a wide range of COPs can be represented as ILPs. Local Branching (LB) is a heuristic that selects the neighborhood that leads to the largest improvement over the current solution in each iteration of LNS. LB is often slow since it needs to solve an ILP of the same size as input. Our proposed heuristics, LB-RELAX and its variants, use the linear programming relaxation of LB to select neighborhoods. Empirically, LB-RELAX and its variants compute as effective neighborhoods as LB but run faster. They achieve state-of-the-art anytime performance on several ILP benchmarks.
FusionRetro: Molecule Representation Fusion via In-Context Learning for Retrosynthetic Planning
Liu, Songtao, Tu, Zhengkai, Xu, Minkai, Zhang, Zuobai, Lin, Lu, Ying, Rex, Tang, Jian, Zhao, Peilin, Wu, Dinghao
Retrosynthetic planning aims to devise a complete multi-step synthetic route from starting materials to a target molecule. Current strategies use a decoupled approach of single-step retrosynthesis models and search algorithms, taking only the product as the input to predict the reactants for each planning step and ignoring valuable context information along the synthetic route. In this work, we propose a novel framework that utilizes context information for improved retrosynthetic planning. We view synthetic routes as reaction graphs and propose to incorporate context through three principled steps: encode molecules into embeddings, aggregate information over routes, and readout to predict reactants. Our approach is the first attempt to utilize in-context learning for retrosynthesis prediction in retrosynthetic planning. The entire framework can be efficiently optimized in an end-to-end fashion and produce more practical and accurate predictions. Comprehensive experiments demonstrate that by fusing in the context information over routes, our model significantly improves the performance of retrosynthetic planning over baselines that are not context-aware, especially for long synthetic routes. Code is available at https://github.com/SongtaoLiu0823/FusionRetro.
Multi-armed bandits for resource efficient, online optimization of language model pre-training: the use case of dynamic masking
Urteaga, Iรฑigo, Draรฏdia, Moulay-Zaรฏdane, Lancewicki, Tomer, Khadivi, Shahram
We design and evaluate a Bayesian optimization framework for resource efficient pre-training of Transformer-based language models (TLMs). TLM pre-training requires high computational resources and introduces many unresolved design choices, such as selecting its pre-training hyperparameters. We propose a multi-armed bandit framework for the sequential selection of TLM pre-training hyperparameters, aimed at optimizing language model performance, in a resource efficient manner. We design a Thompson sampling algorithm, with a surrogate Gaussian process reward model of the Masked Language Model (MLM) pre-training objective, for its sequential minimization. Instead of MLM pre-training with fixed masking probabilities, the proposed Gaussian process-based Thompson sampling (GP-TS) accelerates pre-training by sequentially selecting masking hyperparameters that improve performance. We empirically demonstrate how GP-TS pre-trains language models efficiently, i.e., it achieves lower MLM loss in fewer epochs, across a variety of settings. In addition, GP-TS pre-trained TLMs attain competitive downstream performance, while avoiding expensive hyperparameter grid search. GP-TS provides an interactive framework for efficient and optimized TLM pre-training that, by circumventing costly hyperparameter selection, enables substantial computational savings.
Unified Information Dynamic Analysis of Quantum Decision-Making and Search Algorithms: Computational Intelligence Measure
Ulyanov, Sergey V., Ghisi, Fabio, Kurawaki, Ichiro, Ulyanov, Viktor S.
There are important algorithms built upon a mixture of basic techniques described; for example, the Fast Fourier Transform (FFT) employs both Divide - and - Conquer and Transform - and - Conquer techniques. In this article, the evolution of a quantum algorithm (QA) is examined from a n information theory viewpoint. The complex vector entering the quantum algorithmic gate - QAG is considered as an information source both from the classical and the quantum level. The analysis of the classical and quantum information flow in Deutsch - Jozsa, Shor and Grover algorithm s is used. It is shown that QAG, based on superposition of states, quantum entanglement and interference, when acting on the input vector, stores information into the system state, minimizing the gap between classical Shannon ent ropy and quantum von Neumann entropy. Minimizing of the gap between Shannon and von Neumann entropies is considered as a termination criterion of QA computational intelligence measure. Let us discuss the main properties of classical and quantum information that in dynamic analysis of quantum algorithms are used. Additional necessary detail description of general properties of information amounts in Appendix 1 to this article is given. Any c omputation (both classical and quantum) is formally identical to a communication in time. By considering quantum computation as a communication process, it is possible to relate its efficiency to its classical communication capacity. At time, the programmer sets the computer to accomplish any one of several possible tasks. Each of these tasks can be regarded as embodying a different message. Another programmer can obtain this message by looking at the output of the computer when th e computation is finished at time . Computation based on quantum principles allows for more efficient algorithms for solving certain problems than algorithms based on pure classical principles [ 1 ]. The sender conveys the maximum information when all the message states have equal a priori probability (which also maximizes the channel capacity). In that case the mutual information (channel capacity) at the end of the computation is . Let us consider any peculiarities of information axioms and information capability of quantum computing as the dynamic evolution of QAs. If one breaks down the general unitary transformation of a QA into a number of successive unitary blocks, then the maximum capacity may be achieved only after the number of applications of the blocks. When its total value reaches the maximum possible value consistent with a given initial state o f the quantum computing, the computation is regarded as being complete (see, in details [ 2,3 ]). The classical capacity of a quantum communication channel is connected with the efficiency of quantum computing using entropic arguments [ 1 - 9 ]. This formalism allows us to derive lower bounds on the computational complexity of QA's in the most general context.
Efficient Stochastic Approximation of Minimax Excess Risk Optimization
While traditional distributionally robust optimization (DRO) aims to minimize the maximal risk over a set of distributions, Agarwal and Zhang (2022) recently proposed a variant that replaces risk with excess risk. Compared to DRO, the new formulation -- minimax excess risk optimization (MERO) has the advantage of suppressing the effect of heterogeneous noise in different distributions. However, the choice of excess risk leads to a very challenging minimax optimization problem, and currently there exists only an inefficient algorithm for empirical MERO. In this paper, we develop efficient stochastic approximation approaches which directly target MERO. Specifically, we leverage techniques from stochastic convex optimization to estimate the minimal risk of every distribution, and solve MERO as a stochastic convex-concave optimization (SCCO) problem with biased gradients. The presence of bias makes existing theoretical guarantees of SCCO inapplicable, and fortunately, we demonstrate that the bias, caused by the estimation error of the minimal risk, is under-control. Thus, MERO can still be optimized with a nearly optimal convergence rate. Moreover, we investigate a practical scenario where the quantity of samples drawn from each distribution may differ, and propose a stochastic approach that delivers distribution-dependent convergence rates.