Optimization
Two-step Lookahead Bayesian Optimization with Inequality Constraints
Zhang, Yunxiang, Zhang, Xiangyu, Frazier, Peter I.
Recent advances in computationally efficient non-myopic Bayesian optimization (BO) improve query efficiency over traditional myopic methods like expected improvement while only modestly increasing computational cost. These advances have been largely limited, however, to unconstrained optimization. For constrained optimization, the few existing non-myopic BO methods require heavy computation. For instance, one existing non-myopic constrained BO method [Lam and Willcox, 2017] relies on computationally expensive unreliable brute-force derivative-free optimization of a Monte Carlo rollout acquisition function. Methods that use the reparameterization trick for more efficient derivative-based optimization of non-myopic acquisition functions in the unconstrained setting, like sample average approximation and infinitesimal perturbation analysis, do not extend: constraints introduce discontinuities in the sampled acquisition function surface that hinder its optimization. Moreover, we argue here that being non-myopic is even more important in constrained problems because fear of violating constraints pushes myopic methods away from sampling the boundary between feasible and infeasible regions, slowing the discovery of optimal solutions with tight constraints. In this paper, we propose a computationally efficient two-step lookahead constrained Bayesian optimization acquisition function (2-OPT-C) supporting both sequential and batch settings. To enable fast acquisition function optimization, we develop a novel likelihood-ratio-based unbiased estimator of the gradient of the two-step optimal acquisition function that does not use the reparameterization trick. In numerical experiments, 2-OPT-C typically improves query efficiency by 2x or more over previous methods, and in some cases by 10x or more.
BCD Nets: Scalable Variational Approaches for Bayesian Causal Discovery
Cundy, Chris, Grover, Aditya, Ermon, Stefano
A structural equation model (SEM) is an effective framework to reason over causal relationships represented via a directed acyclic graph (DAG). Recent advances have enabled effective maximum-likelihood point estimation of DAGs from observational data. However, a point estimate may not accurately capture the uncertainty in inferring the underlying graph in practical scenarios, wherein the true DAG is non-identifiable and/or the observed dataset is limited. We propose Bayesian Causal Discovery Nets (BCD Nets), a variational inference framework for estimating a distribution over DAGs characterizing a linear-Gaussian SEM. Developing a full Bayesian posterior over DAGs is challenging due to the the discrete and combinatorial nature of graphs. We analyse key design choices for scalable VI over DAGs, such as 1) the parametrization of DAGs via an expressive variational family, 2) a continuous relaxation that enables low-variance stochastic optimization, and 3) suitable priors over the latent variables. We provide a series of experiments on real and synthetic data showing that BCD Nets outperform maximum-likelihood methods on standard causal discovery metrics such as structural Hamming distance in low data regimes.
Active Sensing for Search and Tracking: A Review
Varotto, Luca, Cenedese, Angelo, Cavallaro, Andrea
Active Position Estimation (APE) is the task of localizing one or more targets using one or more sensing platforms. APE is a key task for search and rescue missions, wildlife monitoring, source term estimation, and collaborative mobile robotics. Success in APE depends on the level of cooperation of the sensing platforms, their number, their degrees of freedom and the quality of the information gathered. APE control laws enable active sensing by satisfying either pure-exploitative or pure-explorative criteria. The former minimizes the uncertainty on position estimation; whereas the latter drives the platform closer to its task completion. In this paper, we define the main elements of APE to systematically classify and critically discuss the state of the art in this domain. We also propose a reference framework as a formalism to classify APE-related solutions. Overall, this survey explores the principal challenges and envisages the main research directions in the field of autonomous perception systems for localization tasks. It is also beneficial to promote the development of robust active sensing methods for search and tracking applications.
Towards the One Learning Algorithm Hypothesis: A System-theoretic Approach
Mavridis, Christos, Baras, John
The existence of a universal learning architecture in human cognition is a widely spread conjecture supported by experimental findings from neuroscience. While no low-level implementation can be specified yet, an abstract outline of human perception and learning is believed to entail three basic properties: (a) hierarchical attention and processing, (b) memory-based knowledge representation, and (c) progressive learning and knowledge compaction. We approach the design of such a learning architecture from a system-theoretic viewpoint, developing a closed-loop system with three main components: (i) a multi-resolution analysis pre-processor, (ii) a group-invariant feature extractor, and (iii) a progressive knowledge-based learning module. Multi-resolution feedback loops are used for learning, i.e., for adapting the system parameters to online observations. To design (i) and (ii), we build upon the established theory of wavelet-based multi-resolution analysis and the properties of group convolution operators. Regarding (iii), we introduce a novel learning algorithm that constructs progressively growing knowledge representations in multiple resolutions. The proposed algorithm is an extension of the Online Deterministic Annealing (ODA) algorithm based on annealing optimization, solved using gradient-free stochastic approximation. ODA has inherent robustness and regularization properties and provides a means to progressively increase the complexity of the learning model i.e. the number of the neurons, as needed, through an intuitive bifurcation phenomenon. The proposed multi-resolution approach is hierarchical, progressive, knowledge-based, and interpretable. We illustrate the properties of the proposed architecture in the context of the state-of-the-art learning algorithms and deep learning methods.
Learning to Search in Local Branching
Liu, Defeng, Fischetti, Matteo, Lodi, Andrea
Finding high-quality solutions to mixed-integer linear programming problems (MILPs) is of great importance for many practical applications. In this respect, the refinement heuristic local branching (LB) has been proposed to produce improving solutions and has been highly influential for the development of local search methods in MILP. The algorithm iteratively explores a sequence of solution neighborhoods defined by the so-called local branching constraint, namely, a linear inequality limiting the distance from a reference solution. For a LB algorithm, the choice of the neighborhood size is critical to performance. Although it was initialized by a conservative value in the original LB scheme, our new observation is that the best size is strongly dependent on the particular MILP instance. In this work, we investigate the relation between the size of the search neighborhood and the behavior of the underlying LB algorithm, and we devise a leaning based framework for guiding the neighborhood search of the LB heuristic. The framework consists of a two-phase strategy. For the first phase, a scaled regression model is trained to predict the size of the LB neighborhood at the first iteration through a regression task. In the second phase, we leverage reinforcement learning and devise a reinforced neighborhood search strategy to dynamically adapt the size at the subsequent iterations. We computationally show that the neighborhood size can indeed be learned, leading to improved performances and that the overall algorithm generalizes well both with respect to the instance size and, remarkably, across instances.
Adaptive Group Collaborative Artificial Bee Colony Algorithm
Wang, Haiquan, Hans-DietrichHaasis, null, Du, Panpan, Xu, Xiaobin, Su, Menghao, Wen, Shengjun, Yue, Wenxuan, Zhang, Shanshan
As an effective algorithm for solving complex optimization problems, artificial bee colony (ABC) algorithm has shown to be competitive, but the same as other population-based algorithms, it is poor at balancing the abilities of global searching in the whole solution space (named as exploration) and quick searching in local solution space which is defined as exploitation. For improving the performance of ABC, an adaptive group collaborative ABC (AgABC) algorithm is introduced where the population in different phases is divided to specific groups and different search strategies with different abilities are assigned to the members in groups, and the member or strategy which obtains the best solution will be employed for further searching. Experimental results on benchmark functions show that the proposed algorithm with dynamic mechanism is superior to other algorithms in searching accuracy and stability. Furthermore, numerical experiments show that the proposed method can generate the optimal solution for the complex scheduling problem.
Breaking the Convergence Barrier: Optimization via Fixed-Time Convergent Flows
Budhraja, Param, Baranwal, Mayank, Garg, Kunal, Hota, Ashish
Accelerated gradient methods are the cornerstones of large-scale, data-driven optimization problems that arise naturally in machine learning and other fields concerning data analysis. We introduce a gradient-based optimization framework for achieving acceleration, based on the recently introduced notion of fixed-time stability of dynamical systems. The method presents itself as a generalization of simple gradient-based methods suitably scaled to achieve convergence to the optimizer in a fixed-time, independent of the initialization. We achieve this by first leveraging a continuous-time framework for designing fixed-time stable dynamical systems, and later providing a consistent discretization strategy, such that the equivalent discrete-time algorithm tracks the optimizer in a practically fixed number of iterations. We also provide a theoretical analysis of the convergence behavior of the proposed gradient flows, and their robustness to additive disturbances for a range of functions obeying strong convexity, strict convexity, and possibly nonconvexity but satisfying the Polyak-{\L}ojasiewicz inequality. We also show that the regret bound on the convergence rate is constant by virtue of the fixed-time convergence. The hyperparameters have intuitive interpretations and can be tuned to fit the requirements on the desired convergence rates. We validate the accelerated convergence properties of the proposed schemes on a range of numerical examples against the state-of-the-art optimization algorithms. Our work provides insights on developing novel optimization algorithms via discretization of continuous-time flows.
GLAMR: Global Occlusion-Aware Human Mesh Recovery with Dynamic Cameras
Yuan, Ye, Iqbal, Umar, Molchanov, Pavlo, Kitani, Kris, Kautz, Jan
We present an approach for 3D global human mesh recovery from monocular videos recorded with dynamic cameras. Our approach is robust to severe and long-term occlusions and tracks human bodies even when they go outside the camera's field of view. To achieve this, we first propose a deep generative motion infiller, which autoregressively infills the body motions of occluded humans based on visible motions. Additionally, in contrast to prior work, our approach reconstructs human meshes in consistent global coordinates even with dynamic cameras. Since the joint reconstruction of human motions and camera poses is underconstrained, we propose a global trajectory predictor that generates global human trajectories based on local body movements. Using the predicted trajectories as anchors, we present a global optimization framework that refines the predicted trajectories and optimizes the camera poses to match the video evidence such as 2D keypoints. Experiments on challenging indoor and in-the-wild datasets with dynamic cameras demonstrate that the proposed approach outperforms prior methods significantly in terms of motion infilling and global mesh recovery.
Constrained Machine Learning: The Bagel Framework
Perez, Guillaume, Ament, Sebastian, Gomes, Carla, Lallouet, Arnaud
Machine learning models are widely used for real-world applications, such as document analysis and vision. Constrained machine learning problems are problems where learned models have to both be accurate and respect constraints. For continuous convex constraints, many works have been proposed, but learning under combinatorial constraints is still a hard problem. The goal of this paper is to broaden the modeling capacity of constrained machine learning problems by incorporating existing work from combinatorial optimization. We propose first a general framework called BaGeL (Branch, Generate and Learn) which applies Branch and Bound to constrained learning problems where a learning problem is generated and trained at each node until only valid models are obtained. Because machine learning has specific requirements, we also propose an extended table constraint to split the space of hypotheses.
Bayesian Optimization over Permutation Spaces
Deshwal, Aryan, Belakaria, Syrine, Doppa, Janardhan Rao, Kim, Dae Hyun
Optimizing expensive to evaluate black-box functions over an input space consisting of all permutations of d objects is an important problem with many real-world applications. For example, placement of functional blocks in hardware design to optimize performance via simulations. The overall goal is to minimize the number of function evaluations to find high-performing permutations. The key challenge in solving this problem using the Bayesian optimization (BO) framework is to trade-off the complexity of statistical model and tractability of acquisition function optimization. In this paper, we propose and evaluate two algorithms for BO over Permutation Spaces (BOPS). First, BOPS-T employs Gaussian process (GP) surrogate model with Kendall kernels and a Tractable acquisition function optimization approach based on Thompson sampling to select the sequence of permutations for evaluation. Second, BOPS-H employs GP surrogate model with Mallow kernels and a Heuristic search approach to optimize expected improvement acquisition function. We theoretically analyze the performance of BOPS-T to show that their regret grows sub-linearly. Our experiments on multiple synthetic and real-world benchmarks show that both BOPS-T and BOPS-H perform better than the state-of-the-art BO algorithm for combinatorial spaces. To drive future research on this important problem, we make new resources and real-world benchmarks available to the community.