Optimization
On efficient global optimization via universal Kriging surrogate models
Palar, Pramudita Satria, Shimoyama, Koji
In this paper, we investigate the capability of the universal Kriging (UK) model for single-objective global optimization applied within an efficient global optimization (EGO) framework. We implemented this combined UK-EGO framework and studied four variants of the UK methods, that is, a UK with a first-order polynomial, a UK with a second-order polynomial, a blind Kriging (BK) implementation from the ooDACE toolbox, and a polynomial-chaos Kriging (PCK) implementation. The UK-EGO framework with automatic trend function selection derived from the BK and PCK models works by building a UK surrogate model and then performing optimizations via expected improvement criteria on the Kriging model with the lowest leave-one-out cross-validation error. Next, we studied and compared the UK-EGO variants and standard EGO using five synthetic test functions and one aerodynamic problem. Our results show that the proper choice for the trend function through automatic feature selection can improve the optimization performance of UK-EGO relative to EGO. From our results, we found that PCK-EGO was the best variant, as it had more robust performance as compared to the rest of the UK-EGO schemes; however, total-order expansion should be used to generate the candidate trend function set for high-dimensional problems. Note that, for some test functions, the UK with predetermined polynomial trend functions performed better than that of BK and PCK, indicating that the use of automatic trend function selection does not always lead to the best quality solutions. We also found that although some variants of UK are not as globally accurate as the ordinary Kriging (OK), they can still identify better-optimized solutions due to the addition of the trend function, which helps the optimizer locate the global optimum.
Broad Learning for Healthcare
A broad spectrum of data from different modalities are generated in the healthcare domain every day, including scalar data (e.g., clinical measures collected at hospitals), tensor data (e.g., neuroimages analyzed by research institutes), graph data (e.g., brain connectivity networks), and sequence data (e.g., digital footprints recorded on smart sensors). Capability for modeling information from these heterogeneous data sources is potentially transformative for investigating disease mechanisms and for informing therapeutic interventions. Our works in this thesis attempt to facilitate healthcare applications in the setting of broad learning which focuses on fusing heterogeneous data sources for a variety of synergistic knowledge discovery and machine learning tasks. We are generally interested in computer-aided diagnosis, precision medicine, and mobile health by creating accurate user profiles which include important biomarkers, brain connectivity patterns, and latent representations. In particular, our works involve four different data mining problems with application to the healthcare domain: multi-view feature selection, subgraph pattern mining, brain network embedding, and multi-view sequence prediction.
Bayesian Optimization with Expensive Integrands
Toscano-Palmerin, Saul, Frazier, Peter I.
We propose a Bayesian optimization algorithm for objective functions that are sums or integrals of expensive-to-evaluate functions, allowing noisy evaluations. These objective functions arise in multi-task Bayesian optimization for tuning machine learning hyperparameters, optimization via simulation, and sequential design of experiments with random environmental conditions. Our method is average-case optimal by construction when a single evaluation of the integrand remains within our evaluation budget. Achieving this one-step optimality requires solving a challenging value of information optimization problem, for which we provide a novel efficient discretization-free computational method. We also provide consistency proofs for our method in both continuum and discrete finite domains for objective functions that are sums. In numerical experiments comparing against previous state-of-the-art methods, including those that also leverage sum or integral structure, our method performs as well or better across a wide range of problems and offers significant improvements when evaluations are noisy or the integrand varies smoothly in the integrated variables.
Optimization of Smooth Functions with Noisy Observations: Local Minimax Rates
Wang, Yining, Balakrishnan, Sivaraman, Singh, Aarti
We consider the problem of global optimization of an unknown non-convex smooth function with zeroth-order feedback. In this setup, an algorithm is allowed to adaptively query the underlying function at different locations and receives noisy evaluations of function values at the queried points (i.e. the algorithm has access to zeroth-order information). Optimization performance is evaluated by the expected difference of function values at the estimated optimum and the true optimum. In contrast to the classical optimization setup, first-order information like gradients are not directly accessible to the optimization algorithm. We show that the classical minimax framework of analysis, which roughly characterizes the worst-case query complexity of an optimization algorithm in this setting, leads to excessively pessimistic results. We propose a local minimax framework to study the fundamental difficulty of optimizing smooth functions with adaptive function evaluations, which provides a refined picture of the intrinsic difficulty of zeroth-order optimization. We show that for functions with fast level set growth around the global minimum, carefully designed optimization algorithms can identify a near global minimizer with many fewer queries. For the special case of strongly convex and smooth functions, our implied convergence rates match the ones developed for zeroth-order convex optimization problems. At the other end of the spectrum, for worst-case smooth functions no algorithm can converge faster than the minimax rate of estimating the entire unknown function in the $\ell_\infty$-norm. We provide an intuitive and efficient algorithm that attains the derived upper error bounds.
Attention Solves Your TSP
We propose a framework for solving combinatorial optimization problems of which the output can be represented as a sequence of input elements. As an alternative to the Pointer Network, we parameterize a policy by a model based entirely on (graph) attention layers, and train it efficiently using REINFORCE with a simple and robust baseline based on a deterministic (greedy) rollout of the best policy found during training. We significantly improve over state-of-the-art results for learning algorithms for the 2D Euclidean TSP, reducing the optimality gap for a single tour construction by more than 75% (to 0.33%) and 50% (to 2.28%) for instances with 20 and 50 nodes respectively.
Resilient Monotone Sequential Maximization
Tzoumas, Vasileios, Jadbabaie, Ali, Pappas, George J.
Applications in machine learning, optimization, and control require the sequential selection of a few system elements, such as sensors, data, or actuators, to optimize the system performance across multiple time steps. However, in failure-prone and adversarial environments, sensors get attacked, data get deleted, and actuators fail. Thence, traditional sequential design paradigms become insufficient and, in contrast, resilient sequential designs that adapt against system-wide attacks, deletions, or failures become important. In general, resilient sequential design problems are computationally hard. Also, even though they often involve objective functions that are monotone and (possibly) submodular, no scalable approximation algorithms are known for their solution. In this paper, we provide the first scalable algorithm, that achieves the following characteristics: system-wide resiliency, i.e., the algorithm is valid for any number of denial-of-service attacks, deletions, or failures; adaptiveness, i.e., at each time step, the algorithm selects system elements based on the history of inflicted attacks, deletions, or failures; and provable approximation performance, i.e., the algorithm guarantees for monotone objective functions a solution close to the optimal. We quantify the algorithm's approximation performance using a notion of curvature for monotone (not necessarily submodular) set functions. Finally, we support our theoretical analyses with simulated experiments, by considering a control-aware sensor scheduling scenario, namely, sensing-constrained robot navigation.
SUCAG: Stochastic Unbiased Curvature-aided Gradient Method for Distributed Optimization
Wai, Hoi-To, Freris, Nikolaos M., Nedic, Angelia, Scaglione, Anna
We propose and analyze a new stochastic gradient method, which we call Stochastic Unbiased Curvature-aided Gra- dient (SUCAG), for finite sum optimization problems. SUCAG constitutes an unbiased total gradient tracking technique that uses Hessian information to accelerate convergence. We an- alyze our method under the general asynchronous model of computation, in which functions are selected infinitely often, but with delays that can grow sublinearly. For strongly convex problems, we establish linear convergence for the SUCAG method. When the initialization point is sufficiently close to the optimal solution, the established convergence rate is only dependent on the condition number of the problem, making it strictly faster than the known rate for the SAGA method. Furthermore, we describe a Markov-driven approach of implementing the SUCAG method in a distributed asynchronous multi-agent setting, via gossiping along a random walk on the communication graph. We show that our analysis applies as long as the undirected graph is connected and, notably, establishes an asymptotic linear convergence rate that is robust to the graph topology. Numerical results demonstrate the merit of our algorithm over existing methods.
Entropy-based closure for probabilistic learning on manifolds
Soizea, C., Ghanem, R., Safta, C., Huan, X., Vane, Z. P., Oefelein, J., Lacaz, G., Najm, H. N., Tang, Q., Chen, X.
In a recent paper, the authors proposed a general methodology for probabilistic learning on manifolds. The method was used to generate numerical samples that are statistically consistent with an existing dataset construed as a realization from a non-Gaussian random vector. The manifold structure is learned using diffusion manifolds and the statistical sample generation is accomplished using a projected Ito stochastic differential equation. This probabilistic learning approach has been extended to polynomial chaos representation of databases on manifolds and to probabilistic nonconvex constrained optimization with a fixed budget of function evaluations. The methodology introduces an isotropic-diffusion kernel with hyperparameter {\epsilon}. Currently, {\epsilon} is more or less arbitrarily chosen. In this paper, we propose a selection criterion for identifying an optimal value of {\epsilon}, based on a maximum entropy argument. The result is a comprehensive, closed, probabilistic model for characterizing data sets with hidden constraints. This entropy argument ensures that out of all possible models, this is the one that is the most uncertain beyond any specified constraints, which is selected. Applications are presented for several databases.
Robust Blind Deconvolution via Mirror Descent
Ravi, Sathya N., Mehta, Ronak, Singh, Vikas
We revisit the Blind Deconvolution problem with a focus on understanding its robustness and convergence properties. Provable robustness to noise and other perturbations is receiving recent interest in vision, from obtaining immunity to adversarial attacks to assessing and describing failure modes of algorithms in mission critical applications. Further, many blind deconvolution methods based on deep architectures internally make use of or optimize the basic formulation, so a clearer understanding of how this sub-module behaves, when it can be solved, and what noise injection it can tolerate is a first order requirement. We derive new insights into the theoretical underpinnings of blind deconvolution. The algorithm that emerges has nice convergence guarantees and is provably robust in a sense we formalize in the paper. Interestingly, these technical results play out very well in practice, where on standard datasets our algorithm yields results competitive with or superior to the state of the art.
Similar Elements and Metric Labeling on Complete Graphs
We consider a problem that involves finding similar elements in a collection of sets. Let X be a (possibly infinite) set and d be a metric on X. Let $* argininx C(LB) be an optimal solution for the similar elements problem. In particular we define n different "star--graph" objective For each 1 S 7" S n define the objective CT(:E) to account only for the terms in C((L') involving 33,1, Let 3:" argminx Cr (:13) be an optimal solution for the optimization problem defined by Cr (x). We can compute x7" efficiently using a simple form of dynamic programming, by first computing (E; Each of the n "star--graph" Objective functions leads to a possible solution. Let G (V, E) be an undirected simple graph on n nodes V {1, . . . Let 33* argininm C(CL') This optimization problem can be solved in polynomial time using For each 7" E V define a different objective function, CT (:13), corresponding to a metric labeling Let SET 2 argininm CT (x).