Optimization
Inference and Optimization for Engineering and Physical Systems
The central object of this PhD thesis is known under different names in the fields of computer science and statistical mechanics. In computer science, it is called the Maximum Cut problem, one of the famous twenty-one Karp's original NP-hard problems, while the same object from Physics is called the Ising Spin Glass model. This model of a rich structure often appears as a reduction or reformulation of real-world problems from computer science, physics and engineering. However, solving this model exactly (finding the maximal cut or the ground state) is likely to stay an intractable problem (unless $\textit{P} = \textit{NP}$) and requires the development of ad-hoc heuristics for every particular family of instances. One of the bright and beautiful connections between discrete and continuous optimization is a Semidefinite Programming-based rounding scheme for Maximum Cut. This procedure allows us to find a provably near-optimal solution; moreover, this method is conjectured to be the best possible in polynomial time. In the first two chapters of this thesis, we investigate local non-convex heuristics intended to improve the rounding scheme. In the last chapter of this thesis, we make one step further and aim to control the solution of the problem we wanted to solve in previous chapters. We formulate a bi-level optimization problem over the Ising model where we want to tweak the interactions as little as possible so that the ground state of the resulting Ising model satisfies the desired criteria. This kind of problem arises in pandemic modeling. We show that when the interactions are non-negative, our bi-level optimization is solvable in polynomial time using convex programming.
Generalization In Multi-Objective Machine Learning
Sรบkenรญk, Peter, Lampert, Christoph H.
Modern machine learning tasks often require considering not just one but multiple objectives. For example, besides the prediction quality, this could be the efficiency, robustness or fairness of the learned models, or any of their combinations. Multi-objective learning offers a natural framework for handling such problems without having to commit to early trade-offs. Surprisingly, statistical learning theory so far offers almost no insight into the generalization properties of multi-objective learning. In this work, we make first steps to fill this gap: we establish foundational generalization bounds for the multi-objective setting as well as generalization and excess bounds for learning with scalarizations. We also provide the first theoretical analysis of the relation between the Pareto-optimal sets of the true objectives and the Pareto-optimal sets of their empirical approximations from training data. In particular, we show a surprising asymmetry: all Pareto-optimal solutions can be approximated by empirically Pareto-optimal ones, but not vice versa.
The case for fully Bayesian optimisation in small-sample trials
While sample efficiency is the main motive for use of Bayesian optimisation when black-box functions are expensive to evaluate, the standard approach based on type II maximum likelihood (ML-II) may fail and result in disappointing performance in small-sample trials. The paper provides three compelling reasons to adopt fully Bayesian optimisation (FBO) as an alternative. First, failures of ML-II are more commonplace than implied by the existing studies using the contrived settings. Second, FBO is more robust than ML-II, and the price of robustness is almost trivial. Third, FBO has become simple to implement and fast enough to be practical. The paper supports the argument using relevant experiments, which reflect the current practice regarding models, algorithms, and software platforms. Since the benefits seem to outweigh the costs, researchers should consider adopting FBO for their applications so that they can guard against potential failures that end up wasting precious research resources.
Selection of a representative sorting model in a preference disaggregation setting: a review of existing procedures, new proposals, and experimental comparison
Wรณjcik, Michaล, Kadziลski, Miลosz, Ciomek, Krzysztof
We consider preference disaggregation in the context of multiple criteria sorting. The value function parameters and thresholds separating the classes are inferred from the Decision Maker's (DM's) assignment examples. Given the multiplicity of sorting models compatible with indirect preferences, selecting a single, representative one can be conducted differently. We review several procedures for this purpose, aiming to identify the most discriminant, average, central, benevolent, aggressive, parsimonious, or robust models. Also, we present three novel procedures that implement the robust assignment rule in practice. They exploit stochastic acceptabilities and maximize the support given to the resulting assignments by all feasible sorting models. The performance of sixteen procedures is verified on problem instances with different complexities. The results of an experimental study indicate the most efficient procedure in terms of classification accuracy, reproducing the DM's model, and delivering the most robust assignments. These include approaches identifying differently interpreted centers of the feasible polyhedron and robust methods introduced in this paper. Moreover, we discuss how the performance of all procedures is affected by different numbers of classes, criteria, characteristic points, and reference assignments. Finally, we illustrate the use of all approaches in a study concerning the assessment of the green performance of European cities.
Maximum-Likelihood Quantum State Tomography by Soft-Bayes
Lin, Chien-Ming, Hsu, Yu-Ming, Li, Yen-Huan
Quantum state tomography (QST), the task of estimating an unknown quantum state given measurement outcomes, is essential to building reliable quantum computing devices. Whereas computing the maximum-likelihood (ML) estimate corresponds to solving a finite-sum convex optimization problem, the objective function is not smooth nor Lipschitz, so most existing convex optimization methods lack sample complexity guarantees; moreover, both the sample size and dimension grow exponentially with the number of qubits in a QST experiment, so a desired algorithm should be highly scalable with respect to the dimension and sample size, just like stochastic gradient descent. In this paper, we propose a stochastic first-order algorithm that computes an $\varepsilon$-approximate ML estimate in $O( ( D \log D ) / \varepsilon ^ 2 )$ iterations with $O( D^3 )$ per-iteration time complexity, where $D$ denotes the dimension of the unknown quantum state and $\varepsilon$ denotes the optimization error. Our algorithm is an extension of Soft-Bayes to the quantum setup.
Benchmark Results for Bookshelf Organization Problem as Mixed Integer Nonlinear Program with Mode Switch and Collision Avoidance
Lin, Xuan, Fernandez, Gabriel I., Hong, Dennis W.
Mixed integer convex and nonlinear programs, MICP and MINLP, are expressive but require long solving times. Recent work that combines data-driven methods on solver heuristics has shown potential to overcome this issue allowing for applications on larger scale practical problems. To solve mixed-integer bilinear programs online with data-driven methods, several formulations exist including mathematical programming with complementary constraints (MPCC), mixed-integer programming (MIP). In this work, we benchmark the performances of those data-driven schemes on a bookshelf organization problem that has discrete mode switch and collision avoidance constraints. The success rate, optimal cost and solving time are compared along with non-data-driven methods. Our proposed methods are demonstrated as a high level planner for a robotic arm for the bookshelf problem.
On the Universal Transformation of Data-Driven Models to Control Systems
Peitz, Sebastian, Bieker, Katharina
The advances in data science and machine learning have resulted in significant improvements regarding the modeling and simulation of nonlinear dynamical systems. It is nowadays possible to make accurate predictions of complex systems such as the weather, disease models or the stock market. Predictive methods are often advertised to be useful for control, but the specifics are frequently left unanswered due to the higher system complexity, the requirement of larger data sets and an increased modeling effort. In other words, surrogate modeling for autonomous systems is much easier than for control systems. In this paper we present the framework QuaSiModO (Quantization-Simulation-Modeling-Optimization) to transform arbitrary predictive models into control systems and thus render the tremendous advances in data-driven surrogate modeling accessible for control. Our main contribution is that we trade control efficiency by autonomizing the dynamics - which yields mixed-integer control problems - to gain access to arbitrary, ready-to-use autonomous surrogate modeling techniques. We then recover the complexity of the original problem by leveraging recent results from mixed-integer optimization. The advantages of QuaSiModO are a linear increase in data requirements with respect to the control dimension, performance guarantees that rely exclusively on the accuracy of the predictive model in use, and little prior knowledge requirements in control theory to solve complex control problems.
Variance Reduced EXTRA and DIGing and Their Optimal Acceleration for Strongly Convex Decentralized Optimization
Li, Huan, Lin, Zhouchen, Fang, Yongchun
We study stochastic decentralized optimization for the problem of training machine learning models with large-scale distributed data. We extend the widely used EXTRA and DIGing methods with variance reduction (VR), and propose two methods: VR-EXTRA and VR-DIGing. The proposed VR-EXTRA requires the time of $O((\kappa_s+n)\log\frac{1}{\epsilon})$ stochastic gradient evaluations and $O((\kappa_b+\kappa_c)\log\frac{1}{\epsilon})$ communication rounds to reach precision $\epsilon$, which are the best complexities among the non-accelerated gradient-type methods, where $\kappa_s$ and $\kappa_b$ are the stochastic condition number and batch condition number for strongly convex and smooth problems, respectively, $\kappa_c$ is the condition number of the communication network, and $n$ is the sample size on each distributed node. The proposed VR-DIGing has a little higher communication cost of $O((\kappa_b+\kappa_c^2)\log\frac{1}{\epsilon})$. Our stochastic gradient computation complexities are the same as the ones of single-machine VR methods, such as SAG, SAGA, and SVRG, and our communication complexities keep the same as those of EXTRA and DIGing, respectively. To further speed up the convergence, we also propose the accelerated VR-EXTRA and VR-DIGing with both the optimal $O((\sqrt{n\kappa_s}+n)\log\frac{1}{\epsilon})$ stochastic gradient computation complexity and $O(\sqrt{\kappa_b\kappa_c}\log\frac{1}{\epsilon})$ communication complexity. Our stochastic gradient computation complexity is also the same as the ones of single-machine accelerated VR methods, such as Katyusha, and our communication complexity keeps the same as those of accelerated full batch decentralized methods, such as MSDA.
An Empirical Study on the Usage of Automated Machine Learning Tools
Majidi, Forough, Openja, Moses, Khomh, Foutse, Li, Heng
The popularity of automated machine learning (AutoML) tools in different domains has increased over the past few years. Machine learning (ML) practitioners use AutoML tools to automate and optimize the process of feature engineering, model training, and hyperparameter optimization and so on. Recent work performed qualitative studies on practitioners' experiences of using AutoML tools and compared different AutoML tools based on their performance and provided features, but none of the existing work studied the practices of using AutoML tools in real-world projects at a large scale. Therefore, we conducted an empirical study to understand how ML practitioners use AutoML tools in their projects. To this end, we examined the top 10 most used AutoML tools and their respective usages in a large number of open-source project repositories hosted on GitHub. The results of our study show 1) which AutoML tools are mostly used by ML practitioners and 2) the characteristics of the repositories that use these AutoML tools. Also, we identified the purpose of using AutoML tools (e.g. model parameter sampling, search space management, model evaluation/error-analysis, Data/ feature transformation, and data labeling) and the stages of the ML pipeline (e.g. feature engineering) where AutoML tools are used. Finally, we report how often AutoML tools are used together in the same source code files. We hope our results can help ML practitioners learn about different AutoML tools and their usages, so that they can pick the right tool for their purposes. Besides, AutoML tool developers can benefit from our findings to gain insight into the usages of their tools and improve their tools to better fit the users' usages and needs.
A Comprehensive Review of Digital Twin -- Part 2: Roles of Uncertainty Quantification and Optimization, a Battery Digital Twin, and Perspectives
Thelen, Adam, Zhang, Xiaoge, Fink, Olga, Lu, Yan, Ghosh, Sayan, Youn, Byeng D., Todd, Michael D., Mahadevan, Sankaran, Hu, Chao, Hu, Zhen
As an emerging technology in the era of Industry 4.0, digital twin is gaining unprecedented attention because of its promise to further optimize process design, quality control, health monitoring, decision and policy making, and more, by comprehensively modeling the physical world as a group of interconnected digital models. In a two-part series of papers, we examine the fundamental role of different modeling techniques, twinning enabling technologies, and uncertainty quantification and optimization methods commonly used in digital twins. This second paper presents a literature review of key enabling technologies of digital twins, with an emphasis on uncertainty quantification, optimization methods, open source datasets and tools, major findings, challenges, and future directions. Discussions focus on current methods of uncertainty quantification and optimization and how they are applied in different dimensions of a digital twin. Additionally, this paper presents a case study where a battery digital twin is constructed and tested to illustrate some of the modeling and twinning methods reviewed in this two-part review. Code and preprocessed data for generating all the results and figures presented in the case study are available on GitHub.