Optimization
Data-Driven Optimized Tracking Control Heuristic for MIMO Structures: A Balance System Case Study
Wang, Ning, Abouheaf, Mohammed, Gueaieb, Wail
A data-driven computational heuristic is proposed to control MIMO systems without prior knowledge of their dynamics. The heuristic is illustrated on a two-input two-output balance system. It integrates a self-adjusting nonlinear threshold accepting heuristic with a neural network to compromise between the desired transient and steady state characteristics of the system while optimizing a dynamic cost function. The heuristic decides on the control gains of multiple interacting PID control loops. The neural network is trained upon optimizing a weighted-derivative like objective cost function. The performance of the developed mechanism is compared with another controller that employs a combined PID-Riccati approach. One of the salient features of the proposed control schemes is that they do not require prior knowledge of the system dynamics. However, they depend on a known region of stability for the control gains to be used as a search space by the optimization algorithm. The control mechanism is validated using different optimization criteria which address different design requirements.
Revisiting Bayesian Optimization in the light of the COCO benchmark
Riche, Rodolphe Le, Picheny, Victor
It is commonly believed that Bayesian optimization (BO) algorithms are highly efficient for optimizing numerically costly functions. However, BO is not often compared to widely different alternatives, and is mostly tested on narrow sets of problems (multimodal, low-dimensional functions), which makes it difficult to assess where (or if) they actually achieve state-of-the-art performance. Moreover, several aspects in the design of these algorithms vary across implementations without a clear recommendation emerging from current practices, and many of these design choices are not substantiated by authoritative test campaigns. This article reports a large investigation about the effects on the performance of (Gaussian process based) BO of common and less common design choices. The experiments are carried out with the established COCO (COmparing Continuous Optimizers) software. It is found that a small initial budget, a quadratic trend, high-quality optimization of the acquisition criterion bring consistent progress. Using the GP mean as an occasional acquisition contributes to a negligible additional improvement. Warping degrades performance. The Mat\'ern 5/2 kernel is a good default but it may be surpassed by the exponential kernel on irregular functions. Overall, the best EGO variants are competitive or improve over state-of-the-art algorithms in dimensions less or equal to 5 for multimodal functions. The code developed for this study makes the new version (v2.1.1) of the R package DiceOptim available on CRAN. The structure of the experiments by function groups allows to define priorities for future research on Bayesian optimization.
End-to-End Constrained Optimization Learning: A Survey
Kotary, James, Fioretto, Ferdinando, Van Hentenryck, Pascal, Wilder, Bryan
This paper surveys the recent attempts at leveraging machine learning to solve constrained optimization problems. It focuses on surveying the work on integrating combinatorial solvers and optimization methods with machine learning architectures. These approaches hold the promise to develop new hybrid machine learning and optimization methods to predict fast, approximate, solutions to combinatorial problems and to enable structural logical inference. This paper presents a conceptual review of the recent advancements in this emerging area.
Robustifying Conditional Portfolio Decisions via Optimal Transport
Nguyen, Viet Anh, Zhang, Fan, Blanchet, Jose, Delage, Erick, Ye, Yinyu
We propose a data-driven portfolio selection model that integrates side information, conditional estimation and robustness using the framework of distributionally robust optimization. Conditioning on the observed side information, the portfolio manager solves an allocation problem that minimizes the worst-case conditional risk-return trade-off, subject to all possible perturbations of the covariate-return probability distribution in an optimal transport ambiguity set. Despite the non-linearity of the objective function in the probability measure, we show that the distributionally robust portfolio allocation with side information problem can be reformulated as a finite-dimensional optimization problem. If portfolio decisions are made based on either the mean-variance or the mean-Conditional Value-at-Risk criterion, the resulting reformulation can be further simplified to second-order or semi-definite cone programs. Empirical studies in the US and Chinese equity markets demonstrate the advantage of our integrative framework against other benchmarks.
Joint User Association and Power Allocation in Heterogeneous Ultra Dense Network via Semi-Supervised Representation Learning
Zhang, Xiangyu, Zhang, Zhengming, Yang, Luxi
Heterogeneous Ultra-Dense Network (HUDN) is one of the vital networking architectures due to its ability to enable higher connectivity density and ultra-high data rates. However, efficiently managing the wireless resource of HUDNs to reduce the wireless interference faces challenges. In this paper, we tackle this challenge by jointly optimizing user association and power control. The joint user association and power control problem is a typical non-convex problem that is hard and time-consuming to solve by traditional optimization techniques. This paper proposes a novel idea for resolving this question: the optimal user association and Base Station (BS) transmit power can be represented by some network parameters of interest, such as the channel information, the precoding matrices, etc. Then, we solve this problem by transforming it into an optimal representation function learning problem. We model the HUDNs as a heterogeneous graph and train a Graph Neural Network (GNN) to approach this representation function by using semi-supervised learning (SSL), in which the loss function is composed of the unsupervised part that helps the GNN approach the optimal representation function and the supervised part that utilizes the previous experience to reduce useless exploration in the initial phase. Besides, we use the entropy regularization to guarantee the effectiveness of exploration in the configuration space. To embrace both the generalization of the learning algorithm and higher performance of HUDNs, we separate the learning process into two parts, the generalization-representation learning (GRL) part, and the specialization-representation learning (SRL) part. In the GRL part, the GNN learns a representation with a tremendous generalized ability to suit any scenario with different user distributions, which processes offline. Based on the learned GRL representation, the SRL finely turn the parameters of GNN on-line to further improving the performance for quasi-static user distribution. Simulation results demonstrate that the proposed GRL-based solution has higher computational efficiency than the traditional optimization algorithm. Besides, the results also show that the performance of SRL outperforms the GRL.
Learning Under Adversarial and Interventional Shifts
Singh, Harvineet, Joshi, Shalmali, Doshi-Velez, Finale, Lakkaraju, Himabindu
Machine learning models are often trained on data from one distribution and deployed on others. So it becomes important to design models that are robust to distribution shifts. Most of the existing work focuses on optimizing for either adversarial shifts or interventional shifts. Adversarial methods lack expressivity in representing plausible shifts as they consider shifts to joint distributions in the data. Interventional methods allow more expressivity but provide robustness to unbounded shifts, resulting in overly conservative models. In this work, we combine the complementary strengths of the two approaches and propose a new formulation, RISe, for designing robust models against a set of distribution shifts that are at the intersection of adversarial and interventional shifts. We employ the distributionally robust optimization framework to optimize the resulting objective in both supervised and reinforcement learning settings. Extensive experimentation with synthetic and real world datasets from healthcare demonstrate the efficacy of the proposed approach.
Gaussian Process for Tomography
Dasgupta, Agnimitra, Graziani, Carlo, Di, Zichao Wendy
Tomographic imaging refers to the reconstruction of a 3D object from its 2D projections by sectioning the object, through the use of any kind of penetrating wave, from many different directions. It has had a revolutionary impact in a number of fields ranging from biology, physics, and chemistry to astronomy [1, 2]. The technique requires an accurate image reconstruction, however, and the resulting reconstruction problem is an ill-posed optimization problem because of insufficient measurements [3]. A direct consequence of ill-posedness is that the reconstruction does not have a unique solution. Therefore, quantifying the solution quality is challenging, given the absence of ground truth and the presence of measurement noise. Moreover, ill-posedness creates a requirement for regularization that imports new parameters to the problem. Regularization parameter choice can lead to substantial variations in reconstruction, and ascertaining optimal values of such parameters is difficult without availing oneself of ground truth [4]. The transition from an optimization perspective on tomographic inversion to a Bayesian statistical perspective can provide a useful reframing of these issues. In particular, the ill-posedness of the optimization view can be replaced by quantified uncertainty in the statistical view, whereas regularization now appears in the guise of parameter estimation.
Shape-constrained Symbolic Regression -- Improving Extrapolation with Prior Knowledge
Kronberger, Gabriel, de França, Fabricio Olivetti, Burlacu, Bogdan, Haider, Christian, Kommenda, Michael
We investigate the addition of constraints on the function image and its derivatives for the incorporation of prior knowledge in symbolic regression. The approach is called shape-constrained symbolic regression and allows us to enforce e.g. monotonicity of the function over selected inputs. The aim is to find models which conform to expected behaviour and which have improved extrapolation capabilities. We demonstrate the feasibility of the idea and propose and compare two evolutionary algorithms for shape-constrained symbolic regression: i) an extension of tree-based genetic programming which discards infeasible solutions in the selection step, and ii) a two population evolutionary algorithm that separates the feasible from the infeasible solutions. In both algorithms we use interval arithmetic to approximate bounds for models and their partial derivatives. The algorithms are tested on a set of 19 synthetic and four real-world regression problems. Both algorithms are able to identify models which conform to shape constraints which is not the case for the unmodified symbolic regression algorithms. However, the predictive accuracy of models with constraints is worse on the training set and the test set. Shape-constrained polynomial regression produces the best results for the test set but also significantly larger models.
Private Non-smooth Empirical Risk Minimization and Stochastic Convex Optimization in Subquadratic Steps
Kulkarni, Janardhan, Lee, Yin Tat, Liu, Daogao
We study the differentially private Empirical Risk Minimization (ERM) and Stochastic Convex Optimization (SCO) problems for non-smooth convex functions. We get a (nearly) optimal bound on the excess empirical risk and excess population loss with subquadratic gradient complexity. More precisely, our differentially private algorithm requires $O(\frac{N^{3/2}}{d^{1/8}}+ \frac{N^2}{d})$ gradient queries for optimal excess empirical risk, which is achieved with the help of subsampling and smoothing the function via convolution. This is the first subquadratic algorithm for the non-smooth case when $d$ is super constant. As a direct application, using the iterative localization approach of Feldman et al. \cite{fkt20}, we achieve the optimal excess population loss for stochastic convex optimization problem, with $O(\min\{N^{5/4}d^{1/8},\frac{ N^{3/2}}{d^{1/8}}\})$ gradient queries. Our work makes progress towards resolving a question raised by Bassily et al. \cite{bfgt20}, giving first algorithms for private ERM and SCO with subquadratic steps. We note that independently Asi et al. \cite{afkt21} gave other algorithms for private ERM and SCO with subquadratic steps.
Deconfounded Score Method: Scoring DAGs with Dense Unobserved Confounding
Bellot, Alexis, van der Schaar, Mihaela
Unobserved confounding is one of the greatest challenges for causal discovery. The case in which unobserved variables have a widespread effect on many of the observed ones is particularly difficult because most pairs of variables are conditionally dependent given any other subset, rendering the causal effect unidentifiable. In this paper we show that beyond conditional independencies, under the principle of independent mechanisms, unobserved confounding in this setting leaves a statistical footprint in the observed data distribution that allows for disentangling spurious and causal effects. Using this insight, we demonstrate that a sparse linear Gaussian directed acyclic graph among observed variables may be recovered approximately and propose an adjusted score-based causal discovery algorithm that may be implemented with general purpose solvers and scales to high-dimensional problems. We find, in addition, that despite the conditions we pose to guarantee causal recovery, performance in practice is robust to large deviations in model assumptions.