Goto

Collaborating Authors

 Optimization


Local Latin Hypercube Refinement for Multi-objective Design Uncertainty Optimization

arXiv.org Machine Learning

Optimizing the reliability and the robustness of a design is important but often unaffordable due to high sample requirements. Surrogate models based on statistical and machine learning methods are used to increase the sample efficiency. However, for higher dimensional or multi-modal systems, surrogate models may also require a large amount of samples to achieve good results. We propose a sequential sampling strategy for the surrogate based solution of multi-objective reliability based robust design optimization problems. Proposed local Latin hypercube refinement (LoLHR) strategy is model-agnostic and can be combined with any surrogate model because there is no free lunch but possibly a budget one. The proposed method is compared to stationary sampling as well as other proposed strategies from the literature. Gaussian process and support vector regression are both used as surrogate models. Empirical evidence is presented, showing that LoLHR achieves on average better results compared to other surrogate based strategies on the tested examples.


Risk Bounds and Calibration for a Smart Predict-then-Optimize Method

arXiv.org Machine Learning

The predict-then-optimize framework is fundamental in practical stochastic decision-making problems: first predict unknown parameters of an optimization model, then solve the problem using the predicted values. A natural loss function in this setting is defined by measuring the decision error induced by the predicted parameters, which was named the Smart Predict-then-Optimize (SPO) loss by Elmachtoub and Grigas [arXiv:1710.08005]. Since the SPO loss is typically nonconvex and possibly discontinuous, Elmachtoub and Grigas [arXiv:1710.08005] introduced a convex surrogate, called the SPO+ loss, that importantly accounts for the underlying structure of the optimization model. In this paper, we greatly expand upon the consistency results for the SPO+ loss provided by Elmachtoub and Grigas [arXiv:1710.08005]. We develop risk bounds and uniform calibration results for the SPO+ loss relative to the SPO loss, which provide a quantitative way to transfer the excess surrogate risk to excess true risk. By combining our risk bounds with generalization bounds, we show that the empirical minimizer of the SPO+ loss achieves low excess true risk with high probability. We first demonstrate these results in the case when the feasible region of the underlying optimization problem is a polyhedron, and then we show that the results can be strengthened substantially when the feasible region is a level set of a strongly convex function. We perform experiments to empirically demonstrate the strength of the SPO+ surrogate, as compared to standard $\ell_1$ and squared $\ell_2$ prediction error losses, on portfolio allocation and cost-sensitive multi-class classification problems.


Structure Learning for Directed Trees

arXiv.org Machine Learning

Knowing the causal structure of a system is of fundamental interest in many areas of science and can aid the design of prediction algorithms that work well under manipulations to the system. The causal structure becomes identifiable from the observational distribution under certain restrictions. To learn the structure from data, score-based methods evaluate different graphs according to the quality of their fits. However, for large nonlinear models, these rely on heuristic optimization approaches with no general guarantees of recovering the true causal structure. In this paper, we consider structure learning of directed trees. We propose a fast and scalable method based on Chu-Liu-Edmonds' algorithm we call causal additive trees (CAT). For the case of Gaussian errors, we prove consistency in an asymptotic regime with a vanishing identifiability gap. We also introduce a method for testing substructure hypotheses with asymptotic family-wise error rate control that is valid post-selection and in unidentified settings. Furthermore, we study the identifiability gap, which quantifies how much better the true causal model fits the observational distribution, and prove that it is lower bounded by local properties of the causal model. Simulation studies demonstrate the favorable performance of CAT compared to competing structure learning methods.


Parallel Quasi-concave set optimization: A new frontier that scales without needing submodularity

arXiv.org Machine Learning

Classes of set functions along with a choice of ground set are a bedrock to determine and develop corresponding variants of greedy algorithms to obtain efficient solutions for combinatorial optimization problems. The class of approximate constrained submodular optimization has seen huge advances at the intersection of good computational efficiency, versatility and approximation guarantees while exact solutions for unconstrained submodular optimization are NP-hard. What is an alternative to situations when submodularity does not hold? Can efficient and globally exact solutions be obtained? We introduce one such new frontier: The class of quasi-concave set functions induced as a dual class to monotone linkage functions. We provide a parallel algorithm with a time complexity over $n$ processors of $\mathcal{O}(n^2g) +\mathcal{O}(\log{\log{n}})$ where $n$ is the cardinality of the ground set and $g$ is the complexity to compute the monotone linkage function that induces a corresponding quasi-concave set function via a duality. The complexity reduces to $\mathcal{O}(gn\log(n))$ on $n^2$ processors and to $\mathcal{O}(gn)$ on $n^3$ processors. Our algorithm provides a globally optimal solution to a maxi-min problem as opposed to submodular optimization which is approximate. We show a potential for widespread applications via an example of diverse feature subset selection with exact global maxi-min guarantees upon showing that a statistical dependency measure called distance correlation can be used to induce a quasi-concave set function.


Provably Efficient Generative Adversarial Imitation Learning for Online and Offline Setting with Linear Function Approximation

arXiv.org Artificial Intelligence

In generative adversarial imitation learning (GAIL), the agent aims to learn a policy from an expert demonstration so that its performance cannot be discriminated from the expert policy on a certain predefined reward set. In this paper, we study GAIL in both online and offline settings with linear function approximation, where both the transition and reward function are linear in the feature maps. Besides the expert demonstration, in the online setting the agent can interact with the environment, while in the offline setting the agent only accesses an additional dataset collected by a prior. For online GAIL, we propose an optimistic generative adversarial policy optimization algorithm (OGAP) and prove that OGAP achieves $\widetilde{\mathcal{O}}(H^2 d^{3/2}K^{1/2}+KH^{3/2}dN_1^{-1/2})$ regret. Here $N_1$ represents the number of trajectories of the expert demonstration, $d$ is the feature dimension, and $K$ is the number of episodes. For offline GAIL, we propose a pessimistic generative adversarial policy optimization algorithm (PGAP). For an arbitrary additional dataset, we obtain the optimality gap of PGAP, achieving the minimax lower bound in the utilization of the additional dataset. Assuming sufficient coverage on the additional dataset, we show that PGAP achieves $\widetilde{\mathcal{O}}(H^{2}dK^{-1/2} +H^2d^{3/2}N_2^{-1/2}+H^{3/2}dN_1^{-1/2} \ )$ optimality gap. Here $N_2$ represents the number of trajectories of the additional dataset with sufficient coverage.


CARE: Coherent Actionable Recourse based on Sound Counterfactual Explanations

arXiv.org Artificial Intelligence

Counterfactual explanation methods interpret the outputs of a machine learning model in the form of "what-if scenarios" without compromising the fidelity-interpretability trade-off. They explain how to obtain a desired prediction from the model by recommending small changes to the input features, aka recourse. We believe an actionable recourse should be created based on sound counterfactual explanations originating from the distribution of the ground-truth data and linked to the domain knowledge. Moreover, it needs to preserve the coherency between changed/unchanged features while satisfying user/domain-specified constraints. This paper introduces CARE, a modular explanation framework that addresses the model- and user-level desiderata in a consecutive and structured manner. We tackle the existing requirements by proposing novel and efficient solutions that are formulated in a multi-objective optimization framework. The designed framework enables including arbitrary requirements and generating counterfactual explanations and actionable recourse by choice. As a model-agnostic approach, CARE generates multiple, diverse explanations for any black-box model in tabular classification and regression settings. Several experiments on standard data sets and black-box models demonstrate the effectiveness of our modular framework and its superior performance compared to the baselines.


Probabilistic methods for approximate archetypal analysis

arXiv.org Machine Learning

Archetypal analysis is an unsupervised learning method for exploratory data analysis. One major challenge that limits the applicability of archetypal analysis in practice is the inherent computational complexity of the existing algorithms. In this paper, we provide a novel approximation approach to partially address this issue. Utilizing probabilistic ideas from high-dimensional geometry, we introduce two preprocessing techniques to reduce the dimension and representation cardinality of the data, respectively. We prove that, provided the data is approximately embedded in a low-dimensional linear subspace and the convex hull of the corresponding representations is well approximated by a polytope with a few vertices, our method can effectively reduce the scaling of archetypal analysis. Moreover, the solution of the reduced problem is near-optimal in terms of prediction errors. Our approach can be combined with other acceleration techniques to further mitigate the intrinsic complexity of archetypal analysis. We demonstrate the usefulness of our results by applying our method to summarize several moderately large-scale datasets.


Synthesizing Pareto-Optimal Interpretations for Black-Box Models

arXiv.org Artificial Intelligence

We present a new multi-objective optimization approach for synthesizing interpretations that "explain" the behavior of black-box machine learning models. Constructing human-understandable interpretations for black-box models often requires balancing conflicting objectives. A simple interpretation may be easier to understand for humans while being less precise in its predictions vis-a-vis a complex interpretation. Existing methods for synthesizing interpretations use a single objective function and are often optimized for a single class of interpretations. In contrast, we provide a more general and multi-objective synthesis framework that allows users to choose (1) the class of syntactic templates from which an interpretation should be synthesized, and (2) quantitative measures on both the correctness and explainability of an interpretation. For a given black-box, our approach yields a set of Pareto-optimal interpretations with respect to the correctness and explainability measures. We show that the underlying multi-objective optimization problem can be solved via a reduction to quantitative constraint solving, such as weighted maximum satisfiability. To demonstrate the benefits of our approach, we have applied it to synthesize interpretations for black-box neural-network classifiers. Our experiments show that there often exists a rich and varied set of choices for interpretations that are missed by existing approaches.


Learning from Images: Proactive Caching with Parallel Convolutional Neural Networks

arXiv.org Artificial Intelligence

With the continuous trend of data explosion, delivering packets from data servers to end users causes increased stress on both the fronthaul and backhaul traffic of mobile networks. To mitigate this problem, caching popular content closer to the end-users has emerged as an effective method for reducing network congestion and improving user experience. To find the optimal locations for content caching, many conventional approaches construct various mixed integer linear programming (MILP) models. However, such methods may fail to support online decision making due to the inherent curse of dimensionality. In this paper, a novel framework for proactive caching is proposed. This framework merges model-based optimization with data-driven techniques by transforming an optimization problem into a grayscale image. For parallel training and simple design purposes, the proposed MILP model is first decomposed into a number of sub-problems and, then, convolutional neural networks (CNNs) are trained to predict content caching locations of these sub-problems. Furthermore, since the MILP model decomposition neglects the internal effects among sub-problems, the CNNs' outputs have the risk to be infeasible solutions. Therefore, two algorithms are provided: the first uses predictions from CNNs as an extra constraint to reduce the number of decision variables; the second employs CNNs' outputs to accelerate local search. Numerical results show that the proposed scheme can reduce 71.6% computation time with only 0.8% additional performance cost compared to the MILP solution, which provides high quality decision making in real-time.


A Two-Layer Near-Optimal Strategy for Substation Constraint Management via Home Batteries

arXiv.org Artificial Intelligence

Within electrical distribution networks, substation constraints management requires that aggregated power demand from residential users is kept within suitable bounds. Efficiency of substation constraints management can be measured as the reduction of constraints violations w.r.t. unmanaged demand. Home batteries hold the promise of enabling efficient and user-oblivious substation constraints management. Centralized control of home batteries would achieve optimal efficiency. However, it is hardly acceptable by users, since service providers (e.g., utilities or aggregators) would directly control batteries at user premises. Unfortunately, devising efficient hierarchical control strategies, thus overcoming the above problem, is far from easy. We present a novel two-layer control strategy for home batteries that avoids direct control of home devices by the service provider and at the same time yields near-optimal substation constraints management efficiency. Our simulation results on field data from 62 households in Denmark show that the substation constraints management efficiency achieved with our approach is at least 82% of the one obtained with a theoretical optimal centralized strategy.