Goto

Collaborating Authors

 Optimization


The Causal Marginal Polytope for Bounding Treatment Effects

arXiv.org Machine Learning

Due to unmeasured confounding, it is often not possible to identify causal effects from a postulated model. Nevertheless, we can ask for partial identification, which usually boils down to finding upper and lower bounds of a causal quantity of interest derived from all solutions compatible with the encoded structural assumptions. One appealing way to derive such bounds is by casting it in terms of a constrained optimization method that searches over all causal models compatible with evidence, as introduced in the classic work of Balke and Pearl (1994) for discrete data. Although by construction this guarantees tight bounds, it poses a formidable computational challenge. To cope with this issue, alternatives include algorithms that are not guaranteed to be tight, or by introducing restrictions on the class of models. In this paper, we introduce a novel alternative: inspired by ideas coming from belief propagation, we enforce compatibility between marginals of a causal model and data, without constructing a global causal model. We call this collection of locally consistent marginals the causal marginal polytope. As global independence constraints disappear when considering small dimensional tractable marginals, this also leads to a rethinking of how to elicit and express causal knowledge. We provide an explicit algorithm and implementation of this idea, and assess its practicality with numerical experiments.


Amortized Proximal Optimization

arXiv.org Machine Learning

We propose a framework for online meta-optimization of parameters that govern optimization, called Amortized Proximal Optimization (APO). We first interpret various existing neural network optimizers as approximate stochastic proximal point methods which trade off the current-batch loss with proximity terms in both function space and weight space. The idea behind APO is to amortize the minimization of the proximal point objective by meta-learning the parameters of an update rule. We show how APO can be used to adapt a learning rate or a structured preconditioning matrix. Under appropriate assumptions, APO can recover existing optimizers such as natural gradient descent and KFAC. It enjoys low computational overhead and avoids expensive and numerically sensitive operations required by some second-order optimizers, such as matrix inverses. We empirically test APO for online adaptation of learning rates and structured preconditioning matrices for regression, image reconstruction, image classification, and natural language translation tasks. Empirically, the learning rate schedules found by APO generally outperform optimal fixed learning rates and are competitive with manually tuned decay schedules. Using APO to adapt a structured preconditioning matrix generally results in optimization performance competitive with second-order methods. Moreover, the absence of matrix inversion provides numerical stability, making it effective for low precision training.


Serve your first model with Scikit-Learn + Flask + Docker

#artificialintelligence

One of the first steps in achieving this is to create a process to serve machine learning models to the organization. This is usually done by creating an application to run the prediction model and return the prediction, in the example in this post we are going to use a handy stack to create and serve models. We will be using Python as the base programming language, the Scikit-Learn package for building the model pipeline: preprocessing the data, training the model and saving the model into a file, the Flask package to develop a web application for the interaction between the client and the prediction model and finally Docker for containerizing the application to prepare it for deployment. In this example we are going to work with the dataset: Breast Cancer Wisconsin (Diagnostic) [1], a widely used dataset for testing machine learning models. In this dataset features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass and it was first introduced in K. P. Bennett and O. L. Mangasarian: "Robust Linear Programming Discrimination of Two Linearly Inseparable Sets", Optimization Methods and Software 1, 1992, 23โ€“34].


Globally Convergent Policy Search over Dynamic Filters for Output Estimation

arXiv.org Machine Learning

We introduce the first direct policy search algorithm which provably converges to the globally optimal $\textit{dynamic}$ filter for the classical problem of predicting the outputs of a linear dynamical system, given noisy, partial observations. Despite the ubiquity of partial observability in practice, theoretical guarantees for direct policy search algorithms, one of the backbones of modern reinforcement learning, have proven difficult to achieve. This is primarily due to the degeneracies which arise when optimizing over filters that maintain internal state. In this paper, we provide a new perspective on this challenging problem based on the notion of $\textit{informativity}$, which intuitively requires that all components of a filter's internal state are representative of the true state of the underlying dynamical system. We show that informativity overcomes the aforementioned degeneracy. Specifically, we propose a $\textit{regularizer}$ which explicitly enforces informativity, and establish that gradient descent on this regularized objective - combined with a ``reconditioning step'' - converges to the globally optimal cost a $\mathcal{O}(1/T)$ rate. Our analysis relies on several new results which may be of independent interest, including a new framework for analyzing non-convex gradient descent via convex reformulation, and novel bounds on the solution to linear Lyapunov equations in terms of (our quantitative measure of) informativity.


A Robust Multi-Objective Bayesian Optimization Framework Considering Input Uncertainty

arXiv.org Machine Learning

Bayesian optimization is a popular tool for data-efficient optimization of expensive objective functions. In real-life applications like engineering design, the designer often wants to take multiple objectives as well as input uncertainty into account to find a set of robust solutions. While this is an active topic in single-objective Bayesian optimization, it is less investigated in the multi-objective case. We introduce a novel Bayesian optimization framework to efficiently perform multi-objective optimization considering input uncertainty. We propose a robust Gaussian Process model to infer the Bayes risk criterion to quantify robustness, and we develop a two-stage Bayesian optimization process to search for a robust Pareto frontier. The complete framework supports various distributions of the input uncertainty and takes full advantage of parallel computing. We demonstrate the effectiveness of the framework through numerical benchmarks.


Policy Learning for Optimal Individualized Dose Intervals

arXiv.org Machine Learning

We study the problem of learning individualized dose intervals using observational data. There are very few previous works for policy learning with continuous treatment, and all of them focused on recommending an optimal dose rather than an optimal dose interval. In this paper, we propose a new method to estimate such an optimal dose interval, named probability dose interval (PDI). The potential outcomes for doses in the PDI are guaranteed better than a pre-specified threshold with a given probability (e.g., 50%). The associated nonconvex optimization problem can be efficiently solved by the Difference-of-Convex functions (DC) algorithm. We prove that our estimated policy is consistent, and its risk converges to that of the best-in-class policy at a root-n rate. Numerical simulations show the advantage of the proposed method over outcome modeling based benchmarks. We further demonstrate the performance of our method in determining individualized Hemoglobin A1c (HbA1c) control intervals for elderly patients with diabetes.


Bidding Agent Design in the LinkedIn Ad Marketplace

arXiv.org Machine Learning

We establish a general optimization framework for the design of automated bidding agent in dynamic online marketplaces. It optimizes solely for the buyer's interest and is agnostic to the auction mechanism imposed by the seller. As a result, the framework allows, for instance, the joint optimization of a group of ads across multiple platforms each running its own auction format. Bidding strategy derived from this framework automatically guarantees the optimality of budget allocation across ad units and platforms. Common constraints such as budget delivery schedule, return on investments and guaranteed results, directly translates to additional parameters in the bidding formula. We share practical learnings of the deployed bidding system in the LinkedIn ad marketplace based on this framework.


Data Driven Modeling of Complex Systems

#artificialintelligence

The almost paradoxical concept of deterministic chaos describes systems which are so sensitive to initial conditions that long term forecasting becomes impossible. Therefore, despite the fact that there is no randomness in the dynamical equations, even the slightest error in calculation -- for instance numerical precision errors in a computer -- will cause future predictions to be wildly off. Applications of chaotic systems range from weather prediction, turbulent flows in fluids, plasma dynamics, chemical reactions, population dynamics, the motion of celestial bodies, the stock market, and many others. While current techniques tend to use noisy and partial measurement information to constrain a physical model (https://en.wikipedia.org/wiki/Kalman_filter), Therefore it is important to be able to use data driven methods such as machine learning (ML) to forecast such systems.


Extension of Dynamic Mode Decomposition for dynamic systems with incomplete information based on t-model of optimal prediction

arXiv.org Artificial Intelligence

The Dynamic Mode Decomposition has proved to be a very efficient technique to study dynamic data. This is entirely a data-driven approach that extracts all necessary information from data snapshots which are commonly supposed to be sampled from measurement. The application of this approach becomes problematic if the available data is incomplete because some dimensions of smaller scale either missing or unmeasured. Such setting occurs very often in modeling complex dynamical systems such as power grids, in particular with reduced-order modeling. To take into account the effect of unresolved variables the optimal prediction approach based on the Mori-Zwanzig formalism can be applied to obtain the most expected prediction under existing uncertainties. This effectively leads to the development of a time-predictive model accounting for the impact of missing data. In the present paper we provide a detailed derivation of the considered method from the Liouville equation and finalize it with the optimization problem that defines the optimal transition operator corresponding to the observed data. In contrast to the existing approach, we consider a first-order approximation of the Mori-Zwanzig decomposition, state the corresponding optimization problem and solve it with the gradient-based optimization method. The gradient of the obtained objective function is computed precisely through the automatic differentiation technique. The numerical experiments illustrate that the considered approach gives practically the same dynamics as the exact Mori-Zwanzig decomposition, but is less computationally intensive.


High-quality Thermal Gibbs Sampling with Quantum Annealing Hardware

arXiv.org Artificial Intelligence

Quantum Annealing (QA) was originally intended for accelerating the solution of combinatorial optimization tasks that have natural encodings as Ising models. However, recent experiments on QA hardware platforms have demonstrated that, in the operating regime corresponding to weak interactions, the QA hardware behaves like a noisy Gibbs sampler at a hardware-specific effective temperature. This work builds on those insights and identifies a class of small hardware-native Ising models that are robust to noise effects and proposes a procedure for executing these models on QA hardware to maximize Gibbs sampling performance. Experimental results indicate that the proposed protocol results in high-quality Gibbs samples from a hardware-specific effective temperature. Furthermore, we show that this effective temperature can be adjusted by modulating the annealing time and energy scale. The procedure proposed in this work provides an approach to using QA hardware for Ising model sampling presenting potential new opportunities for applications in machine learning and physics simulation.