Optimization
Ranking with Adaptive Neighbors
Li, Muge, Li, Liangyue, Nie, Feiping
Retrieving the most similar objects in a large-scale database for a given query is a fundamental building block in many application domains, ranging from web searches, visual, cross media, and document retrievals. State-of-the-art approaches have mainly focused on capturing the underlying geometry of the data manifolds. Graph-based approaches, in particular, define various diffusion processes on weighted data graphs. Despite success, these approaches rely on fixed-weight graphs, making ranking sensitive to the input affinity matrix. In this study, we propose a new ranking algorithm that simultaneously learns the data affinity matrix and the ranking scores. The proposed optimization formulation assigns adaptive neighbors to each point in the data based on the local connectivity, and the smoothness constraint assigns similar ranking scores to similar data points. We develop a novel and efficient algorithm to solve the optimization problem. Evaluations using synthetic and real datasets suggest that the proposed algorithm can outperform the existing methods.
DEPSO Algorithm: Project Portal – Xiao-Feng Xie, Ph.D.
DEPSO [1], or called DEPS, is an algorithm for (constrained) numerical optimization problem (NOP). DEPSO combines the advantages of Particle Swarm Optimization (PSO) and Differential Evolution (DE). It is incorporated into cooperative group optimization (CGO) system [2]. The DEPSO paper has been cited over 400 times with various applications. DEPSO was also implemented (by Sun Microsystems Inc.) into NLPSolver (Solver for Nonlinear Programming), an extension of Calc in Apache OpenOffice.
Neural Conditional Gradients
Schramowski, Patrick, Bauckhage, Christian, Kersting, Kristian
The move from hand-designed to learned optimizers in machine learning has been quite successful for gradient-based and -free optimizers. When facing a constrained problem, however, maintaining feasibility typically requires a projection step, which might be computationally expensive and not differentiable. We show how the design of projection-free convex optimization algorithms can be cast as a learning problem based on Frank-Wolfe Networks: recurrent networks implementing the Frank-Wolfe algorithm aka. conditional gradients. This allows them to learn to exploit structure when, e.g., optimizing over rank-1 matrices. Our LSTM-learned optimizers outperform hand-designed as well learned but unconstrained ones. We demonstrate this for training support vector machines and softmax classifiers.
COPA: Constrained PARAFAC2 for Sparse & Large Datasets
Afshar, Ardavan, Perros, Ioakeim, Papalexakis, Evangelos E., Searles, Elizabeth, Ho, Joyce, Sun, Jimeng
PARAFAC2 has demonstrated success in modeling irregular tensors, where the tensor dimensions vary across one of the modes. An example scenario is jointly modeling treatments across a set of patients with varying number of medical encounters, where the alignment of events in time bears no clinical meaning, and it may also be impossible to align them due to their varying length. Despite recent improvements on scaling up unconstrained PARAFAC2, its model factors are usually dense and sensitive to noise which limits their interpretability. As a result, the following open challenges remain: a) various modeling constraints, such as temporal smoothness, sparsity and non-negativity, are needed to be imposed for interpretable temporal modeling and b) a scalable approach is required to support those constraints efficiently for large datasets. To tackle these challenges, we propose a COnstrained PARAFAC2 (COPA) method, which carefully incorporates optimization constraints such as temporal smoothness, sparsity, and non-negativity in the resulting factors. To efficiently support all those constraints, COPA adopts a hybrid optimization framework using alternating optimization and alternating direction method of multiplier (AO-ADMM). As evaluated on large electronic health record (EHR) datasets with hundreds of thousands of patients, COPA achieves significant speedups (up to 36x faster) over prior PARAFAC2 approaches that only attempt to handle a subset of the constraints that COPA enables. Overall, our method outperforms all the baselines attempting to handle a subset of the constraints in terms of speed, while achieving the same level of accuracy.
Geometrical Insights for Implicit Generative Modeling
Bottou, Leon, Arjovsky, Martin, Lopez-Paz, David, Oquab, Maxime
Learning algorithms for implicit generative models can optimize a variety of criteria that measure how the data distribution differs from the implicit model distribution, including the Wasserstein distance, the Energy distance, and the Maximum Mean Discrepancy criterion. A careful look at the geometries induced by these distances on the space of probability measures reveals interesting differences. In particular, we can establish surprising approximate global convergence guarantees for the $1$-Wasserstein distance,even when the parametric generator has a nonconvex parametrization.
A Minimax Surrogate Loss Approach to Conditional Difference Estimation
Goh, Siong Thye, Rudin, Cynthia
We present a new machine learning approach to estimate personalized treatment effects in the classical potential outcomes framework with binary outcomes. To overcome the problem that both treatment and control outcomes for the same unit are required for supervised learning, we propose surrogate loss functions that incorporate both treatment and control data. The new surrogates yield tighter bounds than the sum of losses for treatment and control groups. A specific choice of loss function, namely a type of hinge loss, yields a minimax support vector machine formulation. The resulting optimization problem requires the solution to only a single convex optimization problem, incorporating both treatment and control units, and it enables the kernel trick to be used to handle nonlinear (also non-parametric) estimation. Statistical learning bounds are also presented for the framework, and experimental results.
Provably robust estimation of modulo 1 samples of a smooth function with applications to phase unwrapping
Cucuringu, Mihai, Tyagi, Hemant
Consider an unknown smooth function $f: [0,1]^d \rightarrow \mathbb{R}$, and say we are given $n$ noisy mod 1 samples of $f$, i.e., $y_i = (f(x_i) + \eta_i)\mod 1$, for $x_i \in [0,1]^d$, where $\eta_i$ denotes the noise. Given the samples $(x_i,y_i)_{i=1}^{n}$, our goal is to recover smooth, robust estimates of the clean samples $f(x_i) \bmod 1$. We formulate a natural approach for solving this problem, which works with angular embeddings of the noisy mod 1 samples over the unit circle, inspired by the angular synchronization framework. This amounts to solving a smoothness regularized least-squares problem -- a quadratically constrained quadratic program (QCQP) -- where the variables are constrained to lie on the unit circle. Our approach is based on solving its relaxation, which is a trust-region sub-problem and hence solvable efficiently. We provide theoretical guarantees demonstrating its robustness to noise for adversarial, and random Gaussian and Bernoulli noise models. To the best of our knowledge, these are the first such theoretical results for this problem. We demonstrate the robustness and efficiency of our approach via extensive numerical simulations on synthetic data, along with a simple least-squares solution for the unwrapping stage, that recovers the original samples of $f$ (up to a global shift). It is shown to perform well at high levels of noise, when taking as input the denoised modulo $1$ samples. Finally, we also consider two other approaches for denoising the modulo 1 samples that leverage tools from Riemannian optimization on manifolds, including a Burer-Monteiro approach for a semidefinite programming relaxation of our formulation. For the two-dimensional version of the problem, which has applications in radar interferometry, we are able to solve instances of real-world data with a million sample points in under 10 seconds, on a personal laptop.
A Stochastic Semismooth Newton Method for Nonsmooth Nonconvex Optimization
Milzarek, Andre, Xiao, Xiantao, Cen, Shicong, Wen, Zaiwen, Ulbrich, Michael
In this work, we present a globalized stochastic semismooth Newton method for solving stochastic optimization problems involving smooth nonconvex and nonsmooth convex terms in the objective function. We assume that only noisy gradient and Hessian information of the smooth part of the objective function is available via calling stochastic first and second order oracles. The proposed method can be seen as a hybrid approach combining stochastic semismooth Newton steps and stochastic proximal gradient steps. Two inexact growth conditions are incorporated to monitor the convergence and the acceptance of the semismooth Newton steps and it is shown that the algorithm converges globally to stationary points in expectation. Moreover, under standard assumptions and utilizing random matrix concentration inequalities, we prove that the proposed approach locally turns into a pure stochastic semismooth Newton method and converges r-superlinearly with high probability. We present numerical results and comparisons on $\ell_1$-regularized logistic regression and nonconvex binary classification that demonstrate the efficiency of our algorithm.
Bayesian Optimization for Dynamic Problems
Nyikosa, Favour M., Osborne, Michael A., Roberts, Stephen J.
We propose practical extensions to Bayesian optimization for solving dynamic problems. We model dynamic objective functions using spatiotemporal Gaussian process priors which capture all the instances of the functions over time. Our extensions to Bayesian optimization use the information learnt from this model to guide the tracking of a temporally evolving minimum. By exploiting temporal correlations, the proposed method also determines when to make evaluations, how fast to make those evaluations, and it induces an appropriate budget of steps based on the available information. Lastly, we evaluate our technique on synthetic and real-world problems.
A novel model-based heuristic for energy optimal motion planning for automated driving
Ajanovic, Zlatan, Stolz, Michael, Horn, Martin
Predictive motion planning is the key to achieve energy-efficient driving, which is one of the main benefits of automated driving. Researchers have been studying the planning of velocity trajectories, a simpler form of motion planning, for over a decade now and many different methods are available. Dynamic programming has shown to be the most common choice due to its numerical background and ability to include nonlinear constraints and models. Although planning of an optimal trajectory is done in a systematic way, dynamic programming does not use any knowledge about the considered problem to guide the exploration and therefore explores all possible trajectories. A* is a search algorithm which enables using knowledge about the problem to guide the exploration to the most promising solutions first. Knowledge has to be represented in a form of a heuristic function, which gives an optimistic estimate of cost for transitioning to the final state, which is not a straightforward task. This paper presents a novel heuristics incorporating air drag and auxiliary power as well as operational costs of the vehicle, besides kinetic and potential energy and rolling resistance known in the literature. Furthermore, optimal cruising velocity, which depends on vehicle aerodynamic properties and auxiliary power, is derived. Results are compared for different variants of heuristic functions and dynamic programming as well.