Goto

Collaborating Authors

 Optimization


SVRG meets SAGA: k-SVRG --- A Tale of Limited Memory

arXiv.org Machine Learning

In recent years, many variance reduced algorithms for empirical risk minimization have been introduced. In contrast to vanilla SGD, these methods converge linearly on strong convex problems. To obtain the variance reduction, current methods either require frequent passes over the full data to recompute gradients---without making any progress during this time (like in SVRG), or they require memory of the same size as the input problem (like SAGA). In this work, we propose k-SVRG, an algorithm that interpolates between those two extremes: it makes best use of the available memory and in turn does avoid full passes over the data without making progress. We prove linear convergence of k-SVRG on strongly convex problems and convergence to stationary points on non-convex problems. Numerical experiments show the effectiveness of our method.


Direct Runge-Kutta Discretization Achieves Acceleration

arXiv.org Machine Learning

We study gradient-based optimization methods obtained by directly discretizing a second-order ordinary differential equation (ODE) related to the continuous limit of Nesterov's accelerated gradient method. When the function is smooth enough, we show that acceleration can be achieved by a stable discretization of the ODE using standard Runge-Kutta integrators. Specifically, we prove that under Lipschitz-gradient, convexity and order-$(s+2)$ differentiability assumptions, the sequence of iterates generated by discretizing the proposed second-order ODE converges to the optimal solution at a rate of $\mathcal{O}({N^{-2\frac{s}{s+1}}})$, where $s$ is the order of the Runge-Kutta numerical integrator. By increasing $s$, the convergence rate of our method approaches the optimal rate of $\mathcal{O}({N^{-2}})$. Furthermore, we introduce a new local flatness condition on the objective, according to which rates even faster than $\mathcal{O}(N^{-2})$ can be achieved with low-order integrators and only gradient information. Notably, this flatness condition is satisfied by several standard loss functions used in machine learning, and it may be of broader independent interest. We provide numerical experiments that verify the theoretical rates predicted by our results.


Optimization in Machine Learning, Spring 2018

#artificialintelligence

A majority of machine learning algorithms minimize empirical risk by solving a convex or non-convex optimization. Structured predictors solve combinatorial optimizations, and their learning algorithms solve hybrid optimizations. And new approaches for stochastic optimization have become integral in modern deep learning methodology. Students who take this course will study the latest knowledge and foundational concepts on optimization in machine learning, including theoretical analyses of optimization-based learning algorithms, theoretical bounds of discrete optimization for structured prediction, and recent discoveries about non-convex optimization methods. Class meets Tuesday and Thursday from 12:30 PM to 1:45 PM in the New Classroom Building (NCB) 110A.


Optimal Transport on Discrete Domains

arXiv.org Artificial Intelligence

Inspired by the matching of supply to demand in logistical problems, the optimal transport (or Monge--Kantorovich) problem involves the matching of probability distributions defined over a geometric domain such as a surface or manifold. In its most obvious discretization, optimal transport becomes a large-scale linear program, which typically is infeasible to solve efficiently on triangle meshes, graphs, point clouds, and other domains encountered in graphics and machine learning. Recent breakthroughs in numerical optimal transport, however, enable scalability to orders-of-magnitude larger problems, solvable in a fraction of a second. Here, we discuss advances in numerical optimal transport that leverage understanding of both discrete and smooth aspects of the problem. State-of-the-art techniques in discrete optimal transport combine insight from partial differential equations (PDE) with convex analysis to reformulate, discretize, and optimize transportation problems. The end result is a set of theoretically-justified models suitable for domains with thousands or millions of vertices. Since numerical optimal transport is a relatively new discipline, special emphasis is placed on identifying and explaining open problems in need of mathematical insight and additional research.


Sample-to-Sample Correspondence for Unsupervised Domain Adaptation

arXiv.org Machine Learning

The assumption that training and testing samples are generated from the same distribution does not always hold for real-world machine-learning applications. The procedure of tackling this discrepancy between the training (source) and testing (target) domains is known as domain adaptation. We propose an unsupervised version of domain adaptation that considers the presence of only unlabelled data in the target domain. Our approach centers on finding correspondences between samples of each domain. The correspondences are obtained by treating the source and target samples as graphs and using a convex criterion to match them. The criteria used are first-order and second-order similarities between the graphs as well as a class-based regularization. We have also developed a computationally efficient routine for the convex optimization, thus allowing the proposed method to be used widely. To verify the effectiveness of the proposed method, computer simulations were conducted on synthetic, image classification and sentiment classification datasets. Results validated that the proposed local sample-to-sample matching method out-performs traditional moment-matching methods and is competitive with respect to current local domain-adaptation methods.



Equivalent Lipschitz surrogates for zero-norm and rank optimization problems

arXiv.org Machine Learning

This paper proposes a mechanism to produce equivalent Lipschitz surrogates for zero-norm and rank optimization problems by means of the global exact penalty for their equivalent mathematical programs with an equilibrium constraint (MPECs). Specifically, we reformulate these combinatorial problems as equivalent MPECs by the variational characterization of the zero-norm and rank function, show that their penalized problems, yielded by moving the equilibrium constraint into the objective, are the global exact penalization, and obtain the equivalent Lipschitz surrogates by eliminating the dual variable in the global exact penalty. These surrogates, including the popular SCAD function in statistics, are also difference of two convex functions (D.C.) if the function and constraint set involved in zero-norm and rank optimization problems are convex. We illustrate an application by designing a multi-stage convex relaxation approach to the rank plus zero-norm regularized minimization problem.


Adversarial Regression for Detecting Attacks in Cyber-Physical Systems

arXiv.org Artificial Intelligence

Attacks in cyber-physical systems (CPS) which manipulate sensor readings can cause enormous physical damage if undetected. Detection of attacks on sensors is crucial to mitigate this issue. We study supervised regression as a means to detect anomalous sensor readings, where each sensor's measurement is predicted as a function of other sensors. We show that several common learning approaches in this context are still vulnerable to \emph{stealthy attacks}, which carefully modify readings of compromised sensors to cause desired damage while remaining undetected. Next, we model the interaction between the CPS defender and attacker as a Stackelberg game in which the defender chooses detection thresholds, while the attacker deploys a stealthy attack in response. We present a heuristic algorithm for finding an approximately optimal threshold for the defender in this game, and show that it increases system resilience to attacks without significantly increasing the false alarm rate.


Safety-Aware Apprenticeship Learning

arXiv.org Artificial Intelligence

Apprenticeship learning (AL) is a kind of Learning from Demonstration techniques where the reward function of a Markov Decision Process (MDP) is unknown to the learning agent and the agent has to derive a good policy by observing an expert's demonstrations. In this paper, we study the problem of how to make AL algorithms inherently safe while still meeting its learning objective. We consider a setting where the unknown reward function is assumed to be a linear combination of a set of state features, and the safety property is specified in Probabilistic Computation Tree Logic (PCTL). By embedding probabilistic model checking inside AL, we propose a novel counterexample-guided approach that can ensure safety while retaining performance of the learnt policy. We demonstrate the effectiveness of our approach on several challenging AL scenarios where safety is essential.


Sparse Group Inductive Matrix Completion

arXiv.org Machine Learning

We consider the problem of inductive matrix completion under the assumption that many features are non-informative, which leads to row- and column-sparse structure of coefficient matrix. Under the additional assumption on the low rank of coefficient matrix we propose the matrix factorization framework with group-lasso regularization on parameter matrices. We suggest efficient optimization algorithm for the solution of the obtained problem. From theoretical point of view, we prove the oracle generalization bound on the expected error of matrix completion. Corresponding sample complexity bounds show the benefits of the proposed approach compared to competitors in the sparse problems. The experiments on synthetic and real-world datasets show the state-of-the-art efficiency of the proposed method.