AITopics | Optimization

Collaborating Authors

Optimization

News Overviews Instructional Materials AI-Alerts Classics

Stochastic Convex Optimization with Multiple Objectives

Mahdavi, Mehrdad, Yang, Tianbao, Jin, Rong

Neural Information Processing SystemsFeb-14-2020, 16:26:37 GMT

In this paper, we are interested in the development of efficient algorithms for convex optimization problems in the simultaneous presence of multiple objectives and stochasticity in the first-order information. We cast the stochastic multiple objective optimization problem into a constrained optimization problem by choosing one function as the objective and try to bound other objectives by appropriate thresholds. We first examine a two stages exploration-exploitation based algorithm which first approximates the stochastic objectives by sampling and then solves a constrained stochastic optimization problem by projected gradient method. Our second approach is an efficient primal-dual stochastic algorithm. It leverages on the theory of Lagrangian method in constrained optimization and attains the optimal convergence rate of $[O(1/ \sqrt{T})]$ in high probability for general Lipschitz continuous objectives.

multiple objective, optimization problem, stochastic convex optimization, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback

Efficient Optimization for Sparse Gaussian Process Regression

Cao, Yanshuai, Brubaker, Marcus A., Fleet, David J., Hertzmann, Aaron

Neural Information Processing SystemsFeb-14-2020, 16:26:04 GMT

We propose an efficient discrete optimization algorithm for selecting a subset of training data to induce sparsity for Gaussian process regression. The algorithm estimates this inducing set and the hyperparameters using a single objective, either the marginal likelihood or a variational free energy. The space and time complexity are linear in the training set size, and the algorithm can be applied to large regression problems on discrete or continuous domains. Empirical evaluation shows state-of-art performance in the discrete case and competitive results in the continuous case. Papers published at the Neural Information Processing Systems Conference.

algorithm, efficient optimization, sparse gaussian process regression

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.72)

Add feedback

Robust Data-Driven Dynamic Programming

Hanasusanto, Grani Adiwena, Kuhn, Daniel

Neural Information Processing SystemsFeb-14-2020, 15:41:08 GMT

In stochastic optimal control the distribution of the exogenous noise is typically unknown and must be inferred from limited data before dynamic programming (DP)-based solution schemes can be applied. If the conditional expectations in the DP recursions are estimated via kernel regression, however, the historical sample paths enter the solution procedure directly as they determine the evaluation points of the cost-to-go functions. The resulting data-driven DP scheme is asymptotically consistent and admits efficient computational solution when combined with parametric value function approximations. If training data is sparse, however, the estimated cost-to-go functions display a high variability and an optimistic bias, while the corresponding control policies perform poorly in out-of-sample tests. To mitigate these small sample effects, we propose a robust data-driven DP scheme, which replaces the expectations in the DP recursions with worst-case expectations over a set of distributions close to the best estimate.

data-driven dp scheme, out-of-sample test, robust data-driven dynamic programming, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.65)

Add feedback

Safe Adaptive Importance Sampling

Stich, Sebastian U., Raj, Anant, Jaggi, Martin

Neural Information Processing SystemsFeb-14-2020, 15:13:25 GMT

Importance sampling has become an indispensable strategy to speed up optimization algorithms for large-scale applications. Improved adaptive variants -- using importance values defined by the complete gradient information which changes during optimization -- enjoy favorable theoretical properties, but are typically computationally infeasible. In this paper we propose an efficient approximation of gradient-based sampling, which is based on safe bounds on the gradient. The proposed sampling distribution is (i) provably the \emph{best sampling} with respect to the given bounds, (ii) always better than uniform sampling and fixed importance sampling and (iii) can efficiently be computed -- in many applications at negligible extra cost. The proposed sampling scheme is generic and can easily be integrated into existing algorithms.

algorithm, application, safe adaptive importance sampling

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.65)

Add feedback

Third-order Smoothness Helps: Faster Stochastic Optimization Algorithms for Finding Local Minima

Yu, Yaodong, Xu, Pan, Gu, Quanquan

Neural Information Processing SystemsFeb-14-2020, 14:58:42 GMT

We propose stochastic optimization algorithms that can find local minima faster than existing algorithms for nonconvex optimization problems, by exploiting the third-order smoothness to escape non-degenerate saddle points more efficiently. This improves upon the $\tilde{O}(\epsilon {-7/2})$ gradient complexity achieved by the state-of-the-art stochastic local minima finding algorithms by a factor of $\tilde{O}(\epsilon {-1/6})$. Experiments on two nonconvex optimization problems demonstrate the effectiveness of our algorithm and corroborate our theory. Papers published at the Neural Information Processing Systems Conference.

faster stochastic optimization algorithm, local minima, third-order smoothness help, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.87)

Add feedback

Model-Based Relative Entropy Stochastic Search

Abdolmaleki, Abbas, Lioutikov, Rudolf, Peters, Jan R., Lau, Nuno, Reis, Luis Pualo, Neumann, Gerhard

Neural Information Processing SystemsFeb-14-2020, 14:58:10 GMT

Stochastic search algorithms are general black-box optimizers. Due to their ease of use and their generality, they have recently also gained a lot of attention in operations research, machine learning and policy search. Yet, these algorithms require a lot of evaluations of the objective, scale poorly with the problem dimension, are affected by highly noisy objective functions and may converge prematurely. To alleviate these problems, we introduce a new surrogate-based stochastic search approach. We learn simple, quadratic surrogate models of the objective function.

algorithm, model-based relative entropy stochastic search, objective function

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback

Large-Scale Price Optimization via Network Flow

Ito, Shinji, Fujimaki, Ryohei

Neural Information Processing SystemsFeb-14-2020, 14:57:07 GMT

This paper deals with price optimization, which is to find the best pricing strategy that maximizes revenue or profit, on the basis of demand forecasting models. Though recent advances in regression technologies have made it possible to reveal price-demand relationship of a number of multiple products, most existing price optimization methods, such as mixed integer programming formulation, cannot handle tens or hundreds of products because of their high computational costs. To cope with this problem, this paper proposes a novel approach based on network flow algorithms. We reveal a connection between supermodularity of the revenue and cross elasticity of demand. On the basis of this connection, we propose an efficient algorithm that employs network flow algorithms.

algorithm, large-scale price optimization, network flow, (3 more...)

Neural Information Processing Systems

Genre: Research Report > Promising Solution (0.44)

Technology:

Information Technology > Communications > Networks (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.44)

Add feedback

Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima

Loh, Po-Ling, Wainwright, Martin J.

Neural Information Processing SystemsFeb-14-2020, 14:43:57 GMT

We establish theoretical results concerning all local optima of various regularized M-estimators, where both loss and penalty functions are allowed to be nonconvex. Our results show that as long as the loss function satisfies restricted strong convexity and the penalty function satisfies suitable regularity conditions, any local optimum of the composite objective function lies within statistical precision of the true parameter vector. Our theory covers a broad class of nonconvex objective functions, including corrected versions of the Lasso for errors-in-variables linear models; regression in generalized linear models using nonconvex regularizers such as SCAD and MCP; and graph and inverse covariance matrix estimation. On the optimization side, we show that a simple adaptation of composite gradient descent may be used to compute a global optimum up to the statistical precision epsilon in log(1/epsilon) iterations, which is the fastest possible rate of any first-order method. We provide a variety of simulations to illustrate the sharpness of our theoretical predictions.

local optima, regularized m-estimator, statistical and algorithmic theory, (3 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.64)

Add feedback

The Physical Systems Behind Optimization Algorithms

Yang, Lin, Arora, Raman, braverman, Vladimir, Zhao, Tuo

Neural Information Processing SystemsFeb-14-2020, 14:41:41 GMT

We use differential equations based approaches to provide some {\it \textbf{physics}} insights into analyzing the dynamics of popular optimization algorithms in machine learning. In particular, we study gradient descent, proximal gradient descent, coordinate gradient descent, proximal coordinate gradient, and Newton's methods as well as their Nesterov's accelerated variants in a unified framework motivated by a natural connection of optimization algorithms to physical systems. Our analysis is applicable to more general algorithms and optimization problems {\it \textbf{beyond}} convexity and strong convexity, e.g. Polyak-\L ojasiewicz and error bound conditions (possibly nonconvex). Papers published at the Neural Information Processing Systems Conference.

gradient descent, optimization algorithm, physical system, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Graphical Time Warping for Joint Alignment of Multiple Curves

Wang, Yizhi, Miller, David J., Poskanzer, Kira, Wang, Yue, Tian, Lin, Yu, Guoqiang

Neural Information Processing SystemsFeb-14-2020, 14:28:25 GMT

Dynamic time warping (DTW) is a fundamental technique in time series analysis for comparing one curve to another using a flexible time-warping function. However, it was designed to compare a single pair of curves. In many applications, such as in metabolomics and image series analysis, alignment is simultaneously needed for multiple pairs. Because the underlying warping functions are often related, independent application of DTW to each pair is a sub-optimal solution. Yet, it is largely unknown how to efficiently conduct a joint alignment with all warping functions simultaneously considered, since any given warping function is constrained by the others and dynamic programming cannot be applied.

graphical time warping, joint alignment, multiple curve, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.62)
Information Technology > Artificial Intelligence > Machine Learning (0.46)

Add feedback