AITopics

We show that there are no spurious local minima in the non-convex factorized parametrization of low-rank matrix recovery from incoherent linear measurements. With noisy measurements we show all local minima are very close to a global optimum. Together with a curvature bound at saddle points, this yields a polynomial time global convergence guarantee for stochastic gradient descent {\em from random initialization}.

artificial intelligence, local minima, machine learning, (17 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)

Laurent, Thomas, Brecht, James von, Bresson, Xavier, szlam, arthur

The Product Cut

We introduce a theoretical and algorithmic framework for multi-way graph partitioning that relies on a multiplicative cut-based objective. We refer to this objective as the Product Cut. We provide a detailed investigation of the mathematical properties of this objective and an effective algorithm for its optimization. The proposed model has strong mathematical underpinnings, and the corresponding algorithm achieves state-of-the-art performance on benchmark data sets.

artificial intelligence, machine learning, partition, (18 more...)

Country: North America > United States (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Suzuki, Taiji, Kanagawa, Heishiro, Kobayashi, Hayato, Shimizu, Nobuyuki, Tagami, Yukihiro

Minimax Optimal Alternating Minimization for Kernel Nonparametric Tensor Learning

We investigate the statistical performance and computational efficiency of the alternating minimization procedure for nonparametric tensor learning. Tensor modeling has been widely used for capturing the higher order relations between multimodal data sources. In addition to a linear model, a nonlinear tensor model has been received much attention recently because of its high flexibility. We consider an alternating minimization procedure for a general nonlinear model where the true function consists of components in a reproducing kernel Hilbert space (RKHS). In this paper, we show that the alternating minimization method achieves linear convergence as an optimization algorithm and that the generalization error of the resultant estimator yields the minimax optimality. We apply our algorithm to some multitask learning problems and show that the method actually shows favorable performances.

artificial intelligence, machine learning, minimization procedure, (17 more...)

Country:

Europe (0.46)
Asia > Japan > Honshū > Kantō (0.14)

Industry: Consumer Products & Services > Restaurants (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.62)

Woodworth, Blake E., Srebro, Nati

Tight Complexity Bounds for Optimizing Composite Objectives

We provide tight upper and lower bounds on the complexity of minimizing the average of m convex functions using gradient and prox oracles of the component functions. We show a significant gap between the complexity of deterministic vs randomized optimization. For smooth functions, we show that accelerated gradient descent (AGD) and an accelerated variant of SVRG are optimal in the deterministic and randomized settings respectively, and that a gradient oracle is sufficient for the optimal rate. For non-smooth functions, having access to prox oracles reduces the complexity and we present optimal methods based on smoothing that improve over methods using just gradient accesses.

algorithm, artificial intelligence, machine learning, (16 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Arjevani, Yossi, Shamir, Ohad

Dimension-Free Iteration Complexity of Finite Sum Optimization Problems

Many canonical machine learning problems boil down to a convex optimization problem with a finite sum structure. However, whereas much progress has been made in developing faster algorithms for this setting, the inherent limitations of these problems are not satisfactorily addressed by existing lower bounds. Indeed, current bounds focus on first-order optimization algorithms, and only apply in the often unrealistic regime where the number of iterations is less than $\cO(d/n)$ (where $d$ is the dimension and $n$ is the number of samples). In this work, we extend the framework of Arjevani et al. \cite{arjevani2015lower,arjevani2016iteration} to provide new lower bounds, which are dimension-free, and go beyond the assumptions of current bounds, thereby covering standard finite sum optimization methods, e.g., SAG, SAGA, SVRG, SDCA without duality, as well as stochastic coordinate-descent methods, such as SDCA and accelerated proximal SDCA.

algorithm, artificial intelligence, machine learning, (17 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Reconstructing Parameters of Spreading Models from Partial Observations

Lokhov, Andrey

Spreading processes are often modelled as a stochastic dynamics occurring on top of a given network with edge weights corresponding to the transmission probabilities. Knowledge of veracious transmission probabilities is essential for prediction, optimization, and control of diffusion dynamics. Unfortunately, in most cases the transmission rates are unknown and need to be reconstructed from the spreading data. Moreover, in realistic settings it is impossible to monitor the state of each node at every time, and thus the data is highly incomplete. We introduce an efficient dynamic message-passing algorithm, which is able to reconstruct parameters of the spreading model given only partial information on the activation times of nodes in the network. The method is generalizable to a large class of dynamic models, as well to the case of temporal graphs.

algorithm, artificial intelligence, machine learning, (19 more...)

Country: North America > United States (1.00)

Industry:

Transportation (1.00)
Health & Medicine (0.95)
Energy (0.68)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Lindgren, Erik, Wu, Shanshan, Dimakis, Alexandros G.

Leveraging Sparsity for Efficient Submodular Data Summarization

The facility location problem is widely used for summarizing large datasets and has additional applications in sensor placement, image retrieval, and clustering. One difficulty of this problem is that submodular optimization algorithms require the calculation of pairwise benefits for all items in the dataset. This is infeasible for large problems, so recent work proposed to only calculate nearest neighbor benefits. One limitation is that several strong assumptions were invoked to obtain provable approximation guarantees. In this paper we establish that these extra assumptions are not necessary--solving the sparsified problem will be almost optimal under the standard assumptions of the problem. We then analyze a different method of sparsification that is a better model for methods such as Locality Sensitive Hashing to accelerate the nearest neighbor computations and extend the use of the problem to a broader family of similarities. We validate our approach by demonstrating that it rapidly generates interpretable summaries.

algorithm, artificial intelligence, machine learning, (17 more...)

Country: North America > United States (0.46)

Industry: Media > Film (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.50)

Erdogdu, Murat A., Dicker, Lee H., Bayati, Mohsen

Scaled Least Squares Estimator for GLMs in Large-Scale Problems

We study the problem of efficiently estimating the coefficients of generalized linear models (GLMs) in the large-scale setting where the number of observations n is much larger than the number of predictors p, i.e. n p 1. We show that in GLMs with random (not necessarily Gaussian) design, the GLM coefficients are approximately proportional to the corresponding ordinary least squares (OLS) coefficients. Usingthis relation, we design an algorithm that achieves the same accuracy as the maximum likelihood estimator (MLE) through iterations that attain up to a cubic convergence rate, and that are cheaper than any batch optimization algorithm by at least a factor of O(p). We provide theoretical guarantees for our algorithm, and analyze the convergence behavior in terms of data dimensions. Finally, we demonstrate the performance of our algorithm through extensive numerical studies on large-scale real and synthetic datasets, and show that it achieves the highest performance compared to several other widely used optimization algorithms.

artificial intelligence, estimator, machine learning, (18 more...)

Country: Europe (0.46)

Genre: Research Report (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.70)

Horel, Thibaut, Singer, Yaron

Maximization of Approximately Submodular Functions

We study the problem of maximizing a function that is approximately submodular under a cardinality constraint. Approximate submodularity implicitly appears in a wide range of applications as in many cases errors in evaluation of a submodular function break submodularity. Say that $F$ is $\eps$-approximately submodular if there exists a submodular function $f$ such that $(1-\eps)f(S) \leq F(S)\leq (1+\eps)f(S)$ for all subsets $S$. We are interested in characterizing the query-complexity of maximizing $F$ subject to a cardinality constraint $k$ as a function of the error level $\eps > 0$. We provide both lower and upper bounds: for $\eps > n^{-1/2}$ we show an exponential query-complexity lower bound. In contrast, when $\eps < {1}/{k}$ or under a stronger bounded curvature assumption, we give constant approximation algorithms.

information retrieval, machine learning, natural language, (19 more...)

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Vision (0.68)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.55)

Xu, Peng, Yang, Jiyan, Roosta-Khorasani, Farbod, Ré, Christopher, Mahoney, Michael W.

Sub-sampled Newton Methods with Non-uniform Sampling

We consider the problem of finding the minimizer of a convex function $F: \mathbb R^d \rightarrow \mathbb R$ of the form $F(w) \defeq \sum_{i=1}^n f_i(w) + R(w)$ where a low-rank factorization of $\nabla^2 f_i(w)$ is readily available.We consider the regime where $n \gg d$. We propose randomized Newton-type algorithms that exploit \textit{non-uniform} sub-sampling of $\{\nabla^2 f_i(w)\}_{i=1}^{n}$, as well as inexact updates, as means to reduce the computational complexity, and are applicable to a wide range of problems in machine learning. Two non-uniform sampling distributions based on {\it block norm squares} and {\it block partial leverage scores} are considered. Under certain assumptions, we show that our algorithms inherit a linear-quadratic convergence rate in $w$ and achieve a lower computational complexity compared to similar existing methods. In addition, we show that our algorithms exhibit more robustness and better dependence on problem specific quantities, such as the condition number. We numerically demonstrate the advantages of our algorithms on several real datasets.

artificial intelligence, leverage score, machine learning, (19 more...)

Country: North America > United States > California (0.14)

Industry:

Government > Regional Government > North America Government > United States Government (0.46)
Government > Military (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.51)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)