AITopics | Ozdaglar, Asuman

Collaborating Authors

Ozdaglar, Asuman

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Escaping Saddle Points in Constrained Optimization

Mokhtari, Aryan, Ozdaglar, Asuman, Jadbabaie, Ali

arXiv.org Machine LearningSep-6-2018

In this paper, we focus on escaping from saddle points in smooth nonconvex optimization problems subject to a convex set $\mathcal{C}$. We propose a generic framework that yields convergence to a second-order stationary point of the problem, if the convex set $\mathcal{C}$ is simple for a quadratic objective function. To be more precise, our results hold if one can find a $\rho$-approximate solution of a quadratic program subject to $\mathcal{C}$ in polynomial time, where $\rho<1$ is a positive constant that depends on the structure of the set $\mathcal{C}$. Under this condition, we show that the sequence of iterates generated by the proposed framework reaches an $(\epsilon,\gamma)$-second order stationary point (SOSP) in at most $\mathcal{O}(\max\{\epsilon^{-2},\rho^{-3}\gamma^{-3}\})$ iterations. We further characterize the overall arithmetic operations to reach an SOSP when the convex set $\mathcal{C}$ can be written as a set of quadratic constraints. Finally, we extend our results to the stochastic setting and characterize the number of stochastic gradient and Hessian evaluations to reach an $(\epsilon,\gamma)$-SOSP.

artificial intelligence, optimization problem, stationary point, (19 more...)

arXiv.org Machine Learning

1809.02162

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)

Add feedback

Convergence Rate of Block-Coordinate Maximization Burer-Monteiro Method for Solving Large SDPs

Erdogdu, Murat A., Ozdaglar, Asuman, Parrilo, Pablo A., Vanli, Nuri Denizcan

arXiv.org Machine LearningJul-12-2018

Semidefinite programming (SDP) with equality constraints arise in many optimization and machine learning problems, such as Max-Cut, community detection and robust PCA. Although SDPs can be solved to arbitrary precision in polynomial time, generic convex solvers do not scale well with the dimension of the problem. In order to address this issue, Burer and Monteiro \cite{burer2003nonlinear} proposed to reduce the dimension of the problem by appealing to a low-rank factorization, and solve the subsequent non-convex problem instead. It is well-understood that the resulting non-convex problem acts as a reliable surrogate to the original SDP, and can be efficiently solved using the block-coordinate maximization method. Despite its simplicity, remarkable success, and wide use in practice, the theoretical understanding of the convergence of this method is limited. We prove that the block-coordinate maximization algorithm applied to the non-convex Burer-Monteiro approach enjoys a global sublinear rate without any assumptions on the problem, and a local linear convergence rate despite no local maxima is locally strongly concave. We illustrate our results through examples and numerical experiments.

algorithm, artificial intelligence, optimization problem, (17 more...)

arXiv.org Machine Learning

1807.04428

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Massachusetts (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

When Cyclic Coordinate Descent Outperforms Randomized Coordinate Descent

Gurbuzbalaban, Mert, Ozdaglar, Asuman, Parrilo, Pablo A., Vanli, Nuri

Neural Information Processing SystemsDec-31-2017

The coordinate descent (CD) method is a classical optimization algorithm that has seen a revival of interest because of its competitive performance in machine learning applications. A number of recent papers provided convergence rate estimates for their deterministic (cyclic) and randomized variants that differ in the selection of update coordinates. These estimates suggest randomized coordinate descent (RCD) performs better than cyclic coordinate descent (CCD), although numerical experiments do not provide clear justification for this comparison. In this paper, we provide examples and more generally problem classes for which CCD (or CD with any deterministic order) is faster than RCD in terms of asymptotic worst-case convergence. Furthermore, we provide lower and upper bounds on the amount of improvement on the rate of CCD relative to RCD, which depends on the deterministic order used. We also provide a characterization of the best deterministic order (that leads to the maximum improvement in convergence rate) in terms of the combinatorial properties of the Hessian matrix of the objective function.

artificial intelligence, matrix, optimization problem, (16 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Computing the Stationary Distribution Locally

Lee, Christina E., Ozdaglar, Asuman, Shah, Devavrat

Neural Information Processing SystemsDec-31-2013

Computing the stationary distribution of a large finite or countably infinite state space Markov Chain (MC) has become central in many problems such as statistical inference and network analysis. Standard methods involve large matrix multiplications as in power iteration, or simulations of long random walks to sample states from the stationary distribution, as in Markov Chain Monte Carlo (MCMC). However these methods are computationally costly; either they involve operations at every state or they scale (in computation time) at least linearly in the size of the state space. In this paper, we provide a novel algorithm that answers whether a chosen state in a MC has stationary probability larger than some $\Delta \in (0,1)$. If so, it estimates the stationary probability. Our algorithm uses information from a local neighborhood of the state on the graph induced by the MC, which has constant size relative to the state space. We provide correctness and convergence guarantees that depend on the algorithm parameters and mixing properties of the MC. Simulation results show MCs for which this method gives tight estimates.

artificial intelligence, machine learning, markov chain, (18 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts (0.15)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.55)

Add feedback

A Fast Distributed Proximal-Gradient Method

Chen, Annie I., Ozdaglar, Asuman

arXiv.org Machine LearningOct-8-2012

We present a distributed proximal-gradient method for optimizing the average of convex functions, each of which is the private local objective of an agent in a network with time-varying topology. The local objectives have distinct differentiable components, but they share a common nondifferentiable component, which has a favorable structure suitable for effective computation of the proximal operator. In our method, each agent iteratively updates its estimate of the global minimum by optimizing its local objective function, and exchanging estimates with others via communication in the network. Using Nesterov-type acceleration techniques and multiple communication steps per iteration, we show that this method converges at the rate 1/k (where k is the number of communication rounds between the agents), which is faster than the convergence rate of the existing distributed methods for solving this problem. The superior convergence rate of our method is also verified by numerical experiments.

artificial intelligence, optimization problem, proximal-gradient method, (16 more...)

arXiv.org Machine Learning

1210.2289

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback