AITopics | Mangoubi, Oren

Collaborating Authors

Mangoubi, Oren

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Dimensionally Tight Bounds for Second-Order Hamiltonian Monte Carlo

Mangoubi, Oren, Vishnoi, Nisheeth

Neural Information Processing SystemsDec-31-2018

Hamiltonian Monte Carlo (HMC) is a widely deployed method to sample from high-dimensional distributions in Statistics and Machine learning. HMC is known to run very efficiently in practice and its popular second-order ``leapfrog" implementation has long been conjectured to run in $d^{1/4}$ gradient evaluations. Here we show that this conjecture is true when sampling from strongly log-concave target distributions that satisfy a weak third-order regularity property associated with the input data. Our regularity condition is weaker than the Lipschitz Hessian property and allows us to show faster convergence bounds for a much larger class of distributions than would be possible with the usual Lipschitz Hessian constant alone. Important distributions that satisfy our regularity condition include posterior distributions used in Bayesian logistic regression for which the data satisfies an ``incoherence" property. Our result compares favorably with the best available bounds for the class of strongly log-concave distributions, which grow like $d^{{1}/{2}}$ gradient evaluations with the dimension. Moreover, our simulations on synthetic data suggest that, when our regularity condition is satisfied, leapfrog HMC performs better than its competitors -- both in terms of accuracy and in terms of the number of gradient evaluations it requires.

artificial intelligence, bayesian inference, gradient evaluation, (15 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

Add feedback

Does Hamiltonian Monte Carlo mix faster than a random walk on multimodal densities?

Mangoubi, Oren, Pillai, Natesh S., Smith, Aaron

arXiv.org Machine LearningAug-9-2018

Hamiltonian Monte Carlo (HMC) is a very popular and generic collection of Markov chain Monte Carlo (MCMC) algorithms. One explanation for the popularity of HMC algorithms is their excellent performance as the dimension $d$ of the target becomes large: under conditions that are satisfied for many common statistical models, optimally-tuned HMC algorithms have a running time that scales like $d^{0.25}$. In stark contrast, the running time of the usual Random-Walk Metropolis (RWM) algorithm, optimally tuned, scales like $d$. This superior scaling of the HMC algorithm with dimension is attributed to the fact that it, unlike RWM, incorporates the gradient information in the proposal distribution. In this paper, we investigate a different scaling question: does HMC beat RWM for highly $\textit{multimodal}$ targets? We find that the answer is often $\textit{no}$. We compute the spectral gaps for both the algorithms for a specific class of multimodal target densities, and show that they are identical. The key reason is that, within one mode, the gradient is effectively ignorant about other modes, thus negating the advantage the HMC algorithm enjoys in unimodal targets. We also give heuristic arguments suggesting that the above observation may hold quite generally. Our main tool for answering this question is a novel simple formula for the conductance of HMC using Liouville's theorem. This result allows us to compute the spectral gap of HMC algorithms, for both the classical HMC with isotropic momentum and the recent Riemannian HMC, for multimodal targets.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1808.0323

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.36)

Add feedback

Dimensionally Tight Running Time Bounds for Second-Order Hamiltonian Monte Carlo

Mangoubi, Oren, Vishnoi, Nisheeth K.

arXiv.org Machine LearningFeb-24-2018

Hamiltonian Monte Carlo (HMC) is a widely deployed method to sample from a given high-dimensional distribution in Statistics and Machine learning. HMC is known to run very efficiently in practice and its second-order variant was conjectured to run in $d^{1/4}$ steps in 1988. Here we show that this conjecture is true when sampling from strongly log-concave target distributions that satisfy weak third-order regularity properties associated with the input data. This improves upon a recent result that shows that the number of steps of the second-order discretization of HMC grows like $d^{1/4}$ under the much stronger assumption that the distribution is separable and its first four Fr\'echet derivatives are bounded. Our result also compares favorably with the best available running time bounds for the class of strongly log-concave distributions, namely the current best bounds for both the overdamped and underdamped Langevin, and first-order HMC Algorithms, which all grow like $d^{{1}/{2}}$ with the dimension. Key to our result is a new regularity condition for the Hessian that may be of independent interest. The class of distributions that satisfy this condition are natural and include posterior distributions used in Bayesian logistic "ridge" regression.

artificial intelligence, health & medicine, running time, (17 more...)

arXiv.org Machine Learning

1802.08898

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Switzerland (0.14)

Genre: Research Report > New Finding (0.86)

Industry: Health & Medicine (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Convex Optimization with Nonconvex Oracles

Mangoubi, Oren, Vishnoi, Nisheeth K.

arXiv.org Machine LearningNov-7-2017

In machine learning and optimization, one often wants to minimize a convex objective function $F$ but can only evaluate a noisy approximation $\hat{F}$ to it. Even though $F$ is convex, the noise may render $\hat{F}$ nonconvex, making the task of minimizing $F$ intractable in general. As a consequence, several works in theoretical computer science, machine learning and optimization have focused on coming up with polynomial time algorithms to minimize $F$ under conditions on the noise $F(x)-\hat{F}(x)$ such as its uniform-boundedness, or on $F$ such as strong convexity. However, in many applications of interest, these conditions do not hold. Here we show that, if the noise has magnitude $\alpha F(x) + \beta$ for some $\alpha, \beta > 0$, then there is a polynomial time algorithm to find an approximate minimizer of $F$. In particular, our result allows for unbounded noise and generalizes those of Applegate and Kannan, and Zhang, Liang and Charikar, who proved similar results for the bounded noise case, and that of Belloni et al. who assume that the noise grows in a very specific manner and that $F$ is strongly convex. Turning our result on its head, one may also view our algorithm as minimizing a nonconvex function $\hat{F}$ that is promised to be related to a convex function $F$ as above. Our algorithm is a "simulated annealing" modification of the stochastic gradient Langevin Markov chain and gradually decreases the temperature of the chain to approach the global minimizer. Analyzing such an algorithm for the unbounded noise model and a general convex function turns out to be challenging and requires several technical ideas that might be of independent interest in deriving non-asymptotic bounds for other simulated annealing based algorithms.

artificial intelligence, markov chain, optimization problem, (20 more...)

arXiv.org Machine Learning

1711.02621

Country:

Europe (0.46)
Asia > Afghanistan > Parwan Province > Charikar (0.24)
North America > Canada > Quebec (0.14)

Genre: Research Report (0.90)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.89)

Add feedback