AITopics

2411.00568

Country:

Europe (0.67)
North America > United States (0.27)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry:

Banking & Finance (0.67)
Energy > Oil & Gas > Upstream (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

arXiv.org Machine LearningNov-28-2023

Sinkhorn Flow: A Continuous-Time Framework for Understanding and Generalizing the Sinkhorn Algorithm

Karimi, Mohammad Reza, Hsieh, Ya-Ping, Krause, Andreas

Many problems in machine learning can be formulated as solving entropy-regularized optimal transport on the space of probability measures. The canonical approach involves the Sinkhorn iterates, renowned for their rich mathematical properties. Recently, the Sinkhorn algorithm has been recast within the mirror descent framework, thus benefiting from classical optimization theory insights. Here, we build upon this result by introducing a continuous-time analogue of the Sinkhorn algorithm. This perspective allows us to derive novel variants of Sinkhorn schemes that are robust to noise and bias. Moreover, our continuous-time dynamics not only generalize but also offer a unified perspective on several recently discovered dynamics in machine learning and mathematics, such as the "Wasserstein mirror flow" of (Deb et al. 2023) or the "mean-field Schr\"odinger equation" of (Claisse et al. 2023).

algorithm, artificial intelligence, machine learning, (17 more...)

2311.16706

Country: North America > United States (0.14)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

arXiv.org Artificial IntelligenceNov-4-2023

Riemannian stochastic optimization methods avoid strict saddle points

Hsieh, Ya-Ping, Karimi, Mohammad Reza, Krause, Andreas, Mertikopoulos, Panayotis

Many modern machine learning applications - from online principal component analysis to covariance matrix identification and dictionary learning - can be formulated as minimization problems on Riemannian manifolds, and are typically solved with a Riemannian stochastic gradient method (or some variant thereof). However, in many cases of interest, the resulting minimization problem is not geodesically convex, so the convergence of the chosen solver to a desirable solution - i.e., a local minimizer - is by no means guaranteed. In this paper, we study precisely this question, that is, whether stochastic Riemannian optimization algorithms are guaranteed to avoid saddle points with probability 1. For generality, we study a family of retraction-based methods which, in addition to having a potentially much lower per-iteration cost relative to Riemannian gradient descent, include other widely used algorithms, such as natural policy gradient methods and mirror descent in ordinary convex spaces. In this general setting, we show that, under mild assumptions for the ambient manifold and the oracle providing gradient information, the policies under study avoid strict saddle points / submanifolds with probability 1, from any initial condition. This result provides an important sanity check for the use of gradient methods on manifolds as it shows that, almost always, the limit state of a stochastic Riemannian algorithm can only be a local minimizer.

artificial intelligence, machine learning, saddle point, (18 more...)

2311.02374

Country:

Europe (1.00)
North America > United States (0.46)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

arXiv.org Artificial IntelligenceMar-13-2023

A Dynamical System View of Langevin-Based Non-Convex Sampling

Karimi, Mohammad Reza, Hsieh, Ya-Ping, Krause, Andreas

Non-convex sampling is a key challenge in machine learning, central to non-convex optimization in deep learning as well as to approximate probabilistic inference. Despite its significance, theoretically there remain many important challenges: Existing guarantees (1) typically only hold for the averaged iterates rather than the more desirable last iterates, (2) lack convergence metrics that capture the scales of the variables such as Wasserstein distances, and (3) mainly apply to elementary schemes such as stochastic gradient Langevin dynamics. In this paper, we develop a new framework that lifts the above issues by harnessing several tools from the theory of dynamical systems. Our key result is that, for a large class of state-of-the-art sampling schemes, their last-iterate convergence in Wasserstein distances can be reduced to the study of their continuous-time counterparts, which is much better understood. Coupled with standard assumptions of MCMC sampling, our theory immediately yields the last-iterate Wasserstein convergence of many advanced sampling schemes such as proximal, randomized mid-point, and Runge-Kutta integrators. Beyond existing methods, our framework also motivates more efficient schemes that enjoy the same rigorous guarantees.

artificial intelligence, convergence, machine learning, (17 more...)

2210.13867

Country: Europe (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

arXiv.org Artificial IntelligenceFeb-25-2023

Isotropic Gaussian Processes on Finite Spaces of Graphs

Borovitskiy, Viacheslav, Karimi, Mohammad Reza, Somnath, Vignesh Ram, Krause, Andreas

We propose a principled way to define Gaussian process priors on various sets of unweighted graphs: directed or undirected, with or without loops. We endow each of these sets with a geometric structure, inducing the notions of closeness and symmetries, by turning them into a vertex set of an appropriate metagraph. Building on this, we describe the class of priors that respect this structure and are analogous to the Euclidean isotropic processes, like squared exponential or Mat\'ern. We propose an efficient computational technique for the ostensibly intractable problem of evaluating these priors' kernels, making such Gaussian processes usable within the usual toolboxes and downstream applications. We go further to consider sets of equivalence classes of unweighted graphs and define the appropriate versions of priors thereon. We prove a hardness result, showing that in this case, exact kernel computation cannot be performed efficiently. However, we propose a simple Monte Carlo approximation for handling moderately sized cases. Inspired by applications in chemistry, we illustrate the proposed techniques on a real molecular property prediction task in the small data regime.

artificial intelligence, gaussian process, machine learning, (19 more...)

2211.01689

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Modeling & Simulation (0.86)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

arXiv.org Artificial IntelligenceDec-27-2022

Riemannian stochastic approximation algorithms

Karimi, Mohammad Reza, Hsieh, Ya-Ping, Mertikopoulos, Panayotis, Krause, Andreas

We examine a wide class of stochastic approximation algorithms for solving (stochastic) nonlinear problems on Riemannian manifolds. Such algorithms arise naturally in the study of Riemannian optimization, game theory and optimal transport, but their behavior is much less understood compared to the Euclidean case because of the lack of a global linear structure on the manifold. We overcome this difficulty by introducing a suitable Fermi coordinate frame which allows us to map the asymptotic behavior of the Riemannian Robbins-Monro (RRM) algorithms under study to that of an associated deterministic dynamical system. In so doing, we provide a general template of almost sure convergence results that mirrors and extends the existing theory for Euclidean Robbins-Monro schemes, despite the significant complications that arise due to the curvature and topology of the underlying manifold. We showcase the flexibility of the proposed framework by applying it to a range of retraction-based variants of the popular optimistic / extra-gradient methods for solving minimization problems and games, and we provide a unified treatment for their convergence.

artificial intelligence, machine learning, manifold, (19 more...)

2206.06795

Country:

North America > United States (0.28)
Europe > Switzerland (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Mathematics of Computing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

arXiv.org Machine LearningOct-21-2020

Online Active Model Selection for Pre-trained Classifiers

Karimi, Mohammad Reza, Gürel, Nezihe Merve, Karlaš, Bojan, Rausch, Johannes, Zhang, Ce, Krause, Andreas

Model selection from a set of pre-trained models is an emerging problem in machine learning and has implications in several practical scenarios. Industrial examples include cases in which a telecommunication company or a flight booking company have multiple ML models trained over different sliding windows of data and hope to pick the one that performs the best on a given day. For many real-world problems, unlabeled data is abundant and can be inexpensively collected, while labels are expensive to acquire and require human expertise. Consequently, there is a need to robustly identify the best model under limited labeling resources. Similarly, one often needs reasonable predictions for the unlabeled data while keeping the labeling budget low. Depending on data availability, one can consider two settings: the pool-based setting assumes that the learner has access to a pool of unlabeled data, and she tries to select informative data samples from the pool to achieve her task. The online (streaming) setting works with a stream of data, where the data arrives one at a time, and the learner decides to ask for the label of the data samples on the go or to just throw the sample away. While offering less options on which data to label next, this streaming setting alleviates the scalability challenge of storing and processing a large pool of examples in the pool-based setting. Another important aspect is the nature of the data: the instance/label pairs might be sampled i.i.d.

big data, probability, télécommunications, (20 more...)

2010.09818

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Wisconsin > Dane County > Madison (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Telecommunications (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.71)
Information Technology > Data Science > Data Mining > Big Data (0.46)

arXiv.org Machine LearningNov-5-2017

Stochastic Submodular Maximization: The Case of Coverage Functions

Karimi, Mohammad Reza, Lucic, Mario, Hassani, Hamed, Krause, Andreas

Stochastic optimization of continuous objectives is at the heart of modern machine learning. However, many important problems are of discrete nature and often involve submodular objectives. We seek to unleash the power of stochastic continuous optimization, namely stochastic gradient descent and its variants, to such discrete problems. We first introduce the problem of stochastic submodular optimization, where one needs to optimize a submodular objective which is given as an expectation. Our model captures situations where the discrete objective arises as an empirical risk (e.g., in the case of exemplar-based clustering), or is given as an explicit stochastic model (e.g., in the case of influence maximization in social networks). By exploiting that common extensions act linearly on the class of submodular functions, we employ projected stochastic gradient ascent and its variants in the continuous domain, and perform rounding to obtain discrete solutions. We focus on the rich and widely used family of weighted coverage functions. We show that our approach yields solutions that are guaranteed to match the optimal approximation guarantees, while reducing the computational cost by several orders of magnitude, as we demonstrate empirically.

algorithm, artificial intelligence, optimization problem, (16 more...)

1711.01566

Country:

Europe (0.28)
North America > United States > California (0.14)

Genre: Research Report (0.64)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.74)

arXiv.org Machine LearningMay-22-2016

Smart broadcasting: Do you want to be seen?

Karimi, Mohammad Reza, Tavakoli, Erfan, Farajtabar, Mehrdad, Song, Le, Gomez-Rodriguez, Manuel

Many users in online social networks are constantly trying to gain attention from their followers by broadcasting posts to them. These broadcasters are likely to gain greater attention if their posts can remain visible for a longer period of time among their followers' most recent feeds. Then when to post? In this paper, we study the problem of smart broadcasting using the framework of temporal point processes, where we model users feeds and posts as discrete events occurring in continuous time. Based on such continuous-time model, then choosing a broadcasting strategy for a user becomes a problem of designing the conditional intensity of her posting events. We derive a novel formula which links this conditional intensity with the visibility of the user in her followers' feeds. Furthermore, by exploiting this formula, we develop an efficient convex optimization framework for the when-to-post problem. Our method can find broadcasting strategies that reach a desired visibility level with provable guarantees. We experimented with data gathered from Twitter, and show that our framework can consistently make broadcasters' post more visible than alternatives.

broadcaster, optimization problem, social media, (20 more...)

1605.06855

Genre: Research Report (0.64)

Industry: Information Technology > Services (0.35)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)