AITopics | Mathematical & Statistical Methods

Collaborating Authors

Mathematical & Statistical Methods

News Overviews Instructional Materials AI-Alerts Classics

SVGD as a kernelized Wasserstein gradient flow of the chi-squared divergence

Neural Information Processing SystemsOct-9-2024, 14:53:18 GMT

Stein Variational Gradient Descent (SVGD), a popular sampling algorithm, is often described as the kernelized gradient flow for the Kullback-Leibler divergence in the geometry of optimal transport. We introduce a new perspective on SVGD that instead views SVGD as the kernelized gradient flow of the chi-squared divergence. Motivated by this perspective, we provide a convergence analysis of the chi-squared gradient flow. We also show that our new perspective provides better guidelines for choosing effective kernels for SVGD.

chi-squared divergence, gradient flow, kernelized wasserstein gradient flow, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.99)

Add feedback

Neural Lyapunov Control for Discrete-Time Systems

Neural Information Processing SystemsOct-9-2024, 12:48:32 GMT

While ensuring stability for linear systems is well understood, it remains a major challenge for nonlinear systems. A general approach in such cases is to compute a combination of a Lyapunov function and an associated control policy. However, finding Lyapunov functions for general nonlinear systems is a challenging task. To address this challenge, several methods have been proposed that represent Lyapunov functions using neural networks. However, such approaches either focus on continuous-time systems, or highly restricted classes of nonlinear dynamics.

discrete-time system, lyapunov function, neural lyapunov control, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.80)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.58)

Add feedback

Learning Erdos-Renyi Random Graphs via Edge Detecting Queries

Neural Information Processing SystemsOct-9-2024, 10:39:59 GMT

In this paper, we consider the problem of learning an unknown graph via queries on groups of nodes, with the result indicating whether or not at least one edge is present among those nodes. While learning arbitrary graphs with n nodes and k edges is known to be hard in the sense of requiring \Omega( \min\{ k 2 \log n, n 2\}) tests (even when a small probability of error is allowed), we show that learning an Erd\H{o}s-R\'enyi random graph with an average of \kbar edges is much easier; namely, one can attain asymptotically vanishing error probability with only O(\kbar \log n) tests. We establish such bounds for a variety of algorithms inspired by the group testing problem, with explicit constant factors indicating a near-optimal number of tests, and in some cases asymptotic optimality including constant factors. In addition, we present an alternative design that permits a near-optimal sublinear decoding time of O(\kbar \log 2 \kbar \kbar \log n) .

constant factor, edge detecting query, learning erdo-renyi random graph, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.65)
Information Technology > Artificial Intelligence > Machine Learning (0.44)

Add feedback

Stochastic Cubic Regularization for Fast Nonconvex Optimization

Neural Information Processing SystemsOct-8-2024, 19:33:04 GMT

This paper proposes a stochastic variant of a classic algorithm---the cubic-regularized Newton method [Nesterov and Polyak]. The proposed algorithm efficiently escapes saddle points and finds approximate local minima for general smooth, nonconvex functions in only \mathcal{\tilde{O}}(\epsilon {-3.5}) stochastic gradient and stochastic Hessian-vector product evaluations. The latter can be computed as efficiently as stochastic gradients. This improves upon the \mathcal{\tilde{O}}(\epsilon {-4}) rate of stochastic gradient descent. Our rate matches the best-known result for finding local minima without requiring any delicate acceleration or variance-reduction techniques.

fast nonconvex optimization, local minima, stochastic cubic regularization, (2 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.12)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Uncertainty Sampling is Preconditioned Stochastic Gradient Descent on Zero-One Loss

Neural Information Processing SystemsOct-8-2024, 16:29:29 GMT

Uncertainty sampling, a popular active learning algorithm, is used to reduce the amount of data required to learn a classifier, but it has been observed in practice to converge to different parameters depending on the initialization and sometimes to even better parameters than standard training on all the data. In this work, we give a theoretical explanation of this phenomenon, showing that uncertainty sampling on a convex (e.g., logistic) loss can be interpreted as performing a preconditioned stochastic gradient step on the population zero-one loss. Experiments on synthetic and real datasets support this connection.

preconditioned stochastic gradient descent, uncertainty sampling, zero-one loss

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.69)

Add feedback

Reviews: A Stein variational Newton method

Neural Information Processing SystemsOct-8-2024, 10:56:19 GMT

Summary: SVGD iteratively moves a set of particles toward the target by choosing a perturbative direction to maximumly decrease the KL divergence with the target distribution in RKHS. The paper proposes to add second-order information into SVGD updates, preliminary empirical results show that their method converges faster in few cases. The paper is well written, and the proofs seem correct. An important reason in using second-order information is the hope to achieve a faster convergence rate. My major concern is a lack of theoretical analysis of convergence rate in this paper: 1) An appealing property of SVGD is that the optimal decreasing rate equals to Stein discrepancy D_F(q p), where F is a function set that includes all possible velocity fields.

convergence rate, stein discrepancy, stein variational newton method, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.40)

Add feedback

Reviews: Efficient Stochastic Gradient Hard Thresholding

Neural Information Processing SystemsOct-8-2024, 08:47:48 GMT

The article analyses convergence in hard-thresholding algorithms and proposes an accelerated stochastic hybrid hard thresholding method that displays better convergence with respect to the compared methods. The article is dense but relatively fine to follow. Theoretical development seems to be complete and accurate, though I admit I have not throughly followed the full derivation. Experimental section is in accordance with the theoretical claims and is more than sufficient. Just for the sake of reproducibility of the results an exhaustive pseudocode or repository should be made available as a companion to the article to further strength the autor's points.

convergence, efficient stochastic gradient hard thresholding, review

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.40)

Add feedback

Reviews: A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization

Neural Information Processing SystemsOct-8-2024, 08:12:46 GMT

This paper focuses on the optimization problem min f(x) h(x), where f is of a finite sum structure (with n functions in the sum), with nonconvex but smooth components, and h is a convex but possibly nonsmooth function. So, this is a nonconvex finite sum problem with a convex regularizer. Function h is treated using a prox step. The authors propose a small modification to ProxSVRG (called ProxSVRG), and prove that this small modification has surprisingly interesting consequences. The modification consists in replacing the full gradient computation in the outer loop of ProxSVRG by an approximation thereof through subsampling/minibatch (batch size B).

artificial intelligence, machine learning, proxsvrg, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.42)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.42)

Add feedback

Reviews: GIANT: Globally Improved Approximate Newton Method for Distributed Optimization

Neural Information Processing SystemsOct-8-2024, 06:21:24 GMT

The paper introduces GIANT, a distributed variant of Newton algorithm. The considered problem is important and the paper gives a nice contribution to the field of distributed optimisation. The paper is very clear and nice to read, and propose nice theoretical contributions and experiments, with a detailed bibliography and positioning with respect to priori work. Here is my main criticism: * Authors acknowledge that their approach is close to previous works, namely DANE, for which GIANT seem to coincide to DANE in the least-squares loss case. However, the rate obtained in the paper is much better, certainly thanks to the introduction of the incoherence assumption, which is well known in the field of compressed sensing and randomized linear algebra.

assumption, dataset, globally improved approximate newton method, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.72)

Add feedback

Model Predictive Control is Almost Optimal for Restless Bandit

Gast, Nicolas, Narasimha, Dheeraj

arXiv.org Machine LearningOct-8-2024

We consider the discrete time infinite horizon average reward restless markovian bandit (RMAB) problem. We propose a \emph{model predictive control} based non-stationary policy with a rolling computational horizon $\tau$. At each time-slot, this policy solves a $\tau$ horizon linear program whose first control value is kept as a control for the RMAB. Our solution requires minimal assumptions and quantifies the loss in optimality in terms of $\tau$ and the number of arms, $N$. We show that its sub-optimality gap is $O(1/\sqrt{N})$ in general, and $\exp(-\Omega(N))$ under a local-stability condition. Our proof is based on a framework from dynamic control known as \emph{dissipativity}. Our solution easy to implement and performs very well in practice when compared to the state of the art. Further, both our solution and our proof methodology can easily be generalized to more general constrained MDP settings and should thus, be of great interest to the burgeoning RMAB community.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

2410.06307

Country:

Europe > France (0.14)
North America > United States (0.14)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas > Upstream (0.60)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.66)

Add feedback