AITopics | Gradient Descent

Collaborating Authors

Gradient Descent

News Overviews Instructional Materials AI-Alerts Classics

Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning

Neural Information Processing SystemsMar-15-2024, 01:43:48 GMT

We consider the minimization of a convex objective function defined on a Hilbert space, which is only available through unbiased estimates of its gradients. This problem includes standard machine learning algorithms such as kernel logistic regression and least-squares regression, and is commonly referred to as a stochastic approximation problem in the operations research community. We provide a non-asymptotic analysis of the convergence of two well-known algorithms, stochastic gradient descent (a.k.a. Robbins-Monro algorithm) as well as a simple modification where iterates are averaged (a.k.a.

assumption, gradient descent, strong convexity, (12 more...)

Neural Information Processing Systems

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report > New Finding (0.36)
Research Report > Experimental Study (0.35)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.59)

Add feedback

DPPE: Dense Pose Estimation in a Plenoxels Environment using Gradient Approximation

Kolios, Christopher, Bahoo, Yeganeh, Saeedi, Sajad

arXiv.org Artificial IntelligenceMar-15-2024

We present DPPE, a dense pose estimation algorithm that functions over a Plenoxels environment. Recent advances in neural radiance field techniques have shown that it is a powerful tool for environment representation. More recent neural rendering algorithms have significantly improved both training duration and rendering speed. Plenoxels introduced a fully-differentiable radiance field technique that uses Plenoptic volume elements contained in voxels for rendering, offering reduced training times and better rendering accuracy, while also eliminating the neural net component. In this work, we introduce a 6-DoF monocular RGB-only pose estimation procedure for Plenoxels, which seeks to recover the ground truth camera pose after a perturbation. We employ a variation on classical template matching techniques, using stochastic gradient descent to optimize the pose by minimizing errors in re-rendering. In particular, we examine an approach that takes advantage of the rapid rendering speed of Plenoxels to numerically approximate part of the pose gradient, using a central differencing technique. We show that such methods are effective in pose estimation. Finally, we perform ablations over key components of the problem space, with a particular focus on image subsampling and Plenoxel grid resolution. Project website: https://sites.google.com/view/dppe

dppe, pose estimation, representation, (12 more...)

arXiv.org Artificial Intelligence

2403.10773

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > Ontario > Toronto (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

A resource-constrained stochastic scheduling algorithm for homeless street outreach and gleaning edible food

Artman, Conor M., Mate, Aditya, Nwankwo, Ezinne, Heching, Aliza, Idé, Tsuyoshi, Jiří\, null, Navrátil, null, Shanmugam, Karthikeyan, Sun, Wei, Varshney, Kush R., Goldkind, Lauri, Kroch, Gidi, Sawyer, Jaclyn, Watson, Ian

arXiv.org Machine LearningMar-15-2024

We developed a common algorithmic solution addressing the problem of resource-constrained outreach encountered by social change organizations with different missions and operations: Breaking Ground -- an organization that helps individuals experiencing homelessness in New York transition to permanent housing and Leket -- the national food bank of Israel that rescues food from farms and elsewhere to feed the hungry. Specifically, we developed an estimation and optimization approach for partially-observed episodic restless bandits under $k$-step transitions. The results show that our Thompson sampling with Markov chain recovery (via Stein variational gradient descent) algorithm significantly outperforms baselines for the problems of both organizations. We carried out this work in a prospective manner with the express goal of devising a flexible-enough but also useful-enough solution that can help overcome a lack of sustainable impact in data science for social good.

algorithm, rmab, transition, (14 more...)

arXiv.org Machine Learning

2403.10638

Country:

Asia > Middle East > Israel (0.25)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > Alameda County > Berkeley (0.14)
(6 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.93)
Social Sector (0.69)
Health & Medicine > Therapeutic Area > Immunology (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
(3 more...)

Add feedback

Sequential Monte Carlo for Inclusive KL Minimization in Amortized Variational Inference

McNamara, Declan, Loper, Jackson, Regier, Jeffrey

arXiv.org Machine LearningMar-15-2024

For training an encoder network to perform amortized variational inference, the Kullback-Leibler (KL) divergence from the exact posterior to its approximation, known as the inclusive or forward KL, is an increasingly popular choice of variational objective due to the mass-covering property of its minimizer. However, minimizing this objective is challenging. A popular existing approach, Reweighted Wake-Sleep (RWS), suffers from heavily biased gradients and a circular pathology that results in highly concentrated variational distributions. As an alternative, we propose SMC-Wake, a procedure for fitting an amortized variational approximation that uses likelihood-tempered sequential Monte Carlo samplers to estimate the gradient of the inclusive KL divergence. We propose three gradient estimators, all of which are asymptotically unbiased in the number of iterations and two of which are strongly consistent. Our method interleaves stochastic gradient updates, SMC samplers, and iterative improvement to an estimate of the normalizing constant to reduce bias from self-normalization. In experiments with both simulated and real datasets, SMC-Wake fits variational distributions that approximate the posterior more accurately than existing methods.

estimator, lt-smc, sampler, (12 more...)

arXiv.org Machine Learning

2403.1061

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Michigan (0.04)
North America > United States > New York (0.04)
Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)

Add feedback

!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent

Neural Information Processing SystemsMar-14-2024, 23:39:06 GMT

Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve stateof-the-art performance on a variety of machine learning tasks. Several researchers have recently proposed schemes to parallelize SGD, but all require performancedestroying memory locking and synchronization. This work aims to show using novel theoretical analysis, algorithms, and implementation that SGD can be implemented without any locking.

ogwild, processor, speedup, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Wisconsin > Dane County > Madison (0.14)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
North America > United States > District of Columbia > Washington (0.04)
North America > United States > California (0.04)

Genre: Research Report (0.46)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Finite Sample Convergence Rates of Zero-Order Stochastic Optimization Methods John C. Duchi Michael I. Jordan 1,2 Martin J. Wainwright

Neural Information Processing SystemsMar-14-2024, 19:36:42 GMT

We consider derivative-free algorithms for stochastic optimization problems that use only noisy function values rather than gradients, analyzing their finite-sample convergence rates. We show that if pairs of function values are available, algorithms that use gradient estimates based on random perturbations suffer a factor of at most d in convergence rate over traditional stochastic gradient methods, where d is the problem dimension. We complement our algorithmic development with information-theoretic lower bounds on the minimax convergence rateof such problems, which show that our bounds are sharp withrespect to all problemdependent quantities: they cannot be improved by more than constant factors.

algorithm, convergence rate, optimization, (14 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.41)
North America > United States > California > Alameda County > Berkeley (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

Stochastic Gradient Descent with Only One Projection

Neural Information Processing SystemsMar-14-2024, 16:45:59 GMT

Although many variants of stochastic gradient descent have been proposed for large-scale convex optimization, most of them require projecting the solution at each iteration to ensure that the obtained solution stays within the feasible domain. For complex domains (e.g., positive semidefinite cone), the projection step can be computationally expensive, making stochastic gradient descent unattractive for large-scale optimization problems. We address this limitation by developing novel stochastic optimization algorithms that do not need intermediate projections. Instead, only one projection at the last iteration is needed to obtain a feasible solution in the given domain. Our theoretical analysis shows that with a high probability, the proposed algorithms achieve an O(1/ T) convergence rate for general convex optimization, and an O(ln T/T) rate for strongly convex optimization under mild conditions about the domain and the objective function.

algorithm, convergence rate, optimization, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Michigan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets

Neural Information Processing SystemsMar-14-2024, 12:51:09 GMT

We propose a new stochastic gradient method for optimizing the sum of a finite set of smooth functions, where the sum is strongly convex. While standard stochastic gradient methods converge at sublinear rates for this problem, the proposed method incorporates a memory of previous gradient values in order to achieve a linear convergence rate. In a machine learning context, numerical experiments indicate that the new algorithm can dramatically outperform standard algorithms, both in terms of optimizing the training error and reducing the test error quickly.

convergence rate, iteration, step size, (15 more...)

Neural Information Processing Systems

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
North America > Canada (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.91)

Add feedback

Scaled Gradients on Grassmann Manifolds for Matrix Completion

Neural Information Processing SystemsMar-14-2024, 10:46:21 GMT

This paper describes gradient methods based on a scaled metric on the Grassmann manifold for low-rank matrix completion. The proposed methods significantly improve canonical gradient methods, especially on ill-conditioned matrices, while maintaining established global convegence and exact recovery guarantees. A connection between a form of subspace iteration for matrix completion and the scaled gradient descent procedure is also established. The proposed conjugate gradient method based on the scaled gradient outperforms several existing algorithms for matrix completion and is competitive with recently proposed methods.

grassmann manifold, iteration, matrix, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

Discriminative Learning of Sum-Product Networks

Neural Information Processing SystemsMar-14-2024, 08:02:42 GMT

Sum-product networks are a new deep architecture that can perform fast, exact inference on high-treewidth models. Only generative methods for training SPNs have been proposed to date. In this paper, we present the first discriminative training algorithms for SPNs, combining the high accuracy of the former with the representational power and tractability of the latter. We show that the class of tractable discriminative SPNs is broader than the class of tractable generative ones, and propose an efficient backpropagation-style algorithm for computing the gradient of the conditional log likelihood. Standard gradient descent suffers from the diffusion problem, but networks with many layers can be learned reliably using "hard" gradient descent, where marginal inference is replaced by MPE inference (i.e., inferring the most probable state of the non-evidence variables). The resulting updates have a simple and intuitive form. We test discriminative SPNs on standard image classification tasks. We obtain the best results to date on the CIFAR-10 dataset, using fewer features than prior methods with an SPN architecture that learns local image structure discriminatively. We also report the highest published test accuracy on STL-10 even though we only use the labeled portion of the dataset.

inference, node, spn, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback