AITopics | Gradient Descent

Collaborating Authors

Gradient Descent

News Overviews Instructional Materials AI-Alerts Classics

8f121ce07d74717e0b1f21d122e04521-Reviews.html

Neural Information Processing SystemsMar-13-2024, 18:36:04 GMT

The paper presents a iterative algorithm to a robust principal component matrix factorization. The data is modeled as a sum of a low rank matrix approximation and a sparse noise matrix. Constraining the norms of the row and column factors of the former part of the sum allows to implement the nuclear norm minimization of the batch robust pca algorithm in an online stochastic gradient descent fashion. The approach taken by the authors resembles very much the approach taken by Marial et al 2009 (ICML) and 2010 (JMLR), only that the objective is slightly different (Robust PCA was not dealt with in the JMLR version). One difference between the JMLR and the ICML version is that a standard stochastic gradient version of the objective function did not perform as well as the proposed online dictionary learning approach (in which the statistics of the data are accumulated in the matrices A and B) in the ICML version, but in the JMLR version, a standard stochastic gradient implementation with appropriately chosen learning rate seemed to perform ok.

algorithm, nuclear norm minimization, objective function, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Learning Efficient Random Maximum A-Posteriori Predictors with Non-Decomposable Loss Functions

Neural Information Processing SystemsMar-13-2024, 17:52:02 GMT

In this work we develop efficient methods for learning random MAP predictors for structured label problems. In particular, we construct posterior distributions over perturbations that can be adjusted via stochastic gradient methods. We show that any smooth posterior distribution would suffice to define a smooth PAC-Bayesian risk bound suitable for gradient methods. In addition, we relate the posterior distributions to computational properties of the MAP predictors. We suggest multiplicative posteriors to learn super-modular potential functions that accompany specialized MAP predictors such as graph-cuts. We also describe label-augmented posterior models that can use efficient MAP approximations, such as those arising from linear program relaxations.

map prediction, posterior distribution, posterior model, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > New York (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Add feedback

Non-strongly-convex smooth stochastic approximation with convergence rateO(1/n)

Neural Information Processing SystemsMar-13-2024, 17:51:20 GMT

We consider the stochastic approximation problem where a convex function has to be minimized, given only the knowledge of unbiased estimates of its gradients at certain points, a framework which includes machine learning methods based on the minimization of the empirical risk. We focus on problems without strong convexity, for which all previously known algorithms achieve a convergence rate for function values of O(1/ n) after n iterations. We consider and analyze two algorithms that achieve a rate of O(1/n) for classical supervised learning problems. For least-squares regression, we show that averaged stochastic gradient descent with constant step-size achieves the desired rate. For logistic regression, this is achieved by a simple novel stochastic gradient algorithm that (a) constructs successive local quadratic approximations of the loss functions, while (b) preserving the same running-time complexity as stochastic gradient descent. For these algorithms, we provide a non-asymptotic analysis of the generalization error (in expectation, and also in high probability for least-squares), and run extensive experiments showing that they often outperform existing approaches.

algorithm, log 10, regression, (14 more...)

Neural Information Processing Systems

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
Europe > United Kingdom > England > West Sussex (0.04)

Genre: Research Report > New Finding (0.50)

Industry: Education (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

7380ad8a673226ae47fce7bff88e9c33-Reviews.html

Neural Information Processing SystemsMar-13-2024, 17:22:52 GMT

Summary of paper: This is a technical paper on supervised sparse coding. It defines a generalized form of Lasso problem that equates some input variables x (e.g. the input image) to some hidden variables y (e.g. the underlying, analyzed or denoised form of the image) in least squares via two arbitrary weight matrices -- M1*x-M2*y 2 -- adding an L1 penalty on a linear mapping Omega*y and an optional L2 penalty on y itself. This formulation is general enough to cover both analysis and synthesis (generative) problems. The paper solves this system for y(x) via a proximal iteration on auxilliary variables z Omega*y -- an ADMM style method -- and it also offers fast approximation of y(x) via a network that contains an unwound, truncated ADMM with re-learned parameters along the lines of Gregor & LeCun [11]. Finally, it proposes supervised parameter (mapping matrix) learning using stochastic gradient descent over an arbitrary problem-specific loss function on y(x), and to this end it derives some explicit formulae for cost gradients in terms of sign(Omega*y) and the matrices and auxilliary vectors involved. The method is illustrated on an image super-resolution problem and a polyphonic music transcription one.

equation, matrix, technical paper, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.57)

Add feedback

Understanding Dropout

Neural Information Processing SystemsMar-13-2024, 17:22:36 GMT

Dropout is a relatively new algorithm for training neural networks which relies on stochastically "dropping out" neurons during training in order to avoid the co-adaptation of feature detectors. We introduce a general formalism for studying dropout on either units or connections, with arbitrary probability values, and use it to analyze the averaging and regularizing properties of dropout in both linear and non-linear networks. For deep neural networks, the averaging properties of dropout are characterized by three recursive equations, including the approximation of expectations by normalized weighted geometric means. We provide estimates and bounds for these approximations and corroborate the results with simulations. Among other results, we also show how dropout performs stochastic gradient descent on a regularized error function.

approximation, dropout, nw gm, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Orange County > Irvine (0.14)
North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Texas > Travis County > Austin (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

6d0f846348a856321729a2f36734d1a7-Reviews.html

Neural Information Processing SystemsMar-13-2024, 17:13:30 GMT

This equation, and others, probably overlap quite substantially with the approaches taken in e.g. "Optimization Algorithms on Matrix Manifolds" by Absil, P., Mahony, R., Sepulchre, R. ( this reference also addresses more general gradient flows, stepsize, retraction ("projection"), and convergence issues). S. Bonnabel, "Stochastic gradient descent on Riemannian manifolds" (arXiv) A. Edelman et al. "THE GEOMETRY OF ALGORITHMS WITH ORTHOGONALITY CONSTRAINTS" - The perspective of gradient descent on matrix manifolds is, to this reviewer, a major omission. The paper could greatly benefit from a discussion as to how this slice of the literature fits in to the picture, in terms of both practical algorithms and theory. I understand that space is a constraint. But if the authors decide to ultimately write a journal version of the submission, careful, considered inclusion of the above would make for a fantastic, authoritative paper bridging all of the major points of view on the topic.

algorithm, gradient descent, manifold, (2 more...)

Neural Information Processing Systems

Genre: Summary/Review (0.53)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.81)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.59)

Add feedback

Stochastic Optimization of PCA with Capped MSG

Neural Information Processing SystemsMar-13-2024, 17:13:28 GMT

We study PCA as a stochastic optimization problem and propose a novel stochastic approximation algorithm which we refer to as "Matrix Stochastic Gradient" (MSG), as well as a practical variant, Capped MSG. We study the method both theoretically and empirically.

algorithm, eigenvalue, problem 3, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Cook County > Chicago (0.05)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

Linear Convergence with Condition Number Independent Access of Full Gradients

Neural Information Processing SystemsMar-13-2024, 15:51:44 GMT

For smooth and strongly convex optimizations, the optimal iteration complexity of the gradient-based algorithm is O( κlog1/ǫ), where κ is the condition number. In the case that the optimization problem is ill-conditioned, we need to evaluate a large number of full gradients, which could be computationally expensive. In this paper, we propose to remove the dependence on the condition number by allowing the algorithm to access stochastic gradients of the objective function. To this end, we present a novel algorithm named Epoch Mixed Gradient Descent (EMGD) that is able to utilize two kinds of gradients. A distinctive step in EMGD is the mixed gradient descent, where we use a combination of the full and stochastic gradients to update the intermediate solution.

gradient, objective function, stochastic gradient, (15 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
North America > United States > New York (0.04)
North America > United States > Michigan > Ingham County > Lansing (0.04)
North America > United States > Michigan > Ingham County > East Lansing (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Stochastic Gradient Riemannian Langevin Dynamics on the Probability Simplex

Neural Information Processing SystemsMar-13-2024, 15:36:53 GMT

In this paper we investigate the use of Langevin Monte Carlo methods on the probability simplex and propose a new method, Stochastic gradient Riemannian Langevin dynamics, which is simple to implement and can be applied to large scale data. We apply this method to latent Dirichlet allocation in an online minibatch setting, and demonstrate that it achieves substantial performance improvements over the state of the art online variational Bayesian methods.

algorithm, langevin dynamic, parameterisation, (13 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.05)
South America > Paraguay > Asunción > Asunción (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.88)

Add feedback

Estimation Optimization and Parallelism when Data is Sparse

Neural Information Processing SystemsMar-13-2024, 15:21:15 GMT

We study stochastic optimization problems when the data is sparse, which is in a sense dual to current perspectives on high-dimensional statistical learning and optimization. We highlight both the difficulties--in terms of increased sample complexity that sparse data necessitates--and the potential benefits, in terms of allowing parallelism and asynchrony in the design of algorithms. Concretely, we derive matching upper and lower bounds on the minimax rate for optimization and learning with sparse data, and we exhibit algorithms achieving these rates. We also show how leveraging sparsity leads to (still minimax optimal) parallel and asynchronous algorithms, providing experimental evidence complementing our theoretical results on several medium to large-scale learning tasks.

algorithm, processor, sync da, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > New York (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Add feedback