AITopics | biased expectation

Collaborating Authors

biased expectation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Robustness Analysis of Non-Convex Stochastic Gradient Descent using Biased Expectations

Neural Information Processing SystemsDec-24-2025, 13:05:39 GMT

This work proposes a novel analysis of stochastic gradient descent (SGD) for non-convex and smooth optimization.

noise, non-convex stochastic gradient descent, robustness analysis, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.64)

Add feedback

Review for NeurIPS paper: Robustness Analysis of Non-Convex Stochastic Gradient Descent using Biased Expectations

Neural Information Processing SystemsFeb-5-2025, 08:23:10 GMT

Weaknesses: While the "biased expectation" appears to be a powerful tool, the overall results are restricted to the gradients of the algorithm at _some_ time t in the last T iterates. While this is a common outcome of the standard analysis of SGD, it would be nice if (with some additional assumptions on f) the results could be transposed to f(x_t) or x_t within some basin of attraction. The special case of s 0 needs much more detailed treatment. While the authors point out in the supplement that \phi is continuous at s 0, much of the document switches between looking at s- 0 or s 0 without explanation. Assumption 1: I see that the authors need to contol X_t 2 in Thm 1. (Eq.

biased expectation, non-convex stochastic gradient descent, robustness analysis, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.85)

Add feedback

Review for NeurIPS paper: Robustness Analysis of Non-Convex Stochastic Gradient Descent using Biased Expectations

Neural Information Processing SystemsFeb-5-2025, 08:23:03 GMT

After significant discussions with the reviewers, the reviewers were all unanimously in appreciation of the simplicity and cleanliness of the approach presented by the paper. However the authors are strongly encouraged to improve the presentation of the paper - especially the crucial proof of Lemma 1 - multiple steps have been contracted in the presentation and clarifying them is necessary. Furthermore the case of the diminishing step-size scheme is strongly suggested to be fleshed out in theory rather than being left as straightforward extensions. Lastly, the reviewers suggested to use heavier tailed distribution like the Levy distribution to verify the theory better.

biased expectation, non-convex stochastic gradient descent, robustness analysis, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.85)

Add feedback

Robustness Analysis of Non-Convex Stochastic Gradient Descent using Biased Expectations

Neural Information Processing SystemsOct-11-2024, 05:55:04 GMT

This work proposes a novel analysis of stochastic gradient descent (SGD) for non-convex and smooth optimization. In the case of sub-Gaussian and centered noise, we prove that, with probability 1-\delta, the number of iterations to reach a precision \varepsilon for the squared gradient norm is O(\varepsilon {-2}\ln(1/\delta)) . In the case of centered and integrable heavy-tailed noise, we show that, while the expectation of the iterates may be infinite, the squared gradient norm still converges with probability 1-\delta in O(\varepsilon {-p}\delta {-q}) iterations, where p,q 2 . This result shows that heavy-tailed noise on the gradient slows down the convergence of SGD without preventing it, proving that SGD is robust to gradient noise with unbounded variance, a setting of interest for Deep Learning. In addition, it indicates that choosing a step size proportional to T {-1/b} where b is the tail-parameter of the noise and T is the number of iterations leads to the best convergence rates.

noise, non-convex stochastic gradient descent, robustness analysis, (8 more...)

Neural Information Processing Systems

Genre: Research Report (0.41)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback