AITopics | bsgd

Collaborating Authors

bsgd

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

cc225865b743ecc91c4743259813f604-Supplemental.pdf

Neural Information Processing SystemsFeb-11-2026, 05:07:05 GMT

bsgd, query, vertex, (15 more...)

Neural Information Processing Systems

Country:

North America > Canada > British Columbia > Vancouver (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)

Add feedback

BiasedStochasticFirst

Neural Information Processing SystemsFeb-7-2026, 17:05:21 GMT

Our lower bound analysis shows that the sample complexities ofBSGD cannot be improved for general convexobjectives and nonconvexobjectivesexcept for smooth nonconvexobjectiveswith Lipschitz continuous gradient estimator.

artificial intelligence, arxivpreprintarxiv, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois (0.05)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)

Add feedback

Biased Stochastic First-Order Methods for Conditional Stochastic Optimization and Applications in Meta Learning

Neural Information Processing SystemsDec-23-2025, 20:02:47 GMT

Conditional stochastic optimization covers a variety of applications ranging from invariant learning and causal inference to meta-learning. However, constructing unbiased gradient estimators for such problems is challenging due to the composition structure. As an alternative, we propose a biased stochastic gradient descent (BSGD) algorithm and study the bias-variance tradeoff under different structural assumptions. We establish the sample complexities of BSGD for strongly convex, convex, and weakly convex objectives under smooth and non-smooth conditions. Our lower bound analysis shows that the sample complexities of BSGD cannot be improved for general convex objectives and nonconvex objectives except for smooth nonconvex objectives with Lipschitz continuous gradient estimator. For this special setting, we propose an accelerated algorithm called biased SpiderBoost (BSpiderBoost) that matches the lower bound complexity. We further conduct numerical experiments on invariant logistic regression and model-agnostic meta-learning to illustrate the performance of BSGD and BSpiderBoost.

artificial intelligence, machine learning, proceedings, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.61)

Add feedback

Biased Stochastic First-Order Methods for Conditional Stochastic Optimization and Applications in Meta Learning

Neural Information Processing SystemsOct-2-2025, 09:01:51 GMT

Our lower bound analysis shows that the sample complexities of BSGD cannot be improved for general convex objectives and nonconvex objectives except for smooth nonconvex objectives with Lipschitz continuous gradient estimator.

artificial intelligence, machine learning, optimization, (15 more...)

Neural Information Processing Systems

Country: North America (0.14)

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

On the Power of Differentiable Learning versus PAC and SQ Learning Emmanuel Abbe

Neural Information Processing SystemsAug-17-2025, 09:56:54 GMT

But this view of differentiable learning ignores two things.

artificial intelligence, machine learning, query, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > British Columbia > Vancouver (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)

Add feedback

On the Power of Differentiable Learning versus PAC and SQ Learning Emmanuel Abbe

Neural Information Processing SystemsAug-17-2025, 09:56:50 GMT

But this view of differentiable learning ignores two things.

artificial intelligence, machine learning, precision, (18 more...)

Neural Information Processing Systems

Country:

North America > Canada > British Columbia > Vancouver (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.72)

Add feedback

Review for NeurIPS paper: Biased Stochastic First-Order Methods for Conditional Stochastic Optimization and Applications in Meta Learning

Neural Information Processing SystemsJan-22-2025, 08:14:11 GMT

Strengths: To the best of my knowledge, the BSGD algorithm is the first stochastic-gradient based algorithm that directly solves CSO problem itself. The two most relevant work that focus on CSO are [12] and [24]; [12] solves a saddle-point problem reformulation of CSO, while [24] resorts to providing sample complexities for SAA approach to solve general CSO problem. With respect to the SAA approach presented in [24], BSGD method improves in sample complexities (they remove the dependence on d) when F is general convex, matching the lower bounds they provide. Although BSGD is not optimal when F is strongly convex and smooth, it matches the complexities of SAA approach[24]. They also argue about the settings in which BSGD may not be optimal, providing a transparent evaluation of their algorithm.

algorithm, conditional stochastic optimization and application, sample complexity, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.73)

Add feedback

On the Power of Differentiable Learning versus PAC and SQ Learning

Abbe, Emmanuel, Kamath, Pritish, Malach, Eran, Sandon, Colin, Srebro, Nathan

arXiv.org Machine LearningAug-9-2021

We study the power of learning via mini-batch stochastic gradient descent (SGD) on the population loss, and batch Gradient Descent (GD) on the empirical loss, of a differentiable model or neural network, and ask what learning problems can be learnt using these paradigms. We show that SGD and GD can always simulate learning with statistical queries (SQ), but their ability to go beyond that depends on the precision $\rho$ of the gradient calculations relative to the minibatch size $b$ (for SGD) and sample size $m$ (for GD). With fine enough precision relative to minibatch size, namely when $b \rho$ is small enough, SGD can go beyond SQ learning and simulate any sample-based learning algorithm and thus its learning power is equivalent to that of PAC learning; this extends prior work that achieved this result for $b=1$. Similarly, with fine enough precision relative to the sample size $m$, GD can also simulate any sample-based learning algorithm based on $m$ samples. In particular, with polynomially many bits of precision (i.e. when $\rho$ is exponentially small), SGD and GD can both simulate PAC learning regardless of the mini-batch size. On the other hand, when $b \rho^2$ is large enough, the power of SGD is equivalent to that of SQ learning.

ime, query, vertex, (14 more...)

arXiv.org Machine Learning

2108.0419

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(7 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.75)

Add feedback

On Compression Principle and Bayesian Optimization for Neural Networks

Tetelman, Michael

arXiv.org Machine LearningJun-22-2020

Finding methods for making generalizable predictions is a fundamental problem of machine learning. By looking into similarities between the prediction problem for unknown data and the lossless compression we have found an approach that gives a solution. In this paper we propose a compression principle that states that an optimal predictive model is the one that minimizes a total compressed message length of all data and model definition while guarantees decodability. Following the compression principle we use Bayesian approach to build probabilistic models of data and network definitions. A method to approximate Bayesian integrals using a sequence of variational approximations is implemented as an optimizer for hyper-parameters: Bayesian Stochastic Gradient Descent (BSGD). Training with BSGD is completely defined by setting only three parameters: number of epochs, the size of the dataset and the size of the minibatch, which define a learning rate and a number of iterations. We show that dropout can be used for a continuous dimensionality reduction that allows to find optimal network dimensions as required by the compression principle.

artificial intelligence, machine learning, variance, (17 more...)

arXiv.org Machine Learning

2006.12714

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > California (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report (1.00)
Workflow (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

Filters

Collaborating Authors

bsgd

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

cc225865b743ecc91c4743259813f604-Supplemental.pdf

cc225865b743ecc91c4743259813f604-Paper.pdf

BiasedStochasticFirst

Biased Stochastic First-Order Methods for Conditional Stochastic Optimization and Applications in Meta Learning

Biased Stochastic First-Order Methods for Conditional Stochastic Optimization and Applications in Meta Learning

On the Power of Differentiable Learning versus PAC and SQ Learning Emmanuel Abbe

On the Power of Differentiable Learning versus PAC and SQ Learning Emmanuel Abbe

Review for NeurIPS paper: Biased Stochastic First-Order Methods for Conditional Stochastic Optimization and Applications in Meta Learning

On the Power of Differentiable Learning versus PAC and SQ Learning

On Compression Principle and Bayesian Optimization for Neural Networks