AITopics | stochastic computation graph

Modelers use automatic differentiation (AD) of computation graphs to implement complex Deep Learning models without defining gradient computations. Stochastic AD extends AD to stochastic computation graphs with sampling steps, which arise when modelers handle the intractable expectations common in Reinforcement Learning and Variational Inference. However, current methods for stochastic AD are limited: They are either only applicable to continuous random variables and differentiable functions, or can only use simple but high variance score-function estimators. To overcome these limitations, we introduce Storchastic, a new framework for AD of stochastic computation graphs. Storchastic allows the modeler to choose from a wide variety of gradient estimation methods at each sampling step, to optimally reduce the variance of the gradient estimates. Furthermore, Storchastic is provably unbiased for estimation of any-order gradients, and generalizes variance reduction techniques to higher-order gradient estimates. Finally, we implement Storchastic as a PyTorch library at github.com/HEmile/storchastic.

general stochastic automatic differentiation, name change, storchastic, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.99)

Add feedback

Gradient Estimation Using Stochastic Computation Graphs

John Schulman, Nicolas Heess, Theophane Weber, Pieter Abbeel

Neural Information Processing SystemsOct-2-2025, 15:01:31 GMT

In a variety of problems originating in supervised, unsupervised, and reinforcement learning, the loss function is defined by an expectation over a collection of random variables, which might be part of a probabilistic model or the external world. Estimating the gradient of this loss function, using samples, lies at the core of gradient-based learning algorithms for these problems. We introduce the formalism of stochastic computation graphs --directed acyclic graphs that include both deterministic functions and conditional probability distributions--and describe how to easily and automatically derive an unbiased estimator of the loss function's gradient. The resulting algorithm for computing the gradient estimator is a simple modification of the standard backpropagation algorithm. The generic scheme we propose unifies estimators derived in variety of prior work, along with variance-reduction techniques therein. It could assist researchers in developing intricate models involving a combination of stochastic and deterministic operations, enabling, for example, attention, memory, and control actions.

estimator, gradient estimator, stochastic computation graph, (11 more...)

Neural Information Processing Systems

Country: North America > United States > California > Alameda County > Berkeley (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsFeb-8-2025, 06:49:01 GMT

The manuscript proposes a formalism for computing stochastic estimates of gradients for loss functions. The formalism, referred to as stochastic computation graphs, is very general, applying to models with deterministic and stochastic components, and allowing the computation of gradient estimates for a broad range of models. Methods for deep networks, variational inference, and reinforcement learning are identified as special cases of the proposed framework. The proposed stochastic computation graphs are essentially Bayesian networks which may also contain deterministic nodes, and which are interpreted as encoding distributions over loss functions that are to be minimized. The gradient estimation algorithm extends backpropagation to the partially stochastic case covered by these models, by simply composing score function estimators and pathwise derivative estimators.

author feedback and meta-review, discussion, export review, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.96)

Add feedback

Storchastic: A Framework for General Stochastic Automatic Differentiation

Neural Information Processing SystemsJan-24-2025, 01:59:44 GMT

Modelers use automatic differentiation (AD) of computation graphs to implement complex Deep Learning models without defining gradient computations. Stochastic AD extends AD to stochastic computation graphs with sampling steps, which arise when modelers handle the intractable expectations common in Reinforcement Learning and Variational Inference. However, current methods for stochastic AD are limited: They are either only applicable to continuous random variables and differentiable functions, or can only use simple but high variance score-function estimators. To overcome these limitations, we introduce Storchastic, a new framework for AD of stochastic computation graphs. Storchastic allows the modeler to choose from a wide variety of gradient estimation methods at each sampling step, to optimally reduce the variance of the gradient estimates. Furthermore, Storchastic is provably unbiased for estimation of any-order gradients, and generalizes variance reduction techniques to higher-order gradient estimates.

computation graph, general stochastic automatic differentiation, storchastic, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)

Add feedback

Gradient Estimation Using Stochastic Computation Graphs

Neural Information Processing SystemsJan-15-2025, 21:08:30 GMT

In a variety of problems originating in supervised, unsupervised, and reinforcement learning, the loss function is defined by an expectation over a collection of random variables, which might be part of a probabilistic model or the external world. Estimating the gradient of this loss function, using samples, lies at the core of gradient-based learning algorithms for these problems. We introduce the formalism of stochastic computation graphs--directed acyclic graphs that include both deterministic functions and conditional probability distributions and describe how to easily and automatically derive an unbiased estimator of the loss function's gradient. The resulting algorithm for computing the gradient estimator is a simple modification of the standard backpropagation algorithm. The generic scheme we propose unifies estimators derived in variety of prior work, along with variance-reduction techniques therein.

gradient estimation, loss function, stochastic computation graph, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.65)

Add feedback

Reviews: Total stochastic gradient algorithms and applications in reinforcement learning

Neural Information Processing SystemsOct-7-2024, 05:01:13 GMT

This paper provides another formalism for gradient estimation in probabilistic computation graphs. Using pathwise derivative and likelihood ratio estimators, existing and well-known policy gradient theorems are cast into the proposed formalism. This intuition is then used to propose two new methods for gradient estimation that can be used in a model-based RL framework. Some results are shown that demonstrate comparable results to PILCO on the cart-pole task. Quality: the idea in this work is interesting, and the proposed framework and methods may prove useful in RL settings.

estimator, stochastic gradient algorithm and application, total stochastic gradient algorithm, (9 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Gradient Estimation Using Stochastic Computation Graphs

Neural Information Processing SystemsMar-13-2024, 04:15:37 GMT

In a variety of problems originating in supervised, unsupervised, and reinforcement learning, the loss function is defined by an expectation over a collection of random variables, which might be part of a probabilistic model or the external world. Estimating the gradient of this loss function, using samples, lies at the core of gradient-based learning algorithms for these problems. We introduce the formalism of stochastic computation graphs--directed acyclic graphs that include both deterministic functions and conditional probability distributions--and describe how to easily and automatically derive an unbiased estimator of the loss function's gradient. The resulting algorithm for computing the gradient estimator is a simple modification of the standard backpropagation algorithm. The generic scheme we propose unifies estimators derived in variety of prior work, along with variance-reduction techniques therein. It could assist researchers in developing intricate models involving a combination of stochastic and deterministic operations, enabling, for example, attention, memory, and control actions.

estimator, gradient estimator, stochastic computation graph, (11 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > New York (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)

Add feedback

Efficient Learning of Discrete-Continuous Computation Graphs

Friede, David, Niepert, Mathias

arXiv.org Artificial IntelligenceJul-26-2023

Numerous models for supervised and reinforcement learning benefit from combinations of discrete and continuous model components. End-to-end learnable discrete-continuous models are compositional, tend to generalize better, and are more interpretable. A popular approach to building discrete-continuous computation graphs is that of integrating discrete probability distributions into neural networks using stochastic softmax tricks. Prior work has mainly focused on computation graphs with a single discrete component on each of the graph's execution paths. We analyze the behavior of more complex stochastic computations graphs with multiple sequential discrete components. We show that it is challenging to optimize the parameters of these models, mainly due to small gradients and local minima. We then propose two new strategies to overcome these challenges. First, we show that increasing the scale parameter of the Gumbel noise perturbations during training improves the learning behavior. Second, we propose dropout residual connections specifically tailored to stochastic, discrete-continuous computation graphs. With an extensive set of experiments, we show that we can train complex discrete-continuous models which one cannot train with standard stochastic softmax tricks. We also show that complex discrete-stochastic models generalize better than their continuous counterparts on several benchmark datasets.

artificial intelligence, logic & formal reasoning, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2307.14193

Country: