AITopics | virel

Neural Information Processing Systems http://nips.cc/

algorithm, learning, proceedings, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
(12 more...)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

VIREL: A Variational Inference Framework for Reinforcement Learning

Neural Information Processing SystemsDec-25-2025, 10:12:44 GMT

Applying probabilistic models to reinforcement learning (RL) enables the uses of powerful optimisation tools such as variational inference in RL. However, existing inference frameworks and their algorithms pose significant challenges for learning optimal policies, e.g., the lack of mode capturing behaviour in pseudo-likelihood methods, difficulties learning deterministic policies in maximum entropy RL based approaches, and a lack of analysis when function approximators are used. We propose VIREL, a theoretically grounded probabilistic inference framework for RL that utilises a parametrised action-value function to summarise future dynamics of the underlying MDP, generalising existing approaches. VIREL also benefits from a mode-seeking form of KL divergence, the ability to learn deterministic optimal polices naturally from inference, and the ability to optimise value functions and policies in separate, iterative steps. In applying variational expectation-maximisation to VIREL, we thus show that the actor-critic algorithm can be reduced to expectation-maximisation, with policy improvement equivalent to an E-step and policy evaluation to an M-step. We then derive a family of actor-critic methods fromVIREL, including a scheme for adaptive exploration. Finally, we demonstrate that actor-critic algorithms from this family outperform state-of-the-art methods based on soft value functions in several domains.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.80)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.60)

Add feedback

VIREL: A Variational Inference Framework for Reinforcement Learning

Matthew Fellows, Anuj Mahajan, Tim G. J. Rudner, Shimon Whiteson

Neural Information Processing SystemsOct-2-2025, 18:57:10 GMT

Applying probabilistic models to reinforcement learning (RL) enables the uses of powerful optimisation tools such as variational inference in RL.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States > California (0.46)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Reviews: VIREL: A Variational Inference Framework for Reinforcement Learning

Neural Information Processing SystemsJan-23-2025, 22:03:24 GMT

This paper brings an novel perspective on probabilistic frameworks for new reinforcement learning algorithms, and the adaptive temperature reweighting may lead to more insightful exploration built into our RL algorithms. The paper is written clearly, and is also well-organized and easy to understand, and the appendix is structured clearly as well, although the full length of the appendix paper makes the paper a little unwieldy to read. The authors have clearly put in a lot of work into developing the theory and presentation in this paper, and although empirically the performance of the derived algorithms do not show significant improvement over max-ent RL methods (with twin Q functions as in TD3), the approach is interesting and I believe this paper would be well-suited for NeurIPS. Some specific comments: - In the definition of the residual error on L147, over what distribution is the L p norm being referred to? - Instead of e_w being a global constant, have the authors considered parametrizing e_w as a function of h - this would allow for state-adaptive uncertainty and exploration, and I believe a majority of the results would still hold. However, most works with the Max-Ent framework parametrize variational distributions through only the action distributions, and fix the variational distribution on dynamics to the actual dynamics model.

algorithm, variational distribution, variational inference framework, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reviews: VIREL: A Variational Inference Framework for Reinforcement Learning

Neural Information Processing SystemsJan-23-2025, 22:03:14 GMT

All reviewers were in agreement that the paper was interesting and well written.

reinforcement learning, variational inference framework, virel

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

VIREL: A Variational Inference Framework for Reinforcement Learning

Neural Information Processing SystemsOct-10-2024, 02:01:52 GMT

Applying probabilistic models to reinforcement learning (RL) enables the uses of powerful optimisation tools such as variational inference in RL. However, existing inference frameworks and their algorithms pose significant challenges for learning optimal policies, e.g., the lack of mode capturing behaviour in pseudo-likelihood methods, difficulties learning deterministic policies in maximum entropy RL based approaches, and a lack of analysis when function approximators are used. We propose VIREL, a theoretically grounded probabilistic inference framework for RL that utilises a parametrised action-value function to summarise future dynamics of the underlying MDP, generalising existing approaches. VIREL also benefits from a mode-seeking form of KL divergence, the ability to learn deterministic optimal polices naturally from inference, and the ability to optimise value functions and policies in separate, iterative steps. In applying variational expectation-maximisation to VIREL, we thus show that the actor-critic algorithm can be reduced to expectation-maximisation, with policy improvement equivalent to an E-step and policy evaluation to an M-step.

reinforcement learning, variational inference framework, virel, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.64)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.62)

Add feedback

VIREL: A Variational Inference Framework for Reinforcement Learning

Fellows, Matthew, Mahajan, Anuj, Rudner, Tim G. J., Whiteson, Shimon

Neural Information Processing SystemsMar-18-2020, 23:31:03 GMT

Applying probabilistic models to reinforcement learning (RL) enables the uses of powerful optimisation tools such as variational inference in RL. However, existing inference frameworks and their algorithms pose significant challenges for learning optimal policies, e.g., the lack of mode capturing behaviour in pseudo-likelihood methods, difficulties learning deterministic policies in maximum entropy RL based approaches, and a lack of analysis when function approximators are used. We propose VIREL, a theoretically grounded probabilistic inference framework for RL that utilises a parametrised action-value function to summarise future dynamics of the underlying MDP, generalising existing approaches. VIREL also benefits from a mode-seeking form of KL divergence, the ability to learn deterministic optimal polices naturally from inference, and the ability to optimise value functions and policies in separate, iterative steps. In applying variational expectation-maximisation to VIREL, we thus show that the actor-critic algorithm can be reduced to expectation-maximisation, with policy improvement equivalent to an E-step and policy evaluation to an M-step.

reinforcement learning, variational inference framework, virel, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.64)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.62)

Add feedback

VIREL: A Variational Inference Framework for Reinforcement Learning

Fellows, Matthew, Mahajan, Anuj, Rudner, Tim G. J., Whiteson, Shimon

arXiv.org Machine LearningNov-13-2018

Applying probabilistic models to reinforcement learning (RL) has become an exciting direction of research owing to powerful optimisation tools such as variational inference becoming applicable to RL. However, due to their formulation, existing inference frameworks and their algorithms pose significant challenges for learning optimal policies, for example, the absence of mode capturing behaviour in pseudo-likelihood methods and difficulties in optimisation of learning objective in maximum entropy RL based approaches. We propose VIREL, a novel, theoretically grounded probabilistic inference framework for RL that utilises the action-value function in a parametrised form to capture future dynamics of the underlying Markov decision process. Owing to its generality, our framework lends itself to current advances in variational inference. Applying the variational expectation-maximisation algorithm to our framework, we show that the actor-critic algorithm can be reduced to expectation-maximisation. We derive a family of methods from our framework, including state-of-the-art methods based on soft value functions. We evaluate two actor-critic algorithms derived from this family, which perform on par with soft actor critic, demonstrating that our framework offers a promising perspective on RL as inference.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1811.01132

Country: