AITopics | m-dqn

Collaborating Authors

m-dqn

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MunchausenReinforcementLearning

Neural Information Processing SystemsFeb-18-2026, 23:36:53 GMT

The resulting agent outperforms RainbowonAtari, installing anewState oftheArtwith very little modifications to the original algorithm.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

2c6a0bae0f071cbbf0bb3d5b11d90a82-Supplemental.pdf

Neural Information Processing SystemsFeb-7-2026, 22:53:00 GMT

agent, dqn, m-dqn, (13 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Russia (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

Munchausen Reinforcement Learning

Neural Information Processing SystemsOct-2-2025, 13:43:16 GMT

Bootstrapping is a core mechanism in Reinforcement Learning (RL).

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country: Europe (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Review for NeurIPS paper: Munchausen Reinforcement Learning

Neural Information Processing SystemsJan-23-2025, 00:35:22 GMT

Additional Feedback: After Authors' Reponse: I still find the paper's analysis regarding action-gaps a bit weak, and the authors' response didn't help much in that regard. I think their action-gap analysis needs to be considered under the new findings of (van Seijen et al., 2019); increasing the action-gap is not important on its own, rather it's the homogeneity of the action-gaps across the states that is important. While I still stand by my verdict of accepting this paper, in light of other reviews, I think the paper's writing should be toned down a bit in regards to its theoretical novelty and claims about empirical results (e.g. the first non-dist-RL to beat a dist-RL). Q1: To the best of my knowledge, IQN in Dopamine also uses Double Q-learning. Is this also the case for your M-IQN agent?

munchausen reinforcement learning, reinforcement learning, van seijen, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Generative Flow Networks as Entropy-Regularized RL

Tiapkin, Daniil, Morozov, Nikita, Naumov, Alexey, Vetrov, Dmitry

arXiv.org Machine LearningOct-23-2023

The recently proposed generative flow networks (GFlowNets) are a method of training a policy to sample compositional discrete objects with probabilities proportional to a given reward via a sequence of actions. GFlowNets exploit the sequential nature of the problem, drawing parallels with reinforcement learning (RL). Our work extends the connection between RL and GFlowNets to a general case. We demonstrate how the task of learning a generative flow network can be efficiently redefined as an entropy-regularized RL problem with a specific reward and regularizer structure. Furthermore, we illustrate the practical efficiency of this reformulation by applying standard soft RL algorithms to GFlowNet training across several probabilistic modeling tasks. Contrary to previously reported results, we show that entropic RL approaches can be competitive against established GFlowNet training methods. This perspective opens a direct path for integrating reinforcement learning principles into the realm of generative flow networks.

machine learning, reinforcement learning, trajectory, (16 more...)

arXiv.org Machine Learning

2310.12934

Country:

North America > United States > New York > Richmond County > New York City (0.04)
North America > United States > New York > Queens County > New York City (0.04)
North America > United States > New York > New York County > New York City (0.04)
(4 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Munchausen Reinforcement Learning

Vieillard, Nino, Pietquin, Olivier, Geist, Matthieu

arXiv.org Machine LearningNov-4-2020

Bootstrapping is a core mechanism in Reinforcement Learning (RL). Most algorithms, based on temporal differences, replace the true value of a transiting state by their current estimate of this value. Yet, another estimate could be leveraged to bootstrap RL: the current policy. Our core contribution stands in a very simple idea: adding the scaled log-policy to the immediate reward. We show that slightly modifying Deep Q-Network (DQN) in that way provides an agent that is competitive with distributional methods on Atari games, without making use of distributional RL, n-step returns or prioritized replay. To demonstrate the versatility of this idea, we also use it together with an Implicit Quantile Network (IQN). The resulting agent outperforms Rainbow on Atari, installing a new State of the Art with very little modifications to the original algorithm. To add to this empirical study, we provide strong theoretical insights on what happens under the hood -- implicit Kullback-Leibler regularization and increase of the action-gap.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

2007.1443

Country: