AITopics | rainbow

Collaborating Authors

rainbow

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Resetting the Optimizer in Deep RL: An Empirical Study

Neural Information Processing SystemsFeb-17-2026, 16:08:22 GMT

We focus on the task of approximating the optimal value function in deep reinforcement learning. This iterative process is comprised of solving a sequence of optimization problems where the loss function changes per iteration. The common approach to solving this sequence of problems is to employ modern variants of the stochastic gradient descent algorithm such as Adam. These optimizers maintain their own internal parameters such as estimates of the first-order and the second-order moments of the gradient, and update them over time. Therefore, information obtained in previous iterations is used to solve the optimization problem in the current iteration. We demonstrate that this can contaminate the moment estimates because the optimization landscape can change arbitrarily from one iteration to the next one. To hedge against this negative effect, a simple idea is to reset the internal parameters of the optimizer when starting a new iteration. We empirically investigate this resetting idea by employing various optimizers in conjunction with the Rainbow algorithm. We demonstrate that this simple modification significantly improves the performance of deep RL on the Atari benchmark.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta (0.14)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.69)

Add feedback

Small batch deep reinforcement learning

Neural Information Processing SystemsFeb-11-2026, 20:31:33 GMT

Contrastingly, others have observed that larger batch sizes tend to converge to "sharper" optimization

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Education (0.68)
Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Faster Deep Reinforcement Learning with Slower Online Network

Neural Information Processing SystemsDec-24-2025, 14:12:34 GMT

Deep reinforcement learning algorithms often use two networks for value function optimization: an online network, and a target network that tracks the online network with some delay. Using two separate networks enables the agent to hedge against issues that arise when performing bootstrapping. In this paper we endow two popular deep reinforcement learning algorithms, namely DQN and Rainbow, with updates that incentivize the online network to remain in the proximity of the target network. This improves the robustness of deep reinforcement learning in presence of noisy updates. The resultant agents, called DQN Pro and Rainbow Pro, exhibit significant performance improvements over their original counterparts on the Atari benchmark demonstrating the effectiveness of this simple idea in deep reinforcement learning.

artificial intelligence, machine learning, reinforcement learning, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Munchausen Reinforcement Learning

Neural Information Processing SystemsDec-23-2025, 21:51:58 GMT

Bootstrapping is a core mechanism in Reinforcement Learning (RL). Most algorithms, based on temporal differences, replace the true value of a transiting state by their current estimate of this value. Yet, another estimate could be leveraged to bootstrap RL: the current policy. Our core contribution stands in a very simple idea: adding the scaled log-policy to the immediate reward. We show that, by slightly modifying Deep Q-Network (DQN) in that way provides an agent that is competitive with the state-of-the-art Rainbow on Atari games, without making use of distributional RL, n-step returns or prioritized replay. To demonstrate the versatility of this idea, we also use it together with an Implicit Quantile Network (IQN). The resulting agent outperforms Rainbow on Atari, installing a new State of the Art with very little modifications to the original algorithm. To add to this empirical study, we provide strong theoretical insights on what happens under the hood -- implicit Kullback-Leibler regularization and increase of the action-gap.

artificial intelligence, proceedings, reinforcement learning, (5 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games > Computer Games (0.61)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.64)

Add feedback

Distributional Reward Decomposition for Reinforcement Learning

Zichuan Lin, Li Zhao, Derek Yang, Tao Qin, Tie-Yan Liu, Guangwen Yang

Neural Information Processing SystemsNov-17-2025, 21:22:06 GMT

Reinforcement learning has achieved great success in decision making problems since Deep Q-learning was proposed by Mnih et al. [2015].

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Diego County > San Diego (0.04)
North America > Canada (0.04)
Asia > China > Shandong Province > Qingdao (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

48726631f87322012c6be38e00c72a47-Paper-Conference.pdf

Neural Information Processing SystemsNov-15-2025, 19:12:27 GMT

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > Puerto Rico > San Juan > San Juan (0.04)
South America > Brazil > Pernambuco (0.04)
(6 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

10 Appendix 10.1 Pseudo-code for DQN Pro Below, we present the pseudo-code for DQN Pro. Notice that the difference between DQN and DQN

Neural Information Processing SystemsNov-15-2025, 06:23:22 GMT

Below, we present the pseudo-code for DQN Pro. Pro is minimal (highlighted in gray). Sticky actions True Optimizer Adam Kingma & Ba (2015) Network architecture Nature DQN network Mnih et al. (2015) Random seeds { 0, 1, 2, 3, 4 } Rainbow hyper-parameters (shared) Batch size 64 Other Config file rainbow_aaai.gin Theorem 2. Consider the PMPI algorithm specified by: We make two assumptions: 1. we assume error in policy evaluation step, as already stated in equation (4). All results are averaged over 5 independent seeds.

artificial intelligence, machine learning, rainbow, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.95)

Add feedback

Faster Deep Reinforcement Learning with Slower Online Network

Neural Information Processing SystemsNov-15-2025, 06:23:18 GMT

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

Europe > Russia (0.04)
Europe > France (0.04)
Asia > South Korea > Seoul > Seoul (0.04)
(2 more...)

Industry: Leisure & Entertainment > Games (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

details and add more discussions on related works in the camera-ready version

Neural Information Processing SystemsNov-14-2025, 08:46:49 GMT

We thank all reviewers for valuable comments. Entropy is used to measure sufficiency, compactness and uniqueness . The usage of variance to approximate entropy was discussed in L203. Therefore, the performance deteriorates dramatically. We will run our algorithm in more environments and provide the results in Appendix.

artificial intelligence, camera-ready version, machine learning, (19 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.32)

Add feedback