AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Supplementary Material for Quantifying Generalisation in Imitation Learning

Neural Information Processing SystemsJun-22-2026, 23:34:22 GMT

Each split is entirely unique, and no labyrinth appears more than once in4 the entire dataset, regardless of its split. Each entry consists of five pieces of information:5 obs: the path to the image representation for that state;6 actions: the integer action performed for that solution in that state;7 rewards: the float reward received for that action in that state;8 episode_starts: the boolean status that states if it is the first state in an episode; and9 info: the textual information required to load the same labyrinth structure if needed.10 We provide images in each dataset since we believe that visual information is more useful to the11 imitation learning agent, and if a vector representation is needed, the info parameter allows researchers12 to load the same structure and the actions enables the recreation of the dataset in its vector format.13 Each observation is an image of size 600 600 3. Although each baseline trained in this work14 uses a 64 64 3input, we thought that providing a bigger image would benefit models requiring15 downsizing (e.g., the walls will not disappear during resizing).

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Genre: Research Report (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)

Add feedback

Pass@KPolicy Optimization: Solving Harder Reinforcement Learning Problems

Neural Information Processing SystemsJun-22-2026, 23:32:45 GMT

Reinforcement Learning algorithms commonly sample multiple (n > 1) solution attempts for each problem and reward them independently.

large language model, machine learning, reinforcement learning, (20 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry:

Information Technology (0.46)
Education > Focused Education > Special Education (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Hadamax Encoding: Elevating Performance in Model-Free Atari

Neural Information Processing SystemsJun-22-2026, 23:32:08 GMT

Neural network architectures have a large impact in machine learning. However, in the specific case of reinforcement learning, network architectures have remained notably simple, as changes often lead to small gains in performance. This work introduces a novel encoder architecture for pixel-based model-free reinforcement learning. The Hadamax (Hadamard max-pooling) encoder achieves state-of-the-art performance by max-pooling Hadamard products between GELU-activated parallel hidden layers. Based on the recent PQN algorithm, the Hadamax encoder achieves state-of-the-art model-free performance in the Atari-57 benchmark. Specifically, without applying any algorithmic hyperparameter modifications, Hadamax-PQN achieves an 80% performance gain over vanilla PQN and significantly surpasses Rainbow-DQN. For reproducibility, the full code is available on GitHub.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Preference-based Reinforcement Learning beyond Pairwise Comparisons: Benefits of Multiple Options

Neural Information Processing SystemsJun-22-2026, 23:17:11 GMT

We study online preference-based reinforcement learning (PbRL) with the goal of improving sample efficiency. While a growing body of theoretical work has emerged--motivated by PbRL's recent empirical success, particularly in aligning large language models (LLMs)--most existing studies focus only on pairwise comparisons. A few recent works [93, 49, 76] have explored using multiple comparisons and ranking feedback, but their performance guarantees fail to improve--and can even deteriorate--as the feedback length increases, despite the richer information available. To address this gap, we adopt the Plackett-Luce (PL) model for ranking feedback over action subsets and propose M-AUPO, an algorithm that selects multiple actions by maximizing the average uncertainty within the offered subset.

large language model, machine learning, reinforcement learning, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
(2 more...)

Add feedback

Temporal Representation Alignment: Successor Features Enable Emergent Compositionality in Robot Instruction Following

Neural Information Processing SystemsJun-22-2026, 22:56:53 GMT

Effective task representations should facilitate compositionality, such that after learning a variety of basic tasks, an agent can perform compound tasks consisting of multiple steps simply by composing the representations of the constituent steps together. While this is conceptually simple and appealing, it is not clear how to automatically learn representations that enable this sort of compositionality. We show that learning to associate the representations of current and future states with a temporal alignment loss can improve compositional generalization, even in the absence of any explicit subtask planning or reinforcement learning. We evaluate our approach across diverse robotic manipulation tasks as well as in simulation, showing substantial improvements for tasks specified with either language or goal images.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Adaptive Frontier Exploration on Graphs with Applications to Network-Based Disease Testing

Neural Information Processing SystemsJun-22-2026, 22:55:35 GMT

We study a sequential decision-making problem on a n-node graph G where each node has an unknown label from a finite set Ω, drawn from a joint distribution P that is Markov with respect to G. At each step, selecting a node reveals its label and yields a label-dependent reward. The goal is to adaptively choose nodes to maximize expected accumulated discounted rewards. We impose a frontier exploration constraint, where actions are limited to neighbors of previously selected nodes, reflecting practical constraints in settings such as contact tracing and robotic exploration. We design a Gittins index-based policy that applies to general graphs and is provably optimal when G is a forest.

data mining, machine learning, reinforcement learning, (21 more...)

Neural Information Processing Systems

Country: North America > United States (0.67)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (0.67)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Therapeutic Area > Internal Medicine (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
(4 more...)

Technology:

Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.93)
Information Technology > Communications (0.93)
(4 more...)

Add feedback

Reinforcement learning for one-shot DAG scheduling with comparability identification and dense reward

Neural Information Processing SystemsJun-22-2026, 22:55:07 GMT

In recent years, many studies proposed to generate solutions for Directed Acyclic Graph (DAG) scheduling problem in one shot by combining reinforcement learning and list scheduling heuristic. However, these existing methods suffer from biased estimation of sampling probabilities and inefficient guidance in training, due to redundant comparisons among node priorities and the sparse reward challenge. To address these issues, we analyze of the limitations of these existing methods, and propose a novel one-shot DAG scheduling method with comparability identification and dense reward signal, based on the policy gradient framework. In our method, a comparable antichain identification mechanism is proposed to eliminate the problem of redundant nodewise priority comparison. We also propose a dense reward signal for node level decision-making optimization in training, effectively addressing the sparse reward challenge. The experimental results show that the proposed method can yield superior results of scheduling objectives compared to other learning-based DAG scheduling methods.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.87)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)

Add feedback

FANS: AFlatness-Aware Network Structure for Generalization in Offline Reinforcement Learning

Neural Information Processing SystemsJun-22-2026, 22:44:17 GMT

Offline reinforcement learning (RL) aims to learn optimal policies from static datasets while enhancing generalization to out-of-distribution (OOD) data. To mitigate overfitting to suboptimal behaviors in offline datasets, existing methods often relax constraints on policy and data or extract informative patterns through data-driven techniques. However, there has been limited exploration into structurally guiding the optimization process toward flatter regions of the solution space that offer better generalization. Motivated by this observation, we present FANS, a generalization-oriented structured network framework that promotes flatter and robust policy learning by guiding the optimization trajectory through modular architectural design. FANS comprises four key components: (1) Residual Blocks, which facilitate compact and expressive representations; (2) Gaussian Activation, which promotes smoother gradients; (3) Layer Normalization, which mitigates overfitting; and (4) Ensemble Modeling, which reduces estimation variance. By integrating FANS into a standard actor-critic framework, we highlight that this remarkably simple architecture achieves superior performance across various tasks compared to many existing advanced methods.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Education (0.46)
Construction & Engineering (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Zero-shot World Models via Search in Memory

Neural Information Processing SystemsJun-22-2026, 22:34:12 GMT

World Models have vastly permeated the field of Reinforcement Learning. Their ability to model the transition dynamics of an environment have greatly improved sample efficiency in online RL. Among them, the most notorious example is Dreamer, a model that learns to act in a diverse set of image-based environments.

large language model, machine learning, reinforcement learning, (20 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.72)
(2 more...)

Add feedback

Meta-World+: An Improved, Standardized, RL Benchmark

Neural Information Processing SystemsJun-22-2026, 22:32:47 GMT

Meta-World is widely used for evaluating multi-task and meta-reinforcement learning agents, which are challenged to master diverse skills simultaneously. Since its introduction however, there have been numerous undocumented changes which inhibit a fair comparison of algorithms. This work strives to disambiguate these results from the literature, while also leveraging the past versions of Meta-World to provide insights into multi-task and meta-reinforcement learning benchmark design. Through this process we release a new open-source version of Meta-World1 that has full reproducibility of past results, is more technically ergonomic, and gives users more control over the tasks that are included in a task set.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.68)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback