AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Natural Language Instruction following with Task related Language Development and Translation

Neural Information Processing SystemsApr-25-2026, 15:33:09 GMT

Natural language-conditioned reinforcement learning (RL) enables agents to follow human instructions. Previous approaches generally implemented languageconditioned RL by providing the policy with human instructions in natural language (NL) and training the policy to follow instructions. In this is outside-in approach, the policy must comprehend the NL and manage the task simultaneously. However, the unbounded NL examples often bring much extra complexity for solving concrete RL tasks, which can distract policy learning from completing the task. To ease the learning burden of the policy, we investigate an inside-out scheme for natural language-conditioned RL by developing a task language (TL) that is task-related and easily understood by the policy, thus reducing the policy learning burden. Besides, we employ a translator to translate natural language into the TL, which is used in RL to achieve efficient policy training. We implement this scheme as TALAR (TAsk Language with predicAte Representation) that learns multiple predicates to model object relationships as the TL. Experiments indicate that TALAR not only better comprehends NL instructions but also leads to a better instruction-following policy that significantly improves the success rate over baselines and adapts to unseen expressions of NL instruction. Besides, the TL is also an effective sub-task abstraction compatible with hierarchical RL.

large language model, machine learning, reinforcement learning, (21 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Learning Universal Policies via Text-Guided Video Generation

Neural Information Processing SystemsApr-25-2026, 15:13:39 GMT

A goal of artificial intelligence is to construct an agent that can solve a wide variety of tasks. Recent progress in text-guided image synthesis has yielded models with an impressive ability to generate complex novel images, exhibiting combinatorial generalization across domains. Motivated by this success, we investigate whether such tools can be used to construct more general-purpose agents. Specifically, we cast the sequential decision making problem as a text-conditioned video generation problem, where, given a text-encoded specification of a desired goal, a planner synthesizes a set of future frames depicting its planned actions in the future, after which control actions are extracted from the generated video. By leveraging text as the underlying goal specification, we are able to naturally and combinatorially generalize to novel goals. The proposed policy-as-video formulation can further represent environments with different state and action spaces in a unified space of images, which, for example, enables learning and generalization across a variety of robot manipulation tasks. Finally, by leveraging pretrained language embeddings and widely available videos from the internet, the approach enables knowledge transfer through predicting highly realistic video plans for real robots2.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

41da609c519d77b29be442f8c1105647-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 15:12:29 GMT

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

412604be30f701b1b1e3124c252065e6-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 14:38:26 GMT

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Communications > Networks (0.67)

Add feedback

412604be30f701b1b1e3124c252065e6-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 14:38:22 GMT

machine learning, reinforcement learning, state aggregation, (15 more...)

Neural Information Processing Systems

Country: North America > United States (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Off-Policy Evaluation for Human Feedback

Neural Information Processing SystemsApr-25-2026, 14:38:14 GMT

Off-policy evaluation (OPE) is important for closing the gap between offline training and evaluation of reinforcement learning (RL), by estimating performance and/or rank of target (evaluation) policies using offline trajectories only. It can improve the safety and efficiency of data collection and policy testing procedures in situations where online deployments are expensive, such as healthcare. However, existing OPE methods fall short in estimating human feedback (HF) signals, as HF may be conditioned over multiple underlying factors and is only sparsely available; as opposed to the agent-defined environmental rewards (used in policy optimization), which are usually determined over parametric functions or distributions. Consequently, the nature of HF signals makes extrapolating accurate OPE estimations to be challenging. To resolve this, we introduce an OPE for HF (OPEHF) framework that revives existing OPE methods in order to accurately evaluate the HF signals. Specifically, we develop an immediate human reward (IHR) reconstruction approach, regularized by environmental knowledge distilled in a latent space that captures the underlying dynamics of state transitions as well as issuing HF signals. Our approach has been tested over two real-world experiments, adaptive in-vivo neurostimulation and intelligent tutoring, as well as in a simulation environment (visual Q&A). Results show that our approach significantly improves the performance toward estimating HF signals accurately, compared to directly applying (variants of) existing OPE methods.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States > North Carolina (0.28)

Genre: Research Report > New Finding (0.87)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
Health & Medicine > Therapeutic Area > Endocrinology (0.92)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)

Add feedback

4079016d940210b4ae9ae7d41c4a2065-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 14:37:26 GMT

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.93)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)

Add feedback

400e5e6a7ce0c754f281525fae75a873-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 14:36:30 GMT

artificial intelligence, machine learning, reinforcement learning, (12 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)

Add feedback

Reinforced Few-Shot Acquisition Function Learning for Bayesian Optimization

Neural Information Processing SystemsApr-25-2026, 14:23:46 GMT

Bayesian optimization (BO) conventionally relies on handcrafted acquisition functions (AFs) to sequentially determine the sample points. However, it has been widely observed in practice that the best-performing AF in terms of regret can vary significantly under different types of black-box functions. It has remained a challenge to design one AF that can attain the best performance over a wide variety of black-box functions. This paper aims to attack this challenge through the perspective of reinforced few-shot AF learning (FSAF). Specifically, we first connect the notion of AFs with Q-functions and view a deep Q-network (DQN) as a surrogate differentiable AF. While it serves as a natural idea to combine DQN and an existing few-shot learning method, we identify that such a direct combination does not perform well due to severe overfitting, which is particularly critical in BO due to the need of a versatile sampling policy. To address this, we present a Bayesian variant of DQN with the following three features: (i) It learns a distribution of Q-networks as AFs based on the Kullback-Leibler regularization framework. This inherently provides the uncertainty required in sampling for BO and mitigates overfitting.

machine learning, optimization, reinforcement learning, (18 more...)

Neural Information Processing Systems

Genre: Research Report (0.67)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
(2 more...)

Add feedback

Filters

Collaborating Authors

Reinforcement Learning

Natural Language Instruction following with Task related Language Development and Translation

437d46a857214c997956eaf0e3b21a55-Supplemental.pdf

Learning Universal Policies via Text-Guided Video Generation

41da609c519d77b29be442f8c1105647-Paper.pdf

412604be30f701b1b1e3124c252065e6-Supplemental.pdf

412604be30f701b1b1e3124c252065e6-Paper.pdf

Off-Policy Evaluation for Human Feedback

4079016d940210b4ae9ae7d41c4a2065-Paper.pdf

400e5e6a7ce0c754f281525fae75a873-Paper.pdf

Reinforced Few-Shot Acquisition Function Learning for Bayesian Optimization