AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Adjustable Robust Reinforcement Learning for Online 3D Bin Packing

Neural Information Processing SystemsNov-19-2025, 13:24:38 GMT

Designing effective policies for the online 3D bin packing problem (3D-BPP) has been a long-standing challenge, primarily due to the unpredictable nature of incoming box sequences and stringent physical constraints.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > Middle East > Republic of Türkiye > İzmir Province > İzmir (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

899511e37a8e01e1bd6f6f1d377cc250-Paper-Conference.pdf

Neural Information Processing SystemsNov-19-2025, 02:18:46 GMT

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

Common Concerns 2 Novelty

Neural Information Processing SystemsNov-19-2025, 00:22:20 GMT

We thank all reviewers for their valuable comments. We address the concerns raised by them below. The idea of using imitation learning to make approximate decisions is not new. The author needs to provide a wall-clock time cost comparison of different methods. We will include them in the final verision.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.33)

Add feedback

FairDICE: Fairness-Driven Offline Multi-Objective Reinforcement Learning

Kim, Woosung, Lee, Jinho, Lee, Jongmin, Lee, Byung-Jun

arXiv.org Artificial IntelligenceNov-19-2025

Multi-objective reinforcement learning (MORL) aims to optimize policies in the presence of conflicting objectives, where linear scalarization is commonly used to reduce vector-valued returns into scalar signals. While effective for certain preferences, this approach cannot capture fairness-oriented goals such as Nash social welfare or max-min fairness, which require nonlinear and non-additive trade-offs. Although several online algorithms have been proposed for specific fairness objectives, a unified approach for optimizing nonlinear welfare criteria in the offline setting-where learning must proceed from a fixed dataset-remains unexplored. In this work, we present FairDICE, the first offline MORL framework that directly optimizes nonlinear welfare objective. FairDICE leverages distribution correction estimation to jointly account for welfare maximization and distributional regularization, enabling stable and sample-efficient learning without requiring explicit preference weights or exhaustive weight search. Across multiple offline benchmarks, FairDICE demonstrates strong fairness-aware performance compared to existing baselines.

justification, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2506.08062

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Co-Alignment: Rethinking Alignment as Bidirectional Human-AI Cognitive Adaptation

Li, Yubo, Song, Weiyi

arXiv.org Artificial IntelligenceNov-19-2025

Current AI alignment through RLHF follows a single directional paradigm that AI conforms to human preferences while treating human cognition as fixed. We propose a shift to co-alignment through Bidirectional Cognitive Alignment (BiCA), where humans and AI mutually adapt. BiCA uses learnable protocols, representation mapping, and KL-budget constraints for controlled co-evolution. In collaborative navigation, BiCA achieved 85.5% success versus 70.3% baseline, with 230% better mutual adaptation and 332% better protocol convergence. Emergent protocols outperformed handcrafted ones by 84%, while bidirectional adaptation unexpectedly improved safety (+23% out-of-distribution robustness). The 46% synergy improvement demonstrates optimal collaboration exists at the intersection, not union, of human and AI capabilities, validating the shift from single-directional to co-alignment paradigms.

machine learning, natural language, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2509.12179

Genre: Research Report (0.83)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
(2 more...)

Add feedback

SkillGen: Learning Domain Skills for In-Context Sequential Decision Making

Ding, Ruomeng, Cheng, Wei, Shao, Minglai, Zhao, Chen

arXiv.org Artificial IntelligenceNov-19-2025

Large language models (LLMs) are increasingly applied to sequential decision-making through in-context learning (ICL), yet their effectiveness is highly sensitive to prompt quality. Effective prompts should meet three principles: focus on decision-critical information, provide step-level granularity, and minimize reliance on expert annotations through label efficiency. However, existing ICL methods often fail to satisfy all three criteria simultaneously. Motivated by these challenges, we introduce SkillGen, a skill-based ICL framework for structured sequential reasoning. It constructs an action-centric, domain-level graph from sampled trajectories, identifies high-utility actions via temporal-difference credit assignment, and retrieves step-wise skills to generate fine-grained, context-aware prompts. We further present a theoretical analysis showing that focusing on high-utility segments supports task identifiability and informs more effective ICL prompt design. Experiments on ALFWorld, BabyAI, and ScienceWorld, using both open-source and proprietary LLMs, show that SkillGen achieves consistent gains, improving progress rate by 5.9%-16.5% on average across models.

large language model, machine learning, reinforcement learning, (21 more...)

arXiv.org Artificial Intelligence

2511.1467

Country: Europe > Austria (0.27)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)

Add feedback

Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning

Cheng, Mingyue, Ouyang, Jie, Yu, Shuo, Yan, Ruiran, Luo, Yucong, Liu, Zirui, Wang, Daoyu, Liu, Qi, Chen, Enhong

arXiv.org Artificial IntelligenceNov-19-2025

Large Language Models (LLMs) are increasingly being explored for building Agents capable of active environmental interaction (e.g., via tool use) to solve complex problems. Reinforcement Learning (RL) is considered a key technology with significant potential for training such Agents; however, the effective application of RL to LLM Agents is still in its nascent stages and faces considerable challenges. Currently, this emerging field lacks in-depth exploration into RL approaches specifically tailored for the LLM Agent context, alongside a scarcity of flexible and easily extensible training frameworks designed for this purpose. To help advance this area, this paper first revisits and clarifies Reinforcement Learning methodologies for LLM Agents by systematically extending the Markov Decision Process (MDP) framework to comprehensively define the key components of an LLM Agent. Secondly, we introduce Agent-R1, a modular, flexible, and user-friendly training framework for RL-based LLM Agents, designed for straightforward adaptation across diverse task scenarios and interactive environments. We conducted experiments on Multihop QA benchmark tasks, providing initial validation for the effectiveness of our proposed methods and framework.

large language model, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2511.1446

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Achieving Safe Control Online through Integration of Harmonic Control Lyapunov-Barrier Functions with Unsafe Object-Centric Action Policies

Fawn, Marlow, Scheutz, Matthias

arXiv.org Artificial IntelligenceNov-19-2025

Open-world environments pose many challenges for autonomous robots as unexpected events or task modulations can make learned robot behavior inapplicable or obsolete. Consider, for example, a robot that has learned to autonomously perform a sorting task on a table top without any human interventions when a human co-worker steps in to help with finishing the task. This change in task environment now requires the robot to avoid colliding with the human whose arms are extended into the robot's work space and are dynamically changing position. Even if the robot has the perceptual capability to detect and track the human's arms and hands, its trained action policy does not provide a way to account for the motion constraints they impose. Or consider a delivery robot in a warehouse that has an optimized policy for traversing indoor spaces when dynamic constraints are imposed on where it can drive (e.g., because parts of the floor are painted).

constraint, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.4204/EPTCS.436.7

2511.14434

Genre: Research Report (0.51)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)

Add feedback

Self-Supervised Multisensory Pretraining for Contact-Rich Robot Reinforcement Learning

Krohn, Rickmer, Prasad, Vignesh, Tiboni, Gabriele, Chalvatzaki, Georgia

arXiv.org Artificial IntelligenceNov-19-2025

Effective contact-rich manipulation requires robots to synergistically leverage vision, force, and proprioception. However, Reinforcement Learning agents struggle to learn in such multisensory settings, especially amidst sensory noise and dynamic changes. We propose MultiSensory Dynamic Pretraining (MSDP), a novel framework for learning expressive multisensory representations tailored for task-oriented policy learning. MSDP is based on masked autoencoding and trains a transformer-based encoder by reconstructing multisensory observations from only a subset of sensor embeddings, leading to cross-modal prediction and sensor fusion. For downstream policy learning, we introduce a novel asymmetric architecture, where a cross-attention mechanism allows the critic to extract dynamic, task-specific features from the frozen embeddings, while the actor receives a stable pooled representation to guide its actions. Our method demonstrates accelerated learning and robust performance under diverse perturbations, including sensor noise, and changes in object dynamics. Evaluations in multiple challenging, contact-rich robot manipulation tasks in simulation and the real world showcase the effectiveness of MSDP. Our approach exhibits strong robustness to perturbations and achieves high success rates on the real robot with as few as 6,000 online interactions, offering a simple yet powerful solution for complex multisensory robotic control.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2511.14427

Country: Europe > Germany (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Filters

Collaborating Authors

Reinforcement Learning

Adjustable Robust Reinforcement Learning for Online 3D Bin Packing

61caa89f7a5366023db6f5736b2c579d-Paper-Conference.pdf

899511e37a8e01e1bd6f6f1d377cc250-Paper-Conference.pdf

Common Concerns 2 Novelty

FairDICE: Fairness-Driven Offline Multi-Objective Reinforcement Learning

Co-Alignment: Rethinking Alignment as Bidirectional Human-AI Cognitive Adaptation

SkillGen: Learning Domain Skills for In-Context Sequential Decision Making

Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning

Achieving Safe Control Online through Integration of Harmonic Control Lyapunov-Barrier Functions with Unsafe Object-Centric Action Policies

Self-Supervised Multisensory Pretraining for Contact-Rich Robot Reinforcement Learning